[PATCH][tree-ssa-loop-ch] Add missing NULL test for dump_file

2020-10-06 Thread Tom de Vries
Hi,

If we change gimple_can_duplicate_bb_p to return false instead of true, we run
into a segfault in ch_base::copy_headers due to using dump_file while it's
NULL:
...
  if (!gimple_duplicate_sese_region (entry, exit, bbs, n_bbs, copied_bbs,
true))
   {
 fprintf (dump_file, "Duplication failed.\n");
 continue;
   }
...

Fix this by adding the missing dump_file != NULL test.

Tested by rebuilding lto1 and rerunning the failing test-case.

[ Not committing this as obvious because I'm not sure about the
(dump_flags & TDF_DETAILS) bit. ]

OK for trunk?

Thanks,
- Tom

[tree-ssa-loop-ch] Add missing NULL test for dump_file

gcc/ChangeLog:

2020-10-07  Tom de Vries  

* tree-ssa-loop-ch.c (ch_base::copy_headers): Add missing NULL test
for dump_file.

---
 gcc/tree-ssa-loop-ch.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index b86acf7c39d..1f3d9321a55 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -425,7 +425,8 @@ ch_base::copy_headers (function *fun)
   if (!gimple_duplicate_sese_region (entry, exit, bbs, n_bbs, copied_bbs,
 true))
{
- fprintf (dump_file, "Duplication failed.\n");
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Duplication failed.\n");
  continue;
}
   copied.safe_push (std::make_pair (entry, loop));


Re: [PATCH] RISC-V: Derive ABI from -march if -mabi is not present.

2020-10-06 Thread Kito Cheng via Gcc-patches
Hi Maciej:

Thanks for sharing your experience on MIPS, that sounds like just
opposite derived directions to this scheme.

>  The MIPS port used to have `-mcpu=' as well, which used to be roughly
> equivalent to modern `-mtune='; from your description I gather `-mcpu=' is
> going to be roughly equivalent to a combination of `-mtune=' and `-march='
> setting DFA scheduling for a specific CPU and the instruction set to the
> underlying architecture

Yes, -mcpu= -mtune + march is what I would like to do,
and that's what RISC-V clang/LLVM do currently, although RISC-V
clang/LLVM didn't have -mtune options yet, but I plan to implement
one to make two compiler interfaces more closer.

> (do we plan to allow vendor extensions?).

I guess vendor extensions supporting is another big issue for RISC-V GCC... :P

> In which case to compile a set of CPU-specific modules to be linked together
> (e.g. individual platform support in a generic piece of software like an
> OS kernel or a bare metal library) you'll always have to specify the ABI
> explicitly (though maybe you want anyway, hmm).
>
>  FWIW,
>
>   Maciej


Re: [PATCH] RISC-V: Derive ABI from -march if -mabi is not present.

2020-10-06 Thread Kito Cheng via Gcc-patches
Hi Andreas:

Thanks for your review, writing the document is my weakness ...:P

On Tue, Oct 6, 2020 at 3:34 PM Andreas Schwab  wrote:
>
> On Okt 06 2020, Kito Cheng wrote:
>
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index f623467b7637..c6ba738aa0b7 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -25928,7 +25928,14 @@ allows floating-point values up to 32 bits long to 
> > be passed in registers; or
> >  @samp{-march=rv64ifd -mabi=lp64}, in which no floating-point arguments 
> > will be
> >  passed in registers.
> >
> > -The default for this argument is system dependent, users who want a 
> > specific
> > +When @option{-mabi=} is not specified, the default value will derived from
> > +@option{-march=}, the rules is using @samp{d} ABI variant if D extension is
> > +enabled, otherwise using soft-float ABI variant even F extension is 
> > enabled,
> > +there is an special rule for @samp{rv32e} variant is it always use
> > +@samp{ilp32e}.
>
> When @option{-mabi=} is not specified, the default value will be derived
> from @option{-march=}.  If the D extension is enabled use the @samp{d}
> ABI variant, otherwise use the soft-float ABI variant even if the F
> extension is enabled.  For the @samp{rv32e} architecture the default is
> @samp{ilp32e}.
>
> > +
> > +If @option{-march} and @option{-mabi=} both are not specified, the default 
> > for
>
> If both ... are not specified
>
> Andreas.
>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."


Re: [committed, wwwdocs] gcc-11/changes: Add notes about column number changes

2020-10-06 Thread Lewis Hyatt via Gcc-patches
On Tue, Oct 6, 2020 at 11:09 AM David Malcolm  wrote:
>
> I've taken the liberty of pushing this website patch, having checked
> that it validates.
>
> It covers the changes by Lewis in 004bb936d6d5f177af26ad4905595e843d5665a5
> (PR 49973 and PR 86904).
>

Cool, thank you for mentioning it here! Sorry if that's something I
was supposed to include with the original patch...
BTW, one small note:

> +  Additionally, in previous releases of GCC, tab characters in the source
> +  would be emitted verbatim when quoting source code, but be prefixed
> +  with whitespace or line number information, leading to misalignments
> +  in the resulting output when compared with the actual source.  Tab
> +  characters are now printed as an appropriate number of spaces, using 
> the
> +   href="https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html#index-ftabstop;>-ftabstop
> +  option (which defaults to 8 spaces per tab stop).
> +
> +  
>  

In previous versions, a tab was output as a single space, not an
actual tab character, so it led to misalignment even without any
prefix, FWIW.

-Lewis


Re: [PATCH][openacc] Fix acc declare for VLAs

2020-10-06 Thread Tobias Burnus

And as spotted by Thomas, Tom's patch also resolved an XFAIL in
gcc/testsuite.

Committed as r11-3687-ga9802204603616df14ed47d05f1b86f1bd08d8fb after
testing it on x86-64-gnu-linux.

Tobias

On 10/6/20 3:28 PM, Tom de Vries wrote:
...

[openacc] Fix acc declare for VLAs

gcc/ChangeLog:

2020-10-06  Tom de Vries  

  PR middle-end/90861
  * gimplify.c (gimplify_bind_expr): Handle lookup in
  oacc_declare_returns using key with decl-expr.

libgomp/ChangeLog:

2020-10-06  Tom de Vries  

  PR middle-end/90861
  * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Remove xfail.

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit a9802204603616df14ed47d05f1b86f1bd08d8fb
Author: Tobias Burnus 
Date:   Tue Oct 6 23:34:21 2020 +0200

c-c++-common/goacc/declare-pr90861.c: Remove xfail

gcc/testsuite/ChangeLog
PR middle-end/90861
* c-c++-common/goacc/declare-pr90861.c: Remove xfail.

diff --git a/gcc/testsuite/c-c++-common/goacc/declare-pr90861.c b/gcc/testsuite/c-c++-common/goacc/declare-pr90861.c
index 7c905624f7a..c5487bdc8ba 100644
--- a/gcc/testsuite/c-c++-common/goacc/declare-pr90861.c
+++ b/gcc/testsuite/c-c++-common/goacc/declare-pr90861.c
@@ -17,5 +17,5 @@ void f2 (void)
   int A_f2[N_f2];
 #pragma acc declare copy(A_f2)
   /* { dg-final { scan-tree-dump-times {#pragma omp target oacc_declare map\(to:\(\*A_f2} 1 gimple } }
- { dg-final { scan-tree-dump-times {#pragma omp target oacc_declare map\(from:\(\*A_f2} 1 gimple { xfail *-*-* } } } TODO PR90861 */
+ { dg-final { scan-tree-dump-times {#pragma omp target oacc_declare map\(from:\(\*A_f2} 1 gimple } } */
 }


Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 3:27 PM, Martin Sebor wrote:

On 10/5/20 8:18 PM, Andrew MacLeod wrote:

On 10/5/20 4:16 PM, Martin Sebor wrote:
Long term, I would expect we might have some sort of general 
access... probably thru cfun.     so any pass/routines would just ask 
for

     RANGE_INFO (cfun)->range_of_expr()
The default would be a general value_range implementation which 
probably implements something like determine_value_range_1 ().. and 
if a pass wants to use a ranger, then it could register a ranger, and 
when its done delete it.  and it would just work for everyone 
everywhere.


That would work too.

As a side note, I don't yet fully understand when it might be useful
to have different range_query instances.  We talked about value_query
and range_query but those are provided because some passes work with
just values and others with just ranges.  When might one want to have
an instance of some other type altogether and tie it to a function?



I don't know what you are asking. An instance of some other type 
altogether?


we have 2 instances of range_query already. m_range_analyzer and 
ranger.
we have multiple instances of value_query as well, in that they are used 
by CCP and the other converted passes and they provide their own version 
of value_of_expr to get what they want.
the hybrid ranger utilizes 2 concurrent instances of different 
range_query's to function right now, and uses the combined results.


GCC currently has no concept of a "global" range analyzer.  Its all been 
pass specific with EVRP and VRP being the 2 engines, and they both set 
some minimal global ranges that can be picked up elsewhere.   EVRP was 
enhanced to allow other passes to hook into it via a DOM walk (Like DOM 
and threading), but it was still pass specific information.  We've added 
Ranger with the longer term goal to replace both of the other engines.


The other passes have all been converted to this model fairly painlessly 
as most passes have some sort of local pass information awareness, so 
creating and accessing a range query engine wasnt a big deal.


I dont know if or when we will "promote" a ranger to a global access 
thing.. there hasn't been a need, but if there is more and more good 
reason, then its something we will certainly look at.  Maintaining 
global ranges across passes is not something that is easy to do, and not 
something we are likely to, so we'll need to respawn a new ranger for 
any pass that needs it anyway.  So having a pass spawn its own doesnt 
seem like a big stretch for now.





but we're not there yet, and we havent worked out the details :-) for 
the moment, passes need to figure out themselves how to access the 
ranger they create if they want one.   They had to manage a 
range_analyzer before if they used anything other than global ranges, 
so that is nothing new.


For EVRP, yes.  For everything else there's the global get_range_info.
It got ugly when a utility function like get_size_range was changed
to work with both.  I made that change and didn't like it then. I'm
trying to brainstorm ways to avoid spreading that wart too much
further.




Its not just evrp, and I assure you, thats not the ugliest thing in GCC :-).

we cant do everything up front, or it would be another 2 years before 
things happen.  until then, we work around things and fix them up when 
we can get to them.


Seems to me that either get_size_range takes a range_query object to get 
a range, or there are 2 versions.., or  it is in the wrong place now 
because its a global function trying to access contextual information, 
and becomes part of a class which has/is/uses a range_query object.   
Looking at it, it doesnt seem to be anything more than range_of_expr, 
plus some odd ANTI_RANGE stuff which can be completely avoided with a 
ranger and multirange support.


in fact, I think
  if (range_of_expr (r, exp, stmt)

pretty much does it?   handles all the integral values, constants. 
symbolic lookups  and returns FALSE if it isnt integral. You wont see an 
anti range if you use int_range<2> or higher, but you might see multiple 
ranges if it was an anti range before.. although you may only care about 
the lower_bound()  and upper_bound()... I dont know.


Of course you still need to access a ranger to make that call :-)

rangers are pretty lightweight for occasional queries... maybe the place 
to start is simply spawning a new one for each request and see where 
that gets you.  In fact, I think  thats what Aldy use to do in 
determine_value_range on the branch :-P I think the bottom line is 
if a pass is going to move from purely global information to something 
that is contextual, it has new constraints it didnt have before and may 
need some additional reshuffling to look pretty.



Andrew















Re: [PATCH] make handling of zero-length arrays in C++ pretty printer more robust (PR 97201)

2020-10-06 Thread Jason Merrill via Gcc-patches

On 9/25/20 2:58 PM, Martin Sebor wrote:

The C and C++ representations of zero-length arrays are different:
C uses a null upper bound of the type's domain while C++ uses
SIZE_MAX.  This makes the middle end logic more complicated (and
prone to mistakes) because it has to be prepared for both.  A recent
change to -Warray-bounds has the middle end create a zero-length
array to print in a warning message.  I forgot about this gotcha
and, as a result, when the warning triggers under these conditions
in C++, it causes an ICE in the C++ pretty printer that in turn
isn't prepared for the C form of the domain.

In my mind, the "right fix" is to make the representation the same
between the front ends, but I'm certain that such a change would
cause more problems before it solved them. > Another solution might
be to provide APIs for creating (and querying) arrays and have them
call language hooks in cases where the representation might differ.
But that would likely be quite intrusive as well.  So with that in
mind, for the time being, the attached patch just continues to deal
with the difference by teaching the C++ pretty printer to also
recognize the C form of the zero-length domain.

While testing the one line fix I noticed that -Warray-bounds (and
therefore, I assume also all other warnings that detect out of bounds
accesses to allocated objects) triggers only for the ordinary form of
operator new and not for the nothrow overload, for instance.  That's
because the ordinary form is recognized as a built-in which has
the alloc_size attribute attached to it.  But because the other forms
are neither built-in nor declared in  with the same attribute,
the warning doesn't trigger.  So the patch also adds the attribute
to the declarations of these overloads in .  In addition, it
adds attribute malloc to a couple of overloads of the operator that
it's missing from.


OK, thanks.

Jason



Re: [PATCH] c++: typename in out-of-class member function definitions [PR97297]

2020-10-06 Thread Jason Merrill via Gcc-patches

On 10/5/20 7:07 PM, Marek Polacek wrote:

I was notified that our P0634R3 (Down with typename) implementation has
a flaw: when we have an out-of-class member function definition, we
still required 'typename' for its parameters.  For example here:

   template  struct S {
 int simple(T::type);
   };
   template 
   int S::simple(/* typename */T::type) { return 0; }

the 'typename' isn't necessary per [temp.res]/5.2.4.  We have a qualified
name here ("S::simple") so we know it's already been declared so we
can look it up to see if it's a function template or a variable
template.

In this case, the P0634R3 code in cp_parser_direct_declarator wasn't
looking into uninstantiated templates and didn't find the member
function 'simple' -- cp_parser_lookup_name returned a SCOPE_REF which
means that the qualifying scope was dependent.  With this fix, we find
the BASELINK for 'simple', don't clear CP_PARSER_FLAGS_TYPENAME_OPTIONAL
from the flags, and the typename is implicitly assumed.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/97297
* parser.c (cp_parser_direct_declarator): When checking if a
name is a function template declaration for the P0634R3 case,
look in uninstantiated templates too.

gcc/testsuite/ChangeLog:

PR c++/97297
* g++.dg/cpp2a/typename18.C: New test.
---
  gcc/cp/parser.c | 10 --
  gcc/testsuite/g++.dg/cpp2a/typename18.C | 21 +
  2 files changed, 29 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/typename18.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index cb4422764ed..2002c05fdb5 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -21788,8 +21788,14 @@ cp_parser_direct_declarator (cp_parser* parser,
   templates, assume S::p to name a type.  Otherwise,
   don't.  */
tree decl
- = cp_parser_lookup_name_simple (parser, unqualified_name,
- token->location);
+ = cp_parser_lookup_name (parser, unqualified_name,
+  none_type,
+  /*is_template=*/false,
+  /*is_namespace=*/false,
+  /*check_dependency=*/false,
+  /*ambiguous_decls=*/NULL,
+  token->location);
+
if (!is_overloaded_fn (decl)
/* Allow
   template
diff --git a/gcc/testsuite/g++.dg/cpp2a/typename18.C 
b/gcc/testsuite/g++.dg/cpp2a/typename18.C
new file mode 100644
index 000..99468661491
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/typename18.C
@@ -0,0 +1,21 @@
+// PR c++/97297
+// { dg-do compile { target c++20 } }
+
+template 
+struct S {
+int simple(T::type);
+
+template 
+int member(U::type);
+};
+
+template 
+int S::simple(T::type) {
+return 1;
+}
+
+template 
+template 
+int S::member(U::type) {
+return 2;
+}

base-commit: ea6da7f50fe2adc3a09fc10a3f437902c40ebff9





Re: [PATCH] debug: Pass --gdwarf-N to assembler if fixed gas is detected during configure

2020-10-06 Thread Jason Merrill via Gcc-patches

On 10/6/20 11:54 AM, Mark Wielaard wrote:

Hi,

On Fri, 2020-09-18 at 17:21 +0200, Mark Wielaard wrote:

On Tue, 2020-09-15 at 20:40 +0200, Jakub Jelinek wrote:

Ok, here it is in patch form.
I've briefly tested it, with the older binutils I have around (no --gdwarf-N
support), with latest gas (--gdwarf-N that can be passed to as even when
compiling C/C++ etc. code and emitting .debug_line) and latest gas with Mark's 
fix
reverted (--gdwarf-N support, but can only pass it to as when assembling
user .s/.S files, not when compiling C/C++ etc.).
Will bootstrap/regtest (with the older binutils) later tonight.

2020-09-15  Jakub Jelinek  

* configure.ac (HAVE_AS_GDWARF_5_DEBUG_FLAG,
HAVE_AS_WORKING_DWARF_4_FLAG): New tests.
* gcc.c (ASM_DEBUG_DWARF_OPTION): Define.
(ASM_DEBUG_SPEC): Use ASM_DEBUG_DWARF_OPTION instead of
"--gdwarf2".  Use %{cond:opt1;:opt2} style.
(ASM_DEBUG_OPTION_DWARF_OPT): Define.
(ASM_DEBUG_OPTION_SPEC): Define.
(asm_debug_option): New variable.
(asm_options): Add "%(asm_debug_option)".
(static_specs): Add asm_debug_option entry.
(static_spec_functions): Add dwarf-version-gt.
(debug_level_greater_than_spec_func): New function.
* config/darwin.h (ASM_DEBUG_OPTION_SPEC): Define.
* config/darwin9.h (ASM_DEBUG_OPTION_SPEC): Redefine.
* config.in: Regenerated.
* configure: Regenerated.


Once this is in we can more generally emit DW_FORM_line_str for
filepaths in CU DIEs for the name and comp_dir attribute. There
currently is a bit of a hack to do this in dwarf2out_early_finish, but
that only works when the assembler doesn't emit a DWARF5 .debug_line,
but gcc does it itself.

What do you think of the attached patch?

DWARF5 has a new string table specially for file paths. .debug_line
file and dir tables reference strings in .debug_line_str.  If a
.debug_line_str section is emitted then also place CU DIE file
names and comp dirs there.

gcc/ChangeLog:

* dwarf2out.c (add_filepath_AT_string): New function.
(asm_outputs_debug_line_str): Likewise.
(add_filename_attribute): Likewise.
(add_comp_dir_attribute): Call add_filepath_AT_string.
(gen_compile_unit_die): Call add_filename_attribute for name.
(init_sections_and_labels): Init debug_line_str_section when
asm_outputs_debug_line_str return true.
(dwarf2out_early_finish): Remove DW_AT_name and DW_AT_comp_dir
hack and call add_filename_attribute for the remap_debug_filename.


On top of that, we also need the following, which makes sure the actual
compilation directory is used in a DWARF5 .debug_line directory table
(and not just a relative path).


All three of these patches (Jakub's, and your two) look good to me, 
except that your add_filepath_AT_string patch is missing comments on 
some of the new functions.


Jason




[patch][DOC]PR97309--improve documentation of -fallow-store-data-races

2020-10-06 Thread Qing Zhao via Gcc-patches
Richard,

On behalf of John Henning, I am sending the following simple documentation 
patch 
for an official approval.  Hopefully this patch can get into gcc11.

Per John, the wording of the documentation change was based on the discussion
 between him and you on 25-Aug-2020. 

I created a gcc bugzilla bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97309
to record this issue. 

Okay to commit?

thanks.

Qing

ChangeLog:


2020-10-06  John Henning  

* gcc/doc/invoke.texi: Improve documentation of -fallow-store-data-races

---
 gcc/doc/invoke.texi | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7c81d7f41bd..926ee1ff28e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11621,7 +11621,18 @@ Do not remove unused C++ allocations in dead code 
elimination.
 
 @item -fallow-store-data-races
 @opindex fallow-store-data-races
-Allow the compiler to introduce new data races on stores.
+Allow the compiler to perform optimizations that may introduce new data races
+on stores, without proving that the variable cannot be concurrently accessed
+by other threads.  Does not affect optimization of local data.  It is safe to
+use this option if it is known that global data will not be accessed by 
+multiple threads. 
+ 
+Examples of optimizations enabled by @option{-fallow-store-data-races} include
+hoisting or if-conversions that may cause a value that was already in memory 
+to be re-written with that same value.  Such re-writing is safe in a single 
+threaded context but may be unsafe in a multi-threaded context.  Note that on
+some processors, if-conversions may be required in order to enable 
+vectorization. 
 
 Enabled at level @option{-Ofast}.
 
-- 
2.11.0



Go patch committed: Avoid undefined behavior in Import::read

2020-10-06 Thread Ian Lance Taylor via Gcc-patches
This Go frontend patch by Nikhil Benesch avoids undefined behavior in
Import::read.  For some implementations of Stream, advancing the
stream will invalidate the previously-returned peek buffer.  Copy the
peek buffer before
advancing in Import::read to avoid this undefined behavior.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
1cf34967b5ae0439cd0b9955594aaabddc4c5f82
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 701b2d427e3..c5c02aa2392 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-d00febdab0535546ccbf1ef634be1f23b09c8b77
+613e530547549f4220c4571ea913acbe5fa56f72
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/gogo.cc b/gcc/go/gofrontend/gogo.cc
index aef1c47d26e..f40f13129e6 100644
--- a/gcc/go/gofrontend/gogo.cc
+++ b/gcc/go/gofrontend/gogo.cc
@@ -6212,7 +6212,7 @@ Function::import_func(Import* imp, std::string* pname,
  return false;
}
 
-  *body = imp->read(static_cast(llen));
+  imp->read(static_cast(llen), body);
 }
 
   return true;
diff --git a/gcc/go/gofrontend/import.cc b/gcc/go/gofrontend/import.cc
index 081afefa083..c6c1178bc24 100644
--- a/gcc/go/gofrontend/import.cc
+++ b/gcc/go/gofrontend/import.cc
@@ -1375,8 +1375,8 @@ Import::read_name()
 
 // Read LENGTH bytes from the stream.
 
-std::string
-Import::read(size_t length)
+void
+Import::read(size_t length, std::string* out)
 {
   const char* data;
   if (!this->stream_->peek(length, ))
@@ -1385,10 +1385,11 @@ Import::read(size_t length)
go_error_at(this->location_, "import error at %d: expected %d bytes",
this->stream_->pos(), static_cast(length));
   this->stream_->set_saw_error();
-  return "";
+  *out = std::string("");
+  return;
 }
+  *out = std::string(data, length);
   this->advance(length);
-  return std::string(data, length);
 }
 
 // Turn a string into a integer with appropriate error handling.
diff --git a/gcc/go/gofrontend/import.h b/gcc/go/gofrontend/import.h
index b12b3b843df..1d8aae493f2 100644
--- a/gcc/go/gofrontend/import.h
+++ b/gcc/go/gofrontend/import.h
@@ -240,10 +240,10 @@ class Import : public Import_expression
   get_char()
   { return this->stream_->get_char(); }
 
-  // Read LENGTH characters into a string and advance past them.  On
-  // EOF reports an error and returns an empty string.
-  std::string
-  read(size_t length);
+  // Read LENGTH characters into *OUT and advance past them.  On
+  // EOF reports an error and sets *OUT to an empty string.
+  void
+  read(size_t length, std::string* out);
 
   // Return true at the end of the stream.
   bool


Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Martin Sebor via Gcc-patches

On 10/6/20 11:58 AM, Andrew MacLeod wrote:

On 10/6/20 1:48 PM, Jakub Jelinek wrote:

On Tue, Oct 06, 2020 at 01:41:54PM -0400, Andrew MacLeod wrote:

On 10/6/20 1:32 PM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Oct 06, 2020 at 10:42:12AM -0600, Martin Sebor wrote:

The manual documents the [0] extension and mentions but discourages
using [1].  Nothing is said about other sizes and the warnings such
as -Warray-bounds have been increasingly complaining about accesses
past the declared constant bound (it won't complain about past-
the-end accesses to a mem[1], but will about those to mem[2]).

It would be nice if existing GCC code could eventually be converted
to avoid relying on the [1] hack.  I would hope we would avoid making
use of it in new code (and certainly avoid extending its uses to other
sizes).

I don't see how we could, because [0] is an extension and GCC needs to
support host compilers that don't support it, and similarly [] is an
extension in C++ and can't be relied on.
Changing say rtl or gimple structs from the way we define them 
currently

to say templates dependent on the size of the embedded arrays is I'm
afraid not really possible.

Jakub


Aldy is currently testing this.. I presume this is what you had in mind?

Yes.
I'm not actually sure if it is best to use struct irange (and why struct
irange rather than just irange?) for the r variable, whether it shouldn't
be unsigned char * or whatever is best suitable type for placement new.


bah, the struct is a carry over from when it was struct new_ir. doh.


as to the type for placement new, I dunno.


Placement new takes void* as an argument.  Since that's also what
obstack_alloc() returns that should obviate the need for a cast.

Martin






diff --git a/gcc/value-range.h b/gcc/value-range.h
index 94b48e55e77..c86301fe885 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -668,13 +668,13 @@ irange_allocator::allocate (unsigned num_pairs)
    if (num_pairs < 2)
  num_pairs = 2;
-  struct newir {
-    irange range;
-    tree mem[1];
-  };
-  size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 
1));

-  struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);
-  return new (r) irange (r->mem, num_pairs);
+  // Allocate the irange and required memory for the vector.
+  struct irange *r = (irange *) obstack_alloc (_obstack, sizeof 
(irange));

+
+  size_t nbytes = sizeof (tree) * 2 * num_pairs;
+  tree *mem = (tree *) obstack_alloc (_obstack, nbytes);
+
+  return new (r) irange (mem, num_pairs);
  }
  inline irange *


Jakub






Re: [PATCH, libstdc++] Improve the performance of std::uniform_int_distribution (fewer divisions)

2020-10-06 Thread Jonathan Wakely via Gcc-patches

On 06/10/20 15:55 -0400, Daniel Lemire via Libstdc++ wrote:

The updated patch looks good to me. It is indeed cleaner to have a separate
(static) function.

It might be nice to add a comment to explain the _S_nd function maybe with
a comment like "returns a random value in [0,__range)
without any bias" (or something to that effect).


Indeed. My current local branch has this comment on _S_nd:

+  // Lemire's nearly divisionless algorithm.
+  // Returns a random number from __g downscaled to [0,__range)
+  // using an unsigned type _Wp twice as wide as unsigned type _Up.

I think "Returns an unbiased random number from ..." would be an
improvement.


Otherwise, it is algorithmically correct.


Great, thanks for the review.

I'll get it committed tomorrow then. Thanks for the patch, and sorry
it took so long.



Re: [PATCH, libstdc++] Improve the performance of std::uniform_int_distribution (fewer divisions)

2020-10-06 Thread Daniel Lemire via Gcc-patches
The updated patch looks good to me. It is indeed cleaner to have a separate
(static) function.

It might be nice to add a comment to explain the _S_nd function maybe with
a comment like "returns a random value in [0,__range)
without any bias" (or something to that effect).

Otherwise, it is algorithmically correct.


On Mon, Oct 5, 2020 at 7:40 PM Jonathan Wakely  wrote:

> On 06/10/20 00:25 +0100, Jonathan Wakely wrote:
> >I'm sorry it's taken a year to review this properly. Comments below ...
> >
> >On 27/09/19 14:18 -0400, Daniel Lemire wrote:
> >>(This is a revised patch proposal. I am revising both the description
> >>and the code itself.)
> >>
> >>Even on recent processors, integer division is relatively expensive.
> >>The current implementation of  std::uniform_int_distribution typically
> >>requires two divisions by invocation:
> >>
> >>   // downscaling
> >>const __uctype __uerange = __urange + 1; // __urange can be zero
> >>const __uctype __scaling = __urngrange / __uerange;
> >>const __uctype __past = __uerange * __scaling;
> >>do
> >>  __ret = __uctype(__urng()) - __urngmin;
> >>while (__ret >= __past);
> >>__ret /= __scaling;
> >>
> >>We can achieve the same algorithmic result with at most one division,
> >>and typically no division at all without requiring more calls to the
> >>random number generator.
> >>This was recently added to Swift (
> https://github.com/apple/swift/pull/25286)
> >>
> >>The main challenge is that we need to be able to compute the "full"
> >>product. E.g., given two 64-bit integers, we want the 128-bit result;
> >>given two 32-bit integers we want the 64-bit result. This is fast on
> >>common processors.
> >>The 128-bit product is not natively supported in C/C++ but can be
> >>achieved with the
> >>__int128 extension when it is available. The patch checks for
> >>__int128 support; when
> >>support is lacking, we fallback on the existing approach which uses
> >>two divisions per
> >>call.
> >>
> >>For example, if we replace the above code by the following, we get a
> substantial
> >>performance boost on skylake microarchitectures. E.g., it can
> >>be twice as fast to shuffle arrays of 1 million elements (e.g., using
> >>the followingbenchmark:
> https://github.com/lemire/simple_cpp_shuffle_benchmark )
> >>
> >>
> >> unsigned __int128 __product = (unsigned
> >>__int128)(__uctype(__urng()) - __urngmin) * uint64_t(__uerange);
> >> uint64_t __lsb = uint64_t(__product);
> >> if(__lsb < __uerange)
> >> {
> >>   uint64_t __threshold = -uint64_t(__uerange) % uint64_t(__uerange);
> >>   while (__lsb < __threshold)
> >>   {
> >> __product = (unsigned __int128)(__uctype(__urng()) -
> >>__urngmin) * (unsigned __int128)(__uerange);
> >> __lsb = uint64_t(__product);
> >>   }
> >> }
> >> __ret = __product >> 64;
> >>
> >>Included is a patch that would bring better performance (e.g., 2x gain)
> to
> >>std::uniform_int_distribution  in some cases. Here are some actual
> numbers...
> >>
> >>With this patch:
> >>
> >>std::shuffle(testvalues, testvalues + size, g)  :  7952091
> >>ns total,  7.95 ns per input key
> >>
> >>Before this patch:
> >>
> >>std::shuffle(testvalues, testvalues + size, g)  :
> >>14954058 ns total,  14.95 ns per input key
> >>
> >>
> >>Compiler: GNU GCC 8.3 with -O3, hardware: Skylake (i7-6700).
> >>
> >>Furthermore, the new algorithm is unbiased, so the randomness of the
> >>result is not affected.
> >>
> >>I ran both performance and biases tests with the proposed patch.
> >>
> >>
> >>This patch proposal was improved following feedback by Jonathan
> >>Wakely. An earlier version used the __uint128_t type, which is widely
> >>supported but not used in the C++ library, instead we now use unsigned
> >>__int128. Furthermore, the previous patch was accidentally broken: it
> >>was not computing the full product since a rhs cast was missing. These
> >>issues are fixed and verified.
> >
> >After looking at GCC's internals, it looks like __uint128_t is
> >actually fine to use, even though we never currently use it in the
> >library. I didn't even know it was supported for C++ mode, sorry!
> >
> >>Reference: Fast Random Integer Generation in an Interval, ACM
> Transactions on
> >>Modeling and Computer Simulation 29 (1), 2019
> https://arxiv.org/abs/1805.10941
> >
> >>Index: libstdc++-v3/include/bits/uniform_int_dist.h
> >>===
> >>--- libstdc++-v3/include/bits/uniform_int_dist.h  (revision 276183)
> >>+++ libstdc++-v3/include/bits/uniform_int_dist.h  (working copy)
> >>@@ -33,7 +33,8 @@
> >>
> >>#include 
> >>#include 
> >>-
> >>+#include 
> >>+#include 
> >>namespace std _GLIBCXX_VISIBILITY(default)
> >>{
> >>_GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>@@ -239,18 +240,61 @@
> >>= __uctype(__param.b()) - __uctype(__param.a());
> >>
> >>  __uctype __ret;
> >>-
> >>-  

Re: [PATCH] c++: Distinguish btw. alignof and __alignof__ in cp_tree_equal [PR97273]

2020-10-06 Thread Jason Merrill via Gcc-patches

On 10/4/20 11:28 PM, Patrick Palka wrote:

cp_tree_equal currently considers alignof the same as __alignof__, but
these operators are semantically different ever since r8-7957.  In the
testcase below, this causes the second static_assert to fail on targets
where alignof(double) != __alignof__(double) because the specialization
cache (which uses cp_tree_equal as the equality predicate) conflates the
two dependent specializations integral_constant<__alignof__(T)> and
integral_constant.

This patch makes cp_tree_equal distinguish between these two operators
by inspecting the ALIGNOF_EXPR_STD_P flag.

Bootstrapped and regtested on x86_64-pc-linux-gnu, and also verified
that we now correctly compile the  PR97273 testcase, does this look OK
for trunk and the release branches?


OK.


gcc/cp/ChangeLog:

PR c++/88115
PR libstdc++/97273
* tree.c (cp_tree_equal) : Return false if
ALIGNOF_EXPR_STD_P differ.

gcc/testsuite/ChangeLog:

PR c++/88115
PR libstdc++/97273
* g++.dg/template/alignof3.C: New test.
---
  gcc/cp/tree.c|  2 ++
  gcc/testsuite/g++.dg/template/alignof3.C | 13 +
  2 files changed, 15 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/template/alignof3.C

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 8b7c6798ee9..a3fc5372cb7 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -3800,6 +3800,8 @@ cp_tree_equal (tree t1, tree t2)
if (SIZEOF_EXPR_TYPE_P (t2))
  o2 = TREE_TYPE (o2);
  }
+   else if (ALIGNOF_EXPR_STD_P (t1) != ALIGNOF_EXPR_STD_P (t2))
+ return false;
  
  	if (TREE_CODE (o1) != TREE_CODE (o2))

  return false;
diff --git a/gcc/testsuite/g++.dg/template/alignof3.C 
b/gcc/testsuite/g++.dg/template/alignof3.C
new file mode 100644
index 000..e573727c5f2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/alignof3.C
@@ -0,0 +1,13 @@
+// PR c++/88115
+// { dg-do compile { target c++11 } }
+
+template
+struct integral_constant {
+  static constexpr int value = __v;
+};
+
+template  using StdAlignOf = integral_constant;
+template  using GCCAlignOf = integral_constant<__alignof__(T)>;
+
+static_assert(StdAlignOf::value == alignof(double), "");
+static_assert(GCCAlignOf::value == __alignof__(double), "");





Re: [PATCH] c++: ICE in dependent_type_p with constrained auto [PR97052]

2020-10-06 Thread Jason Merrill via Gcc-patches

On 10/5/20 4:30 PM, Patrick Palka wrote:

On Wed, 30 Sep 2020, Jason Merrill wrote:


On 9/29/20 5:01 PM, Patrick Palka wrote:

This patch fixes an "unguarded" call to coerce_template_parms in
build_standard_check: processing_template_decl could be zero if we
we get here during processing of the first 'auto' parameter of an
abbreviated function template.  In the testcase below, this leads to an
ICE when coerce_template_parms substitutes into C's dependent default
template argument.

Bootstrapped and regtested on x86_64-pc-linux-gnu and tested by building
cmcstl2 and range-v3.  Does this look OK for trunk?


This looks OK, but is there a place higher in the call stack where we should
have already set processing_template_decl?


The call stack at that point is:

   build_variable_check
   build_concept_check
   build_type_constraint
   finish_type_constraints
   cp_parser_placeholder_type_specifier
   cp_parser_simple_type_specifier
   ...

So it seems the most natural place to set processing_template_decl would
be in build_type_constraint, around the call to build_concept_check,
since that's where we create the WILDCARD_DECL that eventually reaches
coerce_template_parms.

And in order to additionally avoid a similar ICE when processing the
type constraint of a non-templated variable, we also need to guard the
call to build_concept check in make_constrained_placeholder_type.  The
testcase below now contains such an example.


Setting the flag in cp_parser_placeholder_type_specifier would cover 
both of those, right?



So something like this perhaps:

-- >8 --

Subject: [PATCH] c++: ICE in dependent_type_p with constrained auto [PR97052]

This patch fixes an "unguarded" call to coerce_template_parms in
build_standard_check: processing_template_decl could be zero if we
get here during processing of the first 'auto' parameter of an
abbreviated function template, or if we're processing the type
constraint of a non-templated variable.  In the testcase below, this
leads to an ICE when coerce_template_parms instantiates C's dependent
default template argument.

gcc/cp/ChangeLog:

PR c++/97052
* constraint.cc (build_type_constraint): Temporarily increment
processing_template_decl before calling build_concept_check.
* pt.c (make_constrained_placeholder_type): Likewise.

gcc/testsuite/ChangeLog:

PR c++/97052
* g++.dg/cpp2a/concepts-defarg2: New test.
---
  gcc/cp/constraint.cc  |  2 ++
  gcc/cp/pt.c   |  2 ++
  gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C | 13 +
  3 files changed, 17 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index d49957a6c4a..050b55ce092 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1427,7 +1427,9 @@ tree
  build_type_constraint (tree decl, tree args, tsubst_flags_t complain)
  {
tree wildcard = build_nt (WILDCARD_DECL);
+  ++processing_template_decl;
tree check = build_concept_check (decl, wildcard, args, complain);
+  --processing_template_decl;
if (check == error_mark_node)
  return error_mark_node;
return unpack_concept_check (check);
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 72efecff37f..efdd017a4d5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -27914,7 +27914,9 @@ make_constrained_placeholder_type (tree type, tree con, 
tree args)
tree expr = tmpl;
if (TREE_CODE (con) == FUNCTION_DECL)
  expr = ovl_make (tmpl);
+  ++processing_template_decl;
expr = build_concept_check (expr, type, args, tf_warning_or_error);
+  --processing_template_decl;
  
PLACEHOLDER_TYPE_CONSTRAINTS (type) = expr;
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C b/gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C

new file mode 100644
index 000..a63ca4e133d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C
@@ -0,0 +1,13 @@
+// PR c++/97052
+// { dg-do compile { target c++20 } }
+
+template
+concept C = true;
+
+constexpr bool f(C auto) {
+  return true;
+}
+
+static_assert(f(0));
+
+C auto x = 0;





Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Martin Sebor via Gcc-patches

On 10/5/20 8:18 PM, Andrew MacLeod wrote:

On 10/5/20 4:16 PM, Martin Sebor wrote:

On 10/5/20 8:50 AM, Aldy Hernandez via Gcc-patches wrote:

[Martin, as the original author of this pass, do you have any concerns?]



@@ -1270,7 +1271,21 @@ get_size_range (tree exp, tree range[2], bool 
allow_zero /* = false */)

    enum value_range_kind range_type;

    if (integral)
-    range_type = determine_value_range (exp, , );
+    {
+  if (query)
+    {
+  value_range vr;
+  gcc_assert (TREE_CODE (exp) == SSA_NAME
+  || TREE_CODE (exp) == INTEGER_CST);
+  gcc_assert (query->range_of_expr (vr, exp, stmt));


Will the call to the function in the assert above not be eliminated
if the assert is turned into a no-op?  If it can't happen (it looks
like it shouldn't anymore), it would still be nice to break it out
of the macro.  Those of us used to the semantics of the C standard
assert macro might wonder.

gcc_assert contents will not be eliminated in a release compiler, only 
the actual check of the return value.    The body of the assert will 
continue to be executed.


This exists because if we were to try to check the return value, we'd 
have to do something like

   bool ret = range_of_expr (...)
   gcc_checking_assert (ret);

and when the checking assert goes away, we're left with an unused 
variable 'ret' warning.   the gcc_assert ()  resolves that issue.


Right.  The unused warning for the variable would of course have to
be avoided.  There are several ways of doing that we're all familiar
with.  What I wasn't sure about and had to go out of my way to
convince myself of is that the gcc_assert() argument isn't eliminated
when checking is disabled.  There's still a definition of gcc_assert
in gcc/system.h where it is eliminated, but that definition is never
used).  I think the code should be changed so that others (or me if
I forget) don't wonder the same thing in the future.  Another
possibility is to add a comment reassuring the reader that
the argument is always evaluated.



I'm not a huge fan of them, but they do serve a purpose and seem better 
than the alternatives :-P


The first assert should however be a gcc_checking_assert since its just 
a check.. and then that will go away in a release compiler.



-/* Execute the pass for function FUN, walking in dominator order.  */
-
  unsigned
  pass_wrestrict::execute (function *fun)
  {
-  calculate_dominance_info (CDI_DOMINATORS);
-
-  wrestrict_dom_walker walker;
-  walker.walk (ENTRY_BLOCK_PTR_FOR_FN (fun));
+  gimple_ranger ranger;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    wrestrict_walk (, bb);

    return 0;
  }
@@ -159,11 +144,14 @@ public:
   only the destination reference is.  */
    bool strbounded_p;

-  builtin_memref (tree, tree);
+  builtin_memref (range_query *, gimple *, tree, tree);

    tree offset_out_of_bounds (int, offset_int[3]) const;

  private:
+  gimple *stmt;
+
+  range_query *query;


Also please add a comment above STMT to make it clear it's the call
statement to the builtin.

For QUERY, I'm not sure adding a member to every class that needs
to compute ranges is the right design.  At the same time, adding
an argument to every function that computes ranges isn't great
either.  It seems like there should be one shared ranger instance
that could be used by all clients/passes as well as untility
functions called from them.  It could be a global object set/pushed
by each pass when it starts and popped when it ends, and managed by
the ranger API.  Say a static member function:

  range_query* range_query::instance ();
  range_query* range_query::push_instance (range_query*);
  range_query* range_query::pop_instance ();

As some background, when I wrote the builtin_access access
I envisioned using it as a general-purpose class in other similar
contexts.  That hasn't quite happened yet but there is a class
kind of like it that might eventually end up taking the place of
builtin_access.  It's access_ref in builtins.h.  And while neither
class crates a lot of instances so far, I'm about to post a patch
that does create one or two instances of access_ref per SSA_NAME
of pointer type.  Having an extra member in each instance just
to gain access to an API would be excessive.

I'm not saying all this as an objection to the change but more
as something to think about going forward.


Long term, I would expect we might have some sort of general access... 
probably thru cfun.     so any pass/routines would just ask for

     RANGE_INFO (cfun)->range_of_expr()
The default would be a general value_range implementation which probably 
implements something like determine_value_range_1 ().. and if a pass 
wants to use a ranger, then it could register a ranger, and when its 
done delete it.  and it would just work for everyone everywhere.


That would work too.

As a side note, I don't yet fully understand when it might be useful
to have different range_query instances.  We talked about value_query
and 

Re: [PATCH] RFC: add "deallocated_by" attribute for use by analyzer

2020-10-06 Thread Martin Sebor via Gcc-patches

On 10/5/20 5:12 PM, David Malcolm via Gcc-patches wrote:

This work-in-progress patch generalizes the malloc/free problem-checking
in -fanalyzer so that it can work on arbitrary acquire/release API pairs.

It adds a new __attribute__((deallocated_by(FOO))) that could be used
like this in a library header:

   struct foo;

   extern void foo_release (struct foo *);

   extern struct foo *foo_acquire (void)
 __attribute__ ((deallocated_by(foo_release)));

In theory, the analyzer then "knows" these functions are an
acquire/release pair, and can emit diagnostics for leaks, double-frees,
use-after-frees, mismatching deallocations, etc.

My hope was that this would provide a minimal level of markup that would
support library-checking without requiring lots of further markup.
I attempted to use this to detect a memory leak within a Linux
driver (CVE-2019-19078), by adding the attribute to mark these fns:
   extern struct urb *usb_alloc_urb(int iso_packets, gfp_t mem_flags);
   extern void usb_free_urb(struct urb *urb);
where there is a leak of a "urb" on an error-handling path.
Unfortunately I ran into the problem that there are various other fns
that take "struct urb *" and the analyzer conservatively assumes that a
urb passed to them might or might not be freed and thus stops tracking
state for them.

So I don't know how much use this feature would be as-is.
(without either requiring lots of additional attributes for marking
fndecl args as being merely borrowed, or simply assuming that they
are borrowed in the absence of a function body to analyze)

Thoughts?


An attribute like this is close to what PR 94527 asks for, except
it goes beyond pointers and supports allocating resources of any
type.  The latter part seems a bit worrisome to me for some of
the same reason as Linus mentions in comment #8 on the request
(among others).

I've got a prototype solution for the pointer part of the request,
sans the attribute, but should be able to use it with just a few
lines of code.

As for the design of the attribute, my initial thought was to
implement it as an extension to attribute malloc, since I'd always
expect a deallocator to be paired with one (though not the other
way around, as in the case of alloca).

I also wanted the attribute to be suitable for standard allocation
functions, including malloc/realloc, as well as I/O functions like
fopen and freopen.

I wasn't thinking of imposing any particular order on the arguments
to the deallocator (it would rule out supporting freopen).

So with that, I was going to add this syntax to attribute malloc:

  malloc [(deallocator, argno)]

and allow multiple malloc attributes with different deallocators on
the same allocator function, like this:

  __attribute__ ((malloc (free, 1),
  malloc (realloc, 1)))
void* malloc (size_t);

None of this exists even as a prototype so there could be flaws in
the idea I'm not thinking of.  From what I can tell, deallocated_by
could be used this way as well although I don't see it discussed in
the documentation.

I don't view adding deallocated_by on top of malloc as a problem
per se, but it does raise the question of what the attribute alone
implies for points-to analysis.  For example:

  __attribute__ ((deallocated_by (my_dealloc)))
  void* my_alloc (int);

  void f (char *p)
  {
char *q = my_alloc (1);
int c = p < q;   // is this considered valid?
my_dealloc (p);
my_dealloc (q);  // if yes, is this valid?
return c;
  }

With attribute malloc, my prototype diagnoses the inequality because
the two pointers point to distinct objects.  If the inequality in
the above were to be considered valid with deallocated_by (and without
malloc) it would imply that p and q could point to the same object,
and, in fact, be equal, making the second call to my_dealloc() invalid.

The attribute could be specified in a way to give sensible asnwers
to these questions, but I'm not sure they can be answered from
the documentation diff.  With attribute malloc, on the other hand,
the answers are well established.  That doesn't preclude adding
deallocated_by, but I'd think it should be a superset of malloc.
For instance, it could be required to go along with it when
specified for pointers.  As I mentioned, I have misgivings about
allowing it on other types but I haven't thought about those use
cases long or hard enough to either dispel or validate them.

After thinking about this over lunch I wonder if it might be better
to itroduce two attributes: one for memory allocation and pointers
and another for non-pointer types.

...

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index c779d13f023..ce2a958a1bc 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -127,6 +127,7 @@ static tree handle_cleanup_attribute (tree *, tree, tree, 
int, bool *);
  static tree handle_warn_unused_result_attribute (tree *, tree, tree, int,
   

Re: [PATCH, libstdc++] Improve the performance of std::uniform_int_distribution (fewer divisions)

2020-10-06 Thread Daniel Lemire via Gcc-patches
>
> >The condition above checks that __urngrange == 2**64-1 which means
> >that U::max() - U::min() is the maximum 64-bit value, which means
> >means U::max()==2**64-1 and U::min()==0. So if U::min() is 0 we don't
> >need to subtract it.
>

That sounds correct.


Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 2:41 PM, Andreas Schwab wrote:

On Okt 06 2020, Andrew MacLeod via Gcc-patches wrote:


diff --git a/gcc/value-range.h b/gcc/value-range.h
index 7031a823138..02a952bf81f 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -668,13 +668,12 @@ irange_allocator::allocate (unsigned num_pairs)
if (num_pairs < 2)
  num_pairs = 2;
  
-  struct newir {

-irange range;
-tree mem[2];
-  };
-  size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
-  struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);
-  return new (r) irange (r->mem, num_pairs);
+  size_t nbytes = sizeof (tree) * 2 * num_pairs;
+
+  // Allocate the irnge and  required memory for the vector

Typo: irange

Andreas.

Ha. Its all good now. THIS is actually the final final FINAL patch 
which is going thru testing.



diff --git a/gcc/value-range.h b/gcc/value-range.h
index 7031a823138..63c96204cda 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -668,13 +668,12 @@ irange_allocator::allocate (unsigned num_pairs)
   if (num_pairs < 2)
 num_pairs = 2;
 
-  struct newir {
-irange range;
-tree mem[2];
-  };
-  size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
-  struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);
-  return new (r) irange (r->mem, num_pairs);
+  size_t nbytes = sizeof (tree) * 2 * num_pairs;
+
+  // Allocate the irange and required memory for the vector.
+  void *r = obstack_alloc (_obstack, sizeof (irange));
+  tree *mem = (tree *) obstack_alloc (_obstack, nbytes);
+  return new (r) irange (mem, num_pairs);
 }
 
 inline irange *


Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andreas Schwab
On Okt 06 2020, Andrew MacLeod via Gcc-patches wrote:

> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 7031a823138..02a952bf81f 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -668,13 +668,12 @@ irange_allocator::allocate (unsigned num_pairs)
>if (num_pairs < 2)
>  num_pairs = 2;
>  
> -  struct newir {
> -irange range;
> -tree mem[2];
> -  };
> -  size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
> -  struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);
> -  return new (r) irange (r->mem, num_pairs);
> +  size_t nbytes = sizeof (tree) * 2 * num_pairs;
> +
> +  // Allocate the irnge and  required memory for the vector

Typo: irange

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 2:18 PM, Jakub Jelinek wrote:

On Tue, Oct 06, 2020 at 02:09:42PM -0400, Andrew MacLeod wrote:

+  size_t nbytes = sizeof (tree) * 2 * num_pairs;
+
+  // Allocate the irnge and  required memory for the vector
+  void *r = (irange *) obstack_alloc (_obstack, sizeof (irange));

Then either
   void *r = (void *) obstack_alloc (_obstack, sizeof (irange));
or even better
   void *r = obstack_alloc (_obstack, sizeof (irange));



Id already noticed that and went with the latter




Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Tobias Burnus

The following cast looks odd:

On 10/6/20 8:09 PM, Andrew MacLeod via Gcc-patches wrote:


+  // Allocate the irnge and  required memory for the vector
+  void *r = (irange *) obstack_alloc (_obstack, sizeof (irange));

Tobias
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 06, 2020 at 02:09:42PM -0400, Andrew MacLeod wrote:
> +  size_t nbytes = sizeof (tree) * 2 * num_pairs;
> +
> +  // Allocate the irnge and  required memory for the vector
> +  void *r = (irange *) obstack_alloc (_obstack, sizeof (irange));

Then either
  void *r = (void *) obstack_alloc (_obstack, sizeof (irange));
or even better
  void *r = obstack_alloc (_obstack, sizeof (irange));

> +  tree *mem = (tree *) obstack_alloc (_obstack, nbytes);
> +  return new (r) irange (mem, num_pairs);
>  }
>  
>  inline irange *


Jakub



Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 1:58 PM, Andrew MacLeod via Gcc-patches wrote:

On 10/6/20 1:48 PM, Jakub Jelinek wrote:

On Tue, Oct 06, 2020 at 01:41:54PM -0400, Andrew MacLeod wrote:

On 10/6/20 1:32 PM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Oct 06, 2020 at 10:42:12AM -0600, Martin Sebor wrote:

The manual documents the [0] extension and mentions but discourages
using [1].  Nothing is said about other sizes and the warnings such
as -Warray-bounds have been increasingly complaining about accesses
past the declared constant bound (it won't complain about past-
the-end accesses to a mem[1], but will about those to mem[2]).

It would be nice if existing GCC code could eventually be converted
to avoid relying on the [1] hack.  I would hope we would avoid making
use of it in new code (and certainly avoid extending its uses to 
other

sizes).

I don't see how we could, because [0] is an extension and GCC needs to
support host compilers that don't support it, and similarly [] is an
extension in C++ and can't be relied on.
Changing say rtl or gimple structs from the way we define them 
currently

to say templates dependent on the size of the embedded arrays is I'm
afraid not really possible.

Jakub

Aldy is currently testing this.. I presume this is what you had in 
mind?

Yes.
I'm not actually sure if it is best to use struct irange (and why struct
irange rather than just irange?) for the r variable, whether it 
shouldn't

be unsigned char * or whatever is best suitable type for placement new.


bah, the struct is a carry over from when it was struct new_ir. doh.


as to the type for placement new, I dunno.

The authorities inform me that void * is preferred.. so.. final spin, in 
testing now:




diff --git a/gcc/value-range.h b/gcc/value-range.h
index 7031a823138..02a952bf81f 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -668,13 +668,12 @@ irange_allocator::allocate (unsigned num_pairs)
   if (num_pairs < 2)
 num_pairs = 2;
 
-  struct newir {
-irange range;
-tree mem[2];
-  };
-  size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
-  struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);
-  return new (r) irange (r->mem, num_pairs);
+  size_t nbytes = sizeof (tree) * 2 * num_pairs;
+
+  // Allocate the irnge and  required memory for the vector
+  void *r = (irange *) obstack_alloc (_obstack, sizeof (irange));
+  tree *mem = (tree *) obstack_alloc (_obstack, nbytes);
+  return new (r) irange (mem, num_pairs);
 }
 
 inline irange *


Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 1:48 PM, Jakub Jelinek wrote:

On Tue, Oct 06, 2020 at 01:41:54PM -0400, Andrew MacLeod wrote:

On 10/6/20 1:32 PM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Oct 06, 2020 at 10:42:12AM -0600, Martin Sebor wrote:

The manual documents the [0] extension and mentions but discourages
using [1].  Nothing is said about other sizes and the warnings such
as -Warray-bounds have been increasingly complaining about accesses
past the declared constant bound (it won't complain about past-
the-end accesses to a mem[1], but will about those to mem[2]).

It would be nice if existing GCC code could eventually be converted
to avoid relying on the [1] hack.  I would hope we would avoid making
use of it in new code (and certainly avoid extending its uses to other
sizes).

I don't see how we could, because [0] is an extension and GCC needs to
support host compilers that don't support it, and similarly [] is an
extension in C++ and can't be relied on.
Changing say rtl or gimple structs from the way we define them currently
to say templates dependent on the size of the embedded arrays is I'm
afraid not really possible.

Jakub


Aldy is currently testing this.. I presume this is what you had in mind?

Yes.
I'm not actually sure if it is best to use struct irange (and why struct
irange rather than just irange?) for the r variable, whether it shouldn't
be unsigned char * or whatever is best suitable type for placement new.


bah, the struct is a carry over from when it was struct new_ir. doh.


as to the type for placement new, I dunno.




diff --git a/gcc/value-range.h b/gcc/value-range.h
index 94b48e55e77..c86301fe885 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -668,13 +668,13 @@ irange_allocator::allocate (unsigned num_pairs)
if (num_pairs < 2)
  num_pairs = 2;
  
-  struct newir {

-irange range;
-tree mem[1];
-  };
-  size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
-  struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);
-  return new (r) irange (r->mem, num_pairs);
+  // Allocate the irange and required memory for the vector.
+  struct irange *r = (irange *) obstack_alloc (_obstack, sizeof (irange));
+
+  size_t nbytes = sizeof (tree) * 2 * num_pairs;
+  tree *mem = (tree *) obstack_alloc (_obstack, nbytes);
+
+  return new (r) irange (mem, num_pairs);
  }
  
  inline irange *


Jakub




Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 06, 2020 at 01:41:54PM -0400, Andrew MacLeod wrote:
> On 10/6/20 1:32 PM, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Oct 06, 2020 at 10:42:12AM -0600, Martin Sebor wrote:
> > > The manual documents the [0] extension and mentions but discourages
> > > using [1].  Nothing is said about other sizes and the warnings such
> > > as -Warray-bounds have been increasingly complaining about accesses
> > > past the declared constant bound (it won't complain about past-
> > > the-end accesses to a mem[1], but will about those to mem[2]).
> > > 
> > > It would be nice if existing GCC code could eventually be converted
> > > to avoid relying on the [1] hack.  I would hope we would avoid making
> > > use of it in new code (and certainly avoid extending its uses to other
> > > sizes).
> > I don't see how we could, because [0] is an extension and GCC needs to
> > support host compilers that don't support it, and similarly [] is an
> > extension in C++ and can't be relied on.
> > Changing say rtl or gimple structs from the way we define them currently
> > to say templates dependent on the size of the embedded arrays is I'm
> > afraid not really possible.
> > 
> > Jakub
> > 
> Aldy is currently testing this.. I presume this is what you had in mind?

Yes.
I'm not actually sure if it is best to use struct irange (and why struct
irange rather than just irange?) for the r variable, whether it shouldn't
be unsigned char * or whatever is best suitable type for placement new.

> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 94b48e55e77..c86301fe885 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -668,13 +668,13 @@ irange_allocator::allocate (unsigned num_pairs)
>if (num_pairs < 2)
>  num_pairs = 2;
>  
> -  struct newir {
> -irange range;
> -tree mem[1];
> -  };
> -  size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
> -  struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);
> -  return new (r) irange (r->mem, num_pairs);
> +  // Allocate the irange and required memory for the vector.
> +  struct irange *r = (irange *) obstack_alloc (_obstack, sizeof (irange));
> +
> +  size_t nbytes = sizeof (tree) * 2 * num_pairs;
> +  tree *mem = (tree *) obstack_alloc (_obstack, nbytes);
> +
> +  return new (r) irange (mem, num_pairs);
>  }
>  
>  inline irange *


Jakub



Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 1:32 PM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Oct 06, 2020 at 10:42:12AM -0600, Martin Sebor wrote:

The manual documents the [0] extension and mentions but discourages
using [1].  Nothing is said about other sizes and the warnings such
as -Warray-bounds have been increasingly complaining about accesses
past the declared constant bound (it won't complain about past-
the-end accesses to a mem[1], but will about those to mem[2]).

It would be nice if existing GCC code could eventually be converted
to avoid relying on the [1] hack.  I would hope we would avoid making
use of it in new code (and certainly avoid extending its uses to other
sizes).

I don't see how we could, because [0] is an extension and GCC needs to
support host compilers that don't support it, and similarly [] is an
extension in C++ and can't be relied on.
Changing say rtl or gimple structs from the way we define them currently
to say templates dependent on the size of the embedded arrays is I'm
afraid not really possible.

Jakub


Aldy is currently testing this.. I presume this is what you had in mind?

Andrew
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 94b48e55e77..c86301fe885 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -668,13 +668,13 @@ irange_allocator::allocate (unsigned num_pairs)
   if (num_pairs < 2)
 num_pairs = 2;
 
-  struct newir {
-irange range;
-tree mem[1];
-  };
-  size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
-  struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);
-  return new (r) irange (r->mem, num_pairs);
+  // Allocate the irange and required memory for the vector.
+  struct irange *r = (irange *) obstack_alloc (_obstack, sizeof (irange));
+
+  size_t nbytes = sizeof (tree) * 2 * num_pairs;
+  tree *mem = (tree *) obstack_alloc (_obstack, nbytes);
+
+  return new (r) irange (mem, num_pairs);
 }
 
 inline irange *


Re: [PATCH] lto: fix LTO debug sections copying.

2020-10-06 Thread Ian Lance Taylor via Gcc-patches
On Tue, Oct 6, 2020 at 3:20 AM Martin Liška  wrote:
>
> On 10/6/20 10:00 AM, Richard Biener wrote:
> > On Tue, Oct 6, 2020 at 9:01 AM Martin Liška  wrote:
> >>
> >> On 10/5/20 6:34 PM, Ian Lance Taylor wrote:
> >>> On Mon, Oct 5, 2020 at 9:09 AM Martin Liška  wrote:
> 
>  The previous patch was not correct. This one should be.
> 
>  Ready for master?
> >>>
> >>> I don't understand why this code uses symtab_indices_shndx at all.
> >>> There should only be one SHT_SYMTAB_SHNDX section.  There shouldn't be
> >>> any need for the symtab_indices_shndx vector.
> >>
> >> Well, the question is if we can have multiple .symtab sections in one ELF
> >> file? Theoretically yes, so we should also handle SHT_SYMTAB_SHNDX 
> >> sections.
> >> Note that the original usage of the SHT_SYMTAB_SHNDX section was motivated
> >> by PR81968 which is about Solaris ld.
> >
> > It wasn't my code but I suppose this way the implementation was
> > "easiest".  There
> > should be exactly one symtab / shndx section.  Rainer authored this support.
>
> If we expect at maximum one SHT_SYMTAB_SHNDX section section, then I'm 
> suggesting
> an updated version of the patch. It's what Ian offered.

This is OK with me with one minor change.

> + return "Multiple SYMTAB SECTION INDICES sections";

I think simply "More than one SHT_SYMTAB_SHNDX section".  SYMTAB
SECTION INDICES doesn't mean anything to me, and at least people can
do a web search for SHT_SYMTAB_SHNDX.

Thanks.

Ian


Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 06, 2020 at 10:42:12AM -0600, Martin Sebor wrote:
> The manual documents the [0] extension and mentions but discourages
> using [1].  Nothing is said about other sizes and the warnings such
> as -Warray-bounds have been increasingly complaining about accesses
> past the declared constant bound (it won't complain about past-
> the-end accesses to a mem[1], but will about those to mem[2]).
> 
> It would be nice if existing GCC code could eventually be converted
> to avoid relying on the [1] hack.  I would hope we would avoid making
> use of it in new code (and certainly avoid extending its uses to other
> sizes).

I don't see how we could, because [0] is an extension and GCC needs to
support host compilers that don't support it, and similarly [] is an
extension in C++ and can't be relied on.
Changing say rtl or gimple structs from the way we define them currently
to say templates dependent on the size of the embedded arrays is I'm
afraid not really possible.

Jakub



Re: error: ‘EVRP_MODE_DEBUG’ was not declared – was: [PUSHED] Ranger classes.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 1:10 PM, Tobias Burnus wrote:

On 10/6/20 6:56 PM, Tobias Burnus wrote:

Build fails here now with: gimple-range.h:168:59: error:
‘EVRP_MODE_DEBUG’ was not declared in this scope


And now builds – as the "Hybrid EVRP and testcases" was pushed as well,
a bit more than a quarter of an hour later. (At least it finished
building the compiler itself, I do not expect surprises in the library
parts.)

Tobias
Guess I should have just pushed it all as one commit. I thought the 
first part was pretty separate from the second... and it was except for 
one line :-P  of course I had problems getting the second one out or it 
would have followed quicker.


Sorry for the noise.

Andrew



Re: error: ‘EVRP_MODE_DEBUG’ was not declared – was: [PUSHED] Ranger classes.

2020-10-06 Thread Tobias Burnus

On 10/6/20 6:56 PM, Tobias Burnus wrote:

Build fails here now with: gimple-range.h:168:59: error:
‘EVRP_MODE_DEBUG’ was not declared in this scope


And now builds – as the "Hybrid EVRP and testcases" was pushed as well,
a bit more than a quarter of an hour later. (At least it finished
building the compiler itself, I do not expect surprises in the library
parts.)

Tobias


On 10/6/20 6:49 PM, Andrew MacLeod via Gcc-patches wrote:

I have checked in the ranger classes/files.They are being built
but not being invoked until the other passes are checked in.

there are 8 new files:

gimple-range-cache.{h,cc} :   Various caches used by the ranger.
gimple-range-edge.{h,cc} :Outgoing edge range calculations,
particularly switch edge ranges.
gimple-range-gori.{h,cc} : "Generate Outgoing Range Info" module
which calculates ranges on exit to basic blocks.
gimple-range.{h,cc} : gimple_ranger which pulls together
the other components and provides on-demand ranges.

and the Makefile.

the patches are the same as in the previous post last week.  New
streamlined ChangeLog :-)

I'll check in the hybrid EVRP next and finally a few testcase changes.

Andrew

2020-10-06  Andrew MacLeod  

* Makefile.in (OBJS): Add gimple-range*.o.
* gimple-range.h: New file.
* gimple-range.cc: New file.
* gimple-range-cache.h: New file.
* gimple-range-cache.cc: New file.
* gimple-range-edge.h: New file.
* gimple-range-edge.cc: New file.
* gimple-range-gori.h: New file.
* gimple-range-gori.cc: New file.


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: error: ‘EVRP_MODE_DEBUG’ was not declared – was: [PUSHED] Ranger classes.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 12:56 PM, Tobias Burnus wrote:

Build fails here now with: gimple-range.h:168:59: error:
‘EVRP_MODE_DEBUG’ was not declared in this scope

Tobias

On 10/6/20 6:49 PM, Andrew MacLeod via Gcc-patches wrote:

I have checked in the ranger classes/files.    They are being built
but not being invoked until the other passes are checked in.

there are 8 new files:

gimple-range-cache.{h,cc} :   Various caches used by the ranger.
gimple-range-edge.{h,cc} :    Outgoing edge range calculations,
particularly switch edge ranges.
gimple-range-gori.{h,cc} : "Generate Outgoing Range Info" module
which calculates ranges on exit to basic blocks.
gimple-range.{h,cc} : gimple_ranger which pulls together
the other components and provides on-demand ranges.

and the Makefile.

the patches are the same as in the previous post last week.  New
streamlined ChangeLog :-)

I'll check in the hybrid EVRP next and finally a few testcase changes.

Andrew

2020-10-06  Andrew MacLeod  

    * Makefile.in (OBJS): Add gimple-range*.o.
    * gimple-range.h: New file.
    * gimple-range.cc: New file.
    * gimple-range-cache.h: New file.
    * gimple-range-cache.cc: New file.
    * gimple-range-edge.h: New file.
    * gimple-range-edge.cc: New file.
    * gimple-range-gori.h: New file.
    * gimple-range-gori.cc: New file.


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / 
Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
Alexander Walter



Dang.  THe latest checkin fixes that

DOh.






[PUSHED] Hybrid EVRP and testcases

2020-10-06 Thread Andrew MacLeod via Gcc-patches

I have now checked in the hybrid EVRP pass.

We have resolved all the issue we are aware of with a full Fedora build, 
but if any more issues arise, please let us know. (And Im sure you will :-)


I made some minor tweaks.   the option to the new -fevrp-mode  flag are now:

legacy             : classic EVRP mode
ranger             : Ranger only mode
*legacy-first     :  Query ranges with EVRP first, and if that fails try 
the ranger*

ranger-first     : Query the ranger first, then evrp
ranger-trace   : Ranger-only mode plus Show range tracing info in the dump
ranger-debug : Ranger-only mode, and also include all cache debugging info
trace                : Hybrid mode with range tracing info
debug          : Hybrid mode with cache debugging as well as tracing

The default is still *legacy-first*.


If there is a compilation problem, and the problem goes away with    
-fevrp-mode=legacy

the ranger is to blame, and let us know asap.


Attached is the patch which was applied.

---

These are the initial test case differences, and the detailed analysis 
is below:


1)  We now XPASS analyzer/pr94851-1.c.
2) -disable-tree-evrp added to gcc.dg/pr81192.c
3) -disable-tree-evrp added to gcc.dg/tree-ssa/pr77445-2.c
4) -disable-tree-evrp added to tree-ssa/ssa-dom-thread-6.c
5) -disable-tree-evrp added to tree-ssa/ssa-dom-thread-7.c


mostly it is one of 2 things:

1) we propagate a constant into into PHI that wasnt happening before, 
EVRP didn't  handle anything other than single entry blocks well.
2) switches are processed in a lot more detail, which again propagate a 
lot of values into PHIs, and it then triggers more threading.


Last minute update... It also turns out the analyzer xpass may be noise. 
we did change the IL, but it sounds like there are other analyzer things 
at play at the moment :-)



1)  We now XPASS analyzer/pr94851-1.c.


It was xfailing with:
In function ‘pamark’:
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:43:13: 
warning: leak of ‘p’ [CWE-401] [-Wanalyzer-malloc-leak]
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:29:6: note: 
(1) following ‘false’ branch (when ‘p’ is NULL)...
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:32:23: note: 
(2) ...to here
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:32:23: note: 
(3) allocated here
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:32:8: note: 
(4) assuming ‘p’ is non-NULL
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:32:8: note: 
(5) following ‘false’ branch (when ‘p’ is non-NULL)...
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:35:15: note: 
(6) ...to here
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:43:13: note: 
(7) ‘p’ leaks here; was allocated at (3)


now we produce:
XPASS: gcc.dg/analyzer/pr94851-1.c bogus leak (test for bogus messages, 
line 43)



THe reason is in the IL:
  :
  # p_9 = PHI 
  # last_11 = PHI 
  if (p_9 != 0B)
    goto ; [INV]
  else
    goto ; [INV]  --> This outgoing edge

   :
  _3 = p_9->m_name;
  _4 = (char) _32;
  if (_3 != _4)
    goto ; [INV]
  else
    goto ; [INV]

   :
  # p_2 = PHI  <<<   THis PHI node
  # last_17 = PHI 
  if (p_2 != 0B)
    goto ; [INV]
  else
    goto ; [INV]

   :
  printf ("over writing mark %c\n", _32);
  goto ; [INV]


The ranger propagates the p_9 == 0 from the else branch into the PHI 
argument on edge 4->6

  :
  # p_2 = PHI <0B(4), p_9(5)>

which the threaders can then bypass the print in bb7 on one path, and 
that seems to resolve the current issue.


THe IL produced by the time we get to .optimized is identical, we just 
clear it up early enough for the analyzer to use now.


---

2) -fdisable-tree-evrp added to gcc.dg/pr81192.c to enable the test to pass

new version of evrp sees
 :
  if (j_8(D) != 2147483647)
    goto ; [50.00%]
  else
    goto ; [50.00%]
 :
  iftmp.2_11 = j_8(D) + 1;
 :
  # iftmp.2_12 = PHI 

EVRP now recognizes a constant can be propagated into the 3->5 edge and
produces
  # iftmp.2_12 = PHI <2147483647(3), iftmp.2_11(4)>
which causes the situation being tested to dissappear before we get to 
PRE.  */


---

3) -disable-tree-evrp added to gcc.dg/tree-ssa/pr77445-2.c

Aldy investigated this, and basically we are threading 6 more paths on 
x86_64 which is changing  the IL in visible ways.

Disabling evrp allows the threaders to test what they are looking for.

-

4) and 5)

along the same vein, we are threading anew opportunies in PHIs... these 

[PATCH][Arm] Auto-vectorization for MVE: vmin/vmax

2020-10-06 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch enables MVE vmin/vmax instructions for auto-vectorization.
MVE target is included in expander smin3, umin3, smax3 
and umax3 for vectorization.
Related insns for vmin/vmax in mve.md are modified to use smin, umin, 
smax and umax expressions instead of unspec to support the expanders.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vmaxq_): Replace with ...
(mve_vmaxq_s, mve_vmaxq_u): ... these new insns to
use smax/umax instead of VMAXQ.
(mve_vminq_): Replace with ...
(mve_vminq_s, mve_vminq_u): ... these new insns to
use smin/umin instead of VMINQ.
(mve_vmaxnmq_f): Use smax instead of VMAXNMQ_F.
(mve_vminnmq_f): Use smin instead of VMINNMQ_F.
* config/arm/vec-common.md (smin3): Use the new mode macros
ARM_HAVE__ARITH.
(umin3, smax3, umax3): Likewise.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vminmax_1.c: New test.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..0d9f932e983 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1977,15 +1977,25 @@
 ;;
 ;; [vmaxq_u, vmaxq_s])
 ;;
-(define_insn "mve_vmaxq_"
+(define_insn "mve_vmaxq_s"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		   (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMAXQ))
+	(smax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmax.%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+(define_insn "mve_vmaxq_u"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(umax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmax.%#\t%q0, %q1, %q2"
+  "vmax.%#\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2037,15 +2047,25 @@
 ;;
 ;; [vminq_s, vminq_u])
 ;;
-(define_insn "mve_vminq_"
+(define_insn "mve_vminq_s"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		   (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMINQ))
+	(smin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmin.%#\t%q0, %q1, %q2"
+  "vmin.%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+(define_insn "mve_vminq_u"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(umin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmin.%#\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -3030,9 +3050,8 @@
 (define_insn "mve_vmaxnmq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMAXNMQ_F))
+	(smax:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vmaxnm.f%#	%q0, %q1, %q2"
@@ -3090,9 +3109,8 @@
 (define_insn "mve_vminnmq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMINNMQ_F))
+	(smin:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vminnm.f%#	%q0, %q1, %q2"
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..6a330cc82f6 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -114,39 +114,29 @@
   [(set (match_operand:VALLW 0 "s_register_operand")
 	(smin:VALLW (match_operand:VALLW 1 "s_register_operand")
 		(match_operand:VALLW 2 "s_register_operand")))]
-  "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode)
-		|| flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))"
-{
-})
+   "ARM_HAVE__ARITH"
+)
 
 (define_expand "umin3"
   [(set (match_operand:VINTW 0 "s_register_operand")
 	(umin:VINTW (match_operand:VINTW 1 "s_register_operand")
 		(match_operand:VINTW 2 "s_register_operand")))]
-  "TARGET_NEON
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))"
-{
-})
+   "ARM_HAVE__ARITH"
+)
 
 (define_expand "smax3"
   [(set (match_operand:VALLW 0 "s_register_operand")
 	(smax:VALLW (match_operand:VALLW 1 "s_register_operand")
 		(match_operand:VALLW 2 "s_register_operand")))]
-  

error: ‘EVRP_MODE_DEBUG’ was not declared – was: [PUSHED] Ranger classes.

2020-10-06 Thread Tobias Burnus

Build fails here now with: gimple-range.h:168:59: error:
‘EVRP_MODE_DEBUG’ was not declared in this scope

Tobias

On 10/6/20 6:49 PM, Andrew MacLeod via Gcc-patches wrote:

I have checked in the ranger classes/files.They are being built
but not being invoked until the other passes are checked in.

there are 8 new files:

gimple-range-cache.{h,cc} :   Various caches used by the ranger.
gimple-range-edge.{h,cc} :Outgoing edge range calculations,
particularly switch edge ranges.
gimple-range-gori.{h,cc} : "Generate Outgoing Range Info" module
which calculates ranges on exit to basic blocks.
gimple-range.{h,cc} : gimple_ranger which pulls together
the other components and provides on-demand ranges.

and the Makefile.

the patches are the same as in the previous post last week.  New
streamlined ChangeLog :-)

I'll check in the hybrid EVRP next and finally a few testcase changes.

Andrew

2020-10-06  Andrew MacLeod  

* Makefile.in (OBJS): Add gimple-range*.o.
* gimple-range.h: New file.
* gimple-range.cc: New file.
* gimple-range-cache.h: New file.
* gimple-range-cache.cc: New file.
* gimple-range-edge.h: New file.
* gimple-range-edge.cc: New file.
* gimple-range-gori.h: New file.
* gimple-range-gori.cc: New file.


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH][Arm] Auto-vectorization for MVE: vmul

2020-10-06 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch enables MVE vmul instructions for auto-vectorization.
It includes MVE in expander mul3 to enable vectorization for MVE 
and modifies related vmul insns to support the expander by using 'mult' 
instead of unspec.
The mul3 for vectorization in vec-common.md uses mode iterator 
VDQWH instead of VALLW to cover all supported modes.
The macros ARM_HAVE__ARITH are used to select supported modes for 
different targets. The redundant mul3 in neon.md is removed.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vmulq): New entry for vmul instruction
using expression 'mult'.
(mve_vmulq_f): Use mult instead of VMULQ_F.
* config/arm/neon.md (mul3): Removed.
* config/arm/vec-common.md (mul3): Use the new mode macros
ARM_HAVE__ARITH. Use mode iterator VDQWH instead of VALLW.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vmul_1.c: New test.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..5b2b609174c 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2199,6 +2199,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vmulq"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(mult:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmul.i%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
 ;;
 ;; [vornq_u, vornq_s])
 ;;
@@ -3210,9 +3221,8 @@
 (define_insn "mve_vmulq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMULQ_F))
+	(mult:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vmul.f%#	%q0, %q1, %q2"
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 96bf277f501..f6632f1a25a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1899,17 +1899,6 @@
 (const_string "neon_mul_")))]
 )
 
-(define_insn "mul3"
- [(set
-   (match_operand:VH 0 "s_register_operand" "=w")
-   (mult:VH
-(match_operand:VH 1 "s_register_operand" "w")
-(match_operand:VH 2 "s_register_operand" "w")))]
-  "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
-  "vmul.f16\t%0, %1, %2"
- [(set_attr "type" "neon_mul_")]
-)
-
 (define_insn "neon_vmulf"
  [(set
(match_operand:VH 0 "s_register_operand" "=w")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..45db60e7411 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -101,14 +101,11 @@
 })
 
 (define_expand "mul3"
-  [(set (match_operand:VALLW 0 "s_register_operand")
-(mult:VALLW (match_operand:VALLW 1 "s_register_operand")
-		(match_operand:VALLW 2 "s_register_operand")))]
-  "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode)
-		|| flag_unsafe_math_optimizations))
-   || (mode == V4HImode && TARGET_REALLY_IWMMXT)"
-{
-})
+  [(set (match_operand:VDQWH 0 "s_register_operand")
+	(mult:VDQWH (match_operand:VDQWH 1 "s_register_operand")
+		(match_operand:VDQWH 2 "s_register_operand")))]
+  "ARM_HAVE__ARITH"
+)
 
 (define_expand "smin3"
   [(set (match_operand:VALLW 0 "s_register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c
new file mode 100644
index 000..514f292c15e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3" } */
+
+#include 
+
+void test_vmul_i32 (int32_t * dest, int32_t * a, int32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vmul\.i32\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vmul_i16 (int16_t * dest, int16_t * a, int16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vmul\.i16\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vmul_i8 (int8_t * dest, int8_t * a, int8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+dest[i] = 

[PUSHED] Ranger classes.

2020-10-06 Thread Andrew MacLeod via Gcc-patches
I have checked in the ranger classes/files.    They are being built but 
not being invoked until the other passes are checked in.


there are 8 new files:

gimple-range-cache.{h,cc} :   Various caches used by the ranger.
gimple-range-edge.{h,cc} :    Outgoing edge range calculations, 
particularly switch edge ranges.
gimple-range-gori.{h,cc} :     "Generate Outgoing Range Info" module 
which calculates ranges on exit to basic blocks.
gimple-range.{h,cc} :         gimple_ranger which pulls together the 
other components and provides on-demand ranges.


and the Makefile.

the patches are the same as in the previous post last week.  New 
streamlined ChangeLog :-)


I'll check in the hybrid EVRP next and finally a few testcase changes.

Andrew

2020-10-06  Andrew MacLeod  

    * Makefile.in (OBJS): Add gimple-range*.o.
    * gimple-range.h: New file.
    * gimple-range.cc: New file.
    * gimple-range-cache.h: New file.
    * gimple-range-cache.cc: New file.
    * gimple-range-edge.h: New file.
    * gimple-range-edge.cc: New file.
    * gimple-range-gori.h: New file.
    * gimple-range-gori.cc: New file.



Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-10-06 Thread Dennis Zhang via Gcc-patches
Hi all,

On 8/17/20 6:41 PM, Dennis Zhang wrote:
> 
> Hi all,
> 
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub3 optab. The sub3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
> 
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
> 
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
> 
> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
> 
> Is it OK for trunk please?
> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  
> 
>   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
>   * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
>   (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
>   (TARGET_NEON_MVE_HFP): Likewise.
>   * config/arm/iterators.md (VSEL): New mode iterator to select modes
>   for corresponding targets.
>   * config/arm/mve.md (mve_vsubq): New entry for vsub instruction
>   using expression 'minus'.
>   (mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
>   * config/arm/neon.md (sub3): Removed here. Integrated in the
>   sub3 in vec-common.md
>   * config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL
>   to select available modes. Exclude TARGET_NEON_FP16INST from
>   TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
>   originally in neon.md.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  
> 
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>   option -fno-ipa-icf and change the instruction count from 8 to 16.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
>   * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
>   * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
>   * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
> 

This patch is updated based on Richard Sandiford's patch adding new 
vector mode macros: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
The old version of this patch is at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
And a less related part in the old version is separated into another 
patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

This patch enables MVE vsub instructions for auto-vectorization.
It adds insns for MVE vsub instructions using 'minus' instead of unspec 
expression to make the instructions recognizable for auto-vectorization.
The sub3 in mve.md is modified to use new mode macros which make 
the expander available when certain modes are supported. Then various 
targets can share this expander for vectorization. The redundant 
sub3 insns in neon.md are then removed.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vsubq): New entry for vsub instruction
using expression 'minus'.
(mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
* config/arm/neon.md (*sub3_neon): Use the new mode macros
ARM_HAVE__ARITH.
(sub3, sub3_fp16): Removed.
(neon_vsub): Use gen_sub3 instead of gen_sub3_fp16.
* config/arm/vec-common.md (sub3): Use the new mode macros
ARM_HAVE__ARITH.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vsub_1.c: New test.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..7853b642262 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2574,6 +2574,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vsubq"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(minus:MVE_2 

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Martin Sebor via Gcc-patches

On 10/6/20 1:52 AM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Oct 06, 2020 at 09:37:21AM +0200, Aldy Hernandez via Gcc-patches wrote:

Pushed as obvious.

gcc/ChangeLog:

* value-range.h (irange_allocator::allocate): Increase
newir storage by one.
---
  gcc/value-range.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 94b48e55e77..7031a823138 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -670,7 +670,7 @@ irange_allocator::allocate (unsigned num_pairs)

struct newir {
  irange range;
-tree mem[1];
+tree mem[2];
};
size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);


So, we essentially want a flexible array member, which C++ without extension
doesn't have, and thus need to rely on the compiler handling the trailing
array as a poor men's flexible array member (again, GCC does for any size,
but not 100% sure about other compilers, if they e.g. don't handle that way
just size of 1).


The manual documents the [0] extension and mentions but discourages
using [1].  Nothing is said about other sizes and the warnings such
as -Warray-bounds have been increasingly complaining about accesses
past the declared constant bound (it won't complain about past-
the-end accesses to a mem[1], but will about those to mem[2]).

It would be nice if existing GCC code could eventually be converted
to avoid relying on the [1] hack.  I would hope we would avoid making
use of it in new code (and certainly avoid extending its uses to other
sizes).

If it's difficult to write efficient C++ code without relying on
these hacks we are in the perfect position to propose a solution
to C++.  Otherwise, if a portable solution already exists, we
should be able to adopt it.

Martin


Is there any reason why the code is written that way?
I mean, we could just use:
   size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;
   irange *r = (irange *) obstack_alloc (_obstack, nbytes);
   return new (r) irange ((tree *) (r + 1), num_pairs);
without any new type.

Jakub





Re: [PATCH][openacc, libgomp, testsuite] Xfail declare-5.f90

2020-10-06 Thread Tobias Burnus

Hi Tom,

On 10/6/20 6:20 PM, Tom de Vries wrote:

FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test

A PR was filed for this: PR92790 - "[OpenACC] declare device_resident -
Fortran common blocks not handled / libgomp.oacc-fortran/declare-5.f90 fails"

Xfail the fails.

Tested on x86_64-linux with nvptx accelerator.
OK for trunk?


OK. I had hoped that it could be fixed soonish – as this obviously
didn't work out, XFAIL is the right solution.

Tobias



[openacc, libgomp, testsuite] Xfail declare-5.f90

libgomp/ChangeLog:

2020-10-06  Tom de Vries  

  * testsuite/libgomp.oacc-fortran/declare-5.f90: Add xfail for PR92790.

---
  libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 | 1 +
  1 file changed, 1 insertion(+)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
index 2fd25d611a9..ab434f7f127 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
@@ -1,4 +1,5 @@
  ! { dg-do run }
+! { dg-xfail-run-if "PR92790 - acc declare device_resident - Fortran common blocks not handled" { 
*-*-* } { "*" } { "-DACC_DEVICE_TYPE_host=1" } }

  module vars
implicit none

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[committed] libstdc++: Inline std::exception_ptr members [PR 90295]

2020-10-06 Thread Jonathan Wakely via Gcc-patches
This inlines most members of std::exception_ptr so that all operations
on a null exception_ptr can be optimized away. This benefits code like
std::future and coroutines where an exception_ptr object is present to
cope with exceptional cases, but is usually not used and remains null.

Since those functions were previously non-inline we have to continue to
export them from the library, for objects that were compiled against the
old headers and expect to find definitions in the library.

In order to inline the copy constructor and destructor we need to export
the _M_addref() and _M_release() members that increment/decrement the
reference count when copying/destroying a non-null exception_ptr. The
copy ctor and dtor check for null and don't call _M_addref and
_M_release unless they need to. The checks for null pointers in
_M_addref and _M_release are still needed because old code might call
them without checking for null first. But we can use __builtin_expect to
predict that they are usually called for the non-null case.

libstdc++-v3/ChangeLog:

PR libstdc++/90295
* config/abi/pre/gnu.ver (CXXABI_1.3.13): New symbol version.
(exception_ptr::_M_addref(), exception_ptr::_M_release()):
Export symbols.
* libsupc++/eh_ptr.cc (exception_ptr::exception_ptr()):
Remove out-of-line definition.
(exception_ptr::exception_ptr(const exception_ptr&)):
Likewise.
(exception_ptr::~exception_ptr()): Likewise.
(exception_ptr::operator=(const exception_ptr&)):
Likewise.
(exception_ptr::swap(exception_ptr&)): Likewise.
(exception_ptr::_M_addref()): Add branch prediction.
* libsupc++/exception_ptr.h (exception_ptr::operator bool):
Add noexcept.
[!_GLIBCXX_EH_PTR_COMPAT] (operator==, operator!=): Define
inline as hidden friends. Remove declarations at namespace
scope.
(exception_ptr::exception_ptr()): Define inline.
(exception_ptr::exception_ptr(const exception_ptr&)):
Likewise.
(exception_ptr::~exception_ptr()): Likewise.
(exception_ptr::operator=(const exception_ptr&)):
Likewise.
(exception_ptr::swap(exception_ptr&)): Likewise.
* testsuite/util/testsuite_abi.cc: Add CXXABI_1.3.13.
* testsuite/18_support/exception_ptr/90295.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 1352ea192513e9a45808b8034df62b9434c674a7
Author: Jonathan Wakely 
Date:   Tue Oct 6 16:55:06 2020

libstdc++: Inline std::exception_ptr members [PR 90295]

This inlines most members of std::exception_ptr so that all operations
on a null exception_ptr can be optimized away. This benefits code like
std::future and coroutines where an exception_ptr object is present to
cope with exceptional cases, but is usually not used and remains null.

Since those functions were previously non-inline we have to continue to
export them from the library, for objects that were compiled against the
old headers and expect to find definitions in the library.

In order to inline the copy constructor and destructor we need to export
the _M_addref() and _M_release() members that increment/decrement the
reference count when copying/destroying a non-null exception_ptr. The
copy ctor and dtor check for null and don't call _M_addref and
_M_release unless they need to. The checks for null pointers in
_M_addref and _M_release are still needed because old code might call
them without checking for null first. But we can use __builtin_expect to
predict that they are usually called for the non-null case.

libstdc++-v3/ChangeLog:

PR libstdc++/90295
* config/abi/pre/gnu.ver (CXXABI_1.3.13): New symbol version.
(exception_ptr::_M_addref(), exception_ptr::_M_release()):
Export symbols.
* libsupc++/eh_ptr.cc (exception_ptr::exception_ptr()):
Remove out-of-line definition.
(exception_ptr::exception_ptr(const exception_ptr&)):
Likewise.
(exception_ptr::~exception_ptr()): Likewise.
(exception_ptr::operator=(const exception_ptr&)):
Likewise.
(exception_ptr::swap(exception_ptr&)): Likewise.
(exception_ptr::_M_addref()): Add branch prediction.
* libsupc++/exception_ptr.h (exception_ptr::operator bool):
Add noexcept.
[!_GLIBCXX_EH_PTR_COMPAT] (operator==, operator!=): Define
inline as hidden friends. Remove declarations at namespace
scope.
(exception_ptr::exception_ptr()): Define inline.
(exception_ptr::exception_ptr(const exception_ptr&)):
Likewise.
(exception_ptr::~exception_ptr()): Likewise.
(exception_ptr::operator=(const exception_ptr&)):
Likewise.
(exception_ptr::swap(exception_ptr&)): 

[PATCH][openacc, libgomp, testsuite] Xfail declare-5.f90

2020-10-06 Thread Tom de Vries
Hi,

We're currently running into:
...
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  execution test
...

A PR was filed for this: PR92790 - "[OpenACC] declare device_resident -
Fortran common blocks not handled / libgomp.oacc-fortran/declare-5.f90 fails"

Xfail the fails.

Tested on x86_64-linux with nvptx accelerator.

OK for trunk?

Thanks,
- Tom

[openacc, libgomp, testsuite] Xfail declare-5.f90

libgomp/ChangeLog:

2020-10-06  Tom de Vries  

* testsuite/libgomp.oacc-fortran/declare-5.f90: Add xfail for PR92790.

---
 libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
index 2fd25d611a9..ab434f7f127 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-xfail-run-if "PR92790 - acc declare device_resident - Fortran common 
blocks not handled" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_host=1" } }
 
 module vars
   implicit none


RE: [PATCH v2] arm: [MVE] Remove illegal intrinsics

2020-10-06 Thread Kyrylo Tkachov via Gcc-patches
With gcc-patches on too.
Not sure why the reply-all function fails for your address
Kyrill

> -Original Message-
> From: Kyrylo Tkachov
> Sent: 06 October 2020 17:13
> To: Christophe Lyon 
> Subject: RE: [PATCH v2] arm: [MVE] Remove illegal intrinsics
> 
> 
> 
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Christophe Lyon via Gcc-patches
> > Sent: 06 October 2020 16:59
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH v2] arm: [MVE] Remove illegal intrinsics
> >
> > A few MVE intrinsics had an unsigned variant implement while they are
> > supported by the hardware.  This patch removes them:
> > __arm_vqrdmlashq_n_u8
> > __arm_vqrdmlahq_n_u8
> > __arm_vqdmlahq_n_u8
> > __arm_vqrdmlashq_n_u16
> > __arm_vqrdmlahq_n_u16
> > __arm_vqdmlahq_n_u16
> > __arm_vqrdmlashq_n_u32
> > __arm_vqrdmlahq_n_u32
> > __arm_vqdmlahq_n_u32
> > __arm_vmlaldavaxq_p_u32
> > __arm_vmlaldavaxq_p_u16
> >
> > v2: rebased after Srinath's reorganization patch
> 
> Ok.
> Thanks,
> Kyrill
> 
> >
> > 2020-10-06  Christophe Lyon  
> >
> > gcc/
> > PR target/96914
> > * config/arm/arm_mve.h (vqrdmlashq_n_u8, vqrdmlashq_n_u16)
> > (vqrdmlashq_n_u32, vqrdmlahq_n_u8, vqrdmlahq_n_u16)
> > (vqrdmlahq_n_u32, vqdmlahq_n_u8, vqdmlahq_n_u16,
> > vqdmlahq_n_u32)
> > (vmlaldavaxq_p_u16, vmlaldavaxq_p_u32): Remove.
> > * config/arm/arm_mve_builtins.def (vqrdmlashq_n_u,
> > vqrdmlahq_n_u)
> > (vqdmlahq_n_u, vmlaldavaxq_p_u): Remove.
> > * config/arm/unspecs.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
> > (VQRDMLASHQ_N_U)
> > (VMLALDAVAXQ_P_U): Remove unspecs.
> > * config/arm/iterators.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
> > (VQRDMLASHQ_N_U, VMLALDAVAXQ_P_U): Remove attributes.
> > (VQDMLAHQ_N, VQRDMLAHQ_N, VQRDMLASHQ_N,
> > VMLALDAVAXQ_P): Remove
> > unsigned variants from iterators.
> > * config/arm/mve.md (mve_vqdmlahq_n_)
> > (mve_vqrdmlahq_n_)
> > (mve_vqrdmlashq_n_,
> > mve_vmlaldavaxq_p_):
> > Update comment.
> >
> > gcc/testsuite/
> > PR target/96914
> > * gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c: Remove.
> > ---
> >  gcc/config/arm/arm_mve.h   | 199 
> > +
> >  gcc/config/arm/arm_mve_builtins.def|   4 -
> >  gcc/config/arm/iterators.md|  16 +-
> >  gcc/config/arm/mve.md  |   8 +-
> >  gcc/config/arm/unspecs.md  |   4 -
> >  .../arm/mve/intrinsics/vmlaldavaxq_p_u16.c |  21 ---
> >  .../arm/mve/intrinsics/vmlaldavaxq_p_u32.c |  21 ---
> >  .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c |  21 ---
> >  .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c |  21 ---
> >  .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c  |  21 ---
> >  .../arm/mve/intrinsics/vqrdmlahq_n_u16.c   |  21 ---
> >  .../arm/mve/intrinsics/vqrdmlahq_n_u32.c   |  21 ---
> >  .../gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c |  21 ---
> >  .../arm/mve/intrinsics/vqrdmlashq_n_u16.c  |  21 ---
> >  .../arm/mve/intrinsics/vqrdmlashq_n_u32.c  |  21 ---
> >  .../arm/mve/intrinsics/vqrdmlashq_n_u8.c   |  21 ---
> >  16 files changed, 19 insertions(+), 443 deletions(-)
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c
> >
> > diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> > 

Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Aldy Hernandez via Gcc-patches




On 10/6/20 4:51 PM, Martin Sebor wrote:

On 10/6/20 8:42 AM, Andrew MacLeod wrote:

On 10/6/20 10:30 AM, Martin Sebor wrote:

On 10/6/20 3:45 AM, Aldy Hernandez wrote:

-  builtin_memref dstref (dst, dstsize);
-  builtin_memref srcref (src, srcsize);
+  builtin_memref dstref (query, call, dst, dstsize);
+  builtin_memref srcref (query, call, src, srcsize);

    /* Create a descriptor of the access.  This may adjust both 
DSTREF
   and SRCREF based on one another and the kind of the 
access.  */

-  builtin_access acs (call, dstref, srcref);
+  builtin_access acs (query, call, dstref, srcref);


Since/if the query pointer is a member of builtin_memref which is
passed to the builtin_access ctor there should be no need to pass
a second (and third) copy to it as well.


builtin_memref seems like an independent object altogether, and the 
query is a private member of said object.  Are you proposing making 
it public, or making builtin_access a friend of builtin_memref (eeech)?


Either one of those seems preferable to the duplication for the time
being, until there's an API to access the global ranger instance.

A better alternative, in view of your expectation of exposing
the instance via (cfun)->range_of_expr(), is to add some static
namespace scope function to access the range instance.  That
should make adopting the envisioned solution minimally disruptive.

The point was we don't have a fully envisioned solution yet... that is 
just one possibility and may never come to pass.   Each pass should do 
"the right thing" for themselves for now.


Yes, I got that.  Which is why I suggest to add a namespace scope
function to the restrict pass that can then be easily replaced with
whatever solution we ultimately end up with.

What's certain (in my mind anyway) is that storing a pointer to some
global (or per-pass) range instance as a member in each class that
needs to access it is not the solution we want long term.


Tell you what.  I'll make your class public, access it's internal 
members as you describe (ughh), and you can do anything else post-commit.


Aldy



Re: [PATCH] arm: Fix multiple inheritance thunks for thumb-1 with -mpure-code

2020-10-06 Thread Richard Earnshaw via Gcc-patches
On 29/09/2020 20:50, Christophe Lyon via Gcc-patches wrote:
> When mi_delta is > 255 and -mpure-code is used, we cannot load delta
> from code memory (like we do without -mpure-code).
> 
> This patch builds the value of mi_delta into r3 with a series of
> movs/adds/lsls.
> 
> We also do some cleanup by not emitting the function address and delta
> via .word directives at the end of the thunk since we don't use them
> with -mpure-code.
> 
> No need for new testcases, this bug was already identified by
> eg. pr46287-3.C
> 
> 2020-09-29  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm.c (arm_thumb1_mi_thunk): Build mi_delta in r3 and
>   do not emit function address and delta when -mpure-code is used.

There are some optimizations you can make to this code.

Firstly, for values between 256 and 510 (inclusive), it would be better
to just expand a mov of 255 followed by an add.  This is also true for
the literal pools alternative as well, so should be handled before all
this.  I also suspect (but haven't check) that the base adjustment will
most commonly be a multiple of the machine word size (ie 4).  If that is
the case then you could generate n/4 and then shift it left by 2 for an
even greater range of literals.  More generally, any sequence of up to
three thumb1 instructions will be no larger, and probably as fast as the
existing literal pool fall back.

Secondly, if the value is, for example, 65536 (0x1), your code will
emit a mov followed by two shift-by-8 instructions; the two shifts could
be merged into a single shift-by-16.

Finally, I'd really like to see some executable tests for this, if at
all possible.

R.

> 
> k#   (use "git pull" to merge the remote branch into yours)
> ---
>  gcc/config/arm/arm.c | 91 
> +---
>  1 file changed, 66 insertions(+), 25 deletions(-)
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index ceeb91f..62abeb5 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -28342,9 +28342,43 @@ arm_thumb1_mi_thunk (FILE *file, tree, HOST_WIDE_INT 
> delta,
>  {
>if (mi_delta > 255)
>   {
> -   fputs ("\tldr\tr3, ", file);
> -   assemble_name (file, label);
> -   fputs ("+4\n", file);
> +   /* With -mpure-code, we cannot load delta from the constant
> +  pool: we build it explicitly.  */
> +   if (target_pure_code)
> + {
> +   bool mov_done_p = false;
> +   int i;
> +
> +   /* Emit upper 3 bytes if needed.  */
> +   for (i = 0; i < 3; i++)
> + {
> +   int byte = (mi_delta >> (8 * (3 - i))) & 0xff;
> +
> +   if (byte)
> + {
> +   if (mov_done_p)
> + asm_fprintf (file, "\tadds\tr3, #%d\n", byte);
> +   else
> + asm_fprintf (file, "\tmovs\tr3, #%d\n", byte);
> +   mov_done_p = true;
> + }
> +
> +   if (mov_done_p)
> + asm_fprintf (file, "\tlsls\tr3, #8\n");
> + }
> +
> +   /* Emit lower byte if needed.  */
> +   if (!mov_done_p)
> + asm_fprintf (file, "\tmovs\tr3, #%d\n", mi_delta & 0xff);
> +   else if (mi_delta & 0xff)
> + asm_fprintf (file, "\tadds\tr3, #%d\n", mi_delta & 0xff);
> + }
> +   else
> + {
> +   fputs ("\tldr\tr3, ", file);
> +   assemble_name (file, label);
> +   fputs ("+4\n", file);
> + }
> asm_fprintf (file, "\t%ss\t%r, %r, r3\n",
>  mi_op, this_regno, this_regno);
>   }
> @@ -28380,30 +28414,37 @@ arm_thumb1_mi_thunk (FILE *file, tree, 
> HOST_WIDE_INT delta,
>   fputs ("\tpop\t{r3}\n", file);
>  
>fprintf (file, "\tbx\tr12\n");
> -  ASM_OUTPUT_ALIGN (file, 2);
> -  assemble_name (file, label);
> -  fputs (":\n", file);
> -  if (flag_pic)
> +
> +  /* With -mpure-code, we don't need to emit literals for the
> +  function address and delta since we emitted code to build
> +  them.  */
> +  if (!target_pure_code)
>   {
> -   /* Output ".word .LTHUNKn-[3,7]-.LTHUNKPCn".  */
> -   rtx tem = XEXP (DECL_RTL (function), 0);
> -   /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the PC
> -  pipeline offset is four rather than eight.  Adjust the offset
> -  accordingly.  */
> -   tem = plus_constant (GET_MODE (tem), tem,
> -TARGET_THUMB1_ONLY ? -3 : -7);
> -   tem = gen_rtx_MINUS (GET_MODE (tem),
> -tem,
> -gen_rtx_SYMBOL_REF (Pmode,
> -ggc_strdup (labelpc)));
> -   assemble_integer (tem, 4, BITS_PER_WORD, 1);
> - }
> -  else
> - /* Output ".word .LTHUNKn".  */
> - assemble_integer (XEXP (DECL_RTL (function), 0), 4, BITS_PER_WORD, 1);
> 

[PATCH v2] arm: [MVE] Remove illegal intrinsics

2020-10-06 Thread Christophe Lyon via Gcc-patches
A few MVE intrinsics had an unsigned variant implement while they are
supported by the hardware.  This patch removes them:
__arm_vqrdmlashq_n_u8
__arm_vqrdmlahq_n_u8
__arm_vqdmlahq_n_u8
__arm_vqrdmlashq_n_u16
__arm_vqrdmlahq_n_u16
__arm_vqdmlahq_n_u16
__arm_vqrdmlashq_n_u32
__arm_vqrdmlahq_n_u32
__arm_vqdmlahq_n_u32
__arm_vmlaldavaxq_p_u32
__arm_vmlaldavaxq_p_u16

v2: rebased after Srinath's reorganization patch

2020-10-06  Christophe Lyon  

gcc/
PR target/96914
* config/arm/arm_mve.h (vqrdmlashq_n_u8, vqrdmlashq_n_u16)
(vqrdmlashq_n_u32, vqrdmlahq_n_u8, vqrdmlahq_n_u16)
(vqrdmlahq_n_u32, vqdmlahq_n_u8, vqdmlahq_n_u16, vqdmlahq_n_u32)
(vmlaldavaxq_p_u16, vmlaldavaxq_p_u32): Remove.
* config/arm/arm_mve_builtins.def (vqrdmlashq_n_u, vqrdmlahq_n_u)
(vqdmlahq_n_u, vmlaldavaxq_p_u): Remove.
* config/arm/unspecs.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
(VQRDMLASHQ_N_U)
(VMLALDAVAXQ_P_U): Remove unspecs.
* config/arm/iterators.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
(VQRDMLASHQ_N_U, VMLALDAVAXQ_P_U): Remove attributes.
(VQDMLAHQ_N, VQRDMLAHQ_N, VQRDMLASHQ_N, VMLALDAVAXQ_P): Remove
unsigned variants from iterators.
* config/arm/mve.md (mve_vqdmlahq_n_)
(mve_vqrdmlahq_n_)
(mve_vqrdmlashq_n_, mve_vmlaldavaxq_p_):
Update comment.

gcc/testsuite/
PR target/96914
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c: Remove.
---
 gcc/config/arm/arm_mve.h   | 199 +
 gcc/config/arm/arm_mve_builtins.def|   4 -
 gcc/config/arm/iterators.md|  16 +-
 gcc/config/arm/mve.md  |   8 +-
 gcc/config/arm/unspecs.md  |   4 -
 .../arm/mve/intrinsics/vmlaldavaxq_p_u16.c |  21 ---
 .../arm/mve/intrinsics/vmlaldavaxq_p_u32.c |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c  |  21 ---
 .../arm/mve/intrinsics/vqrdmlahq_n_u16.c   |  21 ---
 .../arm/mve/intrinsics/vqrdmlahq_n_u32.c   |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c |  21 ---
 .../arm/mve/intrinsics/vqrdmlashq_n_u16.c  |  21 ---
 .../arm/mve/intrinsics/vqrdmlashq_n_u32.c  |  21 ---
 .../arm/mve/intrinsics/vqrdmlashq_n_u8.c   |  21 ---
 16 files changed, 19 insertions(+), 443 deletions(-)
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 7626ad1..ccdac67 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1237,9 +1237,6 @@
 #define vpselq_u8(__a, __b, __p) __arm_vpselq_u8(__a, __b, __p)
 #define vpselq_s8(__a, __b, __p) __arm_vpselq_s8(__a, __b, __p)
 #define vrev64q_m_u8(__inactive, __a, __p) __arm_vrev64q_m_u8(__inactive, __a, 
__p)
-#define vqrdmlashq_n_u8(__a, __b, __c) __arm_vqrdmlashq_n_u8(__a, __b, __c)
-#define vqrdmlahq_n_u8(__a, __b, __c) __arm_vqrdmlahq_n_u8(__a, __b, __c)
-#define vqdmlahq_n_u8(__a, __b, __c) __arm_vqdmlahq_n_u8(__a, __b, __c)
 #define vmvnq_m_u8(__inactive, __a, __p) __arm_vmvnq_m_u8(__inactive, __a, __p)
 #define vmlasq_n_u8(__a, __b, __c) __arm_vmlasq_n_u8(__a, __b, __c)
 #define vmlaq_n_u8(__a, __b, __c) __arm_vmlaq_n_u8(__a, __b, __c)
@@ -1323,9 +1320,6 @@
 

[PATCH v2] arm: [MVE[ Add vqdmlashq intrinsics

2020-10-06 Thread Christophe Lyon via Gcc-patches
This patch adds:
vqdmlashq_m_n_s16
vqdmlashq_m_n_s32
vqdmlashq_m_n_s8
vqdmlashq_n_s16
vqdmlashq_n_s32
vqdmlashq_n_s8

v2: rebased after Srinath's reorganization patch

2020-10-05  Christophe Lyon  

gcc/
PR target/96914
* config/arm/arm_mve.h (vqdmlashq, vqdmlashq_m): Define.
* config/arm/arm_mve_builtins.def (vqdmlashq_n_s)
(vqdmlashq_m_n_s,): New.
* config/arm/unspecs.md (VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New
unspecs.
* config/arm/iterators.md (VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New
attributes.
(VQDMLASHQ_N): New iterator.
* config/arm/mve.md (mve_vqdmlashq_n_, mve_vqdmlashq_m_n_s): New
patterns.

gcc/tetsuite/
PR target/96914
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c: New test.
---
 gcc/config/arm/arm_mve.h   | 116 +
 gcc/config/arm/arm_mve_builtins.def|   2 +
 gcc/config/arm/iterators.md|   3 +
 gcc/config/arm/mve.md  |  33 ++
 gcc/config/arm/unspecs.md  |   2 +
 .../arm/mve/intrinsics/vqdmlashq_m_n_s16.c |  23 
 .../arm/mve/intrinsics/vqdmlashq_m_n_s32.c |  23 
 .../arm/mve/intrinsics/vqdmlashq_m_n_s8.c  |  23 
 .../arm/mve/intrinsics/vqdmlashq_n_s16.c   |  21 
 .../arm/mve/intrinsics/vqdmlashq_n_s32.c   |  21 
 .../gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c |  21 
 11 files changed, 288 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index d9bfb203..7626ad1 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -141,6 +141,7 @@
 #define vrev64q_m(__inactive, __a, __p) __arm_vrev64q_m(__inactive, __a, __p)
 #define vqrdmlashq(__a, __b, __c) __arm_vqrdmlashq(__a, __b, __c)
 #define vqrdmlahq(__a, __b, __c) __arm_vqrdmlahq(__a, __b, __c)
+#define vqdmlashq(__a, __b, __c) __arm_vqdmlashq(__a, __b, __c)
 #define vqdmlahq(__a, __b, __c) __arm_vqdmlahq(__a, __b, __c)
 #define vmvnq_m(__inactive, __a, __p) __arm_vmvnq_m(__inactive, __a, __p)
 #define vmlasq(__a, __b, __c) __arm_vmlasq(__a, __b, __c)
@@ -260,6 +261,7 @@
 #define vorrq_m(__inactive, __a, __b, __p) __arm_vorrq_m(__inactive, __a, __b, 
__p)
 #define vqaddq_m(__inactive, __a, __b, __p) __arm_vqaddq_m(__inactive, __a, 
__b, __p)
 #define vqdmladhq_m(__inactive, __a, __b, __p) __arm_vqdmladhq_m(__inactive, 
__a, __b, __p)
+#define vqdmlashq_m(__a, __b, __c, __p) __arm_vqdmlashq_m(__a, __b, __c, __p)
 #define vqdmladhxq_m(__inactive, __a, __b, __p) __arm_vqdmladhxq_m(__inactive, 
__a, __b, __p)
 #define vqdmlahq_m(__a, __b, __c, __p) __arm_vqdmlahq_m(__a, __b, __c, __p)
 #define vqdmlsdhq_m(__inactive, __a, __b, __p) __arm_vqdmlsdhq_m(__inactive, 
__a, __b, __p)
@@ -1307,6 +1309,7 @@
 #define vqdmlsdhxq_s8(__inactive, __a, __b) __arm_vqdmlsdhxq_s8(__inactive, 
__a, __b)
 #define vqdmlsdhq_s8(__inactive, __a, __b) __arm_vqdmlsdhq_s8(__inactive, __a, 
__b)
 #define vqdmlahq_n_s8(__a, __b, __c) __arm_vqdmlahq_n_s8(__a, __b, __c)
+#define vqdmlashq_n_s8(__a, __b, __c) __arm_vqdmlashq_n_s8(__a, __b, __c)
 #define vqdmladhxq_s8(__inactive, __a, __b) __arm_vqdmladhxq_s8(__inactive, 
__a, __b)
 #define vqdmladhq_s8(__inactive, __a, __b) __arm_vqdmladhq_s8(__inactive, __a, 
__b)
 #define vmlsdavaxq_s8(__a, __b, __c) __arm_vmlsdavaxq_s8(__a, __b, __c)
@@ -1391,6 +1394,7 @@
 #define vqrdmladhq_s16(__inactive, __a, __b) __arm_vqrdmladhq_s16(__inactive, 
__a, __b)
 #define vqdmlsdhxq_s16(__inactive, __a, __b) __arm_vqdmlsdhxq_s16(__inactive, 
__a, __b)
 #define vqdmlsdhq_s16(__inactive, __a, __b) __arm_vqdmlsdhq_s16(__inactive, 
__a, __b)
+#define vqdmlashq_n_s16(__a, __b, __c) __arm_vqdmlashq_n_s16(__a, __b, __c)
 #define vqdmlahq_n_s16(__a, __b, __c) __arm_vqdmlahq_n_s16(__a, __b, __c)
 #define vqdmladhxq_s16(__inactive, __a, __b) __arm_vqdmladhxq_s16(__inactive, 
__a, __b)
 #define vqdmladhq_s16(__inactive, __a, __b) __arm_vqdmladhq_s16(__inactive, 
__a, __b)
@@ -1476,6 +1480,7 @@
 #define vqrdmladhq_s32(__inactive, __a, __b) 

Re: [PATCH] debug: Pass --gdwarf-N to assembler if fixed gas is detected during configure

2020-10-06 Thread Mark Wielaard
Hi,

On Fri, 2020-09-18 at 17:21 +0200, Mark Wielaard wrote:
> On Tue, 2020-09-15 at 20:40 +0200, Jakub Jelinek wrote:
> > Ok, here it is in patch form.
> > I've briefly tested it, with the older binutils I have around (no --gdwarf-N
> > support), with latest gas (--gdwarf-N that can be passed to as even when
> > compiling C/C++ etc. code and emitting .debug_line) and latest gas with 
> > Mark's fix
> > reverted (--gdwarf-N support, but can only pass it to as when assembling
> > user .s/.S files, not when compiling C/C++ etc.).
> > Will bootstrap/regtest (with the older binutils) later tonight.
> > 
> > 2020-09-15  Jakub Jelinek  
> > 
> > * configure.ac (HAVE_AS_GDWARF_5_DEBUG_FLAG,
> > HAVE_AS_WORKING_DWARF_4_FLAG): New tests.
> > * gcc.c (ASM_DEBUG_DWARF_OPTION): Define.
> > (ASM_DEBUG_SPEC): Use ASM_DEBUG_DWARF_OPTION instead of
> > "--gdwarf2".  Use %{cond:opt1;:opt2} style.
> > (ASM_DEBUG_OPTION_DWARF_OPT): Define.
> > (ASM_DEBUG_OPTION_SPEC): Define.
> > (asm_debug_option): New variable.
> > (asm_options): Add "%(asm_debug_option)".
> > (static_specs): Add asm_debug_option entry.
> > (static_spec_functions): Add dwarf-version-gt.
> > (debug_level_greater_than_spec_func): New function.
> > * config/darwin.h (ASM_DEBUG_OPTION_SPEC): Define.
> > * config/darwin9.h (ASM_DEBUG_OPTION_SPEC): Redefine.
> > * config.in: Regenerated.
> > * configure: Regenerated.
> 
> Once this is in we can more generally emit DW_FORM_line_str for
> filepaths in CU DIEs for the name and comp_dir attribute. There
> currently is a bit of a hack to do this in dwarf2out_early_finish, but
> that only works when the assembler doesn't emit a DWARF5 .debug_line,
> but gcc does it itself.
> 
> What do you think of the attached patch?
>
> DWARF5 has a new string table specially for file paths. .debug_line
> file and dir tables reference strings in .debug_line_str.  If a
> .debug_line_str section is emitted then also place CU DIE file
> names and comp dirs there.
> 
> gcc/ChangeLog:
> 
>   * dwarf2out.c (add_filepath_AT_string): New function.
>   (asm_outputs_debug_line_str): Likewise.
>   (add_filename_attribute): Likewise.
>   (add_comp_dir_attribute): Call add_filepath_AT_string.
>   (gen_compile_unit_die): Call add_filename_attribute for name.
>   (init_sections_and_labels): Init debug_line_str_section when
>   asm_outputs_debug_line_str return true.
>   (dwarf2out_early_finish): Remove DW_AT_name and DW_AT_comp_dir
>   hack and call add_filename_attribute for the remap_debug_filename.

On top of that, we also need the following, which makes sure the actual
compilation directory is used in a DWARF5 .debug_line directory table
(and not just a relative path).


From 66b25bc0a5df06e211b48a54e3b5d33999c24fb6 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Tue, 6 Oct 2020 17:41:19 +0200
Subject: [PATCH] debug: Make sure to output .file 0 when generating DWARF5.

When gas outputs DWARF5 .debug_line[_str] then we have to tell it the
comp_dir and main file name for the zero entry line table. Otherwise
gas has to guess at the CU compilation directory and file.

Before a gcc -gdwarf-5 ../src/hello.c line table looked like:

Directory table:
 0 ../src (24)
 1 ../src (24)
 2 /usr/include (31)

File name table:
 0 hello.c (16),  0
 1 hello.c (16),  1
 2 stdio.h (44),  2

With this patch it looks like:

Directory table:
 0 /tmp/obj (0)
 1 ../src (24)
 2 /usr/include (31)

File name table:
 0 ../src/hello.c (9),  0
 1 hello.c (16),  1
 2 stdio.h (44),  2

gcc/ChangeLog:

	* dwarf2out.c (dwarf2out_finish): Emit .file 0 entry when
	generating DWARF5 .debug_line table through gas.
---
 gcc/dwarf2out.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index a43082864a75..399937a9f310 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -31764,6 +31764,27 @@ dwarf2out_finish (const char *filename)
   ASM_OUTPUT_LABEL (asm_out_file, debug_line_section_label);
   if (! output_asm_line_debug_info ())
 output_line_info (false);
+  else if (asm_outputs_debug_line_str ())
+{
+  /* When gas outputs DWARF5 .debug_line[_str] then we have to
+	 tell it the comp_dir and main file name for the zero entry
+	 line table.  */
+  const char *comp_dir, *filename0;
+
+  comp_dir = comp_dir_string ();
+  if (comp_dir == NULL)
+	comp_dir = "";
+
+  filename0 = get_AT_string (comp_unit_die (), DW_AT_name);
+  if (filename0 == NULL)
+	filename0 = "";
+
+  fprintf (asm_out_file, "\t.file 0 ");
+  output_quoted_string (asm_out_file, remap_debug_filename (comp_dir));
+  fputc (' ', asm_out_file);
+  output_quoted_string (asm_out_file, remap_debug_filename (filename0));
+  fputc ('\n', asm_out_file);
+}
 
   if (dwarf_split_debug_info && info_section_emitted)
 {
-- 
2.18.4



[committed][GCC 8] arm: Add missing part number for Neoverse V1

2020-10-06 Thread Alex Coplan via Gcc-patches
This patch adds the part number for Neoverse V1 which was missing from
the initial AArch32 support in GCC 8.

Bootstrapped and regtested on arm-none-linux-gnueabihf, pushing as
obvious.

Thanks,
Alex

---

gcc/ChangeLog:

* config/arm/driver-arm.c (arm_cpu_table): Add neoverse-v1.
diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index 45ad92e..8352289 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -56,6 +56,7 @@ static struct vendor_cpu arm_cpu_table[] = {
 {"0xd09", "armv8-a+crc", "cortex-a73"},
 {"0xd05", "armv8.2-a+fp16+dotprod", "cortex-a55"},
 {"0xd0a", "armv8.2-a+fp16+dotprod", "cortex-a75"},
+{"0xd40", "armv8.4-a+fp16", "neoverse-v1"},
 {"0xd49", "armv8.4-a+fp16", "neoverse-n2"},
 {"0xc14", "armv7-r", "cortex-r4"},
 {"0xc15", "armv7-r", "cortex-r5"},


Re: [PATCH] xfail and improve some failing libgomp tests

2020-10-06 Thread Tom de Vries
On 10/6/20 5:02 PM, Jakub Jelinek wrote:
> On Tue, Oct 06, 2020 at 04:48:40PM +0200, Tom de Vries wrote:
>> On 10/5/20 3:15 PM, Tom de Vries wrote:
>>> On 2/7/20 4:29 PM, Jakub Jelinek wrote:
 On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
> * {target-32.c, thread-limit-2.c}:
> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690

 Please don't, I want to deal with that using declare variant, just didn't
 get yet around to finishing the last patch needed for that.  Will try next 
 week.

>>>
>>> Hi Jakub,
>>>
>>> Ping, any update on this?
> 
> Not finished the last step, I run into LTO issues.  Will need to return to
> that soon.
> Last progress in "[RFH] LTO cgraph support for late declare variant 
> resolution"
> mail from May on gcc-patches.
> 

Ack, thanks for the update.

>> --- a/libgomp/testsuite/libgomp.c/target-32.c
>> +++ b/libgomp/testsuite/libgomp.c/target-32.c
>> @@ -1,6 +1,26 @@
>>  #include 
>>  #include 
>>  
>> +extern void base_delay(int);
> 
> No need to declare this one early.
> 
>> +extern void nvptx_delay(int);
> 
> Space before (, and the definition could go here instead of
> the declaration.
> 
>> +#pragma omp declare variant( nvptx_delay ) match( construct={target}, 
>> implementation={vendor(nvidia)} )
> 
> This isn't the right declare variant for what we want though,
> we only provide gnu as accepted vendor, it is implementation's vendor,
> not vendor of one of the hw components.
> So, it ought to be instead
> #pragma omp declare variant (nvptx_delay) 
> match(construct={target},device={arch(nvptx)})
> 
>> +void base_delay(int d)
>> +{
>> +  usleep (d);
>> +}

I've updated the patch accordingly.

FWIW, I now run into an ICE which looks like PR96680:
...
lto1: internal compiler error: in lto_fixup_prevailing_decls, at
lto/lto-common.c:2595^M
0x93afcd lto_fixup_prevailing_decls^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto-common.c:2595^M
0x93b1d6 lto_fixup_decls^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto-common.c:2645^M
0x93bcc4 read_cgraph_and_symbols(unsigned int, char const**)^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto-common.c:2897^M
0x910358 lto_main()^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto.c:625^M
...

Thanks,
- Tom
diff --git a/libgomp/testsuite/libgomp.c/target-32.c b/libgomp/testsuite/libgomp.c/target-32.c
index 233877b702b..b8deae72b08 100644
--- a/libgomp/testsuite/libgomp.c/target-32.c
+++ b/libgomp/testsuite/libgomp.c/target-32.c
@@ -1,6 +1,25 @@
 #include 
 #include 
 
+void
+nvptx_delay (int d)
+{
+  /* This function serves as a replacement for usleep in
+ this test case.  It does not even attempt to be functionally
+ equivalent  - we just want some sort of delay. */
+  int i;
+  int N = d * 2000;
+  for (i = 0; i < N; i++)
+asm volatile ("" : : : "memory");
+}
+
+#pragma omp declare variant (nvptx_delay) match(construct={target},device={arch(nvptx)})
+void
+base_delay(int d)
+{
+  usleep (d);
+}
+
 int main ()
 {
   int a = 0, b = 0, c = 0, d[7];
@@ -18,28 +37,28 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[3])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   b |= 4;
 }
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[4])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   b |= 1;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[5])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   c |= 8;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[6])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   c |= 2;
 }


Re: [PATCH] rs6000: Fix extraneous characters in the documentation

2020-10-06 Thread will schmidt via Gcc-patches
On Mon, 2020-10-05 at 17:23 -0300, Tulio Magno Quites Machado Filho via 
Gcc-patches wrote:
> Ping?
+cc Segher  :-)

> 
> Tulio Magno Quites Machado Filho via Gcc-patches  
> writes:
> 
> > Replace them with a whitespace in order to avoid artifacts in the HTML
> > document.
> > 
> > 2020-08-19  Tulio Magno Quites Machado Filho  
> > 
> > gcc/
> > * doc/extend.texi (PowerPC Built-in Functions): Replace
> > extraneous characters with whitespace.
> > ---
> >  gcc/doc/extend.texi | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index bcc251481ca..0c380322280 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -21538,10 +21538,10 @@ void amo_stdat_smin (int64_t *, int64_t);
> >  ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions.
> >  GCC provides support for these instructions through the following built-in
> >  functions which are enabled with the @code{-mmma} option.  The vec_t type
> > -below is defined to be a normal vector unsigned char type.  The uint2, 
> > uint4
> > +below is defined to be a normal vector unsigned char type.  The uint2, 
> > uint4

That looks like a non-breaking space.  (ascii c2 a0) so 
2e c2 a0 20 becomes 2e 20 20 


> >  and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants
> > -respectively.  The compiler will verify that they are constants and that
> > -their values are within range. 
> > +respectively.  The compiler will verify that they are constants and that
> > +their values are within range.

2e c2 a0 20 becomes 2e 20 20

And drops a trailing whitespace.

Those seem reasonable. 
lgtm

Thanks
-Will

> >  
> >  The built-in functions supported are:
> >  
> > -- 
> > 2.25.4
> > 
> 
> 



[committed, wwwdocs] gcc-11/changes: Add notes about column number changes

2020-10-06 Thread David Malcolm via Gcc-patches
I've taken the liberty of pushing this website patch, having checked
that it validates.

It covers the changes by Lewis in 004bb936d6d5f177af26ad4905595e843d5665a5
(PR 49973 and PR 86904).

---
 htdocs/gcc-11/changes.html | 39 ++
 1 file changed, 39 insertions(+)

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 64655120..e2a32e51 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -72,6 +72,45 @@ a work-in-progress.
   control if function entries and exits should be instrumented.
 
   
+  
+
+  In previous releases of GCC, the "column numbers" emitted in diagnostics
+  were actually a count of bytes from the start of the source line.  This
+  could be problematic, both because of:
+
+
+  multibyte characters (requiring more than one byte to encode), 
and
+  multicolumn characters (requiring more than one column to display in 
a monospace font)
+
+
+  For example, the character  ("GREEK SMALL LETTER PI (U+03C0)")
+  occupies one column, and its UTF-8 encoding requires two bytes; the
+  character  ("SLIGHTLY SMILING FACE (U+1F642)") occupies two
+  columns, and its UTF-8 encoding requires four bytes.
+
+
+  In GCC 11 the column numbers default to being column numbers, respecting
+  multi-column characters.  The old behavior can be restored using a new
+  option
+  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-column-unit;>-fdiagnostics-column-unit=byte.
+  There is also a new option
+  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-column-origin;>-fdiagnostics-column-origin=,
+  allowing the pre-existing default of the left-hand column being column
+  1 to be overridden if desired (e.g. for 0-based columns).  The output
+  of
+  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=json
+  has been extended to supply both byte counts and column numbers for all 
source locations.
+
+
+  Additionally, in previous releases of GCC, tab characters in the source
+  would be emitted verbatim when quoting source code, but be prefixed
+  with whitespace or line number information, leading to misalignments
+  in the resulting output when compared with the actual source.  Tab
+  characters are now printed as an appropriate number of spaces, using the
+  https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html#index-ftabstop;>-ftabstop
+  option (which defaults to 8 spaces per tab stop).
+
+  
 
 
 
-- 
2.26.2



Re: [PATCH] xfail and improve some failing libgomp tests

2020-10-06 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 06, 2020 at 04:48:40PM +0200, Tom de Vries wrote:
> On 10/5/20 3:15 PM, Tom de Vries wrote:
> > On 2/7/20 4:29 PM, Jakub Jelinek wrote:
> >> On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
> >>> * {target-32.c, thread-limit-2.c}:
> >>> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690
> >>
> >> Please don't, I want to deal with that using declare variant, just didn't
> >> get yet around to finishing the last patch needed for that.  Will try next 
> >> week.
> >>
> > 
> > Hi Jakub,
> > 
> > Ping, any update on this?

Not finished the last step, I run into LTO issues.  Will need to return to
that soon.
Last progress in "[RFH] LTO cgraph support for late declare variant resolution"
mail from May on gcc-patches.

> --- a/libgomp/testsuite/libgomp.c/target-32.c
> +++ b/libgomp/testsuite/libgomp.c/target-32.c
> @@ -1,6 +1,26 @@
>  #include 
>  #include 
>  
> +extern void base_delay(int);

No need to declare this one early.

> +extern void nvptx_delay(int);

Space before (, and the definition could go here instead of
the declaration.

> +#pragma omp declare variant( nvptx_delay ) match( construct={target}, 
> implementation={vendor(nvidia)} )

This isn't the right declare variant for what we want though,
we only provide gnu as accepted vendor, it is implementation's vendor,
not vendor of one of the hw components.
So, it ought to be instead
#pragma omp declare variant (nvptx_delay) 
match(construct={target},device={arch(nvptx)})

> +void base_delay(int d)
> +{
> +  usleep (d);
> +}

Jakub



Re: [PATCH, 1/3, OpenMP] Target mapping changes for OpenMP 5.0, front-end parts

2020-10-06 Thread Chung-Lin Tang

On 2020/9/29 6:16 PM, Jakub Jelinek wrote:

On Tue, Sep 01, 2020 at 09:16:23PM +0800, Chung-Lin Tang wrote:

this patch set implements parts of the target mapping changes introduced
in OpenMP 5.0, mainly the attachment requirements for pointer-based
list items, and the clause ordering.

The first patch here are the C/C++ front-end changes.


Do you think you could mention in detail which exact target mapping changes
in the spec is the patchset attempting to implement?
5.0 unfortunately contains many target mapping changes and this patchset
can't implement them all and it would be easier to see the list of rules
(e.g. from openmp-diff-full-4.5-5.0.pdf, if you don't have that one, I can
send it to you), rather than trying to guess them from the patchset.

Thanks.


Hi Jakub,
the main implemented features are the clause ordering rules:

 "For a given construct, the effect of a map clause with the to, from, or 
tofrom map-type is
  ordered before the effect of a map clause with the alloc, release, or delete 
map-type."

 "If item1 is a list item in a map clause, and item2 is another list item in a 
map clause on
  the same construct that has a base pointer that is, or is part of, item1, 
then:
* If the map clause(s) appear on a target, target data, or target enter 
data construct,
  then on entry to the corresponding region the effect of the map clause on 
item1 is ordered
  to occur before the effect of the map clause on item2.
    * If the map clause(s) appear on a target, target data, or target exit 
data construct then
  on exit from the corresponding region the effect of the map clause on 
item2 is ordered to
  occur before the effect of the map clause on item1."

and the base-pointer attachment behavior:

 "If a list item in a map clause has a base pointer, and a pointer variable is 
present in the device data
  environment that corresponds to the base pointer when the effect of the map 
clause occurs, then if
  the corresponding pointer or the corresponding list item is created in the 
device data environment
  on entry to the construct, then:
...
2. The corresponding pointer variable becomes an attached pointer for the 
corresponding list item."

(these passages are all in the "2.19.7.1 map Clause" section of the 5.0 spec, 
all are new as
also verified from the diff PDFs you sent us)

Also, because of the these new features, having multiple maps of the same 
variables now have meaning
in OpenMP, so changes in the C/C++ frontends to relax the no-duplicate rules 
are also included.


 gcc/c-family/
 * c-common.h (c_omp_adjust_clauses): New declaration.
 * c-omp.c (c_omp_adjust_clauses): New function.


This function name is too broad, it should have target in it as it is
for processing target* construct clauses only.

Jakub


Sure, I'll update this naming in a later version.

Thanks,
Chung-Lin


Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Martin Sebor via Gcc-patches

On 10/6/20 8:42 AM, Andrew MacLeod wrote:

On 10/6/20 10:30 AM, Martin Sebor wrote:

On 10/6/20 3:45 AM, Aldy Hernandez wrote:

-  builtin_memref dstref (dst, dstsize);
-  builtin_memref srcref (src, srcsize);
+  builtin_memref dstref (query, call, dst, dstsize);
+  builtin_memref srcref (query, call, src, srcsize);

    /* Create a descriptor of the access.  This may adjust both DSTREF
   and SRCREF based on one another and the kind of the access.  */
-  builtin_access acs (call, dstref, srcref);
+  builtin_access acs (query, call, dstref, srcref);


Since/if the query pointer is a member of builtin_memref which is
passed to the builtin_access ctor there should be no need to pass
a second (and third) copy to it as well.


builtin_memref seems like an independent object altogether, and the 
query is a private member of said object.  Are you proposing making 
it public, or making builtin_access a friend of builtin_memref (eeech)?


Either one of those seems preferable to the duplication for the time
being, until there's an API to access the global ranger instance.

A better alternative, in view of your expectation of exposing
the instance via (cfun)->range_of_expr(), is to add some static
namespace scope function to access the range instance.  That
should make adopting the envisioned solution minimally disruptive.

The point was we don't have a fully envisioned solution yet... that is 
just one possibility and may never come to pass.   Each pass should do 
"the right thing" for themselves for now.


Yes, I got that.  Which is why I suggest to add a namespace scope
function to the restrict pass that can then be easily replaced with
whatever solution we ultimately end up with.

What's certain (in my mind anyway) is that storing a pointer to some
global (or per-pass) range instance as a member in each class that
needs to access it is not the solution we want long term.

Martin


Re: [PATCH] xfail and improve some failing libgomp tests

2020-10-06 Thread Tom de Vries
On 10/5/20 3:15 PM, Tom de Vries wrote:
> On 2/7/20 4:29 PM, Jakub Jelinek wrote:
>> On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
>>> * {target-32.c, thread-limit-2.c}:
>>> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690
>>
>> Please don't, I want to deal with that using declare variant, just didn't
>> get yet around to finishing the last patch needed for that.  Will try next 
>> week.
>>
> 
> Hi Jakub,
> 
> Ping, any update on this?

FWIW, I've tried as in patch attached below, but I didn't get it
compiling, I still got:
...
FAIL: libgomp.c/target-32.c (test for excess errors)
Excess errors:
unresolved symbol usleep
...

Jakub, is this already supposed to work?

Thanks,
- Tom
diff --git a/libgomp/testsuite/libgomp.c/target-32.c b/libgomp/testsuite/libgomp.c/target-32.c
index 233877b702b..7ddf8721ed3 100644
--- a/libgomp/testsuite/libgomp.c/target-32.c
+++ b/libgomp/testsuite/libgomp.c/target-32.c
@@ -1,6 +1,26 @@
 #include 
 #include 
 
+extern void base_delay(int);
+extern void nvptx_delay(int);
+
+#pragma omp declare variant( nvptx_delay ) match( construct={target}, implementation={vendor(nvidia)} )
+void base_delay(int d)
+{
+  usleep (d);
+}
+
+void nvptx_delay(int d)
+{
+  /* This function serves as a replacement for usleep in
+ this test case. It does not even attempt to be functionally
+ equivalent  - we just want some sort of delay. */
+  int i;
+  int N = d * 2000;
+  for (i = 0; i < N; i++)
+asm volatile ("" : : : "memory");
+}
+
 int main ()
 {
   int a = 0, b = 0, c = 0, d[7];
@@ -18,28 +38,28 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[3])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   b |= 4;
 }
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[4])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   b |= 1;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[5])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   c |= 8;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[6])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   c |= 2;
 }


Re: [PATCH][openacc] Fix acc declare for VLAs

2020-10-06 Thread Tobias Burnus

LGTM.

Thanks,

Tobias

On 10/6/20 3:28 PM, Tom de Vries wrote:

Hi,

Consider test-case test.c, with VLA A:
...
int main (void) {
   int N = 1000;
   int A[N];
   #pragma acc declare copy(A)
   return 0;
}
...
compiled using:
...
$ gcc test.c -fopenacc -S -fdump-tree-all
...

At original, we have:
...
   #pragma acc declare map(tofrom:A);
...
but at gimple, we have a map (to:A.1), but not a map (from:A.1):
...
   int[0:D.2074] * A.1;

   {
 int A[0:D.2074] [value-expr: *A.1];

 saved_stack.2 = __builtin_stack_save ();
 try
   {
 A.1 = __builtin_alloca_with_align (D.2078, 32);
 #pragma omp target oacc_declare map(to:(*A.1) [len: D.2076])
   }
 finally
   {
 __builtin_stack_restore (saved_stack.2);
   }
   }
...

This is caused by the following incompatibility.  When storing the desired
from clause in oacc_declare_returns, we use 'A.1' as the key:
...
10898 oacc_declare_returns->put (decl, c);
(gdb) call debug_generic_expr (decl)
A.1
(gdb) call debug_generic_expr (c)
map(from:(*A.1))
...
but when looking it up, we use 'A' as the key:
...
(gdb)
1471  tree *c = oacc_declare_returns->get (t);
(gdb) call debug_generic_expr (t)
A
...

Fix this by extracing the 'A.1' lookup key from 'A' using the decl-expr.

In addition, unshare the looked up value, to fix avoid running into
an "incorrect sharing of tree nodes" error.

Using these two fixes, we get our desired:
...
  finally
{
+#pragma omp target oacc_declare map(from:(*A.1))
  __builtin_stack_restore (saved_stack.2);
}
...

Build on x86_64-linux with nvptx accelerator, tested libgomp.

OK for trunk?

Thanks,
- Tom

[openacc] Fix acc declare for VLAs

gcc/ChangeLog:

2020-10-06  Tom de Vries  

  PR middle-end/90861
  * gimplify.c (gimplify_bind_expr): Handle lookup in
  oacc_declare_returns using key with decl-expr.

libgomp/ChangeLog:

2020-10-06  Tom de Vries  

  PR middle-end/90861
  * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Remove xfail.

---
  gcc/gimplify.c| 13 ++---
  libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c |  5 -
  2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2dea03cce3d..fa89e797940 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1468,15 +1468,22 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)

if (flag_openacc && oacc_declare_returns != NULL)
  {
-   tree *c = oacc_declare_returns->get (t);
+   tree key = t;
+   if (DECL_HAS_VALUE_EXPR_P (key))
+ {
+   key = DECL_VALUE_EXPR (key);
+   if (TREE_CODE (key) == INDIRECT_REF)
+ key = TREE_OPERAND (key, 0);
+ }
+   tree *c = oacc_declare_returns->get (key);
if (c != NULL)
  {
if (ret_clauses)
  OMP_CLAUSE_CHAIN (*c) = ret_clauses;

-   ret_clauses = *c;
+   ret_clauses = unshare_expr (*c);

-   oacc_declare_returns->remove (t);
+   oacc_declare_returns->remove (key);

if (oacc_declare_returns->is_empty ())
  {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
index 0f51badca42..714935772c1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
@@ -59,8 +59,3 @@ main ()

return 0;
  }
-
-
-/* { dg-xfail-run-if "TODO PR90861" { *-*-* } { "-DACC_MEM_SHARED=0" } }
-   This might XPASS if the compiler happens to put the two 'A' VLAs at the same
-   address.  */

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 10:30 AM, Martin Sebor wrote:

On 10/6/20 3:45 AM, Aldy Hernandez wrote:

-  builtin_memref dstref (dst, dstsize);
-  builtin_memref srcref (src, srcsize);
+  builtin_memref dstref (query, call, dst, dstsize);
+  builtin_memref srcref (query, call, src, srcsize);

    /* Create a descriptor of the access.  This may adjust both DSTREF
   and SRCREF based on one another and the kind of the access.  */
-  builtin_access acs (call, dstref, srcref);
+  builtin_access acs (query, call, dstref, srcref);


Since/if the query pointer is a member of builtin_memref which is
passed to the builtin_access ctor there should be no need to pass
a second (and third) copy to it as well.


builtin_memref seems like an independent object altogether, and the 
query is a private member of said object.  Are you proposing making 
it public, or making builtin_access a friend of builtin_memref (eeech)?


Either one of those seems preferable to the duplication for the time
being, until there's an API to access the global ranger instance.

A better alternative, in view of your expectation of exposing
the instance via (cfun)->range_of_expr(), is to add some static
namespace scope function to access the range instance.  That
should make adopting the envisioned solution minimally disruptive.

The point was we don't have a fully envisioned solution yet... that is  
just one possibility and may never come to pass.   Each pass should do 
"the right thing" for themselves for now.






[PATCH][GCC-10 backport] arm: Add +nomve and +nomve.fp options to -mcpu=cortex-m55.

2020-10-06 Thread Srinath Parvathaneni via Gcc-patches
Backport of Joe's patch wit no changes.

This patch rearranges feature bits for MVE and FP to implement the
following flags for -mcpu=cortex-m55.

  - +nomve:equivalent to armv8.1-m.main+fp.dp+dsp.
  - +nomve.fp: equivalent to armv8.1-m.main+mve+fp.dp (+dsp is implied by +mve).
  - +nofp: equivalent to armv8.1-m.main+mve (+dsp is implied by +mve).
  - +nodsp:equivalent to armv8.1-m.main+fp.dp.

Combinations of the above:

  - +nomve+nofp: equivalent to armv8.1-m.main+dsp.
  - +nodsp+nofp: equivalent to armv8.1-m.main.

Due to MVE and FP sharing vfp_base, some new syntax was required in the CPU
description to implement the concept of 'implied bits'. These are non-named
features added to the ISA late, depending on whether one or more features which
depend on them are present. This means vfp_base can be present when only one of
MVE and FP is removed, but absent when both are removed.

Ok for GCC-10 branch?

gcc/ChangeLog:

2020-07-31  Joe Ramsay  

* config/arm/arm-cpus.in:
(ALL_FPU_INTERNAL): Remove vfp_base.
(VFPv2): Remove vfp_base.
(MVE): Remove vfp_base.
(vfp_base): Redefine as implied bit dependent on MVE or FP
(cortex-m55): Add flags to disable MVE, MVE FP, FP and DSP extensions.
* config/arm/arm.c (arm_configure_build_target): Add implied bits to 
ISA.
* config/arm/parsecpu.awk:
(gen_isa): Print implied bits and their dependencies to ISA header.
(gen_data): Add parsing for implied feature bits.

gcc/testsuite/ChangeLog:

* gcc.target/arm/cortex-m55-nodsp-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nodsp-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nodsp-nofp-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nofp-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nofp-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nomve-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nomve-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nomve.fp-flag-softfp.c: New test.
* gcc.target/arm/multilib.exp: Add tests for -mcpu=cortex-m55.

(cherry picked from commit 3e8fb15a8cfd0e62dd474af9f536863392ed7572)


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
d609113e969d69505bc2f1b13fab8b1dfd622472..db0b93f6bb74f6ddf42636caa0d9a3db38692982
 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -135,10 +135,6 @@ define feature armv8_1m_main
 # Floating point and Neon extensions.
 # VFPv1 is not supported in GCC.
 
-# This feature bit is enabled for all VFP, MVE and
-# MVE with floating point extensions.
-define feature vfp_base
-
 # Vector floating point v2.
 define feature vfpv2
 
@@ -251,7 +247,7 @@ define fgroup ALL_SIMD  ALL_SIMD_INTERNAL 
ALL_SIMD_EXTERNAL
 
 # List of all FPU bits to strip out if -mfpu is used to override the
 # default.  fp16 is deliberately missing from this list.
-define fgroup ALL_FPU_INTERNAL vfp_base vfpv2 vfpv3 vfpv4 fpv5 fp16conv fp_dbl 
ALL_SIMD_INTERNAL
+define fgroup ALL_FPU_INTERNAL vfpv2 vfpv3 vfpv4 fpv5 fp16conv fp_dbl 
ALL_SIMD_INTERNAL
 # Similarly, but including fp16 and other extensions that aren't part of
 # -mfpu support.
 define fgroup ALL_FPU_EXTERNAL fp16 bf16
@@ -296,11 +292,11 @@ define fgroup ARMv8r  ARMv8a
 define fgroup ARMv8_1m_main ARMv8m_main armv8_1m_main
 
 # Useful combinations.
-define fgroup VFPv2vfp_base vfpv2
+define fgroup VFPv2vfpv2
 define fgroup VFPv3VFPv2 vfpv3
 define fgroup VFPv4VFPv3 vfpv4 fp16conv
 define fgroup FPv5 VFPv4 fpv5
-define fgroup MVE  mve vfp_base armv7em
+define fgroup MVE  mve armv7em
 define fgroup MVE_FP   MVE FPv5 fp16 mve_float
 
 define fgroup FP_DBL   fp_dbl
@@ -310,6 +306,18 @@ define fgroup NEON FP_D32 neon
 define fgroup CRYPTO   NEON crypto
 define fgroup DOTPROD  NEON dotprod
 
+# Implied feature bits.  These are for non-named features shared between 
fgroups.
+# Shared feature f belonging to fgroups A and B will be erroneously removed if:
+# A and B are enabled by default AND A is disabled by a removal flag.
+# To ensure that f is retained, we must add such bits to the ISA after
+# processing the removal flags.  This is implemented by 'implied bits':
+# define implied  []+
+# This indicates that, if any of the listed features are enabled, or if any
+# member of a listed fgroup is enabled, then  will be implicitly enabled.
+
+# Enabled for all VFP, MVE and MVE with floating point extensions.
+define implied vfp_base MVE MVE_FP ALL_FP
+
 # List of all quirk bits to strip out when comparing CPU features with
 # architectures.
 # xscale isn't really a 'quirk', but it isn't an architecture either and we
@@ -1532,6 +1540,10 @@ begin cpu 

Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Martin Sebor via Gcc-patches

On 10/6/20 3:45 AM, Aldy Hernandez wrote:

-  builtin_memref dstref (dst, dstsize);
-  builtin_memref srcref (src, srcsize);
+  builtin_memref dstref (query, call, dst, dstsize);
+  builtin_memref srcref (query, call, src, srcsize);

    /* Create a descriptor of the access.  This may adjust both DSTREF
   and SRCREF based on one another and the kind of the access.  */
-  builtin_access acs (call, dstref, srcref);
+  builtin_access acs (query, call, dstref, srcref);


Since/if the query pointer is a member of builtin_memref which is
passed to the builtin_access ctor there should be no need to pass
a second (and third) copy to it as well.


builtin_memref seems like an independent object altogether, and the 
query is a private member of said object.  Are you proposing making it 
public, or making builtin_access a friend of builtin_memref (eeech)?


Either one of those seems preferable to the duplication for the time
being, until there's an API to access the global ranger instance.

A better alternative, in view of your expectation of exposing
the instance via (cfun)->range_of_expr(), is to add some static
namespace scope function to access the range instance.  That
should make adopting the envisioned solution minimally disruptive.

Martin


Re: [PATCH] lto: fix LTO debug sections copying.

2020-10-06 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 06, 2020 at 12:20:14PM +0200, Martin Liška wrote:
> On 10/6/20 10:00 AM, Richard Biener wrote:
> > On Tue, Oct 6, 2020 at 9:01 AM Martin Liška  wrote:
> > > 
> > > On 10/5/20 6:34 PM, Ian Lance Taylor wrote:
> > > > On Mon, Oct 5, 2020 at 9:09 AM Martin Liška  wrote:
> > > > > 
> > > > > The previous patch was not correct. This one should be.
> > > > > 
> > > > > Ready for master?
> > > > 
> > > > I don't understand why this code uses symtab_indices_shndx at all.
> > > > There should only be one SHT_SYMTAB_SHNDX section.  There shouldn't be
> > > > any need for the symtab_indices_shndx vector.
> > > 
> > > Well, the question is if we can have multiple .symtab sections in one ELF
> > > file? Theoretically yes, so we should also handle SHT_SYMTAB_SHNDX 
> > > sections.
> > > Note that the original usage of the SHT_SYMTAB_SHNDX section was motivated
> > > by PR81968 which is about Solaris ld.
> > 
> > It wasn't my code but I suppose this way the implementation was
> > "easiest".  There
> > should be exactly one symtab / shndx section.  Rainer authored this support.
> 
> If we expect at maximum one SHT_SYMTAB_SHNDX section section, then I'm 
> suggesting
> an updated version of the patch. It's what Ian offered.

gABI says on
SHT_SYMTAB/SHT_DYNSYM:
Currently, an object file may have only one section of each type, but this 
restriction may be relaxed in the future.
SHT_SYMTAB_SHNDX:
This section is associated with a symbol table section and is required if any 
of the section header indexes referenced by that symbol table contain the 
escape value SHN_XINDEX.
So, I guess only at most one SHT_SYMTAB_SHNDX can appear in ET_REL objects
which we are talking about, and at most two SHT_SYMTAB_SHNDX in
ET_EXEC/ET_DYN (though only in the very unlikely case that the binary/dso
contains more than 65536-epsilon sections and both .symtab and .dynsym need
to refer to those.  One would need to play with linker scripts to convince
ld.bfd to create many sections in ET_EXEC/ET_DYN.

Jakub



[GCC-10 backport][COMMITTED] arm: Move iterators from mve.md to iterators.md to maintain consistency.

2020-10-06 Thread Srinath Parvathaneni via Gcc-patches
Backport approved here 
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555618.html .

To maintain consistency with other Arm Architectures backend, iterators and 
iterator attributes are moved
from mve.md file to iterators.md. Also move enumerators for MVE unspecs from 
mve.md file to unspecs.md file.

gcc/ChangeLog:

2020-10-06  Srinath Parvathaneni  

* config/arm/iterators.md (MVE_types): Move mode iterator from mve.md to
iterators.md.
(MVE_VLD_ST): Likewise.
(MVE_0): Likewise.
(MVE_1): Likewise.
(MVE_3): Likewise.
(MVE_2): Likewise.
(MVE_5): Likewise.
(MVE_6): Likewise.
(MVE_CNVT): Move mode attribute iterator from mve.md to iterators.md.
(MVE_LANES): Likewise.
(MVE_constraint): Likewise.
(MVE_constraint1): Likewise.
(MVE_constraint2): Likewise.
(MVE_constraint3): Likewise.
(MVE_pred): Likewise.
(MVE_pred1): Likewise.
(MVE_pred2): Likewise.
(MVE_pred3): Likewise.
(MVE_B_ELEM): Likewise.
(MVE_H_ELEM): Likewise.
(V_sz_elem1): Likewise.
(V_extr_elem): Likewise.
(earlyclobber_32): Likewise.
(supf): Move int attribute from mve.md to iterators.md.
(mode1): Likewise.
(VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
(VMVNQ_N): Likewise.
(VREV64Q): Likewise.
(VCVTQ_FROM_F): Likewise.
(VREV16Q): Likewise.
(VCVTAQ): Likewise.
(VMVNQ): Likewise.
(VDUPQ_N): Likewise.
(VCLZQ): Likewise.
(VADDVQ): Likewise.
(VREV32Q): Likewise.
(VMOVLBQ): Likewise.
(VMOVLTQ): Likewise.
(VCVTPQ): Likewise.
(VCVTNQ): Likewise.
(VCVTMQ): Likewise.
(VADDLVQ): Likewise.
(VCTPQ): Likewise.
(VCTPQ_M): Likewise.
(VCVTQ_N_TO_F): Likewise.
(VCREATEQ): Likewise.
(VSHRQ_N): Likewise.
(VCVTQ_N_FROM_F): Likewise.
(VADDLVQ_P): Likewise.
(VCMPNEQ): Likewise.
(VSHLQ): Likewise.
(VABDQ): Likewise.
(VADDQ_N): Likewise.
(VADDVAQ): Likewise.
(VADDVQ_P): Likewise.
(VANDQ): Likewise.
(VBICQ): Likewise.
(VBRSRQ_N): Likewise.
(VCADDQ_ROT270): Likewise.
(VCADDQ_ROT90): Likewise.
(VCMPEQQ): Likewise.
(VCMPEQQ_N): Likewise.
(VCMPNEQ_N): Likewise.
(VEORQ): Likewise.
(VHADDQ): Likewise.
(VHADDQ_N): Likewise.
(VHSUBQ): Likewise.
(VHSUBQ_N): Likewise.
(VMAXQ): Likewise.
(VMAXVQ): Likewise.
(VMINQ): Likewise.
(VMINVQ): Likewise.
(VMLADAVQ): Likewise.
(VMULHQ): Likewise.
(VMULLBQ_INT): Likewise.
(VMULLTQ_INT): Likewise.
(VMULQ): Likewise.
(VMULQ_N): Likewise.
(VORNQ): Likewise.
(VORRQ): Likewise.
(VQADDQ): Likewise.
(VQADDQ_N): Likewise.
(VQRSHLQ): Likewise.
(VQRSHLQ_N): Likewise.
(VQSHLQ): Likewise.
(VQSHLQ_N): Likewise.
(VQSHLQ_R): Likewise.
(VQSUBQ): Likewise.
(VQSUBQ_N): Likewise.
(VRHADDQ): Likewise.
(VRMULHQ): Likewise.
(VRSHLQ): Likewise.
(VRSHLQ_N): Likewise.
(VRSHRQ_N): Likewise.
(VSHLQ_N): Likewise.
(VSHLQ_R): Likewise.
(VSUBQ): Likewise.
(VSUBQ_N): Likewise.
(VADDLVAQ): Likewise.
(VBICQ_N): Likewise.
(VMLALDAVQ): Likewise.
(VMLALDAVXQ): Likewise.
(VMOVNBQ): Likewise.
(VMOVNTQ): Likewise.
(VORRQ_N): Likewise.
(VQMOVNBQ): Likewise.
(VQMOVNTQ): Likewise.
(VSHLLBQ_N): Likewise.
(VSHLLTQ_N): Likewise.
(VRMLALDAVHQ): Likewise.
(VBICQ_M_N): Likewise.
(VCVTAQ_M): Likewise.
(VCVTQ_M_TO_F): Likewise.
(VQRSHRNBQ_N): Likewise.
(VABAVQ): Likewise.
(VSHLCQ): Likewise.
(VRMLALDAVHAQ): Likewise.
(VADDVAQ_P): Likewise.
(VCLZQ_M): Likewise.
(VCMPEQQ_M_N): Likewise.
(VCMPEQQ_M): Likewise.
(VCMPNEQ_M_N): Likewise.
(VCMPNEQ_M): Likewise.
(VDUPQ_M_N): Likewise.
(VMAXVQ_P): Likewise.
(VMINVQ_P): Likewise.
(VMLADAVAQ): Likewise.
(VMLADAVQ_P): Likewise.
(VMLAQ_N): Likewise.
(VMLASQ_N): Likewise.
(VMVNQ_M): Likewise.
(VPSELQ): Likewise.
(VQDMLAHQ_N): Likewise.
(VQRDMLAHQ_N): Likewise.
(VQRDMLASHQ_N): Likewise.
(VQRSHLQ_M_N): Likewise.
(VQSHLQ_M_R): Likewise.
(VREV64Q_M): Likewise.
(VRSHLQ_M_N): Likewise.
(VSHLQ_M_R): Likewise.
(VSLIQ_N): Likewise.
(VSRIQ_N): Likewise.
(VMLALDAVQ_P): Likewise.
(VQMOVNBQ_M): Likewise.
(VMOVLTQ_M): Likewise.
(VMOVNBQ_M): Likewise.
(VRSHRNTQ_N): Likewise.

Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 06, 2020 at 03:48:38PM +0200, Martin Liška wrote:
> On 10/6/20 9:47 AM, Richard Biener wrote:
> > But is it really extensible with the current implementation?  I doubt so.
> 
> I must agree with the statement. So let's make the pass properly.
> I would need a help with the algorithm where I'm planning to do the following
> steps:
> 
> 1) for each BB ending with a gcond, parse index variable and it's VR;
>I'll support:
>a) index == 123 ([123, 123])
>b) 1 <= index && index <= 9 ([1, 9])
>c) index == 123 || index == 12345 ([123, 123] [12345, 12345])
>d) index != 1 ([1, 1])
>e) index != 1 && index != 5 ([1, 1] [5, 5])

The fold_range_test created cases are essential to support, so
f) index - 123U < 456U ([123, 456+123])
g) (unsigned) index - 123U < 456U (ditto)
but the discovery should actually recurse on all of those forms, so it will
handle
(unsigned) index - 123U < 456U || (unsigned) index - 16384U <= 32711U
etc.
You can see what reassoc init_range_entry does and do something similar?

Jakub



Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Martin Liška

On 10/6/20 9:47 AM, Richard Biener wrote:

But is it really extensible with the current implementation?  I doubt so.


I must agree with the statement. So let's make the pass properly.
I would need a help with the algorithm where I'm planning to do the following
steps:

1) for each BB ending with a gcond, parse index variable and it's VR;
   I'll support:
   a) index == 123 ([123, 123])
   b) 1 <= index && index <= 9 ([1, 9])
   c) index == 123 || index == 12345 ([123, 123] [12345, 12345])
   d) index != 1 ([1, 1])
   e) index != 1 && index != 5 ([1, 1] [5, 5])

2) switch range edge is identified, e.g. true_edge for 1e, while false_edge for 
1a

3) I want to support forward code hoisting, so for each condition BB we need to 
identify
   if the block contains only stmts without a side-effect

4) we can ignore BBs with condition variables that has small number of potential
   case switches

5) for each condition variable we can iterate bottom up in dominator order and 
try
   to find a BB predecessor chain (in first phase no BB in between such 
"condition" BBs)
   that uses a same condition variable

6) the chain will be converted to a switch statement
7) code hoisting must be done in order to move a gimple statements and fix 
potential
   gphis that can be collapsed

Is it something feasible that can work?
Thanks,

Martin



[PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-06 Thread Qing Zhao via Gcc-patches
Hi, Gcc team,

This is the 3rd version of the implementation of patch -fzero-call-used-regs.

We will provide a new feature into GCC:

Add 
-fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
 command-line option
and
zero_call_used_regs("skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all")
 function attribues:

   1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")

   Don't zero call-used registers upon function return. This is the default 
behavior.

   2. -fzero-call-used-regs=used-gpr-arg and zero_call_used_regs("used-gpr-arg")

   Zero used call-used general purpose registers that are used to pass 
parameters upon function return.

   3. -fzero-call-used-regs=used-arg and zero_call_used_regs("used-arg")

   Zero used call-used registers that are used to pass parameters upon function 
return.

   4. -fzero-call-used-regs=all-arg and zero_call_used_regs("all-arg")

   Zero all call-used registers that are used to pass parameters upon function 
return.

   5. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")

   Zero used call-used general purpose registers upon function return.

   6. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")

   Zero all call-used general purpose registers upon function return.

   7. -fzero-call-used-regs=used and zero_call_used_regs("used")

   Zero used call-used registers upon function return.

   8. -fzero-call-used-regs=all and zero_call_used_regs("all")

   Zero all call-used registers upon function return.

Zero call-used registers at function return to increase the program
security by either mitigating Return-Oriented Programming (ROP) or
preventing information leak through registers.

{skip}, which is the default, doesn't zero call-used registers.

{used-arg-gpr} zeros used call-used general purpose registers that
pass parameters. {used-arg} zeros used call-used registers that
pass parameters. {arg} zeros all call-used registers that pass
parameters. These 3 choices are used for ROP mitigation.

{used-gpr} zeros call-used general purpose registers
which are used in function.  {all-gpr} zeros all
call-used registers.  {used} zeros call-used registers which
are used in function.  {all} zeros all call-used registers.
These 4 choices are used for preventing information leak through
registers.

You can control this behavior for a specific function by using the function
attribute {zero_call_used_regs}.

**Tests be done:
1. Gcc bootstrap on x86, aarch64 and rs6000.
2. Regression test on x86, aarch64 and rs6000.
(X86, aarch64 have no any issue, rs6000 failed at the new testing case in 
middle end which is expected)

3. Cpu2017 on x86, -O2 
-fzero-call-used-regs=used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all

**runtime performance data of CPU2017 on x86
https://gitlab.com/x86-gcc/gcc/-/wikis/uploads/e9c5bedba6e387586364571f2eae3b8d/zero_call_used_regs_runtime_New.csv
 


**The major changes compared to the previous version are:

1. Add 3 new sub-options and corresponding function attributes:
  used-gpr-arg, used-arg, all-arg
  for ROP mitigation purpose;
2. Updated user manual;
3. Re-design of the implementation:

  3.1 data flow change to reflect the newly added zeroing insns to avoid
  these insns been deleted, moved, or merged by later passes:

  3.1.1.
  abstract EPILOGUE_USES into a new target-independent wrapper function that
  (a) returns true if EPILOGUE_USES itself returns true and (b) returns
  true for registers that need to be zero on return, if the zeroing
  instructions have already been inserted.  The places that currently
  test EPILOGUE_USES should then test this new wrapper function instead.

  Add this new wrapper function to df.h and df-scan.c.

  3.1.2.
  add a new utility routine "expand_asm_reg_clobber_mem_blockage" to generate
  a volatile asm insn that clobbers all the hard registers that are zeroed.

  emit this volatile asm in the very beginning of the zeroing sequence.

  3.2 new pass:
  add a new pass in the beginning of "late_compilation", before
  "pass_compute_alignment", called "pass_zero_call_used_regs".

  in this new pass,
  * compute the data flow information; (df_analyze ());
  * scan the exit block from backward to look for "return":
A. for each return, compute the "need_zeroed_hardregs" based on
the user request, and data flow information, and function ABI info.
B. pass this need_zeroed_hardregs set to target hook "zero_call_used_regs"
to generate the instruction sequnce that zero the regs.
C. Data flow maintenance. 
4.Use "lookup_attribute" to get the attribute information instead of setting
  the attribute information into "tree_decl_with_vis" in tree-core.h.

**The changelog:

gcc/ChangeLog: 
2020-10-05  Qing Zhao  mailto:qing.z...@oracle.com>>
H.J. Lu  

RE: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md to maintain consistency.

2020-10-06 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 06 October 2020 14:55
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md
> to maintain consistency.
> 
> Hi Kyrill,
> 
> > -Original Message-
> > From: Kyrylo Tkachov 
> > Sent: 06 October 2020 14:42
> > To: Srinath Parvathaneni ; gcc-
> > patc...@gcc.gnu.org
> > Subject: RE: [PATCH][GCC] arm: Move iterators from mve.md to
> iterators.md
> > to maintain consistency.
> >
> >
> >
> > > -Original Message-
> > > From: Srinath Parvathaneni 
> > > Sent: 06 October 2020 13:27
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Kyrylo Tkachov 
> > > Subject: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md
> > > to maintain consistency.
> > >
> > > Hello,
> > >
> > > To maintain consistency with other Arm Architectures backend,
> > > iterators and iterator attributes are moved from mve.md file to
> > > iterators.md. Also move enumerators for MVE unspecs from mve.md file
> > > to unspecs.md file.
> > >
> > > Regression tested on arm-none-eabi and found no regressions.
> > >
> > > Ok for master? Ok for GCC-10 branch?
> >
> > Ok for trunk.
> > I'm not sure if this is needed for the GCC 10 branch (but am open to being
> > convinced otherwise?)
> 
> Thanks for approving this patch.
> Backporting this patch avoid conflicts when backporting any bug fix
> modifying MVE Patterns (iterators and unspecs), I hope this convinces you.

Ok then, thanks for explaining.
Kyrill

> 
> Regards,
> SRI.
> >
> > Thanks,
> > Kyrill
> >
> > >
> > > Regards,
> > > Srinath.
> > >
> > > gcc/ChangeLog:
> > >
> > > 2020-10-06  Srinath Parvathaneni  
> > >
> > >   * config/arm/iterators.md (MVE_types): Move mode iterator from
> > mve.md
> > > to
> > >   iterators.md.
> > >   (MVE_VLD_ST): Likewise.
> > >   (MVE_0): Likewise.
> > >   (MVE_1): Likewise.
> > >   (MVE_3): Likewise.
> > >   (MVE_2): Likewise.
> > >   (MVE_5): Likewise.
> > >   (MVE_6): Likewise.
> > >   (MVE_CNVT): Move mode attribute iterator from mve.md to
> > iterators.md.
> > >   (MVE_LANES): Likewise.
> > >   (MVE_constraint): Likewise.
> > >   (MVE_constraint1): Likewise.
> > >   (MVE_constraint2): Likewise.
> > >   (MVE_constraint3): Likewise.
> > >   (MVE_pred): Likewise.
> > >   (MVE_pred1): Likewise.
> > >   (MVE_pred2): Likewise.
> > >   (MVE_pred3): Likewise.
> > >   (MVE_B_ELEM): Likewise.
> > >   (MVE_H_ELEM): Likewise.
> > >   (V_sz_elem1): Likewise.
> > >   (V_extr_elem): Likewise.
> > >   (earlyclobber_32): Likewise.
> > >   (supf): Move int attribute from mve.md to iterators.md.
> > >   (mode1): Likewise.
> > >   (VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
> > >   (VMVNQ_N): Likewise.
> > >   (VREV64Q): Likewise.
> > >   (VCVTQ_FROM_F): Likewise.
> > >   (VREV16Q): Likewise.
> > >   (VCVTAQ): Likewise.
> > >   (VMVNQ): Likewise.
> > >   (VDUPQ_N): Likewise.
> > >   (VCLZQ): Likewise.
> > >   (VADDVQ): Likewise.
> > >   (VREV32Q): Likewise.
> > >   (VMOVLBQ): Likewise.
> > >   (VMOVLTQ): Likewise.
> > >   (VCVTPQ): Likewise.
> > >   (VCVTNQ): Likewise.
> > >   (VCVTMQ): Likewise.
> > >   (VADDLVQ): Likewise.
> > >   (VCTPQ): Likewise.
> > >   (VCTPQ_M): Likewise.
> > >   (VCVTQ_N_TO_F): Likewise.
> > >   (VCREATEQ): Likewise.
> > >   (VSHRQ_N): Likewise.
> > >   (VCVTQ_N_FROM_F): Likewise.
> > >   (VADDLVQ_P): Likewise.
> > >   (VCMPNEQ): Likewise.
> > >   (VSHLQ): Likewise.
> > >   (VABDQ): Likewise.
> > >   (VADDQ_N): Likewise.
> > >   (VADDVAQ): Likewise.
> > >   (VADDVQ_P): Likewise.
> > >   (VANDQ): Likewise.
> > >   (VBICQ): Likewise.
> > >   (VBRSRQ_N): Likewise.
> > >   (VCADDQ_ROT270): Likewise.
> > >   (VCADDQ_ROT90): Likewise.
> > >   (VCMPEQQ): Likewise.
> > >   (VCMPEQQ_N): Likewise.
> > >   (VCMPNEQ_N): Likewise.
> > >   (VEORQ): Likewise.
> > >   (VHADDQ): Likewise.
> > >   (VHADDQ_N): Likewise.
> > >   (VHSUBQ): Likewise.
> > >   (VHSUBQ_N): Likewise.
> > >   (VMAXQ): Likewise.
> > >   (VMAXVQ): Likewise.
> > >   (VMINQ): Likewise.
> > >   (VMINVQ): Likewise.
> > >   (VMLADAVQ): Likewise.
> > >   (VMULHQ): Likewise.
> > >   (VMULLBQ_INT): Likewise.
> > >   (VMULLTQ_INT): Likewise.
> > >   (VMULQ): Likewise.
> > >   (VMULQ_N): Likewise.
> > >   (VORNQ): Likewise.
> > >   (VORRQ): Likewise.
> > >   (VQADDQ): Likewise.
> > >   (VQADDQ_N): Likewise.
> > >   (VQRSHLQ): Likewise.
> > >   (VQRSHLQ_N): Likewise.
> > >   (VQSHLQ): Likewise.
> > >   (VQSHLQ_N): Likewise.
> > >   (VQSHLQ_R): Likewise.
> > >   (VQSUBQ): Likewise.
> > >   (VQSUBQ_N): Likewise.
> > >   (VRHADDQ): Likewise.
> > >   (VRMULHQ): Likewise.
> > >   (VRSHLQ): Likewise.
> > >   (VRSHLQ_N): Likewise.
> > >   (VRSHRQ_N): Likewise.
> > >   (VSHLQ_N): Likewise.
> > >   (VSHLQ_R): Likewise.
> > >   (VSUBQ): Likewise.
> > >   (VSUBQ_N): Likewise.
> > >   (VADDLVAQ): Likewise.
> > >   (VBICQ_N): Likewise.
> > >   (VMLALDAVQ): Likewise.
> > >   (VMLALDAVXQ): Likewise.
> > >   (VMOVNBQ): Likewise.
> > >   (VMOVNTQ): 

RE: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md to maintain consistency.

2020-10-06 Thread Srinath Parvathaneni via Gcc-patches
Hi Kyrill,

> -Original Message-
> From: Kyrylo Tkachov 
> Sent: 06 October 2020 14:42
> To: Srinath Parvathaneni ; gcc-
> patc...@gcc.gnu.org
> Subject: RE: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md
> to maintain consistency.
> 
> 
> 
> > -Original Message-
> > From: Srinath Parvathaneni 
> > Sent: 06 October 2020 13:27
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov 
> > Subject: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md
> > to maintain consistency.
> >
> > Hello,
> >
> > To maintain consistency with other Arm Architectures backend,
> > iterators and iterator attributes are moved from mve.md file to
> > iterators.md. Also move enumerators for MVE unspecs from mve.md file
> > to unspecs.md file.
> >
> > Regression tested on arm-none-eabi and found no regressions.
> >
> > Ok for master? Ok for GCC-10 branch?
> 
> Ok for trunk.
> I'm not sure if this is needed for the GCC 10 branch (but am open to being
> convinced otherwise?)

Thanks for approving this patch.
Backporting this patch avoid conflicts when backporting any bug fix modifying 
MVE Patterns (iterators and unspecs), I hope this convinces you.

Regards,
SRI.
> 
> Thanks,
> Kyrill
> 
> >
> > Regards,
> > Srinath.
> >
> > gcc/ChangeLog:
> >
> > 2020-10-06  Srinath Parvathaneni  
> >
> > * config/arm/iterators.md (MVE_types): Move mode iterator from
> mve.md
> > to
> > iterators.md.
> > (MVE_VLD_ST): Likewise.
> > (MVE_0): Likewise.
> > (MVE_1): Likewise.
> > (MVE_3): Likewise.
> > (MVE_2): Likewise.
> > (MVE_5): Likewise.
> > (MVE_6): Likewise.
> > (MVE_CNVT): Move mode attribute iterator from mve.md to
> iterators.md.
> > (MVE_LANES): Likewise.
> > (MVE_constraint): Likewise.
> > (MVE_constraint1): Likewise.
> > (MVE_constraint2): Likewise.
> > (MVE_constraint3): Likewise.
> > (MVE_pred): Likewise.
> > (MVE_pred1): Likewise.
> > (MVE_pred2): Likewise.
> > (MVE_pred3): Likewise.
> > (MVE_B_ELEM): Likewise.
> > (MVE_H_ELEM): Likewise.
> > (V_sz_elem1): Likewise.
> > (V_extr_elem): Likewise.
> > (earlyclobber_32): Likewise.
> > (supf): Move int attribute from mve.md to iterators.md.
> > (mode1): Likewise.
> > (VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
> > (VMVNQ_N): Likewise.
> > (VREV64Q): Likewise.
> > (VCVTQ_FROM_F): Likewise.
> > (VREV16Q): Likewise.
> > (VCVTAQ): Likewise.
> > (VMVNQ): Likewise.
> > (VDUPQ_N): Likewise.
> > (VCLZQ): Likewise.
> > (VADDVQ): Likewise.
> > (VREV32Q): Likewise.
> > (VMOVLBQ): Likewise.
> > (VMOVLTQ): Likewise.
> > (VCVTPQ): Likewise.
> > (VCVTNQ): Likewise.
> > (VCVTMQ): Likewise.
> > (VADDLVQ): Likewise.
> > (VCTPQ): Likewise.
> > (VCTPQ_M): Likewise.
> > (VCVTQ_N_TO_F): Likewise.
> > (VCREATEQ): Likewise.
> > (VSHRQ_N): Likewise.
> > (VCVTQ_N_FROM_F): Likewise.
> > (VADDLVQ_P): Likewise.
> > (VCMPNEQ): Likewise.
> > (VSHLQ): Likewise.
> > (VABDQ): Likewise.
> > (VADDQ_N): Likewise.
> > (VADDVAQ): Likewise.
> > (VADDVQ_P): Likewise.
> > (VANDQ): Likewise.
> > (VBICQ): Likewise.
> > (VBRSRQ_N): Likewise.
> > (VCADDQ_ROT270): Likewise.
> > (VCADDQ_ROT90): Likewise.
> > (VCMPEQQ): Likewise.
> > (VCMPEQQ_N): Likewise.
> > (VCMPNEQ_N): Likewise.
> > (VEORQ): Likewise.
> > (VHADDQ): Likewise.
> > (VHADDQ_N): Likewise.
> > (VHSUBQ): Likewise.
> > (VHSUBQ_N): Likewise.
> > (VMAXQ): Likewise.
> > (VMAXVQ): Likewise.
> > (VMINQ): Likewise.
> > (VMINVQ): Likewise.
> > (VMLADAVQ): Likewise.
> > (VMULHQ): Likewise.
> > (VMULLBQ_INT): Likewise.
> > (VMULLTQ_INT): Likewise.
> > (VMULQ): Likewise.
> > (VMULQ_N): Likewise.
> > (VORNQ): Likewise.
> > (VORRQ): Likewise.
> > (VQADDQ): Likewise.
> > (VQADDQ_N): Likewise.
> > (VQRSHLQ): Likewise.
> > (VQRSHLQ_N): Likewise.
> > (VQSHLQ): Likewise.
> > (VQSHLQ_N): Likewise.
> > (VQSHLQ_R): Likewise.
> > (VQSUBQ): Likewise.
> > (VQSUBQ_N): Likewise.
> > (VRHADDQ): Likewise.
> > (VRMULHQ): Likewise.
> > (VRSHLQ): Likewise.
> > (VRSHLQ_N): Likewise.
> > (VRSHRQ_N): Likewise.
> > (VSHLQ_N): Likewise.
> > (VSHLQ_R): Likewise.
> > (VSUBQ): Likewise.
> > (VSUBQ_N): Likewise.
> > (VADDLVAQ): Likewise.
> > (VBICQ_N): Likewise.
> > (VMLALDAVQ): Likewise.
> > (VMLALDAVXQ): Likewise.
> > (VMOVNBQ): Likewise.
> > (VMOVNTQ): Likewise.
> > (VORRQ_N): Likewise.
> > (VQMOVNBQ): Likewise.
> > (VQMOVNTQ): Likewise.
> > (VSHLLBQ_N): Likewise.
> > (VSHLLTQ_N): Likewise.
> > (VRMLALDAVHQ): Likewise.
> > (VBICQ_M_N): Likewise.
> > (VCVTAQ_M): Likewise.
> > (VCVTQ_M_TO_F): Likewise.
> > (VQRSHRNBQ_N): Likewise.
> > (VABAVQ): Likewise.
> > (VSHLCQ): Likewise.
> > 

RE: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-10-06 Thread Kyrylo Tkachov via Gcc-patches
Hi Dennis,

> -Original Message-
> From: Dennis Zhang 
> Sent: 06 October 2020 14:37
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; nd ;
> Richard Earnshaw ; Ramana Radhakrishnan
> 
> Subject: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
> 
> On 9/16/20 4:00 PM, Dennis Zhang wrote:
> > Hi all,
> >
> > This patch enables SIMD modes for MVE auto-vectorization.
> > In this patch, the integer and float MVE SIMD modes are returned by
> > arm_preferred_simd_mode
> (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> > MVE or MVE_FLOAT is enabled.
> > Then the expanders for auto-vectorization can be used for generating MVE
> > SIMD code.
> >
> > This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> > revealed by the enabled MVE SIMD modes.
> > The tests are for checking the MVE reinterpret intrinsics.
> > There are two functions in each of the tests. The two functions contain
> > the pattern of identical code so that they are folded in icf pass.
> > Because of icf, the instruction count only checks one function which is 8.
> > However when the SIMD modes are enabled, the estimation of the code
> size
> > becomes smaller so that inlining is applied after icf, then the
> > instruction count becomes 16 which causes failure of the tests.
> > Because the icf is not the expected pattern to be tested but causes
> > above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> > instruction count.
> >
> > This patch is separated from
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> > because this part is not strongly connected to the aim of that one so
> > that causing confusion.
> >
> > Regtested and bootstraped.
> >
> > Is it OK for trunk please?

Ok.
Sorry for the delay.
Kyrill

> >
> > Thanks
> > Dennis
> >
> > gcc/ChangeLog:
> >
> > 2020-09-15  Dennis Zhang  
> >
> > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD
> modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-09-15  Dennis Zhang  
> >
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> > option -fno-ipa-icf and change the instruction count from 8 to 16.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> >
> 
> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-
> September/554100.html



RE: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md to maintain consistency.

2020-10-06 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 06 October 2020 13:27
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md to
> maintain consistency.
> 
> Hello,
> 
> To maintain consistency with other Arm Architectures backend, iterators and
> iterator attributes are moved
> from mve.md file to iterators.md. Also move enumerators for MVE unspecs
> from mve.md file to unspecs.md file.
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for master? Ok for GCC-10 branch?

Ok for trunk.
I'm not sure if this is needed for the GCC 10 branch (but am open to being 
convinced otherwise?)

Thanks,
Kyrill

> 
> Regards,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2020-10-06  Srinath Parvathaneni  
> 
>   * config/arm/iterators.md (MVE_types): Move mode iterator from
> mve.md to
>   iterators.md.
>   (MVE_VLD_ST): Likewise.
>   (MVE_0): Likewise.
>   (MVE_1): Likewise.
>   (MVE_3): Likewise.
>   (MVE_2): Likewise.
>   (MVE_5): Likewise.
>   (MVE_6): Likewise.
>   (MVE_CNVT): Move mode attribute iterator from mve.md to
> iterators.md.
>   (MVE_LANES): Likewise.
>   (MVE_constraint): Likewise.
>   (MVE_constraint1): Likewise.
>   (MVE_constraint2): Likewise.
>   (MVE_constraint3): Likewise.
>   (MVE_pred): Likewise.
>   (MVE_pred1): Likewise.
>   (MVE_pred2): Likewise.
>   (MVE_pred3): Likewise.
>   (MVE_B_ELEM): Likewise.
>   (MVE_H_ELEM): Likewise.
>   (V_sz_elem1): Likewise.
>   (V_extr_elem): Likewise.
>   (earlyclobber_32): Likewise.
>   (supf): Move int attribute from mve.md to iterators.md.
>   (mode1): Likewise.
>   (VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
>   (VMVNQ_N): Likewise.
>   (VREV64Q): Likewise.
>   (VCVTQ_FROM_F): Likewise.
>   (VREV16Q): Likewise.
>   (VCVTAQ): Likewise.
>   (VMVNQ): Likewise.
>   (VDUPQ_N): Likewise.
>   (VCLZQ): Likewise.
>   (VADDVQ): Likewise.
>   (VREV32Q): Likewise.
>   (VMOVLBQ): Likewise.
>   (VMOVLTQ): Likewise.
>   (VCVTPQ): Likewise.
>   (VCVTNQ): Likewise.
>   (VCVTMQ): Likewise.
>   (VADDLVQ): Likewise.
>   (VCTPQ): Likewise.
>   (VCTPQ_M): Likewise.
>   (VCVTQ_N_TO_F): Likewise.
>   (VCREATEQ): Likewise.
>   (VSHRQ_N): Likewise.
>   (VCVTQ_N_FROM_F): Likewise.
>   (VADDLVQ_P): Likewise.
>   (VCMPNEQ): Likewise.
>   (VSHLQ): Likewise.
>   (VABDQ): Likewise.
>   (VADDQ_N): Likewise.
>   (VADDVAQ): Likewise.
>   (VADDVQ_P): Likewise.
>   (VANDQ): Likewise.
>   (VBICQ): Likewise.
>   (VBRSRQ_N): Likewise.
>   (VCADDQ_ROT270): Likewise.
>   (VCADDQ_ROT90): Likewise.
>   (VCMPEQQ): Likewise.
>   (VCMPEQQ_N): Likewise.
>   (VCMPNEQ_N): Likewise.
>   (VEORQ): Likewise.
>   (VHADDQ): Likewise.
>   (VHADDQ_N): Likewise.
>   (VHSUBQ): Likewise.
>   (VHSUBQ_N): Likewise.
>   (VMAXQ): Likewise.
>   (VMAXVQ): Likewise.
>   (VMINQ): Likewise.
>   (VMINVQ): Likewise.
>   (VMLADAVQ): Likewise.
>   (VMULHQ): Likewise.
>   (VMULLBQ_INT): Likewise.
>   (VMULLTQ_INT): Likewise.
>   (VMULQ): Likewise.
>   (VMULQ_N): Likewise.
>   (VORNQ): Likewise.
>   (VORRQ): Likewise.
>   (VQADDQ): Likewise.
>   (VQADDQ_N): Likewise.
>   (VQRSHLQ): Likewise.
>   (VQRSHLQ_N): Likewise.
>   (VQSHLQ): Likewise.
>   (VQSHLQ_N): Likewise.
>   (VQSHLQ_R): Likewise.
>   (VQSUBQ): Likewise.
>   (VQSUBQ_N): Likewise.
>   (VRHADDQ): Likewise.
>   (VRMULHQ): Likewise.
>   (VRSHLQ): Likewise.
>   (VRSHLQ_N): Likewise.
>   (VRSHRQ_N): Likewise.
>   (VSHLQ_N): Likewise.
>   (VSHLQ_R): Likewise.
>   (VSUBQ): Likewise.
>   (VSUBQ_N): Likewise.
>   (VADDLVAQ): Likewise.
>   (VBICQ_N): Likewise.
>   (VMLALDAVQ): Likewise.
>   (VMLALDAVXQ): Likewise.
>   (VMOVNBQ): Likewise.
>   (VMOVNTQ): Likewise.
>   (VORRQ_N): Likewise.
>   (VQMOVNBQ): Likewise.
>   (VQMOVNTQ): Likewise.
>   (VSHLLBQ_N): Likewise.
>   (VSHLLTQ_N): Likewise.
>   (VRMLALDAVHQ): Likewise.
>   (VBICQ_M_N): Likewise.
>   (VCVTAQ_M): Likewise.
>   (VCVTQ_M_TO_F): Likewise.
>   (VQRSHRNBQ_N): Likewise.
>   (VABAVQ): Likewise.
>   (VSHLCQ): Likewise.
>   (VRMLALDAVHAQ): Likewise.
>   (VADDVAQ_P): Likewise.
>   (VCLZQ_M): Likewise.
>   (VCMPEQQ_M_N): Likewise.
>   (VCMPEQQ_M): Likewise.
>   (VCMPNEQ_M_N): Likewise.
>   (VCMPNEQ_M): Likewise.
>   (VDUPQ_M_N): Likewise.
>   (VMAXVQ_P): Likewise.
>   (VMINVQ_P): Likewise.
>   (VMLADAVAQ): Likewise.
>   (VMLADAVQ_P): Likewise.
>   (VMLAQ_N): Likewise.
>   (VMLASQ_N): Likewise.
>   (VMVNQ_M): Likewise.
>   (VPSELQ): Likewise.
>   (VQDMLAHQ_N): Likewise.
>   

RE: [PATCH][GCC-10 backport] arm: Remove coercion from scalar argument to vmin & vmax intrinsics.

2020-10-06 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 06 October 2020 14:37
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH][GCC-10 backport] arm: Remove coercion from scalar
> argument to vmin & vmax intrinsics.
> 
> Hello,
> 
> Straight backport of Joe's patch with no changes.
> 
> This patch fixes an issue with vmin* and vmax* intrinsics which accept
> a scalar argument. Previously when the scalar was of different width
> to the vector elements this would generate __ARM_undef. This change
> allows the scalar argument to be implicitly converted to the correct
> width. Also tidied up the relevant unit tests, some of which would
> have passed even if only one of two or three intrinsic calls had
> compiled correctly.
> 
> Bootstrapped and tested on arm-none-eabi, gcc and CMSIS_DSP
> testsuites are clean. OK for trunk?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Joe
> 
> gcc/ChangeLog:
> 
> 2020-08-10  Joe Ramsay  
> 
>   * config/arm/arm_mve.h (__arm_vmaxnmavq): Remove coercion of
> scalar
>   argument.
>   (__arm_vmaxnmvq): Likewise.
>   (__arm_vminnmavq): Likewise.
>   (__arm_vminnmvq): Likewise.
>   (__arm_vmaxnmavq_p): Likewise.
>   (__arm_vmaxnmvq_p): Likewise (and delete duplicate definition).
>   (__arm_vminnmavq_p): Likewise.
>   (__arm_vminnmvq_p): Likewise.
>   (__arm_vmaxavq): Likewise.
>   (__arm_vmaxavq_p): Likewise.
>   (__arm_vmaxvq): Likewise.
>   (__arm_vmaxvq_p): Likewise.
>   (__arm_vminavq): Likewise.
>   (__arm_vminavq_p): Likewise.
>   (__arm_vminvq): Likewise.
>   (__arm_vminvq_p): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Add test for
> mismatched
>   width of scalar argument.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
>   * 

Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Richard Biener via Gcc-patches
On Tue, Oct 6, 2020 at 3:09 PM Martin Liška  wrote:
>
> On 10/6/20 2:56 PM, Andrew MacLeod wrote:
> > Ah, by just using the outgoing_range class, all you are getting is static 
> > edges.  so a TRUE edge is always a [1,1] and a false edge is [0,0]
> > I provided that class so you could get the constant edges on switches.
> >
> > if you want to get actual ranges for ssa-names, you will need the ranger 
> > (which I think is going in today).  It uses those values as the starting 
> > point for winding back to calculate other dependent names.
>
> Ah, all right!
>
> >
> > Then  you will want to query the ranger for the range of index_5 on that 
> > edge..
>
> Fine! So the only tricky thing here is to select a proper SSA_NAME to query 
> right?
> In my case I need to cover situations like:

Note what ranger will get you has the very same limitations as what
register_edge_assert_for has - it will _not_ necessarily provide
something that, when "concatenated" with &&, reproduces the
original condition in full its full semantics.  That is, it's not a condition
"decomposition" tool either.

Richard.

>index.0_1 = (unsigned int) index_5(D);
>_2 = index.0_1 + 4294967287;
>if (_2 <= 114)
>
> or
>
>  _1 = aChar_8(D) == 1;
>  _2 = aChar_8(D) == 10;
>  _3 = _1 | _2;
>  if (_3 != 0)
>
> Anything Ranger can help me with?
>
> Martin
>
> >
> > so you will need a gimple ranger instance instead of an outgoing_range 
> > object.
> >
> > Andrew
>


[PATCH][GCC-10 backport] arm: Remove coercion from scalar argument to vmin & vmax intrinsics.

2020-10-06 Thread Srinath Parvathaneni via Gcc-patches
Hello,

Straight backport of Joe's patch with no changes.

This patch fixes an issue with vmin* and vmax* intrinsics which accept
a scalar argument. Previously when the scalar was of different width
to the vector elements this would generate __ARM_undef. This change
allows the scalar argument to be implicitly converted to the correct
width. Also tidied up the relevant unit tests, some of which would
have passed even if only one of two or three intrinsic calls had
compiled correctly.

Bootstrapped and tested on arm-none-eabi, gcc and CMSIS_DSP
testsuites are clean. OK for trunk?

Thanks,
Joe

gcc/ChangeLog:

2020-08-10  Joe Ramsay  

* config/arm/arm_mve.h (__arm_vmaxnmavq): Remove coercion of scalar
argument.
(__arm_vmaxnmvq): Likewise.
(__arm_vminnmavq): Likewise.
(__arm_vminnmvq): Likewise.
(__arm_vmaxnmavq_p): Likewise.
(__arm_vmaxnmvq_p): Likewise (and delete duplicate definition).
(__arm_vminnmavq_p): Likewise.
(__arm_vminnmvq_p): Likewise.
(__arm_vmaxavq): Likewise.
(__arm_vmaxavq_p): Likewise.
(__arm_vmaxvq): Likewise.
(__arm_vmaxvq_p): Likewise.
(__arm_vminavq): Likewise.
(__arm_vminavq_p): Likewise.
(__arm_vminvq): Likewise.
(__arm_vminvq_p): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Add test for mismatched
width of scalar argument.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u8.c: Likewise.

(cherry picked from commit 251950d899bc3c18b5775fe9fe20bebbdc8d15cb)


### Attachment also inlined for ease of reply###


diff --git 

Re: [PATCH] dbgcnt: print list after compilation

2020-10-06 Thread Richard Biener via Gcc-patches
On Tue, Oct 6, 2020 at 12:29 PM Martin Liška  wrote:
>
> Hello.
>
> Motivation of the patch is to display debug counter values after a 
> compilation.
> It's handy for bisection of a debug counter. The new output is printed to 
> stderr
> (instead of stdout) and it works fine with LTO as well.
>
> Sample output:
>
>counter name  counter value closed intervals
> -
>asan_use_after_scope  0 unset
>auto_inc_dec  0 unset
>ccp   29473 unset
>cfg_cleanup   292   unset
>cprop 45unset
>cse2_move2add 451   unset
>dce   740   unset
>dce_fast  15unset
>dce_ud15unset
>delete_trivial_dead   5747  unset
>devirt0 unset
>df_byte_scan  0 unset
>dom_unreachable_edges 10unset
>tail_call 393   [1, 4], [100, 200]
> ...
>
>
> Ready for master?

OK.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * common.opt: Remove -fdbg-cnt-list from deferred options.
> * dbgcnt.c (dbg_cnt_set_limit_by_index): Make a copy
> to original_limits.
> (dbg_cnt_list_all_counters): Print also current counter value
> and print to stderr.
> * opts-global.c (handle_common_deferred_options): Do not handle
> -fdbg-cnt-list.
> * opts.c (common_handle_option): Likewise.
> * toplev.c (finalize): Handle it after compilation here.
> ---
>   gcc/common.opt|  2 +-
>   gcc/dbgcnt.c  | 25 +++--
>   gcc/opts-global.c |  4 
>   gcc/opts.c|  5 -
>   gcc/toplev.c  |  4 
>   5 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 292c2de694e..7e789d1c47f 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1202,7 +1202,7 @@ Common Report Var(flag_data_sections)
>   Place data items into their own section.
>
>   fdbg-cnt-list
> -Common Report Var(common_deferred_options) Defer
> +Common Report Var(flag_dbg_cnt_list)
>   List all available debugging counters with their limits and counts.
>
>   fdbg-cnt=
> diff --git a/gcc/dbgcnt.c b/gcc/dbgcnt.c
> index 01893ce7238..2a2dd57507d 100644
> --- a/gcc/dbgcnt.c
> +++ b/gcc/dbgcnt.c
> @@ -45,6 +45,7 @@ static struct string2counter_map 
> map[debug_counter_number_of_counters] =
>   typedef std::pair limit_tuple;
>
>   static vec limits[debug_counter_number_of_counters];
> +static vec original_limits[debug_counter_number_of_counters];
>
>   static unsigned int count[debug_counter_number_of_counters];
>
> @@ -134,6 +135,8 @@ dbg_cnt_set_limit_by_index (enum debug_counter index, 
> const char *name,
> }
>   }
>
> +  original_limits[index] = limits[index].copy ();
> +
> return true;
>   }
>
> @@ -226,25 +229,27 @@ void
>   dbg_cnt_list_all_counters (void)
>   {
> int i;
> -  printf ("  %-30s %s\n", G_("counter name"), G_("closed intervals"));
> -  printf 
> ("-\n");
> +  fprintf (stderr, "  %-30s%-15s   %s\n", G_("counter name"),
> +  G_("counter value"), G_("closed intervals"));
> +  fprintf (stderr, 
> "-\n");
> for (i = 0; i < debug_counter_number_of_counters; i++)
>   {
> -  printf ("  %-30s ", map[i].name);
> -  if (limits[i].exists ())
> +  fprintf (stderr, "  %-30s%-15d   ", map[i].name, count[i]);
> +  if (original_limits[i].exists ())
> {
> - for (int j = limits[i].length () - 1; j >= 0; j--)
> + for (int j = original_limits[i].length () - 1; j >= 0; j--)
> {
> - printf ("[%u, %u]", limits[i][j].first, limits[i][j].second);
> + fprintf (stderr, "[%u, %u]", original_limits[i][j].first,
> +  original_limits[i][j].second);
>   if (j > 0)
> -   printf (", ");
> +   fprintf (stderr, ", ");
> }
> - putchar ('\n');
> + fprintf (stderr, "\n");
> }
> else
> -   printf ("unset\n");
> +   fprintf (stderr, "unset\n");
>   }
> -  printf ("\n");
> +  fprintf (stderr, "\n");
>   }
>
>   #if CHECKING_P
> diff --git a/gcc/opts-global.c b/gcc/opts-global.c
> index b024ab8e18f..1816acf805b 100644
> --- a/gcc/opts-global.c
> +++ b/gcc/opts-global.c
> @@ -378,10 +378,6 @@ handle_common_deferred_options (void)
>   dbg_cnt_process_opt (opt->arg);
>   break;
>
> -   case OPT_fdbg_cnt_list:
> - dbg_cnt_list_all_counters 

Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-10-06 Thread Dennis Zhang via Gcc-patches
On 9/16/20 4:00 PM, Dennis Zhang wrote:
> Hi all,
> 
> This patch enables SIMD modes for MVE auto-vectorization.
> In this patch, the integer and float MVE SIMD modes are returned by
> arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> MVE or MVE_FLOAT is enabled.
> Then the expanders for auto-vectorization can be used for generating MVE
> SIMD code.
> 
> This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> revealed by the enabled MVE SIMD modes.
> The tests are for checking the MVE reinterpret intrinsics.
> There are two functions in each of the tests. The two functions contain
> the pattern of identical code so that they are folded in icf pass.
> Because of icf, the instruction count only checks one function which is 8.
> However when the SIMD modes are enabled, the estimation of the code size
> becomes smaller so that inlining is applied after icf, then the
> instruction count becomes 16 which causes failure of the tests.
> Because the icf is not the expected pattern to be tested but causes
> above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> instruction count.
> 
> This patch is separated from
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> because this part is not strongly connected to the aim of that one so
> that causing confusion.
> 
> Regtested and bootstraped.
> 
> Is it OK for trunk please?
> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-09-15  Dennis Zhang  
> 
>   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-09-15  Dennis Zhang  
> 
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>   option -fno-ipa-icf and change the instruction count from 8 to 16.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> 

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html


Re: make sincos take type from intrinsic formal, not from result assignment

2020-10-06 Thread Richard Biener via Gcc-patches
On Tue, Oct 6, 2020 at 11:34 AM Alexandre Oliva  wrote:
>
> On Oct  6, 2020, Richard Biener  wrote:
>
> > OK, I see.  mathfn_built_in expects a type inter-operating with
> > the C ABI types (float_type_node, double_type_node, etc.) where
> > "inter-operating" means having the same main variant.
>
> Yup.
>
> > Now, I guess for the sincos pass we want to combine sinl + cosl
> > to sincosl, independent on the case where the result would be
> > assigned to a 'double' when 'double == long double'?
>
> Sorry, I goofed in the patch description and misled you.
>
> When looking at
>
>   _d = sin (_s);
>
> the sincos didn't take the type of _d, but that of _s.
>
> I changed it so it takes the not from the actual passed to the
> intrinsic, but from the formal in the intrinsic declaration.

Yes, I understand.

> If we had conversions of _s to different precisions, the optimization
> wouldn't kick in: we'd have different actuals passed to sin and cos.
> I'm not sure it makes much sense to try to turn e.g.
>
>   _d1 = sin (_s);
>   _t = (float) _s;
>   _d2 = cosf (_t);
>
> into:
>
>   sincos (_s, , );
>   _d1 = D1;
>   _td2 = T;
>   _d2 = (float) _td2;
>
> If someone goes through the trouble of computing sin and cos for the
> same angle at different precisions, you might as well leave it alone.
>
> > Now what about sinl + cos when 'double == long double'?
>
> Now that might make more sense to optimize, but if we're going to do
> such transformations, we might as well canonicalize the *l intrinsics to
> the equivalent double versions (assuming long double and double have the
> same precision), and then sincos will get it right.

Ah, we eventually already do that.

So how about that mathfn_type helper instead of hard-wring this logic
in sincos()?

Richard.

>
> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer


Re: [PATCH] optimize permutes in SLP, remove vect_attempt_slp_rearrange_stmts

2020-10-06 Thread Richard Biener
On Tue, 6 Oct 2020, Richard Biener wrote:

> On Fri, 2 Oct 2020, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > This introduces a permute optimization phase for SLP which is
> > > intended to cover the existing permute eliding for SLP reductions
> > > plus handling commonizing the easy cases.
> > >
> > > It currently uses graphds to compute a postorder on the reverse
> > > SLP graph and it handles all cases vect_attempt_slp_rearrange_stmts
> > > did (hopefully - I've adjusted most testcases that triggered it
> > > a few days ago).  It restricts itself to move around bijective
> > > permutations to simplify things for now, mainly around constant nodes.
> > >
> > > As a prerequesite it makes the SLP graph cyclic (ugh).  It looks
> > > like it would pay off to compute a PRE/POST order visit array
> > > once and elide all the recursive SLP graph walks and their
> > > visited hash-set.  At least for the time where we do not change
> > > the SLP graph during such walk.
> > >
> > > I do not like using graphds too much but at least I don't have to
> > > re-implement yet another RPO walk, so maybe it isn't too bad.
> > >
> > > Comments are welcome - I do want to see vect_attempt_slp_rearrange_stmts
> > > go way for GCC 11 and the permute optimization helps non-store
> > > BB vectorization opportunities where we can end up with a lot of
> > > useless load permutes otherwise.
> > 
> > Looks really nice.  Got a couple of questions that probably just show
> > my misunderstanding :-)
> > 
> > Is this intended to compute an optimal-ish solution?
> 
> The intent was to keep it simple but compute a solution that will
> not increase the number of permutes.
> 
> > It looked from
> > a quick read like it tried to push permutes as far away from loads as
> > possible without creating permuted and unpermuted versions of the same
> > node.  But I guess there will be cases where the optimal placement is
> > somewhere between the two extremes of permuting at the loads and
> > permuting as far away as possible.
> 
> So what it does is that it pushes permutes away from the loads until
> there's a use requiring a different permutation.  But handling of
> constants/externals as having "all" permutations causes us to push
> permutes along binary ops with one constant/external argument (in
> addition to pushing it along all unary operations).
> 
> I have some patches that try to unify constant/external nodes during
> SLP build (we're currently _not_ sharing them, thus not computing their
> cost correctly) - once that's in (not sure if it happens this stage1)
> it would make sense to try to not have too many different permutes
> of constants/externals (esp. externals I guess).
> 
> Now, did you have some other sub-optimality in mind?
> 
> > Of course, whatever we do will be a heuristic.  I just wasn't sure how
> > often this would be best in practice.
> 
> Yeah, so I'm not sure where in a "series" of unary ops we'd want to
> push a permutation.  The argument could be to leave it at the load
> for as little as possible changes from the current handling.  That
> could be done with a reverse propagation stage.  I'll see if
> splitting out some predicates from the current code makes it not
> too much duplication to introduce this.
> 
> > It looks like the materialisation phase changes the choices for nodes
> > on the fly, is that right?  If so, how does that work for backedges?
> > I'd expected the materialisation phase to treat the permutation choice
> > as read-only, and simply implement what the graph already said.
> 
> The materialization phase is also the decision stage (wanted to avoid
> duplicating the loop).  When we materialize a permutation at the
> node which has differing uses we have to update the graph from there.
> As for backedges I wasn't sure and indeed there may be bugs - I do
> have to investigate one libgomp FAIL from the testing.  It would be
> odd to require iteration in the decision stage again but in case we're
> breaking a cycle we have to re-consider the backedge permutation as well.
> Which would mean we'd better to the decision where to materialize during
> the propagation stage(?)
> 
> I'm going to analyze the FAIL now.

OK, that one was a stupid mistake (passing hash_set<> by value).

The following adjusted patch computes the materialization points
during iteration so we should handle backedges more obviously
correct (I guess the previous variant worked because the SLP
graphs with backedges are quite special with only "perfect cycles"
allowed).

The question remains on whether we want to use graphds or whether
we want a (lazily filled?) SLP_TREE_PARENTS array and compute the
RPO on the reverse graph on the SLP data structure (we only need
an iteration order that has at least one child visited before
visiting parents, but we still need the reverse edges - still
a pre-order on the reverse graph will likely work as well, just
not converge as quickly eventually).

Thoughts on that?

Otherwise 

Re: [PATCH] options: Avoid unused variable mask warning [PR97305]

2020-10-06 Thread Richard Biener
On Tue, 6 Oct 2020, Jakub Jelinek wrote:

> Hi!
> 
> On Tue, Oct 06, 2020 at 11:28:22AM +0200, Andreas Schwab wrote:
> > options-save.c: In function 'void cl_target_option_save(cl_target_option*, 
> > gcc_options*, gcc_options*)':
> > options-save.c:8526:26: error: unused variable 'mask' 
> > [-Werror=unused-variable]
> >  8526 |   unsigned HOST_WIDE_INT mask = 0;
> >   |  ^~~~
> > options-save.c: In function 'void cl_target_option_restore(gcc_options*, 
> > gcc_options*, cl_target_option*)':
> > options-save.c:8537:26: error: unused variable 'mask' 
> > [-Werror=unused-variable]
> >  8537 |   unsigned HOST_WIDE_INT mask;
> >   |  ^~~~
> 
> Oops, missed that, sorry.
> 
> The following patch should fix that, tested on x86_64-linux make
> options-save.c (same file as before) and -> ia64-linux cross make
> options-save.o (no warning anymore, just the unwanted declarations gone).
> 
> Ok for trunk if it passes bootstrap/regtest?

OK.

Richard.

> 2020-10-06  Jakub Jelinek  
> 
>   PR bootstrap/97305
>   * optc-save-gen.awk: Don't declare mask variable if explicit_mask
>   array is not present.
> 
> --- gcc/optc-save-gen.awk.jj  2020-10-05 09:34:26.561874335 +0200
> +++ gcc/optc-save-gen.awk 2020-10-06 14:44:04.679556591 +0200
> @@ -597,11 +597,13 @@ for (i = 0; i < n_target_string; i++) {
>  }
>  
>  print "";
> -print "  unsigned HOST_WIDE_INT mask = 0;";
>  
>  j = 0;
>  k = 0;
>  for (i = 0; i < n_extra_target_vars; i++) {
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" extra_target_vars[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -617,6 +619,9 @@ for (i = 0; i < n_target_other; i++) {
>   print "  ptr->explicit_mask_" var_target_other[i] " = 
> opts_set->x_" var_target_other[i] ";";
>   continue;
>   }
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_other[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -628,6 +633,9 @@ for (i = 0; i < n_target_other; i++) {
>  }
>  
>  for (i = 0; i < n_target_enum; i++) {
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_enum[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -643,6 +651,9 @@ for (i = 0; i < n_target_int; i++) {
>   print "  ptr->explicit_mask_" var_target_int[i] " = 
> opts_set->x_" var_target_int[i] ";";
>   continue;
>   }
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_int[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -654,6 +665,9 @@ for (i = 0; i < n_target_int; i++) {
>  }
>  
>  for (i = 0; i < n_target_short; i++) {
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_short[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -665,6 +679,9 @@ for (i = 0; i < n_target_short; i++) {
>  }
>  
>  for (i = 0; i < n_target_char; i++) {
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_char[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -676,6 +693,9 @@ for (i = 0; i < n_target_char; i++) {
>  }
>  
>  for (i = 0; i < n_target_string; i++) {
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_string[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -732,7 +752,9 @@ for (i = 0; i < n_target_string; i++) {
>  }
>  
>  print "";
> -print "  unsigned HOST_WIDE_INT mask;";
> +if (has_target_explicit_mask) {
> + print "  unsigned HOST_WIDE_INT mask;";
> +}
>  
>  j = 64;
>  k = 0;
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


[PATCH][openacc] Fix acc declare for VLAs

2020-10-06 Thread Tom de Vries
Hi,

Consider test-case test.c, with VLA A:
...
int main (void) {
  int N = 1000;
  int A[N];
  #pragma acc declare copy(A)
  return 0;
}
...
compiled using:
...
$ gcc test.c -fopenacc -S -fdump-tree-all
...

At original, we have:
...
  #pragma acc declare map(tofrom:A);
...
but at gimple, we have a map (to:A.1), but not a map (from:A.1):
...
  int[0:D.2074] * A.1;

  {
int A[0:D.2074] [value-expr: *A.1];

saved_stack.2 = __builtin_stack_save ();
try
  {
A.1 = __builtin_alloca_with_align (D.2078, 32);
#pragma omp target oacc_declare map(to:(*A.1) [len: D.2076])
  }
finally
  {
__builtin_stack_restore (saved_stack.2);
  }
  }
...

This is caused by the following incompatibility.  When storing the desired
from clause in oacc_declare_returns, we use 'A.1' as the key:
...
10898 oacc_declare_returns->put (decl, c);
(gdb) call debug_generic_expr (decl)
A.1
(gdb) call debug_generic_expr (c)
map(from:(*A.1))
...
but when looking it up, we use 'A' as the key:
...
(gdb)
1471  tree *c = oacc_declare_returns->get (t);
(gdb) call debug_generic_expr (t)
A
...

Fix this by extracing the 'A.1' lookup key from 'A' using the decl-expr.

In addition, unshare the looked up value, to fix avoid running into
an "incorrect sharing of tree nodes" error.

Using these two fixes, we get our desired:
...
 finally
   {
+#pragma omp target oacc_declare map(from:(*A.1))
 __builtin_stack_restore (saved_stack.2);
   }
...

Build on x86_64-linux with nvptx accelerator, tested libgomp.

OK for trunk?

Thanks,
- Tom

[openacc] Fix acc declare for VLAs

gcc/ChangeLog:

2020-10-06  Tom de Vries  

PR middle-end/90861
* gimplify.c (gimplify_bind_expr): Handle lookup in
oacc_declare_returns using key with decl-expr.

libgomp/ChangeLog:

2020-10-06  Tom de Vries  

PR middle-end/90861
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Remove xfail.

---
 gcc/gimplify.c| 13 ++---
 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c |  5 -
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2dea03cce3d..fa89e797940 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1468,15 +1468,22 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)
 
  if (flag_openacc && oacc_declare_returns != NULL)
{
- tree *c = oacc_declare_returns->get (t);
+ tree key = t;
+ if (DECL_HAS_VALUE_EXPR_P (key))
+   {
+ key = DECL_VALUE_EXPR (key);
+ if (TREE_CODE (key) == INDIRECT_REF)
+   key = TREE_OPERAND (key, 0);
+   }
+ tree *c = oacc_declare_returns->get (key);
  if (c != NULL)
{
  if (ret_clauses)
OMP_CLAUSE_CHAIN (*c) = ret_clauses;
 
- ret_clauses = *c;
+ ret_clauses = unshare_expr (*c);
 
- oacc_declare_returns->remove (t);
+ oacc_declare_returns->remove (key);
 
  if (oacc_declare_returns->is_empty ())
{
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
index 0f51badca42..714935772c1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
@@ -59,8 +59,3 @@ main ()
 
   return 0;
 }
-
-
-/* { dg-xfail-run-if "TODO PR90861" { *-*-* } { "-DACC_MEM_SHARED=0" } }
-   This might XPASS if the compiler happens to put the two 'A' VLAs at the same
-   address.  */


Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 9:09 AM, Martin Liška wrote:

On 10/6/20 2:56 PM, Andrew MacLeod wrote:
Ah, by just using the outgoing_range class, all you are getting is 
static edges.  so a TRUE edge is always a [1,1] and a false edge is [0,0]

I provided that class so you could get the constant edges on switches.

if you want to get actual ranges for ssa-names, you will need the 
ranger (which I think is going in today).  It uses those values as 
the starting point for winding back to calculate other dependent names.


Ah, all right!



Then  you will want to query the ranger for the range of index_5 on 
that edge..


Fine! So the only tricky thing here is to select a proper SSA_NAME to 
query right?

In my case I need to cover situations like:

  index.0_1 = (unsigned int) index_5(D);
  _2 = index.0_1 + 4294967287;
  if (_2 <= 114)

or

    _1 = aChar_8(D) == 1;
    _2 = aChar_8(D) == 10;
    _3 = _1 | _2;
    if (_3 != 0)

Anything Ranger can help me with?

Martin



Well, it *does* assume you know the name of what you are looking for :-P


however, lets see.  it does know the names of things it can generate 
ranges for. We havent gotten around to adding an API for querying 
that.  but that would be possible.


It maintains an export list of names it can calculate ranges for (as a 
bitmap). so for your 2 examples,  the export list of the first block 
contains

   _2, index.0_1, and index_5
and in the second case, it would contain
_3, _2, _1, and aChar_8

So even if you had access to the export list, you'd still have to figure 
out which one you wanted so I'm not sure that helps.  But i suppose you 
could go thru the list looking for something interesting.


I have longer term plans to expose/determine the "control names" which 
trigger the branch, and would be  '_2' in the first example and 
'aChar_8' in the second... but that facility is not built yet.






so you will need a gimple ranger instance instead of an 
outgoing_range object.


Andrew






[PATCH] options: Avoid unused variable mask warning [PR97305]

2020-10-06 Thread Jakub Jelinek via Gcc-patches
Hi!

On Tue, Oct 06, 2020 at 11:28:22AM +0200, Andreas Schwab wrote:
> options-save.c: In function 'void cl_target_option_save(cl_target_option*, 
> gcc_options*, gcc_options*)':
> options-save.c:8526:26: error: unused variable 'mask' 
> [-Werror=unused-variable]
>  8526 |   unsigned HOST_WIDE_INT mask = 0;
>   |  ^~~~
> options-save.c: In function 'void cl_target_option_restore(gcc_options*, 
> gcc_options*, cl_target_option*)':
> options-save.c:8537:26: error: unused variable 'mask' 
> [-Werror=unused-variable]
>  8537 |   unsigned HOST_WIDE_INT mask;
>   |  ^~~~

Oops, missed that, sorry.

The following patch should fix that, tested on x86_64-linux make
options-save.c (same file as before) and -> ia64-linux cross make
options-save.o (no warning anymore, just the unwanted declarations gone).

Ok for trunk if it passes bootstrap/regtest?

2020-10-06  Jakub Jelinek  

PR bootstrap/97305
* optc-save-gen.awk: Don't declare mask variable if explicit_mask
array is not present.

--- gcc/optc-save-gen.awk.jj2020-10-05 09:34:26.561874335 +0200
+++ gcc/optc-save-gen.awk   2020-10-06 14:44:04.679556591 +0200
@@ -597,11 +597,13 @@ for (i = 0; i < n_target_string; i++) {
 }
 
 print "";
-print "  unsigned HOST_WIDE_INT mask = 0;";
 
 j = 0;
 k = 0;
 for (i = 0; i < n_extra_target_vars; i++) {
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" extra_target_vars[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -617,6 +619,9 @@ for (i = 0; i < n_target_other; i++) {
print "  ptr->explicit_mask_" var_target_other[i] " = 
opts_set->x_" var_target_other[i] ";";
continue;
}
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_other[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -628,6 +633,9 @@ for (i = 0; i < n_target_other; i++) {
 }
 
 for (i = 0; i < n_target_enum; i++) {
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_enum[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -643,6 +651,9 @@ for (i = 0; i < n_target_int; i++) {
print "  ptr->explicit_mask_" var_target_int[i] " = 
opts_set->x_" var_target_int[i] ";";
continue;
}
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_int[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -654,6 +665,9 @@ for (i = 0; i < n_target_int; i++) {
 }
 
 for (i = 0; i < n_target_short; i++) {
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_short[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -665,6 +679,9 @@ for (i = 0; i < n_target_short; i++) {
 }
 
 for (i = 0; i < n_target_char; i++) {
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_char[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -676,6 +693,9 @@ for (i = 0; i < n_target_char; i++) {
 }
 
 for (i = 0; i < n_target_string; i++) {
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_string[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -732,7 +752,9 @@ for (i = 0; i < n_target_string; i++) {
 }
 
 print "";
-print "  unsigned HOST_WIDE_INT mask;";
+if (has_target_explicit_mask) {
+   print "  unsigned HOST_WIDE_INT mask;";
+}
 
 j = 64;
 k = 0;


Jakub



Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Martin Liška

On 10/6/20 2:56 PM, Andrew MacLeod wrote:

Ah, by just using the outgoing_range class, all you are getting is static 
edges.  so a TRUE edge is always a [1,1] and a false edge is [0,0]
I provided that class so you could get the constant edges on switches.

if you want to get actual ranges for ssa-names, you will need the ranger (which 
I think is going in today).  It uses those values as the starting point for 
winding back to calculate other dependent names.


Ah, all right!



Then  you will want to query the ranger for the range of index_5 on that edge..


Fine! So the only tricky thing here is to select a proper SSA_NAME to query 
right?
In my case I need to cover situations like:

  index.0_1 = (unsigned int) index_5(D);
  _2 = index.0_1 + 4294967287;
  if (_2 <= 114)

or

_1 = aChar_8(D) == 1;
_2 = aChar_8(D) == 10;
_3 = _1 | _2;
if (_3 != 0)

Anything Ranger can help me with?

Martin



so you will need a gimple ranger instance instead of an outgoing_range object.

Andrew




Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 8:55 AM, Jakub Jelinek wrote:

On Tue, Oct 06, 2020 at 08:47:53AM -0400, Andrew MacLeod wrote:

I think the proper alignment will be guaranteed if irange and tree[] are
obstack_alloc'd separately.  They don't need to be adjacent, do they?



They do not, it just seemed wasteful to do 2 allocs each time, and it'd be
nice to have them co-located since accessing one inevitable leads to
accessing the other.

When using normal allocator like malloc or ggc allocation I'd totally agree
here, but I actually think with obstack it is better to do two successive
allocations.
obstack_alloc is generally pretty cheap, in the common case there is room
for the allocation and so it just bumps the next pointer in the structure
and that is it, for the two allocation the same like with one.
And, if there is room just for the irange and not for the subsequent
allocation, with two allocations they wouldn't be collocated, but wouldn't
waste the memory that would otherwise remain unused.

Jakub

okeydoke then.  we can do 2 allocations.




Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 8:09 AM, Martin Liška wrote:

On 10/2/20 4:19 PM, Andrew MacLeod wrote:

On 10/2/20 9:26 AM, Martin Liška wrote:
Yes, you simply get all sorts of conditions that hold when a 
condition is

true, not just those based on the SSA name you put in.  But it occured
to me that the use-case is somewhat different - for switch-conversion
you want to know whether the test _exactly_ matches a range test,
the VRP worker will not tell you that.  For example if you had
if (x &&  a > 3 && a < 7) then it will give you 'a in [4, 6]' and 
it might
not give you 'x in [1, 1]' (for example if x is float).  But that's 
required

for correctness.


Hello.

Adding Ranger guys. Is it something that can be handled by the 
upcoming changes in VRP?


Presumably. It depends on exactly how the code lays out.  We dont 
process floats, so we wont know anything about the float (at least 
this release :-).  We will sort through complex logicals and tell you 
what we do know, so if x is integral



    if (x &&  a > 3 && a < 7)

will give you, on the final true edge:

x_5(D)  int [-INF, -1][1, +INF]
a_6(D)  int [4, 6]


IF x is a float, then we wont give you anything for x obviously, but 
on the eventual true edge we'd still give you

a_6(D)  int [4, 6]


Which is an acceptable limitation for me.

However, I can't convince ranger to give me a proper ranges for. I'm 
using the following

code snippet:

  outgoing_range query;

  edge e;
  edge_iterator ei;
  FOR_EACH_EDGE (e, ei, bb->succs)
    {
  int_range_max range;
  if (query.edge_range_p (range, e))
{
  if (dump_file)
    {
  fprintf (dump_file, "%d->%d: ", e->src->index, e->dest->index);
  range.dump(dump_file);
  fprintf (dump_file, "\n");
    }
}
    }


if (9 <= index && index <= 123)
    return 123;

   :
  index.0_1 = (unsigned int) index_5(D);
  _2 = index.0_1 + 4294967287;
  if (_2 <= 114)
    goto ; [INV]
  else
    goto ; [INV]

I get:

2->3: _Bool [1, 1]
2->4: _Bool [0, 0]

Can I get to index_5 [9, 123] ?
Thanks,
Martin

Ah, by just using the outgoing_range class, all you are getting is 
static edges.  so a TRUE edge is always a [1,1] and a false edge is [0,0]

I provided that class so you could get the constant edges on switches.

if you want to get actual ranges for ssa-names, you will need the ranger 
(which I think is going in today).  It uses those values as the starting 
point for winding back to calculate other dependent names.


Then  you will want to query the ranger for the range of index_5 on that 
edge..


so you will need a gimple ranger instance instead of an outgoing_range 
object.


Andrew



Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 06, 2020 at 08:47:53AM -0400, Andrew MacLeod wrote:
> > I think the proper alignment will be guaranteed if irange and tree[] are
> > obstack_alloc'd separately.  They don't need to be adjacent, do they?
> > 
> > 
> They do not, it just seemed wasteful to do 2 allocs each time, and it'd be
> nice to have them co-located since accessing one inevitable leads to
> accessing the other.

When using normal allocator like malloc or ggc allocation I'd totally agree
here, but I actually think with obstack it is better to do two successive
allocations.
obstack_alloc is generally pretty cheap, in the common case there is room
for the allocation and so it just bumps the next pointer in the structure
and that is it, for the two allocation the same like with one.
And, if there is room just for the irange and not for the subsequent
allocation, with two allocations they wouldn't be collocated, but wouldn't
waste the memory that would otherwise remain unused.

Jakub



Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 6:22 AM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Oct 06, 2020 at 11:20:52AM +0200, Aldy Hernandez wrote:

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 94b48e55e77..7031a823138 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -670,7 +670,7 @@ irange_allocator::allocate (unsigned num_pairs)

 struct newir {
   irange range;
-tree mem[1];
+tree mem[2];
 };
 size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
 struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);

So, we essentially want a flexible array member, which C++ without extension
doesn't have, and thus need to rely on the compiler handling the trailing
array as a poor men's flexible array member (again, GCC does for any size,
but not 100% sure about other compilers, if they e.g. don't handle that way
just size of 1).

We know we need _at least_ two trees, so what's wrong with the above?

See the discussions we even had in GCC.  Some of us are arguing that only
flexible array member should be treated as such, others also add [0] to
that, others [1] and others any arrays at the trailing positions.
Because standard C++ lacks both [] and [0], at least [1] support is needed
eventhough perhaps pedantically it is invalid.  GCC after all heavily relies
on that elsewhere, e.g. in rtl or gimple structures.  But it is still all
just [1], not [2] or [32].  And e.g. Coverity complains about that a lot.
There is another way around it, using [MAXIMUM_POSSIBLE_COUNT] instead and
then allocating only a subset of those using offsetof to count the size.
But that is undefined in a different way, would probably make Coverity
happy and e.g. for RTL is doable because we have maximum number of operands,
and for many gimple stmts too, except that e.g. GIMPLE_CALLs don't really
have a maximum (well, have it as UINT_MAX - 3 or so).

GCC to my knowledge will treat all the trailing arrays that way, but it is
unclear if other compilers do the same or not.
You can use mem[1] and just use
   size_t nbytes = sizeof (newir) + sizeof (tree) * (2 * num_pairs - 1);


sure that is fine too. I was not aware of any issue with changing [1] to 
[2], it just seemed like the obvious thing :-).


so everything is copacetic  if we go back to [1] and add a sizeof(tree) 
instead?


Andrew






Is there any reason why the code is written that way?
I mean, we could just use:
size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;

We had that originally, but IIRC, the alignment didn't come out right.

That surprises me, because I don't understand how it could (unless irange
didn't have a pointer member at that point).

Jakub





Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

On 10/6/20 6:40 AM, Andreas Schwab wrote:

On Okt 06 2020, Jakub Jelinek wrote:


On Tue, Oct 06, 2020 at 10:47:34AM +0200, Andreas Schwab wrote:

On Okt 06 2020, Jakub Jelinek via Gcc-patches wrote:


I mean, we could just use:
   size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;
   irange *r = (irange *) obstack_alloc (_obstack, nbytes);
   return new (r) irange ((tree *) (r + 1), num_pairs);
without any new type.

Modulo proper alignment.

Sure, but irange's last element is tree * which is pointer to pointer,
and we need here an array of tree, i.e. pointers.  So, it would indeed break
on a hypothetical host that has smaller struct X ** alignment than struct X *
alignment.  I'm not aware of any.
One could add a static_assert to verify that (that alignof (irange) >= alignof 
(tree)
and that sizeof (irange) % alignof (tree) == 0).

I think the proper alignment will be guaranteed if irange and tree[] are
obstack_alloc'd separately.  They don't need to be adjacent, do they?


They do not, it just seemed wasteful to do 2 allocs each time, and it'd 
be nice to have them co-located since accessing one inevitable leads to 
accessing the other.






[PATCH][GCC] arm: Move iterators from mve.md to iterators.md to maintain consistency.

2020-10-06 Thread Srinath Parvathaneni via Gcc-patches
Hello,

To maintain consistency with other Arm Architectures backend, iterators and 
iterator attributes are moved
from mve.md file to iterators.md. Also move enumerators for MVE unspecs from 
mve.md file to unspecs.md file.

Regression tested on arm-none-eabi and found no regressions.

Ok for master? Ok for GCC-10 branch?

Regards,
Srinath.

gcc/ChangeLog:

2020-10-06  Srinath Parvathaneni  

* config/arm/iterators.md (MVE_types): Move mode iterator from mve.md to
iterators.md.
(MVE_VLD_ST): Likewise.
(MVE_0): Likewise.
(MVE_1): Likewise.
(MVE_3): Likewise.
(MVE_2): Likewise.
(MVE_5): Likewise.
(MVE_6): Likewise.
(MVE_CNVT): Move mode attribute iterator from mve.md to iterators.md.
(MVE_LANES): Likewise.
(MVE_constraint): Likewise.
(MVE_constraint1): Likewise.
(MVE_constraint2): Likewise.
(MVE_constraint3): Likewise.
(MVE_pred): Likewise.
(MVE_pred1): Likewise.
(MVE_pred2): Likewise.
(MVE_pred3): Likewise.
(MVE_B_ELEM): Likewise.
(MVE_H_ELEM): Likewise.
(V_sz_elem1): Likewise.
(V_extr_elem): Likewise.
(earlyclobber_32): Likewise.
(supf): Move int attribute from mve.md to iterators.md.
(mode1): Likewise.
(VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
(VMVNQ_N): Likewise.
(VREV64Q): Likewise.
(VCVTQ_FROM_F): Likewise.
(VREV16Q): Likewise.
(VCVTAQ): Likewise.
(VMVNQ): Likewise.
(VDUPQ_N): Likewise.
(VCLZQ): Likewise.
(VADDVQ): Likewise.
(VREV32Q): Likewise.
(VMOVLBQ): Likewise.
(VMOVLTQ): Likewise.
(VCVTPQ): Likewise.
(VCVTNQ): Likewise.
(VCVTMQ): Likewise.
(VADDLVQ): Likewise.
(VCTPQ): Likewise.
(VCTPQ_M): Likewise.
(VCVTQ_N_TO_F): Likewise.
(VCREATEQ): Likewise.
(VSHRQ_N): Likewise.
(VCVTQ_N_FROM_F): Likewise.
(VADDLVQ_P): Likewise.
(VCMPNEQ): Likewise.
(VSHLQ): Likewise.
(VABDQ): Likewise.
(VADDQ_N): Likewise.
(VADDVAQ): Likewise.
(VADDVQ_P): Likewise.
(VANDQ): Likewise.
(VBICQ): Likewise.
(VBRSRQ_N): Likewise.
(VCADDQ_ROT270): Likewise.
(VCADDQ_ROT90): Likewise.
(VCMPEQQ): Likewise.
(VCMPEQQ_N): Likewise.
(VCMPNEQ_N): Likewise.
(VEORQ): Likewise.
(VHADDQ): Likewise.
(VHADDQ_N): Likewise.
(VHSUBQ): Likewise.
(VHSUBQ_N): Likewise.
(VMAXQ): Likewise.
(VMAXVQ): Likewise.
(VMINQ): Likewise.
(VMINVQ): Likewise.
(VMLADAVQ): Likewise.
(VMULHQ): Likewise.
(VMULLBQ_INT): Likewise.
(VMULLTQ_INT): Likewise.
(VMULQ): Likewise.
(VMULQ_N): Likewise.
(VORNQ): Likewise.
(VORRQ): Likewise.
(VQADDQ): Likewise.
(VQADDQ_N): Likewise.
(VQRSHLQ): Likewise.
(VQRSHLQ_N): Likewise.
(VQSHLQ): Likewise.
(VQSHLQ_N): Likewise.
(VQSHLQ_R): Likewise.
(VQSUBQ): Likewise.
(VQSUBQ_N): Likewise.
(VRHADDQ): Likewise.
(VRMULHQ): Likewise.
(VRSHLQ): Likewise.
(VRSHLQ_N): Likewise.
(VRSHRQ_N): Likewise.
(VSHLQ_N): Likewise.
(VSHLQ_R): Likewise.
(VSUBQ): Likewise.
(VSUBQ_N): Likewise.
(VADDLVAQ): Likewise.
(VBICQ_N): Likewise.
(VMLALDAVQ): Likewise.
(VMLALDAVXQ): Likewise.
(VMOVNBQ): Likewise.
(VMOVNTQ): Likewise.
(VORRQ_N): Likewise.
(VQMOVNBQ): Likewise.
(VQMOVNTQ): Likewise.
(VSHLLBQ_N): Likewise.
(VSHLLTQ_N): Likewise.
(VRMLALDAVHQ): Likewise.
(VBICQ_M_N): Likewise.
(VCVTAQ_M): Likewise.
(VCVTQ_M_TO_F): Likewise.
(VQRSHRNBQ_N): Likewise.
(VABAVQ): Likewise.
(VSHLCQ): Likewise.
(VRMLALDAVHAQ): Likewise.
(VADDVAQ_P): Likewise.
(VCLZQ_M): Likewise.
(VCMPEQQ_M_N): Likewise.
(VCMPEQQ_M): Likewise.
(VCMPNEQ_M_N): Likewise.
(VCMPNEQ_M): Likewise.
(VDUPQ_M_N): Likewise.
(VMAXVQ_P): Likewise.
(VMINVQ_P): Likewise.
(VMLADAVAQ): Likewise.
(VMLADAVQ_P): Likewise.
(VMLAQ_N): Likewise.
(VMLASQ_N): Likewise.
(VMVNQ_M): Likewise.
(VPSELQ): Likewise.
(VQDMLAHQ_N): Likewise.
(VQRDMLAHQ_N): Likewise.
(VQRDMLASHQ_N): Likewise.
(VQRSHLQ_M_N): Likewise.
(VQSHLQ_M_R): Likewise.
(VREV64Q_M): Likewise.
(VRSHLQ_M_N): Likewise.
(VSHLQ_M_R): Likewise.
(VSLIQ_N): Likewise.
(VSRIQ_N): Likewise.
(VMLALDAVQ_P): Likewise.
(VQMOVNBQ_M): Likewise.
(VMOVLTQ_M): Likewise.
(VMOVNBQ_M): Likewise.

[PATCH] openmp: Improve composite simd vectorization

2020-10-06 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 06, 2020 at 08:22:13AM +0200, Richard Biener wrote:
> > I was really hoping bbs 4 and 5 would be one loop (the one I set safelen
> > and force_vectorize etc. for) and that basic blocks 6 and 7 would be
> > together with that inner loop another loop, but apparently loop discovery
> > thinks it is just one loop.
> > Any ideas what I'm doing wrong or is there any way how to make it two loops
> > (that would also survive all the cfg cleanups until vectorization)?
> 
> The early CFG looks like we have a common header with two latches
> so it boils down to how we disambiguate those in the end (we seem
> to unify the latches via a forwarder).  IIRC OMP lowering builds
> loops itself, could it not do the appropriate disambiguation itself?

I realized I emit the same stmts on both paths (before goto doit; and before
falling through it), at least the MIN_EXPR and PLUS_EXPR, so by forcing
there an extra bb which does those two and having the "doit" label before
that the innermost loop doesn't have multiple latches anymore and so is
vectorized fine.

Will commit this after full bootstrap/regtest.

Thanks.

2020-10-06  Jakub Jelinek  

* omp-expand.c (expand_omp_simd): Don't emit MIN_EXPR and PLUS_EXPR
at the end of entry_bb and innermost init_bb, instead force arguments
for MIN_EXPR into temporaries in both cases and jump to a new bb that
performs MIN_EXPR and PLUS_EXPR.

* gcc.dg/gomp/simd-2.c: New test.
* gcc.dg/gomp/simd-3.c: New test.

--- gcc/omp-expand.c.jj 2020-09-26 10:09:57.524001314 +0200
+++ gcc/omp-expand.c2020-10-06 13:38:14.295073351 +0200
@@ -6347,6 +6347,7 @@ expand_omp_simd (struct omp_region *regi
   tree n2var = NULL_TREE;
   tree n2v = NULL_TREE;
   tree *nonrect_bounds = NULL;
+  tree min_arg1 = NULL_TREE, min_arg2 = NULL_TREE;
   if (fd->collapse > 1)
 {
   if (broken_loop || gimple_omp_for_combined_into_p (fd->for_stmt))
@@ -6406,9 +6407,10 @@ expand_omp_simd (struct omp_region *regi
 fold_convert (itype, fd->loops[i].step));
  t = fold_convert (type, t);
  tree t2 = fold_build2 (MINUS_EXPR, type, n2, n1);
- t = fold_build2 (MIN_EXPR, type, t2, t);
- t = fold_build2 (PLUS_EXPR, type, fd->loop.v, t);
- expand_omp_build_assign (, n2var, t);
+ min_arg1 = create_tmp_var (type);
+ expand_omp_build_assign (, min_arg1, t2);
+ min_arg2 = create_tmp_var (type);
+ expand_omp_build_assign (, min_arg2, t);
}
   else
{
@@ -6815,7 +6817,16 @@ expand_omp_simd (struct omp_region *regi
}
  else
t = counts[i + 1];
- t = fold_build2 (MIN_EXPR, type, t2, t);
+ expand_omp_build_assign (, min_arg1, t2);
+ expand_omp_build_assign (, min_arg2, t);
+ e = split_block (init_bb, last_stmt (init_bb));
+ gsi = gsi_after_labels (e->dest);
+ init_bb = e->dest;
+ remove_edge (FALLTHRU_EDGE (entry_bb));
+ make_edge (entry_bb, init_bb, EDGE_FALLTHRU);
+ set_immediate_dominator (CDI_DOMINATORS, init_bb, entry_bb);
+ set_immediate_dominator (CDI_DOMINATORS, l1_bb, init_bb);
+ t = fold_build2 (MIN_EXPR, type, min_arg1, min_arg2);
  t = fold_build2 (PLUS_EXPR, type, fd->loop.v, t);
  expand_omp_build_assign (, n2var, t);
}
--- gcc/testsuite/gcc.dg/gomp/simd-2.c.jj   2020-10-06 13:33:53.568870663 
+0200
+++ gcc/testsuite/gcc.dg/gomp/simd-2.c  2020-10-06 13:32:59.674655600 +0200
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -fdump-tree-vect-details" } */
+/* { dg-additional-options "-mavx" { target avx } } */
+/* { dg-final { scan-tree-dump-times "vectorized \[1-9]\[0-9]* loops in 
function" 5 "vect" } } */
+
+int a[1][128];
+
+void
+foo (void)
+{
+  #pragma omp for simd schedule (simd: dynamic, 32) collapse(2)
+  for (int i = 0; i < 1; i++)
+for (int j = 0; j < 128; j++)
+  a[i][j] += 3;
+}
+
+void
+bar (void)
+{
+  #pragma omp parallel for simd schedule (simd: dynamic, 32) collapse(2)
+  for (int i = 0; i < 1; i++)
+for (int j = 0; j < 128; j++)
+  a[i][j] += 3;
+}
+
+void
+baz (void)
+{
+  #pragma omp distribute parallel for simd schedule (simd: dynamic, 32) 
collapse(2)
+  for (int i = 0; i < 1; i++)
+for (int j = 0; j < 128; j++)
+  a[i][j] += 3;
+}
+
+void
+qux (void)
+{
+  #pragma omp distribute simd dist_schedule (static, 128) collapse(2)
+  for (int i = 0; i < 1; i++)
+for (int j = 0; j < 128; j++)
+  a[i][j] += 3;
+}
+
+void
+corge (void)
+{
+  #pragma omp taskloop simd collapse(2)
+  for (int i = 0; i < 1; i++)
+for (int j = 0; j < 128; j++)
+  a[i][j] += 3;
+}
--- gcc/testsuite/gcc.dg/gomp/simd-3.c.jj   2020-10-06 13:33:59.543783638 
+0200
+++ gcc/testsuite/gcc.dg/gomp/simd-3.c  2020-10-06 13:36:25.650655684 +0200
@@ -0,0 +1,51 

Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Martin Liška

On 10/2/20 4:19 PM, Andrew MacLeod wrote:

On 10/2/20 9:26 AM, Martin Liška wrote:

Yes, you simply get all sorts of conditions that hold when a condition is
true, not just those based on the SSA name you put in.  But it occured
to me that the use-case is somewhat different - for switch-conversion
you want to know whether the test _exactly_ matches a range test,
the VRP worker will not tell you that.  For example if you had
if (x &&  a > 3 && a < 7) then it will give you 'a in [4, 6]' and it might
not give you 'x in [1, 1]' (for example if x is float).  But that's required
for correctness.


Hello.

Adding Ranger guys. Is it something that can be handled by the upcoming changes 
in VRP?


Presumably. It depends on exactly how the code lays out.  We dont process 
floats, so we wont know anything about the float (at least this release :-).  
We will sort through complex logicals and tell you what we do know, so if x is 
integral


    if (x &&  a > 3 && a < 7)

will give you, on the final true edge:

x_5(D)  int [-INF, -1][1, +INF]
a_6(D)  int [4, 6]


IF x is a float, then we wont give you anything for x obviously, but on the 
eventual true edge we'd still give you
a_6(D)  int [4, 6]


Which is an acceptable limitation for me.

However, I can't convince ranger to give me a proper ranges for. I'm using the 
following
code snippet:

  outgoing_range query;

  edge e;
  edge_iterator ei;
  FOR_EACH_EDGE (e, ei, bb->succs)
{
  int_range_max range;
  if (query.edge_range_p (range, e))
{
  if (dump_file)
{
  fprintf (dump_file, "%d->%d: ", e->src->index, e->dest->index);
  range.dump(dump_file);
  fprintf (dump_file, "\n");
}
}
}


if (9 <= index && index <= 123)
return 123;

   :
  index.0_1 = (unsigned int) index_5(D);
  _2 = index.0_1 + 4294967287;
  if (_2 <= 114)
goto ; [INV]
  else
goto ; [INV]

I get:

2->3: _Bool [1, 1]
2->4: _Bool [0, 0]

Can I get to index_5 [9, 123] ?
Thanks,
Martin




Andrew







[RFC] Add support for the "retain" attribute utilizing SHF_GNU_RETAIN

2020-10-06 Thread Jozef Lawrynowicz
Hi,

I'd like to propose a new "retain" attribute, which can
be applied to function and variable declarations.

The attribute is used to protect the function or variable declaration it
is applied to from linker garbage collection, by applying the
SHF_GNU_RETAIN section flag to the section containing it. This flag is a
GNU OSABI ELF extension.

The SHF_GNU_RETAIN flag was discussed on the GNU gABI mailing list here:
https://sourceware.org/pipermail/gnu-gabi/2020q3/000429.html

The Binutils patch for SHF_GNU_RETAIN was discussed in the following
threads:
https://sourceware.org/pipermail/binutils/2020-September/113406.html
https://sourceware.org/pipermail/binutils/2020-September/113499.html

The Binutils patch is still being iterated on, and I'd like to get some
feedback on one particular aspect of the GCC functionality before
finalizing the Binutils side of things.

When the "retain" attribute is applied to a declaration, there are three
ways to apply the SHF_GNU_RETAIN flag to the section containing the
declaration:
(1) Mark the entire section containing the declaration with the
SHF_GNU_RETAIN flag
(2) Place the declaration in a new, uniquely named section with
SHF_GNU_RETAIN set.
(3) Place the declaration in a new section with its default name, and
SHF_GNU_RETAIN set.

I think that (2) is the best option, as it most closely corresponds to
the behavior the user wants to apply by using the "retain" attribute.
That is, only the declaration itself needs to be retained.

Option (3) has the same advantage, however it requires some non-standard
behavior in the assembler to support. The assembler would normally emit
an error if two input sections have the same name, but different flags
set. At the moment, SHF_GNU_RETAIN is an exception to this, but
there is no fundamental reason that this exception is required, as the
associated behavior can be fully supported by just giving the section a
unique name.

As far as I'm aware, option (1) would be tricky to support in GCC.
We'd have to examine all the declarations within a section before the
first assembler directive to create a section is created, which isn't
really compatible with the current, linear nature of the assembly output
stream. I guess there's probably something we could do in the middle-end
to set a flag somewhere to catch this without it getting too
complicated.
However, it would also lead to large portions of the program being
unnecessarily retained in the linked file, when only one declaration was
required.

If anyone has any strong opinions that option (2) isn't the best choice
for the "retain" attribute, please let me know. I plan on finalizing the
Binutils patch in the coming days, removing the added support for unique
input sections with the same name, but different states for the
SHF_GNU_RETAIN flag, which is required for option (3).

Should "used" apply SHF_GNU_RETAIN?
===
Another talking point is whether the existing "used" attribute should
apply the SHF_GNU_RETAIN flag to the containing section.

It seems unlikely that a user applies the "used" attribute to a
declaration, and means for it to be saved from only compiler
optimization, but *not* linker optimization. So perhaps it would be
beneficial for "used" to apply SHF_GNU_RETAIN in some way.

If "used" did apply SHF_GNU_RETAIN, we would also have to
consider the above options for how to apply SHF_GNU_RETAIN to the
section. Since the "used" attribute has been around for a while 
it might not be appropriate for its behavior to be changed to place the
associated declaration in its own, unique section, as in option (2).

However, I tested this "used" attribute modification on
x86_64-pc-linux-gnu, and there was only a small number of regressions
(27 PASS->FAIL, from 6 tests) across the GCC and G++ testsuites.

I briefly investigated these, and some failures are just due to a change
in the expected output of tests, but also some real errors from issues
with hot/cold function partitioning. I believe those would just require
some additional functional changes, and there isn't anything
fundamentally broken.

So nothing that can't be worked around, but I am more concerned about
the wider impact of changing the attribute, which is not represented by
this small subset of testing. The changes would also only affect targets
that support the GNU ELF OSABI, which would lead to inconsistent
behavior between non-GNU OS's. Perhaps this isn't an issue since we can
just document it in the description for the "used" attribute:
  As a GNU ELF extension, the declaration the "used" attribute is
  applied to will be placed in a new, uniquely named section with the
  SHF_GNU_RETAIN flag applied.

I think that unless "used" creates a new, uniquely named SHF_GNU_RETAIN
section for a declaration, there is merit to having a separate "retain"
attribute that has this behavior.

To summarize the talking points:
- Any downsides to the new "retain" attribute creating a new, uniquely
  

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andreas Schwab
On Okt 06 2020, Jakub Jelinek wrote:

> On Tue, Oct 06, 2020 at 10:47:34AM +0200, Andreas Schwab wrote:
>> On Okt 06 2020, Jakub Jelinek via Gcc-patches wrote:
>> 
>> > I mean, we could just use:
>> >   size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;
>> >   irange *r = (irange *) obstack_alloc (_obstack, nbytes);
>> >   return new (r) irange ((tree *) (r + 1), num_pairs);
>> > without any new type.
>> 
>> Modulo proper alignment.
>
> Sure, but irange's last element is tree * which is pointer to pointer,
> and we need here an array of tree, i.e. pointers.  So, it would indeed break
> on a hypothetical host that has smaller struct X ** alignment than struct X *
> alignment.  I'm not aware of any.
> One could add a static_assert to verify that (that alignof (irange) >= 
> alignof (tree)
> and that sizeof (irange) % alignof (tree) == 0).

I think the proper alignment will be guaranteed if irange and tree[] are
obstack_alloc'd separately.  They don't need to be adjacent, do they?

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[PATCH] dbgcnt: print list after compilation

2020-10-06 Thread Martin Liška

Hello.

Motivation of the patch is to display debug counter values after a compilation.
It's handy for bisection of a debug counter. The new output is printed to stderr
(instead of stdout) and it works fine with LTO as well.

Sample output:

  counter name  counter value closed intervals
-
  asan_use_after_scope  0 unset
  auto_inc_dec  0 unset
  ccp   29473 unset
  cfg_cleanup   292   unset
  cprop 45unset
  cse2_move2add 451   unset
  dce   740   unset
  dce_fast  15unset
  dce_ud15unset
  delete_trivial_dead   5747  unset
  devirt0 unset
  df_byte_scan  0 unset
  dom_unreachable_edges 10unset
  tail_call 393   [1, 4], [100, 200]
...


Ready for master?
Thanks,
Martin

gcc/ChangeLog:

* common.opt: Remove -fdbg-cnt-list from deferred options.
* dbgcnt.c (dbg_cnt_set_limit_by_index): Make a copy
to original_limits.
(dbg_cnt_list_all_counters): Print also current counter value
and print to stderr.
* opts-global.c (handle_common_deferred_options): Do not handle
-fdbg-cnt-list.
* opts.c (common_handle_option): Likewise.
* toplev.c (finalize): Handle it after compilation here.
---
 gcc/common.opt|  2 +-
 gcc/dbgcnt.c  | 25 +++--
 gcc/opts-global.c |  4 
 gcc/opts.c|  5 -
 gcc/toplev.c  |  4 
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 292c2de694e..7e789d1c47f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1202,7 +1202,7 @@ Common Report Var(flag_data_sections)
 Place data items into their own section.
 
 fdbg-cnt-list

-Common Report Var(common_deferred_options) Defer
+Common Report Var(flag_dbg_cnt_list)
 List all available debugging counters with their limits and counts.
 
 fdbg-cnt=

diff --git a/gcc/dbgcnt.c b/gcc/dbgcnt.c
index 01893ce7238..2a2dd57507d 100644
--- a/gcc/dbgcnt.c
+++ b/gcc/dbgcnt.c
@@ -45,6 +45,7 @@ static struct string2counter_map 
map[debug_counter_number_of_counters] =
 typedef std::pair limit_tuple;
 
 static vec limits[debug_counter_number_of_counters];

+static vec original_limits[debug_counter_number_of_counters];
 
 static unsigned int count[debug_counter_number_of_counters];
 
@@ -134,6 +135,8 @@ dbg_cnt_set_limit_by_index (enum debug_counter index, const char *name,

}
 }
 
+  original_limits[index] = limits[index].copy ();

+
   return true;
 }
 
@@ -226,25 +229,27 @@ void

 dbg_cnt_list_all_counters (void)
 {
   int i;
-  printf ("  %-30s %s\n", G_("counter name"), G_("closed intervals"));
-  printf 
("-\n");
+  fprintf (stderr, "  %-30s%-15s   %s\n", G_("counter name"),
+  G_("counter value"), G_("closed intervals"));
+  fprintf (stderr, 
"-\n");
   for (i = 0; i < debug_counter_number_of_counters; i++)
 {
-  printf ("  %-30s ", map[i].name);
-  if (limits[i].exists ())
+  fprintf (stderr, "  %-30s%-15d   ", map[i].name, count[i]);
+  if (original_limits[i].exists ())
{
- for (int j = limits[i].length () - 1; j >= 0; j--)
+ for (int j = original_limits[i].length () - 1; j >= 0; j--)
{
- printf ("[%u, %u]", limits[i][j].first, limits[i][j].second);
+ fprintf (stderr, "[%u, %u]", original_limits[i][j].first,
+  original_limits[i][j].second);
  if (j > 0)
-   printf (", ");
+   fprintf (stderr, ", ");
}
- putchar ('\n');
+ fprintf (stderr, "\n");
}
   else
-   printf ("unset\n");
+   fprintf (stderr, "unset\n");
 }
-  printf ("\n");
+  fprintf (stderr, "\n");
 }
 
 #if CHECKING_P

diff --git a/gcc/opts-global.c b/gcc/opts-global.c
index b024ab8e18f..1816acf805b 100644
--- a/gcc/opts-global.c
+++ b/gcc/opts-global.c
@@ -378,10 +378,6 @@ handle_common_deferred_options (void)
  dbg_cnt_process_opt (opt->arg);
  break;
 
-	case OPT_fdbg_cnt_list:

- dbg_cnt_list_all_counters ();
- break;
-
case OPT_fdebug_prefix_map_:
  add_debug_prefix_map (opt->arg);
  break;
diff --git a/gcc/opts.c b/gcc/opts.c
index 3bda59afced..da503c32dd0 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2361,11 +2361,6 @@ common_handle_option (struct gcc_options *opts,
   /* Deferred.  */
   break;
 
-case 

Re: [PATCH] RISC-V: Derive ABI from -march if -mabi is not present.

2020-10-06 Thread Maciej W. Rozycki
On Tue, 6 Oct 2020, Kito Cheng wrote:

> I think this patch is kind of major change for GCC RISC-V port, so I cc all
> RISC-V gcc maintainer to make sure this change is fine with you guys.
> 
>  - Motivation of this patch:
>1. Sync behavior between clang/llvm.
>2. Preparation for -mcpu option support, -mcpu will set -march
>   according the core default arch, however it would be awkward
>   if we only change arch: user need to know the default arch of
>   the core and then set the right ABI, of cause user still can
>   specify arch and abi via -march and -mabi.
> 
>  - This patch has change the behavior for default value of ABI, the ABI
>will derive from -march if -mabi is not given, which is same behavior
>as clang/llvm.

 Just to warn you: it used to be the case with the MIPS target and the 
`-mips[1234]' ISA level options originating from SGI's MIPSpro toolchain 
and it has turned out confusing and troublesome.  After many discussions 
we ended up with the current `-march='/`-mtune='/`-mabi=' scheme, for the 
instruction set, the DFA scheduling and the ABI respectively.  Defaults 
are set with `--with-arch='/`--with-tune='/`--with-abi=' respectively, and 
in the absence of an override `-mtune=' is derived from `-march=', which 
is derived from `-mabi='.  Defaults for different ABIs can be set with 
respective `--with-arch*=' and `--with-tune*=' options.

 This prevents the ABI from being changed unexpectedly, especially if 
different though link-compatible `-march=' options are used for individual 
objects in a compilation.

 The MIPS port used to have `-mcpu=' as well, which used to be roughly 
equivalent to modern `-mtune='; from your description I gather `-mcpu=' is 
going to be roughly equivalent to a combination of `-mtune=' and `-march=' 
setting DFA scheduling for a specific CPU and the instruction set to the 
underlying architecture (do we plan to allow vendor extensions?).  In 
which case to compile a set of CPU-specific modules to be linked together 
(e.g. individual platform support in a generic piece of software like an 
OS kernel or a bare metal library) you'll always have to specify the ABI 
explicitly (though maybe you want anyway, hmm).

 FWIW,

  Maciej


Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 06, 2020 at 11:20:52AM +0200, Aldy Hernandez wrote:
> > > diff --git a/gcc/value-range.h b/gcc/value-range.h
> > > index 94b48e55e77..7031a823138 100644
> > > --- a/gcc/value-range.h
> > > +++ b/gcc/value-range.h
> > > @@ -670,7 +670,7 @@ irange_allocator::allocate (unsigned num_pairs)
> > > 
> > > struct newir {
> > >   irange range;
> > > -tree mem[1];
> > > +tree mem[2];
> > > };
> > > size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 
> > > 1));
> > > struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);
> > 
> > So, we essentially want a flexible array member, which C++ without extension
> > doesn't have, and thus need to rely on the compiler handling the trailing
> > array as a poor men's flexible array member (again, GCC does for any size,
> > but not 100% sure about other compilers, if they e.g. don't handle that way
> > just size of 1).
> 
> We know we need _at least_ two trees, so what's wrong with the above?

See the discussions we even had in GCC.  Some of us are arguing that only
flexible array member should be treated as such, others also add [0] to
that, others [1] and others any arrays at the trailing positions.
Because standard C++ lacks both [] and [0], at least [1] support is needed
eventhough perhaps pedantically it is invalid.  GCC after all heavily relies
on that elsewhere, e.g. in rtl or gimple structures.  But it is still all
just [1], not [2] or [32].  And e.g. Coverity complains about that a lot.
There is another way around it, using [MAXIMUM_POSSIBLE_COUNT] instead and
then allocating only a subset of those using offsetof to count the size.
But that is undefined in a different way, would probably make Coverity
happy and e.g. for RTL is doable because we have maximum number of operands,
and for many gimple stmts too, except that e.g. GIMPLE_CALLs don't really
have a maximum (well, have it as UINT_MAX - 3 or so).

GCC to my knowledge will treat all the trailing arrays that way, but it is
unclear if other compilers do the same or not.
You can use mem[1] and just use
  size_t nbytes = sizeof (newir) + sizeof (tree) * (2 * num_pairs - 1);

> > Is there any reason why the code is written that way?
> > I mean, we could just use:
> >size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;
> 
> We had that originally, but IIRC, the alignment didn't come out right.

That surprises me, because I don't understand how it could (unless irange
didn't have a pointer member at that point).

Jakub



[PATCH][obvious] dbgcnt: report upper limit when lower == upper

2020-10-06 Thread Martin Liška

Hey.

There's one obvious patch that should inform about upper limit reach
of a debug counter.

I'm going to install the patch.
Martin

gcc/ChangeLog:

* dbgcnt.c (dbg_cnt): Report also upper limit.
---
 gcc/dbgcnt.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/dbgcnt.c b/gcc/dbgcnt.c
index ae98a281d63..01893ce7238 100644
--- a/gcc/dbgcnt.c
+++ b/gcc/dbgcnt.c
@@ -79,7 +79,10 @@ dbg_cnt (enum debug_counter index)
 {
   print_limit_reach (map[index].name, v, false);
   if (min == max)
-   limits[index].pop ();
+   {
+ print_limit_reach (map[index].name, v, true);
+ limits[index].pop ();
+   }
   return true;
 }
   else if (v < max)
--
2.28.0



Re: [PATCH] lto: fix LTO debug sections copying.

2020-10-06 Thread Martin Liška

On 10/6/20 10:00 AM, Richard Biener wrote:

On Tue, Oct 6, 2020 at 9:01 AM Martin Liška  wrote:


On 10/5/20 6:34 PM, Ian Lance Taylor wrote:

On Mon, Oct 5, 2020 at 9:09 AM Martin Liška  wrote:


The previous patch was not correct. This one should be.

Ready for master?


I don't understand why this code uses symtab_indices_shndx at all.
There should only be one SHT_SYMTAB_SHNDX section.  There shouldn't be
any need for the symtab_indices_shndx vector.


Well, the question is if we can have multiple .symtab sections in one ELF
file? Theoretically yes, so we should also handle SHT_SYMTAB_SHNDX sections.
Note that the original usage of the SHT_SYMTAB_SHNDX section was motivated
by PR81968 which is about Solaris ld.


It wasn't my code but I suppose this way the implementation was
"easiest".  There
should be exactly one symtab / shndx section.  Rainer authored this support.


If we expect at maximum one SHT_SYMTAB_SHNDX section section, then I'm 
suggesting
an updated version of the patch. It's what Ian offered.

Thoughts?
Martin





But in any case this patch looks OK.


I also think the patch looks OK.  Rainer?

Richard.


Waiting for a feedback from Richi.

Thanks,
Martin



Thanks.

Ian





>From bb259b4dc2a79ef45d449896d05855122ecc2ef9 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 5 Oct 2020 18:03:08 +0200
Subject: [PATCH] lto: fix LTO debug sections copying.

readelf -S prints:

There are 81999 section headers, starting at offset 0x1f488060:

Section Headers:
  [Nr] Name  TypeAddress  OffSize   ES Flg Lk Inf Al
  [ 0]   NULL 00 01404f 00 81998   0  0
  [ 1] .groupGROUP    40 08 04 81995 105027  4
...
  [81995] .symtab   SYMTAB   d5d9298 2db310 18 81997 105026  8
  [81996] .symtab_shndx SYMTAB SECTION INDICES  d8b45a8 079dd8 04 81995   0  4
  [81997] .strtab   STRTAB   d92e380 80460c 00  0   0  1
...

Expect only at maximum one .symtab_shndx section.

libiberty/ChangeLog:

PR lto/97290
* simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
Expect only one .symtab_shndx section.
---
 libiberty/simple-object-elf.c | 28 +++-
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
index 7c9d492f6a4..6dc5c60a842 100644
--- a/libiberty/simple-object-elf.c
+++ b/libiberty/simple-object-elf.c
@@ -1109,7 +1109,7 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
   unsigned new_i;
   unsigned *sh_map;
   unsigned first_shndx = 0;
-  unsigned int *symtab_indices_shndx;
+  unsigned int symtab_shndx = 0;
 
   shdr_size = (ei_class == ELFCLASS32
 	   ? sizeof (Elf32_External_Shdr)
@@ -1151,9 +1151,6 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
   pfnret = XNEWVEC (int, shnum);
   pfnname = XNEWVEC (const char *, shnum);
 
-  /* Map of symtab to index section.  */
-  symtab_indices_shndx = XCNEWVEC (unsigned int, shnum - 1);
-
   /* First perform the callbacks to know which sections to preserve and
  what name to use for those.  */
   for (i = 1; i < shnum; ++i)
@@ -1188,10 +1185,9 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
  shdr, sh_type, Elf_Word);
   if (sh_type == SHT_SYMTAB_SHNDX)
 	{
-	  unsigned int sh_link;
-	  sh_link = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
- shdr, sh_link, Elf_Word);
-	  symtab_indices_shndx[sh_link - 1] = i;
+	  if (symtab_shndx != 0)
+	return "Multiple SYMTAB SECTION INDICES sections";
+	  symtab_shndx = i - 1;
 	  /* Always discard the extended index sections, after
 	 copying it will not be needed.  This way we don't need to
 	 update it and deal with the ordering constraints of
@@ -1323,7 +1319,6 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	  *err = 0;
 	  XDELETEVEC (names);
 	  XDELETEVEC (shdrs);
-	  XDELETEVEC (symtab_indices_shndx);
 	  return "ELF section name out of range";
 	}
 
@@ -1341,7 +1336,6 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	{
 	  XDELETEVEC (names);
 	  XDELETEVEC (shdrs);
-	  XDELETEVEC (symtab_indices_shndx);
 	  return errmsg;
 	}
 
@@ -1362,7 +1356,6 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	  XDELETEVEC (buf);
 	  XDELETEVEC (names);
 	  XDELETEVEC (shdrs);
-	  XDELETEVEC (symtab_indices_shndx);
 	  return errmsg;
 	}
 
@@ -1372,19 +1365,22 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	{
 	  unsigned entsize = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
 	  shdr, sh_entsize, Elf_Addr);
-	  unsigned strtab = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-	 shdr, sh_link, Elf_Word);
 	  size_t prevailing_name_idx = 0;
 	  unsigned char 

Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Aldy Hernandez via Gcc-patches

-  builtin_memref dstref (dst, dstsize);
-  builtin_memref srcref (src, srcsize);
+  builtin_memref dstref (query, call, dst, dstsize);
+  builtin_memref srcref (query, call, src, srcsize);

    /* Create a descriptor of the access.  This may adjust both DSTREF
   and SRCREF based on one another and the kind of the access.  */
-  builtin_access acs (call, dstref, srcref);
+  builtin_access acs (query, call, dstref, srcref);


Since/if the query pointer is a member of builtin_memref which is
passed to the builtin_access ctor there should be no need to pass
a second (and third) copy to it as well.


builtin_memref seems like an independent object altogether, and the 
query is a private member of said object.  Are you proposing making it 
public, or making builtin_access a friend of builtin_memref (eeech)?


Aldy



Re: [PATCH] divmod: Match and expand DIVMOD even in some cases of constant divisor [PR97282]

2020-10-06 Thread Christophe Lyon via Gcc-patches
Hi Jakub,

On Tue, 6 Oct 2020 at 10:13, Richard Biener  wrote:
>
> On Tue, 6 Oct 2020, Jakub Jelinek wrote:
>
> > Hi!
> >
> > As written in the comment, tree-ssa-math-opts.c wouldn't create a DIVMOD
> > ifn call for division + modulo by constant for the fear that during
> > expansion we could generate better code for those cases.
> > If the divisoris a power of two, that is certainly the case always,
> > but otherwise expand_divmod can punt in many cases, e.g. if the division
> > type's precision is above HOST_BITS_PER_WIDE_INT, we don't even call
> > choose_multiplier, because it works on HOST_WIDE_INTs (true, something
> > we should fix eventually now that we have wide_ints), or if pre/post shift
> > is larger than BITS_PER_WORD.
> >
> > So, the following patch recognizes DIVMOD with constant last argument even
> > when it is unclear if expand_divmod will be able to optimize it, and then
> > during DIVMOD expansion if the divisor is constant attempts to expand it as
> > division + modulo and if they actually don't contain any libcalls or
> > division/modulo, they are kept as is, otherwise that sequence is thrown away
> > and divmod optab or libcall is used.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> OK.
>
> Richard.
>
> > 2020-10-06  Jakub Jelinek  
> >
> >   PR rtl-optimization/97282
> >   * tree-ssa-math-opts.c (divmod_candidate_p): Don't return false for
> >   constant op2 if it is not a power of two and the type has precision
> >   larger than HOST_BITS_PER_WIDE_INT or BITS_PER_WORD.
> >   * internal-fn.c (contains_call_div_mod): New function.
> >   (expand_DIVMOD): If last argument is a constant, try to expand it as
> >   TRUNC_DIV_EXPR followed by TRUNC_MOD_EXPR, but if the sequence
> >   contains any calls or {,U}{DIV,MOD} rtxes, throw it away and use
> >   divmod optab or divmod libfunc.
> >

This patch causes ICEs on arm while building newlib or glibc

For instance with newlib when compiling vfwprintf.o:
during RTL pass: expand
In file included from
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/stdio/vfprintf.c:153:
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/include/stdio.h:
In function '_vfprintf_r':
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/include/stdio.h:503:9:
internal compiler error: in int_mode_for_mode, at stor-layout.c:404
  503 | int _vfprintf_r (struct _reent *, FILE *__restrict, const
char *__restrict, __VALIST)
  | ^~~
0xaed4e3 int_mode_for_mode(machine_mode)

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/stor-layout.c:404
0x7ff73d emit_move_via_integer
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3425
0x808f2d emit_move_insn_1(rtx_def*, rtx_def*)
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3793
0x8092d7 emit_move_insn(rtx_def*, rtx_def*)
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3935
0x6e703f emit_library_call_value_1(int, rtx_def*, rtx_def*,
libcall_type, machine_mode, int, std::pair*)

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/calls.c:5601
0xdff642 emit_library_call_value
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/rtl.h:4258
0xdff642 arm_expand_divmod_libfunc

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:33256
0x8c69af expand_DIVMOD

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/internal-fn.c:3084
0x7021b7 expand_call_stmt

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:2612
0x7021b7 expand_gimple_stmt_1

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3686
0x7021b7 expand_gimple_stmt

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3851
0x702cfd expand_gimple_basic_block

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5892
0x70533e execute

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6576

Christophe



> >   * gcc.target/i386/pr97282.c: New test.
> >
> > --- gcc/tree-ssa-math-opts.c.jj   2020-10-01 10:40:10.104755999 +0200
> > +++ gcc/tree-ssa-math-opts.c  2020-10-05 13:51:54.476628287 +0200
> > @@ -3567,9 +3567,24 @@ divmod_candidate_p (gassign *stmt)
> >
> >/* Disable the transform if either is a constant, since 
> > division-by-constant
> >   may have specialized expansion.  */
> > -  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
> > +  if (CONSTANT_CLASS_P (op1))
> >  return false;
> >
> > +  if (CONSTANT_CLASS_P (op2))
> > +{
> > +  if (integer_pow2p (op2))
> > + return false;
> > +
> > +  if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
> > +   && TYPE_PRECISION (type) <= BITS_PER_WORD)
> > + return false;
> > +
> > +  /* If the divisor is not power of 2 and the 

Re: make sincos take type from intrinsic formal, not from result assignment

2020-10-06 Thread Alexandre Oliva
On Oct  6, 2020, Richard Biener  wrote:

> OK, I see.  mathfn_built_in expects a type inter-operating with
> the C ABI types (float_type_node, double_type_node, etc.) where
> "inter-operating" means having the same main variant.

Yup.

> Now, I guess for the sincos pass we want to combine sinl + cosl
> to sincosl, independent on the case where the result would be
> assigned to a 'double' when 'double == long double'?

Sorry, I goofed in the patch description and misled you.

When looking at

  _d = sin (_s);

the sincos didn't take the type of _d, but that of _s.

I changed it so it takes the not from the actual passed to the
intrinsic, but from the formal in the intrinsic declaration.

If we had conversions of _s to different precisions, the optimization
wouldn't kick in: we'd have different actuals passed to sin and cos.
I'm not sure it makes much sense to try to turn e.g.

  _d1 = sin (_s);
  _t = (float) _s;
  _d2 = cosf (_t);

into:

  sincos (_s, , );
  _d1 = D1;
  _td2 = T;
  _d2 = (float) _td2;

If someone goes through the trouble of computing sin and cos for the
same angle at different precisions, you might as well leave it alone.

> Now what about sinl + cos when 'double == long double'?

Now that might make more sense to optimize, but if we're going to do
such transformations, we might as well canonicalize the *l intrinsics to
the equivalent double versions (assuming long double and double have the
same precision), and then sincos will get it right.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


  1   2   >