Re: introduce -fcallgraph-info option
On Fri, 15 Nov 2019, Alexandre Oliva wrote: > On Nov 14, 2019, Alexandre Oliva wrote: > > > %{!c:%{!S:-dumpbase %b} > > Uhh, I failed to adjust this one to add the executable output name to > dumpbase. > > Anyway, getting the right semantics out of specs is providing to be a > lot trickier than I had anticipated. I'm now pondering a single spec > function to deal with all of these dumpbase possibilities. > > I'm also a little uncertain about behavior change WRT .dwo files. > Though their names are built out of the .o files in the objcopy > commands, they're built from aux_base_name in dwarf2out. Currently, > since aux_base_name is derived from the output object file name, this > ensures they have the same name and directory, but once we enable > -dumpdir to be specified to override it, that may no longer be the > case. Ugh... Hmm, -dwo-base-name to the rescue? ;) Well, I guess the debug info has to somewhere encode the full/relative path to the .dwo files so all that is needed is to keep that consistent? Richard.
Re: [Patch] [mid-end][__RTL] Clean df state despite invalid __RTL startwith passes
On Thu, 14 Nov 2019, Matthew Malcomson wrote: > Hi there, > > When compiling an __RTL function that has an invalid "startwith" pass we > currently don't run the dfinish cleanup pass. This means we ICE on the next > function. > > This change ensures that all state is cleaned up for the next function > to run correctly. > > As an example, before this change the following code would ICE when compiling > the function `foo2` because the "peephole2" pass is not run at optimisation > level -O0. > > When compiled with > ./aarch64-none-linux-gnu-gcc -O0 -S missed-pass-error.c -o test.s > > ``` > int __RTL (startwith ("peephole2")) badfoo () > { > (function "badfoo" > (insn-chain > (block 2 > (edge-from entry (flags "FALLTHRU")) > (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > (cinsn 101 (set (reg:DI x19) (reg:DI x0))) > (cinsn 10 (use (reg/i:SI x19))) > (edge-to exit (flags "FALLTHRU")) > ) ;; block 2 > ) ;; insn-chain > ) ;; function "foo2" > } > > int __RTL (startwith ("final")) foo2 () > { > (function "foo2" > (insn-chain > (block 2 > (edge-from entry (flags "FALLTHRU")) > (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > (cinsn 101 (set (reg:DI x19) (reg:DI x0))) > (cinsn 10 (use (reg/i:SI x19))) > (edge-to exit (flags "FALLTHRU")) > ) ;; block 2 > ) ;; insn-chain > ) ;; function "foo2" > } > ``` > > Now it silently ignores the __RTL function and successfully compiles foo2. > > regtest done on aarch64 > regtest done on x86_64 > > OK for trunk? OK. Richard. > gcc/ChangeLog: > > 2019-11-14 Matthew Malcomson > > * passes.c (should_skip_pass_p): Always run "dfinish". > > gcc/testsuite/ChangeLog: > > 2019-11-14 Matthew Malcomson > > * gcc.dg/rtl/aarch64/missed-pass-error.c: New test. > > > > ### Attachment also inlined for ease of reply > ### > > > diff --git a/gcc/passes.c b/gcc/passes.c > index > d86af115ecb16fcab6bfce070f1f3e4f1d90ce71..258f85ab4f8a1519b978b75dfa67536d2eacd106 > 100644 > --- a/gcc/passes.c > +++ b/gcc/passes.c > @@ -2375,7 +2375,8 @@ should_skip_pass_p (opt_pass *pass) > return false; > >/* Don't skip df init; later RTL passes need it. */ > - if (strstr (pass->name, "dfinit") != NULL) > + if (strstr (pass->name, "dfinit") != NULL > + || strstr (pass->name, "dfinish") != NULL) > return false; > >if (!quiet_flag) > diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/missed-pass-error.c > b/gcc/testsuite/gcc.dg/rtl/aarch64/missed-pass-error.c > new file mode 100644 > index > ..2f02ca9d0c40b372d86b24009540e157ed1a8c59 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/rtl/aarch64/missed-pass-error.c > @@ -0,0 +1,45 @@ > +/* { dg-do compile { target aarch64-*-* } } */ > +/* { dg-additional-options "-O0" } */ > + > +/* > + When compiling __RTL functions the startwith string can be either > incorrect > + (i.e. not matching a pass) or be unused (i.e. can refer to a pass that is > + not run at the current optimisation level). > + > + Here we ensure that the state clean up is still run, so that functions > other > + than the faulty one can still be compiled. > + */ > + > +int __RTL (startwith ("peephole2")) badfoo () > +{ > +(function "badfoo" > + (insn-chain > +(block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 101 (set (reg:DI x19) (reg:DI x0))) > + (cinsn 10 (use (reg/i:SI x19))) > + (edge-to exit (flags "FALLTHRU")) > +) ;; block 2 > + ) ;; insn-chain > +) ;; function "foo2" > +} > + > +/* Compile a valid __RTL function to test state from the "dfinit" pass has > been > + cleaned with the "dfinish" pass. */ > + > +int __RTL (startwith ("final")) foo2 () > +{ > +(function "foo2" > + (insn-chain > +(block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 101 (set (reg:DI x19) (reg:DI x0))) > + (cinsn 10 (use (reg/i:SI x19))) > + (edge-to exit (flags "FALLTHRU")) > +) ;; block 2 > + ) ;; insn-chain > +) ;; function "foo2" > +} > + > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Re: [patch, fortran] Load scalar intent-in variables at the beginning of procedures
Hi Thomas, On 11/11/19 10:55 PM, Thomas König wrote: the attached patch loads scalar INTENT(IN) variables to a local variable at the start of a procedure, as suggested in PR 67202, in order to aid optimization. This is controlled by front-end optimization so it is easier to catch if any bugs should turn up :-) + if (f->sym == NULL || f->sym->attr.dimension || f->sym->attr.allocatable + || f->sym->attr.optional || f->sym->attr.pointer + || f->sym->attr.codimension || f->sym->attr.value + || f->sym->attr.proc_pointer || f->sym->attr.target + || f->sym->attr.asynchronous + || f->sym->ts.type == BT_CHARACTER || f->sym->ts.type == BT_DERIVED + || f->sym->ts.type == BT_CLASS) + continue; I think you need to add at least VOLATILE to this list – otherwise, I have not thought much about corner cases nor have studied the patch, sorry. Cheers, Tobias
Re: introduce -fcallgraph-info option
On Nov 14, 2019, Alexandre Oliva wrote: > %{!c:%{!S:-dumpbase %b} Uhh, I failed to adjust this one to add the executable output name to dumpbase. Anyway, getting the right semantics out of specs is providing to be a lot trickier than I had anticipated. I'm now pondering a single spec function to deal with all of these dumpbase possibilities. I'm also a little uncertain about behavior change WRT .dwo files. Though their names are built out of the .o files in the objcopy commands, they're built from aux_base_name in dwarf2out. Currently, since aux_base_name is derived from the output object file name, this ensures they have the same name and directory, but once we enable -dumpdir to be specified to override it, that may no longer be the case. Ugh... -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
Re: [PATCH] PR92398: Fix testcase failure of pr72804.c
On 2019/11/15 11:12, Xiong Hu Luo wrote: P9LE generated instruction is not worse than P8LE. mtvsrdd;xxlnot;stxv vs. not;not;std;std. Update the test case to fix failures. gcc/testsuite/ChangeLog: 2019-11-15 Luo Xiong Hu testsuite/pr92398 * gcc.target/powerpc/pr72804.h: New. * gcc.target/powerpc/pr72804.p8.c: New. * gcc.target/powerpc/pr72804.c: Rename to ... * gcc.target/powerpc/pr72804.p9.c: ... this one. --- gcc/testsuite/gcc.target/powerpc/pr72804.h| 17 ++ gcc/testsuite/gcc.target/powerpc/pr72804.p8.c | 16 ++ .../powerpc/{pr72804.c => pr72804.p9.c} | 22 ++- 3 files changed, 40 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr72804.h create mode 100644 gcc/testsuite/gcc.target/powerpc/pr72804.p8.c rename gcc/testsuite/gcc.target/powerpc/{pr72804.c => pr72804.p9.c} (59%) diff --git a/gcc/testsuite/gcc.target/powerpc/pr72804.h b/gcc/testsuite/gcc.target/powerpc/pr72804.h new file mode 100644 index 000..8a5ea93cc17 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.h @@ -0,0 +1,17 @@ +/* This test code is included into pr72804.p8.c and pr72804.p9.c + The two files have the tests for the number of instructions generated for + P8LE versus P9LE. */ + +__int128_t +foo (__int128_t *src) +{ + return ~*src; +} + +void +bar (__int128_t *dst, __int128_t src) +{ + *dst = ~src; +} + + diff --git a/gcc/testsuite/gcc.target/powerpc/pr72804.p8.c b/gcc/testsuite/gcc.target/powerpc/pr72804.p8.c new file mode 100644 index 000..ad968769aae --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.p8.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx -mdejagnu-cpu=power8" } */ + +/* { dg-final { scan-assembler-times "not " 4 {xfail be} } } */ +/* { dg-final { scan-assembler-times "std " 2 {xfail be} } } */ +/* { dg-final { scan-assembler-times "ld " 2 } } */ +/* { dg-final { scan-assembler-not "lxvd2x" } } */ +/* { dg-final { scan-assembler-not "stxvd2x" } } */ +/* { dg-final { scan-assembler-not "xxpermdi" } } */ Update to this after test it on P8BE: -/* { dg-final { scan-assembler-not "stxvd2x" } } */ -/* { dg-final { scan-assembler-not "xxpermdi" } } */ +/* { dg-final { scan-assembler-not "stxvd2x" {xfail be} } } */ +/* { dg-final { scan-assembler-not "xxpermdi" {xfail be} } } */ +/* { dg-final { scan-assembler-not "mfvsrd" } } */ +/* { dg-final { scan-assembler-not "mfvsrd" } } */ + +/* Source code for the test in pr72804.h */ +#include "pr72804.h" diff --git a/gcc/testsuite/gcc.target/powerpc/pr72804.c b/gcc/testsuite/gcc.target/powerpc/pr72804.p9.c similarity index 59% rename from gcc/testsuite/gcc.target/powerpc/pr72804.c rename to gcc/testsuite/gcc.target/powerpc/pr72804.p9.c index 10e37caed6b..2059d7df1a2 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr72804.c +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.p9.c @@ -1,25 +1,17 @@ /* { dg-do compile { target { lp64 } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ -/* { dg-options "-O2 -mvsx -fno-inline-functions --param max-inline-insns-single-O2=200" } */ +/* { dg-options "-O2 -mvsx -mdejagnu-cpu=power9" } */ -__int128_t -foo (__int128_t *src) -{ - return ~*src; -} - -void -bar (__int128_t *dst, __int128_t src) -{ - *dst = ~src; -} - -/* { dg-final { scan-assembler-times "not " 4 } } */ -/* { dg-final { scan-assembler-times "std " 2 } } */ +/* { dg-final { scan-assembler-times "not " 2 } } */ +/* { dg-final { scan-assembler-times "std " 0 } } */ /* { dg-final { scan-assembler-times "ld " 2 } } */ /* { dg-final { scan-assembler-not "lxvd2x" } } */ /* { dg-final { scan-assembler-not "stxvd2x" } } */ /* { dg-final { scan-assembler-not "xxpermdi" } } */ /* { dg-final { scan-assembler-not "mfvsrd" } } */ /* { dg-final { scan-assembler-not "mfvsrd" } } */ + +/* Source code for the test in pr72804.h */ +#include "pr72804.h" +
Go patch committed: Fix inlining of sink names
This Go frontend patch by Than McIntosh fixes inlining of sink names. When the compiler writes an inlinable function to the export data, parameter names are written out (in Export::write_name) using the Gogo::message_name as opposed to a raw/encoded name. This means that sink parameters (those named "_") get created with the name "_" instead of "._" (the name created by the lexer/parser). This confuses Gogo::is_sink_name, which looks for the latter sequence and not just "_". This can cause issues later on if an inlinable function is imported and fed through the rest of the compiler (things that are sinks are no recognized as such). To fix these issues, change Gogo::is_sink_name to return true for either variants ("_" or "._"). This fixes https://golang.org/issue/35586. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 277299) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -1e2d98b27701744cf0ec57b19d7fc8f594184b9a +2d0504236c7236345ee17a0cb43a3bb9ce3acf7f The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/gogo.h === --- gcc/go/gofrontend/gogo.h(revision 277299) +++ gcc/go/gofrontend/gogo.h(working copy) @@ -222,7 +222,9 @@ class Gogo { return (name[0] == '.' && name[name.length() - 1] == '_' - && name[name.length() - 2] == '.'); + && name[name.length() - 2] == '.') +|| (name[0] == '_' +&& name.length() == 1); } // Helper used when adding parameters (including receiver param) to the
[PATCH] PR92398: Fix testcase failure of pr72804.c
P9LE generated instruction is not worse than P8LE. mtvsrdd;xxlnot;stxv vs. not;not;std;std. Update the test case to fix failures. gcc/testsuite/ChangeLog: 2019-11-15 Luo Xiong Hu testsuite/pr92398 * gcc.target/powerpc/pr72804.h: New. * gcc.target/powerpc/pr72804.p8.c: New. * gcc.target/powerpc/pr72804.c: Rename to ... * gcc.target/powerpc/pr72804.p9.c: ... this one. --- gcc/testsuite/gcc.target/powerpc/pr72804.h| 17 ++ gcc/testsuite/gcc.target/powerpc/pr72804.p8.c | 16 ++ .../powerpc/{pr72804.c => pr72804.p9.c} | 22 ++- 3 files changed, 40 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr72804.h create mode 100644 gcc/testsuite/gcc.target/powerpc/pr72804.p8.c rename gcc/testsuite/gcc.target/powerpc/{pr72804.c => pr72804.p9.c} (59%) diff --git a/gcc/testsuite/gcc.target/powerpc/pr72804.h b/gcc/testsuite/gcc.target/powerpc/pr72804.h new file mode 100644 index 000..8a5ea93cc17 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.h @@ -0,0 +1,17 @@ +/* This test code is included into pr72804.p8.c and pr72804.p9.c + The two files have the tests for the number of instructions generated for + P8LE versus P9LE. */ + +__int128_t +foo (__int128_t *src) +{ + return ~*src; +} + +void +bar (__int128_t *dst, __int128_t src) +{ + *dst = ~src; +} + + diff --git a/gcc/testsuite/gcc.target/powerpc/pr72804.p8.c b/gcc/testsuite/gcc.target/powerpc/pr72804.p8.c new file mode 100644 index 000..ad968769aae --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.p8.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx -mdejagnu-cpu=power8" } */ + +/* { dg-final { scan-assembler-times "not " 4 {xfail be} } } */ +/* { dg-final { scan-assembler-times "std " 2 {xfail be} } } */ +/* { dg-final { scan-assembler-times "ld " 2 } } */ +/* { dg-final { scan-assembler-not "lxvd2x" } } */ +/* { dg-final { scan-assembler-not "stxvd2x" } } */ +/* { dg-final { scan-assembler-not "xxpermdi" } } */ +/* { dg-final { scan-assembler-not "mfvsrd" } } */ +/* { dg-final { scan-assembler-not "mfvsrd" } } */ + +/* Source code for the test in pr72804.h */ +#include "pr72804.h" diff --git a/gcc/testsuite/gcc.target/powerpc/pr72804.c b/gcc/testsuite/gcc.target/powerpc/pr72804.p9.c similarity index 59% rename from gcc/testsuite/gcc.target/powerpc/pr72804.c rename to gcc/testsuite/gcc.target/powerpc/pr72804.p9.c index 10e37caed6b..2059d7df1a2 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr72804.c +++ b/gcc/testsuite/gcc.target/powerpc/pr72804.p9.c @@ -1,25 +1,17 @@ /* { dg-do compile { target { lp64 } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ -/* { dg-options "-O2 -mvsx -fno-inline-functions --param max-inline-insns-single-O2=200" } */ +/* { dg-options "-O2 -mvsx -mdejagnu-cpu=power9" } */ -__int128_t -foo (__int128_t *src) -{ - return ~*src; -} - -void -bar (__int128_t *dst, __int128_t src) -{ - *dst = ~src; -} - -/* { dg-final { scan-assembler-times "not " 4 } } */ -/* { dg-final { scan-assembler-times "std " 2 } } */ +/* { dg-final { scan-assembler-times "not " 2 } } */ +/* { dg-final { scan-assembler-times "std " 0 } } */ /* { dg-final { scan-assembler-times "ld " 2 } } */ /* { dg-final { scan-assembler-not "lxvd2x" } } */ /* { dg-final { scan-assembler-not "stxvd2x" } } */ /* { dg-final { scan-assembler-not "xxpermdi" } } */ /* { dg-final { scan-assembler-not "mfvsrd" } } */ /* { dg-final { scan-assembler-not "mfvsrd" } } */ + +/* Source code for the test in pr72804.h */ +#include "pr72804.h" + -- 2.21.0.777.g83232e3864
Re: [PATCH] Add support for C++2a stop_token
Tested x86_64-pc-linux-gnu, committed to trunk. Jonathan Wakely writes: > On 13/11/19 17:59 -0800, Thomas Rodgers wrote: >>+/** @file include/stop_token >>+ * This is a Standard C++ Library header. >>+ */ >>+ >>+#ifndef _GLIBCXX_STOP_TOKEN >>+#define _GLIBCXX_STOP_TOKEN >>+ >>+#if __cplusplus >= 201703L > > This should be > not >= > > OK for trunk with that change.
Improve checks on C2x fallthrough attribute
When adding C2x attribute support, some [[fallthrough]] support appeared as a side-effect because of code for that attribute going through separate paths from the normal attribute handling. However, going through those paths without the normal attribute handlers meant that certain checks, such as for the invalid usage [[fallthrough()]], did not operate. This patch improves checks by adding this attribute to the standard attribute table, so that the parser knows it expects no arguments, along with adding an explicit check for "[[fallthrough]];" attribute-declarations at top level. As with other attributes, there are still cases where warnings should be pedwarns because C2x constraints are violated, but this patch improves the attribute handling. Bootstrapped with no regressions on x86_64-pc-linux-gnu. Applied to mainline. gcc/c: 2019-11-15 Joseph Myers * c-decl.c (std_attribute_table): Add fallthrough. * c-parser.c (c_parser_declaration_or_fndef): Diagnose fallthrough attribute at top level. gcc/c-family: 2019-11-15 Joseph Myers * c-attribs.c (handle_fallthrough_attribute): Remove static. * c-common.h (handle_fallthrough_attribute): Declare. gcc/testsuite: 2019-11-15 Joseph Myers * gcc.dg/c2x-attr-fallthrough-2.c, gcc.dg/c2x-attr-fallthrough-3.c: New tests. Index: gcc/c/c-decl.c === --- gcc/c/c-decl.c (revision 278268) +++ gcc/c/c-decl.c (working copy) @@ -4343,6 +4343,8 @@ const struct attribute_spec std_attribute_table[] affects_type_identity, handler, exclude } */ { "deprecated", 0, 1, false, false, false, false, handle_deprecated_attribute, NULL }, + { "fallthrough", 0, 0, false, false, false, false, +handle_fallthrough_attribute, NULL }, { NULL, 0, 0, false, false, false, false, NULL, NULL } }; Index: gcc/c/c-parser.c === --- gcc/c/c-parser.c(revision 278268) +++ gcc/c/c-parser.c(working copy) @@ -1927,9 +1927,15 @@ c_parser_declaration_or_fndef (c_parser *parser, b { if (fallthru_attr_p != NULL) *fallthru_attr_p = true; - tree fn = build_call_expr_internal_loc (here, IFN_FALLTHROUGH, - void_type_node, 0); - add_stmt (fn); + if (nested) + { + tree fn = build_call_expr_internal_loc (here, IFN_FALLTHROUGH, + void_type_node, 0); + add_stmt (fn); + } + else + pedwarn (here, OPT_Wattributes, +"% attribute at top level"); } else if (empty_ok && !(have_attrs && specs->non_std_attrs_seen_p)) Index: gcc/c-family/c-attribs.c === --- gcc/c-family/c-attribs.c(revision 278268) +++ gcc/c-family/c-attribs.c(working copy) @@ -144,7 +144,6 @@ static tree handle_simd_attribute (tree *, tree, t static tree handle_omp_declare_target_attribute (tree *, tree, tree, int, bool *); static tree handle_designated_init_attribute (tree *, tree, tree, int, bool *); -static tree handle_fallthrough_attribute (tree *, tree, tree, int, bool *); static tree handle_patchable_function_entry_attribute (tree *, tree, tree, int, bool *); static tree handle_copy_attribute (tree *, tree, tree, int, bool *); @@ -4114,7 +4113,7 @@ handle_designated_init_attribute (tree *node, tree /* Handle a "fallthrough" attribute; arguments as in struct attribute_spec.handler. */ -static tree +tree handle_fallthrough_attribute (tree *, tree name, tree, int, bool *no_add_attrs) { Index: gcc/c-family/c-common.h === --- gcc/c-family/c-common.h (revision 278268) +++ gcc/c-family/c-common.h (working copy) @@ -1359,6 +1359,7 @@ extern void warn_for_multistatement_macros (locati extern bool attribute_takes_identifier_p (const_tree); extern tree handle_deprecated_attribute (tree *, tree, tree, int, bool *); extern tree handle_unused_attribute (tree *, tree, tree, int, bool *); +extern tree handle_fallthrough_attribute (tree *, tree, tree, int, bool *); extern int parse_tm_stmt_attr (tree, int); extern int tm_attr_to_mask (tree); extern tree tm_mask_to_attr (int); Index: gcc/testsuite/gcc.dg/c2x-attr-fallthrough-2.c === --- gcc/testsuite/gcc.dg/c2x-attr-fallthrough-2.c (nonexistent) +++ gcc/testsuite/gcc.dg/c2x-attr-fallthrough-2.c (working copy) @@ -0,0 +1,35 @@ +/* Test C2x attribute syntax. Invalid use of fallthrough attribute. */ +/* { dg-do compile } */ +/* { dg-options "-s
Re: introduce -fcallgraph-info option
On Nov 8, 2019, Eric Gallager wrote: > If you're touching the -auxbase option... is that related to the other > options starting with -aux at all? 'fraid they're entirely unrelated. We're talking about how the compiler names aux and dump output files, which is not related with the contents of an explicitly-named output file as the PR you mentioned. -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
Re: introduce -fcallgraph-info option
On Nov 8, 2019, Richard Biener wrote: > Wow, thanks for the elaborate write-up! I wonder if we can > cut&paste this into documentation somewhere appropriate, maybe > there's already a section for "auxiliary compiler outputs". Sure, that makes sense. >> I'm a little hesitant, this amounts to quite significant behavior >> changes. Do these seem acceptable and desirable nevertheless? > I think the current state is somewhat of a mess and in some > cases confusing and your suggestion sounds like an overall > improvement to me (you didn't actually suggest to remove > either of the -dump{base,dir} -auxbase{-strip} options?) I was trying to narrow down the desired behavior before trying to figure out what options we could do away with. If what I proposed was acceptable, I thought we could drop the internal -auxbase* options altogether. However, I missed one relevant case in my analysis. I suggested the auxbase internally derived from dumpbase would drop the dumpbase extension iff the extension matched that of the input file name. That doesn't work when compilation takes an intermediate file rather than the input, e.g., in a -save-temps compilation, in which we'll have separate preprocessing and the actual compiler will take the saved preprocessed input, but should still output dumps to files named after the .c input. ex $CC -S srcdir/foo.c -o objdir/foo.s -save-temps -> objdir/foo.i objdir/foo.s objdir/foo.su objdir/foo.c.#t.original The compilation line would only take the .c from -dumpbase, but since its input is .i, it wouldn't match, and we wouldn't strip the .c from aux outputs, and would instead generate: -> objdir/foo.i objdir/foo.s objdir/foo.c.su objdir/foo.c.#t.original ^^ (which would likely be ok for .su, but quite unexpected for .dwo) In order to address this, I propose we add an internal option (not for the driver), -dumpbase-ext, that names the extension to be discarded from dumpbase to form aux output names. -dumpdir objdir -dumpbase foo.c -dumpbase-ext .c The new -dumpbase-ext option specifies the extension to drop from the specified -dumpbase to form aux output names, but dump output names keep that intermediate extension. When absent, we take it from the main input file. So aux outputs end up as objdir/foo.* whereas dump outputs end up as objdump/foo.c.*, just as expected. We could keep -dumpbase-ext an internal option, used only when doing separate preprocessing, but it might make sense to expose it for use along with -dumpbase for such tools as ccache and distcc, that call the compiler driver with internal .i files, but would still prefer dumps and aux files to be generated just as they would have for the .c files. Specs would change from: %{!dumpbase:-dumpbase %B} %{c|S:%{o*:-auxbase-strip %*} %{!o*:-auxbase %b}}} %{!c:%{!S:-auxbase %b} to %{!dumpdir:%{o*:-dumpdir %:dirname(%*)}} %{c|S:%{!dumpbase:%{o*:-dumpbase %:replace-extension(%:basename(%*) %:extension(%i))} %{!o*:-dumpbase %b}}} %{!c:%{!S:-dumpbase %b} and add to separate preprocessing commands: %{!dumpbase-ext:-dumpbase-ext %:extension(%i)} Then we'd set up aux_base_name from dump_base_name minus the extension, given or taken from main_input_filename. -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
[PATCH], V8, #6 of #6, Testsuite: Test -fstack-protect-strong works with prefixed addressing
This patch checks whether the -fstack-protect-strong option works with a large stack frame on -mcpu=future systems where prefixed instructions are generated. Can I check this into the FSF trunk? 2019-11-14 Michael Meissner * gcc.target/powerpc/prefix-stack-protect.c: New test to make sure -fstack-protect-strong works with prefixed addressing. --- /tmp/byVdyb_prefix-stack-protect.c 2019-11-13 17:45:35.374176204 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c 2019-11-13 17:45:35.143174125 -0500 @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future -fstack-protector-strong" } */ + +/* Test that we can handle large stack frames with -fstack-protector-strong and + prefixed addressing. */ + +extern long foo (char *); + +long +bar (void) +{ + char buffer[0x2]; + return foo (buffer) + 1; +} + +/* { dg-final { scan-assembler {\mpld\M} } } */ +/* { dg-final { scan-assembler {\mpstd\M} } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH], V8, #5 of 6, Testsuite: Test PC-relative load/store instructions
This patch adds tests for using the PC-relative addressing on the 'future' system. Can I check this patch into the FSF trunk after the patch in the V7 series that enables PC-relative addressing by default on 64-bit Linux systems has been commited? 2019-11-14 Michael Meissner * gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h: New set of tests to test prefixed addressing on 'future' system with PC-relative tests. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-qi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sd.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sf.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-si.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-udi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uhi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uqi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-usi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-pcrel-v2df.c: New test. --- /tmp/79Y8V6_prefix-pcrel-dd.c 2019-11-13 17:43:34.462087329 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c 2019-11-13 17:43:34.183084816 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for SImode. */ + +#define TYPE _Decimal64 + +#include "prefix-pcrel.h" + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ --- /tmp/st8ftv_prefix-pcrel-df.c 2019-11-13 17:43:34.472087419 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c 2019-11-13 17:43:34.188084861 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for DFmode. */ + +#define TYPE double + +#include "prefix-pcrel.h" + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ --- /tmp/Wo2P1T_prefix-pcrel-di.c 2019-11-13 17:43:34.479087482 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c 2019-11-13 17:43:34.194084915 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for DImode. */ + +#define TYPE long + +#include "prefix-pcrel.h" + +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */ --- /tmp/KmOSBi_prefix-pcrel-hi.c 2019-11-13 17:43:34.487087554 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c 2019-11-13 17:43:34.199084960 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for HImode. */ + +#define TYPE short + +#include "prefix-pcrel.h" + +/* { dg-final { scan-assembler-times {\mplh[az]\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpsth\M} 2 } } */ --- /tmp/BalpdH_prefix-pcrel-kf.c 2019-11-13 17:43:34.494087617 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c 2019-11-13 17:43:34.205085014 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for KFmode. */ + +#define TYPE __float128 + +#include "prefix-pcrel.h" + +/* { dg-final { scan-assembler-times {\mplxv\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstxv\M} 2 } } */ --- /tmp/FMdpQ5_prefix-pcrel-qi.c 2019-11-13 17:43:34.502087689 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-qi.c 2019-11-13 17:43:34.210085059 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for QImode. */ + +#define TYPE signed char + +#include "p
[PATCH], V8, #4 of 6, Testsuite: Test for prefixed instructions with large offsets
This patch tests whether using large numeric offsets causes prefixed loads or stores to be generated. Can I check this patch into the FSF trunk? 2019-11-14 Michael Meissner * gcc/testsuite/gcc.target/powerpc/prefix-large.h: New set of tests to test prefixed addressing on 'future' system with large numeric offsets. * gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-df.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-di.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-qi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-sd.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-sf.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-si.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-udi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-uhi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-uqi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-usi.c: New test. * gcc/testsuite/gcc.target/powerpc/prefix-large-v2df.c: New test. --- /tmp/RMaUEu_prefix-large-dd.c 2019-11-13 17:42:31.960524470 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c 2019-11-13 17:42:31.719522299 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset. */ + +#define TYPE _Decimal64 + +#include "prefix-large.h" + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ --- /tmp/ASyj4G_prefix-large-df.c 2019-11-13 17:42:31.968524542 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-large-df.c 2019-11-13 17:42:31.725522354 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset. */ + +#define TYPE double + +#include "prefix-large.h" + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ --- /tmp/uCv6uT_prefix-large-di.c 2019-11-13 17:42:31.975524605 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-large-di.c 2019-11-13 17:42:31.730522399 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset. */ + +#define TYPE long + +#include "prefix-large.h" + +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */ --- /tmp/M6slX5_prefix-large-hi.c 2019-11-13 17:42:31.983524677 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c 2019-11-13 17:42:31.735522443 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset. */ + +#define TYPE short + +#include "prefix-large.h" + +/* { dg-final { scan-assembler-times {\mplh[az]\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpsth\M} 2 } } */ --- /tmp/iEQZqi_prefix-large-kf.c 2019-11-13 17:42:31.990524740 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c 2019-11-13 17:42:31.740522489 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset. */ + +#define TYPE __float128 + +#include "prefix-large.h" + +/* { dg-final { scan-assembler-times {\mplxv\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstxv\M} 2 } } */ --- /tmp/01w3Vu_prefix-large-qi.c 2019-11-13 17:42:31.997524803 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-large-qi.c 2019-11-13 17:42:31.745522534 -0500 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset.
[PATCH], V8, #3 of 6, Testsuite: Insure no prefixed instruction uses update addressing
The prefixed instructions do not support the update form of the memory instruction (i.e. internally this is addresses using PRE_INC, PRE_DEC, or PRE_MODIFY). Can I check this into the FSF trunk? 2019-11-14 Michael Meissner * gcc.target/powerpc/prefix-premodify.c: New test to make sure we do not generate PRE_INC, PRE_DEC, or PRE_MODIFY on prefixed loads or stores. --- /tmp/LMc94y_prefix-premodify.c 2019-11-13 17:41:36.037020850 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-premodify.c 2019-11-13 17:41:35.807018779 -0500 @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Make sure that we don't try to generate a prefixed form of the load and + store with update instructions. */ + +#ifndef SIZE +#define SIZE 5 +#endif + +struct foo { + unsigned int field; + char pad[SIZE]; +}; + +struct foo *inc_load (struct foo *p, unsigned int *q) +{ + *q = (++p)->field; + return p; +} + +struct foo *dec_load (struct foo *p, unsigned int *q) +{ + *q = (--p)->field; + return p; +} + +struct foo *inc_store (struct foo *p, unsigned int *q) +{ + (++p)->field = *q; + return p; +} + +struct foo *dec_store (struct foo *p, unsigned int *q) +{ + (--p)->field = *q; + return p; +} + +/* { dg-final { scan-assembler-times {\mpli\M|\mpla\M|\mpaddi\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mplwz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstw\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mp?lwzu\M} } } */ +/* { dg-final { scan-assembler-not {\mp?stwzu\M} } } */ +/* { dg-final { scan-assembler-not {\maddis\M} } } */ +/* { dg-final { scan-assembler-not {\maddi\M}} } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH], V8, #2 of 6, Testsuite: Test illegal DS/DQ offsets become prefixed insns
This test tests whether traditional DS and DQ instructions that require the bottom 2 bits of the offset to be zero (DS) or the bottom 4 of the offset to be zero (DQ) properly generate the prefixed form of the instruction instead of loading the offset into a GPR and doing an indexed memory operation. Can I check test this into the FSF trunk? 2019-11-14 Michael Meissner * gcc.target/powerpc/prefix-odd-memory.c: New test to make sure prefixed instructions are generated if an offset would not be legal for the non-prefixed DS/DQ instructions. --- /tmp/Clb8P3_prefix-odd-memory.c 2019-11-13 17:40:31.750441916 -0500 +++ gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c2019-11-13 17:40:31.568440277 -0500 @@ -0,0 +1,156 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests whether we can generate a prefixed load/store operation for addresses + that don't meet DS/DQ alignment constraints. */ + +unsigned long +load_uc_odd (unsigned char *p) +{ + return p[1]; /* should generate LBZ. */ +} + +long +load_sc_odd (signed char *p) +{ + return p[1]; /* should generate LBZ + EXTSB. */ +} + +unsigned long +load_us_odd (unsigned char *p) +{ + return *(unsigned short *)(p + 1); /* should generate LHZ. */ +} + +long +load_ss_odd (unsigned char *p) +{ + return *(short *)(p + 1);/* should generate LHA. */ +} + +unsigned long +load_ui_odd (unsigned char *p) +{ + return *(unsigned int *)(p + 1); /* should generate LWZ. */ +} + +long +load_si_odd (unsigned char *p) +{ + return *(int *)(p + 1); /* should generate PLWA. */ +} + +unsigned long +load_ul_odd (unsigned char *p) +{ + return *(unsigned long *)(p + 1);/* should generate PLD. */ +} + +long +load_sl_odd (unsigned char *p) +{ + return *(long *)(p + 1); /* should generate PLD. */ +} + +float +load_float_odd (unsigned char *p) +{ + return *(float *)(p + 1);/* should generate LFS. */ +} + +double +load_double_odd (unsigned char *p) +{ + return *(double *)(p + 1); /* should generate LFD. */ +} + +__ieee128 +load_ieee128_odd (unsigned char *p) +{ + return *(__ieee128 *)(p + 1);/* should generate PLXV. */ +} + +void +store_uc_odd (unsigned char uc, unsigned char *p) +{ + p[1] = uc; /* should generate STB. */ +} + +void +store_sc_odd (signed char sc, signed char *p) +{ + p[1] = sc; /* should generate STB. */ +} + +void +store_us_odd (unsigned short us, unsigned char *p) +{ + *(unsigned short *)(p + 1) = us; /* should generate STH. */ +} + +void +store_ss_odd (signed short ss, unsigned char *p) +{ + *(signed short *)(p + 1) = ss; /* should generate STH. */ +} + +void +store_ui_odd (unsigned int ui, unsigned char *p) +{ + *(unsigned int *)(p + 1) = ui; /* should generate STW. */ +} + +void +store_si_odd (signed int si, unsigned char *p) +{ + *(signed int *)(p + 1) = si; /* should generate STW. */ +} + +void +store_ul_odd (unsigned long ul, unsigned char *p) +{ + *(unsigned long *)(p + 1) = ul; /* should generate PSTD. */ +} + +void +store_sl_odd (signed long sl, unsigned char *p) +{ + *(signed long *)(p + 1) = sl;/* should generate PSTD. */ +} + +void +store_float_odd (float f, unsigned char *p) +{ + *(float *)(p + 1) = f; /* should generate STF. */ +} + +void +store_double_odd (double d, unsigned char *p) +{ + *(double *)(p + 1) = d; /* should generate STD. */ +} + +void +store_ieee128_odd (__ieee128 ieee, unsigned char *p) +{ + *(__ieee128 *)(p + 1) = ieee;/* should generate PSTXV. */ +} + +/* { dg-final { scan-assembler-times {\mextsb\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlbz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlfd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlfs\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlha\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlhz\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlwz\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mpld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mplwa\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mplxv\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstxv\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstfd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstfs\M} 1 } } */ +/* { dg-final { scan-assembler-times {\msth\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstw\M} 2 } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH], V8, #1 of 6, Tessuite: Add PADDI tests
This patch adds 3 tests that tests whether PLI (PADDI) is generated to load up DImode constants, load up SImode constants, and adding 34-bit constants. Once the appropriate patches to generate this code have been checked in (V7, #1-3), can I check these new tests into the FSF trunk? 2019-11-14 Michael Meissner * gcc.target/powerpc/paddi-1.c: New test to test using PLI to load up a large DImode constant. * gcc.target/powerpc/paddi-2.c: New test to test using PLI to load up a large SImode constant. * gcc.target/powerpc/paddi-3.c: New test to test using PADDI to add a large DImode constant. --- /tmp/s2UNQW_paddi-1.c 2019-11-13 17:39:21.274807246 -0500 +++ gcc/testsuite/gcc.target/powerpc/paddi-1.c 2019-11-13 17:39:21.067805382 -0500 @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PADDI is generated to add a large constant. */ +unsigned long +add (unsigned long a) +{ + return a + 0x12345678UL; +} + +/* { dg-final { scan-assembler {\mpaddi\M} } } */ --- /tmp/T53ePo_paddi-2.c 2019-11-13 17:39:21.283807328 -0500 +++ gcc/testsuite/gcc.target/powerpc/paddi-2.c 2019-11-13 17:39:21.069805400 -0500 @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PLI (PADDI) is generated to load a large constant. */ +unsigned long +large (void) +{ + return 0x12345678UL; +} + +/* { dg-final { scan-assembler {\mpli\M} } } */ --- /tmp/gyt7OQ_paddi-3.c 2019-11-13 17:39:21.291807400 -0500 +++ gcc/testsuite/gcc.target/powerpc/paddi-3.c 2019-11-13 17:39:21.071805418 -0500 @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PLI (PADDI) is generated to load a large constant for SImode. */ +void +large_si (unsigned int *p) +{ + *p = 0x12345U; +} + +/* { dg-final { scan-assembler {\mpli\M} } } */ -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
PowerPC V8 testsuite patches
After the V7 patches have been processed, these patches add various tests for new features with -mcpu=future. If we don't apply the last patch of V7 (that sets the defaults for 64-bit Linux to enable both prefixed and PC-relative addressing when -mcpu=future is used), some of these patches may need to be modified to have the appropriate -mpcrel, etc. switches. I would prefer to get the default changed, but if there are reasons why we can't do that patch, I wold prefer to get these patches in ASAP. There are 6 patches in this set. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Support C2x [[deprecated]] attribute
This patch adds support for the C2x [[deprecated]] attribute. All the actual logic for generating warnings can be identical to the GNU __attribute__ ((deprecated)), as can the attribute handler, so this is just a matter of wiring things up appropriately and adding the checks specified in the standard. Unlike for C++, this patch gives "deprecated" an entry in a table of standard attributes rather than remapping it internally to the GNU attribute, as that seems a cleaner approach to me. Specifically, the only form of arguments to the attribute permitted in the standard is (string-literal); empty parentheses are not permitted in the case of no arguments, and a string literal (which includes concatenated adjacent string literals, because concatenation is an earlier phase of translation) cannot have further redundant parentheses around it. For the case of empty parentheses, this patch makes the C parser disallow them for all known attributes using the [[]] syntax, as done for C++. For string literals (where the C++ front end is missing the check to avoid redundant parentheses, 92521 filed for that issue), a special case is inserted in the C parser. A known issue that I think can be addressed later as a bug fix is that the warnings for the attribute being ignored in certain cases (attribute declarations, statements, most uses on types) ought to be pedwarns, as those usages are constraint violations. Bad handling of wide string literals with this attribute is also a pre-existing bug (91182 - although that's filed as a C++ bug, the code in question is language-independent, in tree.c). Bootstrapped with no regressions on x86_64-pc-linux-gnu. Applied to mainline. gcc/c: 2019-11-15 Joseph Myers * c-decl.c (std_attribute_table): New. (c_init_decl_processing): Register attributes from std_attribute_table. * c-parser.c (c_parser_attribute_arguments): Add arguments require_string and allow_empty_args. All callers changed. (c_parser_std_attribute): Set require_string argument for "deprecated" attribute. gcc/c-family: 2019-11-15 Joseph Myers * c-attribs.c (handle_deprecated_attribute): Remove static. * c-common.h (handle_deprecated_attribute): Declare. gcc/testsuite: 2019-11-15 Joseph Myers * gcc.dg/c2x-attr-deprecated-1.c, gcc.dg/c2x-attr-deprecated-2.c, gcc.dg/c2x-attr-deprecated-3.c: New tests. Index: gcc/c/c-decl.c === --- gcc/c/c-decl.c (revision 278265) +++ gcc/c/c-decl.c (working copy) @@ -4336,6 +4336,16 @@ lookup_name_fuzzy (tree name, enum lookup_name_fuz } +/* Table of supported standard (C2x) attributes. */ +const struct attribute_spec std_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "deprecated", 0, 1, false, false, false, false, +handle_deprecated_attribute, NULL }, + { NULL, 0, 0, false, false, false, false, NULL, NULL } +}; + /* Create the predefined scalar types of C, and some nodes representing standard constants (0, 1, (void *) 0). Initialize the global scope. @@ -4349,6 +4359,8 @@ c_init_decl_processing (void) /* Initialize reserved words for parser. */ c_parse_init (); + register_scoped_attributes (std_attribute_table, NULL); + current_function_decl = NULL_TREE; gcc_obstack_init (&parser_obstack); Index: gcc/c/c-parser.c === --- gcc/c/c-parser.c(revision 278265) +++ gcc/c/c-parser.c(working copy) @@ -4478,7 +4478,8 @@ c_parser_gnu_attribute_any_word (c_parser *parser) allow identifiers declared as types to start the arguments? */ static tree -c_parser_attribute_arguments (c_parser *parser, bool takes_identifier) +c_parser_attribute_arguments (c_parser *parser, bool takes_identifier, + bool require_string, bool allow_empty_args) { vec *expr_list; tree attr_args; @@ -4518,7 +4519,21 @@ static tree else { if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN)) - attr_args = NULL_TREE; + { + if (!allow_empty_args) + error_at (c_parser_peek_token (parser)->location, + "parentheses must be omitted if " + "attribute argument list is empty"); + attr_args = NULL_TREE; + } + else if (require_string) + { + /* The only valid argument for this attribute is a string +literal. Handle this specially here to avoid accepting +string literals with excess parentheses. */ + tree string = c_parser_string_literal (parser, false, true).value; + attr_args = build_tree_list (NULL_TREE, string); + } else { expr_list = c_parser_expr_list (parser, false, true, @@ -4601,7 +4616,8 @@ c_parser_gnu_attribute (c_pa
Re: Use known value ranges while evaluating ipa predicates
On Tue, Nov 12, 2019 at 02:03:32PM +0100, Jan Hubicka wrote: > this implements use of value ranges in ipa-predicates so inliner know > when some tests are going to be removed (especially NULL pointer > checks). > > Bootstrapped/regtested x86_64-linux. Martin, I would apprechiate if you > look on the patch. > * gcc.dg/ipa/inline-9.c: New testcase. The testcase is now UNRESOLVED on both x86_64-linux and i686-linux: > --- testsuite/gcc.dg/ipa/inline-9.c (nonexistent) > +++ testsuite/gcc.dg/ipa/inline-9.c (working copy) > @@ -0,0 +1,21 @@ > +/* { dg-options "-Os -fdump-ipa-inline" } */ > +int test(int a) > +{ > + if (a>100) > + { > + foo(); > + foo(); > + foo(); > + foo(); > + foo(); > + foo(); > + foo(); > + foo(); > + } > +} > +main() > +{ > + for (int i=0;i<100;i++) > +test(i); > +} > +/* { dg-final { scan-tree-dump "Inlined 1 calls" "inline" } } */ PASS: gcc.dg/ipa/inline-9.c (test for excess errors) gcc.dg/ipa/inline-9.c: dump file does not exist UNRESOLVED: gcc.dg/ipa/inline-9.c scan-tree-dump inline "Inlined 1 calls" but fixing the obvious bug in there, s/scan-tree-dump/scan-ipa-dump/ doesn't help, nothing is really inlined. Jakub
[PATCH], V7, #7 of 7, Turn on -mpcrel for Linux 64-bit, but not for other targets
This patch will enable prefixed addressing and PC-relative addressing if the OS target support indicates that the OS supports either prefixed addressing and whether it supports PC-relative addressing when the user uses -mcpu=future. At the moment, 64-bit Linux is the only system that enables both prefixed addressing and PC-relative addressing. I have built bootstrap compilers with this patch and there were no regressions in the testsuite. In addition, during development, I set each of the two options, and built a copiler with it, and I observed that the expected behavior for the default of whether prefixed addressing an PC-relative support is enabled. Can I check this into the FSF trunk? 2019-11-14 Michael Meissner * config/rs6000/linux64.h (TARGET_PREFIXED_ADDR_DEFAULT): Enable prefixed addressing by default. (TARGET_PCREL_DEFAULT): Enable pc-relative addressing by default. * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Only enable -mprefixed-addr and -mpcrel if the OS tm.h says to enable it. (ADDRESSING_FUTURE_MASKS): New mask macro. (OTHER_FUTURE_MASKS): Use ADDRESSING_FUTURE_MASKS. * config/rs6000/rs6000.c (TARGET_PREFIXED_ADDR_DEFAULT): Do not enable -mprefixed-addr unless the OS tm.h says to. (TARGET_PCREL_DEFAULT): Do not enable -mpcrel unless the OS tm.h says to. (rs6000_option_override_internal): Do not enable -mprefixed-addr or -mpcrel unless the OS tm.h says to enable it. Add more checks for -mcpu=future. Index: gcc/config/rs6000/linux64.h === --- gcc/config/rs6000/linux64.h (revision 278173) +++ gcc/config/rs6000/linux64.h (working copy) @@ -640,3 +640,11 @@ extern int dot_symbols; enabling the __float128 keyword. */ #undef TARGET_FLOAT128_ENABLE_TYPE #define TARGET_FLOAT128_ENABLE_TYPE 1 + +/* Enable support for pc-relative and numeric prefixed addressing on the + 'future' system. */ +#undef TARGET_PREFIXED_ADDR_DEFAULT +#define TARGET_PREFIXED_ADDR_DEFAULT 1 + +#undef TARGET_PCREL_DEFAULT +#define TARGET_PCREL_DEFAULT 1 Index: gcc/config/rs6000/rs6000-cpus.def === --- gcc/config/rs6000/rs6000-cpus.def (revision 278173) +++ gcc/config/rs6000/rs6000-cpus.def (working copy) @@ -75,15 +75,21 @@ | OPTION_MASK_P8_VECTOR\ | OPTION_MASK_P9_VECTOR) -/* Support for a future processor's features. Do not enable -mpcrel until it - is fully functional. */ +/* Support for a future processor's features. The prefixed and pc-relative + addressing bits are not added here. Instead, rs6000.c adds them if the OS + tm.h says that it supports the addressing modes. */ #define ISA_FUTURE_MASKS_SERVER(ISA_3_0_MASKS_SERVER \ -| OPTION_MASK_FUTURE \ +| OPTION_MASK_FUTURE) + +/* Addressing related flags on a future processor. These flags are broken out + because not all targets will support either pc-relative addressing, or even + prefixed addressing, and we want to clear all of the addressing bits + on targets that cannot support prefixed/pcrel addressing. */ +#define ADDRESSING_FUTURE_MASKS(OPTION_MASK_PCREL \ | OPTION_MASK_PREFIXED_ADDR) /* Flags that need to be turned off if -mno-future. */ -#define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL \ -| OPTION_MASK_PREFIXED_ADDR) +#define OTHER_FUTURE_MASKS ADDRESSING_FUTURE_MASKS /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW\ Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 278181) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -98,6 +98,16 @@ #endif #endif +/* Set up the defaults for whether prefixed addressing is used, and if it is + used, whether we want to turn on pc-relative support by default. */ +#ifndef TARGET_PREFIXED_ADDR_DEFAULT +#define TARGET_PREFIXED_ADDR_DEFAULT 0 +#endif + +#ifndef TARGET_PCREL_DEFAULT +#define TARGET_PCREL_DEFAULT 0 +#endif + /* Support targetm.vectorize.builtin_mask_for_load. */ GTY(()) tree altivec_builtin_mask_for_load; @@ -2535,6 +2545,14 @@ rs6000_debug_reg_global (void) if (TARGET_DIRECT_MOVE_128) fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit mfvsrld element", (int)VECTOR_ELEMENT_MFVSRLD_64BIT); + + if (TARGET_FUTURE) +{ + fprintf (stderr, DEBUG_FMT_D, "TARGET_PREFIXED_ADDR_DEFAULT", + TARGET_PREFIXED_ADDR_DEFAULT); + fprintf (stderr, DEBUG_FMT_D, "TAR
[PATCH], V7, #6 of 7, Fix issues with vector extract and prefixed instructions
This patch fixes two issues with vector extracts and prefixed instructions. The first is if you use a vector extract on a vector that is located in memory and to access the vector, you use a PC-relative address with a veriable index. I.e.: #include static vector int vi; int get (int n) { return vec_extract (vi, n); } In this case, the current code re-uses the temporary for calculating the offset of the element to load up the address of the vector, losing the offset. This code prevents the combiner from combining loading the vector from memory and the vector extract if the vector is accessed via a PC-relative address. Instead, the vector is loaded up into a register, and the variable extract from a register is done. I needed to add a new constraint (em) in addition to new predicate functions. I discovered that with the predicate function alone, the register allocator would re-create the address. The constraint prevents this combination. I also modified the vector extract code to generate a single PC-relative load if the vector has a PC-relative address and the offset is constant. I have built a bootstrap compiler with this change, and there were no regressions in the test suite. Can I check this into the FSF trunk? 2019-11-14 Michael Meissner * config/rs6000/constraints.md (em constraint): New constraint for non-prefixed memory. * config/rs6000/predicates.md (non_prefixed_memory): New predicate. (reg_or_non_prefixed_memory): New predicate. * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support for optimizing extracting a constant vector element from a vector that uses a prefixed address. If the element number is variable and the address uses a prefixed address, abort. * config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator): Do not allow combining prefixed memory with a variable vector extract. (vsx_extract_v4sf_var): Do not allow combining prefixed memory with a variable vector extract. (vsx_extract__var, VSX_EXTRACT_I iterator): Do not allow combining prefixed memory with a variable vector extract. (vsx_extract__mode_var): Do not allow combining prefixed memory with a variable vector extract. * doc/md.texi (PowerPC constraints): Document the em constraint. Index: gcc/config/rs6000/constraints.md === --- gcc/config/rs6000/constraints.md(revision 278173) +++ gcc/config/rs6000/constraints.md(working copy) @@ -202,6 +202,11 @@ (define_constraint "H" ;; Memory constraints +(define_memory_constraint "em" + "A memory operand that does not contain a prefixed address." + (and (match_code "mem") + (match_test "non_prefixed_memory (op, mode)"))) + (define_memory_constraint "es" "A ``stable'' memory operand; that is, one which does not include any automodification of the base register. Unlike @samp{m}, this constraint Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 278177) +++ gcc/config/rs6000/predicates.md (working copy) @@ -1836,3 +1836,24 @@ (define_predicate "prefixed_memory" { return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); }) + +;; Return true if the operand is a memory address that does not use a prefixed +;; address. +(define_predicate "non_prefixed_memory" + (match_code "mem") +{ + /* If the operand is not a valid memory operand even if it is not prefixed, + do not return true. */ + if (!memory_operand (op, mode)) +return false; + + return !address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); +}) + +;; Return true if the operand is either a register or it is a non-prefixed +;; memory operand. +(define_predicate "reg_or_non_prefixed_memory" + (match_code "reg,subreg,mem") +{ + return gpc_reg_operand (op, mode) || non_prefixed_memory (op, mode); +}) Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 278178) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6734,6 +6734,7 @@ rs6000_adjust_vec_address (rtx scalar_re rtx element_offset; rtx new_addr; bool valid_addr_p; + bool pcrel_p = pcrel_local_address (addr, Pmode); /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY. */ gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC); @@ -6771,6 +6772,38 @@ rs6000_adjust_vec_address (rtx scalar_re else if (REG_P (addr) || SUBREG_P (addr)) new_addr = gen_rtx_PLUS (Pmode, addr, element_offset); + /* Optimize PC-relative addresses with a constant offset. */ + else if (pcrel_p && CONST_INT_P (element_offset)) +{ + rtx addr2 = addr; + HOST_WIDE_INT offset = INTVAL (element_offset);
[PATCH], V7, #5 of 7, Add more effective targets for the 'future' system to target-supports.
This patch adds more effective targets to the target-supports.exp in the testsuite. I tried to break it down to whether prefixed instructions are allowed, whether the target is generating 64-bit code with prefixed instructions, and if -mpcrel support is available. I also enabled 'future' testing on the actual hardware (or simulator). The tests in V8 will use some of these capabilities. I have run the test suite on a little endian power8 system with no degradation. Can I check this into the FSF trunk? 2019-11-14 Michael Meissner * lib/target-supports.exp (check_effective_target_powerpc_future_ok): Do not require 64-bit or Linux support before doing the test. Use a 32-bit constant in PLI. (check_effective_target_powerpc_prefixed_addr_ok): New effective target test to see if prefixed memory instructions are supported. (check_effective_target_powerpc_pcrel_ok): New effective target test to test whether PC-relative addressing is supported. (is-effective-target): Add test for the PowerPC 'future' hardware support. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 278173) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -5345,16 +5345,14 @@ proc check_effective_target_powerpc_p9mo } } -# Return 1 if this is a PowerPC target supporting -mfuture. -# Limit this to 64-bit linux systems for now until other -# targets support FUTURE. +# Return 1 if this is a PowerPC target supporting -mcpu=future. proc check_effective_target_powerpc_future_ok { } { -if { ([istarget powerpc64*-*-linux*]) } { +if { ([istarget powerpc*-*-*]) } { return [check_no_compiler_messages powerpc_future_ok object { int main (void) { long e; - asm ("pli %0,%1" : "=r" (e) : "n" (0x12345)); + asm ("pli %0,%1" : "=r" (e) : "n" (0x1234)); return e; } } "-mfuture"] @@ -5363,6 +5361,46 @@ proc check_effective_target_powerpc_futu } } +# Return 1 if this is a PowerPC target supporting -mcpu=future. The compiler +# must support large numeric prefixed addresses by default when -mfuture is +# used. We test loading up a large constant to verify that the full 34-bit +# offset for prefixed instructions is supported and we check for a prefixed +# load as well. + +proc check_effective_target_powerpc_prefixed_addr_ok { } { +if { ([istarget powerpc*-*-*]) } { + return [check_no_compiler_messages powerpc_prefixed_addr_ok object { + int main (void) { + extern long l[]; + long e, e2; + asm ("pli %0,%1" : "=r" (e) : "n" (0x12345678)); + asm ("pld %0,0x12345678(%1)" : "=r" (e2) : "r" (& l[0])); + return e - e2; + } + } "-mfuture"] +} else { + return 0 +} +} + +# Return 1 if this is a PowerPC target supporting -mfuture. The compiler must +# support PC-relative addressing when -mcpu=future is used to pass this test. + +proc check_effective_target_powerpc_pcrel_ok { } { +if { ([istarget powerpc*-*-*]) } { + return [check_no_compiler_messages powerpc_pcrel_ok object { + int main (void) { + static int s __attribute__((__used__)); + int e; + asm ("plwa %0,s@pcrel(0),1" : "=r" (e)); + return e; + } + } "-mfuture"] + } else { + return 0 + } +} + # Return 1 if this is a PowerPC target supporting -mfloat128 via either # software emulation on power7/power8 systems or hardware support on power9. @@ -7261,6 +7299,7 @@ proc is-effective-target { arg } { "named_sections" { set selected [check_named_sections_available] } "gc_sections"{ set selected [check_gc_sections_available] } "cxa_atexit" { set selected [check_cxa_atexit_available] } + "powerpc_future_hw" { set selected [check_powerpc_future_hw_available] } default { error "unknown effective target keyword `$arg'" } } } -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH] V7, #4 of 7, Add explicit (0),1 to @pcrel references
In some of my previous work, I had make a mistake forgetting that the PADDI instruction did not allow adding a PC-relative reference to a register (you can either load up a PC-relative address without adding a register, or you can add a register to a constant). The assembler allowed the instruction, but it didn't do what I expected. This patch adds an explicit (0),1 to PC-relative references. This way if you try to add the PC-relative offset to a register, it will get a syntax error. I have built compilers with this patch installed on a little endian power8 Linux system, and there were no regressions. Can I check this into the trunk? 2019-11-14 Michael Meissner * config/rs6000/rs6000.c (print_operand_address): Add (0),1 to @pcrel to catch errant usage. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 278175) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -13241,7 +13241,10 @@ print_operand_address (FILE *file, rtx x if (SYMBOL_REF_P (x) && !SYMBOL_REF_LOCAL_P (x)) fprintf (file, "@got"); - fprintf (file, "@pcrel"); + /* Specifically add (0),1 to catch uses where a @pcrel was added to a an +address with a base register, since the hardware does not support +adding a base register to a PC-relative address. */ + fprintf (file, "@pcrel(0),1"); } else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST || GET_CODE (x) == LABEL_REF) -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH], V7, #3 of 7, Use PADDI for 34-bit immediate adds
This patch generates PADDI to add 34-bit immediate constants on the 'future' system, and prevents such adds from being split. I have built and boostrapped compilers with the patch, and there were no regressions. Can I check this into the trunk? 2019-11-14 Michael Meissner * config/rs6000/predicates.md (add_operand): Add support for PADDI. * config/rs6000/rs6000.md (add3): Add support for PADDI. Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 278173) +++ gcc/config/rs6000/predicates.md (working copy) @@ -839,7 +839,8 @@ (define_special_predicate "indexed_addre (define_predicate "add_operand" (if_then_else (match_code "const_int") (match_test "satisfies_constraint_I (op) -|| satisfies_constraint_L (op)") +|| satisfies_constraint_L (op) +|| satisfies_constraint_eI (op)") (match_operand 0 "gpc_reg_operand"))) ;; Return 1 if the operand is either a non-special register, or 0, or -1. Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 278176) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -1761,15 +1761,17 @@ (define_expand "add3" }) (define_insn "*add3" - [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r") - (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b") - (match_operand:GPR 2 "add_operand" "r,I,L")))] + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r") + (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b") + (match_operand:GPR 2 "add_operand" "r,I,L,eI")))] "" "@ add %0,%1,%2 addi %0,%1,%2 - addis %0,%1,%v2" - [(set_attr "type" "add")]) + addis %0,%1,%v2 + addi %0,%1,%2" + [(set_attr "type" "add") + (set_attr "isa" "*,*,*,fut")]) (define_insn "*addsi3_high" [(set (match_operand:SI 0 "gpc_reg_operand" "=b") -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[PATCH], V7, #2 of 7, Use PLI to load up 32-bit SImode constants
This patch generates the PLI (PADDI) instruction to load up 32-bit SImode constants on the future system. It adds an alternative to movsi, and prevents the movsi load immediate from being split. I have built compilers with this patch which bootstrapped fine, and there were no regressions in the test suite. Can I check this into the trunk? 2019-11-14 Michael Meissner * config/rs6000/rs6000.md (movsi_internal1): Add support to load up 32-bit SImode integer constants with PADDI. (movsi integer constant splitter): Do not split constant if PADDI can load it up directly. Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 278175) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -6891,22 +6891,22 @@ (define_split ;; MR LA LWZ LFIWZX LXSIWZX ;; STW STFIWX STXSIWX LI LIS -;; #XXLORXXSPLTIB 0 XXSPLTIB -1 VSPLTISW -;; XXLXOR 0 XXLORC -1P9 const MTVSRWZ MFVSRWZ -;; MF%1 MT%0 NOP +;; PLI #XXLORXXSPLTIB 0 XXSPLTIB -1 +;; VSPLTISW XXLXOR 0 XXLORC -1P9 const MTVSRWZ +;; MFVSRWZ MF%1 MT%0 NOP (define_insn "*movsi_internal1" [(set (match_operand:SI 0 "nonimmediate_operand" "=r, r, r, d, v, m, Z, Z, r, r, -r, wa, wa, wa, v, -wa, v, v, wa, r, -r, *h, *h") +r, r, wa, wa, wa, +v, wa, v, v, wa, +r, r, *h, *h") (match_operand:SI 1 "input_operand" "r, U, m, Z, Z, r, d, v, I, L, -n, wa, O, wM, wB, -O, wM, wS, r, wa, -*h, r, 0"))] +eI, n, wa, O, wM, +wB, O, wM, wS, r, +wa, *h, r, 0"))] "gpc_reg_operand (operands[0], SImode) || gpc_reg_operand (operands[1], SImode)" "@ @@ -6920,6 +6920,7 @@ (define_insn "*movsi_internal1" stxsiwx %x1,%y0 li %0,%1 lis %0,%v1 + li %0,%1 # xxlor %x0,%x1,%x1 xxspltib %x0,0 @@ -6936,21 +6937,21 @@ (define_insn "*movsi_internal1" [(set_attr "type" "*, *, load,fpload, fpload, store, fpstore, fpstore, *, *, -*, veclogical, vecsimple, vecsimple, vecsimple, -veclogical, veclogical, vecsimple, mffgpr, mftgpr, -*, *, *") +*, *, veclogical, vecsimple, vecsimple, +vecsimple, veclogical, veclogical, vecsimple, mffgpr, +mftgpr, *, *, *") (set_attr "length" "*, *, *, *, *, *, *, *, *, *, -8, *, *, *, *, -*, *, 8, *, *, -*, *, *") +*, 8, *, *, *, +*, *, *, 8, *, +*, *, *, *") (set_attr "isa" "*, *, *, p8v, p8v, *, p8v, p8v, *, *, -*, p8v, p9v, p9v, p8v, -p9v,p8v, p9v, p8v, p8v, -*, *, *")]) +fut,*, p8v, p9v, p9v, +p8v,p9v, p8v, p9v, p8v, +p8v,*, *, *")]) ;; Like movsi, but adjust a SF value to be used in a SI context, i.e. ;; (set (reg:SI ...) (subreg:SI (reg:SF ...) 0)) @@ -7095,14 +7096,15 @@ (define_insn "*movsi_from_df" "xscvdpsp %x0,%x1" [(set_attr "type" "fp")]) -;; Split a load of a large constant into the appropriate two-insn -;; sequence. +;; Split a load of a large constant into the appropriate
[PATCH] V7, #1 of 7, Use PLI to load up 34-bit DImode constants
This patch adds an alternative to movdi to allow using PLI (PADDI) to load up 34-bit constants on the 'future' machine. I have built compilers with this patch, and there were no regressions in the test suite. Can I check this into the trunk? 2019-11-14 Michael Meissner * config/rs6000/rs6000.c (num_insns_constant_gpr): Add support for PADDI to load up and/or add 34-bit integer constants. (rs6000_rtx_costs): Treat constants loaded up with PADDI with the same cost as normal 16-bit constants. * config/rs6000/rs6000.md (movdi_internal64): Add support to load up 34-bit integer constants with PADDI. (movdi integer constant splitter): Add comment about PADDI. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 278173) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5552,7 +5552,7 @@ static int num_insns_constant_gpr (HOST_WIDE_INT value) { /* signed constant loadable with addi */ - if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x1) + if (SIGNED_16BIT_OFFSET_P (value)) return 1; /* constant loadable with addis */ @@ -5560,6 +5560,10 @@ num_insns_constant_gpr (HOST_WIDE_INT va && (value >> 31 == -1 || value >> 31 == 0)) return 1; + /* PADDI can support up to 34 bit signed integers. */ + else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value)) +return 1; + else if (TARGET_POWERPC64) { HOST_WIDE_INT low = ((value & 0x) ^ 0x8000) - 0x8000; @@ -20679,7 +20683,8 @@ rs6000_rtx_costs (rtx x, machine_mode mo || outer_code == PLUS || outer_code == MINUS) && (satisfies_constraint_I (x) - || satisfies_constraint_L (x))) + || satisfies_constraint_L (x) + || satisfies_constraint_eI (x))) || (outer_code == AND && (satisfies_constraint_K (x) || (mode == SImode Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 278173) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -8808,24 +8808,24 @@ (define_split DONE; }) -;; GPR store GPR load GPR move GPR li GPR lis GPR # -;; FPR store FPR load FPR move AVX store AVX store AVX load -;; AVX load VSX move P9 0 P9 -1 AVX 0/-1VSX 0 -;; VSX -1 P9 const AVX const From SPR To SPR SPR<->SPR -;; VSX->GPR GPR->VSX +;; GPR store GPR load GPR move GPR li GPR lis GPR pli +;; GPR # FPR store FPR load FPR move AVX store AVX store +;; AVX load AVX load VSX move P9 0 P9 -1 AVX 0/-1 +;; VSX 0 VSX -1 P9 const AVX const From SPRTo SPR +;; SPR<->SPR VSX->GPR GPR->VSX (define_insn "*movdi_internal64" [(set (match_operand:DI 0 "nonimmediate_operand" "=YZ, r, r, r, r, r, -m, ^d,^d,wY,Z, $v, -$v,^wa, wa,wa,v, wa, -wa,v, v, r, *h, *h, -?r,?wa") +r, m, ^d,^d,wY, Z, +$v,$v,^wa, wa,wa, v, +wa,wa,v, v, r, *h, +*h,?r,?wa") (match_operand:DI 1 "input_operand" - "r, YZ,r, I, L, nF, -^d,m, ^d,^v,$v, wY, -Z, ^wa, Oj,wM,OjwM, Oj, -wM,wS,wB,*h,r, 0, -wa,r"))] + "r, YZ,r, I, L, eI, +nF,^d,m, ^d,^v, $v, +wY,Z, ^wa, Oj,wM, OjwM, +Oj,wM,wS,wB,*h, r, +0, wa,r"))] "TARGET_POWERPC64 && (gpc_reg_operand (operands[0], DImode) || gpc_reg_operand (operands[1], DImode))" @@ -8835,6 +8835,7 @@ (define_insn "*movdi_internal64" mr %0,%1 li %0,%1 lis %0,%v1 + li %0,%1 # stfd%U0%X0 %1,%0 lfd%U1%X1 %0,%1 @@ -8858,26 +8859,28 @@ (define_insn "*movdi_internal64" mtvsrd %x0,%1" [(set_attr "type" "store, load, *, *, *, *, -fpstore,fpload, fpsimple, fpstore, fpstore, fpload, -
Re: [RFC C++ PATCH] __builtin_source_location ()
On Thu, Nov 14, 2019 at 10:15:21PM +, Jonathan Wakely wrote: > > namespace std { > > struct source_location { > >struct __impl { > > Will this work if the library type is actually in an inline namespace, > such as std::__8::source_location::__impl (as for > --enable-symvers=gnu-versioned-namespace) or > std::__v1::source_location::__impl (as it probably would be in > libc++). > > If I'm reading the patch correctly, it would work fine, because > qualified lookup for std::source_location would find that name even if > it's really in some inline namespace. I'd say so, but the unfinished testsuite coverage would need to cover it of course. > > const char *__file; > > const char *__function; > > unsigned int __line, __column; > >}; > >const void *__ptr; > > If the magic type the compiler generates is declared in the header, > then this member might as well be 'const __impl*'. Yes, with the static_cast on the __builtin_source_location () result sure. I can't easily make it return const std::source_location::__impl* though, because the initialization of the builtins happens early, before is parsed. And as it is a nested class, I think I can't predeclare it in the compiler. If it would be std::__source_location_impl instead of std::source_location::__impl, perhaps I could pretend there is namespace std { struct __source_location_impl; } but I bet that wouldn't work well with the inline namespaces. So, is const void * return from the builtin ok? Jakub
[committed] Change range_operator:fold_range to return a boolean indicating success.
One final tweak to the range-ops API. Most fold_range calls were returning true, which was a trigger for the original change to return by value. When I changed it back to a return by reference, I should also have added the boolean result back. Now all 3 routines are similar... virtual bool fold_range (value_range &r, tree type, const value_range &lh, const value_range &rh) const; virtual bool op1_range (value_range &r, tree type, const value_range &lhs, const value_range &op2) const; virtual bool op2_range (value_range &r, tree type, const value_range &lhs, const value_range &op1) const; And again, this is pretty mechanical. bootstraps and passes all tests. Checked in as revision 278266 Andrew 2019-11-14 Andrew MacLeod * range-op.h (range_operator::fold_range): Return a bool. * range-op.cc (range_operator::wi_fold): Assert supported type. (range_operator::fold_range): Assert supported type and return true. (operator_equal::fold_range): Return true. (operator_not_equal::fold_range): Same. (operator_lt::fold_range): Same. (operator_le::fold_range): Same. (operator_gt::fold_range): Same. (operator_ge::fold_range): Same. (operator_plus::op1_range): Adjust call to fold_range. (operator_plus::op2_range): Same. (operator_minus::op1_range): Same. (operator_minus::op2_range): Same. (operator_exact_divide::op1_range): Same. (operator_lshift::fold_range): Return true and adjust fold_range call. (operator_rshift::fold_range): Same. (operator_cast::fold_range): Return true. (operator_logical_and::fold_range): Same. (operator_logical_or::fold_range): Same. (operator_logical_not::fold_range): Same. (operator_bitwise_not::fold_range): Adjust call to fold_range. (operator_bitwise_not::op1_range): Same. (operator_cst::fold_range): Return true. (operator_identity::fold_range): Return true. (operator_negate::fold_range): Return true and adjust fold_range call. (operator_addr_expr::fold_range): Return true. (operator_addr_expr::op1_range): Adjust call to fold_range. (range_cast): Same. * tree-vrp.c (range_fold_binary_symbolics_p): Adjust call to fold_range. (range_fold_unary_symbolics_p): Same. Index: range-op.h === *** range-op.h (revision 278265) --- range-op.h (working copy) *** class range_operator *** 50,56 { public: // Perform an operation between 2 ranges and return it. ! virtual void fold_range (value_range &r, tree type, const value_range &lh, const value_range &rh) const; --- 50,56 { public: // Perform an operation between 2 ranges and return it. ! virtual bool fold_range (value_range &r, tree type, const value_range &lh, const value_range &rh) const; *** public: *** 73,79 const value_range &op1) const; protected: ! // Perform an operation between 2 sub-ranges and return it. virtual void wi_fold (value_range &r, tree type, const wide_int &lh_lb, const wide_int &lh_ub, --- 73,79 const value_range &op1) const; protected: ! // Perform an integral operation between 2 sub-ranges and return it. virtual void wi_fold (value_range &r, tree type, const wide_int &lh_lb, const wide_int &lh_ub, Index: range-op.cc === *** range-op.cc (revision 278265) --- range-op.cc (working copy) *** range_operator::wi_fold (value_range &r, *** 131,149 const wide_int &rh_lb ATTRIBUTE_UNUSED, const wide_int &rh_ub ATTRIBUTE_UNUSED) const { r = value_range (type); } // The default for fold is to break all ranges into sub-ranges and // invoke the wi_fold method on each sub-range pair. ! void range_operator::fold_range (value_range &r, tree type, const value_range &lh, const value_range &rh) const { if (empty_range_check (r, lh, rh)) ! return; value_range tmp; r.set_undefined (); --- 131,151 const wide_int &rh_lb ATTRIBUTE_UNUSED, const wide_int &rh_ub ATTRIBUTE_UNUSED) const { + gcc_checking_assert (value_range::supports_type_p (type)); r = value_range (type); } // The default for fold is to break all ranges into sub-ranges and // invoke the wi_fold method on each sub-range pair. ! bool range_operator::fold_range (value_range &r, tree type, const value_range &lh, const value_range &rh) const { + gcc_checking_assert (value_range::supports_type_p (type)); if (empty_range_check (r, lh, rh)) ! return true; value_range tmp; r.set_undefined (); *** range_operator::fold_range (value_range *** 157,164 wi_fold (tmp, type, lh_lb, lh_ub, rh_lb, rh_ub); r.union_ (tmp); if (r.varying_p ()) ! return; } } // The default for op1_range is to return false. --- 159,167 wi_fold (
PowerPC V7 future machine patches
This set of patches from V6 got missed, and should go in ASAP. I am breaking the patches into 3 logical steps: 1) The V7 patches that were part of the V6 patches that have not been committed. There are 7 patches in the suite (6 to the compiler, 1 to the testsuite to enable finer granularity for running the 'future' tests).. 2) The V8 patches will be all of the tests that have been modified. There are 6 testsuite patches. 3) The V9 patches will be the implementation of PCREL_OPT, which has not been submitted yet. I acknowledge that there will be a larger patch after these to clean up the insn length issues, but none of these patches do anything but add a simple insn that may or may not be prefixed. I would prefer to get these out of the way first. I have built compilers with these patches, and they have bootstrapped on a little endian power8 system with no issue. There have been no regressions in the test suite. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: [RFC C++ PATCH] __builtin_source_location ()
On 14/11/19 20:34 +0100, Jakub Jelinek wrote: Hi! The following WIP patch implements __builtin_source_location (), which returns const void pointer to a std::source_location::__impl struct that is required to contain __file, __function, __line and __column fields, the first two with const char * type, the latter some integral type. I don't have testcase coverage yet and the hash map to allow sharing of VAR_DECLs with the same location is commented out both because it doesn't compile for some reason and because hashing on location_t is not enough, we probably need to hash on both location_t and fndecl, as the baz case in the following shows. Comments? namespace std { struct source_location { struct __impl { Will this work if the library type is actually in an inline namespace, such as std::__8::source_location::__impl (as for --enable-symvers=gnu-versioned-namespace) or std::__v1::source_location::__impl (as it probably would be in libc++). If I'm reading the patch correctly, it would work fine, because qualified lookup for std::source_location would find that name even if it's really in some inline namespace. const char *__file; const char *__function; unsigned int __line, __column; }; const void *__ptr; If the magic type the compiler generates is declared in the header, then this member might as well be 'const __impl*'. constexpr source_location () : __ptr (nullptr) {} static consteval source_location current (const void *__p = __builtin_source_location ()) { source_location __ret; __ret.__ptr = __p; return __ret; } constexpr const char *file () const { return static_cast (__ptr)->__file; Not really relevant to your patch, but I'll say it here for the benefit of others reading these mails ... On IRC I suggested that the default constructor should set the __ptr member to null, and these member functions should check for null, e.g. if (__ptr) [[likely]] return __ptr->__function; else return ""; The alternative is for the default constructor to call __builtin_source_location() or refer to some static object in the runtime library, but both options waste space. Adding a [[likely]] branch to the accessors wastes no space and should only penalise users who are misusing source_location by trying to get meaningful values out of default constructed objects. If that's a bit slower I don't care.
Re: [PATCH V2] Refactor tree-loop-distribution for thread safety
Previously, the suggested patch removed all tree-loop-distributions.c global variables moving them into a struct and passing it aroung across the functions. This patch address this problem by using C++ classes instead, avoiding passing the struct as argument since it will be accessible from this pointer. gcc/ChangeLog 2019-11-14 Giuliano Belinassi * cfgloop.c (get_loop_body_in_custom_order): New. * cfgloop.h (get_loop_body_in_custom_order): New prototype. * tree-loop-distribution.c (class loop_distribution): New. (bb_top_order_cmp): Remove. (bb_top_order_cmp_r): New. (create_rdg_vertices): Move into class loop_distribution. (stmts_from_loop): Same as above. (update_for_merge): Same as above. (partition_merge_into): Same as above. (get_data_dependence): Same as above. (data_dep_in_cycle_p): Same as above. (update_type_for_merge): Same as above. (build_rdg_partition_for-vertex): Same as above. (classify_builtin_ldst): Same as above. (classify_partition): Same as above. (share_memory_accesses): Same as above. (rdg_build_partitions): Same as above. (pg_add_dependence_edges): Same as above. (build_partition_graph): Same as above. (merge_dep_scc_partitions): Same as above. (break_alias_scc_partitions): Same as above. (finalize_partitions): Same as above. (distribute_loop): Same as above. (bb_top_order_init): New method (bb_top_order_destroy): New method. (get_bb_top_order_index_size): New method. (get_bb_top_order_index_index): New method. (get_bb_top_order_index_index): New method. (loop_distribution::execute): New method. (pass_loop_distribution::execute): Instantiate loop_distribution. diff --git gcc/cfgloop.c gcc/cfgloop.c index f18d2b3f24b..db0066ea859 100644 --- gcc/cfgloop.c +++ gcc/cfgloop.c @@ -980,6 +980,19 @@ get_loop_body_in_custom_order (const class loop *loop, return bbs; } +/* Same as above, but use gcc_sort_r instead of qsort. */ + +basic_block * +get_loop_body_in_custom_order (const class loop *loop, void *data, + int (*bb_comparator) (const void *, const void *, void *)) +{ + basic_block *bbs = get_loop_body (loop); + + gcc_sort_r (bbs, loop->num_nodes, sizeof (basic_block), bb_comparator, data); + + return bbs; +} + /* Get body of a LOOP in breadth first sort order. */ basic_block * diff --git gcc/cfgloop.h gcc/cfgloop.h index 0b0154ffd7b..6256cc01ff4 100644 --- gcc/cfgloop.h +++ gcc/cfgloop.h @@ -376,6 +376,8 @@ extern basic_block *get_loop_body_in_dom_order (const class loop *); extern basic_block *get_loop_body_in_bfs_order (const class loop *); extern basic_block *get_loop_body_in_custom_order (const class loop *, int (*) (const void *, const void *)); +extern basic_block *get_loop_body_in_custom_order (const class loop *, void *, + int (*) (const void *, const void *, void *)); extern vec get_loop_exit_edges (const class loop *); extern edge single_exit (const class loop *); diff --git gcc/tree-loop-distribution.c gcc/tree-loop-distribution.c index 81784866ad1..6afb3089ec1 100644 --- gcc/tree-loop-distribution.c +++ gcc/tree-loop-distribution.c @@ -155,21 +155,10 @@ ddr_hasher::equal (const data_dependence_relation *ddr1, return (DDR_A (ddr1) == DDR_A (ddr2) && DDR_B (ddr1) == DDR_B (ddr2)); } -/* The loop (nest) to be distributed. */ -static vec loop_nest; -/* Vector of data references in the loop to be distributed. */ -static vec datarefs_vec; -/* If there is nonaddressable data reference in above vector. */ -static bool has_nonaddressable_dataref_p; - -/* Store index of data reference in aux field. */ #define DR_INDEX(dr) ((uintptr_t) (dr)->aux) -/* Hash table for data dependence relation in the loop to be distributed. */ -static hash_table *ddrs_table; - /* A Reduced Dependence Graph (RDG) vertex representing a statement. */ struct rdg_vertex { @@ -216,6 +205,83 @@ struct rdg_edge #define RDGE_TYPE(E)((struct rdg_edge *) ((E)->data))->type +/* Kind of distributed loop. */ +enum partition_kind { +PKIND_NORMAL, +/* Partial memset stands for a paritition can be distributed into a loop + of memset calls, rather than a single memset call. It's handled just + like a normal parition, i.e, distributed as separate loop, no memset + call is generated. + + Note: This is a hacking fix trying to distribute ZERO-ing stmt in a + loop nest as deep as possible. As a result, parloop achieves better + parallelization by parallelizing deeper loop nest. This hack should + be unnecessary and removed once distributed memset can be understood + and analyzed in data reference analysis. See PR82604 for more. */ +PKIND_PARTIAL_MEMSET, +PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE +}; + +/* Type of distributed loop.
Re: [PATCH v3] gdbinit.in: allow to pass function argument explicitly
On 14.11.2019 23:50, Segher Boessenkool wrote: Hi! On Thu, Nov 14, 2019 at 07:01:47PM +0300, Konstantin Kharlamov wrote: Generally, people expect functions to accept arguments directly. But ones defined in gdbinit did not use the argument, which may be confusing for newcomers. But we can't change behavior to use the argument without breaking existing users of the gdbinit. Let's fix this by adding a check for whether a user passed an argument, and either use it or go with older behavior. 2019-11-14 Konstantin Kharlamov * gdbinit.in (pp, pr, prl, pt, pct, pgg, pgq, pgs, pge, pmz, ptc, pdn, ptn, pdd, prc, pi, pbm, pel, trt): Make use of $arg0 if a user passed it (Lines in a changelog end in a dot). I love this... if it works :-) How was it tested? With what GDB versions? I ran GCC under GDB, and set a breakpoint somewhere in the GCC code where `stmt` variable is defined (I don't remember the type, ATM I'm from another PC). Then I tested 3 things: 1. that `pgg stmt` prints the `stmt` content 2. that `p 0` followed by `pgg` prints nothing (because the underlying function checks its argument for being zero) 3. that `p 1` followed by `pgg` crashes. Tested on Archlinux, gdb version is 8.3.1
Re: [PATCH] Check suitability of spill register for mode
On 11/14/19 7:34 AM, Kwok Cheung Yeung wrote: Hello Currently, when choosing a spill register, GCC just picks the first available register in the register class returned by the TAQRGET_SPILL_CLASS hook that doesn't conflict. On AMD GCN this can cause problems as DImode values stored in SGPRs must start on an even register number and TImode values on a multiple of 4. This is enforced by defining TARGET_HARD_REGNO_MODE_OK to be false when this condition is not satisfied. However, assign_spill_hard_regs does not check TARGET_HARD_REGNO_MODE_OK, so it can assign an unsuitable hard register for the mode. I have fixed this by rejecting spill registers that do not satisfy TARGET_HARD_REGNO_MODE_OK for the largest mode of the spilt register. Built and tested for an AMD GCN target. This fixes failures in: gcc.dg/vect/no-scevccp-outer-9.c gcc.dg/vect/no-scevccp-outer-10.c I have also ensured that the code bootstraps on x86_64, though it currently does not use spill registers. Okay for trunk? Spilling into registers of another class was an experimental feature. On some x86-64 micro-architectures, it resulted in a worse performance code. But I guess in your case, it is profitable. I hope after your patches it can be switched on for some x86-64 micro-architectures. As for the patch, it is OK to commit. Thank you. 2019-11-14 Kwok Cheung Yeung gcc/ * lra-spills.c (assign_spill_hard_regs): Check that the spill register is suitable for the mode. --- gcc/lra-spills.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/lra-spills.c b/gcc/lra-spills.c index 54f76cc..8fbd3a8 100644 --- a/gcc/lra-spills.c +++ b/gcc/lra-spills.c @@ -283,7 +283,8 @@ assign_spill_hard_regs (int *pseudo_regnos, int n) for (k = 0; k < spill_class_size; k++) { hard_regno = ira_class_hard_regs[spill_class][k]; - if (TEST_HARD_REG_BIT (eliminable_regset, hard_regno)) + if (TEST_HARD_REG_BIT (eliminable_regset, hard_regno) + || !targetm.hard_regno_mode_ok (hard_regno, mode)) continue; if (! overlaps_hard_reg_set_p (conflict_hard_regs, mode, hard_regno)) break;
[PATCH] PR c++/92078 Add access specifiers to specializations
Fixes mentioned issue. Tested on bootstrap and cmcsstl2. gcc/cp/ * pt.c (maybe_new_partial_specialization): Apply access to newly created partial specializations. Update comment style. gcc/testsuite/ * g++.dg/cpp2a/concepts-pr92078.C: New. Andrew pr92078.patch Description: Binary data
Re: [PATCH v3] gdbinit.in: allow to pass function argument explicitly
Hi! On Thu, Nov 14, 2019 at 07:01:47PM +0300, Konstantin Kharlamov wrote: > Generally, people expect functions to accept arguments directly. But > ones defined in gdbinit did not use the argument, which may be confusing > for newcomers. But we can't change behavior to use the argument without > breaking existing users of the gdbinit. Let's fix this by adding a check > for whether a user passed an argument, and either use it or go with > older behavior. > > 2019-11-14 Konstantin Kharlamov > > * gdbinit.in (pp, pr, prl, pt, pct, pgg, pgq, pgs, pge, pmz, ptc, pdn, > ptn, pdd, prc, pi, pbm, pel, trt): Make use of $arg0 if a user passed > it (Lines in a changelog end in a dot). I love this... if it works :-) How was it tested? With what GDB versions? Segher
Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)
> >> Cost model used by self-recursive cloning is mainly based on existing > >> stuffs > >> in ipa-cp cloning, size growth and time benefit are considered. But since > >> recursive cloning is a more aggressive cloning, we will actually have > >> another > >> problem, which is opposite to your concern. By default, current parameter > >> set used to control ipa-cp and recursive-inliner gives priority to code > >> size, > >> not completely for performance. This makes ipa-cp behave somewhat > > > Yes, for a while the cost model is quite off. On Firefox it does just > > few clonings where code size increases so it desprately needs retuning. > > > But since rescursive cloning is quite a different case from normal one, > > perhaps having independent set of limits would help in particular ... > I did consider this way, but this seems to be contradictory for normal > and recursive cloning. We could definitly discuss cost model incrementally. It is safe to do what you do currently (rely on the existing ipa-cp's overconservative heuristics). > > > > Do you have some data on code size/performance effects of this change? > > For spec2017, no obvious code size and performance change with default > > setting. > > Specifically, for exchange2, with ipa-cp-eval-threshold=1 and > > ipcp-unit-growth=80, > > performance +31%, size +7%, on aarch64. > > > ... it will help here since ipa-cp-eval-threshold value needed are quite > > off of what we need to do. > > > I wonder about the 80% of unit growth which is also more than we can > > enable by default. How it comes the overal size change is only 7%? > 343624 -> 365632 (this contains debug info, -g)recursion-depth=8 > 273488 -> 273760 (no debug info) recursion-depth=8 What seems bit odd is that ipcp's metrics ends up with 80% code growth. I will try to look into it and see if I can think better what to do about the costs. Honza
[PATCH, fortran] Extend the builtin directive
The builtin directive allows specifying the simd attribute for a builtin function. Similarly how the C language simd attribtue got extended to allow declaring a specific vector variant, update the fortran builtin directive too. Before the patch, only the masking (inbranch/notinbranch) could be specified, when declaring the availability of vector variants, e.g.: !GCC$ builtin (expf) attributes simd (notinbranch) if('x86_64') now the simdlen and simdabi (aka ISA) can be specified too, and a different name may be used instead of the vector ABI name, e.g.: !GCC$ builtin (expf) attributes simd (notinbranch, 4, 'b', 'vexpf') if('x86_64') Tested on aarch64-linux-gnu. The C language change is at https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01288.html Note: I don't have much fortran experience, so i'm not sure if the syntax makes sense or if i modified the frontend reasonably. 2019-11-14 Szabolcs Nagy * gfortran.h (struct gfc_vect_builtin): Define. * decl.c (gfc_match_gcc_builtin): Parse new flags. * trans-intrinsic.c (add_simd_flag_for_built_in): Update. (gfc_adjust_builtins): Update. gcc/testsuite/ChangeLog: 2019-11-14 Szabolcs Nagy * gfortran.dg/simd-builtins-9.f90: New test. * gfortran.dg/simd-builtins-9.h: New test. * gfortran.dg/simd-builtins-10.f90: New test. * gfortran.dg/simd-builtins-10.h: New test. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 7858973cc20..dab8a323148 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -105,7 +105,7 @@ bool directive_vector = false; bool directive_novector = false; /* Map of middle-end built-ins that should be vectorized. */ -hash_map *gfc_vectorized_builtins; +hash_map *gfc_vectorized_builtins; /* If a kind expression of a component of a parameterized derived type is parameterized, temporarily store the expression here. */ @@ -11600,9 +11600,13 @@ gfc_match_gcc_unroll (void) /* Match a !GCC$ builtin (b) attributes simd flags if('target') form: The parameter b is name of a middle-end built-in. - FLAGS is optional and must be one of: - - (inbranch) - - (notinbranch) + FLAGS is optional and must be of the form: + (mask) + (mask, simdlen) + (mask, simdlen, 'simdabi') + (mask, simdlen, 'simdabi', 'name') + where mask is inbranch or notinbranch, simdlen is an integer, simdabi + and name are strings. IF('target') is optional and TARGET is a name of a multilib ABI. @@ -11613,15 +11617,44 @@ gfc_match_gcc_builtin (void) { char builtin[GFC_MAX_SYMBOL_LEN + 1]; char target[GFC_MAX_SYMBOL_LEN + 1]; + char simdabi[GFC_MAX_SYMBOL_LEN + 1] = ""; + char name[GFC_MAX_SYMBOL_LEN + 1] = ""; + bool inbranch; + bool flags = false; + int simdlen = 0; if (gfc_match (" ( %n ) attributes simd", builtin) != MATCH_YES) return MATCH_ERROR; - gfc_simd_clause clause = SIMD_NONE; - if (gfc_match (" ( notinbranch ) ") == MATCH_YES) -clause = SIMD_NOTINBRANCH; - else if (gfc_match (" ( inbranch ) ") == MATCH_YES) -clause = SIMD_INBRANCH; + if (gfc_match (" ( ") == MATCH_YES) +{ + flags = true; + if (gfc_match ("notinbranch") == MATCH_YES) + inbranch = false; + else if (gfc_match ("inbranch") == MATCH_YES) + inbranch = true; + else + { +syntax_error: + gfc_error ("Syntax error in !GCC$ BUILTIN directive at %C"); + return MATCH_ERROR; + } + + if (gfc_match (" , ") == MATCH_YES) + if (gfc_match_small_int (&simdlen) != MATCH_YES || simdlen < 0) + goto syntax_error; + + if (gfc_match (" , ") == MATCH_YES) + if (gfc_match (" '%n'", &simdabi) != MATCH_YES) + goto syntax_error; + + if (gfc_match (" , ") == MATCH_YES) + if (gfc_match (" '%n'", &name) != MATCH_YES) + goto syntax_error; + + if (gfc_match (" ) ") != MATCH_YES) + goto syntax_error; +} if (gfc_match (" if ( '%n' ) ", target) == MATCH_YES) { @@ -11631,16 +11664,27 @@ gfc_match_gcc_builtin (void) } if (gfc_vectorized_builtins == NULL) -gfc_vectorized_builtins = new hash_map (); +gfc_vectorized_builtins = + new hash_map (); char *r = XNEWVEC (char, strlen (builtin) + 32); sprintf (r, "__builtin_%s", builtin); bool existed; - int &value = gfc_vectorized_builtins->get_or_insert (r, &existed); - value |= clause; + gfc_vect_builtin *v = &gfc_vectorized_builtins->get_or_insert (r, &existed); if (existed) -free (r); +{ + free (r); + gfc_vect_builtin *next = v->next; + v->next = new gfc_vect_builtin; + v = v->next; + v->next = next; +} + v->flags = flags; + v->inbranch = inbranch; + v->simdlen = simdlen; + v->simdabi = simdabi[0] ? xstrdup (simdabi) : 0; + v->name = name[0] ? xstrdup (name) : 0; return MATCH_YES; } diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 920acdafc6b..56becb207b2 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -2812,26 +2812,18 @@ extern bool
Re: [C++ Patch] Use cp_expr_loc_or_input_loc in a few additional typeck.c places
Hi again, On 14/11/19 12:09, Paolo Carlini wrote: Hi, tested x86_64-linux. Instead of sending a separate patch, the below has two additional uses. Tested as usual. Thanks, Paolo. /// /cp 2019-11-14 Paolo Carlini * typeck.c (cp_build_addr_expr_1): Use cp_expr_loc_or_input_loc in three places. (cxx_sizeof_expr): Use it in one additional place. (cxx_alignof_expr): Likewise. (lvalue_or_else): Likewise. /testsuite 2019-11-14 Paolo Carlini * g++.dg/cpp0x/addressof2.C: Test locations too. * g++.dg/cpp0x/rv-lvalue-req.C: Likewise. * g++.dg/expr/crash2.C: Likewise. * g++.dg/expr/lval1.C: Likewise. * g++.dg/expr/unary2.C: Likewise. * g++.dg/ext/lvaddr.C: Likewise. * g++.dg/ext/lvalue1.C: Likewise. * g++.dg/tree-ssa/pr20280.C: Likewise. * g++.dg/warn/Wplacement-new-size.C: Likewise. * g++.old-deja/g++.brendan/alignof.C: Likewise. * g++.old-deja/g++.brendan/sizeof2.C: Likewise. * g++.old-deja/g++.law/temps1.C: Likewise. Index: cp/typeck.c === --- cp/typeck.c (revision 278253) +++ cp/typeck.c (working copy) @@ -1765,7 +1765,8 @@ cxx_sizeof_expr (tree e, tsubst_flags_t complain) if (bitfield_p (e)) { if (complain & tf_error) -error ("invalid application of % to a bit-field"); + error_at (cp_expr_loc_or_input_loc (e), + "invalid application of % to a bit-field"); else return error_mark_node; e = char_type_node; @@ -1825,7 +1826,8 @@ cxx_alignof_expr (tree e, tsubst_flags_t complain) else if (bitfield_p (e)) { if (complain & tf_error) -error ("invalid application of %<__alignof%> to a bit-field"); + error_at (cp_expr_loc_or_input_loc (e), + "invalid application of %<__alignof%> to a bit-field"); else return error_mark_node; t = size_one_node; @@ -6126,7 +6128,7 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue if (kind == clk_none) { if (complain & tf_error) - lvalue_error (input_location, lv_addressof); + lvalue_error (cp_expr_loc_or_input_loc (arg), lv_addressof); return error_mark_node; } if (strict_lvalue && (kind & (clk_rvalueref|clk_class))) @@ -6134,7 +6136,8 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue if (!(complain & tf_error)) return error_mark_node; /* Make this a permerror because we used to accept it. */ - permerror (input_location, "taking address of rvalue"); + permerror (cp_expr_loc_or_input_loc (arg), +"taking address of rvalue"); } } @@ -6228,7 +6231,8 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue if (bitfield_p (arg)) { if (complain & tf_error) - error ("attempt to take address of bit-field"); + error_at (cp_expr_loc_or_input_loc (arg), + "attempt to take address of bit-field"); return error_mark_node; } @@ -10431,7 +10435,7 @@ lvalue_or_else (tree ref, enum lvalue_use use, tsu if (kind == clk_none) { if (complain & tf_error) - lvalue_error (input_location, use); + lvalue_error (cp_expr_loc_or_input_loc (ref), use); return 0; } else if (kind & (clk_rvalueref|clk_class)) Index: testsuite/g++.dg/cpp0x/addressof2.C === --- testsuite/g++.dg/cpp0x/addressof2.C (revision 278253) +++ testsuite/g++.dg/cpp0x/addressof2.C (working copy) @@ -8,19 +8,19 @@ addressof (T &x) noexcept return __builtin_addressof (x); } -auto a = __builtin_addressof (1); // { dg-error "lvalue required as unary" } -auto b = addressof (1);// { dg-error "cannot bind non-const lvalue reference of type" } +auto a = __builtin_addressof (1); // { dg-error "31:lvalue required as unary" } +auto b = addressof (1);// { dg-error "21:cannot bind non-const lvalue reference of type" } struct S { int s : 5; int t; void foo (); } s; auto c = __builtin_addressof (s); auto d = addressof (s); -auto e = __builtin_addressof (s.s);// { dg-error "attempt to take address of bit-field" } -auto f = addressof (s.s); // { dg-error "cannot bind bit-field" } -auto g = __builtin_addressof (S{});// { dg-error "taking address of rvalue" } -auto h = addressof (S{}); // { dg-error "cannot bind non-const lvalue reference of type" } -auto i = __builtin_addressof (S::t); // { dg-error "invalid use of non-static data member" } -auto j = __builtin_addressof (S::foo); // { dg-error "invalid use of non-static member function" } +auto e = __builtin_addressof (s.s);// { dg-error
[PATCH v3] Extend the simd function attribute
Sorry v2 had a bug. v2: added documentation and tests. v3: fixed expand_simd_clones so different isa variants are actually generated. GCC currently supports two ways to declare the availability of vector variants of a scalar function: #pragma omp declare simd void f (void); and __attribute__ ((simd)) void f (void); However these declare a set of symbols that are different simd variants of f, so a library either provides definitions for all those symbols or it cannot use these declarations. (The set of declared symbols can be narrowed down with additional omp clauses, but not enough to allow declaring a single symbol. OpenMP 5 has a declare variant feature that allows declaring more specific simd variants, but it is complicated and still requires gcc or vendor extension for unambiguous declarations.) This patch extends the gcc specific simd attribute such that it can specify a single vector variant of simple scalar functions (functions that only take and return scalar integer or floating type values): __attribute__ ((simd (mask, simdlen, simdabi, name where mask is "inbranch" or "notinbranch" like now, simdlen is an int with the same meaning as in omp declare simd and simdabi is a string specifying the call ABI (which the intel vector ABI calls ISA). The name is optional and allows a library to use a different symbol name than what the vector ABI specifies. The simd attribute currently can be used for both declarations and definitions, in the latter case the simd varaints of the function are generated, which should work with the extended simd attribute too. Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ChangeLog: 2019-11-14 Szabolcs Nagy * cgraph.h (struct cgraph_simd_clone): Add simdname field. * doc/extend.texi: Update the simd attribute documentation. * tree.h (OMP_CLAUSE__SIMDABI__EXPR): Define. (OMP_CLAUSE__SIMDNAME__EXPR): Define. * tree.c (walk_tree_1): Handle new omp clauses. * tree-core.h (enum omp_clause_code): Likewise. * tree-nested.c (convert_nonlocal_omp_clauses): Likewise. * tree-pretty-print.c (dump_omp_clause): Likewise. * omp-low.c (scan_sharing_clauses): Likewise. * omp-simd-clone.c (simd_clone_clauses_extract): Likewise. (simd_clone_mangle): Handle simdname. (expand_simd_clones): Reset vecsize_mangle when generating clones. * config/aarch64/aarch64.c (aarch64_simd_clone_compute_vecsize_and_simdlen): Warn about unsupported SIMD ABI. * config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen): Likewise. gcc/c-family/ChangeLog: 2019-11-14 Szabolcs Nagy * c-attribs.c (handle_simd_attribute): Handle 4 arguments. gcc/testsuite/ChangeLog: 2019-11-14 Szabolcs Nagy * c-c++-common/attr-simd-5.c: Update. * c-c++-common/attr-simd-6.c: New test. * c-c++-common/attr-simd-7.c: New test. * c-c++-common/attr-simd-8.c: New test. diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index c62cebf7bfd..bf2301eb790 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -448,7 +448,7 @@ const struct attribute_spec c_common_attribute_table[] = handle_omp_declare_variant_attribute, NULL }, { "omp declare variant variant", 0, -1, true, false, false, false, handle_omp_declare_variant_attribute, NULL }, - { "simd", 0, 1, true, false, false, false, + { "simd", 0, 4, true, false, false, false, handle_simd_attribute, NULL }, { "omp declare target", 0, -1, true, false, false, false, handle_omp_declare_target_attribute, NULL }, @@ -3094,13 +3094,22 @@ handle_simd_attribute (tree *node, tree name, tree args, int, bool *no_add_attrs { tree t = get_identifier ("omp declare simd"); tree attr = NULL_TREE; + + /* Allow + simd + simd (mask) + simd (mask, simdlen) + simd (mask, simdlen, simdabi) + simd (mask, simdlen, simdabi, name) + forms. */ + if (args) { tree id = TREE_VALUE (args); if (TREE_CODE (id) != STRING_CST) { - error ("attribute %qE argument not a string", name); + error ("attribute %qE first argument not a string", name); *no_add_attrs = true; return NULL_TREE; } @@ -3113,13 +3122,75 @@ handle_simd_attribute (tree *node, tree name, tree args, int, bool *no_add_attrs OMP_CLAUSE_INBRANCH); else { - error ("only % and % flags are " - "allowed for %<__simd__%> attribute"); + error ("%qE attribute first argument must be % or " + "%", name); + *no_add_attrs = true; + return NULL_TREE; + } + + args = TREE_CHAIN (args); + } + + if (args) + { + tree arg = TREE_VALUE (args); + + arg = c_fully_fold (arg, false, NULL); + if (TREE_CODE (arg) != INTEGER_CST + || !INTEGRAL_TYPE_P (TREE_TYPE (arg)) + || tree_int_cst_sgn (ar
Support UTF-8 character constants for C2x
C2x adds u8'' character constants to C. This patch adds the corresponding GCC support. Most of the support was already present for C++ and just needed enabling for C2x. However, in C2x these constants have type unsigned char, which required corresponding adjustments in the compiler and the preprocessor to give them that type for C. For C, it seems clear to me that having type unsigned char means the constants are unsigned in the preprocessor (and thus treated as having type uintmax_t in #if conditionals), so this patch implements that. I included a conditional in the libcpp change to avoid affecting signedness for C++, but I'm not sure if in fact these constants should also be unsigned in the preprocessor for C++ in which case that !CPP_OPTION (pfile, cplusplus) conditional would not be needed. Bootstrapped with no regressions on x86_64-pc-linux-gnu. Applied to mainline. gcc/c: 2019-11-14 Joseph Myers * c-parser.c (c_parser_postfix_expression) (c_parser_check_literal_zero): Handle CPP_UTF8CHAR. * gimple-parser.c (c_parser_gimple_postfix_expression): Likewise. gcc/c-family: 2019-11-14 Joseph Myers * c-lex.c (lex_charconst): Make CPP_UTF8CHAR constants unsigned char for C. gcc/testsuite: 2019-11-14 Joseph Myers * gcc.dg/c11-utf8char-1.c, gcc.dg/c2x-utf8char-1.c, gcc.dg/c2x-utf8char-2.c, gcc.dg/c2x-utf8char-3.c, gcc.dg/gnu2x-utf8char-1.c: New tests. libcpp: 2019-11-14 Joseph Myers * charset.c (narrow_str_to_charconst): Make CPP_UTF8CHAR constants unsigned for C. * init.c (lang_defaults): Set utf8_char_literals for GNUC2X and STDC2X. Index: gcc/c/c-parser.c === --- gcc/c/c-parser.c(revision 278253) +++ gcc/c/c-parser.c(working copy) @@ -8783,6 +8783,7 @@ c_parser_postfix_expression (c_parser *parser) case CPP_CHAR: case CPP_CHAR16: case CPP_CHAR32: +case CPP_UTF8CHAR: case CPP_WCHAR: expr.value = c_parser_peek_token (parser)->value; /* For the purpose of warning when a pointer is compared with @@ -10459,6 +10460,7 @@ c_parser_check_literal_zero (c_parser *parser, uns case CPP_WCHAR: case CPP_CHAR16: case CPP_CHAR32: +case CPP_UTF8CHAR: /* If a parameter is literal zero alone, remember it for -Wmemset-transposed-args warning. */ if (integer_zerop (tok->value) Index: gcc/c/gimple-parser.c === --- gcc/c/gimple-parser.c (revision 278253) +++ gcc/c/gimple-parser.c (working copy) @@ -1395,6 +1395,7 @@ c_parser_gimple_postfix_expression (gimple_parser case CPP_CHAR: case CPP_CHAR16: case CPP_CHAR32: +case CPP_UTF8CHAR: case CPP_WCHAR: expr.value = c_parser_peek_token (parser)->value; set_c_expr_source_range (&expr, tok_range); Index: gcc/c-family/c-lex.c === --- gcc/c-family/c-lex.c(revision 278253) +++ gcc/c-family/c-lex.c(working copy) @@ -1376,7 +1376,9 @@ lex_charconst (const cpp_token *token) type = char16_type_node; else if (token->type == CPP_UTF8CHAR) { - if (flag_char8_t) + if (!c_dialect_cxx ()) + type = unsigned_char_type_node; + else if (flag_char8_t) type = char8_type_node; else type = char_type_node; Index: gcc/testsuite/gcc.dg/c11-utf8char-1.c === --- gcc/testsuite/gcc.dg/c11-utf8char-1.c (nonexistent) +++ gcc/testsuite/gcc.dg/c11-utf8char-1.c (working copy) @@ -0,0 +1,7 @@ +/* Test C2x UTF-8 characters. Test not accepted for C11. */ +/* { dg-do compile } */ +/* { dg-options "-std=c11 -pedantic-errors" } */ + +#define z(x) 0 +#define u8 z( +unsigned char a = u8'a'); Index: gcc/testsuite/gcc.dg/c2x-utf8char-1.c === --- gcc/testsuite/gcc.dg/c2x-utf8char-1.c (nonexistent) +++ gcc/testsuite/gcc.dg/c2x-utf8char-1.c (working copy) @@ -0,0 +1,29 @@ +/* Test C2x UTF-8 characters. Test valid usages. */ +/* { dg-do compile } */ +/* { dg-options "-std=c2x -pedantic-errors" } */ + +unsigned char a = u8'a'; +_Static_assert (u8'a' == 97); + +unsigned char b = u8'\0'; +_Static_assert (u8'\0' == 0); + +unsigned char c = u8'\xff'; +_Static_assert (u8'\xff' == 255); + +unsigned char d = u8'\377'; +_Static_assert (u8'\377' == 255); + +_Static_assert (sizeof (u8'a') == 1); +_Static_assert (sizeof (u8'\0') == 1); +_Static_assert (sizeof (u8'\xff') == 1); +_Static_assert (sizeof (u8'\377') == 1); + +_Static_assert (_Generic (u8'a', unsigned char: 1, default: 2) == 1); +_Static_assert (_Generic (u8'\0', unsigned char: 1, default: 2) == 1); +_Static_assert (_Generic (u8'\xff', unsigned char: 1, default: 2) == 1); +_Static_assert (_Generic (u8'\377', un
[RFC C++ PATCH] __builtin_source_location ()
Hi! The following WIP patch implements __builtin_source_location (), which returns const void pointer to a std::source_location::__impl struct that is required to contain __file, __function, __line and __column fields, the first two with const char * type, the latter some integral type. I don't have testcase coverage yet and the hash map to allow sharing of VAR_DECLs with the same location is commented out both because it doesn't compile for some reason and because hashing on location_t is not enough, we probably need to hash on both location_t and fndecl, as the baz case in the following shows. Comments? namespace std { struct source_location { struct __impl { const char *__file; const char *__function; unsigned int __line, __column; }; const void *__ptr; constexpr source_location () : __ptr (nullptr) {} static consteval source_location current (const void *__p = __builtin_source_location ()) { source_location __ret; __ret.__ptr = __p; return __ret; } constexpr const char *file () const { return static_cast (__ptr)->__file; } constexpr const char *function () const { return static_cast (__ptr)->__function; } constexpr unsigned line () const { return static_cast (__ptr)->__line; } constexpr unsigned column () const { return static_cast (__ptr)->__column; } }; } using namespace std; consteval source_location bar (const source_location x = source_location::current ()) { return x; } void foo (const char **p, unsigned *q) { constexpr source_location s = source_location::current (); constexpr source_location t = bar (); p[0] = s.file (); p[1] = s.function (); q[0] = s.line (); q[1] = s.column (); p[2] = t.file (); p[3] = t.function (); q[2] = t.line (); q[3] = t.column (); constexpr const char *r = s.file (); } template constexpr source_location baz () { return source_location::current (); } constexpr source_location s1 = baz <0> (); constexpr source_location s2 = baz <1> (); const source_location *p1 = &s1; const source_location *p2 = &s2; --- gcc/cp/tree.c.jj2019-11-13 10:54:45.437045793 +0100 +++ gcc/cp/tree.c 2019-11-14 18:11:42.391213117 +0100 @@ -445,7 +445,9 @@ builtin_valid_in_constant_expr_p (const_ if (DECL_BUILT_IN_CLASS (decl) != BUILT_IN_NORMAL) { if (fndecl_built_in_p (decl, CP_BUILT_IN_IS_CONSTANT_EVALUATED, - BUILT_IN_FRONTEND)) +BUILT_IN_FRONTEND) + || fndecl_built_in_p (decl, CP_BUILT_IN_SOURCE_LOCATION, + BUILT_IN_FRONTEND)) return true; /* Not a built-in. */ return false; --- gcc/cp/constexpr.c.jj 2019-11-13 10:54:45.426045960 +0100 +++ gcc/cp/constexpr.c 2019-11-14 18:26:40.691581038 +0100 @@ -1238,6 +1238,9 @@ cxx_eval_builtin_function_call (const co return boolean_true_node; } + if (fndecl_built_in_p (fun, CP_BUILT_IN_SOURCE_LOCATION, BUILT_IN_FRONTEND)) +return fold_builtin_source_location (EXPR_LOCATION (t)); + /* Be permissive for arguments to built-ins; __builtin_constant_p should return constant false for a non-constant argument. */ constexpr_ctx new_ctx = *ctx; --- gcc/cp/name-lookup.c.jj 2019-11-13 10:54:45.495044911 +0100 +++ gcc/cp/name-lookup.c2019-11-14 18:38:30.765804391 +0100 @@ -5747,6 +5747,8 @@ get_std_name_hint (const char *name) {"shared_lock", "", cxx14}, {"shared_mutex", "", cxx17}, {"shared_timed_mutex", "", cxx14}, +/* . */ +{"source_location", "", cxx2a}, /* . */ {"basic_stringbuf", "", cxx98}, {"basic_istringstream", "", cxx98}, --- gcc/cp/cp-gimplify.c.jj 2019-11-06 08:58:38.036473709 +0100 +++ gcc/cp/cp-gimplify.c2019-11-14 20:22:32.905068438 +0100 @@ -35,6 +35,9 @@ along with GCC; see the file COPYING3. #include "attribs.h" #include "asan.h" #include "gcc-rich-location.h" +#include "output.h" +#include "file-prefix-map.h" +#include "cgraph.h" /* Forward declarations. */ @@ -896,8 +899,12 @@ cp_gimplify_expr (tree *expr_p, gimple_s tree decl = cp_get_callee_fndecl_nofold (*expr_p); if (decl && fndecl_built_in_p (decl, CP_BUILT_IN_IS_CONSTANT_EVALUATED, - BUILT_IN_FRONTEND)) + BUILT_IN_FRONTEND)) *expr_p = boolean_false_node; + else if (decl + && fndecl_built_in_p (decl, CP_BUILT_IN_SOURCE_LOCATION, +BUILT_IN_FRONTEND)) + *expr_p = fold_builtin_source_location (EXPR_LOCATION (*expr_p)); } break; @@ -2641,9 +2648,17 @@ cp_fold (tree x) /* Defer folding __builtin_is_constant_evaluated. */ if (callee && fndecl_built_in_p (callee, CP_BUILT_IN_IS_CONSTANT_EVALUATED, - BUILT_IN_FRONTEND)) +
[PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics.
Hello, This patches series is to support Arm MVE ACLE intrinsics. Please refer to Arm reference manual [1] and MVE intrinsics [2] for more details. Please refer to Chapter 13 MVE ACLE [3] for MVE intrinsics concepts. This patch series depends on upstream patches "Armv8.1-M Mainline Security Extension" [4], "CLI and multilib support for Armv8.1-M Mainline MVE extensions" [5] and "support for Armv8.1-M Mainline scalar shifts" [6]. [1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914 [2] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics [3] https://static.docs.arm.com/101028/0009/Q3-ACLE_2019Q3_release-0009.pdf?_ga=2.239684871.588348166.1573726994-1501600630.1548848914 [4] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01654.html [5] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00641.html [6] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01194.html Srinath Parvathaneni(38): [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch. [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch. [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch. [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics. [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand. [PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand. [PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand. [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand. [PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands. [PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands. [PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands. [PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands. [PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands. [PATCH][ARM][GCC][1/3x]: MVE intrinsics with ternary operands. [PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands. [PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands. [PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands. [PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands. [PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands. [PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands. [PATCH][ARM][GCC][1/5x]: MVE store intrinsics. [PATCH][ARM][GCC][2/5x]: MVE load intrinsics. [PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix. [PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix. [PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, halfword, or word from memory. [PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads half word and word or double word from memory. [PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half word or word to memory. [PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores an half word, word and double word to memory. [PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus operator. [PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics. [PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup intrinsics with writeback. [PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store intrinsics with writeback. [PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care) variant. [PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across beats" and "beat-wise substract". [PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and deinterleaving load intrinsics and also aliases to vstr and vldr intrinsics. [PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane. [PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics. [PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry intrinsics. Regards, Srinath.
[PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and deinterleaving load intrinsics and also aliases to vstr and vldr intrinsics.
Hello, This patch supports following MVE ACLE intrinsics which are aliases of vstr and vldr intrinsics. vst1q_p_u8, vst1q_p_s8, vld1q_z_u8, vld1q_z_s8, vst1q_p_u16, vst1q_p_s16, vld1q_z_u16, vld1q_z_s16, vst1q_p_u32, vst1q_p_s32, vld1q_z_u32, vld1q_z_s32, vld1q_z_f16, vst1q_p_f16, vld1q_z_f32, vst1q_p_f32. This patch also supports following MVE ACLE vector deinterleaving loads and vector interleaving stores. vst2q_s8, vst2q_u8, vld2q_s8, vld2q_u8, vld4q_s8, vld4q_u8, vst2q_s16, vst2q_u16, vld2q_s16, vld2q_u16, vld4q_s16, vld4q_u16, vst2q_s32, vst2q_u32, vld2q_s32, vld2q_u32, vld4q_s32, vld4q_u32, vld4q_f16, vld2q_f16, vst2q_f16, vld4q_f32, vld2q_f32, vst2q_f32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-08 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vst1q_p_u8): Define macro. (vst1q_p_s8): Likewise. (vst2q_s8): Likewise. (vst2q_u8): Likewise. (vld1q_z_u8): Likewise. (vld1q_z_s8): Likewise. (vld2q_s8): Likewise. (vld2q_u8): Likewise. (vld4q_s8): Likewise. (vld4q_u8): Likewise. (vst1q_p_u16): Likewise. (vst1q_p_s16): Likewise. (vst2q_s16): Likewise. (vst2q_u16): Likewise. (vld1q_z_u16): Likewise. (vld1q_z_s16): Likewise. (vld2q_s16): Likewise. (vld2q_u16): Likewise. (vld4q_s16): Likewise. (vld4q_u16): Likewise. (vst1q_p_u32): Likewise. (vst1q_p_s32): Likewise. (vst2q_s32): Likewise. (vst2q_u32): Likewise. (vld1q_z_u32): Likewise. (vld1q_z_s32): Likewise. (vld2q_s32): Likewise. (vld2q_u32): Likewise. (vld4q_s32): Likewise. (vld4q_u32): Likewise. (vld4q_f16): Likewise. (vld2q_f16): Likewise. (vld1q_z_f16): Likewise. (vst2q_f16): Likewise. (vst1q_p_f16): Likewise. (vld4q_f32): Likewise. (vld2q_f32): Likewise. (vld1q_z_f32): Likewise. (vst2q_f32): Likewise. (vst1q_p_f32): Likewise. (__arm_vst1q_p_u8): Define intrinsic. (__arm_vst1q_p_s8): Likewise. (__arm_vst2q_s8): Likewise. (__arm_vst2q_u8): Likewise. (__arm_vld1q_z_u8): Likewise. (__arm_vld1q_z_s8): Likewise. (__arm_vld2q_s8): Likewise. (__arm_vld2q_u8): Likewise. (__arm_vld4q_s8): Likewise. (__arm_vld4q_u8): Likewise. (__arm_vst1q_p_u16): Likewise. (__arm_vst1q_p_s16): Likewise. (__arm_vst2q_s16): Likewise. (__arm_vst2q_u16): Likewise. (__arm_vld1q_z_u16): Likewise. (__arm_vld1q_z_s16): Likewise. (__arm_vld2q_s16): Likewise. (__arm_vld2q_u16): Likewise. (__arm_vld4q_s16): Likewise. (__arm_vld4q_u16): Likewise. (__arm_vst1q_p_u32): Likewise. (__arm_vst1q_p_s32): Likewise. (__arm_vst2q_s32): Likewise. (__arm_vst2q_u32): Likewise. (__arm_vld1q_z_u32): Likewise. (__arm_vld1q_z_s32): Likewise. (__arm_vld2q_s32): Likewise. (__arm_vld2q_u32): Likewise. (__arm_vld4q_s32): Likewise. (__arm_vld4q_u32): Likewise. (__arm_vld4q_f16): Likewise. (__arm_vld2q_f16): Likewise. (__arm_vld1q_z_f16): Likewise. (__arm_vst2q_f16): Likewise. (__arm_vst1q_p_f16): Likewise. (__arm_vld4q_f32): Likewise. (__arm_vld2q_f32): Likewise. (__arm_vld1q_z_f32): Likewise. (__arm_vst2q_f32): Likewise. (__arm_vst1q_p_f32): Likewise. (vld1q_z): Define polymorphic variant. (vld2q): Likewise. (vld4q): Likewise. (vst1q_p): Likewise. (vst2q): Likewise. * config/arm/arm_mve_builtins.def (STORE1): Use builtin qualifier. (LOAD1): Likewise. * config/arm/mve.md (mve_vst2q): Define RTL pattern. (mve_vld2q): Likewise. (mve_vld4q): Likewise. gcc/testsuite/ChangeLog: 2019-11-08 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vld1q_z_f16.c: New test. * gcc.target/arm/mve/intrinsics/vld1q_z_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vld2q_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld2q_f32.c: Likewi
[PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane.
Hello, This patch supports following MVE ACLE intrinsics to get and set vector lane. vsetq_lane_f16, vsetq_lane_f32, vsetq_lane_s16, vsetq_lane_s32, vsetq_lane_s8, vsetq_lane_s64, vsetq_lane_u8, vsetq_lane_u16, vsetq_lane_u32, vsetq_lane_u64, vgetq_lane_f16, vgetq_lane_f32, vgetq_lane_s16, vgetq_lane_s32, vgetq_lane_s8, vgetq_lane_s64, vgetq_lane_u8, vgetq_lane_u16, vgetq_lane_u32, vgetq_lane_u64. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-08 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vsetq_lane_f16): Define macro. (vsetq_lane_f32): Likewise. (vsetq_lane_s16): Likewise. (vsetq_lane_s32): Likewise. (vsetq_lane_s8): Likewise. (vsetq_lane_s64): Likewise. (vsetq_lane_u8): Likewise. (vsetq_lane_u16): Likewise. (vsetq_lane_u32): Likewise. (vsetq_lane_u64): Likewise. (vgetq_lane_f16): Likewise. (vgetq_lane_f32): Likewise. (vgetq_lane_s16): Likewise. (vgetq_lane_s32): Likewise. (vgetq_lane_s8): Likewise. (vgetq_lane_s64): Likewise. (vgetq_lane_u8): Likewise. (vgetq_lane_u16): Likewise. (vgetq_lane_u32): Likewise. (vgetq_lane_u64): Likewise. (__ARM_NUM_LANES): Likewise. (__ARM_LANEQ): Likewise. (__ARM_CHECK_LANEQ): Likewise. (__arm_vsetq_lane_s16): Define intrinsic. (__arm_vsetq_lane_s32): Likewise. (__arm_vsetq_lane_s8): Likewise. (__arm_vsetq_lane_s64): Likewise. (__arm_vsetq_lane_u8): Likewise. (__arm_vsetq_lane_u16): Likewise. (__arm_vsetq_lane_u32): Likewise. (__arm_vsetq_lane_u64): Likewise. (__arm_vgetq_lane_s16): Likewise. (__arm_vgetq_lane_s32): Likewise. (__arm_vgetq_lane_s8): Likewise. (__arm_vgetq_lane_s64): Likewise. (__arm_vgetq_lane_u8): Likewise. (__arm_vgetq_lane_u16): Likewise. (__arm_vgetq_lane_u32): Likewise. (__arm_vgetq_lane_u64): Likewise. (__arm_vsetq_lane_f16): Likewise. (__arm_vsetq_lane_f32): Likewise. (__arm_vgetq_lane_f16): Likewise. (__arm_vgetq_lane_f32): Likewise. (vgetq_lane): Define polymorphic variant. (vsetq_lane): Likewise. * config/arm/mve.md (mve_vec_extract): Define RTL pattern. (mve_vec_extractv2didi): Likewise. (mve_vec_extract_sext_internal): Likewise. (mve_vec_extract_zext_internal): Likewise. (mve_vec_set_internal): Likewise. (mve_vec_setv2di_internal): Likewise. * config/arm/neon.md (vec_set): Move RTL pattern to vec-common.md file. (vec_extract): Rename to "neon_vec_extract". (vec_extractv2didi): Rename to "neon_vec_extractv2didi". * config/arm/vec-common.md (vec_extract): Define RTL pattern common for MVE and NEON. (vec_set): Move RTL pattern from neon.md and modify to accept both MVE and NEON. gcc/testsuite/ChangeLog: 2019-11-08 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: New test. * gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index d0259d7bd96c565d901b7634e9f735e0e14bc9dc..9dcf8d692670cd8552fade9868bc051683553b91
[PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics.
Hello, This patch supports following MVE ACLE scalar shift intrinsics. sqrshr, sqrshrl, sqrshrl_sat48, sqshl, sqshll, srshr, srshrl, uqrshl, uqrshll, uqrshll_sat48, uqshl, uqshll, urshr, urshrl, lsll, asrl. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-08 Srinath Parvathaneni * config/arm/arm-builtins.c (LSLL_QUALIFIERS): Define builtin qualifier. (UQSHL_QUALIFIERS): Likewise. (ASRL_QUALIFIERS): Likewise. (SQSHL_QUALIFIERS): Likewise. * config/arm/arm_mve.h (sqrshr): Define macro. (sqrshrl): Likewise. (sqrshrl_sat48): Likewise. (sqshl): Likewise. (sqshll): Likewise. (srshr): Likewise. (srshrl): Likewise. (uqrshl): Likewise. (uqrshll): Likewise. (uqrshll_sat48): Likewise. (uqshl): Likewise. (uqshll): Likewise. (urshr): Likewise. (urshrl): Likewise. (lsll): Likewise. (asrl): Likewise. (__arm_lsll): Define intrinsic. (__arm_asrl): Likewise. (__arm_uqrshll): Likewise. (__arm_uqrshll_sat48): Likewise. (__arm_sqrshrl): Likewise. (__arm_sqrshrl_sat48): Likewise. (__arm_uqshll): Likewise. (__arm_urshrl): Likewise. (__arm_srshrl): Likewise. (__arm_sqshll): Likewise. (__arm_uqrshl): Likewise. (__arm_sqrshr): Likewise. (__arm_uqshl): Likewise. (__arm_urshr): Likewise. (__arm_sqshl): Likewise. (__arm_srshr): Likewise. * config/arm/arm_mve_builtins.def (LSLL_QUALIFIERS): Use builtin qualifier. (UQSHL_QUALIFIERS): Likewise. (ASRL_QUALIFIERS): Likewise. (SQSHL_QUALIFIERS): Likewise. * config/arm/mve.md (mve_uqrshll_sat_di): Define RTL pattern. (mve_sqrshrl_sat_di): Likewise (mve_uqrshl_si): Likewise (mve_sqrshr_si): Likewise (mve_uqshll_di): Likewise (mve_urshrl_di): Likewise (mve_uqshl_si): Likewise (mve_urshr_si): Likewise (mve_sqshl_si): Likewise (mve_srshr_si): Likewise (mve_srshrl_di): Likewise (mve_sqshll_di): Likewise gcc/testsuite/ChangeLog: 2019-11-08 Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/asrl.c: New test. * gcc.target/arm/mve/intrinsics/lsll.c: Likewise. * gcc.target/arm/mve/intrinsics/sqrshr.c: Likewise. * gcc.target/arm/mve/intrinsics/sqrshrl_sat48.c: Likewise. * gcc.target/arm/mve/intrinsics/sqrshrl_sat64.c: Likewise. * gcc.target/arm/mve/intrinsics/sqshl.c: Likewise. * gcc.target/arm/mve/intrinsics/sqshll.c: Likewise. * gcc.target/arm/mve/intrinsics/srshr.c: Likewise. * gcc.target/arm/mve/intrinsics/srshrl.c: Likewise. * gcc.target/arm/mve/intrinsics/uqrshl.c: Likewise. * gcc.target/arm/mve/intrinsics/uqrshll_sat48.c: Likewise. * gcc.target/arm/mve/intrinsics/uqrshll_sat64.c: Likewise. * gcc.target/arm/mve/intrinsics/uqshl.c: Likewise. * gcc.target/arm/mve/intrinsics/uqshll.c: Likewise. * gcc.target/arm/mve/intrinsics/urshr.c: Likewise. * gcc.target/arm/mve/intrinsics/urshrl.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 9e87dff264d6b535f64407f669c6e83b0ed639a6..31bff8511e368f8e789297818e9b0b9f885463ae 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -738,6 +738,26 @@ arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned, qualifier_unsigned}; #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers) +static enum arm_type_qualifiers +arm_lsll_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_unsigned, qualifier_none}; +#define LSLL_QUALIFIERS (arm_lsll_qualifiers) + +static enum arm_type_qualifiers +arm_uqshl_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_unsigned, qualifier_const}; +#define UQSHL_QUALIFIERS (arm_uqshl_qualifiers) + +static enum arm_type_qualifiers +arm_asrl_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none}; +#define ASRL_QUALIFIERS (arm_asrl_qualifiers) + +static enum arm_type_qualifiers +arm_sqshl_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_unsigned, qualifier_const}; +#define SQSHL_QUALIFIERS (arm_sqshl_qualifiers) + /* End of Qualifier for MVE builtins. */ /* void ([T element type] *, T, immediate). */ diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 9dcf8d692670cd8552fade9868bc051683553b91..2adae7f8b21f44aa3b80231b89bd68bcd0812611 100644 ---
[PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry intrinsics.
Hello, This patch supports following MVE ACLE whole vector left shift with carry intrinsics. vshlcq_m_s8, vshlcq_m_s16, vshlcq_m_s32, vshlcq_m_u8, vshlcq_m_u16, vshlcq_m_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-09 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vshlcq_m_s8): Define macro. (vshlcq_m_u8): Likewise. (vshlcq_m_s16): Likewise. (vshlcq_m_u16): Likewise. (vshlcq_m_s32): Likewise. (vshlcq_m_u32): Likewise. (__arm_vshlcq_m_s8): Define intrinsic. (__arm_vshlcq_m_u8): Likewise. (__arm_vshlcq_m_s16): Likewise. (__arm_vshlcq_m_u16): Likewise. (__arm_vshlcq_m_s32): Likewise. (__arm_vshlcq_m_u32): Likewise. (vshlcq_m): Define polymorphic variant. * config/arm/arm_mve_builtins.def (QUADOP_NONE_NONE_UNONE_IMM_UNONE): Use builtin qualifier. (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE): Likewise. * config/arm/mve.md (mve_vshlcq_m_vec_): Define RTL pattern. (mve_vshlcq_m_carry_): Likewise. (mve_vshlcq_m_): Likewise. gcc/testsuite/ChangeLog: 2019-11-09 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vshlcq_m_s16.c: New test. * gcc.target/arm/mve/intrinsics/vshlcq_m_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlcq_m_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlcq_m_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlcq_m_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlcq_m_u8.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 2adae7f8b21f44aa3b80231b89bd68bcd0812611..5d385081a0affacc4dd21d010b01bceb38a9b699 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -2542,6 +2542,12 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t; #define urshrl(__p0, __p1) __arm_urshrl(__p0, __p1) #define lsll(__p0, __p1) __arm_lsll(__p0, __p1) #define asrl(__p0, __p1) __arm_asrl(__p0, __p1) +#define vshlcq_m_s8(__a, __b, __imm, __p) __arm_vshlcq_m_s8(__a, __b, __imm, __p) +#define vshlcq_m_u8(__a, __b, __imm, __p) __arm_vshlcq_m_u8(__a, __b, __imm, __p) +#define vshlcq_m_s16(__a, __b, __imm, __p) __arm_vshlcq_m_s16(__a, __b, __imm, __p) +#define vshlcq_m_u16(__a, __b, __imm, __p) __arm_vshlcq_m_u16(__a, __b, __imm, __p) +#define vshlcq_m_s32(__a, __b, __imm, __p) __arm_vshlcq_m_s32(__a, __b, __imm, __p) +#define vshlcq_m_u32(__a, __b, __imm, __p) __arm_vshlcq_m_u32(__a, __b, __imm, __p) #endif /* For big-endian, GCC's vector indices are reversed within each 64 bits @@ -16667,6 +16673,60 @@ __arm_srshr (int32_t value, const int shift) return __builtin_mve_srshr_si (value, shift); } +__extension__ extern __inline int8x16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vshlcq_m_s8 (int8x16_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p) +{ + int8x16_t __res = __builtin_mve_vshlcq_m_vec_sv16qi (__a, *__b, __imm, __p); + *__b = __builtin_mve_vshlcq_m_carry_sv16qi (__a, *__b, __imm, __p); + return __res; +} + +__extension__ extern __inline uint8x16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vshlcq_m_u8 (uint8x16_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p) +{ + uint8x16_t __res = __builtin_mve_vshlcq_m_vec_uv16qi (__a, *__b, __imm, __p); + *__b = __builtin_mve_vshlcq_m_carry_uv16qi (__a, *__b, __imm, __p); + return __res; +} + +__extension__ extern __inline int16x8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vshlcq_m_s16 (int16x8_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p) +{ + int16x8_t __res = __builtin_mve_vshlcq_m_vec_sv8hi (__a, *__b, __imm, __p); + *__b = __builtin_mve_vshlcq_m_carry_sv8hi (__a, *__b, __imm, __p); + return __res; +} + +__extension__ extern __inline uint16x8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vshlcq_m_u16 (uint16x8_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p) +{ + uint16x8_t __res = __builtin_mve_vshlcq_m_vec_uv8hi (__a, *__b, __imm, __p); + *__b = __builtin_mve_vshlcq_m_carry_uv8hi (__a, *__b, __imm, __p); + return __res; +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vshlcq_m_s32 (int32x4_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p) +{ + int32x4_t __res = __builtin_mve_vshlcq_m_vec_sv4si (__a, *__b, __imm, __p); + *__b = __builtin_mve_vshlcq_m_ca
[PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across beats" and "beat-wise substract".
Hello, This patch supports following MVE ACLE "add with carry across beats" intrinsics and "beat-wise substract" intrinsics. vadciq_s32, vadciq_u32, vadciq_m_s32, vadciq_m_u32, vadcq_s32, vadcq_u32, vadcq_m_s32, vadcq_m_u32, vsbciq_s32, vsbciq_u32, vsbciq_m_s32, vsbciq_m_u32, vsbcq_s32, vsbcq_u32, vsbcq_m_s32, vsbcq_m_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-08 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vadciq_s32): Define macro. (vadciq_u32): Likewise. (vadciq_m_s32): Likewise. (vadciq_m_u32): Likewise. (vadcq_s32): Likewise. (vadcq_u32): Likewise. (vadcq_m_s32): Likewise. (vadcq_m_u32): Likewise. (vsbciq_s32): Likewise. (vsbciq_u32): Likewise. (vsbciq_m_s32): Likewise. (vsbciq_m_u32): Likewise. (vsbcq_s32): Likewise. (vsbcq_u32): Likewise. (vsbcq_m_s32): Likewise. (vsbcq_m_u32): Likewise. (__arm_vadciq_s32): Define intrinsic. (__arm_vadciq_u32): Likewise. (__arm_vadciq_m_s32): Likewise. (__arm_vadciq_m_u32): Likewise. (__arm_vadcq_s32): Likewise. (__arm_vadcq_u32): Likewise. (__arm_vadcq_m_s32): Likewise. (__arm_vadcq_m_u32): Likewise. (__arm_vsbciq_s32): Likewise. (__arm_vsbciq_u32): Likewise. (__arm_vsbciq_m_s32): Likewise. (__arm_vsbciq_m_u32): Likewise. (__arm_vsbcq_s32): Likewise. (__arm_vsbcq_u32): Likewise. (__arm_vsbcq_m_s32): Likewise. (__arm_vsbcq_m_u32): Likewise. (vadciq_m): Define polymorphic variant. (vadciq): Likewise. (vadcq_m): Likewise. (vadcq): Likewise. (vsbciq_m): Likewise. (vsbciq): Likewise. (vsbcq_m): Likewise. (vsbcq): Likewise. * config/arm/arm_mve_builtins.def (BINOP_NONE_NONE_NONE): Use builtin qualifier. (BINOP_UNONE_UNONE_UNONE): Likewise. (QUADOP_NONE_NONE_NONE_NONE_UNONE): Likewise. (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE): Likewise. * config/arm/mve.md (VADCIQ): Define iterator. (VADCIQ_M): Likewise. (VSBCQ): Likewise. (VSBCQ_M): Likewise. (VSBCIQ): Likewise. (VSBCIQ_M): Likewise. (VADCQ): Likewise. (VADCQ_M): Likewise. (mve_vadciq_m_v4si): Define RTL pattern. (mve_vadciq_v4si): Likewise. (mve_vadcq_m_v4si): Likewise. (mve_vadcq_v4si): Likewise. (mve_vsbciq_m_v4si): Likewise. (mve_vsbciq_v4si): Likewise. (mve_vsbcq_m_v4si): Likewise. (mve_vsbcq_v4si): Likewise. gcc/testsuite/ChangeLog: 2019-11-08 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: New test. * gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vadciq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vadciq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vadcq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vadcq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbciq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbciq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbcq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbcq_u32.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 31ad3fc5cddfedede02b10e194a426a98bd13024..1704b622c5d6e0abcf814ae1d439bb732f0bd76e 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -2450,6 +2450,22 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t; #define vrev32q_x_f16(__a, __p) __arm_vrev32q_x_f16(__a, __p) #define vrev64q_x_f16(__a, __p) __arm_vrev64q_x_f16(__a, __p) #define vrev64q_x_f32(__a, __p) __arm_vrev64q_x_f32(__a, __p) +#define vadciq_s32(__a, __b, __carry_out) __arm_vadciq_s32(__a, __b, __carry_out) +#define vadciq_u32(__a, __b, __carry_out) __arm_vadciq_u32(__a, __b, __carry_out) +#define vadciq_m_s32(__inactive, __a, __b, __carry_out, __p) __arm_vadciq_m_s32(__inactive, __a, __b, __carry_out, __p) +#define vadciq_m_u32(__inactive
[PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care) variant.
Hello, This patch supports following MVE ACLE predicated intrinsic with `_x` (dont-care) variant. * ``_x`` (dont-care) which indicates that the false-predicated lanes have undefined values. These are syntactic sugar for merge intrinsics with a ``vuninitializedq`` inactive parameter. vabdq_x_f16, vabdq_x_f32, vabdq_x_s16, vabdq_x_s32, vabdq_x_s8, vabdq_x_u16, vabdq_x_u32, vabdq_x_u8, vabsq_x_f16, vabsq_x_f32, vabsq_x_s16, vabsq_x_s32, vabsq_x_s8, vaddq_x_f16, vaddq_x_f32, vaddq_x_n_f16, vaddq_x_n_f32, vaddq_x_n_s16, vaddq_x_n_s32, vaddq_x_n_s8, vaddq_x_n_u16, vaddq_x_n_u32, vaddq_x_n_u8, vaddq_x_s16, vaddq_x_s32, vaddq_x_s8, vaddq_x_u16, vaddq_x_u32, vaddq_x_u8, vandq_x_f16, vandq_x_f32, vandq_x_s16, vandq_x_s32, vandq_x_s8, vandq_x_u16, vandq_x_u32, vandq_x_u8, vbicq_x_f16, vbicq_x_f32, vbicq_x_s16, vbicq_x_s32, vbicq_x_s8, vbicq_x_u16, vbicq_x_u32, vbicq_x_u8, vbrsrq_x_n_f16, vbrsrq_x_n_f32, vbrsrq_x_n_s16, vbrsrq_x_n_s32, vbrsrq_x_n_s8, vbrsrq_x_n_u16, vbrsrq_x_n_u32, vbrsrq_x_n_u8, vcaddq_rot270_x_f16, vcaddq_rot270_x_f32, vcaddq_rot270_x_s16, vcaddq_rot270_x_s32, vcaddq_rot270_x_s8, vcaddq_rot270_x_u16, vcaddq_rot270_x_u32, vcaddq_rot270_x_u8, vcaddq_rot90_x_f16, vcaddq_rot90_x_f32, vcaddq_rot90_x_s16, vcaddq_rot90_x_s32, vcaddq_rot90_x_s8, vcaddq_rot90_x_u16, vcaddq_rot90_x_u32, vcaddq_rot90_x_u8, vclsq_x_s16, vclsq_x_s32, vclsq_x_s8, vclzq_x_s16, vclzq_x_s32, vclzq_x_s8, vclzq_x_u16, vclzq_x_u32, vclzq_x_u8, vcmulq_rot180_x_f16, vcmulq_rot180_x_f32, vcmulq_rot270_x_f16, vcmulq_rot270_x_f32, vcmulq_rot90_x_f16, vcmulq_rot90_x_f32, vcmulq_x_f16, vcmulq_x_f32, vcvtaq_x_s16_f16, vcvtaq_x_s32_f32, vcvtaq_x_u16_f16, vcvtaq_x_u32_f32, vcvtbq_x_f32_f16, vcvtmq_x_s16_f16, vcvtmq_x_s32_f32, vcvtmq_x_u16_f16, vcvtmq_x_u32_f32, vcvtnq_x_s16_f16, vcvtnq_x_s32_f32, vcvtnq_x_u16_f16, vcvtnq_x_u32_f32, vcvtpq_x_s16_f16, vcvtpq_x_s32_f32, vcvtpq_x_u16_f16, vcvtpq_x_u32_f32, vcvtq_x_f16_s16, vcvtq_x_f16_u16, vcvtq_x_f32_s32, vcvtq_x_f32_u32, vcvtq_x_n_f16_s16, vcvtq_x_n_f16_u16, vcvtq_x_n_f32_s32, vcvtq_x_n_f32_u32, vcvtq_x_n_s16_f16, vcvtq_x_n_s32_f32, vcvtq_x_n_u16_f16, vcvtq_x_n_u32_f32, vcvtq_x_s16_f16, vcvtq_x_s32_f32, vcvtq_x_u16_f16, vcvtq_x_u32_f32, vcvttq_x_f32_f16, vddupq_x_n_u16, vddupq_x_n_u32, vddupq_x_n_u8, vddupq_x_wb_u16, vddupq_x_wb_u32, vddupq_x_wb_u8, vdupq_x_n_f16, vdupq_x_n_f32, vdupq_x_n_s16, vdupq_x_n_s32, vdupq_x_n_s8, vdupq_x_n_u16, vdupq_x_n_u32, vdupq_x_n_u8, vdwdupq_x_n_u16, vdwdupq_x_n_u32, vdwdupq_x_n_u8, vdwdupq_x_wb_u16, vdwdupq_x_wb_u32, vdwdupq_x_wb_u8, veorq_x_f16, veorq_x_f32, veorq_x_s16, veorq_x_s32, veorq_x_s8, veorq_x_u16, veorq_x_u32, veorq_x_u8, vhaddq_x_n_s16, vhaddq_x_n_s32, vhaddq_x_n_s8, vhaddq_x_n_u16, vhaddq_x_n_u32, vhaddq_x_n_u8, vhaddq_x_s16, vhaddq_x_s32, vhaddq_x_s8, vhaddq_x_u16, vhaddq_x_u32, vhaddq_x_u8, vhcaddq_rot270_x_s16, vhcaddq_rot270_x_s32, vhcaddq_rot270_x_s8, vhcaddq_rot90_x_s16, vhcaddq_rot90_x_s32, vhcaddq_rot90_x_s8, vhsubq_x_n_s16, vhsubq_x_n_s32, vhsubq_x_n_s8, vhsubq_x_n_u16, vhsubq_x_n_u32, vhsubq_x_n_u8, vhsubq_x_s16, vhsubq_x_s32, vhsubq_x_s8, vhsubq_x_u16, vhsubq_x_u32, vhsubq_x_u8, vidupq_x_n_u16, vidupq_x_n_u32, vidupq_x_n_u8, vidupq_x_wb_u16, vidupq_x_wb_u32, vidupq_x_wb_u8, viwdupq_x_n_u16, viwdupq_x_n_u32, viwdupq_x_n_u8, viwdupq_x_wb_u16, viwdupq_x_wb_u32, viwdupq_x_wb_u8, vmaxnmq_x_f16, vmaxnmq_x_f32, vmaxq_x_s16, vmaxq_x_s32, vmaxq_x_s8, vmaxq_x_u16, vmaxq_x_u32, vmaxq_x_u8, vminnmq_x_f16, vminnmq_x_f32, vminq_x_s16, vminq_x_s32, vminq_x_s8, vminq_x_u16, vminq_x_u32, vminq_x_u8, vmovlbq_x_s16, vmovlbq_x_s8, vmovlbq_x_u16, vmovlbq_x_u8, vmovltq_x_s16, vmovltq_x_s8, vmovltq_x_u16, vmovltq_x_u8, vmulhq_x_s16, vmulhq_x_s32, vmulhq_x_s8, vmulhq_x_u16, vmulhq_x_u32, vmulhq_x_u8, vmullbq_int_x_s16, vmullbq_int_x_s32, vmullbq_int_x_s8, vmullbq_int_x_u16, vmullbq_int_x_u32, vmullbq_int_x_u8, vmullbq_poly_x_p16, vmullbq_poly_x_p8, vmulltq_int_x_s16, vmulltq_int_x_s32, vmulltq_int_x_s8, vmulltq_int_x_u16, vmulltq_int_x_u32, vmulltq_int_x_u8, vmulltq_poly_x_p16, vmulltq_poly_x_p8, vmulq_x_f16, vmulq_x_f32, vmulq_x_n_f16, vmulq_x_n_f32, vmulq_x_n_s16, vmulq_x_n_s32, vmulq_x_n_s8, vmulq_x_n_u16, vmulq_x_n_u32, vmulq_x_n_u8, vmulq_x_s16, vmulq_x_s32, vmulq_x_s8, vmulq_x_u16, vmulq_x_u32, vmulq_x_u8, vmvnq_x_n_s16, vmvnq_x_n_s32, vmvnq_x_n_u16, vmvnq_x_n_u32, vmvnq_x_s16, vmvnq_x_s32, vmvnq_x_s8, vmvnq_x_u16, vmvnq_x_u32, vmvnq_x_u8, vnegq_x_f16, vnegq_x_f32, vnegq_x_s16, vnegq_x_s32, vnegq_x_s8, vornq_x_f16, vornq_x_f32, vornq_x_s16, vornq_x_s32, vornq_x_s8, vornq_x_u16, vornq_x_u32, vornq_x_u8, vorrq_x_f16, vorrq_x_f32, vorrq_x_s16, vorrq_x_s32, vorrq_x_s8, vorrq_x_u16, vorrq_x_u32, vorrq_x_u8, vrev16q_x_s8, vrev16q_x_u8, vrev32q_x_f16, vrev32q_x_s16, vrev32q_x_s8, vrev32q_x_u16, vrev32q_x_u8, vrev64q_x_f16, vrev64q_x_f32, vrev64q_x_s16, vrev64q_x_s32, vrev64q_x_s8, vrev64q_x_u16, vrev64q_x_u32, vrev64q_x_u8, vrhaddq_x_s16, vrhaddq_x_s32, vrhaddq_x_s8, vrhaddq_x_u16, vrhaddq_x_u32, vrhaddq_x_u8, vrmu
[PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store intrinsics with writeback.
Hello, This patch supports following MVE ACLE intrinsics with writeback. vldrdq_gather_base_wb_s64, vldrdq_gather_base_wb_u64, vldrdq_gather_base_wb_z_s64, vldrdq_gather_base_wb_z_u64, vldrwq_gather_base_wb_f32, vldrwq_gather_base_wb_s32, vldrwq_gather_base_wb_u32, vldrwq_gather_base_wb_z_f32, vldrwq_gather_base_wb_z_s32, vldrwq_gather_base_wb_z_u32, vstrdq_scatter_base_wb_p_s64, vstrdq_scatter_base_wb_p_u64, vstrdq_scatter_base_wb_s64, vstrdq_scatter_base_wb_u64, vstrwq_scatter_base_wb_p_s32, vstrwq_scatter_base_wb_p_f32, vstrwq_scatter_base_wb_p_u32, vstrwq_scatter_base_wb_s32, vstrwq_scatter_base_wb_u32, vstrwq_scatter_base_wb_f32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-07 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (LDRGBWBS_QUALIFIERS): Define builtin qualifier. (LDRGBWBU_QUALIFIERS): Likewise. (LDRGBWBS_Z_QUALIFIERS): Likewise. (LDRGBWBU_Z_QUALIFIERS): Likewise. (STRSBWBS_QUALIFIERS): Likewise. (STRSBWBU_QUALIFIERS): Likewise. (STRSBWBS_P_QUALIFIERS): Likewise. (STRSBWBU_P_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vldrdq_gather_base_wb_s64): Define macro. (vldrdq_gather_base_wb_u64): Likewise. (vldrdq_gather_base_wb_z_s64): Likewise. (vldrdq_gather_base_wb_z_u64): Likewise. (vldrwq_gather_base_wb_f32): Likewise. (vldrwq_gather_base_wb_s32): Likewise. (vldrwq_gather_base_wb_u32): Likewise. (vldrwq_gather_base_wb_z_f32): Likewise. (vldrwq_gather_base_wb_z_s32): Likewise. (vldrwq_gather_base_wb_z_u32): Likewise. (vstrdq_scatter_base_wb_p_s64): Likewise. (vstrdq_scatter_base_wb_p_u64): Likewise. (vstrdq_scatter_base_wb_s64): Likewise. (vstrdq_scatter_base_wb_u64): Likewise. (vstrwq_scatter_base_wb_p_s32): Likewise. (vstrwq_scatter_base_wb_p_f32): Likewise. (vstrwq_scatter_base_wb_p_u32): Likewise. (vstrwq_scatter_base_wb_s32): Likewise. (vstrwq_scatter_base_wb_u32): Likewise. (vstrwq_scatter_base_wb_f32): Likewise. (__arm_vldrdq_gather_base_wb_s64): Define intrinsic. (__arm_vldrdq_gather_base_wb_u64): Likewise. (__arm_vldrdq_gather_base_wb_z_s64): Likewise. (__arm_vldrdq_gather_base_wb_z_u64): Likewise. (__arm_vldrwq_gather_base_wb_s32): Likewise. (__arm_vldrwq_gather_base_wb_u32): Likewise. (__arm_vldrwq_gather_base_wb_z_s32): Likewise. (__arm_vldrwq_gather_base_wb_z_u32): Likewise. (__arm_vstrdq_scatter_base_wb_s64): Likewise. (__arm_vstrdq_scatter_base_wb_u64): Likewise. (__arm_vstrdq_scatter_base_wb_p_s64): Likewise. (__arm_vstrdq_scatter_base_wb_p_u64): Likewise. (__arm_vstrwq_scatter_base_wb_p_s32): Likewise. (__arm_vstrwq_scatter_base_wb_p_u32): Likewise. (__arm_vstrwq_scatter_base_wb_s32): Likewise. (__arm_vstrwq_scatter_base_wb_u32): Likewise. (__arm_vldrwq_gather_base_wb_f32): Likewise. (__arm_vldrwq_gather_base_wb_z_f32): Likewise. (__arm_vstrwq_scatter_base_wb_f32): Likewise. (__arm_vstrwq_scatter_base_wb_p_f32): Likewise. (vstrwq_scatter_base_wb): Define polymorphic variant. (vstrwq_scatter_base_wb_p): Likewise. (vstrdq_scatter_base_wb_p): Likewise. (vstrdq_scatter_base_wb): Likewise. * config/arm/arm_mve_builtins.def (LDRGBWBS_QUALIFIERS): Use builtin qualifier. * config/arm/mve.md (mve_vstrwq_scatter_base_wb_v4si): Define RTL pattern. (mve_vstrwq_scatter_base_wb_add_v4si): Likewise. (mve_vstrwq_scatter_base_wb_v4si_insn): Likewise. (mve_vstrwq_scatter_base_wb_p_v4si): Likewise. (mve_vstrwq_scatter_base_wb_p_add_v4si): Likewise. (mve_vstrwq_scatter_base_wb_p_v4si_insn): Likewise. (mve_vstrwq_scatter_base_wb_fv4sf): Likewise. (mve_vstrwq_scatter_base_wb_add_fv4sf): Likewise. (mve_vstrwq_scatter_base_wb_fv4sf_insn): Likewise. (mve_vstrwq_scatter_base_wb_p_fv4sf): Likewise. (mve_vstrwq_scatter_base_wb_p_add_fv4sf): Likewise. (mve_vstrwq_scatter_base_wb_p_fv4sf_insn): Likewise. (mve_vstrdq_scatter_base_wb_v2di): Likewise. (mve_vstrdq_scatter_base_wb_add_v2di): Likewise. (mve_vstrdq_scatter_base_wb_v2di_insn): Likewise. (mve_vstrdq_scatter_base_wb_p_v2di): Likewise. (mve_vstrdq_scatter_base_wb_p_add_v2di): Likewise. (mve_vstrdq_scatter_base_wb_p_v2di_insn): Likewise. (mve_vldrwq_gather_base_wb_v4si): Likewise. (mve_vldrwq_gather_base_wb_v4si_insn): Likewise. (
[PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics.
Hello, This patch supports following MVE ACLE intrinsics. vreinterpretq_s16_s32, vreinterpretq_s16_s64, vreinterpretq_s16_s8, vreinterpretq_s16_u16, vreinterpretq_s16_u32, vreinterpretq_s16_u64, vreinterpretq_s16_u8, vreinterpretq_s32_s16, vreinterpretq_s32_s64, vreinterpretq_s32_s8, vreinterpretq_s32_u16, vreinterpretq_s32_u32, vreinterpretq_s32_u64, vreinterpretq_s32_u8, vreinterpretq_s64_s16, vreinterpretq_s64_s32, vreinterpretq_s64_s8, vreinterpretq_s64_u16, vreinterpretq_s64_u32, vreinterpretq_s64_u64, vreinterpretq_s64_u8, vreinterpretq_s8_s16, vreinterpretq_s8_s32, vreinterpretq_s8_s64, vreinterpretq_s8_u16, vreinterpretq_s8_u32, vreinterpretq_s8_u64, vreinterpretq_s8_u8, vreinterpretq_u16_s16, vreinterpretq_u16_s32, vreinterpretq_u16_s64, vreinterpretq_u16_s8, vreinterpretq_u16_u32, vreinterpretq_u16_u64, vreinterpretq_u16_u8, vreinterpretq_u32_s16, vreinterpretq_u32_s32, vreinterpretq_u32_s64, vreinterpretq_u32_s8, vreinterpretq_u32_u16, vreinterpretq_u32_u64, vreinterpretq_u32_u8, vreinterpretq_u64_s16, vreinterpretq_u64_s32, vreinterpretq_u64_s64, vreinterpretq_u64_s8, vreinterpretq_u64_u16, vreinterpretq_u64_u32, vreinterpretq_u64_u8, vreinterpretq_u8_s16, vreinterpretq_u8_s32, vreinterpretq_u8_s64, vreinterpretq_u8_s8, vreinterpretq_u8_u16, vreinterpretq_u8_u32, vreinterpretq_u8_u64, vreinterpretq_s32_f16, vreinterpretq_s32_f32, vreinterpretq_u16_f16, vreinterpretq_u16_f32, vreinterpretq_u32_f16, vreinterpretq_u32_f32, vreinterpretq_u64_f16, vreinterpretq_u64_f32, vreinterpretq_u8_f16, vreinterpretq_u8_f32, vreinterpretq_f16_f32, vreinterpretq_f16_s16, vreinterpretq_f16_s32, vreinterpretq_f16_s64, vreinterpretq_f16_s8, vreinterpretq_f16_u16, vreinterpretq_f16_u32, vreinterpretq_f16_u64, vreinterpretq_f16_u8, vreinterpretq_f32_f16, vreinterpretq_f32_s16, vreinterpretq_f32_s32, vreinterpretq_f32_s64, vreinterpretq_f32_s8, vreinterpretq_f32_u16, vreinterpretq_f32_u32, vreinterpretq_f32_u64, vreinterpretq_f32_u8, vreinterpretq_s16_f16, vreinterpretq_s16_f32, vreinterpretq_s64_f16, vreinterpretq_s64_f32, vreinterpretq_s8_f16, vreinterpretq_s8_f32, vuninitializedq_u8, vuninitializedq_u16, vuninitializedq_u32, vuninitializedq_u64, vuninitializedq_s8, vuninitializedq_s16, vuninitializedq_s32, vuninitializedq_s64, vuninitializedq_f16, vuninitializedq_f32 and vuninitializedq. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-14 Srinath Parvathaneni * config/arm/arm_mve.h (vreinterpretq_s16_s32): Define macro. (vreinterpretq_s16_s64): Likewise. (vreinterpretq_s16_s8): Likewise. (vreinterpretq_s16_u16): Likewise. (vreinterpretq_s16_u32): Likewise. (vreinterpretq_s16_u64): Likewise. (vreinterpretq_s16_u8): Likewise. (vreinterpretq_s32_s16): Likewise. (vreinterpretq_s32_s64): Likewise. (vreinterpretq_s32_s8): Likewise. (vreinterpretq_s32_u16): Likewise. (vreinterpretq_s32_u32): Likewise. (vreinterpretq_s32_u64): Likewise. (vreinterpretq_s32_u8): Likewise. (vreinterpretq_s64_s16): Likewise. (vreinterpretq_s64_s32): Likewise. (vreinterpretq_s64_s8): Likewise. (vreinterpretq_s64_u16): Likewise. (vreinterpretq_s64_u32): Likewise. (vreinterpretq_s64_u64): Likewise. (vreinterpretq_s64_u8): Likewise. (vreinterpretq_s8_s16): Likewise. (vreinterpretq_s8_s32): Likewise. (vreinterpretq_s8_s64): Likewise. (vreinterpretq_s8_u16): Likewise. (vreinterpretq_s8_u32): Likewise. (vreinterpretq_s8_u64): Likewise. (vreinterpretq_s8_u8): Likewise. (vreinterpretq_u16_s16): Likewise. (vreinterpretq_u16_s32): Likewise. (vreinterpretq_u16_s64): Likewise. (vreinterpretq_u16_s8): Likewise. (vreinterpretq_u16_u32): Likewise. (vreinterpretq_u16_u64): Likewise. (vreinterpretq_u16_u8): Likewise. (vreinterpretq_u32_s16): Likewise. (vreinterpretq_u32_s32): Likewise. (vreinterpretq_u32_s64): Likewise. (vreinterpretq_u32_s8): Likewise. (vreinterpretq_u32_u16): Likewise. (vreinterpretq_u32_u64): Likewise. (vreinterpretq_u32_u8): Likewise. (vreinterpretq_u64_s16): Likewise. (vreinterpretq_u64_s32): Likewise. (vreinterpretq_u64_s64): Likewise. (vreinterpretq_u64_s8): Likewise. (vreinterpretq_u64_u16): Likewise. (vreinterpretq_u64_u32): Likewise. (vreinterpretq_u64_u8): Likewise. (vreinterpretq_u8_s16): Likewise. (vreinterpretq_u8_s32): Likewise. (vreinterpretq_u8_s64): Likewise. (vreinterpretq_u8_s8): Likewise. (vreinterpretq_u8_u16): Likewise.
[PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup intrinsics with writeback.
Hello, This patch supports following MVE ACLE intrinsics with writeback. vddupq_m_n_u8, vddupq_m_n_u32, vddupq_m_n_u16, vddupq_m_wb_u8, vddupq_m_wb_u16, vddupq_m_wb_u32, vddupq_n_u8, vddupq_n_u32, vddupq_n_u16, vddupq_wb_u8, vddupq_wb_u16, vddupq_wb_u32, vdwdupq_m_n_u8, vdwdupq_m_n_u32, vdwdupq_m_n_u16, vdwdupq_m_wb_u8, vdwdupq_m_wb_u32, vdwdupq_m_wb_u16, vdwdupq_n_u8, vdwdupq_n_u32, vdwdupq_n_u16, vdwdupq_wb_u8, vdwdupq_wb_u32, vdwdupq_wb_u16, vidupq_m_n_u8, vidupq_m_n_u32, vidupq_m_n_u16, vidupq_m_wb_u8, vidupq_m_wb_u16, vidupq_m_wb_u32, vidupq_n_u8, vidupq_n_u32, vidupq_n_u16, vidupq_wb_u8, vidupq_wb_u16, vidupq_wb_u32, viwdupq_m_n_u8, viwdupq_m_n_u32, viwdupq_m_n_u16, viwdupq_m_wb_u8, viwdupq_m_wb_u32, viwdupq_m_wb_u16, viwdupq_n_u8, viwdupq_n_u32, viwdupq_n_u16, viwdupq_wb_u8, viwdupq_wb_u32, viwdupq_wb_u16. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-07 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Define quinary builtin qualifier. * config/arm/arm_mve.h (vddupq_m_n_u8): Define macro. (vddupq_m_n_u32): Likewise. (vddupq_m_n_u16): Likewise. (vddupq_m_wb_u8): Likewise. (vddupq_m_wb_u16): Likewise. (vddupq_m_wb_u32): Likewise. (vddupq_n_u8): Likewise. (vddupq_n_u32): Likewise. (vddupq_n_u16): Likewise. (vddupq_wb_u8): Likewise. (vddupq_wb_u16): Likewise. (vddupq_wb_u32): Likewise. (vdwdupq_m_n_u8): Likewise. (vdwdupq_m_n_u32): Likewise. (vdwdupq_m_n_u16): Likewise. (vdwdupq_m_wb_u8): Likewise. (vdwdupq_m_wb_u32): Likewise. (vdwdupq_m_wb_u16): Likewise. (vdwdupq_n_u8): Likewise. (vdwdupq_n_u32): Likewise. (vdwdupq_n_u16): Likewise. (vdwdupq_wb_u8): Likewise. (vdwdupq_wb_u32): Likewise. (vdwdupq_wb_u16): Likewise. (vidupq_m_n_u8): Likewise. (vidupq_m_n_u32): Likewise. (vidupq_m_n_u16): Likewise. (vidupq_m_wb_u8): Likewise. (vidupq_m_wb_u16): Likewise. (vidupq_m_wb_u32): Likewise. (vidupq_n_u8): Likewise. (vidupq_n_u32): Likewise. (vidupq_n_u16): Likewise. (vidupq_wb_u8): Likewise. (vidupq_wb_u16): Likewise. (vidupq_wb_u32): Likewise. (viwdupq_m_n_u8): Likewise. (viwdupq_m_n_u32): Likewise. (viwdupq_m_n_u16): Likewise. (viwdupq_m_wb_u8): Likewise. (viwdupq_m_wb_u32): Likewise. (viwdupq_m_wb_u16): Likewise. (viwdupq_n_u8): Likewise. (viwdupq_n_u32): Likewise. (viwdupq_n_u16): Likewise. (viwdupq_wb_u8): Likewise. (viwdupq_wb_u32): Likewise. (viwdupq_wb_u16): Likewise. (__arm_vddupq_m_n_u8): Define intrinsic. (__arm_vddupq_m_n_u32): Likewise. (__arm_vddupq_m_n_u16): Likewise. (__arm_vddupq_m_wb_u8): Likewise. (__arm_vddupq_m_wb_u16): Likewise. (__arm_vddupq_m_wb_u32): Likewise. (__arm_vddupq_n_u8): Likewise. (__arm_vddupq_n_u32): Likewise. (__arm_vddupq_n_u16): Likewise. (__arm_vdwdupq_m_n_u8): Likewise. (__arm_vdwdupq_m_n_u32): Likewise. (__arm_vdwdupq_m_n_u16): Likewise. (__arm_vdwdupq_m_wb_u8): Likewise. (__arm_vdwdupq_m_wb_u32): Likewise. (__arm_vdwdupq_m_wb_u16): Likewise. (__arm_vdwdupq_n_u8): Likewise. (__arm_vdwdupq_n_u32): Likewise. (__arm_vdwdupq_n_u16): Likewise. (__arm_vdwdupq_wb_u8): Likewise. (__arm_vdwdupq_wb_u32): Likewise. (__arm_vdwdupq_wb_u16): Likewise. (__arm_vidupq_m_n_u8): Likewise. (__arm_vidupq_m_n_u32): Likewise. (__arm_vidupq_m_n_u16): Likewise. (__arm_vidupq_n_u8): Likewise. (__arm_vidupq_m_wb_u8): Likewise. (__arm_vidupq_m_wb_u16): Likewise. (__arm_vidupq_m_wb_u32): Likewise. (__arm_vidupq_n_u32): Likewise. (__arm_vidupq_n_u16): Likewise. (__arm_vidupq_wb_u8): Likewise. (__arm_vidupq_wb_u16): Likewise. (__arm_vidupq_wb_u32): Likewise. (__arm_vddupq_wb_u8): Likewise. (__arm_vddupq_wb_u16): Likewise. (__arm_vddupq_wb_u32): Likewise. (__arm_viwdupq_m_n_u8): Likewise. (__arm_viwdupq_m_n_u32): Likewise. (__arm_viwdupq_m_n_u16): Likewise. (__arm_viwdupq_m_wb_u8): Likewise. (__arm_viwdupq_m_wb_u32): Likewise. (__arm_viwdupq_m_wb_u16): Likewise. (__arm_viwdupq_n_u8): Likewise. (__arm_viwdupq_n_u32): Likewise. (__arm_viwdupq_n_u16): Likewise. (__arm_viwdupq_
[PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half word or word to memory.
Hello, This patch supports the following MVE ACLE store intrinsics which stores a byte, halfword, or word to memory. vst1q_f32, vst1q_f16, vst1q_s8, vst1q_s32, vst1q_s16, vst1q_u8, vst1q_u32, vst1q_u16, vstrhq_f16, vstrhq_scatter_offset_s32, vstrhq_scatter_offset_s16, vstrhq_scatter_offset_u32, vstrhq_scatter_offset_u16, vstrhq_scatter_offset_p_s32, vstrhq_scatter_offset_p_s16, vstrhq_scatter_offset_p_u32, vstrhq_scatter_offset_p_u16, vstrhq_scatter_shifted_offset_s32, vstrhq_scatter_shifted_offset_s16, vstrhq_scatter_shifted_offset_u32, vstrhq_scatter_shifted_offset_u16, vstrhq_scatter_shifted_offset_p_s32, vstrhq_scatter_shifted_offset_p_s16, vstrhq_scatter_shifted_offset_p_u32, vstrhq_scatter_shifted_offset_p_u16, vstrhq_s32, vstrhq_s16, vstrhq_u32, vstrhq_u16, vstrhq_p_f16, vstrhq_p_s32, vstrhq_p_s16, vstrhq_p_u32, vstrhq_p_u16, vstrwq_f32, vstrwq_s32, vstrwq_u32, vstrwq_p_f32, vstrwq_p_s32, vstrwq_p_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-01 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vst1q_f32): Define macro. (vst1q_f16): Likewise. (vst1q_s8): Likewise. (vst1q_s32): Likewise. (vst1q_s16): Likewise. (vst1q_u8): Likewise. (vst1q_u32): Likewise. (vst1q_u16): Likewise. (vstrhq_f16): Likewise. (vstrhq_scatter_offset_s32): Likewise. (vstrhq_scatter_offset_s16): Likewise. (vstrhq_scatter_offset_u32): Likewise. (vstrhq_scatter_offset_u16): Likewise. (vstrhq_scatter_offset_p_s32): Likewise. (vstrhq_scatter_offset_p_s16): Likewise. (vstrhq_scatter_offset_p_u32): Likewise. (vstrhq_scatter_offset_p_u16): Likewise. (vstrhq_scatter_shifted_offset_s32): Likewise. (vstrhq_scatter_shifted_offset_s16): Likewise. (vstrhq_scatter_shifted_offset_u32): Likewise. (vstrhq_scatter_shifted_offset_u16): Likewise. (vstrhq_scatter_shifted_offset_p_s32): Likewise. (vstrhq_scatter_shifted_offset_p_s16): Likewise. (vstrhq_scatter_shifted_offset_p_u32): Likewise. (vstrhq_scatter_shifted_offset_p_u16): Likewise. (vstrhq_s32): Likewise. (vstrhq_s16): Likewise. (vstrhq_u32): Likewise. (vstrhq_u16): Likewise. (vstrhq_p_f16): Likewise. (vstrhq_p_s32): Likewise. (vstrhq_p_s16): Likewise. (vstrhq_p_u32): Likewise. (vstrhq_p_u16): Likewise. (vstrwq_f32): Likewise. (vstrwq_s32): Likewise. (vstrwq_u32): Likewise. (vstrwq_p_f32): Likewise. (vstrwq_p_s32): Likewise. (vstrwq_p_u32): Likewise. (__arm_vst1q_s8): Define intrinsic. (__arm_vst1q_s32): Likewise. (__arm_vst1q_s16): Likewise. (__arm_vst1q_u8): Likewise. (__arm_vst1q_u32): Likewise. (__arm_vst1q_u16): Likewise. (__arm_vstrhq_scatter_offset_s32): Likewise. (__arm_vstrhq_scatter_offset_s16): Likewise. (__arm_vstrhq_scatter_offset_u32): Likewise. (__arm_vstrhq_scatter_offset_u16): Likewise. (__arm_vstrhq_scatter_offset_p_s32): Likewise. (__arm_vstrhq_scatter_offset_p_s16): Likewise. (__arm_vstrhq_scatter_offset_p_u32): Likewise. (__arm_vstrhq_scatter_offset_p_u16): Likewise. (__arm_vstrhq_scatter_shifted_offset_s32): Likewise. (__arm_vstrhq_scatter_shifted_offset_s16): Likewise. (__arm_vstrhq_scatter_shifted_offset_u32): Likewise. (__arm_vstrhq_scatter_shifted_offset_u16): Likewise. (__arm_vstrhq_scatter_shifted_offset_p_s32): Likewise. (__arm_vstrhq_scatter_shifted_offset_p_s16): Likewise. (__arm_vstrhq_scatter_shifted_offset_p_u32): Likewise. (__arm_vstrhq_scatter_shifted_offset_p_u16): Likewise. (__arm_vstrhq_s32): Likewise. (__arm_vstrhq_s16): Likewise. (__arm_vstrhq_u32): Likewise. (__arm_vstrhq_u16): Likewise. (__arm_vstrhq_p_s32): Likewise. (__arm_vstrhq_p_s16): Likewise. (__arm_vstrhq_p_u32): Likewise. (__arm_vstrhq_p_u16): Likewise. (__arm_vstrwq_s32): Likewise. (__arm_vstrwq_u32): Likewise. (__arm_vstrwq_p_s32): Likewise. (__arm_vstrwq_p_u32): Likewise. (__arm_vstrwq_p_f32): Likewise. (__arm_vstrwq_f32): Likewise. (__arm_vst1q_f32): Likewise. (__arm_vst1q_f16): Likewise. (__arm_vstrhq_f16): Likewise. (__arm_vstrhq_p_f16): Likewise. (vst1q): Define polymorphic variant. (vstrhq): Likewise. (vstrhq_p): Likewise. (vstrhq_scatter_offset_p): Likewise. (vstrhq_scatter_offset): Likewise.
[PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, halfword, or word from memory.
Hello, This patch supports the following MVE ACLE load intrinsics which load a byte, halfword, or word from memory. vld1q_s8, vld1q_s32, vld1q_s16, vld1q_u8, vld1q_u32, vld1q_u16, vldrhq_gather_offset_s32, vldrhq_gather_offset_s16, vldrhq_gather_offset_u32, vldrhq_gather_offset_u16, vldrhq_gather_offset_z_s32, vldrhq_gather_offset_z_s16, vldrhq_gather_offset_z_u32, vldrhq_gather_offset_z_u16, vldrhq_gather_shifted_offset_s32,vldrwq_f32, vldrwq_z_f32, vldrhq_gather_shifted_offset_s16, vldrhq_gather_shifted_offset_u32, vldrhq_gather_shifted_offset_u16, vldrhq_gather_shifted_offset_z_s32, vldrhq_gather_shifted_offset_z_s16, vldrhq_gather_shifted_offset_z_u32, vldrhq_gather_shifted_offset_z_u16, vldrhq_s32, vldrhq_s16, vldrhq_u32, vldrhq_u16, vldrhq_z_s32, vldrhq_z_s16, vldrhq_z_u32, vldrhq_z_u16, vldrwq_s32, vldrwq_u32, vldrwq_z_s32, vldrwq_z_u32, vld1q_f32, vld1q_f16, vldrhq_f16, vldrhq_z_f16. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-01 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vld1q_s8): Define macro. (vld1q_s32): Likewise. (vld1q_s16): Likewise. (vld1q_u8): Likewise. (vld1q_u32): Likewise. (vld1q_u16): Likewise. (vldrhq_gather_offset_s32): Likewise. (vldrhq_gather_offset_s16): Likewise. (vldrhq_gather_offset_u32): Likewise. (vldrhq_gather_offset_u16): Likewise. (vldrhq_gather_offset_z_s32): Likewise. (vldrhq_gather_offset_z_s16): Likewise. (vldrhq_gather_offset_z_u32): Likewise. (vldrhq_gather_offset_z_u16): Likewise. (vldrhq_gather_shifted_offset_s32): Likewise. (vldrhq_gather_shifted_offset_s16): Likewise. (vldrhq_gather_shifted_offset_u32): Likewise. (vldrhq_gather_shifted_offset_u16): Likewise. (vldrhq_gather_shifted_offset_z_s32): Likewise. (vldrhq_gather_shifted_offset_z_s16): Likewise. (vldrhq_gather_shifted_offset_z_u32): Likewise. (vldrhq_gather_shifted_offset_z_u16): Likewise. (vldrhq_s32): Likewise. (vldrhq_s16): Likewise. (vldrhq_u32): Likewise. (vldrhq_u16): Likewise. (vldrhq_z_s32): Likewise. (vldrhq_z_s16): Likewise. (vldrhq_z_u32): Likewise. (vldrhq_z_u16): Likewise. (vldrwq_s32): Likewise. (vldrwq_u32): Likewise. (vldrwq_z_s32): Likewise. (vldrwq_z_u32): Likewise. (vld1q_f32): Likewise. (vld1q_f16): Likewise. (vldrhq_f16): Likewise. (vldrhq_z_f16): Likewise. (vldrwq_f32): Likewise. (vldrwq_z_f32): Likewise. (__arm_vld1q_s8): Define intrinsic. (__arm_vld1q_s32): Likewise. (__arm_vld1q_s16): Likewise. (__arm_vld1q_u8): Likewise. (__arm_vld1q_u32): Likewise. (__arm_vld1q_u16): Likewise. (__arm_vldrhq_gather_offset_s32): Likewise. (__arm_vldrhq_gather_offset_s16): Likewise. (__arm_vldrhq_gather_offset_u32): Likewise. (__arm_vldrhq_gather_offset_u16): Likewise. (__arm_vldrhq_gather_offset_z_s32): Likewise. (__arm_vldrhq_gather_offset_z_s16): Likewise. (__arm_vldrhq_gather_offset_z_u32): Likewise. (__arm_vldrhq_gather_offset_z_u16): Likewise. (__arm_vldrhq_gather_shifted_offset_s32): Likewise. (__arm_vldrhq_gather_shifted_offset_s16): Likewise. (__arm_vldrhq_gather_shifted_offset_u32): Likewise. (__arm_vldrhq_gather_shifted_offset_u16): Likewise. (__arm_vldrhq_gather_shifted_offset_z_s32): Likewise. (__arm_vldrhq_gather_shifted_offset_z_s16): Likewise. (__arm_vldrhq_gather_shifted_offset_z_u32): Likewise. (__arm_vldrhq_gather_shifted_offset_z_u16): Likewise. (__arm_vldrhq_s32): Likewise. (__arm_vldrhq_s16): Likewise. (__arm_vldrhq_u32): Likewise. (__arm_vldrhq_u16): Likewise. (__arm_vldrhq_z_s32): Likewise. (__arm_vldrhq_z_s16): Likewise. (__arm_vldrhq_z_u32): Likewise. (__arm_vldrhq_z_u16): Likewise. (__arm_vldrwq_s32): Likewise. (__arm_vldrwq_u32): Likewise. (__arm_vldrwq_z_s32): Likewise. (__arm_vldrwq_z_u32): Likewise. (__arm_vld1q_f32): Likewise. (__arm_vld1q_f16): Likewise. (__arm_vldrwq_f32): Likewise. (__arm_vldrwq_z_f32): Likewise. (__arm_vldrhq_z_f16): Likewise. (__arm_vldrhq_f16): Likewise. (vld1q): Define polymorphic variant. (vldrhq_gather_offset): Likewise. (vldrhq_gather_offset_z): Likewise. (vldrhq_gather_shifted_offset): Likewise. (vldrhq_gather_shifted_offset_z): Likewise. * config
[PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus operator.
Hello, This patch supports following MVE ACLE vaddq intrinsics. The RTL patterns for this intrinsics are added using arithmetic "plus" operator. vaddq_s8, vaddq_s16, vaddq_s32, vaddq_u8, vaddq_u16, vaddq_u32, vaddq_f16, vaddq_f32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-05 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vaddq_s8): Define macro. (vaddq_s16): Likewise. (vaddq_s32): Likewise. (vaddq_u8): Likewise. (vaddq_u16): Likewise. (vaddq_u32): Likewise. (vaddq_f16): Likewise. (vaddq_f32): Likewise. (__arm_vaddq_s8): Define intrinsic. (__arm_vaddq_s16): Likewise. (__arm_vaddq_s32): Likewise. (__arm_vaddq_u8): Likewise. (__arm_vaddq_u16): Likewise. (__arm_vaddq_u32): Likewise. (__arm_vaddq_f16): Likewise. (__arm_vaddq_f32): Likewise. (vaddq): Define polymorphic variant. * config/arm/iterators.md (VNIM): Define mode iterator for common types Neon, IWMMXT and MVE. (VNINOTM): Likewise. * config/arm/mve.md (mve_vaddq): Define RTL pattern. (mve_vaddq_f): Define RTL pattern. * config/arm/neon.md (add3): Rename to addv4hf3 RTL pattern. (addv8hf3_neon): Define RTL pattern. * config/arm/vec-common.md (add3): Modify standard add RTL pattern to support MVE. (addv8hf3): Define standard RTL pattern for MVE and Neon. (add3): Modify existing standard add RTL pattern for Neon and IWMMXT. gcc/testsuite/ChangeLog: 2019-11-05 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vaddq_f16.c: New test. * gcc.target/arm/mve/intrinsics/vaddq_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vaddq_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vaddq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vaddq_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vaddq_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vaddq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vaddq_u8.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 42e98f9ad1e357fe974e58378a49bcaaf36c302a..89456589c9dcdff5b56e8707dd720fb15141 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -1898,6 +1898,14 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t; #define vstrwq_scatter_shifted_offset_p_u32(__base, __offset, __value, __p) __arm_vstrwq_scatter_shifted_offset_p_u32(__base, __offset, __value, __p) #define vstrwq_scatter_shifted_offset_s32(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset_s32(__base, __offset, __value) #define vstrwq_scatter_shifted_offset_u32(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset_u32(__base, __offset, __value) +#define vaddq_s8(__a, __b) __arm_vaddq_s8(__a, __b) +#define vaddq_s16(__a, __b) __arm_vaddq_s16(__a, __b) +#define vaddq_s32(__a, __b) __arm_vaddq_s32(__a, __b) +#define vaddq_u8(__a, __b) __arm_vaddq_u8(__a, __b) +#define vaddq_u16(__a, __b) __arm_vaddq_u16(__a, __b) +#define vaddq_u32(__a, __b) __arm_vaddq_u32(__a, __b) +#define vaddq_f16(__a, __b) __arm_vaddq_f16(__a, __b) +#define vaddq_f32(__a, __b) __arm_vaddq_f32(__a, __b) #endif __extension__ extern __inline void @@ -12341,6 +12349,48 @@ __arm_vstrwq_scatter_shifted_offset_u32 (uint32_t * __base, uint32x4_t __offset, __builtin_mve_vstrwq_scatter_shifted_offset_uv4si ((__builtin_neon_si *) __base, __offset, __value); } +__extension__ extern __inline int8x16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vaddq_s8 (int8x16_t __a, int8x16_t __b) +{ + return __a + __b; +} + +__extension__ extern __inline int16x8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vaddq_s16 (int16x8_t __a, int16x8_t __b) +{ + return __a + __b; +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vaddq_s32 (int32x4_t __a, int32x4_t __b) +{ + return __a + __b; +} + +__extension__ extern __inline uint8x16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vaddq_u8 (uint8x16_t __a, uint8x16_t __b) +{ + return __a + __b; +} + +__extension__ extern __inline uint16x8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vaddq_u16 (uint16x8_t __a, uint16x8_t __b) +{ + return __a + __b; +} + +__extension__ extern __inline uint32x4_t +__attribute__ ((__always_inline__, __gnu_inl
[PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads half word and word or double word from memory.
Hello, This patch supports the following Remaining MVE ACLE load intrinsics which load an halfword, word or double word from memory. vldrdq_gather_base_s64, vldrdq_gather_base_u64, vldrdq_gather_base_z_s64, vldrdq_gather_base_z_u64, vldrdq_gather_offset_s64, vldrdq_gather_offset_u64, vldrdq_gather_offset_z_s64, vldrdq_gather_offset_z_u64, vldrdq_gather_shifted_offset_s64, vldrdq_gather_shifted_offset_u64, vldrdq_gather_shifted_offset_z_s64, vldrdq_gather_shifted_offset_z_u64, vldrhq_gather_offset_f16, vldrhq_gather_offset_z_f16, vldrhq_gather_shifted_offset_f16, vldrhq_gather_shifted_offset_z_f16, vldrwq_gather_base_f32, vldrwq_gather_base_z_f32, vldrwq_gather_offset_f32, vldrwq_gather_offset_s32, vldrwq_gather_offset_u32, vldrwq_gather_offset_z_f32, vldrwq_gather_offset_z_s32, vldrwq_gather_offset_z_u32, vldrwq_gather_shifted_offset_f32, vldrwq_gather_shifted_offset_s32, vldrwq_gather_shifted_offset_u32, vldrwq_gather_shifted_offset_z_f32, vldrwq_gather_shifted_offset_z_s32, vldrwq_gather_shifted_offset_z_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-01 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vldrdq_gather_base_s64): Define macro. (vldrdq_gather_base_u64): Likewise. (vldrdq_gather_base_z_s64): Likewise. (vldrdq_gather_base_z_u64): Likewise. (vldrdq_gather_offset_s64): Likewise. (vldrdq_gather_offset_u64): Likewise. (vldrdq_gather_offset_z_s64): Likewise. (vldrdq_gather_offset_z_u64): Likewise. (vldrdq_gather_shifted_offset_s64): Likewise. (vldrdq_gather_shifted_offset_u64): Likewise. (vldrdq_gather_shifted_offset_z_s64): Likewise. (vldrdq_gather_shifted_offset_z_u64): Likewise. (vldrhq_gather_offset_f16): Likewise. (vldrhq_gather_offset_z_f16): Likewise. (vldrhq_gather_shifted_offset_f16): Likewise. (vldrhq_gather_shifted_offset_z_f16): Likewise. (vldrwq_gather_base_f32): Likewise. (vldrwq_gather_base_z_f32): Likewise. (vldrwq_gather_offset_f32): Likewise. (vldrwq_gather_offset_s32): Likewise. (vldrwq_gather_offset_u32): Likewise. (vldrwq_gather_offset_z_f32): Likewise. (vldrwq_gather_offset_z_s32): Likewise. (vldrwq_gather_offset_z_u32): Likewise. (vldrwq_gather_shifted_offset_f32): Likewise. (vldrwq_gather_shifted_offset_s32): Likewise. (vldrwq_gather_shifted_offset_u32): Likewise. (vldrwq_gather_shifted_offset_z_f32): Likewise. (vldrwq_gather_shifted_offset_z_s32): Likewise. (vldrwq_gather_shifted_offset_z_u32): Likewise. (__arm_vldrdq_gather_base_s64): Define intrinsic. (__arm_vldrdq_gather_base_u64): Likewise. (__arm_vldrdq_gather_base_z_s64): Likewise. (__arm_vldrdq_gather_base_z_u64): Likewise. (__arm_vldrdq_gather_offset_s64): Likewise. (__arm_vldrdq_gather_offset_u64): Likewise. (__arm_vldrdq_gather_offset_z_s64): Likewise. (__arm_vldrdq_gather_offset_z_u64): Likewise. (__arm_vldrdq_gather_shifted_offset_s64): Likewise. (__arm_vldrdq_gather_shifted_offset_u64): Likewise. (__arm_vldrdq_gather_shifted_offset_z_s64): Likewise. (__arm_vldrdq_gather_shifted_offset_z_u64): Likewise. (__arm_vldrwq_gather_offset_s32): Likewise. (__arm_vldrwq_gather_offset_u32): Likewise. (__arm_vldrwq_gather_offset_z_s32): Likewise. (__arm_vldrwq_gather_offset_z_u32): Likewise. (__arm_vldrwq_gather_shifted_offset_s32): Likewise. (__arm_vldrwq_gather_shifted_offset_u32): Likewise. (__arm_vldrwq_gather_shifted_offset_z_s32): Likewise. (__arm_vldrwq_gather_shifted_offset_z_u32): Likewise. (__arm_vldrhq_gather_offset_f16): Likewise. (__arm_vldrhq_gather_offset_z_f16): Likewise. (__arm_vldrhq_gather_shifted_offset_f16): Likewise. (__arm_vldrhq_gather_shifted_offset_z_f16): Likewise. (__arm_vldrwq_gather_base_f32): Likewise. (__arm_vldrwq_gather_base_z_f32): Likewise. (__arm_vldrwq_gather_offset_f32): Likewise. (__arm_vldrwq_gather_offset_z_f32): Likewise. (__arm_vldrwq_gather_shifted_offset_f32): Likewise. (__arm_vldrwq_gather_shifted_offset_z_f32): Likewise. (vldrhq_gather_offset): Define polymorphic variant. (vldrhq_gather_offset_z): Likewise. (vldrhq_gather_shifted_offset): Likewise. (vldrhq_gather_shifted_offset_z): Likewise. (vldrwq_gather_offset): Likewise. (vldrwq_gather_offset_z): Likewise. (vldrwq_gather_shifted_offset): Likewise. (vldrwq_gather_shifted_offs
[PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix.
Hello, This patch supports the following MVE ACLE store intrinsics with predicated suffix. vstrbq_p_s8, vstrbq_p_s32, vstrbq_p_s16, vstrbq_p_u8, vstrbq_p_u32, vstrbq_p_u16, vstrbq_scatter_offset_p_s8, vstrbq_scatter_offset_p_s32, vstrbq_scatter_offset_p_s16, vstrbq_scatter_offset_p_u8, vstrbq_scatter_offset_p_u32, vstrbq_scatter_offset_p_u16, vstrwq_scatter_base_p_s32, vstrwq_scatter_base_p_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-01 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (STRS_P_QUALIFIERS): Define builtin qualifier. (STRU_P_QUALIFIERS): Likewise. (STRSU_P_QUALIFIERS): Likewise. (STRSS_P_QUALIFIERS): Likewise. (STRSBS_P_QUALIFIERS): Likewise. (STRSBU_P_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vstrbq_p_s8): Define macro. (vstrbq_p_s32): Likewise. (vstrbq_p_s16): Likewise. (vstrbq_p_u8): Likewise. (vstrbq_p_u32): Likewise. (vstrbq_p_u16): Likewise. (vstrbq_scatter_offset_p_s8): Likewise. (vstrbq_scatter_offset_p_s32): Likewise. (vstrbq_scatter_offset_p_s16): Likewise. (vstrbq_scatter_offset_p_u8): Likewise. (vstrbq_scatter_offset_p_u32): Likewise. (vstrbq_scatter_offset_p_u16): Likewise. (vstrwq_scatter_base_p_s32): Likewise. (vstrwq_scatter_base_p_u32): Likewise. (__arm_vstrbq_p_s8): Define intrinsic. (__arm_vstrbq_p_s32): Likewise. (__arm_vstrbq_p_s16): Likewise. (__arm_vstrbq_p_u8): Likewise. (__arm_vstrbq_p_u32): Likewise. (__arm_vstrbq_p_u16): Likewise. (__arm_vstrbq_scatter_offset_p_s8): Likewise. (__arm_vstrbq_scatter_offset_p_s32): Likewise. (__arm_vstrbq_scatter_offset_p_s16): Likewise. (__arm_vstrbq_scatter_offset_p_u8): Likewise. (__arm_vstrbq_scatter_offset_p_u32): Likewise. (__arm_vstrbq_scatter_offset_p_u16): Likewise. (__arm_vstrwq_scatter_base_p_s32): Likewise. (__arm_vstrwq_scatter_base_p_u32): Likewise. (vstrbq_p): Define polymorphic variant. (vstrbq_scatter_offset_p): Likewise. (vstrwq_scatter_base_p): Likewise. * config/arm/arm_mve_builtins.def (STRS_P_QUALIFIERS): Use builtin qualifier. (STRU_P_QUALIFIERS): Likewise. (STRSU_P_QUALIFIERS): Likewise. (STRSS_P_QUALIFIERS): Likewise. (STRSBS_P_QUALIFIERS): Likewise. (STRSBU_P_QUALIFIERS): Likewise. * config/arm/mve.md (mve_vstrbq_scatter_offset_p_): Define RTL pattern. (mve_vstrwq_scatter_base_p_v4si): Likewise. (mve_vstrbq_p_): Likewise. gcc/testsuite/ChangeLog: 2019-11-01 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vstrbq_p_s16.c: New test. * gcc.target/arm/mve/intrinsics/vstrbq_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_p_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_p_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_p_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 02ea297937b18099f33a50c808964d1dd7eac1b3..b5639051bf07785d906ed596e08d670f4de1a67e 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -589,6 +589,41 @@ arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define STRSBU_QUALIFIERS (arm_strsbu_qualifiers) static enum arm_type_qualifiers +arm_strs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_pointer, qualifier_none, qualifier_unsigned}; +#define STRS_P_QUALIFIERS (arm_strs_p_qualifiers) + +static enum arm_type_qualifiers +arm_stru_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_pointer, qualifier_unsigned, + qualifier_unsigned}; +#define STRU_P_QUALIFIERS (arm_stru_
[PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores an half word, word and double word to memory.
Hello, This patch supports the following MVE ACLE store intrinsics which stores an halfword, word or double word to memory. vstrdq_scatter_base_p_s64, vstrdq_scatter_base_p_u64, vstrdq_scatter_base_s64, vstrdq_scatter_base_u64, vstrdq_scatter_offset_p_s64, vstrdq_scatter_offset_p_u64, vstrdq_scatter_offset_s64, vstrdq_scatter_offset_u64, vstrdq_scatter_shifted_offset_p_s64, vstrdq_scatter_shifted_offset_p_u64, vstrdq_scatter_shifted_offset_s64, vstrdq_scatter_shifted_offset_u64, vstrhq_scatter_offset_f16, vstrhq_scatter_offset_p_f16, vstrhq_scatter_shifted_offset_f16, vstrhq_scatter_shifted_offset_p_f16, vstrwq_scatter_base_f32, vstrwq_scatter_base_p_f32, vstrwq_scatter_offset_f32, vstrwq_scatter_offset_p_f32, vstrwq_scatter_offset_p_s32, vstrwq_scatter_offset_p_u32, vstrwq_scatter_offset_s32, vstrwq_scatter_offset_u32, vstrwq_scatter_shifted_offset_f32, vstrwq_scatter_shifted_offset_p_f32, vstrwq_scatter_shifted_offset_p_s32, vstrwq_scatter_shifted_offset_p_u32, vstrwq_scatter_shifted_offset_s32, vstrwq_scatter_shifted_offset_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics In this patch a new predicate "Ri" is defined to check the immediate is in the range of +/-1016 and multiple of 8. Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-05 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vstrdq_scatter_base_p_s64): Define macro. (vstrdq_scatter_base_p_u64): Likewise. (vstrdq_scatter_base_s64): Likewise. (vstrdq_scatter_base_u64): Likewise. (vstrdq_scatter_offset_p_s64): Likewise. (vstrdq_scatter_offset_p_u64): Likewise. (vstrdq_scatter_offset_s64): Likewise. (vstrdq_scatter_offset_u64): Likewise. (vstrdq_scatter_shifted_offset_p_s64): Likewise. (vstrdq_scatter_shifted_offset_p_u64): Likewise. (vstrdq_scatter_shifted_offset_s64): Likewise. (vstrdq_scatter_shifted_offset_u64): Likewise. (vstrhq_scatter_offset_f16): Likewise. (vstrhq_scatter_offset_p_f16): Likewise. (vstrhq_scatter_shifted_offset_f16): Likewise. (vstrhq_scatter_shifted_offset_p_f16): Likewise. (vstrwq_scatter_base_f32): Likewise. (vstrwq_scatter_base_p_f32): Likewise. (vstrwq_scatter_offset_f32): Likewise. (vstrwq_scatter_offset_p_f32): Likewise. (vstrwq_scatter_offset_p_s32): Likewise. (vstrwq_scatter_offset_p_u32): Likewise. (vstrwq_scatter_offset_s32): Likewise. (vstrwq_scatter_offset_u32): Likewise. (vstrwq_scatter_shifted_offset_f32): Likewise. (vstrwq_scatter_shifted_offset_p_f32): Likewise. (vstrwq_scatter_shifted_offset_p_s32): Likewise. (vstrwq_scatter_shifted_offset_p_u32): Likewise. (vstrwq_scatter_shifted_offset_s32): Likewise. (vstrwq_scatter_shifted_offset_u32): Likewise. (__arm_vstrdq_scatter_base_p_s64): Define intrinsic. (__arm_vstrdq_scatter_base_p_u64): Likewise. (__arm_vstrdq_scatter_base_s64): Likewise. (__arm_vstrdq_scatter_base_u64): Likewise. (__arm_vstrdq_scatter_offset_p_s64): Likewise. (__arm_vstrdq_scatter_offset_p_u64): Likewise. (__arm_vstrdq_scatter_offset_s64): Likewise. (__arm_vstrdq_scatter_offset_u64): Likewise. (__arm_vstrdq_scatter_shifted_offset_p_s64): Likewise. (__arm_vstrdq_scatter_shifted_offset_p_u64): Likewise. (__arm_vstrdq_scatter_shifted_offset_s64): Likewise. (__arm_vstrdq_scatter_shifted_offset_u64): Likewise. (__arm_vstrwq_scatter_offset_p_s32): Likewise. (__arm_vstrwq_scatter_offset_p_u32): Likewise. (__arm_vstrwq_scatter_offset_s32): Likewise. (__arm_vstrwq_scatter_offset_u32): Likewise. (__arm_vstrwq_scatter_shifted_offset_p_s32): Likewise. (__arm_vstrwq_scatter_shifted_offset_p_u32): Likewise. (__arm_vstrwq_scatter_shifted_offset_s32): Likewise. (__arm_vstrwq_scatter_shifted_offset_u32): Likewise. (__arm_vstrhq_scatter_offset_f16): Likewise. (__arm_vstrhq_scatter_offset_p_f16): Likewise. (__arm_vstrhq_scatter_shifted_offset_f16): Likewise. (__arm_vstrhq_scatter_shifted_offset_p_f16): Likewise. (__arm_vstrwq_scatter_base_f32): Likewise. (__arm_vstrwq_scatter_base_p_f32): Likewise. (__arm_vstrwq_scatter_offset_f32): Likewise. (__arm_vstrwq_scatter_offset_p_f32): Likewise. (__arm_vstrwq_scatter_shifted_offset_f32): Likewise. (__arm_vstrwq_scatter_shifted_offset_p_f32): Likewise. (vstrhq_scatter_offset): Define polymorphic variant. (vstrhq_scatter_offset_p): Likewise. (vstrhq_scatter_shifted_offset): Likewise. (vstrhq_scatter
[PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix.
Hello, This patch supports the following MVE ACLE load intrinsics with zero(_z) suffix. * ``_z`` (zero) which indicates false-predicated lanes are filled with zeroes, these are only used for load instructions. vldrbq_gather_offset_z_s16, vldrbq_gather_offset_z_u8, vldrbq_gather_offset_z_s32, vldrbq_gather_offset_z_u16, vldrbq_gather_offset_z_u32, vldrbq_gather_offset_z_s8, vldrbq_z_s16, vldrbq_z_u8, vldrbq_z_s8, vldrbq_z_s32, vldrbq_z_u16, vldrbq_z_u32, vldrwq_gather_base_z_u32, vldrwq_gather_base_z_s32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-01 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (LDRGBS_Z_QUALIFIERS): Define builtin qualifier. (LDRGBU_Z_QUALIFIERS): Likewise. (LDRGS_Z_QUALIFIERS): Likewise. (LDRGU_Z_QUALIFIERS): Likewise. (LDRS_Z_QUALIFIERS): Likewise. (LDRU_Z_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vldrbq_gather_offset_z_s16): Define macro. (vldrbq_gather_offset_z_u8): Likewise. (vldrbq_gather_offset_z_s32): Likewise. (vldrbq_gather_offset_z_u16): Likewise. (vldrbq_gather_offset_z_u32): Likewise. (vldrbq_gather_offset_z_s8): Likewise. (vldrbq_z_s16): Likewise. (vldrbq_z_u8): Likewise. (vldrbq_z_s8): Likewise. (vldrbq_z_s32): Likewise. (vldrbq_z_u16): Likewise. (vldrbq_z_u32): Likewise. (vldrwq_gather_base_z_u32): Likewise. (vldrwq_gather_base_z_s32): Likewise. (__arm_vldrbq_gather_offset_z_s8): Define intrinsic. (__arm_vldrbq_gather_offset_z_s32): Likewise. (__arm_vldrbq_gather_offset_z_s16): Likewise. (__arm_vldrbq_gather_offset_z_u8): Likewise. (__arm_vldrbq_gather_offset_z_u32): Likewise. (__arm_vldrbq_gather_offset_z_u16): Likewise. (__arm_vldrbq_z_s8): Likewise. (__arm_vldrbq_z_s32): Likewise. (__arm_vldrbq_z_s16): Likewise. (__arm_vldrbq_z_u8): Likewise. (__arm_vldrbq_z_u32): Likewise. (__arm_vldrbq_z_u16): Likewise. (__arm_vldrwq_gather_base_z_s32): Likewise. (__arm_vldrwq_gather_base_z_u32): Likewise. (vldrbq_gather_offset_z): Define polymorphic variant. * config/arm/arm_mve_builtins.def (LDRGBS_Z_QUALIFIERS): Use builtin qualifier. (LDRGBU_Z_QUALIFIERS): Likewise. (LDRGS_Z_QUALIFIERS): Likewise. (LDRGU_Z_QUALIFIERS): Likewise. (LDRS_Z_QUALIFIERS): Likewise. (LDRU_Z_QUALIFIERS): Likewise. * config/arm/mve.md (mve_vldrbq_gather_offset_z_): Define RTL pattern. (mve_vldrbq_z_): Likewise. (mve_vldrwq_gather_base_z_v4si): Likewise. gcc/testsuite/ChangeLog: Likewise. 2019-11-01 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s16.c: New test. * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_z_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_z_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_z_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_z_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_u32.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index b5639051bf07785d906ed596e08d670f4de1a67e..c3d12375d2fbc933ad33f7a15a3bbf53079d0639 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -653,6 +653,40 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate}; #define LDRGBU_QUALIFIERS (arm_ldrgbu_qualifiers) +static enum arm_type_qualifiers +arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_unsigned, qualifier_immediate, + qualifier_unsigned}; +#define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers) + +static enum arm_type_qualifiers +arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualif
[PATCH][ARM][GCC][2/5x]: MVE load intrinsics.
Hello, This patch supports the following MVE ACLE load intrinsics. vldrbq_gather_offset_u8, vldrbq_gather_offset_s8, vldrbq_s8, vldrbq_u8, vldrbq_gather_offset_u16, vldrbq_gather_offset_s16, vldrbq_s16, vldrbq_u16, vldrbq_gather_offset_u32, vldrbq_gather_offset_s32, vldrbq_s32, vldrbq_u32, vldrwq_gather_base_s32, vldrwq_gather_base_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-01 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (LDRGU_QUALIFIERS): Define builtin qualifier. (LDRGS_QUALIFIERS): Likewise. (LDRS_QUALIFIERS): Likewise. (LDRU_QUALIFIERS): Likewise. (LDRGBS_QUALIFIERS): Likewise. (LDRGBU_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vldrbq_gather_offset_u8): Define macro. (vldrbq_gather_offset_s8): Likewise. (vldrbq_s8): Likewise. (vldrbq_u8): Likewise. (vldrbq_gather_offset_u16): Likewise. (vldrbq_gather_offset_s16): Likewise. (vldrbq_s16): Likewise. (vldrbq_u16): Likewise. (vldrbq_gather_offset_u32): Likewise. (vldrbq_gather_offset_s32): Likewise. (vldrbq_s32): Likewise. (vldrbq_u32): Likewise. (vldrwq_gather_base_s32): Likewise. (vldrwq_gather_base_u32): Likewise. (__arm_vldrbq_gather_offset_u8): Define intrinsic. (__arm_vldrbq_gather_offset_s8): Likewise. (__arm_vldrbq_s8): Likewise. (__arm_vldrbq_u8): Likewise. (__arm_vldrbq_gather_offset_u16): Likewise. (__arm_vldrbq_gather_offset_s16): Likewise. (__arm_vldrbq_s16): Likewise. (__arm_vldrbq_u16): Likewise. (__arm_vldrbq_gather_offset_u32): Likewise. (__arm_vldrbq_gather_offset_s32): Likewise. (__arm_vldrbq_s32): Likewise. (__arm_vldrbq_u32): Likewise. (__arm_vldrwq_gather_base_s32): Likewise. (__arm_vldrwq_gather_base_u32): Likewise. (vldrbq_gather_offset): Define polymorphic variant. * config/arm/arm_mve_builtins.def (LDRGU_QUALIFIERS): Use builtin qualifier. (LDRGS_QUALIFIERS): Likewise. (LDRS_QUALIFIERS): Likewise. (LDRU_QUALIFIERS): Likewise. (LDRGBS_QUALIFIERS): Likewise. (LDRGBU_QUALIFIERS): Likewise. * config/arm/mve.md (VLDRBGOQ): Define iterator. (VLDRBQ): Likewise. (VLDRWGBQ): Likewise. (mve_vldrbq_gather_offset_): Define RTL pattern. (mve_vldrbq_): Likewise. (mve_vldrwq_gather_base_v4si): Likewise. gcc/testsuite/ChangeLog: 2019-11-01 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s16.c: New test. * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_u32.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index ec88199bb5e7e9c15a346061c70841f3086004ef..02ea297937b18099f33a50c808964d1dd7eac1b3 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -588,6 +588,36 @@ arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned}; #define STRSBU_QUALIFIERS (arm_strsbu_qualifiers) +static enum arm_type_qualifiers +arm_ldrgu_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned}; +#define LDRGU_QUALIFIERS (arm_ldrgu_qualifiers) + +static enum arm_type_qualifiers +arm_ldrgs_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_pointer, qualifier_unsigned}; +#define LDRGS_QUALIFIERS (arm_ldrgs_qualifiers) + +static enum arm_type_qualifiers +arm_ldrs_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_pointer}; +#define LDRS_QUALIFIERS (arm_ldrs_qualifiers) + +static enum arm_typ
[PATCH][ARM][GCC][1/5x]: MVE store intrinsics.
Hello, This patch supports the following MVE ACLE store intrinsics. vstrbq_scatter_offset_s8, vstrbq_scatter_offset_s32, vstrbq_scatter_offset_s16, vstrbq_scatter_offset_u8, vstrbq_scatter_offset_u32, vstrbq_scatter_offset_u16, vstrbq_s8, vstrbq_s32, vstrbq_s16, vstrbq_u8, vstrbq_u32, vstrbq_u16, vstrwq_scatter_base_s32, vstrwq_scatter_base_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-01 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (STRS_QUALIFIERS): Define builtin qualifier. (STRU_QUALIFIERS): Likewise. (STRSS_QUALIFIERS): Likewise. (STRSU_QUALIFIERS): Likewise. (STRSBS_QUALIFIERS): Likewise. (STRSBU_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vstrbq_s8): Define macro. (vstrbq_u8): Likewise. (vstrbq_u16): Likewise. (vstrbq_scatter_offset_s8): Likewise. (vstrbq_scatter_offset_u8): Likewise. (vstrbq_scatter_offset_u16): Likewise. (vstrbq_s16): Likewise. (vstrbq_u32): Likewise. (vstrbq_scatter_offset_s16): Likewise. (vstrbq_scatter_offset_u32): Likewise. (vstrbq_s32): Likewise. (vstrbq_scatter_offset_s32): Likewise. (vstrwq_scatter_base_s32): Likewise. (vstrwq_scatter_base_u32): Likewise. (__arm_vstrbq_scatter_offset_s8): Define intrinsic. (__arm_vstrbq_scatter_offset_s32): Likewise. (__arm_vstrbq_scatter_offset_s16): Likewise. (__arm_vstrbq_scatter_offset_u8): Likewise. (__arm_vstrbq_scatter_offset_u32): Likewise. (__arm_vstrbq_scatter_offset_u16): Likewise. (__arm_vstrbq_s8): Likewise. (__arm_vstrbq_s32): Likewise. (__arm_vstrbq_s16): Likewise. (__arm_vstrbq_u8): Likewise. (__arm_vstrbq_u32): Likewise. (__arm_vstrbq_u16): Likewise. (__arm_vstrwq_scatter_base_s32): Likewise. (__arm_vstrwq_scatter_base_u32): Likewise. (vstrbq): Define polymorphic variant. (vstrbq_scatter_offset): Likewise. (vstrwq_scatter_base): Likewise. * config/arm/arm_mve_builtins.def (STRS_QUALIFIERS): Use builtin qualifier. (STRU_QUALIFIERS): Likewise. (STRSS_QUALIFIERS): Likewise. (STRSU_QUALIFIERS): Likewise. (STRSBS_QUALIFIERS): Likewise. (STRSBU_QUALIFIERS): Likewise. * config/arm/mve.md (MVE_B_ELEM): Define mode attribute iterator. (VSTRWSBQ): Define iterators. (VSTRBSOQ): Likewise. (VSTRBQ): Likewise. (mve_vstrbq_): Define RTL pattern. (mve_vstrbq_scatter_offset_): Likewise. (mve_vstrwq_scatter_base_v4si): Likewise. gcc/testsuite/ChangeLog: 2019-11-01 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vstrbq_s16.c: New test. * gcc.target/arm/mve/intrinsics/vstrbq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrbq_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 6dffb36fe179357c62cb4f35d486513971e3487d..ec88199bb5e7e9c15a346061c70841f3086004ef 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -555,6 +555,39 @@ arm_quadop_unone_unone_unone_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS \ (arm_quadop_unone_unone_unone_none_unone_qualifiers) +static enum arm_type_qualifiers +arm_strs_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_pointer, qualifier_none }; +#define STRS_QUALIFIERS (arm_strs_qualifiers) + +static enum arm_type_qualifiers +arm_stru_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_pointer, qualifier_unsigned }; +#define STRU_QUALIFIERS (arm_str
[PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands.
Hello, This patch supports following MVE ACLE intrinsics with quaternary operands. vsriq_m_n_s8, vsubq_m_s8, vsubq_x_s8, vcvtq_m_n_f16_u16, vcvtq_x_n_f16_u16, vqshluq_m_n_s8, vabavq_p_s8, vsriq_m_n_u8, vshlq_m_u8, vshlq_x_u8, vsubq_m_u8, vsubq_x_u8, vabavq_p_u8, vshlq_m_s8, vshlq_x_s8, vcvtq_m_n_f16_s16, vcvtq_x_n_f16_s16, vsriq_m_n_s16, vsubq_m_s16, vsubq_x_s16, vcvtq_m_n_f32_u32, vcvtq_x_n_f32_u32, vqshluq_m_n_s16, vabavq_p_s16, vsriq_m_n_u16, vshlq_m_u16, vshlq_x_u16, vsubq_m_u16, vsubq_x_u16, vabavq_p_u16, vshlq_m_s16, vshlq_x_s16, vcvtq_m_n_f32_s32, vcvtq_x_n_f32_s32, vsriq_m_n_s32, vsubq_m_s32, vsubq_x_s32, vqshluq_m_n_s32, vabavq_p_s32, vsriq_m_n_u32, vshlq_m_u32, vshlq_x_u32, vsubq_m_u32, vsubq_x_u32, vabavq_p_u32, vshlq_m_s32, vshlq_x_s32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-29 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Define builtin qualifier. (QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Likewise. (QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise. (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Likewise. (QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Likewise. (QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise. (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise. (QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vsriq_m_n_s8): Define macro. (vsubq_m_s8): Likewise. (vcvtq_m_n_f16_u16): Likewise. (vqshluq_m_n_s8): Likewise. (vabavq_p_s8): Likewise. (vsriq_m_n_u8): Likewise. (vshlq_m_u8): Likewise. (vsubq_m_u8): Likewise. (vabavq_p_u8): Likewise. (vshlq_m_s8): Likewise. (vcvtq_m_n_f16_s16): Likewise. (vsriq_m_n_s16): Likewise. (vsubq_m_s16): Likewise. (vcvtq_m_n_f32_u32): Likewise. (vqshluq_m_n_s16): Likewise. (vabavq_p_s16): Likewise. (vsriq_m_n_u16): Likewise. (vshlq_m_u16): Likewise. (vsubq_m_u16): Likewise. (vabavq_p_u16): Likewise. (vshlq_m_s16): Likewise. (vcvtq_m_n_f32_s32): Likewise. (vsriq_m_n_s32): Likewise. (vsubq_m_s32): Likewise. (vqshluq_m_n_s32): Likewise. (vabavq_p_s32): Likewise. (vsriq_m_n_u32): Likewise. (vshlq_m_u32): Likewise. (vsubq_m_u32): Likewise. (vabavq_p_u32): Likewise. (vshlq_m_s32): Likewise. (__arm_vsriq_m_n_s8): Define intrinsic. (__arm_vsubq_m_s8): Likewise. (__arm_vqshluq_m_n_s8): Likewise. (__arm_vabavq_p_s8): Likewise. (__arm_vsriq_m_n_u8): Likewise. (__arm_vshlq_m_u8): Likewise. (__arm_vsubq_m_u8): Likewise. (__arm_vabavq_p_u8): Likewise. (__arm_vshlq_m_s8): Likewise. (__arm_vsriq_m_n_s16): Likewise. (__arm_vsubq_m_s16): Likewise. (__arm_vqshluq_m_n_s16): Likewise. (__arm_vabavq_p_s16): Likewise. (__arm_vsriq_m_n_u16): Likewise. (__arm_vshlq_m_u16): Likewise. (__arm_vsubq_m_u16): Likewise. (__arm_vabavq_p_u16): Likewise. (__arm_vshlq_m_s16): Likewise. (__arm_vsriq_m_n_s32): Likewise. (__arm_vsubq_m_s32): Likewise. (__arm_vqshluq_m_n_s32): Likewise. (__arm_vabavq_p_s32): Likewise. (__arm_vsriq_m_n_u32): Likewise. (__arm_vshlq_m_u32): Likewise. (__arm_vsubq_m_u32): Likewise. (__arm_vabavq_p_u32): Likewise. (__arm_vshlq_m_s32): Likewise. (__arm_vcvtq_m_n_f16_u16): Likewise. (__arm_vcvtq_m_n_f16_s16): Likewise. (__arm_vcvtq_m_n_f32_u32): Likewise. (__arm_vcvtq_m_n_f32_s32): Likewise. (vcvtq_m_n): Define polymorphic variant. (vqshluq_m_n): Likewise. (vshlq_m): Likewise. (vsriq_m_n): Likewise. (vsubq_m): Likewise. (vabavq_p): Likewise. * config/arm/arm_mve_builtins.def (QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Use builtin qualifier. (QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Likewise. (QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise. (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Likewise. (QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Likewise. (QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise. (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise. (QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Likewise. * config/arm/mve.md (VABAVQ_P): Define iterator. (VSHLQ_M): Likewise. (VSRIQ_M_N): Likewise. (V
[PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands.
Hello, This patch supports following MVE ACLE intrinsics with quaternary operands. vabdq_m_s8, vabdq_m_s32, vabdq_m_s16, vabdq_m_u8, vabdq_m_u32, vabdq_m_u16, vaddq_m_n_s8, vaddq_m_n_s32, vaddq_m_n_s16, vaddq_m_n_u8, vaddq_m_n_u32, vaddq_m_n_u16, vaddq_m_s8, vaddq_m_s32, vaddq_m_s16, vaddq_m_u8, vaddq_m_u32, vaddq_m_u16, vandq_m_s8, vandq_m_s32, vandq_m_s16, vandq_m_u8, vandq_m_u32, vandq_m_u16, vbicq_m_s8, vbicq_m_s32, vbicq_m_s16, vbicq_m_u8, vbicq_m_u32, vbicq_m_u16, vbrsrq_m_n_s8, vbrsrq_m_n_s32, vbrsrq_m_n_s16, vbrsrq_m_n_u8, vbrsrq_m_n_u32, vbrsrq_m_n_u16, vcaddq_rot270_m_s8, vcaddq_rot270_m_s32, vcaddq_rot270_m_s16, vcaddq_rot270_m_u8, vcaddq_rot270_m_u32, vcaddq_rot270_m_u16, vcaddq_rot90_m_s8, vcaddq_rot90_m_s32, vcaddq_rot90_m_s16, vcaddq_rot90_m_u8, vcaddq_rot90_m_u32, vcaddq_rot90_m_u16, veorq_m_s8, veorq_m_s32, veorq_m_s16, veorq_m_u8, veorq_m_u32, veorq_m_u16, vhaddq_m_n_s8, vhaddq_m_n_s32, vhaddq_m_n_s16, vhaddq_m_n_u8, vhaddq_m_n_u32, vhaddq_m_n_u16, vhaddq_m_s8, vhaddq_m_s32, vhaddq_m_s16, vhaddq_m_u8, vhaddq_m_u32, vhaddq_m_u16, vhcaddq_rot270_m_s8, vhcaddq_rot270_m_s32, vhcaddq_rot270_m_s16, vhcaddq_rot90_m_s8, vhcaddq_rot90_m_s32, vhcaddq_rot90_m_s16, vhsubq_m_n_s8, vhsubq_m_n_s32, vhsubq_m_n_s16, vhsubq_m_n_u8, vhsubq_m_n_u32, vhsubq_m_n_u16, vhsubq_m_s8, vhsubq_m_s32, vhsubq_m_s16, vhsubq_m_u8, vhsubq_m_u32, vhsubq_m_u16, vmaxq_m_s8, vmaxq_m_s32, vmaxq_m_s16, vmaxq_m_u8, vmaxq_m_u32, vmaxq_m_u16, vminq_m_s8, vminq_m_s32, vminq_m_s16, vminq_m_u8, vminq_m_u32, vminq_m_u16, vmladavaq_p_s8, vmladavaq_p_s32, vmladavaq_p_s16, vmladavaq_p_u8, vmladavaq_p_u32, vmladavaq_p_u16, vmladavaxq_p_s8, vmladavaxq_p_s32, vmladavaxq_p_s16, vmlaq_m_n_s8, vmlaq_m_n_s32, vmlaq_m_n_s16, vmlaq_m_n_u8, vmlaq_m_n_u32, vmlaq_m_n_u16, vmlasq_m_n_s8, vmlasq_m_n_s32, vmlasq_m_n_s16, vmlasq_m_n_u8, vmlasq_m_n_u32, vmlasq_m_n_u16, vmlsdavaq_p_s8, vmlsdavaq_p_s32, vmlsdavaq_p_s16, vmlsdavaxq_p_s8, vmlsdavaxq_p_s32, vmlsdavaxq_p_s16, vmulhq_m_s8, vmulhq_m_s32, vmulhq_m_s16, vmulhq_m_u8, vmulhq_m_u32, vmulhq_m_u16, vmullbq_int_m_s8, vmullbq_int_m_s32, vmullbq_int_m_s16, vmullbq_int_m_u8, vmullbq_int_m_u32, vmullbq_int_m_u16, vmulltq_int_m_s8, vmulltq_int_m_s32, vmulltq_int_m_s16, vmulltq_int_m_u8, vmulltq_int_m_u32, vmulltq_int_m_u16, vmulq_m_n_s8, vmulq_m_n_s32, vmulq_m_n_s16, vmulq_m_n_u8, vmulq_m_n_u32, vmulq_m_n_u16, vmulq_m_s8, vmulq_m_s32, vmulq_m_s16, vmulq_m_u8, vmulq_m_u32, vmulq_m_u16, vornq_m_s8, vornq_m_s32, vornq_m_s16, vornq_m_u8, vornq_m_u32, vornq_m_u16, vorrq_m_s8, vorrq_m_s32, vorrq_m_s16, vorrq_m_u8, vorrq_m_u32, vorrq_m_u16, vqaddq_m_n_s8, vqaddq_m_n_s32, vqaddq_m_n_s16, vqaddq_m_n_u8, vqaddq_m_n_u32, vqaddq_m_n_u16, vqaddq_m_s8, vqaddq_m_s32, vqaddq_m_s16, vqaddq_m_u8, vqaddq_m_u32, vqaddq_m_u16, vqdmladhq_m_s8, vqdmladhq_m_s32, vqdmladhq_m_s16, vqdmladhxq_m_s8, vqdmladhxq_m_s32, vqdmladhxq_m_s16, vqdmlahq_m_n_s8, vqdmlahq_m_n_s32, vqdmlahq_m_n_s16, vqdmlahq_m_n_u8, vqdmlahq_m_n_u32, vqdmlahq_m_n_u16, vqdmlsdhq_m_s8, vqdmlsdhq_m_s32, vqdmlsdhq_m_s16, vqdmlsdhxq_m_s8, vqdmlsdhxq_m_s32, vqdmlsdhxq_m_s16, vqdmulhq_m_n_s8, vqdmulhq_m_n_s32, vqdmulhq_m_n_s16, vqdmulhq_m_s8, vqdmulhq_m_s32, vqdmulhq_m_s16, vqrdmladhq_m_s8, vqrdmladhq_m_s32, vqrdmladhq_m_s16, vqrdmladhxq_m_s8, vqrdmladhxq_m_s32, vqrdmladhxq_m_s16, vqrdmlahq_m_n_s8, vqrdmlahq_m_n_s32, vqrdmlahq_m_n_s16, vqrdmlahq_m_n_u8, vqrdmlahq_m_n_u32, vqrdmlahq_m_n_u16, vqrdmlashq_m_n_s8, vqrdmlashq_m_n_s32, vqrdmlashq_m_n_s16, vqrdmlashq_m_n_u8, vqrdmlashq_m_n_u32, vqrdmlashq_m_n_u16, vqrdmlsdhq_m_s8, vqrdmlsdhq_m_s32, vqrdmlsdhq_m_s16, vqrdmlsdhxq_m_s8, vqrdmlsdhxq_m_s32, vqrdmlsdhxq_m_s16, vqrdmulhq_m_n_s8, vqrdmulhq_m_n_s32, vqrdmulhq_m_n_s16, vqrdmulhq_m_s8, vqrdmulhq_m_s32, vqrdmulhq_m_s16, vqrshlq_m_s8, vqrshlq_m_s32, vqrshlq_m_s16, vqrshlq_m_u8, vqrshlq_m_u32, vqrshlq_m_u16, vqshlq_m_n_s8, vqshlq_m_n_s32, vqshlq_m_n_s16, vqshlq_m_n_u8, vqshlq_m_n_u32, vqshlq_m_n_u16, vqshlq_m_s8, vqshlq_m_s32, vqshlq_m_s16, vqshlq_m_u8, vqshlq_m_u32, vqshlq_m_u16, vqsubq_m_n_s8, vqsubq_m_n_s32, vqsubq_m_n_s16, vqsubq_m_n_u8, vqsubq_m_n_u32, vqsubq_m_n_u16, vqsubq_m_s8, vqsubq_m_s32, vqsubq_m_s16, vqsubq_m_u8, vqsubq_m_u32, vqsubq_m_u16, vrhaddq_m_s8, vrhaddq_m_s32, vrhaddq_m_s16, vrhaddq_m_u8, vrhaddq_m_u32, vrhaddq_m_u16, vrmulhq_m_s8, vrmulhq_m_s32, vrmulhq_m_s16, vrmulhq_m_u8, vrmulhq_m_u32, vrmulhq_m_u16, vrshlq_m_s8, vrshlq_m_s32, vrshlq_m_s16, vrshlq_m_u8, vrshlq_m_u32, vrshlq_m_u16, vrshrq_m_n_s8, vrshrq_m_n_s32, vrshrq_m_n_s16, vrshrq_m_n_u8, vrshrq_m_n_u32, vrshrq_m_n_u16, vshlq_m_n_s8, vshlq_m_n_s32, vshlq_m_n_s16, vshlq_m_n_u8, vshlq_m_n_u32, vshlq_m_n_u16, vshrq_m_n_s8, vshrq_m_n_s32, vshrq_m_n_s16, vshrq_m_n_u8, vshrq_m_n_u32, vshrq_m_n_u16, vsliq_m_n_s8, vsliq_m_n_s32, vsliq_m_n_s16, vsliq_m_n_u8, vsliq_m_n_u32, vsliq_m_n_u16, vsubq_m_n_s8, vsubq_m_n_s32, vsubq_m_n_s16, vsubq_m_n_u8, vsubq_m_n_u32, vsubq_m_n_u16. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more de
[PATCH][ARM][GCC][1/3x]: MVE intrinsics with ternary operands.
Hello, This patch supports following MVE ACLE intrinsics with ternary operands. vabavq_s8, vabavq_s16, vabavq_s32, vbicq_m_n_s16, vbicq_m_n_s32, vbicq_m_n_u16, vbicq_m_n_u32, vcmpeqq_m_f16, vcmpeqq_m_f32, vcvtaq_m_s16_f16, vcvtaq_m_u16_f16, vcvtaq_m_s32_f32, vcvtaq_m_u32_f32, vcvtq_m_f16_s16, vcvtq_m_f16_u16, vcvtq_m_f32_s32, vcvtq_m_f32_u32, vqrshrnbq_n_s16, vqrshrnbq_n_u16, vqrshrnbq_n_s32, vqrshrnbq_n_u32, vqrshrunbq_n_s16, vqrshrunbq_n_s32, vrmlaldavhaq_s32, vrmlaldavhaq_u32, vshlcq_s8, vshlcq_u8, vshlcq_s16, vshlcq_u16, vshlcq_s32, vshlcq_u32, vabavq_s8, vabavq_s16, vabavq_s32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-23 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (TERNOP_UNONE_UNONE_UNONE_IMM_QUALIFIERS): Define qualifier for ternary operands. (TERNOP_UNONE_UNONE_NONE_NONE_QUALIFIERS): Likewise. (TERNOP_UNONE_NONE_UNONE_IMM_QUALIFIERS): Likewise. (TERNOP_NONE_NONE_UNONE_IMM_QUALIFIERS): Likewise. (TERNOP_UNONE_UNONE_NONE_IMM_QUALIFIERS): Likewise. (TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Likewise. (TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise. (TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Likewise. (TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS): Likewise. (TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Likewise. (TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise. (TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Likewise. (TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Likewise. (TERNOP_NONE_NONE_NONE_NONE_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vabavq_s8): Define macro. (vabavq_s16): Likewise. (vabavq_s32): Likewise. (vbicq_m_n_s16): Likewise. (vbicq_m_n_s32): Likewise. (vbicq_m_n_u16): Likewise. (vbicq_m_n_u32): Likewise. (vcmpeqq_m_f16): Likewise. (vcmpeqq_m_f32): Likewise. (vcvtaq_m_s16_f16): Likewise. (vcvtaq_m_u16_f16): Likewise. (vcvtaq_m_s32_f32): Likewise. (vcvtaq_m_u32_f32): Likewise. (vcvtq_m_f16_s16): Likewise. (vcvtq_m_f16_u16): Likewise. (vcvtq_m_f32_s32): Likewise. (vcvtq_m_f32_u32): Likewise. (vqrshrnbq_n_s16): Likewise. (vqrshrnbq_n_u16): Likewise. (vqrshrnbq_n_s32): Likewise. (vqrshrnbq_n_u32): Likewise. (vqrshrunbq_n_s16): Likewise. (vqrshrunbq_n_s32): Likewise. (vrmlaldavhaq_s32): Likewise. (vrmlaldavhaq_u32): Likewise. (vshlcq_s8): Likewise. (vshlcq_u8): Likewise. (vshlcq_s16): Likewise. (vshlcq_u16): Likewise. (vshlcq_s32): Likewise. (vshlcq_u32): Likewise. (vabavq_u8): Likewise. (vabavq_u16): Likewise. (vabavq_u32): Likewise. (__arm_vabavq_s8): Define intrinsic. (__arm_vabavq_s16): Likewise. (__arm_vabavq_s32): Likewise. (__arm_vabavq_u8): Likewise. (__arm_vabavq_u16): Likewise. (__arm_vabavq_u32): Likewise. (__arm_vbicq_m_n_s16): Likewise. (__arm_vbicq_m_n_s32): Likewise. (__arm_vbicq_m_n_u16): Likewise. (__arm_vbicq_m_n_u32): Likewise. (__arm_vqrshrnbq_n_s16): Likewise. (__arm_vqrshrnbq_n_u16): Likewise. (__arm_vqrshrnbq_n_s32): Likewise. (__arm_vqrshrnbq_n_u32): Likewise. (__arm_vqrshrunbq_n_s16): Likewise. (__arm_vqrshrunbq_n_s32): Likewise. (__arm_vrmlaldavhaq_s32): Likewise. (__arm_vrmlaldavhaq_u32): Likewise. (__arm_vshlcq_s8): Likewise. (__arm_vshlcq_u8): Likewise. (__arm_vshlcq_s16): Likewise. (__arm_vshlcq_u16): Likewise. (__arm_vshlcq_s32): Likewise. (__arm_vshlcq_u32): Likewise. (__arm_vcmpeqq_m_f16): Likewise. (__arm_vcmpeqq_m_f32): Likewise. (__arm_vcvtaq_m_s16_f16): Likewise. (__arm_vcvtaq_m_u16_f16): Likewise. (__arm_vcvtaq_m_s32_f32): Likewise. (__arm_vcvtaq_m_u32_f32): Likewise. (__arm_vcvtq_m_f16_s16): Likewise. (__arm_vcvtq_m_f16_u16): Likewise. (__arm_vcvtq_m_f32_s32): Likewise. (__arm_vcvtq_m_f32_u32): Likewise. (vcvtaq_m): Define polymorphic variant. (vcvtq_m): Likewise. (vabavq): Likewise. (vshlcq): Likewise. (vbicq_m_n): Likewise. (vqrshrnbq_n): Likewise. (vqrshrunbq_n): Likewise. * config/arm/arm_mve_builtins.def (TERNOP_UNONE_UNONE_UNONE_IMM_QUALIFIERS): Use the builtin qualifer. (TERNOP_UNONE_UNONE_NONE_NONE_QUALIFIERS): Likewise. (TERNOP_UNONE_NONE_UNONE_IMM_QUALIFIERS): Likewise. (TERNOP_NONE_NON
[PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands.
Hello, This patch supports following MVE ACLE intrinsics with quaternary operands. vabdq_m_f32, vabdq_m_f16, vaddq_m_f32, vaddq_m_f16, vaddq_m_n_f32, vaddq_m_n_f16, vandq_m_f32, vandq_m_f16, vbicq_m_f32, vbicq_m_f16, vbrsrq_m_n_f32, vbrsrq_m_n_f16, vcaddq_rot270_m_f32, vcaddq_rot270_m_f16, vcaddq_rot90_m_f32, vcaddq_rot90_m_f16, vcmlaq_m_f32, vcmlaq_m_f16, vcmlaq_rot180_m_f32, vcmlaq_rot180_m_f16, vcmlaq_rot270_m_f32, vcmlaq_rot270_m_f16, vcmlaq_rot90_m_f32, vcmlaq_rot90_m_f16, vcmulq_m_f32, vcmulq_m_f16, vcmulq_rot180_m_f32, vcmulq_rot180_m_f16, vcmulq_rot270_m_f32, vcmulq_rot270_m_f16, vcmulq_rot90_m_f32, vcmulq_rot90_m_f16, vcvtq_m_n_s32_f32, vcvtq_m_n_s16_f16, vcvtq_m_n_u32_f32, vcvtq_m_n_u16_f16, veorq_m_f32, veorq_m_f16, vfmaq_m_f32, vfmaq_m_f16, vfmaq_m_n_f32, vfmaq_m_n_f16, vfmasq_m_n_f32, vfmasq_m_n_f16, vfmsq_m_f32, vfmsq_m_f16, vmaxnmq_m_f32, vmaxnmq_m_f16, vminnmq_m_f32, vminnmq_m_f16, vmulq_m_f32, vmulq_m_f16, vmulq_m_n_f32, vmulq_m_n_f16, vornq_m_f32, vornq_m_f16, vorrq_m_f32, vorrq_m_f16, vsubq_m_f32, vsubq_m_f16, vsubq_m_n_f32, vsubq_m_n_f16. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-31 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vabdq_m_f32): Define macro. (vabdq_m_f16): Likewise. (vaddq_m_f32): Likewise. (vaddq_m_f16): Likewise. (vaddq_m_n_f32): Likewise. (vaddq_m_n_f16): Likewise. (vandq_m_f32): Likewise. (vandq_m_f16): Likewise. (vbicq_m_f32): Likewise. (vbicq_m_f16): Likewise. (vbrsrq_m_n_f32): Likewise. (vbrsrq_m_n_f16): Likewise. (vcaddq_rot270_m_f32): Likewise. (vcaddq_rot270_m_f16): Likewise. (vcaddq_rot90_m_f32): Likewise. (vcaddq_rot90_m_f16): Likewise. (vcmlaq_m_f32): Likewise. (vcmlaq_m_f16): Likewise. (vcmlaq_rot180_m_f32): Likewise. (vcmlaq_rot180_m_f16): Likewise. (vcmlaq_rot270_m_f32): Likewise. (vcmlaq_rot270_m_f16): Likewise. (vcmlaq_rot90_m_f32): Likewise. (vcmlaq_rot90_m_f16): Likewise. (vcmulq_m_f32): Likewise. (vcmulq_m_f16): Likewise. (vcmulq_rot180_m_f32): Likewise. (vcmulq_rot180_m_f16): Likewise. (vcmulq_rot270_m_f32): Likewise. (vcmulq_rot270_m_f16): Likewise. (vcmulq_rot90_m_f32): Likewise. (vcmulq_rot90_m_f16): Likewise. (vcvtq_m_n_s32_f32): Likewise. (vcvtq_m_n_s16_f16): Likewise. (vcvtq_m_n_u32_f32): Likewise. (vcvtq_m_n_u16_f16): Likewise. (veorq_m_f32): Likewise. (veorq_m_f16): Likewise. (vfmaq_m_f32): Likewise. (vfmaq_m_f16): Likewise. (vfmaq_m_n_f32): Likewise. (vfmaq_m_n_f16): Likewise. (vfmasq_m_n_f32): Likewise. (vfmasq_m_n_f16): Likewise. (vfmsq_m_f32): Likewise. (vfmsq_m_f16): Likewise. (vmaxnmq_m_f32): Likewise. (vmaxnmq_m_f16): Likewise. (vminnmq_m_f32): Likewise. (vminnmq_m_f16): Likewise. (vmulq_m_f32): Likewise. (vmulq_m_f16): Likewise. (vmulq_m_n_f32): Likewise. (vmulq_m_n_f16): Likewise. (vornq_m_f32): Likewise. (vornq_m_f16): Likewise. (vorrq_m_f32): Likewise. (vorrq_m_f16): Likewise. (vsubq_m_f32): Likewise. (vsubq_m_f16): Likewise. (vsubq_m_n_f32): Likewise. (vsubq_m_n_f16): Likewise. (__attribute__): Likewise. (__arm_vabdq_m_f32): Likewise. (__arm_vabdq_m_f16): Likewise. (__arm_vaddq_m_f32): Likewise. (__arm_vaddq_m_f16): Likewise. (__arm_vaddq_m_n_f32): Likewise. (__arm_vaddq_m_n_f16): Likewise. (__arm_vandq_m_f32): Likewise. (__arm_vandq_m_f16): Likewise. (__arm_vbicq_m_f32): Likewise. (__arm_vbicq_m_f16): Likewise. (__arm_vbrsrq_m_n_f32): Likewise. (__arm_vbrsrq_m_n_f16): Likewise. (__arm_vcaddq_rot270_m_f32): Likewise. (__arm_vcaddq_rot270_m_f16): Likewise. (__arm_vcaddq_rot90_m_f32): Likewise. (__arm_vcaddq_rot90_m_f16): Likewise. (__arm_vcmlaq_m_f32): Likewise. (__arm_vcmlaq_m_f16): Likewise. (__arm_vcmlaq_rot180_m_f32): Likewise. (__arm_vcmlaq_rot180_m_f16): Likewise. (__arm_vcmlaq_rot270_m_f32): Likewise. (__arm_vcmlaq_rot270_m_f16): Likewise. (__arm_vcmlaq_rot90_m_f32): Likewise. (__arm_vcmlaq_rot90_m_f16): Likewise. (__arm_vcmulq_m_f32): Likewise. (__arm_vcmulq_m_f16): Likewise. (__arm_vcmulq_rot180_m_f32): Define intrinsic. (__arm_vcmulq_rot180_m_f16): Likewise. (__arm_vcmulq_rot270_m_
[PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands.
Hello, This patch supports following MVE ACLE intrinsics with quaternary operands. vmlaldavaq_p_s16, vmlaldavaq_p_s32, vmlaldavaq_p_u16, vmlaldavaq_p_u32, vmlaldavaxq_p_s16, vmlaldavaxq_p_s32, vmlaldavaxq_p_u16, vmlaldavaxq_p_u32, vmlsldavaq_p_s16, vmlsldavaq_p_s32, vmlsldavaxq_p_s16, vmlsldavaxq_p_s32, vmullbq_poly_m_p16, vmullbq_poly_m_p8, vmulltq_poly_m_p16, vmulltq_poly_m_p8, vqdmullbq_m_n_s16, vqdmullbq_m_n_s32, vqdmullbq_m_s16, vqdmullbq_m_s32, vqdmulltq_m_n_s16, vqdmulltq_m_n_s32, vqdmulltq_m_s16, vqdmulltq_m_s32, vqrshrnbq_m_n_s16, vqrshrnbq_m_n_s32, vqrshrnbq_m_n_u16, vqrshrnbq_m_n_u32, vqrshrntq_m_n_s16, vqrshrntq_m_n_s32, vqrshrntq_m_n_u16, vqrshrntq_m_n_u32, vqrshrunbq_m_n_s16, vqrshrunbq_m_n_s32, vqrshruntq_m_n_s16, vqrshruntq_m_n_s32, vqshrnbq_m_n_s16, vqshrnbq_m_n_s32, vqshrnbq_m_n_u16, vqshrnbq_m_n_u32, vqshrntq_m_n_s16, vqshrntq_m_n_s32, vqshrntq_m_n_u16, vqshrntq_m_n_u32, vqshrunbq_m_n_s16, vqshrunbq_m_n_s32, vqshruntq_m_n_s16, vqshruntq_m_n_s32, vrmlaldavhaq_p_s32, vrmlaldavhaq_p_u32, vrmlaldavhaxq_p_s32, vrmlsldavhaq_p_s32, vrmlsldavhaxq_p_s32, vrshrnbq_m_n_s16, vrshrnbq_m_n_s32, vrshrnbq_m_n_u16, vrshrnbq_m_n_u32, vrshrntq_m_n_s16, vrshrntq_m_n_s32, vrshrntq_m_n_u16, vrshrntq_m_n_u32, vshllbq_m_n_s16, vshllbq_m_n_s8, vshllbq_m_n_u16, vshllbq_m_n_u8, vshlltq_m_n_s16, vshlltq_m_n_s8, vshlltq_m_n_u16, vshlltq_m_n_u8, vshrnbq_m_n_s16, vshrnbq_m_n_s32, vshrnbq_m_n_u16, vshrnbq_m_n_u32, vshrntq_m_n_s16, vshrntq_m_n_s32, vshrntq_m_n_u16, vshrntq_m_n_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-31 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-protos.h (arm_mve_immediate_check): * config/arm/arm.c (arm_mve_immediate_check): Define fuction to check mode and interger value. * config/arm/arm_mve.h (vmlaldavaq_p_s32): Define macro. (vmlaldavaq_p_s16): Likewise. (vmlaldavaq_p_u32): Likewise. (vmlaldavaq_p_u16): Likewise. (vmlaldavaxq_p_s32): Likewise. (vmlaldavaxq_p_s16): Likewise. (vmlaldavaxq_p_u32): Likewise. (vmlaldavaxq_p_u16): Likewise. (vmlsldavaq_p_s32): Likewise. (vmlsldavaq_p_s16): Likewise. (vmlsldavaxq_p_s32): Likewise. (vmlsldavaxq_p_s16): Likewise. (vmullbq_poly_m_p8): Likewise. (vmullbq_poly_m_p16): Likewise. (vmulltq_poly_m_p8): Likewise. (vmulltq_poly_m_p16): Likewise. (vqdmullbq_m_n_s32): Likewise. (vqdmullbq_m_n_s16): Likewise. (vqdmullbq_m_s32): Likewise. (vqdmullbq_m_s16): Likewise. (vqdmulltq_m_n_s32): Likewise. (vqdmulltq_m_n_s16): Likewise. (vqdmulltq_m_s32): Likewise. (vqdmulltq_m_s16): Likewise. (vqrshrnbq_m_n_s32): Likewise. (vqrshrnbq_m_n_s16): Likewise. (vqrshrnbq_m_n_u32): Likewise. (vqrshrnbq_m_n_u16): Likewise. (vqrshrntq_m_n_s32): Likewise. (vqrshrntq_m_n_s16): Likewise. (vqrshrntq_m_n_u32): Likewise. (vqrshrntq_m_n_u16): Likewise. (vqrshrunbq_m_n_s32): Likewise. (vqrshrunbq_m_n_s16): Likewise. (vqrshruntq_m_n_s32): Likewise. (vqrshruntq_m_n_s16): Likewise. (vqshrnbq_m_n_s32): Likewise. (vqshrnbq_m_n_s16): Likewise. (vqshrnbq_m_n_u32): Likewise. (vqshrnbq_m_n_u16): Likewise. (vqshrntq_m_n_s32): Likewise. (vqshrntq_m_n_s16): Likewise. (vqshrntq_m_n_u32): Likewise. (vqshrntq_m_n_u16): Likewise. (vqshrunbq_m_n_s32): Likewise. (vqshrunbq_m_n_s16): Likewise. (vqshruntq_m_n_s32): Likewise. (vqshruntq_m_n_s16): Likewise. (vrmlaldavhaq_p_s32): Likewise. (vrmlaldavhaq_p_u32): Likewise. (vrmlaldavhaxq_p_s32): Likewise. (vrmlsldavhaq_p_s32): Likewise. (vrmlsldavhaxq_p_s32): Likewise. (vrshrnbq_m_n_s32): Likewise. (vrshrnbq_m_n_s16): Likewise. (vrshrnbq_m_n_u32): Likewise. (vrshrnbq_m_n_u16): Likewise. (vrshrntq_m_n_s32): Likewise. (vrshrntq_m_n_s16): Likewise. (vrshrntq_m_n_u32): Likewise. (vrshrntq_m_n_u16): Likewise. (vshllbq_m_n_s8): Likewise. (vshllbq_m_n_s16): Likewise. (vshllbq_m_n_u8): Likewise. (vshllbq_m_n_u16): Likewise. (vshlltq_m_n_s8): Likewise. (vshlltq_m_n_s16): Likewise. (vshlltq_m_n_u8): Likewise. (vshlltq_m_n_u16): Likewise. (vshrnbq_m_n_s32): Likewise. (vshrnbq_m_n_s16): Likewise. (vshrnbq_m_n_u32): Likewise. (vshrnbq_m_n_u16): Likewise. (vshrntq_m_n_s32): Likewise. (vshrntq_m_n_s16): Likewise. (vshrntq_m_n_u32): Like
[PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands.
Hello, This patch supports following MVE ACLE intrinsics with ternary operands. vpselq_u8, vpselq_s8, vrev64q_m_u8, vqrdmlashq_n_u8, vqrdmlahq_n_u8, vqdmlahq_n_u8, vmvnq_m_u8, vmlasq_n_u8, vmlaq_n_u8, vmladavq_p_u8, vmladavaq_u8, vminvq_p_u8, vmaxvq_p_u8, vdupq_m_n_u8, vcmpneq_m_u8, vcmpneq_m_n_u8, vcmphiq_m_u8, vcmphiq_m_n_u8, vcmpeqq_m_u8, vcmpeqq_m_n_u8, vcmpcsq_m_u8, vcmpcsq_m_n_u8, vclzq_m_u8, vaddvaq_p_u8, vsriq_n_u8, vsliq_n_u8, vshlq_m_r_u8, vrshlq_m_n_u8, vqshlq_m_r_u8, vqrshlq_m_n_u8, vminavq_p_s8, vminaq_m_s8, vmaxavq_p_s8, vmaxaq_m_s8, vcmpneq_m_s8, vcmpneq_m_n_s8, vcmpltq_m_s8, vcmpltq_m_n_s8, vcmpleq_m_s8, vcmpleq_m_n_s8, vcmpgtq_m_s8, vcmpgtq_m_n_s8, vcmpgeq_m_s8, vcmpgeq_m_n_s8, vcmpeqq_m_s8, vcmpeqq_m_n_s8, vshlq_m_r_s8, vrshlq_m_n_s8, vrev64q_m_s8, vqshlq_m_r_s8, vqrshlq_m_n_s8, vqnegq_m_s8, vqabsq_m_s8, vnegq_m_s8, vmvnq_m_s8, vmlsdavxq_p_s8, vmlsdavq_p_s8, vmladavxq_p_s8, vmladavq_p_s8, vminvq_p_s8, vmaxvq_p_s8, vdupq_m_n_s8, vclzq_m_s8, vclsq_m_s8, vaddvaq_p_s8, vabsq_m_s8, vqrdmlsdhxq_s8, vqrdmlsdhq_s8, vqrdmlashq_n_s8, vqrdmlahq_n_s8, vqrdmladhxq_s8, vqrdmladhq_s8, vqdmlsdhxq_s8, vqdmlsdhq_s8, vqdmlahq_n_s8, vqdmladhxq_s8, vqdmladhq_s8, vmlsdavaxq_s8, vmlsdavaq_s8, vmlasq_n_s8, vmlaq_n_s8, vmladavaxq_s8, vmladavaq_s8, vsriq_n_s8, vsliq_n_s8, vpselq_u16, vpselq_s16, vrev64q_m_u16, vqrdmlashq_n_u16, vqrdmlahq_n_u16, vqdmlahq_n_u16, vmvnq_m_u16, vmlasq_n_u16, vmlaq_n_u16, vmladavq_p_u16, vmladavaq_u16, vminvq_p_u16, vmaxvq_p_u16, vdupq_m_n_u16, vcmpneq_m_u16, vcmpneq_m_n_u16, vcmphiq_m_u16, vcmphiq_m_n_u16, vcmpeqq_m_u16, vcmpeqq_m_n_u16, vcmpcsq_m_u16, vcmpcsq_m_n_u16, vclzq_m_u16, vaddvaq_p_u16, vsriq_n_u16, vsliq_n_u16, vshlq_m_r_u16, vrshlq_m_n_u16, vqshlq_m_r_u16, vqrshlq_m_n_u16, vminavq_p_s16, vminaq_m_s16, vmaxavq_p_s16, vmaxaq_m_s16, vcmpneq_m_s16, vcmpneq_m_n_s16, vcmpltq_m_s16, vcmpltq_m_n_s16, vcmpleq_m_s16, vcmpleq_m_n_s16, vcmpgtq_m_s16, vcmpgtq_m_n_s16, vcmpgeq_m_s16, vcmpgeq_m_n_s16, vcmpeqq_m_s16, vcmpeqq_m_n_s16, vshlq_m_r_s16, vrshlq_m_n_s16, vrev64q_m_s16, vqshlq_m_r_s16, vqrshlq_m_n_s16, vqnegq_m_s16, vqabsq_m_s16, vnegq_m_s16, vmvnq_m_s16, vmlsdavxq_p_s16, vmlsdavq_p_s16, vmladavxq_p_s16, vmladavq_p_s16, vminvq_p_s16, vmaxvq_p_s16, vdupq_m_n_s16, vclzq_m_s16, vclsq_m_s16, vaddvaq_p_s16, vabsq_m_s16, vqrdmlsdhxq_s16, vqrdmlsdhq_s16, vqrdmlashq_n_s16, vqrdmlahq_n_s16, vqrdmladhxq_s16, vqrdmladhq_s16, vqdmlsdhxq_s16, vqdmlsdhq_s16, vqdmlahq_n_s16, vqdmladhxq_s16, vqdmladhq_s16, vmlsdavaxq_s16, vmlsdavaq_s16, vmlasq_n_s16, vmlaq_n_s16, vmladavaxq_s16, vmladavaq_s16, vsriq_n_s16, vsliq_n_s16, vpselq_u32, vpselq_s32, vrev64q_m_u32, vqrdmlashq_n_u32, vqrdmlahq_n_u32, vqdmlahq_n_u32, vmvnq_m_u32, vmlasq_n_u32, vmlaq_n_u32, vmladavq_p_u32, vmladavaq_u32, vminvq_p_u32, vmaxvq_p_u32, vdupq_m_n_u32, vcmpneq_m_u32, vcmpneq_m_n_u32, vcmphiq_m_u32, vcmphiq_m_n_u32, vcmpeqq_m_u32, vcmpeqq_m_n_u32, vcmpcsq_m_u32, vcmpcsq_m_n_u32, vclzq_m_u32, vaddvaq_p_u32, vsriq_n_u32, vsliq_n_u32, vshlq_m_r_u32, vrshlq_m_n_u32, vqshlq_m_r_u32, vqrshlq_m_n_u32, vminavq_p_s32, vminaq_m_s32, vmaxavq_p_s32, vmaxaq_m_s32, vcmpneq_m_s32, vcmpneq_m_n_s32, vcmpltq_m_s32, vcmpltq_m_n_s32, vcmpleq_m_s32, vcmpleq_m_n_s32, vcmpgtq_m_s32, vcmpgtq_m_n_s32, vcmpgeq_m_s32, vcmpgeq_m_n_s32, vcmpeqq_m_s32, vcmpeqq_m_n_s32, vshlq_m_r_s32, vrshlq_m_n_s32, vrev64q_m_s32, vqshlq_m_r_s32, vqrshlq_m_n_s32, vqnegq_m_s32, vqabsq_m_s32, vnegq_m_s32, vmvnq_m_s32, vmlsdavxq_p_s32, vmlsdavq_p_s32, vmladavxq_p_s32, vmladavq_p_s32, vminvq_p_s32, vmaxvq_p_s32, vdupq_m_n_s32, vclzq_m_s32, vclsq_m_s32, vaddvaq_p_s32, vabsq_m_s32, vqrdmlsdhxq_s32, vqrdmlsdhq_s32, vqrdmlashq_n_s32, vqrdmlahq_n_s32, vqrdmladhxq_s32, vqrdmladhq_s32, vqdmlsdhxq_s32, vqdmlsdhq_s32, vqdmlahq_n_s32, vqdmladhxq_s32, vqdmladhq_s32, vmlsdavaxq_s32, vmlsdavaq_s32, vmlasq_n_s32, vmlaq_n_s32, vmladavaxq_s32, vmladavaq_s32, vsriq_n_s32, vsliq_n_s32, vpselq_u64, vpselq_s64. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics In this patch new constraints "Rc" and "Re" are added, which checks the constant is with in the range of 0 to 15 and 0 to 31 respectively. Also a new predicates "mve_imm_15" and "mve_imm_31" are added, to check the the matching constraint Rc and Re respectively. Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-25 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vpselq_u8): Define macro. (vpselq_s8): Likewise. (vrev64q_m_u8): Likewise. (vqrdmlashq_n_u8): Likewise. (vqrdmlahq_n_u8): Likewise. (vqdmlahq_n_u8): Likewise. (vmvnq_m_u8): Likewise. (vmlasq_n_u8): Likewise. (vmlaq_n_u8): Likewise. (vmladavq_p_u8): Likewise. (vmladavaq_u8): Likewise. (vminvq_p_u8): Likewise. (vmaxvq_p_u8): Like
[PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands.
Hello, This patch supports following MVE ACLE intrinsics with ternary operands. vrmlaldavhaxq_s32, vrmlsldavhaq_s32, vrmlsldavhaxq_s32, vaddlvaq_p_s32, vcvtbq_m_f16_f32, vcvtbq_m_f32_f16, vcvttq_m_f16_f32, vcvttq_m_f32_f16, vrev16q_m_s8, vrev32q_m_f16, vrmlaldavhq_p_s32, vrmlaldavhxq_p_s32, vrmlsldavhq_p_s32, vrmlsldavhxq_p_s32, vaddlvaq_p_u32, vrev16q_m_u8, vrmlaldavhq_p_u32, vmvnq_m_n_s16, vorrq_m_n_s16, vqrshrntq_n_s16, vqshrnbq_n_s16, vqshrntq_n_s16, vrshrnbq_n_s16, vrshrntq_n_s16, vshrnbq_n_s16, vshrntq_n_s16, vcmlaq_f16, vcmlaq_rot180_f16, vcmlaq_rot270_f16, vcmlaq_rot90_f16, vfmaq_f16, vfmaq_n_f16, vfmasq_n_f16, vfmsq_f16, vmlaldavaq_s16, vmlaldavaxq_s16, vmlsldavaq_s16, vmlsldavaxq_s16, vabsq_m_f16, vcvtmq_m_s16_f16, vcvtnq_m_s16_f16, vcvtpq_m_s16_f16, vcvtq_m_s16_f16, vdupq_m_n_f16, vmaxnmaq_m_f16, vmaxnmavq_p_f16, vmaxnmvq_p_f16, vminnmaq_m_f16, vminnmavq_p_f16, vminnmvq_p_f16, vmlaldavq_p_s16, vmlaldavxq_p_s16, vmlsldavq_p_s16, vmlsldavxq_p_s16, vmovlbq_m_s8, vmovltq_m_s8, vmovnbq_m_s16, vmovntq_m_s16, vnegq_m_f16, vpselq_f16, vqmovnbq_m_s16, vqmovntq_m_s16, vrev32q_m_s8, vrev64q_m_f16, vrndaq_m_f16, vrndmq_m_f16, vrndnq_m_f16, vrndpq_m_f16, vrndq_m_f16, vrndxq_m_f16, vcmpeqq_m_n_f16, vcmpgeq_m_f16, vcmpgeq_m_n_f16, vcmpgtq_m_f16, vcmpgtq_m_n_f16, vcmpleq_m_f16, vcmpleq_m_n_f16, vcmpltq_m_f16, vcmpltq_m_n_f16, vcmpneq_m_f16, vcmpneq_m_n_f16, vmvnq_m_n_u16, vorrq_m_n_u16, vqrshruntq_n_s16, vqshrunbq_n_s16, vqshruntq_n_s16, vcvtmq_m_u16_f16, vcvtnq_m_u16_f16, vcvtpq_m_u16_f16, vcvtq_m_u16_f16, vqmovunbq_m_s16, vqmovuntq_m_s16, vqrshrntq_n_u16, vqshrnbq_n_u16, vqshrntq_n_u16, vrshrnbq_n_u16, vrshrntq_n_u16, vshrnbq_n_u16, vshrntq_n_u16, vmlaldavaq_u16, vmlaldavaxq_u16, vmlaldavq_p_u16, vmlaldavxq_p_u16, vmovlbq_m_u8, vmovltq_m_u8, vmovnbq_m_u16, vmovntq_m_u16, vqmovnbq_m_u16, vqmovntq_m_u16, vrev32q_m_u8, vmvnq_m_n_s32, vorrq_m_n_s32, vqrshrntq_n_s32, vqshrnbq_n_s32, vqshrntq_n_s32, vrshrnbq_n_s32, vrshrntq_n_s32, vshrnbq_n_s32, vshrntq_n_s32, vcmlaq_f32, vcmlaq_rot180_f32, vcmlaq_rot270_f32, vcmlaq_rot90_f32, vfmaq_f32, vfmaq_n_f32, vfmasq_n_f32, vfmsq_f32, vmlaldavaq_s32, vmlaldavaxq_s32, vmlsldavaq_s32, vmlsldavaxq_s32, vabsq_m_f32, vcvtmq_m_s32_f32, vcvtnq_m_s32_f32, vcvtpq_m_s32_f32, vcvtq_m_s32_f32, vdupq_m_n_f32, vmaxnmaq_m_f32, vmaxnmavq_p_f32, vmaxnmvq_p_f32, vminnmaq_m_f32, vminnmavq_p_f32, vminnmvq_p_f32, vmlaldavq_p_s32, vmlaldavxq_p_s32, vmlsldavq_p_s32, vmlsldavxq_p_s32, vmovlbq_m_s16, vmovltq_m_s16, vmovnbq_m_s32, vmovntq_m_s32, vnegq_m_f32, vpselq_f32, vqmovnbq_m_s32, vqmovntq_m_s32, vrev32q_m_s16, vrev64q_m_f32, vrndaq_m_f32, vrndmq_m_f32, vrndnq_m_f32, vrndpq_m_f32, vrndq_m_f32, vrndxq_m_f32, vcmpeqq_m_n_f32, vcmpgeq_m_f32, vcmpgeq_m_n_f32, vcmpgtq_m_f32, vcmpgtq_m_n_f32, vcmpleq_m_f32, vcmpleq_m_n_f32, vcmpltq_m_f32, vcmpltq_m_n_f32, vcmpneq_m_f32, vcmpneq_m_n_f32, vmvnq_m_n_u32, vorrq_m_n_u32, vqrshruntq_n_s32, vqshrunbq_n_s32, vqshruntq_n_s32, vcvtmq_m_u32_f32, vcvtnq_m_u32_f32, vcvtpq_m_u32_f32, vcvtq_m_u32_f32, vqmovunbq_m_s32, vqmovuntq_m_s32, vqrshrntq_n_u32, vqshrnbq_n_u32, vqshrntq_n_u32, vrshrnbq_n_u32, vrshrntq_n_u32, vshrnbq_n_u32, vshrntq_n_u32, vmlaldavaq_u32, vmlaldavaxq_u32, vmlaldavq_p_u32, vmlaldavxq_p_u32, vmovlbq_m_u16, vmovltq_m_u16, vmovnbq_m_u32, vmovntq_m_u32, vqmovnbq_m_u32, vqmovntq_m_u32, vrev32q_m_u16. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-29 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vrmlaldavhaxq_s32): Define macro. (vrmlsldavhaq_s32): Likewise. (vrmlsldavhaxq_s32): Likewise. (vaddlvaq_p_s32): Likewise. (vcvtbq_m_f16_f32): Likewise. (vcvtbq_m_f32_f16): Likewise. (vcvttq_m_f16_f32): Likewise. (vcvttq_m_f32_f16): Likewise. (vrev16q_m_s8): Likewise. (vrev32q_m_f16): Likewise. (vrmlaldavhq_p_s32): Likewise. (vrmlaldavhxq_p_s32): Likewise. (vrmlsldavhq_p_s32): Likewise. (vrmlsldavhxq_p_s32): Likewise. (vaddlvaq_p_u32): Likewise. (vrev16q_m_u8): Likewise. (vrmlaldavhq_p_u32): Likewise. (vmvnq_m_n_s16): Likewise. (vorrq_m_n_s16): Likewise. (vqrshrntq_n_s16): Likewise. (vqshrnbq_n_s16): Likewise. (vqshrntq_n_s16): Likewise. (vrshrnbq_n_s16): Likewise. (vrshrntq_n_s16): Likewise. (vshrnbq_n_s16): Likewise. (vshrntq_n_s16): Likewise. (vcmlaq_f16): Likewise. (vcmlaq_rot180_f16): Likewise. (vcmlaq_rot270_f16): Likewise. (vcmlaq_rot90_f16): Likewise. (vfmaq_f16): Likewise. (vfmaq_n_f16): Likewise. (vfmasq_n_f16): Likewise. (vfmsq_f16): Likewis
[PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands.
Hello, This patch supports following MVE ACLE intrinsics with binary operands. vsubq_u8, vsubq_n_u8, vrmulhq_u8, vrhaddq_u8, vqsubq_u8, vqsubq_n_u8, vqaddq_u8, vqaddq_n_u8, vorrq_u8, vornq_u8, vmulq_u8, vmulq_n_u8, vmulltq_int_u8, vmullbq_int_u8, vmulhq_u8, vmladavq_u8, vminvq_u8, vminq_u8, vmaxvq_u8, vmaxq_u8, vhsubq_u8, vhsubq_n_u8, vhaddq_u8, vhaddq_n_u8, veorq_u8, vcmpneq_n_u8, vcmphiq_u8, vcmphiq_n_u8, vcmpeqq_u8, vcmpeqq_n_u8, vcmpcsq_u8, vcmpcsq_n_u8, vcaddq_rot90_u8, vcaddq_rot270_u8, vbicq_u8, vandq_u8, vaddvq_p_u8, vaddvaq_u8, vaddq_n_u8, vabdq_u8, vshlq_r_u8, vrshlq_u8, vrshlq_n_u8, vqshlq_u8, vqshlq_r_u8, vqrshlq_u8, vqrshlq_n_u8, vminavq_s8, vminaq_s8, vmaxavq_s8, vmaxaq_s8, vbrsrq_n_u8, vshlq_n_u8, vrshrq_n_u8, vqshlq_n_u8, vcmpneq_n_s8, vcmpltq_s8, vcmpltq_n_s8, vcmpleq_s8, vcmpleq_n_s8, vcmpgtq_s8, vcmpgtq_n_s8, vcmpgeq_s8, vcmpgeq_n_s8, vcmpeqq_s8, vcmpeqq_n_s8, vqshluq_n_s8, vaddvq_p_s8, vsubq_s8, vsubq_n_s8, vshlq_r_s8, vrshlq_s8, vrshlq_n_s8, vrmulhq_s8, vrhaddq_s8, vqsubq_s8, vqsubq_n_s8, vqshlq_s8, vqshlq_r_s8, vqrshlq_s8, vqrshlq_n_s8, vqrdmulhq_s8, vqrdmulhq_n_s8, vqdmulhq_s8, vqdmulhq_n_s8, vqaddq_s8, vqaddq_n_s8, vorrq_s8, vornq_s8, vmulq_s8, vmulq_n_s8, vmulltq_int_s8, vmullbq_int_s8, vmulhq_s8, vmlsdavxq_s8, vmlsdavq_s8, vmladavxq_s8, vmladavq_s8, vminvq_s8, vminq_s8, vmaxvq_s8, vmaxq_s8, vhsubq_s8, vhsubq_n_s8, vhcaddq_rot90_s8, vhcaddq_rot270_s8, vhaddq_s8, vhaddq_n_s8, veorq_s8, vcaddq_rot90_s8, vcaddq_rot270_s8, vbrsrq_n_s8, vbicq_s8, vandq_s8, vaddvaq_s8, vaddq_n_s8, vabdq_s8, vshlq_n_s8, vrshrq_n_s8, vqshlq_n_s8, vsubq_u16, vsubq_n_u16, vrmulhq_u16, vrhaddq_u16, vqsubq_u16, vqsubq_n_u16, vqaddq_u16, vqaddq_n_u16, vorrq_u16, vornq_u16, vmulq_u16, vmulq_n_u16, vmulltq_int_u16, vmullbq_int_u16, vmulhq_u16, vmladavq_u16, vminvq_u16, vminq_u16, vmaxvq_u16, vmaxq_u16, vhsubq_u16, vhsubq_n_u16, vhaddq_u16, vhaddq_n_u16, veorq_u16, vcmpneq_n_u16, vcmphiq_u16, vcmphiq_n_u16, vcmpeqq_u16, vcmpeqq_n_u16, vcmpcsq_u16, vcmpcsq_n_u16, vcaddq_rot90_u16, vcaddq_rot270_u16, vbicq_u16, vandq_u16, vaddvq_p_u16, vaddvaq_u16, vaddq_n_u16, vabdq_u16, vshlq_r_u16, vrshlq_u16, vrshlq_n_u16, vqshlq_u16, vqshlq_r_u16, vqrshlq_u16, vqrshlq_n_u16, vminavq_s16, vminaq_s16, vmaxavq_s16, vmaxaq_s16, vbrsrq_n_u16, vshlq_n_u16, vrshrq_n_u16, vqshlq_n_u16, vcmpneq_n_s16, vcmpltq_s16, vcmpltq_n_s16, vcmpleq_s16, vcmpleq_n_s16, vcmpgtq_s16, vcmpgtq_n_s16, vcmpgeq_s16, vcmpgeq_n_s16, vcmpeqq_s16, vcmpeqq_n_s16, vqshluq_n_s16, vaddvq_p_s16, vsubq_s16, vsubq_n_s16, vshlq_r_s16, vrshlq_s16, vrshlq_n_s16, vrmulhq_s16, vrhaddq_s16, vqsubq_s16, vqsubq_n_s16, vqshlq_s16, vqshlq_r_s16, vqrshlq_s16, vqrshlq_n_s16, vqrdmulhq_s16, vqrdmulhq_n_s16, vqdmulhq_s16, vqdmulhq_n_s16, vqaddq_s16, vqaddq_n_s16, vorrq_s16, vornq_s16, vmulq_s16, vmulq_n_s16, vmulltq_int_s16, vmullbq_int_s16, vmulhq_s16, vmlsdavxq_s16, vmlsdavq_s16, vmladavxq_s16, vmladavq_s16, vminvq_s16, vminq_s16, vmaxvq_s16, vmaxq_s16, vhsubq_s16, vhsubq_n_s16, vhcaddq_rot90_s16, vhcaddq_rot270_s16, vhaddq_s16, vhaddq_n_s16, veorq_s16, vcaddq_rot90_s16, vcaddq_rot270_s16, vbrsrq_n_s16, vbicq_s16, vandq_s16, vaddvaq_s16, vaddq_n_s16, vabdq_s16, vshlq_n_s16, vrshrq_n_s16, vqshlq_n_s16, vsubq_u32, vsubq_n_u32, vrmulhq_u32, vrhaddq_u32, vqsubq_u32, vqsubq_n_u32, vqaddq_u32, vqaddq_n_u32, vorrq_u32, vornq_u32, vmulq_u32, vmulq_n_u32, vmulltq_int_u32, vmullbq_int_u32, vmulhq_u32, vmladavq_u32, vminvq_u32, vminq_u32, vmaxvq_u32, vmaxq_u32, vhsubq_u32, vhsubq_n_u32, vhaddq_u32, vhaddq_n_u32, veorq_u32, vcmpneq_n_u32, vcmphiq_u32, vcmphiq_n_u32, vcmpeqq_u32, vcmpeqq_n_u32, vcmpcsq_u32, vcmpcsq_n_u32, vcaddq_rot90_u32, vcaddq_rot270_u32, vbicq_u32, vandq_u32, vaddvq_p_u32, vaddvaq_u32, vaddq_n_u32, vabdq_u32, vshlq_r_u32, vrshlq_u32, vrshlq_n_u32, vqshlq_u32, vqshlq_r_u32, vqrshlq_u32, vqrshlq_n_u32, vminavq_s32, vminaq_s32, vmaxavq_s32, vmaxaq_s32, vbrsrq_n_u32, vshlq_n_u32, vrshrq_n_u32, vqshlq_n_u32, vcmpneq_n_s32, vcmpltq_s32, vcmpltq_n_s32, vcmpleq_s32, vcmpleq_n_s32, vcmpgtq_s32, vcmpgtq_n_s32, vcmpgeq_s32, vcmpgeq_n_s32, vcmpeqq_s32, vcmpeqq_n_s32, vqshluq_n_s32, vaddvq_p_s32, vsubq_s32, vsubq_n_s32, vshlq_r_s32, vrshlq_s32, vrshlq_n_s32, vrmulhq_s32, vrhaddq_s32, vqsubq_s32, vqsubq_n_s32, vqshlq_s32, vqshlq_r_s32, vqrshlq_s32, vqrshlq_n_s32, vqrdmulhq_s32, vqrdmulhq_n_s32, vqdmulhq_s32, vqdmulhq_n_s32, vqaddq_s32, vqaddq_n_s32, vorrq_s32, vornq_s32, vmulq_s32, vmulq_n_s32, vmulltq_int_s32, vmullbq_int_s32, vmulhq_s32, vmlsdavxq_s32, vmlsdavq_s32, vmladavxq_s32, vmladavq_s32, vminvq_s32, vminq_s32, vmaxvq_s32, vmaxq_s32, vhsubq_s32, vhsubq_n_s32, vhcaddq_rot90_s32, vhcaddq_rot270_s32, vhaddq_s32, vhaddq_n_s32, veorq_s32, vcaddq_rot90_s32, vcaddq_rot270_s32, vbrsrq_n_s32, vbicq_s32, vandq_s32, vaddvaq_s32, vaddq_n_s32, vabdq_s32, vshlq_n_s32, vrshrq_n_s32, vqshlq_n_s32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-set
[PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands.
Hello, This patch supports following MVE ACLE intrinsics with binary operands. vqmovntq_u16, vqmovnbq_u16, vmulltq_poly_p8, vmullbq_poly_p8, vmovntq_u16, vmovnbq_u16, vmlaldavxq_u16, vmlaldavq_u16, vqmovuntq_s16, vqmovunbq_s16, vshlltq_n_u8, vshllbq_n_u8, vorrq_n_u16, vbicq_n_u16, vcmpneq_n_f16, vcmpneq_f16, vcmpltq_n_f16, vcmpltq_f16, vcmpleq_n_f16, vcmpleq_f16, vcmpgtq_n_f16, vcmpgtq_f16, vcmpgeq_n_f16, vcmpgeq_f16, vcmpeqq_n_f16, vcmpeqq_f16, vsubq_f16, vqmovntq_s16, vqmovnbq_s16, vqdmulltq_s16, vqdmulltq_n_s16, vqdmullbq_s16, vqdmullbq_n_s16, vorrq_f16, vornq_f16, vmulq_n_f16, vmulq_f16, vmovntq_s16, vmovnbq_s16, vmlsldavxq_s16, vmlsldavq_s16, vmlaldavxq_s16, vmlaldavq_s16, vminnmvq_f16, vminnmq_f16, vminnmavq_f16, vminnmaq_f16, vmaxnmvq_f16, vmaxnmq_f16, vmaxnmavq_f16, vmaxnmaq_f16, veorq_f16, vcmulq_rot90_f16, vcmulq_rot270_f16, vcmulq_rot180_f16, vcmulq_f16, vcaddq_rot90_f16, vcaddq_rot270_f16, vbicq_f16, vandq_f16, vaddq_n_f16, vabdq_f16, vshlltq_n_s8, vshllbq_n_s8, vorrq_n_s16, vbicq_n_s16, vqmovntq_u32, vqmovnbq_u32, vmulltq_poly_p16, vmullbq_poly_p16, vmovntq_u32, vmovnbq_u32, vmlaldavxq_u32, vmlaldavq_u32, vqmovuntq_s32, vqmovunbq_s32, vshlltq_n_u16, vshllbq_n_u16, vorrq_n_u32, vbicq_n_u32, vcmpneq_n_f32, vcmpneq_f32, vcmpltq_n_f32, vcmpltq_f32, vcmpleq_n_f32, vcmpleq_f32, vcmpgtq_n_f32, vcmpgtq_f32, vcmpgeq_n_f32, vcmpgeq_f32, vcmpeqq_n_f32, vcmpeqq_f32, vsubq_f32, vqmovntq_s32, vqmovnbq_s32, vqdmulltq_s32, vqdmulltq_n_s32, vqdmullbq_s32, vqdmullbq_n_s32, vorrq_f32, vornq_f32, vmulq_n_f32, vmulq_f32, vmovntq_s32, vmovnbq_s32, vmlsldavxq_s32, vmlsldavq_s32, vmlaldavxq_s32, vmlaldavq_s32, vminnmvq_f32, vminnmq_f32, vminnmavq_f32, vminnmaq_f32, vmaxnmvq_f32, vmaxnmq_f32, vmaxnmavq_f32, vmaxnmaq_f32, veorq_f32, vcmulq_rot90_f32, vcmulq_rot270_f32, vcmulq_rot180_f32, vcmulq_f32, vcaddq_rot90_f32, vcaddq_rot270_f32, vbicq_f32, vandq_f32, vaddq_n_f32, vabdq_f32, vshlltq_n_s16, vshllbq_n_s16, vorrq_n_s32, vbicq_n_s32, vrmlaldavhq_u32, vctp8q_m, vctp64q_m, vctp32q_m, vctp16q_m, vaddlvaq_u32, vrmlsldavhxq_s32, vrmlsldavhq_s32, vrmlaldavhxq_s32, vrmlaldavhq_s32, vcvttq_f16_f32, vcvtbq_f16_f32, vaddlvaq_s32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics The above intrinsics are defined using the already defined builtin qualifiers BINOP_NONE_NONE_IMM, BINOP_NONE_NONE_NONE, BINOP_UNONE_NONE_NONE, BINOP_UNONE_UNONE_IMM, BINOP_UNONE_UNONE_NONE, BINOP_UNONE_UNONE_UNONE. Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-23 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vqmovntq_u16): Define macro. (vqmovnbq_u16): Likewise. (vmulltq_poly_p8): Likewise. (vmullbq_poly_p8): Likewise. (vmovntq_u16): Likewise. (vmovnbq_u16): Likewise. (vmlaldavxq_u16): Likewise. (vmlaldavq_u16): Likewise. (vqmovuntq_s16): Likewise. (vqmovunbq_s16): Likewise. (vshlltq_n_u8): Likewise. (vshllbq_n_u8): Likewise. (vorrq_n_u16): Likewise. (vbicq_n_u16): Likewise. (vcmpneq_n_f16): Likewise. (vcmpneq_f16): Likewise. (vcmpltq_n_f16): Likewise. (vcmpltq_f16): Likewise. (vcmpleq_n_f16): Likewise. (vcmpleq_f16): Likewise. (vcmpgtq_n_f16): Likewise. (vcmpgtq_f16): Likewise. (vcmpgeq_n_f16): Likewise. (vcmpgeq_f16): Likewise. (vcmpeqq_n_f16): Likewise. (vcmpeqq_f16): Likewise. (vsubq_f16): Likewise. (vqmovntq_s16): Likewise. (vqmovnbq_s16): Likewise. (vqdmulltq_s16): Likewise. (vqdmulltq_n_s16): Likewise. (vqdmullbq_s16): Likewise. (vqdmullbq_n_s16): Likewise. (vorrq_f16): Likewise. (vornq_f16): Likewise. (vmulq_n_f16): Likewise. (vmulq_f16): Likewise. (vmovntq_s16): Likewise. (vmovnbq_s16): Likewise. (vmlsldavxq_s16): Likewise. (vmlsldavq_s16): Likewise. (vmlaldavxq_s16): Likewise. (vmlaldavq_s16): Likewise. (vminnmvq_f16): Likewise. (vminnmq_f16): Likewise. (vminnmavq_f16): Likewise. (vminnmaq_f16): Likewise. (vmaxnmvq_f16): Likewise. (vmaxnmq_f16): Likewise. (vmaxnmavq_f16): Likewise. (vmaxnmaq_f16): Likewise. (veorq_f16): Likewise. (vcmulq_rot90_f16): Likewise. (vcmulq_rot270_f16): Likewise. (vcmulq_rot180_f16): Likewise. (vcmulq_f16): Likewise. (vcaddq_rot90_f16): Likewise. (vcaddq_rot270_f16): Likewise. (vbicq_f16): Likewise. (vandq_f16): Likewise. (vaddq_n_f16): Likewise. (vabdq_f16): Likewise. (vshlltq_n_s8): Likewise. (vshllbq_n_s8): Likewise. (vorrq_n_s16
[PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands.
Hello, This patch supports following MVE ACLE intrinsics with binary operands. vaddlvq_p_s32, vaddlvq_p_u32, vcmpneq_s8, vcmpneq_s16, vcmpneq_s32, vcmpneq_u8, vcmpneq_u16, vcmpneq_u32, vshlq_s8, vshlq_s16, vshlq_s32, vshlq_u8, vshlq_u16, vshlq_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (BINOP_NONE_NONE_UNONE_QUALIFIERS): Define qualifier for binary operands. (BINOP_UNONE_NONE_NONE_QUALIFIERS): Likewise. (BINOP_UNONE_UNONE_NONE_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vaddlvq_p_s32): Define macro. (vaddlvq_p_u32): Likewise. (vcmpneq_s8): Likewise. (vcmpneq_s16): Likewise. (vcmpneq_s32): Likewise. (vcmpneq_u8): Likewise. (vcmpneq_u16): Likewise. (vcmpneq_u32): Likewise. (vshlq_s8): Likewise. (vshlq_s16): Likewise. (vshlq_s32): Likewise. (vshlq_u8): Likewise. (vshlq_u16): Likewise. (vshlq_u32): Likewise. (__arm_vaddlvq_p_s32): Define intrinsic. (__arm_vaddlvq_p_u32): Likewise. (__arm_vcmpneq_s8): Likewise. (__arm_vcmpneq_s16): Likewise. (__arm_vcmpneq_s32): Likewise. (__arm_vcmpneq_u8): Likewise. (__arm_vcmpneq_u16): Likewise. (__arm_vcmpneq_u32): Likewise. (__arm_vshlq_s8): Likewise. (__arm_vshlq_s16): Likewise. (__arm_vshlq_s32): Likewise. (__arm_vshlq_u8): Likewise. (__arm_vshlq_u16): Likewise. (__arm_vshlq_u32): Likewise. (vaddlvq_p): Define polymorphic variant. (vcmpneq): Likewise. (vshlq): Likewise. * config/arm/arm_mve_builtins.def (BINOP_NONE_NONE_UNONE_QUALIFIERS): Use it. (BINOP_UNONE_NONE_NONE_QUALIFIERS): Likewise. (BINOP_UNONE_UNONE_NONE_QUALIFIERS): Likewise. * config/arm/mve.md (mve_vaddlvq_p_v4si): Define RTL pattern. (mve_vcmpneq_): Likewise. (mve_vshlq_): Likewise. gcc/testsuite/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c: New test. * gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpneq_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpneq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpneq_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpneq_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpneq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpneq_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlq_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlq_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlq_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlq_u8.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index ec56dbcdd2bb1de696142186955d75570772..73a0a3070bda2e35f3e400994dbb6ccfb3043f76 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -391,6 +391,24 @@ arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define BINOP_UNONE_NONE_IMM_QUALIFIERS \ (arm_binop_unone_none_imm_qualifiers) +static enum arm_type_qualifiers +arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned }; +#define BINOP_NONE_NONE_UNONE_QUALIFIERS \ + (arm_binop_none_none_unone_qualifiers) + +static enum arm_type_qualifiers +arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_none, qualifier_none }; +#define BINOP_UNONE_NONE_NONE_QUALIFIERS \ + (arm_binop_unone_none_none_qualifiers) + +static enum arm_type_qualifiers +arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_unsigned, qualifier_none }; +#define BINOP_UNONE_UNONE_NONE_QUALIFIERS \ + (arm_binop_unone_unone_none_qualifiers) + /* End of Qualifier for MVE builtins. */ /* void ([T element type] *, T, immediate). */ diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 0318a2cd8c3cdea430a32506e75a5aceb4d3a475..0140b0cb38ccee5f9caa7570627bdc010e158e71 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -225,6 +225,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t; #define vshrq_n_u8(__a, __im
[PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands.
Hello, This patch supports following MVE ACLE intrinsics with binary operands. vcvtq_n_s16_f16, vcvtq_n_s32_f32, vcvtq_n_u16_f16, vcvtq_n_u32_f32, vcreateq_u8, vcreateq_u16, vcreateq_u32, vcreateq_u64, vcreateq_s8, vcreateq_s16, vcreateq_s32, vcreateq_s64, vshrq_n_s8, vshrq_n_s16, vshrq_n_s32, vshrq_n_u8, vshrq_n_u16, vshrq_n_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics In this patch new constraints "Rb" and "Rf" are added, which checks the constant is with in the range of 1 to 8 and 1 to 32 respectively. Also a new predicates "mve_imm_8" and "mve_imm_32" are added, to check the the matching constraint Rb and Rf respectively. Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (BINOP_UNONE_UNONE_IMM_QUALIFIERS): Define qualifier for binary operands. (BINOP_UNONE_UNONE_UNONE_QUALIFIERS): Likewise. (BINOP_UNONE_NONE_IMM_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vcvtq_n_s16_f16): Define macro. (vcvtq_n_s32_f32): Likewise. (vcvtq_n_u16_f16): Likewise. (vcvtq_n_u32_f32): Likewise. (vcreateq_u8): Likewise. (vcreateq_u16): Likewise. (vcreateq_u32): Likewise. (vcreateq_u64): Likewise. (vcreateq_s8): Likewise. (vcreateq_s16): Likewise. (vcreateq_s32): Likewise. (vcreateq_s64): Likewise. (vshrq_n_s8): Likewise. (vshrq_n_s16): Likewise. (vshrq_n_s32): Likewise. (vshrq_n_u8): Likewise. (vshrq_n_u16): Likewise. (vshrq_n_u32): Likewise. (__arm_vcreateq_u8): Define intrinsic. (__arm_vcreateq_u16): Likewise. (__arm_vcreateq_u32): Likewise. (__arm_vcreateq_u64): Likewise. (__arm_vcreateq_s8): Likewise. (__arm_vcreateq_s16): Likewise. (__arm_vcreateq_s32): Likewise. (__arm_vcreateq_s64): Likewise. (__arm_vshrq_n_s8): Likewise. (__arm_vshrq_n_s16): Likewise. (__arm_vshrq_n_s32): Likewise. (__arm_vshrq_n_u8): Likewise. (__arm_vshrq_n_u16): Likewise. (__arm_vshrq_n_u32): Likewise. (__arm_vcvtq_n_s16_f16): Likewise. (__arm_vcvtq_n_s32_f32): Likewise. (__arm_vcvtq_n_u16_f16): Likewise. (__arm_vcvtq_n_u32_f32): Likewise. (vshrq_n): Define polymorphic variant. * config/arm/arm_mve_builtins.def (BINOP_UNONE_UNONE_IMM_QUALIFIERS): Use it. (BINOP_UNONE_UNONE_UNONE_QUALIFIERS): Likewise. (BINOP_UNONE_NONE_IMM_QUALIFIERS): Likewise. * config/arm/constraints.md (Rb): Define constraint to check constant is in the range of 1 to 8. (Rf): Define constraint to check constant is in the range of 1 to 32. * config/arm/mve.md (mve_vcreateq_): Define RTL pattern. (mve_vshrq_n_): Likewise. (mve_vcvtq_n_from_f_): Likewise. * config/arm/predicates.md (mve_imm_8): Define predicate to check the matching constraint Rb. (mve_imm_32): Define predicate to check the matching constraint Rf. gcc/testsuite/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vcreateq_s16.c: New test. * gcc.target/arm/mve/intrinsics/vcreateq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcreateq_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vcreateq_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vcreateq_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcreateq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcreateq_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vcreateq_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_s16_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_s32_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_u16_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_u32_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vshrq_n_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vshrq_n_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vshrq_n_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vshrq_n_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vshrq_n_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vshrq_n_u8.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index c2dad057d1365914477c64d559aa1fd1c32bbf19..ec56dbcdd2bb1de696142186955d75570772 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -373,6
[PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands.
Hello, This patch supports following MVE ACLE intrinsics with binary operand. vsubq_n_f16, vsubq_n_f32, vbrsrq_n_f16, vbrsrq_n_f32, vcvtq_n_f16_s16, vcvtq_n_f32_s32, vcvtq_n_f16_u16, vcvtq_n_f32_u32, vcreateq_f16, vcreateq_f32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics In this patch new constraint "Rd" is added, which checks the constant is with in the range of 1 to 16. Also a new predicate "mve_imm_16" is added, to check the the matching constraint Rd. Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (BINOP_NONE_NONE_NONE_QUALIFIERS): Define qualifier for binary operands. (BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise. (BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise. (BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vsubq_n_f16): Define macro. (vsubq_n_f32): Likewise. (vbrsrq_n_f16): Likewise. (vbrsrq_n_f32): Likewise. (vcvtq_n_f16_s16): Likewise. (vcvtq_n_f32_s32): Likewise. (vcvtq_n_f16_u16): Likewise. (vcvtq_n_f32_u32): Likewise. (vcreateq_f16): Likewise. (vcreateq_f32): Likewise. (__arm_vsubq_n_f16): Define intrinsic. (__arm_vsubq_n_f32): Likewise. (__arm_vbrsrq_n_f16): Likewise. (__arm_vbrsrq_n_f32): Likewise. (__arm_vcvtq_n_f16_s16): Likewise. (__arm_vcvtq_n_f32_s32): Likewise. (__arm_vcvtq_n_f16_u16): Likewise. (__arm_vcvtq_n_f32_u32): Likewise. (__arm_vcreateq_f16): Likewise. (__arm_vcreateq_f32): Likewise. (vsubq): Define polymorphic variant. (vbrsrq): Likewise. (vcvtq_n): Likewise. * config/arm/arm_mve_builtins.def (BINOP_NONE_NONE_NONE_QUALIFIERS): Use it. (BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise. (BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise. (BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise. * config/arm/constraints.md (Rd): Define constraint to check constant is in the range of 1 to 16. * config/arm/mve.md (mve_vsubq_n_f): Define RTL pattern. mve_vbrsrq_n_f: Likewise. mve_vcvtq_n_to_f_: Likewise. mve_vcreateq_f: Likewise. * config/arm/predicates.md (mve_imm_16): Define predicate to check the matching constraint Rd. gcc/testsuite/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c: New test. * gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcreateq_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcreateq_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsubq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsubq_n_f32.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index cd82aa159089c288607e240de02a85dcbb134a14..c2dad057d1365914477c64d559aa1fd1c32bbf19 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -349,6 +349,30 @@ arm_unop_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define UNOP_UNONE_IMM_QUALIFIERS \ (arm_unop_unone_imm_qualifiers) +static enum arm_type_qualifiers +arm_binop_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none }; +#define BINOP_NONE_NONE_NONE_QUALIFIERS \ + (arm_binop_none_none_none_qualifiers) + +static enum arm_type_qualifiers +arm_binop_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_immediate }; +#define BINOP_NONE_NONE_IMM_QUALIFIERS \ + (arm_binop_none_none_imm_qualifiers) + +static enum arm_type_qualifiers +arm_binop_none_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_unsigned, qualifier_immediate }; +#define BINOP_NONE_UNONE_IMM_QUALIFIERS \ + (arm_binop_none_unone_imm_qualifiers) + +static enum arm_type_qualifiers +arm_binop_none_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_unsigned, qualifier_unsigned }; +#define BINOP_NONE_UNONE_UNONE_QUALIFIERS \ + (arm_binop_none_unone_unone_qualifiers) + /* End of Qualifier for MVE builtins. */ /* void ([T element type] *, T, immediate). */ diff --git a/gcc/config/arm/arm_mve.h b/
[PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand.
Hello, This patch supports MVE ACLE intrinsics vcvtq_f16_s16, vcvtq_f32_s32, vcvtq_f16_u16, vcvtq_f32_u32n vrndxq_f16, vrndxq_f32, vrndq_f16, vrndq_f32, vrndpq_f16, vrndpq_f32, vrndnq_f16, vrndnq_f32, vrndmq_f16, vrndmq_f32, vrndaq_f16, vrndaq_f32, vrev64q_f16, vrev64q_f32, vnegq_f16, vnegq_f32, vdupq_n_f16, vdupq_n_f32, vabsq_f16, vabsq_f32, vrev32q_f16, vcvttq_f32_f16, vcvtbq_f32_f16. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-17 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (UNOP_NONE_NONE_QUALIFIERS): Define macro. (UNOP_NONE_SNONE_QUALIFIERS): Likewise. (UNOP_NONE_UNONE_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vrndxq_f16): Define macro. (vrndxq_f32): Likewise. (vrndq_f16) Likewise.: (vrndq_f32): Likewise. (vrndpq_f16): Likewise. (vrndpq_f32): Likewise. (vrndnq_f16): Likewise. (vrndnq_f32): Likewise. (vrndmq_f16): Likewise. (vrndmq_f32): Likewise. (vrndaq_f16): Likewise. (vrndaq_f32): Likewise. (vrev64q_f16): Likewise. (vrev64q_f32): Likewise. (vnegq_f16): Likewise. (vnegq_f32): Likewise. (vdupq_n_f16): Likewise. (vdupq_n_f32): Likewise. (vabsq_f16): Likewise. (vabsq_f32): Likewise. (vrev32q_f16): Likewise. (vcvttq_f32_f16): Likewise. (vcvtbq_f32_f16): Likewise. (vcvtq_f16_s16): Likewise. (vcvtq_f32_s32): Likewise. (vcvtq_f16_u16): Likewise. (vcvtq_f32_u32): Likewise. (__arm_vrndxq_f16): Define intrinsic. (__arm_vrndxq_f32): Likewise. (__arm_vrndq_f16): Likewise. (__arm_vrndq_f32): Likewise. (__arm_vrndpq_f16): Likewise. (__arm_vrndpq_f32): Likewise. (__arm_vrndnq_f16): Likewise. (__arm_vrndnq_f32): Likewise. (__arm_vrndmq_f16): Likewise. (__arm_vrndmq_f32): Likewise. (__arm_vrndaq_f16): Likewise. (__arm_vrndaq_f32): Likewise. (__arm_vrev64q_f16): Likewise. (__arm_vrev64q_f32): Likewise. (__arm_vnegq_f16): Likewise. (__arm_vnegq_f32): Likewise. (__arm_vdupq_n_f16): Likewise. (__arm_vdupq_n_f32): Likewise. (__arm_vabsq_f16): Likewise. (__arm_vabsq_f32): Likewise. (__arm_vrev32q_f16): Likewise. (__arm_vcvttq_f32_f16): Likewise. (__arm_vcvtbq_f32_f16): Likewise. (__arm_vcvtq_f16_s16): Likewise. (__arm_vcvtq_f32_s32): Likewise. (__arm_vcvtq_f16_u16): Likewise. (__arm_vcvtq_f32_u32): Likewise. (vrndxq): Define polymorphic variants. (vrndq): Likewise. (vrndpq): Likewise. (vrndnq): Likewise. (vrndmq): Likewise. (vrndaq): Likewise. (vrev64q): Likewise. (vnegq): Likewise. (vabsq): Likewise. (vrev32q): Likewise. (vcvtbq_f32): Likewise. (vcvttq_f32): Likewise. (vcvtq): Likewise. * config/arm/arm_mve_builtins.def (VAR2): Define. (VAR1): Define. * config/arm/mve.md (mve_vrndxq_f): Add RTL pattern. (mve_vrndq_f): Likewise. (mve_vrndpq_f): Likewise. (mve_vrndnq_f): Likewise. (mve_vrndmq_f): Likewise. (mve_vrndaq_f): Likewise. (mve_vrev64q_f): Likewise. (mve_vnegq_f): Likewise. (mve_vdupq_n_f): Likewise. (mve_vabsq_f): Likewise. (mve_vrev32q_fv8hf): Likewise. (mve_vcvttq_f32_f16v4sf): Likewise. (mve_vcvtbq_f32_f16v4sf): Likewise. (mve_vcvtq_to_f_): Likewise. gcc/testsuite/ChangeLog: 2019-10-17 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vabsq_f16.c: New test. * gcc.target/arm/mve/intrinsics/vabsq_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vnegq_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vnegq_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev32q_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vr
[PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics.
Hello, This patch supports MVE ACLE intrinsics vst4q_s8, vst4q_s16, vst4q_s32, vst4q_u8, vst4q_u16, vst4q_u32, vst4q_f16 and vst4q_f32. In this patch arm_mve_builtins.def file is added to the source code in which the builtins for MVE ACLE intrinsics are defined using builtin qualifiers. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-12 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (CF): Define mve_builtin_data. (VAR1): Define. (ARM_BUILTIN_MVE_PATTERN_START): Define. (arm_init_mve_builtins): Define function. (arm_init_builtins): Add TARGET_HAVE_MVE check. (arm_expand_builtin_1): Check the range of fcode. (arm_expand_mve_builtin): Define function to expand MVE builtins. (arm_expand_builtin): Check the range of fcode. * config/arm/arm_mve.h (__ARM_FEATURE_MVE): Define MVE floating point types. (__ARM_MVE_PRESERVE_USER_NAMESPACE): Define to protect user namespace. (vst4q_s8): Define macro. (vst4q_s16): Likewise. (vst4q_s32): Likewise. (vst4q_u8): Likewise. (vst4q_u16): Likewise. (vst4q_u32): Likewise. (vst4q_f16): Likewise. (vst4q_f32): Likewise. (__arm_vst4q_s8): Define inline builtin. (__arm_vst4q_s16): Likewise. (__arm_vst4q_s32): Likewise. (__arm_vst4q_u8): Likewise. (__arm_vst4q_u16): Likewise. (__arm_vst4q_u32): Likewise. (__arm_vst4q_f16): Likewise. (__arm_vst4q_f32): Likewise. (__ARM_mve_typeid): Define macro with MVE types. (__ARM_mve_coerce): Define macro with _Generic feature. (vst4q): Define polymorphic variant for different vst4q builtins. * config/arm/arm_mve_builtins.def: New file. * config/arm/mve.md (MVE_VLD_ST): Define iterator. (unspec): Define unspec. (mve_vst4q): Define RTL pattern. * config/arm/t-arm (arm.o): Add entry for arm_mve_builtins.def. (arm-builtins.o): Likewise. gcc/testsuite/ChangeLog: 2019-11-12 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vst4q_f16.c: New test. * gcc.target/arm/mve/intrinsics/vst4q_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_u8.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index d4cb0ea3deb49b10266d1620c85e243ed34aee4d..a9f76971ef310118bf7edea6a8dd3de1da46b46b 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -401,6 +401,13 @@ static arm_builtin_datum neon_builtin_data[] = }; #undef CF +#define CF(N,X) CODE_FOR_mve_##N##X +static arm_builtin_datum mve_builtin_data[] = +{ +#include "arm_mve_builtins.def" +}; + +#undef CF #undef VAR1 #define VAR1(T, N, A) \ {#N, UP (A), CODE_FOR_arm_##N, 0, T##_QUALIFIERS}, @@ -705,6 +712,13 @@ enum arm_builtins #include "arm_acle_builtins.def" + ARM_BUILTIN_MVE_BASE, + +#undef VAR1 +#define VAR1(T, N, X) \ + ARM_BUILTIN_MVE_##N##X, +#include "arm_mve_builtins.def" + ARM_BUILTIN_MAX }; @@ -714,6 +728,9 @@ enum arm_builtins #define ARM_BUILTIN_NEON_PATTERN_START \ (ARM_BUILTIN_NEON_BASE + 1) +#define ARM_BUILTIN_MVE_PATTERN_START \ + (ARM_BUILTIN_MVE_BASE + 1) + #define ARM_BUILTIN_ACLE_PATTERN_START \ (ARM_BUILTIN_ACLE_BASE + 1) @@ -1219,6 +1236,22 @@ arm_init_acle_builtins (void) } } +/* Set up all the MVE builtins mentioned in arm_mve_builtins.def file. */ +static void +arm_init_mve_builtins (void) +{ + volatile unsigned int i, fcode = ARM_BUILTIN_MVE_PATTERN_START; + + arm_init_simd_builtin_scalar_types (); + arm_init_simd_builtin_types (); + + for (i = 0; i < ARRAY_SIZE (mve_builtin_data); i++, fcode++) +{ + arm_builtin_datum *d = &mve_builtin_data[i]; + arm_init_builtin (fcode, d, "__builtin_mve"); +} +} + /* Set up all the NEON builtins, even builtins for instructions that are not in the current target ISA to allow the user to compile particular modules with different target specific options that differ from the command line @@ -1961,8 +1994,10 @@ arm_init_builtins (void) = add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr, ARM_BUILT
[PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.
Hello, This patch creates the required framework for MVE ACLE intrinsics. The following changes are done in this patch to support MVE ACLE intrinsics. Header file arm_mve.h is added to source code, which contains the definitions of MVE ACLE intrinsics and different data types used in MVE. Machine description file mve.md is also added which contains the RTL patterns defined for MVE. A new reigster "p0" is added which is used in by MVE predicated patterns. A new register class "VPR_REG" is added and its contents are defined in REG_CLASS_CONTENTS. The vec-common.md file is modified to support the standard move patterns. The prefix of neon functions which are also used by MVE is changed from "neon_" to "simd_". eg: neon_immediate_valid_for_move changed to simd_immediate_valid_for_move. In the patch standard patterns mve_move, mve_store and move_load for MVE are added and neon.md and vfp.md files are modified to support this common patterns. Please refer to Arm reference manual [1] for more details. [1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914 Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath gcc/ChangeLog: 2019-11-11 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config.gcc (arm_mve.h): Add header file. * config/arm/aout.h (p0): Add new register name. * config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define. (ARM_BUILTIN_NEON_LANE_CHECK): Remove. (arm_init_simd_builtin_types): Add TARGET_HAVE_MVE check. (arm_init_neon_builtins): Move a check to arm_init_builtins function. (arm_init_builtins): Move a check from arm_init_neon_builtins function. (mve_dereference_pointer): Add new function. (arm_expand_builtin_args): Add TARGET_HAVE_MVE check. (arm_expand_neon_builtin): Move a check to arm_expand_builtin function. (arm_expand_builtin): Move a check from arm_expand_neon_builtin function. * config/arm/arm-c.c (arm_cpu_builtins): Define macros for MVE. * config/arm/arm-modes.def (INT_MODE): Add three new integer modes. * config/arm/arm-protos.h (neon_immediate_valid_for_move): Rename function. (simd_immediate_valid_for_move): Rename neon_immediate_valid_for_move function. * config/arm/arm.c (arm_options_perform_arch_sanity_checks):Enable mve isa bit. (use_return_insn): Add TARGET_HAVE_MVE check. (aapcs_vfp_allocate): Add TARGET_HAVE_MVE check. (aapcs_vfp_allocate_return_reg): Add TARGET_HAVE_MVE check. (thumb2_legitimate_address_p): Add TARGET_HAVE_MVE check. (arm_rtx_costs_internal): Add TARGET_HAVE_MVE check. (neon_valid_immediate): Rename to simd_valid_immediate. (simd_valid_immediate): Rename from neon_valid_immediate. (neon_immediate_valid_for_move): Rename to simd_immediate_valid_for_move. (simd_immediate_valid_for_move): Rename from neon_immediate_valid_for_move. (neon_immediate_valid_for_logic): Modify call to neon_valid_immediate function. (neon_make_constant): Modify call to neon_valid_immediate function. (neon_vector_mem_operand): Add TARGET_HAVE_MVE check. (output_move_neon): Add TARGET_HAVE_MVE check. (arm_compute_frame_layout): Add TARGET_HAVE_MVE check. (arm_save_coproc_regs): Add TARGET_HAVE_MVE check. (arm_print_operand): Add case 'E' to print memory operands. (arm_print_operand_address): Add TARGET_HAVE_MVE check. (arm_hard_regno_mode_ok): Add TARGET_HAVE_MVE check. (arm_modes_tieable_p): Add TARGET_HAVE_MVE check. (arm_regno_class): Add VPR_REGNUM check. (arm_expand_epilogue_apcs_frame): Add TARGET_HAVE_MVE check. (arm_expand_epilogue): Add TARGET_HAVE_MVE check. (arm_vector_mode_supported_p): Add TARGET_HAVE_MVE check for MVE vector modes. (arm_array_mode_supported_p): Add TARGET_HAVE_MVE check. (arm_conditional_register_usage): For TARGET_HAVE_MVE enable VPR register. * config/arm/arm.h (IS_VPR_REGNUM): Macro to check for VPR register. (FIRST_PSEUDO_REGISTER): Modify. (VALID_MVE_MODE): Define. (VALID_MVE_SI_MODE): Define. (VALID_MVE_SF_MODE): Define. (VALID_MVE_STRUCT_MODE): Define. (REG_ALLOC_ORDER): Add VPR_REGNUM entry. (enum reg_class): Add VPR_REG entry. (REG_CLASS_NAMES): Add VPR_REG entry. * config/arm/arm.md (VPR_REGNUM): Define. (arm_movsf_soft_insn): Add TARGET_HAVE_MVE check to not allow MVE. (vfp_pop_multiple_with_writeback): Add TARGET_HAVE_MVE check to allow writeback. (include "mve.md"): Include mve.md file. * config/arm/arm_mve.h: New file. * config/arm/constraints.md (Up): Define. * config/arm/iterators.md (VNIM1): Define. (VNINOTM1):
[PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand.
Hello, This patch supports following MVE ACLE intrinsics with unary operand. vmvnq_n_s16, vmvnq_n_s32, vrev64q_s8, vrev64q_s16, vrev64q_s32, vcvtq_s16_f16, vcvtq_s32_f32, vrev64q_u8, vrev64q_u16, vrev64q_u32, vmvnq_n_u16, vmvnq_n_u32, vcvtq_u16_f16, vcvtq_u32_f32, vrev64q. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (UNOP_SNONE_SNONE_QUALIFIERS): Define. (UNOP_SNONE_NONE_QUALIFIERS): Likewise. (UNOP_SNONE_IMM_QUALIFIERS): Likewise. (UNOP_UNONE_NONE_QUALIFIERS): Likewise. (UNOP_UNONE_UNONE_QUALIFIERS): Likewise. (UNOP_UNONE_IMM_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vmvnq_n_s16): Define macro. (vmvnq_n_s32): Likewise. (vrev64q_s8): Likewise. (vrev64q_s16): Likewise. (vrev64q_s32): Likewise. (vcvtq_s16_f16): Likewise. (vcvtq_s32_f32): Likewise. (vrev64q_u8): Likewise. (vrev64q_u16): Likewise. (vrev64q_u32): Likewise. (vmvnq_n_u16): Likewise. (vmvnq_n_u32): Likewise. (vcvtq_u16_f16): Likewise. (vcvtq_u32_f32): Likewise. (__arm_vmvnq_n_s16): Define intrinsic. (__arm_vmvnq_n_s32): Likewise. (__arm_vrev64q_s8): Likewise. (__arm_vrev64q_s16): Likewise. (__arm_vrev64q_s32): Likewise. (__arm_vrev64q_u8): Likewise. (__arm_vrev64q_u16): Likewise. (__arm_vrev64q_u32): Likewise. (__arm_vmvnq_n_u16): Likewise. (__arm_vmvnq_n_u32): Likewise. (__arm_vcvtq_s16_f16): Likewise. (__arm_vcvtq_s32_f32): Likewise. (__arm_vcvtq_u16_f16): Likewise. (__arm_vcvtq_u32_f32): Likewise. (vrev64q): Define polymorphic variant. * config/arm/arm_mve_builtins.def (UNOP_SNONE_SNONE): Use it. (UNOP_SNONE_NONE): Likewise. (UNOP_SNONE_IMM): Likewise. (UNOP_UNONE_UNONE): Likewise. (UNOP_UNONE_NONE): Likewise. (UNOP_UNONE_IMM): Likewise. * config/arm/mve.md (mve_vrev64q_): Define RTL pattern. (mve_vcvtq_from_f_): Likewise. (mve_vmvnq_n_): Likewise. gcc/testsuite/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c: New test. * gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_u8.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 2fee417fe6585f457edd4cf96655366b1d6bd1a0..21b213d8e1bc99a3946f15e97161e01d73832799 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -313,6 +313,42 @@ arm_unop_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define UNOP_NONE_UNONE_QUALIFIERS \ (arm_unop_none_unone_qualifiers) +static enum arm_type_qualifiers +arm_unop_snone_snone_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none }; +#define UNOP_SNONE_SNONE_QUALIFIERS \ + (arm_unop_snone_snone_qualifiers) + +static enum arm_type_qualifiers +arm_unop_snone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none }; +#define UNOP_SNONE_NONE_QUALIFIERS \ + (arm_unop_snone_none_qualifiers) + +static enum arm_type_qualifiers +arm_unop_snone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_immediate }; +#define UNOP_SNONE_IMM_QUALIFIERS \ + (arm_unop_snone_imm_qualifiers) + +static enum arm_type_qualifiers +arm_unop_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_none }; +#define UNOP_UNONE_NONE_QUALIFIERS \ + (arm_unop_unone_none_qualifiers) + +static enum arm_type_qualifiers +arm_unop_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_unsigned }; +#define UNOP_UNONE_UNONE_QUALIFIERS \ + (arm_u
[PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand.
Hello, This patch supports following MVE ACLE intrinsics with unary operand. vctp16q, vctp32q, vctp64q, vctp8q, vpnot. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics There are few conflicts in defining the machine registers, resolved by re-ordering VPR_REGNUM, APSRQ_REGNUM and APSRGE_REGNUM. Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-12 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (hi_UP): Define mode. * config/arm/arm.h (IS_VPR_REGNUM): Move. * config/arm/arm.md (VPR_REGNUM): Define before APSRQ_REGNUM. (APSRQ_REGNUM): Modify. (APSRGE_REGNUM): Modify. * config/arm/arm_mve.h (vctp16q): Define macro. (vctp32q): Likewise. (vctp64q): Likewise. (vctp8q): Likewise. (vpnot): Likewise. (__arm_vctp16q): Define intrinsic. (__arm_vctp32q): Likewise. (__arm_vctp64q): Likewise. (__arm_vctp8q): Likewise. (__arm_vpnot): Likewise. * config/arm/arm_mve_builtins.def (UNOP_UNONE_UNONE): Use builtin qualifier. * config/arm/mve.md (mve_vctpqhi): Define RTL pattern. (mve_vpnothi): Likewise. gcc/testsuite/ChangeLog: 2019-11-12 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vctp16q.c: New test. * gcc.target/arm/mve/intrinsics/vctp32q.c: Likewise. * gcc.target/arm/mve/intrinsics/vctp64q.c: Likewise. * gcc.target/arm/mve/intrinsics/vctp8q.c: Likewise. * gcc.target/arm/mve/intrinsics/vpnot.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 21b213d8e1bc99a3946f15e97161e01d73832799..cd82aa159089c288607e240de02a85dcbb134a14 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -387,6 +387,7 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define oi_UP E_OImode #define hf_UP E_HFmode #define si_UP E_SImode +#define hi_UPE_HImode #define void_UP E_VOIDmode #define UP(X) X##_UP diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 485db72f05f16ca389227289a35c232dc982bf9d..95ec7963a57a1a5652a0a9dc30391a0ce6348242 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -955,6 +955,9 @@ extern int arm_arch_cmse; #define IS_IWMMXT_GR_REGNUM(REGNUM) \ (((REGNUM) >= FIRST_IWMMXT_GR_REGNUM) && ((REGNUM) <= LAST_IWMMXT_GR_REGNUM)) +#define IS_VPR_REGNUM(REGNUM) \ + ((REGNUM) == VPR_REGNUM) + /* Base register for access to local variables of the function. */ #define FRAME_POINTER_REGNUM 102 @@ -999,7 +1002,7 @@ extern int arm_arch_cmse; && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1)) /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP - + 1 APSRQ + 1 APSRGE + 1 VPR. */ + +1 VPR + 1 APSRQ + 1 APSRGE. */ /* Intel Wireless MMX Technology registers add 16 + 4 more. */ /* VFP (VFP3) adds 32 (64) + 1 VFPCC. */ #define FIRST_PSEUDO_REGISTER 107 @@ -1101,13 +1104,10 @@ extern int arm_regs_in_sequence[]; /* Registers not for general use. */\ CC_REGNUM, VFPCC_REGNUM, \ FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,\ - SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM, \ - VPR_REGNUM \ + SP_REGNUM, PC_REGNUM, VPR_REGNUM, APSRQ_REGNUM,\ + APSRGE_REGNUM\ } -#define IS_VPR_REGNUM(REGNUM) \ - ((REGNUM) == VPR_REGNUM) - /* Use different register alloc ordering for Thumb. */ #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc () diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 689baa0b0ff63ef90f47d2fd844cb98c9a1457a0..2a90482a873f8250a3b2b1dec141669f55e0c58b 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -39,9 +39,9 @@ (LAST_ARM_REGNUM 15) ; (CC_REGNUM 100) ; Condition code pseudo register (VFPCC_REGNUM101) ; VFP Condition code pseudo register - (APSRQ_REGNUM104) ; Q bit pseudo register - (APSRGE_REGNUM 105) ; GE bits pseudo register - (VPR_REGNUM 106) ; Vector Predication Register - MVE register. + (VPR_REGNUM 104) ; Vector Predication Register - MVE register. + (APSRQ_REGNUM105) ; Q bit pseudo register + (APSRGE_REGNUM 106) ; GE bits pseudo register ] ) ;; 3rd operand to select_dominance_cc_mode diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 1d357180ba9ddb26347b55cde625903bdb09eef6..c8d9b6471634725cea9bab3f9fa145810b506938 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/co
[PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.
Hello, This patch is part of MVE ACLE intrinsics framework. This patches add support to update (read/write) the APSR (Application Program Status Register) register and FPSCR (Floating-point Status and Control Register) register for MVE. This patch also enables thumb2 mov RTL patterns for MVE. Please refer to Arm reference manual [1] for more details. [1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914 Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath gcc/ChangeLog: 2019-11-11 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/thumb2.md (thumb2_movsfcc_soft_insn): Add check to not allow TARGET_HAVE_MVE for this pattern. (thumb2_cmse_entry_return): Add TARGET_HAVE_MVE check to update APSR register. * config/arm/unspecs.md (UNSPEC_GET_FPSCR): Define. (VUNSPEC_GET_FPSCR): Remove. * config/arm/vfp.md (thumb2_movhi_vfp): Add TARGET_HAVE_MVE check. (thumb2_movhi_fp16): Add TARGET_HAVE_MVE check. (thumb2_movsi_vfp): Add TARGET_HAVE_MVE check. (movdi_vfp): Add TARGET_HAVE_MVE check. (thumb2_movdf_vfp): Add TARGET_HAVE_MVE check. (thumb2_movsfcc_vfp): Add TARGET_HAVE_MVE check. (thumb2_movdfcc_vfp): Add TARGET_HAVE_MVE check. (push_multi_vfp): Add TARGET_HAVE_MVE check. (set_fpscr): Add TARGET_HAVE_MVE check. (get_fpscr): Add TARGET_HAVE_MVE check. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 809461a25da5a8058a8afce972dea0d3131effc0..81afd8fcdc1b0a82493dc0758bce16fa9e5fde20 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -435,10 +435,10 @@ (define_insn "*cmovsi_insn" [(set (match_operand:SI 0 "arm_general_register_operand" "=r,r,r,r,r,r,r") (if_then_else:SI -(match_operator 1 "arm_comparison_operator" - [(match_operand 2 "cc_register" "") (const_int 0)]) -(match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1") -(match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, r,UM,U1")))] + (match_operator 1 "arm_comparison_operator" +[(match_operand 2 "cc_register" "") (const_int 0)]) + (match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1") + (match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, r,UM,U1")))] "TARGET_THUMB2 && TARGET_COND_ARITH && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx) || (operands[3] == constm1_rtx && operands[4] == const1_rtx)))" @@ -540,7 +540,7 @@ [(match_operand 4 "cc_register" "") (const_int 0)]) (match_operand:SF 1 "s_register_operand" "0,r") (match_operand:SF 2 "s_register_operand" "r,0")))] - "TARGET_THUMB2 && TARGET_SOFT_FLOAT" + "TARGET_THUMB2 && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE" "@ it\\t%D3\;mov%D3\\t%0, %2 it\\t%d3\;mov%d3\\t%0, %1" @@ -1226,7 +1226,7 @@ ; added to clear the APSR and potentially the FPSCR if VFP is available, so ; we adapt the length accordingly. (set (attr "length") - (if_then_else (match_test "TARGET_HARD_FLOAT") + (if_then_else (match_test "TARGET_HARD_FLOAT || TARGET_HAVE_MVE") (const_int 34) (const_int 8))) ; We do not support predicate execution of returns from cmse_nonsecure_entry diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index b3b4f8ee3e2d1bdad968a9dd8ccbc72ded274f48..ac7fe7d0af19f1965356d47d8327e24d410b99bd 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -170,6 +170,7 @@ UNSPEC_TORC ; Used by the intrinsic form of the iWMMXt TORC instruction. UNSPEC_TORVSC; Used by the intrinsic form of the iWMMXt TORVSC instruction. UNSPEC_TEXTRC; Used by the intrinsic form of the iWMMXt TEXTRC instruction. + UNSPEC_GET_FPSCR ; Represent fetch of FPSCR content. ]) @@ -216,7 +217,6 @@ VUNSPEC_SLX ; Represent a store-register-release-exclusive. VUNSPEC_LDA ; Represent a store-register-acquire. VUNSPEC_STL ; Represent a store-register-release. - VUNSPEC_GET_FPSCR; Represent fetch of FPSCR content. VUNSPEC_SET_FPSCR; Represent assign of FPSCR content. VUNSPEC_PROBE_STACK_RANGE ; Represent stack range probing. VUNSPEC_CDP ; Represent the coprocessor cdp instruction. diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 6349c0570540ec25a599166f5d427fcbdbf2af68..461a5d71ca8548cfc61c83f9716249425633ad21 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -74,10 +74,10 @@ (define_insn "*thumb2_movhi_vfp" [(set (match_operand:HI 0 "nonimmediate_operand" -"=rk, r, l, r, m, r, *t, r, *t") +"=rk, r, l, r, m, r, *
[PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand.
Hello, This patch supports following MVE ACLE intrinsics with unary operand. vdupq_n_s8, vdupq_n_s16, vdupq_n_s32, vabsq_s8, vabsq_s16, vabsq_s32, vclsq_s8, vclsq_s16, vclsq_s32, vclzq_s8, vclzq_s16, vclzq_s32, vnegq_s8, vnegq_s16, vnegq_s32, vaddlvq_s32, vaddvq_s8, vaddvq_s16, vaddvq_s32, vmovlbq_s8, vmovlbq_s16, vmovltq_s8, vmovltq_s16, vmvnq_s8, vmvnq_s16, vmvnq_s32, vrev16q_s8, vrev32q_s8, vrev32q_s16, vqabsq_s8, vqabsq_s16, vqabsq_s32, vqnegq_s8, vqnegq_s16, vqnegq_s32, vcvtaq_s16_f16, vcvtaq_s32_f32, vcvtnq_s16_f16, vcvtnq_s32_f32, vcvtpq_s16_f16, vcvtpq_s32_f32, vcvtmq_s16_f16, vcvtmq_s32_f32, vmvnq_u8, vmvnq_u16, vmvnq_u32, vdupq_n_u8, vdupq_n_u16, vdupq_n_u32, vclzq_u8, vclzq_u16, vclzq_u32, vaddvq_u8, vaddvq_u16, vaddvq_u32, vrev32q_u8, vrev32q_u16, vmovltq_u8, vmovltq_u16, vmovlbq_u8, vmovlbq_u16, vrev16q_u8, vaddlvq_u32, vcvtpq_u16_f16, vcvtpq_u32_f32, vcvtnq_u16_f16, vcvtmq_u16_f16, vcvtmq_u32_f32, vcvtaq_u16_f16, vcvtaq_u32_f32, vdupq_n, vabsq, vclsq, vclzq, vnegq, vaddlvq, vaddvq, vmovlbq, vmovltq, vmvnq, vrev16q, vrev32q, vqabsq, vqnegq. A new register class "EVEN_REGS" which allows only even registers is added in this patch. The new constraint "e" allows only reigsters of EVEN_REGS class. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm.h (enum reg_class): Define new class EVEN_REGS. * config/arm/arm_mve.h (vdupq_n_s8): Define macro. (vdupq_n_s16): Likewise. (vdupq_n_s32): Likewise. (vabsq_s8): Likewise. (vabsq_s16): Likewise. (vabsq_s32): Likewise. (vclsq_s8): Likewise. (vclsq_s16): Likewise. (vclsq_s32): Likewise. (vclzq_s8): Likewise. (vclzq_s16): Likewise. (vclzq_s32): Likewise. (vnegq_s8): Likewise. (vnegq_s16): Likewise. (vnegq_s32): Likewise. (vaddlvq_s32): Likewise. (vaddvq_s8): Likewise. (vaddvq_s16): Likewise. (vaddvq_s32): Likewise. (vmovlbq_s8): Likewise. (vmovlbq_s16): Likewise. (vmovltq_s8): Likewise. (vmovltq_s16): Likewise. (vmvnq_s8): Likewise. (vmvnq_s16): Likewise. (vmvnq_s32): Likewise. (vrev16q_s8): Likewise. (vrev32q_s8): Likewise. (vrev32q_s16): Likewise. (vqabsq_s8): Likewise. (vqabsq_s16): Likewise. (vqabsq_s32): Likewise. (vqnegq_s8): Likewise. (vqnegq_s16): Likewise. (vqnegq_s32): Likewise. (vcvtaq_s16_f16): Likewise. (vcvtaq_s32_f32): Likewise. (vcvtnq_s16_f16): Likewise. (vcvtnq_s32_f32): Likewise. (vcvtpq_s16_f16): Likewise. (vcvtpq_s32_f32): Likewise. (vcvtmq_s16_f16): Likewise. (vcvtmq_s32_f32): Likewise. (vmvnq_u8): Likewise. (vmvnq_u16): Likewise. (vmvnq_u32): Likewise. (vdupq_n_u8): Likewise. (vdupq_n_u16): Likewise. (vdupq_n_u32): Likewise. (vclzq_u8): Likewise. (vclzq_u16): Likewise. (vclzq_u32): Likewise. (vaddvq_u8): Likewise. (vaddvq_u16): Likewise. (vaddvq_u32): Likewise. (vrev32q_u8): Likewise. (vrev32q_u16): Likewise. (vmovltq_u8): Likewise. (vmovltq_u16): Likewise. (vmovlbq_u8): Likewise. (vmovlbq_u16): Likewise. (vrev16q_u8): Likewise. (vaddlvq_u32): Likewise. (vcvtpq_u16_f16): Likewise. (vcvtpq_u32_f32): Likewise. (vcvtnq_u16_f16): Likewise. (vcvtmq_u16_f16): Likewise. (vcvtmq_u32_f32): Likewise. (vcvtaq_u16_f16): Likewise. (vcvtaq_u32_f32): Likewise. (__arm_vdupq_n_s8): Define intrinsic. (__arm_vdupq_n_s16): Likewise. (__arm_vdupq_n_s32): Likewise. (__arm_vabsq_s8): Likewise. (__arm_vabsq_s16): Likewise. (__arm_vabsq_s32): Likewise. (__arm_vclsq_s8): Likewise. (__arm_vclsq_s16): Likewise. (__arm_vclsq_s32): Likewise. (__arm_vclzq_s8): Likewise. (__arm_vclzq_s16): Likewise. (__arm_vclzq_s32): Likewise. (__arm_vnegq_s8): Likewise. (__arm_vnegq_s16): Likewise. (__arm_vnegq_s32): Likewise. (__arm_vaddlvq_s32): Likewise. (__arm_vaddvq_s8): Likewise. (__arm_vaddvq_s16): Likewise. (__arm_vaddvq_s32): Likewise. (__arm_vmovlbq_s8): Likewise. (__arm_vmovlbq_s16): Likewise. (__arm_vmovltq_s8): Likewise. (__arm_vmovltq_s16): Likewise. (__arm_vmvnq_s8): Likewise. (__arm_vmvnq_s16): Likewise. (__arm_vmvnq_s32): Likewise. (__ar
[PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.
Hello, This patch is part of MVE ACLE intrinsics framework. The patch supports the use of emulation for the double-precision arithmetic operations for MVE. This changes are to support the MVE ACLE intrinsics which operates on vector floating point arithmetic operations. Please refer to Arm reference manual [1] for more details. [1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914 Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-11 Andre Vieira Srinath Parvathaneni * config/arm/arm.c (arm_libcall_uses_aapcs_base): Modify function to add emulator calls for dobule precision arithmetic operations for MVE. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 6faed76206b93c1a9dea048e2f693dc16ee58072..358b2638b65a2007d1c7e8062844b67682597f45 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -5658,9 +5658,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall) /* Values from double-precision helper functions are returned in core registers if the selected core only supports single-precision arithmetic, even if we are using the hard-float ABI. The same is -true for single-precision helpers, but we will never be using the -hard-float ABI on a CPU which doesn't support single-precision -operations in hardware. */ +true for single-precision helpers except in case of MVE, because in +MVE we will be using the hard-float ABI on a CPU which doesn't support +single-precision operations in hardware. In MVE the following check +enables use of emulation for the double-precision arithmetic +operations. */ + if (TARGET_HAVE_MVE) + { + add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode)); + } add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode)); add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode)); add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode)); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 6faed76206b93c1a9dea048e2f693dc16ee58072..358b2638b65a2007d1c7e8062844b67682597f45 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -5658,9 +5658,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall) /* Values from double-precision helper functions are returned in core registers if the selected core only supports single-precision arithmetic, even if we are using the hard-float ABI. The same is -true for single-precision helpers, but we will never be using the -hard-float ABI on a CPU which doesn't support single-precision -operations in hardware. */ +true for single-precision helpers except in case of MVE, because in +MVE we will be using the hard-float ABI on a CPU which doesn't support +single-precision operations in hardware. In MVE the following check +enables use of emulation for the double-precision arithmetic +operations. */ + if (TARGET_HAVE_MVE) + { + add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode)); + } add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode)); add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode)); add_libcall (libcall_htab,
[committed] operator_abs::fold_range() returning incorrect result for overflows (pr92506)
Traced it back to a typo in operator_abs::fold_range() when I did the conversion where the wrong line got copied in.. Instead of returning value_range (type) when a overflow happens, it was returning the same result of the previous check, which was the case for all positives. This had EVRP setting the range of an ABS to [-MIN, -1] instead of varying, which later caused VRP to intersect that with 0 - [-MIN, -1] and all heck broke loose. doh. I also stumbled across a case where we should be starting with undefined in the default fold_range() and building with union for each sub-range. We previously declared a local value_range to work with, and that defaulted to undefined. When I changed it to a reference parameter, I need to explicitly initialize it. Bootstraps, checked in as revision 278259. Andrew 2019-11-14 Andrew MacLeod * range-op.cc (range_operator::fold_range): Start with range undefined. (operator_abs::wi_fold): Fix wrong line copy... With wrapv, abs with overflow is varying. Index: range-op.cc === *** range-op.cc (revision 277979) --- range-op.cc (working copy) *** range_operator::fold_range (value_range *** 146,151 --- 146,152 return; value_range tmp; + r.set_undefined (); for (unsigned x = 0; x < lh.num_pairs (); ++x) for (unsigned y = 0; y < rh.num_pairs (); ++y) { *** operator_abs::wi_fold (value_range &r, t *** 2359,2365 wide_int max_value = wi::max_value (prec, sign); if (!TYPE_OVERFLOW_UNDEFINED (type) && wi::eq_p (lh_lb, min_value)) { ! r = value_range (type, lh_lb, lh_ub); return; } --- 2360,2366 wide_int max_value = wi::max_value (prec, sign); if (!TYPE_OVERFLOW_UNDEFINED (type) && wi::eq_p (lh_lb, min_value)) { ! r = value_range (type); return; }
Re: Tweak gcc.dg/vect/bb-slp-4[01].c (PR92366)
On November 14, 2019 7:10:10 PM GMT+01:00, Richard Sandiford wrote: >gcc.dg/vect/bb-slp-40.c was failing on some targets because the >explicit dg-options overrode things like -maltivec. This patch >uses dg-additional-options instead. > >Also, it seems safer not to require exactly 1 instance of each message, >since that depends on the target vector length. > >gcc.dg/vect/bb-slp-41.c contained invariant constructors that are >vectorised on AArch64 (foo) and constructors that aren't (bar). >This meant that the number of times we print "Found vectorizable >constructor" depended on how many vector sizes we try, since we'd >print it for each failed attempt. > >In foo, we create invariant { b[0], ... } and { b[1], ... }, >and the test is making sure that the two separate invariant vectors >can be fed from the same vector load at b. This is a different case >from bb-slp-40.c, where the constructors are naturally separate. >(The expected count is 4 rather than 2 because we can vectorise the >epilogue too.) > >However, due to limitations in the loop vectoriser, we still do the >addition of { b[0], ... } and { b[1], ... } in the loop. Hopefully >that'll be fixed at some point, so this patch adds an alternative test >that directly needs 4 separate invariant constructors. E.g. with >Joel's >SLP optimisation, the new test generates: > >ldr q4, [x1] >dup v7.4s, v4.s[0] >dup v6.4s, v4.s[1] >dup v5.4s, v4.s[2] >dup v4.4s, v4.s[3] > >instead of the somewhat bizarre: > >ldp s6, s5, [x1, 4] >ldr s4, [x1, 12] >ld1r{v7.4s}, [x1] >dup v6.4s, v6.s[0] >dup v5.4s, v5.s[0] >dup v4.4s, v4.s[0] > >The patch then disables vectorisation of the original foo in >bb-vect-slp-41.c, so that we get the same correctness testing >for bar but don't need to test for specific counts. > >Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu. >OK to install? Ok. Richard. >Richard > > >2019-11-14 Richard Sandiford > >gcc/testsuite/ > PR testsuite/92366 > * gcc.dg/vect/bb-slp-40.c: Use dg-additional-options instead > of dg-options. Remove expected counts. > * gcc.dg/vect/bb-slp-41.c: Remove dg-options and explicit > dg-do run. Suppress vectorization of foo. > * gcc.dg/vect/bb-slp-42.c: New test. > >Index: gcc/testsuite/gcc.dg/vect/bb-slp-40.c >=== >--- gcc/testsuite/gcc.dg/vect/bb-slp-40.c 2019-11-04 21:13:57.363758109 >+ >+++ gcc/testsuite/gcc.dg/vect/bb-slp-40.c 2019-11-14 18:08:36.323546916 >+ >@@ -1,5 +1,5 @@ > /* { dg-do compile } */ >-/* { dg-options "-O3 -fdump-tree-slp-all" } */ >+/* { dg-additional-options "-fvect-cost-model=dynamic" } */ > /* { dg-require-effective-target vect_int } */ > > char g_d[1024], g_s1[1024], g_s2[1024]; >@@ -30,5 +30,5 @@ void foo(void) > } > > /* See that we vectorize an SLP instance. */ >-/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" >1 "slp1" } } */ >-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 >"slp1" } } */ >+/* { dg-final { scan-tree-dump "Found vectorizable constructor" "slp1" >} } */ >+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "slp1" } >} */ >Index: gcc/testsuite/gcc.dg/vect/bb-slp-41.c >=== >--- gcc/testsuite/gcc.dg/vect/bb-slp-41.c 2019-11-04 21:13:57.363758109 >+ >+++ gcc/testsuite/gcc.dg/vect/bb-slp-41.c 2019-11-14 18:08:36.323546916 >+ >@@ -1,10 +1,9 @@ >-/* { dg-do run } */ >-/* { dg-options "-O3 -fdump-tree-slp-all -fno-vect-cost-model" } */ > /* { dg-require-effective-target vect_int } */ > > #define ARR_SIZE 1000 > >-void foo (int *a, int *b) >+void __attribute__((optimize (0))) >+foo (int *a, int *b) > { > int i; > for (i = 0; i < (ARR_SIZE - 2); ++i) >@@ -56,6 +55,4 @@ int main () > return 0; > > } >-/* See that we vectorize an SLP instance. */ >-/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" >12 "slp1" } } */ >-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 >"slp1" } } */ >+/* { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" >"slp1" } } */ >Index: gcc/testsuite/gcc.dg/vect/bb-slp-42.c >=== >--- /dev/null 2019-09-17 11:41:18.176664108 +0100 >+++ gcc/testsuite/gcc.dg/vect/bb-slp-42.c 2019-11-14 18:08:36.323546916 >+ >@@ -0,0 +1,49 @@ >+/* { dg-require-effective-target vect_int } */ >+/* { dg-require-effective-target vect_perm } */ >+ >+#include "tree-vect.h" >+ >+#define ARR_SIZE 1024 >+ >+void __attribute__((noipa)) >+foo (int a[][ARR_SIZE], int *b) >+{ >+ int i; >+ for (i = 0; i < ARR_SIZE; ++i) >+{ >+ a[0][i] += b[0]; >+ a[1][i] += b[1]; >+ a[2][i] += b[2]; >+ a[3][i] += b[3]; >+} >+} >+ >+int >+main
[PATCH v2] Extend the simd function attribute
GCC currently supports two ways to declare the availability of vector variants of a scalar function: #pragma omp declare simd void f (void); and __attribute__ ((simd)) void f (void); However these declare a set of symbols that are different simd variants of f, so a library either provides definitions for all those symbols or it cannot use these declarations. (The set of declared symbols can be narrowed down with additional omp clauses, but not enough to allow declaring a single symbol.) OpenMP 5 has a declare variant feature that allows declaring more specific simd variants, but it is complicated and still doesn't provide a reliable mechanism (requires gcc or vendor specific extension for unambiguous declarations). And it requires -fopenmp. A simpler approach is to extend the gcc specific simd attribute such that it can specify a single vector variant of simple scalar functions. Where simple scalar functions are ones that only take and return scalar integer or floating type values. I believe this can be achieved by __attribute__ ((simd (mask, simdlen, simdabi, name where mask is "inbranch" or "notinbranch" like now, simdlen is an int with the same meaning as in omp declare simd and simdabi is a string specifying the call ABI (which the intel vector ABI calls ISA). The name is optional and allows a library to use a different symbol name than what the vector ABI specifies. The simd attribute currently can be used for both declarations and definitions, in the latter case the simd varaints of the function are generated, which should work with the extended simd attribute too. Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ChangeLog: 2019-11-14 Szabolcs Nagy * cgraph.h (struct cgraph_simd_clone): Add simdname field. * doc/extend.texi: Update the simd attribute documentation. * tree.h (OMP_CLAUSE__SIMDABI__EXPR): Define. (OMP_CLAUSE__SIMDNAME__EXPR): Define. * tree.c (walk_tree_1): Handle new omp clauses. * tree-core.h (enum omp_clause_code): Likewise. * tree-nested.c (convert_nonlocal_omp_clauses): Likewise. * tree-pretty-print.c (dump_omp_clause): Likewise. * omp-low.c (scan_sharing_clauses): Likewise. * omp-simd-clone.c (simd_clone_clauses_extract): Likewise. (simd_clone_mangle): Handle simdname. * config/aarch64/aarch64.c (aarch64_simd_clone_compute_vecsize_and_simdlen): Warn about unsupported SIMD ABI. * config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen): Likewise. gcc/c-family/ChangeLog: 2019-11-14 Szabolcs Nagy * c-attribs.c (handle_simd_attribute): Handle 4 arguments. gcc/testsuite/ChangeLog: 2019-11-14 Szabolcs Nagy * c-c++-common/attr-simd-5.c: Update. * c-c++-common/attr-simd-6.c: New test. * c-c++-common/attr-simd-7.c: New test. * c-c++-common/attr-simd-8.c: New test. diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index c62cebf7bfd..bf2301eb790 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -448,7 +448,7 @@ const struct attribute_spec c_common_attribute_table[] = handle_omp_declare_variant_attribute, NULL }, { "omp declare variant variant", 0, -1, true, false, false, false, handle_omp_declare_variant_attribute, NULL }, - { "simd", 0, 1, true, false, false, false, + { "simd", 0, 4, true, false, false, false, handle_simd_attribute, NULL }, { "omp declare target", 0, -1, true, false, false, false, handle_omp_declare_target_attribute, NULL }, @@ -3094,13 +3094,22 @@ handle_simd_attribute (tree *node, tree name, tree args, int, bool *no_add_attrs { tree t = get_identifier ("omp declare simd"); tree attr = NULL_TREE; + + /* Allow + simd + simd (mask) + simd (mask, simdlen) + simd (mask, simdlen, simdabi) + simd (mask, simdlen, simdabi, name) + forms. */ + if (args) { tree id = TREE_VALUE (args); if (TREE_CODE (id) != STRING_CST) { - error ("attribute %qE argument not a string", name); + error ("attribute %qE first argument not a string", name); *no_add_attrs = true; return NULL_TREE; } @@ -3113,13 +3122,75 @@ handle_simd_attribute (tree *node, tree name, tree args, int, bool *no_add_attrs OMP_CLAUSE_INBRANCH); else { - error ("only % and % flags are " - "allowed for %<__simd__%> attribute"); + error ("%qE attribute first argument must be % or " + "%", name); + *no_add_attrs = true; + return NULL_TREE; + } + + args = TREE_CHAIN (args); + } + + if (args) + { + tree arg = TREE_VALUE (args); + + arg = c_fully_fold (arg, false, NULL); + if (TREE_CODE (arg) != INTEGER_CST + || !INTEGRAL_TYPE_P (TREE_TYPE (arg)) + || tree_int_cst_sgn (arg) < 0) + { + error ("%qE attribute second argument m
[SVE] PR89007 - Implement generic vector average expansion
Hi, As suggested in PR, the attached patch falls back to distributing rshift over plus_expr instead of fallback widening -> arithmetic -> narrowing sequence, if target support is not available. Bootstrap+tested on x86_64-unknown-linux-gnu and aarch64-linux-gnu. OK to commit ? Thanks, Prathamesh 2019-11-15 Prathamesh Kulkarni PR tree-optimization/89007 * tree-vect-patterns.c (vect_recog_average_pattern): If there is no target support available, generate code to distribute rshift over plus and add one depending upon floor or ceil rounding. testsuite/ * gcc.target/aarch64/sve/pr89007.c: New test. diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c b/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c new file mode 100644 index 000..b682f3f3b74 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +#define N 1024 +unsigned char dst[N]; +unsigned char in1[N]; +unsigned char in2[N]; + +void +foo () +{ + for( int x = 0; x < N; x++ ) +dst[x] = (in1[x] + in2[x] + 1) >> 1; +} + +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */ +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */ diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 8ebbcd76b64..7025a3b4dc2 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -2019,22 +2019,59 @@ vect_recog_average_pattern (stmt_vec_info last_stmt_info, tree *type_out) /* Check for target support. */ tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type); - if (!new_vectype - || !direct_internal_fn_supported_p (ifn, new_vectype, - OPTIMIZE_FOR_SPEED)) + + if (!new_vectype) return NULL; + bool ifn_supported += direct_internal_fn_supported_p (ifn, new_vectype, OPTIMIZE_FOR_SPEED); + /* The IR requires a valid vector type for the cast result, even though it's likely to be discarded. */ *type_out = get_vectype_for_scalar_type (vinfo, type); if (!*type_out) return NULL; - /* Generate the IFN_AVG* call. */ tree new_var = vect_recog_temp_ssa_var (new_type, NULL); tree new_ops[2]; vect_convert_inputs (last_stmt_info, 2, new_ops, new_type, unprom, new_vectype); + + if (!ifn_supported) +{ + /* If there is no target support available, generate code +to distribute rshift over plus and add one depending +upon floor or ceil rounding. */ + + tree one_cst = build_one_cst (new_type); + + tree tmp1 = vect_recog_temp_ssa_var (new_type, NULL); + gassign *g1 = gimple_build_assign (tmp1, RSHIFT_EXPR, new_ops[0], one_cst); + + tree tmp2 = vect_recog_temp_ssa_var (new_type, NULL); + gassign *g2 = gimple_build_assign (tmp2, RSHIFT_EXPR, new_ops[1], one_cst); + + tree tmp3 = vect_recog_temp_ssa_var (new_type, NULL); + gassign *g3 = gimple_build_assign (tmp3, PLUS_EXPR, tmp1, tmp2); + + tree tmp4 = vect_recog_temp_ssa_var (new_type, NULL); + tree_code c = (ifn == IFN_AVG_CEIL) ? BIT_IOR_EXPR : BIT_AND_EXPR; + gassign *g4 = gimple_build_assign (tmp4, c, new_ops[0], new_ops[1]); + + tree tmp5 = vect_recog_temp_ssa_var (new_type, NULL); + gassign *g5 = gimple_build_assign (tmp5, BIT_AND_EXPR, tmp4, one_cst); + + gassign *g6 = gimple_build_assign (new_var, PLUS_EXPR, tmp3, tmp5); + + append_pattern_def_seq (last_stmt_info, g1, new_vectype); + append_pattern_def_seq (last_stmt_info, g2, new_vectype); + append_pattern_def_seq (last_stmt_info, g3, new_vectype); + append_pattern_def_seq (last_stmt_info, g4, new_vectype); + append_pattern_def_seq (last_stmt_info, g5, new_vectype); + return vect_convert_output (last_stmt_info, type, g6, new_vectype); +} + + /* Generate the IFN_AVG* call. */ gcall *average_stmt = gimple_build_call_internal (ifn, 2, new_ops[0], new_ops[1]); gimple_call_set_lhs (average_stmt, new_var);
[Patch] [mid-end][__RTL] Clean df state despite invalid __RTL startwith passes
Hi there, When compiling an __RTL function that has an invalid "startwith" pass we currently don't run the dfinish cleanup pass. This means we ICE on the next function. This change ensures that all state is cleaned up for the next function to run correctly. As an example, before this change the following code would ICE when compiling the function `foo2` because the "peephole2" pass is not run at optimisation level -O0. When compiled with ./aarch64-none-linux-gnu-gcc -O0 -S missed-pass-error.c -o test.s ``` int __RTL (startwith ("peephole2")) badfoo () { (function "badfoo" (insn-chain (block 2 (edge-from entry (flags "FALLTHRU")) (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) (cinsn 101 (set (reg:DI x19) (reg:DI x0))) (cinsn 10 (use (reg/i:SI x19))) (edge-to exit (flags "FALLTHRU")) ) ;; block 2 ) ;; insn-chain ) ;; function "foo2" } int __RTL (startwith ("final")) foo2 () { (function "foo2" (insn-chain (block 2 (edge-from entry (flags "FALLTHRU")) (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) (cinsn 101 (set (reg:DI x19) (reg:DI x0))) (cinsn 10 (use (reg/i:SI x19))) (edge-to exit (flags "FALLTHRU")) ) ;; block 2 ) ;; insn-chain ) ;; function "foo2" } ``` Now it silently ignores the __RTL function and successfully compiles foo2. regtest done on aarch64 regtest done on x86_64 OK for trunk? gcc/ChangeLog: 2019-11-14 Matthew Malcomson * passes.c (should_skip_pass_p): Always run "dfinish". gcc/testsuite/ChangeLog: 2019-11-14 Matthew Malcomson * gcc.dg/rtl/aarch64/missed-pass-error.c: New test. ### Attachment also inlined for ease of reply### diff --git a/gcc/passes.c b/gcc/passes.c index d86af115ecb16fcab6bfce070f1f3e4f1d90ce71..258f85ab4f8a1519b978b75dfa67536d2eacd106 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -2375,7 +2375,8 @@ should_skip_pass_p (opt_pass *pass) return false; /* Don't skip df init; later RTL passes need it. */ - if (strstr (pass->name, "dfinit") != NULL) + if (strstr (pass->name, "dfinit") != NULL + || strstr (pass->name, "dfinish") != NULL) return false; if (!quiet_flag) diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/missed-pass-error.c b/gcc/testsuite/gcc.dg/rtl/aarch64/missed-pass-error.c new file mode 100644 index ..2f02ca9d0c40b372d86b24009540e157ed1a8c59 --- /dev/null +++ b/gcc/testsuite/gcc.dg/rtl/aarch64/missed-pass-error.c @@ -0,0 +1,45 @@ +/* { dg-do compile { target aarch64-*-* } } */ +/* { dg-additional-options "-O0" } */ + +/* + When compiling __RTL functions the startwith string can be either incorrect + (i.e. not matching a pass) or be unused (i.e. can refer to a pass that is + not run at the current optimisation level). + + Here we ensure that the state clean up is still run, so that functions other + than the faulty one can still be compiled. + */ + +int __RTL (startwith ("peephole2")) badfoo () +{ +(function "badfoo" + (insn-chain +(block 2 + (edge-from entry (flags "FALLTHRU")) + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) + (cinsn 101 (set (reg:DI x19) (reg:DI x0))) + (cinsn 10 (use (reg/i:SI x19))) + (edge-to exit (flags "FALLTHRU")) +) ;; block 2 + ) ;; insn-chain +) ;; function "foo2" +} + +/* Compile a valid __RTL function to test state from the "dfinit" pass has been + cleaned with the "dfinish" pass. */ + +int __RTL (startwith ("final")) foo2 () +{ +(function "foo2" + (insn-chain +(block 2 + (edge-from entry (flags "FALLTHRU")) + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) + (cinsn 101 (set (reg:DI x19) (reg:DI x0))) + (cinsn 10 (use (reg/i:SI x19))) + (edge-to exit (flags "FALLTHRU")) +) ;; block 2 + ) ;; insn-chain +) ;; function "foo2" +} + diff --git a/gcc/passes.c b/gcc/passes.c index d86af115ecb16fcab6bfce070f1f3e4f1d90ce71..258f85ab4f8a1519b978b75dfa67536d2eacd106 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -2375,7 +2375,8 @@ should_skip_pass_p (opt_pass *pass) return false; /* Don't skip df init; later RTL passes need it. */ - if (strstr (pass->name, "dfinit") != NULL) + if (strstr (pass->name, "dfinit") != NULL + || strstr (pass->name, "dfinish") != NULL) return false; if (!quiet_flag) diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/missed-pass-error.c b/gcc/testsuite/gcc.dg/rtl/aarch64/missed-pass-error.c new file mode 100644 index ..2f02ca9d0c40b372d86b24009540e157ed1a8c59 --- /dev/null +++ b/gcc/testsuite/gcc.dg/rtl/aarch64/missed-pass-error.c @@ -0,0 +1,45 @@ +/* { dg-do compile { target aarch64-*-* } } */ +/* { dg-additional-options "-O0" } */ + +/* + When compiling __RTL functions the startwith string can be either incorrect + (i.e. not matching a pass) or be unused (i.e. can refer to a pass that is + not run at the current optimisation level). + + Here we ensure that the state clean
[mid-end][__RTL] Clean state despite unspecified __RTL startwith passes
Hi there, When compiling an __RTL function that has an unspecified "startwith" pass we currently don't run the cleanup pass, this means that we ICE on the next function (if it's a basic function). I asked about this on the GCC mailing list a while ago and Richard mentioned it might be a good idea to clear bad state so it doesn't leak to other functions. https://gcc.gnu.org/ml/gcc/2019-02/msg00106.html This change ensures that the clean_state pass is run even if the startwith pass is unspecified. We also ensure the name of the startwith pass is always freed correctly. As an example, before this change the following code would ICE when compiling the function `foo_a`. When compiled with ./aarch64-none-linux-gnu-gcc -O0 -S unspecified-pass-error.c -o test.s ``` int __RTL () badfoo () { (function "badfoo" (insn-chain (block 2 (edge-from entry (flags "FALLTHRU")) (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) (cinsn 101 (set (reg:DI x19) (reg:DI x0))) (cinsn 10 (use (reg/i:SI x19))) (edge-to exit (flags "FALLTHRU")) ) ;; block 2 ) ;; insn-chain ) ;; function "foo2" } int foo_a () { return 200; } ``` Now it silently ignores the __RTL function and successfully compiles foo_a. regtest done on aarch64 regtest done on x86_64 OK for trunk? gcc/ChangeLog: 2019-11-14 Matthew Malcomson * run-rtl-passes.c (run_rtl_passes): Accept and handle empty "initial_pass_name" argument -- by running "*clean_state" pass. Also free the "initial_pass_name" when done. gcc/c/ChangeLog: 2019-11-14 Matthew Malcomson * c-parser.c (c_parser_parse_rtl_body): Always call run_rtl_passes, even if startwith pass is not provided. gcc/testsuite/ChangeLog: 2019-11-14 Matthew Malcomson * gcc.dg/rtl/aarch64/unspecified-pass-error.c: New test. ### Attachment also inlined for ease of reply### diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 9589cc68c25b5b15bb364fdae56e24dedbe91601..05485833d306cd79c5405543175b63c2e7e62538 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -20868,11 +20868,9 @@ c_parser_parse_rtl_body (c_parser *parser, char *start_with_pass) return; } - /* If a pass name was provided for START_WITH_PASS, run the backend - accordingly now, on the cfun created above, transferring - ownership of START_WITH_PASS. */ - if (start_with_pass) -run_rtl_passes (start_with_pass); + /* Run the backend on the cfun created above, transferring ownership of + START_WITH_PASS. */ + run_rtl_passes (start_with_pass); } #include "gt-c-c-parser.h" diff --git a/gcc/run-rtl-passes.c b/gcc/run-rtl-passes.c index f65c0af6dfd48aa9ca7ec29b63d7cd4108432178..38765ebbc288e7aef35d7c02693efd534c6b2ddc 100644 --- a/gcc/run-rtl-passes.c +++ b/gcc/run-rtl-passes.c @@ -49,24 +49,31 @@ run_rtl_passes (char *initial_pass_name) switch_to_section (text_section); (*debug_hooks->assembly_start) (); - /* Pass "expand" normally sets this up. */ + if (initial_pass_name) +{ + /* Pass "expand" normally sets this up. */ #ifdef INSN_SCHEDULING - init_sched_attrs (); + init_sched_attrs (); #endif + bitmap_obstack_initialize (NULL); + bitmap_obstack_initialize (®_obstack); + opt_pass *rest_of_compilation + = g->get_passes ()->get_rest_of_compilation (); + gcc_assert (rest_of_compilation); + execute_pass_list (cfun, rest_of_compilation); - bitmap_obstack_initialize (NULL); - bitmap_obstack_initialize (®_obstack); - - opt_pass *rest_of_compilation -= g->get_passes ()->get_rest_of_compilation (); - gcc_assert (rest_of_compilation); - execute_pass_list (cfun, rest_of_compilation); - - opt_pass *clean_slate = g->get_passes ()->get_clean_slate (); - gcc_assert (clean_slate); - execute_pass_list (cfun, clean_slate); - - bitmap_obstack_release (®_obstack); + opt_pass *clean_slate = g->get_passes ()->get_clean_slate (); + gcc_assert (clean_slate); + execute_pass_list (cfun, clean_slate); + bitmap_obstack_release (®_obstack); +} + else +{ + opt_pass *clean_slate = g->get_passes ()->get_clean_slate (); + gcc_assert (clean_slate); + execute_pass_list (cfun, clean_slate); +} cfun->curr_properties |= PROP_rtl; + free (initial_pass_name); } diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/unspecified-pass-error.c b/gcc/testsuite/gcc.dg/rtl/aarch64/unspecified-pass-error.c new file mode 100644 index ..596501e977044132bd3e9a2d0afd0f8b2b789186 --- /dev/null +++ b/gcc/testsuite/gcc.dg/rtl/aarch64/unspecified-pass-error.c @@ -0,0 +1,30 @@ +/* { dg-do compile { target aarch64-*-* } } */ +/* { dg-additional-options "-O0" } */ + +/* + Ensure an __RTL function with an unspecified "startwith" pass doesn't cause + an assertion error on the next function. + */ + +int __RTL () badfoo () +{ +(function "badfoo" + (insn-chain +(block 2 +
Tweak gcc.dg/vect/bb-slp-4[01].c (PR92366)
gcc.dg/vect/bb-slp-40.c was failing on some targets because the explicit dg-options overrode things like -maltivec. This patch uses dg-additional-options instead. Also, it seems safer not to require exactly 1 instance of each message, since that depends on the target vector length. gcc.dg/vect/bb-slp-41.c contained invariant constructors that are vectorised on AArch64 (foo) and constructors that aren't (bar). This meant that the number of times we print "Found vectorizable constructor" depended on how many vector sizes we try, since we'd print it for each failed attempt. In foo, we create invariant { b[0], ... } and { b[1], ... }, and the test is making sure that the two separate invariant vectors can be fed from the same vector load at b. This is a different case from bb-slp-40.c, where the constructors are naturally separate. (The expected count is 4 rather than 2 because we can vectorise the epilogue too.) However, due to limitations in the loop vectoriser, we still do the addition of { b[0], ... } and { b[1], ... } in the loop. Hopefully that'll be fixed at some point, so this patch adds an alternative test that directly needs 4 separate invariant constructors. E.g. with Joel's SLP optimisation, the new test generates: ldr q4, [x1] dup v7.4s, v4.s[0] dup v6.4s, v4.s[1] dup v5.4s, v4.s[2] dup v4.4s, v4.s[3] instead of the somewhat bizarre: ldp s6, s5, [x1, 4] ldr s4, [x1, 12] ld1r{v7.4s}, [x1] dup v6.4s, v6.s[0] dup v5.4s, v5.s[0] dup v4.4s, v4.s[0] The patch then disables vectorisation of the original foo in bb-vect-slp-41.c, so that we get the same correctness testing for bar but don't need to test for specific counts. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu. OK to install? Richard 2019-11-14 Richard Sandiford gcc/testsuite/ PR testsuite/92366 * gcc.dg/vect/bb-slp-40.c: Use dg-additional-options instead of dg-options. Remove expected counts. * gcc.dg/vect/bb-slp-41.c: Remove dg-options and explicit dg-do run. Suppress vectorization of foo. * gcc.dg/vect/bb-slp-42.c: New test. Index: gcc/testsuite/gcc.dg/vect/bb-slp-40.c === --- gcc/testsuite/gcc.dg/vect/bb-slp-40.c 2019-11-04 21:13:57.363758109 + +++ gcc/testsuite/gcc.dg/vect/bb-slp-40.c 2019-11-14 18:08:36.323546916 + @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -fdump-tree-slp-all" } */ +/* { dg-additional-options "-fvect-cost-model=dynamic" } */ /* { dg-require-effective-target vect_int } */ char g_d[1024], g_s1[1024], g_s2[1024]; @@ -30,5 +30,5 @@ void foo(void) } /* See that we vectorize an SLP instance. */ -/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" 1 "slp1" } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp1" } } */ +/* { dg-final { scan-tree-dump "Found vectorizable constructor" "slp1" } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "slp1" } } */ Index: gcc/testsuite/gcc.dg/vect/bb-slp-41.c === --- gcc/testsuite/gcc.dg/vect/bb-slp-41.c 2019-11-04 21:13:57.363758109 + +++ gcc/testsuite/gcc.dg/vect/bb-slp-41.c 2019-11-14 18:08:36.323546916 + @@ -1,10 +1,9 @@ -/* { dg-do run } */ -/* { dg-options "-O3 -fdump-tree-slp-all -fno-vect-cost-model" } */ /* { dg-require-effective-target vect_int } */ #define ARR_SIZE 1000 -void foo (int *a, int *b) +void __attribute__((optimize (0))) +foo (int *a, int *b) { int i; for (i = 0; i < (ARR_SIZE - 2); ++i) @@ -56,6 +55,4 @@ int main () return 0; } -/* See that we vectorize an SLP instance. */ -/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" 12 "slp1" } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "slp1" } } */ +/* { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" "slp1" } } */ Index: gcc/testsuite/gcc.dg/vect/bb-slp-42.c === --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.dg/vect/bb-slp-42.c 2019-11-14 18:08:36.323546916 + @@ -0,0 +1,49 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_perm } */ + +#include "tree-vect.h" + +#define ARR_SIZE 1024 + +void __attribute__((noipa)) +foo (int a[][ARR_SIZE], int *b) +{ + int i; + for (i = 0; i < ARR_SIZE; ++i) +{ + a[0][i] += b[0]; + a[1][i] += b[1]; + a[2][i] += b[2]; + a[3][i] += b[3]; +} +} + +int +main () +{ + int a[4][ARR_SIZE]; + int b[4]; + + check_vect (); + + for (int i = 0; i < 4; ++i) +{ + b[i] = 20 * i; + for (int j = 0; j < ARR_SIZE; ++j) + a[i][j] = (i + 1) * ARR_SIZE - j; +
[arm] Follow up for asm-flags vs thumb1
What I committed today does in fact ICE for thumb1, as you suspected. I'm currently testing the following vs arm-sim/ arm-sim/-mthumb arm-sim/-mcpu=cortex-a15/-mthumb. which, with the default cpu for arm-elf-eabi, should test all of arm, thumb1, thumb2. I'm not thrilled about the ifdef in aarch-common.c, but I don't see a different way to catch this case for arm and still compile for aarch64. Ideas? Particularly ones that work with __attribute__((target("thumb")))? Which, now that I've thought about it I really should be testing... r~ gcc/ * config/arm/aarch-common.c (arm_md_asm_adjust): Sorry for asm flags in thumb1 mode. * config/arm/arm-c.c (arm_cpu_builtins): Do not define __GCC_ASM_FLAG_OUTPUTS__ in thumb1 mode. * doc/extend.texi (FlagOutputOperands): Document thumb1 restriction. gcc/testsuite/ * gcc.target/arm/asm-flag-1.c: Skip if arm_thumb1. * gcc.target/arm/asm-flag-3.c: Skip if arm_thumb1. * gcc.target/arm/asm-flag-5.c: Skip if arm_thumb1. * gcc.target/arm/asm-flag-6.c: Skip if arm_thumb1. diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c index 760ef6c9c0a..6f3db3838ba 100644 --- a/gcc/config/arm/aarch-common.c +++ b/gcc/config/arm/aarch-common.c @@ -544,6 +544,15 @@ arm_md_asm_adjust (vec &outputs, vec &/*inputs*/, if (strncmp (con, "=@cc", 4) != 0) continue; con += 4; + +#ifdef TARGET_THUMB1 + if (TARGET_THUMB1) + { + sorry ("asm flags not supported in thumb1 mode"); + break; + } +#endif + if (strchr (con, ',') != NULL) { error ("alternatives not allowed in % flag output"); diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index c4485ce7af1..865c448d531 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -122,7 +122,9 @@ arm_cpu_builtins (struct cpp_reader* pfile) if (arm_arch_notm) builtin_define ("__ARM_ARCH_ISA_ARM"); builtin_define ("__APCS_32__"); - builtin_define ("__GCC_ASM_FLAG_OUTPUTS__"); + + if (!TARGET_THUMB1) +builtin_define ("__GCC_ASM_FLAG_OUTPUTS__"); def_or_undef_macro (pfile, "__thumb__", TARGET_THUMB); def_or_undef_macro (pfile, "__thumb2__", TARGET_THUMB2); diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 1c8ae0d5cd3..62a98e939c8 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -9810,6 +9810,8 @@ signed greater than signed less than equal @end table +The flag output constraints are not supported in thumb1 mode. + @item x86 family The flag output constraints for the x86 family are of the form @samp{=@@cc@var{cond}} where @var{cond} is one of the standard diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-1.c b/gcc/testsuite/gcc.target/arm/asm-flag-1.c index 9707ebfcebb..97104d3ac73 100644 --- a/gcc/testsuite/gcc.target/arm/asm-flag-1.c +++ b/gcc/testsuite/gcc.target/arm/asm-flag-1.c @@ -1,6 +1,7 @@ /* Test the valid @cc asm flag outputs. */ /* { dg-do compile } */ /* { dg-options "-O" } */ +/* { dg-skip-if "" { arm_thumb1 } } */ #ifndef __GCC_ASM_FLAG_OUTPUTS__ #error "missing preprocessor define" diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-3.c b/gcc/testsuite/gcc.target/arm/asm-flag-3.c index e84e3431277..e2d616051cc 100644 --- a/gcc/testsuite/gcc.target/arm/asm-flag-3.c +++ b/gcc/testsuite/gcc.target/arm/asm-flag-3.c @@ -1,6 +1,7 @@ /* Test some of the valid @cc asm flag outputs. */ /* { dg-do compile } */ /* { dg-options "-O" } */ +/* { dg-skip-if "" { arm_thumb1 } } */ #define DO(C) \ void f##C(void) { char x; asm("" : "=@cc"#C(x)); if (!x) asm(""); asm(""); } diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-5.c b/gcc/testsuite/gcc.target/arm/asm-flag-5.c index 4d4394e1478..9a8ff586c29 100644 --- a/gcc/testsuite/gcc.target/arm/asm-flag-5.c +++ b/gcc/testsuite/gcc.target/arm/asm-flag-5.c @@ -1,6 +1,7 @@ /* Test error conditions of asm flag outputs. */ /* { dg-do compile } */ /* { dg-options "" } */ +/* { dg-skip-if "" { arm_thumb1 } } */ void f_B(void) { _Bool x; asm("" : "=@"(x)); } void f_c(void) { char x; asm("" : "=@"(x)); } diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-6.c b/gcc/testsuite/gcc.target/arm/asm-flag-6.c index 09174e04ae6..d862db4e106 100644 --- a/gcc/testsuite/gcc.target/arm/asm-flag-6.c +++ b/gcc/testsuite/gcc.target/arm/asm-flag-6.c @@ -1,5 +1,6 @@ /* Executable testcase for 'output flags.' */ /* { dg-do run } */ +/* { dg-skip-if "" { arm_thumb1 } } */ int test_bits (long nzcv) {
[PATCH] extend missing nul checks to all built-ins (PR 88226)
GCC 9 added checks for usafe uses of unterminated constant char arrays to a few string functions but the checking is far from comprehensive. It's been on my list of things to do to do a more thorough review and add the checks where they're missing. The attached patch does this for the majority of common built-ins. There still are a few where it could be added but this should cover most of the commonly used ones where the misuses are likely to come up. This patch depends on the one I posted earlier today for PR 92501: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01233.html I tested both patches together on x86_64-linux. Martin PS I considered introducing a new attribute, say string, first to reduce the extent of the changes in GCC, and second to provide a mechanism to let GCC check even user-defined functions for these bugs. I stopped short of doing this because most of the changes to the built-ins are necessary either way, and also because it seems late in the cycle to introduce such an extension. Unless there's a strong preference for adding it now I will revisit the decision for GCC 11. PR middle-end/88226 - missing warning on fprintf, fputs, and puts with an unterminated array gcc/ChangeLog: PR middle-end/88226 * builtins.c (check_nul_terminated_array): New function. (fold_builtin_0): Remove declaration. (fold_builtin_1): Same. (fold_builtin_2): Same. (fold_builtin_3): Same. (fold_builtin_strpbrk): Add argument. (fold_builtin_strspn): Same. (fold_builtin_strcspn): Same. (expand_builtin_strcat): Call it. Remove unused argument. (expand_builtin_stpncpy): Same. (expand_builtin_strncat): Same. (expand_builtin_strncpy): Same. Adjust indentation. (expand_builtin_strcmp): Same. (expand_builtin_strncmp): Same. (expand_builtin_fork_or_exec): Same. (expand_builtin): Handle more built-ins. (fold_builtin_2): Add argument. (fold_builtin_n): Make static. Add argument. (fold_call_expr): Pass new argument to fold_builtin_n and fold_builtin_2. (fold_builtin_call_array): Pass new argument to fold_builtin_n. (fold_builtin_strpbrk): Add argument. Call check_nul_terminated_array. (fold_call_stmt): Pass new argument to fold_builtin_n. * builtins.h: Correct a comment. * gimple-fold.c (gimple_fold_builtin_strchr): Call check_nul_terminated_array. * tree-ssa-strlen.c (handle_builtin_strlen): Call check_nul_terminated_array. (handle_builtin_strchr): Same. (handle_builtin_string_cmp): Same. gcc/testsuite/ChangeLog: PR middle-end/88226 * gcc.dg/Wstringop-overflow-22.c: New test. * gcc.dg/tree-ssa/builtin-fprintf-warn-1.c: Remove xfails. Index: gcc/builtins.c === --- gcc/builtins.c (revision 278253) +++ gcc/builtins.c (working copy) @@ -131,7 +131,7 @@ static rtx expand_builtin_memory_copy_args (tree d static rtx expand_builtin_memmove (tree, rtx); static rtx expand_builtin_mempcpy (tree, rtx); static rtx expand_builtin_mempcpy_args (tree, tree, tree, rtx, tree, memop_ret); -static rtx expand_builtin_strcat (tree, rtx); +static rtx expand_builtin_strcat (tree); static rtx expand_builtin_strcpy (tree, rtx); static rtx expand_builtin_strcpy_args (tree, tree, tree, rtx); static rtx expand_builtin_stpcpy (tree, rtx, machine_mode); @@ -166,15 +166,11 @@ static tree fold_builtin_fabs (location_t, tree, t static tree fold_builtin_abs (location_t, tree, tree); static tree fold_builtin_unordered_cmp (location_t, tree, tree, tree, enum tree_code, enum tree_code); -static tree fold_builtin_0 (location_t, tree); -static tree fold_builtin_1 (location_t, tree, tree); -static tree fold_builtin_2 (location_t, tree, tree, tree); -static tree fold_builtin_3 (location_t, tree, tree, tree, tree); static tree fold_builtin_varargs (location_t, tree, tree*, int); -static tree fold_builtin_strpbrk (location_t, tree, tree, tree); -static tree fold_builtin_strspn (location_t, tree, tree); -static tree fold_builtin_strcspn (location_t, tree, tree); +static tree fold_builtin_strpbrk (location_t, tree, tree, tree, tree); +static tree fold_builtin_strspn (location_t, tree, tree, tree); +static tree fold_builtin_strcspn (location_t, tree, tree, tree); static rtx expand_builtin_object_size (tree); static rtx expand_builtin_memory_chk (tree, rtx, machine_mode, @@ -564,6 +560,51 @@ warn_string_no_nul (location_t loc, const char *fn } } +/* For a call EXPR (which may be null) that expects a string argument + and SRC as the argument, returns false if SRC is a character array + with no terminating NUL. When nonnull, BOUND is the number of + characters in which to expect the terminating NUL. + When EXPR is nonnull also issues a warning. */ + +bool +check_nul_terminated_array (tree expr, tree src, tree bound /* = NULL_TREE */) +{ + tree size; + bool exact; + tree nonstr = unterminated_array (src, &size, &exact); + if (!nonstr) +return true; + + /* NONSTR refers to the non-nul terminated constant array and SI
Re: [patch, libgomp] Add tests for print from offload target
On Thu, Nov 14, 2019 at 05:18:41PM +, Andrew Stubbs wrote: > On 14/11/2019 17:05, Jakub Jelinek wrote: > > On Thu, Nov 14, 2019 at 04:47:49PM +, Andrew Stubbs wrote: > > > This patch adds new libgomp tests to ensure that C "printf" and Fortran > > > "write" work correctly within offload kernels. Both should work for > > > amdgcn, > > > but nvptx uses the libgfortran "minimal" mode which lacks "write" support. > > > > So, do those *.f90 testcases now FAIL with nvptx offloading? > > If yes, perhaps there should be effective target check that it is not nvptx > > offloading. > > Once the declare variant support is finished, at least in OpenMP it could be > > handled through that, but Fortran support for that will definitely not make > > GCC 10. > > Oops, I forgot to regenerate the patch before posting it. > > Here's the version with the nvptx xfails. > > OK? Ok. > Add tests for print from offload target. > > 2019-11-14 Andrew Stubbs > > libgomp/ > * testsuite/libgomp.c/target-print-1.c: New file. > * testsuite/libgomp.fortran/target-print-1.f90: New file. > * testsuite/libgomp.oacc-c/print-1.c: New file. > * testsuite/libgomp.oacc-fortran/print-1.f90: New file. > +int > +main () > +{ > +#pragma omp target > +{ > + printf ("The answer is %d\n", var); > +} Just a nit, #pragma omp target printf ("The answer is %d\n", var); would be valid too, but no need to change the testcase. Jakub
Re: [patch, libgomp] Add tests for print from offload target
On 14/11/2019 17:05, Jakub Jelinek wrote: On Thu, Nov 14, 2019 at 04:47:49PM +, Andrew Stubbs wrote: This patch adds new libgomp tests to ensure that C "printf" and Fortran "write" work correctly within offload kernels. Both should work for amdgcn, but nvptx uses the libgfortran "minimal" mode which lacks "write" support. So, do those *.f90 testcases now FAIL with nvptx offloading? If yes, perhaps there should be effective target check that it is not nvptx offloading. Once the declare variant support is finished, at least in OpenMP it could be handled through that, but Fortran support for that will definitely not make GCC 10. Oops, I forgot to regenerate the patch before posting it. Here's the version with the nvptx xfails. OK? Andrew Add tests for print from offload target. 2019-11-14 Andrew Stubbs libgomp/ * testsuite/libgomp.c/target-print-1.c: New file. * testsuite/libgomp.fortran/target-print-1.f90: New file. * testsuite/libgomp.oacc-c/print-1.c: New file. * testsuite/libgomp.oacc-fortran/print-1.f90: New file. diff --git a/libgomp/testsuite/libgomp.c/target-print-1.c b/libgomp/testsuite/libgomp.c/target-print-1.c new file mode 100644 index 000..5857b875ced --- /dev/null +++ b/libgomp/testsuite/libgomp.c/target-print-1.c @@ -0,0 +1,17 @@ +/* Ensure that printf on the offload device works. */ + +/* { dg-do run } */ +/* { dg-output "The answer is 42(\n|\r\n|\r)+" } */ + +#include + +int var = 42; + +int +main () +{ +#pragma omp target +{ + printf ("The answer is %d\n", var); +} +} diff --git a/libgomp/testsuite/libgomp.fortran/target-print-1.f90 b/libgomp/testsuite/libgomp.fortran/target-print-1.f90 new file mode 100644 index 000..c71a0952483 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/target-print-1.f90 @@ -0,0 +1,15 @@ +! Ensure that printf on the offload device works. + +! { dg-do run } +! { dg-output "The answer is 42(\n|\r\n|\r)+" } +! { dg-xfail-if "no write for nvidia" { openacc_nvidia_accel_selected } } + +program main + implicit none + integer :: var = 42 + +!$omp target + write (0, '("The answer is ", I2)') var +!$omp end target + +end program main diff --git a/libgomp/testsuite/libgomp.oacc-c/print-1.c b/libgomp/testsuite/libgomp.oacc-c/print-1.c new file mode 100644 index 000..593885b5c2c --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c/print-1.c @@ -0,0 +1,17 @@ +/* Ensure that printf on the offload device works. */ + +/* { dg-do run } */ +/* { dg-output "The answer is 42(\n|\r\n|\r)+" } */ + +#include + +int var = 42; + +int +main () +{ +#pragma acc parallel +{ + printf ("The answer is %d\n", var); +} +} diff --git a/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 new file mode 100644 index 000..a83280d1edb --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 @@ -0,0 +1,15 @@ +! Ensure that printf on the offload device works. + +! { dg-do run } +! { dg-output "The answer is 42(\n|\r\n|\r)+" } +! { dg-xfail-if "no write for nvidia" { openacc_nvidia_accel_selected } } + +program main + implicit none + integer :: var = 42 + +!$acc parallel + write (0, '("The answer is ", I2)') var +!$acc end parallel + +end program main
Re: [patch, libgomp] Add tests for print from offload target
On 11/14/19 5:47 PM, Andrew Stubbs wrote: This patch adds new libgomp tests to ensure that C "printf" and Fortran "write" work correctly within offload kernels. Both should work for amdgcn, but nvptx uses the libgfortran "minimal" mode which lacks "write" support. Can't you add something like: ! { dg-do run { target { ! { openacc_nvidia_accel_selected } } } } ! For openacc_nvidia_accel_selected, there is no I/O support. To avoid FAILs? Cheers, Tobias
[COMMITTED] Remove range_intersect, range_invert, and range_union.
range_intersect, range_union, and range_intersect are currently returning their results by value. After Andrew's change, these should also return their results in an argument. However, if we do this, the functions become superfluous since we have corresponding API methods with the same functionality: - r = range_intersect (op1, op2); + r = op1; + r.intersect (op2); I have removed all 3 functions and have adjusted the code throughout. Committed as mostly obvious, after having consulted with Andrew that it was these and not the range_true* ones as well that needed adjusting. Aldy commit e0f55e7de91f779fe12ab65fc9479e4df0fe2614 Author: Aldy Hernandez Date: Thu Nov 14 17:55:32 2019 +0100 Remove range_intersect, range_invert, and range_union. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 051b10ed953..4266f6b1655 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,16 @@ +2019-11-14 Aldy Hernandez + + * range-op.cc (*operator*::*range): Remove calls to + range_intersect, range_invert, and range_union in favor of calling + the in-place API methods. + (range_tests): Same. + * range.cc (range_intersect): Remove. + (range_union): Remove. + (range_invert): Remove. + * range.h (range_intersect): Remove. + (range_union): Remove. + (range_intersect): Remove. + 2019-11-14 Ilya Leoshkevich PR rtl-optimization/92430 diff --git a/gcc/range-op.cc b/gcc/range-op.cc index ae3025c6eea..4a23cca3dbb 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -396,7 +396,8 @@ operator_equal::fold_range (value_range &r, tree type, { // If ranges do not intersect, we know the range is not equal, // otherwise we don't know anything for sure. - r = range_intersect (op1, op2); + r = op1; + r.intersect (op2); if (r.undefined_p ()) r = range_false (type); else @@ -415,7 +416,10 @@ operator_equal::op1_range (value_range &r, tree type, // If the result is false, the only time we know anything is // if OP2 is a constant. if (wi::eq_p (op2.lower_bound(), op2.upper_bound())) - r = range_invert (op2); + { + r = op2; + r.invert (); + } else r.set_varying (type); break; @@ -476,7 +480,8 @@ operator_not_equal::fold_range (value_range &r, tree type, { // If ranges do not intersect, we know the range is not equal, // otherwise we don't know anything for sure. - r = range_intersect (op1, op2); + r = op1; + r.intersect (op2); if (r.undefined_p ()) r = range_true (type); else @@ -495,7 +500,10 @@ operator_not_equal::op1_range (value_range &r, tree type, // If the result is true, the only time we know anything is if // OP2 is a constant. if (wi::eq_p (op2.lower_bound(), op2.upper_bound())) - r = range_invert (op2); + { + r = op2; + r.invert (); + } else r.set_varying (type); break; @@ -1974,7 +1982,8 @@ operator_logical_or::fold_range (value_range &r, tree type ATTRIBUTE_UNUSED, if (empty_range_check (r, lh, rh)) return; - r = range_union (lh, rh); + r = lh; + r.union_ (rh); } bool @@ -2221,7 +2230,10 @@ operator_logical_not::fold_range (value_range &r, tree type, if (lh.varying_p () || lh.undefined_p ()) r = lh; else -r = range_invert (lh); +{ + r = lh; + r.invert (); +} gcc_checking_assert (lh.type() == type); return; } @@ -2232,10 +2244,9 @@ operator_logical_not::op1_range (value_range &r, const value_range &lhs, const value_range &op2 ATTRIBUTE_UNUSED) const { - if (lhs.varying_p () || lhs.undefined_p ()) -r = lhs; - else -r = range_invert (lhs); + r = lhs; + if (!lhs.varying_p () && !lhs.undefined_p ()) +r.invert (); return true; } @@ -3033,13 +3044,6 @@ range_tests () r1.union_ (r2); ASSERT_TRUE (r0 == r1); - // [10,20] U [30,40] ==> [10,20][30,40]. - r0 = value_range (INT (10), INT (20)); - r1 = value_range (INT (30), INT (40)); - r0.union_ (r1); - ASSERT_TRUE (r0 == range_union (value_range (INT (10), INT (20)), - value_range (INT (30), INT (40; - // Make sure NULL and non-NULL of pointer types work, and that // inverses of them are consistent. tree voidp = build_pointer_type (void_type_node); @@ -3049,27 +3053,12 @@ range_tests () r0.invert (); ASSERT_TRUE (r0 == r1); - // [10,20][30,40] U [25,70] => [10,70]. - r0 = range_union (value_range (INT (10), INT (20)), - value_range (INT (30), INT (40))); - r1 = value_range (INT (25), INT (70)); - r0.union_ (r1); - ASSERT_TRUE (r0 == range_union (value_range (INT (10), INT (20)), - value_range (INT (25), INT (70; - // [10,20] U [15, 30] => [10, 30]. r0 = value_range (INT (10), INT (20)); r1 = value_range (INT (15), INT (30)); r0.union_ (r1); ASSERT_TRUE (r0 == value_range (INT (10), INT (30))); - // [10,20] U [25,25] => [10,20][25,25]. - r0 = value_range (INT (10), INT (20)); - r1 = value_range (INT (25),
[PATCH] fold strncmp of unterminated arrays (PR 92501)
Adding tests for unsafe uses of unterminated constant char arrays in string functions exposed the limitation in strncmp folding described in PR 92501: GCC only folds strncmp calls involving nul-terminated constant strings. The attached patch improves the folder to also handle unterminated constant character arrays. This capability is in turn relied on for the dependable detection of unsafe uses of unterminated arrays in strncpy, a patch for which I'm about to post separately. Tested on x86_64-linux. Martin PR tree-optimization/92501 - strncmp with constant unterminated arrays not folded gcc/testsuite/ChangeLog: PR tree-optimization/92501 * gcc.dg/strcmpopt_7.c: New test. gcc/ChangeLog: PR tree-optimization/92501 * gimple-fold.c ((gimple_fold_builtin_string_compare): Let strncmp handle unterminated arrays. Rename local variables for clarity. Index: gcc/gimple-fold.c === --- gcc/gimple-fold.c (revision 278253) +++ gcc/gimple-fold.c (working copy) @@ -2323,8 +2323,7 @@ gimple_load_first_char (location_t loc, tree str, return var; } -/* Fold a call to the str{n}{case}cmp builtin pointed by GSI iterator. - FCODE is the name of the builtin. */ +/* Fold a call to the str{n}{case}cmp builtin pointed by GSI iterator. */ static bool gimple_fold_builtin_string_compare (gimple_stmt_iterator *gsi) @@ -2337,18 +2336,19 @@ gimple_fold_builtin_string_compare (gimple_stmt_it tree str1 = gimple_call_arg (stmt, 0); tree str2 = gimple_call_arg (stmt, 1); tree lhs = gimple_call_lhs (stmt); - HOST_WIDE_INT length = -1; + tree len = NULL_TREE; + HOST_WIDE_INT bound = -1; /* Handle strncmp and strncasecmp functions. */ if (gimple_call_num_args (stmt) == 3) { - tree len = gimple_call_arg (stmt, 2); + len = gimple_call_arg (stmt, 2); if (tree_fits_uhwi_p (len)) - length = tree_to_uhwi (len); + bound = tree_to_uhwi (len); } /* If the LEN parameter is zero, return zero. */ - if (length == 0) + if (bound == 0) { replace_call_with_value (gsi, integer_zero_node); return true; @@ -2361,9 +2361,32 @@ gimple_fold_builtin_string_compare (gimple_stmt_it return true; } - const char *p1 = c_getstr (str1); - const char *p2 = c_getstr (str2); + /* Initially set to the number of characters, including the terminating + nul if each array has one. Nx == strnlen(Sx, Nx) implies that + the array is not terminated by a nul. + For nul-terminated strings then adjusted to their length. */ + unsigned HOST_WIDE_INT len1 = HOST_WIDE_INT_MAX, len2 = len1; + const char *p1 = c_getstr (str1, &len1); + const char *p2 = c_getstr (str2, &len2); + /* The position of the terminting nul character if one exists, otherwise + a value greater than LENx. */ + unsigned HOST_WIDE_INT nulpos1 = HOST_WIDE_INT_MAX, nulpos2 = nulpos1; + + if (p1) +{ + nulpos1 = strnlen (p1, len1); + if (nulpos1 < len1) + len1 = nulpos1; +} + + if (p2) +{ + nulpos2 = strnlen (p2, len2); + if (nulpos2 < len2) + len2 = nulpos2; +} + /* For known strings, return an immediate value. */ if (p1 && p2) { @@ -2374,17 +2397,19 @@ gimple_fold_builtin_string_compare (gimple_stmt_it { case BUILT_IN_STRCMP: case BUILT_IN_STRCMP_EQ: - { - r = strcmp (p1, p2); - known_result = true; + if (len1 != nulpos1 || len2 != nulpos2) break; - } + + r = strcmp (p1, p2); + known_result = true; + break; + case BUILT_IN_STRNCMP: case BUILT_IN_STRNCMP_EQ: { - if (length == -1) + if (bound == -1) break; - r = strncmp (p1, p2, length); + r = strncmp (p1, p2, bound); known_result = true; break; } @@ -2394,9 +2419,9 @@ gimple_fold_builtin_string_compare (gimple_stmt_it break; case BUILT_IN_STRNCASECMP: { - if (length == -1) + if (bound == -1) break; - r = strncmp (p1, p2, length); + r = strncmp (p1, p2, bound); if (r == 0) known_result = true; break; @@ -2412,7 +2437,7 @@ gimple_fold_builtin_string_compare (gimple_stmt_it } } - bool nonzero_length = length >= 1 + bool nonzero_bound = bound >= 1 || fcode == BUILT_IN_STRCMP || fcode == BUILT_IN_STRCMP_EQ || fcode == BUILT_IN_STRCASECMP; @@ -2420,7 +2445,7 @@ gimple_fold_builtin_string_compare (gimple_stmt_it location_t loc = gimple_location (stmt); /* If the second arg is "", return *(const unsigned char*)arg1. */ - if (p2 && *p2 == '\0' && nonzero_length) + if (p2 && *p2 == '\0' && nonzero_bound) { gimple_seq stmts = NULL; tree var = gimple_load_first_char (loc, str1, &stmts); @@ -2435,7 +2460,7 @@ gimple_fold_builtin_string_compare (gimple_stmt_it } /* If the first arg is "", return -*(const unsigned char*)arg2. */ - if (p1 && *p1 == '\0' && nonzero_length) + if (p1 && *p1 == '\0' && nonzero_bound) { gimple_se
Re: [patch, libgomp] Add tests for print from offload target
On Thu, Nov 14, 2019 at 04:47:49PM +, Andrew Stubbs wrote: > This patch adds new libgomp tests to ensure that C "printf" and Fortran > "write" work correctly within offload kernels. Both should work for amdgcn, > but nvptx uses the libgfortran "minimal" mode which lacks "write" support. So, do those *.f90 testcases now FAIL with nvptx offloading? If yes, perhaps there should be effective target check that it is not nvptx offloading. Once the declare variant support is finished, at least in OpenMP it could be handled through that, but Fortran support for that will definitely not make GCC 10. Jakub
Re: [Patch, fortran] PR69654 - ICE in gfc_trans_structure_assign
On Thu, Nov 14, 2019 at 03:52:26PM +, Paul Richard Thomas wrote: > As I remarked in PR, this fix probably comes 1,379 days too late. I am > not at all sure that I understand why I couldn't see the problem > because it is rather trivial. > > I am open to not adding the second gcc_assert - it does seem like overkill. > > Regtested on FC30/x86_64 - OK for trunk and ? > Yes. 7-branch is now closed. So, if you're inclined to backport then it is also ok for 8 and 9 branches after testing. -- Steve
[PATCH] libstdc++: Implement new predicate concepts from P1716R3
* include/bits/iterator_concepts.h (__iter_concept_impl): Add comments. (indirect_relation): Rename to indirect_binary_predicate and adjust definition as per P1716R3. (indirect_equivalence_relation): Define. (indirectly_comparable): Adjust definition. * include/std/concepts (equivalence_relation): Define. * testsuite/std/concepts/concepts.callable/relation.cc: Add tests for equivalence_relation. Tested powerpc64le-linux, committed to trunk. commit 01e0b14116ce56d4327362686334c37272faac43 Author: Jonathan Wakely Date: Wed Nov 13 22:26:06 2019 + libstdc++: Implement new predicate concepts from P1716R3 * include/bits/iterator_concepts.h (__iter_concept_impl): Add comments. (indirect_relation): Rename to indirect_binary_predicate and adjust definition as per P1716R3. (indirect_equivalence_relation): Define. (indirectly_comparable): Adjust definition. * include/std/concepts (equivalence_relation): Define. * testsuite/std/concepts/concepts.callable/relation.cc: Add tests for equivalence_relation. diff --git a/libstdc++-v3/include/bits/iterator_concepts.h b/libstdc++-v3/include/bits/iterator_concepts.h index 7cc058eb8c9..90a8bc8071f 100644 --- a/libstdc++-v3/include/bits/iterator_concepts.h +++ b/libstdc++-v3/include/bits/iterator_concepts.h @@ -420,20 +420,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION namespace __detail { template - struct __iter_concept_impl - { }; + struct __iter_concept_impl; +// ITER_CONCEPT(I) is ITER_TRAITS(I)::iterator_concept if that is valid. template requires requires { typename __iter_traits<_Iter>::iterator_concept; } struct __iter_concept_impl<_Iter> { using type = typename __iter_traits<_Iter>::iterator_concept; }; +// Otherwise, ITER_TRAITS(I)::iterator_category if that is valid. template requires (!requires { typename __iter_traits<_Iter>::iterator_concept; } && requires { typename __iter_traits<_Iter>::iterator_category; }) struct __iter_concept_impl<_Iter> { using type = typename __iter_traits<_Iter>::iterator_category; }; +// Otherwise, random_access_tag if iterator_traits is not specialized. template requires (!requires { typename __iter_traits<_Iter>::iterator_concept; } && !requires { typename __iter_traits<_Iter>::iterator_category; } @@ -441,7 +443,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION struct __iter_concept_impl<_Iter> { using type = random_access_iterator_tag; }; -// ITER_TRAITS +// Otherwise, there is no ITER_CONCEPT(I) type. +template + struct __iter_concept_impl + { }; + +// ITER_CONCEPT template using __iter_concept = typename __iter_concept_impl<_Iter>::type; } // namespace __detail @@ -615,15 +622,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION && predicate<_Fn&, iter_reference_t<_Iter>> && predicate<_Fn&, iter_common_reference_t<_Iter>>; - template -concept indirect_relation = readable<_I1> && readable<_I2> + template +concept indirect_binary_predicate = readable<_I1> && readable<_I2> && copy_constructible<_Fn> - && relation<_Fn&, iter_value_t<_I1>&, iter_value_t<_I2>&> - && relation<_Fn&, iter_value_t<_I1>&, iter_reference_t<_I2>> - && relation<_Fn&, iter_reference_t<_I1>, iter_value_t<_I2>&> - && relation<_Fn&, iter_reference_t<_I1>, iter_reference_t<_I2>> - && relation<_Fn&, iter_common_reference_t<_I1>, - iter_common_reference_t<_I2>>; + && predicate<_Fn&, iter_value_t<_I1>&, iter_value_t<_I2>&> + && predicate<_Fn&, iter_value_t<_I1>&, iter_reference_t<_I2>> + && predicate<_Fn&, iter_reference_t<_I1>, iter_value_t<_I2>&> + && predicate<_Fn&, iter_reference_t<_I1>, iter_reference_t<_I2>> + && predicate<_Fn&, iter_common_reference_t<_I1>, + iter_common_reference_t<_I2>>; + + template +concept indirect_equivalence_relation = readable<_I1> && readable<_I2> + && copy_constructible<_Fn> + && equivalence_relation<_Fn&, iter_value_t<_I1>&, iter_value_t<_I2>&> + && equivalence_relation<_Fn&, iter_value_t<_I1>&, iter_reference_t<_I2>> + && equivalence_relation<_Fn&, iter_reference_t<_I1>, iter_value_t<_I2>&> + && equivalence_relation<_Fn&, iter_reference_t<_I1>, + iter_reference_t<_I2>> + && equivalence_relation<_Fn&, iter_common_reference_t<_I1>, + iter_common_reference_t<_I2>>; template concept indirect_strict_weak_order = readable<_I1> && readable<_I2> @@ -767,7 +785,8 @@ namespace ranges template concept indirectly_comparable - = indirect_relation<_Rel, projected<_I1, _P1>, projected<_I2, _P2>>; + = indirect_binary_predicate<_Rel, projected<_I1, _P1>, +
[PATCH] libstdc++: Rename disable_sized_sentinel [P1871R1]
* include/bits/iterator_concepts.h (disable_sized_sentinel): Rename to disable_sized_sentinel_for. * testsuite/24_iterators/headers/iterator/synopsis_c++20.cc: Adjust. Tested powerpc64le-linux, committed to trunk. commit 3a7f3e87680cc0ca20318b0983d517cd32851fc5 Author: Jonathan Wakely Date: Wed Nov 13 22:27:59 2019 + libstdc++: Rename disable_sized_sentinel [P1871R1] * include/bits/iterator_concepts.h (disable_sized_sentinel): Rename to disable_sized_sentinel_for. * testsuite/24_iterators/headers/iterator/synopsis_c++20.cc: Adjust. diff --git a/libstdc++-v3/include/bits/iterator_concepts.h b/libstdc++-v3/include/bits/iterator_concepts.h index 8b398616a56..7cc058eb8c9 100644 --- a/libstdc++-v3/include/bits/iterator_concepts.h +++ b/libstdc++-v3/include/bits/iterator_concepts.h @@ -524,11 +524,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION && __detail::__weakly_eq_cmp_with<_Sent, _Iter>; template -inline constexpr bool disable_sized_sentinel = false; +inline constexpr bool disable_sized_sentinel_for = false; template concept sized_sentinel_for = sentinel_for<_Sent, _Iter> -&& !disable_sized_sentinel, remove_cv_t<_Iter>> +&& !disable_sized_sentinel_for, remove_cv_t<_Iter>> && requires(const _Iter& __i, const _Sent& __s) { { __s - __i } -> same_as>; diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h index 411feba90e0..a707621c9ed 100644 --- a/libstdc++-v3/include/bits/stl_iterator.h +++ b/libstdc++-v3/include/bits/stl_iterator.h @@ -449,9 +449,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION # if __cplusplus > 201703L && defined __cpp_lib_concepts template requires (!sized_sentinel_for<_Iterator1, _Iterator2>) -inline constexpr bool disable_sized_sentinel, -reverse_iterator<_Iterator2>> - = true; +inline constexpr bool +disable_sized_sentinel_for, + reverse_iterator<_Iterator2>> = true; # endif // C++20 # endif // C++14 diff --git a/libstdc++-v3/testsuite/24_iterators/headers/iterator/synopsis_c++20.cc b/libstdc++-v3/testsuite/24_iterators/headers/iterator/synopsis_c++20.cc index 824b0b4f38c..fb3bb420a54 100644 --- a/libstdc++-v3/testsuite/24_iterators/headers/iterator/synopsis_c++20.cc +++ b/libstdc++-v3/testsuite/24_iterators/headers/iterator/synopsis_c++20.cc @@ -79,7 +79,7 @@ namespace std } struct I { }; -template<> constexpr bool std::disable_sized_sentinel = true; +template<> constexpr bool std::disable_sized_sentinel_for = true; namespace __gnu_test { @@ -87,8 +87,8 @@ namespace __gnu_test constexpr auto* iter_move = &std::ranges::iter_move; constexpr auto* iter_swap = &std::ranges::iter_swap; // sized sentinels - constexpr bool const* disable_sized_sentinel -= &std::disable_sized_sentinel; + constexpr bool const* disable_sized_sentinel_for += &std::disable_sized_sentinel_for; // default sentinels constexpr std::default_sentinel_t const* default_sentinel = &std::default_sentinel;
[patch, libgomp] Add tests for print from offload target
Hi, This patch adds new libgomp tests to ensure that C "printf" and Fortran "write" work correctly within offload kernels. Both should work for amdgcn, but nvptx uses the libgfortran "minimal" mode which lacks "write" support. Obviously, printing from offload kernels is not recommended in production, but can be useful in development. OK to commit? Thanks Andrew Add tests for print from offload target. 2019-11-14 Andrew Stubbs libgomp/ * testsuite/libgomp.c/target-print-1.c: New file. * testsuite/libgomp.fortran/target-print-1.f90: New file. * testsuite/libgomp.oacc-c/print-1.c: New file. * testsuite/libgomp.oacc-fortran/print-1.f90: New file. diff --git a/libgomp/testsuite/libgomp.c/target-print-1.c b/libgomp/testsuite/libgomp.c/target-print-1.c new file mode 100644 index 000..5857b875ced --- /dev/null +++ b/libgomp/testsuite/libgomp.c/target-print-1.c @@ -0,0 +1,17 @@ +/* Ensure that printf on the offload device works. */ + +/* { dg-do run } */ +/* { dg-output "The answer is 42(\n|\r\n|\r)+" } */ + +#include + +int var = 42; + +int +main () +{ +#pragma omp target +{ + printf ("The answer is %d\n", var); +} +} diff --git a/libgomp/testsuite/libgomp.fortran/target-print-1.f90 b/libgomp/testsuite/libgomp.fortran/target-print-1.f90 new file mode 100644 index 000..73ee09a2b79 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/target-print-1.f90 @@ -0,0 +1,14 @@ +! Ensure that printf on the offload device works. + +! { dg-do run } +! { dg-output "The answer is 42(\n|\r\n|\r)+" } + +program main + implicit none + integer :: var = 42 + +!$omp target + write (0, '("The answer is ", I2)') var +!$omp end target + +end program main diff --git a/libgomp/testsuite/libgomp.oacc-c/print-1.c b/libgomp/testsuite/libgomp.oacc-c/print-1.c new file mode 100644 index 000..593885b5c2c --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c/print-1.c @@ -0,0 +1,17 @@ +/* Ensure that printf on the offload device works. */ + +/* { dg-do run } */ +/* { dg-output "The answer is 42(\n|\r\n|\r)+" } */ + +#include + +int var = 42; + +int +main () +{ +#pragma acc parallel +{ + printf ("The answer is %d\n", var); +} +} diff --git a/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 new file mode 100644 index 000..bef664df4fa --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 @@ -0,0 +1,14 @@ +! Ensure that printf on the offload device works. + +! { dg-do run } +! { dg-output "The answer is 42(\n|\r\n|\r)+" } + +program main + implicit none + integer :: var = 42 + +!$acc parallel + write (0, '("The answer is ", I2)') var +!$acc end parallel + +end program main