Re: [PATCH] c++: private inheritance access diagnostics fix [PR17314]

2021-01-08 Thread Anthony Sharp via Gcc-patches
Hi Jason,

Thank you!

> To start with, do you have a copyright assignment on file or in the
> works already?

Good point. I incorrectly assumed it would only be a minor
contribution copyright-wise. Mr Edelsohn gave me a template which I've
now filled out and sent to ass...@gnu.org. I'm assuming I just need to
wait for them to send me the form. I'll update this thread when that's
sorted. In the meantime I've hopefully fixed some of the issues.

> Second, your patch was mangled by word wrap so that it can't be applied
> without manual repair.  If you can't prevent word wrap in your mail
> client, please send it as an attachment rather than inline.

Oh yes I see where it's gotten mangled now. I'm attaching it as a
.patch file (I assume that's okay).

> Also, there are a few whitespace issues in the patch; please run
> contrib/check_GNU_style.sh on the patch before submitting.

Should be all fixed now (there is one style issue left but it's a
false positive). Visual Studio Code was lying to me about what the
file looks like so if there are any more formatting issues please let
me know.

> If you use contrib/gcc-git-customization.sh and then git
> gcc-commit-mklog you don't need to touch ChangeLog files at all, just
> adjust the generated ChangeLog entries in the git commit message.  I
> personally tend to commit first with a placeholder message and then use
> git gcc-commit-mklog --amend to generate the ChangeLog entries.

Wouldn't that require read-write access? (Just from looking here
https://gcc.gnu.org/gitwrite.html.)

> Probably.  Can you use sort/uniq/diff on the .sum testsuite output to
> determine which passes are missing in the patched sources?

According to contrib/dg-cmp-results.sh ...

I get a bunch of these weird NA->PASSes (and vice-versa), for example:

PASS->NA: g++.dg/modules/alias-1_a.H module-cmi
(gcm.cache/home/anthony/Desktop/GCC/builds_and_source/source_clean/gcc/testsuite/g++.dg/modules/alias-1_a.H.gcm)
NA->PASS: g++.dg/modules/alias-1_a.H module-cmi
(gcm.cache/home/anthony/Desktop/GCC/builds_and_source/source_pr17314/gcc/testsuite/g++.dg/modules/alias-1_a.H.gcm)
PASS->NA: g++.dg/modules/alias-1_a.H module-cmi
(gcm.cache/home/anthony/Desktop/GCC/builds_and_source/source_clean/gcc/testsuite/g++.dg/modules/alias-1_a.H.gcm)
NA->PASS: g++.dg/modules/alias-1_a.H module-cmi
(gcm.cache/home/anthony/Desktop/GCC/builds_and_source/source_pr17314/gcc/testsuite/g++.dg/modules/alias-1_a.H.gcm)

They're weird because I haven't actually touched those files (so I'm
assuming this is normal). There are about ~400 of those and they're
all .gcm files. They seem to balance out.

dr142.c reports:

NA->PASS: g++.dg/tc1/dr142.C  -std=c++14  (test for warnings, line 11)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++14  (test for warnings, line 5)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++14  (test for warnings, line 7)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++14  (test for warnings, line 8)
NA->PASS: g++.dg/tc1/dr142.C  -std=c++17  (test for warnings, line 11)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++17  (test for warnings, line 5)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++17  (test for warnings, line 7)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++17  (test for warnings, line 8)
NA->PASS: g++.dg/tc1/dr142.C  -std=c++2a  (test for warnings, line 11)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++2a  (test for warnings, line 5)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++2a  (test for warnings, line 7)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++2a  (test for warnings, line 8)
NA->PASS: g++.dg/tc1/dr142.C  -std=c++98  (test for warnings, line 11)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++98  (test for warnings, line 5)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++98  (test for warnings, line 7)
PASS->NA: g++.dg/tc1/dr142.C  -std=c++98  (test for warnings, line 8)

In other words, there are 12 PASS->NAs and 4 NA->PASSes in this file,
meaning a net change of -8 (which explains why there are eight fewer).
My other changes also report PASS->NAs and vice-versa, but for those
the number of new NAs equals the number of new PASSes, so they don't
cause a change in quantity.

Thanks for being patient with me. I'll let you know when I've
completed the forms.

Also if I need to adjust the .patch to deal with the changelogs issue
please let me know.

Kind regards,
Anthony
Index: gcc/testsuite/g++.old-deja/g++.jason/access8.C
===
--- gcc/testsuite/g++.old-deja/g++.jason/access8.C	2020-12-31 16:51:34.0 +
+++ gcc/testsuite/g++.old-deja/g++.jason/access8.C	2021-01-03 00:22:14.969854000 +
@@ -4,5 +4,5 @@
 // Bug: g++ forgets access decls after the definition.
 
-class inh { // { dg-message "" } inaccessible
+class inh { 
 int a;
 protected:
@@ -10,5 +10,6 @@ protected:
 };
 
-class mel : private inh {
+class mel : private inh // { dg-message "" } inaccessible
+{
 protected:
 int t;
Index: gcc/testsuite/g++.old-deja/g++.law/access4.C

Re: [PATCH] issue -Wstring-compare for member arrays (PR 98097)

2021-01-08 Thread Jeff Law via Gcc-patches



On 1/7/21 5:53 PM, Martin Sebor via Gcc-patches wrote:
> In PR 98097 Richard expects -Wstring-compare for a call to strcmp()
> with a member array and a string literal of larger size, used in
> an equality test.
>
> In virtually all cases the test will indicate the two are unequal
> because the string stored in the member must be shorter (to fit
> the terminating nul), but GCC doesn't fold the result because
> there's wicked code out there that treats whole aggregates as if
> they were strings, up their full size.  Because the warning is
> based on the same conservative assumptions as the optimization,
> it doesn't trigger, letting the almost certain bug go unnoticed.
>
> The attached patch allows -Wstring-compare to trigger for these
> bugs by partly decoupling the warning from the underlying strcmp
> optimization.  Making this possible requires adding a new member
> to the c_strlen_data struct, which in turn called for changing
> the meaning of the existing decl member to nonstr.  That led to
> changes elsewhere, simply to adjust to the name change.  For
> the purposes of review, the meat of the warning changes is in
> tree-ssa-strlen.c.  All the rest of changes simply adjust code
> to the new name.
>
> Tested on x86_64-linux (None of Binutils, GDB, Glibc, or Valgrind
> triggers any instances of the warning with this change.)
>
> Martin
>
> gcc-98097.diff
>
> PR middle-end/98097 - missing -Wstring-compare with a member array
>
> gcc/ChangeLog:
>
>   PR middle-end/98097
>   * builtins.c (unterminated_array): Adjust to a name change.  Adjust
>   indentation.
>   (c_strlen): Use a member instead of a local variable.
>   (expand_builtin_stpcpy_1): Adjust to a name change.
>   (fold_builtin_strlen): Same.
>   * builtins.h (struct c_strlen_data::nonstr): New data member to use
>   instead of decl.
>(struct c_strlen_data::decl): Adjust comment.
>   * gimple-fold.c (get_range_strlen_tree): Set c_strlen_data::nonstr
>   in addition to c_strlen_data::decl.
>   (get_maxval_strlen): Adjust to a name change.
>   (gimple_fold_builtin_stpcpy): Same.
>   (gimple_fold_builtin_strlen): Same.
>   * gimple-ssa-sprintf.c (get_string_length): Same.
>   * tree-ssa-strlen.c (get_range_strlen_dynamic): Same.  Also set
>   struct c_strlen_data::decl.
>   (get_len_or_size): Use c_strlen_data::decl.  Succeed even for
>   nonconstant member arrays.
>   (strxcmp_eqz_result): Handle member arrays.
>   (handle_builtin_string_cmp): Issue warnings for member arrays.
>
> gcc/testsuite/ChangeLog:
>
>   PR middle-end/98097
>   * gcc.dg/Wstring-compare.c:
>   * gcc.dg/strcmpopt_10.c:
>   * gcc.dg/Wstring-compare-4.c: New test.
>   * gcc.dg/Wstring-compare-5.c: New test.
I think you need to update the function comment for gen_len_or_size to
describe the special case where we make the range invalidate and inverted.

OK with that change.

jeff



[PATCH] aarch64 : Mark rotate immediates with '#' as per DDI0487iFc.

2021-01-08 Thread Iain Sandoe
Hi,

The armv8_arm manual [C6.2.226, ROR (immediate)] uses a # in front
of the immediate rotation quantity.

Although, it seems, GAS is able to infer the # (or is leninent about
its absence) assemblers based on the LLVM back end expect it and error out.

tested on aarch64-linux-gnu (gcc115) and aarch64-darwin20 (experimental)

OK for master?
thanks
Iain

gcc/ChangeLog:

* config/aarch64/aarch64.md (_rol3): Add a '#'
mark in front of the immediate quantity.
(_rolsi3_uxtw): Likewise.
---
gcc/config/aarch64/aarch64.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 45d9c6ac45a..e0de82c938a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4416,7 +4416,7 @@
  (match_operand:QI 2 "aarch64_shift_imm_" "n"))
 (match_operand:GPI 3 "register_operand" "r")))]
  ""
-  "\\t%0, %3, %1, ror ( - %2)"
+  "\\t%0, %3, %1, ror #( - %2)"
  [(set_attr "type" "logic_shift_imm")]
)

@@ -4441,7 +4441,7 @@
  (match_operand:QI 2 "aarch64_shift_imm_si" "n"))
 (match_operand:SI 3 "register_operand" "r"]
  ""
-  "\\t%w0, %w3, %w1, ror (32 - %2)"
+  "\\t%w0, %w3, %w1, ror #(32 - %2)"
  [(set_attr "type" "logic_shift_imm")]
)

-- 
2.24.1


Re: [PATCH] ira: Skip some pseudos in move_unallocated_pseudos

2021-01-08 Thread Jeff Law via Gcc-patches



On 1/5/21 8:12 PM, Kewen.Lin wrote:
> on 2021/1/6 上午2:19, Jeff Law wrote:
>>
>> On 1/4/21 7:36 PM, Kewen.Lin wrote:
>>> Hi Jeff,
>>>
>>> on 2021/1/5 上午7:13, Jeff Law wrote:
 On 12/22/20 11:40 PM, Kewen.Lin via Gcc-patches wrote:
> Hi Segher,
>
> on 2020/12/22 下午9:55, Segher Boessenkool wrote:
>> Hi!
>>
>> Just a dumb formatting comment:
>>
>> On Tue, Dec 22, 2020 at 04:05:39PM +0800, Kewen.Lin wrote:
>>> This patch is to make move_unallocated_pseudos consistent
>>> to what we have in function find_moveable_pseudos, where we
>>> record the original pseudo into pseudo_replaced_reg only if
>>> validate_change succeeds with newreg.  To ensure every
>>> unallocated pseudo in move_unallocated_pseudos has expected
>>> information, it's better to add a check and skip it if it's
>>> unexpected.  This avoids possible ICEs in future.
>>>
>>> btw, I happened to found this in the bootstrapping for one
>>> experimental local patch, which is considered as impractical.
>>> --- a/gcc/ira.c
>>> +++ b/gcc/ira.c
>>> @@ -5111,6 +5111,11 @@ move_unallocated_pseudos (void)
>>>{
>>> int idx = i - first_moveable_pseudo;
>>> rtx other_reg = pseudo_replaced_reg[idx];
>>> +   /* If there is no appropriate pseudo in pseudo_replaced_reg, it
>>> +  means validate_change fails for this new pseudo in function
>>> +  find_moveable_pseudos, then bypass it here.*/
>> Dot space space.
> Good catch, thanks!  I forgot to reformat after polishing the comments.
> Will fix it with other potential comments.
>
>> The patch sounds fine to me.  Hard to tell without seeing the patch that
>> exposed the problem (for onlookers like me who do not know this code
>> well, anyway ;-) )
> The patch which made this issue exposed looks like:
>
> +; Like *rotl3_insert_3 but work with nonzero_bits rather than
> +; explicit AND.
> +(define_insn "*rotl3_insert_8"
> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> +(ior:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
> + (match_operand:SI 2 "u6bit_cint_operand" 
> "n"))
> + (match_operand:GPR 3 "gpc_reg_operand" "0")))]
> +  "HOST_WIDE_INT_1U << INTVAL (operands[2])
> +   > nonzero_bits (operands[3], mode)"
> +{
> +  if (mode == SImode)
> +return "rlwimi %0,%1,%h2,0,31-%h2";
> +  else
> +return "rldimi %0,%1,%H2,0";
> +}
> +  [(set_attr "type" "insert")])
>
> Some insn matches this pattern in combine, later ira tries to introduce
> one new pseudo since it meets the checks in find_moveable_pseudos, but
> it fails in the call to validate_change since the nonzero_bits is more
> rough and can't satisfy the pattern condition, leaving the unexpected
> entry in pseudo_replaced_reg.
 But what doesn't make any sense to me is pseudo_replaced_reg[] is only
 set when validation is successful in find_moveable_pseudos.   So I can't
 see how this patch actually helps the problem you're describing.

>>> Yeah, pseudo_replaced_reg[] is only set when validation is successful,
>>> but we bump the max pseudo number in ira_create_new_reg as below
>>> regardless of whether validation succeeds or not:
>>>
>>>   rtx newreg = ira_create_new_reg (def_reg);
>>>   if (validate_change (def_insn, DF_REF_REAL_LOC (def), newreg, 0))
>>>
>>> Later in move_unallocated_pseudos, the iterating could cover those
>>> pseudos which were created but not used due to failed validation.
>>>
>>>   for (i = first_moveable_pseudo; i < last_moveable_pseudo; i++)
>>> if (reg_renumber[i] < 0)
>>>   {
>>> int idx = i - first_moveable_pseudo;
>>> rtx other_reg = pseudo_replaced_reg[idx];// (1)
>>> rtx_insn *def_insn = DF_REF_INSN (DF_REG_DEF_CHAIN (i));
>>> /* The use must follow all definitions of OTHER_REG, so we can
>>>insert the new definition immediately after any of them.  */
>>> df_ref other_def = DF_REG_DEF_CHAIN (REGNO (other_reg))
>>>
>>> Then we can get the NULL other_reg in (1), also have unexpected df info
>>> which causes ICE.  The patch skips the handlings on those pseudos which
>>> were intended to be used in validatation INSN but failed to.
>> I was wondering if it was somehow related to creation of new pseudos. 
>> The other important tidbit here is we reset last_movable_pseudo near the
>> end of find_moveable_pseudos.
> Yeah, the iterating will scan all new pseudos created in 
> find_moveable_pseudos,
> the problem occurs on those ones that fail to validate.
>
>> OK for the trunk with an expanded comment.
> Thanks!  Does the attached new version look good to you?
Yes.  Thanks.
jeff



Re: [PATCH, libstdc++] GLIBCXX_HAVE_INT64_T

2021-01-08 Thread David Edelsohn via Gcc-patches
On Fri, Jan 8, 2021 at 1:52 PM Jakub Jelinek  wrote:
>
> On Fri, Jan 08, 2021 at 06:37:03PM +, Jonathan Wakely wrote:
> > This uses __INT64_TYPE__ if that's defined, and long long otherwise. I
> > think that should be equivalent in all practical cases (I can imagine
> > some strange target where __INT64_TYPE__ is defined by the compiler,
> > but int64_t isn't defined when the configure checks look for it, and
> > so the current code would use long long and with my patch would use
> > __INT64_TYPE__ which could be long ... but I think in practice that's
> > unlikely. It was probably more likely in older releases where the
> > configure test would have been done with -std=gnu++98 and so int64_t
> > might not have been declared by libc's , but if that was the
> > case then any ABI break it caused happened years ago.
>
> Does clang and ICC define __INT64_TYPE__ (at least on most architectures)
> and does it match what gcc defines it to?

Clang (at least back to 3.0.0) and ICC (at least back to 16.0.0)
define __INT64_TYPE__.  If the value is not compatible with the target
__int64_t type (matching GCC), there presumably are deeper problems.

Thanks, David


Re: Pointer width in GCC?

2021-01-08 Thread H.J. Lu via Gcc-patches
On Fri, Jan 8, 2021 at 12:15 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 1/8/21 12:55 PM, Qing Zhao via Gcc-patches wrote:
> > Hi,
> >
> > Is there an utility routine in GCC to query the pointer width of the 
> > current target? Whether it’s 32bit pointer or 64 bit pointer for the target?
> >
> > Thanks a lot for the help.
> You can look at the GET_MODE_SIZE (Pmode)  or POINTER_SIZE.  They can
> differ in some circumstances.
>

It is ptr_mode vs Pmode.  ptr_mode is the software pointer mode.  Pmode
is the hardware pointer mode.  They can be different.


-- 
H.J.


Re: [PATCH] Add pytest for a GCOV test-case

2021-01-08 Thread Jeff Law via Gcc-patches



On 1/7/21 9:14 AM, Martin Liška wrote:
> On 1/6/21 12:36 AM, Jeff Law wrote:
>> unresolved "could not find python interpreter $testcase" in
>> run-gcov-pytest if you find the right magic in the output of your spawn.
>
> Achieved that with the updated patch.
>
> Ready for master?
> Thanks,
> Martin
>
> 0001-Add-pytest-for-a-GCOV-test-case.patch
>
> From 53f5169156044acf8ecec498aa89d6be44c7173a Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Mon, 21 Dec 2020 09:14:28 +0100
> Subject: [PATCH] Add pytest for a GCOV test-case
>
> gcc/testsuite/ChangeLog:
>
>   PR gcov-profile/98273
>   * lib/gcov.exp: Add run-gcov-pytest function which runs pytest.
>   * g++.dg/gcov/pr98273.C: New test.
>   * g++.dg/gcov/gcov.py: New test.
>   * g++.dg/gcov/test-pr98273.py: New test.
OK
jeff



Re: Pointer width in GCC?

2021-01-08 Thread Jeff Law via Gcc-patches



On 1/8/21 12:55 PM, Qing Zhao via Gcc-patches wrote:
> Hi,
>
> Is there an utility routine in GCC to query the pointer width of the current 
> target? Whether it’s 32bit pointer or 64 bit pointer for the target?
>
> Thanks a lot for the help.
You can look at the GET_MODE_SIZE (Pmode)  or POINTER_SIZE.  They can
differ in some circumstances.


jeff



Re: [PATCH 4/4] VAX: Remove a duplicate `cc' mode attribute

2021-01-08 Thread Jeff Law via Gcc-patches



On 1/7/21 6:51 PM, Maciej W. Rozycki wrote:
> Remove the `cc' mode attribute that duplicates the implicitly defined 
> `mode' attribute.  No change to semantics.
>
>   gcc/
>   * config/vax/vax.md (cc): Remove mode attribute.
>   (subst_, subst_f): Rename to...
>   (subst_, subst_f): ... these respectively.
>   (*cbranch4_): Update for `cc' removal.
>   (*cbranch4_): Likewise.
>   (*branch_, *branch__reversed): Likewise.
OK
jeff



Re: [PATCH 3/4] VAX: Use a mode with `const_double_zero' expressions

2021-01-08 Thread Jeff Law via Gcc-patches



On 1/7/21 6:50 PM, Maciej W. Rozycki wrote:
> For predictable semantics propagate the mode from operands referred by 
> the FP substitution to the `const_double_zero' expressions used with the 
> associated condition code calculation.  Use an iterator to make copies 
> of the FP substitution across the FP modes supported as the substitution 
> now has to match the mode of the operands.
>
>   gcc/
>   * config/pdp11/pdp11.md (subst_f): Add mode to operands and 
>   `const_double_zero'.
OK
jeff



Re: [PATCH 1/4] RTL: Update `const_double_zero' handling for mode and callable insns

2021-01-08 Thread Jeff Law via Gcc-patches



On 1/7/21 6:50 PM, Maciej W. Rozycki wrote:
> Handle machine mode specification with `const_double_zero' and handle 
> the rtx with callable code produced from named insns.  Complementing 
> commit 20ab43b5cad6 ("RTL: Add `const_double_zero' syntactic rtx") and 
> removing a commit c60d0736dff7 ("PDP11: Use `const_double_zero' to 
> express double zero constant") build regression observed with the 
> `pdp11-aout' target:
>
> genemit: Internal error: abort in gen_exp, at genemit.c:202
> make[2]: *** [Makefile:2427: s-emit] Error 1
>
> where a:
>
> (const_double 0 [0] 0 [0] 0 [0] 0 [0])
>
> rtx coming from:
>
> (parallel [
> (set (reg:CC 16)
> (compare:CC (abs:DF (match_operand:DF 1 ("general_operand") 
> ("0,0")))
> (const_double 0 [0] 0 [0] 0 [0] 0 [0])))
> (set (match_operand:DF 0 ("nonimmediate_operand") ("=fR,Q"))
> (abs:DF (match_dup 1)))
> ])
>
> and ultimately `(const_double_zero)' referred in a named RTL insn cannot 
> be interpreted.  Handle the rtx then by supplying the constant 0 double 
> operand requested, resulting in the following update to insn-emit.c code 
> produced for the `pdp11-aout' target, relative to before the triggering 
> commit:
>
> @@ -1514,7 +1514,7 @@ gen_absdf2_cc (rtx operand0 ATTRIBUTE_UN
>   gen_rtx_COMPARE (CCmode,
>   gen_rtx_ABS (DFmode,
>   operand1),
> - const0_rtx)),
> + CONST_DOUBLE_ATOF ("0", VOIDmode))),
>   gen_rtx_SET (operand0,
>   gen_rtx_ABS (DFmode,
>   copy_rtx (operand1);
> @@ -1555,7 +1555,7 @@ gen_negdf2_cc (rtx operand0 ATTRIBUTE_UN
>   gen_rtx_COMPARE (CCmode,
>   gen_rtx_NEG (DFmode,
>   operand1),
> - const0_rtx)),
> + CONST_DOUBLE_ATOF ("0", VOIDmode))),
>   gen_rtx_SET (operand0,
>   gen_rtx_NEG (DFmode,
>   copy_rtx (operand1);
> @@ -1790,7 +1790,7 @@ gen_muldf3_cc (rtx operand0 ATTRIBUTE_UN
>   gen_rtx_MULT (DFmode,
>   operand1,
>   operand2),
> - const0_rtx)),
> + CONST_DOUBLE_ATOF ("0", VOIDmode))),
>   gen_rtx_SET (operand0,
>   gen_rtx_MULT (DFmode,
>   copy_rtx (operand1),
> @@ -1942,7 +1942,7 @@ gen_divdf3_cc (rtx operand0 ATTRIBUTE_UN
>   gen_rtx_DIV (DFmode,
>   operand1,
>   operand2),
> - const0_rtx)),
> + CONST_DOUBLE_ATOF ("0", VOIDmode))),
>   gen_rtx_SET (operand0,
>   gen_rtx_DIV (DFmode,
>   copy_rtx (operand1),
>
> This does not (yet) remove VOIDmode CONST_DOUBLE use, as it is up to 
> individual machine descriptions to choose.
>
>   gcc/
>   * genemit.c (gen_exp) : Handle `const_double_zero' 
>   rtx.
>   * read-rtl.c (rtx_reader::read_rtx_code): Handle machine mode 
>   with `const_double_zero'.
>   * doc/rtl.texi (Constant Expression Types): Document it.
OK
jeff



Re: [PATCH] VAX/testsuite: Remove notsi comparison elimination regressions

2021-01-08 Thread Jeff Law via Gcc-patches



On 1/8/21 5:49 AM, Maciej W. Rozycki wrote:
> Remove fallout from commit 0bd675183d94 ("match.pd: Add ~(X - Y) -> ~X 
> + Y simplification [PR96685]") and paper over the regression caused as 
> it is not the matter of the test cases affected.
>
> Previously assembly like this:
>
>   .text
>   .align 1
> .globl eq_notsi
>   .type   eq_notsi, @function
> eq_notsi:
>   .word 0 # 35[c=0]  procedure_entry_mask
>   subl2 $4,%sp# 46[c=32]  *addsi3
>   mcoml 4(%ap),%r0# 32[c=16]  *one_cmplsi2_ccz
>   jeql .L1# 34[c=26]  *branch_ccz
>   addl2 $2,%r0# 31[c=32]  *addsi3
> .L1:
>   ret # 40[c=0]  return
>   .size   eq_notsi, .-eq_notsi
>
> was produced.  Now this:
>
>   .text
>   .align 1
> .globl eq_notsi
>   .type   eq_notsi, @function
> eq_notsi:
>   .word 0 # 36[c=0]  procedure_entry_mask
>   subl2 $4,%sp# 48[c=32]  *addsi3
>   movl 4(%ap),%r0 # 33[c=16]  *movsi_2
>   cmpl %r0,$-1# 34[c=8]  *cmpsi_ccz/1
>   jeql .L3# 35[c=26]  *branch_ccz
>   subl3 %r0,$1,%r0# 32[c=32]  *subsi3/1
>   ret # 27[c=0]  return
> .L3:
>   clrl %r0# 31[c=2]  *movsi_2
>   ret # 41[c=0]  return
>   .size   eq_notsi, .-eq_notsi
>
> is, which cannot work with post-reload comparison elimination, due to 
> the comparison against -1 rather than 0.
>
> Use subtraction from a constant then rather than addition as the former 
> operation is not transformed, removing these regressions:
>
> FAIL: gcc.target/vax/cmpelim-eq-notsi.c   -O1   scan-rtl-dump-times cmpelim 
> "deleting insn with uid" 1
> FAIL: gcc.target/vax/cmpelim-eq-notsi.c   -O1   scan-assembler-not 
> \t(bit|cmpz?|tst).
> FAIL: gcc.target/vax/cmpelim-eq-notsi.c   -O1   scan-assembler one_cmplsi[^ 
> ]*_ccz(/[0-9]+)?\n
> FAIL: gcc.target/vax/cmpelim-lt-notsi.c   -O1   scan-rtl-dump-times cmpelim 
> "deleting insn with uid" 1
> FAIL: gcc.target/vax/cmpelim-lt-notsi.c   -O1   scan-assembler-not 
> \t(bit|cmpz?|tst).
> FAIL: gcc.target/vax/cmpelim-lt-notsi.c   -O1   scan-assembler one_cmplsi[^ 
> ]*_ccn(/[0-9]+)?\n
>
> and likewise across some of the other the optimization levels verified.  
>
> The LE variant appears unaffected as the new transformation produces 
> slightly different although still suboptimal code:
>
>   .text
>   .align 1
> .globl le_notsi
>   .type   le_notsi, @function
> le_notsi:
>   .word 0 # 27[c=0]  procedure_entry_mask
>   subl2 $4,%sp# 34[c=32]  *addsi3
>   movl 4(%ap),%r1 # 23[c=16]  *movsi_2
>   mcoml %r1,%r0   # 24[c=8]  *one_cmplsi2_ccnz
>   jleq .L1# 26[c=26]  *branch_ccnz
>   subl3 %r1,$1,%r0# 22[c=32]  *subsi3/1
> .L1:
>   ret # 32[c=0]  return
>   .size   le_notsi, .-le_notsi
>
> but update the test case too, for consistency with the other two.
>
>   gcc/testsuite/
>   * gcc.target/vax/cmpelim-eq-notsi.c: Use subtraction from a 
>   constant then rather than addition.
>   * gcc.target/vax/cmpelim-le-notsi.c: Likewise.
>   * gcc.target/vax/cmpelim-lt-notsi.c: Likewise.
OK
jeff



Pointer width in GCC?

2021-01-08 Thread Qing Zhao via Gcc-patches
Hi,

Is there an utility routine in GCC to query the pointer width of the current 
target? Whether it’s 32bit pointer or 64 bit pointer for the target?

Thanks a lot for the help.

Qing

Re: [PATCH] c++, abi: Fix abi_tag attribute handling [PR98481]

2021-01-08 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 08, 2021 at 02:22:59PM -0500, Jason Merrill wrote:
> I like the idea to use *walk_subtrees to distinguish between walking
> syntactic subtrees and walking type-identity subtrees.  But it should be
> more general; how does this look to you?

LGTM, thanks.

> diff --git a/gcc/cp/class.c b/gcc/cp/class.c
> index c41ac7deefe..00c0dba0a55 100644
> --- a/gcc/cp/class.c
> +++ b/gcc/cp/class.c
> @@ -1507,6 +1507,10 @@ mark_or_check_tags (tree t, tree *tp, abi_tag_data *p, 
> bool val)
>  static tree
>  find_abi_tags_r (tree *tp, int *walk_subtrees, void *data)
>  {
> +  if (TYPE_P (*tp) && *walk_subtrees == 1)
> +/* Tell cp_walk_subtrees to look though typedefs.  */
> +*walk_subtrees = 2;
> +
>if (!OVERLOAD_TYPE_P (*tp))
>  return NULL_TREE;
>  
> @@ -1527,6 +1531,10 @@ find_abi_tags_r (tree *tp, int *walk_subtrees, void 
> *data)
>  static tree
>  mark_abi_tags_r (tree *tp, int *walk_subtrees, void *data)
>  {
> +  if (TYPE_P (*tp) && *walk_subtrees == 1)
> +/* Tell cp_walk_subtrees to look though typedefs.  */
> +*walk_subtrees = 2;
> +
>if (!OVERLOAD_TYPE_P (*tp))
>  return NULL_TREE;
>  
> diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
> index b10671091b5..b087753cfba 100644
> --- a/gcc/cp/decl2.c
> +++ b/gcc/cp/decl2.c
> @@ -2358,9 +2358,6 @@ min_vis_r (tree *tp, int *walk_subtrees, void *data)
>int this_vis = VISIBILITY_DEFAULT;
>if (! TYPE_P (*tp))
>  *walk_subtrees = 0;
> -  else if (typedef_variant_p (*tp))
> -/* Look through typedefs despite cp_walk_subtrees.  */
> -this_vis = type_visibility (DECL_ORIGINAL_TYPE (TYPE_NAME (*tp)));
>else if (OVERLOAD_TYPE_P (*tp)
>  && !TREE_PUBLIC (TYPE_MAIN_DECL (*tp)))
>  {
> @@ -2379,6 +2376,10 @@ min_vis_r (tree *tp, int *walk_subtrees, void *data)
>if (this_vis > *vis_p)
>  *vis_p = this_vis;
>  
> +  /* Tell cp_walk_subtrees to look through typedefs.  */
> +  if (*walk_subtrees == 1)
> +*walk_subtrees = 2;
> +
>return NULL;
>  }
>  
> diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
> index 82027cc9abf..c536eb581a7 100644
> --- a/gcc/cp/tree.c
> +++ b/gcc/cp/tree.c
> @@ -5146,16 +5146,26 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
> walk_tree_fn func,
>  
>if (TYPE_P (*tp))
>  {
> -  /* Walk into template args without looking through typedefs.  */
> -  if (tree ti = TYPE_TEMPLATE_INFO_MAYBE_ALIAS (*tp))
> - WALK_SUBTREE (TI_ARGS (ti));
> -  /* Don't look through typedefs; walk_tree_fns that want to look through
> -  typedefs (like min_vis_r) need to do that themselves.  */
> -  if (typedef_variant_p (*tp))
> +  /* If *WALK_SUBTREES_P is 1, we're interested in the syntactic form of
> +  the argument, so don't look through typedefs, but do walk into
> +  template arguments for alias templates (and non-typedefed classes).
> +
> +  If *WALK_SUBTREES_P > 1, we're interested in type identity or
> +  equivalence, so look through typedefs, ignoring template arguments for
> +  alias templates, and walk into template args of classes.
> +
> +  See find_abi_tags_r for an example of setting *WALK_SUBTREES_P to 2
> +  when that's the behavior the walk_tree_fn wants.  */
> +  if (*walk_subtrees_p == 1 && typedef_variant_p (*tp))
>   {
> +   if (tree ti = TYPE_ALIAS_TEMPLATE_INFO (*tp))
> + WALK_SUBTREE (TI_ARGS (ti));
> *walk_subtrees_p = 0;
> return NULL_TREE;
>   }
> +
> +  if (tree ti = TYPE_TEMPLATE_INFO (*tp))
> + WALK_SUBTREE (TI_ARGS (ti));
>  }
>  
>/* Not one of the easy cases.  We must explicitly go through the


Jakub



Re: [PATCH v2] testsuite: Fix test failures from outputs.exp [PR98225]

2021-01-08 Thread David Edelsohn via Gcc-patches
Hi, Bernd

Thanks for investigating this and creating a revised version of the
patch.  With the second patch, the gcc.misc-test/outputs.exp results
are clean on AIX.

Thanks, David

On Fri, Jan 8, 2021 at 1:59 PM Bernd Edlinger  wrote:
>
> On 1/8/21 3:23 PM, David Edelsohn wrote:
> > On Thu, Jan 7, 2021 at 5:18 PM Bernd Edlinger  
> > wrote:
> >>
> >> Hi,
> >>
> >> On 1/7/21 5:12 PM, Rainer Orth wrote:
> >>>   The unsetenv needs to be wrapped in
> >>>
> >>> if [info exists env(MAKEFLAGS)] {
> >>>
> >>
> >> Done.
> >>
> >>> @@ -163,6 +167,9 @@ proc outest { test sources opts dirs out
> >>>   if { $ogl != {} } {
> >>>   pass "$test: $d$o"
> >>>   file delete $ogl
> >>> + } elseif { [string match "*.ld1_args" $o] } {
> >>> + # This file may be missing if !HAVE_GNU_LD
> >>> + pass "$test: $d$o"
> >>>
> >>>   Always PASSing the test even if it isn't run is wrong.  Either wrap
> >>>   the whole group of tests with response files in
> >>>
> >>> if [check_effective_target_gld] {
> >>>
> >>>   or make the test for the *.ld1_args file conditional on that
> >>>   (e.g. along the lines of $ltop used elsewhere).  I'd welcome input
> >>>   from Alexandre which is preferred.
> >>>
> >>
> >> Ah, yes that is a good idea.  Thanks.
> >>
> >>
> >> I think the .cdtor.* handling, is probably a bad example that I followed 
> >> here.
> >> I don't know why that is there in the first place, as there
> >> are no C++ test cases, these files should not be created at all.
> >> If they are ever created we would have a couple of other files created
> >> as well IMHO.
> >> If there are still missing files in some cases,
> >> I'd prefer to track these per test case, instead of globally.
> >>
> >> Therefore I propose to remove that exception for now.
> >>
> >> Is it OK for trunk?
> >
> > As Alex said, please don't just remove features and functionality if
> > you don't know why they were added.  The history is online in the
> > mailing list and the repo history.
> >
> > AIX uses constructors to register EH frames and libgcc has an EH
> > frame.  ctors and dtors can be found in non-C++ code.
> >
>
> Okydoky.
>
> I think I understand now better what the issue is here.
> Although the name cdtor suggests that it has something to do with
> C++ it is also needed to collect EH frame info, in certain targets.
> Those are mainly AIX but also hppa*-*-hpux*.
> I believe those exceptions are only necessary for targets that
> define EH_FRAME_THROUGH_COLLECT2.
>
> I have tested this new version of my patch but only on not-affected
> x86_64-pc-linux-gnu.
>
> @David, @Rainer: I would very much appreciate if you could give this patch
> a test on your systems.
>
>
> Thanks
> Berns.


Re: [PATCH] c++, abi: Fix abi_tag attribute handling [PR98481]

2021-01-08 Thread Jason Merrill via Gcc-patches

On 1/7/21 11:47 AM, Jakub Jelinek wrote:

In GCC10 cp_walk_subtrees has been changed to walk template arguments.
As the following testcase, that changed the mangling of some functions.


Argh.


I believe the previous behavior that find_abi_tags_r doesn't recurse into
template args has been the correct one, but setting *walk_subtrees = 0
for the types and handling the types subtree walking manually in
find_abi_tags_r looks too hard, there are a lot of subtrees and details what
should and shouldn't be walked, both in tree.c (walk_type_fields there,
which is static) and in cp_walk_subtrees itself.

The following patch abuses the fact that *walk_subtrees is an int to
tell cp_walk_subtrees it shouldn't walk the template args.

Another option would be to have two separate cp_walk_subtrees-like
callbacks, one that wouldn't walk into template args and the other
that would and then would tail call the other one, and
cp_walk_tree_without_duplicates but call walk_tree_1 directly or use
some other macro.


I like the idea to use *walk_subtrees to distinguish between walking 
syntactic subtrees and walking type-identity subtrees.  But it should be 
more general; how does this look to you?


Jason
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index c41ac7deefe..00c0dba0a55 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -1507,6 +1507,10 @@ mark_or_check_tags (tree t, tree *tp, abi_tag_data *p, bool val)
 static tree
 find_abi_tags_r (tree *tp, int *walk_subtrees, void *data)
 {
+  if (TYPE_P (*tp) && *walk_subtrees == 1)
+/* Tell cp_walk_subtrees to look though typedefs.  */
+*walk_subtrees = 2;
+
   if (!OVERLOAD_TYPE_P (*tp))
 return NULL_TREE;
 
@@ -1527,6 +1531,10 @@ find_abi_tags_r (tree *tp, int *walk_subtrees, void *data)
 static tree
 mark_abi_tags_r (tree *tp, int *walk_subtrees, void *data)
 {
+  if (TYPE_P (*tp) && *walk_subtrees == 1)
+/* Tell cp_walk_subtrees to look though typedefs.  */
+*walk_subtrees = 2;
+
   if (!OVERLOAD_TYPE_P (*tp))
 return NULL_TREE;
 
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index b10671091b5..b087753cfba 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -2358,9 +2358,6 @@ min_vis_r (tree *tp, int *walk_subtrees, void *data)
   int this_vis = VISIBILITY_DEFAULT;
   if (! TYPE_P (*tp))
 *walk_subtrees = 0;
-  else if (typedef_variant_p (*tp))
-/* Look through typedefs despite cp_walk_subtrees.  */
-this_vis = type_visibility (DECL_ORIGINAL_TYPE (TYPE_NAME (*tp)));
   else if (OVERLOAD_TYPE_P (*tp)
 	   && !TREE_PUBLIC (TYPE_MAIN_DECL (*tp)))
 {
@@ -2379,6 +2376,10 @@ min_vis_r (tree *tp, int *walk_subtrees, void *data)
   if (this_vis > *vis_p)
 *vis_p = this_vis;
 
+  /* Tell cp_walk_subtrees to look through typedefs.  */
+  if (*walk_subtrees == 1)
+*walk_subtrees = 2;
+
   return NULL;
 }
 
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 82027cc9abf..c536eb581a7 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -5146,16 +5146,26 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, walk_tree_fn func,
 
   if (TYPE_P (*tp))
 {
-  /* Walk into template args without looking through typedefs.  */
-  if (tree ti = TYPE_TEMPLATE_INFO_MAYBE_ALIAS (*tp))
-	WALK_SUBTREE (TI_ARGS (ti));
-  /* Don't look through typedefs; walk_tree_fns that want to look through
-	 typedefs (like min_vis_r) need to do that themselves.  */
-  if (typedef_variant_p (*tp))
+  /* If *WALK_SUBTREES_P is 1, we're interested in the syntactic form of
+	 the argument, so don't look through typedefs, but do walk into
+	 template arguments for alias templates (and non-typedefed classes).
+
+	 If *WALK_SUBTREES_P > 1, we're interested in type identity or
+	 equivalence, so look through typedefs, ignoring template arguments for
+	 alias templates, and walk into template args of classes.
+
+	 See find_abi_tags_r for an example of setting *WALK_SUBTREES_P to 2
+	 when that's the behavior the walk_tree_fn wants.  */
+  if (*walk_subtrees_p == 1 && typedef_variant_p (*tp))
 	{
+	  if (tree ti = TYPE_ALIAS_TEMPLATE_INFO (*tp))
+	WALK_SUBTREE (TI_ARGS (ti));
 	  *walk_subtrees_p = 0;
 	  return NULL_TREE;
 	}
+
+  if (tree ti = TYPE_TEMPLATE_INFO (*tp))
+	WALK_SUBTREE (TI_ARGS (ti));
 }
 
   /* Not one of the easy cases.  We must explicitly go through the


[PATCH v2] testsuite: Fix test failures from outputs.exp [PR98225]

2021-01-08 Thread Bernd Edlinger
On 1/8/21 3:23 PM, David Edelsohn wrote:
> On Thu, Jan 7, 2021 at 5:18 PM Bernd Edlinger  
> wrote:
>>
>> Hi,
>>
>> On 1/7/21 5:12 PM, Rainer Orth wrote:
>>>   The unsetenv needs to be wrapped in
>>>
>>> if [info exists env(MAKEFLAGS)] {
>>>
>>
>> Done.
>>
>>> @@ -163,6 +167,9 @@ proc outest { test sources opts dirs out
>>>   if { $ogl != {} } {
>>>   pass "$test: $d$o"
>>>   file delete $ogl
>>> + } elseif { [string match "*.ld1_args" $o] } {
>>> + # This file may be missing if !HAVE_GNU_LD
>>> + pass "$test: $d$o"
>>>
>>>   Always PASSing the test even if it isn't run is wrong.  Either wrap
>>>   the whole group of tests with response files in
>>>
>>> if [check_effective_target_gld] {
>>>
>>>   or make the test for the *.ld1_args file conditional on that
>>>   (e.g. along the lines of $ltop used elsewhere).  I'd welcome input
>>>   from Alexandre which is preferred.
>>>
>>
>> Ah, yes that is a good idea.  Thanks.
>>
>>
>> I think the .cdtor.* handling, is probably a bad example that I followed 
>> here.
>> I don't know why that is there in the first place, as there
>> are no C++ test cases, these files should not be created at all.
>> If they are ever created we would have a couple of other files created
>> as well IMHO.
>> If there are still missing files in some cases,
>> I'd prefer to track these per test case, instead of globally.
>>
>> Therefore I propose to remove that exception for now.
>>
>> Is it OK for trunk?
> 
> As Alex said, please don't just remove features and functionality if
> you don't know why they were added.  The history is online in the
> mailing list and the repo history.
> 
> AIX uses constructors to register EH frames and libgcc has an EH
> frame.  ctors and dtors can be found in non-C++ code.
> 

Okydoky.

I think I understand now better what the issue is here.
Although the name cdtor suggests that it has something to do with
C++ it is also needed to collect EH frame info, in certain targets.
Those are mainly AIX but also hppa*-*-hpux*.
I believe those exceptions are only necessary for targets that
define EH_FRAME_THROUGH_COLLECT2.

I have tested this new version of my patch but only on not-affected
x86_64-pc-linux-gnu.

@David, @Rainer: I would very much appreciate if you could give this patch
a test on your systems.


Thanks
Berns.
From 861f6631c34bdcbc0d6f61247cc231c1f1b36708 Mon Sep 17 00:00:00 2001
From: Bernd Edlinger 
Date: Thu, 7 Jan 2021 09:37:32 +0100
Subject: [PATCH] testsuite: Fix test failures from outputs.exp [PR98225]

The .ld1_args file is not created when HAVE_GNU_LD is false.
The ltrans0.ltrans_arg file is not created when the make jobserver
is available, so remove the MAKEFLAGS variable.
Add an exception for *.gcc_args files similar to the
exception for *.cdtor.* files.
Limit both exceptions to targets that define EH_FRAME_THROUGH_COLLECT2.
That means although the test case does not use C++ constructors
or destructors it is still using dwarf2 frame info.

2021-01-07  Bernd Edlinger  

	PR testsuite/98225
	* gcc.misc-tests/outputs.exp: Unset MAKEFLAGS.
	Expect .ld1_args only when GNU LD is used.
	Add an exception for *.gcc_args files.
---
 gcc/testsuite/gcc.misc-tests/outputs.exp | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.misc-tests/outputs.exp b/gcc/testsuite/gcc.misc-tests/outputs.exp
index 80d4b61..d5a9709 100644
--- a/gcc/testsuite/gcc.misc-tests/outputs.exp
+++ b/gcc/testsuite/gcc.misc-tests/outputs.exp
@@ -50,6 +50,9 @@ if !$skip_lto {
 set ltop [check_linker_plugin_available]
 }
 
+# Check for GNU LD.  Some files like .ld1_args depend on this.
+set gld [check_effective_target_gld]
+
 # Prepare additional options to be used for linking.
 # We do not compile to an executable, because that requires naming an output.
 set link_options ""
@@ -67,6 +70,12 @@ if {[board_info $dest exists output_format]} {
 append link_options " additional_flags=-Wl,-oformat,[board_info $dest output_format]"
 }
 
+# Avoid possible influence from the make jobserver,
+# otherwise ltrans0.ltrans_args files may be missing.
+if [info exists env(MAKEFLAGS)] {
+unsetenv MAKEFLAGS
+}
+
 # For the test named TEST, run the compiler with SOURCES and OPTS, and
 # look in DIRS for OUTPUTS.  SOURCES is a list of suffixes for source
 # files starting with $b in $srcdir/$subdir, OPTS is a string with
@@ -130,6 +139,7 @@ proc outest { test sources opts dirs outputs } {
 	foreach og $olist {
 	if { [string index $og 0] == "!" } {
 		global gspd ltop
+		global gld
 		set cond [expr $og]
 		continue
 	}
@@ -181,7 +191,10 @@ proc outest { test sources opts dirs outputs } {
 	file delete $f
 	# collect2 may create .cdtor* files in -save-temps link tests,
 	# ??? without regard to aux output naming conventions.
-	if ![string match "*.cdtor.*" $f] then {
+	# Limit this exception to targets that define EH_FRAME_THROUGH_COLLECT2.

Re: [PATCH, libstdc++] GLIBCXX_HAVE_INT64_T

2021-01-08 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 08, 2021 at 06:37:03PM +, Jonathan Wakely wrote:
> This uses __INT64_TYPE__ if that's defined, and long long otherwise. I
> think that should be equivalent in all practical cases (I can imagine
> some strange target where __INT64_TYPE__ is defined by the compiler,
> but int64_t isn't defined when the configure checks look for it, and
> so the current code would use long long and with my patch would use
> __INT64_TYPE__ which could be long ... but I think in practice that's
> unlikely. It was probably more likely in older releases where the
> configure test would have been done with -std=gnu++98 and so int64_t
> might not have been declared by libc's , but if that was the
> case then any ABI break it caused happened years ago.

Does clang and ICC define __INT64_TYPE__ (at least on most architectures)
and does it match what gcc defines it to?

Jakub



Re: [PATCH, libstdc++] GLIBCXX_HAVE_INT64_T

2021-01-08 Thread Jonathan Wakely via Gcc-patches

On 06/01/21 19:41 -0500, David Edelsohn wrote:

Thanks for clarifying the issue.

As you implicitly point out, GCC knows the type of INT64 and defines
the macro __INT64_TYPE__ .  The revised code can use that directly,
such as:

#if defined(_GLIBCXX_HAVE_INT64_T_LONG) \
   || defined(_GLIBCXX_HAVE_INT64_T_LONG_LONG)
  typedef __INT64_TYPE__   streamoff;
#elif defined(_GLIBCXX_HAVE_INT64_T)
  typedef int64_t streamoff;
#else
  typedef long long streamoff;
#endif

Are there any additional issues not addressed by that approach, other
than possible further simplification?


That avoids the ABI break that Jakub pointed out. But I think we can
simplify it further, as in the attached patch.

This uses __INT64_TYPE__ if that's defined, and long long otherwise. I
think that should be equivalent in all practical cases (I can imagine
some strange target where __INT64_TYPE__ is defined by the compiler,
but int64_t isn't defined when the configure checks look for it, and
so the current code would use long long and with my patch would use
__INT64_TYPE__ which could be long ... but I think in practice that's
unlikely. It was probably more likely in older releases where the
configure test would have been done with -std=gnu++98 and so int64_t
might not have been declared by libc's , but if that was the
case then any ABI break it caused happened years ago.



diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index e4175ea3e64..f13c5d2467f 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -474,63 +474,6 @@ AC_DEFUN([GLIBCXX_CHECK_WRITEV], [
 ])
 
 
-dnl
-dnl Check whether int64_t is available in , and define HAVE_INT64_T.
-dnl Also check whether int64_t is actually a typedef to long or long long.
-dnl
-AC_DEFUN([GLIBCXX_CHECK_INT64_T], [
-
-  AC_LANG_SAVE
-  AC_LANG_CPLUSPLUS
-
-  AC_MSG_CHECKING([for int64_t])
-  AC_CACHE_VAL(glibcxx_cv_INT64_T, [
-AC_TRY_COMPILE(
-  [#include ],
-  [int64_t var;],
-  [glibcxx_cv_INT64_T=yes],
-  [glibcxx_cv_INT64_T=no])
-  ])
-
-  if test $glibcxx_cv_INT64_T = yes; then
-AC_DEFINE(HAVE_INT64_T, 1, [Define if int64_t is available in .])
-AC_MSG_RESULT($glibcxx_cv_INT64_T)
-
-AC_MSG_CHECKING([for int64_t as long])
-AC_CACHE_VAL(glibcxx_cv_int64_t_long, [
-  AC_TRY_COMPILE(
-	[#include 
-	template struct same { enum { value = -1 }; };
-	template struct same { enum { value = 1 }; };
-	int array[same::value];], [],
-	[glibcxx_cv_int64_t_long=yes], [glibcxx_cv_int64_t_long=no])
-])
-
-if test $glibcxx_cv_int64_t_long = yes; then
-  AC_DEFINE(HAVE_INT64_T_LONG, 1, [Define if int64_t is a long.])
-  AC_MSG_RESULT($glibcxx_cv_int64_t_long)
-fi
-
-AC_MSG_CHECKING([for int64_t as long long])
-AC_CACHE_VAL(glibcxx_cv_int64_t_long_long, [
-  AC_TRY_COMPILE(
-	[#include 
-	template struct same { enum { value = -1 }; };
-	template struct same { enum { value = 1 }; };
-	int array[same::value];], [],
-	[glibcxx_cv_int64_t_long_long=yes], [glibcxx_cv_int64_t_long_long=no])
-])
-
-if test $glibcxx_cv_int64_t_long_long = yes; then
-  AC_DEFINE(HAVE_INT64_T_LONG_LONG, 1, [Define if int64_t is a long long.])
-  AC_MSG_RESULT($glibcxx_cv_int64_t_long_long)
-fi
-  fi
-
-  AC_LANG_RESTORE
-])
-
-
 dnl
 dnl Check whether LFS support is available.
 dnl
diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index 3c799be82b1..a816ff79d16 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -185,9 +185,6 @@ GLIBCXX_CHECK_STDIO_PROTO
 GLIBCXX_CHECK_MATH11_PROTO
 GLIBCXX_CHECK_UCHAR_H
 
-# For the streamoff typedef.
-GLIBCXX_CHECK_INT64_T
-
 # For LFS support.
 GLIBCXX_CHECK_LFS
 
diff --git a/libstdc++-v3/include/bits/postypes.h b/libstdc++-v3/include/bits/postypes.h
index 718ff44628c..d2fbfc35dee 100644
--- a/libstdc++-v3/include/bits/postypes.h
+++ b/libstdc++-v3/include/bits/postypes.h
@@ -39,32 +39,6 @@
 
 #include  // For mbstate_t
 
-// XXX If  is really needed, make sure to define the macros
-// before including it, in order not to break  (and 
-// in C++11).  Reconsider all this as soon as possible...
-#if (defined(_GLIBCXX_HAVE_INT64_T) && !defined(_GLIBCXX_HAVE_INT64_T_LONG) \
- && !defined(_GLIBCXX_HAVE_INT64_T_LONG_LONG))
-
-#ifndef __STDC_LIMIT_MACROS
-# define _UNDEF__STDC_LIMIT_MACROS
-# define __STDC_LIMIT_MACROS
-#endif
-#ifndef __STDC_CONSTANT_MACROS
-# define _UNDEF__STDC_CONSTANT_MACROS
-# define __STDC_CONSTANT_MACROS
-#endif
-#include  // For int64_t
-#ifdef _UNDEF__STDC_LIMIT_MACROS
-# undef __STDC_LIMIT_MACROS
-# undef _UNDEF__STDC_LIMIT_MACROS
-#endif
-#ifdef _UNDEF__STDC_CONSTANT_MACROS
-# undef __STDC_CONSTANT_MACROS
-# undef _UNDEF__STDC_CONSTANT_MACROS
-#endif
-
-#endif
-
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -84,12 +58,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  Note: In versions of GCC up to and including GCC 3.3, streamoff
*  was 

[PATCH] Testcase for old PR 47059

2021-01-08 Thread Martin Jambor
Hi,

I stumbled across PR 47059 from 2010 which has been addressed by
store-merging.  I am going to close it but would like to add its
testcase too.

OK for trunk?

Thanks,

Martin


gcc/testsuite/ChangeLog:

2021-01-08  Martin Jambor  

PR tree-optimization/47059
* gcc.dg/tree-ssa/pr47059.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr47059.c | 45 +
 1 file changed, 45 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr47059.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr47059.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr47059.c
new file mode 100644
index 000..9f9c61aa213
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr47059.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-tree-optimized" } */
+
+
+struct struct1
+{
+  void *data;
+  unsigned short f1;
+  unsigned short f2;
+};
+typedef struct struct1 S1;
+
+struct struct2
+{
+  int f3;
+  S1 f4;
+};
+typedef struct struct2 S2;
+
+
+extern void foo (S1 *ptr);
+extern S2 gstruct2_var;
+extern S1 gstruct1_var;
+
+static inline S1 bar (const S1 *ptr) __attribute__ ((always_inline));
+
+static inline S1
+bar (const S1 *ptr)
+{
+  S1 ls_var = *ptr;
+  foo (_var);
+  return ls_var;
+}
+
+int
+main ()
+{
+  S2 *ps_var;
+
+  ps_var = _var;
+  ps_var->f4 = bar (_var);
+
+  return 0;
+}
+/* { dg-final { scan-tree-dump-times "short unsigned int\[^*\]*;" 0 
"optimized"} } */
-- 
2.29.2



Re: [PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2021-01-08 Thread Antony Polukhin via Gcc-patches
On Thu, Nov 12, 2020, 21:55 Antony Polukhin  wrote:

> Final bits for libstdc/71579
>

Gentle reminder on last patch

>


Re: [PATCH] IBM Z: Fix constraints in vpdi patterns

2021-01-08 Thread Andreas Krebbel via Gcc-patches
On 1/8/21 5:35 PM, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> 
> 
> 
> The destination register is only partially overwritten, so + should be
> used instead of =.
> 
> gcc/ChangeLog:
> 
> 2021-01-08  Ilya Leoshkevich  
> 
>   * config/s390/vector.md (*tf_to_fprx2_0): Rename from
>   *mov_tf_to_fprx2_0 for consistency, fix constraint.
>   (*tf_to_fprx2_1): Rename from *mov_tf_to_fprx2_1 for
>   consistency, fix constraint.

Ok, thanks!

Andreas


[PATCH] x86-64: Require lp64 for PR target/98482 tests

2021-01-08 Thread H.J. Lu via Gcc-patches
On Fri, Jan 8, 2021 at 6:43 AM H.J. Lu  wrote:
>
> On Fri, Jan 8, 2021 at 6:31 AM Uros Bizjak  wrote:
> >
> > On Fri, Jan 8, 2021 at 2:28 PM H.J. Lu  wrote:
> > >
> > > On Fri, Jan 8, 2021 at 4:50 AM H.J. Lu  wrote:
> > > >
> > > > On Fri, Jan 8, 2021 at 1:24 AM Uros Bizjak  wrote:
> > > > >
> > > > > > Since R10 is preserved when calling mcount, R10 can be used a 
> > > > > > scratch
> > > > > > register to call mcount in large model.
> > > > >
> > > > > Please mention that R10 can be used as a static chain registers and is
> > > > > preserved when calling mcount for nested functions.
> > > > >
> > > > > > gcc/
> > > > > >
> > > > > > PR target/98482
> > > > > > * config/i386/i386.c (x86_function_profiler): Use R10 to call
> > > > > > mcount in large model. Sorry for large model with PIC.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > >
> > > > > > PR target/98482
> > > > > > * gcc.target/i386/pr98482-1.c: New test.
> > > > > > * gcc.target/i386/pr98482-1.c: Likewise.
> > > > >
> > > > > OK with comment fixes.
> > > > >
> > > > > Thanks,
> > > > > Uros.
> > > > >
> > > > > +case CM_LARGE:
> > > > > +  /* NB: R10 can be used as a scratch register here since
> > > > > +R10 is preserved when calling mcount.  */
> > > > >
> > > > > Also mention that R10 can be used as a static chain register and is
> > > > > preserved when calling mcount for nested functions.
> > > > >
> > > > > +  fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
> > > > > +   mcount_name);
> > > > > +  break;
> > > >
> > > > This is the patch I am checking in.
> > > >
> > >
> > > For NO_PROFILE_COUNTERS targets, R11 is a scratch register.  We can use
> > > R10 and R11 to call mcount in large model with PIC.
> > >
> > > OK for master?
> >
> > +  fprintf (file, "\tmovabsq\t$%s@PLTOFF, %%r11\n",
> > +   mcount_name);
> >
> > Please put mcount_name in the same line (and please do the same for
> > case CM_MEDIUM_PIC).
>
> Fixed.
>
> > OK with the above fixes.
> >
> > Thanks,
> > Uros.
>
> Here is the updated patch I am checking in.
>

I am checking in this patch since -mcmodel=large is isn't
supported for x32.

-- 
H.J.
From e7797264b2b8bee6b9a429385d91af0858ee0c8a Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 8 Jan 2021 08:41:38 -0800
Subject: [PATCH] x86-64: Require lp64 for PR target/98482 tests

Require lp64 for PR target/98482 tests since -mcmodel=large is isn't
supported for x32.

	PR target/98482
	* gcc.target/i386/pr98482-1.c: Require lp64.
	* gcc.target/i386/pr98482-1.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr98482-1.c | 2 +-
 gcc/testsuite/gcc.target/i386/pr98482-2.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr98482-1.c b/gcc/testsuite/gcc.target/i386/pr98482-1.c
index 72d5ccb269c..912cbe09191 100644
--- a/gcc/testsuite/gcc.target/i386/pr98482-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr98482-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target { *-*-linux* && lp64 } } } */
 /* { dg-require-effective-target mfentry } */
 /* { dg-options "-fprofile -mfentry -O2 -mcmodel=large" } */
 /* { dg-final { scan-assembler "movabsq\t\\\$__fentry__, %r10\n\tcall\t\\*%r10" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr98482-2.c b/gcc/testsuite/gcc.target/i386/pr98482-2.c
index 0ee142db12c..03c62a4b67b 100644
--- a/gcc/testsuite/gcc.target/i386/pr98482-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr98482-2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target { *-*-linux* && lp64 } } } */
 /* { dg-require-effective-target mfentry } */
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-fpic -fprofile -mfentry -O2 -mcmodel=large" } */
-- 
2.29.2



[PATCH] IBM Z: Fix constraints in vpdi patterns

2021-01-08 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



The destination register is only partially overwritten, so + should be
used instead of =.

gcc/ChangeLog:

2021-01-08  Ilya Leoshkevich  

* config/s390/vector.md (*tf_to_fprx2_0): Rename from
*mov_tf_to_fprx2_0 for consistency, fix constraint.
(*tf_to_fprx2_1): Rename from *mov_tf_to_fprx2_1 for
consistency, fix constraint.
---
 gcc/config/s390/vector.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 5b8d75f18f0..0e3c31f5d4f 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -737,16 +737,16 @@ (define_insn "*vec_perm"
   "vperm\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
 
-(define_insn "*mov_tf_to_fprx2_0"
-  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0)
+(define_insn "*tf_to_fprx2_0"
+  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
(subreg:DF (match_operand:TF1 "general_operand"   "v") 0))]
   "TARGET_VXE"
   ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
   "vpdi\t%v0,%v1,%v0,1"
   [(set_attr "op_type" "VRR")])
 
-(define_insn "*mov_tf_to_fprx2_1"
-  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 8)
+(define_insn "*tf_to_fprx2_1"
+  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
(subreg:DF (match_operand:TF1 "general_operand"   "v") 8))]
   "TARGET_VXE"
   ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
-- 
2.26.2



Re: [PATCH v2] IBM Z: Introduce __LONG_DOUBLE_VX__ macro

2021-01-08 Thread Andreas Krebbel via Gcc-patches
On 1/8/21 2:14 PM, Ilya Leoshkevich wrote:
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563034.html
> v1 -> v2: Use TARGET_VXE_P instead of TARGET_Z14_P.
> 
> 
> 
> Give end users the opportunity to find out whether long doubles are
> stored in floating-point register pairs or in vector registers, so that
> they could fine-tune their asm statements.
> 
> gcc/ChangeLog:
> 
> 2020-12-14  Ilya Leoshkevich  
> 
>   * config/s390/s390-c.c (s390_def_or_undef_macro): Accept
>   callables instead of mask values.
>   (struct target_flag_set_p): New predicate.
>   (s390_cpu_cpp_builtins_internal): Define or undefine
>   __LONG_DOUBLE_VX__ macro.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-12-14  Ilya Leoshkevich  
> 
>   * gcc.target/s390/vector/long-double-vx-macro-off.c: New test.
>   * gcc.target/s390/vector/long-double-vx-macro-on.c: New test.

Ok, thanks!

Andreas


Re: [PATCH] c++: ICE with constexpr call that returns a PMF [PR98551]

2021-01-08 Thread Patrick Palka via Gcc-patches
On Thu, 7 Jan 2021, Jason Merrill wrote:

> On 1/7/21 10:10 AM, Patrick Palka wrote:
> > We shouldn't do replace_result_decl after evaluating a call that returns
> > a PMF because PMF temporaries aren't wrapped in a TARGET_EXPR (and so we
> > can't trust ctx->object), and PMF initializers can't be self-referential
> > anyway, so replace_result_decl would always be a no-op.  This fixes an
> > ICE from the sanity check in replace_result_decl in the below testcase
> > during cxx_eval_call_expression of the call f() in the initializer g(f()).
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/98551
> > * constexpr.c (cxx_eval_call_expression): Don't call
> > replace_result_decl when the result is a PMF.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR c++/98551
> > * g++.dg/cpp0x/constexpr-pmf2.C: New test.
> > ---
> >   gcc/cp/constexpr.c  | 1 +
> >   gcc/testsuite/g++.dg/cpp0x/constexpr-pmf2.C | 9 +
> >   2 files changed, 10 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-pmf2.C
> > 
> > diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
> > index 0c12f608d36..a7272d49d0d 100644
> > --- a/gcc/cp/constexpr.c
> > +++ b/gcc/cp/constexpr.c
> > @@ -2788,6 +2788,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx,
> > tree t,
> >current object under construction.  */
> > if (!*non_constant_p && ctx->object
> > && AGGREGATE_TYPE_P (TREE_TYPE (res))
> > +   && !TYPE_PTRMEMFUNC_P (TREE_TYPE (res))
> 
> It ought to work to change AGGREGATE_TYPE_P to CLASS_TYPE_P; we can't return
> an array, and a vector can't contain a pointer to itself.
> 
> Alternately, we could change the same-type assert in replace_result_decl to a
> test and return false if different.
> 
> OK with either change.

Sounds good.  I went with changing AGGREGATE_TYPE_P to CLASS_TYPE_P and
leaving alone the assert in replace_result_decl for now:

-- >8 --

Subject: [PATCH] c++: ICE with constexpr call that returns a PMF [PR98551]

We shouldn't do replace_result_decl after evaluating a call that returns
a PMF because PMF temporaries aren't wrapped in a TARGET_EXPR (and so we
can't trust ctx->object), and PMF initializers can't be self-referential
anyway, so replace_result_decl would always be a no-op.

To that end, this patch changes the relevant AGGREGATE_TYPE_P test to
CLASS_TYPE_P, which should rule out PMFs (as well as arrays, which we
can't return and therefore won't see here).  This fixes an ICE from the
sanity check in replace_result_decl in the below testcase during
constexpr evaluation of the call f() in the initializer g(f()).

gcc/cp/ChangeLog:

PR c++/98551
* constexpr.c (cxx_eval_call_expression): Check CLASS_TYPE_P
instead of AGGREGATE_TYPE_P before calling replace_result_decl.

gcc/testsuite/ChangeLog:

PR c++/98551
* g++.dg/cpp0x/constexpr-pmf2.C: New test.
---
 gcc/cp/constexpr.c  | 2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-pmf2.C | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-pmf2.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 4a5e6384da6..9dddc53ca52 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2790,7 +2790,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
/* Rewrite all occurrences of the function's RESULT_DECL with the
   current object under construction.  */
if (!*non_constant_p && ctx->object
-   && AGGREGATE_TYPE_P (TREE_TYPE (res))
+   && CLASS_TYPE_P (TREE_TYPE (res))
&& !is_empty_class (TREE_TYPE (res)))
  if (replace_result_decl (, res, ctx->object))
cacheable = false;
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-pmf2.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-pmf2.C
new file mode 100644
index 000..a76e712afe1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-pmf2.C
@@ -0,0 +1,9 @@
+// PR c++/98551
+// { dg-do compile { target c++11 } }
+
+struct A {};
+struct B { int t(); };
+using pmf = decltype(::t);
+constexpr pmf f() { return ::t; }
+constexpr A g(pmf) { return {}; };
+constexpr A x = g(f());
-- 
2.30.0



Re: [PATCH] c++: Fix access checking of scoped non-static member [PR98515]

2021-01-08 Thread Patrick Palka via Gcc-patches
On Thu, 7 Jan 2021, Jason Merrill wrote:

> On 1/7/21 5:47 PM, Patrick Palka wrote:
> > On Thu, 7 Jan 2021, Jason Merrill wrote:
> > 
> > > On 1/6/21 1:19 PM, Patrick Palka wrote:
> > > > In the first testcase below, we incorrectly reject the use of the
> > > > protected non-static member A::var0 from C::g() because
> > > > check_accessibility_of_qualified_id, at template parse time, determines
> > > > that the access doesn't go through 'this'.  (This happens because the
> > > > dependent base B of C doesn't have a binfo object, so it appears
> > > > to DERIVED_FROM_P that A is not an indirect base of C.)  From there
> > > > we create the corresponding deferred access check, which we then
> > > > perform at instantiation time and which (unsurprisingly) fails.
> > > > 
> > > > The problem ultimately seems to be that we can't, in general, know
> > > > whether a use of a scoped non-static member goes through 'this' until
> > > > instantiation time, as the second testcase below demonstrates.  So this
> > > > patch makes check_accessibility_of_qualified_id punt in this situation.
> > > > 
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK to
> > > > commit?
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > PR c++/98515
> > > > * semantics.c (check_accessibility_of_qualified_id): Punt if
> > > > we're checking the access of a scoped non-static member at
> > > > class template parse time.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > PR c++/98515
> > > > * g++.dg/template/access32.C: New test.
> > > > * g++.dg/template/access33.C: New test.
> > > > ---
> > > >gcc/cp/semantics.c   | 20 +++-
> > > >gcc/testsuite/g++.dg/template/access32.C |  8 
> > > >gcc/testsuite/g++.dg/template/access33.C |  9 +
> > > >3 files changed, 32 insertions(+), 5 deletions(-)
> > > >create mode 100644 gcc/testsuite/g++.dg/template/access32.C
> > > >create mode 100644 gcc/testsuite/g++.dg/template/access33.C
> > > > 
> > > > diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
> > > > index b448efe024a..f52b2e4d1e7 100644
> > > > --- a/gcc/cp/semantics.c
> > > > +++ b/gcc/cp/semantics.c
> > > > @@ -2107,14 +2107,24 @@ check_accessibility_of_qualified_id (tree decl,
> > > >  /* If the reference is to a non-static member of the
> > > >  current class, treat it as if it were referenced through
> > > >  `this'.  */
> > > > -  tree ct;
> > > >  if (DECL_NONSTATIC_MEMBER_P (decl)
> > > > - && current_class_ptr
> > > > - && DERIVED_FROM_P (scope, ct = current_nonlambda_class_type 
> > > > ()))
> > > > -   qualifying_type = ct;
> > > > + && current_class_ptr)
> > > > +   {
> > > > + if (dependent_type_p (TREE_TYPE (current_class_ptr)))
> > > 
> > > This should also look at current_nonlambda_class_type.
> > 
> > Ah, ack.  But it seems to me we really only need to be checking
> > dependence of current_nonlambda_class_type here.
> 
> Yes, that's what I meant, sorry about the ambiguous use of "also".  :)

Oops, I see what you had meant now, sorry about the confusion :)

> 
> >  IIUC, dependence of
> > these two types should coincide except in the case where we're inside a
> > generic lambda within a non-template class (in which case
> > current_class_ptr will dependent and current_nonlambda_class_type won't).
> > But in this case we have enough information to be able to resolve the
> > access check at parse time, I think (and so we shouldn't punt).
> > 
> > The below patch, which seems to pass 'make check-c++', checks the
> > dependence of current_nonlambda_class_type instead of that of
> > current_class_ptr.  Does this approach seem right?
> 
> OK.

Thanks.

> 
> > -- >8 --
> > 
> > Subject: [PATCH] c++: Fix access checking of scoped non-static member
> >   [PR98515]
> > 
> > In the first testcase below, we incorrectly reject the use of the
> > protected non-static member A::var0 from C::g() because
> > check_accessibility_of_qualified_id, at template parse time, determines
> > that the access doesn't go through 'this'.  (This happens because the
> > dependent base B of C doesn't have a binfo object, so it appears
> > to DERIVED_FROM_P that A is not an indirect base of C.)  From there
> > we create the corresponding deferred access check, which we then
> > perform at instantiation time and which (unsurprisingly) fails.
> > 
> > The problem ultimately seems to be that we can't in general determine
> > whether a use of a scoped non-static member goes through 'this' until
> > instantiation time, as the second testcase below illustrates.  So this
> > patch makes check_accessibility_of_qualified_id punt in such situations
> > to avoid creating a bogus deferred access check.
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/98515
> > * semantics.c (check_accessibility_of_qualified_id): Punt if
> > we're 

Re: [PATCH] c++: Add support for -std=c++2b

2021-01-08 Thread Marek Polacek via Gcc-patches
I think we should consider making this -std=c++23 right away this time,
since we're on a three-year release schedule.  Up to Jason though.

Marek



Re: [PATCH 2/4] PDP11: Use a mode with `const_double_zero' expressions

2021-01-08 Thread Paul Koning via Gcc-patches



> On Jan 7, 2021, at 8:50 PM, Maciej W. Rozycki  wrote:
> 
> ...
> 
> Provide a new iterator to provide copies of FP substitutions across the 
> FP modes supported as the substitutions now need to match the mode of 
> the operands.
> 
>   gcc/
>   * config/pdp11/pdp11.md (PDPfp): New mode iterator.
>   (fcc_cc, fcc_ccnz): Use it.  Add mode to `const_double_zero' and 
>   operands.
> ---
> gcc/config/pdp11/pdp11.md |   10 ++
> 1 file changed, 6 insertions(+), 4 deletions(-)
> 
> gcc-pdp11-const-double-zero-mode.diff
> Index: gcc/gcc/config/pdp11/pdp11.md
> ===
> --- gcc.orig/gcc/config/pdp11/pdp11.md
> +++ gcc/gcc/config/pdp11/pdp11.md
> @@ -82,6 +82,8 @@
> 
> (define_code_iterator SHF [ashift ashiftrt lshiftrt])
> 
> +(define_mode_iterator PDPfp [SF DF])
> +
> ;; Substitution to turn a CC clobber into a CC setter.  We have four of
> ;; these: for CCmode vs. CCNZmode, and for CC_REGNUM vs. FCC_REGNUM.
> (define_subst "cc_cc"
> @@ -101,19 +103,19 @@
>(set (match_dup 0) (match_dup 1))])
> 
> (define_subst "fcc_cc"
> -  [(set (match_operand 0 "") (match_operand 1 ""))
> +  [(set (match_operand:PDPfp 0 "") (match_operand:PDPfp 1 ""))
>(clobber (reg FCC_REGNUM))]
>   ""
>   [(set (reg:CC FCC_REGNUM)
> - (compare:CC (match_dup 1) (const_double_zero)))
> + (compare:CC (match_dup 1) (const_double_zero:PDPfp)))
>(set (match_dup 0) (match_dup 1))])
> 
> (define_subst "fcc_ccnz"
> -  [(set (match_operand 0 "") (match_operand 1 ""))
> +  [(set (match_operand:PDPfp 0 "") (match_operand:PDPfp 1 ""))
>(clobber (reg FCC_REGNUM))]
>   ""
>   [(set (reg:CCNZ FCC_REGNUM)
> - (compare:CCNZ (match_dup 1) (const_double_zero)))
> + (compare:CCNZ (match_dup 1) (const_double_zero:PDPfp)))
>(set (match_dup 0) (match_dup 1))])
> 
> (define_subst_attr "cc_cc" "cc_cc" "_nocc" "_cc")


Ok.  Thanks Maciej.

paul



Re: [PATCH] x86-64: Use R10 and R11 for profiling large model with PIC

2021-01-08 Thread H.J. Lu via Gcc-patches
On Fri, Jan 8, 2021 at 6:31 AM Uros Bizjak  wrote:
>
> On Fri, Jan 8, 2021 at 2:28 PM H.J. Lu  wrote:
> >
> > On Fri, Jan 8, 2021 at 4:50 AM H.J. Lu  wrote:
> > >
> > > On Fri, Jan 8, 2021 at 1:24 AM Uros Bizjak  wrote:
> > > >
> > > > > Since R10 is preserved when calling mcount, R10 can be used a scratch
> > > > > register to call mcount in large model.
> > > >
> > > > Please mention that R10 can be used as a static chain registers and is
> > > > preserved when calling mcount for nested functions.
> > > >
> > > > > gcc/
> > > > >
> > > > > PR target/98482
> > > > > * config/i386/i386.c (x86_function_profiler): Use R10 to call
> > > > > mcount in large model. Sorry for large model with PIC.
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > > PR target/98482
> > > > > * gcc.target/i386/pr98482-1.c: New test.
> > > > > * gcc.target/i386/pr98482-1.c: Likewise.
> > > >
> > > > OK with comment fixes.
> > > >
> > > > Thanks,
> > > > Uros.
> > > >
> > > > +case CM_LARGE:
> > > > +  /* NB: R10 can be used as a scratch register here since
> > > > +R10 is preserved when calling mcount.  */
> > > >
> > > > Also mention that R10 can be used as a static chain register and is
> > > > preserved when calling mcount for nested functions.
> > > >
> > > > +  fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
> > > > +   mcount_name);
> > > > +  break;
> > >
> > > This is the patch I am checking in.
> > >
> >
> > For NO_PROFILE_COUNTERS targets, R11 is a scratch register.  We can use
> > R10 and R11 to call mcount in large model with PIC.
> >
> > OK for master?
>
> +  fprintf (file, "\tmovabsq\t$%s@PLTOFF, %%r11\n",
> +   mcount_name);
>
> Please put mcount_name in the same line (and please do the same for
> case CM_MEDIUM_PIC).

Fixed.

> OK with the above fixes.
>
> Thanks,
> Uros.

Here is the updated patch I am checking in.

Thanks.

-- 
H.J.
From a9b5c1263cada2e0b3d59aa8a65b8bbc841775f2 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 8 Jan 2021 05:20:19 -0800
Subject: [PATCH] x86-64: Use R10 and R11 for profiling large model with PIC

For NO_PROFILE_COUNTERS targets, R11 is a scratch register.  We can use
R10 and R11 to call mcount in large model with PIC.

gcc/

	PR target/98482
	* config/i386/i386.c (x86_function_profiler): Use R10 and R11
	to call mcount in large model with PIC for NO_PROFILE_COUNTERS
	targets.

gcc/testsuite/

	PR target/98482
	* gcc.target/i386/pr98482-2.c: Updated.
---
 gcc/config/i386/i386.c| 12 ++--
 gcc/testsuite/gcc.target/i386/pr98482-2.c |  3 ++-
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d3068462fcd..d35af37a49c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20806,12 +20806,20 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
 		   mcount_name);
 	  break;
 	case CM_LARGE_PIC:
+#ifdef NO_PROFILE_COUNTERS
+	  fprintf (file, "1:\tmovabsq\t$_GLOBAL_OFFSET_TABLE_-1b, %%r11\n");
+	  fprintf (file, "\tleaq\t1b(%%rip), %%r10\n");
+	  fprintf (file, "\taddq\t%%r11, %%r10\n");
+	  fprintf (file, "\tmovabsq\t$%s@PLTOFF, %%r11\n", mcount_name);
+	  fprintf (file, "\taddq\t%%r11, %%r10\n");
+	  fprintf (file, "\tcall\t*%%r10\n");
+#else
 	  sorry ("profiling %<-mcmodel=large%> with PIC is not supported");
+#endif
 	  break;
 	case CM_SMALL_PIC:
 	case CM_MEDIUM_PIC:
-	  fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n",
-		   mcount_name);
+	  fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name);
 	  break;
 	default:
 	  x86_print_call_or_nop (file, mcount_name);
diff --git a/gcc/testsuite/gcc.target/i386/pr98482-2.c b/gcc/testsuite/gcc.target/i386/pr98482-2.c
index aed3ca4b6ff..0ee142db12c 100644
--- a/gcc/testsuite/gcc.target/i386/pr98482-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr98482-2.c
@@ -2,8 +2,9 @@
 /* { dg-require-effective-target mfentry } */
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-fpic -fprofile -mfentry -O2 -mcmodel=large" } */
+/* { dg-final { scan-assembler "movabsq\t\\\$__fentry__@PLTOFF, %r11\n\taddq\t%r11, %r10\n\tcall\t\\*%r10" } } */
 
 void
 func (void)
 {
-} /* { dg-message "sorry, unimplemented: profiling '-mcmodel=large' with PIC is not supported" } */
+}
-- 
2.29.2



Re: [PATCH] if-to-switch: remove memory leaks

2021-01-08 Thread Richard Biener via Gcc-patches
On Fri, Jan 8, 2021 at 3:27 PM Martin Liška  wrote:
>
> The patch removes some memory leaks spotted by valgrind.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

Richard.

> Thanks,
> Martin
>
>
> gcc/ChangeLog:
>
> * gimple-if-to-switch.cc (struct condition_info): Use auto_var.
> (if_chain::is_beneficial): Delete clusters
> (find_conditions): Make second argument of conditions_in_bbs a
> pointer so that we control over it's lifetime.
> (pass_if_to_switch::execute): Delete them.
> ---
>   gcc/gimple-if-to-switch.cc | 97 ++
>   1 file changed, 56 insertions(+), 41 deletions(-)
>
> diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
> index 6dba4e2c39c..560753d0311 100644
> --- a/gcc/gimple-if-to-switch.cc
> +++ b/gcc/gimple-if-to-switch.cc
> @@ -59,7 +59,7 @@ using namespace tree_switch_conversion;
>
>   struct condition_info
>   {
> -  typedef vec> mapping_vec;
> +  typedef auto_vec> mapping_vec;
>
> condition_info (gcond *cond): m_cond (cond), m_bb (gimple_bb (cond)),
>   m_forwarder_bb (NULL), m_ranges (), m_true_edge (NULL), m_false_edge 
> (NULL),
> @@ -75,7 +75,7 @@ struct condition_info
> gcond *m_cond;
> basic_block m_bb;
> basic_block m_forwarder_bb;
> -  vec m_ranges;
> +  auto_vec m_ranges;
> edge m_true_edge;
> edge m_false_edge;
> mapping_vec m_true_edge_phi_mapping;
> @@ -253,6 +253,10 @@ if_chain::is_beneficial ()
> r = output.length () < filtered_clusters.length ();
> if (r)
>   dump_clusters (, "BT can be built");
> +
> +  for (unsigned i = 0; i < output.length (); i++)
> +delete output[i];
> +
> output.release ();
> return r;
>   }
> @@ -377,7 +381,7 @@ convert_if_conditions_to_switch (if_chain *chain)
>
>   static void
>   find_conditions (basic_block bb,
> -hash_map *conditions_in_bbs)
> +hash_map *conditions_in_bbs)
>   {
> gimple_stmt_iterator gsi = gsi_last_nondebug_bb (bb);
> if (gsi_end_p (gsi))
> @@ -394,7 +398,7 @@ find_conditions (basic_block bb,
> tree rhs = gimple_cond_rhs (cond);
> tree_code code = gimple_cond_code (cond);
>
> -  condition_info info (cond);
> +  condition_info *info = new condition_info (cond);
>
> gassign *def;
> if (code == NE_EXPR
> @@ -405,49 +409,53 @@ find_conditions (basic_block bb,
> enum tree_code rhs_code = gimple_assign_rhs_code (def);
> if (rhs_code == BIT_IOR_EXPR)
> {
> - info.m_ranges.safe_grow (2, true);
> - init_range_entry (_ranges[0], gimple_assign_rhs1 (def), 
> NULL);
> - init_range_entry (_ranges[1], gimple_assign_rhs2 (def), 
> NULL);
> + info->m_ranges.safe_grow (2, true);
> + init_range_entry (>m_ranges[0], gimple_assign_rhs1 (def), 
> NULL);
> + init_range_entry (>m_ranges[1], gimple_assign_rhs2 (def), 
> NULL);
> }
>   }
> else
>   {
> -  info.m_ranges.safe_grow (1, true);
> -  init_range_entry (_ranges[0], NULL_TREE, cond);
> +  info->m_ranges.safe_grow (1, true);
> +  init_range_entry (>m_ranges[0], NULL_TREE, cond);
>   }
>
> /* All identified ranges must have equal expression and IN_P flag.  */
> -  if (!info.m_ranges.is_empty ())
> +  if (!info->m_ranges.is_empty ())
>   {
> edge true_edge, false_edge;
> -  tree expr = info.m_ranges[0].exp;
> -  bool in_p = info.m_ranges[0].in_p;
> +  tree expr = info->m_ranges[0].exp;
> +  bool in_p = info->m_ranges[0].in_p;
>
> extract_true_false_edges_from_block (bb, _edge, _edge);
> -  info.m_true_edge = in_p ? true_edge : false_edge;
> -  info.m_false_edge = in_p ? false_edge : true_edge;
> -
> -  for (unsigned i = 0; i < info.m_ranges.length (); ++i)
> -   if (info.m_ranges[i].exp == NULL_TREE
> -   || !INTEGRAL_TYPE_P (TREE_TYPE (info.m_ranges[i].exp))
> -   || info.m_ranges[i].low == NULL_TREE
> -   || info.m_ranges[i].high == NULL_TREE
> -   || (TYPE_PRECISION (TREE_TYPE (info.m_ranges[i].low))
> -   != TYPE_PRECISION (TREE_TYPE (info.m_ranges[i].high
> - return;
> -
> -  for (unsigned i = 1; i < info.m_ranges.length (); ++i)
> -   if (info.m_ranges[i].exp != expr
> -   || info.m_ranges[i].in_p != in_p)
> - return;
> -
> -  info.record_phi_mapping (info.m_true_edge,
> -  _true_edge_phi_mapping);
> -  info.record_phi_mapping (info.m_false_edge,
> -  _false_edge_phi_mapping);
> +  info->m_true_edge = in_p ? true_edge : false_edge;
> +  info->m_false_edge = in_p ? false_edge : true_edge;
> +
> +  for (unsigned i = 0; i < info->m_ranges.length (); ++i)
> +   if (info->m_ranges[i].exp == NULL_TREE
> +   || !INTEGRAL_TYPE_P (TREE_TYPE (info->m_ranges[i].exp))
> +   || info->m_ranges[i].low == 

Re: [PATCH] x86-64: Use R10 and R11 for profiling large model with PIC

2021-01-08 Thread Uros Bizjak via Gcc-patches
On Fri, Jan 8, 2021 at 2:28 PM H.J. Lu  wrote:
>
> On Fri, Jan 8, 2021 at 4:50 AM H.J. Lu  wrote:
> >
> > On Fri, Jan 8, 2021 at 1:24 AM Uros Bizjak  wrote:
> > >
> > > > Since R10 is preserved when calling mcount, R10 can be used a scratch
> > > > register to call mcount in large model.
> > >
> > > Please mention that R10 can be used as a static chain registers and is
> > > preserved when calling mcount for nested functions.
> > >
> > > > gcc/
> > > >
> > > > PR target/98482
> > > > * config/i386/i386.c (x86_function_profiler): Use R10 to call
> > > > mcount in large model. Sorry for large model with PIC.
> > > >
> > > > gcc/testsuite/
> > > >
> > > > PR target/98482
> > > > * gcc.target/i386/pr98482-1.c: New test.
> > > > * gcc.target/i386/pr98482-1.c: Likewise.
> > >
> > > OK with comment fixes.
> > >
> > > Thanks,
> > > Uros.
> > >
> > > +case CM_LARGE:
> > > +  /* NB: R10 can be used as a scratch register here since
> > > +R10 is preserved when calling mcount.  */
> > >
> > > Also mention that R10 can be used as a static chain register and is
> > > preserved when calling mcount for nested functions.
> > >
> > > +  fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
> > > +   mcount_name);
> > > +  break;
> >
> > This is the patch I am checking in.
> >
>
> For NO_PROFILE_COUNTERS targets, R11 is a scratch register.  We can use
> R10 and R11 to call mcount in large model with PIC.
>
> OK for master?

+  fprintf (file, "\tmovabsq\t$%s@PLTOFF, %%r11\n",
+   mcount_name);

Please put mcount_name in the same line (and please do the same for
case CM_MEDIUM_PIC).

OK with the above fixes.

Thanks,
Uros.


[PATCH] if-to-switch: remove memory leaks

2021-01-08 Thread Martin Liška

The patch removes some memory leaks spotted by valgrind.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin


gcc/ChangeLog:

* gimple-if-to-switch.cc (struct condition_info): Use auto_var.
(if_chain::is_beneficial): Delete clusters
(find_conditions): Make second argument of conditions_in_bbs a
pointer so that we control over it's lifetime.
(pass_if_to_switch::execute): Delete them.
---
 gcc/gimple-if-to-switch.cc | 97 ++
 1 file changed, 56 insertions(+), 41 deletions(-)

diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
index 6dba4e2c39c..560753d0311 100644
--- a/gcc/gimple-if-to-switch.cc
+++ b/gcc/gimple-if-to-switch.cc
@@ -59,7 +59,7 @@ using namespace tree_switch_conversion;
 
 struct condition_info

 {
-  typedef vec> mapping_vec;
+  typedef auto_vec> mapping_vec;
 
   condition_info (gcond *cond): m_cond (cond), m_bb (gimple_bb (cond)),

 m_forwarder_bb (NULL), m_ranges (), m_true_edge (NULL), m_false_edge 
(NULL),
@@ -75,7 +75,7 @@ struct condition_info
   gcond *m_cond;
   basic_block m_bb;
   basic_block m_forwarder_bb;
-  vec m_ranges;
+  auto_vec m_ranges;
   edge m_true_edge;
   edge m_false_edge;
   mapping_vec m_true_edge_phi_mapping;
@@ -253,6 +253,10 @@ if_chain::is_beneficial ()
   r = output.length () < filtered_clusters.length ();
   if (r)
 dump_clusters (, "BT can be built");
+
+  for (unsigned i = 0; i < output.length (); i++)
+delete output[i];
+
   output.release ();
   return r;
 }
@@ -377,7 +381,7 @@ convert_if_conditions_to_switch (if_chain *chain)
 
 static void

 find_conditions (basic_block bb,
-hash_map *conditions_in_bbs)
+hash_map *conditions_in_bbs)
 {
   gimple_stmt_iterator gsi = gsi_last_nondebug_bb (bb);
   if (gsi_end_p (gsi))
@@ -394,7 +398,7 @@ find_conditions (basic_block bb,
   tree rhs = gimple_cond_rhs (cond);
   tree_code code = gimple_cond_code (cond);
 
-  condition_info info (cond);

+  condition_info *info = new condition_info (cond);
 
   gassign *def;

   if (code == NE_EXPR
@@ -405,49 +409,53 @@ find_conditions (basic_block bb,
   enum tree_code rhs_code = gimple_assign_rhs_code (def);
   if (rhs_code == BIT_IOR_EXPR)
{
- info.m_ranges.safe_grow (2, true);
- init_range_entry (_ranges[0], gimple_assign_rhs1 (def), NULL);
- init_range_entry (_ranges[1], gimple_assign_rhs2 (def), NULL);
+ info->m_ranges.safe_grow (2, true);
+ init_range_entry (>m_ranges[0], gimple_assign_rhs1 (def), NULL);
+ init_range_entry (>m_ranges[1], gimple_assign_rhs2 (def), NULL);
}
 }
   else
 {
-  info.m_ranges.safe_grow (1, true);
-  init_range_entry (_ranges[0], NULL_TREE, cond);
+  info->m_ranges.safe_grow (1, true);
+  init_range_entry (>m_ranges[0], NULL_TREE, cond);
 }
 
   /* All identified ranges must have equal expression and IN_P flag.  */

-  if (!info.m_ranges.is_empty ())
+  if (!info->m_ranges.is_empty ())
 {
   edge true_edge, false_edge;
-  tree expr = info.m_ranges[0].exp;
-  bool in_p = info.m_ranges[0].in_p;
+  tree expr = info->m_ranges[0].exp;
+  bool in_p = info->m_ranges[0].in_p;
 
   extract_true_false_edges_from_block (bb, _edge, _edge);

-  info.m_true_edge = in_p ? true_edge : false_edge;
-  info.m_false_edge = in_p ? false_edge : true_edge;
-
-  for (unsigned i = 0; i < info.m_ranges.length (); ++i)
-   if (info.m_ranges[i].exp == NULL_TREE
-   || !INTEGRAL_TYPE_P (TREE_TYPE (info.m_ranges[i].exp))
-   || info.m_ranges[i].low == NULL_TREE
-   || info.m_ranges[i].high == NULL_TREE
-   || (TYPE_PRECISION (TREE_TYPE (info.m_ranges[i].low))
-   != TYPE_PRECISION (TREE_TYPE (info.m_ranges[i].high
- return;
-
-  for (unsigned i = 1; i < info.m_ranges.length (); ++i)
-   if (info.m_ranges[i].exp != expr
-   || info.m_ranges[i].in_p != in_p)
- return;
-
-  info.record_phi_mapping (info.m_true_edge,
-  _true_edge_phi_mapping);
-  info.record_phi_mapping (info.m_false_edge,
-  _false_edge_phi_mapping);
+  info->m_true_edge = in_p ? true_edge : false_edge;
+  info->m_false_edge = in_p ? false_edge : true_edge;
+
+  for (unsigned i = 0; i < info->m_ranges.length (); ++i)
+   if (info->m_ranges[i].exp == NULL_TREE
+   || !INTEGRAL_TYPE_P (TREE_TYPE (info->m_ranges[i].exp))
+   || info->m_ranges[i].low == NULL_TREE
+   || info->m_ranges[i].high == NULL_TREE
+   || (TYPE_PRECISION (TREE_TYPE (info->m_ranges[i].low))
+   != TYPE_PRECISION (TREE_TYPE (info->m_ranges[i].high
+ goto exit;
+
+  for (unsigned i = 1; i < info->m_ranges.length (); ++i)
+   if (info->m_ranges[i].exp != expr
+   || 

Re: [PATCH] testsuite: Fix test failures from outputs.exp [PR98225]

2021-01-08 Thread David Edelsohn via Gcc-patches
On Thu, Jan 7, 2021 at 5:18 PM Bernd Edlinger  wrote:
>
> Hi,
>
> On 1/7/21 5:12 PM, Rainer Orth wrote:
> >   The unsetenv needs to be wrapped in
> >
> > if [info exists env(MAKEFLAGS)] {
> >
>
> Done.
>
> > @@ -163,6 +167,9 @@ proc outest { test sources opts dirs out
> >   if { $ogl != {} } {
> >   pass "$test: $d$o"
> >   file delete $ogl
> > + } elseif { [string match "*.ld1_args" $o] } {
> > + # This file may be missing if !HAVE_GNU_LD
> > + pass "$test: $d$o"
> >
> >   Always PASSing the test even if it isn't run is wrong.  Either wrap
> >   the whole group of tests with response files in
> >
> > if [check_effective_target_gld] {
> >
> >   or make the test for the *.ld1_args file conditional on that
> >   (e.g. along the lines of $ltop used elsewhere).  I'd welcome input
> >   from Alexandre which is preferred.
> >
>
> Ah, yes that is a good idea.  Thanks.
>
>
> I think the .cdtor.* handling, is probably a bad example that I followed here.
> I don't know why that is there in the first place, as there
> are no C++ test cases, these files should not be created at all.
> If they are ever created we would have a couple of other files created
> as well IMHO.
> If there are still missing files in some cases,
> I'd prefer to track these per test case, instead of globally.
>
> Therefore I propose to remove that exception for now.
>
> Is it OK for trunk?

As Alex said, please don't just remove features and functionality if
you don't know why they were added.  The history is online in the
mailing list and the repo history.

AIX uses constructors to register EH frames and libgcc has an EH
frame.  ctors and dtors can be found in non-C++ code.

Thanks, David


[PATCH] reset the SCEV htab after FRE in loop pipeline

2021-01-08 Thread Richard Biener
When running FRE in the loop pipeline (as part of the conditionally
scheduled scalar cleanups) we have to reset the SCEV hashtable as
otherwise we can end up with stale entries and all sorts of problems.

Catched by my out-of-tree verifier for this problem.

Bootstrap & regtest pending on x86_64-unknown-linux-gnu.

2021-01-08  Richard Biener  

* tree-ssa-sccvn.c (pass_fre::execute): Reset the SCEV hash table.
---
 gcc/tree-ssa-sccvn.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 17016853a34..0ba846f0be2 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -7883,6 +7883,9 @@ pass_fre::execute (function *fun)
   if (iterate_p)
 loop_optimizer_finalize ();
 
+  if (scev_initialized_p ())
+scev_reset_htab ();
+
   /* For late FRE after IVOPTs and unrolling, see if we can
  remove some TREE_ADDRESSABLE and rewrite stuff into SSA.  */
   if (!may_iterate)
-- 
2.26.2


[PATCH] fix vectorizer memleaks

2021-01-08 Thread Richard Biener
This plugs two memleaks in the vectorizer.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2021-01-08  Richard Biener  

* tree-vect-slp.c (scalar_stmts_to_slp_tree_map_t): Fix.
(vect_build_slp_tree): On cache hit release the matched
scalar stmts vector.
* tree-vect-stmts.c (vectorizable_store): Properly free
vec_oprnds before possibly gathering them again.
---
 gcc/tree-vect-slp.c   | 3 ++-
 gcc/tree-vect-stmts.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index e0f3539aa54..e7191ed3267 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1378,7 +1378,7 @@ bst_traits::equal (value_type existing, value_type 
candidate)
   return true;
 }
 
-typedef hash_map , slp_tree,
+typedef hash_map , slp_tree,
  simple_hashmap_traits  >
   scalar_stmts_to_slp_tree_map_t;
 
@@ -1405,6 +1405,7 @@ vect_build_slp_tree (vec_info *vinfo,
{
  SLP_TREE_REF_COUNT (*leader)++;
  vect_update_max_nunits (max_nunits, (*leader)->max_nunits);
+ stmts.release ();
}
   return *leader;
 }
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 014f1aff4c1..068e4982303 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7717,11 +7717,11 @@ vectorizable_store (vec_info *vinfo,
}
}
  next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info);
+ vec_oprnds.release ();
  if (slp)
break;
}
 
-  vec_oprnds.release ();
   return true;
 }
 
-- 
2.26.2


[PATCH] x86-64: Use R10 and R11 for profiling large model with PIC

2021-01-08 Thread H.J. Lu via Gcc-patches
On Fri, Jan 8, 2021 at 4:50 AM H.J. Lu  wrote:
>
> On Fri, Jan 8, 2021 at 1:24 AM Uros Bizjak  wrote:
> >
> > > Since R10 is preserved when calling mcount, R10 can be used a scratch
> > > register to call mcount in large model.
> >
> > Please mention that R10 can be used as a static chain registers and is
> > preserved when calling mcount for nested functions.
> >
> > > gcc/
> > >
> > > PR target/98482
> > > * config/i386/i386.c (x86_function_profiler): Use R10 to call
> > > mcount in large model. Sorry for large model with PIC.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/98482
> > > * gcc.target/i386/pr98482-1.c: New test.
> > > * gcc.target/i386/pr98482-1.c: Likewise.
> >
> > OK with comment fixes.
> >
> > Thanks,
> > Uros.
> >
> > +case CM_LARGE:
> > +  /* NB: R10 can be used as a scratch register here since
> > +R10 is preserved when calling mcount.  */
> >
> > Also mention that R10 can be used as a static chain register and is
> > preserved when calling mcount for nested functions.
> >
> > +  fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
> > +   mcount_name);
> > +  break;
>
> This is the patch I am checking in.
>

For NO_PROFILE_COUNTERS targets, R11 is a scratch register.  We can use
R10 and R11 to call mcount in large model with PIC.

OK for master?

Thanks.

-- 
H.J.
From b2e0bccbdba630a1f7f8b601e19b7302e375e240 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 8 Jan 2021 05:20:19 -0800
Subject: [PATCH] x86-64: Use R10 and R11 for profiling large model with PIC

For NO_PROFILE_COUNTERS targets, R11 is a scratch register.  We can use
R10 and R11 to call mcount in large model with PIC.

gcc/

	PR target/98482
	* config/i386/i386.c (x86_function_profiler): Use R10 and R11
	to call mcount in large model with PIC for NO_PROFILE_COUNTERS
	targets.

gcc/testsuite/

	PR target/98482
	* gcc.target/i386/pr98482-2.c: Updated.
---
 gcc/config/i386/i386.c| 10 ++
 gcc/testsuite/gcc.target/i386/pr98482-2.c |  3 ++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d3068462fcd..50380865acd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20806,7 +20806,17 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
 		   mcount_name);
 	  break;
 	case CM_LARGE_PIC:
+#ifdef NO_PROFILE_COUNTERS
+	  fprintf (file, "1:\tmovabsq\t$_GLOBAL_OFFSET_TABLE_-1b, %%r11\n");
+	  fprintf (file, "\tleaq\t1b(%%rip), %%r10\n");
+	  fprintf (file, "\taddq\t%%r11, %%r10\n");
+	  fprintf (file, "\tmovabsq\t$%s@PLTOFF, %%r11\n",
+		   mcount_name);
+	  fprintf (file, "\taddq\t%%r11, %%r10\n");
+	  fprintf (file, "\tcall\t*%%r10\n");
+#else
 	  sorry ("profiling %<-mcmodel=large%> with PIC is not supported");
+#endif
 	  break;
 	case CM_SMALL_PIC:
 	case CM_MEDIUM_PIC:
diff --git a/gcc/testsuite/gcc.target/i386/pr98482-2.c b/gcc/testsuite/gcc.target/i386/pr98482-2.c
index aed3ca4b6ff..0ee142db12c 100644
--- a/gcc/testsuite/gcc.target/i386/pr98482-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr98482-2.c
@@ -2,8 +2,9 @@
 /* { dg-require-effective-target mfentry } */
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-fpic -fprofile -mfentry -O2 -mcmodel=large" } */
+/* { dg-final { scan-assembler "movabsq\t\\\$__fentry__@PLTOFF, %r11\n\taddq\t%r11, %r10\n\tcall\t\\*%r10" } } */
 
 void
 func (void)
 {
-} /* { dg-message "sorry, unimplemented: profiling '-mcmodel=large' with PIC is not supported" } */
+}
-- 
2.29.2



[PATCH v2] IBM Z: Introduce __LONG_DOUBLE_VX__ macro

2021-01-08 Thread Ilya Leoshkevich via Gcc-patches
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563034.html
v1 -> v2: Use TARGET_VXE_P instead of TARGET_Z14_P.



Give end users the opportunity to find out whether long doubles are
stored in floating-point register pairs or in vector registers, so that
they could fine-tune their asm statements.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390-c.c (s390_def_or_undef_macro): Accept
callables instead of mask values.
(struct target_flag_set_p): New predicate.
(s390_cpu_cpp_builtins_internal): Define or undefine
__LONG_DOUBLE_VX__ macro.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-vx-macro-off.c: New test.
* gcc.target/s390/vector/long-double-vx-macro-on.c: New test.
---
 gcc/config/s390/s390-c.c  | 59 ---
 .../s390/vector/long-double-vx-macro-off-on.c | 11 
 .../s390/vector/long-double-vx-macro-on-off.c | 11 
 3 files changed, 60 insertions(+), 21 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c

diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c
index 95cd2df505d..a5f5f56311a 100644
--- a/gcc/config/s390/s390-c.c
+++ b/gcc/config/s390/s390-c.c
@@ -294,9 +294,9 @@ s390_macro_to_expand (cpp_reader *pfile, const cpp_token 
*tok)
 /* Helper function that defines or undefines macros.  If SET is true, the macro
MACRO_DEF is defined.  If SET is false, the macro MACRO_UNDEF is undefined.
Nothing is done if SET and WAS_SET have the same value.  */
+template 
 static void
-s390_def_or_undef_macro (cpp_reader *pfile,
-unsigned int mask,
+s390_def_or_undef_macro (cpp_reader *pfile, F is_set,
 const struct cl_target_option *old_opts,
 const struct cl_target_option *new_opts,
 const char *macro_def, const char *macro_undef)
@@ -304,8 +304,8 @@ s390_def_or_undef_macro (cpp_reader *pfile,
   bool was_set;
   bool set;
 
-  was_set = (!old_opts) ? false : old_opts->x_target_flags & mask;
-  set = new_opts->x_target_flags & mask;
+  was_set = (!old_opts) ? false : is_set (old_opts);
+  set = is_set (new_opts);
   if (was_set == set)
 return;
   if (set)
@@ -314,6 +314,19 @@ s390_def_or_undef_macro (cpp_reader *pfile,
 cpp_undef (pfile, macro_undef);
 }
 
+struct target_flag_set_p
+{
+  target_flag_set_p (unsigned int mask) : m_mask (mask) {}
+
+  bool
+  operator() (const struct cl_target_option *opts) const
+  {
+return opts->x_target_flags & m_mask;
+  }
+
+  unsigned int m_mask;
+};
+
 /* Internal function to either define or undef the appropriate system
macros.  */
 static void
@@ -321,18 +334,18 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile,
struct cl_target_option *opts,
const struct cl_target_option *old_opts)
 {
-  s390_def_or_undef_macro (pfile, MASK_OPT_HTM, old_opts, opts,
-  "__HTM__", "__HTM__");
-  s390_def_or_undef_macro (pfile, MASK_OPT_VX, old_opts, opts,
-  "__VX__", "__VX__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__VEC__=10303", "__VEC__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__vector=__attribute__((vector_size(16)))",
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_HTM), old_opts,
+  opts, "__HTM__", "__HTM__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_VX), old_opts,
+  opts, "__VX__", "__VX__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts,
+  opts, "__VEC__=10303", "__VEC__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts,
+  opts, "__vector=__attribute__((vector_size(16)))",
   "__vector__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__bool=__attribute__((s390_vector_bool)) unsigned",
-  "__bool");
+  s390_def_or_undef_macro (
+  pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, opts,
+  "__bool=__attribute__((s390_vector_bool)) unsigned", "__bool");
   {
 char macro_def[64];
 gcc_assert (s390_arch != PROCESSOR_NATIVE);
@@ -340,16 +353,20 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile,
 cpp_undef (pfile, "__ARCH__");
 cpp_define (pfile, macro_def);
   }
+  s390_def_or_undef_macro (
+  pfile,
+  [] (const struct cl_target_option *opts) { return TARGET_VXE_P (opts); },
+  old_opts, opts, "__LONG_DOUBLE_VX__", "__LONG_DOUBLE_VX__");
 
   if (!flag_iso)
 {
-  s390_def_or_undef_macro 

[PATCH] tree-optimization/98544 - more permute optimization fixes

2021-01-08 Thread Richard Biener
Permute nodes are not transparent to the permute of their children.
Instead we have to materialize child permutes always and in future
may treat permute nodes as the source of arbitrary permutes as
we can permute the lane permutation vector at will (as the target
supports in the end).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-01-08  Richard Biener  

PR tree-optimization/98544
* tree-vect-slp.c (vect_optimize_slp): Always materialize
permutes at a permute node.

* gcc.dg/vect/bb-slp-pr98544.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr98544.c | 32 
 gcc/tree-vect-slp.c| 34 +-
 2 files changed, 53 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr98544.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr98544.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr98544.c
new file mode 100644
index 000..756dc02ebad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr98544.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+double a[2], b[2], c[2], d[2];
+
+void __attribute__((noipa))
+foo()
+{
+  double a0 = a[0];
+  double a1 = a[1];
+  double b0 = b[0];
+  double b1 = b[1];
+  double c0 = c[0];
+  double c1 = c[1];
+  double tem1 = a1 - b1;
+  double tem2 = a0 + b0;
+  d[0] = tem1 * c1;
+  d[1] = tem2 * c0;
+}
+
+int main()
+{
+  a[0] = 1.;
+  a[1] = 2.;
+  b[0] = 3.;
+  b[1] = 4.;
+  c[0] = 2.;
+  c[1] = 3.;
+  foo ();
+  if (d[0] != -6. || d[1] != 8.)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index c9da8457e5e..e0f3539aa54 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3029,19 +3029,27 @@ vect_optimize_slp (vec_info *vinfo)
 
  /* Decide on permute materialization.  Look whether there's
 a use (pred) edge that is permuted differently than us.
-In that case mark ourselves so the permutation is applied.  */
- bool all_preds_permuted = slpg->vertices[idx].pred != NULL;
- for (graph_edge *pred = slpg->vertices[idx].pred;
-  pred; pred = pred->pred_next)
-   {
- gcc_checking_assert (bitmap_bit_p (n_visited, pred->src));
- int pred_perm = n_perm[pred->src];
- if (!vect_slp_perms_eq (perms, perm, pred_perm))
-   {
- all_preds_permuted = false;
- break;
-   }
-   }
+In that case mark ourselves so the permutation is applied.
+For VEC_PERM_EXPRs the permutation doesn't carry along
+from children to parents so force materialization at the
+point of the VEC_PERM_EXPR.  In principle VEC_PERM_EXPRs
+are a source of an arbitrary permutation again, similar
+to constants/externals - that's something we do not yet
+optimally handle.  */
+ bool all_preds_permuted = (SLP_TREE_CODE (node) != VEC_PERM_EXPR
+&& slpg->vertices[idx].pred != NULL);
+ if (all_preds_permuted)
+   for (graph_edge *pred = slpg->vertices[idx].pred;
+pred; pred = pred->pred_next)
+ {
+   gcc_checking_assert (bitmap_bit_p (n_visited, pred->src));
+   int pred_perm = n_perm[pred->src];
+   if (!vect_slp_perms_eq (perms, perm, pred_perm))
+ {
+   all_preds_permuted = false;
+   break;
+ }
+ }
  if (!all_preds_permuted)
{
  if (!bitmap_bit_p (n_materialize, idx))
-- 
2.26.2


Re: [PATCH] x86-64: Use R10 for profiling large model

2021-01-08 Thread H.J. Lu via Gcc-patches
On Fri, Jan 8, 2021 at 1:24 AM Uros Bizjak  wrote:
>
> > Since R10 is preserved when calling mcount, R10 can be used a scratch
> > register to call mcount in large model.
>
> Please mention that R10 can be used as a static chain registers and is
> preserved when calling mcount for nested functions.
>
> > gcc/
> >
> > PR target/98482
> > * config/i386/i386.c (x86_function_profiler): Use R10 to call
> > mcount in large model. Sorry for large model with PIC.
> >
> > gcc/testsuite/
> >
> > PR target/98482
> > * gcc.target/i386/pr98482-1.c: New test.
> > * gcc.target/i386/pr98482-1.c: Likewise.
>
> OK with comment fixes.
>
> Thanks,
> Uros.
>
> +case CM_LARGE:
> +  /* NB: R10 can be used as a scratch register here since
> +R10 is preserved when calling mcount.  */
>
> Also mention that R10 can be used as a static chain register and is
> preserved when calling mcount for nested functions.
>
> +  fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
> +   mcount_name);
> +  break;

This is the patch I am checking in.

Thanks.

-- 
H.J.
From 6ddaec60b84ccdfb11224440bfffa86112244d88 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 7 Jan 2021 14:27:49 -0800
Subject: [PATCH] x86-64: Use R10 for profiling large model

R10 is caller-saved.  Although it can be used as a static chain register,
it is preserved when calling mcount for nested functions.  Use R10 as a
scratch register to call mcount in large model.

gcc/

	PR target/98482
	* config/i386/i386.c (x86_function_profiler): Use R10 to call
	mcount in large model.  Sorry for large model with PIC.

gcc/testsuite/

	PR target/98482
	* gcc.target/i386/pr98482-1.c: New test.
	* gcc.target/i386/pr98482-1.c: Likewise.
---
 gcc/config/i386/i386.c| 26 +--
 gcc/testsuite/gcc.target/i386/pr98482-1.c |  9 
 gcc/testsuite/gcc.target/i386/pr98482-2.c |  9 
 3 files changed, 42 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr98482-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr98482-2.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fad50e7e537..d3068462fcd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20794,8 +20794,30 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
   fprintf (file, "\tleaq\t%sP%d(%%rip),%%r11\n", LPREFIX, labelno);
 #endif
 
-  if (!TARGET_PECOFF && flag_pic)
-	fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name);
+  if (!TARGET_PECOFF)
+	{
+	  switch (ix86_cmodel)
+	{
+	case CM_LARGE:
+	  /* NB: R10 is caller-saved.  Although it can be used as a
+		 static chain register, it is preserved when calling
+		 mcount for nested functions.  */
+	  fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
+		   mcount_name);
+	  break;
+	case CM_LARGE_PIC:
+	  sorry ("profiling %<-mcmodel=large%> with PIC is not supported");
+	  break;
+	case CM_SMALL_PIC:
+	case CM_MEDIUM_PIC:
+	  fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n",
+		   mcount_name);
+	  break;
+	default:
+	  x86_print_call_or_nop (file, mcount_name);
+	  break;
+	}
+	}
   else
 	x86_print_call_or_nop (file, mcount_name);
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr98482-1.c b/gcc/testsuite/gcc.target/i386/pr98482-1.c
new file mode 100644
index 000..72d5ccb269c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr98482-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-require-effective-target mfentry } */
+/* { dg-options "-fprofile -mfentry -O2 -mcmodel=large" } */
+/* { dg-final { scan-assembler "movabsq\t\\\$__fentry__, %r10\n\tcall\t\\*%r10" } } */
+
+void
+func (void)
+{
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr98482-2.c b/gcc/testsuite/gcc.target/i386/pr98482-2.c
new file mode 100644
index 000..aed3ca4b6ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr98482-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-require-effective-target mfentry } */
+/* { dg-require-effective-target fpic } */
+/* { dg-options "-fpic -fprofile -mfentry -O2 -mcmodel=large" } */
+
+void
+func (void)
+{
+} /* { dg-message "sorry, unimplemented: profiling '-mcmodel=large' with PIC is not supported" } */
-- 
2.29.2



[PATCH] VAX/testsuite: Remove notsi comparison elimination regressions

2021-01-08 Thread Maciej W. Rozycki
Remove fallout from commit 0bd675183d94 ("match.pd: Add ~(X - Y) -> ~X 
+ Y simplification [PR96685]") and paper over the regression caused as 
it is not the matter of the test cases affected.

Previously assembly like this:

.text
.align 1
.globl eq_notsi
.type   eq_notsi, @function
eq_notsi:
.word 0 # 35[c=0]  procedure_entry_mask
subl2 $4,%sp# 46[c=32]  *addsi3
mcoml 4(%ap),%r0# 32[c=16]  *one_cmplsi2_ccz
jeql .L1# 34[c=26]  *branch_ccz
addl2 $2,%r0# 31[c=32]  *addsi3
.L1:
ret # 40[c=0]  return
.size   eq_notsi, .-eq_notsi

was produced.  Now this:

.text
.align 1
.globl eq_notsi
.type   eq_notsi, @function
eq_notsi:
.word 0 # 36[c=0]  procedure_entry_mask
subl2 $4,%sp# 48[c=32]  *addsi3
movl 4(%ap),%r0 # 33[c=16]  *movsi_2
cmpl %r0,$-1# 34[c=8]  *cmpsi_ccz/1
jeql .L3# 35[c=26]  *branch_ccz
subl3 %r0,$1,%r0# 32[c=32]  *subsi3/1
ret # 27[c=0]  return
.L3:
clrl %r0# 31[c=2]  *movsi_2
ret # 41[c=0]  return
.size   eq_notsi, .-eq_notsi

is, which cannot work with post-reload comparison elimination, due to 
the comparison against -1 rather than 0.

Use subtraction from a constant then rather than addition as the former 
operation is not transformed, removing these regressions:

FAIL: gcc.target/vax/cmpelim-eq-notsi.c   -O1   scan-rtl-dump-times cmpelim 
"deleting insn with uid" 1
FAIL: gcc.target/vax/cmpelim-eq-notsi.c   -O1   scan-assembler-not 
\t(bit|cmpz?|tst).
FAIL: gcc.target/vax/cmpelim-eq-notsi.c   -O1   scan-assembler one_cmplsi[^ 
]*_ccz(/[0-9]+)?\n
FAIL: gcc.target/vax/cmpelim-lt-notsi.c   -O1   scan-rtl-dump-times cmpelim 
"deleting insn with uid" 1
FAIL: gcc.target/vax/cmpelim-lt-notsi.c   -O1   scan-assembler-not 
\t(bit|cmpz?|tst).
FAIL: gcc.target/vax/cmpelim-lt-notsi.c   -O1   scan-assembler one_cmplsi[^ 
]*_ccn(/[0-9]+)?\n

and likewise across some of the other the optimization levels verified.  

The LE variant appears unaffected as the new transformation produces 
slightly different although still suboptimal code:

.text
.align 1
.globl le_notsi
.type   le_notsi, @function
le_notsi:
.word 0 # 27[c=0]  procedure_entry_mask
subl2 $4,%sp# 34[c=32]  *addsi3
movl 4(%ap),%r1 # 23[c=16]  *movsi_2
mcoml %r1,%r0   # 24[c=8]  *one_cmplsi2_ccnz
jleq .L1# 26[c=26]  *branch_ccnz
subl3 %r1,$1,%r0# 22[c=32]  *subsi3/1
.L1:
ret # 32[c=0]  return
.size   le_notsi, .-le_notsi

but update the test case too, for consistency with the other two.

gcc/testsuite/
* gcc.target/vax/cmpelim-eq-notsi.c: Use subtraction from a 
constant then rather than addition.
* gcc.target/vax/cmpelim-le-notsi.c: Likewise.
* gcc.target/vax/cmpelim-lt-notsi.c: Likewise.
---
 gcc/testsuite/gcc.target/vax/cmpelim-eq-notsi.c |4 ++--
 gcc/testsuite/gcc.target/vax/cmpelim-le-notsi.c |4 ++--
 gcc/testsuite/gcc.target/vax/cmpelim-lt-notsi.c |4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

gcc-test-vax-notsi-pr96685.diff
Index: gcc/gcc/testsuite/gcc.target/vax/cmpelim-eq-notsi.c
===
--- gcc.orig/gcc/testsuite/gcc.target/vax/cmpelim-eq-notsi.c
+++ gcc/gcc/testsuite/gcc.target/vax/cmpelim-eq-notsi.c
@@ -11,14 +11,14 @@ eq_notsi (int_t x)
   if (x == 0)
 return x;
   else
-return x + 2;
+return 2 - x;
 }
 
 /* Expect assembly like:
 
mcoml 4(%ap),%r0# 32[c=16]  *one_cmplsi2_ccz
jeql .L1# 34[c=26]  *branch_ccz
-   addl2 $2,%r0# 31[c=32]  *addsi3
+   subl3 %r0,$2,%r0# 31[c=32]  *subsi3/1
 .L1:
 
  */
Index: gcc/gcc/testsuite/gcc.target/vax/cmpelim-le-notsi.c
===
--- gcc.orig/gcc/testsuite/gcc.target/vax/cmpelim-le-notsi.c
+++ gcc/gcc/testsuite/gcc.target/vax/cmpelim-le-notsi.c
@@ -11,14 +11,14 @@ le_notsi (int_t x)
   if (x <= 0)
 return x;
   else
-return x + 2;
+return 2 - x;
 }
 
 /* Expect assembly like:
 
mcoml 4(%ap),%r0# 28[c=16]  *one_cmplsi2_ccnz
jleq .L1# 30[c=26]  *branch_ccnz
-   addl2 $2,%r0# 27[c=32]  *addsi3
+   subl3 %r0,$2,%r0# 27[c=32]  *subsi3/1
 .L1:
 
  */
Index: gcc/gcc/testsuite/gcc.target/vax/cmpelim-lt-notsi.c
===
--- gcc.orig/gcc/testsuite/gcc.target/vax/cmpelim-lt-notsi.c
+++ 

Re: [PATCH 1/3] arm: Add movmisalign patterns for MVE (PR target/97875)

2021-01-08 Thread Christophe Lyon via Gcc-patches
On Fri, 8 Jan 2021 at 10:50, Kyrylo Tkachov  wrote:
>
> Hi Christophe,
>
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Christophe Lyon via Gcc-patches
> > Sent: 17 December 2020 17:48
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH 1/3] arm: Add movmisalign patterns for MVE (PR
> > target/97875)
> >
> > This patch adds new movmisalign_mve_load and store patterns for
> > MVE to help vectorization. They are very similar to their Neon
> > counterparts, but use different iterators and instructions.
> >
> > Indeed MVE supports less vectors modes than Neon, so we use
> > the MVE_VLD_ST iterator where Neon uses VQX.
> >
> > Since the supported modes are different from the ones valid for
> > arithmetic operators, we introduce two new sets of macros:
> >
> > ARM_HAVE_NEON__LDST
> >   true if Neon has vector load/store instructions for 
> >
> > ARM_HAVE__LDST
> >   true if any vector extension has vector load/store instructions for 
> >
>
> I'm not a big fan of the big number of these macros ☹ but I understand why 
> they're used, so I won't object.

Indeed, I tried to find other ways, but it seemed better to follow the
new practice of using this new style of macros.


> > We move the movmisalign expander from neon.md to vec-
> > commond.md, and
> > replace the TARGET_NEON enabler with ARM_HAVE__LDST.
> >
> > The patch also updates the mve-vneg.c test to scan for the better code
> > generation when loading and storing the vectors involved: it checks
> > that no 'orr' instruction is generated to cope with misalignment at
> > runtime.
> > This test was chosen among the other mve tests, but any other should
> > be OK. Using a plain vector copy loop (dest[i] = a[i]) is not a good
> > test because the compiler chooses to use memcpy.
> >
> > For instance we now generate:
> > test_vneg_s32x4:
> >   vldrw.32   q3, [r1]
> >   vneg.s32  q3, q3
> >   vstrw.32   q3, [r0]
> >   bx  lr
> >
> > instead of:
> > test_vneg_s32x4:
> >   orr r3, r1, r0
> >   lslsr3, r3, #28
> >   bne .L15
> >   vldrw.32q3, [r1]
> >   vneg.s32  q3, q3
> >   vstrw.32q3, [r0]
> >   bx  lr
> >   .L15:
> >   push{r4, r5}
> >   ldrdr2, r3, [r1, #8]
> >   ldrdr5, r4, [r1]
> >   rsbsr2, r2, #0
> >   rsbsr5, r5, #0
> >   rsbsr4, r4, #0
> >   rsbsr3, r3, #0
> >   strdr5, r4, [r0]
> >   pop {r4, r5}
> >   strdr2, r3, [r0, #8]
> >   bx  lr
> >
> > 2020-12-15  Christophe Lyon  
> >
> >   PR target/97875
> >   gcc/
> >   * config/arm/arm.h (ARM_HAVE_NEON_V8QI_LDST): New macro.
> >   (ARM_HAVE_NEON_V16QI_LDST, ARM_HAVE_NEON_V4HI_LDST):
> > Likewise.
> >   (ARM_HAVE_NEON_V8HI_LDST, ARM_HAVE_NEON_V2SI_LDST):
> > Likewise.
> >   (ARM_HAVE_NEON_V4SI_LDST, ARM_HAVE_NEON_V4HF_LDST):
> > Likewise.
> >   (ARM_HAVE_NEON_V8HF_LDST, ARM_HAVE_NEON_V4BF_LDST):
> > Likewise.
> >   (ARM_HAVE_NEON_V8BF_LDST, ARM_HAVE_NEON_V2SF_LDST):
> > Likewise.
> >   (ARM_HAVE_NEON_V4SF_LDST, ARM_HAVE_NEON_DI_LDST):
> > Likewise.
> >   (ARM_HAVE_NEON_V2DI_LDST): Likewise.
> >   (ARM_HAVE_V8QI_LDST, ARM_HAVE_V16QI_LDST): Likewise.
> >   (ARM_HAVE_V4HI_LDST, ARM_HAVE_V8HI_LDST): Likewise.
> >   (ARM_HAVE_V2SI_LDST, ARM_HAVE_V4SI_LDST,
> > ARM_HAVE_V4HF_LDST): Likewise.
> >   (ARM_HAVE_V8HF_LDST, ARM_HAVE_V4BF_LDST,
> > ARM_HAVE_V8BF_LDST): Likewise.
> >   (ARM_HAVE_V2SF_LDST, ARM_HAVE_V4SF_LDST,
> > ARM_HAVE_DI_LDST): Likewise.
> >   (ARM_HAVE_V2DI_LDST): Likewise.
> >   * config/arm/mve.md (*movmisalign_mve_store): New
> > pattern.
> >   (*movmisalign_mve_load): New pattern.
> >   * config/arm/neon.md (movmisalign): Move to ...
> >   * config/arm/vec-common.md: ... here.
> >
> >   PR target/97875
> >   gcc/testsuite/
> >   * gcc.target/arm/simd/mve-vneg.c: Update test.
> > ---
> >  gcc/config/arm/arm.h | 40 
> > 
> >  gcc/config/arm/mve.md| 25 +
> >  gcc/config/arm/neon.md   | 25 -
> >  gcc/config/arm/vec-common.md | 24 +
> >  gcc/testsuite/gcc.target/arm/simd/mve-vneg.c |  3 +++
> >  5 files changed, 92 insertions(+), 25 deletions(-)
> >
> > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> > index 4a63d33..d44e0c6 100644
> > --- a/gcc/config/arm/arm.h
> > +++ b/gcc/config/arm/arm.h
> > @@ -1151,6 +1151,46 @@ extern const int arm_arch_cde_coproc_bits[];
> >  #define ARM_HAVE_V8HF_ARITH (ARM_HAVE_NEON_V8HF_ARITH ||
> > TARGET_HAVE_MVE_FLOAT)
> >  #define ARM_HAVE_V4SF_ARITH (ARM_HAVE_NEON_V4SF_ARITH ||
> > TARGET_HAVE_MVE_FLOAT)
> >
> > +/* The conditions under which vector modes are supported by load/store
> > +   instructions using Neon.  */
> > +
> > +#define ARM_HAVE_NEON_V8QI_LDST TARGET_NEON
> > +#define ARM_HAVE_NEON_V16QI_LDST 

Re: [PATCH v2] aarch64: Add cpu cost tables for A64FX

2021-01-08 Thread Richard Sandiford via Gcc-patches
Qian Jianhua  writes:
> This patch add cost tables for A64FX.
>
> ChangeLog:
> 2021-01-08 Qian jianhua 
>
> gcc/
>   * config/aarch64/aarch64-cost-tables.h (a64fx_extra_costs): New.
>   * config/aarch64/aarch64.c (a64fx_addrcost_table): New.
>   (a64fx_regmove_cost, a64fx_vector_cost): New.
>   (a64fx_tunings): Use the new added cost tables.

OK for trunk, thanks.  The v1 patch is OK for branches that support
-mcpu=a64fx.

Would you like commit access, so that you can commit it yourself?
If so, please fill out the form mentioned at the beginning of
https://gcc.gnu.org/gitwrite.html listing me as sponsor.

Alternatively, if you'd rather not for any reason, I'm happy to apply
it for you.

Thanks,
Richard


[pushed] aarch64: Support unpacked CNOT on SVE

2021-01-08 Thread Richard Sandiford via Gcc-patches
This patch adds unpacked support for unconditional and
conditional CNOT.  The type suffix has to be taken from
the element size rather than the container size.

Tested on aarch64-linux-gnu and aarch64_be-elf.  Pushed to trunk.

Richard


gcc/
* config/aarch64/aarch64-sve.md (*cnot): Extend from
SVE_FULL_I to SVE_I.
(*cond_cnot_2, *cond_cnot_any): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/cnot_2.c: New test.
* gcc.target/aarch64/sve/cond_cnot_4.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_5.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_5_run.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_6.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_6_run.c: Likewise.
---
 gcc/config/aarch64/aarch64-sve.md | 36 +--
 gcc/testsuite/gcc.target/aarch64/sve/cnot_2.c | 29 +++
 .../gcc.target/aarch64/sve/cond_cnot_4.c  | 32 +
 .../gcc.target/aarch64/sve/cond_cnot_4_run.c  | 26 ++
 .../gcc.target/aarch64/sve/cond_cnot_5.c  | 32 +
 .../gcc.target/aarch64/sve/cond_cnot_5_run.c  | 26 ++
 .../gcc.target/aarch64/sve/cond_cnot_6.c  | 31 
 .../gcc.target/aarch64/sve/cond_cnot_6_run.c  | 26 ++
 8 files changed, 220 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cnot_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4_run.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_5_run.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6_run.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index b83f9912cb6..2f5a5e3c914 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3227,16 +3227,16 @@ (define_expand "@aarch64_pred_cnot"
 )
 
 (define_insn "*cnot"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w, ?")
-   (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (unspec:SVE_I
  [(unspec:
 [(match_operand: 1 "register_operand" "Upl, Upl")
  (match_operand:SI 5 "aarch64_sve_ptrue_flag")
  (eq:
-   (match_operand:SVE_FULL_I 2 "register_operand" "0, w")
-   (match_operand:SVE_FULL_I 3 "aarch64_simd_imm_zero"))]
+   (match_operand:SVE_I 2 "register_operand" "0, w")
+   (match_operand:SVE_I 3 "aarch64_simd_imm_zero"))]
 UNSPEC_PRED_Z)
-  (match_operand:SVE_FULL_I 4 "aarch64_simd_imm_one")
+  (match_operand:SVE_I 4 "aarch64_simd_imm_one")
   (match_dup 3)]
  UNSPEC_SEL))]
   "TARGET_SVE"
@@ -3274,19 +3274,19 @@ (define_expand "@cond_cnot"
 
 ;; Predicated logical inverse, merging with the first input.
 (define_insn_and_rewrite "*cond_cnot_2"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w, ?")
-   (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (unspec:SVE_I
  [(match_operand: 1 "register_operand" "Upl, Upl")
   ;; Logical inverse of operand 2 (as above).
-  (unspec:SVE_FULL_I
+  (unspec:SVE_I
 [(unspec:
[(match_operand 5)
 (const_int SVE_KNOWN_PTRUE)
 (eq:
-  (match_operand:SVE_FULL_I 2 "register_operand" "0, w")
-  (match_operand:SVE_FULL_I 3 "aarch64_simd_imm_zero"))]
+  (match_operand:SVE_I 2 "register_operand" "0, w")
+  (match_operand:SVE_I 3 "aarch64_simd_imm_zero"))]
UNSPEC_PRED_Z)
- (match_operand:SVE_FULL_I 4 "aarch64_simd_imm_one")
+ (match_operand:SVE_I 4 "aarch64_simd_imm_one")
  (match_dup 3)]
 UNSPEC_SEL)
   (match_dup 2)]
@@ -3310,22 +3310,22 @@ (define_insn_and_rewrite "*cond_cnot_2"
 ;; as earlyclobber helps to make the instruction more regular to the
 ;; register allocator.
 (define_insn_and_rewrite "*cond_cnot_any"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=, ?, ?")
-   (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand" "=, ?, ?")
+   (unspec:SVE_I
  [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
   ;; Logical inverse of operand 2 (as above).
-  (unspec:SVE_FULL_I
+  (unspec:SVE_I
 [(unspec:
[(match_operand 5)
 (const_int SVE_KNOWN_PTRUE)
 (eq:
-  (match_operand:SVE_FULL_I 2 "register_operand" "w, w, w")
-  (match_operand:SVE_FULL_I 3 

[pushed] aarch64: Support conditional unpacked UXT on SVE

2021-01-08 Thread Richard Sandiford via Gcc-patches
This patch extends the conditional UXT patterns from SVE_FULL_I
to SVE_I.  It doesn't matter in this case whether the type suffix
is taken from the element size or the container size.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.

Richard


gcc/
* config/aarch64/aarch64-sve.md (*cond_uxt_2): Extend from
SVE_FULL_I to SVE_I.
(*cond_uxt_any): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_uxt_5.c: New test.
* gcc.target/aarch64/sve/cond_uxt_5_run.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_6.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_6_run.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_7.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_7_run.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_8.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_8_run.c: Likewise.
---
 gcc/config/aarch64/aarch64-sve.md | 22 ++---
 .../gcc.target/aarch64/sve/cond_uxt_5.c   | 33 +++
 .../gcc.target/aarch64/sve/cond_uxt_5_run.c   | 26 +++
 .../gcc.target/aarch64/sve/cond_uxt_6.c   | 33 +++
 .../gcc.target/aarch64/sve/cond_uxt_6_run.c   | 26 +++
 .../gcc.target/aarch64/sve/cond_uxt_7.c   | 29 
 .../gcc.target/aarch64/sve/cond_uxt_7_run.c   | 26 +++
 .../gcc.target/aarch64/sve/cond_uxt_8.c   | 32 ++
 .../gcc.target/aarch64/sve/cond_uxt_8_run.c   | 26 +++
 9 files changed, 242 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5_run.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_6_run.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_7_run.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_8_run.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 2ec9acbf38d..b83f9912cb6 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3135,12 +3135,12 @@ (define_insn 
"@aarch64_cond_sxt"
 ;; The canonical form of this operation is an AND of a constant rather
 ;; than (zero_extend (truncate ...)).
 (define_insn "*cond_uxt_2"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w, ?")
-   (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (unspec:SVE_I
  [(match_operand: 1 "register_operand" "Upl, Upl")
-  (and:SVE_FULL_I
-(match_operand:SVE_FULL_I 2 "register_operand" "0, w")
-(match_operand:SVE_FULL_I 3 "aarch64_sve_uxt_immediate"))
+  (and:SVE_I
+(match_operand:SVE_I 2 "register_operand" "0, w")
+(match_operand:SVE_I 3 "aarch64_sve_uxt_immediate"))
   (match_dup 2)]
  UNSPEC_SEL))]
   "TARGET_SVE"
@@ -3159,13 +3159,13 @@ (define_insn "*cond_uxt_2"
 ;; as early-clobber helps to make the instruction more regular to the
 ;; register allocator.
 (define_insn "*cond_uxt_any"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=, ?, ?")
-   (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand" "=, ?, ?")
+   (unspec:SVE_I
  [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
-  (and:SVE_FULL_I
-(match_operand:SVE_FULL_I 2 "register_operand" "w, w, w")
-(match_operand:SVE_FULL_I 3 "aarch64_sve_uxt_immediate"))
-  (match_operand:SVE_FULL_I 4 "aarch64_simd_reg_or_zero" "0, Dz, w")]
+  (and:SVE_I
+(match_operand:SVE_I 2 "register_operand" "w, w, w")
+(match_operand:SVE_I 3 "aarch64_sve_uxt_immediate"))
+  (match_operand:SVE_I 4 "aarch64_simd_reg_or_zero" "0, Dz, w")]
  UNSPEC_SEL))]
   "TARGET_SVE && !rtx_equal_p (operands[2], operands[4])"
   "@
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
new file mode 100644
index 000..18866286b7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include 
+
+#define DEF_LOOP(TYPE1, TYPE2, CONST, COUNT)   \
+  void __attribute__ ((noipa)) \
+  test_##CONST##_##TYPE1##_##TYPE2 (TYPE2 *restrict r, \
+   TYPE1 *restrict a,  \
+   TYPE2 *restrict b)  \
+  {\
+for (int i = 0; i < COUNT; ++i)\
+  r[i] = a[i] > 20 ? b[i] & CONST : b[i];  

[PATCH] aarch64: Reimplement most vpadal intrinsics using builtins

2021-01-08 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch reimplements most of the vpadal intrinsics to use RTL builtins in 
the normal way.
The ones that aren't converted are the int32x2_t -> int64x1_t ones as the RTL 
pattern doesn't currently handle
these modes. We don't have a V1DI mode so it would need to return a DImode 
value or a V2DI one with the first lane
being the result. It's not hard to do, but it would require a bit more 
refactoring so we can do it separately later.

This patch hopefully improves the status quo.

The new Vwhalf mode attribute is created because the existing Vwtype attribute 
maps V8QI wrongly (for this pattern) to "8h" as the
suffix rather than "4h" as needed.

Bootstrapped and tested on aarch64-none-linux-gnu.

Pushing to trunk.
Thanks,
Kyrill

gcc/
* config/aarch64/iterators.md (Vwhalf): New iterator.
* config/aarch64/aarch64-simd.md (aarch64_adalp_3): Rename 
to...
(aarch64_adalp): ... This.  Make more builtin-friendly.
(sadv16qi): Adjust callsite of the above.
* config/aarch64/aarch64-simd-builtins.def (sadalp, uadalp): New 
builtins.
* config/aarch64/arm_neon.h (vpadal_s8): Reimplement using builtins.
(vpadal_s16): Likewise.
(vpadal_u8): Likewise.
(vpadal_u16): Likewise.
(vpadalq_s8): Likewise.
(vpadalq_s16): Likewise.
(vpadalq_s32): Likewise.
(vpadalq_u8): Likewise.
(vpadalq_u16): Likewise.
(vpadalq_u32): Likewise.


vpadal-int.patch
Description: vpadal-int.patch


[PATCH] aarch64: Reimplement vabd* intrinsics using builtins

2021-01-08 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch reimplements the vabd* intrinsics using RTL builtins.
It's fairly straightforward with new builtins + arm_neon.h changes.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/
* config/aarch64/aarch64-simd.md (aarch64_abd_3): Rename to...
(aarch64_abd): ... This.
(sadv16qi): Adjust callsite of the above.
* config/aarch64/aarch64-simd-builtins.def (sabd, uabd): Define 
builtins.
* config/aarch64/arm_neon.h (vabd_s8): Reimplement using builtin.
(vabd_s16): Likewise.
(vabd_s32): Likewise.
(vabd_u8): Likewise.
(vabd_u16): Likewise.
(vabd_u32): Likewise.
(vabdq_s8): Likewise.
(vabdq_s16): Likewise.
(vabdq_s32): Likewise.
(vabdq_u8): Likewise.
(vabdq_u16): Likewise.
(vabdq_u32): Likewise.


vabd-int.patch
Description: vabd-int.patch


[PATCH] aarch64: Reimplement vaba* intrinsics using builtins

2021-01-08 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch reimplements the vaba* arm_neon.h intrinsics using RTL builtins that 
expand to proper RTL patterns
rather than using inline asm.
The implementation is fairly straightforward by defining new builtins and using 
them in the header.

Bootstrapped and tested on aarch64-none-linux-gnu.

Pushing to trunk.
Thanks,
Kyrill

gcc/
* config/aarch64/aarch64-simd-builtins.def (saba, uaba): Define 
builtins.
* config/aarch64/arm_neon.h (vaba_s8): Implement using builtin.
(vaba_s16): Likewise.
(vaba_s32): Likewise.
(vaba_u8): Likewise.
(vaba_u16): Likewise.
(vaba_u32): Likewise.
(vabaq_s8): Likewise.
(vabaq_s16): Likewise.
(vabaq_s32): Likewise.
(vabaq_u8): Likewise.
(vabaq_u16): Likewise.
(vabaq_u32): Likewise.


vaba-int.patch
Description: vaba-int.patch


[PATCH] aarch64: Fix RTL patterns for UABA/SABA

2021-01-08 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

Sometime ago we changed the RTL representation of the (SU)ABD instructions in 
RTL to
a (MINUS (MAX) (MIN)) rather than a (MINUS (ABS) (ABS)) as it is more correctly 
models the semantics.
We should do the same for the accumulation forms of these instructions: 
UABA/SABA.

This patch does that and allows the new pattern to generate the unsigned UABA 
form as well.
The new form also allows it to more easily be re-used to implement the relevant 
arm_neon.h intrinsics in the future.

The testcase takes an -fno-tree-reassoc to work around a side-effect of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98581

Bootstrapped and tested on aarch64-none-linux-gnu.

Pushing to trunk.
Thanks,
Kyrill

gcc/
* config/aarch64/aarch64-simd.md (aba_3): Rename to...
(aarch64_aba): ... This.  Handle uaba as well.
Change RTL pattern to match.

gcc/testsuite/
* gcc.target/aarch64/usaba_1.c: New test.


fix-usaba.patch
Description: fix-usaba.patch


RE: [PATCH 6/8 v9]middle-end slp: support complex FMA and complex FMA conjugate

2021-01-08 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Biener 
> Sent: Friday, January 8, 2021 10:17 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> Subject: RE: [PATCH 6/8 v9]middle-end slp: support complex FMA and
> complex FMA conjugate
> 
> On Fri, 8 Jan 2021, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, January 8, 2021 9:45 AM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > > Subject: Re: [PATCH 6/8 v9]middle-end slp: support complex FMA and
> > > complex FMA conjugate
> > >
> > > On Mon, 28 Dec 2020, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This adds support for FMA and FMA conjugated to the slp pattern
> matcher.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > x86_64-pc-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ): New.
> > > > * optabs.def (cmla_optab, cmla_conj_optab): New.
> > > > * doc/md.texi: Document them.
> > > > * tree-vect-slp-patterns.c (vect_match_call_p,
> > > > class complex_fma_pattern, vect_slp_reset_pattern,
> > > > complex_fma_pattern::matches, complex_fma_pattern::recognize,
> > > > complex_fma_pattern::build): New.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > >
> > >
> b8cc90e1a75e402abbf8a8cf2efefc1a333f8b3a..6d5a98c4946d3ff4c2b8abea5c
> > > 2
> > > 9
> > > > caa6863fd3f7 100644
> > > > --- a/gcc/doc/md.texi
> > > > +++ b/gcc/doc/md.texi
> > > > @@ -6202,6 +6202,51 @@ The operation is only supported for vector
> > > modes @var{m}.
> > > >
> > > >  This pattern is not allowed to @code{FAIL}.
> > > >
> > > > +@cindex @code{cmla@var{m}4} instruction pattern @item
> > > > +@samp{cmla@var{m}4} Perform a vector multiply and accumulate
> that
> > > > +is semantically the same as a multiply and accumulate of complex
> > > > +numbers.
> > > > +
> > > > +@smallexample
> > > > +  complex TYPE c[N];
> > > > +  complex TYPE a[N];
> > > > +  complex TYPE b[N];
> > > > +  for (int i = 0; i < N; i += 1)
> > > > +@{
> > > > +  c[i] += a[i] * b[i];
> > > > +@}
> > > > +@end smallexample
> > > > +
> > > > +In GCC lane ordering the real part of the number must be in the
> > > > +even lanes with the imaginary part in the odd lanes.
> > > > +
> > > > +The operation is only supported for vector modes @var{m}.
> > > > +
> > > > +This pattern is not allowed to @code{FAIL}.
> > > > +
> > > > +@cindex @code{cmla_conj@var{m}4} instruction pattern @item
> > > > +@samp{cmla_conj@var{m}4} Perform a vector multiply by conjugate
> > > > +and accumulate that is semantically the same as a multiply and
> > > > +accumulate of complex numbers where the second multiply
> arguments is conjugated.
> > > > +
> > > > +@smallexample
> > > > +  complex TYPE c[N];
> > > > +  complex TYPE a[N];
> > > > +  complex TYPE b[N];
> > > > +  for (int i = 0; i < N; i += 1)
> > > > +@{
> > > > +  c[i] += a[i] * conj (b[i]);
> > > > +@}
> > > > +@end smallexample
> > > > +
> > > > +In GCC lane ordering the real part of the number must be in the
> > > > +even lanes with the imaginary part in the odd lanes.
> > > > +
> > > > +The operation is only supported for vector modes @var{m}.
> > > > +
> > > > +This pattern is not allowed to @code{FAIL}.
> > > > +
> > > >  @cindex @code{cmul@var{m}4} instruction pattern  @item
> > > > @samp{cmul@var{m}4}  Perform a vector multiply that is
> > > > semantically the same as multiply of diff --git
> > > > a/gcc/internal-fn.def b/gcc/internal-fn.def index
> > > >
> > >
> 5a0bbe3fe5dee591d54130e60f6996b28164ae38..305450e026d4b94ab62ceb9c
> > > a719
> > > > ec5570ff43eb 100644
> > > > --- a/gcc/internal-fn.def
> > > > +++ b/gcc/internal-fn.def
> > > > @@ -288,6 +288,8 @@ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST,
> ldexp,
> > > > binary)
> > > >
> > > >  /* Ternary math functions.  */
> > > >  DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
> > > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla,
> ternary)
> > > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST,
> > > cmla_conj,
> > > > +ternary)
> > > >
> > > >  /* Unary integer ops.  */
> > > >  DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb,
> > > unary)
> > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > >
> > >
> e82396bae1117c6de91304761a560b7fbcb69ce1..8e2758d685ed85e02df10dac
> > > 571e
> > > > b40d45a294ed 100644
> > > > --- a/gcc/optabs.def
> > > > +++ b/gcc/optabs.def
> > > > @@ -294,6 +294,8 @@ OPTAB_D (cadd90_optab, "cadd90$a3")
> OPTAB_D
> > > > (cadd270_optab, "cadd270$a3")  OPTAB_D (cmul_optab, "cmul$a3")
> > > > OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
> > > > +OPTAB_D (cmla_optab, "cmla$a4")
> > > > +OPTAB_D (cmla_conj_optab, 

RE: [PATCH 6/8 v9]middle-end slp: support complex FMA and complex FMA conjugate

2021-01-08 Thread Richard Biener
On Fri, 8 Jan 2021, Tamar Christina wrote:

> Hi Richi,
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, January 8, 2021 9:45 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > Subject: Re: [PATCH 6/8 v9]middle-end slp: support complex FMA and
> > complex FMA conjugate
> > 
> > On Mon, 28 Dec 2020, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This adds support for FMA and FMA conjugated to the slp pattern matcher.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ): New.
> > >   * optabs.def (cmla_optab, cmla_conj_optab): New.
> > >   * doc/md.texi: Document them.
> > >   * tree-vect-slp-patterns.c (vect_match_call_p,
> > >   class complex_fma_pattern, vect_slp_reset_pattern,
> > >   complex_fma_pattern::matches, complex_fma_pattern::recognize,
> > >   complex_fma_pattern::build): New.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> > b8cc90e1a75e402abbf8a8cf2efefc1a333f8b3a..6d5a98c4946d3ff4c2b8abea5c2
> > 9
> > > caa6863fd3f7 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -6202,6 +6202,51 @@ The operation is only supported for vector
> > modes @var{m}.
> > >
> > >  This pattern is not allowed to @code{FAIL}.
> > >
> > > +@cindex @code{cmla@var{m}4} instruction pattern @item
> > > +@samp{cmla@var{m}4} Perform a vector multiply and accumulate that is
> > > +semantically the same as a multiply and accumulate of complex
> > > +numbers.
> > > +
> > > +@smallexample
> > > +  complex TYPE c[N];
> > > +  complex TYPE a[N];
> > > +  complex TYPE b[N];
> > > +  for (int i = 0; i < N; i += 1)
> > > +@{
> > > +  c[i] += a[i] * b[i];
> > > +@}
> > > +@end smallexample
> > > +
> > > +In GCC lane ordering the real part of the number must be in the even
> > > +lanes with the imaginary part in the odd lanes.
> > > +
> > > +The operation is only supported for vector modes @var{m}.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > > +@cindex @code{cmla_conj@var{m}4} instruction pattern @item
> > > +@samp{cmla_conj@var{m}4} Perform a vector multiply by conjugate and
> > > +accumulate that is semantically the same as a multiply and accumulate
> > > +of complex numbers where the second multiply arguments is conjugated.
> > > +
> > > +@smallexample
> > > +  complex TYPE c[N];
> > > +  complex TYPE a[N];
> > > +  complex TYPE b[N];
> > > +  for (int i = 0; i < N; i += 1)
> > > +@{
> > > +  c[i] += a[i] * conj (b[i]);
> > > +@}
> > > +@end smallexample
> > > +
> > > +In GCC lane ordering the real part of the number must be in the even
> > > +lanes with the imaginary part in the odd lanes.
> > > +
> > > +The operation is only supported for vector modes @var{m}.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > >  @cindex @code{cmul@var{m}4} instruction pattern  @item
> > > @samp{cmul@var{m}4}  Perform a vector multiply that is semantically
> > > the same as multiply of diff --git a/gcc/internal-fn.def
> > > b/gcc/internal-fn.def index
> > >
> > 5a0bbe3fe5dee591d54130e60f6996b28164ae38..305450e026d4b94ab62ceb9c
> > a719
> > > ec5570ff43eb 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -288,6 +288,8 @@ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp,
> > > binary)
> > >
> > >  /* Ternary math functions.  */
> > >  DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST,
> > cmla_conj,
> > > +ternary)
> > >
> > >  /* Unary integer ops.  */
> > >  DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb,
> > unary)
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > >
> > e82396bae1117c6de91304761a560b7fbcb69ce1..8e2758d685ed85e02df10dac
> > 571e
> > > b40d45a294ed 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -294,6 +294,8 @@ OPTAB_D (cadd90_optab, "cadd90$a3")  OPTAB_D
> > > (cadd270_optab, "cadd270$a3")  OPTAB_D (cmul_optab, "cmul$a3")
> > > OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
> > > +OPTAB_D (cmla_optab, "cmla$a4")
> > > +OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
> > >  OPTAB_D (cos_optab, "cos$a2")
> > >  OPTAB_D (cosh_optab, "cosh$a2")
> > >  OPTAB_D (exp10_optab, "exp10$a2")
> > > diff --git a/gcc/tree-vect-slp-patterns.c
> > > b/gcc/tree-vect-slp-patterns.c index
> > >
> > 82721acbab8cf81c4d6f9954c98fb913a7bb6282..3625a80c08e3d70fd362fc52e1
> > 7e
> > > 65b3b2c7da83 100644
> > > --- a/gcc/tree-vect-slp-patterns.c
> > > +++ b/gcc/tree-vect-slp-patterns.c
> > > @@ -325,6 +325,24 @@ vect_match_expression_p (slp_tree node,
> > tree_code code)
> > >return true;
> > >  }
> > >
> > > +/* Checks to see if the expression 

RE: [PATCH 7/8 v9]middle-end slp: support complex FMS and complex FMS conjugate

2021-01-08 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Biener 
> Sent: Friday, January 8, 2021 9:49 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: Re: [PATCH 7/8 v9]middle-end slp: support complex FMS and
> complex FMS conjugate
> 
> On Mon, 28 Dec 2020, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This adds support for FMS and FMS conjugated to the slp pattern matcher.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Ok for master?
> 
> Interestingly this patch looks different from the FMA one(!?).  I would have
> expected to have the same pattern for FMA and FMS in the end.

No, because the mid-end canonization of the tree for FMA and FMS are different.
Because FMS has two TWO_OPERANDS nodes the order of the tree is swapped.

There's no real reason for it (as far as I can tell) but that results in a 
reverse tree.
However the operations are not sufficiently different that I can detect the MUL 
part.

I have a note for next year's rewrite to fix this during slp build so they can 
be shared.

> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.def (COMPLEX_FMS, COMPLEX_FMS_CONJ): New.
> > * optabs.def (cmls_optab, cmls_conj_optab): New.
> > * doc/md.texi: Document them.
> > * tree-vect-slp-patterns.c (class complex_fms_pattern,
> > complex_fms_pattern::matches, complex_fms_pattern::recognize,
> > complex_fms_pattern::build): New.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> >
> 6d5a98c4946d3ff4c2b8abea5c29caa6863fd3f7..3f5a42df285b3ee162edc9ec66
> 1f
> > 25c0eec5e4fa 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6247,6 +6247,51 @@ The operation is only supported for vector
> modes @var{m}.
> >
> >  This pattern is not allowed to @code{FAIL}.
> >
> > +@cindex @code{cmls@var{m}4} instruction pattern @item
> > +@samp{cmls@var{m}4} Perform a vector multiply and subtract that is
> > +semantically the same as a multiply and subtract of complex numbers.
> > +
> > +@smallexample
> > +  complex TYPE c[N];
> > +  complex TYPE a[N];
> > +  complex TYPE b[N];
> > +  for (int i = 0; i < N; i += 1)
> > +@{
> > +  c[i] -= a[i] * b[i];
> > +@}
> > +@end smallexample
> > +
> > +In GCC lane ordering the real part of the number must be in the even
> > +lanes with the imaginary part in the odd lanes.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmls_conj@var{m}4} instruction pattern @item
> > +@samp{cmls_conj@var{m}4} Perform a vector multiply by conjugate and
> > +subtract that is semantically the same as a multiply and subtract of
> > +complex numbers where the second multiply arguments is conjugated.
> > +
> > +@smallexample
> > +  complex TYPE c[N];
> > +  complex TYPE a[N];
> > +  complex TYPE b[N];
> > +  for (int i = 0; i < N; i += 1)
> > +@{
> > +  c[i] -= a[i] * conj (b[i]);
> > +@}
> > +@end smallexample
> > +
> > +In GCC lane ordering the real part of the number must be in the even
> > +lanes with the imaginary part in the odd lanes.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> >  @cindex @code{cmul@var{m}4} instruction pattern  @item
> > @samp{cmul@var{m}4}  Perform a vector multiply that is semantically
> > the same as multiply of diff --git a/gcc/internal-fn.def
> > b/gcc/internal-fn.def index
> >
> 305450e026d4b94ab62ceb9ca719ec5570ff43eb..c8161509d9497afe58f32bde1
> 2d8
> > e6bd7b876a3c 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -290,6 +290,8 @@ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp,
> > binary)  DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
> > DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
> > DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST,
> cmla_conj,
> > ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST,
> cmls_conj,
> > +ternary)
> >
> >  /* Unary integer ops.  */
> >  DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb,
> unary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> >
> 8e2758d685ed85e02df10dac571eb40d45a294ed..320bb5f3dce31867d312bbb
> b6a4c
> > 6e31c534254e 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -296,6 +296,8 @@ OPTAB_D (cmul_optab, "cmul$a3")  OPTAB_D
> > (cmul_conj_optab, "cmul_conj$a3")  OPTAB_D (cmla_optab, "cmla$a4")
> > OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
> > +OPTAB_D (cmls_optab, "cmls$a4")
> > +OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
> >  OPTAB_D (cos_optab, "cos$a2")
> >  OPTAB_D (cosh_optab, "cosh$a2")
> >  OPTAB_D (exp10_optab, "exp10$a2")
> > diff --git a/gcc/tree-vect-slp-patterns.c
> > b/gcc/tree-vect-slp-patterns.c index
> >
> 

RE: [PATCH 6/8 v9]middle-end slp: support complex FMA and complex FMA conjugate

2021-01-08 Thread Tamar Christina via Gcc-patches
Hi Richi,

> -Original Message-
> From: Richard Biener 
> Sent: Friday, January 8, 2021 9:45 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> Subject: Re: [PATCH 6/8 v9]middle-end slp: support complex FMA and
> complex FMA conjugate
> 
> On Mon, 28 Dec 2020, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This adds support for FMA and FMA conjugated to the slp pattern matcher.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ): New.
> > * optabs.def (cmla_optab, cmla_conj_optab): New.
> > * doc/md.texi: Document them.
> > * tree-vect-slp-patterns.c (vect_match_call_p,
> > class complex_fma_pattern, vect_slp_reset_pattern,
> > complex_fma_pattern::matches, complex_fma_pattern::recognize,
> > complex_fma_pattern::build): New.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> >
> b8cc90e1a75e402abbf8a8cf2efefc1a333f8b3a..6d5a98c4946d3ff4c2b8abea5c2
> 9
> > caa6863fd3f7 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6202,6 +6202,51 @@ The operation is only supported for vector
> modes @var{m}.
> >
> >  This pattern is not allowed to @code{FAIL}.
> >
> > +@cindex @code{cmla@var{m}4} instruction pattern @item
> > +@samp{cmla@var{m}4} Perform a vector multiply and accumulate that is
> > +semantically the same as a multiply and accumulate of complex
> > +numbers.
> > +
> > +@smallexample
> > +  complex TYPE c[N];
> > +  complex TYPE a[N];
> > +  complex TYPE b[N];
> > +  for (int i = 0; i < N; i += 1)
> > +@{
> > +  c[i] += a[i] * b[i];
> > +@}
> > +@end smallexample
> > +
> > +In GCC lane ordering the real part of the number must be in the even
> > +lanes with the imaginary part in the odd lanes.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmla_conj@var{m}4} instruction pattern @item
> > +@samp{cmla_conj@var{m}4} Perform a vector multiply by conjugate and
> > +accumulate that is semantically the same as a multiply and accumulate
> > +of complex numbers where the second multiply arguments is conjugated.
> > +
> > +@smallexample
> > +  complex TYPE c[N];
> > +  complex TYPE a[N];
> > +  complex TYPE b[N];
> > +  for (int i = 0; i < N; i += 1)
> > +@{
> > +  c[i] += a[i] * conj (b[i]);
> > +@}
> > +@end smallexample
> > +
> > +In GCC lane ordering the real part of the number must be in the even
> > +lanes with the imaginary part in the odd lanes.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> >  @cindex @code{cmul@var{m}4} instruction pattern  @item
> > @samp{cmul@var{m}4}  Perform a vector multiply that is semantically
> > the same as multiply of diff --git a/gcc/internal-fn.def
> > b/gcc/internal-fn.def index
> >
> 5a0bbe3fe5dee591d54130e60f6996b28164ae38..305450e026d4b94ab62ceb9c
> a719
> > ec5570ff43eb 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -288,6 +288,8 @@ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp,
> > binary)
> >
> >  /* Ternary math functions.  */
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST,
> cmla_conj,
> > +ternary)
> >
> >  /* Unary integer ops.  */
> >  DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb,
> unary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> >
> e82396bae1117c6de91304761a560b7fbcb69ce1..8e2758d685ed85e02df10dac
> 571e
> > b40d45a294ed 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -294,6 +294,8 @@ OPTAB_D (cadd90_optab, "cadd90$a3")  OPTAB_D
> > (cadd270_optab, "cadd270$a3")  OPTAB_D (cmul_optab, "cmul$a3")
> > OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
> > +OPTAB_D (cmla_optab, "cmla$a4")
> > +OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
> >  OPTAB_D (cos_optab, "cos$a2")
> >  OPTAB_D (cosh_optab, "cosh$a2")
> >  OPTAB_D (exp10_optab, "exp10$a2")
> > diff --git a/gcc/tree-vect-slp-patterns.c
> > b/gcc/tree-vect-slp-patterns.c index
> >
> 82721acbab8cf81c4d6f9954c98fb913a7bb6282..3625a80c08e3d70fd362fc52e1
> 7e
> > 65b3b2c7da83 100644
> > --- a/gcc/tree-vect-slp-patterns.c
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -325,6 +325,24 @@ vect_match_expression_p (slp_tree node,
> tree_code code)
> >return true;
> >  }
> >
> > +/* Checks to see if the expression represented by NODE is a call to the
> internal
> > +   function FN.  */
> > +
> > +static inline bool
> > +vect_match_call_p (slp_tree node, internal_fn fn) {
> > +  if (!node
> > +  || !SLP_TREE_REPRESENTATIVE (node))
> > +return false;
> > +
> > +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE
> 

Re: [PATCH]AArch64 SVE2: Fix aarch64-sve2-acle-asm tests.

2021-01-08 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> This fixes a logical inconsistency with the SVE2 ACLE tests where the SVE2 
> tests
> are checking for SVE support in the assembler instead of SVE2.
>
> This makes all these tests fail when the user has an SVE enabled assembler but
> not an SVE2 one.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/testsuite/ChangeLog:
>
>   * lib/target-supports.exp
>   (check_effective_target_aarch64_asm_sve_ok): New.
>   * g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Use it.

While you're there, could you do the same for the gcc.target version
(gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp)?

OK with that change, thanks.

Richard

>
> --- inline copy of patch -- 
> diff --git 
> a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp 
> b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
> index 
> aa8dc9ee11ec4e0a7144e79a36f52c6d9d83ec68..c3a3a01a7ed913720e31729300dc0a5c99232ec4
>  100644
> --- a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
> +++ b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
> @@ -66,7 +66,7 @@ set-torture-options {
>  # Main loop.
>  set files [glob -nocomplain $srcdir/$gcc_subdir/asm/*.c]
>  set save-dg-do-what-default ${dg-do-what-default}
> -if { [check_effective_target_aarch64_asm_sve_ok]
> +if { [check_effective_target_aarch64_asm_sve2_ok]
>   && [check_effective_target_aarch64_variant_pcs] } {
>  set dg-do-what-default assemble
>  } else {
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 
> 5cf0f4218a6420a5289a4be25ecca0915107b139..47d4c45e9eb9008148a5f8f26b3c7dd7292369fc
>  100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -9735,7 +9735,7 @@ proc check_effective_target_aarch64_tiny { } {
>  # various architecture extensions via the .arch_extension pseudo-op.
>  
>  foreach { aarch64_ext } { "fp" "simd" "crypto" "crc" "lse" "dotprod" "sve"
> -   "i8mm" "f32mm" "f64mm" "bf16" "sb" } {
> +   "i8mm" "f32mm" "f64mm" "bf16" "sb" "sve2" } {
>  eval [string map [list FUNC $aarch64_ext] {
>   proc check_effective_target_aarch64_asm_FUNC_ok { } {
> if { [istarget aarch64*-*-*] } {


RE: [PATCH 1/3] arm: Add movmisalign patterns for MVE (PR target/97875)

2021-01-08 Thread Kyrylo Tkachov via Gcc-patches
Hi Christophe,

> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 17 December 2020 17:48
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 1/3] arm: Add movmisalign patterns for MVE (PR
> target/97875)
> 
> This patch adds new movmisalign_mve_load and store patterns for
> MVE to help vectorization. They are very similar to their Neon
> counterparts, but use different iterators and instructions.
> 
> Indeed MVE supports less vectors modes than Neon, so we use
> the MVE_VLD_ST iterator where Neon uses VQX.
> 
> Since the supported modes are different from the ones valid for
> arithmetic operators, we introduce two new sets of macros:
> 
> ARM_HAVE_NEON__LDST
>   true if Neon has vector load/store instructions for 
> 
> ARM_HAVE__LDST
>   true if any vector extension has vector load/store instructions for 
> 

I'm not a big fan of the big number of these macros ☹ but I understand why 
they're used, so I won't object.

> We move the movmisalign expander from neon.md to vec-
> commond.md, and
> replace the TARGET_NEON enabler with ARM_HAVE__LDST.
> 
> The patch also updates the mve-vneg.c test to scan for the better code
> generation when loading and storing the vectors involved: it checks
> that no 'orr' instruction is generated to cope with misalignment at
> runtime.
> This test was chosen among the other mve tests, but any other should
> be OK. Using a plain vector copy loop (dest[i] = a[i]) is not a good
> test because the compiler chooses to use memcpy.
> 
> For instance we now generate:
> test_vneg_s32x4:
>   vldrw.32   q3, [r1]
>   vneg.s32  q3, q3
>   vstrw.32   q3, [r0]
>   bx  lr
> 
> instead of:
> test_vneg_s32x4:
>   orr r3, r1, r0
>   lslsr3, r3, #28
>   bne .L15
>   vldrw.32q3, [r1]
>   vneg.s32  q3, q3
>   vstrw.32q3, [r0]
>   bx  lr
>   .L15:
>   push{r4, r5}
>   ldrdr2, r3, [r1, #8]
>   ldrdr5, r4, [r1]
>   rsbsr2, r2, #0
>   rsbsr5, r5, #0
>   rsbsr4, r4, #0
>   rsbsr3, r3, #0
>   strdr5, r4, [r0]
>   pop {r4, r5}
>   strdr2, r3, [r0, #8]
>   bx  lr
> 
> 2020-12-15  Christophe Lyon  
> 
>   PR target/97875
>   gcc/
>   * config/arm/arm.h (ARM_HAVE_NEON_V8QI_LDST): New macro.
>   (ARM_HAVE_NEON_V16QI_LDST, ARM_HAVE_NEON_V4HI_LDST):
> Likewise.
>   (ARM_HAVE_NEON_V8HI_LDST, ARM_HAVE_NEON_V2SI_LDST):
> Likewise.
>   (ARM_HAVE_NEON_V4SI_LDST, ARM_HAVE_NEON_V4HF_LDST):
> Likewise.
>   (ARM_HAVE_NEON_V8HF_LDST, ARM_HAVE_NEON_V4BF_LDST):
> Likewise.
>   (ARM_HAVE_NEON_V8BF_LDST, ARM_HAVE_NEON_V2SF_LDST):
> Likewise.
>   (ARM_HAVE_NEON_V4SF_LDST, ARM_HAVE_NEON_DI_LDST):
> Likewise.
>   (ARM_HAVE_NEON_V2DI_LDST): Likewise.
>   (ARM_HAVE_V8QI_LDST, ARM_HAVE_V16QI_LDST): Likewise.
>   (ARM_HAVE_V4HI_LDST, ARM_HAVE_V8HI_LDST): Likewise.
>   (ARM_HAVE_V2SI_LDST, ARM_HAVE_V4SI_LDST,
> ARM_HAVE_V4HF_LDST): Likewise.
>   (ARM_HAVE_V8HF_LDST, ARM_HAVE_V4BF_LDST,
> ARM_HAVE_V8BF_LDST): Likewise.
>   (ARM_HAVE_V2SF_LDST, ARM_HAVE_V4SF_LDST,
> ARM_HAVE_DI_LDST): Likewise.
>   (ARM_HAVE_V2DI_LDST): Likewise.
>   * config/arm/mve.md (*movmisalign_mve_store): New
> pattern.
>   (*movmisalign_mve_load): New pattern.
>   * config/arm/neon.md (movmisalign): Move to ...
>   * config/arm/vec-common.md: ... here.
> 
>   PR target/97875
>   gcc/testsuite/
>   * gcc.target/arm/simd/mve-vneg.c: Update test.
> ---
>  gcc/config/arm/arm.h | 40 
> 
>  gcc/config/arm/mve.md| 25 +
>  gcc/config/arm/neon.md   | 25 -
>  gcc/config/arm/vec-common.md | 24 +
>  gcc/testsuite/gcc.target/arm/simd/mve-vneg.c |  3 +++
>  5 files changed, 92 insertions(+), 25 deletions(-)
> 
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 4a63d33..d44e0c6 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1151,6 +1151,46 @@ extern const int arm_arch_cde_coproc_bits[];
>  #define ARM_HAVE_V8HF_ARITH (ARM_HAVE_NEON_V8HF_ARITH ||
> TARGET_HAVE_MVE_FLOAT)
>  #define ARM_HAVE_V4SF_ARITH (ARM_HAVE_NEON_V4SF_ARITH ||
> TARGET_HAVE_MVE_FLOAT)
> 
> +/* The conditions under which vector modes are supported by load/store
> +   instructions using Neon.  */
> +
> +#define ARM_HAVE_NEON_V8QI_LDST TARGET_NEON
> +#define ARM_HAVE_NEON_V16QI_LDST TARGET_NEON
> +#define ARM_HAVE_NEON_V4HI_LDST TARGET_NEON
> +#define ARM_HAVE_NEON_V8HI_LDST TARGET_NEON
> +#define ARM_HAVE_NEON_V2SI_LDST TARGET_NEON
> +#define ARM_HAVE_NEON_V4SI_LDST TARGET_NEON
> +#define ARM_HAVE_NEON_V4HF_LDST TARGET_NEON_FP16INST
> +#define ARM_HAVE_NEON_V8HF_LDST TARGET_NEON_FP16INST
> +#define ARM_HAVE_NEON_V4BF_LDST TARGET_BF16_SIMD
> +#define ARM_HAVE_NEON_V8BF_LDST TARGET_BF16_SIMD
> 

Re: [PATCH 8/8 v9]middle-end slp: Add complex operations class to share first match among all matchers

2021-01-08 Thread Richard Biener
On Mon, 28 Dec 2020, Tamar Christina wrote:

> Hi All,
> 
> This introduces a common class complex_operations_pattern which encapsulates
> the complex add, mul, fma and fms pattern in such a way so that the first 
> match
> is shared.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-slp-patterns.c (class complex_operations_pattern,
>   complex_operations_pattern::matches,
>   complex_operations_pattern::recognize,
>   complex_operations_pattern::build): New.
>   (slp_patterns): Use it.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 
> ab6587f0b8522ec5f916f74e7e7401b1f7a35bbb..33d22e657ebf1d0454a134bab4febb2b65581822
>  100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -1429,6 +1429,83 @@ complex_fms_pattern::build (vec_info *vinfo)
>complex_pattern::build (vinfo);
>  }
>  
> +/***
> + * complex_operations_pattern class
> + 
> **/
> +
> +/* This function combines all the existing pattern matchers above into one 
> class
> +   that shares the functionality between them.  The initial match is shared
> +   between all complex operations.  */
> +
> +class complex_operations_pattern : public complex_pattern
> +{
> +  protected:
> +complex_operations_pattern (slp_tree *node, vec *m_ops,
> + internal_fn ifn)
> +  : complex_pattern (node, m_ops, ifn)
> +{
> +  this->m_num_args = 0;
> +}
> +
> +  public:
> +void build (vec_info *);
> +static internal_fn
> +matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree 
> *,
> +  vec *);
> +
> +static vect_pattern*
> +recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +};
> +
> +/* Dummy matches implementation for proxy object.  */
> +
> +internal_fn
> +complex_operations_pattern::
> +matches (complex_operation_t /* op */,
> +  slp_tree_to_load_perm_map_t * /* perm_cache */,
> +  slp_tree * /* ref_node */, vec * /* ops */)
> +{
> +  return IFN_LAST;
> +}
> +
> +/* Attempt to recognize a complex mul pattern.  */
> +
> +vect_pattern*
> +complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t 
> *perm_cache,
> +slp_tree *node)
> +{
> +  auto_vec ops;
> +  complex_operation_t op
> += vect_detect_pair_op (*node, true, );
> +  internal_fn ifn = IFN_LAST;
> +
> +  ifn  = complex_fms_pattern::matches (op, perm_cache, node, );
> +  if (ifn != IFN_LAST)
> +return complex_fms_pattern::mkInstance (node, , ifn);
> +
> +  ifn  = complex_mul_pattern::matches (op, perm_cache, node, );
> +  if (ifn != IFN_LAST)
> +return complex_mul_pattern::mkInstance (node, , ifn);
> +
> +  ifn  = complex_fma_pattern::matches (op, perm_cache, node, );
> +  if (ifn != IFN_LAST)
> +return complex_fma_pattern::mkInstance (node, , ifn);
> +
> +  ifn  = complex_add_pattern::matches (op, perm_cache, node, );
> +  if (ifn != IFN_LAST)
> +return complex_add_pattern::mkInstance (node, , ifn);
> +
> +  return NULL;
> +}
> +
> +/* Dummy implementation of build.  */
> +
> +void
> +complex_operations_pattern::build (vec_info * /* vinfo */)
> +{
> +  gcc_unreachable ();
> +}
> +
>  
> /***
>   * Pattern matching definitions
>   
> **/
> @@ -1440,7 +1517,7 @@ vect_pattern_decl_t slp_patterns[]
>   order patterns from the largest to the smallest.  Especially if they
>   overlap in what they can detect.  */
>  
> -  SLP_PATTERN (complex_add_pattern),
> +  SLP_PATTERN (complex_operations_pattern),
>  };
>  #undef SLP_PATTERN
>  
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH 7/8 v9]middle-end slp: support complex FMS and complex FMS conjugate

2021-01-08 Thread Richard Biener
On Mon, 28 Dec 2020, Tamar Christina wrote:

> Hi All,
> 
> This adds support for FMS and FMS conjugated to the slp pattern matcher.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> 
> Ok for master?

Interestingly this patch looks different from the FMA one(!?).  I
would have expected to have the same pattern for FMA and FMS in the
end.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * internal-fn.def (COMPLEX_FMS, COMPLEX_FMS_CONJ): New.
>   * optabs.def (cmls_optab, cmls_conj_optab): New.
>   * doc/md.texi: Document them.
>   * tree-vect-slp-patterns.c (class complex_fms_pattern,
>   complex_fms_pattern::matches, complex_fms_pattern::recognize,
>   complex_fms_pattern::build): New.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 6d5a98c4946d3ff4c2b8abea5c29caa6863fd3f7..3f5a42df285b3ee162edc9ec661f25c0eec5e4fa
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6247,6 +6247,51 @@ The operation is only supported for vector modes 
> @var{m}.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{cmls@var{m}4} instruction pattern
> +@item @samp{cmls@var{m}4}
> +Perform a vector multiply and subtract that is semantically the same as
> +a multiply and subtract of complex numbers.
> +
> +@smallexample
> +  complex TYPE c[N];
> +  complex TYPE a[N];
> +  complex TYPE b[N];
> +  for (int i = 0; i < N; i += 1)
> +@{
> +  c[i] -= a[i] * b[i];
> +@}
> +@end smallexample
> +
> +In GCC lane ordering the real part of the number must be in the even lanes 
> with
> +the imaginary part in the odd lanes.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> +@cindex @code{cmls_conj@var{m}4} instruction pattern
> +@item @samp{cmls_conj@var{m}4}
> +Perform a vector multiply by conjugate and subtract that is semantically
> +the same as a multiply and subtract of complex numbers where the second
> +multiply arguments is conjugated.
> +
> +@smallexample
> +  complex TYPE c[N];
> +  complex TYPE a[N];
> +  complex TYPE b[N];
> +  for (int i = 0; i < N; i += 1)
> +@{
> +  c[i] -= a[i] * conj (b[i]);
> +@}
> +@end smallexample
> +
> +In GCC lane ordering the real part of the number must be in the even lanes 
> with
> +the imaginary part in the odd lanes.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{cmul@var{m}4} instruction pattern
>  @item @samp{cmul@var{m}4}
>  Perform a vector multiply that is semantically the same as multiply of
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> 305450e026d4b94ab62ceb9ca719ec5570ff43eb..c8161509d9497afe58f32bde12d8e6bd7b876a3c
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -290,6 +290,8 @@ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
>  DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
>  DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
>  DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST, cmla_conj, ternary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST, cmls_conj, ternary)
>  
>  /* Unary integer ops.  */
>  DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 8e2758d685ed85e02df10dac571eb40d45a294ed..320bb5f3dce31867d3126a4c6e31c534254e
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -296,6 +296,8 @@ OPTAB_D (cmul_optab, "cmul$a3")
>  OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
>  OPTAB_D (cmla_optab, "cmla$a4")
>  OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
> +OPTAB_D (cmls_optab, "cmls$a4")
> +OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
>  OPTAB_D (cos_optab, "cos$a2")
>  OPTAB_D (cosh_optab, "cosh$a2")
>  OPTAB_D (exp10_optab, "exp10$a2")
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 
> 3625a80c08e3d70fd362fc52e17e65b3b2c7da83..ab6587f0b8522ec5f916f74e7e7401b1f7a35bbb
>  100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -1254,6 +1254,181 @@ complex_fma_pattern::build (vec_info *vinfo)
>complex_pattern::build (vinfo);
>  }
>  
> +/***
> + * complex_fms_pattern class
> + 
> **/
> +
> +class complex_fms_pattern : public complex_pattern
> +{
> +  protected:
> +complex_fms_pattern (slp_tree *node, vec *m_ops, internal_fn 
> ifn)
> +  : complex_pattern (node, m_ops, ifn)
> +{
> +  this->m_num_args = 3;
> +}
> +
> +  public:
> +void build (vec_info *);
> +static internal_fn
> +matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree 
> *,
> +  vec *);
> +
> +static 

Re: [PATCH 6/8 v9]middle-end slp: support complex FMA and complex FMA conjugate

2021-01-08 Thread Richard Biener
On Mon, 28 Dec 2020, Tamar Christina wrote:

> Hi All,
> 
> This adds support for FMA and FMA conjugated to the slp pattern matcher.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ): New.
>   * optabs.def (cmla_optab, cmla_conj_optab): New.
>   * doc/md.texi: Document them.
>   * tree-vect-slp-patterns.c (vect_match_call_p,
>   class complex_fma_pattern, vect_slp_reset_pattern,
>   complex_fma_pattern::matches, complex_fma_pattern::recognize,
>   complex_fma_pattern::build): New.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> b8cc90e1a75e402abbf8a8cf2efefc1a333f8b3a..6d5a98c4946d3ff4c2b8abea5c29caa6863fd3f7
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6202,6 +6202,51 @@ The operation is only supported for vector modes 
> @var{m}.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{cmla@var{m}4} instruction pattern
> +@item @samp{cmla@var{m}4}
> +Perform a vector multiply and accumulate that is semantically the same as
> +a multiply and accumulate of complex numbers.
> +
> +@smallexample
> +  complex TYPE c[N];
> +  complex TYPE a[N];
> +  complex TYPE b[N];
> +  for (int i = 0; i < N; i += 1)
> +@{
> +  c[i] += a[i] * b[i];
> +@}
> +@end smallexample
> +
> +In GCC lane ordering the real part of the number must be in the even lanes 
> with
> +the imaginary part in the odd lanes.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> +@cindex @code{cmla_conj@var{m}4} instruction pattern
> +@item @samp{cmla_conj@var{m}4}
> +Perform a vector multiply by conjugate and accumulate that is semantically
> +the same as a multiply and accumulate of complex numbers where the second
> +multiply arguments is conjugated.
> +
> +@smallexample
> +  complex TYPE c[N];
> +  complex TYPE a[N];
> +  complex TYPE b[N];
> +  for (int i = 0; i < N; i += 1)
> +@{
> +  c[i] += a[i] * conj (b[i]);
> +@}
> +@end smallexample
> +
> +In GCC lane ordering the real part of the number must be in the even lanes 
> with
> +the imaginary part in the odd lanes.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{cmul@var{m}4} instruction pattern
>  @item @samp{cmul@var{m}4}
>  Perform a vector multiply that is semantically the same as multiply of
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> 5a0bbe3fe5dee591d54130e60f6996b28164ae38..305450e026d4b94ab62ceb9ca719ec5570ff43eb
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -288,6 +288,8 @@ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
>  
>  /* Ternary math functions.  */
>  DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST, cmla_conj, ternary)
>  
>  /* Unary integer ops.  */
>  DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> e82396bae1117c6de91304761a560b7fbcb69ce1..8e2758d685ed85e02df10dac571eb40d45a294ed
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -294,6 +294,8 @@ OPTAB_D (cadd90_optab, "cadd90$a3")
>  OPTAB_D (cadd270_optab, "cadd270$a3")
>  OPTAB_D (cmul_optab, "cmul$a3")
>  OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
> +OPTAB_D (cmla_optab, "cmla$a4")
> +OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
>  OPTAB_D (cos_optab, "cos$a2")
>  OPTAB_D (cosh_optab, "cosh$a2")
>  OPTAB_D (exp10_optab, "exp10$a2")
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 
> 82721acbab8cf81c4d6f9954c98fb913a7bb6282..3625a80c08e3d70fd362fc52e17e65b3b2c7da83
>  100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -325,6 +325,24 @@ vect_match_expression_p (slp_tree node, tree_code code)
>return true;
>  }
>  
> +/* Checks to see if the expression represented by NODE is a call to the 
> internal
> +   function FN.  */
> +
> +static inline bool
> +vect_match_call_p (slp_tree node, internal_fn fn)
> +{
> +  if (!node
> +  || !SLP_TREE_REPRESENTATIVE (node))
> +return false;
> +
> +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node));
> +  if (!expr
> +  || !gimple_call_internal_p (expr, fn))
> +return false;
> +
> +   return true;
> +}
> +
>  /* Check if the given lane permute in PERMUTES matches an alternating 
> sequence
> of {even odd even odd ...}.  This to account for unrolled loops.  Further
> mode there resulting permute must be linear.   */
> @@ -1081,6 +1099,161 @@ complex_mul_pattern::build (vec_info *vinfo)
>complex_pattern::build (vinfo);
>  }
>  
> 

Re: [PATCH 5/8 v9]middle-end slp: support complex multiply and complex multiply conjugate

2021-01-08 Thread Richard Biener
On Mon, 28 Dec 2020, Tamar Christina wrote:

> Hi All,
> 
> This adds support for complex multiply and complex multiply and accumulate to
> the vect pattern detector.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * internal-fn.def (COMPLEX_MUL, COMPLEX_MUL_CONJ): New.
>   * optabs.def (cmul_optab, cmul_conj_optab): New.
>   * doc/md.texi: Document them.
>   * tree-vect-slp-patterns.c (vect_match_call_complex_mla,
>   vect_normalize_conj_loc, is_eq_or_top, vect_validate_multiplication,
>   vect_build_combine_node, class complex_mul_pattern,
>   complex_mul_pattern::matches, complex_mul_pattern::recognize,
>   complex_mul_pattern::build): New.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> ec6ec180b91fcf9f481b6754c044483787fd923c..b8cc90e1a75e402abbf8a8cf2efefc1a333f8b3a
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6202,6 +6202,50 @@ The operation is only supported for vector modes 
> @var{m}.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{cmul@var{m}4} instruction pattern
> +@item @samp{cmul@var{m}4}
> +Perform a vector multiply that is semantically the same as multiply of
> +complex numbers.
> +
> +@smallexample
> +  complex TYPE c[N];
> +  complex TYPE a[N];
> +  complex TYPE b[N];
> +  for (int i = 0; i < N; i += 1)
> +@{
> +  c[i] = a[i] * b[i];
> +@}
> +@end smallexample
> +
> +In GCC lane ordering the real part of the number must be in the even lanes 
> with
> +the imaginary part in the odd lanes.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> +@cindex @code{cmul_conj@var{m}4} instruction pattern
> +@item @samp{cmul_conj@var{m}4}
> +Perform a vector multiply by conjugate that is semantically the same as a
> +multiply of complex numbers where the second multiply arguments is 
> conjugated.
> +
> +@smallexample
> +  complex TYPE c[N];
> +  complex TYPE a[N];
> +  complex TYPE b[N];
> +  for (int i = 0; i < N; i += 1)
> +@{
> +  c[i] = a[i] * conj (b[i]);
> +@}
> +@end smallexample
> +
> +In GCC lane ordering the real part of the number must be in the even lanes 
> with
> +the imaginary part in the odd lanes.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{ffs@var{m}2} instruction pattern
>  @item @samp{ffs@var{m}2}
>  Store into operand 0 one plus the index of the least significant 1-bit
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> 511fe70162b5d9db3a61a5285d31c008f6835487..5a0bbe3fe5dee591d54130e60f6996b28164ae38
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -279,6 +279,8 @@ DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
>  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
>  DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
>  DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
>  
>  
>  /* FP scales.  */
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> e9727def4dbf941bb9ac8b56f83f8ea0f52b262c..e82396bae1117c6de91304761a560b7fbcb69ce1
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -292,6 +292,8 @@ OPTAB_D (copysign_optab, "copysign$F$a3")
>  OPTAB_D (xorsign_optab, "xorsign$F$a3")
>  OPTAB_D (cadd90_optab, "cadd90$a3")
>  OPTAB_D (cadd270_optab, "cadd270$a3")
> +OPTAB_D (cmul_optab, "cmul$a3")
> +OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
>  OPTAB_D (cos_optab, "cos$a2")
>  OPTAB_D (cosh_optab, "cosh$a2")
>  OPTAB_D (exp10_optab, "exp10$a2")
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 
> dbc58f7c53868ed431fc67de1f0162eb0d3b2c24..82721acbab8cf81c4d6f9954c98fb913a7bb6282
>  100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -719,6 +719,368 @@ complex_add_pattern::recognize 
> (slp_tree_to_load_perm_map_t *perm_cache,
>return new complex_add_pattern (node, , ifn);
>  }
>  
> +/***
> + * complex_mul_pattern
> + 
> **/
> +
> +/* Helper function of that looks for a match in the CHILDth child of NODE.  
> The
> +   child used is stored in RES.
> +
> +   If the match is successful then ARGS will contain the operands matched
> +   and the complex_operation_t type is returned.  If match is not successful
> +   then CMPLX_NONE is returned and ARGS is left unmodified.  */
> +
> +static inline complex_operation_t
> +vect_match_call_complex_mla (slp_tree node, unsigned child,
> +  vec 

[PATCH]AArch64 SVE2: Fix aarch64-sve2-acle-asm tests.

2021-01-08 Thread Tamar Christina via Gcc-patches
Hi All,

This fixes a logical inconsistency with the SVE2 ACLE tests where the SVE2 tests
are checking for SVE support in the assembler instead of SVE2.

This makes all these tests fail when the user has an SVE enabled assembler but
not an SVE2 one.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_aarch64_asm_sve_ok): New.
* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Use it.

--- inline copy of patch -- 
diff --git 
a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp 
b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
index 
aa8dc9ee11ec4e0a7144e79a36f52c6d9d83ec68..c3a3a01a7ed913720e31729300dc0a5c99232ec4
 100644
--- a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
+++ b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
@@ -66,7 +66,7 @@ set-torture-options {
 # Main loop.
 set files [glob -nocomplain $srcdir/$gcc_subdir/asm/*.c]
 set save-dg-do-what-default ${dg-do-what-default}
-if { [check_effective_target_aarch64_asm_sve_ok]
+if { [check_effective_target_aarch64_asm_sve2_ok]
  && [check_effective_target_aarch64_variant_pcs] } {
 set dg-do-what-default assemble
 } else {
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
5cf0f4218a6420a5289a4be25ecca0915107b139..47d4c45e9eb9008148a5f8f26b3c7dd7292369fc
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9735,7 +9735,7 @@ proc check_effective_target_aarch64_tiny { } {
 # various architecture extensions via the .arch_extension pseudo-op.
 
 foreach { aarch64_ext } { "fp" "simd" "crypto" "crc" "lse" "dotprod" "sve"
- "i8mm" "f32mm" "f64mm" "bf16" "sb" } {
+ "i8mm" "f32mm" "f64mm" "bf16" "sb" "sve2" } {
 eval [string map [list FUNC $aarch64_ext] {
proc check_effective_target_aarch64_asm_FUNC_ok { } {
  if { [istarget aarch64*-*-*] } {


-- 
diff --git a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
index aa8dc9ee11ec4e0a7144e79a36f52c6d9d83ec68..c3a3a01a7ed913720e31729300dc0a5c99232ec4 100644
--- a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
+++ b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
@@ -66,7 +66,7 @@ set-torture-options {
 # Main loop.
 set files [glob -nocomplain $srcdir/$gcc_subdir/asm/*.c]
 set save-dg-do-what-default ${dg-do-what-default}
-if { [check_effective_target_aarch64_asm_sve_ok]
+if { [check_effective_target_aarch64_asm_sve2_ok]
  && [check_effective_target_aarch64_variant_pcs] } {
 set dg-do-what-default assemble
 } else {
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5cf0f4218a6420a5289a4be25ecca0915107b139..47d4c45e9eb9008148a5f8f26b3c7dd7292369fc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9735,7 +9735,7 @@ proc check_effective_target_aarch64_tiny { } {
 # various architecture extensions via the .arch_extension pseudo-op.
 
 foreach { aarch64_ext } { "fp" "simd" "crypto" "crc" "lse" "dotprod" "sve"
-			  "i8mm" "f32mm" "f64mm" "bf16" "sb" } {
+			  "i8mm" "f32mm" "f64mm" "bf16" "sb" "sve2" } {
 eval [string map [list FUNC $aarch64_ext] {
 	proc check_effective_target_aarch64_asm_FUNC_ok { } {
 	  if { [istarget aarch64*-*-*] } {



Re: [PATCH] c++: Add support for -std=c++2b

2021-01-08 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 08, 2021 at 12:48:51AM +, Paul Fee via Gcc-patches wrote:
> Derived from the changes that added C++2a support in 2017.
> https://gcc.gnu.org/g:026a79f70cf33f836ea5275eda72d4870a3041e5
> 
> No C++2b features are added here.
> Use of -std=c++2b sets __cplusplus to 202101L.

What Jon wrote, plus:

> 
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 5541e694bb3..3a0d452b62b 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2021-01-08  Paul Fee  
> +
> +Add support for -std=c++2b
> +* doc/cpp.texi (__cplusplus): Document value for -std=c++2b
> +or -std=gnu+2b.

One plus missing above.

> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -738,7 +738,9 @@ enum cxx_dialect {
>/* C++17 */
>cxx17,
>/* C++20 */
> -  cxx20
> +  cxx20,
> +  /* C++23? */

In the past we used
  /* C++2a (C++20?) */
here, so perhaps
  /* C++2b (C++23?) */
?
> +  cxx2b

> 
> +/* Set the C++ 2020 standard (without GNU extensions if ISO).  */

2020 is wrong.  In the past we used
/* Set the C++ 202a draft standard (without GNU extensions if ISO).  */
so perhaps
/* Set the C++ 202b draft standard (without GNU extensions if ISO).  */

> +static void
> +set_std_cxx2b (int iso)
> +{
> +cpp_set_lang (parse_in, iso ? CLK_CXX2B: CLK_GNUCXX2B);

Wrong formatting, only 2 char indentation for the whole function body.
> +flag_no_gnu_keywords = iso;
> +flag_no_nonansi_builtin = iso;
> +flag_iso = iso;
> +/* C++2b includes the C11 standard library.  */
> +flag_isoc94 = 1;
> +flag_isoc99 = 1;
> +flag_isoc11 = 1;
> +/* C++2b includes coroutines.  */
> +flag_coroutines = true;
> +cxx_dialect = cxx2b;
> +lang_hooks.name = "GNU C++20"; /* Pretend C++20 until standardization.  
> */
> +}
> +
>  /* Args to -d specify what to dump.  Silently ignore
> unrecognized options; they may be aimed at toplev.c.  */
>  static void
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 1766364806e..3464d72591b 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -2214,6 +2214,11 @@ std=c++20
>  C++ ObjC++
>  Conform to the ISO 2020 C++ draft standard (experimental and
> incomplete support).
> 
> +std=c++2b
> +C++ ObjC++
> +Conform to the ISO 2023 (?) C++ draft standard (experimental and
> +incomplete support).

Perhaps no space before (?) ?
And to be consistent with other similar entries, there was no line wrapping in
the description.

> +
>  std=c11
>  C ObjC
>  Conform to the ISO 2011 C standard.
> @@ -2292,6 +2297,11 @@ std=gnu++20
>  C++ ObjC++
>  Conform to the ISO 2020 C++ draft standard with GNU extensions
> (experimental and incomplete support).
> 
> +std=gnu++2b
> +C++ ObjC++
> +Conform to the ISO 2023 (?) C++ draft standard with GNU extensions

Likewise.

> (experimental
> +and incomplete support).

> +* lib/target-supports.exp: (check_effective_target_c++2a_only)
> +rename to check_effective_target_c++20_only.

The : belongs after ), and description after : should start with capital
letter.  So
* lib/target-supports.exp (check_effective_target_c++2a_only):
Rename to ...
(check_effective_target_c++20_only): ... this.
Though, what you are really doing is swapping the names of the two.

Also, while your patch keeps lang_name as GNU C++20 even for -std=c++2b,
I think it would be better to adjust dwarf2out.c too:
  else if (strcmp (language_string, "GNU C++17") == 0
   || strcmp (language_string, "GNU C++20") == 0)
/* For now.  */
language = DW_LANG_C_plus_plus_14;
and add there || strcmp (language_string, "GNU C++23") == 0
so that we don't forget.

Jakub



Re: [PATCH] x86-64: Use R10 for profiling large model

2021-01-08 Thread Uros Bizjak via Gcc-patches
> Since R10 is preserved when calling mcount, R10 can be used a scratch
> register to call mcount in large model.

Please mention that R10 can be used as a static chain registers and is
preserved when calling mcount for nested functions.

> gcc/
>
> PR target/98482
> * config/i386/i386.c (x86_function_profiler): Use R10 to call
> mcount in large model. Sorry for large model with PIC.
>
> gcc/testsuite/
>
> PR target/98482
> * gcc.target/i386/pr98482-1.c: New test.
> * gcc.target/i386/pr98482-1.c: Likewise.

OK with comment fixes.

Thanks,
Uros.

+case CM_LARGE:
+  /* NB: R10 can be used as a scratch register here since
+R10 is preserved when calling mcount.  */

Also mention that R10 can be used as a static chain register and is
preserved when calling mcount for nested functions.

+  fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
+   mcount_name);
+  break;


Re: [PATCH] c++: Add support for -std=c++2b

2021-01-08 Thread Jonathan Wakely via Gcc-patches
On Fri, 8 Jan 2021, 00:50 Paul Fee via Libstdc++, 
wrote:

> Derived from the changes that added C++2a support in 2017.
> https://gcc.gnu.org/g:026a79f70cf33f836ea5275eda72d4870a3041e5
>
> No C++2b features are added here.
> Use of -std=c++2b sets __cplusplus to 202101L.
>

I wonder if 202100L would be better, as is obviously not a real date, but
it's still greater than the value for C++20. We didn't do that for C++2a
though.




>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 5541e694bb3..3a0d452b62b 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2021-01-08  Paul Fee  
> +
> +Add support for -std=c++2b
> +* doc/cpp.texi (__cplusplus): Document value for -std=c++2b
> +or -std=gnu+2b.
> +* doc/invoke.texi: Document -std=c++2b and -std=gnu++2b.
>

Do not patch the changelog files. Since last year they are automatically
edited by a nightly script, using the content of the git commit message. So
you should put the changelog entries in your commit message instead. See
any commits on the master branch for examples.

I only skimmed the rest of the patch but it looks ok to me.


Re: [PATCH] testsuite: Fix test failures from outputs.exp [PR98225]

2021-01-08 Thread Alexandre Oliva
On Jan  7, 2021, Bernd Edlinger  wrote:

> I don't know why that is there in the first place, as there
> are no C++ test cases, these files should not be created at all.

collect2, on platforms that use it, create .cdtor files even for C.
David Edelsohn told me so back then; the problem was on AIX IIRC.  That
was why I added code to tolerate such outputs.  Removing it would likely
bring that failure back.


> Is it OK for trunk?

It looks good to me, aside from the removal of the .cdtor handler.

I don't think I have authority to approve it with that change,
but I would if I did ;-)  Thanks!

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: [PATCH] ipa-modref: avoid linebreak split in debug print

2021-01-08 Thread Richard Biener
On Thu, 7 Jan 2021, Sergei Trofimovich wrote:

> From: Sergei Trofimovich 
> 
>   * ipa-modref.c (merge_call_side_effects): Fix
>   linebreak split by reordering two print calls.


OK.

Richard.

> ---
>  gcc/ipa-modref.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index fcc676d25e4..04613201f1f 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -835,10 +835,6 @@ merge_call_side_effects (modref_summary *cur_summary,
>auto_vec  parm_map;
>bool changed = false;
>  
> -  if (dump_file)
> -fprintf (dump_file, " - Merging side effects of %s with parm map:",
> -  callee_node->dump_name ());
> -
>/* We can not safely optimize based on summary of callee if it does
>   not always bind to current def: it is possible that memory load
>   was optimized out earlier which may not happen in the interposed
> @@ -850,6 +846,10 @@ merge_call_side_effects (modref_summary *cur_summary,
>cur_summary->loads->collapse ();
>  }
>  
> +  if (dump_file)
> +fprintf (dump_file, " - Merging side effects of %s with parm map:",
> +  callee_node->dump_name ());
> +
>parm_map.safe_grow_cleared (gimple_call_num_args (stmt), true);
>for (unsigned i = 0; i < gimple_call_num_args (stmt); i++)
>  {
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] i386: Fix -mcmodel= vs. target attribute [PR98585]

2021-01-08 Thread Uros Bizjak via Gcc-patches
On Fri, Jan 8, 2021 at 9:24 AM Jakub Jelinek  wrote:
>
> Hi!
>
> My patch to save/restore opts_set rather than essentially treating
> global_options_set as a logical or whether some option has ever been
> explicitly set somewhere apparently broke -mcmodel= vs. target attribute
> (and as the patch shows some other options too).
> The thing is, at least for options for which we ever test opts_set->x_*
> or global_options_set.x_*, we need to save/restore them next to the
> saving/restoring of the actual option values.
> If an option has Save keyword or in case of TargetVariable, it is the
> generic code that handles the saving and restoring of both the option
> and corresponding opts_set flag automatically, for other variables
> (TargetSave, or Target without Save) the backend needs to do that in the
> target hook manually and in that case should save/restore both the option
> values (the hooks mostly did that) and opts_set (they didn't).
>
> As it seems much easier to let the automatic saving/restoring do the work
> for us unless the saving/restoring of the option needs some specific magic,
> the following patch is a result of grepping through the backend for
> opts_set->x_ and global_options_set.x_ and for all such referenced
> variables, grepping whether it is saved/restored including opts_set properly
> in the generated options-save.c or not.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-01-07  Jakub Jelinek  
>
> PR target/98585
> * config/i386/i386.opt (ix86_cmodel, ix86_incoming_stack_boundary_arg,
> ix86_pmode, ix86_preferred_stack_boundary_arg, ix86_regparm,
> ix86_veclibabi_type): Remove x_ prefix, use TargetVariable instead of
> TargetSave and initialize for variables with enum types.
> (mfentry, mstack-protector-guard-reg=, mstack-protector-guard-offset=,
> mstack-protector-guard-symbol=): Add Save.
>
> * config/i386/i386-options.c (ix86_function_specific_save,
> ix86_function_specific_restore): Don't save or restore x_ix86_cmodel,
> x_ix86_incoming_stack_boundary_arg, x_ix86_pmode,
> x_ix86_preferred_stack_boundary_arg, x_ix86_regparm,
> x_ix86_veclibabi_type.
>
> * gcc.target/i386/pr98585.c: New test.

LGTM.

Thanks,
Uros.

> --- gcc/config/i386/i386.opt.jj 2021-01-07 21:08:41.017459644 +0100
> +++ gcc/config/i386/i386.opt2021-01-07 21:16:02.468563270 +0100
> @@ -105,8 +105,8 @@ TargetSave
>  unsigned char arch_specified
>
>  ;; -mcmodel= model
> -TargetSave
> -enum cmodel x_ix86_cmodel
> +TargetVariable
> +enum cmodel ix86_cmodel = CM_32
>
>  ;; -mabi=
>  TargetSave
> @@ -133,24 +133,24 @@ TargetSave
>  int x_ix86_force_drap
>
>  ;; -mincoming-stack-boundary=
> -TargetSave
> -int x_ix86_incoming_stack_boundary_arg
> +TargetVariable
> +int ix86_incoming_stack_boundary_arg
>
>  ;; -maddress-mode=
> -TargetSave
> -enum pmode x_ix86_pmode
> +TargetVariable
> +enum pmode ix86_pmode = PMODE_SI
>
>  ;; -mpreferred-stack-boundary=
> -TargetSave
> -int x_ix86_preferred_stack_boundary_arg
> +TargetVariable
> +int ix86_preferred_stack_boundary_arg
>
>  ;; -mrecip=
>  TargetSave
>  const char *x_ix86_recip_name
>
>  ;; -mregparm=
> -TargetSave
> -int x_ix86_regparm
> +TargetVariable
> +int ix86_regparm
>
>  ;; -mlarge-data-threshold=
>  TargetSave
> @@ -189,8 +189,8 @@ TargetSave
>  int x_ix86_tune_no_default
>
>  ;; -mveclibabi=
> -TargetSave
> -enum ix86_veclibabi x_ix86_veclibabi_type
> +TargetVariable
> +enum ix86_veclibabi ix86_veclibabi_type = ix86_veclibabi_type_none
>
>  ;; x86 options
>  m128bit-long-double
> @@ -934,7 +934,7 @@ Target Mask(ISA_PREFETCHWT1) Var(ix86_is
>  Support PREFETCHWT1 built-in functions and code generation.
>
>  mfentry
> -Target Var(flag_fentry)
> +Target Save Var(flag_fentry)
>  Emit profiling counter call at function entry before prologue.
>
>  mrecord-mcount
> @@ -1005,21 +1005,21 @@ EnumValue
>  Enum(stack_protector_guard) String(global) Value(SSP_GLOBAL)
>
>  mstack-protector-guard-reg=
> -Target RejectNegative Joined Var(ix86_stack_protector_guard_reg_str)
> +Target Save RejectNegative Joined Var(ix86_stack_protector_guard_reg_str)
>  Use the given base register for addressing the stack-protector guard.
>
>  TargetVariable
>  addr_space_t ix86_stack_protector_guard_reg = ADDR_SPACE_GENERIC
>
>  mstack-protector-guard-offset=
> -Target RejectNegative Joined Integer 
> Var(ix86_stack_protector_guard_offset_str)
> +Target Save RejectNegative Joined Integer 
> Var(ix86_stack_protector_guard_offset_str)
>  Use the given offset for addressing the stack-protector guard.
>
>  TargetVariable
>  HOST_WIDE_INT ix86_stack_protector_guard_offset = 0
>
>  mstack-protector-guard-symbol=
> -Target RejectNegative Joined Integer 
> Var(ix86_stack_protector_guard_symbol_str)
> +Target Save RejectNegative Joined Integer 
> Var(ix86_stack_protector_guard_symbol_str)
>  Use the given symbol for addressing the stack-protector guard.
>
>  

Re: [PATCH] IBM Z: Introduce __LONG_DOUBLE_VX__ macro

2021-01-08 Thread Andreas Krebbel via Gcc-patches
On 1/8/21 1:17 AM, Ilya Leoshkevich wrote:

> +  s390_def_or_undef_macro (
> +  pfile,
> +  [] (const struct cl_target_option *opts) { return TARGET_Z14_P (opts); 
> },
> +  old_opts, opts, "__LONG_DOUBLE_VX__", "__LONG_DOUBLE_VX__");

Shouldn't this rather check TARGET_VXE_P instead?

Bye,

Andreas


[PATCH] i386: Fix -mcmodel= vs. target attribute [PR98585]

2021-01-08 Thread Jakub Jelinek via Gcc-patches
Hi!

My patch to save/restore opts_set rather than essentially treating
global_options_set as a logical or whether some option has ever been
explicitly set somewhere apparently broke -mcmodel= vs. target attribute
(and as the patch shows some other options too).
The thing is, at least for options for which we ever test opts_set->x_*
or global_options_set.x_*, we need to save/restore them next to the
saving/restoring of the actual option values.
If an option has Save keyword or in case of TargetVariable, it is the
generic code that handles the saving and restoring of both the option
and corresponding opts_set flag automatically, for other variables
(TargetSave, or Target without Save) the backend needs to do that in the
target hook manually and in that case should save/restore both the option
values (the hooks mostly did that) and opts_set (they didn't).

As it seems much easier to let the automatic saving/restoring do the work
for us unless the saving/restoring of the option needs some specific magic,
the following patch is a result of grepping through the backend for
opts_set->x_ and global_options_set.x_ and for all such referenced
variables, grepping whether it is saved/restored including opts_set properly
in the generated options-save.c or not.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-01-07  Jakub Jelinek  

PR target/98585
* config/i386/i386.opt (ix86_cmodel, ix86_incoming_stack_boundary_arg,
ix86_pmode, ix86_preferred_stack_boundary_arg, ix86_regparm,
ix86_veclibabi_type): Remove x_ prefix, use TargetVariable instead of
TargetSave and initialize for variables with enum types.
(mfentry, mstack-protector-guard-reg=, mstack-protector-guard-offset=,
mstack-protector-guard-symbol=): Add Save.

* config/i386/i386-options.c (ix86_function_specific_save,
ix86_function_specific_restore): Don't save or restore x_ix86_cmodel,
x_ix86_incoming_stack_boundary_arg, x_ix86_pmode,
x_ix86_preferred_stack_boundary_arg, x_ix86_regparm,
x_ix86_veclibabi_type.

* gcc.target/i386/pr98585.c: New test.

--- gcc/config/i386/i386.opt.jj 2021-01-07 21:08:41.017459644 +0100
+++ gcc/config/i386/i386.opt2021-01-07 21:16:02.468563270 +0100
@@ -105,8 +105,8 @@ TargetSave
 unsigned char arch_specified
 
 ;; -mcmodel= model
-TargetSave
-enum cmodel x_ix86_cmodel
+TargetVariable
+enum cmodel ix86_cmodel = CM_32
 
 ;; -mabi=
 TargetSave
@@ -133,24 +133,24 @@ TargetSave
 int x_ix86_force_drap
 
 ;; -mincoming-stack-boundary=
-TargetSave
-int x_ix86_incoming_stack_boundary_arg
+TargetVariable
+int ix86_incoming_stack_boundary_arg
 
 ;; -maddress-mode=
-TargetSave
-enum pmode x_ix86_pmode
+TargetVariable
+enum pmode ix86_pmode = PMODE_SI
 
 ;; -mpreferred-stack-boundary=
-TargetSave
-int x_ix86_preferred_stack_boundary_arg
+TargetVariable
+int ix86_preferred_stack_boundary_arg
 
 ;; -mrecip=
 TargetSave
 const char *x_ix86_recip_name
 
 ;; -mregparm=
-TargetSave
-int x_ix86_regparm
+TargetVariable
+int ix86_regparm
 
 ;; -mlarge-data-threshold=
 TargetSave
@@ -189,8 +189,8 @@ TargetSave
 int x_ix86_tune_no_default
 
 ;; -mveclibabi=
-TargetSave
-enum ix86_veclibabi x_ix86_veclibabi_type
+TargetVariable
+enum ix86_veclibabi ix86_veclibabi_type = ix86_veclibabi_type_none
 
 ;; x86 options
 m128bit-long-double
@@ -934,7 +934,7 @@ Target Mask(ISA_PREFETCHWT1) Var(ix86_is
 Support PREFETCHWT1 built-in functions and code generation.
 
 mfentry
-Target Var(flag_fentry)
+Target Save Var(flag_fentry)
 Emit profiling counter call at function entry before prologue.
 
 mrecord-mcount
@@ -1005,21 +1005,21 @@ EnumValue
 Enum(stack_protector_guard) String(global) Value(SSP_GLOBAL)
 
 mstack-protector-guard-reg=
-Target RejectNegative Joined Var(ix86_stack_protector_guard_reg_str)
+Target Save RejectNegative Joined Var(ix86_stack_protector_guard_reg_str)
 Use the given base register for addressing the stack-protector guard.
 
 TargetVariable
 addr_space_t ix86_stack_protector_guard_reg = ADDR_SPACE_GENERIC
 
 mstack-protector-guard-offset=
-Target RejectNegative Joined Integer Var(ix86_stack_protector_guard_offset_str)
+Target Save RejectNegative Joined Integer 
Var(ix86_stack_protector_guard_offset_str)
 Use the given offset for addressing the stack-protector guard.
 
 TargetVariable
 HOST_WIDE_INT ix86_stack_protector_guard_offset = 0
 
 mstack-protector-guard-symbol=
-Target RejectNegative Joined Integer Var(ix86_stack_protector_guard_symbol_str)
+Target Save RejectNegative Joined Integer 
Var(ix86_stack_protector_guard_symbol_str)
 Use the given symbol for addressing the stack-protector guard.
 
 mmitigate-rop
--- gcc/config/i386/i386-options.c.jj   2021-01-04 10:25:45.426159170 +0100
+++ gcc/config/i386/i386-options.c  2021-01-07 21:08:57.175280431 +0100
@@ -651,18 +651,13 @@ ix86_function_specific_save (struct cl_t
   ptr->x_recip_mask_explicit = opts->x_recip_mask_explicit;