Re: [PATCH] Add __builtin_iseqsig()

2022-10-28 Thread Jeff Law via Gcc-patches



On 9/21/22 03:40, FX via Gcc-patches wrote:

ping*2




Le 9 sept. 2022 à 19:55, FX  a écrit :

ping



Le 1 sept. 2022 à 23:02, FX  a écrit :

Attached patch adds __builtin_iseqsig() to the middle-end and C family 
front-ends.
Testing does not currently check whether the signaling part works, because with 
optimisation is actually does not (preexisting compiler bug: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106805)

Bootstrapped and regtested on x86_64-linux.
OK to commit?

(I’m not very skilled for middle-end hacking, so I’m sure there will be 
modifications to make.)

FX
<0001-Add-__builtin_iseqsig.patch>


Joseph, do you have bits in this space that are going to be landing 
soon, or is your C2X work focused elsewhere?  Are there other C2X 
routines we need to be proving builtins for?



Jeff



Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/20/22 13:01, Andrea Parri wrote:

On Wed, Oct 12, 2022 at 07:16:20PM +0200, Andrea Parri wrote:

 +Andrea, in case he has time to look at the memory model / ABI
 issues.

+Jeff, who was offering to help when the threads got crossed.  I'd punted on
a lot of this in the hope Andrea could help out, as I'm not really a memory
model guy and this is pretty far down the rabbit hole.  Happy to have the
help if you're offering, though, as what's there is likely a pretty big
performance issue for anyone with a reasonable memory system.

Thanks for linking me to the discussion and the remarks, Palmer.  I'm
happy to help (and synchronized with Jeff/the community) as possible,
building a better understanding of the 'issues' at stake.

Summarizing here some findings from looking at the currently-implemented
and the proposed [1] mappings:

   - Current mapping is missing synchronization, notably

atomic_compare_exchange_weak_explicit(-, -, -,
  memory_order_release,
  memory_order_relaxed);

 is unable to provide the (required) release ordering guarantees; for
 reference, I've reported a litmus test illustrating it at the bottom
 of this email, cf. c-cmpxchg.

   - [1] addressed the "memory_order_release" problem/bug mentioned above
 (as well as other quirks of the current mapping I won't detail here),
 but it doesn't address other problems present in the current mapping;
 in particular, both mappings translate the following

atomic_compare_exchange_weak_explicit(-, -, -,
  memory_order_acquire,
  memory_order_relaxed);

 to a sequence

lr.w
bne
sc.w.aq

 (withouth any other synchronization/fences), which contrasts with the
 the Unprivileged Spec, Section 10,2 "Load-Reserve / Store-Conditional
 Instructions":

   "Software should not set the 'rl' bit on an LR instruction unless
   the 'aq' bit is also set, nor should software set the 'aq' bit on
   an SC instruction unless the 'rl' bit is also set.  LR.rl and SC.aq
   instructions are not guaranteed to provide any stronger ordering
   than those with both bits clear [...]"


So it sounds like Christoph's patch is an improvement, but isn't 
complete.  Given the pain in this space, I'd be hesitant to put in an 
incomplete fix, even if it moves things in the right direction as it 
creates another compatibility headache if we don't get the complete 
solution in place for gcc-13.



Christoph, thoughts on the case Andrea pointed out?


Jeff




Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/24/22 00:04, Aldy Hernandez via Gcc-patches wrote:

PING


I'd be a lot more comfortable if Jakub would chime in here.


Jeff




Re: [PATCH] builtins: Add various __builtin_*f{16,32,64,128,32x,64x,128x} builtins

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/16/22 04:09, Jakub Jelinek wrote:

Hi!

When working on libstdc++ extended float support in , I found that
we need various builtins for the _Float{16,32,64,128,32x,64x,128x} types.
Glibc 2.26 and later provides the underlying libm routines (except for
_Float16 and _Float128x for the time being) and in libstdc++ I think we
need at least the _Float128 builtins on x86_64, i?86, powerpc64le and ia64
(when long double is IEEE quad, we can handle it by using __builtin_*l
instead), because without the builtins the overloads couldn't be constexpr
(say when it would declare the *f128 extern "C" routines itself and call
them).

The testcase covers just types of those builtins and their constant
folding, so doesn't need actual libm support.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-10-15  Jakub Jelinek  

* builtin-types.def (BT_FLOAT16_PTR, BT_FLOAT32_PTR, BT_FLOAT64_PTR,
BT_FLOAT128_PTR, BT_FLOAT32X_PTR, BT_FLOAT64X_PTR, BT_FLOAT128X_PTR):
New DEF_PRIMITIVE_TYPE.
(BT_FN_INT_FLOAT16, BT_FN_INT_FLOAT32, BT_FN_INT_FLOAT64,
BT_FN_INT_FLOAT128, BT_FN_INT_FLOAT32X, BT_FN_INT_FLOAT64X,
BT_FN_INT_FLOAT128X, BT_FN_LONG_FLOAT16, BT_FN_LONG_FLOAT32,
BT_FN_LONG_FLOAT64, BT_FN_LONG_FLOAT128, BT_FN_LONG_FLOAT32X,
BT_FN_LONG_FLOAT64X, BT_FN_LONG_FLOAT128X, BT_FN_LONGLONG_FLOAT16,
BT_FN_LONGLONG_FLOAT32, BT_FN_LONGLONG_FLOAT64,
BT_FN_LONGLONG_FLOAT128, BT_FN_LONGLONG_FLOAT32X,
BT_FN_LONGLONG_FLOAT64X, BT_FN_LONGLONG_FLOAT128X): New
DEF_FUNCTION_TYPE_1.
(BT_FN_FLOAT16_FLOAT16_FLOAT16PTR, BT_FN_FLOAT32_FLOAT32_FLOAT32PTR,
BT_FN_FLOAT64_FLOAT64_FLOAT64PTR, BT_FN_FLOAT128_FLOAT128_FLOAT128PTR,
BT_FN_FLOAT32X_FLOAT32X_FLOAT32XPTR,
BT_FN_FLOAT64X_FLOAT64X_FLOAT64XPTR,
BT_FN_FLOAT128X_FLOAT128X_FLOAT128XPTR, BT_FN_FLOAT16_FLOAT16_INT,
BT_FN_FLOAT32_FLOAT32_INT, BT_FN_FLOAT64_FLOAT64_INT,
BT_FN_FLOAT128_FLOAT128_INT, BT_FN_FLOAT32X_FLOAT32X_INT,
BT_FN_FLOAT64X_FLOAT64X_INT, BT_FN_FLOAT128X_FLOAT128X_INT,
BT_FN_FLOAT16_FLOAT16_INTPTR, BT_FN_FLOAT32_FLOAT32_INTPTR,
BT_FN_FLOAT64_FLOAT64_INTPTR, BT_FN_FLOAT128_FLOAT128_INTPTR,
BT_FN_FLOAT32X_FLOAT32X_INTPTR, BT_FN_FLOAT64X_FLOAT64X_INTPTR,
BT_FN_FLOAT128X_FLOAT128X_INTPTR, BT_FN_FLOAT16_FLOAT16_LONG,
BT_FN_FLOAT32_FLOAT32_LONG, BT_FN_FLOAT64_FLOAT64_LONG,
BT_FN_FLOAT128_FLOAT128_LONG, BT_FN_FLOAT32X_FLOAT32X_LONG,
BT_FN_FLOAT64X_FLOAT64X_LONG, BT_FN_FLOAT128X_FLOAT128X_LONG): New
DEF_FUNCTION_TYPE_2.
(BT_FN_FLOAT16_FLOAT16_FLOAT16_INTPTR,
BT_FN_FLOAT32_FLOAT32_FLOAT32_INTPTR,
BT_FN_FLOAT64_FLOAT64_FLOAT64_INTPTR,
BT_FN_FLOAT128_FLOAT128_FLOAT128_INTPTR,
BT_FN_FLOAT32X_FLOAT32X_FLOAT32X_INTPTR,
BT_FN_FLOAT64X_FLOAT64X_FLOAT64X_INTPTR,
BT_FN_FLOAT128X_FLOAT128X_FLOAT128X_INTPTR): New DEF_FUNCTION_TYPE_3.
* builtins.def (ACOSH_TYPE, ATAN2_TYPE, ATANH_TYPE, COSH_TYPE,
FDIM_TYPE, HUGE_VAL_TYPE, HYPOT_TYPE, ILOGB_TYPE, LDEXP_TYPE,
LGAMMA_TYPE, LLRINT_TYPE, LOG10_TYPE, LRINT_TYPE, MODF_TYPE,
NEXTAFTER_TYPE, REMQUO_TYPE, SCALBLN_TYPE, SCALBN_TYPE, SINH_TYPE):
Define and undefine later.
(FMIN_TYPE, SQRT_TYPE): Undefine at a later line.
(INF_TYPE): Define at a later line.
(BUILT_IN_ACOSH, BUILT_IN_ACOS, BUILT_IN_ASINH, BUILT_IN_ASIN,
BUILT_IN_ATAN2, BUILT_IN_ATANH, BUILT_IN_ATAN, BUILT_IN_CBRT,
BUILT_IN_COSH, BUILT_IN_COS, BUILT_IN_ERFC, BUILT_IN_ERF,
BUILT_IN_EXP2, BUILT_IN_EXP, BUILT_IN_EXPM1, BUILT_IN_FDIM,
BUILT_IN_FMOD, BUILT_IN_FREXP, BUILT_IN_HYPOT, BUILT_IN_ILOGB,
BUILT_IN_LDEXP, BUILT_IN_LGAMMA, BUILT_IN_LLRINT, BUILT_IN_LLROUND,
BUILT_IN_LOG10, BUILT_IN_LOG1P, BUILT_IN_LOG2, BUILT_IN_LOGB,
BUILT_IN_LOG, BUILT_IN_LRINT, BUILT_IN_LROUND, BUILT_IN_MODF,
BUILT_IN_NEXTAFTER, BUILT_IN_POW, BUILT_IN_REMAINDER, BUILT_IN_REMQUO,
BUILT_IN_SCALBLN, BUILT_IN_SCALBN, BUILT_IN_SINH, BUILT_IN_SIN,
BUILT_IN_TANH, BUILT_IN_TAN, BUILT_IN_TGAMMA): Add
DEF_EXT_LIB_FLOATN_NX_BUILTINS.
(BUILT_IN_HUGE_VAL): Use HUGE_VAL_TYPE instead of INF_TYPE in
DEF_GCC_FLOATN_NX_BUILTINS.
* fold-const-call.cc (fold_const_call_ss): Add various CASE_CFN_*_FN:
cases when CASE_CFN_* is present.
(fold_const_call_sss): Likewise.
* builtins.cc (mathfn_built_in_2): Use CASE_MATHFN_FLOATN instead of
CASE_MATHFN for various builtins in SEQ_OF_CASE_MATHFN macro.
(builtin_with_linkage_p): Add CASE_FLT_FN_FLOATN_NX for various
builtins next to CASE_FLT_FN.
* fold-const.cc (tree_call_nonnegative_warnv_p): Add CASE_CFN_*_FN:
next to CASE_CFN_*: for various builtins.
* tree-call-cdce.cc (can_test_argument_range): Add
CASE_FLT_FN_FLOATN_NX next to CASE_FLT_FN for various 

Re: [PATCH] improved const shifts for AVR targets

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/15/22 06:08, A. Binzberger wrote:

Re: [PATCH] improved const shifts for AVR targets
On 12.10.22 19:57, Jeff Law wrote:


On 10/4/22 11:06, Alexander Binzberger via Gcc-patches wrote:

Hi,
recently I used some arduino uno for a project and realized some areas
which do not output optimal asm code. Especially around shifts and 
function

calls.
With this as motivation and hacktoberfest I started patching things.
Since patch files do not provide a good overview and I hope for a
"hacktoberfest-accepted" label on the PR on github I also opened it 
there:

https://github.com/gcc-mirror/gcc/pull/73

This patch improves shifts with const right hand operand. While 8bit 
and

16bit shifts where mostly fine 24bit and 32bit where not handled well.

Testing
I checked output with a local installation of compiler explorer in 
asm and

a tiny unit test comparing shifts with mul/div by 2.
I however did not write any testcases in gcc for it.

Target
This patch is only targeting atmel avr family of chips.

Changelog
improved const shifts for AVR targets


It would be helpful if you could show the before/after code for the 
cases you're changing.  Extra credit if you include cycles & size 
information for those cases.  That would help someone like me who 
knows GCC well, but isn't particularly well versed in the AVR target 
evaluate the overarching goal of the patch (ie, better code).


about the avr family targets:

* consider every branch instruction = 1/2 cycles

* consider every 2byte/word instruction (besides move word if 
available) = 2 cycles


* consider multiplication (if available) = 2 cycles

* consider every load (beside load immediate "ldi" 1cylce) = 2cycles 
(+1 for prog mem)


* pop and jump mostly = 2 cycles

* call is basically = 2-4 cycles

* ret is about =  4/5 cycles

* consider every instruction (bit/bit-test, most compare, arithmetic, 
logic, some other) = 1 cycle


* division does not exist

or as a summary for this patch: branches and such are 2 cycles the 
rest is 1 cycle


note that shifts are 1bit per cycle and the instructions are at least 
mostly byte based.


also note that operations using immediate do only work with the upper 
half of registers.


All useful, but you should be giving me the summary for the things 
you're changing, not asking me to do it :-)  Presumably you've already 
done the analysis to ensure your changes are an improvement.  I'm asking 
you to provide that analysis for review and archival purposes.



A quick table like


Mode    Shift count    Shift type    original cycles (or size) new 
cycles (or size)



That will make it very clear for me and anyone doing historical work in 
the future what was expected here.  It's OK if the cycle counts aren't 
100% accurate.



Including a testcase would be awesome as well, but isn't strictly required.



a description for the code before my change and what changed:

* shifts on 8bit (beside arithmetic shifts right) were optimized and 
always unrolled (only aligned with the rest of the code without actual 
change)


* arithmetic shift 8bit and 16bit shifts were mostly optimized and 
mostly unrolled - depending on registers and Os (I added the missing 
cases there)


* 24bit and 32bit shifts were basically not optimized at all and never 
unrolled (I added those cases and aligned the optimizer logic with the 
others. They also reuse the other shift code since they may reduce to 
those cases after a move for bigger shifts.)


* out_shift_with_cnt provides a fallback implementation as a loop over 
shifts which may get unrolled. in case of Os to about inner_len + 3,4 
or 5 and in other cases of optimizer e.g. O2 it gets unrolled if size 
is smaller 10. see max_len (basically unchanged)


* did not touch non const cases in this patch but may in a future 
patch for O2 and O3


note that in case of Os the smaller code is picked which is the loop 
at least in some cases but other optimizer cases profit a lot.


also note that it is debatable if Os needs to be that strict with size 
since the compute overhead of the loop is high with 5 per loop 
iteration/cycle- so per bit shift. A lot more cases could be covered 
with +1 or +2 more instructions.



about plen:

If plen is NULL the asm code gets returned.

If plen is a pointer the code does count the instruction count which I 
guess is used (or could be used) as a rough estimate of cycles as well 
as byte code size.


Some of the functions named this len. The 24bit functions mainly named 
this plen and used it like it is now in all functions. This is mostly 
a readability improvement.


I am not sure how this works together with the optimizer or the rest.

To my understanding however the functions may get called once by the 
optimizer with a length given, then to output code and potentially 
again with a len given over avr_adjust_length to return the size.


I may be wrong about this part but as far as I can tell I did not 
change the way it operates.



size and cycles summary:

The 

Re: [PATCH v4] RISC-V: Libitm add RISC-V support.

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/28/22 06:34, Xiongchuan Tan via Gcc-patches wrote:

Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

libitm/ChangeLog:

 * configure.tgt: Add riscv support.
 * config/riscv/asm.h: New file.
 * config/riscv/sjlj.S: New file.
 * config/riscv/target.h: New file.
---
v2: Change HW_CACHELINE_SIZE to 64 (in accordance with the RVA profiles, see
https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc)

v3: Ensure the stack is aligned to 16 bytes; make use of Zihintpause in
cpu_relax()

v4: Add a guard for unsupported RV32E

  libitm/config/riscv/asm.h|  58 ++
  libitm/config/riscv/sjlj.S   | 144 +++
  libitm/config/riscv/target.h |  62 +++
  libitm/configure.tgt |   2 +
  4 files changed, 266 insertions(+)
  create mode 100644 libitm/config/riscv/asm.h
  create mode 100644 libitm/config/riscv/sjlj.S
  create mode 100644 libitm/config/riscv/target.h

diff --git a/libitm/config/riscv/asm.h b/libitm/config/riscv/asm.h
new file mode 100644
index 000..8d02117
--- /dev/null
+++ b/libitm/config/riscv/asm.h
@@ -0,0 +1,58 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Xiongchuan Tan .
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _RV_ASM_H
+#define _RV_ASM_H
+
+#ifdef __riscv_e
+#  error "rv32e unsupported"
+#endif


error "rv32e and rv64e unsupported" would probably be a better error 
here.  But it's probably not a big deal.




+#else
+#  define SZ_FPR 0
+#endif


Sneaky way to not allocate space for the FP regs.  ;)

Do you have commit access?  If so, go ahead and commit the change.  Else 
let me know and I can do it for you.



Thanks,



Jeff



Re: Ping^3 [PATCH V2] Add attribute hot judgement for INLINE_HINT_known_hot hint.

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/20/22 19:52, Cui, Lili via Gcc-patches wrote:

Hi Honza,

Gentle ping  
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601934.html

gcc/ChangeLog

   * ipa-inline-analysis.cc (do_estimate_edge_time): Add function attribute
   judgement for INLINE_HINT_known_hot hint.

gcc/testsuite/ChangeLog:

   * gcc.dg/ipa/inlinehint-6.c: New test.
---
  gcc/ipa-inline-analysis.cc  | 13 ---
  gcc/testsuite/gcc.dg/ipa/inlinehint-6.c | 47 +
  2 files changed, 56 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/ipa/inlinehint-6.c

diff --git a/gcc/ipa-inline-analysis.cc b/gcc/ipa-inline-analysis.cc
index 1ca685d1b0e..7bd29c36590 100644
--- a/gcc/ipa-inline-analysis.cc
+++ b/gcc/ipa-inline-analysis.cc
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "ipa-utils.h"
  #include "cfgexpand.h"
  #include "gimplify.h"
+#include "attribs.h"
  
  /* Cached node/edge growths.  */

  fast_call_summary *edge_growth_cache = 
NULL;
@@ -249,15 +250,19 @@ do_estimate_edge_time (struct cgraph_edge *edge, sreal 
*ret_nonspec_time)
hints = estimates.hints;
  }
  
-  /* When we have profile feedback, we can quite safely identify hot

- edges and for those we disable size limits.  Don't do that when
- probability that caller will call the callee is low however, since it
+  /* When we have profile feedback or function attribute, we can quite safely
+ identify hot edges and for those we disable size limits.  Don't do that
+ when probability that caller will call the callee is low however, since it
   may hurt optimization of the caller's hot path.  */
-  if (edge->count.ipa ().initialized_p () && edge->maybe_hot_p ()
+  if ((edge->count.ipa ().initialized_p () && edge->maybe_hot_p ()
&& (edge->count.ipa () * 2
  > (edge->caller->inlined_to
 ? edge->caller->inlined_to->count.ipa ()
 : edge->caller->count.ipa (
+  || (lookup_attribute ("hot", DECL_ATTRIBUTES (edge->caller->decl))
+ != NULL
+&& lookup_attribute ("hot", DECL_ATTRIBUTES (edge->callee->decl))
+ != NULL))
  hints |= INLINE_HINT_known_hot;


Is the theory here that if the user has marked the caller and callee as 
hot, then we're going to assume an edge between them is hot too?  That's 
not necessarily true, it could be they're both hot, but via other call 
chains.  But it's probably a reasonable heuristic in practice.



OK


jeff




Re: [PATCH] libgcc: Special-case BFD ld unwind table encodings in find_fde_tail

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/17/22 03:06, Florian Weimer via Gcc-patches wrote:

BFD ld (and the other linkers) only produce one encoding of these
values.  It is not necessary to use the general
read_encoded_value_with_base decoding routine.  This avoids the
data-dependent branches in its implementation.

libgcc/

* unwind-dw2-fde-dip.c (find_fde_tail): Special-case encoding
values actually used by BFD ld.


OK.

jeff




Re: [committed] More infrastructure to avoid bogus RTL on H8

2022-10-28 Thread Jeff Law via Gcc-patches


On 10/25/22 13:59, Jan-Benedict Glaw wrote:

Hi Jeff!

On Mon, 2022-10-17 17:47:16 -0600, Jeff Law via Gcc-patches 
 wrote:

--- a/gcc/config/h8300/h8300.cc
+++ b/gcc/config/h8300/h8300.cc
@@ -5531,6 +5531,32 @@ h8300_ok_for_sibcall_p (tree fndecl, tree)
  
return 1;

  }
+
+/* Return TRUE if OP is a PRE_INC or PRE_DEC
+   instruction using REG, FALSE otherwise.  */
+
+bool
+pre_incdec_with_reg (rtx op, int reg)
+{
+  /* OP must be a MEM.  */
+  if (GET_CODE (op) != MEM)
+return false;
+
+  /* The address must be a PRE_INC or PRE_DEC.  */
+  op = XEXP (op, 0);
+  if (GET_CODE (op) != PRE_DEC && GET_CODE (op) != PRE_INC)
+return false;
+
+  /* It must be a register that is being incremented
+ or decremented.  */
+  op = XEXP (op, 0);
+  if (!REG_P (op))
+return false;
+
+  /* Finally, check that the register number matches.  */
+  return REGNO (op) == reg;

This results in a new signed-vs-unsigned warning for me:

[all 2022-10-25 00:41:11] ../../gcc/gcc/config/h8300/h8300.cc: In function 
'bool pre_incdec_with_reg(rtx, int)':
[all 2022-10-25 00:41:11] ../../gcc/gcc/config/h8300/h8300.cc:5557:21: error: 
comparison of integer expressions of different signedness: 'unsigned int' and 
'int' [-Werror=sign-compare]
[all 2022-10-25 00:41:11]  5557 |   return REGNO (op) == reg;


Fixed via the attached patch.  Thanks for pointing it out.


jeff

commit 724d3f926b94672de960dbe88fb699bbdd7fde97
Author: Jeff Law 
Date:   Fri Oct 28 23:33:06 2022 -0400

Fix signed vs unsigned issue in H8 port

gcc/
* config/h8300/h8300.cc (pre_incdec_with_reg): Make reg argument
an unsigned int
* config/h8300/h8300-protos.h (pre_incdec_with_reg): Adjust 
prototype.

diff --git a/gcc/config/h8300/h8300-protos.h b/gcc/config/h8300/h8300-protos.h
index 8c989495c29..77adfaba07b 100644
--- a/gcc/config/h8300/h8300-protos.h
+++ b/gcc/config/h8300/h8300-protos.h
@@ -100,7 +100,7 @@ extern int h8300_initial_elimination_offset (int, int);
 extern int h8300_regs_ok_for_stm (int, rtx[]);
 extern int h8300_hard_regno_rename_ok (unsigned int, unsigned int);
 extern bool h8300_move_ok (rtx, rtx);
-extern bool pre_incdec_with_reg (rtx, int);
+extern bool pre_incdec_with_reg (rtx, unsigned int);
 
 struct cpp_reader;
 extern void h8300_pr_interrupt (struct cpp_reader *);
diff --git a/gcc/config/h8300/h8300.cc b/gcc/config/h8300/h8300.cc
index ce0702edecb..cd7975e2fff 100644
--- a/gcc/config/h8300/h8300.cc
+++ b/gcc/config/h8300/h8300.cc
@@ -5536,7 +5536,7 @@ h8300_ok_for_sibcall_p (tree fndecl, tree)
instruction using REG, FALSE otherwise.  */
 
 bool
-pre_incdec_with_reg (rtx op, int reg)
+pre_incdec_with_reg (rtx op, unsigned int reg)
 {
   /* OP must be a MEM.  */
   if (GET_CODE (op) != MEM)


Re: [PATCH 2/3] Add lto-dump tool.

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/28/22 04:14, Thomas Schwinge wrote:

Hi!

This minor clean-up had fallen out of me working on something else in
GCC's options machinery, several months ago:

On 2019-03-12T18:14:04+0100, marxin  wrote:

gcc/lto/ChangeLog:
   * lang.opt: Add new language LTODump and options related
   to LTO dump tool.

As this new "Language" 'LTODump' does not share any options with 'LTO'
proper, it makes sense, in my opinion, to also make that obvious in
'gcc/lto/lang.opt', which your Subversion r270897 (Git
commit 66d62d9f2e6b059be6a018397fba555147133a9a) "Add lto-dump tool"
almost ;-) did:


--- a/gcc/lto/lang.opt
+++ b/gcc/lto/lang.opt
@@ -24,6 +24,9 @@
  Language
  LTO

+Language
+LTODump
+
  Enum
  Name(lto_linker_output) Type(enum lto_linker_output) UnknownError(unknown 
linker output %qs)

@@ -66,6 +69,65 @@ fwpa=
  LTO Driver RejectNegative Joined Var(flag_wpa)
  Whole program analysis (WPA) mode with number of parallel jobs specified.

+
+[LTODump option records]
+
+
  fresolution=
  LTO Joined
  The resolution file.

OK to push the attached
"Better separate 'LTO' vs. 'LTODump' in 'gcc/lto/lang.opt'"?


OK.

jeff




Re: [PATCH v2] RISC-V: Libitm add RISC-V support.

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/27/22 22:23, Xi Ruoyao via Gcc-patches wrote:

On Thu, 2022-10-27 at 17:44 -0700, Palmer Dabbelt wrote:

though I don't have an opinion on whether libitm should be taking ports
to new targets, I'd never even heard of it before.

I asked this question to myself when I reviewed LoongArch libitm port.
But I remember one maintainer of Deepin (a distro) has complained that
some packages were depending on libitm (and/or libvtv).+++


I thought libitm had generic code to work if the target didn't provide 
an implementation.  But looking more closely I see that is not the 
case.  So yea, I guess cobbling together the most straightforward 
implementation as possible makes sense.



Jeff



[Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh

2022-10-28 Thread zhongyunde at huawei dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

--- Comment #11 from vfdff  ---
Created attachment 53787
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53787=edit
has different operand order base on different commit node

hi @Andrew Pinski

* Showed as the figure swap_order.jpg attaiched, we can introduce flags :c for
the plus node m_13 to match commutated node according
https://gcc.gnu.org/onlinedocs/gccint/The-Language.html.

And for the plus node _24, does it also have some similar flag to simplify the
matching ?

[Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code.

2022-10-28 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

bartoldeman at users dot sourceforge.net changed:

   What|Removed |Added

  Attachment #53785|0   |1
is obsolete||

--- Comment #3 from bartoldeman at users dot sourceforge.net ---
Created attachment 53786
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53786=edit
Corrected test case

In my eagerness to make it as short as possible I made it too short indeed!

[committed] libstdc++: Fix dangling reference in filesystem::path::filename()

2022-10-28 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk. Worth backporting too.

-- >8 --

The new -Wdangling-reference warning noticed this.

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (path::filename()): Fix dangling
reference.
---
 libstdc++-v3/include/bits/fs_path.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 6e7b366d104..2fc7dcd98c9 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -1262,9 +1262,9 @@ namespace __detail
   {
if (_M_pathname.back() == preferred_separator)
  return {};
-   auto& __last = *--end();
-   if (__last._M_type() == _Type::_Filename)
- return __last;
+   auto __last = --end();
+   if (__last->_M_type() == _Type::_Filename)
+ return *__last;
   }
 return {};
   }
-- 
2.37.3



[Bug tree-optimization/103035] [meta-bug] YARPGen bugs

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103035
Bug 103035 depends on bug 98694, which changed state.

Bug 98694 Summary: GCC produces incorrect code for loops with -O3 for 
skylake-avx512 and icelake-server
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98694

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug target/98694] GCC produces incorrect code for loops with -O3 for skylake-avx512 and icelake-server

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98694

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |10.4
 Status|NEW |RESOLVED
  Known to work||10.4.0

--- Comment #14 from Vsevolod Livinskii  
---
Should this issue be marked as Resolved and Fixed?

--- Comment #15 from Andrew Pinski  ---
Fixed.

[Bug c++/102786] [c++20] virtual pmf sometimes rejected as not a constant

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102786

Andrew Pinski  changed:

   What|Removed |Added

  Known to work||10.4.0, 12.1.0, 9.5.0
   Target Milestone|--- |9.5

[Bug c/101176] valgrind error for c-c++-common/builtin-has-attribute.c

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101176

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |9.5

[Bug rtl-optimization/101008] ICE: in native_encode_rtx, at simplify-rtx.c:6594 with -O -g

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101008

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |10.4

gcc-11-20221028 is now available

2022-10-28 Thread GCC Administrator via Gcc
Snapshot gcc-11-20221028 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/11-20221028/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 11 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-11 revision f2298bd50109e5460e8949290b5337ec28310e91

You'll find:

 gcc-11-20221028.tar.xz   Complete GCC

  SHA256=66db3e6232f3a853df3c1e924504dacaeb260881e60744486a2cb55170b2c7be
  SHA1=dd28abe0fe1f198cca26b2a25517349480688469

Diffs from 11-20221021 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-11
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread H.J. Lu via Gcc-patches
On Fri, Oct 28, 2022 at 2:34 PM Segher Boessenkool
 wrote:
>
> On Wed, Oct 26, 2022 at 11:58:57AM -0700, H.J. Lu via Gcc-patches wrote:
> > In i386.md, neg patterns which set MODE_CC register like
> >
> > (set (reg:CCC FLAGS_REG)
> >  (ne:CCC (match_operand:SWI48 1 "general_reg_operand") (const_int 0)))
> >
> > can lead to errors when operand 1 is a constant value.  If FLAGS_REG in
>
> But it cannot be.  general_reg_operand will not allow that:
> ===
> (define_predicate "general_reg_operand"
>   (and (match_code "reg")
>(match_test "GENERAL_REGNO_P (REGNO (op))")))
> ===
>
> > (set (reg:CCC FLAGS_REG)
> >  (ne:CCC (const_int 2) (const_int 0)))
> >
> > is set to 1, RTX simplifiers may simplify
>

Here is another example:

(define_insn "*neg_ccc_1"
  [(set (reg:CCC FLAGS_REG)
(ne:CCC
  (match_operand:SWI 1 "nonimmediate_operand" "0")
  (const_int 0)))
   (set (match_operand:SWI 0 "nonimmediate_operand" "=m")
(neg:SWI (match_dup 1)))]
  ""
  "neg{}\t%0"
  [(set_attr "type" "negnot")
   (set_attr "mode" "")])

Operand 1 can be a known value.

H.J.


Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread Eric Botcazou via Gcc-patches
> You mean in CCV?  That works yes, but only because (or if) the setter
> and getter of the CC reg both use CCV (so never use any other flag at
> the same time; CCV has an empty intersection with all other CC modes).

We're talking about CCC here AFAIK, i.e. the carry, not CCV.

-- 
Eric Botcazou




[Bug fortran/107397] [10/11/12/13 Regression] ICE in gfc_arith_plus, at fortran/arith.cc:654

2022-10-28 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107397

--- Comment #6 from anlauf at gcc dot gnu.org ---
(In reply to Steve Kargl from comment #5)
> No.  I have no idea how to add a testcase to git.
> Every time I've tried, I end up deleting my git 
> repository and grabbing a new clone.  Not a pleasant
> developer experience.

The workflow with git is really simple:

git add path/to/file
...
git commit

(git gcc-commit-mklog is a tailored version for working with the gcc repo.)

To create a patch for submitting,

git format-patch -1

(if you don't need fancy stuff like a patch series...)

With "git rebase" you can do really many useful things you would never dream
of with svn.

It's also easy to remove a commit from your local worktree or reorder commits
(using rebase), or resetting your worktree, ...

I really think you just need a good primer.

Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread Segher Boessenkool
Hi!

On Fri, Oct 28, 2022 at 10:35:03AM +0200, Eric Botcazou via Gcc-patches wrote:
> > (set (reg:SI 93)
> >  (neg:SI (ltu:SI (reg:CCC 17 flags) (const_int 0 [0]
> > 
> > as
> > 
> > (set (reg:SI 93)
> >  (neg:SI (ltu:SI (const_int 1) (const_int 0 [0]
> > 
> > which leads to incorrect results since LTU on MODE_CC register isn't the
> > same as "unsigned less than" in x86 backend.
> 
> That's not specific to the x86 back-end, i.e. it's a generic caveat.

A MODE_CC reg can never be "const_int 1".  That is total garbage.  It
cannot work.  It would mean all of
  (eq (reg:CC) (const_int 0))
  (lt (reg:CC) (const_int 0))
  (gt (reg:CC) (const_int 0))
  (ne (reg:CC) (const_int 0))
  (ge (reg:CC) (const_int 0))
  (le (reg:CC) (const_int 0))
(and more) are simultaneously true.

> > PR target/107172
> > * config/i386/i386.md (UNSPEC_CC_NE): New.
> > Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns.
> 
> FWIW the SPARC back-end uses a COMPARE instead of an UNSPEC here.

You mean in CCV?  That works yes, but only because (or if) the setter
and getter of the CC reg both use CCV (so never use any other flag at
the same time; CCV has an empty intersection with all other CC modes).


Segher


[Bug fortran/107397] [10/11/12/13 Regression] ICE in gfc_arith_plus, at fortran/arith.cc:654

2022-10-28 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107397

--- Comment #5 from Steve Kargl  ---
On Fri, Oct 28, 2022 at 08:31:58PM +, anlauf at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107397
> 
> --- Comment #4 from anlauf at gcc dot gnu.org ---
> (In reply to kargl from comment #3)
> > This patch fixes the ICE and issues an error.  It has passed
> > regression testing.
> 
> Great!
> 
> Do you plan to submit your patch?  (Hint: git gcc-commit-mklog).
> 

No.  I have no idea how to add a testcase to git.
Every time I've tried, I end up deleting my git 
repository and grabbing a new clone.  Not a pleasant
developer experience.

[Bug target/107453] New stdarg tests in r13-3549-g4fe34cdcc80ac2 fail

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107453

--- Comment #1 from Andrew Pinski  ---
>From https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604152.html:
It's possible further
target-specific fixes will be needed; target maintainers should watch
out for failures of c2x-stdarg-4.c, the execution test, which would
indicate that this feature is not working correctly.

[Bug other/107453] New: New stdarg tests in r13-3549-g4fe34cdcc80ac2 fail

2022-10-28 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107453

Bug ID: 107453
   Summary: New stdarg tests in r13-3549-g4fe34cdcc80ac2 fail
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:4fe34cdcc80ac225b80670eabc38ac5e31ce8a5a, r13-3549-g4fe34cdcc80ac2

FAIL: gcc.dg/c2x-stdarg-4.c execution test
FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O0  execution test
FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O1  execution test
FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2  execution test
FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O3 -g  execution test
FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -Os  execution test


One traceback

(gdb) run
Starting program: /home/seurer/gcc/git/build/gcc-test/c2x-stdarg-4.exe 

Program received signal SIGABRT, Aborted.
0x202192a8 in raise () from /lib64/glibc-hwcaps/power9/libc-2.28.so
(gdb) where
#0  0x202192a8 in raise () from /lib64/glibc-hwcaps/power9/libc-2.28.so
#1  0x201f3eb4 in abort () from /lib64/glibc-hwcaps/power9/libc-2.28.so
#2  0x1fa0 in main () at
/home/seurer/gcc/git/gcc-test/gcc/testsuite/gcc.dg/c2x-stdarg-4.c:153


The abort is from here:

  if (f (1, 2.0, 3, 4.0) != 10.0)
abort ();



Author: Joseph Myers 
Date:   Fri Oct 28 14:40:25 2022 +

c: tree: target: C2x (...) function prototypes and va_start relaxation

Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread Segher Boessenkool
On Wed, Oct 26, 2022 at 11:58:57AM -0700, H.J. Lu via Gcc-patches wrote:
> In i386.md, neg patterns which set MODE_CC register like
> 
> (set (reg:CCC FLAGS_REG)
>  (ne:CCC (match_operand:SWI48 1 "general_reg_operand") (const_int 0)))
> 
> can lead to errors when operand 1 is a constant value.  If FLAGS_REG in

But it cannot be.  general_reg_operand will not allow that:
===
(define_predicate "general_reg_operand"
  (and (match_code "reg")
   (match_test "GENERAL_REGNO_P (REGNO (op))")))
===

> (set (reg:CCC FLAGS_REG)
>  (ne:CCC (const_int 2) (const_int 0)))
> 
> is set to 1, RTX simplifiers may simplify

"is set to 1"?  Do you mean you do something like
  (set (regs FLAGS_REG) (const_int 1))
?  That is invalid RTL, as I've said tens of time in the last few weeks.

> which leads to incorrect results since LTU on MODE_CC register isn't the
> same as "unsigned less than" in x86 backend.

The special notation
  (ltu (reg:CC) (const_int 0))
is not about comparing anything to 0, but simply means "did the
comparison-like thing that set that reg say ltu was true".

> To prevent RTL optimizers
> from setting MODE_CC register to a constant, use UNSPEC_CC_NE to replace
> ne:CCC/ne:CCO when setting FLAGS_REG in neg patterns.

This is an indirect workaround, nothing more.  The unspec will naturally
not be folded to anything else (unless you arrange for that yourself),
there is nothing the generic code knows about the semantics of any
unspec after all.

AFIACS there is no way to express overflow in a CC, but an unspec can
help, sure.  You need to fix the setter side as well though.


Segher


[r13-3540 Regression] FAIL: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times vect "loop vectorized" 1 on Linux/x86_64

2022-10-28 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

0607307768b66a90e27c5bc91a247acc938f070e is the first bad commit
commit 0607307768b66a90e27c5bc91a247acc938f070e
Author: Thomas Schwinge 
Date:   Tue Oct 25 13:10:52 2022 +0200

Fix target selector syntax in 'gcc.dg/vect/bb-slp-cond-1.c'

caused

FAIL: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "loop vectorized" 1
FAIL: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times vect "loop vectorized" 1

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-3540/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-cond-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-cond-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: --target=powerpc64-linux_altivec: Use rs6000_linux64_override_options()?

2022-10-28 Thread Segher Boessenkool
On Fri, Oct 28, 2022 at 10:07:41PM +0200, Jan-Benedict Glaw wrote:
> On Fri, 2022-10-28 14:19:10 -0500, Segher Boessenkool 
>  wrote:
> > Why do you use powerpc64-linux_altivec?  This things (normally spelled
> > with a dash, not and underscore, btw) was made for 32-bit targets.  It
> > never has done anything useful for 64-bit targets, afaik?
> 
> Because it's listed in ./contrib/config-list.mk:
> 
> /var/cache/git/gcc [master] # make -f contrib/config-list.mk show | tr ' ' 
> $'\n' | grep altivec
> powerpc-eabisimaltivec
> powerpc-eabialtivec
> powerpc64-linux_altivec

Huh.  Okay, that is a bug.  Has that target ever worked (or
alternatively, has it ever existed at all, other than it is recognised
by config.gcc by not very tight REs)?

> It seems to be on the target list since the very beginning, when
> config-list.mk was created by Joern Rennecke. So somebody cared about
> this configuration I guess?

No idea.  rs6000_altivec_abi is always forced on on any linux
configuration that has VMX or VSX or 64 bit enabled:
===
  /* The AltiVec ABI is the default for PowerPC-64 GNU/Linux.  For
 PowerPC-32 GNU/Linux, -maltivec implies the AltiVec ABI.  It can
 be explicitly overridden in either case.  */
  if (TARGET_ELF)
{
  if (!OPTION_SET_P (rs6000_altivec_abi)
  && (TARGET_64BIT || TARGET_ALTIVEC || TARGET_VSX))
{
  if (main_target_opt != NULL &&
  !main_target_opt->x_rs6000_altivec_abi)
error ("target attribute or pragma changes AltiVec ABI");
  else
rs6000_altivec_abi = 1;
}
}
===

>   If this configuration isn't ment to be used, we'd just drop it from
> the list I guess.

Yeah, the config makes no sense.

Thanks,


Segher


[Bug fortran/103413] [10/11/12/13 Regression] ICE: Invalid expression in gfc_element_size since r10-2083-g8dc63166e0b85954

2022-10-28 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103413

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #20 from anlauf at gcc dot gnu.org ---
Fixed on an open branches.  Closing.

Thanks for the report!

[Bug fortran/103413] [10/11/12/13 Regression] ICE: Invalid expression in gfc_element_size since r10-2083-g8dc63166e0b85954

2022-10-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103413

--- Comment #19 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Harald Anlauf
:

https://gcc.gnu.org/g:3b4c9e0658b13b8db6c7f38242ed270cdb8fc932

commit r10-11063-g3b4c9e0658b13b8db6c7f38242ed270cdb8fc932
Author: Harald Anlauf 
Date:   Wed Oct 26 21:00:44 2022 +0200

Fortran: BOZ literal constants are not compatible to any type [PR103413]

gcc/fortran/ChangeLog:

PR fortran/103413
* symbol.c (gfc_type_compatible): A boz-literal-constant has no
type
and thus is not considered compatible to any type.

gcc/testsuite/ChangeLog:

PR fortran/103413
* gfortran.dg/illegal_boz_arg_4.f90: New test.

(cherry picked from commit f7d28818179247685f3c101f9f2f16366f56309b)

[Bug target/93177] PPC: Missing many useful platform intrinsics

2022-10-28 Thread vital.had at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93177

--- Comment #22 from Sergey Fedorov  ---
(In reply to Iain Sandoe from comment #19)
> Created attachment 53779 [details]
> introduce ppc_intrinsics.h for powerpc*-darwin.
> 
> This takes the header from the GCC-4.x apple debt branch (as present in SVN:
> r113478) and 
>  - updates the license.
>  - installs for powerpc*-darwin
> 
> It needs the test cases forward porting too.
> However, it would be good to know if this solves the problems folks have
> encountered here (if other ports want to try it, why only need to amend
> their entry in gcc/config.gcc)

Thank you! I will try it.

[Bug fortran/103413] [10/11/12/13 Regression] ICE: Invalid expression in gfc_element_size since r10-2083-g8dc63166e0b85954

2022-10-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103413

--- Comment #18 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Harald Anlauf
:

https://gcc.gnu.org/g:f2298bd50109e5460e8949290b5337ec28310e91

commit r11-10343-gf2298bd50109e5460e8949290b5337ec28310e91
Author: Harald Anlauf 
Date:   Wed Oct 26 21:00:44 2022 +0200

Fortran: BOZ literal constants are not compatible to any type [PR103413]

gcc/fortran/ChangeLog:

PR fortran/103413
* symbol.c (gfc_type_compatible): A boz-literal-constant has no
type
and thus is not considered compatible to any type.

gcc/testsuite/ChangeLog:

PR fortran/103413
* gfortran.dg/illegal_boz_arg_4.f90: New test.

(cherry picked from commit f7d28818179247685f3c101f9f2f16366f56309b)

[PATCH] c++: Tweaks for -Wredundant-move [PR107363]

2022-10-28 Thread Marek Polacek via Gcc-patches
Two things here:

1) when we're pointing out that std::move on a constant object is
   redundant, don't say "in return statement" when we aren't in a
   return statement;
2) suppress the warning when the std::move call was dependent, because
   removing the std::move may not be correct for a different
   instantiation of the original template.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/107363

gcc/cp/ChangeLog:

* semantics.cc (finish_call_expr): Suppress OPT_Wpessimizing_move.
* typeck.cc (maybe_warn_pessimizing_move): Check warn_redundant_move
and warning_suppressed_p.  Adjust a message depending on return_p.
(check_return_expr): Don't suppress OPT_Wpessimizing_move here.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/Wredundant-move13.C: New test.
---
 gcc/cp/semantics.cc   |  4 ++
 gcc/cp/typeck.cc  | 16 ++---
 .../g++.dg/cpp0x/Wredundant-move13.C  | 61 +++
 3 files changed, 73 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wredundant-move13.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 36aa9c4499f..caaa40fde19 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -2738,6 +2738,10 @@ finish_call_expr (tree fn, vec **args, bool 
disallow_virtual,
  result = build_min_nt_call_vec (orig_fn, *args);
  SET_EXPR_LOCATION (result, cp_expr_loc_or_input_loc (fn));
  KOENIG_LOOKUP_P (result) = koenig_p;
+ /* Disable the std::move warnings since this call was dependent
+(c++/89780, c++/107363).  This also suppresses the
+-Wredundant-move warning.  */
+ suppress_warning (result, OPT_Wpessimizing_move);
  if (is_overloaded_fn (fn))
fn = get_fns (fn);
 
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 2e0fd8fbf17..5f5fb2a212b 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -10885,7 +10885,9 @@ maybe_warn_pessimizing_move (tree expr, tree type, bool 
return_p)
  and where the std::move does nothing if T does not have a T(const T&&)
  constructor, because the argument is const.  It will not use T(T&&)
  because that would mean losing the const.  */
-  else if (TYPE_REF_P (TREE_TYPE (arg))
+  else if (warn_redundant_move
+  && !warning_suppressed_p (expr, OPT_Wredundant_move)
+  && TYPE_REF_P (TREE_TYPE (arg))
   && CP_TYPE_CONST_P (TREE_TYPE (TREE_TYPE (arg
 {
   tree rtype = TREE_TYPE (TREE_TYPE (arg));
@@ -10901,8 +10903,11 @@ maybe_warn_pessimizing_move (tree expr, tree type, 
bool return_p)
  return;
  }
   auto_diagnostic_group d;
-  if (warning_at (loc, OPT_Wredundant_move,
- "redundant move in return statement"))
+  if (return_p
+ ? warning_at (loc, OPT_Wredundant_move,
+   "redundant move in return statement")
+ : warning_at (loc, OPT_Wredundant_move,
+   "redundant move in initialization"))
inform (loc, "remove % call");
 }
 }
@@ -11126,11 +11131,6 @@ check_return_expr (tree retval, bool *no_warning)
   /* We don't know if this is an lvalue or rvalue use, but
 either way we can mark it as read.  */
   mark_exp_read (retval);
-  /* Disable our std::move warnings when we're returning
-a dependent expression (c++/89780).  */
-  if (retval && TREE_CODE (retval) == CALL_EXPR)
-   /* This also suppresses -Wredundant-move.  */
-   suppress_warning (retval, OPT_Wpessimizing_move);
   return retval;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/Wredundant-move13.C 
b/gcc/testsuite/g++.dg/cpp0x/Wredundant-move13.C
new file mode 100644
index 000..80e7d80cd02
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/Wredundant-move13.C
@@ -0,0 +1,61 @@
+// PR c++/107363
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wredundant-move" }
+
+// Define std::move.
+namespace std {
+  template
+struct remove_reference
+{ typedef _Tp   type; };
+
+  template
+struct remove_reference<_Tp&>
+{ typedef _Tp   type; };
+
+  template
+struct remove_reference<_Tp&&>
+{ typedef _Tp   type; };
+
+  template
+constexpr typename std::remove_reference<_Tp>::type&&
+move(_Tp&& __t) noexcept
+{ return static_cast::type&&>(__t); }
+}
+
+template 
+struct Optional {
+  U ();
+  T release_value() {
+T t = std::move (value ());
+return t;
+  }
+};
+
+struct Foo {};
+void test(Optional o) { o.release_value(); }
+
+struct F {
+  F(const F&);
+  F(F&&) = delete;
+};
+
+struct Z {
+  Z(const Z&) = delete;
+  Z(Z&&) = delete;
+  Z(const Z&&);
+};
+
+const F& constfref();
+const Z& constzref();
+
+void
+g ()
+{
+  // Will call F::F(const F&) w/ and w/o std::move.  So it's redundant.
+  F f = std::move (constfref()); // { dg-warning "redundant move in 
initialization" }
+  (void) f;
+  // Will 

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2022-10-28 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #32 from joseph at codesourcery dot com  ---
On Fri, 28 Oct 2022, jakub at gcc dot gnu.org via Gcc-bugs wrote:

> > That said, if C allows us to limit to 128bits then let's do that for now.
> > 32bit targets will still see all the complication when we give that a stab.
> 
> I'm afraid once we define BITINT_MAXWIDTH, it will become part of the ABI, so
> we can't increase it afterwards.

I don't think it's part of the ABI; I think it's always OK to increase 
BITINT_MAXWIDTH, as long as the wider types don't need more alignment than 
the previous choice of max_align_t.

Thus, starting with a 128-bit limit (or indeed a 64-bit limit on 32-bit 
platforms, so that all the types fix within existing modes supported for 
arithmetic), and adding support for wider _BitInt later, would be a 
reasonable thing to do.

(You still have ABI considerations even with such a limit: apart from the 
padding question, on x86_64 the ABI says _BitInt(128) is 64-bit aligned 
but __int128 is 128-bit aligned.)

> Anyway, I'm afraid we probably don't have enough time to implement this
> properly in stage1, so might need to target GCC 14 with it.  Unless somebody
> spends on it
> the remaining 2 weeks full time.

I think https://gcc.gnu.org/pipermail/gcc/2022-October/239704.html is 
still current as a list of C2x language features likely not to make it 
into GCC 13.  (I hope to get auto and constexpr done in the next two 
weeks, and the other C2x language features not on that list are done.)

[Bug fortran/103413] [10/11/12/13 Regression] ICE: Invalid expression in gfc_element_size since r10-2083-g8dc63166e0b85954

2022-10-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103413

--- Comment #17 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Harald Anlauf
:

https://gcc.gnu.org/g:9831a5f4843b573bbdb2688bbf2de864b4e8be8b

commit r12-8875-g9831a5f4843b573bbdb2688bbf2de864b4e8be8b
Author: Harald Anlauf 
Date:   Wed Oct 26 21:00:44 2022 +0200

Fortran: BOZ literal constants are not compatible to any type [PR103413]

gcc/fortran/ChangeLog:

PR fortran/103413
* symbol.cc (gfc_type_compatible): A boz-literal-constant has no
type
and thus is not considered compatible to any type.

gcc/testsuite/ChangeLog:

PR fortran/103413
* gfortran.dg/illegal_boz_arg_4.f90: New test.

(cherry picked from commit f7d28818179247685f3c101f9f2f16366f56309b)

[Bug fortran/107397] [10/11/12/13 Regression] ICE in gfc_arith_plus, at fortran/arith.cc:654

2022-10-28 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107397

--- Comment #4 from anlauf at gcc dot gnu.org ---
(In reply to kargl from comment #3)
> This patch fixes the ICE and issues an error.  It has passed
> regression testing.

Great!

Do you plan to submit your patch?  (Hint: git gcc-commit-mklog).

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2022-10-28 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #31 from joseph at codesourcery dot com  ---
On Fri, 28 Oct 2022, rguenth at gcc dot gnu.org via Gcc-bugs wrote:

> I wouldn't go with a new tree code, given semantics are INTEGER_TYPE it should
> be an INTEGER_TYPE.

Implementation note in that case: bit-precise integer types aren't allowed 
as underlying types for enums, so the code in 
c-parser.cc:c_parser_enum_specifier checking underlying types:

  else if (TREE_CODE (specs->type) != INTEGER_TYPE
   && TREE_CODE (specs->type) != BOOLEAN_TYPE)
{
  error_at (enum_loc, "invalid % underlying type");

would then need to check that the type isn't a bit-precise type.

[Bug target/105549] aarch64: Wrong va_arg alignment handling with packed bitfields and alignment

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105549

Andrew Pinski  changed:

   What|Removed |Added

Summary|aarch64: Wrong va_arg   |aarch64: Wrong va_arg
   |alignment handling  |alignment handling with
   ||packed bitfields and
   ||alignment
 Ever confirmed|0   |1
   Last reconfirmed||2022-10-28
 Status|UNCONFIRMED |ASSIGNED
   Keywords||ABI

--- Comment #2 from Andrew Pinski  ---
Confirmed.

[Bug tree-optimization/105597] [13 Regression] ice in type, at value-range.h:223

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105597

Andrew Pinski  changed:

   What|Removed |Added

  Component|c   |tree-optimization
Summary|ice in type, at |[13 Regression] ice in
   |value-range.h:223   |type, at value-range.h:223
   Target Milestone|--- |13.0
Version|12.0|13.0
   Keywords||ice-on-valid-code

[Bug fortran/107441] optional arguments are identified as "present" when missing

2022-10-28 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107441

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |anlauf at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #10 from anlauf at gcc dot gnu.org ---
Submitted: https://gcc.gnu.org/pipermail/fortran/2022-October/058398.html

[Bug c++/107452] Failed to catch C++ exception thrown from multiarch-function (x64 CPUs)

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107452

--- Comment #2 from Andrew Pinski  ---
>Is this a known GCC issue? If needed I could also try to write a minimal test 
>that reproduces this issue.

Yes it is a known issue as shown by the duplicate bug report. The duplicate bug
report has a nice minimal testcase already so you don't need to write one but
thanks for the offer.

[PATCH] Fortran: ordering of hidden procedure arguments [PR107441]

2022-10-28 Thread Harald Anlauf via Gcc-patches
Dear all,

the passing of procedure arguments in Fortran sometimes requires
ancillary parameters that are "hidden".  Examples are string length
and the presence status of scalar variables with optional+value
attribute.

The gfortran ABI is actually documented:

https://gcc.gnu.org/onlinedocs/gfortran/Argument-passing-conventions.html

The reporter found that there was a discrepancy between the
caller and the callee.  This is corrected by the attached patch.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From b7646403557eca19612c81437f381d4b4dcd51c8 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 28 Oct 2022 21:58:08 +0200
Subject: [PATCH] Fortran: ordering of hidden procedure arguments [PR107441]

gcc/fortran/ChangeLog:

	PR fortran/107441
	* trans-decl.cc (create_function_arglist): Adjust the ordering of
	automatically generated hidden procedure arguments to match the
	documented ABI for gfortran.

gcc/testsuite/ChangeLog:

	PR fortran/107441
	* gfortran.dg/optional_absent_6.f90: New test.
---
 gcc/fortran/trans-decl.cc | 15 +++--
 .../gfortran.dg/optional_absent_6.f90 | 60 +++
 2 files changed, 71 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/optional_absent_6.f90

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 63515b9072a..18842fe2c4b 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -2508,7 +2508,7 @@ create_function_arglist (gfc_symbol * sym)
   tree fndecl;
   gfc_formal_arglist *f;
   tree typelist, hidden_typelist;
-  tree arglist, hidden_arglist;
+  tree arglist, hidden_arglist, optional_arglist, strlen_arglist;
   tree type;
   tree parm;

@@ -2518,6 +2518,7 @@ create_function_arglist (gfc_symbol * sym)
  the new FUNCTION_DECL node.  */
   arglist = NULL_TREE;
   hidden_arglist = NULL_TREE;
+  strlen_arglist = optional_arglist = NULL_TREE;
   typelist = TYPE_ARG_TYPES (TREE_TYPE (fndecl));

   if (sym->attr.entry_master)
@@ -2644,7 +2645,7 @@ create_function_arglist (gfc_symbol * sym)
 	  length = build_decl (input_location,
 			   PARM_DECL, get_identifier (name), len_type);

-	  hidden_arglist = chainon (hidden_arglist, length);
+	  strlen_arglist = chainon (strlen_arglist, length);
 	  DECL_CONTEXT (length) = fndecl;
 	  DECL_ARTIFICIAL (length) = 1;
 	  DECL_ARG_TYPE (length) = len_type;
@@ -2712,7 +2713,7 @@ create_function_arglist (gfc_symbol * sym)
 			PARM_DECL, get_identifier (name),
 			boolean_type_node);

-  hidden_arglist = chainon (hidden_arglist, tmp);
+	  optional_arglist = chainon (optional_arglist, tmp);
   DECL_CONTEXT (tmp) = fndecl;
   DECL_ARTIFICIAL (tmp) = 1;
   DECL_ARG_TYPE (tmp) = boolean_type_node;
@@ -2863,10 +2864,16 @@ create_function_arglist (gfc_symbol * sym)
   typelist = TREE_CHAIN (typelist);
 }

+  /* Add hidden present status for optional+value arguments.  */
+  arglist = chainon (arglist, optional_arglist);
+
   /* Add the hidden string length parameters, unless the procedure
  is bind(C).  */
   if (!sym->attr.is_bind_c)
-arglist = chainon (arglist, hidden_arglist);
+arglist = chainon (arglist, strlen_arglist);
+
+  /* Add hidden extra arguments for the gfortran library.  */
+  arglist = chainon (arglist, hidden_arglist);

   gcc_assert (hidden_typelist == NULL_TREE
   || TREE_VALUE (hidden_typelist) == void_type_node);
diff --git a/gcc/testsuite/gfortran.dg/optional_absent_6.f90 b/gcc/testsuite/gfortran.dg/optional_absent_6.f90
new file mode 100644
index 000..b8abb06980a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/optional_absent_6.f90
@@ -0,0 +1,60 @@
+! { dg-do run }
+! PR fortran/107441
+!
+! Test VALUE + OPTIONAL for integer/real/...
+! in the presence of non-optional character dummies
+
+program bugdemo
+  implicit none
+  character :: s = 'a'
+  integer   :: t
+
+  t = testoptional(s)
+  call test2 (s)
+  call test3 (s)
+  call test4 (w='123',x=42)
+
+contains
+
+  function testoptional (w, x) result(t)
+character, intent(in)  :: w
+integer,   intent(in), value, optional :: x
+integer :: t
+print *, 'present(x) is', present(x)
+t = 0
+if (present (x)) stop 1
+  end function testoptional
+
+  subroutine test2 (w, x)
+character, intent(in)  :: w
+integer,   intent(in), value, optional :: x
+print*, 'present(x) is', present(x)
+if (present (x)) stop 2
+  end subroutine test2
+
+  subroutine test3 (w, x)
+character, intent(in),optional :: w
+integer,   intent(in), value, optional :: x
+print *, 'present(w) is', present(w)
+print *, 'present(x) is', present(x)
+if (.not. present (w)) stop 3
+if (present (x)) stop 4
+  end subroutine test3
+
+  subroutine test4 (r, w, x)
+real, value, optional :: r
+character(*), intent(in),optional :: w
+integer,  value, 

[Bug c++/107452] Failed to catch C++ exception thrown from multiarch-function (x64 CPUs)

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107452

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
Dup of bug 106627.

*** This bug has been marked as a duplicate of bug 106627 ***

[Bug ipa/106627] Exception from multiversion function cannot be caught

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106627

Andrew Pinski  changed:

   What|Removed |Added

 CC||kim.walisch at gmail dot com

--- Comment #5 from Andrew Pinski  ---
*** Bug 107452 has been marked as a duplicate of this bug. ***

Re: --target=powerpc64-linux_altivec: Use rs6000_linux64_override_options()?

2022-10-28 Thread Jan-Benedict Glaw
Hi!

On Fri, 2022-10-28 14:19:10 -0500, Segher Boessenkool 
 wrote:
> On Fri, Oct 28, 2022 at 07:34:24PM +0200, Jan-Benedict Glaw wrote:
> > While checking my bot build logs, I noticed that GCC configured for
> > --target=powerpc64-linux_altivec will pull in linux64.h and
> > linuxaltivec.h .
> > 
> > linux64.h
> >   * Will "#define TARGET_USES_LINUX64_OPT 1" (to make static void
> > rs6000_linux64_override_options() available in rs6000.cc).
> >   * Will "#define SUBSUBTARGET_OVERRIDE_OPTIONS" to use
> > rs6000_linux64_override_options().
> > 
> > linuxaltivec.h OTOH
> >   * Will undef / "#define SUBSUBTARGET_OVERRIDE_OPTIONS  rs6000_altivec_abi 
> > = 1"
> > and thus no longer use rs6000_linux64_override_options()
> >   * That triggers a warning (unused-function).
> > 
> > To silence that warning, should linuxaltivec.h undefine
> > TARGET_USES_LINUX64_OPT? Or set rs6000_altivec_abi=1 and call
> > rs6000_linux64_override_options()?
> 
> Why do you use powerpc64-linux_altivec?  This things (normally spelled
> with a dash, not and underscore, btw) was made for 32-bit targets.  It
> never has done anything useful for 64-bit targets, afaik?

Because it's listed in ./contrib/config-list.mk:

/var/cache/git/gcc [master] # make -f contrib/config-list.mk show | tr ' ' 
$'\n' | grep altivec
powerpc-eabisimaltivec
powerpc-eabialtivec
powerpc64-linux_altivec

> (And not for 32-bit targets either really, but that is another issue.)

It seems to be on the target list since the very beginning, when
config-list.mk was created by Joern Rennecke. So somebody cared about
this configuration I guess?

  If this configuration isn't ment to be used, we'd just drop it from
the list I guess.

MfG, JBG

-- 


signature.asc
Description: PGP signature


[Bug c++/107452] New: Failed to catch C++ exception thrown from multiarch-function (x64 CPUs)

2022-10-28 Thread kim.walisch at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107452

Bug ID: 107452
   Summary: Failed to catch C++ exception thrown from
multiarch-function (x64 CPUs)
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kim.walisch at gmail dot com
  Target Milestone: ---

Hi,

Tested using: GCC 11.2.0, Ubuntu 22.10 x64
Tested using: GCC 9.4.0, Ubuntu 18.04 x64

I am using the GCC multiarch feature (also known as function multiversioning:
https://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning.html) in my
primesieve C++ project to take advantage of the latest supported CPU
instruction set e.g. AVX, AVX2, AVX512 (on x64 CPUs).

Today I found out that if I throw a C++ exception from a multiarch-function and
I try to catch that exception outside of the originating multiarch-function but
within the same translation unit, then catching the exception fails and my
program simply aborts.

My exception is thrown from here:
https://github.com/kimwalisch/primesieve/blob/776c102f92905401613a83508d60744d41df7c73/src/PrimeGenerator.cpp#L332
It should be caught here:
https://github.com/kimwalisch/primesieve/blob/776c102f92905401613a83508d60744d41df7c73/src/iterator-c.cpp#L151



My bug can be reproduced using these steps:

git clone https://github.com/kimwalisch/primesieve.git
cd primesieve && mkdir build && cd build
git checkout 776c102f92905401613a83508d60744d41df7c73
CXXFLAGS="-O2 -Wall -Wextra -pedantic" cmake ..  -DBUILD_TESTS=ON
-DCMAKE_BUILD_TYPE=Debug -DWITH_MULTIARCH=ON  && make -j8
test/next_prime2

The test/next_prime2 will fail with the following error message:

terminate called after throwing an instance of 'primesieve::primesieve_error'
  what():  cannot generate primes > 2^64
Aborted



If I recompile without function multiversioning (-DWITH_MULTIARCH=OFF) the same
exception is caught successfully:

rm -rf *
CXXFLAGS="-O2 -Wall -Wextra -pedantic" cmake ..  -DBUILD_TESTS=ON
-DCMAKE_BUILD_TYPE=Debug -DWITH_MULTIARCH=OFF && make -j8
test/next_prime2

The test/next_prime2 completes successfully:

...
primesieve_iterator: cannot generate primes > 2^64
next_prime(18446744073709551615) = PRIMESIEVE_ERROR:   OK
primesieve_iterator: cannot generate primes > 2^64
next_prime(18446744073709551615) = PRIMESIEVE_ERROR:   OK

All tests passed successfully!



Clang also supports function multiversioning on Linux & x64 CPUs. And with
Clang this issue is not present, with Clang catching C++ exceptions thrown from
a multiarch-function works flawlessly (tested using Clang 14.0.0 on Ubuntu
22.10 x64):

rm -rf *
CXX=clang++ CC=clang CXXFLAGS="-O2 -Wall -Wextra -pedantic" cmake .. 
-DBUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Debug -DWITH_MULTIARCH=ON && make -j8
test/next_prime2

The test/next_prime2 completes successfully:

...
primesieve_iterator: cannot generate primes > 2^64
next_prime(18446744073709551615) = PRIMESIEVE_ERROR:   OK
primesieve_iterator: cannot generate primes > 2^64
next_prime(18446744073709551615) = PRIMESIEVE_ERROR:   OK

All tests passed successfully!



Is this a known GCC issue? If needed I could also try to write a minimal test
that reproduces this issue.

Re: --target=powerpc64-linux_altivec: Use rs6000_linux64_override_options()?

2022-10-28 Thread Segher Boessenkool
Hi!

On Fri, Oct 28, 2022 at 07:34:24PM +0200, Jan-Benedict Glaw wrote:
> While checking my bot build logs, I noticed that GCC configured for
> --target=powerpc64-linux_altivec will pull in linux64.h and
> linuxaltivec.h .
> 
> linux64.h
>   * Will "#define TARGET_USES_LINUX64_OPT 1" (to make static void
> rs6000_linux64_override_options() available in rs6000.cc).
>   * Will "#define SUBSUBTARGET_OVERRIDE_OPTIONS" to use
> rs6000_linux64_override_options().
> 
> linuxaltivec.h OTOH
>   * Will undef / "#define SUBSUBTARGET_OVERRIDE_OPTIONS  rs6000_altivec_abi = 
> 1"
> and thus no longer use rs6000_linux64_override_options()
>   * That triggers a warning (unused-function).
> 
> To silence that warning, should linuxaltivec.h undefine
> TARGET_USES_LINUX64_OPT? Or set rs6000_altivec_abi=1 and call
> rs6000_linux64_override_options()?

Why do you use powerpc64-linux_altivec?  This things (normally spelled
with a dash, not and underscore, btw) was made for 32-bit targets.  It
never has done anything useful for 64-bit targets, afaik?

(And not for 32-bit targets either really, but that is another issue.)


Segher


[Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code.

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

Andrew Pinski  changed:

   What|Removed |Added

  Known to work||10.4.0
 Status|RESOLVED|NEW
   Target Milestone|--- |11.4
   Last reconfirmed||2022-10-28
Summary|Segmentation fault with |[11/12/13 Regression]
   |vectorized code.|Segmentation fault with
   ||vectorized code.
 Resolution|INVALID |---
 Ever confirmed|0   |1
  Known to fail||11.1.0

--- Comment #2 from Andrew Pinski  ---
I think this code is undefined as x/y are arrays of size 1 but you access one
past.

But here is the main which makes this well defined:
int main(void)
{
double x[2] = {0,0}, y[2] = {0,0};
return dot(1, [0], 4096*4096, [0]);
}

Still an issue on the trunk.
Confirmed.

[Bug tree-optimization/107451] Segmentation fault with vectorized code.

2022-10-28 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Jakub Jelinek  ---
The bug is in the testcase:
gcc -fsanitize=undefined,address -g -o /tmp/pr107451{,.c}; /tmp/pr107451
=
==2296364==ERROR: AddressSanitizer: stack-buffer-overflow on address
0x7ffca382d798 at pc 0x0040148c bp 0x7ffca382d680 sp 0x7ffca382d678
READ of size 8 at 0x7ffca382d798 thread T0
#0 0x40148b in dot /tmp/pr107451.c:9
#1 0x4019f8 in main /tmp/pr107451.c:21
#2 0x7f8c74de858f in __libc_start_call_main (/lib64/libc.so.6+0x2958f)
#3 0x7f8c74de8648 in __libc_start_main@GLIBC_2.2.5
(/lib64/libc.so.6+0x29648)
#4 0x4010f4 in _start (/tmp/pr107451+0x4010f4)

Address 0x7ffca382d798 is located in stack of thread T0 at offset 40 in frame
#0 0x401922 in main /tmp/pr107451.c:19

  This frame has 2 object(s):
[32, 40) 'x' (line 20) <== Memory access at offset 40 overflows this
variable
[64, 72) 'y' (line 20)
HINT: this may be a false positive if your program uses some custom stack
unwind mechanism, swapcontext or vfork
  (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /tmp/pr107451.c:9 in dot
Shadow bytes around the buggy address:
  0x1000146fdaa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdab0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdad0: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00
  0x1000146fdae0: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 f1 f1
=>0x1000146fdaf0: f1 f1 00[f2]f2 f2 00 f3 f3 f3 00 00 00 00 00 00
  0x1000146fdb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdb10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdb20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdb30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdb40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:cb
==2296364==ABORTING

x[ix+1] or y[ix+1] when ix is 0 and x is  in main or y  in main
is an out of bounds access.

[Bug tree-optimization/107451] New: Segmentation fault with vectorized code.

2022-10-28 Thread bartoldeman at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

Bug ID: 107451
   Summary: Segmentation fault with vectorized code.
   Product: gcc
   Version: 11.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bartoldeman at users dot sourceforge.net
  Target Milestone: ---

Created attachment 53785
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53785=edit
Test case

The following code:

double dot(int n, const double *x, int inc_x, const double *y)
{
int i, ix;
double dot[4] = { 0.0, 0.0, 0.0, 0.0 } ; 

ix=0;
for(i = 0; i < n; i++) {
dot[0] += x[ix]   * y[ix]   ;
dot[1] += x[ix+1] * y[ix+1] ;
dot[2] += x[ix]   * y[ix+1] ;
dot[3] += x[ix+1] * y[ix]   ;
ix += inc_x ;
}

return dot[0] + dot[1] + dot[2] + dot[3];
}

int main(void)
{
double x = 0, y = 0;
return dot(1, , 4096*4096, );
}

crashes with (on Linux x86-64)

$ gcc -O2 -ftree-vectorize -march=haswell crash.c -o crash
$ ./a.out 
Segmentation fault

for GCC 11.3.0 and also the current prerelease (gcc version 11.3.1 20221021),
and also when patched with the patches from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212.

The loop code assembly is as follows:

  18:   c5 f9 10 1e vmovupd (%rsi),%xmm3
  1c:   c5 f9 10 21 vmovupd (%rcx),%xmm4
  20:   ff c2   inc%edx
  22:   c4 e3 65 18 0c 06 01vinsertf128 $0x1,(%rsi,%rax,1),%ymm3,%ymm1
  29:   c4 e3 5d 18 04 01 01vinsertf128 $0x1,(%rcx,%rax,1),%ymm4,%ymm0
  30:   48 01 c6add%rax,%rsi
  33:   48 01 c1add%rax,%rcx
  36:   c4 e3 fd 01 c9 11   vpermpd $0x11,%ymm1,%ymm1
  3c:   c4 e3 fd 01 c0 14   vpermpd $0x14,%ymm0,%ymm0
  42:   c4 e2 f5 b8 d0  vfmadd231pd %ymm0,%ymm1,%ymm2
  47:   39 fa   cmp%edi,%edx
  49:   75 cd   jne18 

what happens here is that the vinsertf128 instructions take the element from
one loop iteration later, and those get put in the high halves of ymm0 and
ymm1.
The vpermpd instructions then throw away those high halves again, so e.g. they
turn 1,2,3,4 into 2,1,2,1 and 1,2,2,1 respectively.

So the result is correct but the superfluous vinsertf128 instructions access
memory potentially past the end of x or y and thus a produce a segfault.

related issue (coming from OpenBLAS):
https://github.com/easybuilders/easybuild-easyconfigs/issues/16387
may also be related:
https://github.com/xianyi/OpenBLAS/issues/3740#issuecomment-1233899834
(the particular comment shows very similar code but it's for GCC 12 which
vectorizes by default, OpenBLAS worked around this by disabling the tree
vectorizer there but only on Mac OS and Windows).

Re: [PATCH] docs: document sanitizers can trigger warnings

2022-10-28 Thread Eric Gallager via Gcc-patches
On Wed, Oct 26, 2022 at 7:09 AM Martin Liška  wrote:
>
> PR sanitizer/107298
>
> gcc/ChangeLog:
>
> * doc/invoke.texi: Document sanitizers can trigger warnings.
> ---
>  gcc/doc/invoke.texi | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 64f77e8367a..1ffbba16a72 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -16460,6 +16460,10 @@ by this option.
>
>  @end table
>
> +Note the enabled sanitizer options tend to increase a false-positive rate
> +of selected warnings, most notably @option{-Wmaybe-uninitialized}.
> +And thus we recommend to disable @option{-Werror}.
> +

I'd recommend rewording the second sentence there as:
"Thus, GCC developers recommend disabling @option{-Werror} when using
sanitizer options."

>  While @option{-ftrapv} causes traps for signed overflows to be emitted,
>  @option{-fsanitize=undefined} gives a diagnostic message.
>  This currently works only for the C family of languages.
> --
> 2.38.0
>


Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread Eric Botcazou via Gcc-patches
> COMPARE may also set CC register to a constant when both operands are
> known constants.

No, a COMPARE is never evaluated alone, only the CC user may be evaluated.

-- 
Eric Botcazou




Re: RFC - VRP1 default mode

2022-10-28 Thread Eric Botcazou via Gcc-patches
> I get a clean testsuite run configured and bootstrapped with
> 
> --enable-languages=c,c++,go,fortran,ada,obj-c++,jit --enable-host-shared
> 
> Is there a PR or specific tests in either fortran or ada for those
> improvements? ie, something specific I should check for? Part of rangers
> point is to be able to do symbolic relationships without storing the
> symbolic in the range, just picking it up from the IL as needed.

The motivating Ada example for symbolic ranges was gnat.dg/opt40.adb.

-- 
Eric Botcazou




--target=powerpc64-linux_altivec: Use rs6000_linux64_override_options()?

2022-10-28 Thread Jan-Benedict Glaw
Hi!

While checking my bot build logs, I noticed that GCC configured for
--target=powerpc64-linux_altivec will pull in linux64.h and
linuxaltivec.h .

linux64.h
  * Will "#define TARGET_USES_LINUX64_OPT 1" (to make static void
rs6000_linux64_override_options() available in rs6000.cc).
  * Will "#define SUBSUBTARGET_OVERRIDE_OPTIONS" to use
rs6000_linux64_override_options().

linuxaltivec.h OTOH
  * Will undef / "#define SUBSUBTARGET_OVERRIDE_OPTIONS  rs6000_altivec_abi = 1"
and thus no longer use rs6000_linux64_override_options()
  * That triggers a warning (unused-function).

To silence that warning, should linuxaltivec.h undefine
TARGET_USES_LINUX64_OPT? Or set rs6000_altivec_abi=1 and call
rs6000_linux64_override_options()?

Thanks,
  Jan-Benedict

-- 


signature.asc
Description: PGP signature


Re: [PATCH] c++: -Wdangling-reference and system headers

2022-10-28 Thread Jason Merrill via Gcc-patches

On 10/27/22 11:39, Marek Polacek wrote:

I got this testcase:

   auto f() -> std::optional;
   for (char c : f().value()) { }

which has a dangling reference: std::optional::value returns
a reference to the contained value, but here it's the f() temporary.
We warn, which is great, but only with -Wsystem-headers, because
the function comes from a system header and warning_enabled_at used
in do_warn_dangling_reference checks diagnostic_report_warnings_p,
which in this case returned false so we didn't warn.

Fixed as below.  I could also override dc_warn_system_headers so that
the warning is enabled in system headers always.  With that, I found one
issue in libstdc++:

libstdc++-v3/include/bits/fs_path.h:1265:15: warning: possibly dangling 
reference to a temporary [-Wdangling-reference]
  1265 | auto& __last = *--end();
   |   ^~

which looks like a true positive as well.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

* call.cc (maybe_warn_dangling_reference): Enable the warning in
system headers if the decl isn't in a system header.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference4.C: New test.
---
  gcc/cp/call.cc   |  7 +++
  gcc/testsuite/g++.dg/warn/Wdangling-reference4.C | 14 ++
  2 files changed, 21 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference4.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 951b9fd2a88..c7c7a122045 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13539,6 +13539,13 @@ maybe_warn_dangling_reference (const_tree decl, tree 
init)
  return;
if (!TYPE_REF_P (TREE_TYPE (decl)))
  return;
+  /* Don't suppress the diagnostic just because the call comes from
+ a system header.  If the DECL is not in a system header, or if
+ -Wsystem-headers was provided, warn.  */
+  auto wsh
+= make_temp_override (global_dc->dc_warn_system_headers,
+ (!in_system_header_at (DECL_SOURCE_LOCATION (decl))
+  || global_dc->dc_warn_system_headers));


Hmm, this is OK, but maybe we want a 
warning_enabled_at_ignore_system_header?



if (tree call = do_warn_dangling_reference (init))
  {
auto_diagnostic_group d;
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference4.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference4.C
new file mode 100644
index 000..aee7a29019b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference4.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++17 } }
+// { dg-options "-Wdangling-reference" }
+// Check that we warn here even without -Wsystem-headers.
+
+#include 
+#include 
+
+auto f() -> std::optional;
+
+void
+g ()
+{
+  for (char c : f().value()) { (void) c; } // { dg-warning "dangling 
reference" }
+}

base-commit: f95d3d5de72a1c43e8d529bad3ef59afc3214705




[Bug c/102989] Implement C2x's n2763 (_BitInt)

2022-10-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #30 from Andrew Pinski  ---
I have an use case until 1k except I don't need division. It will in handy
while translating P4 language (https://p4.org/p4-spec/docs/P4-16-v-1.2.3.html)
to C. P4 supports any bit size you want and there are some uses for > 128 for
crypto; usually just a storage area for the key at that point.

Re: [PATCH] libstdc++: std::to_chars std::{,b}float16_t support

2022-10-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 28, 2022 at 12:52:44PM -0400, Patrick Palka wrote:
> IIRC for hex formatting of denormals I opted to be consistent with how
> glibc printf formats them, instead of outputting the truly shortest
> form.

Note, it isn't just denormals,
1.18cp-4
2.318p-5
4.63p-6
8.c6p-7
463p-10
8c6p-11
also represent the same number, the first is what glibc emits (and
is certainly nicer to read), but some of the others are shorter.

Now, the printf %a/%A documentation says that there must be one hexadecimal
digit before the dot if any and that for normalized numbers it must be
non-zero.
So that rules out the last 2, and allows but doesn't require the denormal
treatment the library does right now.
If we shall go really for the shortest, we should handle denormals with
non-zero leading digit too and for all cases consider the 4 shifting
possibilities which one results in shortest (perhaps prefer the smallest
non-zero leading digit among the shortest)?
> > readelf -Ws libstdc++.so.6.0.31 | grep float16_t
> >912: 000ae824   950 FUNCGLOBAL DEFAULT   13 
> > _ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
> >   5767: 000ae4a1   899 FUNCGLOBAL DEFAULT   13 
> > _ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
> >842: 0016d430   106 FUNCLOCAL  DEFAULT   13 
> > _ZN12_GLOBAL__N_113get_ieee_reprINS_23floating_type_float16_tEEENS_6ieee_tIT_EES3_
> >865: 00170980  1613 FUNCLOCAL  DEFAULT   13 
> > _ZSt23__floating_to_chars_hexIN12_GLOBAL__N_123floating_type_float16_tEESt15to_chars_resultPcS3_T_St8optionalIiE.constprop.0.isra.0
> >   7205: 000ae824   950 FUNCGLOBAL DEFAULT   13 
> > _ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format
> >   7985: 000ae4a1   899 FUNCGLOBAL DEFAULT   13 
> > _ZSt20__to_chars_float16_tPcS_fSt12chars_format
> > so 3568 code bytes together or so.
> 
> Ouch, the instantiation of __floating_to_chars_hex for float16 is
> responsible for nearly 50% of the .so size increase

True, but the increase isn't that huge.

Jakub



Re: [PATCH v2 3/3] p1689r5: initial support

2022-10-28 Thread Ben Boeckel via Gcc-patches
On Thu, Oct 27, 2022 at 19:16:44 -0400, Ben Boeckel wrote:
> diff --git a/gcc/testsuite/g++.dg/modules/modules.exp 
> b/gcc/testsuite/g++.dg/modules/modules.exp
> index afb323d0efd..7fe8825144f 100644
> --- a/gcc/testsuite/g++.dg/modules/modules.exp
> +++ b/gcc/testsuite/g++.dg/modules/modules.exp
> @@ -28,6 +28,7 @@
>  # { dg-module-do [link|run] [xfail] [options] } # link [and run]
>  
>  load_lib g++-dg.exp
> +load_lib modules.exp
>  
>  # If a testcase doesn't have special options, use these.
>  global DEFAULT_CXXFLAGS
> @@ -237,6 +238,13 @@ proc cleanup_module_files { files } {
>  }
>  }
>  
> +# delete the specified set of dep files
> +proc cleanup_dep_files { files } {
> +foreach file $files {
> + file_on_host delete $file
> +}
> +}
> +
>  global testdir
>  set testdir $srcdir/$subdir
>  proc srcdir {} {
> @@ -310,6 +318,7 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>   set std_list [module-init $src]
>   foreach std $std_list {
>   set mod_files {}
> + set dep_files {}
>   global module_do
>   set module_do {"compile" "P"}
>   set asm_list {}
> @@ -346,6 +355,8 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>   set mod_files [find $DEFAULT_REPO *.gcm]
>   }
>   cleanup_module_files $mod_files
> +
> + cleanup_dep_files $dep_files
>   }
>  }
>  }

These `cleanup_dep_files` hunks are leftovers from my attempts at
getting the P1689 and flags tests working; they'll be gone in v3.

--Ben


Re: [PATCH v2 3/3] p1689r5: initial support

2022-10-28 Thread Ben Boeckel via Gcc
On Thu, Oct 27, 2022 at 19:16:44 -0400, Ben Boeckel wrote:
> diff --git a/gcc/testsuite/g++.dg/modules/modules.exp 
> b/gcc/testsuite/g++.dg/modules/modules.exp
> index afb323d0efd..7fe8825144f 100644
> --- a/gcc/testsuite/g++.dg/modules/modules.exp
> +++ b/gcc/testsuite/g++.dg/modules/modules.exp
> @@ -28,6 +28,7 @@
>  # { dg-module-do [link|run] [xfail] [options] } # link [and run]
>  
>  load_lib g++-dg.exp
> +load_lib modules.exp
>  
>  # If a testcase doesn't have special options, use these.
>  global DEFAULT_CXXFLAGS
> @@ -237,6 +238,13 @@ proc cleanup_module_files { files } {
>  }
>  }
>  
> +# delete the specified set of dep files
> +proc cleanup_dep_files { files } {
> +foreach file $files {
> + file_on_host delete $file
> +}
> +}
> +
>  global testdir
>  set testdir $srcdir/$subdir
>  proc srcdir {} {
> @@ -310,6 +318,7 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>   set std_list [module-init $src]
>   foreach std $std_list {
>   set mod_files {}
> + set dep_files {}
>   global module_do
>   set module_do {"compile" "P"}
>   set asm_list {}
> @@ -346,6 +355,8 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>   set mod_files [find $DEFAULT_REPO *.gcm]
>   }
>   cleanup_module_files $mod_files
> +
> + cleanup_dep_files $dep_files
>   }
>  }
>  }

These `cleanup_dep_files` hunks are leftovers from my attempts at
getting the P1689 and flags tests working; they'll be gone in v3.

--Ben


Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-10-28 Thread Ben Boeckel via Gcc
On Fri, Oct 28, 2022 at 08:59:16 -0400, David Malcolm wrote:
> On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> > This simplifies the interface for other UTF-8 validity detections
> > when a
> > simple "yes" or "no" answer is sufficient.
> > 
> > Signed-off-by: Ben Boeckel 
> > ---
> >  libcpp/ChangeLog  |  6 ++
> >  libcpp/charset.cc | 18 ++
> >  libcpp/internal.h |  2 ++
> >  3 files changed, 26 insertions(+)
> > 
> > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> > index 4d707277531..4e2c7900ae2 100644
> > --- a/libcpp/ChangeLog
> > +++ b/libcpp/ChangeLog
> > @@ -1,3 +1,9 @@
> > +2022-10-27  Ben Boeckel  
> > +
> > +   * include/charset.cc: Add `_cpp_valid_utf8_str` which
> > determines
> > +   whether a C string is valid UTF-8 or not.
> > +   * include/internal.h: Add prototype for
> > `_cpp_valid_utf8_str`.
> > +
> >  2022-10-27  Ben Boeckel  
> >  
> > * include/charset.cc: Reject encodings of codepoints above
> > 0x10.
> 
> The patch looks good to me, with the same potential caveat that you
> might need to move the ChangeLog entry from the patch "body" to the
> leading blurb, to satisfy:
>   ./contrib/gcc-changelog/git_check_commit.py

Ah, I had missed that. Now fixed locally for patches 1 and 2; will be in
v3 pending some time for further reviews.

THanks,

--Ben


Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-10-28 Thread Ben Boeckel via Gcc-patches
On Fri, Oct 28, 2022 at 08:59:16 -0400, David Malcolm wrote:
> On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> > This simplifies the interface for other UTF-8 validity detections
> > when a
> > simple "yes" or "no" answer is sufficient.
> > 
> > Signed-off-by: Ben Boeckel 
> > ---
> >  libcpp/ChangeLog  |  6 ++
> >  libcpp/charset.cc | 18 ++
> >  libcpp/internal.h |  2 ++
> >  3 files changed, 26 insertions(+)
> > 
> > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> > index 4d707277531..4e2c7900ae2 100644
> > --- a/libcpp/ChangeLog
> > +++ b/libcpp/ChangeLog
> > @@ -1,3 +1,9 @@
> > +2022-10-27  Ben Boeckel  
> > +
> > +   * include/charset.cc: Add `_cpp_valid_utf8_str` which
> > determines
> > +   whether a C string is valid UTF-8 or not.
> > +   * include/internal.h: Add prototype for
> > `_cpp_valid_utf8_str`.
> > +
> >  2022-10-27  Ben Boeckel  
> >  
> > * include/charset.cc: Reject encodings of codepoints above
> > 0x10.
> 
> The patch looks good to me, with the same potential caveat that you
> might need to move the ChangeLog entry from the patch "body" to the
> leading blurb, to satisfy:
>   ./contrib/gcc-changelog/git_check_commit.py

Ah, I had missed that. Now fixed locally for patches 1 and 2; will be in
v3 pending some time for further reviews.

THanks,

--Ben


Re: [PATCH v4] RISC-V: Add support for inlining subword atomic operations

2022-10-28 Thread David Abdurachmanov via Gcc-patches
On Fri, Sep 2, 2022 at 1:09 PM Kito Cheng via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> LGTM with minor comments, it's time to move forward, thanks Patrick and
> Palmer.
>

Ping.

Any plans to finally land this one for GCC 13?

The hope is that this patch would make life significantly easier for
distributions. There are way too many packages failing to build due to
sub-word atomics, which is highly annoying considering that it's not
consistent between package versions. Build times on riscv64 are extremely
long which makes it even more annoying. Would love to see this finally
fixed.


> > +
> > +void
> > +riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
> > +  rtx *not_mask)
> > +{
> > +  /* Align the memory addess to a word.  */
> > +  rtx addr = force_reg (Pmode, XEXP (mem, 0));
> > +
> > +  rtx aligned_addr = gen_reg_rtx (Pmode);
> > +  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
> > + gen_int_mode (-4, Pmode)));
> > +
> > +  *aligned_mem = change_address (mem, SImode, aligned_addr);
> > +
> > +  /* Calculate the shift amount.  */
> > +  *shift = gen_reg_rtx (SImode);
>
> Already allocated reg_rtx outside, this line could be removed.
>
> > +  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode,
> addr),
> > + gen_int_mode (3, SImode)));
> > +  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
> > +gen_int_mode(3, SImode)));
> > +
> > +  /* Calculate the mask.  */
> > +  int unshifted_mask;
> > +  if (GET_MODE (mem) == QImode)
> > +unshifted_mask = 0xFF;
> > +  else
> > +unshifted_mask = 0x;
> > +
> > +  rtx mask_reg = gen_reg_rtx (SImode);
>
> Ditto.
>
> > @@ -152,6 +348,128 @@
> >DONE;
> >  })
> >
> > +(define_expand "atomic_compare_and_swap"
> > +  [(match_operand:SI 0 "register_operand" "");; bool output
> > +   (match_operand:SHORT 1 "register_operand" "") ;; val output
> > +   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
> > +   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
> > +   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
> > +   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
> > +   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
> > +   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
> > +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> > +{
> > +  emit_insn (gen_atomic_cas_value_strong (operands[1],
> operands[2],
> > +   operands[3], operands[4],
> > +   operands[6],
> operands[7]));
> > +
> > +  rtx val = gen_reg_rtx (SImode);
> > +  if (operands[1] != const0_rtx)
> > +emit_insn (gen_rtx_SET (val, gen_rtx_SIGN_EXTEND (SImode,
> operands[1])));
> > +  else
> > +emit_insn (gen_rtx_SET (val, const0_rtx));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
> > +
> > +  rtx exp = gen_reg_rtx (SImode);
> > +  if (operands[3] != const0_rtx)
> > +emit_insn (gen_rtx_SET (exp, gen_rtx_SIGN_EXTEND (SImode,
> operands[3])));
> > +  else
> > +emit_insn (gen_rtx_SET (exp, const0_rtx));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
> > +
> > +  rtx compare = val;
> > +  if (exp != const0_rtx)
> > +{
> > +  rtx difference = gen_rtx_MINUS (SImode, val, exp);
> > +  compare = gen_reg_rtx (SImode);
> > +  emit_insn (gen_rtx_SET (compare, difference));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
> > +}
> > +
> > +  if (word_mode != SImode)
> > +{
> > +  rtx reg = gen_reg_rtx (word_mode);
> > +  emit_insn (gen_rtx_SET (reg, gen_rtx_SIGN_EXTEND (word_mode,
> compare)));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
>
> > +  compare = reg;
> > +}
> > +
> > +  emit_insn (gen_rtx_SET (operands[0], gen_rtx_EQ (SImode, compare,
> const0_rtx)));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>


Re: [PATCH] libstdc++: std::to_chars std::{,b}float16_t support

2022-10-28 Thread Patrick Palka via Gcc-patches
On Thu, 27 Oct 2022, Jakub Jelinek wrote:

> Hi!
> 
> The following patch on top of
> https://gcc.gnu.org/pipermail/libstdc++/2022-October/054849.html
> adds std::{,b}float16_t support for std::to_chars.
> When precision is specified (or for std::bfloat16_t for hex mode even if not),
> I believe we can just use the std::to_chars float (when float is mode
> compatible with std::float32_t) overloads, both formats are proper subsets
> of std::float32_t.
> Unfortunately when precision is not specified and we are supposed to emit
> shortest string, the std::{,b}float16_t strings are usually much shorter.
> E.g. 1.e7p-14f16 shortest fixed representation is
> 0.0001161 and shortest scientific representation is
> 1.161e-04 while 1.e7p-14f32 (same number promoted to std::float32_t)
> 0.00011610985 and
> 1.1610985e-04.
> Similarly for 1.38p-112bf16,
> 0.0235
> 2.35e-34 vs. 1.38p-112f32
> 0.023472271
> 2.3472271e-34
> For std::float16_t there are differences even in the shortest hex, say:
> 0.01p-14 vs. 1p-22
> but only for denormal std::float16_t values (where all std::float16_t
> denormals converted to std::float32_t are normal), __FLT16_MIN__ and
> everything larger in absolute value than that is the same.  Unless
> that is a bug and we should try to discover shorter representations
> even for denormals...

IIRC for hex formatting of denormals I opted to be consistent with how
glibc printf formats them, instead of outputting the truly shortest
form.

I wouldn't be against using the float32 overloads even for shortest hex
formatting of float16.  The output is shorter but equivalent so it
shouldn't cause any problems.

> std::bfloat16_t has the same exponent range as std::float32_t, so all
> std::bfloat16_t denormals are also std::float32_t denormals and thus
> the shortest hex representations are the same.
> 
> As documented, ryu can handle arbitrary IEEE like floating point formats
> (probably not wider than IEEE quad) using the generic_128 handling, but
> ryu is hidden in libstdc++.so.  As only few architectures support
> std::float16_t right now and some of them have special ISA requirements
> for those (e.g. on i?86 one needs -msse2) and std::bfloat16_t is right
> now supported only on x86 (again with -msse2), perhaps with aarch64/arm
> coming next if ARM is interested, but I think it is possible that more
> will be added later, instead of exporting APIs from the library to handle
> directly the std::{,b}float16_t overloads this patch instead exports
> functions which take a float which is a superset of those and expects
> the inline overloads to promote the 16-bit formats to 32-bit, then inside
> of the library it ensures they are printed right.
> With the added [[gnu::cold]] attribute because I think most users
> will primarily use these formats as storage formats and perform arithmetics
> in the excess precision for them and print also as std::float32_t the
> added support doesn't seem to be too large, on x86_64:
> readelf -Ws libstdc++.so.6.0.31 | grep float16_t
>912: 000ae824   950 FUNCGLOBAL DEFAULT   13 
> _ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
>   5767: 000ae4a1   899 FUNCGLOBAL DEFAULT   13 
> _ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
>842: 0016d430   106 FUNCLOCAL  DEFAULT   13 
> _ZN12_GLOBAL__N_113get_ieee_reprINS_23floating_type_float16_tEEENS_6ieee_tIT_EES3_
>865: 00170980  1613 FUNCLOCAL  DEFAULT   13 
> _ZSt23__floating_to_chars_hexIN12_GLOBAL__N_123floating_type_float16_tEESt15to_chars_resultPcS3_T_St8optionalIiE.constprop.0.isra.0
>   7205: 000ae824   950 FUNCGLOBAL DEFAULT   13 
> _ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format
>   7985: 000ae4a1   899 FUNCGLOBAL DEFAULT   13 
> _ZSt20__to_chars_float16_tPcS_fSt12chars_format
> so 3568 code bytes together or so.

Ouch, the instantiation of __floating_to_chars_hex for float16 is
responsible for nearly 50% of the .so size increase

> 
> Tested with the attached test (which doesn't prove the shortest
> representation, just prints std::{,b}float16_t and std::float32_t
> shortest strings side by side, then tries to verify it can be
> emitted even into the exact sized range and can't be into range
> one smaller than that and tries to read what is printed
> back using from_chars float32_t overload (so there could be
> double rounding, but apparently there is none for the shortest strings).
> The only differences printed are for NaNs, where sNaNs are canonicalized
> to canonical qNaNs and as to_chars doesn't print NaN mantissa, even qNaNs
> other than the canonical one are read back just as the canonical NaN.
> 
> Also attaching what Patrick wrote to generate the pow10_adjustment_tab,
> for std::float16_t only 1.0, 10.0, 100.0, 1000.0 and 1.0 are powers
> of 10 in the range because __FLT16_MAX__ is 65504.0, and all of the above
> are exactly 

Re: [PATCH v3] RISC-V: Libitm add RISC-V support.

2022-10-28 Thread Palmer Dabbelt

On Fri, 28 Oct 2022 02:37:13 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

I guess we don't really care about RV32E here, but in case you add a
guard for that?

#ifdef __riscv_e
#error "rv32e unsupported"
#endif


Ah, thanks.  There's rv64e now too, but that's just an error message 
problem so probably not a big deal.



On Fri, Oct 28, 2022 at 4:39 PM Xiongchuan Tan via Gcc-patches
 wrote:


Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

libitm/ChangeLog:

* configure.tgt: Add riscv support.
* config/riscv/asm.h: New file.
* config/riscv/sjlj.S: New file.
* config/riscv/target.h: New file.
---
v2: Change HW_CACHELINE_SIZE to 64 (in accordance with the RVA profiles, see
https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc)

v3: Ensure the stack is aligned to 16 bytes; make use of Zihintpause in
cpu_relax()

 libitm/config/riscv/asm.h|  54 +
 libitm/config/riscv/sjlj.S   | 144 +++
 libitm/config/riscv/target.h |  62 +++
 libitm/configure.tgt |   2 +
 4 files changed, 262 insertions(+)
 create mode 100644 libitm/config/riscv/asm.h
 create mode 100644 libitm/config/riscv/sjlj.S
 create mode 100644 libitm/config/riscv/target.h

diff --git a/libitm/config/riscv/asm.h b/libitm/config/riscv/asm.h
new file mode 100644
index 000..bb515f2
--- /dev/null
+++ b/libitm/config/riscv/asm.h
@@ -0,0 +1,54 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Xiongchuan Tan .
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _RV_ASM_H
+#define _RV_ASM_H
+
+#if __riscv_xlen == 64
+#  define GPR_L ld
+#  define GPR_S sd
+#  define SZ_GPR 8
+#  define LEN_GPR 14
+#elif __riscv_xlen == 32
+#  define GPR_L lw
+#  define GPR_S sw
+#  define SZ_GPR 4
+#  define LEN_GPR 16 /* Extra padding to align the stack to 16 bytes */
+#else
+#  error Unsupported XLEN (must be 64-bit or 32-bit).
+#endif
+
+#if defined(__riscv_flen) && __riscv_flen == 64
+#  define FPR_L fld
+#  define FPR_S fsd
+#  define SZ_FPR 8
+#elif defined(__riscv_flen) && __riscv_flen == 32
+#  define FPR_L flw
+#  define FPR_S fsw
+#  define SZ_FPR 4


Check __riscv_flen is not 32 or 64 here, in case we add Q-extension
then we can error out.


diff --git a/libitm/config/riscv/sjlj.S b/libitm/config/riscv/sjlj.S
new file mode 100644
index 000..93f12ec
--- /dev/null
+++ b/libitm/config/riscv/sjlj.S
@@ -0,0 +1,144 @@
+#include "asmcfi.h"
+#include "asm.h"
+
+   .text
+   .align  2
+   .global _ITM_beginTransaction
+   .type   _ITM_beginTransaction, @function
+
+_ITM_beginTransaction:
+   cfi_startproc
+   mv a1, sp
+   addi sp, sp, -(LEN_GPR*SZ_GPR+ 12*SZ_FPR)


This expression appeared 4 times, maybe define a marco ADJ_STACK_SIZE
or something else to hold that?


+   cfi_adjust_cfa_offset(LEN_GPR*SZ_GPR+ 12*SZ_FPR)



diff --git a/libitm/config/riscv/target.h b/libitm/config/riscv/target.h
new file mode 100644
index 000..b8a1665
--- /dev/null
+++ b/libitm/config/riscv/target.h
@@ -0,0 +1,62 @@
+typedef struct gtm_jmpbuf
+  {
+long int pc;
+void *cfa;
+long int s[12]; /* Saved registers, s0 is fp */
+
+#if __riscv_xlen == 32
+/* Ensure that the stack is 16-byte aligned */
+long int padding[2];
+#endif
+
+/* FP saved registers */
+#if defined(__riscv_flen) && __riscv_flen == 64
+double fs[12];
+#elif defined(__riscv_flen) && __riscv_flen == 32
+float fs[12];


Same here, error __riscv_flen if defined but not 64 or 32.


[PATCH 12/15 V3] arm: implement bti injection

2022-10-28 Thread Andrea Corallo via Gcc-patches
Hi all,

please find attached the third iteration of this patch addresing review
comments.

Thanks

  Andrea

>From e3001bd662b84dafeca200b52fc644b7bf81c4af Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Thu, 7 Apr 2022 11:51:56 +0200
Subject: [PATCH] [PATCH 12/15] arm: implement bti injection

Hi all,

this patch enables Branch Target Identification Armv8.1-M Mechanism
[1].

This is achieved by using the bti pass made common with Aarch64.

The pass iterates through the instructions and adds the necessary BTI
instructions at the beginning of every function and at every landing
pads targeted by indirect jumps.

Best Regards

  Andrea

[1]


gcc/ChangeLog

2022-04-07  Andrea Corallo  

* config.gcc (arm*-*-*): Add 'aarch-bti-insert.o' object.
* config/arm/arm-protos.h: Update.
* config/arm/arm.cc (aarch_bti_enabled, aarch_bti_j_insn_p)
(aarch_pac_insn_p, aarch_gen_bti_c, aarch_gen_bti_j): New
functions.
* config/arm/arm.md (bti_nop): New insn.
* config/arm/t-arm (PASSES_EXTRA): Add 'arm-passes.def'.
(aarch-bti-insert.o): New target.
* config/arm/unspecs.md (UNSPEC_BTI_NOP): New unspec.
* config/arm/aarch-bti-insert.cc (rest_of_insert_bti): Update
to verify arch compatibility.
* config/arm/arm-passes.def: New file.

gcc/testsuite/ChangeLog

2022-04-07  Andrea Corallo  

* gcc.target/arm/bti-1.c: New testcase.
* gcc.target/arm/bti-2.c: Likewise.
---
 gcc/config.gcc   |  2 +-
 gcc/config/arm/arm-passes.def| 21 ++
 gcc/config/arm/arm-protos.h  |  2 +
 gcc/config/arm/arm.cc| 61 +---
 gcc/config/arm/arm.md|  7 
 gcc/config/arm/t-arm | 10 +
 gcc/config/arm/unspecs.md|  1 +
 gcc/testsuite/gcc.target/arm/bti-1.c | 12 ++
 gcc/testsuite/gcc.target/arm/bti-2.c | 58 ++
 9 files changed, 167 insertions(+), 7 deletions(-)
 create mode 100644 gcc/config/arm/arm-passes.def
 create mode 100644 gcc/testsuite/gcc.target/arm/bti-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/bti-2.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2021bdf9d2f..004e1dfa8d8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -353,7 +353,7 @@ arc*-*-*)
;;
 arm*-*-*)
cpu_type=arm
-   extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o"
+   extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o 
aarch-bti-insert.o"
extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h 
arm_bf16.h arm_mve_types.h arm_mve.h arm_cde.h"
target_type_format_char='%'
c_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-passes.def b/gcc/config/arm/arm-passes.def
new file mode 100644
index 000..71d6b563640
--- /dev/null
+++ b/gcc/config/arm/arm-passes.def
@@ -0,0 +1,21 @@
+/* Arm-specific passes declarations.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 84764bf27ce..6befb6c4445 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -24,6 +24,8 @@
 
 #include "sbitmap.h"
 
+rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
+
 extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
 extern int use_return_insn (int, rtx);
 extern bool use_simple_return_p (void);
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index fa0f9a61498..26d4c1502f2 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -23374,12 +23374,6 @@ output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }
 
-static bool
-aarch_bti_enabled ()
-{
-  return false;
-}
-
 /* Generate the prologue instructions for entry into an ARM or Thumb-2
function.  */
 void
@@ -32992,6 +32986,61 @@ arm_current_function_pac_enabled_p (void)
   && !crtl->is_leaf));
 }
 
+/* Return TRUE if Branch Target Identification Mechanism is enabled.  */
+bool

[PATCH 10/15 V3] arm: Implement cortex-M return signing address codegen

2022-10-28 Thread Andrea Corallo via Gcc-patches
Hi all,

the third iteration of this patch is attached addresing review comments.

Thanks

  Andrea

>From b42e28be75f374a4e1a5943c8c9002e07dbcc567 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Thu, 20 Jan 2022 15:36:23 +0100
Subject: [PATCH] [PATCH 10/15] arm: Implement cortex-M return signing address
 codegen

Hi all,

this patch enables address return signature and verification based on
Armv8.1-M Pointer Authentication [1].

To sign the return address, we use the PAC R12, LR, SP instruction
upon function entry.  This is signing LR using SP and storing the
result in R12.  R12 will be pushed into the stack.

During function epilogue R12 will be popped and AUT R12, LR, SP will
be used to verify that the content of LR is still valid before return.

Here an example of PAC instrumented function prologue and epilogue:

void foo (void);

int main()
{
  foo ();
  return 0;
}

Compiled with '-march=armv8.1-m.main -mbranch-protection=pac-ret
-mthumb' translates into:

main:
pac ip, lr, sp
push{r3, r7, ip, lr}
add r7, sp, #0
bl  foo
movsr3, #0
mov r0, r3
pop {r3, r7, ip, lr}
aut ip, lr, sp
bx  lr

The patch also takes care of generating a PACBTI instruction in place
of the sequence BTI+PAC when Branch Target Identification is enabled
contextually.

Ex. the previous example compiled with '-march=armv8.1-m.main
-mbranch-protection=pac-ret+bti -mthumb' translates into:

main:
pacbti  ip, lr, sp
push{r3, r7, ip, lr}
add r7, sp, #0
bl  foo
movsr3, #0
mov r0, r3
pop {r3, r7, ip, lr}
aut ip, lr, sp
bx  lr

As part of previous upstream suggestions a test for varargs has been
added and '-mtpcs-frame' is deemed being incompatible with this return
signing address feature being introduced.

[1] 


gcc/Changelog

2021-11-03  Andrea Corallo  

* config/arm/arm.h (arm_arch8m_main): Declare it.
* config/arm/arm.cc (arm_arch8m_main): Define it.
(arm_option_reconfigure_globals): Set arm_arch8m_main.
(arm_compute_frame_layout, arm_expand_prologue)
(thumb2_expand_return, arm_expand_epilogue)
(arm_conditional_register_usage): Update for pac codegen.
(arm_current_function_pac_enabled_p): New function.
* config/arm/arm.md (pac_ip_lr_sp, pacbti_ip_lr_sp, aut_ip_lr_sp):
Add new patterns.
* config/arm/unspecs.md (UNSPEC_PAC_IP_LR_SP)
(UNSPEC_PACBTI_IP_LR_SP, UNSPEC_AUT_IP_LR_SP): Add unspecs.

gcc/testsuite/Changelog

2021-11-03  Andrea Corallo  

* gcc.target/arm/pac.h : New file.
* gcc.target/arm/pac-1.c : New test case.
* gcc.target/arm/pac-2.c : Likewise.
* gcc.target/arm/pac-3.c : Likewise.
* gcc.target/arm/pac-4.c : Likewise.
* gcc.target/arm/pac-5.c : Likewise.
* gcc.target/arm/pac-6.c : Likewise.
* gcc.target/arm/pac-7.c : Likewise.
* gcc.target/arm/pac-8.c : Likewise.
---
 gcc/config/arm/arm-protos.h  |  1 +
 gcc/config/arm/arm.cc| 77 +++-
 gcc/config/arm/arm.h |  4 ++
 gcc/config/arm/arm.md| 23 +
 gcc/config/arm/unspecs.md|  3 ++
 gcc/testsuite/gcc.target/arm/pac-1.c | 12 +
 gcc/testsuite/gcc.target/arm/pac-2.c | 11 
 gcc/testsuite/gcc.target/arm/pac-3.c | 11 
 gcc/testsuite/gcc.target/arm/pac-4.c | 10 
 gcc/testsuite/gcc.target/arm/pac-5.c | 28 ++
 gcc/testsuite/gcc.target/arm/pac-6.c | 18 +++
 gcc/testsuite/gcc.target/arm/pac-7.c | 32 
 gcc/testsuite/gcc.target/arm/pac-8.c | 34 
 gcc/testsuite/gcc.target/arm/pac.h   | 17 ++
 14 files changed, 268 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-4.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-5.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-6.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-7.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-8.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac.h

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cff7ff1da2a..84764bf27ce 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -379,6 +379,7 @@ extern int vfp3_const_double_for_bits (rtx);
 extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
   rtx);
 extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
+extern bool 

Re: Rust frontend patches v3

2022-10-28 Thread David Malcolm via Gcc-patches
On Fri, 2022-10-28 at 17:20 +0200, Arthur Cohen wrote:
> 
> 
> On 10/28/22 15:06, David Malcolm wrote:
> > On Fri, 2022-10-28 at 13:48 +0200, Arthur Cohen wrote:
> > > Hi David,
> > > 
> > > On 10/26/22 23:15, David Malcolm wrote:
> > > > On Wed, 2022-10-26 at 10:17 +0200,
> > > > arthur.co...@embecosm.com wrote:
> > > > > This is the fixed version of our previous patch set for gccrs
> > > > > -
> > > > > We've
> > > > > adressed
> > > > > the comments raised in our previous emails.

[...snip...]

> > 
> > I'm guessing that almost all of gccrs testing so far has been on
> > relatively small examples, so that even if the GC considers
> > collecting,
> > the memory usage might not have exceeded the threshold for actually
> > doing the mark-and-sweep collection, and so no collection has been
> > happening during your testing.
> > 
> > In case you haven't tried yet, you might want to try adding:
> >    --param=ggc-min-expand=0 --param=ggc-min-heapsize=0
> > which IIRC forces the GC to actually do its mark-and-sweep
> > collection
> > at every potential point where it might collect.
> 
> That's very helpful, thanks a lot. I've ran our testsuite with these
> and 
> found no issues, but we might consider adding that to our CI setup to
> make sure.

Great!   Though as noted, for libgccjit it slows the testsuite down
*massively*, so you might want to bear that in mind.  I'm doing it for
libgccjit because libgccjit looks like a "frontend" to the rest of the
GCC codebase, but it's a deeply weird one, and so tends to uncover
weird issues :-/

Dave

> 
> Kindly,
> 
> Arthur
> 
> > I use these params in libgccjit's test suite; it massively slows
> > things
> > down, but it makes any GC misuse crash immediately even on minimal
> > test
> > cases, rather than hiding problems until you have a big (and thus
> > nasty) test case.
> > 
> > Hope this is helpful
> > Dave
> > 
> > 
> > > 
> > > > Hope this is constructive
> > > > Dave
> > > > 
> > > 
> > > Thanks a lot for the input,
> > > 
> > > All the best,
> > > 
> > > Arthur
> > > 
> > > 
> > > 
> > > 
> > 



Re: Rust frontend patches v3

2022-10-28 Thread David Malcolm via Gcc-rust
On Fri, 2022-10-28 at 17:20 +0200, Arthur Cohen wrote:
> 
> 
> On 10/28/22 15:06, David Malcolm wrote:
> > On Fri, 2022-10-28 at 13:48 +0200, Arthur Cohen wrote:
> > > Hi David,
> > > 
> > > On 10/26/22 23:15, David Malcolm wrote:
> > > > On Wed, 2022-10-26 at 10:17 +0200,
> > > > arthur.co...@embecosm.com wrote:
> > > > > This is the fixed version of our previous patch set for gccrs
> > > > > -
> > > > > We've
> > > > > adressed
> > > > > the comments raised in our previous emails.

[...snip...]

> > 
> > I'm guessing that almost all of gccrs testing so far has been on
> > relatively small examples, so that even if the GC considers
> > collecting,
> > the memory usage might not have exceeded the threshold for actually
> > doing the mark-and-sweep collection, and so no collection has been
> > happening during your testing.
> > 
> > In case you haven't tried yet, you might want to try adding:
> >    --param=ggc-min-expand=0 --param=ggc-min-heapsize=0
> > which IIRC forces the GC to actually do its mark-and-sweep
> > collection
> > at every potential point where it might collect.
> 
> That's very helpful, thanks a lot. I've ran our testsuite with these
> and 
> found no issues, but we might consider adding that to our CI setup to
> make sure.

Great!   Though as noted, for libgccjit it slows the testsuite down
*massively*, so you might want to bear that in mind.  I'm doing it for
libgccjit because libgccjit looks like a "frontend" to the rest of the
GCC codebase, but it's a deeply weird one, and so tends to uncover
weird issues :-/

Dave

> 
> Kindly,
> 
> Arthur
> 
> > I use these params in libgccjit's test suite; it massively slows
> > things
> > down, but it makes any GC misuse crash immediately even on minimal
> > test
> > cases, rather than hiding problems until you have a big (and thus
> > nasty) test case.
> > 
> > Hope this is helpful
> > Dave
> > 
> > 
> > > 
> > > > Hope this is constructive
> > > > Dave
> > > > 
> > > 
> > > Thanks a lot for the input,
> > > 
> > > All the best,
> > > 
> > > Arthur
> > > 
> > > 
> > > 
> > > 
> > 

-- 
Gcc-rust mailing list
Gcc-rust@gcc.gnu.org
https://gcc.gnu.org/mailman/listinfo/gcc-rust


[Bug testsuite/106806] [13 regression] gcc.dg/tree-ssa/gen-vect-34.c fails after r13-2333-gca8f4e8af14869

2022-10-28 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106806

seurer at gcc dot gnu.org changed:

   What|Removed |Added

 Target|powerpc64le-linux-gnu   |powerpc64le-linux-gnu
   |hppa-linux-gnu  |powerpc64-linux-gnuhppa-lin
   ||ux-gnu
  Build|powerpc64le-linux-gnu   |powerpc64le-linux-gnu
   |hppa-linux-gnu  |powerpc64-linux-gnu
   ||hppa-linux-gnu
   Host|powerpc64le-linux-gnu   |powerpc64le-linux-gnu
   |hppa-linux-gnu  |powerpc64-linux-gnu
   ||hppa-linux-gnu
 CC||bergner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org

--- Comment #2 from seurer at gcc dot gnu.org ---
Also occurs on powerpc64 big endian.

[Bug testsuite/107073] New test case gcc.dg/tree-ssa/gen-vect-34.c from r13-2333-gca8f4e8af14869 fails

2022-10-28 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107073

seurer at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from seurer at gcc dot gnu.org ---
This is a duplicate of one I tracked down on little endian.

*** This bug has been marked as a duplicate of bug 106806 ***

[Bug testsuite/106806] [13 regression] gcc.dg/tree-ssa/gen-vect-34.c fails after r13-2333-gca8f4e8af14869

2022-10-28 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106806

--- Comment #1 from seurer at gcc dot gnu.org ---
*** Bug 107073 has been marked as a duplicate of this bug. ***

[Bug testsuite/107073] New test case gcc.dg/tree-ssa/gen-vect-34.c from r13-2333-gca8f4e8af14869 fails

2022-10-28 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107073

seurer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||bergner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org
  Build|powerpc64-linux-gnu |powerpc64-linux-gnu,
   ||powerpc64le-linux-gnu
   Host|powerpc64-linux-gnu |powerpc64-linux-gnu,
   ||powerpc64le-linux-gnu
Summary|New test case   |New test case
   |gcc.dg/tree-ssa/gen-vect-34 |gcc.dg/tree-ssa/gen-vect-34
   |.c fails|.c from
   ||r13-2333-gca8f4e8af14869
   ||fails
 Target|powerpc64-linux-gnu |powerpc64-linux-gnu,
   ||powerpc64le-linux-gnu

--- Comment #1 from seurer at gcc dot gnu.org ---
Correction: It also fails on powerpc64 LE in the same way.

[Bug analyzer/107345] - -Wanayzer-null-dereference false positive with giving weird path infomation

2022-10-28 Thread geoffreydgr at icloud dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107345

--- Comment #5 from Geoffrey  ---
(In reply to David Malcolm from comment #3)
> Fixed on trunk for GCC 13 by the above patch.
> 
> Keeping open for backporting to GCC 12.

That is really great! Thanks a lot!

[Bug analyzer/107345] - -Wanayzer-null-dereference false positive with giving weird path infomation

2022-10-28 Thread geoffreydgr at icloud dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107345

--- Comment #4 from Geoffrey  ---
(In reply to David Malcolm from comment #3)
> Fixed on trunk for GCC 13 by the above patch.
> 
> Keeping open for backporting to GCC 12.

That is really great! Thanks a lot!

Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread H.J. Lu via Gcc-patches
On Fri, Oct 28, 2022 at 1:35 AM Eric Botcazou  wrote:
>
> > (set (reg:SI 93)
> >  (neg:SI (ltu:SI (reg:CCC 17 flags) (const_int 0 [0]
> >
> > as
> >
> > (set (reg:SI 93)
> >  (neg:SI (ltu:SI (const_int 1) (const_int 0 [0]
> >
> > which leads to incorrect results since LTU on MODE_CC register isn't the
> > same as "unsigned less than" in x86 backend.
>
> That's not specific to the x86 back-end, i.e. it's a generic caveat.
>
> >   PR target/107172
> >   * config/i386/i386.md (UNSPEC_CC_NE): New.
> >   Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns.
>
> FWIW the SPARC back-end uses a COMPARE instead of an UNSPEC here.

COMPARE may also set CC register to a constant when both operands are
known constants.


-- 
H.J.


[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-28 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #48 from H.J. Lu  ---
(In reply to Roger Sayle from comment #47)
> I really don't believe that using UNSPEC here is the correct way to go, but
> it appears to be the (only?) approach that Segher is prepared to approve. 
> Hohum.

I wish we could avoid UNSPEC.

Re: Rust frontend patches v3

2022-10-28 Thread Arthur Cohen



On 10/28/22 15:06, David Malcolm wrote:

On Fri, 2022-10-28 at 13:48 +0200, Arthur Cohen wrote:

Hi David,

On 10/26/22 23:15, David Malcolm wrote:

On Wed, 2022-10-26 at 10:17 +0200, arthur.co...@embecosm.com wrote:

This is the fixed version of our previous patch set for gccrs -
We've
adressed
the comments raised in our previous emails.


[...snip...]

(Caveat: I'm not a global reviewer)

Sorry if this is answered in the docs in the patch kit, but a high-
level question: what's the interaction between gccrs and gcc's
garbage
collector?  Are the only GC-managed objects (such as trees) either
(a)
created near the end of the gccrs, or (b) common globals created at
initialization and with GTY roots?


We only create trees at the last point of our compilation pipeline,
before directly writing them to the backend. This then calls a
`write_global_definitions` method, that we ported over directly from
the
Go frontend. Among other things, this method has the role of
preserving
trees from the GC using `go_preserve_from_gc()` (or
`rust_preserve_from_gc()` in our case).

Elsewhere in our pipeline, we never call any garbage-collection
routines
or GC-related functions.


Are there any points where a collection happen within gccrs?  Or is
almost everything stored using
gccrs's own data structures, and are these managed in the regular
(non-
GC) heap?


This is correct. We have an AST representation, implemented using
unique
pointers, which is then lowered to an HIR, also using unique
pointers.


I skimmed the patches and see that gccrs uses e.g. std::vector,
std::unique_ptr, std::map, and std::string; this seems reasonable
to
me, but it got me thinking about memory management strategies.

I see various std::map e.g. in Rust::Compile::Context; so
e.g.
is the GC guaranteed never to collect whilst this is live?


This is a really interesting question, and I hope the answer is yes!
But
I'm unsure as to how to enforce that, as I am not too familiar with
the
GCC GC. I'm hoping someone else will weigh in. As I said, we do not
do
anything particular with the GC during the execution of our
`CompileCrate` visitor, so hopefully it shouldn't run.


I'm guessing that almost all of gccrs testing so far has been on
relatively small examples, so that even if the GC considers collecting,
the memory usage might not have exceeded the threshold for actually
doing the mark-and-sweep collection, and so no collection has been
happening during your testing.

In case you haven't tried yet, you might want to try adding:
   --param=ggc-min-expand=0 --param=ggc-min-heapsize=0
which IIRC forces the GC to actually do its mark-and-sweep collection
at every potential point where it might collect.


That's very helpful, thanks a lot. I've ran our testsuite with these and 
found no issues, but we might consider adding that to our CI setup to 
make sure.


Kindly,

Arthur


I use these params in libgccjit's test suite; it massively slows things
down, but it makes any GC misuse crash immediately even on minimal test
cases, rather than hiding problems until you have a big (and thus
nasty) test case.

Hope this is helpful
Dave





Hope this is constructive
Dave



Thanks a lot for the input,

All the best,

Arthur








OpenPGP_0x1B3465B044AD9C65.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
-- 
Gcc-rust mailing list
Gcc-rust@gcc.gnu.org
https://gcc.gnu.org/mailman/listinfo/gcc-rust


Re: Rust frontend patches v3

2022-10-28 Thread Arthur Cohen



On 10/28/22 15:06, David Malcolm wrote:

On Fri, 2022-10-28 at 13:48 +0200, Arthur Cohen wrote:

Hi David,

On 10/26/22 23:15, David Malcolm wrote:

On Wed, 2022-10-26 at 10:17 +0200, arthur.co...@embecosm.com wrote:

This is the fixed version of our previous patch set for gccrs -
We've
adressed
the comments raised in our previous emails.


[...snip...]

(Caveat: I'm not a global reviewer)

Sorry if this is answered in the docs in the patch kit, but a high-
level question: what's the interaction between gccrs and gcc's
garbage
collector?  Are the only GC-managed objects (such as trees) either
(a)
created near the end of the gccrs, or (b) common globals created at
initialization and with GTY roots?


We only create trees at the last point of our compilation pipeline,
before directly writing them to the backend. This then calls a
`write_global_definitions` method, that we ported over directly from
the
Go frontend. Among other things, this method has the role of
preserving
trees from the GC using `go_preserve_from_gc()` (or
`rust_preserve_from_gc()` in our case).

Elsewhere in our pipeline, we never call any garbage-collection
routines
or GC-related functions.


Are there any points where a collection happen within gccrs?  Or is
almost everything stored using
gccrs's own data structures, and are these managed in the regular
(non-
GC) heap?


This is correct. We have an AST representation, implemented using
unique
pointers, which is then lowered to an HIR, also using unique
pointers.


I skimmed the patches and see that gccrs uses e.g. std::vector,
std::unique_ptr, std::map, and std::string; this seems reasonable
to
me, but it got me thinking about memory management strategies.

I see various std::map e.g. in Rust::Compile::Context; so
e.g.
is the GC guaranteed never to collect whilst this is live?


This is a really interesting question, and I hope the answer is yes!
But
I'm unsure as to how to enforce that, as I am not too familiar with
the
GCC GC. I'm hoping someone else will weigh in. As I said, we do not
do
anything particular with the GC during the execution of our
`CompileCrate` visitor, so hopefully it shouldn't run.


I'm guessing that almost all of gccrs testing so far has been on
relatively small examples, so that even if the GC considers collecting,
the memory usage might not have exceeded the threshold for actually
doing the mark-and-sweep collection, and so no collection has been
happening during your testing.

In case you haven't tried yet, you might want to try adding:
   --param=ggc-min-expand=0 --param=ggc-min-heapsize=0
which IIRC forces the GC to actually do its mark-and-sweep collection
at every potential point where it might collect.


That's very helpful, thanks a lot. I've ran our testsuite with these and 
found no issues, but we might consider adding that to our CI setup to 
make sure.


Kindly,

Arthur


I use these params in libgccjit's test suite; it massively slows things
down, but it makes any GC misuse crash immediately even on minimal test
cases, rather than hiding problems until you have a big (and thus
nasty) test case.

Hope this is helpful
Dave





Hope this is constructive
Dave



Thanks a lot for the input,

All the best,

Arthur








OpenPGP_0x1B3465B044AD9C65.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[pushed] c++: apply friend attributes sooner

2022-10-28 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- >8 --

Comparing attributes between declarations of a friend function has been
complicated by pushdecl happening before decl_attributes.  I assumed there
was some complicated reason we weren't calling decl_attributes here, but it
doesn't break anything.

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Call decl_attributes before do_friend.
---
 gcc/cp/decl.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index bc085f8fcce..c7f1937ea48 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -14206,13 +14206,16 @@ grokdeclarator (const cp_declarator *declarator,
else if (decl && DECL_NAME (decl))
  {
set_originating_module (decl, true);
-   
+
if (initialized)
  /* Kludge: We need funcdef_flag to be true in do_friend for
 in-class defaulted functions, but that breaks grokfndecl.
 So set it here.  */
  funcdef_flag = true;
 
+   cplus_decl_attributes (, *attrlist, 0);
+   *attrlist = NULL_TREE;
+
decl = do_friend (ctype, unqualified_id, decl,
  flags, funcdef_flag);
return decl;

base-commit: 4fe34cdcc80ac225b80670eabc38ac5e31ce8a5a
-- 
2.31.1



[Bug c++/107450] GCC accepts invalid program involving multiple template parameter packs

2022-10-28 Thread jlame646 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107450

--- Comment #1 from Jason Liam  ---
Also if we remove one of the template parameter(say T3) then msvc starts
compiling this code as well. Demo: https://godbolt.org/z/qacMzoT3q


Additionally, this current bug is most probably a duplicate of:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69623

Re: [PATCH] libstdc++: Make placeholders inline when inline variables are available

2022-10-28 Thread Jonathan Wakely via Gcc-patches

On 20/10/22 16:58 +0200, Arsen Arsenović wrote:

This slightly lowers the dependency of generated code on libstdc++.so.


Looks good, I'll test and push, thanks.


libstdc++-v3/ChangeLog:

* include/std/functional: Make placeholders inline, if possible.
---
libstdc++-v3/include/std/functional | 66 -
1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index d22acaa3cb8..b396e8dbbdc 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -285,35 +285,43 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   * simplify this with variadic templates, because we're introducing
   * unique names for each.
   */
-extern const _Placeholder<1> _1;
-extern const _Placeholder<2> _2;
-extern const _Placeholder<3> _3;
-extern const _Placeholder<4> _4;
-extern const _Placeholder<5> _5;
-extern const _Placeholder<6> _6;
-extern const _Placeholder<7> _7;
-extern const _Placeholder<8> _8;
-extern const _Placeholder<9> _9;
-extern const _Placeholder<10> _10;
-extern const _Placeholder<11> _11;
-extern const _Placeholder<12> _12;
-extern const _Placeholder<13> _13;
-extern const _Placeholder<14> _14;
-extern const _Placeholder<15> _15;
-extern const _Placeholder<16> _16;
-extern const _Placeholder<17> _17;
-extern const _Placeholder<18> _18;
-extern const _Placeholder<19> _19;
-extern const _Placeholder<20> _20;
-extern const _Placeholder<21> _21;
-extern const _Placeholder<22> _22;
-extern const _Placeholder<23> _23;
-extern const _Placeholder<24> _24;
-extern const _Placeholder<25> _25;
-extern const _Placeholder<26> _26;
-extern const _Placeholder<27> _27;
-extern const _Placeholder<28> _28;
-extern const _Placeholder<29> _29;
+#if __cpp_inline_variables
+#  define _GLIBCXX_PLACEHOLDER inline
+#else
+#  define _GLIBCXX_PLACEHOLDER extern
+#endif
+
+_GLIBCXX_PLACEHOLDER const _Placeholder<1> _1;
+_GLIBCXX_PLACEHOLDER const _Placeholder<2> _2;
+_GLIBCXX_PLACEHOLDER const _Placeholder<3> _3;
+_GLIBCXX_PLACEHOLDER const _Placeholder<4> _4;
+_GLIBCXX_PLACEHOLDER const _Placeholder<5> _5;
+_GLIBCXX_PLACEHOLDER const _Placeholder<6> _6;
+_GLIBCXX_PLACEHOLDER const _Placeholder<7> _7;
+_GLIBCXX_PLACEHOLDER const _Placeholder<8> _8;
+_GLIBCXX_PLACEHOLDER const _Placeholder<9> _9;
+_GLIBCXX_PLACEHOLDER const _Placeholder<10> _10;
+_GLIBCXX_PLACEHOLDER const _Placeholder<11> _11;
+_GLIBCXX_PLACEHOLDER const _Placeholder<12> _12;
+_GLIBCXX_PLACEHOLDER const _Placeholder<13> _13;
+_GLIBCXX_PLACEHOLDER const _Placeholder<14> _14;
+_GLIBCXX_PLACEHOLDER const _Placeholder<15> _15;
+_GLIBCXX_PLACEHOLDER const _Placeholder<16> _16;
+_GLIBCXX_PLACEHOLDER const _Placeholder<17> _17;
+_GLIBCXX_PLACEHOLDER const _Placeholder<18> _18;
+_GLIBCXX_PLACEHOLDER const _Placeholder<19> _19;
+_GLIBCXX_PLACEHOLDER const _Placeholder<20> _20;
+_GLIBCXX_PLACEHOLDER const _Placeholder<21> _21;
+_GLIBCXX_PLACEHOLDER const _Placeholder<22> _22;
+_GLIBCXX_PLACEHOLDER const _Placeholder<23> _23;
+_GLIBCXX_PLACEHOLDER const _Placeholder<24> _24;
+_GLIBCXX_PLACEHOLDER const _Placeholder<25> _25;
+_GLIBCXX_PLACEHOLDER const _Placeholder<26> _26;
+_GLIBCXX_PLACEHOLDER const _Placeholder<27> _27;
+_GLIBCXX_PLACEHOLDER const _Placeholder<28> _28;
+_GLIBCXX_PLACEHOLDER const _Placeholder<29> _29;
+
+#undef _GLIBCXX_PLACEHOLDER
  }

  /**




Re: [PATCH v2] libstdc++: Don't use gstdint.h anymore

2022-10-28 Thread Jonathan Wakely via Gcc-patches

On 20/10/22 16:20 +0200, Arsen Arsenović wrote:

libstdc++-v3/ChangeLog:

* configure.ac: Stop generating gstdint.h.
* src/c++11/compatibility-atomic-c++0x.cc: Stop using gstdint.h.
---


> +using guintptr_t = __UINTPTR_TYPE__;

I think this should be local in the only function that uses it.

Sure.

Tested on x86_64-pc-linux-gnu.



Thanks, I'll test and push this.



libstdc++-v3/configure.ac| 6 --
libstdc++-v3/src/c++11/compatibility-atomic-c++0x.cc | 8 
2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index 81d914b434a..c5ec976c026 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -440,12 +440,6 @@ GCC_CHECK_UNWIND_GETIPINFO

GCC_LINUX_FUTEX([AC_DEFINE(HAVE_LINUX_FUTEX, 1, [Define if futex syscall is 
available.])])

-if test "$is_hosted" = yes; then
-# TODO: remove this and change src/c++11/compatibility-atomic-c++0x.cc to
-# use  instead of .
-GCC_HEADER_STDINT(include/gstdint.h)
-fi
-
GLIBCXX_ENABLE_SYMVERS([yes])
AC_SUBST(libtool_VERSION)

diff --git a/libstdc++-v3/src/c++11/compatibility-atomic-c++0x.cc 
b/libstdc++-v3/src/c++11/compatibility-atomic-c++0x.cc
index 5a0c5459088..e21bd76245d 100644
--- a/libstdc++-v3/src/c++11/compatibility-atomic-c++0x.cc
+++ b/libstdc++-v3/src/c++11/compatibility-atomic-c++0x.cc
@@ -22,7 +22,6 @@
// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
// .

-#include "gstdint.h"
#include 
#include 

@@ -119,13 +118,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _GLIBCXX_CONST __atomic_flag_base*
  __atomic_flag_for_address(const volatile void* __z) _GLIBCXX_NOTHROW
  {
-uintptr_t __u = reinterpret_cast(__z);
+using guintptr_t = __UINTPTR_TYPE__;
+guintptr_t __u = reinterpret_cast(__z);
__u += (__u >> 2) + (__u << 4);
__u += (__u >> 7) + (__u << 5);
__u += (__u >> 17) + (__u << 13);
-if (sizeof(uintptr_t) > 4)
+if (sizeof(guintptr_t) > 4)
  __u += (__u >> 31);
-__u &= ~((~uintptr_t(0)) << LOGSIZE);
+__u &= ~((~guintptr_t(0)) << LOGSIZE);
return flag_table + __u;
  }





[Bug libstdc++/107376] regex executor requires allocator to be default constructible

2022-10-28 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107376

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |13.0

--- Comment #3 from Jonathan Wakely  ---
Fixed for GCC 13. Might be worth backporting too.

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-10-28 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 26 Oct 2022 at 21:07, Richard Sandiford
 wrote:
>
> Sorry for the slow response.  I wanted to find some time to think
> about this a bit more.
>
> Prathamesh Kulkarni  writes:
> > On Fri, 30 Sept 2022 at 21:38, Richard Sandiford
> >  wrote:
> >>
> >> Richard Sandiford via Gcc-patches  writes:
> >> > Prathamesh Kulkarni  writes:
> >> >> Sorry to ask a silly question but in which case shall we select 2nd 
> >> >> vector ?
> >> >> For num_poly_int_coeffs == 2,
> >> >> a1 /trunc n1 == (a1 + 0x) / (n1.coeffs[0] + n1.coeffs[1]*x)
> >> >> If a1/trunc n1 succeeds,
> >> >> 0 / n1.coeffs[1] == a1/n1.coeffs[0] == 0.
> >> >> So, a1 has to be < n1.coeffs[0] ?
> >> >
> >> > Remember that a1 is itself a poly_int.  It's not necessarily a constant.
> >> >
> >> > E.g. the TRN1 .D instruction maps to a VEC_PERM_EXPR with the selector:
> >> >
> >> >   { 0, 2 + 2x, 1, 4 + 2x, 2, 6 + 2x, ... }
> >>
> >> Sorry, should have been:
> >>
> >>   { 0, 2 + 2x, 2, 4 + 2x, 4, 6 + 2x, ... }
> > Hi Richard,
> > Thanks for the clarifications, and sorry for late reply.
> > I have attached POC patch that tries to implement the above approach.
> > Passes bootstrap+test on x86_64-linux-gnu and aarch64-linux-gnu for VLS 
> > vectors.
> >
> > For VLA vectors, I have only done limited testing so far.
> > It seems to pass couple of tests written in the patch for
> > nelts_per_pattern == 3,
> > and folds the following svld1rq test:
> > int32x4_t v = {1, 2, 3, 4};
> > return svld1rq_s32 (svptrue_b8 (), [0])
> > into:
> > return {1, 2, 3, 4, ...};
> > I will try to bootstrap+test it on SVE machine to test further for VLA 
> > folding.
> >
> > I have a couple of questions:
> > 1] When mask selects elements from same vector but from different patterns:
> > For eg:
> > arg0 = {1, 11, 2, 12, 3, 13, ...},
> > arg1 = {21, 31, 22, 32, 23, 33, ...},
> > mask = {0, 0, 0, 1, 0, 2, ... },
> > All have npatterns = 2, nelts_per_pattern = 3.
> >
> > With above mask,
> > Pattern {0, ...} selects arg0[0], ie {1, ...}
> > Pattern {0, 1, 2, ...} selects arg0[0], arg0[1], arg0[2], ie {1, 11, 2, ...}
> > While arg0[0] and arg0[2] belong to same pattern, arg0[1] belongs to 
> > different
> > pattern in arg0.
> > The result is:
> > res = {1, 1, 1, 11, 1, 2, ...}
> > In this case, res's 2nd pattern {1, 11, 2, ...} is encoded with:
> > with a0 = 1, a1 = 11, S = -9.
> > Is that expected tho ? It seems to create a new encoding which
> > wasn't present in the input vector. For instance, the next elem in
> > sequence would be -7,
> > which is not present originally in arg0.
>
> Yeah, you're right, sorry.  Going back to:
>
> (2) The explicit encoding can be used to produce a sequence of N*Ex*Px
> elements for any integer N.  This extended sequence can be reencoded
> as having N*Px patterns, with Ex staying the same.
>
> I guess we need to pick an N for the selector such that each new
> selector pattern (each one out of the N*Px patterns) selects from
> the *same pattern* of the same data input.
>
> So if a particular pattern in the selector has a step S, and the data
> input it selects from has Pi patterns, N*S must be a multiple of Pi.
> N must be a multiple of least_common_multiple(S,Pi)/S.
>
> I think that means that the total number of patterns in the result
> (Pr from previous messages) can safely be:
>
>   Ps * least_common_multiple(
> least_common_multiple(S[1], P[input(1)]) / S[1],
> ...
> least_common_multiple(S[Ps], P[input(Ps)]) / S[Ps]
>   )
>
> where:
>
>   Ps = the number of patterns in the selector
>   S[I] = the step for selector pattern I (I being 1-based)
>   input(I) = the data input selected by selector pattern I (I being 1-based)
>   P[I] = the number of patterns in data input I
>
> That's getting quite complicated :-)  If we allow arbitrary P[...]
> and S[...] then it could also get large.  Perhaps we should finally
> give up on the general case and limit this to power-of-2 patterns and
> power-of-2 steps, so that least_common_multiple becomes MAX.  Maybe that
> simplifies other things as well.
>
> What do you think?
Hi Richard,
Thanks for the suggestions. Yeah I suppose we can initially add support for
power-of-2 patterns and power-of-2 steps and try to generalize it in
follow up patches if possible.

Sorry if this sounds like a silly ques -- if we are going to have
pattern in selector, select *same pattern from same input vector*,
instead of re-encoding the selector to have N * Ps patterns, would it
make sense for elements in selector to denote pattern number itself
instead of element index
if input vectors are VLA ?

For eg:
op0 = {1, 2, 3, 4, 1, 2, 3, 5, 1, 2, 3, 6, ...}
op1 = {...}
with npatterns == 4, nelts_per_pattern == 3,
sel = {0, 3} should pick pattern 0 and pattern 3 from op0,
so, res = {1, 4, 1, 5, 1, 6, ...}
Not sure if this is correct tho.

Thanks,
Prathamesh
>
> > I suppose it's fine since if the user defines mask to have pattern {0,
> > 1, 2, ...}
> > they intended result to have pattern with above encoding.
> > 

Re: RFC - VRP1 default mode

2022-10-28 Thread Andrew MacLeod via Gcc-patches



On 10/28/22 10:14, Richard Biener wrote:



Am 28.10.2022 um 15:59 schrieb Andrew MacLeod :



On 10/28/22 09:46, Richard Biener wrote:

On Fri, Oct 28, 2022 at 3:43 PM Andrew MacLeod  wrote:

On 10/28/22 03:17, Richard Biener wrote:

On Wed, Oct 26, 2022 at 4:24 PM Andrew MacLeod  wrote:

Figured I would ask what you guys think of making ranger the default for
the VRP1 pass now.

With partial equivalences and the other bits I checked in the past few
weeks I'm not aware of much that the legacy VRP pass gets that ranger
doesn't.  The only exception to that which I am aware of is the trick
played with the unreachable edges to set global ranges, but that is done
in the DOM passes now anyway... so it just happens slightly later in the
optimization cycle.

Note DOM should go away at some point.  Why can this not happen during
ranger driven VRP?

I have been working on that for the last 2 days.  Turns out VRP1 can
remove builtin_unreachable from the
if (X)
  __builtin_unreachable ()

idiom and set the appropriate global ranges, but it has to leave those
with 2 ssa-names:

if (a_1 != b_2)
  __builtin_unreachable()

until the second pass of VRP or we lose the relationship between a_1 and
b_2.  That triggers some failures.  Specifically a vectorizor fail
because it cant be sure that the start and end point are not the same
without the condition in the IL. Trying to store global relations over
multiple passes would be problematic at this stage of development, so I
don't see a problem with leaving it that way.

Hmm, I don't remember VRP1 doing anything special with the above though?
Did it somehow propagate the (un!)conditional equivalence?

So as I looked at builtin_unreachable(), it was very adhoc.  That one of the 
roots of that artificial testcase in the PR I opened. Cascading calls were not 
being handled in a consistent way. VRP1 removed some, dom removed some..  they 
just kind of disappeared at some point, but not consistently.  The PR that Uli 
opened that Aldy fixed, I could make fail again with minor adjustments to the 
conditions.  So I worked on a consistent approach.

My guess is the old range stored globally for that case for a_1 was probably 
~[b_2, b_2]  meaning it was carried in the range. Until we have an overall 
global relation tracker, we can't represent that across passes.

The global ranges were never symbolic, this was at most used during VRP itself.



Ah. Just took a closer look at what use to happen.

legacy vrp1 never removed the unreachable call, it hung around until the 
threadfull2 ran just before vrp2. The testcase was an artificial 
vectorizing test with an infinite loop and unreachable in the final 
block.  Just part of the inconsistent removal :-P:


Andrew



[committed] libstdc++: Fix allocator propagation in regex algorithms [PR107376]

2022-10-28 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

The PR points out that we assume the match_results allocator is default
constuctible, which might not be true. We also have a related issue with
unwanted propagation from an object that might have an unequal
allocator.

Ideally we use the same allocator type for _State_info::_M_match_queue
but that would be an ABI change now. We should investigate if that can
be done without breaking anything, which might be possible because the
_Executor object is short-lived and never leaks out of the regex_match,
regex_search, and regex_replace algorithms. If we change the mangled
name for _Executor then there would be no ODR violations when mixing old
and new definitions. This commit does not attempt that.

libstdc++-v3/ChangeLog:

PR libstdc++/107376
* include/bits/regex_executor.h (_Executor::_Executor): Use same
allocator for _M_cur_results and _M_results.
* include/bits/regex_executor.tcc (_Executor::_M_main_dispatch):
Prevent possibly incorrect allocator propagating to
_M_cur_results.
* testsuite/28_regex/algorithms/regex_match/107376.cc: New test.
---
 libstdc++-v3/include/bits/regex_executor.h| 17 +++--
 libstdc++-v3/include/bits/regex_executor.tcc  |  3 +-
 .../28_regex/algorithms/regex_match/107376.cc | 76 +++
 3 files changed, 87 insertions(+), 9 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/28_regex/algorithms/regex_match/107376.cc

diff --git a/libstdc++-v3/include/bits/regex_executor.h 
b/libstdc++-v3/include/bits/regex_executor.h
index dc0878ce678..cdafcd5523d 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -71,14 +71,15 @@ namespace __detail
_ResultsVec&__results,
const _RegexT&  __re,
_FlagT  __flags)
-  : _M_begin(__begin),
-  _M_end(__end),
-  _M_re(__re),
-  _M_nfa(*__re._M_automaton),
-  _M_results(__results),
-  _M_rep_count(_M_nfa.size()),
-  _M_states(_M_nfa._M_start(), _M_nfa.size()),
-  _M_flags(__flags)
+  : _M_cur_results(__results.get_allocator()),
+   _M_begin(__begin),
+   _M_end(__end),
+   _M_re(__re),
+   _M_nfa(*__re._M_automaton),
+   _M_results(__results),
+   _M_rep_count(_M_nfa.size()),
+   _M_states(_M_nfa._M_start(), _M_nfa.size()),
+   _M_flags(__flags)
   {
using namespace regex_constants;
if (__flags & match_prev_avail) // ignore not_bol and not_bow
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index b93e958075e..a5885ed34ba 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -124,9 +124,10 @@ namespace __detail
break;
  std::fill_n(_M_states._M_visited_states, _M_nfa.size(), false);
  auto __old_queue = std::move(_M_states._M_match_queue);
+ auto __alloc = _M_cur_results.get_allocator();
  for (auto& __task : __old_queue)
{
- _M_cur_results = std::move(__task.second);
+ _M_cur_results = _ResultsVec(std::move(__task.second), __alloc);
  _M_dfs(__match_mode, __task.first);
}
  if (__match_mode == _Match_mode::_Prefix)
diff --git a/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/107376.cc 
b/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/107376.cc
new file mode 100644
index 000..da4f7ad0a23
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/107376.cc
@@ -0,0 +1,76 @@
+// { dg-do run { target c++11 } }
+#include 
+#include 
+#include 
+
+template
+struct Alloc
+{
+  using value_type = T;
+  explicit Alloc(int) { }
+  template Alloc(const Alloc&) { }
+
+  T* allocate(std::size_t n)
+  { return std::allocator().allocate(n); }
+  void deallocate(T* ptr, std::size_t n)
+  { std::allocator().deallocate(ptr, n); }
+
+  bool operator==(const Alloc&) const { return true; }
+  bool operator!=(const Alloc&) const { return false; }
+};
+
+void
+test_non_default_constructible()
+{
+  using sub_match = std::sub_match;
+  using alloc_type = Alloc;
+  using match_results = std::match_results;
+  match_results res(alloc_type(1));
+
+  std::regex_match("x", res, std::regex(".")); // PR libstdc++/107376
+}
+
+template
+struct PropAlloc
+{
+  int id;
+
+  using value_type = T;
+  explicit PropAlloc(int id) : id(id) { }
+  template PropAlloc(const PropAlloc& a) : id(a.id) { }
+
+  using propagate_on_container_move_assignment = std::true_type;
+  using propagate_on_container_copy_assignment = std::true_type;
+
+  PropAlloc select_on_container_copy_construction() const
+  { return PropAlloc(0); }
+
+  T* allocate(std::size_t n)
+  { return std::allocator().allocate(n); }
+  void deallocate(T* ptr, std::size_t n)
+  { std::allocator().deallocate(ptr, n); }
+
+  bool operator==(const 

[Bug libstdc++/107376] regex executor requires allocator to be default constructible

2022-10-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107376

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:988dd22ec6665117e8587389ac85389f1c321c45

commit r13-3548-g988dd22ec6665117e8587389ac85389f1c321c45
Author: Jonathan Wakely 
Date:   Tue Oct 25 13:03:12 2022 +0100

libstdc++: Fix allocator propagation in regex algorithms [PR107376]

The PR points out that we assume the match_results allocator is default
constuctible, which might not be true. We also have a related issue with
unwanted propagation from an object that might have an unequal
allocator.

Ideally we use the same allocator type for _State_info::_M_match_queue
but that would be an ABI change now. We should investigate if that can
be done without breaking anything, which might be possible because the
_Executor object is short-lived and never leaks out of the regex_match,
regex_search, and regex_replace algorithms. If we change the mangled
name for _Executor then there would be no ODR violations when mixing old
and new definitions. This commit does not attempt that.

libstdc++-v3/ChangeLog:

PR libstdc++/107376
* include/bits/regex_executor.h (_Executor::_Executor): Use same
allocator for _M_cur_results and _M_results.
* include/bits/regex_executor.tcc (_Executor::_M_main_dispatch):
Prevent possibly incorrect allocator propagating to
_M_cur_results.
* testsuite/28_regex/algorithms/regex_match/107376.cc: New test.

[Bug c++/107439] use of static member function in requires-expression depends on declaration order

2022-10-28 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107439

--- Comment #2 from Patrick Palka  ---
So the question is if in C++20 mode we're allowed to reject ahead of time a
call to an unknown template-id with dependent template arguments and no
function arguments (as in the original testcase):

template
void f() {
  g(); // OK? gcc rejects, clang/msvc accept
}

Re: [PATCH v3] LoongArch: Libvtv add loongarch support.

2022-10-28 Thread chenglulu



在 2022/10/28 17:38, WANG Xuerui 写道:

Hi,

The code change seems good but a few grammatical nits.

Patch subject should be a verb phrase, something like "libvtv: add 
LoongArch support" could be better.


Ok, thank you. I'll make the changes.




On 2022/10/28 16:01, Lulu Cheng wrote:
After several considerations, I decided to set VTV_PAGE_SIZE to 16KB 
under loongarch64.



v1 - > v2:

1. When the macro __loongarch_lp64 is defined, the VTV_PAGE_SIZE is 
set to 64K.
2. In the vtv_malloc.cc file __vtv_malloc_init function, it does not 
check

    whether VTV_PAGE_SIZE is equal to the system page size, if the macro
    __loongarch_lp64 is defined.

v2 -> v3:

Set VTV_PAGE_SIZE to 16KB under loongarch64.



All regression tests of libvtv passed.

 === libvtv Summary ===

# of expected passes    176

-


Are the monologue and changelog supposed to be a part of the actual 
commit? If not, conventionally they should be placed *after* the "---" 
line separating the commit message and diffstat/patch content.




The loongarch64 kernel supports 4KB,16KB, or 64KB pages,
but only 16k pages are currently supported in this code.
This sentence feels a little bit unnatural. I suggest just "The 
LoongArch specification permits page sizes of 4KiB, 16KiB and 64KiB, 
but only 16KiB pages are supported for now".


Co-Authored-By: qijingwen 

include/ChangeLog:

* vtv-change-permission.h (defined):
(VTV_PAGE_SIZE): Set VTV_PAGE_SIZE to 16KB under loongarch64.

"for loongarch64" feels more natural.


What I want to say is that loongarch64 supports different page sizes,

but loongarch32 will be supported later, and loongarch32 only

supports 4KiB page sizes, so this is loongarch64.



libvtv/ChangeLog:

* configure.tgt: Add loongarch support.
---
  include/vtv-change-permission.h | 5 +
  libvtv/configure.tgt    | 3 +++
  2 files changed, 8 insertions(+)

diff --git a/include/vtv-change-permission.h 
b/include/vtv-change-permission.h

index 70bdad92bca..f61d8b68ef6 100644
--- a/include/vtv-change-permission.h
+++ b/include/vtv-change-permission.h
@@ -48,6 +48,11 @@ extern void __VLTChangePermission (int);
  #else
  #if defined(__sun__) && defined(__svr4__) && defined(__sparc__)
  #define VTV_PAGE_SIZE 8192
+#elif defined(__loongarch_lp64)
+/* The page size can be configured to 4, 16, or 64KB configuring the 
kernel.

"The page size is configurable by the kernel to be 4, 16 or 64 KiB."
+   However, only 16KB pages are supported here. Please modify this 
macro if you

+   want to support other page sizes.  */


Are we actually encouraging the users to modify the sources themselves 
if they decide to run with non-16KiB page size? This might not even be 
feasible, as you're essentially telling them to recompile part of the 
toolchain, which they may not want to cannot do.


I think the message you want to convey here is for them to voice their 
need upstream so we can then discuss. In that case, the 2 sentences 
here could be:


"For now, only the default page size of 16KiB is supported. If you 
have a need for other page sizes, please get in touch."
Although I'm not sure if the vague "get in touch" wording is 
appropriate. What do others think?

I think ok, I can't think of a better way to say it.



+#define VTV_PAGE_SIZE 16384
  #else
  #define VTV_PAGE_SIZE 4096
  #endif
diff --git a/libvtv/configure.tgt b/libvtv/configure.tgt
index aa2a3f675b8..6cdd1e97ab1 100644
--- a/libvtv/configure.tgt
+++ b/libvtv/configure.tgt
@@ -50,6 +50,9 @@ case "${target}" in
  ;;
    x86_64-*-darwin[1]* | i?86-*-darwin[1]*)
  ;;
+  loongarch*-*-linux*)
+    VTV_SUPPORTED=yes
+    ;;
    *)
  ;;
  esac


[Bug middle-end/107436] Is -fsignaling-nans still experimental?

2022-10-28 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107436

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
(In reply to Florian Schanda from comment #5)
> Richard, if I may rephrase your statement (for clarity), you're saying:
> 
> > Under your assumptions, -fsignaling-nans should work. There are no known 
> > bugs
> > in this setup, but if you find something please report it.
> 
> Is that accurate?

No.  See the See Also bugs referenced in this bug.

[Bug middle-end/107411] trivial-auto-var-init=zero invalid uninitialized variable warning

2022-10-28 Thread qinzhao at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107411

qinzhao at gcc dot gnu.org changed:

   What|Removed |Added

 CC||qinzhao at gcc dot gnu.org

--- Comment #3 from qinzhao at gcc dot gnu.org ---
(In reply to Richard Biener from comment #2)
> 
> The gimplifier instead of
> 
>   _1 = t ();
>   D.2389 = _1;
>   e = 
>   _2 = *e;
>   f (_2);
> 
> produces
> 
>   _1 = .DEFERRED_INIT (4, 2, &"D.2389"[0]);
>   D.2389 = _1;
>   e = .DEFERRED_INIT (8, 2, &"e"[0]);
>   _2 = t ();
>   D.2389 = _2;
>   e = 
>   _3 = *e;
>   f (_3);
> 
> which is odd and sub-optimal at least.  Doing such things makes us rely
> on DSE to elide the uninit "inits".

Looks like that "_1 = t ()" was not treated as an initializer, therefore "_1"
was identified as an uninitialized var. "e = " has the same issue. 
should "_1 = t ()" be treated as an initializer to _1?

Re: RFC - VRP1 default mode

2022-10-28 Thread Richard Biener via Gcc-patches



> Am 28.10.2022 um 15:59 schrieb Andrew MacLeod :
> 
> 
>> On 10/28/22 09:46, Richard Biener wrote:
>>> On Fri, Oct 28, 2022 at 3:43 PM Andrew MacLeod  wrote:
>>> 
>>> On 10/28/22 03:17, Richard Biener wrote:
 On Wed, Oct 26, 2022 at 4:24 PM Andrew MacLeod  wrote:
> Figured I would ask what you guys think of making ranger the default for
> the VRP1 pass now.
> 
> With partial equivalences and the other bits I checked in the past few
> weeks I'm not aware of much that the legacy VRP pass gets that ranger
> doesn't.  The only exception to that which I am aware of is the trick
> played with the unreachable edges to set global ranges, but that is done
> in the DOM passes now anyway... so it just happens slightly later in the
> optimization cycle.
 Note DOM should go away at some point.  Why can this not happen during
 ranger driven VRP?
>>> I have been working on that for the last 2 days.  Turns out VRP1 can
>>> remove builtin_unreachable from the
>>>if (X)
>>>  __builtin_unreachable ()
>>> 
>>> idiom and set the appropriate global ranges, but it has to leave those
>>> with 2 ssa-names:
>>> 
>>>if (a_1 != b_2)
>>>  __builtin_unreachable()
>>> 
>>> until the second pass of VRP or we lose the relationship between a_1 and
>>> b_2.  That triggers some failures.  Specifically a vectorizor fail
>>> because it cant be sure that the start and end point are not the same
>>> without the condition in the IL. Trying to store global relations over
>>> multiple passes would be problematic at this stage of development, so I
>>> don't see a problem with leaving it that way.
>> Hmm, I don't remember VRP1 doing anything special with the above though?
>> Did it somehow propagate the (un!)conditional equivalence?
> 
> So as I looked at builtin_unreachable(), it was very adhoc.  That one of the 
> roots of that artificial testcase in the PR I opened. Cascading calls were 
> not being handled in a consistent way. VRP1 removed some, dom removed some..  
> they just kind of disappeared at some point, but not consistently.  The PR 
> that Uli opened that Aldy fixed, I could make fail again with minor 
> adjustments to the conditions.  So I worked on a consistent approach.
> 
> My guess is the old range stored globally for that case for a_1 was probably 
> ~[b_2, b_2]  meaning it was carried in the range. Until we have an overall 
> global relation tracker, we can't represent that across passes.

The global ranges were never symbolic, this was at most used during VRP itself.

> 
> It appears that leaving those until VRP2 works fine...  testsuite currently 
> running tho ;-)
> 
> Andrew
> 


[Bug tree-optimization/107346] [13 Regression] gnat.dg/loop_optimization23_pkg.ad failure afer r13-3413-ge10ca9544632db

2022-10-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107346

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Andre Simoes Dias Vieira
:

https://gcc.gnu.org/g:95decac3ce8c8c7c5302cd6fac005a10463de165

commit r13-3547-g95decac3ce8c8c7c5302cd6fac005a10463de165
Author: Andre Vieira 
Date:   Fri Oct 28 15:05:11 2022 +0100

vect: Reject non-byte offsets for gather/scatters [PR107346]

The ada failure reported in the PR was being caused by
vect_check_gather_scatter
failing to deal with bit offsets that weren't multiples of BITS_PER_UNIT.
This
patch makes vect_check_gather_scatter reject memory accesses with such
offsets.

gcc/ChangeLog:

PR tree-optimization/107346
* tree-vect-data-refs.cc (vect_check_gather_scatter): Reject
offsets
that aren't multiples of BITS_PER_UNIT.

[Bug c++/107439] use of static member function in requires-expression depends on declaration order

2022-10-28 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107439

Patrick Palka  changed:

   What|Removed |Added

 CC||ppalka at gcc dot gnu.org

--- Comment #1 from Patrick Palka  ---
> This seems inconsistent, as member functions are normally expected to be 
> usable anywhere within the class definition.

IIUC associated constraints are part of the function signature and thus aren't
late parsed like the function body is, so later-declared members aren't
generally usable in a constraint.

Interestingly, if we add a dummy argument to 'check' then we accept the call
(and treat it as an ADL-enabled call to an unknown function template where
unqualified lookup found nothing):

struct A
{
  template requires (check(0))
  auto func(T) { }

  template
  static consteval bool check(0) { return true; }
};

But if we then try to actually use func, its constraint will always fail due to
'check' not being visible at the point of use (since associated constraints
aren't late-parsed):

int main() {
  A a;
  a.func(0); // error: ‘check’ was not declared in this scope, and no
declarations were found by argument-dependent lookup at the point of
instantiation
}

This behavior (for the modified testcase) is correct AFAICT (Clang behaves the
same).

Re: RFC - VRP1 default mode

2022-10-28 Thread Andrew MacLeod via Gcc-patches



On 10/28/22 09:46, Richard Biener wrote:

On Fri, Oct 28, 2022 at 3:43 PM Andrew MacLeod  wrote:


On 10/28/22 03:17, Richard Biener wrote:

On Wed, Oct 26, 2022 at 4:24 PM Andrew MacLeod  wrote:

Figured I would ask what you guys think of making ranger the default for
the VRP1 pass now.

With partial equivalences and the other bits I checked in the past few
weeks I'm not aware of much that the legacy VRP pass gets that ranger
doesn't.  The only exception to that which I am aware of is the trick
played with the unreachable edges to set global ranges, but that is done
in the DOM passes now anyway... so it just happens slightly later in the
optimization cycle.

Note DOM should go away at some point.  Why can this not happen during
ranger driven VRP?

I have been working on that for the last 2 days.  Turns out VRP1 can
remove builtin_unreachable from the
if (X)
  __builtin_unreachable ()

idiom and set the appropriate global ranges, but it has to leave those
with 2 ssa-names:

if (a_1 != b_2)
  __builtin_unreachable()

until the second pass of VRP or we lose the relationship between a_1 and
b_2.  That triggers some failures.  Specifically a vectorizor fail
because it cant be sure that the start and end point are not the same
without the condition in the IL. Trying to store global relations over
multiple passes would be problematic at this stage of development, so I
don't see a problem with leaving it that way.

Hmm, I don't remember VRP1 doing anything special with the above though?
Did it somehow propagate the (un!)conditional equivalence?


So as I looked at builtin_unreachable(), it was very adhoc.  That one of 
the roots of that artificial testcase in the PR I opened. Cascading 
calls were not being handled in a consistent way. VRP1 removed some, dom 
removed some..  they just kind of disappeared at some point, but not 
consistently.  The PR that Uli opened that Aldy fixed, I could make fail 
again with minor adjustments to the conditions.  So I worked on a 
consistent approach.


My guess is the old range stored globally for that case for a_1 was 
probably ~[b_2, b_2]  meaning it was carried in the range. Until we have 
an overall global relation tracker, we can't represent that across passes.


It appears that leaving those until VRP2 works fine...  testsuite 
currently running tho ;-)


Andrew



Re: vect: Make vect_check_gather_scatter reject offsets that aren't multiples of BITS_PER_UNIT [PR107346]

2022-10-28 Thread Richard Biener via Gcc-patches
On Fri, 28 Oct 2022, Andre Vieira (lists) wrote:

> 
> On 24/10/2022 14:29, Richard Biener wrote:
> > On Mon, 24 Oct 2022, Andre Vieira (lists) wrote:
> >
> >> Changing if-convert would merely change this testcase but we could still
> >> trigger using a different structure type, changing the size of Int24 to 32
> >> bits rather than 24:
> >> package Loop_Optimization23_Pkg is
> >>    type Nibble is mod 2**4;
> >>    type Int24  is mod 2**32;  -- Changed this from 24->32
> >>    type StructA is record
> >>      a : Nibble;
> >>      b : Int24;
> >>    end record;
> >>    pragma Pack(StructA);
> >>    type StructB is record
> >>      a : Nibble;
> >>      b : StructA;
> >>    end record;
> >>    pragma Pack(StructB);
> >>    type ArrayOfStructB is array(0..100) of StructB;
> >>    procedure Foo (X : in out ArrayOfStructB);
> >> end Loop_Optimization23_Pkg;
> >>
> >> This would yield a DR_REF (dr): (*x_7(D))[_1].b.b  where the last 'b' isn't
> >> a
> >> DECL_BIT_FIELD anymore, but the first one still is and still has the
> >> non-multiple of BITS_PER_UNIT offset. Thus passing the
> >> vect_find_stmt_data_reference check and triggering the
> >> vect_check_gather_scatter failure. So unless we go and make sure we always
> >> set
> >> the DECL_BIT_FIELD on all subsequent accesses of a DECL_BIT_FIELD 'struct'
> >> (which is odd enough on its own) then we are better off catching the issue
> >> in
> >> vect_check_gather_scatter ?
> > But it's not only an issue with scatter-gather, other load/store handling
> > assumes it can create a pointer to the start of the access and thus
> > requires BITS_PER_UNIT alignment for each of them.  So we need to fail
> > at data-ref analysis somehow.
> >
> > Richard.
> 
> Sorry for the delay on this, had some other things come in between. After our
> IRC discussion I believe we agreed that it would be neater to check this in
> vect_check_gather_scatter as I did in the original patch in
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604139.html
> The main reasons being that to check earlier we'd need to walk the DR_REF to
> look for any FIELD_DECL that has DECL_BIT_FIELD set and we decided against
> that.
> 
> Can you confirm the original patch is OK for trunk?

Yes.

Thanks,
Richard.

> Kind regards,
> Andre


Re: RFC - VRP1 default mode

2022-10-28 Thread Richard Biener via Gcc-patches
On Fri, Oct 28, 2022 at 3:43 PM Andrew MacLeod  wrote:
>
>
> On 10/28/22 03:17, Richard Biener wrote:
> > On Wed, Oct 26, 2022 at 4:24 PM Andrew MacLeod  wrote:
> >> Figured I would ask what you guys think of making ranger the default for
> >> the VRP1 pass now.
> >>
> >> With partial equivalences and the other bits I checked in the past few
> >> weeks I'm not aware of much that the legacy VRP pass gets that ranger
> >> doesn't.  The only exception to that which I am aware of is the trick
> >> played with the unreachable edges to set global ranges, but that is done
> >> in the DOM passes now anyway... so it just happens slightly later in the
> >> optimization cycle.
> > Note DOM should go away at some point.  Why can this not happen during
> > ranger driven VRP?
>
> I have been working on that for the last 2 days.  Turns out VRP1 can
> remove builtin_unreachable from the
>if (X)
>  __builtin_unreachable ()
>
> idiom and set the appropriate global ranges, but it has to leave those
> with 2 ssa-names:
>
>if (a_1 != b_2)
>  __builtin_unreachable()
>
> until the second pass of VRP or we lose the relationship between a_1 and
> b_2.  That triggers some failures.  Specifically a vectorizor fail
> because it cant be sure that the start and end point are not the same
> without the condition in the IL. Trying to store global relations over
> multiple passes would be problematic at this stage of development, so I
> don't see a problem with leaving it that way.

Hmm, I don't remember VRP1 doing anything special with the above though?
Did it somehow propagate the (un!)conditional equivalence?

> bultin_unreachables() from switches get removed during the second pass
> of switch-conversion... which I presume remains OK.
>
> Anyway, thats pretty much under control.  Patch probably coming later today.
>
>
>
> >> There is one test case that needs adjustment for
> >> that which was just checking for a mask in DOM2
> >> (gcc.dg/tree-ssa/pr107009.c).   At this point I have not aware of
> >> anything that Id be concerned about, and the testsuite seems to run
> >> cleanly.
> > Did you enable Ada?  The only feature I don't see implemented is
> > symbolic range handling which boils down to general base + constant offset
> > range endpoints (that's what symbolic ranges allow).  That area was
> > specifically improved to optimize range checks emitted by the Ada frontend
> > but IIRC also applies to fortran -frange-check (not sure about test coverage
> > of that).
> I get a clean testsuite run configured and bootstrapped with
>
> --enable-languages=c,c++,go,fortran,ada,obj-c++,jit --enable-host-shared
>
> Is there a PR or specific tests in either fortran or ada for those
> improvements? ie, something specific I should check for? Part of rangers
> point is to be able to do symbolic relationships without storing the
> symbolic in the range, just picking it up from the IL as needed.

I'm defering to Eric here.

Richard.

> Andrew
>
>


Re: vect: Make vect_check_gather_scatter reject offsets that aren't multiples of BITS_PER_UNIT [PR107346]

2022-10-28 Thread Andre Vieira (lists) via Gcc-patches



On 24/10/2022 14:29, Richard Biener wrote:

On Mon, 24 Oct 2022, Andre Vieira (lists) wrote:


Changing if-convert would merely change this testcase but we could still
trigger using a different structure type, changing the size of Int24 to 32
bits rather than 24:
package Loop_Optimization23_Pkg is
   type Nibble is mod 2**4;
   type Int24  is mod 2**32;  -- Changed this from 24->32
   type StructA is record
     a : Nibble;
     b : Int24;
   end record;
   pragma Pack(StructA);
   type StructB is record
     a : Nibble;
     b : StructA;
   end record;
   pragma Pack(StructB);
   type ArrayOfStructB is array(0..100) of StructB;
   procedure Foo (X : in out ArrayOfStructB);
end Loop_Optimization23_Pkg;

This would yield a DR_REF (dr): (*x_7(D))[_1].b.b  where the last 'b' isn't a
DECL_BIT_FIELD anymore, but the first one still is and still has the
non-multiple of BITS_PER_UNIT offset. Thus passing the
vect_find_stmt_data_reference check and triggering the
vect_check_gather_scatter failure. So unless we go and make sure we always set
the DECL_BIT_FIELD on all subsequent accesses of a DECL_BIT_FIELD 'struct'
(which is odd enough on its own) then we are better off catching the issue in
vect_check_gather_scatter ?

But it's not only an issue with scatter-gather, other load/store handling
assumes it can create a pointer to the start of the access and thus
requires BITS_PER_UNIT alignment for each of them.  So we need to fail
at data-ref analysis somehow.

Richard.


Sorry for the delay on this, had some other things come in between. 
After our IRC discussion I believe we agreed that it would be neater to 
check this in vect_check_gather_scatter as I did in the original patch 
in https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604139.html
The main reasons being that to check earlier we'd need to walk the 
DR_REF to look for any FIELD_DECL that has DECL_BIT_FIELD set and we 
decided against that.


Can you confirm the original patch is OK for trunk?

Kind regards,
Andre



  1   2   >