[PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-14 Thread Siddhesh Poyarekar
The size argument in strncmp only describe the maximum length to which
to compare two strings and is not an indication of sizes of the two
source strings.  Do not warn if it is larger than the two input strings
because it is entirely likely that the size argument is a conservative
maximum to accommodate inputs of different lengths and only a subset is
reachable through the current code path or that it is some other
application-specific property completely unrelated to the sizes of the
input strings.

gcc/ChangeLog:

middle-end/104854
* gimple-ssa-warn-access.cc
(pass_waccess::warn_zero_sized_strncmp_inputs): New function.
(pass_waccess::check_strncmp): Use it.

gcc/testsuite/ChangeLog:

middle-end/104854
* gcc.dg/Wstringop-overread.c (test_strncmp_array): Don't expect
failures for non-zero sizes.

Signed-off-by: Siddhesh Poyarekar 
---

Changes from v1:

A little better approach, ensuring that it tries to warn on zero length
inputs if the size of at least one of the two sources is known.

Also cc'ing Martin so that we can discuss approach on the list instead
of on the bug.  To summarize the discussion so far, Martin suggests that
the warning be split into levels but I'm contesting the utility of the
heuristics as a compiler warning given the looseness of the relationship
between the size argument and the inputs in the case of these functions.


 gcc/gimple-ssa-warn-access.cc | 69 +--
 gcc/testsuite/gcc.dg/Wstringop-overread.c |  2 +-
 2 files changed, 28 insertions(+), 43 deletions(-)

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 75297ed7c9e..15299770e29 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -2137,6 +2137,9 @@ private:
   /* Return true if use follows an invalidating statement.  */
   bool use_after_inval_p (gimple *, gimple *, bool = false);
 
+  /* Emit an overread warning for zero sized inputs to strncmp.  */
+  void warn_zero_sized_strncmp_inputs (gimple *, tree *, access_data *);
+
   /* A pointer_query object to store information about pointers and
  their targets in.  */
   pointer_query m_ptr_qry;
@@ -2619,8 +2622,20 @@ pass_waccess::check_stxncpy (gcall *stmt)
data.mode, , m_ptr_qry.rvals);
 }
 
-/* Check a call STMT to stpncpy() or strncpy() for overflow and warn
-   if it does.  */
+/* Warn for strncmp on a zero sized source or when an argument isn't
+   nul-terminated.  */
+void
+pass_waccess::warn_zero_sized_strncmp_inputs (gimple *stmt, tree *bndrng,
+ access_data *pad)
+{
+  tree func = get_callee_fndecl (stmt);
+  location_t loc = gimple_location (stmt);
+  maybe_warn_for_bound (OPT_Wstringop_overread, loc, stmt, func, bndrng,
+   size_zero_node, pad);
+}
+
+/* Check a call STMT to strncmp () for overflow and warn if it does.  This is
+   limited to checking for NUL terminated arrays for now.  */
 
 void
 pass_waccess::check_strncmp (gcall *stmt)
@@ -2678,46 +2693,16 @@ pass_waccess::check_strncmp (gcall *stmt)
   if (!bndrng[0] || integer_zerop (bndrng[0]))
 return;
 
-  if (len1 && tree_int_cst_lt (len1, bndrng[0]))
-bndrng[0] = len1;
-  if (len2 && tree_int_cst_lt (len2, bndrng[0]))
-bndrng[0] = len2;
-
-  /* compute_objsize almost never fails (and ultimately should never
- fail).  Don't bother to handle the rare case when it does.  */
-  if (!compute_objsize (arg1, stmt, 1, , _ptr_qry)
-  || !compute_objsize (arg2, stmt, 1, , _ptr_qry))
-return;
-
-  /* Compute the size of the remaining space in each array after
- subtracting any offset into it.  */
-  offset_int rem1 = adata1.src.size_remaining ();
-  offset_int rem2 = adata2.src.size_remaining ();
-
-  /* Cap REM1 and REM2 at the other if the other's argument is known
- to be an unterminated array, either because there's no space
- left in it after adding its offset or because it's constant and
- has no nul.  */
-  if (rem1 == 0 || (rem1 < rem2 && lendata1.decl))
-rem2 = rem1;
-  else if (rem2 == 0 || (rem2 < rem1 && lendata2.decl))
-rem1 = rem2;
-
-  /* Point PAD at the array to reference in the note if a warning
- is issued.  */
-  access_data *pad = len1 ?  : 
-  offset_int maxrem = wi::max (rem1, rem2, UNSIGNED);
-  if (lendata1.decl || lendata2.decl
-  || maxrem < wi::to_offset (bndrng[0]))
-{
-  /* Warn when either argument isn't nul-terminated or the maximum
-remaining space in the two arrays is less than the bound.  */
-  tree func = get_callee_fndecl (stmt);
-  location_t loc = gimple_location (stmt);
-  maybe_warn_for_bound (OPT_Wstringop_overread, loc, stmt, func,
-   bndrng, wide_int_to_tree (sizetype, maxrem),
-   pad);
-}
+  /* compute_objsize almost never fails (and ultimately should never fail).
+ Don't bother to handle the rare case 

Re: [PATCH] rs6000: Fix invalid address passed to __builtin_mma_disassemble_acc [PR104923]

2022-03-14 Thread Peter Bergner via Gcc-patches
I forgot to CC the gcc-patches mailing list on the original patch submission.
Adding it in now.  Sorry.



On 3/14/22 8:24 PM, Segher Boessenkool wrote:
>> gcc/
>>     PR target/104923
>>     * config/rs6000/predicates.md (mma_disassemble_output_operand):
>>     Restrict acceptable MEM addresses.
>>
>> gcc/testsuite/
>>     PR target/104923
>>     * gcc.target/powerpc/pr104923.c: New test.
> 
> Changelogs are indented with tabs, not with four spaces.  Maybe the
> commit script will fix this up though, dunno.

They are actually indented correctly.  I think when I was fixing one
of my Thunderbird setting for my IBM account, it reverted my settings
to always send in plain-text and probably munged the whitespace on
me.  I think I fixed that now.  Thanks for letting me know it was
broken.




> Including that tab the maximum line length is 80.  You don't have to end
> that first line with a colon (it looks like something is missing like
> this)

Ok, I can move "Restrict" to the line above and still stay within 80 chars.



>> +  if (MEM_P (op))
>> +    return indexed_or_indirect_address (XEXP (op, 0), mode)
>> +       || quad_address_p (XEXP (op, 0), mode, false);
> 
> The indent of this last line is wrong.  The || should align with
> indexed_or_indirect_address, and every leading eight spaces is a tab.

Again, it's indented correctly for me, but got munged when sending.
Hopefully fixed now.


> You might want to name that common expression, "rtx addr = XEXP (op, 0);"
> or something.  Dunno what is best

Will do.



> Please put that new MEM_P code first, followed by a blank line, and only
> then do the SUBREG thing.  As written it will allow subregs of mem.  And
> the blank line is important of course ;-)

Will do.


> Okay for trunk with those changes.  Also okay for 10 and 11 after an
> appropriate soak period.  Thanks!

Thanks.  I'll verify things are still working with the changes agreed
to above before committing.  Thanks.


> Btw.  A good cleanup would be to have mma_assemble_input_operand written
> like this, too?

Ok, I'll have a look at doing that as part of a separate change.

Peter



Re: [PATCH] libgomp : OMPD implementation

2022-03-14 Thread Mohamed Atef via Gcc-patches
On Tue, Mar 15, 2022 at 2:21 AM Mohamed Atef 
wrote:

> Hi Jakub,
> i will fix all the issues and i will use the script on the website.
> On Mon, Mar 14, 2022 at 6:23 PM Jakub Jelinek  wrote:
>
>> Hi!
>>
>> Sorry for the delay, GCC is currently in stage 4, which means a lot of us
>> concentrate on fixing GCC 12 so that it can be released soon and projects
>> that are clearly GCC 13 material are much lower priority.
>>
> Never mind, thank you anyway.
>
>>
>> On Wed, Feb 16, 2022 at 11:04:13PM +0200, Mohamed Atef via Gcc-patches
>> wrote:
>> > --- a/libgomp/ChangeLog
>> > +++ b/libgomp/ChangeLog
>> > @@ -1,3 +1,30 @@
>>
>> ChangeLog entries should be posted in the patches and eventually should be
>> in the commit message, but the patch shouldn't change ChangeLog files
>> directly, we have nightly scripts that handle that.
>>
>> > +2022-02-16  Mohamed Atef  
>> > +
>> > +* Makefile.am (toolexeclib_LTLIBRARIES): Add libgompd.la.
>>
>> For ChangeLog entries, https://gcc.gnu.org/contribute.html specifies
>> how it should be formatted.
>> E.g. there should be a tab instead of 8 spaces at the start of the lines
>> (except for the date/name/email line), the above has spaces.
>>
>> > +(libgompd_la_LDFLAGS, libgompd_la_DEPENDENCIES,
>> libgompd_la_LINK,
>> > +libgompd_la_SOURCES, libgompd_version_dep,
>> libgompd_version_script,
>> > +libgompd.ver-sun, libgompd.ver, libgompd_version_info):
>> Defined.
>>
>> No trailing whitespace.  I think it would be better to use : New.
>>
>> > +* Makefile.in: Regenerate.
>> > +* aclocal.m4: Regenerate.
>> > +* config/darwin/plugin-suffix.h: Removed ().
>> > +* config/hpux/plugin-suffix.h: Removed ().
>> > +* config/posix/plugin-suffix.h: Removed ().
>>
>> These don't say what has changed, and the tense is wrong.  So should be
>> like:
>> * config/posix/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
>> or so.
>>
>> > +* configure: Regenerate.
>> > +* env.c: (#include "ompd-support.h") : Added.
>>
>> The above correctly mentions just the filename and not a particular thing
>> in it for the added include, but the rest is incorreclty formatted.
>> Should be:
>> * env.c: Include ompd-support.h
>> or so.
>>
>> > +(initialize_env) : Call ompd_load().
>>
>> There should be no space in between ) and :.  Just say
>> (initialize_env): Call ompd_load.
>> No need for the ()s in there.
>>
>> > +* parallel.c:(#include "ompd-support.h"): Added.
>>
>> See above.
>>
>> > +(GOMP_parallel) : Call ompd_bp_parallel_begin and
>> ompd_bp_parallel_end.
>> > +* libgomp.map: Add OMP_5.0.3 symobl versions.
>> > +* libgompd.map: New file.
>> > +* omp-tools.h.in : New file.
>> > +* omp-types.h.in : New file.
>> > +* ompd-support.h : New file.
>> > +* ompd-support.c : New file.
>> > +* ompd-helper.h : New file.
>>
>> See above for the superfluous spaces before :.
>>
>> > +* ompd-helper.c: New file.
>> > +* ompd-init.c: New file.
>> > +* testsuite/Makfile.in: Regenerate.
>>
>> Typo, Makefile.in instead of Makfile.in (the checking script would prevent
>> commit because of this).
>>
>> > --- a/libgomp/Makefile.am
>> > +++ b/libgomp/Makefile.am
>> ...
>> > +libgompd_version_info = -version-info $(libtool_VERSION)
>> >  libgomp_la_LDFLAGS = $(libgomp_version_info) $(libgomp_version_script)
>> \
>> >  $(lt_host_flags)
>> > +libgompd_la_LDFLAGS = $(libgompd_version_info)
>> $(libgompd_version_script) \
>> > + $(lt_host_flags)
>> >  libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
>> > +libgompd_la_DEPENDENCIES = $(libgompd_version_dep)
>> >  libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
>> > -
>> > +libgompd_la_LINK = $(LINK) $(libgompd_la_LDFLAGS)
>>
>> Please preserve the empty line above libgomp_la_SOURCES.
>>
>> > --- a/libgomp/Makefile.in
>> > +++ b/libgomp/Makefile.in
>> > @@ -1,7 +1,7 @@
>> > -# Makefile.in generated by automake 1.15.1 from Makefile.am.
>> > +# Makefile.in generated by automake 1.16.1 from Makefile.am.
>>
>> Please make sure you regenerate files with the exact autoconf/automake
>> versions as used previously, we don't want the generated files
>> jump backwards/forwards whenever somebody regenerates them.
>> You can e.g. build automake 1.15.1 and autoconf 2.69 if you don't have
>> those, make install DESTDIR=/your/home/automake-1.15.1 etc.
>> and then just regenerate with those in $PATH.
>>
> done.
>
>> > --- a/libgomp/env.c
>> > +++ b/libgomp/env.c
>> > @@ -1483,6 +1484,8 @@ initialize_env (void)
>> >   = thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
>> >  }
>> >parse_int_secure ("GOMP_DEBUG", _debug_var, true);
>> > +  if(gomp_debug_var)
>> > + ompd_load();
>>
>> Formatting.  Again, https://gcc.gnu.org/contribute.html contains
>> the rules or just watch how does surrounding code look like.
>> The errors are:

Re: RFA: crc builtin functions & optimizations

2022-03-14 Thread Oleg Endo
On Mon, 2022-03-14 at 18:04 -0700, Andrew Pinski via Gcc-patches wrote:
> On Mon, Mar 14, 2022 at 5:33 PM Joern Rennecke
>  wrote:
> > 
> > Most microprocessors have efficient ways to perform CRC operations, be
> > that with lookup tables, rotates, or even special instructions.
> > However, because we lack a representation for CRC in the compiler, we
> > can't do proper instruction selection.  With this patch I seek out to
> > rectify this,
> > I've avoided using a mode name for the built-in functions because that
> > would tie the semantics to the size of the addressable unit.  We
> > generally use abbreviations like s/l/ll for type names, which is all
> > right when the type can be widened without changing semantics.  For
> > the data input, however, we also have to consider the shift count that
> > is tied to it.  That is why I used a number to designate the width of
> > the data input and shift.
> > 
> > For machine support, I made a start with 8 and 16 bit little-endian
> > CRC for RISCV using a
> > lookup table.  I am sure once we have the basic infrastructure in the
> > tree, we'll get more
> > contributions of suitable named patterns for various ports.
> 
> 
> A few points.
> There are at least 9 different polynomials for the CRC-8 in common use today.
> For CRC-32 there are 5 different polynomials used.
> You don't have a patch to invoke.texi adding the descriptions of the builtins.
> How is your polynom 3rd argument described? Is it similar to how it is
> done on the wiki for the CRC?
> Does it make sense to have to list the most common polynomials in the
> documentation?
> 
> Also I am sorry but micro-optimizing coremarks is just wrong. Maybe it
> is better to pick the CRC32 that is inside zip instead for a testcase
> and benchmarking against?
> Or even the CRC32C for iSCSI/ext4.
> 
> I see you also don't optimize the case where you have three other
> variants of polynomials that are reversed, reciprocal and reversed
> reciocal.

In my own CRC library I've got ~30 'commonly used' CRC types, based on
the following generic definition:

template <
  // number of crc result bits (polynomial order in bits)
  unsigned int BitCount,

  // normal polynomial without the leading 1 bit.
  typename crc_impl_detail::select_int::type TruncPoly,

  // initial remainder
  typename crc_impl_detail::select_int::type InitRem = 0,

  // final xor value
  typename crc_impl_detail::select_int::type FinalXor = 0,

  // input data byte reflected before processing (LSB / MSB first)
  bool ReflectInput = false,

  // output CRC reflected before the xor
  bool ReflectRemainder = false >
class crc
{
...
};


and then it goes like ...

// CRC-1 (most hardware; also known as parity bit)
// x + 1
typedef crc < 1, 0x01 > crc_1;

// CRC-3
typedef crc < 3, 0x03, 0x07, 0x00, true, true> crc_3;

...

// CRC-32 (ISO 3309, ANSI X3.66, FIPS PUB 71, FED-STD-1003, ITU-T V.42, 
Ethernet, SATA, MPEG-2, Gzip, PKZIP, POSIX cksum, PNG, ZMODEM)
// x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + 
x^4 + x^2 + x + 1
typedef crc < 32, 0x04C11DB7, 0x, 0x, true, true > crc_32;

typedef crc < 32, 0x04C11DB7, 0x7FFF, 0x7FFF, false, false > 
crc_32_mpeg2;
typedef crc < 32, 0x04C11db7, 0x, 0x, false, false > 
crc_32_posix;

...


It then generates the lookup tables at compile time into .rodata for
the types that are used in the program, which is great for MCUs with
more flash/ROM than RAM.

Specific CRC types can be overridden if the system has a better way to
calculate the CRC, e.g. as hardware peripheral.

This being a library makes it relatively easy to tune and customize for
various systems.

How would that work together with your proposal?

Cheers,
Oleg



Ping^1 [PATCH, rs6000] Correct match pattern in pr56605.c

2022-03-14 Thread HAO CHEN GUI via Gcc-patches
Hi,
  Gentle ping this:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590958.html
Thanks

On 28/2/2022 上午 11:17, HAO CHEN GUI wrote:
> Hi,
>   This patch corrects the match pattern in pr56605.c. The former pattern
> is wrong and test case fails with GCC11. It should match following insn on
> each subtarget after mode promotion is disabled. The patch need to be
> backported to GCC11.
> 
> //gimple
> _17 = (unsigned int) _20;
>  prolog_loop_niters.4_23 = _17 & 3;
> 
> //rtl
> (insn 19 18 20 2 (parallel [
> (set (reg:CC 208)
> (compare:CC (and:SI (subreg:SI (reg:DI 207) 0)
> (const_int 3 [0x3]))
> (const_int 0 [0])))
> (set (reg:SI 129 [ prolog_loop_niters.5 ])
> (and:SI (subreg:SI (reg:DI 207) 0)
> (const_int 3 [0x3])))
> ]) 197 {*andsi3_imm_mask_dot2}
> 
> 
>   Bootstrapped and tested on powerpc64-linux BE/LE and AIX with no 
> regressions.
> Is this okay for trunk and GCC11? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-02-28 Haochen Gui 
> 
> gcc/testsuite/
>   PR target/102146
>   * gcc.target/powerpc/pr56605.c: Correct match pattern in combine pass.
> 
> 
> patch.diff
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c 
> b/gcc/testsuite/gcc.target/powerpc/pr56605.c
> index fdedbfc573d..231d808aa99 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c
> @@ -11,5 +11,5 @@ void foo (short* __restrict sb, int* __restrict ia)
>  ia[i] = (int) sb[i];
>  }
> 
> -/* { dg-final { scan-rtl-dump-times {\(compare:CC 
> \((?:and|zero_extend):(?:DI) \((?:sub)?reg:[SD]I} 1 "combine" } } */
> +/* { dg-final { scan-rtl-dump-times {\(compare:CC \(and:SI \(subreg:SI 
> \(reg:DI} 1 "combine" } } */
> 


Ping^1 [PATCH, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-14 Thread HAO CHEN GUI via Gcc-patches
Hi,
  Gentle ping this:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591507.html
Thanks

On 10/3/2022 下午 2:31, HAO CHEN GUI wrote:
> Hi,
>This patch adds V1TI mode into mode iterator used in vector comparison
> expands.With the patch, both built-ins and direct comparison could generate
> P10 new V1TI comparison instructions.
> 
>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions. Is
> this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-03-09 Haochen Gui 
> 
> gcc/
>   PR target/103316
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
>   gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
>   RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
>   RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
>   * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10
>   V1TI instructions.
>   (vec_cmp): Set mode iterator to VEC_IC.
>   (vec_cmpu): Likewise.
> 
> gcc/testsuite/
>   PR target/103316
>   * gcc.target/powerpc/pr103316.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 5d34c1bcfc9..143effa89bf 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1994,6 +1994,7 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPEQUH:
>  case RS6000_BIF_VCMPEQUW:
>  case RS6000_BIF_VCMPEQUD:
> +case RS6000_BIF_VCMPEQUT:
>  /* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
> folding produces worse code for 128-bit compares.  */
>fold_compare_helper (gsi, EQ_EXPR, stmt);
> @@ -2002,6 +2003,7 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPNEB:
>  case RS6000_BIF_VCMPNEH:
>  case RS6000_BIF_VCMPNEW:
> +case RS6000_BIF_VCMPNET:
>  /* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
> folding produces worse code for 128-bit compares.  */
>fold_compare_helper (gsi, NE_EXPR, stmt);
> @@ -2015,6 +2017,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPGE_U4SI:
>  case RS6000_BIF_CMPGE_2DI:
>  case RS6000_BIF_CMPGE_U2DI:
> +case RS6000_BIF_CMPGE_1TI:
> +case RS6000_BIF_CMPGE_U1TI:
>  /* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
> for now, because gimple folding produces worse code for 128-bit
> compares.  */
> @@ -2029,6 +2033,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPGTUW:
>  case RS6000_BIF_VCMPGTUD:
>  case RS6000_BIF_VCMPGTSD:
> +case RS6000_BIF_VCMPGTUT:
> +case RS6000_BIF_VCMPGTST:
>  /* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
> for now, because gimple folding produces worse code for 128-bit
> compares.  */
> @@ -2043,6 +2049,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPLE_U4SI:
>  case RS6000_BIF_CMPLE_2DI:
>  case RS6000_BIF_CMPLE_U2DI:
> +case RS6000_BIF_CMPLE_1TI:
> +case RS6000_BIF_CMPLE_U1TI:
>  /* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
> for now, because gimple folding produces worse code for 128-bit
> compares.  */
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index b87a742cca8..1afb8a6d786 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; Vector int modes for comparison
> +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI V1TI])
> +
>  ;; 128-bit int modes
>  (define_mode_iterator VEC_TI [V1TI TI])
> 
> @@ -533,11 +536,12 @@ (define_expand "vcond_mask_"
> 
>  ;; For signed integer vectors comparison.
>  (define_expand "vec_cmp"
> -  [(set (match_operand:VEC_I 0 "vint_operand")
> +  [(set (match_operand:VEC_IC 0 "vint_operand")
>   (match_operator 1 "signed_or_equality_comparison_operator"
> -   [(match_operand:VEC_I 2 "vint_operand")
> -(match_operand:VEC_I 3 "vint_operand")]))]
> -  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> +   [(match_operand:VEC_IC 2 "vint_operand")
> +(match_operand:VEC_IC 3 "vint_operand")]))]
> +  "(VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && mode!= V1TImode)
> +   || (mode == V1TImode && TARGET_POWER10)"
>  {
>enum rtx_code code = GET_CODE (operands[1]);
>rtx tmp = gen_reg_rtx (mode);
> @@ -573,11 +577,12 @@ (define_expand "vec_cmp"
> 
>  ;; For unsigned integer vectors comparison.
>  (define_expand "vec_cmpu"
> -  [(set (match_operand:VEC_I 0 "vint_operand")
> +  [(set (match_operand:VEC_IC 0 "vint_operand")
>   (match_operator 1 "unsigned_or_equality_comparison_operator"
> -   [(match_operand:VEC_I 2 "vint_operand")
> -  

Re: RFA: crc builtin functions & optimizations

2022-03-14 Thread Andrew Pinski via Gcc-patches
On Mon, Mar 14, 2022 at 5:33 PM Joern Rennecke
 wrote:
>
> Most microprocessors have efficient ways to perform CRC operations, be
> that with lookup tables, rotates, or even special instructions.
> However, because we lack a representation for CRC in the compiler, we
> can't do proper instruction selection.  With this patch I seek out to
> rectify this,
> I've avoided using a mode name for the built-in functions because that
> would tie the semantics to the size of the addressable unit.  We
> generally use abbreviations like s/l/ll for type names, which is all
> right when the type can be widened without changing semantics.  For
> the data input, however, we also have to consider the shift count that
> is tied to it.  That is why I used a number to designate the width of
> the data input and shift.
>
> For machine support, I made a start with 8 and 16 bit little-endian
> CRC for RISCV using a
> lookup table.  I am sure once we have the basic infrastructure in the
> tree, we'll get more
> contributions of suitable named patterns for various ports.


A few points.
There are at least 9 different polynomials for the CRC-8 in common use today.
For CRC-32 there are 5 different polynomials used.
You don't have a patch to invoke.texi adding the descriptions of the builtins.
How is your polynom 3rd argument described? Is it similar to how it is
done on the wiki for the CRC?
Does it make sense to have to list the most common polynomials in the
documentation?

Also I am sorry but micro-optimizing coremarks is just wrong. Maybe it
is better to pick the CRC32 that is inside zip instead for a testcase
and benchmarking against?
Or even the CRC32C for iSCSI/ext4.

I see you also don't optimize the case where you have three other
variants of polynomials that are reversed, reciprocal and reversed
reciocal.

Also a huge problem, you don't check to make sure the third argument
to the crc builtin function is constant in the rsicv backend. Plus
since you expose the crc builtins as a non target specific builtin, I
assume there should be a libcall right and therefore either you need
to have a generic expansion for it or a function added to libgcc?
Also why not expand generically to the table or a loop based on a
target hook instead of in the backend? This way a target can choose
without even doing much.

Thanks,
Andrew Pinski



>
> bootstrapped on x86_64-pc-linux-gnu .


RFA: crc builtin functions & optimizations

2022-03-14 Thread Joern Rennecke
Most microprocessors have efficient ways to perform CRC operations, be
that with lookup tables, rotates, or even special instructions.
However, because we lack a representation for CRC in the compiler, we
can't do proper instruction selection.  With this patch I seek out to
rectify this,
I've avoided using a mode name for the built-in functions because that
would tie the semantics to the size of the addressable unit.  We
generally use abbreviations like s/l/ll for type names, which is all
right when the type can be widened without changing semantics.  For
the data input, however, we also have to consider the shift count that
is tied to it.  That is why I used a number to designate the width of
the data input and shift.

For machine support, I made a start with 8 and 16 bit little-endian
CRC for RISCV using a
lookup table.  I am sure once we have the basic infrastructure in the
tree, we'll get more
contributions of suitable named patterns for various ports.

bootstrapped on x86_64-pc-linux-gnu .
2022-03-14  Jon Beniston  
Joern Rennecke  

* Makefile.in (OBJS): Add tree-crc.o .
* builtin-types.def (BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Define.
(BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise.
(BT_FN_UINT16_UINT16_UINT32_CONST_SIZE): Likewise.
* builtins.cc (associated_internal_fn):
Handle BUILT_IN_CRC8S, BUILT_IN_CRC16S, BUILT_IN_CRC32S.
* builtins.def (BUILT_IN_CRC8S, BUILT_IN_CRC16S, BUILT_IN_CRC32S):
New builtin functions.
* cgraph.cc (cgraph_node::verify_node):
Allow const calls without a callgraph edge.
* common.opt (fcrc): New option.
* doc/invoke.texi (-fcrc): Document.
* gimple-match-head.cc: #include predict.h .
* internal-fn.cc (crc_direct): Define.
(expand_crc_optab_fn): New function.
(direct_crc_optab_supported_p): Define.
* internal-fn.def (CRC, CRC_BE): New internal optab functions.
* match.pd: Match a pair of crc operations.
* optabs.def (crc_optab, crc_be_optab): New optabs.
* passes.def (pass_crc): Add new pass.
* tree-crc.cc: New file.
* tree-pass.h (make_pass_crc): Declare.

testsuite:
* gcc.c-torture/compile/crc.c: New test.
* gcc.dg/tree-ssa/crc.c: Likewise.
* gcc.dg/tree-ssa/crc-2.c: likewise.
* gcc.dg/tree-ssa/pr59597.c: Add flag -fno-crc .

config/riscv:
* crc.md: New file.
* riscv-protos.h (expand_crc_lookup, print_crc_table): Declare.
* riscv.cc (compute_crc): New function.
(print_crc_table, expand_crc_lookup): Likewise.
* riscv.md: Include crc.md.
* riscv.opt (msmall-memory): New option.
* tree-crc-doc.txt: New file.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 31ff95500c9..a901925511b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1612,6 +1612,7 @@ OBJS = \
tree-cfgcleanup.o \
tree-chrec.o \
tree-complex.o \
+   tree-crc.o \
tree-data-ref.o \
tree-dfa.o \
tree-diagnostic.o \
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 3a7cecdf087..aa7751a6d5a 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -872,3 +872,9 @@ DEF_FUNCTION_TYPE_2 (BT_FN_VOID_VPTR_LDOUBLE, BT_VOID,
 BT_VOLATILE_PTR, BT_LONGDOUBLE)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_VPTR_SIZE, BT_VOID,
 BT_VOLATILE_PTR, BT_SIZE)
+DEF_FUNCTION_TYPE_3 (BT_FN_UINT16_UINT16_UINT8_CONST_SIZE, BT_UINT16,
+BT_UINT16, BT_UINT8, BT_CONST_SIZE)
+DEF_FUNCTION_TYPE_3 (BT_FN_UINT16_UINT16_UINT16_CONST_SIZE, BT_UINT16,
+BT_UINT16, BT_UINT16, BT_CONST_SIZE)
+DEF_FUNCTION_TYPE_3 (BT_FN_UINT16_UINT16_UINT32_CONST_SIZE, BT_UINT16,
+BT_UINT16, BT_UINT32, BT_CONST_SIZE)
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 4c6c2939053..37c28c930ac 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2175,6 +2175,9 @@ associated_internal_fn (built_in_function fn, tree 
return_type)
return IFN_LDEXP;
   return IFN_LAST;
 
+case BUILT_IN_CRC8S: case BUILT_IN_CRC16S: case BUILT_IN_CRC32S:
+  return IFN_CRC;
+
 default:
   return IFN_LAST;
 }
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 005976f34e9..24aaca34406 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -850,6 +850,9 @@ DEF_GCC_BUILTIN(BUILT_IN_CLRSB, "clrsb", 
BT_FN_INT_INT, ATTR_CONST_NOTHR
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBIMAX, "clrsbimax", BT_FN_INT_INTMAX, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBL, "clrsbl", BT_FN_INT_LONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBLL, "clrsbll", BT_FN_INT_LONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_CRC8S, "crc8s", 
BT_FN_UINT16_UINT16_UINT8_CONST_SIZE, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_CRC16S, "crc16s", 

Re: [PATCH] libgomp : OMPD implementation

2022-03-14 Thread Mohamed Atef via Gcc-patches
Hi Jakub,
i will fix all the issues and i will use the script on the website.
On Mon, Mar 14, 2022 at 6:23 PM Jakub Jelinek  wrote:

> Hi!
>
> Sorry for the delay, GCC is currently in stage 4, which means a lot of us
> concentrate on fixing GCC 12 so that it can be released soon and projects
> that are clearly GCC 13 material are much lower priority.
>
Never mind, thank you anyway.

>
> On Wed, Feb 16, 2022 at 11:04:13PM +0200, Mohamed Atef via Gcc-patches
> wrote:
> > --- a/libgomp/ChangeLog
> > +++ b/libgomp/ChangeLog
> > @@ -1,3 +1,30 @@
>
> ChangeLog entries should be posted in the patches and eventually should be
> in the commit message, but the patch shouldn't change ChangeLog files
> directly, we have nightly scripts that handle that.
>
> > +2022-02-16  Mohamed Atef  
> > +
> > +* Makefile.am (toolexeclib_LTLIBRARIES): Add libgompd.la.
>
> For ChangeLog entries, https://gcc.gnu.org/contribute.html specifies
> how it should be formatted.
> E.g. there should be a tab instead of 8 spaces at the start of the lines
> (except for the date/name/email line), the above has spaces.
>
> > +(libgompd_la_LDFLAGS, libgompd_la_DEPENDENCIES,
> libgompd_la_LINK,
> > +libgompd_la_SOURCES, libgompd_version_dep,
> libgompd_version_script,
> > +libgompd.ver-sun, libgompd.ver, libgompd_version_info):
> Defined.
>
> No trailing whitespace.  I think it would be better to use : New.
>
> > +* Makefile.in: Regenerate.
> > +* aclocal.m4: Regenerate.
> > +* config/darwin/plugin-suffix.h: Removed ().
> > +* config/hpux/plugin-suffix.h: Removed ().
> > +* config/posix/plugin-suffix.h: Removed ().
>
> These don't say what has changed, and the tense is wrong.  So should be
> like:
> * config/posix/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
> or so.
>
> > +* configure: Regenerate.
> > +* env.c: (#include "ompd-support.h") : Added.
>
> The above correctly mentions just the filename and not a particular thing
> in it for the added include, but the rest is incorreclty formatted.
> Should be:
> * env.c: Include ompd-support.h
> or so.
>
> > +(initialize_env) : Call ompd_load().
>
> There should be no space in between ) and :.  Just say
> (initialize_env): Call ompd_load.
> No need for the ()s in there.
>
> > +* parallel.c:(#include "ompd-support.h"): Added.
>
> See above.
>
> > +(GOMP_parallel) : Call ompd_bp_parallel_begin and
> ompd_bp_parallel_end.
> > +* libgomp.map: Add OMP_5.0.3 symobl versions.
> > +* libgompd.map: New file.
> > +* omp-tools.h.in : New file.
> > +* omp-types.h.in : New file.
> > +* ompd-support.h : New file.
> > +* ompd-support.c : New file.
> > +* ompd-helper.h : New file.
>
> See above for the superfluous spaces before :.
>
> > +* ompd-helper.c: New file.
> > +* ompd-init.c: New file.
> > +* testsuite/Makfile.in: Regenerate.
>
> Typo, Makefile.in instead of Makfile.in (the checking script would prevent
> commit because of this).
>
> > --- a/libgomp/Makefile.am
> > +++ b/libgomp/Makefile.am
> ...
> > +libgompd_version_info = -version-info $(libtool_VERSION)
> >  libgomp_la_LDFLAGS = $(libgomp_version_info) $(libgomp_version_script) \
> >  $(lt_host_flags)
> > +libgompd_la_LDFLAGS = $(libgompd_version_info)
> $(libgompd_version_script) \
> > + $(lt_host_flags)
> >  libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
> > +libgompd_la_DEPENDENCIES = $(libgompd_version_dep)
> >  libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
> > -
> > +libgompd_la_LINK = $(LINK) $(libgompd_la_LDFLAGS)
>
> Please preserve the empty line above libgomp_la_SOURCES.
>
> > --- a/libgomp/Makefile.in
> > +++ b/libgomp/Makefile.in
> > @@ -1,7 +1,7 @@
> > -# Makefile.in generated by automake 1.15.1 from Makefile.am.
> > +# Makefile.in generated by automake 1.16.1 from Makefile.am.
>
> Please make sure you regenerate files with the exact autoconf/automake
> versions as used previously, we don't want the generated files
> jump backwards/forwards whenever somebody regenerates them.
> You can e.g. build automake 1.15.1 and autoconf 2.69 if you don't have
> those, make install DESTDIR=/your/home/automake-1.15.1 etc.
> and then just regenerate with those in $PATH.
>
done.

> > --- a/libgomp/env.c
> > +++ b/libgomp/env.c
> > @@ -1483,6 +1484,8 @@ initialize_env (void)
> >   = thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
> >  }
> >parse_int_secure ("GOMP_DEBUG", _debug_var, true);
> > +  if(gomp_debug_var)
> > + ompd_load();
>
> Formatting.  Again, https://gcc.gnu.org/contribute.html contains
> the rules or just watch how does surrounding code look like.
> The errors are:
> 1) space in between if and ( is missing
> 2) ompd_load should be indented 2 further columns from if (and
>any consecutive 8 spaces at the start of lines replaced with
>tab in *.c/*.C 

Call for testers: shrink wrapping without a prologue

2022-03-14 Thread Joern Rennecke
I noticed that when there are registers to save (that can vary with
ABI), shrink-wrapping would
arrange for a more expeditious early return than when there were no
registers to save,
but still some dull argument copies to make for the main function,
even if they are not
needed for the early return path.  Most of the logic to do
shrink-wrapping also in the absence
of register saves is already there, and the generated code indeed
looks better when this
is thus used.  However, I couldn't find a difference in the execution
time of the benchmarks
I was looking at, presumably because the function didn't actually
return early (doing
things with an array of N elements where N might be zero... but it
isn't for the actual data).

Does someone have a benchmark / computing load where the early return
is beneficial?  Or conversely, harmful?
2022-03-14  Joern Rennecke  

* common.opt (fearly-return): New option.
* shrink-wrap.cc (try_early_return): New function.
(try_shrink_wrapping): Call try_early_return.

diff --git a/gcc/common.opt b/gcc/common.opt
index 8b6513de47c..901287fcad6 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3607,4 +3607,8 @@ fipa-ra
 Common Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 
+fearly-return
+Common Var(flag_early_return) Optimization Init(1)
+Extend shrink-wrapping to prologue-free functions.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc
index 30166bd20eb..31ab0ecff10 100644
--- a/gcc/shrink-wrap.cc
+++ b/gcc/shrink-wrap.cc
@@ -586,6 +586,42 @@ handle_simple_exit (edge e)
 INSN_UID (ret), e->src->index);
 }
 
+/* Even if there is no prologue, we might have a number of argument
+   copy and initialization statements in the first basic block that
+   might be unnecessary if we return early.  */
+/* ??? This might be overly agressive for super-scalar processors without
+   speculative execution in that we migth want to keep enough instructions
+   in front of the branch to fill all issue slots.
+
+   If the branch depends on a register copied from another register
+   immediately before, later passes already take care of propagating the
+   copy into the branch.  */
+void
+try_early_return (edge *entry_edge)
+{
+  basic_block entry = (*entry_edge)->dest;
+  if (EDGE_COUNT (entry->succs) != 2 || !single_pred_p (entry))
+return;
+  edge e;
+  edge_iterator ei;
+  const int max_depth = 20;
+
+  FOR_EACH_EDGE (e, ei, entry->succs)
+{
+  basic_block dst = e->dest;
+  for (int i = max_depth; --i; dst = single_succ (dst))
+   {
+ if (dst == EXIT_BLOCK_PTR_FOR_FN (cfun))
+   {
+ prepare_shrink_wrap (entry);
+ return;
+   }
+ if (!single_succ_p (dst))
+   break;
+   }
+}
+}
+
 /* Try to perform a kind of shrink-wrapping, making sure the
prologue/epilogue is emitted only around those parts of the
function that require it.
@@ -666,7 +702,11 @@ try_shrink_wrapping (edge *entry_edge, rtx_insn 
*prologue_seq)
break;
   }
   if (empty_prologue)
-return;
+{
+  if (flag_early_return)
+   try_early_return (entry_edge);
+  return;
+}
 
   /* Move some code down to expose more shrink-wrapping opportunities.  */
 


Re: [PATCH] c++: Fix up cp_parser_skip_to_pragma_eol [PR104623]

2022-03-14 Thread Jason Merrill via Gcc-patches

On 3/12/22 14:22, Jakub Jelinek wrote:

Hi!

We ICE on the following testcase, because we tentatively parse it multiple
times and the erroneous attribute syntax results in
cp_parser_skip_to_end_of_statement, which when seeing CPP_PRAGMA (can be
any deferred one, OpenMP/OpenACC/ivdep etc.) it calls
cp_parser_skip_to_pragma_eol, which calls cp_lexer_purge_tokens_after.
That call purges all the tokens from CPP_PRAGMA until CPP_PRAGMA_EOL,
excluding the initial CPP_PRAGMA though (but including the final
CPP_PRAGMA_EOL).  This means the second time we parse this, we see
CPP_PRAGMA with no tokens after it from the pragma, most importantly
not the CPP_PRAGMA_EOL, so either if it is the last pragma in the TU,
we ICE, or if there are other pragmas we treat everything in between
as a pragma.

I've tried various things, including making the CPP_PRAGMA token
itself also purged, or changing the cp_parser_skip_to_end_of_statement
(and cp_parser_skip_to_end_of_block_or_statement) to call it with
NULL instead of token, so that this purging isn't done there,
but each patch resulted in lots of regressions.
But removing the purging altogether surprisingly doesn't regress anything,
and I think it is the right thing, if we e.g. parse tentatively, why can't
we parse the pragma multiple times or at least skip over it?

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.

The purging seems to date back to rth's r109336 that introduced CPP_PRAGMA.


2022-03-12  Jakub Jelinek  

PR c++/104623
* parser.cc (cp_parser_skip_to_pragma_eol): Don't purge any tokens.

* g++.dg/gomp/pr104623.C: New test.

--- gcc/cp/parser.cc.jj 2022-03-11 13:11:53.622094878 +0100
+++ gcc/cp/parser.cc2022-03-11 14:45:36.877647173 +0100
@@ -4111,8 +4111,6 @@ cp_parser_skip_to_pragma_eol (cp_parser*
  
if (pragma_tok)

  {
-  /* Ensure that the pragma is not parsed again.  */
-  cp_lexer_purge_tokens_after (parser->lexer, pragma_tok);
parser->lexer->in_pragma = false;
if (parser->lexer->in_omp_attribute_pragma
  && cp_lexer_next_token_is (parser->lexer, CPP_EOF))
--- gcc/testsuite/g++.dg/gomp/pr104623.C.jj 2022-03-11 14:22:15.724288282 
+0100
+++ gcc/testsuite/g++.dg/gomp/pr104623.C2022-03-11 14:22:06.746413835 
+0100
@@ -0,0 +1,9 @@
+// PR c++/104623
+// { dg-do compile }
+
+void
+foo ()
+{
+  struct __attribute__() a // { dg-error "expected primary-expression 
before" }
+  #pragma omp task
+}

Jakub





Re: [PATCH] c++: fold calls to std::move/forward [PR96780]

2022-03-14 Thread Jason Merrill via Gcc-patches

On 3/14/22 13:13, Patrick Palka wrote:

On Fri, 11 Mar 2022, Jason Merrill wrote:


On 3/10/22 11:27, Patrick Palka wrote:

On Wed, 9 Mar 2022, Jason Merrill wrote:


On 3/1/22 18:08, Patrick Palka wrote:

A well-formed call to std::move/forward is equivalent to a cast, but the
former being a function call means it comes with bloated debug info,
which
persists even after the call has been inlined away, for an operation
that
is never interesting to debug.

This patch addresses this problem in a relatively ad-hoc way by folding
calls to std::move/forward into casts as part of the frontend's general
expression folding routine.  After this patch with -O2 and a
non-checking
compiler, debug info size for some testcases decreases by about ~10% and
overall compile time and memory usage decreases by ~2%.


Impressive.  Which testcases?


I saw the largest percent reductions in debug file object size in
various tests from cmcstl2 and range-v3, e.g.
test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp
(which are among their biggest tests).

Significant reductions in debug object file size can be observed in
some libstdc++ testcases too, such as a 5.5% reduction in
std/ranges/adaptor/join.cc



Do you also want to handle addressof and as_const in this patch, as
Jonathan
suggested?


Yes, good idea.  Since each of their argument and return types are
indirect types, I think we can use the same NOP_EXPR-based folding for
them.



I think we can do this now, and think about generalizing more in stage 1.


Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something we
want to consider for GCC 12?

PR c++/96780

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : When optimizing,
fold calls to std::move/forward into simple casts.
* cp-tree.h (is_std_move_p, is_std_forward_p): Declare.
* typeck.cc (is_std_move_p, is_std_forward_p): Export.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr96780.C: New test.
---
gcc/cp/cp-gimplify.cc  | 18 ++
gcc/cp/cp-tree.h   |  2 ++
gcc/cp/typeck.cc   |  6 ++
gcc/testsuite/g++.dg/opt/pr96780.C | 24 
4 files changed, 46 insertions(+), 4 deletions(-)
create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index d7323fb5c09..0b009b631c7 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2756,6 +2756,24 @@ cp_fold (tree x)
  case CALL_EXPR:
  {
+   if (optimize


I think this should check flag_no_inline rather than optimize.


Sounds good.

Here's a patch that extends the folding to as_const and addressof (as
well as __addressof, which I'm kind of unsure about since it's
non-standard).  I suppose it also doesn't hurt to verify that the return
and argument type of the function are sane before we commit to folding.

-- >8 --

Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]

A well-formed call to std::move/forward is equivalent to a cast, but the
former being a function call means the compiler generates debug info for
it, which persists even after the call has been inlined away, for an
operation that's never interesting to debug.

This patch addresses this problem in a relatively ad-hoc way by folding
calls to std::move/forward and other cast-like functions into simple
casts as part of the frontend's general expression folding routine.
After this patch with -O2 and a non-checking compiler, debug info size
for some testcases decreases by about ~10% and overall compile time and
memory usage decreases by ~2%.

PR c++/96780

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : When optimizing,
fold calls to std::move/forward and other cast-like functions
into simple casts.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr96780.C: New test.
---
   gcc/cp/cp-gimplify.cc  | 36 +++-
   gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++
   2 files changed, 73 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index d7323fb5c09..efc4c8f0eb9 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2756,9 +2756,43 @@ cp_fold (tree x)
 case CALL_EXPR:
 {
-   int sv = optimize, nw = sv;
tree callee = get_callee_fndecl (x);
   +/* "Inline" calls to std::move/forward and other cast-like functions
+  by simply folding them into the corresponding cast determined by
+  their return type.  This is cheaper than relying on the middle-end
+  to do so, and also means we avoid generating useless debug info for
+  them at all.
+
+  At this point the argument has already been converted into a
+  reference, so it suffices to use a NOP_EXPR to express the
+  cast.  */
+ 

Re: [PATCH] c++: extraneous access error with ambiguous lookup [PR103177]

2022-03-14 Thread Jason Merrill via Gcc-patches

On 3/14/22 13:24, Patrick Palka wrote:

When a lookup is ambiguous, lookup_member still attempts to check
access of the first member found before diagnosing the ambiguity and
erroring out, which may cause us to issue an extraneous access error
in case of ambiguous lookup as in the testcase below (for B1::foo).

This patch fixes this by swapping the order of the ambiguity and access
checks within lookup_member.  In passing, since the only possible error
that could happen during lookup_field_r is an ambiguity error, we might
as well hardcode that in lookup_member instead and remove the unneeded
lookup_field_info::errstr.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/103177

gcc/cp/ChangeLog:

* search.cc (lookup_field_info::errstr): Remove this data
member.
(lookup_field_r): Don't set errstr.
(lookup_member): Check ambiguity before checking access.
Simplify accordingly after errstr removal.  Exit early upon
error or empty result.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/ambig6.C: New test.
---
  gcc/cp/search.cc | 37 
  gcc/testsuite/g++.dg/lookup/ambig6.C | 18 ++
  2 files changed, 34 insertions(+), 21 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/lookup/ambig6.C

diff --git a/gcc/cp/search.cc b/gcc/cp/search.cc
index 2b8210408fb..85e3e7cb487 100644
--- a/gcc/cp/search.cc
+++ b/gcc/cp/search.cc
@@ -941,8 +941,6 @@ struct lookup_field_info {
tree ambiguous;
/* If nonzero, we are looking for types, not data members.  */
int want_type;
-  /* If something went wrong, a message indicating what.  */
-  const char *errstr;
  };
  
  /* True for a class member means that it is shared between all objects

@@ -1055,7 +1053,6 @@ lookup_field_r (tree binfo, void *data)
  /* Add the new value.  */
  lfi->ambiguous = tree_cons (NULL_TREE, nval, lfi->ambiguous);
  TREE_TYPE (lfi->ambiguous) = error_mark_node;
- lfi->errstr = G_("request for member %qD is ambiguous");
}
  }
else
@@ -1127,8 +1124,6 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
   checks.  Whereas rval is only set if a proper (not hidden)
   non-function member is found.  */
  
-  const char *errstr = 0;

-
if (name == error_mark_node
|| xbasetype == NULL_TREE
|| xbasetype == error_mark_node)
@@ -1172,7 +1167,6 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
rval_binfo = lfi.rval_binfo;
if (rval_binfo)
  type = BINFO_TYPE (rval_binfo);
-  errstr = lfi.errstr;
  
/* If we are not interested in ambiguities, don't report them;

   just return NULL_TREE.  */
@@ -1187,6 +1181,19 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
protect = 0;
  }
  
+  if (protect == 1 && lfi.ambiguous)

+{
+  if (complain & tf_error)
+   {
+ error ("request for member %qD is ambiguous", name);
+ print_candidates (lfi.ambiguous);
+   }
+  return error_mark_node;
+}
+
+  if (!rval)
+return NULL_TREE;
+
/* [class.access]
  
   In the case of overloaded function names, access control is

@@ -1206,8 +1213,7 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
  
  only the first call to "f" is valid.  However, if the function is

  static, we can check.  */
-  if (rval && protect
-  && !really_overloaded_fn (rval))
+  if (protect && !really_overloaded_fn (rval))
  {
tree decl = is_overloaded_fn (rval) ? get_first_fn (rval) : rval;
decl = strip_using_decl (decl);
@@ -1216,21 +1222,10 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
  && !DECL_NONSTATIC_MEMBER_FUNCTION_P (decl)
  && !perform_or_defer_access_check (basetype_path, decl, decl,
 complain, afi))
-   rval = error_mark_node;
-}
-
-  if (errstr && protect)
-{
-  if (complain & tf_error)
-   {
- error (errstr, name, type);
- if (lfi.ambiguous)
-   print_candidates (lfi.ambiguous);
-   }
-  rval = error_mark_node;
+   return error_mark_node;
  }
  
-  if (rval && is_overloaded_fn (rval)

+  if (is_overloaded_fn (rval)
/* Don't use a BASELINK for class-scope deduction guides since
 they're not actually member functions.  */
&& !dguide_name_p (name))
diff --git a/gcc/testsuite/g++.dg/lookup/ambig6.C 
b/gcc/testsuite/g++.dg/lookup/ambig6.C
new file mode 100644
index 000..45e8effba43
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/ambig6.C
@@ -0,0 +1,18 @@
+// PR c++/103177
+
+struct B1 {
+private:
+  static int foo();
+};
+
+struct B2 {
+  int foo();
+};
+
+struct D : B1, B2 { };
+
+void f() {
+  D d;
+  d.foo(); // { dg-error "ambiguous" }
+  // { dg-bogus "private" "" { 

Re: [PATCH] libstdc++: Ensure that std::from_chars is declared when supported

2022-03-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Mar 2022 at 14:17, Patrick Palka via Libstdc++
 wrote:
>
> On Fri, 11 Mar 2022, Jonathan Wakely wrote:
>
> > Patrick, I think this is right, but please take a look to double check.
> >
> > I think we should fix the feature-test macro conditions for gcc-11 too,
> > although it's a bit more complicated there. It should depend on IEEE
> > float and double *and* uselocale. We don't need the other changes on the
> > branch.
>
> Don't we still depend on uselocale in GCC 12 for long double from_chars,
> at least on targets where long double != binary64?

Not after this patch:

from_chars(const char* first, const char* last, long double& value,
  chars_format fmt) noexcept
{
-#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \
-  && ! USE_STRTOD_FOR_FROM_CHARS
+#if ! USE_STRTOD_FOR_FROM_CHARS
+  // Either long double is the same as double, or we can't use strtold.
+  // In the latter case, this might give an incorrect result (e.g. values
+  // out of range of double give an error, even if they fit in long double).

If uselocale isn't available, this defines the long double overload in
terms of the double one, even if that doesn't always give the right
answers. That greatly simplifies the preprocessor conditions for when
it's supported. If the float and double forms are present, so is the
long double one.


[PATCH] Ignore (possible) signed zeros in operands of FP comparisons.

2022-03-14 Thread Roger Sayle

I've been wondering about the possible performance/missed-optimization
impact of my patch for PR middle-end/98420 and similar IEEE correctness
fixes that disable constant folding optimizations when worrying about -0.0.
In the common situation where the floating point result is used by a
FP comparison, there's no distinction between +0.0 and -0.0, so some
HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.

Consider the following interesting example:

int foo(int x, double y) {
return (x * 0.0) < y;
}

Although we know that x (when converted to double) can't be NaN or Inf,
we still worry that for negative values of x that (x * 0.0) may be -0.0
and so perform the multiplication at run-time.  But in this case, the
result of the comparison (-0.0 < y) will be exactly the same as (+0.0 < y)
for any y, hence the above may be safely constant folded to "0.0 < y"
avoiding the multiplication at run-time.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures, and allows GCC to continue to
optimize cases that we optimized in GCC 11 (without regard to correctness).
Ok for mainline?


2022-03-14  Roger Sayle  

gcc/ChangeLog
* match.pd (X CMP (Y-Y) -> X CMP 0.0): New transformation.
(X CMP (Y * 0.0) -> X CMP 0.0): Likewise.
(X CMP X -> true): Test tree_expr_maybe_nan_p instead of HONOR_NANS.
(X LTGT X -> false): Enable if X is not tree_expr_maybe_nan_p, as
this can't trap/signal.

gcc/testsuite/ChangeLog
* gcc.dg/fold-compare-9.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index 8b44b5c..f3f7865 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4671,6 +4671,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if (single_use (@2))
 (cmp @0 @1)
 
+/* Floating point comparisons can ignore signed zeros.  */
+(for cmp (tcc_comparison)
+ /* Simplify (X - X) CMP Y even with -frounding-math.  */
+ (simplify
+  (cmp (minus @0 @0) @1)
+  (if (FLOAT_TYPE_P (TREE_TYPE (@0))
+   && !tree_expr_maybe_nan_p (@0)
+   && !tree_expr_maybe_infinite_p (@0))
+   (cmp { build_zero_cst (TREE_TYPE (@0)); } @1)))
+ /* Simplify X CMP (Y - Y) even with -frounding-math.  */
+ (simplify
+  (cmp @0 (minus @1 @1))
+  (if (FLOAT_TYPE_P (TREE_TYPE (@1))
+   && !tree_expr_maybe_nan_p (@1)
+   && !tree_expr_maybe_infinite_p (@1))
+   (cmp @0 { build_zero_cst (TREE_TYPE (@1)); })))
+ /* Simplify (X * 0.0) CMP Y.  */
+ (simplify
+  (cmp (mult @0 real_zerop@1) @2)
+  (if (!tree_expr_maybe_nan_p (@0)
+   && !tree_expr_maybe_infinite_p (@0))
+   (cmp @1 @2)))
+ /* Simplify X CMP (Y * 0.0).  */
+ (simplify
+  (cmp @0 (mult @1 real_zerop@2))
+  (if (!tree_expr_maybe_nan_p (@1)
+   && !tree_expr_maybe_infinite_p (@0))
+   (cmp @0 @2
+
 /* Simplify (x < 0) ^ (y < 0) to (x ^ y) < 0 and
(x >= 0) ^ (y >= 0) to (x ^ y) < 0.  */
 (for cmp (lt ge)
@@ -4743,7 +4772,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (simplify
   (cmp @0 @0)
   (if (! FLOAT_TYPE_P (TREE_TYPE (@0))
-   || ! HONOR_NANS (@0))
+   || ! tree_expr_maybe_nan_p (@0))
{ constant_boolean_node (true, type); }
(if (cmp != EQ_EXPR
/* With -ftrapping-math conversion to EQ loses an exception.  */
@@ -4755,7 +4784,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cmp @0 @0)
   (if (cmp != NE_EXPR
|| ! FLOAT_TYPE_P (TREE_TYPE (@0))
-   || ! HONOR_NANS (@0))
+   || ! tree_expr_maybe_nan_p (@0))
{ constant_boolean_node (false, type); })))
 (for cmp (unle unge uneq)
  (simplify
@@ -4767,7 +4796,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (unordered @0 @0)))
 (simplify
  (ltgt @0 @0)
- (if (!flag_trapping_math)
+ (if (!flag_trapping_math || !tree_expr_maybe_nan_p (@0))
   { constant_boolean_node (false, type); }))
 
 /* x == ~x -> false */
diff --git a/gcc/testsuite/gcc.dg/fold-compare-9.c 
b/gcc/testsuite/gcc.dg/fold-compare-9.c
new file mode 100644
index 000..e56f63d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-compare-9.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(int x, double y) {
+return (x * 0.0) < y;
+}
+
+/* { dg-final { scan-tree-dump " y_\[0-9\]\\(D\\) > 0.0" "optimized" } } */


PING: [PATCH v2] x86: Fix -fsplit-stack feature detection via TARGET_CAN_SPLIT_STACK

2022-03-14 Thread Sören Tempel via Gcc-patches
Ping.

Summary: Currently, the macro guards for TARGET_CAN_SPLIT_STACK are not
aligned with the implementation of ix86_supports_split_stack on x86.
That is, on musl systems TARGET_CAN_SPLIT_STACK is defined even though
-fsplit-stack is not supported (via ix86_supports_split_stack). This
prevents gccgo (which uses TARGET_CAN_SPLIT_STACK) from compiling on
musl systems. The proposed patch attempts to align the behavior of
ix86_supports_split_stack and TARGET_CAN_SPLIT_STACK.

See: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590688.html

Greetings,
Sören

Sören Tempel  wrote:
> Since commit c163647ffbc9a20c8feb6e079dbecccfe016c82e -fsplit-stack
> is only supported on glibc targets. However, this original commit
> required some fixups. As part of the fixup, the changes to the
> gnu-user-common.h and gnu.h were partially reverted in commit
> 60953a23d57b13a672f751bec0c6eefc059eb1ab thus causing TARGET_CAN_SPLIT_STACK
> to be defined for non-glibc targets even though -fsplit-stack is
> actually not supported and attempting to use it causes a runtime error.
> 
> This causes gcc internal code, such as ./gcc/go/gospec.c to not
> correctly detect that -fsplit-stack is not supported and thus causes
> gccgo to fail compilation on non-glibc targets.
> 
> This commit ensures that TARGET_CAN_SPLIT_STACK is only set if the
> default libc is glibc. It is presently unclear to me if there is a
> better way to detect glibc at pre-processor time.
> 
> The proposed changes have been tested on x86 and x86_64 Alpine Linux
> (which uses musl libc) and fix compilation of gccgo for this target.
> 
> Signed-off-by: Sören Tempel 
> 
> gcc/ChangeLog:
> 
>   * config/i386/gnu-user-common.h (defined): Only define
> TARGET_CAN_SPLIT_STACK for glibc targets.
>   * config/i386/gnu.h (defined): Ditto.
> ---
> Changes since v1: Use (DEFAULT_LIBC == LIBC_GLIBC) instead of
> OPTION_GLIBC_P to detect use of glibc in a pre-processor context.
> 
> Is there a better way to detect use of glibc in the config header?
> 
>  gcc/config/i386/gnu-user-common.h | 5 +++--
>  gcc/config/i386/gnu.h | 5 +++--
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/i386/gnu-user-common.h 
> b/gcc/config/i386/gnu-user-common.h
> index 00226f5a455..e126c3fa9fa 100644
> --- a/gcc/config/i386/gnu-user-common.h
> +++ b/gcc/config/i386/gnu-user-common.h
> @@ -66,7 +66,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define STACK_CHECK_STATIC_BUILTIN 1
>  
>  /* We only build the -fsplit-stack support in libgcc if the
> -   assembler has full support for the CFI directives.  */
> -#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE
> +   assembler has full support for the CFI directives.  Also
> +   we only support -fsplit-stack on glibc targets.  */
> +#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE && (DEFAULT_LIBC == LIBC_GLIBC)
>  #define TARGET_CAN_SPLIT_STACK
>  #endif
> diff --git a/gcc/config/i386/gnu.h b/gcc/config/i386/gnu.h
> index 25fbc07f58c..17494333bb9 100644
> --- a/gcc/config/i386/gnu.h
> +++ b/gcc/config/i386/gnu.h
> @@ -41,8 +41,9 @@ along with GCC.  If not, see .
>  #define TARGET_THREAD_SSP_OFFSET0x14
>  
>  /* We only build the -fsplit-stack support in libgcc if the
> -   assembler has full support for the CFI directives.  */
> -#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE
> +   assembler has full support for the CFI directives.  Also
> +   we only support -fsplit-stack on glibc targets.  */
> +#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE && (DEFAULT_LIBC == LIBC_GLIBC)
>  #define TARGET_CAN_SPLIT_STACK
>  #endif
>  /* We steal the last transactional memory word.  */


[PATCH] fortran: Separate associate character lengths earlier [PR104570]

2022-03-14 Thread Mikael Morin

Hello,

this workarounds the regression I introduced with the fix for pr104228.
The problem comes from the evaluation of character length for the 
associate target (y) in the testcase.
The expression is non-scalar which is not supported at that point, as no 
scalarization setup is made there.


My initial direction to fix this was trying to modify the evaluation of 
expressions so that the rank was ignored in the context of character 
length evaluation.

That’s the patch I attached to the PR.
The patch I’m proposing here tries instead to avoid the need to evaluate 
character length through sharing of gfc_charlen between symbols and 
expressions.
I slightly prefer the latter direction, though I’m not sure it’s more 
robust.


At least it passes regression testing, which proves some basic level of 
goodness.

OK for master?
Even if PR104570 is a 12 regression only, the PR104228 fix is targeted 
at all the open branches so this one as well.

OK for 11/10/9?
From af11a90c730a57be86b94331f31a611b31276b83 Mon Sep 17 00:00:00 2001
From: Mikael Morin 
Date: Sun, 13 Mar 2022 22:22:55 +0100
Subject: [PATCH] fortran: Separate associate character lengths earlier
 [PR104570]
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This change workarounds an ICE in the evaluation of the character length of an
array expression referencing an associate variable; the code is not prepared
to see a non-scalar expression as it doesn’t initialize the scalarizer.

Before this change, associate length symbols get a new gfc_charlen at resolution
stage to unshare them from the associate expression, so that at translation
stage it is a decl specific to the associate symbol that is initialized, not the
decl of some other symbol.  This reinitialization of gfc_charlen happens after
expressions referencing the associate symbol have been parsed, so that those
expressions retain the original gfc_charlen they have copied from the symbol.
At translation stage, the gfc_charlen for the associate symbol is setup with the
decl holding the actual length value, but the expressions have retained the
original gfc_charlen without any decl.  So they need to evaluate the character
length, and this is where the ICE happens.

This change moves the reinitialization of gfc_charlen earlier at parsing stage,
so that at resolution stage the gfc_charlen can be retained as it’s already not
shared with any other symbol, and the expressions which now share their
gfc_charlen with the symbol are automatically updated when the length decl is
setup at translation stage.  There is no need any more to evaluate the character
length as it has all the required information, and the ICE doesn’t happen.

The first resolve.cc hunk is necessary to avoid regressing on the
associate_35.f90 testcase.

	PR fortran/104228
	PR fortran/104570

gcc/fortran/ChangeLog:

	* parse.cc (parse_associate): Use a new distinct gfc_charlen if the
	copied type has one whose length is not known to be constant.
	* resolve.cc (resolve_assoc_var): Reset charlen if it’s shared with
	the associate target regardless of the expression type.
	Don’t reinitialize charlen if it’s deferred.

gcc/testsuite/ChangeLog:

	* gfortran.dg/associate_58.f90: New test.
---
 gcc/fortran/parse.cc   | 18 ++
 gcc/fortran/resolve.cc |  9 ++---
 gcc/testsuite/gfortran.dg/associate_58.f90 | 21 +
 3 files changed, 45 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/associate_58.f90

diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index db918291b10..e6e915d2a5e 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -4924,6 +4924,24 @@ parse_associate (void)
 	 in case of association to a derived-type.  */
   sym->ts = a->target->ts;
 
+  /* Don’t share the character length information between associate
+	 variable and target if the length is not a compile-time constant,
+	 as we don’t want to touch some other character length variable when
+	 we try to initialize the associate variable’s character length
+	 variable.
+	 We do it here rather than later so that expressions referencing the
+	 associate variable will automatically have the correctly setup length
+	 information.  If we did it at resolution stage the expressions would
+	 use the original length information, and the variable a new different
+	 one, but only the latter one would be correctly initialized at
+	 translation stage, and the former one would need some additional setup
+	 there.  */
+  if (sym->ts.type == BT_CHARACTER
+	  && sym->ts.u.cl
+	  && !(sym->ts.u.cl->length
+	   && sym->ts.u.cl->length->expr_type == EXPR_CONSTANT))
+	sym->ts.u.cl = gfc_new_charlen (gfc_current_ns, NULL);
+
   /* Check if the target expression is array valued.  This cannot always
 	 be done by looking at target.rank, because that might not have been
 	 set yet.  Therefore traverse the chain of refs, 

[PATCH] configure: cache result of "sys/sdt.h" header check

2022-03-14 Thread David Seifert via Gcc-patches
Use AC_CACHE_CHECK to store the result of the header check for
systemtap's "sys/sdt.h", which is similar in spirit to libstdc++'s
AC_CACHE_CHECK(..., glibcxx_cv_sys_sdt_h).

gcc/
* configure.ac: Add AC_CACHE_CHECK(..., gcc_cv_sys_sdt_h).
* configure: Regenerate.
---
 gcc/configure| 20 +++-
 gcc/configure.ac | 16 +---
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 14b19c8fe0c..1dfc5cc7344 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -31389,15 +31389,25 @@ fi
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking sys/sdt.h in the target C 
library" >&5
 $as_echo_n "checking sys/sdt.h in the target C library... " >&6; }
-have_sys_sdt_h=no
-if test -f $target_header_dir/sys/sdt.h; then
-  have_sys_sdt_h=yes
+if ${gcc_cv_sys_sdt_h+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+
+  gcc_cv_sys_sdt_h=no
+  if test -f $target_header_dir/sys/sdt.h; then
+gcc_cv_sys_sdt_h=yes
+  fi
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_sys_sdt_h" >&5
+$as_echo "$gcc_cv_sys_sdt_h" >&6; }
+if test x$gcc_cv_sys_sdt_h = xyes; then :
+
 
 $as_echo "#define HAVE_SYS_SDT_H 1" >>confdefs.h
 
+
 fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $have_sys_sdt_h" >&5
-$as_echo "$have_sys_sdt_h" >&6; }
 
 # Check if TFmode long double should be used by default or not.
 # Some glibc targets used DFmode long double, but with glibc 2.4
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 90cad309762..c86ce5e0a9b 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -6904,14 +6904,16 @@ AC_SUBST([enable_default_ssp])
 
 # Test for  on the target.
 GCC_TARGET_TEMPLATE([HAVE_SYS_SDT_H])
-AC_MSG_CHECKING(sys/sdt.h in the target C library)
-have_sys_sdt_h=no
-if test -f $target_header_dir/sys/sdt.h; then
-  have_sys_sdt_h=yes
-  AC_DEFINE(HAVE_SYS_SDT_H, 1,
+AC_CACHE_CHECK([sys/sdt.h in the target C library], [gcc_cv_sys_sdt_h], [
+  gcc_cv_sys_sdt_h=no
+  if test -f $target_header_dir/sys/sdt.h; then
+gcc_cv_sys_sdt_h=yes
+  fi
+])
+AS_IF([test x$gcc_cv_sys_sdt_h = xyes], [
+  AC_DEFINE([HAVE_SYS_SDT_H], [1],
 [Define if your target C library provides sys/sdt.h])
-fi
-AC_MSG_RESULT($have_sys_sdt_h)
+])
 
 # Check if TFmode long double should be used by default or not.
 # Some glibc targets used DFmode long double, but with glibc 2.4
-- 
2.35.1



[PATCH] c++: extraneous access error with ambiguous lookup [PR103177]

2022-03-14 Thread Patrick Palka via Gcc-patches
When a lookup is ambiguous, lookup_member still attempts to check
access of the first member found before diagnosing the ambiguity and
erroring out, which may cause us to issue an extraneous access error
in case of ambiguous lookup as in the testcase below (for B1::foo).

This patch fixes this by swapping the order of the ambiguity and access
checks within lookup_member.  In passing, since the only possible error
that could happen during lookup_field_r is an ambiguity error, we might
as well hardcode that in lookup_member instead and remove the unneeded
lookup_field_info::errstr.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/103177

gcc/cp/ChangeLog:

* search.cc (lookup_field_info::errstr): Remove this data
member.
(lookup_field_r): Don't set errstr.
(lookup_member): Check ambiguity before checking access.
Simplify accordingly after errstr removal.  Exit early upon
error or empty result.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/ambig6.C: New test.
---
 gcc/cp/search.cc | 37 
 gcc/testsuite/g++.dg/lookup/ambig6.C | 18 ++
 2 files changed, 34 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/lookup/ambig6.C

diff --git a/gcc/cp/search.cc b/gcc/cp/search.cc
index 2b8210408fb..85e3e7cb487 100644
--- a/gcc/cp/search.cc
+++ b/gcc/cp/search.cc
@@ -941,8 +941,6 @@ struct lookup_field_info {
   tree ambiguous;
   /* If nonzero, we are looking for types, not data members.  */
   int want_type;
-  /* If something went wrong, a message indicating what.  */
-  const char *errstr;
 };
 
 /* True for a class member means that it is shared between all objects
@@ -1055,7 +1053,6 @@ lookup_field_r (tree binfo, void *data)
  /* Add the new value.  */
  lfi->ambiguous = tree_cons (NULL_TREE, nval, lfi->ambiguous);
  TREE_TYPE (lfi->ambiguous) = error_mark_node;
- lfi->errstr = G_("request for member %qD is ambiguous");
}
 }
   else
@@ -1127,8 +1124,6 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
  checks.  Whereas rval is only set if a proper (not hidden)
  non-function member is found.  */
 
-  const char *errstr = 0;
-
   if (name == error_mark_node
   || xbasetype == NULL_TREE
   || xbasetype == error_mark_node)
@@ -1172,7 +1167,6 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
   rval_binfo = lfi.rval_binfo;
   if (rval_binfo)
 type = BINFO_TYPE (rval_binfo);
-  errstr = lfi.errstr;
 
   /* If we are not interested in ambiguities, don't report them;
  just return NULL_TREE.  */
@@ -1187,6 +1181,19 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
protect = 0;
 }
 
+  if (protect == 1 && lfi.ambiguous)
+{
+  if (complain & tf_error)
+   {
+ error ("request for member %qD is ambiguous", name);
+ print_candidates (lfi.ambiguous);
+   }
+  return error_mark_node;
+}
+
+  if (!rval)
+return NULL_TREE;
+
   /* [class.access]
 
  In the case of overloaded function names, access control is
@@ -1206,8 +1213,7 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
 
 only the first call to "f" is valid.  However, if the function is
 static, we can check.  */
-  if (rval && protect 
-  && !really_overloaded_fn (rval))
+  if (protect && !really_overloaded_fn (rval))
 {
   tree decl = is_overloaded_fn (rval) ? get_first_fn (rval) : rval;
   decl = strip_using_decl (decl);
@@ -1216,21 +1222,10 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
  && !DECL_NONSTATIC_MEMBER_FUNCTION_P (decl)
  && !perform_or_defer_access_check (basetype_path, decl, decl,
 complain, afi))
-   rval = error_mark_node;
-}
-
-  if (errstr && protect)
-{
-  if (complain & tf_error)
-   {
- error (errstr, name, type);
- if (lfi.ambiguous)
-   print_candidates (lfi.ambiguous);
-   }
-  rval = error_mark_node;
+   return error_mark_node;
 }
 
-  if (rval && is_overloaded_fn (rval)
+  if (is_overloaded_fn (rval)
   /* Don't use a BASELINK for class-scope deduction guides since
 they're not actually member functions.  */
   && !dguide_name_p (name))
diff --git a/gcc/testsuite/g++.dg/lookup/ambig6.C 
b/gcc/testsuite/g++.dg/lookup/ambig6.C
new file mode 100644
index 000..45e8effba43
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/ambig6.C
@@ -0,0 +1,18 @@
+// PR c++/103177
+
+struct B1 {
+private:
+  static int foo();
+};
+
+struct B2 {
+  int foo();
+};
+
+struct D : B1, B2 { };
+
+void f() {
+  D d;
+  d.foo(); // { dg-error "ambiguous" }
+  // { dg-bogus "private" "" { target *-*-* } .-1 }
+}
-- 
2.35.1.455.g1a4874565f



RE: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Roger Sayle


Hi Jeff,

> What I find rather surprising is the location of your changes -- they feel
> incomplete.  For example, you fix the callee side of returns in
> expand_value_return, but I don't analogous code for the caller side.
> 
> Similarly while you fix things for arguments in expand_expr_real_1, that's 
> again
> just the callee side.  Don't you need to do something on the caller side too?

I've taken the pragmatic approach for this fix to PR target/104489, that this
patch only needs to modify/fix the parts of the middle-end that are broken.
With this patch, gcc can compile the following with -O2 -misa=sm_80 -ffast-math

_Float16 p;
_Float16 q;
_Float16 r;

_Float16 foo(_Float16 x, _Float16 y)
{
  return x * y;
}

_Float16 mid(_Float16 x, _Float16 y)
{
  return foo(x,y) + foo(y,x);
}

void bar()
{
  p = mid(q,r);
}

which I assume covers all of the paths that I/we need to care about.
Technically, the blocker is that without this patch, GCC's build fails
in libgcc (compiling __mulhc3) when/if HFmode is enabled by default.
I'm hoping any remaining issues, not caught by the current testsuite,
can be handled as regular Bugzilla PRs to be fixed/added to the
testsuite.


Let me if there's anything I've missed or need to worry about.
I believe most PC laptops/desktops contain Nvidia graphics cards, so it's
relatively easy for GCC developers to try things out (on real hardware)
for themselves.
 
Cheers,
Roger
--

> -Original Message-
> From: Jeff Law 
> Sent: 14 March 2022 15:30
> To: Roger Sayle ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] middle-end: Support ABIs that pass FP values as wider
> integers.
> 
> 
> 
> On 2/9/2022 1:12 PM, Roger Sayle wrote:
> > This patch adds middle-end support for target ABIs that pass/return
> > floating point values in integer registers with precision wider than
> > the original FP mode.  An example, is the nvptx backend where 16-bit
> > HFmode registers are passed/returned as (promoted to) SImode registers.
> > Unfortunately, this currently falls foul of the various (recent?)
> > sanity checks that (very sensibly) prevent creating paradoxical
> > SUBREGs of floating point registers.  The approach below is to
> > explicitly perform the conversion/promotion in two steps, via an
> > integer mode of same precision as the floating point value.  So on
> > nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using
> > SUBREG), then zero-extended to SImode, and likewise when going the
> > other way, parameters truncated to HImode then converted to HFmode
> > (using SUBREG).  These changes are localized to expand_value_return
> > and expanding DECL_RTL to support strange ABIs, rather than inside
> > convert_modes or gen_lowpart, as mismatched precision integer/FP
> > conversions should be explicit in the RTL, and these semantics not generally
> visible/implicit in user code.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check with no new failures, and on nvptx-none, where it is
> > the middle-end portion of a pair of patches to allow the default ISA
> > to be advanced.  Ok for mainline?
> >
> > 2022-02-09  Roger Sayle  
> >
> > gcc/ChangeLog
> > * cfgexpand.cc (expand_value_return): Allow backends to promote
> > a scalar floating point return value to a wider integer mode.
> > * expr.cc (expand_expr_real_1) [expand_decl_rtl]: Likewise, allow
> > backends to promote scalar FP PARM_DECLs to wider integer modes.
> 
> Buried somewhere in our calling conventions code is the ability to pass around
> BLKmode objects in registers along with the ability to tune left vs right 
> padding
> adjustments.   Much of this support grew out of the PA
> 32 bit SOM ABI.
> 
> While I think we could probably make those bits do what we want, I suspect the
> result will actually be uglier than what you've done here and I wouldn't be
> surprised if there was a performance hit as the code to handle those cases was
> pretty dumb in its implementation.
> 
> What I find rather surprising is the location of your changes -- they feel
> incomplete.  For example, you fix the callee side of returns in
> expand_value_return, but I don't analogous code for the caller side.
> 
> Similarly while you fix things for arguments in expand_expr_real_1, that's 
> again
> just the callee side.  Don't you need to so something on the caller side too?
> 
> Jeff
> 




Re: [PATCH] c++: fold calls to std::move/forward [PR96780]

2022-03-14 Thread Patrick Palka via Gcc-patches
On Fri, 11 Mar 2022, Jason Merrill wrote:

> On 3/10/22 11:27, Patrick Palka wrote:
> > On Wed, 9 Mar 2022, Jason Merrill wrote:
> > 
> > > On 3/1/22 18:08, Patrick Palka wrote:
> > > > A well-formed call to std::move/forward is equivalent to a cast, but the
> > > > former being a function call means it comes with bloated debug info,
> > > > which
> > > > persists even after the call has been inlined away, for an operation
> > > > that
> > > > is never interesting to debug.
> > > > 
> > > > This patch addresses this problem in a relatively ad-hoc way by folding
> > > > calls to std::move/forward into casts as part of the frontend's general
> > > > expression folding routine.  After this patch with -O2 and a
> > > > non-checking
> > > > compiler, debug info size for some testcases decreases by about ~10% and
> > > > overall compile time and memory usage decreases by ~2%.
> > > 
> > > Impressive.  Which testcases?
> > 
> > I saw the largest percent reductions in debug file object size in
> > various tests from cmcstl2 and range-v3, e.g.
> > test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp
> > (which are among their biggest tests).
> > 
> > Significant reductions in debug object file size can be observed in
> > some libstdc++ testcases too, such as a 5.5% reduction in
> > std/ranges/adaptor/join.cc
> > 
> > > 
> > > Do you also want to handle addressof and as_const in this patch, as
> > > Jonathan
> > > suggested?
> > 
> > Yes, good idea.  Since each of their argument and return types are
> > indirect types, I think we can use the same NOP_EXPR-based folding for
> > them.
> > 
> > > 
> > > I think we can do this now, and think about generalizing more in stage 1.
> > > 
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something we
> > > > want to consider for GCC 12?
> > > > 
> > > > PR c++/96780
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * cp-gimplify.cc (cp_fold) : When optimizing,
> > > > fold calls to std::move/forward into simple casts.
> > > > * cp-tree.h (is_std_move_p, is_std_forward_p): Declare.
> > > > * typeck.cc (is_std_move_p, is_std_forward_p): Export.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/opt/pr96780.C: New test.
> > > > ---
> > > >gcc/cp/cp-gimplify.cc  | 18 ++
> > > >gcc/cp/cp-tree.h   |  2 ++
> > > >gcc/cp/typeck.cc   |  6 ++
> > > >gcc/testsuite/g++.dg/opt/pr96780.C | 24 
> > > >4 files changed, 46 insertions(+), 4 deletions(-)
> > > >create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C
> > > > 
> > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > index d7323fb5c09..0b009b631c7 100644
> > > > --- a/gcc/cp/cp-gimplify.cc
> > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > @@ -2756,6 +2756,24 @@ cp_fold (tree x)
> > > >  case CALL_EXPR:
> > > >  {
> > > > +   if (optimize
> > > 
> > > I think this should check flag_no_inline rather than optimize.
> > 
> > Sounds good.
> > 
> > Here's a patch that extends the folding to as_const and addressof (as
> > well as __addressof, which I'm kind of unsure about since it's
> > non-standard).  I suppose it also doesn't hurt to verify that the return
> > and argument type of the function are sane before we commit to folding.
> > 
> > -- >8 --
> > 
> > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]
> > 
> > A well-formed call to std::move/forward is equivalent to a cast, but the
> > former being a function call means the compiler generates debug info for
> > it, which persists even after the call has been inlined away, for an
> > operation that's never interesting to debug.
> > 
> > This patch addresses this problem in a relatively ad-hoc way by folding
> > calls to std::move/forward and other cast-like functions into simple
> > casts as part of the frontend's general expression folding routine.
> > After this patch with -O2 and a non-checking compiler, debug info size
> > for some testcases decreases by about ~10% and overall compile time and
> > memory usage decreases by ~2%.
> > 
> > PR c++/96780
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-gimplify.cc (cp_fold) : When optimizing,
> > fold calls to std::move/forward and other cast-like functions
> > into simple casts.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/opt/pr96780.C: New test.
> > ---
> >   gcc/cp/cp-gimplify.cc  | 36 +++-
> >   gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++
> >   2 files changed, 73 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C
> > 
> > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > index d7323fb5c09..efc4c8f0eb9 100644
> > --- a/gcc/cp/cp-gimplify.cc
> > +++ b/gcc/cp/cp-gimplify.cc
> > @@ -2756,9 +2756,43 @@ cp_fold (tree x)
> > case CALL_EXPR:

Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-14 Thread Xi Ruoyao via Gcc-patches
On Mon, 2022-03-14 at 16:04 +, Richard Sandiford wrote:

> Xi Ruoyao  writes:

> > Now I think the only rational ways are:
> > 
> > (1) allow zeroing more registers than need_zeroed_hardregs.
> 
> I think this is the way to go.  I agree it's a bit hacky, but it seems
> like the least worst option.

> > (2) allow zeroing less registers than need_zeroed_hardregs (then I'll
> > skip ST_REGS, after all they are just 8 bits in total).
> 
> Yeah, this is explicitly OK, provided that the target maintainers
> feel that the contents of the registers in question are not a significant
> security concern.  I don't feel I can make that call though.  It's really
> a question for the userbase.

To me, I believe nobody will write some security-critical code depending
on a floating-point compare result :).  Maybe I'm wrong by mixing up the
concept of "security" and "safety".
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[pushed] Enable libsanitizer build on mips64

2022-03-14 Thread Xi Ruoyao via Gcc-patches
On Mon, 2022-03-14 at 16:18 +, Richard Sandiford wrote:
> Xi Ruoyao  writes:

> > * configure.tgt: Enable build on mips64.

I changed this to "mips*64*-*-linux*" to be more precise.

> OK, thanks.
> 
> Richard

Two patches pushed as r12-7645 and r12-7646, with the minor changelog
modification above.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] libgomp : OMPD implementation

2022-03-14 Thread Jakub Jelinek via Gcc-patches
Hi!

Sorry for the delay, GCC is currently in stage 4, which means a lot of us
concentrate on fixing GCC 12 so that it can be released soon and projects
that are clearly GCC 13 material are much lower priority.

On Wed, Feb 16, 2022 at 11:04:13PM +0200, Mohamed Atef via Gcc-patches wrote:
> --- a/libgomp/ChangeLog
> +++ b/libgomp/ChangeLog
> @@ -1,3 +1,30 @@

ChangeLog entries should be posted in the patches and eventually should be
in the commit message, but the patch shouldn't change ChangeLog files
directly, we have nightly scripts that handle that.

> +2022-02-16  Mohamed Atef  
> +
> +* Makefile.am (toolexeclib_LTLIBRARIES): Add libgompd.la.

For ChangeLog entries, https://gcc.gnu.org/contribute.html specifies
how it should be formatted.
E.g. there should be a tab instead of 8 spaces at the start of the lines
(except for the date/name/email line), the above has spaces.

> +(libgompd_la_LDFLAGS, libgompd_la_DEPENDENCIES, libgompd_la_LINK,
> +libgompd_la_SOURCES, libgompd_version_dep, libgompd_version_script,
> +libgompd.ver-sun, libgompd.ver, libgompd_version_info): Defined. 

No trailing whitespace.  I think it would be better to use : New.

> +* Makefile.in: Regenerate.
> +* aclocal.m4: Regenerate.
> +* config/darwin/plugin-suffix.h: Removed ().
> +* config/hpux/plugin-suffix.h: Removed ().
> +* config/posix/plugin-suffix.h: Removed ().

These don't say what has changed, and the tense is wrong.  So should be
like:
* config/posix/plugin-suffix.h (SONAME_SUFFIX): Remove ()s.
or so.

> +* configure: Regenerate.
> +* env.c: (#include "ompd-support.h") : Added.

The above correctly mentions just the filename and not a particular thing
in it for the added include, but the rest is incorreclty formatted.
Should be:
* env.c: Include ompd-support.h
or so.

> +(initialize_env) : Call ompd_load().

There should be no space in between ) and :.  Just say
(initialize_env): Call ompd_load.
No need for the ()s in there.

> +* parallel.c:(#include "ompd-support.h"): Added.

See above.

> +(GOMP_parallel) : Call ompd_bp_parallel_begin and 
> ompd_bp_parallel_end.
> +* libgomp.map: Add OMP_5.0.3 symobl versions.
> +* libgompd.map: New file.
> +* omp-tools.h.in : New file.
> +* omp-types.h.in : New file.
> +* ompd-support.h : New file.
> +* ompd-support.c : New file.
> +* ompd-helper.h : New file.

See above for the superfluous spaces before :.

> +* ompd-helper.c: New file.
> +* ompd-init.c: New file.
> +* testsuite/Makfile.in: Regenerate.

Typo, Makefile.in instead of Makfile.in (the checking script would prevent
commit because of this).

> --- a/libgomp/Makefile.am
> +++ b/libgomp/Makefile.am
...
> +libgompd_version_info = -version-info $(libtool_VERSION)
>  libgomp_la_LDFLAGS = $(libgomp_version_info) $(libgomp_version_script) \
>  $(lt_host_flags)
> +libgompd_la_LDFLAGS = $(libgompd_version_info) $(libgompd_version_script) \
> + $(lt_host_flags)
>  libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
> +libgompd_la_DEPENDENCIES = $(libgompd_version_dep)
>  libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
> -
> +libgompd_la_LINK = $(LINK) $(libgompd_la_LDFLAGS)

Please preserve the empty line above libgomp_la_SOURCES.

> --- a/libgomp/Makefile.in
> +++ b/libgomp/Makefile.in
> @@ -1,7 +1,7 @@
> -# Makefile.in generated by automake 1.15.1 from Makefile.am.
> +# Makefile.in generated by automake 1.16.1 from Makefile.am.

Please make sure you regenerate files with the exact autoconf/automake
versions as used previously, we don't want the generated files
jump backwards/forwards whenever somebody regenerates them.
You can e.g. build automake 1.15.1 and autoconf 2.69 if you don't have
those, make install DESTDIR=/your/home/automake-1.15.1 etc.
and then just regenerate with those in $PATH.
> --- a/libgomp/env.c
> +++ b/libgomp/env.c
> @@ -1483,6 +1484,8 @@ initialize_env (void)
>   = thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
>  }
>parse_int_secure ("GOMP_DEBUG", _debug_var, true);
> +  if(gomp_debug_var)
> + ompd_load();

Formatting.  Again, https://gcc.gnu.org/contribute.html contains
the rules or just watch how does surrounding code look like.
The errors are:
1) space in between if and ( is missing
2) ompd_load should be indented 2 further columns from if (and
   any consecutive 8 spaces at the start of lines replaced with
   tab in *.c/*.C files)
3) space between ompd_load and ( is missing

>  #ifndef HAVE_SYNC_BUILTINS
>gomp_mutex_init (_managed_threads_lock);
>  #endif
> diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
> index 2ac58094169..5c57b1a2d08 100644
> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map
> @@ -226,6 +226,16 @@ OMP_5.1 {
>   omp_get_teams_thread_limit_;
>  } OMP_5.0.2;
>  
> +OMP_5.0.3 {
> +  

Re: [PATCH 2/2] Enable libsanitizer build on mips64

2022-03-14 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao  writes:
> Bootstrapped and regtested on mips64el-linux-gnuabi64.
>
> bootstrap-ubsan revealed 3 bugs (PR 104842, 104843, 104851).
> bootstrap-asan did not reveal any new bug.
>
> gcc/
>
>   * config/mips/mips.h (SUBTARGET_SHADOW_OFFSET): Define.
>   * config/mips/mips.cc (mips_option_override): Make
>   -fsanitize=address imply -fasynchronous-unwind-tables.  This is
>   needed by libasan for stack backtrace on MIPS.
>   (mips_asan_shadow_offset): Return SUBTARGET_SHADOW_OFFSET.
>
> gcc/testsuite:
>
>   * c-c++-common/asan/global-overflow-1.c: Skip for MIPS with some
>   optimization levels because inaccurate debug info is causing
>   dg-output mismatch on line numbers.
>   * g++.dg/asan/large-func-test-1.C: Likewise.
>
> libsanitizer/
>
>   * configure.tgt: Enable build on mips64.

OK, thanks.

Richard

> ---
>  gcc/config/mips/mips.cc | 9 -
>  gcc/config/mips/mips.h  | 7 +++
>  gcc/testsuite/c-c++-common/asan/global-overflow-1.c | 1 +
>  gcc/testsuite/g++.dg/asan/large-func-test-1.C   | 1 +
>  libsanitizer/configure.tgt  | 4 
>  5 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index 59eef515826..6b06c6380f6 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -19974,6 +19974,13 @@ mips_option_override (void)
>   target_flags |= MASK_64BIT;
>  }
>  
> +  /* -fsanitize=address needs to turn on -fasynchronous-unwind-tables in
> + order for tracebacks to be complete but not if any
> + -fasynchronous-unwind-table were already specified.  */
> +  if (flag_sanitize & SANITIZE_USER_ADDRESS
> +  && !global_options_set.x_flag_asynchronous_unwind_tables)
> +flag_asynchronous_unwind_tables = 1;
> +
>if ((target_flags_explicit & MASK_FLOAT64) != 0)
>  {
>if (mips_isa_rev >= 6 && !TARGET_FLOAT64)
> @@ -22591,7 +22598,7 @@ mips_constant_alignment (const_tree exp, 
> HOST_WIDE_INT align)
>  static unsigned HOST_WIDE_INT
>  mips_asan_shadow_offset (void)
>  {
> -  return 0x0aaa;
> +  return SUBTARGET_SHADOW_OFFSET;
>  }
>  
>  /* Implement TARGET_STARTING_FRAME_OFFSET.  See mips_compute_frame_info
> diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
> index 0029864fdcd..858bbba3a36 100644
> --- a/gcc/config/mips/mips.h
> +++ b/gcc/config/mips/mips.h
> @@ -3463,3 +3463,10 @@ struct GTY(())  machine_function {
> && !TARGET_MICROMIPS && !TARGET_FIX_24K)
>  
>  #define NEED_INDICATE_EXEC_STACK 0
> +
> +/* Define the shadow offset for asan. Other OS's can override in the
> +   respective tm.h files.  */
> +#ifndef SUBTARGET_SHADOW_OFFSET
> +#define SUBTARGET_SHADOW_OFFSET \
> +  (POINTER_SIZE == 64 ? HOST_WIDE_INT_1 << 37 : HOST_WIDE_INT_C (0x0aaa))
> +#endif
> diff --git a/gcc/testsuite/c-c++-common/asan/global-overflow-1.c 
> b/gcc/testsuite/c-c++-common/asan/global-overflow-1.c
> index 1092a316681..ec412231be0 100644
> --- a/gcc/testsuite/c-c++-common/asan/global-overflow-1.c
> +++ b/gcc/testsuite/c-c++-common/asan/global-overflow-1.c
> @@ -22,6 +22,7 @@ int main() {
>return res;
>  }
>  
> +/* { dg-skip-if "inaccurate debug info" { mips*-*-* } { "*" } { "-O0" } } */
>  /* { dg-output "READ of size 1 at 0x\[0-9a-f\]+ thread T0.*(\n|\r\n|\r)" } */
>  /* { dg-output "#0 0x\[0-9a-f\]+ +(in _*main 
> (\[^\n\r]*global-overflow-1.c:20|\[^\n\r]*:0|\[^\n\r]*\\+0x\[0-9a-z\]*)|\[(\])\[^\n\r]*(\n|\r\n|\r).*"
>  } */
>  /* { dg-output "0x\[0-9a-f\]+ is located 0 bytes to the right of global 
> variable" } */
> diff --git a/gcc/testsuite/g++.dg/asan/large-func-test-1.C 
> b/gcc/testsuite/g++.dg/asan/large-func-test-1.C
> index b42c09e3b0d..ac9deb898c8 100644
> --- a/gcc/testsuite/g++.dg/asan/large-func-test-1.C
> +++ b/gcc/testsuite/g++.dg/asan/large-func-test-1.C
> @@ -35,6 +35,7 @@ int main() {
>delete x;
>  }
>  
> +// { dg-skip-if "inaccurate debug info" { mips*-*-* } { "-Os" } { "" }  }
>  // { dg-output "ERROR: AddressSanitizer:? heap-buffer-overflow on 
> address\[^\n\r]*" }
>  // { dg-output "0x\[0-9a-f\]+ at pc 0x\[0-9a-f\]+ bp 0x\[0-9a-f\]+ sp 
> 0x\[0-9a-f\]+\[^\n\r]*(\n|\r\n|\r)" }
>  // { dg-output "\[^\n\r]*READ of size 4 at 0x\[0-9a-f\]+ thread 
> T0\[^\n\r]*(\n|\r\n|\r)" }
> diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
> index 5a59ea6a1b5..fb89df4935c 100644
> --- a/libsanitizer/configure.tgt
> +++ b/libsanitizer/configure.tgt
> @@ -54,10 +54,6 @@ case "${target}" in
>   ;;
>arm*-*-linux*)
>   ;;
> -  mips*64*-*-linux*)
> - # This clause is only here to not match the supported mips*-*-linux*.
> - UNSUPPORTED=1
> - ;;
>mips*-*-linux*)
>   ;;
>aarch64*-*-linux*)


Re: [PATCH 1/2] libsanitizer: cherry-pick db7bca28638e from upstream

2022-03-14 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao  writes:
> libsanitizer/
>
>   * sanitizer_common/sanitizer_atomic_clang.h: Ensures to only
>   include sanitizer_atomic_clang_mips.h for O32.

OK, thanks.

Richard

> ---
>  libsanitizer/sanitizer_common/sanitizer_atomic_clang.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libsanitizer/sanitizer_common/sanitizer_atomic_clang.h 
> b/libsanitizer/sanitizer_common/sanitizer_atomic_clang.h
> index fc13ca52dda..ccf18f0786d 100644
> --- a/libsanitizer/sanitizer_common/sanitizer_atomic_clang.h
> +++ b/libsanitizer/sanitizer_common/sanitizer_atomic_clang.h
> @@ -96,8 +96,8 @@ inline bool atomic_compare_exchange_weak(volatile T *a,
>  // This include provides explicit template instantiations for atomic_uint64_t
>  // on MIPS32, which does not directly support 8 byte atomics. It has to
>  // proceed the template definitions above.
> -#if defined(_MIPS_SIM) && defined(_ABIO32)
> -  #include "sanitizer_atomic_clang_mips.h"
> +#if defined(_MIPS_SIM) && defined(_ABIO32) && _MIPS_SIM == _ABIO32
> +#  include "sanitizer_atomic_clang_mips.h"
>  #endif
>  
>  #undef ATOMIC_ORDER


Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-14 Thread Richard Sandiford via Gcc-patches
Sorry for the slow response, was out for a few days.

Xi Ruoyao  writes:
> On Sat, 2022-03-12 at 18:48 +0800, Xi Ruoyao via Gcc-patches wrote:
>> On Fri, 2022-03-11 at 21:26 +, Qing Zhao wrote:
>> > Hi, Ruoyao,
>> > 
>> > (I might not be able to reply to this thread till next Wed due to a
>> > short vacation).
>> > 
>> > First, some comments on opening bugs against Gcc:
>> > 
>> > I took a look at the bug reports PR104817 and PR104820:
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104820
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104817
>> > 
>> > I didn’t see a testing case and a script to repeat the error, so I
>> > cannot repeat the error at my side.
>> 
>> I've put the test case, but maybe you didn't see it because it is too
>> simple:
>> 
>> echo 'int t() {}' | /home/xry111/git-repos/gcc-test-mips/gcc/cc1 -
>> nostdinc -fzero-call-used-regs=all
>> 
>> An empty function is enough to break -fzero-call-used-regs=all.  And
>> if
>> you append -mips64r2 to the cc1 command line you'll get 104820.  I
>> enabled 4 existing tests for MIPS (reported "not work" on MIPS) in the
>> patch so I think it's unnecessary to add new test cases.
>> 
>> Richard: can we use MIPS_EPILOGUE_TEMP as a scratch register in the
>> sequence for zeroing the call-used registers, and then zero itself
>> (despite it's not in need_zeroed_hardregs)?
>
> No, it leads to an ICE at stage 3 bootstrapping :(.
>
> Now I think the only rational ways are:
>
> (1) allow zeroing more registers than need_zeroed_hardregs.

I think this is the way to go.  I agree it's a bit hacky, but it seems
like the least worst option.

A less hacky alternative would be to pass an extra argument to the hook
that contains the set of registers that the hook is *allowed* to clobber.
For -fzero-call-used-regs=X, this new argument would be the set that
would have been chosen for -fzero-call-used-regs=all, regardless of
what X actually is.  We could then assert that the extra registers we
want to clobber are in that set (which will be true for all values of X).

> Or
>
> (2) allow zeroing less registers than need_zeroed_hardregs (then I'll
> skip ST_REGS, after all they are just 8 bits in total).

Yeah, this is explicitly OK, provided that the target maintainers
feel that the contents of the registers in question are not a significant
security concern.  I don't feel I can make that call though.  It's really
a question for the userbase.

Thanks,
Richard

> If all these are unacceptable, then
>
> (3) I'll just call sorry in MIPS target hook to tell not to use this
> feature on MIPS.


Re: [PATCH v2] Add TARGET_MOVE_WITH_MODE_P

2022-03-14 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, Mar 9, 2022 at 7:04 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener via Gcc-patches  writes:
>> > On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu  wrote:
>> >>
>> >> On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote:
>> >> > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches
>> >> >  wrote:
>> >> > >
>> >> > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to fold 
>> >> > > memcpy.
>> >> > > The default is
>> >> > >
>> >> > > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun))
>> >> > >
>> >> > > For x86, it is MOVE_MAX to restore the old behavior before
>> >> >
>> >> > I know we've discussed this to death in the PR, I just want to repeat 
>> >> > here
>> >> > that the GIMPLE folding expects to generate a single load and a single
>> >> > store (that is what it does on the GIMPLE level) which is why MOVE_MAX
>> >> > was chosen originally (it's documented to what a "single instruction" 
>> >> > does).
>> >> > In practice MOVE_MAX does not seem to cover vector register sizes
>> >> > so Richard pulled MOVE_RATIO which is really intended to cover
>> >> > the case of using multiple instructions for moving memory (but then I
>> >> > don't remember whether for the ARM case the single load/store GIMPLE
>> >> > will be expanded to multiple load/store instructions).
>> >> >
>> >> > TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution,
>> >> > being very specific for memcpy folding (we also fold memmove btw).
>> >> >
>> >> > There is also MOVE_MAX_PIECES which _might_ be more appropriate
>> >> > than MOVE_MAX here and still honor the idea of single instructions.
>> >> > Now neither arm nor aarch64 define this and it defaults to MOVE_MAX,
>> >> > not MOVE_MAX * MOVE_RATIO.
>> >> >
>> >> > So if we need a new hook then that hook should at least get the
>> >> > 'speed' argument of MOVE_RATIO and it should get a better name.
>> >> >
>> >> > I still think that it should be possible to improve the insn check to
>> >> > avoid use of "disabled" modes, maybe that's also a point to add
>> >> > a new hook like .move_with_mode_p or so?  To quote, we do
>> >>
>> >> Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P.
>> >
>> > Again I'd like to shine light on MOVE_MAX_PIECES which explicitely
>> > mentions "a load or store used TO COPY MEMORY" (emphasis mine)
>> > and whose x86 implementation would already be fine (doing larger moves
>> > and also not doing too large moves).  But appearantly the arm folks
>> > decided that that's not fit and instead (mis-?)used MOVE_MAX * MOVE_RATIO.
>>
>> It seems like there are old comments and old documentation that justify
>> both interpretations, so there are good arguments on both sides.  But
>> with this kind of thing I think we have to infer the meaning of the
>> macro from the way it's currently used, rather than trusting such old
>> and possibly out-of-date and contradictory information.
>>
>> FWIW, I agree that (if we exclude old reload, which we should!) the
>> only direct uses of MOVE_MAX before the patch were not specific to
>> integer registers and so MOVE_MAX should include vectors if the
>> target wants vector modes to be used for general movement.
>>
>> Even if people disagree that that's the current meaning, I think it's
>> at least a sensible meaning.  It provides information that AFAIK isn't
>> available otherwise, and it avoids overlap with MAX_FIXED_MODE_SIZE.
>>
>> So FWIW, I think it'd be reasonable to change non-x86 targets if they
>> want vector modes to be used for single-insn copies.
>
> Note a slight complication in the GIMPLE folding case is that we
> do not end up using vector modes but we're using "fake"
> integer modes like OImode which x86 has move patterns for.
> If we'd use vector modes we could use existing target hooks to
> eventually decide whether auto-using those is desired or not.

Hmm, yeah.  Certainly we shouldn't require the target to support
a scalar integer equivalent of every vector mode.

Richard


Re: [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.

2022-03-14 Thread Marcel Vollweiler

Hi Tobias,


Minor remark to the test:

On 11.03.22 13:30, Marcel Vollweiler wrote:

+  int d = omp_get_default_device ();

...

+  int shared_mem = 0;
+  #pragma omp target map (alloc: shared_mem) device (d)
+shared_mem = 1;
+  if (omp_target_is_accessible (p, sizeof (int), d) != shared_mem)
+__builtin_abort ();


I wonder whether it makes sense to do instead
   for (d = 0; d <= omp_get_num_devices(); ++d)
instead of just
   d = omp_get_default_device();
given that we have already found once in a while bugs when testing more
than just the default device - be it because devices differed or because
'0' was special.

In particular, I could image having at the same time two or three devices
available of type intelmic + gcn + nvptx, possibly mixing shared memory,
nonshared memory and semi-shared memory*


Good hint, thanks. I updated the C(++) and Fortran tests accordingly and
attached the updated patch.

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.

gcc/ChangeLog:

* omp-low.cc (omp_runtime_api_call): Added target_is_accessible to
omp_runtime_apis array.

libgomp/ChangeLog:

* libgomp.map: Added omp_target_is_accessible.
* libgomp.texi: Tagged omp_target_is_accessible as supported.
* omp.h.in: Added omp_target_is_accessible.
* omp_lib.f90.in: Added interface for omp_target_is_accessible.
* omp_lib.h.in: Likewise.
* target.c (omp_target_is_accessible): Added implementation of
omp_target_is_accessible.
* testsuite/libgomp.c-c++-common/target-is-accessible-1.c: New test.
* testsuite/libgomp.fortran/target-is-accessible-1.f90: New test.

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 77176ef..bf38fad 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -3959,6 +3959,7 @@ omp_runtime_api_call (const_tree fndecl)
   "target_associate_ptr",
   "target_disassociate_ptr",
   "target_free",
+  "target_is_accessible",
   "target_is_present",
   "target_memcpy",
   "target_memcpy_rect",
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2ac5809..1764380 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -226,6 +226,11 @@ OMP_5.1 {
omp_get_teams_thread_limit_;
 } OMP_5.0.2;
 
+OMP_5.1.1 {
+  global:
+   omp_target_is_accessible;
+} OMP_5.1;
+
 GOMP_1.0 {
   global:
GOMP_atomic_end;
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 161a423..58e432c 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -311,7 +311,7 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{omp_set_num_teams}, @code{omp_set_teams_thread_limit},
   @code{omp_get_max_teams}, @code{omp_get_teams_thread_limit} runtime
   routines @tab Y @tab
-@item @code{omp_target_is_accessible} runtime routine @tab N @tab
+@item @code{omp_target_is_accessible} runtime routine @tab Y @tab
 @item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
   runtime routines @tab N @tab
 @item @code{omp_get_mapped_ptr} runtime routine @tab N @tab
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 89c5d65..1ec7415 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -282,6 +282,8 @@ extern int omp_target_memcpy_rect (void *, const void *, 
__SIZE_TYPE__, int,
 extern int omp_target_associate_ptr (const void *, const void *, __SIZE_TYPE__,
 __SIZE_TYPE__, int) __GOMP_NOTHROW;
 extern int omp_target_disassociate_ptr (const void *, int) __GOMP_NOTHROW;
+extern int omp_target_is_accessible (const void *, __SIZE_TYPE__, int)
+  __GOMP_NOTHROW;
 
 extern void omp_set_affinity_format (const char *) __GOMP_NOTHROW;
 extern __SIZE_TYPE__ omp_get_affinity_format (char *, __SIZE_TYPE__)
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index daf40dc..f369507 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -835,6 +835,16 @@
   end function omp_target_disassociate_ptr
 end interface
 
+interface
+  function omp_target_is_accessible (ptr, size, device_num) bind(c)
+use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t, c_int
+integer(c_int) :: omp_target_is_accessible
+type(c_ptr), value :: ptr
+integer(c_size_t), value :: size
+integer(c_int), value :: device_num
+  end function omp_target_is_accessible
+end interface
+
 #if _OPENMP >= 201811
 !GCC$ ATTRIBUTES DEPRECATED :: omp_get_nested, omp_set_nested
 #endif
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index ff857a4..5ea0366 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -416,3 +416,14 @@
   

Re: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Richard Sandiford via Gcc-patches
"Roger Sayle"  writes:
> Hi Richard,
>> Yes, which is why I think the target should claim argument passing happens
> in reg:HI.
>
> Unfortunately, this hits another "feature" of the nvptx backend; it's a 
>
> /* Implement TARGET_MODES_TIEABLE_P.  */
> bool nvptx_modes_tieable_p (machine_mode, machine_mode) { return false; }
> /* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
> bool nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t) {
> return false; }
> /* Implement TARGET_TRULY_NOOP_TRUNCATION.  */
> bool nvptx_truly_noop_truncation (poly_uint64, poly_uint64) { return false;
> }
>
> Basically, HImode is considered as a different register (bank) to SImode,
> and requires
> explicit move instructions to move data from HImode to SImode, and back,
> unlike
> most targets that can simply re-interpret the contents of GPRs.
>
> Passing an argument as HImode would be incompatible with the requirement to
> pass as SImode.

Ah, OK, this was the bit I was missing from all of this.

Like Richard says, how important is the zero-extension
vs. sign-extension thing?  Are callers required to zero-extend,
and can callees rely on that?  Or does the ABI treat the upper
bits as undefined, so that callees can't rely on their contents?

If the bits are undefined, have you tried making the ABI code return:

  (parallel:HF [(expr_list (reg:SI rN) (const_int 0))])

It's admittedly unusual to have the register be *bigger* than the
value, but supporting that feels less hacky to me than the current
approach.

Thanks,
Richard


Re: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Jeff Law via Gcc-patches




On 2/9/2022 1:12 PM, Roger Sayle wrote:

This patch adds middle-end support for target ABIs that pass/return
floating point values in integer registers with precision wider than
the original FP mode.  An example, is the nvptx backend where 16-bit
HFmode registers are passed/returned as (promoted to) SImode registers.
Unfortunately, this currently falls foul of the various (recent?) sanity
checks that (very sensibly) prevent creating paradoxical SUBREGs of
floating point registers.  The approach below is to explicitly perform the
conversion/promotion in two steps, via an integer mode of same precision
as the floating point value.  So on nvptx, 16-bit HFmode is initially
converted to 16-bit HImode (using SUBREG), then zero-extended to SImode,
and likewise when going the other way, parameters truncated to HImode
then converted to HFmode (using SUBREG).  These changes are localized
to expand_value_return and expanding DECL_RTL to support strange ABIs,
rather than inside convert_modes or gen_lowpart, as mismatched
precision integer/FP conversions should be explicit in the RTL,
and these semantics not generally visible/implicit in user code.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures, and on nvptx-none, where it is
the middle-end portion of a pair of patches to allow the default ISA to
be advanced.  Ok for mainline?

2022-02-09  Roger Sayle  

gcc/ChangeLog
* cfgexpand.cc (expand_value_return): Allow backends to promote
a scalar floating point return value to a wider integer mode.
* expr.cc (expand_expr_real_1) [expand_decl_rtl]: Likewise, allow
backends to promote scalar FP PARM_DECLs to wider integer modes.


Buried somewhere in our calling conventions code is the ability to pass 
around BLKmode objects in registers along with the ability to tune left 
vs right padding adjustments.   Much of this support grew out of the PA 
32 bit SOM ABI.


While I think we could probably make those bits do what we want, I 
suspect the result will actually be uglier than what you've done here 
and I wouldn't be surprised if there was a performance hit as the code 
to handle those cases was pretty dumb in its implementation.


What I find rather surprising is the location of your changes -- they 
feel incomplete.  For example, you fix the callee side of returns in 
expand_value_return, but I don't analogous code for the caller side.


Similarly while you fix things for arguments in expand_expr_real_1, 
that's again just the callee side.  Don't you need to so something on 
the caller side too?


Jeff




Re: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Jeff Law via Gcc-patches




On 3/14/2022 2:39 AM, Roger Sayle wrote:


The only (other?) possible point of contention is the (arbitrary) decision that
when passing floating point values in a larger integer register, the code is
hardwired to use zero-extension.  This in theory could be turned into a target
hook to support sign-extension, but given these are padding bits, zero seems
appropriate. [On x86_64, if passing DFmode argument in a V2DFmode vector,
say, it seems reasonable to use movq and zero the high bits].

I'm not terribly concerned about the zero vs sign extension.



The final point is that at the moment, the only affected target is nvptx-none,
as I don't believe any other backend specifies an ABI that requires passing
floating point values in wider integer registers.  Having said that, most 
targets
don't yet support HFmode, and their ABI specifications haven't yet been
updated as what to do in that eventuality.  If they choose to treat these like
HImode, they'll run into the same issues, as most ABIs pass HImode in
wider word_mode registers.
The plan for our target has been to pass around HFmode in the GPF 
registers, but in general HFmode is new enough that not many targets 
have them handled explicitly in their ABI.


Passing HFmode objects around in GPRs is odd, but not crazy so. We've 
certainly had targets where FP values would get passed around in the 
integer registers -- for example the 32 bit PA ABI mandated this kind of 
behavior in some circumstances.  Of course they also relied on the 
linker to look at the caller & callee and add a stub in between them to 
deal with mismatches :(


jeff



Re: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Jeff Law via Gcc-patches




On 3/14/2022 3:09 AM, Richard Biener wrote:

On Mon, 14 Mar 2022, Roger Sayle wrote:


I thought I'd add a few comments that might help reviewers of this patch.
Most importantly, this patch should be completely safe, as both changes
only trigger with FLOAT vs INT size mismatches which currently both ICE in
the compiler a few lines later.  Admittedly, this indicates something odd
about a target's choice of ABI, but one alternative might be to issue a
"sorry, unimplemented" error message rather than ICE, perhaps with a
TODO or FIXME comment that support for mixed size FP/integer ABIs be
added in future.

The only (other?) possible point of contention is the (arbitrary) decision that
when passing floating point values in a larger integer register, the code is
hardwired to use zero-extension.  This in theory could be turned into a target
hook to support sign-extension, but given these are padding bits, zero seems
appropriate. [On x86_64, if passing DFmode argument in a V2DFmode vector,
say, it seems reasonable to use movq and zero the high bits].

The final point is that at the moment, the only affected target is nvptx-none,
as I don't believe any other backend specifies an ABI that requires passing
floating point values in wider integer registers.  Having said that, most 
targets
don't yet support HFmode, and their ABI specifications haven't yet been
updated as what to do in that eventuality.  If they choose to treat these like
HImode, they'll run into the same issues, as most ABIs pass HImode in
wider word_mode registers.

I hope this helps.  If folks could air their concerns out loud, I can try my 
best
to address those worries.

First of all I'm not familiar with the ABI related code as all, so I
refrained from commenting.  But now I've looked closer (still unfamiliar
code).

I suppose there's targets passing SFmode in a SImode GPR and DFmode
in a DImode GPR (all soft-float targets?), so that already works somehow.
Why does nvptx choose DImode for SFmode passing then?  Can't it choose
(subreg:SI di:..) or so?  Does it require zero-extending to DImode
on the caller side?  It seems your expand_expr_real_1 code does
not rely on that?  So, why does nvptx function_arg hook (?) insist
on returning a DImode reg for an SFmode argument rather than
an SImode subreg of that?
I think so.  V850 for example might follow this pattern, but those 
extensions came in after the original port, so I'm not 100% sure.


I'd hazard a guess DI is a better match than SI simply because nvptx is 
a 64bit target.  I'd be surprised if sign vs zero extension is actually 
important.  It may be following from existing conventions on the port.


jeff



RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Richard Biener via Gcc-patches
On Mon, 14 Mar 2022, Roger Sayle wrote:

> 
> Hi Richard,
> > Yes, which is why I think the target should claim argument passing happens
> in reg:HI.
> 
> Unfortunately, this hits another "feature" of the nvptx backend; it's a 
> 
> /* Implement TARGET_MODES_TIEABLE_P.  */
> bool nvptx_modes_tieable_p (machine_mode, machine_mode) { return false; }
> /* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
> bool nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t) {
> return false; }
> /* Implement TARGET_TRULY_NOOP_TRUNCATION.  */
> bool nvptx_truly_noop_truncation (poly_uint64, poly_uint64) { return false;
> }
> 
> Basically, HImode is considered as a different register (bank) to SImode,
> and requires
> explicit move instructions to move data from HImode to SImode, and back,
> unlike
> most targets that can simply re-interpret the contents of GPRs.
> 
> Passing an argument as HImode would be incompatible with the requirement to
> pass as SImode.

OK - I'm out of good suggestions but it somehow feels the ugly details
should hide inside the backend somehow.  Just not sure how.

Richard.

> Cheers,
> Roger
> --
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: 14 March 2022 13:28
> > To: Roger Sayle 
> > Cc: 'Tom de Vries' ; 'Jeff Law' ;
> > 'Tobias Burnus' ; richard.sandif...@arm.com;
> 'Jeff
> > Law' ; gcc-patches@gcc.gnu.org
> > Subject: RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP
> values as
> > wider integers.
> > 
> > On Mon, 14 Mar 2022, Roger Sayle wrote:
> > 
> > >
> > > Hi Richard,
> > >
> > > > The above seems backwards - it should be (subreg:HF (reg:SI 0)), no?
> > >
> > > Line 949 of emit-rtl.cc in the definition of "validate_subreg" contains:
> > >
> > >   /* Subregs involving floating point modes are not allowed to
> > >  change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> > >  (subreg:SI (reg:DF) 0) isn't.  */
> > >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > > {
> > >   if (! (known_eq (isize, osize)
> > >
> > > Hence, we ICE on line 1022 of emit-rtl.cc in gen_rtx_SUBREG, that
> > > asserts validate_subreg, when called with omode=HFmode and
> > imode=SImode.
> > 
> > Yes, which is why I think the target should claim argument passing happens
> in
> > reg:HI.  But as said, I'm not very familiar with argument passing
> internals.  But
> > still the patch looks quite bolted on.
> > 
> > For the expand_value_return I have to guess that 'val' has HFmode, the
> > DECL_RESULT also has HFmode(?), promote_function_mode then returns
> > SImode which isn't supported.  So make it return HImode?
> > 
> > > Cheers,
> > > Roger
> > > --
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: 14 March 2022 10:15
> > > > To: Roger Sayle 
> > > > Cc: 'Tom de Vries' ; 'Jeff Law'
> > > > ; 'Tobias Burnus' ;
> > > > richard.sandif...@arm.com;
> > > 'Jeff
> > > > Law' ; gcc-patches@gcc.gnu.org
> > > > Subject: RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP
> > > values as
> > > > wider integers.
> > > >
> > > > On Mon, 14 Mar 2022, Roger Sayle wrote:
> > > >
> > > > >
> > > > > Hi Richard,
> > > > > Thanks for asking.
> > > > > The issue is with 16-bit floating point HFmode, where clang on
> > > > > nvptx passes HFmode values in SImode registers, and for binary
> > > > > compatibility GCC needs to do the same.
> > > > > Their motivation is that for compatibility with older GPUs and the
> > > > > x86_64 host, HFmode parameters are treated like "unsigned int".
> > > > > Hence, libgcc functions like __sqrthf behave as though declared
> > > > > (are binary compatible with) "unsigned short __sqrthf(unsigned
> short)".
> > > > > As you point out, this also greatly simplifies soft-float, as both
> > > > > ABIs remain the same.
> > > > >
> > > > > Alas, the subreg approach doesn't work as we'd end up with
> > > > > (subreg:SI (subreg:HI (reg:HF)), though technically that is what
> > > > > this patch does, inserting a pair of conversion instructions.
> > > >
> > > > So what does FUNCTION_ARG return for the HFmode parameters?  The
> > > > above seems backwards - it should be (subreg:HF (reg:SI 0)), no?
> > > >
> > > > Or are we talking about the caller side?  Or the return value case
> > > > (in
> > > which case I
> > > > ask the same question wrt FUNCTION_VALUE)?
> > > >
> > > > Does nvptx use the promotion hooks in those cases?
> > > >
> > > > > Until recently (subreg:SI (reg:HF)) did work, but that support was
> > > > > removed/cleaned-up,
> > > > > as we need sensible subreg semantics in the RTL passes.   The
> proposed
> > > patch
> > > > > simply
> > > > > adds support back in the minimal places where changing FP/non-FP
> > > > > and precision at the same time is required: argument passing and
> > > > > return values.
> > > > >
> > > > > Hopefully this answers your question.  An alternate viewpoint
> > > > > might be "is there a reason for GCC not to be able to support
> > > > > targets 

Re: [PATCH] x86: Ignore OPTION_MASK_ISA_64BIT for -m32 when inlining

2022-03-14 Thread H.J. Lu via Gcc-patches
On Mon, Mar 14, 2022 at 7:03 AM Jakub Jelinek  wrote:
>
> On Mon, Mar 14, 2022 at 06:49:01AM -0700, H.J. Lu via Gcc-patches wrote:
> > gcc/
> >
> >   PR target/104890
> >   * config/i386/i386.cc (ix86_can_inline_p): Ignore
> >   OPTION_MASK_ISA_64BIT for -m32.
> >
> > gcc/testsuite/
> >
> >   PR target/104890
> >   * gcc.target/i386/pr104890.c: New test.
> > ---
> >  gcc/config/i386/i386.cc  |  4 
> >  gcc/testsuite/gcc.target/i386/pr104890.c | 11 +++
> >  2 files changed, 15 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104890.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 23bedea92bd..f2bb4765e5b 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -587,6 +587,10 @@ ix86_can_inline_p (tree caller, tree callee)
> >if (TARGET_GENERAL_REGS_ONLY_P (callee_opts->x_ix86_target_flags))
> >  always_inline_safe_mask |= MASK_80387;
> >
> > +  /* Ignore OPTION_MASK_ISA_64BIT for -m32.  */
> > +  if (!TARGET_64BIT)
> > +always_inline_safe_mask |= OPTION_MASK_ISA_64BIT;
>
> This looks wrong.  (1 << 1) in alwyas_inline_safe_mask and
> *->x_target_flags is MASK_80387, not OPTION_MASK_ISA_64BIT.
>
> Jakub

Here is the v2 patch:

https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591716.html

-- 
H.J.


[PATCH v2] x86: Also check _SOFT_FLOAT in

2022-03-14 Thread H.J. Lu via Gcc-patches
Push target("general-regs-only") in  if x87 is enabled.

gcc/

PR target/104890
* config/i386/x86gprintrin.h: Also check _SOFT_FLOAT before
pushing target("general-regs-only").

gcc/testsuite/

PR target/104890
* gcc.target/i386/pr104890.c: New test.
---
 gcc/config/i386/x86gprintrin.h   |  2 +-
 gcc/testsuite/gcc.target/i386/pr104890.c | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104890.c

diff --git a/gcc/config/i386/x86gprintrin.h b/gcc/config/i386/x86gprintrin.h
index 017ec299793..e0be01d5e78 100644
--- a/gcc/config/i386/x86gprintrin.h
+++ b/gcc/config/i386/x86gprintrin.h
@@ -24,7 +24,7 @@
 #ifndef _X86GPRINTRIN_H_INCLUDED
 #define _X86GPRINTRIN_H_INCLUDED
 
-#if defined __MMX__ || defined __SSE__
+#if !defined _SOFT_FLOAT || defined __MMX__ || defined __SSE__
 #pragma GCC push_options
 #pragma GCC target("general-regs-only")
 #define __DISABLE_GENERAL_REGS_ONLY__
diff --git a/gcc/testsuite/gcc.target/i386/pr104890.c 
b/gcc/testsuite/gcc.target/i386/pr104890.c
new file mode 100644
index 000..cb430eef688
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104890.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -mshstk -march=i686" } */
+
+#include 
+
+__attribute__((target ("general-regs-only")))
+int
+foo ()
+{
+  return _get_ssp ();
+}
-- 
2.35.1



RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Roger Sayle


Hi Richard,
> Yes, which is why I think the target should claim argument passing happens
in reg:HI.

Unfortunately, this hits another "feature" of the nvptx backend; it's a 

/* Implement TARGET_MODES_TIEABLE_P.  */
bool nvptx_modes_tieable_p (machine_mode, machine_mode) { return false; }
/* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
bool nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t) {
return false; }
/* Implement TARGET_TRULY_NOOP_TRUNCATION.  */
bool nvptx_truly_noop_truncation (poly_uint64, poly_uint64) { return false;
}

Basically, HImode is considered as a different register (bank) to SImode,
and requires
explicit move instructions to move data from HImode to SImode, and back,
unlike
most targets that can simply re-interpret the contents of GPRs.

Passing an argument as HImode would be incompatible with the requirement to
pass as SImode.

Cheers,
Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 14 March 2022 13:28
> To: Roger Sayle 
> Cc: 'Tom de Vries' ; 'Jeff Law' ;
> 'Tobias Burnus' ; richard.sandif...@arm.com;
'Jeff
> Law' ; gcc-patches@gcc.gnu.org
> Subject: RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP
values as
> wider integers.
> 
> On Mon, 14 Mar 2022, Roger Sayle wrote:
> 
> >
> > Hi Richard,
> >
> > > The above seems backwards - it should be (subreg:HF (reg:SI 0)), no?
> >
> > Line 949 of emit-rtl.cc in the definition of "validate_subreg" contains:
> >
> >   /* Subregs involving floating point modes are not allowed to
> >  change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> >  (subreg:SI (reg:DF) 0) isn't.  */
> >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > {
> >   if (! (known_eq (isize, osize)
> >
> > Hence, we ICE on line 1022 of emit-rtl.cc in gen_rtx_SUBREG, that
> > asserts validate_subreg, when called with omode=HFmode and
> imode=SImode.
> 
> Yes, which is why I think the target should claim argument passing happens
in
> reg:HI.  But as said, I'm not very familiar with argument passing
internals.  But
> still the patch looks quite bolted on.
> 
> For the expand_value_return I have to guess that 'val' has HFmode, the
> DECL_RESULT also has HFmode(?), promote_function_mode then returns
> SImode which isn't supported.  So make it return HImode?
> 
> > Cheers,
> > Roger
> > --
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: 14 March 2022 10:15
> > > To: Roger Sayle 
> > > Cc: 'Tom de Vries' ; 'Jeff Law'
> > > ; 'Tobias Burnus' ;
> > > richard.sandif...@arm.com;
> > 'Jeff
> > > Law' ; gcc-patches@gcc.gnu.org
> > > Subject: RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP
> > values as
> > > wider integers.
> > >
> > > On Mon, 14 Mar 2022, Roger Sayle wrote:
> > >
> > > >
> > > > Hi Richard,
> > > > Thanks for asking.
> > > > The issue is with 16-bit floating point HFmode, where clang on
> > > > nvptx passes HFmode values in SImode registers, and for binary
> > > > compatibility GCC needs to do the same.
> > > > Their motivation is that for compatibility with older GPUs and the
> > > > x86_64 host, HFmode parameters are treated like "unsigned int".
> > > > Hence, libgcc functions like __sqrthf behave as though declared
> > > > (are binary compatible with) "unsigned short __sqrthf(unsigned
short)".
> > > > As you point out, this also greatly simplifies soft-float, as both
> > > > ABIs remain the same.
> > > >
> > > > Alas, the subreg approach doesn't work as we'd end up with
> > > > (subreg:SI (subreg:HI (reg:HF)), though technically that is what
> > > > this patch does, inserting a pair of conversion instructions.
> > >
> > > So what does FUNCTION_ARG return for the HFmode parameters?  The
> > > above seems backwards - it should be (subreg:HF (reg:SI 0)), no?
> > >
> > > Or are we talking about the caller side?  Or the return value case
> > > (in
> > which case I
> > > ask the same question wrt FUNCTION_VALUE)?
> > >
> > > Does nvptx use the promotion hooks in those cases?
> > >
> > > > Until recently (subreg:SI (reg:HF)) did work, but that support was
> > > > removed/cleaned-up,
> > > > as we need sensible subreg semantics in the RTL passes.   The
proposed
> > patch
> > > > simply
> > > > adds support back in the minimal places where changing FP/non-FP
> > > > and precision at the same time is required: argument passing and
> > > > return values.
> > > >
> > > > Hopefully this answers your question.  An alternate viewpoint
> > > > might be "is there a reason for GCC not to be able to support
> > > > targets with slightly wacky parameter passing conventions".
> > > >
> > > > Thanks for thinking about this.
> > > > Roger
> > > > --
> > > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: 14 March 2022 09:09
> > > > > To: Roger Sayle 
> > > > > Cc: 'Tom de Vries' ; 'Jeff Law'
> > > > > ; 'Tobias Burnus'
> > > > > ; richard.sandif...@arm.com;
> > > > 'Jeff
> > > > > Law' ; gcc-patches@gcc.gnu.org
> > > > > 

Re: [PATCH] libstdc++: Ensure that std::from_chars is declared when supported

2022-03-14 Thread Patrick Palka via Gcc-patches
On Fri, 11 Mar 2022, Jonathan Wakely wrote:

> Patrick, I think this is right, but please take a look to double check.
> 
> I think we should fix the feature-test macro conditions for gcc-11 too,
> although it's a bit more complicated there. It should depend on IEEE
> float and double *and* uselocale. We don't need the other changes on the
> branch.

Don't we still depend on uselocale in GCC 12 for long double from_chars,
at least on targets where long double != binary64?

> 
> 
> -- >8 --
> 
> This adjusts the declarations in  to match when the definition
> is present. This solves the issue that std::from_chars is present on
> Solaris 11.3 (using fast_float) but was not declared in the header
> (because the declarations were guarded by _GLIBCXX_HAVE_USELOCALE).
> 
> Additionally, do not define __cpp_lib_to_chars unless both from_chars
> and to_chars are supported (which is only true for IEEE float and
> double). We might still provide from_chars (via strtold) but if to_chars
> isn't provided, we shouldn't define the feature test macro.
> 
> Finally, this simplifies some of the preprocessor checks in the bodies
> of std::from_chars in src/c++17/floating_from_chars.cc and hoists the
> repeated code for the strtod version into a new function template.
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/std/charconv (__cpp_lib_to_chars): Only define when
>   both from_chars and to_chars are supported for floating-point
>   types.
>   (from_chars, to_chars): Adjust preprocessor conditions guarding
>   declarations.
>   * include/std/version (__cpp_lib_to_chars): Adjust condition to
>   match  definition.
>   * src/c++17/floating_from_chars.cc (from_chars_strtod): New
>   function template.
>   (from_chars): Simplify preprocessor checks and use
>   from_chars_strtod when appropriate.
> ---
>  libstdc++-v3/include/std/charconv |   8 +-
>  libstdc++-v3/include/std/version  |   3 +-
>  libstdc++-v3/src/c++17/floating_from_chars.cc | 120 ++
>  3 files changed, 45 insertions(+), 86 deletions(-)
> 
> diff --git a/libstdc++-v3/include/std/charconv 
> b/libstdc++-v3/include/std/charconv
> index a3f8c7718b2..2ce9c7d4cb9 100644
> --- a/libstdc++-v3/include/std/charconv
> +++ b/libstdc++-v3/include/std/charconv
> @@ -43,7 +43,8 @@
>  #include  // for std::errc
>  #include 
>  
> -#if _GLIBCXX_HAVE_USELOCALE
> +#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \
> +&& __SIZE_WIDTH__ >= 32

So ISTM we need to also check

  _GLIBCXX_HAVE_USELOCALE || (__LDBL_MANT_DIG__ == __DBL_MANT_DIG__)

or something like that :/

>  # define __cpp_lib_to_chars 201611L
>  #endif
>  
> @@ -686,7 +687,7 @@ namespace __detail
>operator^=(chars_format& __lhs, chars_format __rhs) noexcept
>{ return __lhs = __lhs ^ __rhs; }
>  
> -#if _GLIBCXX_HAVE_USELOCALE
> +#if defined __cpp_lib_to_chars || _GLIBCXX_HAVE_USELOCALE
>from_chars_result
>from_chars(const char* __first, const char* __last, float& __value,
>chars_format __fmt = chars_format::general) noexcept;
> @@ -700,8 +701,7 @@ namespace __detail
>chars_format __fmt = chars_format::general) noexcept;
>  #endif
>  
> -#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \
> -&& __SIZE_WIDTH__ >= 32
> +#if defined __cpp_lib_to_chars
>// Floating-point std::to_chars
>  
>// Overloads for float.
> diff --git a/libstdc++-v3/include/std/version 
> b/libstdc++-v3/include/std/version
> index 461e65b5fab..d730a7ea3c7 100644
> --- a/libstdc++-v3/include/std/version
> +++ b/libstdc++-v3/include/std/version
> @@ -171,7 +171,8 @@
>  #endif
>  #define __cpp_lib_shared_ptr_weak_type 201606L
>  #define __cpp_lib_string_view 201803L
> -#if _GLIBCXX_HAVE_USELOCALE
> +#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \
> +&& __SIZE_WIDTH__ >= 32
>  # define __cpp_lib_to_chars 201611L
>  #endif
>  #define __cpp_lib_unordered_map_try_emplace 201411L
> diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
> b/libstdc++-v3/src/c++17/floating_from_chars.cc
> index ba0426b3344..4aa2483bc28 100644
> --- a/libstdc++-v3/src/c++17/floating_from_chars.cc
> +++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
> @@ -65,6 +65,7 @@ extern "C" __ieee128 __strtoieee128(const char*, char**);
>  && __SIZE_WIDTH__ >= 32
>  # define USE_LIB_FAST_FLOAT 1
>  # if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__
> +// No need to use strtold.
>  #  undef USE_STRTOD_FOR_FROM_CHARS
>  # endif
>  #endif
> @@ -420,6 +421,33 @@ namespace
>  return true;
>}
>  #endif
> +
> +  template
> +  from_chars_result
> +  from_chars_strtod(const char* first, const char* last, T& value,
> + chars_format fmt) noexcept
> +  {
> +errc ec = errc::invalid_argument;
> +#if _GLIBCXX_USE_CXX11_ABI
> +buffer_resource mr;
> +pmr::string buf();
> +#else
> +string buf;
> +if (!reserve_string(buf))
> +  return 

Re: [PATCH] gcc: add --enable-systemtap switch [PR61257]

2022-03-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Mar 14, 2022 at 09:26:57AM -0400, Marek Polacek via Gcc-patches wrote:
> Thanks for the patch.
> 
> The new configure option needs documenting in doc/install.texi, and configure
> needs to be regenerated.

More importantly, I don't see explanation why the patch is needed, analysis
why did the HAVE_SYS_SDT_H configure check say that the header exists but
trying to include it in libgcc doesn't.

Jakub



Re: [PATCH] x86: Ignore OPTION_MASK_ISA_64BIT for -m32 when inlining

2022-03-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Mar 14, 2022 at 06:49:01AM -0700, H.J. Lu via Gcc-patches wrote:
> gcc/
> 
>   PR target/104890
>   * config/i386/i386.cc (ix86_can_inline_p): Ignore
>   OPTION_MASK_ISA_64BIT for -m32.
> 
> gcc/testsuite/
> 
>   PR target/104890
>   * gcc.target/i386/pr104890.c: New test.
> ---
>  gcc/config/i386/i386.cc  |  4 
>  gcc/testsuite/gcc.target/i386/pr104890.c | 11 +++
>  2 files changed, 15 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104890.c
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 23bedea92bd..f2bb4765e5b 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -587,6 +587,10 @@ ix86_can_inline_p (tree caller, tree callee)
>if (TARGET_GENERAL_REGS_ONLY_P (callee_opts->x_ix86_target_flags))
>  always_inline_safe_mask |= MASK_80387;
>  
> +  /* Ignore OPTION_MASK_ISA_64BIT for -m32.  */
> +  if (!TARGET_64BIT)
> +always_inline_safe_mask |= OPTION_MASK_ISA_64BIT;

This looks wrong.  (1 << 1) in alwyas_inline_safe_mask and
*->x_target_flags is MASK_80387, not OPTION_MASK_ISA_64BIT.

Jakub



Re: [PATCH] x86: Ignore OPTION_MASK_ISA_64BIT for -m32 when inlining

2022-03-14 Thread H.J. Lu via Gcc-patches
On Mon, Mar 14, 2022 at 6:49 AM H.J. Lu  wrote:
>
> gcc/
>
> PR target/104890
> * config/i386/i386.cc (ix86_can_inline_p): Ignore
> OPTION_MASK_ISA_64BIT for -m32.

Please ignore this.

> gcc/testsuite/
>
> PR target/104890
> * gcc.target/i386/pr104890.c: New test.
> ---
>  gcc/config/i386/i386.cc  |  4 
>  gcc/testsuite/gcc.target/i386/pr104890.c | 11 +++
>  2 files changed, 15 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104890.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 23bedea92bd..f2bb4765e5b 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -587,6 +587,10 @@ ix86_can_inline_p (tree caller, tree callee)
>if (TARGET_GENERAL_REGS_ONLY_P (callee_opts->x_ix86_target_flags))
>  always_inline_safe_mask |= MASK_80387;
>
> +  /* Ignore OPTION_MASK_ISA_64BIT for -m32.  */
> +  if (!TARGET_64BIT)
> +always_inline_safe_mask |= OPTION_MASK_ISA_64BIT;
> +
>cgraph_node *callee_node = cgraph_node::get (callee);
>/* Callee's isa options should be a subset of the caller's, i.e. a SSE4
>   function can inline a SSE2 function but a SSE2 function can't inline
> diff --git a/gcc/testsuite/gcc.target/i386/pr104890.c 
> b/gcc/testsuite/gcc.target/i386/pr104890.c
> new file mode 100644
> index 000..cb430eef688
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr104890.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-options "-O2 -mshstk -march=i686" } */
> +
> +#include 
> +
> +__attribute__((target ("general-regs-only")))
> +int
> +foo ()
> +{
> +  return _get_ssp ();
> +}
> --
> 2.35.1
>


-- 
H.J.


[PATCH] x86: Ignore OPTION_MASK_ISA_64BIT for -m32 when inlining

2022-03-14 Thread H.J. Lu via Gcc-patches
gcc/

PR target/104890
* config/i386/i386.cc (ix86_can_inline_p): Ignore
OPTION_MASK_ISA_64BIT for -m32.

gcc/testsuite/

PR target/104890
* gcc.target/i386/pr104890.c: New test.
---
 gcc/config/i386/i386.cc  |  4 
 gcc/testsuite/gcc.target/i386/pr104890.c | 11 +++
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104890.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 23bedea92bd..f2bb4765e5b 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -587,6 +587,10 @@ ix86_can_inline_p (tree caller, tree callee)
   if (TARGET_GENERAL_REGS_ONLY_P (callee_opts->x_ix86_target_flags))
 always_inline_safe_mask |= MASK_80387;
 
+  /* Ignore OPTION_MASK_ISA_64BIT for -m32.  */
+  if (!TARGET_64BIT)
+always_inline_safe_mask |= OPTION_MASK_ISA_64BIT;
+
   cgraph_node *callee_node = cgraph_node::get (callee);
   /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
  function can inline a SSE2 function but a SSE2 function can't inline
diff --git a/gcc/testsuite/gcc.target/i386/pr104890.c 
b/gcc/testsuite/gcc.target/i386/pr104890.c
new file mode 100644
index 000..cb430eef688
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104890.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -mshstk -march=i686" } */
+
+#include 
+
+__attribute__((target ("general-regs-only")))
+int
+foo ()
+{
+  return _get_ssp ();
+}
-- 
2.35.1



Re: [PATCH] lra: Fix up debug_p handling in lra_substitute_pseudo [PR104778]

2022-03-14 Thread Vladimir Makarov via Gcc-patches



On 2022-03-12 14:37, Jakub Jelinek wrote:

Hi!

The following testcase ICEs on powerpc-linux, because lra_substitute_pseudo
substitutes (const_int 1) into a subreg operand.  First a subreg of subreg
of a reg appears in a debug insn (which surely is invalid outside of
debug insns, but in debug insns we allow even what is normally invalid in
RTL like subregs which the target doesn't like, because either dwarf2out
is able to handle it, or we just throw away the location expression,
making some var .

lra_substitute_pseudo already has some code to deal with specifically
SUBREG of REG with the REG being substituted for VOIDmode constant,
but that doesn't cover this case, so the following patch extends
lra_substitute_pseudo for debug_p mode to treat stuff like e.g.
combiner's subst function to ensure we don't lose mode which is essential
for the IL.

Bootstrapped/regtested on {powerpc64{,le},x86_64,i686}-linux, ok for trunk?



Sure.  Thank you for working on this PR, Jakub.



2022-03-12  Jakub Jelinek  

PR debug/104778
* lra.cc (lra_substitute_pseudo): For debug_p mode, simplify
SUBREG, ZERO_EXTEND, SIGN_EXTEND, FLOAT or UNSIGNED_FLOAT if recursive
call simplified the first operand into VOIDmode constant.

* gcc.target/powerpc/pr104778.c: New test.





RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Richard Biener via Gcc-patches
On Mon, 14 Mar 2022, Roger Sayle wrote:

> 
> Hi Richard,
> 
> > The above seems backwards - it should be (subreg:HF (reg:SI 0)), no?
> 
> Line 949 of emit-rtl.cc in the definition of "validate_subreg" contains:
> 
>   /* Subregs involving floating point modes are not allowed to
>  change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
>  (subreg:SI (reg:DF) 0) isn't.  */
>   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> {
>   if (! (known_eq (isize, osize)
> 
> Hence, we ICE on line 1022 of emit-rtl.cc in gen_rtx_SUBREG, that asserts
> validate_subreg, when called with omode=HFmode and imode=SImode.

Yes, which is why I think the target should claim argument passing
happens in reg:HI.  But as said, I'm not very familiar with argument
passing internals.  But still the patch looks quite bolted on.

For the expand_value_return I have to guess that 'val' has HFmode,
the DECL_RESULT also has HFmode(?), promote_function_mode then
returns SImode which isn't supported.  So make it return HImode?

> Cheers,
> Roger
> --
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: 14 March 2022 10:15
> > To: Roger Sayle 
> > Cc: 'Tom de Vries' ; 'Jeff Law' ;
> > 'Tobias Burnus' ; richard.sandif...@arm.com;
> 'Jeff
> > Law' ; gcc-patches@gcc.gnu.org
> > Subject: RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP
> values as
> > wider integers.
> > 
> > On Mon, 14 Mar 2022, Roger Sayle wrote:
> > 
> > >
> > > Hi Richard,
> > > Thanks for asking.
> > > The issue is with 16-bit floating point HFmode, where clang on nvptx
> > > passes HFmode values in SImode registers, and for binary compatibility
> > > GCC needs to do the same.
> > > Their motivation is that for compatibility with older GPUs and the
> > > x86_64 host, HFmode parameters are treated like "unsigned int".
> > > Hence, libgcc functions like __sqrthf behave as though declared (are
> > > binary compatible with) "unsigned short __sqrthf(unsigned short)".
> > > As you point out, this also greatly simplifies soft-float, as both
> > > ABIs remain the same.
> > >
> > > Alas, the subreg approach doesn't work as we'd end up with (subreg:SI
> > > (subreg:HI (reg:HF)), though technically that is what this patch does,
> > > inserting a pair of conversion instructions.
> > 
> > So what does FUNCTION_ARG return for the HFmode parameters?  The above
> > seems backwards - it should be (subreg:HF (reg:SI 0)), no?
> > 
> > Or are we talking about the caller side?  Or the return value case (in
> which case I
> > ask the same question wrt FUNCTION_VALUE)?
> > 
> > Does nvptx use the promotion hooks in those cases?
> > 
> > > Until recently (subreg:SI (reg:HF)) did work, but that support was
> > > removed/cleaned-up,
> > > as we need sensible subreg semantics in the RTL passes.   The proposed
> patch
> > > simply
> > > adds support back in the minimal places where changing FP/non-FP and
> > > precision at the same time is required: argument passing and return
> > > values.
> > >
> > > Hopefully this answers your question.  An alternate viewpoint might be
> > > "is there a reason for GCC not to be able to support targets with
> > > slightly wacky parameter passing conventions".
> > >
> > > Thanks for thinking about this.
> > > Roger
> > > --
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: 14 March 2022 09:09
> > > > To: Roger Sayle 
> > > > Cc: 'Tom de Vries' ; 'Jeff Law'
> > > > ; 'Tobias Burnus' ;
> > > > richard.sandif...@arm.com;
> > > 'Jeff
> > > > Law' ; gcc-patches@gcc.gnu.org
> > > > Subject: RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP
> > > values as
> > > > wider integers.
> > > >
> > > > On Mon, 14 Mar 2022, Roger Sayle wrote:
> > > >
> > > > >
> > > > > I thought I'd add a few comments that might help reviewers of this
> > > patch.
> > > > > Most importantly, this patch should be completely safe, as both
> > > > > changes only trigger with FLOAT vs INT size mismatches which
> > > > > currently both ICE in the compiler a few lines later.  Admittedly,
> > > > > this indicates something odd about a target's choice of ABI, but
> > > > > one alternative might be to issue a "sorry, unimplemented" error
> > > > > message rather than ICE, perhaps with a TODO or FIXME comment that
> > > > > support for mixed size FP/integer ABIs be added in future.
> > > > >
> > > > > The only (other?) possible point of contention is the (arbitrary)
> > > > > decision that when passing floating point values in a larger
> > > > > integer register, the code is hardwired to use zero-extension.
> > > > > This in theory could be turned into a target hook to support
> > > > > sign-extension, but given these are padding bits, zero seems
> > > > > appropriate. [On x86_64, if passing DFmode argument in a V2DFmode
> > > > > vector, say, it seems reasonable
> > > to
> > > > use movq and zero the high bits].
> > > > >
> > > > > The final point is that at the moment, the only affected target is
> > > > 

Re: [PATCH] gcc: add --enable-systemtap switch [PR61257]

2022-03-14 Thread Marek Polacek via Gcc-patches
On Mon, Mar 14, 2022 at 01:48:21PM +0100, David Seifert via Gcc-patches wrote:
> Bug: https://bugs.gentoo.org/654748
> 
> gcc/configure.ac
> * Add --enable-systemtap switch to explicitly allow for
> disabling systemtap even if the target libc supports it.
> ---
>  gcc/configure.ac | 26 ++
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index 90cad309762..7dfb9a58cdd 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -6904,14 +6904,24 @@ AC_SUBST([enable_default_ssp])
>  
>  # Test for  on the target.
>  GCC_TARGET_TEMPLATE([HAVE_SYS_SDT_H])
> -AC_MSG_CHECKING(sys/sdt.h in the target C library)
> -have_sys_sdt_h=no
> -if test -f $target_header_dir/sys/sdt.h; then
> -  have_sys_sdt_h=yes
> -  AC_DEFINE(HAVE_SYS_SDT_H, 1,
> -[Define if your target C library provides sys/sdt.h])
> -fi
> -AC_MSG_RESULT($have_sys_sdt_h)
> +
> +AC_ARG_ENABLE([systemtap],
> +  [AS_HELP_STRING([--enable-systemtap],
> +[enable systemtap static probe points [default=auto]])])
> +
> +AS_IF([test x$enable_systemtap != xno], [
> +  AC_MSG_CHECKING([sys/sdt.h in the target C library])
> +  have_sys_sdt_h=no
> +  AS_IF([test -f "$target_header_dir/sys/sdt.h"], [
> +have_sys_sdt_h=yes
> +AC_DEFINE([HAVE_SYS_SDT_H], [1],
> +  [Define if your target C library provides sys/sdt.h])
> +  ])
> +  AC_MSG_RESULT([$have_sys_sdt_h])
> +  AS_IF([test x$enable_systemtap = xyes && test x$have_sys_sdt_h = xno], [
> +AC_MSG_ERROR([sys/sdt.h was not found])
> +  ])
> +])

Thanks for the patch.

The new configure option needs documenting in doc/install.texi, and configure
needs to be regenerated.

Marek



Re: [PATCH] top-level: Fix comment about --enable-libstdcxx in configure

2022-03-14 Thread Jonathan Wakely via Gcc-patches

Pushed to trunk now.

On 11/03/22 18:38 +, Jonathan Wakely wrote:

I'm going to push this as obvious, but do I need to do anything special
to sync it with binutils, or will that happen next time somebody needs a
proper fix?

-- >8 --

The custom option for enabling/disabling libstdc++ is not spelled the
same as the directory name:

AC_ARG_ENABLE(libstdcxx,
AS_HELP_STRING([--disable-libstdcxx],
 [do not build libstdc++-v3 directory])

The comment referring to it later use the wrong name.

ChangeLog:

* configure.ac: Fix incorrect option in comment.
* configure: Regenerate.
---
configure| 2 +-
configure.ac | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 9c2d7df1bb2..f7e0fa46c9c 100755
--- a/configure
+++ b/configure
@@ -3390,7 +3390,7 @@ case "${target}" in
esac

# Disable libstdc++-v3 for some systems.
-# Allow user to override this if they pass --enable-libstdc++-v3
+# Allow user to override this if they pass --enable-libstdcxx
if test "${ENABLE_LIBSTDCXX}" = "default" ; then
  case "${target}" in
*-*-vxworks*)
diff --git a/configure.ac b/configure.ac
index 68cc5cc31fe..434b1a267a4 100644
--- a/configure.ac
+++ b/configure.ac
@@ -649,7 +649,7 @@ case "${target}" in
esac

# Disable libstdc++-v3 for some systems.
-# Allow user to override this if they pass --enable-libstdc++-v3
+# Allow user to override this if they pass --enable-libstdcxx
if test "${ENABLE_LIBSTDCXX}" = "default" ; then
  case "${target}" in
*-*-vxworks*)




[committed] libstdc++: Fix reading UTF-8 characters for 16-bit targets [PR104875]

2022-03-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux and sparc-sun-solaris2.11, pushed to trunk.

This should be backported too. It's not a regression, but this code is
just broken on 16-bit targets.

-- >8 --

The current code in read_utf8_code_point assumes that integer promotion
will create a 32-bit int, but that's not true for 16-bit targets like
msp430 and avr. This changes the intermediate variables used for each
octet from unsigned char to char32_t, so that (c << N) works correctly
when N > 8.

libstdc++-v3/ChangeLog:

PR libstdc++/104875
* src/c++11/codecvt.cc (read_utf8_code_point): Use char32_t to
hold octets that will be left-shifted.
---
 libstdc++-v3/src/c++11/codecvt.cc | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/src/c++11/codecvt.cc 
b/libstdc++-v3/src/c++11/codecvt.cc
index d9f2dacb647..9f8cb767732 100644
--- a/libstdc++-v3/src/c++11/codecvt.cc
+++ b/libstdc++-v3/src/c++11/codecvt.cc
@@ -254,7 +254,7 @@ namespace
 const size_t avail = from.size();
 if (avail == 0)
   return incomplete_mb_character;
-unsigned char c1 = from[0];
+char32_t c1 = (unsigned char) from[0];
 // https://en.wikipedia.org/wiki/UTF-8#Sample_code
 if (c1 < 0x80)
 {
@@ -267,7 +267,7 @@ namespace
 {
   if (avail < 2)
return incomplete_mb_character;
-  unsigned char c2 = from[1];
+  char32_t c2 = (unsigned char) from[1];
   if ((c2 & 0xC0) != 0x80)
return invalid_mb_sequence;
   char32_t c = (c1 << 6) + c2 - 0x3080;
@@ -279,12 +279,12 @@ namespace
 {
   if (avail < 3)
return incomplete_mb_character;
-  unsigned char c2 = from[1];
+  char32_t c2 = (unsigned char) from[1];
   if ((c2 & 0xC0) != 0x80)
return invalid_mb_sequence;
   if (c1 == 0xE0 && c2 < 0xA0) // overlong
return invalid_mb_sequence;
-  unsigned char c3 = from[2];
+  char32_t c3 = (unsigned char) from[2];
   if ((c3 & 0xC0) != 0x80)
return invalid_mb_sequence;
   char32_t c = (c1 << 12) + (c2 << 6) + c3 - 0xE2080;
@@ -296,17 +296,17 @@ namespace
 {
   if (avail < 4)
return incomplete_mb_character;
-  unsigned char c2 = from[1];
+  char32_t c2 = (unsigned char) from[1];
   if ((c2 & 0xC0) != 0x80)
return invalid_mb_sequence;
   if (c1 == 0xF0 && c2 < 0x90) // overlong
return invalid_mb_sequence;
   if (c1 == 0xF4 && c2 >= 0x90) // > U+10
   return invalid_mb_sequence;
-  unsigned char c3 = from[2];
+  char32_t c3 = (unsigned char) from[2];
   if ((c3 & 0xC0) != 0x80)
return invalid_mb_sequence;
-  unsigned char c4 = from[3];
+  char32_t c4 = (unsigned char) from[3];
   if ((c4 & 0xC0) != 0x80)
return invalid_mb_sequence;
   char32_t c = (c1 << 18) + (c2 << 12) + (c3 << 6) + c4 - 0x3C82080;
-- 
2.34.1



[PATCH] gcc: add --enable-systemtap switch [PR61257]

2022-03-14 Thread David Seifert via Gcc-patches
Bug: https://bugs.gentoo.org/654748

gcc/configure.ac
* Add --enable-systemtap switch to explicitly allow for
disabling systemtap even if the target libc supports it.
---
 gcc/configure.ac | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/gcc/configure.ac b/gcc/configure.ac
index 90cad309762..7dfb9a58cdd 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -6904,14 +6904,24 @@ AC_SUBST([enable_default_ssp])
 
 # Test for  on the target.
 GCC_TARGET_TEMPLATE([HAVE_SYS_SDT_H])
-AC_MSG_CHECKING(sys/sdt.h in the target C library)
-have_sys_sdt_h=no
-if test -f $target_header_dir/sys/sdt.h; then
-  have_sys_sdt_h=yes
-  AC_DEFINE(HAVE_SYS_SDT_H, 1,
-[Define if your target C library provides sys/sdt.h])
-fi
-AC_MSG_RESULT($have_sys_sdt_h)
+
+AC_ARG_ENABLE([systemtap],
+  [AS_HELP_STRING([--enable-systemtap],
+[enable systemtap static probe points [default=auto]])])
+
+AS_IF([test x$enable_systemtap != xno], [
+  AC_MSG_CHECKING([sys/sdt.h in the target C library])
+  have_sys_sdt_h=no
+  AS_IF([test -f "$target_header_dir/sys/sdt.h"], [
+have_sys_sdt_h=yes
+AC_DEFINE([HAVE_SYS_SDT_H], [1],
+  [Define if your target C library provides sys/sdt.h])
+  ])
+  AC_MSG_RESULT([$have_sys_sdt_h])
+  AS_IF([test x$enable_systemtap = xyes && test x$have_sys_sdt_h = xno], [
+AC_MSG_ERROR([sys/sdt.h was not found])
+  ])
+])
 
 # Check if TFmode long double should be used by default or not.
 # Some glibc targets used DFmode long double, but with glibc 2.4
-- 
2.35.1



Re: [PATCH] i386: Fix up _mm_loadu_si{16,32} [PR99754]

2022-03-14 Thread Hongtao Liu via Gcc-patches
On Mon, Mar 14, 2022 at 8:20 PM Hongtao Liu  wrote:
>
> On Mon, Mar 14, 2022 at 7:25 PM Jakub Jelinek  wrote:
> >
> > On Sun, Mar 13, 2022 at 09:34:10PM +0800, Hongtao Liu wrote:
> > > LGTM, thanks for handling this.
> >
> > Thanks, committed.
> >
> > > > Note, while the Intrinsics guide for _mm_loadu_si32 says SSE2,
> > > > for _mm_loadu_si16 it says strangely SSE.  But the intrinsics
> > > > returns __m128i, which is only defined in emmintrin.h, and
> > > > _mm_set_epi16 is also only SSE2 and later in emmintrin.h.
> > > > Even clang defines it in emmintrin.h and ends up with inlining
> > > > failure when calling _mm_loadu_si16 from sse,no-sse2 function.
> > > > So, isn't that a bug in the intrinsic guide instead?
> > > I think it's a bug, it's supposed to generate movzx + movd, and movd
> > > is under sse2, and have reported it to the colleague who maintains
> > > Intel intrinsic guide.
> > >
> > > Similar bug for
> > > _mm_loadu_si64
> > > _mm_storeu_si16
> > > _mm_storeu_si64
> >
> > Currently it emits pxor + pinsrw, but even those are SSE2 instructions,
> > unless they use a MMX register (then it is MMX and SSE).
> > I agree that movzwl + movd seems better than pxor + pinsrw though.
> > So, do we want to help it a little bit then?  Like:
> >
> > 2022-03-14  Jakub Jelinek  
> >
> > * config/i386/eemintrin.h (_mm_loadu_si16): Use _mm_set_epi32 
> > instead
> > of _mm_set_epi16 and zero extend the memory load.
> >
> > * gcc.target/i386/pr95483-1.c: Use -msse2 instead of -msse in
> > dg-options, allow movzwl+movd instead of pxor with pinsrw.
> >
> > --- gcc/config/i386/emmintrin.h.jj  2022-03-14 10:44:29.402617685 +0100
> > +++ gcc/config/i386/emmintrin.h 2022-03-14 11:58:18.062666257 +0100
> > @@ -724,7 +724,7 @@ _mm_loadu_si32 (void const *__P)
> >  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> >  _mm_loadu_si16 (void const *__P)
> >  {
> > -  return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, (*(__m16_u *)__P)[0]);
> > +  return _mm_set_epi32 (0, 0, 0, (unsigned short) ((*(__m16_u *)__P)[0]));
> >  }
> Under avx512fp16,  the former directly generates vmovw, but the latter
> still generates movzx + vmovd. There's still a miss optimization.
> Thus I prefer to optimize it in the backend pxor + pinsrw -> movzx +
> movd -> vmovw (under avx512fp16).
> I'll open a PR for that and optimize it in GCC13.
PR104915.
> >
> >  extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > --- gcc/testsuite/gcc.target/i386/pr95483-1.c.jj2020-10-14 
> > 22:05:19.380856952 +0200
> > +++ gcc/testsuite/gcc.target/i386/pr95483-1.c   2022-03-14 
> > 12:11:07.716891710 +0100
> > @@ -1,7 +1,7 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -msse" } */
> > -/* { dg-final { scan-assembler-times "pxor\[ 
> > \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
> > -/* { dg-final { scan-assembler-times "pinsrw\[ 
> > \\t\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
> > +/* { dg-options "-O2 -msse2" } */
> > +/* { dg-final { scan-assembler-times "(?:movzwl\[ \\t\]+\[^\n\]*|pxor\[ 
> > \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+)(?:\n|\[ \\t\]+#)" 1 } } */
> > +/* { dg-final { scan-assembler-times "(?:movd|pinsrw)\[ 
> > \\t\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
> >  /* { dg-final { scan-assembler-times "pextrw\[ 
> > \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*(?:\n|\[ \\t\]+#)" 1 } } */
> >
> >
> >
> >
> > Jakub
> >
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH] i386: Fix up _mm_loadu_si{16,32} [PR99754]

2022-03-14 Thread Hongtao Liu via Gcc-patches
On Mon, Mar 14, 2022 at 7:25 PM Jakub Jelinek  wrote:
>
> On Sun, Mar 13, 2022 at 09:34:10PM +0800, Hongtao Liu wrote:
> > LGTM, thanks for handling this.
>
> Thanks, committed.
>
> > > Note, while the Intrinsics guide for _mm_loadu_si32 says SSE2,
> > > for _mm_loadu_si16 it says strangely SSE.  But the intrinsics
> > > returns __m128i, which is only defined in emmintrin.h, and
> > > _mm_set_epi16 is also only SSE2 and later in emmintrin.h.
> > > Even clang defines it in emmintrin.h and ends up with inlining
> > > failure when calling _mm_loadu_si16 from sse,no-sse2 function.
> > > So, isn't that a bug in the intrinsic guide instead?
> > I think it's a bug, it's supposed to generate movzx + movd, and movd
> > is under sse2, and have reported it to the colleague who maintains
> > Intel intrinsic guide.
> >
> > Similar bug for
> > _mm_loadu_si64
> > _mm_storeu_si16
> > _mm_storeu_si64
>
> Currently it emits pxor + pinsrw, but even those are SSE2 instructions,
> unless they use a MMX register (then it is MMX and SSE).
> I agree that movzwl + movd seems better than pxor + pinsrw though.
> So, do we want to help it a little bit then?  Like:
>
> 2022-03-14  Jakub Jelinek  
>
> * config/i386/eemintrin.h (_mm_loadu_si16): Use _mm_set_epi32 instead
> of _mm_set_epi16 and zero extend the memory load.
>
> * gcc.target/i386/pr95483-1.c: Use -msse2 instead of -msse in
> dg-options, allow movzwl+movd instead of pxor with pinsrw.
>
> --- gcc/config/i386/emmintrin.h.jj  2022-03-14 10:44:29.402617685 +0100
> +++ gcc/config/i386/emmintrin.h 2022-03-14 11:58:18.062666257 +0100
> @@ -724,7 +724,7 @@ _mm_loadu_si32 (void const *__P)
>  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_loadu_si16 (void const *__P)
>  {
> -  return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, (*(__m16_u *)__P)[0]);
> +  return _mm_set_epi32 (0, 0, 0, (unsigned short) ((*(__m16_u *)__P)[0]));
>  }
Under avx512fp16,  the former directly generates vmovw, but the latter
still generates movzx + vmovd. There's still a miss optimization.
Thus I prefer to optimize it in the backend pxor + pinsrw -> movzx +
movd -> vmovw (under avx512fp16).
I'll open a PR for that and optimize it in GCC13.
>
>  extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> --- gcc/testsuite/gcc.target/i386/pr95483-1.c.jj2020-10-14 
> 22:05:19.380856952 +0200
> +++ gcc/testsuite/gcc.target/i386/pr95483-1.c   2022-03-14 12:11:07.716891710 
> +0100
> @@ -1,7 +1,7 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -msse" } */
> -/* { dg-final { scan-assembler-times "pxor\[ 
> \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
> -/* { dg-final { scan-assembler-times "pinsrw\[ 
> \\t\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-options "-O2 -msse2" } */
> +/* { dg-final { scan-assembler-times "(?:movzwl\[ \\t\]+\[^\n\]*|pxor\[ 
> \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+)(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "(?:movd|pinsrw)\[ 
> \\t\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
>  /* { dg-final { scan-assembler-times "pextrw\[ 
> \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*(?:\n|\[ \\t\]+#)" 1 } } */
>
>
>
>
> Jakub
>


-- 
BR,
Hongtao


RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Roger Sayle


Hi Richard,

> The above seems backwards - it should be (subreg:HF (reg:SI 0)), no?

Line 949 of emit-rtl.cc in the definition of "validate_subreg" contains:

  /* Subregs involving floating point modes are not allowed to
 change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
 (subreg:SI (reg:DF) 0) isn't.  */
  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
{
  if (! (known_eq (isize, osize)

Hence, we ICE on line 1022 of emit-rtl.cc in gen_rtx_SUBREG, that asserts
validate_subreg, when called with omode=HFmode and imode=SImode.

Cheers,
Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 14 March 2022 10:15
> To: Roger Sayle 
> Cc: 'Tom de Vries' ; 'Jeff Law' ;
> 'Tobias Burnus' ; richard.sandif...@arm.com;
'Jeff
> Law' ; gcc-patches@gcc.gnu.org
> Subject: RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP
values as
> wider integers.
> 
> On Mon, 14 Mar 2022, Roger Sayle wrote:
> 
> >
> > Hi Richard,
> > Thanks for asking.
> > The issue is with 16-bit floating point HFmode, where clang on nvptx
> > passes HFmode values in SImode registers, and for binary compatibility
> > GCC needs to do the same.
> > Their motivation is that for compatibility with older GPUs and the
> > x86_64 host, HFmode parameters are treated like "unsigned int".
> > Hence, libgcc functions like __sqrthf behave as though declared (are
> > binary compatible with) "unsigned short __sqrthf(unsigned short)".
> > As you point out, this also greatly simplifies soft-float, as both
> > ABIs remain the same.
> >
> > Alas, the subreg approach doesn't work as we'd end up with (subreg:SI
> > (subreg:HI (reg:HF)), though technically that is what this patch does,
> > inserting a pair of conversion instructions.
> 
> So what does FUNCTION_ARG return for the HFmode parameters?  The above
> seems backwards - it should be (subreg:HF (reg:SI 0)), no?
> 
> Or are we talking about the caller side?  Or the return value case (in
which case I
> ask the same question wrt FUNCTION_VALUE)?
> 
> Does nvptx use the promotion hooks in those cases?
> 
> > Until recently (subreg:SI (reg:HF)) did work, but that support was
> > removed/cleaned-up,
> > as we need sensible subreg semantics in the RTL passes.   The proposed
patch
> > simply
> > adds support back in the minimal places where changing FP/non-FP and
> > precision at the same time is required: argument passing and return
> > values.
> >
> > Hopefully this answers your question.  An alternate viewpoint might be
> > "is there a reason for GCC not to be able to support targets with
> > slightly wacky parameter passing conventions".
> >
> > Thanks for thinking about this.
> > Roger
> > --
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: 14 March 2022 09:09
> > > To: Roger Sayle 
> > > Cc: 'Tom de Vries' ; 'Jeff Law'
> > > ; 'Tobias Burnus' ;
> > > richard.sandif...@arm.com;
> > 'Jeff
> > > Law' ; gcc-patches@gcc.gnu.org
> > > Subject: RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP
> > values as
> > > wider integers.
> > >
> > > On Mon, 14 Mar 2022, Roger Sayle wrote:
> > >
> > > >
> > > > I thought I'd add a few comments that might help reviewers of this
> > patch.
> > > > Most importantly, this patch should be completely safe, as both
> > > > changes only trigger with FLOAT vs INT size mismatches which
> > > > currently both ICE in the compiler a few lines later.  Admittedly,
> > > > this indicates something odd about a target's choice of ABI, but
> > > > one alternative might be to issue a "sorry, unimplemented" error
> > > > message rather than ICE, perhaps with a TODO or FIXME comment that
> > > > support for mixed size FP/integer ABIs be added in future.
> > > >
> > > > The only (other?) possible point of contention is the (arbitrary)
> > > > decision that when passing floating point values in a larger
> > > > integer register, the code is hardwired to use zero-extension.
> > > > This in theory could be turned into a target hook to support
> > > > sign-extension, but given these are padding bits, zero seems
> > > > appropriate. [On x86_64, if passing DFmode argument in a V2DFmode
> > > > vector, say, it seems reasonable
> > to
> > > use movq and zero the high bits].
> > > >
> > > > The final point is that at the moment, the only affected target is
> > > > nvptx-none, as I don't believe any other backend specifies an ABI
> > > > that requires passing floating point values in wider integer
registers.
> > > > Having said that, most targets don't yet support HFmode, and their
> > > > ABI specifications haven't yet been updated as what to do in that
> > > > eventuality.  If they choose to treat these like HImode, they'll
> > > > run into the same issues, as most ABIs pass HImode in wider
> > > > word_mode
> > registers.
> > > >
> > > > I hope this helps.  If folks could air their concerns out loud, I
> > > > can try my best to address those worries.
> > >
> > > First of all I'm not familiar with the ABI related 

Re: [PATCH] i386: Fix up _mm_loadu_si{16,32} [PR99754]

2022-03-14 Thread Jakub Jelinek via Gcc-patches
On Sun, Mar 13, 2022 at 09:34:10PM +0800, Hongtao Liu wrote:
> LGTM, thanks for handling this.

Thanks, committed.

> > Note, while the Intrinsics guide for _mm_loadu_si32 says SSE2,
> > for _mm_loadu_si16 it says strangely SSE.  But the intrinsics
> > returns __m128i, which is only defined in emmintrin.h, and
> > _mm_set_epi16 is also only SSE2 and later in emmintrin.h.
> > Even clang defines it in emmintrin.h and ends up with inlining
> > failure when calling _mm_loadu_si16 from sse,no-sse2 function.
> > So, isn't that a bug in the intrinsic guide instead?
> I think it's a bug, it's supposed to generate movzx + movd, and movd
> is under sse2, and have reported it to the colleague who maintains
> Intel intrinsic guide.
> 
> Similar bug for
> _mm_loadu_si64
> _mm_storeu_si16
> _mm_storeu_si64

Currently it emits pxor + pinsrw, but even those are SSE2 instructions,
unless they use a MMX register (then it is MMX and SSE).
I agree that movzwl + movd seems better than pxor + pinsrw though.
So, do we want to help it a little bit then?  Like:

2022-03-14  Jakub Jelinek  

* config/i386/eemintrin.h (_mm_loadu_si16): Use _mm_set_epi32 instead
of _mm_set_epi16 and zero extend the memory load.

* gcc.target/i386/pr95483-1.c: Use -msse2 instead of -msse in
dg-options, allow movzwl+movd instead of pxor with pinsrw.

--- gcc/config/i386/emmintrin.h.jj  2022-03-14 10:44:29.402617685 +0100
+++ gcc/config/i386/emmintrin.h 2022-03-14 11:58:18.062666257 +0100
@@ -724,7 +724,7 @@ _mm_loadu_si32 (void const *__P)
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loadu_si16 (void const *__P)
 {
-  return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, (*(__m16_u *)__P)[0]);
+  return _mm_set_epi32 (0, 0, 0, (unsigned short) ((*(__m16_u *)__P)[0]));
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
--- gcc/testsuite/gcc.target/i386/pr95483-1.c.jj2020-10-14 
22:05:19.380856952 +0200
+++ gcc/testsuite/gcc.target/i386/pr95483-1.c   2022-03-14 12:11:07.716891710 
+0100
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -msse" } */
-/* { dg-final { scan-assembler-times "pxor\[ 
\\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "pinsrw\[ 
\\t\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-final { scan-assembler-times "(?:movzwl\[ \\t\]+\[^\n\]*|pxor\[ 
\\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+)(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "(?:movd|pinsrw)\[ 
\\t\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "pextrw\[ 
\\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*(?:\n|\[ \\t\]+#)" 1 } } */
 
 


Jakub



Re: [PATCH 2/2] c++tools: Work around a BSD bug in getaddrinfo().

2022-03-14 Thread Richard Biener via Gcc-patches
On Mon, Mar 14, 2022 at 9:03 AM Iain Sandoe  wrote:
>
>
>
> > On 14 Mar 2022, at 07:45, Richard Biener  wrote:
> >
> > On Mon, Mar 14, 2022 at 12:17 AM Iain Sandoe via Gcc-patches
> >  wrote:
> >>
> >> Some versions of the BSD getaddrinfo() call do not work with the specific
> >> input of "0" for the servname entry (a segv results).  Since we are making
> >> the call with a dummy port number, the value is actually not important, 
> >> other
> >> than it should be in range.  Work around the BSD bug by using "1" instead.
> >>
> >> tested on powerpc,i686-darwin9, x86-64-darwin10,17,20
> >> powerpc64le,powerpc64,x86_64-linux-gnu,
> >>
> >> OK for master?
> >> eventual backports?
> >
> > Why does the nullptr solution not work here?
>
> Because this code is setting a netmask and, if I read it correctly (we do not 
> seem to have
> a test for it), it can be called with “/NN” (where NN is the number of bits 
> and there is no host
> name before the slash).
>
> So in this case, in the getaddrinfo() call, the “hostname” parm is set to 
> NULL, and we must
> supply a service-name/port number (for the “servname” parm), since 
> getaddrinfo() does not
> permit both to be NULL.

I see.  I guess the fix is OK then.

Richard.

>
> >
> >> thanks
> >> Iain
> >>
> >> Signed-off-by: Iain Sandoe 
> >>
> >> c++tools/ChangeLog:
> >>
> >>* server.cc (accept_from): Use "1" as the dummy port number.
> >> ---
> >> c++tools/server.cc | 6 +-
> >> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/c++tools/server.cc b/c++tools/server.cc
> >> index 8c6ad314886..00154a05925 100644
> >> --- a/c++tools/server.cc
> >> +++ b/c++tools/server.cc
> >> @@ -360,7 +360,11 @@ accept_from (char *arg ATTRIBUTE_UNUSED)
> >>   hints.ai_next = NULL;
> >>
> >>   struct addrinfo *addrs = NULL;
> >> -  if (int e = getaddrinfo (slash == arg ? NULL : arg, "0", , 
> >> ))
> >> +  /* getaddrinfo requires either hostname or servname to be non-null, so 
> >> that we must
> >> + set a port number (to cover the case that the string passed contains 
> >> just /NN).
> >> + Use an arbitrary in-range port number, but avoiding "0" which 
> >> triggers a bug on
> >> + some BSD variants.  */
> >> +  if (int e = getaddrinfo (slash == arg ? NULL : arg, "1", , 
> >> ))
> >> {
> >>   noisy ("cannot resolve '%s': %s", arg, gai_strerror (e));
> >>   ok = false;
> >> --
> >> 2.24.3 (Apple Git-128)
> >>
>


Re: [PATCH] ifcvt, v2: Punt if not onlyjump_p for find_if_case_{1, 2} [PR104814]

2022-03-14 Thread Eric Botcazou via Gcc-patches
> Here is an updated patch for it.  Ok for trunk?
> 
> 2022-03-14  Jakub Jelinek  
> 
>   PR rtl-optimization/104814
> * ifcvt.cc (find_if_case_1, find_if_case_2): Punt if test_bb doesn't
>   end with onlyjump_p.  Assume BB_END (test_bb) is always non-NULL.
> 
>   * gcc.c-torture/execute/pr104814.c: New test.

OK, thanks.

-- 
Eric Botcazou




[PATCH] ifcvt, v2: Punt if not onlyjump_p for find_if_case_{1,2} [PR104814]

2022-03-14 Thread Jakub Jelinek via Gcc-patches
On Sun, Mar 13, 2022 at 02:43:18PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > Are the !BB_END tests really necessary?  cond_exec_process_if_block has the 
> > same test on onlyjump_p without it.  Likewise for noce_find_if_block.
> 
> Probably not, I've put it there only because the neighboring code is testing
> it too:
>   if ((BB_END (then_bb)
>&& JUMP_P (BB_END (then_bb))
>&& CROSSING_JUMP_P (BB_END (then_bb)))
>   || (BB_END (test_bb)
>   && JUMP_P (BB_END (test_bb))
>   && CROSSING_JUMP_P (BB_END (test_bb)))
>   || (BB_END (else_bb)
>   && JUMP_P (BB_END (else_bb))
>   && CROSSING_JUMP_P (BB_END (else_bb
> return FALSE;
> find_if_header which calls find_if_case_* has:
>   if (EDGE_COUNT (test_bb->succs) != 2)
> return NULL;
> at the start, and I think an empty bb can't have more than one successor,
> because there is nothing to cause different control flow.

Here is an updated patch for it.  Ok for trunk?

2022-03-14  Jakub Jelinek  

PR rtl-optimization/104814
* ifcvt.cc (find_if_case_1, find_if_case_2): Punt if test_bb doesn't
end with onlyjump_p.  Assume BB_END (test_bb) is always non-NULL.

* gcc.c-torture/execute/pr104814.c: New test.

--- gcc/ifcvt.cc.jj 2022-03-14 10:34:16.350172371 +0100
+++ gcc/ifcvt.cc2022-03-14 11:20:32.425384101 +0100
@@ -5251,14 +5251,17 @@ find_if_case_1 (basic_block test_bb, edg
   if ((BB_END (then_bb)
&& JUMP_P (BB_END (then_bb))
&& CROSSING_JUMP_P (BB_END (then_bb)))
-  || (BB_END (test_bb)
- && JUMP_P (BB_END (test_bb))
+  || (JUMP_P (BB_END (test_bb))
  && CROSSING_JUMP_P (BB_END (test_bb)))
   || (BB_END (else_bb)
  && JUMP_P (BB_END (else_bb))
  && CROSSING_JUMP_P (BB_END (else_bb
 return FALSE;
 
+  /* Verify test_bb ends in a conditional jump with no other side-effects.  */
+  if (!onlyjump_p (BB_END (test_bb)))
+return FALSE;
+
   /* THEN has one successor.  */
   if (!single_succ_p (then_bb))
 return FALSE;
@@ -5372,14 +5375,17 @@ find_if_case_2 (basic_block test_bb, edg
   if ((BB_END (then_bb)
&& JUMP_P (BB_END (then_bb))
&& CROSSING_JUMP_P (BB_END (then_bb)))
-  || (BB_END (test_bb)
- && JUMP_P (BB_END (test_bb))
+  || (JUMP_P (BB_END (test_bb))
  && CROSSING_JUMP_P (BB_END (test_bb)))
   || (BB_END (else_bb)
  && JUMP_P (BB_END (else_bb))
  && CROSSING_JUMP_P (BB_END (else_bb
 return FALSE;
 
+  /* Verify test_bb ends in a conditional jump with no other side-effects.  */
+  if (!onlyjump_p (BB_END (test_bb)))
+return FALSE;
+
   /* ELSE has one successor.  */
   if (!single_succ_p (else_bb))
 return FALSE;
--- gcc/testsuite/gcc.c-torture/execute/pr104814.c.jj   2022-03-14 
11:18:57.188717248 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr104814.c  2022-03-14 
11:18:57.188717248 +0100
@@ -0,0 +1,30 @@
+/* PR rtl-optimization/104814 */
+
+short a = 0;
+static long b = 0;
+int c = 7;
+char d = 0;
+short *e = 
+long f = 0;
+
+unsigned long
+foo (unsigned long h, long j)
+{
+  return j == 0 ? h : h / j;
+}
+
+int
+main ()
+{
+  long k = f;
+  for (; c; --c)
+{
+  for (int i = 0; i < 7; ++i)
+   ;
+  long m = foo (f, --b);
+  d = ((char) m | *e) <= 43165;
+}
+  if (b != -7)
+__builtin_abort ();
+  return 0;
+}


Jakub



RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Richard Biener via Gcc-patches
On Mon, 14 Mar 2022, Roger Sayle wrote:

> 
> Hi Richard,
> Thanks for asking.
> The issue is with 16-bit floating point HFmode, where clang on nvptx passes
> HFmode
> values in SImode registers, and for binary compatibility GCC needs to do the
> same.
> Their motivation is that for compatibility with older GPUs and the x86_64
> host, HFmode
> parameters are treated like "unsigned int".  Hence, libgcc functions like
> __sqrthf behave
> as though declared (are binary compatible with) "unsigned short
> __sqrthf(unsigned short)".
> As you point out, this also greatly simplifies soft-float, as both ABIs
> remain the same.
> 
> Alas, the subreg approach doesn't work as we'd end up with (subreg:SI
> (subreg:HI (reg:HF)),
> though technically that is what this patch does, inserting a pair of
> conversion instructions.

So what does FUNCTION_ARG return for the HFmode parameters?  The above
seems backwards - it should be (subreg:HF (reg:SI 0)), no?

Or are we talking about the caller side?  Or the return value case
(in which case I ask the same question wrt FUNCTION_VALUE)?

Does nvptx use the promotion hooks in those cases?

> Until recently (subreg:SI (reg:HF)) did work, but that support was
> removed/cleaned-up,
> as we need sensible subreg semantics in the RTL passes.   The proposed patch
> simply
> adds support back in the minimal places where changing FP/non-FP and
> precision at the
> same time is required: argument passing and return values.
> 
> Hopefully this answers your question.  An alternate viewpoint might be "is
> there a reason for
> GCC not to be able to support targets with slightly wacky parameter passing
> conventions".
> 
> Thanks for thinking about this.
> Roger
> --
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: 14 March 2022 09:09
> > To: Roger Sayle 
> > Cc: 'Tom de Vries' ; 'Jeff Law' ;
> > 'Tobias Burnus' ; richard.sandif...@arm.com;
> 'Jeff
> > Law' ; gcc-patches@gcc.gnu.org
> > Subject: RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP
> values as
> > wider integers.
> > 
> > On Mon, 14 Mar 2022, Roger Sayle wrote:
> > 
> > >
> > > I thought I'd add a few comments that might help reviewers of this
> patch.
> > > Most importantly, this patch should be completely safe, as both
> > > changes only trigger with FLOAT vs INT size mismatches which currently
> > > both ICE in the compiler a few lines later.  Admittedly, this
> > > indicates something odd about a target's choice of ABI, but one
> > > alternative might be to issue a "sorry, unimplemented" error message
> > > rather than ICE, perhaps with a TODO or FIXME comment that support for
> > > mixed size FP/integer ABIs be added in future.
> > >
> > > The only (other?) possible point of contention is the (arbitrary)
> > > decision that when passing floating point values in a larger integer
> > > register, the code is hardwired to use zero-extension.  This in theory
> > > could be turned into a target hook to support sign-extension, but
> > > given these are padding bits, zero seems appropriate. [On x86_64, if
> > > passing DFmode argument in a V2DFmode vector, say, it seems reasonable
> to
> > use movq and zero the high bits].
> > >
> > > The final point is that at the moment, the only affected target is
> > > nvptx-none, as I don't believe any other backend specifies an ABI that
> > > requires passing floating point values in wider integer registers.
> > > Having said that, most targets don't yet support HFmode, and their ABI
> > > specifications haven't yet been updated as what to do in that
> > > eventuality.  If they choose to treat these like HImode, they'll run
> > > into the same issues, as most ABIs pass HImode in wider word_mode
> registers.
> > >
> > > I hope this helps.  If folks could air their concerns out loud, I can
> > > try my best to address those worries.
> > 
> > First of all I'm not familiar with the ABI related code as all, so I
> refrained from
> > commenting.  But now I've looked closer (still unfamiliar code).
> > 
> > I suppose there's targets passing SFmode in a SImode GPR and DFmode in a
> > DImode GPR (all soft-float targets?), so that already works somehow.
> > Why does nvptx choose DImode for SFmode passing then?  Can't it choose
> > (subreg:SI di:..) or so?  Does it require zero-extending to DImode on the
> caller
> > side?  It seems your expand_expr_real_1 code does not rely on that?  So,
> why
> > does nvptx function_arg hook (?) insist on returning a DImode reg for an
> SFmode
> > argument rather than an SImode subreg of that?
> > 
> > Richard.
> > 
> > >
> > > Many thanks in advance (and thanks to Tobias and Tom for pushing for
> this).
> > > Roger
> > > --
> > >
> > > > -Original Message-
> > > > From: Tom de Vries 
> > > > Sent: 14 March 2022 08:06
> > > > To: Jeff Law ; Richard Biener
> > > > ; Tobias Burnus 
> > > > Cc: richard.sandif...@arm.com; Jeff Law ; gcc-
> > > > patc...@gcc.gnu.org; Roger Sayle 
> > > > Subject: PING**4 - [PATCH] middle-end: Support 

Re: [committed] libstdc++: Move closing brace outside #endif [PR104866]

2022-03-14 Thread Detlef Vollmann

On 3/11/22 20:24, Detlef Vollmann wrote:

On 3/11/22 18:59, Jonathan Wakely wrote:


Thanks. Now I'm getting a build failure because libtol wasn't created
in the avr/libstdc++-v3 directory of the build tree, but I'll have to
look into that next week.

/bin/sh: ../libtool: No such file or directory


Here's my configure call:

$REPO_DIR/configure \
     --prefix=$PREFIX \
     --target=avr \
     --enable-languages=c,c++ \
     --with-dwarf2 \
     --enable-multilib \
     --enable-libstdcxx \
     --disable-decimal-float \
     --disable-libffi \
     --disable-libgomp \
     --disable-libmudflap \
     --disable-libquadmath \
     --disable-libssp \
     --disable-libstdcxx-pch \
     --disable-nls \
     --without-included-gettext \
     --disable-libstdcxx-verbose \
     --disable-shared \
     --disable-threads \
     --disable-tls \
     --disable-plugin \
     --with-system-zlib \
     --with-native-system-header-dir=$PREFIX/port/include \
     --with-headers=yes \
     --with-gnu-as \
     --with-gnu-ld \
     --with-avrlibc \
     --with-build-time-tools=$PREFIX/lib/avr/bin

--disable-threads is probably wrong, as I definitly have threads
(any ISR counts as thread).  I added it in the (wrong) assumption
that it would build support for std::thread...


Just for completeness: for the previous steps building the toolchain
I followed
.
Using binutils: 2.37 and AVR LibC 2.0.0 (from SVN).

  Detlef


RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Roger Sayle


Hi Richard,
Thanks for asking.
The issue is with 16-bit floating point HFmode, where clang on nvptx passes
HFmode
values in SImode registers, and for binary compatibility GCC needs to do the
same.
Their motivation is that for compatibility with older GPUs and the x86_64
host, HFmode
parameters are treated like "unsigned int".  Hence, libgcc functions like
__sqrthf behave
as though declared (are binary compatible with) "unsigned short
__sqrthf(unsigned short)".
As you point out, this also greatly simplifies soft-float, as both ABIs
remain the same.

Alas, the subreg approach doesn't work as we'd end up with (subreg:SI
(subreg:HI (reg:HF)),
though technically that is what this patch does, inserting a pair of
conversion instructions.
Until recently (subreg:SI (reg:HF)) did work, but that support was
removed/cleaned-up,
as we need sensible subreg semantics in the RTL passes.   The proposed patch
simply
adds support back in the minimal places where changing FP/non-FP and
precision at the
same time is required: argument passing and return values.

Hopefully this answers your question.  An alternate viewpoint might be "is
there a reason for
GCC not to be able to support targets with slightly wacky parameter passing
conventions".

Thanks for thinking about this.
Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 14 March 2022 09:09
> To: Roger Sayle 
> Cc: 'Tom de Vries' ; 'Jeff Law' ;
> 'Tobias Burnus' ; richard.sandif...@arm.com;
'Jeff
> Law' ; gcc-patches@gcc.gnu.org
> Subject: RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP
values as
> wider integers.
> 
> On Mon, 14 Mar 2022, Roger Sayle wrote:
> 
> >
> > I thought I'd add a few comments that might help reviewers of this
patch.
> > Most importantly, this patch should be completely safe, as both
> > changes only trigger with FLOAT vs INT size mismatches which currently
> > both ICE in the compiler a few lines later.  Admittedly, this
> > indicates something odd about a target's choice of ABI, but one
> > alternative might be to issue a "sorry, unimplemented" error message
> > rather than ICE, perhaps with a TODO or FIXME comment that support for
> > mixed size FP/integer ABIs be added in future.
> >
> > The only (other?) possible point of contention is the (arbitrary)
> > decision that when passing floating point values in a larger integer
> > register, the code is hardwired to use zero-extension.  This in theory
> > could be turned into a target hook to support sign-extension, but
> > given these are padding bits, zero seems appropriate. [On x86_64, if
> > passing DFmode argument in a V2DFmode vector, say, it seems reasonable
to
> use movq and zero the high bits].
> >
> > The final point is that at the moment, the only affected target is
> > nvptx-none, as I don't believe any other backend specifies an ABI that
> > requires passing floating point values in wider integer registers.
> > Having said that, most targets don't yet support HFmode, and their ABI
> > specifications haven't yet been updated as what to do in that
> > eventuality.  If they choose to treat these like HImode, they'll run
> > into the same issues, as most ABIs pass HImode in wider word_mode
registers.
> >
> > I hope this helps.  If folks could air their concerns out loud, I can
> > try my best to address those worries.
> 
> First of all I'm not familiar with the ABI related code as all, so I
refrained from
> commenting.  But now I've looked closer (still unfamiliar code).
> 
> I suppose there's targets passing SFmode in a SImode GPR and DFmode in a
> DImode GPR (all soft-float targets?), so that already works somehow.
> Why does nvptx choose DImode for SFmode passing then?  Can't it choose
> (subreg:SI di:..) or so?  Does it require zero-extending to DImode on the
caller
> side?  It seems your expand_expr_real_1 code does not rely on that?  So,
why
> does nvptx function_arg hook (?) insist on returning a DImode reg for an
SFmode
> argument rather than an SImode subreg of that?
> 
> Richard.
> 
> >
> > Many thanks in advance (and thanks to Tobias and Tom for pushing for
this).
> > Roger
> > --
> >
> > > -Original Message-
> > > From: Tom de Vries 
> > > Sent: 14 March 2022 08:06
> > > To: Jeff Law ; Richard Biener
> > > ; Tobias Burnus 
> > > Cc: richard.sandif...@arm.com; Jeff Law ; gcc-
> > > patc...@gcc.gnu.org; Roger Sayle 
> > > Subject: PING**4 - [PATCH] middle-end: Support ABIs that pass FP
> > > values as wider integers.
> > >
> > > On 3/2/22 20:18, Jeff Law via Gcc-patches wrote:
> > > >
> > > >
> > > > On 2/28/2022 5:54 AM, Richard Biener via Gcc-patches wrote:
> > > >> On Mon, 28 Feb 2022, Tobias Burnus wrote:
> > > >>
> > > >>> Ping**3
> > > >>>
> > > >>> On 23.02.22 09:42, Tobias Burnus wrote:
> > >  PING**2 for the ME review or at least comments to that patch,
> > >  which fixes a build issue/ICE with nvptx
> > > 
> > >  Patch:
> > >  https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.
> > >  html 

[committed] Spelling fix - cannott -> cannot [PR104899]

2022-03-14 Thread Jakub Jelinek via Gcc-patches
Hi!

This fixes typos and while changing that, also uses %< %> around attribute
names and fixes up formatting.

2022-03-14  Jakub Jelinek  

PR other/104899
* config/bfin/bfin.cc (bfin_handle_longcall_attribute): Fix a typo
in diagnostic message - cannott -> cannot.  Use %< and %> around
names of attribute.  Avoid too long line.
* range-op.cc (operator_logical_and::op1_range): Fix up a typo
in comment - cannott -> cannot.  Use 2 spaces after . instead of one.

--- gcc/config/bfin/bfin.cc.jj  2022-02-04 14:36:54.301615132 +0100
+++ gcc/config/bfin/bfin.cc 2022-03-14 10:37:13.452701382 +0100
@@ -4763,7 +4763,8 @@ bfin_handle_longcall_attribute (tree *no
  && lookup_attribute ("longcall", TYPE_ATTRIBUTES (*node
 {
   warning (OPT_Wattributes,
-  "cannott apply both longcall and shortcall attributes to the 
same function");
+  "cannot apply both % and % attributes "
+  "to the same function");
   *no_add_attrs = true;
 }
 
--- gcc/range-op.cc.jj  2022-02-09 15:15:59.479837374 +0100
+++ gcc/range-op.cc 2022-03-14 10:38:07.038953734 +0100
@@ -2532,7 +2532,7 @@ operator_logical_and::op1_range (irange
break;
  default:
// Any other result means only one side has to be false, the
-   // other side can be anything. So we cannott be sure of any
+   // other side can be anything.  So we cannot be sure of any
// result here.
r = range_true_and_false (type);
break;

Jakub



RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Richard Biener via Gcc-patches
On Mon, 14 Mar 2022, Roger Sayle wrote:

> 
> I thought I'd add a few comments that might help reviewers of this patch.
> Most importantly, this patch should be completely safe, as both changes
> only trigger with FLOAT vs INT size mismatches which currently both ICE in
> the compiler a few lines later.  Admittedly, this indicates something odd 
> about a target's choice of ABI, but one alternative might be to issue a
> "sorry, unimplemented" error message rather than ICE, perhaps with a
> TODO or FIXME comment that support for mixed size FP/integer ABIs be
> added in future.
> 
> The only (other?) possible point of contention is the (arbitrary) decision 
> that
> when passing floating point values in a larger integer register, the code is
> hardwired to use zero-extension.  This in theory could be turned into a target
> hook to support sign-extension, but given these are padding bits, zero seems
> appropriate. [On x86_64, if passing DFmode argument in a V2DFmode vector,
> say, it seems reasonable to use movq and zero the high bits].
> 
> The final point is that at the moment, the only affected target is nvptx-none,
> as I don't believe any other backend specifies an ABI that requires passing 
> floating point values in wider integer registers.  Having said that, most 
> targets
> don't yet support HFmode, and their ABI specifications haven't yet been
> updated as what to do in that eventuality.  If they choose to treat these like
> HImode, they'll run into the same issues, as most ABIs pass HImode in 
> wider word_mode registers.
> 
> I hope this helps.  If folks could air their concerns out loud, I can try my 
> best
> to address those worries.

First of all I'm not familiar with the ABI related code as all, so I
refrained from commenting.  But now I've looked closer (still unfamiliar 
code).

I suppose there's targets passing SFmode in a SImode GPR and DFmode
in a DImode GPR (all soft-float targets?), so that already works somehow.
Why does nvptx choose DImode for SFmode passing then?  Can't it choose
(subreg:SI di:..) or so?  Does it require zero-extending to DImode
on the caller side?  It seems your expand_expr_real_1 code does
not rely on that?  So, why does nvptx function_arg hook (?) insist
on returning a DImode reg for an SFmode argument rather than
an SImode subreg of that?

Richard.

> 
> Many thanks in advance (and thanks to Tobias and Tom for pushing for this).
> Roger
> --
> 
> > -Original Message-
> > From: Tom de Vries 
> > Sent: 14 March 2022 08:06
> > To: Jeff Law ; Richard Biener ;
> > Tobias Burnus 
> > Cc: richard.sandif...@arm.com; Jeff Law ; gcc-
> > patc...@gcc.gnu.org; Roger Sayle 
> > Subject: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as
> > wider integers.
> > 
> > On 3/2/22 20:18, Jeff Law via Gcc-patches wrote:
> > >
> > >
> > > On 2/28/2022 5:54 AM, Richard Biener via Gcc-patches wrote:
> > >> On Mon, 28 Feb 2022, Tobias Burnus wrote:
> > >>
> > >>> Ping**3
> > >>>
> > >>> On 23.02.22 09:42, Tobias Burnus wrote:
> >  PING**2 for the ME review or at least comments to that patch, which
> >  fixes a build issue/ICE with nvptx
> > 
> >  Patch:
> >  https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html
> >  (for gcc/cfgexpand.cc + gcc/expr.cc)
> > 
> >  (There is some discussion by Tom and Roger about the BE in the
> >  patch thread, which only not relate to the ME patch. But there is
> >  no ME-patch comment so far.)
> > >>> The related BE patch has been already committed, but to be
> > >>> effective, it needs the ME patch.
> > >> I'm not sure I'm qualified to review this - maybe Richard is.
> > > I'd initially ignored the patch as it didn't seem a good fit for
> > > stage4, subsequent messages changed my mind about it, but I never went
> > > back to take a deeper look at Roger's patch.
> > 
> > Ping.
> > 
> > [ FWIW, I'd appreciate it if a response came before the end of stage 4, 
> > such that
> > I have some time left to deal with fallout in case the patch is not 
> > approved. ]
> > 
> > Thanks,
> > - Tom
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


[PING] AArch64: add R30_REGNUM into shrink-wrapping separate

2022-03-14 Thread Dan Li via Gcc-patches

Gentile ping for this :), thanks.

Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590906.html


R30_REGNUM could also be used as a component in shrink-wrapping
separate, this patch enables it in aarch64.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_get_separate_components):
Remove bitmap clear of R30_REGNUM.
(aarch64_components_for_bb): Support R30_REGNUM as a component.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shrink_wrap_separate_1.c: New test.


---
 gcc/config/aarch64/aarch64.cc   |  4 ++--
 .../gcc.target/aarch64/shrink_wrap_separate_1.c | 17 +
 2 files changed, 19 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 8bcee8be9eb..6e1589b0312 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8463,7 +8463,6 @@ aarch64_get_separate_components (void)
   if (reg1 != INVALID_REGNUM)
 bitmap_clear_bit (components, reg1);
 
-  bitmap_clear_bit (components, LR_REGNUM);

   bitmap_clear_bit (components, SP_REGNUM);
 
   return components;

@@ -8500,7 +8499,8 @@ aarch64_components_for_bb (basic_block bb)
   /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
   for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
 if (!fixed_regs[regno]
-   && !crtl->abi->clobbers_full_reg_p (regno)
+   && (regno == R30_REGNUM
+   || !crtl->abi->clobbers_full_reg_p (regno))
&& (TEST_HARD_REG_BIT (extra_caller_saves, regno)
|| bitmap_bit_p (in, regno)
|| bitmap_bit_p (gen, regno)
diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c 
b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c
new file mode 100644
index 000..34002705ace
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_separate_1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fomit-frame-pointer -fdump-rtl-pro_and_epilogue" } */
+
+void f();
+
+int g(int x)
+{
+  if (x == 0)
+{
+  __asm__ ("":::"x19", "x20");
+  return 1;
+}
+  f();
+  return 2;
+}
+
+/* { dg-final { scan-rtl-dump {The components we wrap separately are \[sep 30\]} 
"pro_and_epilogue"  } } */
--
2.17.1


RE: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Roger Sayle

I thought I'd add a few comments that might help reviewers of this patch.
Most importantly, this patch should be completely safe, as both changes
only trigger with FLOAT vs INT size mismatches which currently both ICE in
the compiler a few lines later.  Admittedly, this indicates something odd 
about a target's choice of ABI, but one alternative might be to issue a
"sorry, unimplemented" error message rather than ICE, perhaps with a
TODO or FIXME comment that support for mixed size FP/integer ABIs be
added in future.

The only (other?) possible point of contention is the (arbitrary) decision that
when passing floating point values in a larger integer register, the code is
hardwired to use zero-extension.  This in theory could be turned into a target
hook to support sign-extension, but given these are padding bits, zero seems
appropriate. [On x86_64, if passing DFmode argument in a V2DFmode vector,
say, it seems reasonable to use movq and zero the high bits].

The final point is that at the moment, the only affected target is nvptx-none,
as I don't believe any other backend specifies an ABI that requires passing 
floating point values in wider integer registers.  Having said that, most 
targets
don't yet support HFmode, and their ABI specifications haven't yet been
updated as what to do in that eventuality.  If they choose to treat these like
HImode, they'll run into the same issues, as most ABIs pass HImode in 
wider word_mode registers.

I hope this helps.  If folks could air their concerns out loud, I can try my 
best
to address those worries.

Many thanks in advance (and thanks to Tobias and Tom for pushing for this).
Roger
--

> -Original Message-
> From: Tom de Vries 
> Sent: 14 March 2022 08:06
> To: Jeff Law ; Richard Biener ;
> Tobias Burnus 
> Cc: richard.sandif...@arm.com; Jeff Law ; gcc-
> patc...@gcc.gnu.org; Roger Sayle 
> Subject: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as
> wider integers.
> 
> On 3/2/22 20:18, Jeff Law via Gcc-patches wrote:
> >
> >
> > On 2/28/2022 5:54 AM, Richard Biener via Gcc-patches wrote:
> >> On Mon, 28 Feb 2022, Tobias Burnus wrote:
> >>
> >>> Ping**3
> >>>
> >>> On 23.02.22 09:42, Tobias Burnus wrote:
>  PING**2 for the ME review or at least comments to that patch, which
>  fixes a build issue/ICE with nvptx
> 
>  Patch:
>  https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html
>  (for gcc/cfgexpand.cc + gcc/expr.cc)
> 
>  (There is some discussion by Tom and Roger about the BE in the
>  patch thread, which only not relate to the ME patch. But there is
>  no ME-patch comment so far.)
> >>> The related BE patch has been already committed, but to be
> >>> effective, it needs the ME patch.
> >> I'm not sure I'm qualified to review this - maybe Richard is.
> > I'd initially ignored the patch as it didn't seem a good fit for
> > stage4, subsequent messages changed my mind about it, but I never went
> > back to take a deeper look at Roger's patch.
> 
> Ping.
> 
> [ FWIW, I'd appreciate it if a response came before the end of stage 4, such 
> that
> I have some time left to deal with fallout in case the patch is not approved. 
> ]
> 
> Thanks,
> - Tom
diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index d51af2e..c377f16 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -3715,7 +3715,22 @@ expand_value_return (rtx val)
 mode = promote_function_mode (type, old_mode, , funtype, 1);
 
   if (mode != old_mode)
-   val = convert_modes (mode, old_mode, val, unsignedp);
+   {
+ /* Some ABIs require scalar floating point modes to be returned
+in a wider scalar integer mode.  We need to explicitly
+reinterpret to an integer mode of the correct precision
+before extending to the desired result.  */
+ if (SCALAR_INT_MODE_P (mode)
+ && SCALAR_FLOAT_MODE_P (old_mode)
+ && known_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (old_mode)))
+   {
+ scalar_int_mode imode = int_mode_for_mode (old_mode).require ();
+ val = force_reg (imode, gen_lowpart (imode, val));
+ val = convert_modes (mode, imode, val, 1);
+   }
+ else
+   val = convert_modes (mode, old_mode, val, unsignedp);
+   }
 
   if (GET_CODE (return_reg) == PARALLEL)
emit_group_load (return_reg, val, type, int_size_in_bytes (type));
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 35e4029..e4efdcd 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -10674,6 +10674,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
pmode = promote_ssa_mode (ssa_name, );
  gcc_assert (GET_MODE (decl_rtl) == pmode);
 
+ /* Some ABIs require scalar floating point modes to be passed
+in a wider scalar integer mode.  We need to explicitly
+truncate to an integer mode of the correct precision before
+using a SUBREG to 

PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Tom de Vries via Gcc-patches

On 3/2/22 20:18, Jeff Law via Gcc-patches wrote:



On 2/28/2022 5:54 AM, Richard Biener via Gcc-patches wrote:

On Mon, 28 Feb 2022, Tobias Burnus wrote:


Ping**3

On 23.02.22 09:42, Tobias Burnus wrote:

PING**2 for the ME review or at least comments to that patch,
which fixes a build issue/ICE with nvptx

Patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html
(for gcc/cfgexpand.cc + gcc/expr.cc)

(There is some discussion by Tom and Roger about the BE in the patch
thread, which only not relate to the ME patch. But there is no
ME-patch comment so far.)

The related BE patch has been already committed, but to be effective, it
needs the ME patch.

I'm not sure I'm qualified to review this - maybe Richard is.
I'd initially ignored the patch as it didn't seem a good fit for stage4, 
subsequent messages changed my mind about it, but I never went back to 
take a deeper look at Roger's patch.


Ping.

[ FWIW, I'd appreciate it if a response came before the end of stage 4, 
such that I have some time left to deal with fallout in case the patch 
is not approved. ]


Thanks,
- Tom


Re: [PATCH 2/2] c++tools: Work around a BSD bug in getaddrinfo().

2022-03-14 Thread Iain Sandoe



> On 14 Mar 2022, at 07:45, Richard Biener  wrote:
> 
> On Mon, Mar 14, 2022 at 12:17 AM Iain Sandoe via Gcc-patches
>  wrote:
>> 
>> Some versions of the BSD getaddrinfo() call do not work with the specific
>> input of "0" for the servname entry (a segv results).  Since we are making
>> the call with a dummy port number, the value is actually not important, other
>> than it should be in range.  Work around the BSD bug by using "1" instead.
>> 
>> tested on powerpc,i686-darwin9, x86-64-darwin10,17,20
>> powerpc64le,powerpc64,x86_64-linux-gnu,
>> 
>> OK for master?
>> eventual backports?
> 
> Why does the nullptr solution not work here?

Because this code is setting a netmask and, if I read it correctly (we do not 
seem to have
a test for it), it can be called with “/NN” (where NN is the number of bits and 
there is no host
name before the slash).

So in this case, in the getaddrinfo() call, the “hostname” parm is set to NULL, 
and we must
supply a service-name/port number (for the “servname” parm), since 
getaddrinfo() does not
permit both to be NULL.


> 
>> thanks
>> Iain
>> 
>> Signed-off-by: Iain Sandoe 
>> 
>> c++tools/ChangeLog:
>> 
>>* server.cc (accept_from): Use "1" as the dummy port number.
>> ---
>> c++tools/server.cc | 6 +-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/c++tools/server.cc b/c++tools/server.cc
>> index 8c6ad314886..00154a05925 100644
>> --- a/c++tools/server.cc
>> +++ b/c++tools/server.cc
>> @@ -360,7 +360,11 @@ accept_from (char *arg ATTRIBUTE_UNUSED)
>>   hints.ai_next = NULL;
>> 
>>   struct addrinfo *addrs = NULL;
>> -  if (int e = getaddrinfo (slash == arg ? NULL : arg, "0", , ))
>> +  /* getaddrinfo requires either hostname or servname to be non-null, so 
>> that we must
>> + set a port number (to cover the case that the string passed contains 
>> just /NN).
>> + Use an arbitrary in-range port number, but avoiding "0" which triggers 
>> a bug on
>> + some BSD variants.  */
>> +  if (int e = getaddrinfo (slash == arg ? NULL : arg, "1", , ))
>> {
>>   noisy ("cannot resolve '%s': %s", arg, gai_strerror (e));
>>   ok = false;
>> --
>> 2.24.3 (Apple Git-128)
>> 



Re: [PATCH][Middle-end][Backport to GCC11][PR100775]Updating the reg use in exit block for -fzero-call-used-regs

2022-03-14 Thread Richard Biener via Gcc-patches
On Fri, Mar 11, 2022 at 5:31 PM Qing Zhao via Gcc-patches
 wrote:
>
>
> Hi,
>
> I plan to backport the patch to fix PR100775:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100775
>
> To gcc11 since this is a general bug to -fzero-call-used-regs. And should be 
> fixed in gcc11 as well.
>
> I have tested the patch with gcc11 release branch on both x86 and aarch64, no 
> regression.
>
> Okay for commit?

OK

> thanks.
>
> Qing.
>
> ==
>
>
> From 737a0b0824111f46da44c5578b9769070c52 Mon Sep 17 00:00:00 2001
> From: Qing Zhao 
> Date: Thu, 10 Mar 2022 23:22:29 +
> Subject: [PATCH] middle-end: updating the reg use in exit block for
>  -fzero-call-used-regs [PR100775] GCC11 backport.
>
> In the pass_zero_call_used_regs, when updating dataflow info after adding
> the register zeroing sequence in the epilogue of the function, we should
> call "df_update_exit_block_uses" to update the register use information in
> the exit block to include all the registers that have been zeroed.
>
> gcc/ChangeLog:
>
> PR middle-end/100775
> * function.c (gen_call_used_regs_seq): Call
> df_update_exit_block_uses when updating df.
>
> gcc/testsuite/ChangeLog:
>
> PR middle-end/100775
> * gcc.target/arm/pr100775.c: New test.
> ---
>  gcc/function.c  | 2 +-
>  gcc/testsuite/gcc.target/arm/pr100775.c | 9 +
>  2 files changed, 10 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pr100775.c
>
> diff --git a/gcc/function.c b/gcc/function.c
> index fc7b147b5f1..0495e9f1b81 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -5922,7 +5922,7 @@ gen_call_used_regs_seq (rtx_insn *ret, unsigned int 
> zero_regs_type)
>
>/* Update the data flow information.  */
>crtl->must_be_zero_on_return |= zeroed_hardregs;
> -  df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
> +  df_update_exit_block_uses ();
>  }
>  }
>
> diff --git a/gcc/testsuite/gcc.target/arm/pr100775.c 
> b/gcc/testsuite/gcc.target/arm/pr100775.c
> new file mode 100644
> index 000..c648cd5e8f7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pr100775.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
> +/* { dg-options "-mthumb -fzero-call-used-regs=used" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> --
> 2.27.0
>
>


Re: [PATCH 2/2] c++tools: Work around a BSD bug in getaddrinfo().

2022-03-14 Thread Richard Biener via Gcc-patches
On Mon, Mar 14, 2022 at 12:17 AM Iain Sandoe via Gcc-patches
 wrote:
>
> Some versions of the BSD getaddrinfo() call do not work with the specific
> input of "0" for the servname entry (a segv results).  Since we are making
> the call with a dummy port number, the value is actually not important, other
> than it should be in range.  Work around the BSD bug by using "1" instead.
>
> tested on powerpc,i686-darwin9, x86-64-darwin10,17,20
> powerpc64le,powerpc64,x86_64-linux-gnu,
>
> OK for master?
> eventual backports?

Why does the nullptr solution not work here?

> thanks
> Iain
>
> Signed-off-by: Iain Sandoe 
>
> c++tools/ChangeLog:
>
> * server.cc (accept_from): Use "1" as the dummy port number.
> ---
>  c++tools/server.cc | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/c++tools/server.cc b/c++tools/server.cc
> index 8c6ad314886..00154a05925 100644
> --- a/c++tools/server.cc
> +++ b/c++tools/server.cc
> @@ -360,7 +360,11 @@ accept_from (char *arg ATTRIBUTE_UNUSED)
>hints.ai_next = NULL;
>
>struct addrinfo *addrs = NULL;
> -  if (int e = getaddrinfo (slash == arg ? NULL : arg, "0", , ))
> +  /* getaddrinfo requires either hostname or servname to be non-null, so 
> that we must
> + set a port number (to cover the case that the string passed contains 
> just /NN).
> + Use an arbitrary in-range port number, but avoiding "0" which triggers 
> a bug on
> + some BSD variants.  */
> +  if (int e = getaddrinfo (slash == arg ? NULL : arg, "1", , ))
>  {
>noisy ("cannot resolve '%s': %s", arg, gai_strerror (e));
>ok = false;
> --
> 2.24.3 (Apple Git-128)
>


Re: [PATCH 1/2] libcody: Do not use a dummy port number in getaddrinfo().

2022-03-14 Thread Richard Biener via Gcc-patches
On Mon, Mar 14, 2022 at 12:16 AM Iain Sandoe via Gcc-patches
 wrote:
>
> The getaddrinfo() call requires either a non-null name for the server or
> a port service / number.  In the code that opens a connection we have
> been calling this with a dummy port number of "0".  Unfortunately this
> triggers a crashing bug in some BSD versions (and OSes importing that code).
>
> In this part of the code we do not really need a port number, since it
> is not reasonable to open a connection to an unspecified host.
>
> Setting hints ai_flags field to 0, and the servname parm to nullptr works
> around the BSD bug in this case.
>
> Also posted upstream.
>
> (fixes bad-mapper-2/3 on the versions affected).
>
> tested on powerpc,i686-darwin9, x86-64-darwin10,17,20
> powerpc64le,powerpc64,x86_64-linux-gnu,
>
> OK for master?

LGTM.

> eventual backports?

After a while, yes.

> thanks
> Iain
>
> Signed-off-by: Iain Sandoe 
>
> libcody/ChangeLog:
>
> * netclient.cc (OpenInet6): Do not provide a dummy port number
> in the getaddrinfo() call.
> ---
>  libcody/netclient.cc | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/libcody/netclient.cc b/libcody/netclient.cc
> index 7f81dd91810..558808be485 100644
> --- a/libcody/netclient.cc
> +++ b/libcody/netclient.cc
> @@ -93,7 +93,7 @@ int OpenInet6 (char const **e, char const *name, int port)
>  }
>
>addrinfo hints;
> -  hints.ai_flags = AI_NUMERICSERV;
> +  hints.ai_flags = 0;
>hints.ai_family = AF_INET6;
>hints.ai_socktype = SOCK_STREAM;
>hints.ai_protocol = 0;
> @@ -102,9 +102,7 @@ int OpenInet6 (char const **e, char const *name, int port)
>hints.ai_canonname = nullptr;
>hints.ai_next = nullptr;
>
> -  /* getaddrinfo requires a port number, but is quite happy to accept
> - invalid ones.  So don't rely on it.  */
> -  if (int err = getaddrinfo (name, "0", , ))
> +  if (int err = getaddrinfo (name, nullptr, , ))
>  {
>errstr = gai_strerror (err);
>// What's the best errno to set?
> --
> 2.24.3 (Apple Git-128)
>


Re: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help recognize FMA.

2022-03-14 Thread Richard Biener via Gcc-patches
On Sun, Mar 13, 2022 at 12:39 AM Marc Glisse via Gcc-patches
 wrote:
>
> On Fri, 11 Mar 2022, Roger Sayle wrote:
>
> +(match vec_same_elem_p
> +  CONSTRUCTOR@0
> +  (if (uniform_vector_p (TREE_CODE (@0) == SSA_NAME
> +? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : 
> @0
>
> Ah, I didn't remember we needed that, we don't seem to be very consistent
> about it. Probably for this reason, the transformation "Prefer vector1 <<
> scalar to vector1 << vector2" does not match
>
> typedef int vec __attribute__((vector_size(16)));
> vec f(vec a, int b){
>vec bb = { b, b, b, b };
>return a << bb;
> }
>
> which is only optimized at vector lowering time.

Few more comments - since match.pd is matching in match.pd order the

(match vec_same_elem_p
  @0
  (...))

should come last.  Please use

+(match vec_same_elem_p
+  CONSTRUCTOR@0
(if (TREE_CODE (@0) == SSA_NAME
 && uniform_vector_p (...

since otherwise we'll try uniform_vector_p twice on all CTORs (that
are not uniform).

> +/* Push VEC_PERM earlier if that may help FMA perception (PR101895).  */
> +(for plusminus (plus minus)
> +  (simplify
> +(plusminus (vec_perm (mult@0 @1 vec_same_elem_p@2) @0 @3) @4)
> +(plusminus (mult (vec_perm @1 @1 @3) @2) @4)))
>
> Don't you want :s on mult and vec_perm?

Yes.  Also for plus you want :c on it , likewise you want :c on the
mult.  The :c
on the plus will require splitting the plus and minus case :/

Otherwise looks reasonable.

Richard.

>
> --
> Marc Glisse


[PATCH] c++/96765: warn when casting "this" to Derived* in Base ctor/dtor

2022-03-14 Thread Zhao Wei Liew via Gcc-patches
Hi!

This patch adds a warning when casting "this" to Derived* in the Base
class constructor and destructor. I've added the warning under the
-Wextra flag as I can't find a more appropriate flag for it.

The patch has been bootstrapped and regression tested on x86_64-pc-linux-gnu.

Appreciate reviews and feedback. Thanks!
From c70e087c7ee9db7497da293f8a85891abe895d9a Mon Sep 17 00:00:00 2001
From: Zhao Wei Liew 
Date: Tue, 22 Feb 2022 16:03:17 +0800
Subject: [PATCH] c++: warn when casting "this" to Derived* in Base ctor/dtor
 [PR96765]

Casting "this" in a base class constructor to a derived class type is
undefined behaviour, but there is no warning when doing so.

Add a warning for this.

Signed-off-by: Zhao Wei Liew 

PR c++/96765

gcc/cp/ChangeLog:

* typeck.cc (build_static_cast_1): Add a warning when casting
  "this" to Derived * in Base constructor and destructor.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wextra-5.C: New test.
---
 gcc/cp/typeck.cc |  9 
 gcc/testsuite/g++.dg/warn/Wextra-5.C | 33 
 2 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wextra-5.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 516fa574ef6..782f70b27e6 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -8079,6 +8079,15 @@ build_static_cast_1 (location_t loc, tree type, tree 
expr, bool c_cast_p,
 {
   tree base;
 
+  if (current_function_decl
+  && (DECL_CONSTRUCTOR_P (current_function_decl)
+  || DECL_DESTRUCTOR_P (current_function_decl))
+  && TREE_CODE (expr) == NOP_EXPR
+  && is_this_parameter (TREE_OPERAND (expr, 0)))
+warning_at(loc, OPT_Wextra,
+   "invalid % from type %qT to type %qT before 
the latter is constructed",
+   intype, type);
+
   if (processing_template_decl)
return expr;
 
diff --git a/gcc/testsuite/g++.dg/warn/Wextra-5.C 
b/gcc/testsuite/g++.dg/warn/Wextra-5.C
new file mode 100644
index 000..7e82c4c6121
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wextra-5.C
@@ -0,0 +1,33 @@
+// PR c++/96765
+// { dg-options "-Wextra" }
+
+struct Derived;
+struct Base {
+  Derived *x;
+  Derived *y;
+  Base();
+  ~Base();
+};
+
+struct Derived : Base {};
+
+Base::Base()
+: x(static_cast(this)), // { dg-warning "invalid 'static_cast'" 
}
+  y((Derived *)this) // { dg-warning "invalid 'static_cast'" }
+{ 
+  static_cast(this); // { dg-warning "invalid 'static_cast'" }
+  (Derived *)this; // { dg-warning "invalid 'static_cast'" }
+}
+
+Base::~Base() {
+  static_cast(this); // { dg-warning "invalid 'static_cast'" }
+  (Derived *)this; // { dg-warning "invalid 'static_cast'" }
+}
+
+struct Other {
+  Other() {
+Base b;
+static_cast();
+(Derived *)();
+  }
+};
-- 
2.25.1