[gcc12 backport] PR target/105930: Split *xordi3_doubleword after reload on x86.

2022-07-09 Thread Roger Sayle

This is a backport of the fix for PR target/105930 from mainline to the
gcc12 release branch.  This patch has been retested against the gcc12
branch on x86_64-pc-linux-gnu with make bootstrap and make -k check,
both with and without --target_board=unix{-m32} with no new failures.
Ok for the gcc12 branch?


2022-07-09  Roger Sayle  
Uroš Bizjak  

gcc/ChangeLog
PR target/105930
* config/i386/i386.md (*di3_doubleword): Split after
reload.  Use rtx_equal_p to avoid creating memory-to-memory moves,
and emit NOTE_INSN_DELETED if operand[2] is zero (i.e. with -O0).


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 7c9560fc4..1c4781d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10400,22 +10400,25 @@
   "ix86_expand_binary_operator (, mode, operands); DONE;")
 
 (define_insn_and_split "*di3_doubleword"
-  [(set (match_operand:DI 0 "nonimmediate_operand")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=ro,r")
(any_or:DI
-(match_operand:DI 1 "nonimmediate_operand")
-(match_operand:DI 2 "x86_64_szext_general_operand")))
+(match_operand:DI 1 "nonimmediate_operand" "0,0")
+(match_operand:DI 2 "x86_64_szext_general_operand" "re,o")))
(clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT
-   && ix86_binary_operator_ok (, DImode, operands)
-   && ix86_pre_reload_split ()"
+   && ix86_binary_operator_ok (, DImode, operands)"
   "#"
-  "&& 1"
+  "&& reload_completed"
   [(const_int 0)]
 {
+  /* This insn may disappear completely when operands[2] == const0_rtx
+ and operands[0] == operands[1], which requires a NOTE_INSN_DELETED.  */
+  bool emit_insn_deleted_note_p = false;
+
   split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);
 
   if (operands[2] == const0_rtx)
-emit_move_insn (operands[0], operands[1]);
+emit_insn_deleted_note_p = true;
   else if (operands[2] == constm1_rtx)
 {
   if ( == IOR)
@@ -10427,7 +10430,10 @@
 ix86_expand_binary_operator (, SImode, &operands[0]);
 
   if (operands[5] == const0_rtx)
-emit_move_insn (operands[3], operands[4]);
+{
+  if (emit_insn_deleted_note_p)
+   emit_note (NOTE_INSN_DELETED);
+}
   else if (operands[5] == constm1_rtx)
 {
   if ( == IOR)


Re: [gcc12 backport] PR target/105930: Split *xordi3_doubleword after reload on x86.

2022-07-09 Thread Uros Bizjak via Gcc-patches
On Sat, Jul 9, 2022 at 11:26 AM Roger Sayle  wrote:
>
>
> This is a backport of the fix for PR target/105930 from mainline to the
> gcc12 release branch.  This patch has been retested against the gcc12
> branch on x86_64-pc-linux-gnu with make bootstrap and make -k check,
> both with and without --target_board=unix{-m32} with no new failures.
> Ok for the gcc12 branch?
>
>
> 2022-07-09  Roger Sayle  
> Uroš Bizjak  
>
> gcc/ChangeLog
> PR target/105930
> * config/i386/i386.md (*di3_doubleword): Split after
> reload.  Use rtx_equal_p to avoid creating memory-to-memory moves,
> and emit NOTE_INSN_DELETED if operand[2] is zero (i.e. with -O0).

OK.

Thanks,
Uros.

>
> Thanks in advance,
> Roger
> --
>


Modula-2: merge followup (brief update on the progress of the new linking implementation)

2022-07-09 Thread Gaius Mulley via Gcc-patches


A very brief update to say that I've merged the new linking
implementation back onto the devel/modula-2 branch,

regards,
Gaius


[x86_64 PATCH] Improved Scalar-To-Vector (STV) support for TImode to V1TImode.

2022-07-09 Thread Roger Sayle

This patch upgrades x86_64's scalar-to-vector (STV) pass to more
aggressively transform 128-bit scalar TImode operations into vector
V1TImode operations performed on SSE registers.  TImode functionality
already exists in STV, but only for move operations, this changes
brings support for logical operations (AND, IOR, XOR, NOT and ANDN)
and comparisons.

The effect of these changes are conveniently demonstrated by the new
sse4_1-stv-5.c test case:

__int128 a[16];
__int128 b[16];
__int128 c[16];

void foo()
{
  for (unsigned int i=0; i<16; i++)
a[i] = b[i] & ~c[i];
}

which when currently compiled on mainline wtih -O2 -msse4 produces:

foo:xorl%eax, %eax
.L2:movqc(%rax), %rsi
movqc+8(%rax), %rdi
addq$16, %rax
notq%rsi
notq%rdi
andqb-16(%rax), %rsi
andqb-8(%rax), %rdi
movq%rsi, a-16(%rax)
movq%rdi, a-8(%rax)
cmpq$256, %rax
jne .L2
ret

but with this patch now produces:

foo:xorl%eax, %eax
.L2:movdqa  c(%rax), %xmm0
pandn   b(%rax), %xmm0
addq$16, %rax
movaps  %xmm0, a-16(%rax)
cmpq$256, %rax
jne .L2
ret

Technically, the STV pass is implemented by three C++ classes, a common
abstract base class "scalar_chain" that contains common functionality,
and two derived classes: general_scalar_chain (which handles SI and
DI modes) and timode_scalar_chain (which handles TI modes).  As
mentioned previously, because only TI mode moves were handled the
two worker classes behaved significantly differently.  These changes
bring the functionality of these two classes closer together, which
is reflected by refactoring more shared code from general_scalar_chain
to the parent scalar_chain and reusing it from timode.  There still
remain significant differences (and simplifications) so the existing
division of classes (as specializations) continues to make sense.

Obviously, there are more changes to come (shifts and rotates),
and compute_convert_gain doesn't yet have its final (tuned) form,
but is already an improvement over the "return 1;" used previously.

This patch has been tested on x86_64-pc-linux-gnu with make boostrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2022-07-09  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-features.h (scalar_chain): Add fields
insns_conv, n_sse_to_integer and n_integer_to_sse to this
parent class, moved from general_scalar_chain.
(scalar_chain::convert_compare): Protected method moved
from general_scalar_chain.
(mark_dual_mode_def): Make protected, not private virtual.
(scalar_chain:convert_op): New private virtual method.

(general_scalar_chain::general_scalar_chain): Simplify constructor.
(general_scalar_chain::~general_scalar_chain): Delete destructor.
(general_scalar_chain): Move insns_conv, n_sse_to_integer and
n_integer_to_sse fields to parent class, scalar_chain.
(general_scalar_chain::mark_dual_mode_def): Delete prototype.
(general_scalar_chain::convert_compare): Delete prototype.

(timode_scalar_chain::compute_convert_gain): Remove simplistic
implementation, convert to a method prototype.
(timode_scalar_chain::mark_dual_mode_def): Delete prototype.
(timode_scalar_chain::convert_op): Prototype new virtual method.

* config/i386/i386-features.cc (scalar_chain::scalar_chain):
Allocate insns_conv and initialize n_sse_to_integer and
n_integer_to_sse fields in constructor.
(scalar_chain::scalar_chain): Free insns_conv in destructor.

(general_scalar_chain::general_scalar_chain): Delete
constructor, now defined in the class declaration.
(general_scalar_chain::~general_scalar_chain): Delete destructor.

(scalar_chain::mark_dual_mode_def): Renamed from
general_scalar_chain::mark_dual_mode_def.
(timode_scalar_chain::mark_dual_mode_def): Delete.
(scalar_chain::convert_compare): Renamed from
general_scalar_chain::convert_compare.

(timode_scalar_chain::compute_convert_gain): New method to
determine the gain from converting a TImode chain to V1TImode.
(timode_scalar_chain::convert_op): New method to convert an
operand from TImode to V1TImode.

(timode_scalar_chain::convert_insn) : Only PUT_MODE
on REG_EQUAL notes that were originally TImode (not CONST_INT).
Handle AND, ANDN, XOR, IOR, NOT and COMPARE.
(timode_mem_p): Helper predicate to check where operand is
memory reference with sufficient alignment for TImode STV.
(timode_scalar_to_vector_candidate_p): Use convertible_comparison_p
to check whether COMPARE is convertible.  Handle SET_DESTs that
that are REG_P or MEM_P and SET_SRCs that are R

[committed] libstdc++: Remove obsolete comment in

2022-07-09 Thread François Dumont via Gcc-patches

   libstdc++: Remove obsolete comment in  header

    The comment is obsolete because char_traits.h do not include 
stl_algobase.h
    anymore and stl_algobase.h is included directly from  a few 
lines

    below.

    libstdc++-v3/ChangeLog:

    * include/std/string: Remove obsolete comment about 
char_traits.h including

    stl_algobase.h.

François
diff --git a/libstdc++-v3/include/std/string b/libstdc++-v3/include/std/string
index 37a4aaba9cd..62ecdb3af45 100644
--- a/libstdc++-v3/include/std/string
+++ b/libstdc++-v3/include/std/string
@@ -37,7 +37,7 @@
 
 #include 
 #include 
-#include   // NB: In turn includes stl_algobase.h
+#include 
 #include 
 #include 
 #include // For operators >>, <<, and getline.


Re: Mips: Fix kernel_stat structure size

2022-07-09 Thread Hans-Peter Nilsson
On Sat, 9 Jul 2022, Xi Ruoyao wrote:

> On Fri, 2022-07-08 at 21:42 -0400, Hans-Peter Nilsson wrote:
> > On Fri, 1 Jul 2022, Dimitrije Milosevic wrote:
> >
> > > Fix kernel_stat structure size for non-Android 32-bit Mips.
> > > LLVM currently has this value for the kernel_stat structure size,
> > > as per compiler-rt/lib/sanitizer-
> > > common/sanitizer_platform_limits_posix.h.
> > > This also resolves one of the build issues for non-Android 32-bit
> > > Mips.
> >
> > I insist that PR105614 comment #7 is the way to go, i.e. fix
> > the merge error, avoiding the faulty include that it
> > reintroduced.? Was this tested on O32?
>
> I'm pretty sure it is *not* the way to go.
>
> Sanitizer does not really intercept system call.  It intercepts libc
> stat() or lstat() etc. calls.  So you need to keep struct_kernel_stat_sz
> same as the size of struct stat in libc, i. e. "the size of buffer which
> *libc* stat()-like functions writing into".
>
> The "kernel_" in the name is just misleading.

You make a sound argument, and I just refer to my old commit
9f943b2446f2d0a.  Either way, the proof is in the pussing: was
this tested for O32 or not?

> And, if you still think it should be the way to go, let's submit the
> change to LLVM and get it reviewed properly.

Sorry, I've no longer interest in mips besides I'd like to see
un-broken what I helped getting in a working state (support ASAN
in gcc for mips O32).

brgds, H-P


Re: [committed] libstdc++: Remove obsolete comment in

2022-07-09 Thread Jonathan Wakely via Gcc-patches
On Sat, 9 Jul 2022 at 13:26, François Dumont via Libstdc++
 wrote:
>
> libstdc++: Remove obsolete comment in  header
>
>  The comment is obsolete because char_traits.h do not include
> stl_algobase.h
>  anymore and stl_algobase.h is included directly from  a few
> lines
>  below.

Nice, thanks


>
>  libstdc++-v3/ChangeLog:
>
>  * include/std/string: Remove obsolete comment about
> char_traits.h including
>  stl_algobase.h.
>
> François



Re: [PATCH v3] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-07-09 Thread Jeff Law via Gcc-patches




On 6/21/2022 12:12 PM, Noah Goldstein via Gcc-patches wrote:

This patch allows for strchr(x, c) to the replace with memchr(x, c,
strlen(x) + 1) if strlen(x) has already been computed earlier in the
tree.

Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821

Since memchr doesn't need to re-find the null terminator it is faster
than strchr.

bootstrapped and tested on x86_64-linux.

PR tree-optimization/95821

gcc/

* tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
memchr instead of strchr if strlen already computed.

gcc/testsuite/

* c-c++-common/pr95821-1.c: New test.
* c-c++-common/pr95821-2.c: New test.
* c-c++-common/pr95821-3.c: New test.
* c-c++-common/pr95821-4.c: New test.
* c-c++-common/pr95821-5.c: New test.
* c-c++-common/pr95821-6.c: New test.
* c-c++-common/pr95821-7.c: New test.
* c-c++-common/pr95821-8.c: New test.
Given Jakub's involvement to-date and the fact this touches 
tree-ssa-strlen.cc I think Jakub should have final ACK/NAK on this.


jeff



Re: [PATCH] match.pd: Add new bitwise arithmetic pattern [PR98304]

2022-07-09 Thread Jeff Law via Gcc-patches




On 7/7/2022 7:59 AM, Sam Feifer via Gcc-patches wrote:

Hi!

This patch is meant to solve a missed optimization in match.pd. It optimizes the following 
expression: n - (((n > 63) ? n : 63) & -64) where the constant being negated (in this case 
-64) is a power of 2 and the sum of the two constants is -1. For the signed case, this gets 
optimized to (n <= 63) ? n : (n & 63). For the unsigned case, it gets optimized to (n 
& 63). In both scenarios, the number of instructions produced decreases.

There are also tests for this optimization making sure the optimization happens 
when it is supposed to, and does not happen when it isn't.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR tree-optimization/98304

gcc/ChangeLog:

* match.pd (n - (((n > C1) ? n : C1) & -C2)): New simplification.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr98304-2.c: New test.
* gcc.dg/pr98304-1.c: New test.
OK.   I'm going to assume Red Hat's assignment covers you and/or you 
want to contribute under the DCO.    Going forward, if you're part of 
the tools team for Red Hat and expect to be contributing regularly, 
you'll probably want to get authenticated write access so that you can 
commit  approved changes (anyone on the team should be able to help with 
that).


I'll go ahead and push this one to the trunk.

Thanks,
Jeff



Re: [PATCH v2] Support --disable-fixincludes.

2022-07-09 Thread Jeff Law via Gcc-patches




On 7/8/2022 5:14 AM, Martin Liška wrote:

On 5/25/22 07:37, Alexandre Oliva wrote:

On May 24, 2022, Martin Liška  wrote:


Allways install limits.h and syslimits.h header files
to include folder.

typo: s/Allways/Always/

Hello.

Fixed.


I'm a little worried about this bit, too.  limitx.h includes
"syslimits.h", mentioning it should be in the same directory.  Perhaps
it could be left in include-fixed?

Well, I would like to go w/o include-fixed if the option --disable-fixincludes 
is used.


The patch also changes syslimits.h from either the fixincluded file or
gsyslimits.h to use gsyslimits.h unconditionally, which seemed wrong at
first.

Now I see how these two hunks work together: syslimits.h will now always
#include_next , which will find it in include-fixed if it's
there, and system header files otherwise.  Nice!, but IMHO the commit
message could be a little more verbose on the extent of the change and
why that (is supposed to) work.

Oh, to be honest I'm not fully familiar with how these 2 files work together.
Can you explain it to me so that I can adjust the changelog entry 
correspondingly?



It also looks like install-mkheaders installs limits-related headers for
when fixincludes runs; we could probably skip the whole thing if
fixincludes is disabled, but I'm also worried about how the changes
above might impact post-install fixincludes: if that installs
gsyslimits.h et al in include-fixed while your changes moves it to
include, headers might end up in a confusing state.  I haven't worked
out whether that's safe, but there appears to be room for cleanups
there.

I've check that 'make install-mkheaders' work fine w/ and w/o 
--disable-fixincludes
after the patch.


gcc/config/mips/t-sdemtk also places syslimits.h explicitly in include/
rather than include-fixed/, as part of disabling fixincludes, which is
good, but it could be cleaned up as well.

Can we do that as a follow-up patch?


I don't see other config fragments that might require adjustments, so I
think the patch looks good; hopefully my worries are unjustified, and
the cleanups it enables could be

Good.



We still create the README file in there and install it, even with
fixincludes disabled and thus unavailable, don't we?  That README then
becomes misleading; we might be better off not installing it.

Sure, fixed in v2 of the patch.




When --disable-fixincludes is used, then no systen header files
are fixed by the tools in fixincludes. Moreover, the fixincludes
tools are not built any longer.

typo: s/systen/system/

Fixed.



Could you please check that a post-install mkheaders still has a
functional limits.h with these changes?

How do I check that, please?


The patch is ok (with the typo
fixes) if so.  The cleanups it enables would be welcome as separate
patches ;-)

Can I install the v2?

Once Alex is OK with this patch, then it'll be good to go.

jeff


Re: [PATCH] Implement global ranges for all vrange types (SSA_NAME_RANGE_INFO).

2022-07-09 Thread Jeff Law via Gcc-patches




On 7/6/2022 11:10 AM, Aldy Hernandez via Gcc-patches wrote:

Currently SSA_NAME_RANGE_INFO only handles integer ranges, and loses
half the precision in the process because its use of legacy
value_range's.  This patch rewrites all the SSA_NAME_RANGE_INFO
(nonzero bits included) to use the recently contributed
vrange_storage.  With it, we'll be able to efficiently save any ranges
supported by ranger in GC memory.  Presently this will only be
irange's, but shortly we'll add floating ranges and others to the mix.

As per the discussion with the trailing_wide_ints adjustments and
vrange_storage, we'll be able to save integer ranges with a maximum of
5 sub-ranges.  This could be adjusted later if more sub-ranges are
needed (unlikely).

Since this is a behavior changing patch, I would like to take a few
days for discussion, and commit early next week if all goes well.

A few notes.

First, we get rid of the SSA_NAME_ANTI_RANGE_P bit in the SSA_NAME
since we store full resolution ranges.  Perhaps it could be re-used
for something else.

The range_info_def struct is gone in favor of an opaque type handled
by vrange_storage.  It currently supports irange, but will support
frange, prange, etc, in due time.

 From the looks of it, set_range_info was an update operation despite
its name, as we improved the nonzero bits with each call, even though
we clobbered the ranges.  Presumably this was because doing a proper
intersect of ranges lost information with the anti-range hack.  We no
longer have this limitation so now we formalize both set_range_info
and set_nonzero_bits to an update operation.  After all, we should
never be losing information, but enhancing it whenever possible.  This
means, that if folks' finger-memory is not offended, as a follow-up,
I'd like to rename set_nonzero_bits and set_range_info to update_*.

I have kept the same global API we had in tree-ssanames.h, with the
caveat that all set operations are now update as discussed above.

There is a 2% performance penalty for evrp and a 3% penalty for VRP
that is coincidentally in line with a previous improvement of the same
amount in the vrange abstraction patchset.  Interestingly, this
penalty is mostly due to the wide int to tree dance we keep doing with
irange and legacy.  In a first draft of this patch where I was
streaming trees directly, there was actually a small improvement
instead.  I hope to get some of the gain back when we move irange's to
wide-ints, though I'm not in a hurry ;-).

Tested and benchmarked on x86-64 Linux.  I will also test on ppc64le
before the final commit.

Comments welcome.

gcc/ChangeLog:

* gimple-range.cc (gimple_ranger::export_global_ranges): Remove
verification against legacy value_range.
* tree-core.h (struct range_info_def): Remove.
(struct irange_storage_slot): New.
(struct tree_base): Remove SSA_NAME_ANTI_RANGE_P documentation.
(struct tree_ssa_name): Add vrange_storage support.
* tree-ssanames.cc (range_info_p): New.
(range_info_fits_p): New.
(range_info_alloc): New.
(range_info_free): New.
(range_info_get_range): New.
(range_info_set_range): New.
(set_range_info_raw): Remove.
(set_range_info): Adjust to use vrange_storage.
(set_nonzero_bits): Same.
(get_nonzero_bits): Same.
(duplicate_ssa_name_range_info): Remove overload taking
value_range_kind.
Rewrite tree overload to use vrange_storage.
(duplicate_ssa_name_fn): Adjust to use vrange_storage.
* tree-ssanames.h (struct range_info_def): Remove.
(set_range_info): Adjust prototype to take vrange.
* tree-vrp.cc (vrp_asserts::remove_range_assertions): Call
duplicate_ssa_name_range_info.
* tree.h (SSA_NAME_ANTI_RANGE_P): Remove.
(SSA_NAME_RANGE_TYPE): Remove.
* value-query.cc (get_ssa_name_range_info): Adjust to use
vrange_storage.
(update_global_range): Use int_range_max.
(get_range_global): Remove as_a.
I'll be so happy once we don't have to keep doing the conversions 
between the types.


Anti-ranges no more!

I've got no real concerns here.  So unless someone objects, your plan is OK.

jeff



Re: [PATCH v3] c: Extend the -Wpadded message with actual padding size

2022-07-09 Thread Jeff Law via Gcc-patches




On 6/27/2022 2:04 AM, Vit Kabele wrote:

gcc/ChangeLog:

* stor-layout.cc (finalize_record_size): Extend warning message.

gcc/testsuite/ChangeLog:

* c-c++-common/Wpadded.c: New test.

Thanks.  I've pushed this to the trunk.

jeff



Re: [PATCH v2] [PR100106] Reject unaligned subregs when strict alignment is required

2022-07-09 Thread Jeff Law via Gcc-patches




On 5/5/2022 8:41 PM, Alexandre Oliva via Gcc-patches wrote:

On May  5, 2022, Segher Boessenkool  wrote:


On Thu, May 05, 2022 at 03:52:01AM -0300, Alexandre Oliva wrote:

+  else if (reg && MEM_P (reg)
+  && STRICT_ALIGNMENT && MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode))
+return false;

Please fix the line breaks?  Either do a break before every &&, or put
as many things as possible on one line?

I was going for conceptual grouping of alignment-related subexprs,
but I don't care enough to fight for it.


Note that you should never have paradoxical subregs of mem on rs6000 or
any other target with INSN_SCHEDULING.

Great, that alleviates some of my concerns about overreaching in this patch.


+#include "../../gcc.c-torture/compile/pr100106.c"

It is better to copy the 11 lines of code.

'k


Please comment what the ilp32 is for (namely, the -mcpu= will barf
without it)..

Ack


The testcase is okay with those changes, thanks!

Thanks.  Here's the revised patch.

I'm now testing on several platforms a follow-up patch that introduces
TARGET_ALLOW_SUBREG_OF_MEM.


[PR100106] Reject unaligned subregs when strict alignment is required

From: Alexandre Oliva 

The testcase for pr100106, compiled with optimization for 32-bit
powerpc -mcpu=604 with -mstrict-align expands the initialization of a
union from a float _Complex value into a load from an SCmode
constant pool entry, aligned to 4 bytes, into a DImode pseudo,
requiring 8-byte alignment.

The patch that introduced the testcase modified simplify_subreg to
avoid changing the MEM to outermode, but simplify_gen_subreg still
creates a SUBREG or a MEM that would require stricter alignment than
MEM's, and lra_constraints appears to get confused by that, repeatedly
creating unsatisfiable reloads for the SUBREG until it exceeds the
insn count.

Avoiding the unaligned SUBREG, expand splits the DImode dest into
SUBREGs and loads each SImode word of the constant pool with the
proper alignment.


for  gcc/ChangeLog

PR target/100106
* emit-rtl.cc (validate_subreg): Reject a SUBREG of a MEM that
requires stricter alignment than MEM's.

for  gcc/testsuite/ChangeLog

PR target/100106
* gcc.target/powerpc/pr100106-sa.c: New.

OK.
jeff



Re: [PATCH] postscan_insn hook not called after input_asm

2022-07-09 Thread Jeff Law via Gcc-patches




On 3/29/2022 8:10 AM, Paul Iannetta via Gcc-patches wrote:

Hi,

While working on the Kalray port of gcc, I noticed that the hook 
TARGET_ASM_FINAL_POSTSCAN_INSN is not called after emitting an instruction 
coming from a basic asm block.  Here is a patch which fixes this behavior.

The following check:
```
$ find gcc/config/ -type f -exec grep "#define TARGET_ASM_FINAL" "{}" +
gcc/config/m68k/m68k.cc:#define TARGET_ASM_FINAL_POSTSCAN_INSN 
m68k_asm_final_postscan_insn
gcc/config/avr/avr.cc:#define TARGET_ASM_FINAL_POSTSCAN_INSN 
avr_asm_final_postscan_insn
gcc/config/mips/mips.cc:#define TARGET_ASM_FINAL_POSTSCAN_INSN 
mips_final_postscan_insn
```
reveals that m68k, avr and mips are the only affected targets upstream.
So I know it's been a while since you posted this patch.   Do you recall 
the motivation behind it?  ie, was there something your port is doing 
that required the post-scan hook to be called on an ASM?


I'm not objecting to the patch, I just want to understand better what 
drove you to write it in the first place.


IIRC you can't ask for attributes on an ASM, meaning that the m68k hook 
will probably abort with this change.   You may need to adjust the m68k 
implementation so that it  doesn't fault if presented with an ASM.   
Something like this at the start should do the trick.


if (recog_memoized (insn) < 0)
  {
    flags_operand1 = flags_operand2 = NULL_RTX;
    return;
  }




Re: [PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855]

2022-07-09 Thread Jeff Law via Gcc-patches




On 3/16/2022 6:49 PM, Zhao Wei Liew via Gcc-patches wrote:

Thanks for the detailed review. I have uploaded patch v2 based on the review.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590604.html
Changes since v1:
1. Add patterns for the cases CST / (T)x and (T)x / CST as well. Fix
test regressions caused by those patterns.
2. Support signed integers except for the possible INT_MIN / -1 case.
3. Support cases where X and Y have different precisions.

On Tue, 22 Feb 2022 at 15:54, Richard Biener  wrote:

+/* (type)X / (type)Y -> (type)(X / Y)
+   when the resulting type is at least precise as the original types
+   and when all the types are unsigned integral types. */
+(simplify
+ (trunc_div (convert @0) (convert @1))
+ (if (INTEGRAL_TYPE_P (type)
+  && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+  && TYPE_UNSIGNED (type)
+  && TYPE_UNSIGNED (TREE_TYPE (@0))
+  && TYPE_UNSIGNED (TREE_TYPE (@1))
+  && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
+  && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
+  (convert (trunc_div @0 @1

since you are requiring the types of @0 and @1 to match it's easier to write

  && types_match (TREE_TYPE(@0), TREE_TYPE (@1))


Thanks. In the new patch, TREE_TYPE (@0) and TREE_TYPE (@1) can now
have different precisions, so I believe that I can't use types_match()
anymore.


that allows you to elide checks on either @0 or @1.  I suppose the transform
does not work for signed types because of the -INT_MIN / -1 overflow case?
It might be possible to use expr_not_equal_to (@0, -INT_MIN) ||
expr_not_equal_to (@1, -1)
(correctly written, lookup the existing examples in match.pd for the X
% -Y transform)

That's right. I have made changes to support signed types except for
the INT_MIN / -1 case.


I'll note that as written the transform will not catch CST / (T)x or
(T)x / CST since
you'll not see conversions around constants.  I'm not sure whether
using (convert[12]? ...)
or writing special patterns with INTEGER_CST operands is more convenient.
There is int_fits_type_p to check whether a constant will fit in a
type without truncation
or sign change.

I see. I couldn't find an easy way to use (convert[12]? ...) in this
case so I added 2 other patterns for the CST / (T)x and (T)x / CST
cases.


When @0 and @1 do not have the same type there might still be a common type
that can be used and is smaller than 'type', it might be as simple as using
build_nonstandard_integer_type (MIN (@0-prec, @1-prec), 1 /*unsigned_p*/).

Thanks for the tip. I've made a similar change, but using either
TREE_TYPE (@0) or TREE_TYPE (@1) instead of
build_nonstandard_integer_type().


In the past there have been attempts to more globally narrow operations using
a new pass rather than using individual patterns.  So for more complicated cases
that might be the way to go.  There's now also the ISEL pass which does
pre-RTL expansion transforms that need some global context and for example
can look at SSA uses.

I had wanted to do something similar to handle all integral
trunc_divs, but I'm not sure where/how to start. It seems out of my
league at this moment. I'll gladly explore it after this change
though!

Just a couple general notes in case you want to poke further in this space.

Earlier you mentioned you were trying to do some of this work in 
expand_divmod, but the operands had rtx types rather than tree types, 
thus you're unable to get at the range data.


In expand_expr_divmod there's treeop0, treeop1.  So if they are 
SSA_NAMEs, then you can query for their range.


Richi mentioned attempts to globally narrow during a new pass. That's a 
much broader chance, but has the potential to apply in a lot more 
places.    Narrowing early would potentially expose the narrowed 
statements to the vectorizer which could be a win. Narrowing late is 
likely a win too, but it likely only helps the expansion phase generate 
narrower operations.  This is obviously a larger project than just 
handling the division cases in match.pd. Tackling it is not (IMHO) a 
requirement to move forward.


Finally, narrowing isn't always a win.  There have been 
micro-architectures were sub-word accesses are slower than word 
accesses.  I'm not too worried about it for this patch, but it could 
become more important if you were to look into a more general solution.


As far as the patch is concerned.  At one point I was a bit worried that 
expr_not_equal_to wasn't right.  But after digging a bit deeper into its 
implementation, it should do the right thing here.  Note if you wanted 
to see how to get to underlying range data, you can find a simple 
example in that function.



Your new testcase needs to limit itself to targets where integers are at 
least 32 bits wide.    Something like this at the top of the new test:


/* { dg-require-effective-target int32plus } */

There are additional cases Andrew Pinski pointed out in B

[COMMITTED] Set VR_VARYING in irange::irange_single_pair_union.

2022-07-09 Thread Aldy Hernandez via Gcc-patches
The fast union operation is sometimes setting a range of the entire
domain, but leaving the kind bit as VR_RANGE instead of downgrading it
to VR_VARYING.

Tested on x86-64 Linux.

gcc/ChangeLog:

* value-range.cc (irange::irange_single_pair_union): Set
VR_VARYING when appropriate.
---
 gcc/value-range.cc | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 25f1acff4a3..fd549b9ca59 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1777,11 +1777,7 @@ irange::irange_single_pair_union (const irange &r)
   // Check for overlap/touching ranges, or single target range.
   if (m_max_ranges == 1
  || wi::to_widest (m_base[1]) + 1 >= wi::to_widest (r.m_base[0]))
-   {
- m_base[1] = r.m_base[1];
- if (varying_compatible_p ())
-   m_kind = VR_VARYING;
-   }
+   m_base[1] = r.m_base[1];
   else
{
  // This is a dual range result.
@@ -1789,8 +1785,8 @@ irange::irange_single_pair_union (const irange &r)
  m_base[3] = r.m_base[1];
  m_num_ranges = 2;
}
-  if (flag_checking)
-   verify_range ();
+  if (varying_compatible_p ())
+   m_kind = VR_VARYING;
   return true;
 }
 
@@ -1817,8 +1813,8 @@ irange::irange_single_pair_union (const irange &r)
   m_base[3] = m_base[1];
   m_base[1] = r.m_base[1];
 }
-  if (flag_checking)
-verify_range ();
+  if (varying_compatible_p ())
+m_kind = VR_VARYING;
   return true;
 }
 
@@ -1857,6 +1853,8 @@ irange::irange_union (const irange &r)
 {
   bool ret = irange_single_pair_union (r);
   set_nonzero_bits (saved_nz);
+  if (flag_checking)
+   verify_range ();
   return ret || ret_nz;
 }
 
-- 
2.36.1



Re: [PATCH] Implement global ranges for all vrange types (SSA_NAME_RANGE_INFO).

2022-07-09 Thread Aldy Hernandez via Gcc-patches
On Sat, Jul 9, 2022 at 6:16 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 7/6/2022 11:10 AM, Aldy Hernandez via Gcc-patches wrote:
> > Currently SSA_NAME_RANGE_INFO only handles integer ranges, and loses
> > half the precision in the process because its use of legacy
> > value_range's.  This patch rewrites all the SSA_NAME_RANGE_INFO
> > (nonzero bits included) to use the recently contributed
> > vrange_storage.  With it, we'll be able to efficiently save any ranges
> > supported by ranger in GC memory.  Presently this will only be
> > irange's, but shortly we'll add floating ranges and others to the mix.
> >
> > As per the discussion with the trailing_wide_ints adjustments and
> > vrange_storage, we'll be able to save integer ranges with a maximum of
> > 5 sub-ranges.  This could be adjusted later if more sub-ranges are
> > needed (unlikely).
> >
> > Since this is a behavior changing patch, I would like to take a few
> > days for discussion, and commit early next week if all goes well.
> >
> > A few notes.
> >
> > First, we get rid of the SSA_NAME_ANTI_RANGE_P bit in the SSA_NAME
> > since we store full resolution ranges.  Perhaps it could be re-used
> > for something else.
> >
> > The range_info_def struct is gone in favor of an opaque type handled
> > by vrange_storage.  It currently supports irange, but will support
> > frange, prange, etc, in due time.
> >
> >  From the looks of it, set_range_info was an update operation despite
> > its name, as we improved the nonzero bits with each call, even though
> > we clobbered the ranges.  Presumably this was because doing a proper
> > intersect of ranges lost information with the anti-range hack.  We no
> > longer have this limitation so now we formalize both set_range_info
> > and set_nonzero_bits to an update operation.  After all, we should
> > never be losing information, but enhancing it whenever possible.  This
> > means, that if folks' finger-memory is not offended, as a follow-up,
> > I'd like to rename set_nonzero_bits and set_range_info to update_*.
> >
> > I have kept the same global API we had in tree-ssanames.h, with the
> > caveat that all set operations are now update as discussed above.
> >
> > There is a 2% performance penalty for evrp and a 3% penalty for VRP
> > that is coincidentally in line with a previous improvement of the same
> > amount in the vrange abstraction patchset.  Interestingly, this
> > penalty is mostly due to the wide int to tree dance we keep doing with
> > irange and legacy.  In a first draft of this patch where I was
> > streaming trees directly, there was actually a small improvement
> > instead.  I hope to get some of the gain back when we move irange's to
> > wide-ints, though I'm not in a hurry ;-).
> >
> > Tested and benchmarked on x86-64 Linux.  I will also test on ppc64le
> > before the final commit.
> >
> > Comments welcome.
> >
> > gcc/ChangeLog:
> >
> >   * gimple-range.cc (gimple_ranger::export_global_ranges): Remove
> >   verification against legacy value_range.
> >   * tree-core.h (struct range_info_def): Remove.
> >   (struct irange_storage_slot): New.
> >   (struct tree_base): Remove SSA_NAME_ANTI_RANGE_P documentation.
> >   (struct tree_ssa_name): Add vrange_storage support.
> >   * tree-ssanames.cc (range_info_p): New.
> >   (range_info_fits_p): New.
> >   (range_info_alloc): New.
> >   (range_info_free): New.
> >   (range_info_get_range): New.
> >   (range_info_set_range): New.
> >   (set_range_info_raw): Remove.
> >   (set_range_info): Adjust to use vrange_storage.
> >   (set_nonzero_bits): Same.
> >   (get_nonzero_bits): Same.
> >   (duplicate_ssa_name_range_info): Remove overload taking
> >   value_range_kind.
> >   Rewrite tree overload to use vrange_storage.
> >   (duplicate_ssa_name_fn): Adjust to use vrange_storage.
> >   * tree-ssanames.h (struct range_info_def): Remove.
> >   (set_range_info): Adjust prototype to take vrange.
> >   * tree-vrp.cc (vrp_asserts::remove_range_assertions): Call
> >   duplicate_ssa_name_range_info.
> >   * tree.h (SSA_NAME_ANTI_RANGE_P): Remove.
> >   (SSA_NAME_RANGE_TYPE): Remove.
> >   * value-query.cc (get_ssa_name_range_info): Adjust to use
> >   vrange_storage.
> >   (update_global_range): Use int_range_max.
> >   (get_range_global): Remove as_a.
> I'll be so happy once we don't have to keep doing the conversions
> between the types.
>
> Anti-ranges no more!

Yeah, it took a little longer than the 6 weeks Andrew had estimated
originally :-P.

Note that anti range kinda sorta still exist in two forms:

a) If you use value_range, as it still uses the legacy stuff
underneath.  But any new consumers (evrp, DOM, etc), all pass an
int_range or an int_range_max, so anyone who cares about ranges
should never see an anti range.  Later this cycle value_range will be
typedefed to what is now Value_Range, which is an infinite precision
range that works 

[PATCH] c: Fix location for _Pragma tokens [PR97498]

2022-07-09 Thread Lewis Hyatt via Gcc-patches
Hello-

PR97498 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97498) is another PR
related to the fact that imprecise locations for _Pragma result in
counterintuitive behavior for GCC diagnostic pragmas, which inhibit the
ability to make convenient wrapper macros for enabling and disabling
diagnostics in specific scopes.

It looks like David did a lot of work a few years ago improving this
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69543 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69558), and in particular
r233637 added a lot of new test coverage for cases that had regressed in the
past.

I think the main source of problems for all remaining issues is that we use
the global input_location for deciding when/if a diagnostic should apply. I
think it should be eventually doable to eliminate this, and rather properly
resolve the token locations to the place they need to be so that _Pragma
type wrapper macros just work the way people expect.

That said, PR97498 can be solved easily with a 2-line fix without removing
input_location, and I think the resulting change to input_location's value
is an improvement that will benefit other areas, so I thought I'd see what
you think about this patch please?

Here is a typical testcase. Note the line continuations so it's all one
logical line.

===
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wunused-function\"") \
static void f() {} \
_Pragma("GCC diagnostic pop")
===

What happens is that in C++ mode, input_location is always updated to the
most recently-lexed token, so the above case works fine and does not warn
when compiled with "g++ -Wunused-functions". However, in C mode, it does
warn because input_location in C is almost always set to the start of the
line, and is in this case. So the pop is deemed to take place prior to the
definition of f().

Initially, I thought it best to change input_location for C mode to behave
like C++, and always update to the most recently lexed token. Maybe that's
still the right way to go, but there was a fair amount of testsuite fallout
from that. Most of it, was just that we would need to change the tests to look
for the new locations, and in many cases, the new locations seemed
preferable to the old ones, but it seemed a bit much for now, so I took a
more measured approach and just changed input_location in the specific case
of processing a pragma, to be the location of the CPP_PRAGMA token.

Unfortunately, it turns out that the CPP_PRAGMA token that libcpp provides
to represent the _Pragma() expression doesn't have a valid location with
which input_location could be overridden. Looking into that, in r232893
David added logic which sets the location of all tokens inside the
_Pragma(...) to a reasonable place (namely it points to "_Pragma" at the
expansion point). However, that patch didn't change the location of the
CPP_PRAGMA token itself to similarly point there, so the 2nd line of this
patch does that.

The rest of it is just tweaking a couple tests which were sensitive to the
location being output. In all these cases, the new locations seem more
informative to me than the old ones. With those tweaks, bootstrap + regtest
all languages looks good with no regressions.

Please let me know what you think? Thanks!

-Lewis
[PATCH] c: Fix location for _Pragma tokens [PR97498]

The handling of #pragma GCC diagnostic uses input_location, which is not always
as precise as needed; in particular the relative location of some tokens and a
_Pragma directive will crucially determine whether a given diagnostic is enabled
or suppressed in the desired way. PR97498 shows how the C frontend ends up with
input_location pointing to the beginning of the line containing a _Pragma()
directive, resulting in the wrong behavior if the diagnostic to be modified
pertains to some tokens found earlier on the same line. This patch fixes that by
addressing two issues:

a) libcpp was not assigning a valid location to the CPP_PRAGMA token
generated by the _Pragma directive.
b) C frontend was not setting input_location to something reasonable.

With this change, the C frontend is able to change input_location to point to
the _Pragma token as needed.

This is just a two-line fix (one for each of a) and b)), the testsuite changes
were needed only because the location on the tested warnings has been somewhat
improved, so the tests need to look for the new locations.

gcc/c/ChangeLog:

PR preprocessor/97498
* c-parser.cc (c_parser_pragma): Set input_location to the
location of the pragma, rather than the start of the line.

libcpp/ChangeLog:

PR preprocessor/97498
* directives.cc (destringize_and_run): Override the location of
the CPP_PRAGMA token from a _Pragma directive to the location of
the expansion point, as is done for the tokens lexed from it.

gcc/testsuite/ChangeLog:

PR preprocessor/97498
* c-c++-common/pr97498.c: New test.
* c-c++-common/

Re: [PATCH v3] loongarch: fix mulsidi3_64bit instruction

2022-07-09 Thread Lulu Cheng



在 2022/7/9 上午10:56, Xi Ruoyao 写道:

v3: Relax scan-assembler pattern in test case mulw_d_w.c.  It's because
multiplication is Abelian and the compiler may switch the order of
operands in the future.
-- >8 --

(mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be
"mulw.d.w", not "mul.d".

gcc/ChangeLog:

* config/loongarch/loongarch.md (mulsidi3_64bit): Use mulw.d.w
instead of mul.d.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/mulw_d_w.c: New test.
* gcc.c-torture/execute/mul-sext.c: New test.
---


I think there is no problem with this modification.

Thankes!




Re: [PATCH 0/2] loongarch: improve code generation for integer division

2022-07-09 Thread Lulu Cheng



在 2022/7/7 上午10:23, Xi Ruoyao 写道:

We were generating some unnecessary instructions for integer division.
These two patches improve the code generation to compile

 template  T div(T a, T b) { return a / b; }

into a single division instruction (along with a return instruction of
course) as we expected for T in {int32_t, uint32_t, int64_t}.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (2):
   loongarch: add alternatives for idiv insns to improve code generation
   loongarch: avoid unnecessary sign-extend after 32-bit division

  gcc/config/loongarch/loongarch-protos.h|  1 +
  gcc/config/loongarch/loongarch.cc  |  2 +-
  gcc/config/loongarch/loongarch.md  | 34 --
  gcc/testsuite/gcc.target/loongarch/div-1.c |  9 ++
  gcc/testsuite/gcc.target/loongarch/div-2.c |  9 ++
  gcc/testsuite/gcc.target/loongarch/div-3.c |  9 ++
  gcc/testsuite/gcc.target/loongarch/div-4.c |  9 ++
  7 files changed, 63 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c


LGTM and spec has been tested.

Thanks!



Re: PING^2: [PATCH] Add --enable-first-stage-cross configure option

2022-07-09 Thread Jeff Law via Gcc-patches




On 1/9/2022 2:26 PM, Serge Belyshev wrote:

Ping: [PATCH] Add --enable-first-stage-cross configure option
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575318.html


Add --enable-first-stage-cross configure option

Build static-only, C-only compiler that is sufficient to cross compile
glibc.  This option disables various runtime libraries that require
libc to compile, turns on --with-newlib, --without-headers,
--disable-decimal-float, --disable-shared, --disable-threads, and sets
--enable-languages=c.
 
Rationale: current way of building first stage compiler of a cross

toolchain requires specifying a list of target libraries that are not
going to be compiled due to their dependency on target libc.  This
list is not documented in gccinstall.texi and sometimes changes.  To
simplify the procedure, it is better to maintain that list in the GCC
itself.

Usage example as a patch to glibc's scripts/build-many-libcs.py:

diff --git a/scripts/build-many-glibcs.py b/scripts/build-many-glibcs.py
index 580d25e8ee..3a6a7be76b 100755
--- a/scripts/build-many-glibcs.py
+++ b/scripts/build-many-glibcs.py
@@ -1446,17 +1446,7 @@ class Config(object):
  # required to define inhibit_libc (to stop some parts of
  # libgcc including libc headers); --without-headers is not
  # sufficient.
-cfg_opts += ['--enable-languages=c', '--disable-shared',
- '--disable-threads',
- '--disable-libatomic',
- '--disable-decimal-float',
- '--disable-libffi',
- '--disable-libgomp',
- '--disable-libitm',
- '--disable-libmpx',
- '--disable-libquadmath',
- '--disable-libsanitizer',
- '--without-headers', '--with-newlib',
+cfg_opts += ['--enable-first-stage-cross',
   '--with-glibc-version=%s' % self.ctx.glibc_version
   ]
  cfg_opts += self.first_gcc_cfg

Bootstrapped/regtested on x86_64-pc-linux-gnu, and
tested with build-many-glibcs.py with the above patch.

OK for mainline?


ChangeLog:

* configure.ac: Add --enable-first-stage-cross.
* configure: Regenerate.

gcc/ChangeLog:

* doc/install.texi: Document --enable-first-stage-cross.
I'm not really sure we need a patch for this.  Isn't it sufficient to 
"make all-gcc && make all-target-libgcc"?  Folks have been doing that 
for decades.


Jeff


Re: [PATCH] middle-end/104854: Avoid overread warning for strnlen and strndup

2022-07-09 Thread Jeff Law via Gcc-patches




On 3/9/2022 5:39 PM, Siddhesh Poyarekar wrote:

The size argument larger than size of SRC for strnlen and strndup is
problematic only if SRC is not NULL terminated, which invokes undefined
behaviour.  In all other cases, as long as SRC is large enough to have a
NULL char (i.e. size 1 or more), a larger N should not invoke a warning
during compilation.

Such a warning may be a suitable check for the static analyzer instead
with slightly different wording suggesting that choice of size argument
makes the function call equivalent to strlen/strdup.

This change results in the following code going through without a
warning:

--
char *buf;

char *
foo (void)
{
   buf = __builtin_malloc (4);
   __builtin_memset (buf, 'A', 4);

   return __builtin_strndup (buf,  5);
}

int
main ()
{
   __builtin_printf ("%s\n", foo ());
}
--

but the problem above is a missing NULL, not N being larger than the
size of SRC and the overread warning in this context is confusing at
best and misleading (and hinting at the wrong solution) in the worst
case.

gcc/ChangeLog:

middle-end/104854
* gimple-ssa-warn-access.cc (check_access):
New parameter.  Skip warning if in read-only mode, source string
is NULL terminated and has non-zero object size.
(check_access): New parameter.
(check_access): Adjust.
(check_read_access): New parameter.  Adjust for check_access
change.
(pass_waccess::check_builtin): Adjust check_read_access call for
memcmp, memchr.
(pass_waccess::maybe_check_access_sizes): Likewise.

gcc/testsuite/ChangeLog:

middle-end/104854
* gcc.dg/Wstringop-overread.c
(test_strnlen_array, test_strndup_array): Don't expect warning
for non-zero source sizes.
* gcc.dg/attr-nonstring-4.c (strnlen_range): Likewise.
* gcc.dg/pr78902.c: Likewise.
* gcc.dg/warn-strnlen-no-nul.c: Likewise.
I know this is old and the BZ has been set as CLOSED/INVALID.  But it 
was in my TODO list,  and I've got thoughts here so I might as well 
chime in ;-)


The potential overread warning for that code seems quite reasonable to 
me.    Yes it is the case that the length argument is sometimes 
unrelated to the source string.  But even then where's the line for when 
we should and should not warn?


In my mind these middle end warnings can not and will not ever be 100% 
accurate.  Their goal is to say "hey, can someone please look at this 
code very closely because it might be wrong" -- and that's probably 
about the best we can do since many of these are trying to infer intent 
of the programmer.


It almost feels like we want not just to have finer grained control over 
this particular class of wrnings, but even within the class we may want 
levels (and just to be clear, I'm not generally a fan of leveled 
warnings as the higher levels rarely get used in practice). The 
different levels could correspond to our ability to discern the source 
of the length operand -- ie, it is related to the source operand, some 
other operand or a constant and we could potentially treat each of those 
cases differently.


Jeff


Re: [PING^2][PATCH][final] Handle compiler-generated asm insn

2022-07-09 Thread Jeff Law via Gcc-patches




On 3/21/2022 10:14 AM, Tom de Vries via Gcc-patches wrote:

On 3/21/22 14:49, Richard Biener wrote:

On Mon, Mar 21, 2022 at 12:50 PM Tom de Vries  wrote:


On 3/21/22 08:58, Richard Biener wrote:

On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches
 wrote:


On 3/9/22 13:50, Tom de Vries wrote:

On 2/22/22 14:55, Tom de Vries wrote:

Hi,

For the nvptx port, with -mptx-comment we have in pr53465.s:
...
   // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
   // Start: Added by -minit-regs=3:
   // #NO_APP
   mov.u32 %r26, 0;
   // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
   // End: Added by -minit-regs=3:
   // #NO_APP
...

The comments where generated using the compiler-generated 
equivalent of:

...
 asm ("// Comment");
...
but both the printed location and the NO_APP/APP are unnecessary 
for a

compiler-generated asm insn.

Fix this by handling ASM_INPUT_SOURCE_LOCATION == 
UNKNOWN_LOCATION in

final_scan_insn_1, such what we simply get:
...
   // Start: Added by -minit-regs=3:
   mov.u32 %r26, 0;
   // End: Added by -minit-regs=3:
...

Tested on nvptx.

OK for trunk?





Ping^2.

Tobias just reported an ICE in PR104968, and this patch fixes it.

I'd like to known whether this patch is acceptable for stage 4 or 
not.


If not, I need to fix PR104968 in a different way.  Say, disable
-mcomment by default, or trying harder to propagate source info on
outlined functions.




Hi,

thanks for the review.


Usually targets use UNSPECs to emit compiler-generated "asm"
instructions.


Ack. [ I could go down that route eventually, but for now I'm hoping to
implement this without having to change the port. ]


I think an unknown location is a reasonable but not
the best way to identify 'compiler-generated', we might lose
the location through optimization.  (why does it not use
the INSN_LOCATION?)



I don't know.  FWIW, at the time that ASM_INPUT_SOURCE_LOCATION was
introduced (2007), there was no INSN_LOCATION yet (introduced in 2012),
only INSN_LOCATOR, my guess is that it has something to do with that.


Rather than a location I'd use sth like DECL_ARTIFICIAL to
disable 'user-mangling', do we have something like that for
ASM or an insn in general?


Haven't found it.


If not maybe there's an unused
bit on ASMs we can enable this way.


Done.  I've used the jump flag for that.

Updated, untested patch attached.

Is this what you meant?


Hmm.  I now read that ASM_INPUT is in every PATTERN of an insn


Maybe I misunderstand, but that sounds incorrect to me.  That is, can 
you point me to where you read that?


Maybe you're referring to the fact that an ASM_INPUT may occur inside 
an ASM_OPERANDS, as "a convenient way to hold a string" (quoting 
rtl.def)?



and wonder how this all works out there.  That is, by default the
ASM_INPUT would be artificial (for regular define_insn) but asm("")
in source would mark them ASM_INPUT_USER_P or so.



If you're suggesting to make it by default artificial, then that 
doesn't sound like a bad idea to me.  In this iteration I haven't 
implemented this (yet), but instead explicitly marked as artificial 
some other uses of ASM_INPUT.



But then I know nothing here.  I did expect us to look at
ASM_OPERANDS instead of just ASM_INPUT (but the code you
are changing is about ASM_INPUT).



I extended the rationale in the commit log a bit to include a 
description of what the rtl-equivalent of 'asm ("// Comment")' looks 
like, and there's no ASM_OPERANDS there.



That said, the comments should probably explicitely say this
is about ASM_INPUT in an ASM_OPERANDS  instruction
template, not some other pattern.



AFAIU, this isn't about an ASM_INPUT in an ASM_OPERANDS instruction 
template, so at this point I haven't updated the comment.


Thanks,
- Tom

0015-final-Handle-compiler-generated-asm-insn.patch

[final] Handle compiler-generated asm insn

For the nvptx port, with -mptx-comment we have for test-case pr53465.c at
mach:
...
(insn 66 43 65 3 (asm_input ("// Start: Added by -minit-regs=3:")) -1
  (nil))
(insn 65 66 67 3 (set (reg/v:SI 26 [ d ])
 (const_int 0 [0])) 6 {*movsi_insn}
  (nil))
(insn 67 65 44 3 (asm_input ("// End: Added by -minit-regs=3:")) -1
  (nil))
...
and in pr53465.s:
...
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // Start: Added by -minit-regs=3:
 // #NO_APP
 mov.u32 %r26, 0;
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // End: Added by -minit-regs=3:
 // #NO_APP
...

[ The comment insns were modelled after:
...
   asm ("// Comment");
...
which expands to:
...
(insn 5 2 6 2 (parallel [
 (asm_input/v ("// Comment") test.c:4)
 (clobber (mem:BLK (scratch) [0  A8]))
 ]) "test.c":4:3 -1
  (nil))
...
Note btw the differences: the comment insn has no clobber, and A

Re: [PATCH v3] loongarch: fix mulsidi3_64bit instruction

2022-07-09 Thread Xi Ruoyao via Gcc-patches
On Sun, 2022-07-10 at 09:45 +0800, Lulu Cheng wrote:
> 
> 在 2022/7/9 上午10:56, Xi Ruoyao 写道:
> > v3: Relax scan-assembler pattern in test case mulw_d_w.c.  It's
> > because
> > multiplication is Abelian and the compiler may switch the order of
> > operands in the future.
> > -- >8 --
> > 
> > (mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be
> > "mulw.d.w", not "mul.d".
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.md (mulsidi3_64bit): Use
> > mulw.d.w
> > instead of mul.d.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/loongarch/mulw_d_w.c: New test.
> > * gcc.c-torture/execute/mul-sext.c: New test.
> > ---
> 
> I think there is no problem with this modification.
> 
> Thankes!
> 

Pushed r13-1591 and r12-8562.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Implement global ranges for all vrange types (SSA_NAME_RANGE_INFO).

2022-07-09 Thread Jeff Law via Gcc-patches




On 7/9/2022 1:31 PM, Aldy Hernandez wrote:

On Sat, Jul 9, 2022 at 6:16 PM Jeff Law via Gcc-patches
 wrote:



On 7/6/2022 11:10 AM, Aldy Hernandez via Gcc-patches wrote:

Currently SSA_NAME_RANGE_INFO only handles integer ranges, and loses
half the precision in the process because its use of legacy
value_range's.  This patch rewrites all the SSA_NAME_RANGE_INFO
(nonzero bits included) to use the recently contributed
vrange_storage.  With it, we'll be able to efficiently save any ranges
supported by ranger in GC memory.  Presently this will only be
irange's, but shortly we'll add floating ranges and others to the mix.

As per the discussion with the trailing_wide_ints adjustments and
vrange_storage, we'll be able to save integer ranges with a maximum of
5 sub-ranges.  This could be adjusted later if more sub-ranges are
needed (unlikely).

Since this is a behavior changing patch, I would like to take a few
days for discussion, and commit early next week if all goes well.

A few notes.

First, we get rid of the SSA_NAME_ANTI_RANGE_P bit in the SSA_NAME
since we store full resolution ranges.  Perhaps it could be re-used
for something else.

The range_info_def struct is gone in favor of an opaque type handled
by vrange_storage.  It currently supports irange, but will support
frange, prange, etc, in due time.

  From the looks of it, set_range_info was an update operation despite
its name, as we improved the nonzero bits with each call, even though
we clobbered the ranges.  Presumably this was because doing a proper
intersect of ranges lost information with the anti-range hack.  We no
longer have this limitation so now we formalize both set_range_info
and set_nonzero_bits to an update operation.  After all, we should
never be losing information, but enhancing it whenever possible.  This
means, that if folks' finger-memory is not offended, as a follow-up,
I'd like to rename set_nonzero_bits and set_range_info to update_*.

I have kept the same global API we had in tree-ssanames.h, with the
caveat that all set operations are now update as discussed above.

There is a 2% performance penalty for evrp and a 3% penalty for VRP
that is coincidentally in line with a previous improvement of the same
amount in the vrange abstraction patchset.  Interestingly, this
penalty is mostly due to the wide int to tree dance we keep doing with
irange and legacy.  In a first draft of this patch where I was
streaming trees directly, there was actually a small improvement
instead.  I hope to get some of the gain back when we move irange's to
wide-ints, though I'm not in a hurry ;-).

Tested and benchmarked on x86-64 Linux.  I will also test on ppc64le
before the final commit.

Comments welcome.

gcc/ChangeLog:

   * gimple-range.cc (gimple_ranger::export_global_ranges): Remove
   verification against legacy value_range.
   * tree-core.h (struct range_info_def): Remove.
   (struct irange_storage_slot): New.
   (struct tree_base): Remove SSA_NAME_ANTI_RANGE_P documentation.
   (struct tree_ssa_name): Add vrange_storage support.
   * tree-ssanames.cc (range_info_p): New.
   (range_info_fits_p): New.
   (range_info_alloc): New.
   (range_info_free): New.
   (range_info_get_range): New.
   (range_info_set_range): New.
   (set_range_info_raw): Remove.
   (set_range_info): Adjust to use vrange_storage.
   (set_nonzero_bits): Same.
   (get_nonzero_bits): Same.
   (duplicate_ssa_name_range_info): Remove overload taking
   value_range_kind.
   Rewrite tree overload to use vrange_storage.
   (duplicate_ssa_name_fn): Adjust to use vrange_storage.
   * tree-ssanames.h (struct range_info_def): Remove.
   (set_range_info): Adjust prototype to take vrange.
   * tree-vrp.cc (vrp_asserts::remove_range_assertions): Call
   duplicate_ssa_name_range_info.
   * tree.h (SSA_NAME_ANTI_RANGE_P): Remove.
   (SSA_NAME_RANGE_TYPE): Remove.
   * value-query.cc (get_ssa_name_range_info): Adjust to use
   vrange_storage.
   (update_global_range): Use int_range_max.
   (get_range_global): Remove as_a.

I'll be so happy once we don't have to keep doing the conversions
between the types.

Anti-ranges no more!

Yeah, it took a little longer than the 6 weeks Andrew had estimated
originally :-P.

Note that anti range kinda sorta still exist in two forms:

a) If you use value_range, as it still uses the legacy stuff
underneath.  But any new consumers (evrp, DOM, etc), all pass an
int_range or an int_range_max, so anyone who cares about ranges
should never see an anti range.  Later this cycle value_range will be
typedefed to what is now Value_Range, which is an infinite precision
range that works for all types the ranger supports.  So anti-ranges
here will die a quick death.

b) There are some passes which still use the deprecated
irange::kind().  This method will return VR_ANTI_RANGE if the range
looks like this [-MIN, 123][567,+MAX].  But kind() is 

Re: [PATCH 0/2] loongarch: improve code generation for integer division

2022-07-09 Thread Xi Ruoyao via Gcc-patches
On Sun, 2022-07-10 at 10:20 +0800, Lulu Cheng wrote:
> 
> 在 2022/7/7 上午10:23, Xi Ruoyao 写道:
> > We were generating some unnecessary instructions for integer
> > division.
> > These two patches improve the code generation to compile
> > 
> >  template  T div(T a, T b) { return a / b; }
> > 
> > into a single division instruction (along with a return instruction
> > of
> > course) as we expected for T in {int32_t, uint32_t, int64_t}.
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> > 
> > Xi Ruoyao (2):
> >    loongarch: add alternatives for idiv insns to improve code
> > generation
> >    loongarch: avoid unnecessary sign-extend after 32-bit division
> > 
> >   gcc/config/loongarch/loongarch-protos.h    |  1 +
> >   gcc/config/loongarch/loongarch.cc  |  2 +-
> >   gcc/config/loongarch/loongarch.md  | 34 --
> > 
> >   gcc/testsuite/gcc.target/loongarch/div-1.c |  9 ++
> >   gcc/testsuite/gcc.target/loongarch/div-2.c |  9 ++
> >   gcc/testsuite/gcc.target/loongarch/div-3.c |  9 ++
> >   gcc/testsuite/gcc.target/loongarch/div-4.c |  9 ++
> >   7 files changed, 63 insertions(+), 10 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c
> > 
> LGTM and spec has been tested.
> 
> Thanks!

Pushed r13-1592 and r13-1593.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] c: Fix location for _Pragma tokens [PR97498]

2022-07-09 Thread Jeff Law via Gcc-patches




On 7/9/2022 2:52 PM, Lewis Hyatt via Gcc-patches wrote:

Hello-

PR97498 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97498) is another PR
related to the fact that imprecise locations for _Pragma result in
counterintuitive behavior for GCC diagnostic pragmas, which inhibit the
ability to make convenient wrapper macros for enabling and disabling
diagnostics in specific scopes.

It looks like David did a lot of work a few years ago improving this
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69543 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69558), and in particular
r233637 added a lot of new test coverage for cases that had regressed in the
past.

I think the main source of problems for all remaining issues is that we use
the global input_location for deciding when/if a diagnostic should apply. I
think it should be eventually doable to eliminate this, and rather properly
resolve the token locations to the place they need to be
I've long wanted to see our dependency on input_location be diminished 
with the goal of making it go away completely.

so that _Pragma
type wrapper macros just work the way people expect.
Certainly desirable since many projects have built wrapper macros which 
use Pragmas to control warnings.  One of the biggest QOI implementation 
details we've had with the warnings has been problems with location data 
leading to an inability to turn them off in specific locations.


So I'm all for improvements, in terms of getting our location data more 
correct.






That said, PR97498 can be solved easily with a 2-line fix without removing
input_location, and I think the resulting change to input_location's value
is an improvement that will benefit other areas, so I thought I'd see what
you think about this patch please?

Here is a typical testcase. Note the line continuations so it's all one
logical line.

===
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wunused-function\"") \
static void f() {} \
_Pragma("GCC diagnostic pop")
===

What happens is that in C++ mode, input_location is always updated to the
most recently-lexed token, so the above case works fine and does not warn
when compiled with "g++ -Wunused-functions". However, in C mode, it does
warn because input_location in C is almost always set to the start of the
line, and is in this case. So the pop is deemed to take place prior to the
definition of f().

Initially, I thought it best to change input_location for C mode to behave
like C++, and always update to the most recently lexed token. Maybe that's
still the right way to go, but there was a fair amount of testsuite fallout
from that. Most of it, was just that we would need to change the tests to look
for the new locations, and in many cases, the new locations seemed
preferable to the old ones, but it seemed a bit much for now, so I took a
more measured approach and just changed input_location in the specific case
of processing a pragma, to be the location of the CPP_PRAGMA token.

Unfortunately, it turns out that the CPP_PRAGMA token that libcpp provides
to represent the _Pragma() expression doesn't have a valid location with
which input_location could be overridden. Looking into that, in r232893
David added logic which sets the location of all tokens inside the
_Pragma(...) to a reasonable place (namely it points to "_Pragma" at the
expansion point). However, that patch didn't change the location of the
CPP_PRAGMA token itself to similarly point there, so the 2nd line of this
patch does that.

The rest of it is just tweaking a couple tests which were sensitive to the
location being output. In all these cases, the new locations seem more
informative to me than the old ones. With those tweaks, bootstrap + regtest
all languages looks good with no regressions.

Please let me know what you think? Thanks!
gcc/c/ChangeLog:

PR preprocessor/97498
* c-parser.cc (c_parser_pragma): Set input_location to the
location of the pragma, rather than the start of the line.

libcpp/ChangeLog:

PR preprocessor/97498
* directives.cc (destringize_and_run): Override the location of
the CPP_PRAGMA token from a _Pragma directive to the location of
the expansion point, as is done for the tokens lexed from it.

gcc/testsuite/ChangeLog:

PR preprocessor/97498
* c-c++-common/pr97498.c: New test.
* c-c++-common/gomp/pragma-3.c: Adapt for improved warning locations.
* c-c++-common/gomp/pragma-5.c: Likewise.
* gcc.dg/pragma-message.c: Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adapt for
improved warning locations.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.

OK for the trunk.  Thanks for digging into this!

jeff



Re: [PATCH] middle-end/104854: Avoid overread warning for strnlen and strndup

2022-07-09 Thread Siddhesh Poyarekar

On 10/07/2022 08:59, Jeff Law via Gcc-patches wrote:



On 3/9/2022 5:39 PM, Siddhesh Poyarekar wrote:

The size argument larger than size of SRC for strnlen and strndup is
problematic only if SRC is not NULL terminated, which invokes undefined
behaviour.  In all other cases, as long as SRC is large enough to have a
NULL char (i.e. size 1 or more), a larger N should not invoke a warning
during compilation.

Such a warning may be a suitable check for the static analyzer instead
with slightly different wording suggesting that choice of size argument
makes the function call equivalent to strlen/strdup.

This change results in the following code going through without a
warning:

--
char *buf;

char *
foo (void)
{
   buf = __builtin_malloc (4);
   __builtin_memset (buf, 'A', 4);

   return __builtin_strndup (buf,  5);
}

int
main ()
{
   __builtin_printf ("%s\n", foo ());
}
--

but the problem above is a missing NULL, not N being larger than the
size of SRC and the overread warning in this context is confusing at
best and misleading (and hinting at the wrong solution) in the worst
case.

gcc/ChangeLog:

middle-end/104854
* gimple-ssa-warn-access.cc (check_access):
New parameter.  Skip warning if in read-only mode, source string
is NULL terminated and has non-zero object size.
(check_access): New parameter.
(check_access): Adjust.
(check_read_access): New parameter.  Adjust for check_access
change.
(pass_waccess::check_builtin): Adjust check_read_access call for
memcmp, memchr.
(pass_waccess::maybe_check_access_sizes): Likewise.

gcc/testsuite/ChangeLog:

middle-end/104854
* gcc.dg/Wstringop-overread.c
(test_strnlen_array, test_strndup_array): Don't expect warning
for non-zero source sizes.
* gcc.dg/attr-nonstring-4.c (strnlen_range): Likewise.
* gcc.dg/pr78902.c: Likewise.
* gcc.dg/warn-strnlen-no-nul.c: Likewise.
I know this is old and the BZ has been set as CLOSED/INVALID.  But it 
was in my TODO list,  and I've got thoughts here so I might as well 
chime in ;-)


The potential overread warning for that code seems quite reasonable to 
me.    Yes it is the case that the length argument is sometimes 
unrelated to the source string.  But even then where's the line for when 
we should and should not warn?


The argument I was trying to make in the context of strnlen and strndup 
was that it is more likely in practice for the length argument to be a 
function of some other property, e.g. a destination buffer or an 
external limit that it is to be related to the source string.  However I 
don't have any concrete evidence (or the cycles to find it at the 
moment) to either back up my claim or refute it.  strndup for example 
seems popular for a substring alloc+copy and also for a general string 
copy with an application-specific upper bound, e.g. PATH_MAX.


Thanks,
Sid