[gcc r15-1071] AArch64: correct constraint on Upl early clobber alternatives

2024-06-06 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:afe85f8e22a703280b17c701f3490d89337f674a commit r15-1071-gafe85f8e22a703280b17c701f3490d89337f674a Author: Tamar Christina Date: Thu Jun 6 14:35:48 2024 +0100 AArch64: correct constraint on Upl early clobber alternatives I made an oversight in the previous

[PATCH]AArch64: correct constraint on Upl early clobber alternatives

2024-06-06 Thread Tamar Christina
Hi All, I made an oversight in the previous patch, where I added a ?Upa alternative to the Upl cases. This causes it to create the tie between the larger register file rather than the constrained one. This fixes the affected patterns. Bootstrapped Regtested on aarch64-none-linux-gnu and no

[gcc r15-1041] AArch64: enable new predicate tuning for Neoverse cores.

2024-06-05 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:3eb9f6eab9802d5ae65ead6b1f2ae6fe0833e06e commit r15-1041-g3eb9f6eab9802d5ae65ead6b1f2ae6fe0833e06e Author: Tamar Christina Date: Wed Jun 5 19:32:16 2024 +0100 AArch64: enable new predicate tuning for Neoverse cores. This enables the new tuning flag

[gcc r15-1040] AArch64: add new alternative with early clobber to patterns

2024-06-05 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:2de3bbde1ebea8689f3596967769f66bf903458e commit r15-1040-g2de3bbde1ebea8689f3596967769f66bf903458e Author: Tamar Christina Date: Wed Jun 5 19:31:39 2024 +0100 AArch64: add new alternative with early clobber to patterns This patch adds new alternatives

[gcc r15-1039] AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-06-05 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:35f17c680ca650f8658994f857358e5a529c0b93 commit r15-1039-g35f17c680ca650f8658994f857358e5a529c0b93 Author: Tamar Christina Date: Wed Jun 5 19:31:11 2024 +0100 AArch64: add new tuning param and attribute for enabling conditional early clobber This adds a new

[gcc r15-1038] AArch64: convert several predicate patterns to new compact syntax

2024-06-05 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:fd4898891ae0c73d6b7aa433cd1ef4539aaa2457 commit r15-1038-gfd4898891ae0c73d6b7aa433cd1ef4539aaa2457 Author: Tamar Christina Date: Wed Jun 5 19:30:39 2024 +0100 AArch64: convert several predicate patterns to new compact syntax This converts the single

RE: [PATCH] Rearrange SLP nodes with duplicate statements. [PR98138]

2024-06-05 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, June 5, 2024 9:07 AM > To: Manolis Tsamis > Cc: gcc-patches@gcc.gnu.org; Christoph Müllner ; > Kewen . Lin ; Philipp Tomsich ; > Tamar Christina ; Jiangning Liu > > Subject: Re: [PATCH] Rearran

RE: [PATCH] [RFC] lower SLP load permutation to interleaving

2024-06-05 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, June 4, 2024 3:33 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford ; Tamar Christina > > Subject: [PATCH] [RFC] lower SLP load permutation to interleaving > > The following emulates classica

RE: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-28 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, May 22, 2024 12:24 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; ktkac...@gcc.gnu.org > Subject: Re: [PATCH 3/4]AArch64: add new altern

RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-05-28 Thread Tamar Christina
> -Original Message- > From: Tamar Christina > Sent: Wednesday, May 22, 2024 10:29 AM > To: Richard Sandiford > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; ktkac...@gcc.gnu.org > Subject: RE: [PATCH 2/4]AArch64: add new tu

RE: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-22 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, May 22, 2024 10:48 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; ktkac...@gcc.gnu.org > Subject: Re: [PATCH 3/4]AArch64: add new altern

[PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-22 Thread Tamar Christina
Hi All, This patch adds new alternatives to the patterns which are affected. The new alternatives with the conditional early clobbers are added before the normal ones in order for LRA to prefer them in the event that we have enough free registers to accommodate them. In case register pressure

[PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores.

2024-05-22 Thread Tamar Christina
Hi All, This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2. It is kept off for generic codegen. Note the reason for the +sve even though they are in aarch64-sve.exp is if the testsuite is ran with a forced SVE off option, e.g. -march=armv8-a+nosve then the intrinsics

RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-05-22 Thread Tamar Christina
> > Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"? > (I'm open to other suggestions.) Just looking for something that describes > either the architecture or the end result that we want to achieve. > And preferable something fairly short :) > > avoid_* would be

RE: [RFC] Merge strathegy for all-SLP vectorizer

2024-05-21 Thread Tamar Christina via Gcc
> -Original Message- > From: Richard Biener > Sent: Friday, May 17, 2024 1:54 PM > To: Richard Sandiford > Cc: Richard Biener via Gcc ; Tamar Christina > > Subject: Re: [RFC] Merge strathegy for all-SLP vectorizer > > On Fri, 17 May 2024, Richard Sand

RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-20 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, May 15, 2024 10:31 PM > To: Tamar Christina > Cc: Richard Biener ; gcc-patches@gcc.gnu.org; nd > ; Richard Earnshaw ; Marcus > Shawcroft ; ktkac...@gcc.gnu.org > Subject: Re: [PATCH 0/4]AArch64: s

RE: [PATCH v3] Match: Extract ternary_integer_types_match_p helper func [NFC]

2024-05-20 Thread Tamar Christina
> -Original Message- > From: pan2...@intel.com > Sent: Tuesday, May 21, 2024 2:13 AM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > ; richard.guent...@gmail.com; Pan Li > > Subject: [

RE: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD

2024-05-20 Thread Tamar Christina
Hi Pan, > -Original Message- > From: pan2...@intel.com > Sent: Monday, May 20, 2024 12:01 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > ; richard.guent...@gmail.com; Pan Li > > Subject: [PATCH v1 1/2] Ma

RE: [PATCH v1] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-20 Thread Tamar Christina
> -Original Message- > From: pan2...@intel.com > Sent: Sunday, May 19, 2024 5:17 AM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > ; richard.guent...@gmail.com; Pan Li > > Subject: [PATCH v1] Match: Extract int

RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-17 Thread Tamar Christina
t: 1073741824]: > _1 = x_3(D) + y_4(D); > if (_1 >= x_3(D)) > goto ; [65.00%] > else > goto ; [35.00%] > >[local count: 697932184]: > > [local count: 1073741824]: > # _2 = PHI <65535(2), _1(3)> > return _2; > } > >

RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Friday, May 17, 2024 10:46 AM > To: Tamar Christina > Cc: Victor Do Nascimento ; gcc- > patc...@gcc.gnu.org; Richard Sandiford ; Richard > Earnshaw ; Victor Do Nascimento > > Subject: Re: [PATCH] middle-e

RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Tamar Christina
> -Original Message- > From: Hongtao Liu > Sent: Friday, May 17, 2024 3:14 AM > To: Victor Do Nascimento > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ; > Richard Earnshaw ; Victor Do Nascimento > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in > autovectorizer

RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Friday, May 17, 2024 6:51 AM > To: Victor Do Nascimento > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ; > Richard Earnshaw ; Victor Do Nascimento > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in >

RE: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Tamar Christina
Hi, > -Original Message- > From: Victor Do Nascimento > Sent: Thursday, May 16, 2024 2:57 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford ; Richard Earnshaw > ; Victor Do Nascimento > > Subject: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization > [PR114061]'

RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Tamar Christina
Hi Victor, > -Original Message- > From: Victor Do Nascimento > Sent: Thursday, May 16, 2024 3:39 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford ; Richard Earnshaw > ; Victor Do Nascimento > > Subject: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer > >

RE: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit

2024-05-16 Thread Tamar Christina
> -Original Message- > From: pan2...@intel.com > Sent: Thursday, May 16, 2024 5:06 AM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > ; richard.guent...@gmail.com; Richard Sandiford > ; Pan Li > Subject: [PATCH v2

RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, May 15, 2024 10:31 PM > To: Tamar Christina > Cc: Richard Biener ; gcc-patches@gcc.gnu.org; nd > ; Richard Earnshaw ; Marcus > Shawcroft ; ktkac...@gcc.gnu.org > Subject: Re: [PATCH 0/4]AArch64: s

RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Tamar Christina
> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina > >> wrote: > >> > > >> > Hi All, > >> > > >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that > >> > state > >>

RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, May 15, 2024 12:20 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; ktkac...@gcc.gnu.org; Richard Sandiford > > Subject: Re: [PATCH 0/4]AArch64

RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-05-15 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, May 15, 2024 11:56 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; ktkac...@gcc.gnu.org > Subject: Re: [PATCH 2/4]AArch64: add new tu

[PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-15 Thread Tamar Christina
Hi All, This patch adds new alternatives to the patterns which are affected. The new alternatives with the conditional early clobbers are added before the normal ones in order for LRA to prefer them in the event that we have enough free registers to accommodate them. In case register pressure

[PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores.

2024-05-15 Thread Tamar Christina
Hi All, This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2. It is kept off for generic codegen. Note the reason for the +sve even though they are in aarch64-sve.exp is if the testsuite is ran with a forced SVE off option, e.g. -march=armv8-a+nosve then the intrinsics

[PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax

2024-05-15 Thread Tamar Christina
Hi All, This converts the single alternative patterns to the new compact syntax such that when I add the new alternatives it's clearer what's being changed. Note that this will spew out a bunch of warnings from geninsn as it'll warn that @ is useless for a single alternative pattern. These are

[PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-05-15 Thread Tamar Christina
Hi All, This adds a new tuning parameter EARLY_CLOBBER_SVE_PRED_DEST for AArch64 to allow us to conditionally enable the early clobber alternatives based on the tuning models. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog:

[PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Tamar Christina
Hi All, Some Neoverse Software Optimization Guides (SWoG) have a clause that state that for predicated operations that also produce a predicate it is preferred that the codegen should use a different register for the destination than that of the input predicate in order to avoid a performance

RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-15 Thread Tamar Christina
Hi Pan, Thanks! > -Original Message- > From: pan2...@intel.com > Sent: Wednesday, May 15, 2024 3:14 AM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > ; richard.guent...@gmail.com; > hongtao@intel.com; Pan Li &

RE: [PATCH v1 1/3] Vect: Support loop len in vectorizable early exit

2024-05-13 Thread Tamar Christina
> -Original Message- > From: pan2...@intel.com > Sent: Monday, May 13, 2024 3:54 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; > Tamar Christina ; Richard Sandiford > ; Pan Li > Subject: [PATCH v1

RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-13 Thread Tamar Christina
for each shape here ? Both works for > me. > Yeah, I think that's better than iterating over the statements twice. It also fits better In the existing code. Tamar. > #define SAT_ADD_U_1(T) \ > T sat_add_u_1_##T(T x, T y) \ > { \ > return (T)(x + y) >= x ? (x + y) : -1;

RE: [PATCH] Allow patterns in SLP reductions

2024-05-13 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Friday, May 10, 2024 2:07 PM > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Allow patterns in SLP reductions > > On Fri, Mar 1, 2024 at 10:21 AM Richard Biener wrote: > > > > The following removes the

RE: [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int

2024-05-13 Thread Tamar Christina
Hi Pan, > -Original Message- > From: pan2...@intel.com > Sent: Monday, May 6, 2024 3:49 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > ; richard.guent...@gmail.com; > hongtao@intel.com; Pan Li > Subj

RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-13 Thread Tamar Christina
Hi Pan, > -Original Message- > From: pan2...@intel.com > Sent: Monday, May 6, 2024 3:48 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > ; richard.guent...@gmail.com; > hongtao@intel.com; Pan Li > Subject:

RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

2024-05-02 Thread Tamar Christina
t y) noexcept { uint64_t z; if (!__builtin_add_overflow(x, y, )) return z; return -1u; } Is a valid and common way to do saturation too. But for now, it's fine. Cheers, Tamar > Sorry not sure if my understanding is correct, feel free to correct me. > > Pan >

RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

2024-05-01 Thread Tamar Christina
> -Original Message- > From: Li, Pan2 > Sent: Thursday, May 2, 2024 4:11 AM > To: Tamar Christina ; gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; > Liu, Hongtao > Subject: RE: [PATCH v3] Internal-fn: Introduce

RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

2024-05-01 Thread Tamar Christina
Hi, > From: Pan Li > > Update in v3: > * Rebase upstream for conflict. > > Update in v2: > * Fix one failure for x86 bootstrap. > > Original log: > > This patch would like to add the middle-end presentation for the > saturation add. Aka set the result of add to the max when overflow. > It

[gcc r14-10040] middle-end: refactory vect_recog_absolute_difference to simplify flow [PR114769]

2024-04-19 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:1216460e7023cd8ec49933866107417c70e933c9 commit r14-10040-g1216460e7023cd8ec49933866107417c70e933c9 Author: Tamar Christina Date: Fri Apr 19 15:22:13 2024 +0100 middle-end: refactory vect_recog_absolute_difference to simplify flow [PR114769] Hi All

[PATCH]middle-end: refactory vect_recog_absolute_difference to simplify flow [PR114769]

2024-04-19 Thread Tamar Christina
Hi All, As the reporter in PR114769 points out the control flow for the abd detection is hard to follow. This is because vect_recog_absolute_difference has two different ways it can return true. 1. It can return true when the widening operation is matched, in which case unprom is set,

[gcc r14-10014] AArch64: remove reliance on register allocator for simd/gpreg costing. [PR114741]

2024-04-18 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:a2f4be3dae04fa8606d1cc8451f0b9d450f7e6e6 commit r14-10014-ga2f4be3dae04fa8606d1cc8451f0b9d450f7e6e6 Author: Tamar Christina Date: Thu Apr 18 11:47:42 2024 +0100 AArch64: remove reliance on register allocator for simd/gpreg costing. [PR114741] In PR114741 we

[PATCH]AArch64: remove reliance on register allocator for simd/gpreg costing. [PR114741]

2024-04-18 Thread Tamar Christina
Hi All, In PR114741 we see that we have a regression in codegen when SVE is enable where the simple testcase: void foo(unsigned v, unsigned *p) { *p = v & 1; } generates foo: fmovs31, w0 and z31.s, z31.s, #1 str s31, [x1] ret instead of: foo:

gcc-wwwdocs branch master updated. 3530b8d820658fb3add4b06def91672a0053f2b2

2024-04-16 Thread Tamar Christina via Gcc-cvs-wwwdocs
--- commit 3530b8d820658fb3add4b06def91672a0053f2b2 Author: Tamar Christina Date: Mon Apr 15 16:00:21 2024 +0100 gcc-14/docs: document early break support and pragma novector diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index 6035ae37..c98ebe5a 100644 --- a/htdocs/gcc-14/changes.html +++ b/htd

[gcc r14-9997] testsuite: Fix data check loop on vect-early-break_124-pr114403.c

2024-04-16 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:f438acf7ce2e6cb862cf62f2543c36639e2af233 commit r14-9997-gf438acf7ce2e6cb862cf62f2543c36639e2af233 Author: Tamar Christina Date: Tue Apr 16 20:56:26 2024 +0100 testsuite: Fix data check loop on vect-early-break_124-pr114403.c The testcase had the wrong

RE: [PATCH]middle-end: skip vectorization check on ilp32 on vect-early-break_124-pr114403.c

2024-04-16 Thread Tamar Christina
> On Tue, Apr 16, 2024 at 09:00:53AM +0200, Richard Biener wrote: > > > PR tree-optimization/114403 > > > * gcc.dg/vect/vect-early-break_124-pr114403.c: Skip in ilp32. > > > > > > --- > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c >

[PATCH]middle-end: skip vectorization check on ilp32 on vect-early-break_124-pr114403.c

2024-04-15 Thread Tamar Christina
Hi all, The testcase seems to fail vectorization on -m32 since the access pattern is determined as too complex. This skips the vectorization check on ilp32 systems as I couldn't find a better proxy for being able to do strided 64-bit loads and I suspect it would fail on all 32-bit targets.

docs: document early break support and pragma novector

2024-04-15 Thread Tamar Christina
docs: document early break support and pragma novector --- diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index b4c602a523717c1d64333e44aefb60ba0ed02e7a..aceecb86f17443cfae637e90987427b98c42f6eb 100644 --- a/htdocs/gcc-14/changes.html +++ b/htdocs/gcc-14/changes.html @@

[gcc r11-11323] [AArch64]: Do not allow SIMD clones with simdlen 1 [PR113552]

2024-04-15 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:0c2fcf3ddfe93d1f403962c4bacbb5d55ab7d19d commit r11-11323-g0c2fcf3ddfe93d1f403962c4bacbb5d55ab7d19d Author: Tamar Christina Date: Mon Apr 15 12:32:24 2024 +0100 [AArch64]: Do not allow SIMD clones with simdlen 1 [PR113552] This is a backport of g

[gcc r12-10329] AArch64: Do not allow SIMD clones with simdlen 1 [PR113552]

2024-04-15 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:642cfd049780f03335da9fe0a51415f130232334 commit r12-10329-g642cfd049780f03335da9fe0a51415f130232334 Author: Tamar Christina Date: Mon Apr 15 12:16:53 2024 +0100 AArch64: Do not allow SIMD clones with simdlen 1 [PR113552] This is a backport of g

[gcc r13-8604] AArch64: Do not allow SIMD clones with simdlen 1 [PR113552]

2024-04-15 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:1e08e39c743692afdd5d3546b2223474beac1dbc commit r13-8604-g1e08e39c743692afdd5d3546b2223474beac1dbc Author: Tamar Christina Date: Mon Apr 15 12:11:48 2024 +0100 AArch64: Do not allow SIMD clones with simdlen 1 [PR113552] This is a backport of g

[gcc r14-9969] middle-end: adjust loop upper bounds when peeling for gaps and early break [PR114403].

2024-04-15 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:85002f8085c25bb3e74ab013581a74e7c7ae006b commit r14-9969-g85002f8085c25bb3e74ab013581a74e7c7ae006b Author: Tamar Christina Date: Mon Apr 15 12:06:21 2024 +0100 middle-end: adjust loop upper bounds when peeling for gaps and early break [PR114403]. This fixes

[PATCH]middle-end: adjust loop upper bounds when peeling for gaps and early break [PR114403].

2024-04-12 Thread Tamar Christina
Hi All, This is a story all about how the peeling for gaps introduces a bug in the upper bounds. Before I go further, I'll first explain how I understand this to work for loops with a single exit. When peeling for gaps we peel N < VF iterations to scalar. This happens by removing N iterations

[PATCH]middle-end vect: adjust loop upper bounds when peeling for gaps and early break [PR114403]

2024-04-04 Thread Tamar Christina
Hi All, The report shows that we end up in a situation where the code has been peeled for gaps and we have an early break. The code for peeling for gaps assume that a scalar loop needs to perform at least one iteration. However this doesn't take into account early break where the scalar loop

[gcc r14-9493] match.pd: Only merge truncation with conversion for -fno-signed-zeros

2024-03-15 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:7dd3b2b09cbeb6712ec680a0445cb0ad41070423 commit r14-9493-g7dd3b2b09cbeb6712ec680a0445cb0ad41070423 Author: Joe Ramsay Date: Fri Mar 15 09:20:45 2024 + match.pd: Only merge truncation with conversion for -fno-signed-zeros This optimisation does not honour

Summary: [PATCH][committed]AArch64: Do not allow SIMD clones with simdlen 1 [PR113552][GCC 13/12/11 backport]

2024-03-12 Thread Tamar Christina
Hi All, This is a backport of g:306713c953d509720dc394c43c0890548bb0ae07. The AArch64 vector PCS does not allow simd calls with simdlen 1, however due to a bug we currently do allow it for num == 0. This causes us to emit a symbol that doesn't exist and we fail to link. Bootstrapped Regtested

RE: [PATCH] vect: Do not peel epilogue for partial vectors [PR114196].

2024-03-07 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Thursday, March 7, 2024 8:47 AM > To: Robin Dapp > Cc: gcc-patches ; Tamar Christina > > Subject: Re: [PATCH] vect: Do not peel epilogue for partial vectors > [PR114196]. > > On Wed, Mar 6, 202

RE: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-02-27 Thread Tamar Christina
e. This would allow us to better understand what kind of gimple would have to to deal with in ISEL and VECT if we decide not to lower early. Thanks, Tamar > Pan > > -Original Message- > From: Tamar Christina > Sent: Tuesday, February 27, 2024 5:57 PM > To: Richard Biener

RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

2024-02-27 Thread Tamar Christina
> Am 19.02.24 um 08:36 schrieb Richard Biener: > > On Sat, Feb 17, 2024 at 11:30 AM wrote: > >> > >> From: Pan Li > >> > >> This patch would like to add the middle-end presentation for the > >> unsigned saturation add. Aka set the result of add to the max > >> when overflow. It will take the

RE: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-02-27 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, February 27, 2024 9:44 AM > To: Tamar Christina > Cc: pan2...@intel.com; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; > yanzhang.w...@intel.com; kito.ch...@gmail.com; > richard.sandiford@arm.com2;

RE: [PATCH]middle-end: delay updating of dominators until later during vectorization. [PR114081]

2024-02-26 Thread Tamar Christina
> > The testcase shows an interesting case where we have multiple loops sharing > > a > > live value and have an early exit that go to the same location. The > > additional > > complication is that on x86_64 with -mavx we seem to also do prologue > > peeling > > on the loops. > > > > We

[PATCH]middle-end: delay updating of dominators until later during vectorization. [PR114081]

2024-02-25 Thread Tamar Christina
Hi All, The testcase shows an interesting case where we have multiple loops sharing a live value and have an early exit that go to the same location. The additional complication is that on x86_64 with -mavx we seem to also do prologue peeling on the loops. We correctly identify which BB we need

RE: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-02-25 Thread Tamar Christina
Hi Pan, > From: Pan Li > > Hi Richard & Tamar, > > Try the DEF_INTERNAL_INT_EXT_FN as your suggestion. By mapping > us_plus$a3 to the RTL representation (us_plus:m x y) in optabs.def. > And then expand_US_PLUS in internal-fn.cc. Not very sure if my > understanding is correct for

[PATCH]middle-end: update vuses out of loop which use a vdef that's moved [PR114068]

2024-02-23 Thread Tamar Christina
Hi All, In certain cases we can have a situation where the merge block has a vUSE virtual PHI and the exits do not. In this case for instance the exits lead to an abort so they have no virtual PHIs. If we have a store before the first exit and we move it to a later block during vectorization we

RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

2024-02-19 Thread Tamar Christina
> -Original Message- > From: Li, Pan2 > Sent: Monday, February 19, 2024 12:59 PM > To: Tamar Christina ; Richard Biener > > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang > ; kito.ch...@gmail.com > Subject: RE: [PATCH v1] Internal-fn: Add new in

RE: [PATCH]AArch64: xfail modes_1.f90 [PR107071]

2024-02-19 Thread Tamar Christina
> -Original Message- > From: Tamar Christina > Sent: Thursday, February 15, 2024 11:05 AM > To: Richard Earnshaw (lists) ; gcc- > patc...@gcc.gnu.org > Cc: nd ; Marcus Shawcroft ; Kyrylo > Tkachov ; Richard Sandiford > > Subject: RE: [PATCH]AArch64:

RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

2024-02-19 Thread Tamar Christina
Thanks for doing this! > -Original Message- > From: Li, Pan2 > Sent: Monday, February 19, 2024 8:42 AM > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang > ; kito.ch...@gmail.com; Tamar Christina > > Subject: RE: [PATCH

RE: [PATCH] aarch64: Improve PERM<{0}, a, ...> (64bit) by adding whole vector shift right [PR113872]

2024-02-15 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Thursday, February 15, 2024 2:56 PM > To: Andrew Pinski > Cc: gcc-patches@gcc.gnu.org; Tamar Christina > Subject: Re: [PATCH] aarch64: Improve PERM<{0}, a, ...> (64bit) by adding > whole > vector shif

RE: [PATCH]AArch64: xfail modes_1.f90 [PR107071]

2024-02-15 Thread Tamar Christina
> -Original Message- > From: Richard Earnshaw (lists) > Sent: Thursday, February 15, 2024 11:01 AM > To: Tamar Christina ; gcc-patches@gcc.gnu.org > Cc: nd ; Marcus Shawcroft ; Kyrylo > Tkachov ; Richard Sandiford > > Subject: Re: [PATCH]AArch64: xfail modes_1.f

[PATCH]AArch64: xfail modes_1.f90 [PR107071]

2024-02-15 Thread Tamar Christina
Hi All, This test has never worked on AArch64 since the day it was committed. It has a number of issues that prevent it from working on AArch64: 1. IEEE does not require that FP operations raise a SIGFPE for FP operations, only that an exception is raised somehow. 2. Most Arm designed

RE: [PATCH]AArch64: remove ls64 from being mandatory on armv8.7-a..

2024-02-15 Thread Tamar Christina
Hi, this I a new version of the patch updating some additional tests because some of the LTO tests required a newer binutils than my distro had. --- The Arm Architectural Reference Manual (Version J.a, section A2.9 on FEAT_LS64) shows that ls64 is an optional extensions and should not be

RE: [PATCH]AArch64: update vget_set_lane_1.c test output

2024-02-15 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Thursday, February 1, 2024 4:42 PM > To: Tamar Christina > Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; nd > ; Richard Earnshaw ; Marcus > Shawcroft ; Kyrylo Tkachov > > Subject: Re: [PATCH]AArch64: up

[PATCH]AArch64: remove ls64 from being mandatory on armv8.7-a..

2024-02-14 Thread Tamar Christina
Hi All, The Arm Architectural Reference Manual (Version J.a, section A2.9 on FEAT_LS64) shows that ls64 is an optional extensions and should not be enabled by default for Armv8.7-a. This drops it from the mandatory bits for the architecture and brings GCC inline with LLVM and the achitecture.

RE: [PATCH]middle-end: inspect all exits for additional annotations for loop.

2024-02-14 Thread Tamar Christina
> > I think this isn't entirely good. For simple cases for > do {} while the condition ends up in the latch while for while () {} > loops it ends up in the header. In your case the latch isn't empty > so it doesn't end up with the conditional. > > I think your patch is OK to the point of

[PATCH]middle-end: inspect all exits for additional annotations for loop.

2024-02-14 Thread Tamar Christina
Hi All, Attaching a pragma to a loop which has a complex condition often gets the pragma dropped. e.g. #pragma GCC novector while (i < N && parse_tables_n--) before lowering this is represented as: if (ANNOTATE_EXPR ) ... But after lowering the condition is broken appart and attached to

[PATCH]middle-end: update vector loop upper bounds when early break vect [PR113734]

2024-02-13 Thread Tamar Christina
Hi All, When doing early break vectorization we should treat the final iteration as possibly being partial. This so that when we calculate the vector loop upper bounds we take into account that final iteration could have done some work. The attached testcase shows that if we don't then cunroll

RE: [PATCH]middle-end: add two debug counters for early-break vectorization debugging

2024-02-08 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Thursday, February 8, 2024 2:16 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com > Subject: Re: [PATCH]middle-end: add two debug counters for early-break > vectorization debuggi

[PATCH]middle-end: add two debug counters for early-break vectorization debugging

2024-02-08 Thread Tamar Christina
Hi All, This adds two new debug counter to aid in debugging early break code. - vect_force_last_exit: when reached will always force the final loop exit. - vect_skip_exit: when reached will skip selecting the current candidate exit as the loop exit. The first counter

RE: [PATCH]middle-end: don't cache restart_loop in vectorizable_live_operations [PR113808]

2024-02-08 Thread Tamar Christina
> Please either drop lastprivate(k) clause or use linear(k:1) > The iteration var of simd loop without collapse or with > collapse(1) is implicitly linear with the step, and even linear > means the value from the last iteration can be used after the > simd construct. Overriding the data sharing

[PATCH]middle-end: don't cache restart_loop in vectorizable_live_operations [PR113808]

2024-02-08 Thread Tamar Christina
Hi All, There's a bug in vectorizable_live_operation that restart_loop is defined outside the loop. This variable is supposed to indicate whether we are doing a first or last index reduction. The problem is that by defining it outside the loop it becomes dependent on the order we visit the

[PATCH][committed]middle-end: fix pointer conversion error in testcase vect-early-break_110-pr113467.c

2024-02-08 Thread Tamar Christina
Hi All, I had missed a conversion from unsigned long to uint64_t. This fixes the failing test on -m32. Regtested on x86_64-pc-linux-gnu with -m32 and no issues. Committed as obvious. Thanks, Tamar gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-early-break_110-pr113467.c: Change unsigned

RE: [PATCH]middle-end: fix ICE when moving statements to empty BB [PR113731]

2024-02-05 Thread Tamar Christina
> It looks like LOOP_VINFO_EARLY_BRK_STORES is "reverse"? Is that > why you are doing gsi_move_before + gsi_prev? Why do gsi_prev > at all? > As discussed on IRC, then how about this one. Incremental building passed all tests and bootstrap is running. Ok for master if bootstrap and regtesting

RE: [PATCH]middle-end: add additional runtime test for [PR113467]

2024-02-05 Thread Tamar Christina
> > Ok for master? > > I think you need a lp64 target check for the large constants or > alternatively use uint64_t? > Ok, how about this one. Regtested on x86_64-pc-linux-gnu with -m32,-m64 and no issues. Ok for master? Thanks, Tamar gcc/testsuite/ChangeLog: PR

RE: [PATCH]middle-end: fix ICE when moving statements to empty BB [PR113731]

2024-02-05 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Monday, February 5, 2024 1:22 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com > Subject: Re: [PATCH]middle-end: fix ICE when moving statements to empty BB > [PR113731] > >

[PATCH]middle-end: fix ICE when destination BB for stores starts with a label [PR113750]

2024-02-05 Thread Tamar Christina
Hi All, The report shows that if the FE leaves a label as the first thing in the dest BB then we ICE because we move the stores before the label. This is easy to fix if we know that there's still only one way into the BB. We would have already rejected the loop if there was multiple paths into

[PATCH]middle-end: fix ICE when moving statements to empty BB [PR113731]

2024-02-05 Thread Tamar Christina
Hi All, We use gsi_move_before (_gsi, _gsi); to request that the new statement be placed before any other statement. Typically this then moves the current pointer to be after the statement we just inserted. However it looks like when the BB is empty, this does not happen and the CUR pointer

[PATCH]middle-end: add additional runtime test for [PR113467]

2024-02-05 Thread Tamar Christina
Hi All, This just adds an additional runtime testcase for the fixed issue. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/testsuite/ChangeLog: PR tree-optimization/113467 * gcc.dg/vect/vect-early-break_110-pr113467.c: New

RE: [PATCH]middle-end: check memory accesses in the destination block [PR113588].

2024-02-01 Thread Tamar Christina
> > > > If the above is correct then I think I understand what you're saying and > > will update the patch and do some Checks. > > Yes, I think that's what I wanted to say. > As discussed: Bootstrapped Regtested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu no issues. Also checked both

RE: [PATCH]AArch64: update vget_set_lane_1.c test output

2024-02-01 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Thursday, February 1, 2024 2:24 PM > To: Andrew Pinski > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; nd > ; Richard Earnshaw ; Marcus > Shawcroft ; Kyrylo Tkachov > > Subject: Re: [PATCH]AArch64: up

[PATCH]AArch64: update vget_set_lane_1.c test output

2024-02-01 Thread Tamar Christina
Hi All, In the vget_set_lane_1.c test the following entries now generate a zip1 instead of an INS BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0) BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0) BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0) This is because the non-Q variant for

[PATCH 2/2][libsanitizer] hwasan: Remove testsuite check for a complaint message [PR112644]

2024-01-31 Thread Tamar Christina
Hi All, With recent updates to hwasan runtime libraries, the error reporting for this particular check is has been reworked. I would question why it has lost this message. To me it looks strange that num_descriptions_printed is incremented whenever we call PrintHeapOrGlobalCandidate whether

[PATCH 1/2][libsanitizer] hwasan: Remove testsuite check for a complaint message [PR112644]

2024-01-31 Thread Tamar Christina
Hi All, Recent libhwasan updates[1] intercept various string and memory functions. These functions have checking in them, which means there's no need to inline the checking. This patch marks said functions as intercepted, and adjusts a testcase to handle the difference. It also looks for HWASAN

RE: [PATCH][libsanitizer]: Sync fixes for asan interceptors from upstream [PR112644]

2024-01-31 Thread Tamar Christina
> -Original Message- > From: Andrew Pinski > Sent: Monday, January 29, 2024 9:55 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; ja...@redhat.com; > do...@redhat.com; k...@google.com; dvyu...@google.com > Subject: Re: [PATCH][libsanitizer]: Sync fixes

RE: [PATCH]middle-end: check memory accesses in the destination block [PR113588].

2024-01-30 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, January 30, 2024 9:51 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com > Subject: Re: [PATCH]middle-end: check memory accesses in the destination block > [PR113588]. >

[PATCH]middle-end: check memory accesses in the destination block [PR113588].

2024-01-29 Thread Tamar Christina
Hi All, When analyzing loads for early break it was always the intention that for the exit where things get moved to we only check the loads that can be reached from the condition. However the main loop checks all loads and we skip the destination BB. As such we never actually check the loads

  1   2   3   4   5   6   7   8   9   10   >