[pushed] wwwdocs: onlinedocs: Fix markup

2023-05-07 Thread Gerald Pfeifer
Pushed.

Gerald

---

Commit 7c0b3155efaecf8c9bfa5192ca99acc7356bec71 for GCC 13.1 "stole" an
opening  from the existing GCC 13.2 entry.
---
 htdocs/onlinedocs/index.html | 1 +
 1 file changed, 1 insertion(+)

diff --git a/htdocs/onlinedocs/index.html b/htdocs/onlinedocs/index.html
index 421d1d0b..e72f5bfa 100644
--- a/htdocs/onlinedocs/index.html
+++ b/htdocs/onlinedocs/index.html
@@ -111,6 +111,7 @@
   
 
 
+
  GCC 12.2 manuals:
   
 https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/";>GCC
-- 
2.40.1


Re: [PATCH V3] rs6000: Load high and low part of 64bit constant independently

2023-05-07 Thread guojiufu via Gcc-patches

Hi,

On 2023-04-26 17:35, Kewen.Lin wrote:

Hi Jeff,

on 2023/1/4 14:51, Jiufu Guo wrote:

Hi,

Compare with previous version, this patch updates the comments only.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608293.html

For a complicate 64bit constant, below is one instruction-sequence to
build:
lis 9,0x800a
ori 9,9,0xabcd
sldi 9,9,32
oris 9,9,0xc167
ori 9,9,0xfa16

while we can also use below sequence to build:
lis 9,0xc167
lis 10,0x800a
ori 9,9,0xfa16
ori 10,10,0xabcd
rldimi 9,10,32,0
This sequence is using 2 registers to build high and low part firstly,
and then merge them.

In parallel aspect, this sequence would be faster. (Ofcause, using 1 
more

register with potential register pressure).

The instruction sequence with two registers for parallel version can 
be

generated only if can_create_pseudo_p.  Otherwise, the one register
sequence is generated.

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?


OK for trunk, thanks for the improvement!


Thanks! Committed via r14-555-gb05b529125fa51.

BR,
Jeff (Jiufu)



BR,
Kewen




BR,
Jeff(Jiufu)


gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Generate
more parallel code if can_create_pseudo_p.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/parall_5insn_const.c: New test.



Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-07 Thread Jeff Law via Gcc-patches




On 5/4/23 07:25, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

This patch is fixing V3 patch:
https://patchwork.sourceware.org/project/gcc/patch/20230407014741.139387-1-juzhe.zh...@rivai.ai/

Fix issues according to Richard Sandiford && Richard Biener.

1. Rename WHILE_LEN pattern into SELECT_VL according to Richard Sandiford.
2. Support multiple-rgroup for non-SLP auto-vectorization.

For vec_pack_trunc pattern (multi-rgroup of non-SLP), we generate the total 
length:

  _36 = MIN_EXPR ;

  First length (MIN (X, VF/N)):
loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;

  Second length (X - MIN (X, 1 * VF/N)):
loop_len_16 = _36 - loop_len_15;

  Third length (X - MIN (X, 2 * VF/N)):
_38 = MIN_EXPR <_36, POLY_INT_CST [4, 4]>;
loop_len_17 = _36 - _38;

  Forth length (X - MIN (X, 3 * VF/N)):
_39 = MIN_EXPR <_36, POLY_INT_CST [6, 6]>;
loop_len_18 = _36 - _39;

The reason that I use MIN_EXPR instead of SELECT_VL to calculate total length 
since using SELECT_VL
to adapt induction IV consumes more instructions than just using MIN_EXPR. 
Also, during testing,
I found it's hard to adjust length correctly according to SELECT_VL.

So, this patch we only use SELECT_VL for single-rgroup with single length 
control.

3. Fix document of select_vl for Richard Biener (remove mode N).
4. Fix comments of vect_set_loop_controls_by_select_vl according to Richard 
Biener.
5. Keep loop_vinfo as first parameter for "vect_get_loop_len".
6. make requirement of get_while_len_data_ref_ptr outside, let it to be gated 
at the caller site.

More comments from Richard Biener:

So it's not actually saturating.  The saturating operation is done by 
.WHILE_LEN?

I define the outcome of SELECT_VL (n, vf)  (WHILE_LEN) = IN_RANGE (0, min (n, 
vf)) will make
the loop control counter never underflow zero.


I see.  I wonder if it makes sense to leave .WHILE_LEN aside for a start,
the above scheme should also work for single rgroups, no?
As said, it _looks_ like you can progress without .WHILE_LEN and using
.WHILE_LEN is a pure optimization?

Yes, SELECT_VL (WHILE_LEN) is pure optimization for single-rgroup and allow
target adjust any length = INRANGE (0, min (n, vf)) each iteration.

Let me known if I missed something for the V3 patch.
So at a high level this is pretty good.  I think there's some 
improvements we should make in the documentation and comments, but I'm 
comfortable with most of the implementation details.






diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index cc4a93a8763..99cf0cdbdca 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,40 @@ for (i = 1; i < operand3; i++)
operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
  @end smallexample
  
+@cindex @code{select_vl@var{m}} instruction pattern

+@item @code{select_vl@var{m}}
+Set operand 0 to the number of active elements in vector will be updated value.

This reads rather poorly.  Is this still accurate?

Set operand 0 to the number of active elements in a vector to be updated 
in a loop iteration based on the total number of elements to be updated, 
the vectorization factor and vector properties of the target.




+operand 1 is the total elements need to be updated value.

operand 1 is the total elements in the vector to be updated.



+
+The output of this pattern is not only used as IV of loop control counter, but 
also
+is used as the IV of address calculation with multiply/shift operation. This 
allow
+us dynamic adjust the number of elements is processed in each iteration of the 
loop.
This allows dynamic adjustment of the number of elements processed each 
loop iteration. -- is that still accurate and does it read better?




@@ -47,7 +47,9 @@ along with GCC; see the file COPYING3.  If not see
 so that we can free them all at once.  */
  static bitmap_obstack loop_renamer_obstack;
  
-/* Creates an induction variable with value BASE + STEP * iteration in LOOP.

+/* Creates an induction variable with value BASE (+/-) STEP * iteration in 
LOOP.
+   If CODE is PLUS_EXPR, the induction variable is BASE + STEP * iteration.
+   If CODE is MINUS_EXPR, the induction variable is BASE - STEP * iteration.
 It is expected that neither BASE nor STEP are shared with other expressions
 (unless the sharing rules allow this).  Use VAR as a base var_decl for it
 (if NULL, a new temporary will be created).  The increment will occur at
It's been pretty standard to stick with just PLUS_EXPR for this stuff 
and instead negate the constant to produce the same effect as 
MINUS_EXPR.  Is there a reason we're not continuing that practice? 
Sorry if you've answered this already -- if you have, you can just point 
me at the prior discussion and I'll read it.





diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 44bd5f2c805..d63ded5d4f0 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -385,6 +385,48 @@ vect_maybe_

Re: [x86_64 PATCH] Introduce insvti_highpart define_insn_and_split.

2023-05-07 Thread Uros Bizjak via Gcc-patches
On Sat, May 6, 2023 at 4:00 PM Roger Sayle  wrote:
>
>
> Hi Uros,
> This is a repost/respin of a patch that was conditionally approved:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609470.html
>
> This patch adds a convenient post-reload splitter for setting/updating
> the highpart of a TImode variable, using i386's previously added
> split_double_concat infrastructure.
>
> For the new test case below:
>
> __int128 foo(__int128 x, unsigned long long y)
> {
>   __int128 t = (__int128)y << 64;
>   __int128 r = (x & ~0ull) | t;
>   return r;
> }
>
> mainline GCC with -O2 currently generates:
>
> foo:movq%rdi, %rcx
> xorl%eax, %eax
> xorl%edi, %edi
> orq %rcx, %rax
> orq %rdi, %rdx
> ret
>
> with this patch, GCC instead now generates the much better:
>
> foo:movq%rdi, %rcx
> movq%rcx, %rax
> ret
>
> It turns out that the -m32 equivalent of this testcase, already
> avoids using explict orl/xor instructions, as it gets optimized
> (in combine) by a completely different path.  Given that this idiom
> isn't seen in 32-bit code (so this pattern doesn't match with -m32),
> and also that the shorter 32-bit AND bitmask is represented as a
> CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split
> is implemented for just TARGET_64BIT rather than contort a "generic"
> implementation using DWI mode iterators.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline now that we're back in stage 1?
>
>
> 2023-05-06  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (any_or_plus): Move definition earlier.
> (*insvti_highpart_1): New define_insn_and_split to overwrite
> (insv) the highpart of a TImode register/memory.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/insvti_highpart-1.c: New test case.

@@ -3479,6 +3479,31 @@
   "mov{b}\t{%h1, %h0|%h0, %h1}"
   [(set_attr "type" "imov")
(set_attr "mode" "QI")])
+
+(define_code_iterator any_or_plus [plus ior xor])

Please add a line of vertical space here.

+(define_insn_and_split "*insvti_highpart_1"

...

+   && CONST_WIDE_INT_P (operands[3])
+   && CONST_WIDE_INT_NUNITS (operands[3]) == 2
+   && CONST_WIDE_INT_ELT (operands[3], 0) == -1
+   && CONST_WIDE_INT_ELT (operands[3], 1) == 0"
+  "#"
+  "&& reload_completed"
+  [(clobber (const_int 0))]

The above RTX is unreachable, but please use

[(const_int 0)]

as is the case with similar patterns.

OK with the above changes.

Thanks,
Uros.


Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-07 Thread Jeff Law via Gcc-patches




On 5/6/23 19:55, Li, Pan2 wrote:

It looks like we cannot simply swap the code and mode in rtx_def, the code may 
have to be the same bits as the tree_code in tree_base. Or we will meet ICE 
like below.

rtx_def code 16 => 8 bits.
rtx_def mode 8 => 16 bits.

static inline decl_or_value
dv_from_value (rtx value)
{
   decl_or_value dv;
   dv = value;
   gcc_checking_assert (dv_is_value_p (dv));  <=  ICE
   return dv;
Ugh.  We really just need to fix this code.  It assumes particular 
structure layouts and that's just wrong/dumb.


So I think think the first step is to fix up this crap code in 
var-tracking.  That should be a patch unto itself.  Then we'd have the 
structure changes as a separate change.


Jeff


[PATCH] Fix aarch64/109762: push_options/push_options does not work sometimes

2023-05-07 Thread Andrew Pinski via Gcc-patches
aarch64_isa_flags (and aarch64_asm_isa_flags) are both aarch64_feature_flags 
(uint64_t)
but since r12-8000-g14814e20161d, they are saved/restored as unsigned long. This
does not make a difference for LP64 targets but on ILP32 and LLP64IL32 targets,
it means it does not get restored correctly.
This patch changes over to use aarch64_feature_flags instead of unsigned long.

Committed as obvious after a bootstrap/test.

gcc/ChangeLog:

PR target/109762
* config/aarch64/aarch64-builtins.cc 
(aarch64_simd_switcher::aarch64_simd_switcher):
Change argument type to aarch64_feature_flags.
* config/aarch64/aarch64-protos.h (aarch64_simd_switcher): Change
constructor argument type to aarch64_feature_flags.
Change m_old_asm_isa_flags to be aarch64_feature_flags.
---
 gcc/config/aarch64/aarch64-builtins.cc | 2 +-
 gcc/config/aarch64/aarch64-protos.h| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 94ad364b997..cb6aae3f1fa 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1547,7 +1547,7 @@ aarch64_scalar_builtin_type_p (aarch64_simd_type t)
 
 /* Enable AARCH64_FL_* flags EXTRA_FLAGS on top of the base Advanced SIMD
set.  */
-aarch64_simd_switcher::aarch64_simd_switcher (unsigned int extra_flags)
+aarch64_simd_switcher::aarch64_simd_switcher (aarch64_feature_flags 
extra_flags)
   : m_old_asm_isa_flags (aarch64_asm_isa_flags),
 m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY)
 {
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 4904d193647..b138494384b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -733,11 +733,11 @@ const unsigned int AARCH64_BUILTIN_CLASS = (1 << 
AARCH64_BUILTIN_SHIFT) - 1;
 class aarch64_simd_switcher
 {
 public:
-  aarch64_simd_switcher (unsigned int extra_flags = 0);
+  aarch64_simd_switcher (aarch64_feature_flags extra_flags = 0);
   ~aarch64_simd_switcher ();
 
 private:
-  unsigned long m_old_asm_isa_flags;
+  aarch64_feature_flags m_old_asm_isa_flags;
   bool m_old_general_regs_only;
 };
 
-- 
2.31.1



Re: [patch, fortran] PR109662 Namelist input with comma after name accepted

2023-05-07 Thread Jerry D via Gcc-patches

On 5/6/23 11:15 AM, Harald Anlauf via Fortran wrote:

Hi Jerry, Steve,

I think I have to pour a little water into the wine.

The patch fixes the reported issue only for a comma after
the namelist name, but we still accept a few other illegal
characters, e.g. ';', because:

#define is_separator(c) (c == '/' ||  c == ',' || c == '\n' || c == ' ' \
  || c == '\t' || c == '\r' || c == ';' || \
  (dtp->u.p.namelist_mode && c == '!'))

We don't want that in standard conformance mode, or do we?

Cheers,
Harald

On 5/6/23 06:02, Steve Kargl via Gcc-patches wrote:

On Fri, May 05, 2023 at 08:41:48PM -0700, Jerry D via Fortran wrote:
The attached patch adds a check for the invalid comma and emits a 
runtime

error if -std=f95,f2003,f2018 are specified at compile time.

Attached patch includes a new test case.

Regression tested on x86_64-linux-gnu.

OK for mainline?



Yes.  Thanks for the fix.  It's been a long time since
I looked at libgfortran code and couldn't quite determine
where to start to fix this.





As I think back, I don't recall ever seeing a semi-colon used after a 
NAMELIST name, so I think we should reject it always.  The other "soft" 
blanks we should allow.


I will make a another patch on trunk to reject the semi-colon and if no 
one objects here I will test and push it.


Regards,

Jerry



Re: [PATCH v2] MIPS: add speculation_barrier support

2023-05-07 Thread Maciej W. Rozycki
On Wed, 3 May 2023, Jiaxun Yang wrote:

> Since it’s possible to run R2- binary on R2+ processor, we’d better find a
> semantic that do eliminate speculation on all processors. While SSNOPs
> on R2+ processors is pretty much undefined, there is no guarantee that
> SSNOP sequence can eliminate speculation.

 Not exactly undefined on R2+, SSNOP is still required to single-issue, so 
it does act as an execution barrier.  Good point otherwise.

 Both EHB and J[AL]R.HB are backwards compatible however (except for an 
obscure 4Kc J[AL]R.HB erratum I came across once and which may be no 
longer relevant), so I think the legacy sequence ought to just return via 
JR.HB as well, therefore providing the required semantics with newer 
hardware.  If it does trap for 4Kc, then the OS can emulate it (and we can 
ignore it for bare metal, deferring to whoever might be interested for a 
workaround).

> My proposal is for R2- CPUs we can do a dummy syscall to act as instruction
> hazard barrier, since exception must clear the pipeline this should be true
> for all known implementations.

 I think the SSNOP approach should be sufficient.

> The most lightweight syscall I know is to do a MIPS_ATOMIC_SET with
> sysmips. A dummy variable on stack should do the track. Do let me know if 
> there
> is a better option.

 That would have to be gettimeofday(2) then, the most performance-critical 
one, and also one that does not have side effects.  The real syscall and 
not VSDO emulation of course (there's a reason it's emulated via VSDO, 
which is exactly our reason too).

> I have a vague memory about a discussion finding that exception does not 
> indicate
> a memory barrier, so perhaps we still need a sync preceding to that syscall.

 There is no claim that I could find in the architecture specification 
saying that taking an exception implies a memory barrier and therefore we 
must conclude it does not.  Likewise executing ERET.

 As I say I think the SSNOP approach should be sufficient, along with 
relying on SYNC emulation.

> > I think there may be no authoritative source of information here, this is 
> > a grey area.  The longest SSNOP sequences I have seen were for the various 
> > Broadcom implementations and counted 7 instructions.  Both the Linux 
> > kernel and the CFE firmware has them.
> 
> Was it for SiByte or BMIPS?

 Both AFAICT.

> > Also we may not be able to fully enforce ordering for the oldest devices 
> > that do not implement SYNC, as this is system-specific, e.g. involving 
> > branching on the CP0 condition with the BC0F instruction, and inventing an 
> > OS interface for that seems unreasonable at this point of history.
> 
> I guess this is not a valid concern for user space applications?
> As per R4000 manual BC0F will issue “Coprocessor unusable exception”
> exception and it’s certain that we have Staus.CU0 = 0 in user space.

 Exactly, which is why an OS service would have to provide the required 
semantics to the userland, and none might be available.  And we probably 
do not care anyway, because I gather this is a security feature to prevent 
certain types of data leaks via a side channel.  I wouldn't expect anyone 
doing any serious security-sensitive processing with legacy MIPS hardware.  
And then there's no speculative execution with all these pieces of legacy 
hardware (R3000, eh?) that have no suitable barriers provided, mostly 
because they are not relevant for them anyway.

 For bare metal we probably do not care about such legacy hardware either 
way.

 Overall I'd say let's do the best we can without bending backwards and 
then rely on people's common sense.

  Maciej


Re: [patch, fortran] PR109662 Namelist input with comma after name accepted

2023-05-07 Thread Harald Anlauf via Gcc-patches

Hi Jerry,

I've made a small compiler survey how they behave on namelist read
from an internal unit when:

1.) there is a single input line of the type
"&stuff" // testchar // " n = 666/"

2.) the input spans 2 lines split after the testchar

3.) same as 2.) but first line right-adjusted

See attached source code.

Competitors: Intel, NAG, NVidia, gfortran at r14-547 with -std=f2018.

My findings were (last column is iostat, next-to-last is n read or -1):

NAG:

 Compiler version = NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
 1-line:   > < 666 0
 2-line/left:  > < 666 0
 2-line/right: > < 666 0
 1-line:   >!< -1 187
 2-line/left:  >!< -1 187
 2-line/right: >!< -1 187
 1-line:   >/< -1 187
 2-line/left:  >/< -1 187
 2-line/right: >/< -1 187
 1-line:   >,< -1 187
 2-line/left:  >,< -1 187
 2-line/right: >,< -1 187
 1-line:   >;< -1 187
 2-line/left:  >;< -1 187
 2-line/right: >;< -1 187
 1-line:   tab 666 0
 2-line/left:  tab 666 0
 2-line/right: tab 666 0
 1-line:   lf -1 187
 2-line/left:  lf -1 187
 2-line/right: lf -1 187
 1-line:   ret -1 187
 2-line/left:  ret -1 187
 2-line/right: ret -1 187

My interpretation of this is that NAG treats tab as (white)space,
everything else gives an error.  This is the strictest compiler.

Intel:

 Compiler version = Intel(R) Fortran Intel(R) 64 Compiler Classic for
applications running on Intel(R) 64, Version 2021.9.0 Build 20230302_00
 1-line:   > < 666   0
 2-line/left:  > < 666   0
 2-line/right: > < 666   0
 1-line:   >!<  -1  -1
 2-line/left:  >!< 666   0
 2-line/right: >!< 666   0
 1-line:   >/<  -1   0
 2-line/left:  >/<  -1   0
 2-line/right: >/<  -1   0
 1-line:   >,<  -1  17
 2-line/left:  >,<  -1  17
 2-line/right: >,<  -1  17
 1-line:   >;<  -1  17
 2-line/left:  >;<  -1  17
 2-line/right: >;<  -1  17
 1-line:   tab 666   0
 2-line/left:  tab 666   0
 2-line/right: tab 666   0
 1-line:   lf 666   0
 2-line/left:  lf 666   0
 2-line/right: lf 666   0
 1-line:   ret  -1  17
 2-line/left:  ret  -1  17
 2-line/right: ret  -1  17

Nvidia:

 Compiler version = nvfortran 23.3-0
 1-line:   > <  6660
 2-line/left:  > <  6660
 2-line/right: > <  6660
 1-line:   >!<   -1   -1
 2-line/left:  >!<   -1   -1
 2-line/right: >!<   -1   -1
 1-line:   >/<   -1   -1
 2-line/left:  >/<   -1   -1
 2-line/right: >/<   -1   -1
 1-line:   >,<   -1   -1
 2-line/left:  >,<   -1   -1
 2-line/right: >,<   -1   -1
 1-line:   >;<   -1   -1
 2-line/left:  >;<   -1   -1
 2-line/right: >;<   -1   -1
 1-line:   tab  6660
 2-line/left:  tab  6660
 2-line/right: tab  6660
 1-line:   lf   -1   -1
 2-line/left:  lf  6660
 2-line/right: lf  6660
 1-line:   ret  6660
 2-line/left:  ret  6660
 2-line/right: ret  6660

gfortran (see above):

 Compiler version = GCC version 14.0.0 20230506 (experimental)
 1-line:   > < 666   0
 2-line/left:  > < 666   0
 2-line/right: > < 666   0
 1-line:   >!<  -1  -1
 2-line/left:  >!<  -1   0
 2-line/right: >!< 666   0
 1-line:   >/<  -1   0
 2-line/left:  >/<  -1   0
 2-line/right: >/<  -1   0
 1-line:   >,< 6665010
 2-line/left:  >,< 6665010
 2-line/right: >,< 6665010
 1-line:   >;< 666   0
 2-line/left:  >;< 666   0
 2-line/right: >;< 666   0
 1-line:   tab 666   0
 2-line/left:  tab 666   0
 2-line/right: tab 666   0
 1-line:   lf 666   0
 2-line/left:  lf 666   0
 2-line/right: lf 666   0
 1-line:   ret 666   0
 2-line/left:  ret 666   0
 2-line/right: ret 666   0


So there seems to be a consensus that "," and ";" must be rejected,
and tab is accepted (makes real sense), but already the termination
character "/" and comment character "!" are treated differently.
And how do we want to treat lf and ret in internal files with
-std=f20xx?

Cheers,

Re: [PATCH v2] MIPS: add speculation_barrier support

2023-05-07 Thread Jiaxun Yang via Gcc-patches



> 2023年5月7日 18:34,Maciej W. Rozycki  写道:
> 
> On Wed, 3 May 2023, Jiaxun Yang wrote:
> 
>> Since it’s possible to run R2- binary on R2+ processor, we’d better find a
>> semantic that do eliminate speculation on all processors. While SSNOPs
>> on R2+ processors is pretty much undefined, there is no guarantee that
>> SSNOP sequence can eliminate speculation.
> 
> Not exactly undefined on R2+, SSNOP is still required to single-issue, so 
> it does act as an execution barrier.  Good point otherwise.
> 
> Both EHB and J[AL]R.HB are backwards compatible however (except for an 
> obscure 4Kc J[AL]R.HB erratum I came across once and which may be no 
> longer relevant), so I think the legacy sequence ought to just return via 
> JR.HB as well, therefore providing the required semantics with newer 
> hardware.  If it does trap for 4Kc, then the OS can emulate it (and we can 
> ignore it for bare metal, deferring to whoever might be interested for a 
> workaround).

Hmm, I just checked MIPS-IV manual, it seems like HB bit (bit 10) is defined as
zero for both JR and JALR.

Is it actually omitted in implementation?

Thanks
Jiaxun




Re: [PATCH v2] MIPS: add speculation_barrier support

2023-05-07 Thread Maciej W. Rozycki
On Sun, 7 May 2023, Jiaxun Yang wrote:

> > Both EHB and J[AL]R.HB are backwards compatible however (except for an 
> > obscure 4Kc J[AL]R.HB erratum I came across once and which may be no 
> > longer relevant), so I think the legacy sequence ought to just return via 
> > JR.HB as well, therefore providing the required semantics with newer 
> > hardware.  If it does trap for 4Kc, then the OS can emulate it (and we can 
> > ignore it for bare metal, deferring to whoever might be interested for a 
> > workaround).
> 
> Hmm, I just checked MIPS-IV manual, it seems like HB bit (bit 10) is defined 
> as
> zero for both JR and JALR.
> 
> Is it actually omitted in implementation?

 What has become the hint field was supposed not to be decoded in earlier 
silicon, which is why such encodings were chosen for JR.HB and JALR.HB.  
I was told it was actually verified across existing silicon at the time 
the instructions were being defined with MIPSr1 (i.e. when the hint field 
was added to the ISA).  The issue with the 4Kc was an unfortunate mistake.

 Many instructions are not fully decoded causing encoding aliases, MFHI, 
etc. comes to mind.

  Maciej


[PATCH] [GCC 13] Fix aarch64/109762: push_options/push_options does not work sometimes

2023-05-07 Thread Andrew Pinski via Gcc-patches
aarch64_isa_flags (and aarch64_asm_isa_flags) are both aarch64_feature_flags 
(uint64_t)
but since r12-8000-g14814e20161d, they are saved/restored as unsigned long. This
does not make a difference for LP64 targets but on ILP32 and LLP64IL32 targets,
it means it does not get restored correctly.
This patch changes over to use aarch64_feature_flags instead of unsigned long.

Committed as obvious to gcc 13 branch after a bootstrap/test.

gcc/ChangeLog:

PR target/109762
* config/aarch64/aarch64-builtins.cc 
(aarch64_simd_switcher::aarch64_simd_switcher):
Change argument type to aarch64_feature_flags.
* config/aarch64/aarch64-protos.h (aarch64_simd_switcher): Change
constructor argument type to aarch64_feature_flags.
Change m_old_asm_isa_flags to be aarch64_feature_flags.

(cherry picked from commit a1a9ce2441df0675540faee8476523164e12578b)
---
 gcc/config/aarch64/aarch64-builtins.cc | 2 +-
 gcc/config/aarch64/aarch64-protos.h| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index cc6b7c01fd1..4ee6cb00f8b 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1547,7 +1547,7 @@ aarch64_scalar_builtin_type_p (aarch64_simd_type t)
 
 /* Enable AARCH64_FL_* flags EXTRA_FLAGS on top of the base Advanced SIMD
set.  */
-aarch64_simd_switcher::aarch64_simd_switcher (unsigned int extra_flags)
+aarch64_simd_switcher::aarch64_simd_switcher (aarch64_feature_flags 
extra_flags)
   : m_old_asm_isa_flags (aarch64_asm_isa_flags),
 m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY)
 {
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 63339fa47df..e727e207367 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -733,11 +733,11 @@ const unsigned int AARCH64_BUILTIN_CLASS = (1 << 
AARCH64_BUILTIN_SHIFT) - 1;
 class aarch64_simd_switcher
 {
 public:
-  aarch64_simd_switcher (unsigned int extra_flags = 0);
+  aarch64_simd_switcher (aarch64_feature_flags extra_flags = 0);
   ~aarch64_simd_switcher ();
 
 private:
-  unsigned long m_old_asm_isa_flags;
+  aarch64_feature_flags m_old_asm_isa_flags;
   bool m_old_general_regs_only;
 };
 
-- 
2.31.1



Re: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-07 Thread 钟居哲
>> It's been pretty standard to stick with just PLUS_EXPR for this stuff
>> and instead negate the constant to produce the same effect as
>> MINUS_EXPR.  Is there a reason we're not continuing that practice?
>> Sorry if you've answered this already -- if you have, you can just point
>> me at the prior discussion and I'll read it.

Richard (Sandiford) has answered this question:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616745.html 
And as Richard (Biener) said, it will make IVOPTs failed if we have variable 
IVs.
However, we already have implemented variable IVs in downstream RVV GCC
and works fine and I don't see any bad codegen so far. So, I think it may not
be a serious issue for RVV.

Besides, this implementation is not my idea which is just following the guide 
coming
from RVV ISA spec.  And also inspired by the implementation coming from LLVM.
Reference:
1). RVV ISA: 
https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s 
2). LLVM length stuff implementation (Should note that "get_vector_length" 
pattern in LLVM is
 totally doing the same thing as "select_vl" pattern in GCC here:
 https://reviews.llvm.org/D99750 

vvaddint32:
vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
vle32.v v0, (a1) # Get first vector
  sub a0, a0, t0 # Decrement number done
  slli t0, t0, 2 # Multiply number done by 4 bytes
  add a1, a1, t0 # Bump pointer
vle32.v v1, (a2) # Get second vector
  add a2, a2, t0 # Bump pointer
vadd.vv v2, v0, v1   # Sum vectors
vse32.v v2, (a3) # Store result
  add a3, a3, t0 # Bump pointer
  bnez a0, vvaddint32# Loop back
  ret# Finished

Notice "sub a0, a0, t0", the "t0" is the variable coming from "vsetvli t0, a0, 
e32, ta, ma"
which is generated by "get_vector_length" in LLVM, or similiar it is also 
generatd by "select_vl" in GCC too.

Other comments from Jeff will be addressed in the next patch (V5), I will wait 
for
Richards (both Sandiford && Biener that are the experts in Loop Vectorizer) 
comments.
Then send V5 patch which is including all comments from Jeff && Richards 
(Sandiford && Biener).

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-07 23:19
To: juzhe.zhong; gcc-patches
CC: richard.sandiford; rguenther
Subject: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by 
variable amount support
 
 
On 5/4/23 07:25, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> This patch is fixing V3 patch:
> https://patchwork.sourceware.org/project/gcc/patch/20230407014741.139387-1-juzhe.zh...@rivai.ai/
> 
> Fix issues according to Richard Sandiford && Richard Biener.
> 
> 1. Rename WHILE_LEN pattern into SELECT_VL according to Richard Sandiford.
> 2. Support multiple-rgroup for non-SLP auto-vectorization.
> 
> For vec_pack_trunc pattern (multi-rgroup of non-SLP), we generate the 
> total length:
> 
>   _36 = MIN_EXPR ;
> 
>   First length (MIN (X, VF/N)):
> loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;
> 
>   Second length (X - MIN (X, 1 * VF/N)):
> loop_len_16 = _36 - loop_len_15;
> 
>   Third length (X - MIN (X, 2 * VF/N)):
> _38 = MIN_EXPR <_36, POLY_INT_CST [4, 4]>;
> loop_len_17 = _36 - _38;
> 
>   Forth length (X - MIN (X, 3 * VF/N)):
> _39 = MIN_EXPR <_36, POLY_INT_CST [6, 6]>;
> loop_len_18 = _36 - _39;
> 
> The reason that I use MIN_EXPR instead of SELECT_VL to calculate total length 
> since using SELECT_VL
> to adapt induction IV consumes more instructions than just using MIN_EXPR. 
> Also, during testing,
> I found it's hard to adjust length correctly according to SELECT_VL.
> 
> So, this patch we only use SELECT_VL for single-rgroup with single length 
> control.
> 
> 3. Fix document of select_vl for Richard Biener (remove mode N).
> 4. Fix comments of vect_set_loop_controls_by_select_vl according to Richard 
> Biener.
> 5. Keep loop_vinfo as first parameter for "vect_get_loop_len".
> 6. make requirement of get_while_len_data_ref_ptr outside, let it to be gated 
> at the caller site.
> 
> More comments from Richard Biener:
>>> So it's not actually saturating.  The saturating operation is done by 
>>> .WHILE_LEN?
> I define the outcome of SELECT_VL (n, vf)  (WHILE_LEN) = IN_RANGE (0, min (n, 
> vf)) will make
> the loop control counter never underflow zero.
> 
>>> I see.  I wonder if it makes sense to leave .WHILE_LEN aside for a start,
>>> the above scheme should also work for single rgroups, no?
>>> As said, it _looks_ like you can progress without .WHILE_LEN and using
>>> .WHILE_LEN is a pure optimization?
> Yes, SELECT_VL (WHILE_LEN) is pure optimization for single-rgroup and allow
> target adjust any length = INRANGE (0, min (n, vf)) each iteration.
> 
> Let me known if I missed something for the V3 patch.
So at a high level this is pretty good.  I think there's some 
improvem

[PATCH] MATCH: Move `a <= CST1 ? MAX : a` optimization to match

2023-05-07 Thread Andrew Pinski via Gcc-patches
This moves the `a <= CST1 ? MAX : a` optimization
from phiopt to match. It just adds a new pattern to match.pd.

There is one more change needed before being able to remove
minmax_replacement from phiopt.

A few notes on the testsuite changes:
* phi-opt-5.c is now able to optimize at phiopt1 so remove
the xfail.
* pr66726-4.c can be optimized during fold before phiopt1
so need to change the scanning.
* pr66726-5.c needs two phiopt passes currently to optimize
to the right thing, it needed 2 phiopt passes before, the cast
from int to unsigned char is the reason.
* pr66726-6.c is what the original pr66726-4.c was testing
before the fold was able to optimize it.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (`(a CMP CST1) ? max : a`): New
pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-5.c: Remove last xfail.
* gcc.dg/tree-ssa/pr66726-4.c: Change how scanning
works.
* gcc.dg/tree-ssa/pr66726-5.c: New test.
* gcc.dg/tree-ssa/pr66726-6.c: New test.
---
 gcc/match.pd  | 18 +++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c |  5 +++-
 gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c | 28 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr66726-6.c | 17 ++
 5 files changed, 68 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr66726-6.c

diff --git a/gcc/match.pd b/gcc/match.pd
index ceae1c34abc..a55ede838cd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4954,6 +4954,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (code == MAX_EXPR)
   (minmax (max @1 @2) @4)))
 
+/* Optimize (a CMP CST1) ? max : a */
+(for cmp(gt  ge  lt  le)
+ minmax (min min max max)
+ (simplify
+  (cond (cmp @0 @1) (minmax:c@2 @0 @3) @4)
+   (with
+{
+  tree_code code = minmax_from_comparison (cmp, @0, @1, @0, @4);
+}
+(if ((cmp == LT_EXPR || cmp == LE_EXPR)
+&& code == MIN_EXPR
+ && integer_nonzerop (fold_build2 (LE_EXPR, boolean_type_node, @3, 
@1)))
+ (min @2 @4)
+ (if ((cmp == GT_EXPR || cmp == GE_EXPR)
+ && code == MAX_EXPR
+  && integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node, @3, 
@1)))
+  (max @2 @4))
+
 /* X != C1 ? -X : C2 simplifies to -X when -C1 == C2.  */
 (simplify
  (cond (ne @0 INTEGER_CST@1) (negate@3 @0) INTEGER_CST@2)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c
index 5f78a1ba6dc..e78d9d8b83d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c
@@ -39,7 +39,7 @@ float repl2 (float vary)
 
 /* phiopt1 confused by predictors.  */
 /* { dg-final { scan-tree-dump "vary.*MAX_EXPR.*0\\.0" "phiopt1" } } */
-/* { dg-final { scan-tree-dump "vary.*MIN_EXPR.*1\\.0" "phiopt1" { xfail *-*-* 
} } } */
+/* { dg-final { scan-tree-dump "vary.*MIN_EXPR.*1\\.0" "phiopt1" } } */
 /* { dg-final { scan-tree-dump "vary.*MAX_EXPR.*0\\.0" "phiopt2"} } */
 /* { dg-final { scan-tree-dump "vary.*MIN_EXPR.*1\\.0" "phiopt2"} } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c
index 4e43522f3a3..930ad5fb79f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c
@@ -9,4 +9,7 @@ foo (unsigned char *p, int i)
   *p = SAT (i);
 }
 
-/* { dg-final { scan-tree-dump-times "COND_EXPR .*and PHI .*converted to 
straightline code" 1 "phiopt1" } } */
+/* fold could optimize SAT before phiopt1 so only match on the
+   MIN/MAX here.  */
+/* { dg-final { scan-tree-dump-times "= MIN_EXPR" 1 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "= MAX_EXPR" 1 "phiopt1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c
new file mode 100644
index 000..4b5066cdb6b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt1-details -fdump-tree-phiopt2-details 
-fdump-tree-optimized" } */
+
+#define SAT(x) (x < 0 ? 0 : (x > 255 ? 255 : x))
+
+unsigned char
+foo (unsigned char *p, int i)
+{
+  if (i < 0)
+return 0;
+  {
+int t;
+if (i > 255)
+  t = 255;
+else
+  t = i;
+return t;
+  }
+}
+
+/* Because of the way PHIOPT works, it only does the merging of BBs after it 
is done so we get the case were we can't
+   optimize the above until phiopt2 right now.  */
+/* { dg-final { scan-tree-dump-times "COND_EXPR .*and PHI .*converted to 
straightline code" 2 "phiopt1" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "COND_EXPR .*and PHI .*converted to 
straightline code" 0 "phiopt2" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "= MIN_EXPR" 1 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-ti

aarch64: Add scheduling model for Neoverse V1

2023-05-07 Thread Evandro Menezes via Gcc-patches
This patch adds the scheduling model for Neoverse V1, based on the information 
from the “Arm Neoverse V1 Software Optimization Guide” and on static and 
dynamic analysis of internal and public benchmarks.  Results are forthcoming. 

-- 
Evandro Menezes


0001-aarch64-Add-scheduling-model-for-Neoverse-V1.patch
Description: Binary data



Re: [patch, fortran] PR109662 Namelist input with comma after name accepted

2023-05-07 Thread Steve Kargl via Gcc-patches
Harald,
Thanks for keeping us honest.  I didn't check what other 
separators might cause a problem.

After 2 decades of working on gfortran, I've come to conclusion
that -std=f2018 should be the default.  When f2023 is ratified,
the default becomes -std=f2023.  All GNU fortran extension 
should be behind an option, and we should be aggressive 
eliminating extensions.

Yes, this means that 'real*4' and similar would require 
a -fallow-nonstandard-declaration option.

-- 
steve

On Sun, May 07, 2023 at 08:33:43PM +0200, Harald Anlauf wrote:
> Hi Jerry,
> 
> I've made a small compiler survey how they behave on namelist read
> from an internal unit when:
> 
> 1.) there is a single input line of the type
> "&stuff" // testchar // " n = 666/"
> 
> 2.) the input spans 2 lines split after the testchar
> 
> 3.) same as 2.) but first line right-adjusted
> 
> See attached source code.
> 
> Competitors: Intel, NAG, NVidia, gfortran at r14-547 with -std=f2018.
> 
> My findings were (last column is iostat, next-to-last is n read or -1):
> 
> NAG:
> 
>  Compiler version = NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
>  1-line:   > < 666 0
>  2-line/left:  > < 666 0
>  2-line/right: > < 666 0
>  1-line:   >!< -1 187
>  2-line/left:  >!< -1 187
>  2-line/right: >!< -1 187
>  1-line:   >/< -1 187
>  2-line/left:  >/< -1 187
>  2-line/right: >/< -1 187
>  1-line:   >,< -1 187
>  2-line/left:  >,< -1 187
>  2-line/right: >,< -1 187
>  1-line:   >;< -1 187
>  2-line/left:  >;< -1 187
>  2-line/right: >;< -1 187
>  1-line:   tab 666 0
>  2-line/left:  tab 666 0
>  2-line/right: tab 666 0
>  1-line:   lf -1 187
>  2-line/left:  lf -1 187
>  2-line/right: lf -1 187
>  1-line:   ret -1 187
>  2-line/left:  ret -1 187
>  2-line/right: ret -1 187
> 
> My interpretation of this is that NAG treats tab as (white)space,
> everything else gives an error.  This is the strictest compiler.
> 
> Intel:
> 
>  Compiler version = Intel(R) Fortran Intel(R) 64 Compiler Classic for
> applications running on Intel(R) 64, Version 2021.9.0 Build 20230302_00
>  1-line:   > < 666   0
>  2-line/left:  > < 666   0
>  2-line/right: > < 666   0
>  1-line:   >!<  -1  -1
>  2-line/left:  >!< 666   0
>  2-line/right: >!< 666   0
>  1-line:   >/<  -1   0
>  2-line/left:  >/<  -1   0
>  2-line/right: >/<  -1   0
>  1-line:   >,<  -1  17
>  2-line/left:  >,<  -1  17
>  2-line/right: >,<  -1  17
>  1-line:   >;<  -1  17
>  2-line/left:  >;<  -1  17
>  2-line/right: >;<  -1  17
>  1-line:   tab 666   0
>  2-line/left:  tab 666   0
>  2-line/right: tab 666   0
>  1-line:   lf 666   0
>  2-line/left:  lf 666   0
>  2-line/right: lf 666   0
>  1-line:   ret  -1  17
>  2-line/left:  ret  -1  17
>  2-line/right: ret  -1  17
> 
> Nvidia:
> 
>  Compiler version = nvfortran 23.3-0
>  1-line:   > <  6660
>  2-line/left:  > <  6660
>  2-line/right: > <  6660
>  1-line:   >!<   -1   -1
>  2-line/left:  >!<   -1   -1
>  2-line/right: >!<   -1   -1
>  1-line:   >/<   -1   -1
>  2-line/left:  >/<   -1   -1
>  2-line/right: >/<   -1   -1
>  1-line:   >,<   -1   -1
>  2-line/left:  >,<   -1   -1
>  2-line/right: >,<   -1   -1
>  1-line:   >;<   -1   -1
>  2-line/left:  >;<   -1   -1
>  2-line/right: >;<   -1   -1
>  1-line:   tab  6660
>  2-line/left:  tab  6660
>  2-line/right: tab  6660
>  1-line:   lf   -1   -1
>  2-line/left:  lf  6660
>  2-line/right: lf  6660
>  1-line:   ret  6660
>  2-line/left:  ret  6660
>  2-line/right: ret  6660
> 
> gfortran (see above):
> 
>  Compiler version = GCC version 14.0.0 20230506 (experimental)
>  1-line:   > < 666   0
>  2-line/left:  > < 666   0
>  2-line/right: > < 666   0
>  1-line:   >!<  -1  -1
>  2-line/left:  >!<  -1   0
>  2-line/right: >!< 666   0
>  1-line:   >/<  -1   0
>  2-line/left:  >/<  -1   0
>  2-line/right: >/<  -1   0
>  1-line:   >,< 6665010
>  2-line/left:  >,< 6665010
>  2-line/right: >,< 6665010
>  1-line:   

RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-07 Thread Li, Pan2 via Gcc-patches
I see. Thank you, will have a try soon.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, May 7, 2023 11:24 PM
To: Li, Pan2 ; Kito Cheng 
Cc: juzhe.zh...@rivai.ai; rguenther ; richard.sandiford 
; gcc-patches ; palmer 
; jakub 
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit



On 5/6/23 19:55, Li, Pan2 wrote:
> It looks like we cannot simply swap the code and mode in rtx_def, the code 
> may have to be the same bits as the tree_code in tree_base. Or we will meet 
> ICE like below.
> 
> rtx_def code 16 => 8 bits.
> rtx_def mode 8 => 16 bits.
> 
> static inline decl_or_value
> dv_from_value (rtx value)
> {
>decl_or_value dv;
>dv = value;
>gcc_checking_assert (dv_is_value_p (dv));  <=  ICE
>return dv;
Ugh.  We really just need to fix this code.  It assumes particular structure 
layouts and that's just wrong/dumb.

So I think think the first step is to fix up this crap code in var-tracking.  
That should be a patch unto itself.  Then we'd have the structure changes as a 
separate change.

Jeff


Re: [PATCH] RISC-V: Optimize vsetvli of LCM INSERTED edge for user vsetvli [PR 109743]

2023-05-07 Thread juzhe.zh...@rivai.ai
Gentle ping this patch.

Is this Ok for trunk? Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-06 19:14
To: gcc-patches
CC: kito.cheng; Juzhe-Zhong
Subject: [PATCH] RISC-V: Optimize vsetvli of LCM INSERTED edge for user vsetvli 
[PR 109743]
From: Juzhe-Zhong 
 
This patch is fixing: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743.
 
This issue happens is because we are currently very conservative in 
optimization of user vsetvli.
 
Consider this following case:
 
bb 1:
  vsetvli a5,a4... (demand AVL = a4).
bb 2:
  RVV insn use a5 (demand AVL = a5).
 
LCM will hoist vsetvl of bb 2 into bb 1.
We don't do AVL propagation for this situation since it's complicated that
we should analyze the code sequence between vsetvli in bb 1 and RVV insn in bb 
2.
They are not necessary the consecutive blocks.
 
This patch is doing the optimizations after LCM, we will check and eliminate 
the vsetvli
in LCM inserted edge if such vsetvli is redundant. Such approach is much 
simplier and safe.
 
code:
void
foo2 (int32_t *a, int32_t *b, int n)
{
  if (n <= 0)
  return;
  int i = n;
  size_t vl = __riscv_vsetvl_e32m1 (i);
 
  for (; i >= 0; i--)
  {
vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl);
__riscv_vse32_v_i32m1 (b, v, vl);
 
if (i >= vl)
  continue;
 
if (i == 0)
  return;
 
vl = __riscv_vsetvl_e32m1 (i);
  }
}
 
Before this patch:
foo2:
.LFB2:
.cfi_startproc
ble a2,zero,.L1
mv  a4,a2
li  a3,-1
vsetvli a5,a2,e32,m1,ta,mu
vsetvli zero,a5,e32,m1,ta,ma  <- can be eliminated.
.L5:
vle32.v v1,0(a0)
vse32.v v1,0(a1)
bgeua4,a5,.L3
.L10:
beq a2,zero,.L1
vsetvli a5,a4,e32,m1,ta,mu
addia4,a4,-1
vsetvli zero,a5,e32,m1,ta,ma  <- can be eliminated.
vle32.v v1,0(a0)
vse32.v v1,0(a1)
addiw   a2,a2,-1
bltua4,a5,.L10
.L3:
addiw   a2,a2,-1
addia4,a4,-1
bne a2,a3,.L5
.L1:
ret
 
After this patch:
f:
ble a2,zero,.L1
mv  a4,a2
li  a3,-1
vsetvli a5,a2,e32,m1,ta,ma
.L5:
vle32.v v1,0(a0)
vse32.v v1,0(a1)
bgeua4,a5,.L3
.L10:
beq a2,zero,.L1
vsetvli a5,a4,e32,m1,ta,ma
addia4,a4,-1
vle32.v v1,0(a0)
vse32.v v1,0(a1)
addiw   a2,a2,-1
bltua4,a5,.L10
.L3:
addiw   a2,a2,-1
addia4,a4,-1
bne a2,a3,.L5
.L1:
ret
 
PR target/109743
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pass_vsetvl::commit_vsetvls): Add 
optimization for LCM inserted edge.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/pr109743-1.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-4.c: New test.
 
---
gcc/config/riscv/riscv-vsetvl.cc  | 42 +++
.../gcc.target/riscv/rvv/vsetvl/pr109743-1.c  | 26 
.../gcc.target/riscv/rvv/vsetvl/pr109743-2.c  | 27 
.../gcc.target/riscv/rvv/vsetvl/pr109743-3.c  | 28 +
.../gcc.target/riscv/rvv/vsetvl/pr109743-4.c  | 28 +
5 files changed, 151 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index f55907a410e..fcee7fdf323 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3834,6 +3834,48 @@ pass_vsetvl::commit_vsetvls (void)
  const vector_insn_info *require
= m_vector_manager->vector_exprs[i];
  gcc_assert (require->valid_or_dirty_p ());
+
+   /* Here we optimize the VSETVL is hoisted by LCM:
+
+ Before LCM:
+bb 1:
+  vsetvli a5,a2,e32,m1,ta,mu
+bb 2:
+  vsetvli zero,a5,e32,m1,ta,mu
+  ...
+
+ After LCM:
+bb 1:
+  vsetvli a5,a2,e32,m1,ta,mu
+  LCM INSERTED: vsetvli zero,a5,e32,m1,ta,mu --> eliminate
+bb 2:
+  ...
+*/
+   const basic_block pred_cfg_bb = eg->src;
+   const auto block_info
+ = m_vector_manager->vector_block_infos[pred_cfg_bb->index];
+   const insn_info *pred_insn = block_info.reaching_out.get_insn ();
+   if (pred_insn && vsetvl_insn_p (pred_insn->rtl ())
+   && require->get_avl_source ()
+   && require->get_avl_source ()->insn ()
+   && require->skip_avl_compatible_p (block_info.reaching_out))
+ {
+   vector_insn_info new_info = *require;
+   new_info.set_avl_info (
+ block_info.reaching_out.get_avl_info ());
+   new_info
+ = block_info.reaching_out.merge (new_info, LOCAL_MERGE);
+   change_vsetvl_insn (pred_insn, 

RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-07 Thread Li, Pan2 via Gcc-patches
Update the X86 memory bytes allocated by below changes (included kito's patch 
for the tree common part).

rtx_def code 16 => 12 bits.
rtx_def mode 8 => 12 bits.
tree_base code 16 => 12 bits.

Bytes allocated with O2:
-
Benchmark   |  upstream | with the PATCH
-
400.perlbench   | 25286185160   | 25286590847 ~0.0%
401.bzip2   | 1429883731| 1430373103 ~0.0%
403.gcc | 55023568981   | 55027574220 ~0.0%
429.mcf | 1360975660| 1360959361 ~0.0%
445.gobmk   | 12791636502   | 12789648370 ~0.0%
456.hmmer   | 9354433652| 9353899089 ~0.0%
458.sjeng   | 1991260562| 1991107773 ~0.0%
462.libquantum  | 1725112078| 1724972077 ~0.0%
464.h264ref | 8597673515| 8597748172 ~0.0%
471.omnetpp | 37613034778   | 37614346380 ~0.0%
473.astar   | 3817295518| 3817226365 ~0.0%
483.xalancbmk   | 149418776991  | 149405214817 ~0.0%

Bytes allocated with Ofast + funroll-loops:
--
Benchmark   |  upstream | with the PATCH
--
400.perlbench   | 30438407499   | 30568217795 +0.4% 
401.bzip2   | 2277114519| 2318588280 +1.8%
403.gcc | 64499664264   | 64764400606 +0.4%
429.mcf | 1361486758| 1399872438 +2.8%
445.gobmk   | 15258056111   | 15392769408 +0.9%
456.hmmer   | 10896615649   | 10934649010 +0.3%
458.sjeng   | 2592620709| 2641551464 +1.9%
462.libquantum  | 1814487525| 1856446214 +2.3%
464.h264ref | 13528736878   | 13606989269 +0.6%
471.omnetpp | 38721066702   | 38908678658 +0.5%
473.astar   | 3924015756| 3967867190 +1.1%
483.xalancbmk   | 165897692838  | 166818255397 +0.6%

Pan

-Original Message-
From: Li, Pan2 
Sent: Saturday, May 6, 2023 10:20 AM
To: Kito Cheng 
Cc: juzhe.zh...@rivai.ai; rguenther ; richard.sandiford 
; jeffreyalaw ; gcc-patches 
; palmer ; jakub 
Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

Yes, that makes sense, will have a try and keep you posted.

Pan

-Original Message-
From: Kito Cheng 
Sent: Saturday, May 6, 2023 10:19 AM
To: Li, Pan2 
Cc: juzhe.zh...@rivai.ai; rguenther ; richard.sandiford 
; jeffreyalaw ; gcc-patches 
; palmer ; jakub 
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

I think x86 first? The major thing we want to make sure is that this change 
won't affect those targets which do not really require 16 bit machine_mode too 
much.


On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches 
 wrote:
>
> Sure thing, I will pick them all together and trigger(will send out the 
> overall diff before start to make sure my understand is correct) the test 
> again. BTW which target do we prefer first? X86 or RISC-V.
>
> Pan
>
> From: juzhe.zh...@rivai.ai 
> Sent: Saturday, May 6, 2023 10:00 AM
> To: kito.cheng ; Li, Pan2 
> Cc: rguenther ; richard.sandiford 
> ; jeffreyalaw ; 
> gcc-patches ; palmer ; 
> jakub 
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 
> 8-bit to 16-bit
>
> Yeah, you should also swap mode and code in rtx_def according to 
> Richard suggestion since it will not change the rtx_def data structure.
>
> I think the only problem is the mode in tree data structure.
> 
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-06 09:53
> To: Li, Pan2
> CC: Richard Biener;
> 钟居哲;
> richard.sandiford; Jeff 
> Law;
> gcc-patches;
> palmer; jakub
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 
> 8-bit to 16-bit Hi Pan:
>
> Could you try to apply the following diff and measure again? This 
> makes tree_type_common size unchanged.
>
>
> sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common=
> 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this
> diff)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h index
> af795aa81f98..b8ccfa407ed9 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1680,6 +16

[PATCH V2] [vect]Enhance NARROW FLOAT_EXPR vectorization by truncating integer to lower precision.

2023-05-07 Thread liuhongt via Gcc-patches
> > @@ -4799,7 +4800,8 @@ vect_create_vectorized_demotion_stmts (vec_info 
> > *vinfo, vec *vec_oprnds,
> >stmt_vec_info stmt_info,
> >vec &vec_dsts,
> >gimple_stmt_iterator *gsi,
> > -  slp_tree slp_node, enum tree_code 
> > code)
> > +  slp_tree slp_node, enum tree_code 
> > code,
> > +  bool last_stmt_p)
>
> Can you please document this new parameter?
>
Changed.

>
> I understand what you are doing, but somehow it looks a bit awkward?
> Maybe we should split the NARROW case into NARROW_SRC and NARROW_DST?
> The case of narrowing the source because we know its range isn't a
> good fit for the
> flow.
Changed.

Here's updated patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

Similar like WIDEN FLOAT_EXPR, when direct_optab is not existed, try
intermediate integer type whenever gimple ranger can tell it's safe.

.i.e.
When there's no direct optab for vector long long -> vector float, but
the value range of integer can be represented as int, try vector int
-> vector float if availble.

gcc/ChangeLog:

PR tree-optimization/108804
* tree-vect-patterns.cc (vect_get_range_info): Remove static.
* tree-vect-stmts.cc (vect_create_vectorized_demotion_stmts):
Add new parameter narrow_src_p.
(vectorizable_conversion): Enhance NARROW FLOAT_EXPR
vectorization by truncating to lower precision.
* tree-vectorizer.h (vect_get_range_info): New declare.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108804.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr108804.c |  15 +++
 gcc/tree-vect-patterns.cc|   2 +-
 gcc/tree-vect-stmts.cc   | 135 +--
 gcc/tree-vectorizer.h|   1 +
 4 files changed, 121 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr108804.c

diff --git a/gcc/testsuite/gcc.target/i386/pr108804.c 
b/gcc/testsuite/gcc.target/i386/pr108804.c
new file mode 100644
index 000..2a43c1e1848
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr108804.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -Ofast -fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 1 "vect" } } */
+
+typedef unsigned long long uint64_t;
+uint64_t d[512];
+float f[1024];
+
+void foo() {
+for (int i=0; i<512; ++i) {
+uint64_t k = d[i];
+f[i]=(k & 0x3F30);
+}
+}
+
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index a49b0953977..dd546b488a4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -61,7 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 /* Return true if we have a useful VR_RANGE range for VAR, storing it
in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
 
-static bool
+bool
 vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
 {
   value_range vr;
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 6b7dbfd4a23..3da89a8402d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "tree-vector-builder.h"
 #include "vec-perm-indices.h"
+#include "gimple-range.h"
 #include "tree-ssa-loop-niter.h"
 #include "gimple-fold.h"
 #include "regs.h"
@@ -4791,7 +4792,9 @@ vect_gen_widened_results_half (vec_info *vinfo, enum 
tree_code code,
 
 /* Create vectorized demotion statements for vector operands from VEC_OPRNDS.
For multi-step conversions store the resulting vectors and call the function
-   recursively.  */
+   recursively. When NARROW_SRC_P is true, there's still a conversion after
+   narrowing, don't store the vectors in the SLP_NODE or in vector info of
+   the scalar statement(or in STMT_VINFO_RELATED_STMT chain).  */
 
 static void
 vect_create_vectorized_demotion_stmts (vec_info *vinfo, vec *vec_oprnds,
@@ -4799,7 +4802,8 @@ vect_create_vectorized_demotion_stmts (vec_info *vinfo, 
vec *vec_oprnds,
   stmt_vec_info stmt_info,
   vec &vec_dsts,
   gimple_stmt_iterator *gsi,
-  slp_tree slp_node, enum tree_code code)
+  slp_tree slp_node, enum tree_code code,
+  bool narrow_src_p)
 {
   unsigned int i;
   tree vop0, vop1, new_tmp, vec_dest;
@@ -4815,9 +4819,9 @@ vect_create_vectorized_demotion_stmts (vec_info *vinfo, 
vec *vec_oprnds,
   new_tmp = make_ssa_name (vec_dest, new_stmt);
   gimple_assign_set_lhs (new_stmt, new_tmp);
   vect_finish_stmt_generation (vinfo, stmt_in

Re: [PATCH 1/2] [RISC-V] disable shrink-wrap-separate if zcmp enabled.

2023-05-07 Thread Kito Cheng via Gcc-patches
shrink-wraping already gated by Os so I think we don't need add more
gate here, unless we are trying to claim force optimize for size if
zcmp is present.

On Sat, May 6, 2023 at 4:41 PM Fei Gao  wrote:
>
> zcmp aims to reduce code size, while shrink-wrap-separate prefers
> speed to code size. So disable shrink-wrap-separate if zcmp
> enabled, just like what save-restore has done.
>
> author: Zhangjin Liao liaozhang...@eswincomputing.com
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_get_separate_components):
> ---
>  gcc/config/riscv/riscv.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 45a63cab9c9..629e5e45cac 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5729,7 +5729,8 @@ riscv_get_separate_components (void)
>
>if (riscv_use_save_libcall (&cfun->machine->frame)
>|| cfun->machine->interrupt_handler_p
> -  || !cfun->machine->frame.gp_sp_offset.is_constant ())
> +  || !cfun->machine->frame.gp_sp_offset.is_constant ()
> +  || TARGET_ZCMP)
>  return components;
>
>offset = cfun->machine->frame.gp_sp_offset.to_constant ();
> --
> 2.17.1
>


Re: [PATCH 2/2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-05-07 Thread Kito Cheng via Gcc-patches
diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
new file mode 100644
index 000..1c2f390269e
--- /dev/null
+++ b/gcc/config/riscv/zc.md
@@ -0,0 +1,55 @@
...
+(define_insn "gpr_multi_pop"
+  [(unspec_volatile [(match_operand 0 "const_int_operand")
+ (match_operand 1 "const_int_operand")]
+UNSPECV_GPR_MULTI_POP)]

I would strongly suggest modeling the right memory and register access
here correctly instead of using unspec,
and same for other two patterns.

That will help GCC know the semantics of this operation.


[PATCH] RISC-V: Fix ugly && incorrect codes of RVV auto-vectorization

2023-05-07 Thread juzhe . zhong
From: Juzhe-Zhong 

1. Add movmisalign pattern for TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
   targethook, current RISC-V has supported this target hook, we can't make
   it supported without movmisalign pattern.

2. Remove global extern of get_mask_policy_no_pred && get_tail_policy_no_pred.
   These 2 functions are comming from intrinsic builtin frameworks.
   We are sure we don't need them in auto-vectorization implementation.

3. Refine mask mode implementation.

4. We should not have "riscv_vector_" in riscv_vector namspace since it
   makes the codes inconsistent and ugly.
   
   For example:
   Before this patch:
   static opt_machine_mode
   riscv_get_mask_mode (machine_mode mode)
   {
 machine_mode mask_mode = VOIDmode;
 if (TARGET_VECTOR && riscv_vector::riscv_vector_get_mask_mode 
(mode).exists (&mask_mode))
  return mask_mode;
   ..

   After this patch:
   riscv_get_mask_mode (machine_mode mode)
   {
 machine_mode mask_mode = VOIDmode;
 if (TARGET_VECTOR && riscv_vector::get_mask_mode (mode).exists 
(&mask_mode))
  return mask_mode;
   ..

5. Fix fail testcase fixed-vlmax-1.c.

gcc/ChangeLog:

* config/riscv/autovec.md (movmisalign): New pattern.
* config/riscv/riscv-protos.h (riscv_vector_mask_mode_p): Delete.
(riscv_vector_get_mask_mode): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(get_mask_mode): New function.
* config/riscv/riscv-v.cc (get_mask_policy_no_pred): Delete.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(get_mask_mode): New function.
* config/riscv/riscv-vector-builtins.cc (use_real_merge_p): Remove 
global extern.
(get_tail_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred): 
Ditto.
(get_mask_policy_for_pred): Ditto
* config/riscv/riscv.cc (riscv_get_mask_mode): Refine codes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Fix typo.

---
 gcc/config/riscv/autovec.md   | 11 ++
 gcc/config/riscv/riscv-protos.h   |  5 +--
 gcc/config/riscv/riscv-v.cc   | 36 ++-
 gcc/config/riscv/riscv-vector-builtins.cc |  4 +--
 gcc/config/riscv/riscv-vector-builtins.h  |  3 --
 gcc/config/riscv/riscv.cc |  3 +-
 .../riscv/rvv/autovec/fixed-vlmax-1.c |  4 +--
 7 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b5d46ff57ab..f1c5ff5951b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -47,3 +47,14 @@
  operands[1], operands[2], mode);
   DONE;
 })
+
+(define_expand "movmisalign"
+  [(set (match_operand:V 0 "nonimmediate_operand")
+   (match_operand:V 1 "general_operand"))]
+  "TARGET_VECTOR"
+  {
+/* Equivalent to a normal move for our purpooses.  */
+emit_move_insn (operands[0], operands[1]);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index d83ea2c77e4..c0293a306f9 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -218,10 +218,7 @@ bool slide1_sew64_helper (int, machine_mode, machine_mode,
 rtx gen_avl_for_scalar_move (rtx);
 void expand_tuple_move (machine_mode, rtx *);
 machine_mode preferred_simd_mode (scalar_mode);
-extern bool riscv_vector_mask_mode_p (machine_mode);
-extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
-extern rtx get_mask_policy_no_pred (void);
-extern rtx get_tail_policy_no_pred (void);
+opt_machine_mode get_mask_mode (machine_mode);
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8c7f3206771..7ca49ca67c1 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -43,7 +43,6 @@
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
-#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"
 #include "targhooks.h"
 
@@ -479,43 +478,12 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }
 
-/* Return the mask policy for no predication.  */
-rtx
-get_mask_policy_no_pred (void)
-{
-  return get_mask_policy_for_pred (PRED_TYPE_none);
-}
-
-/* Return the tail policy for no predication.  */
-rtx
-get_tail_policy_no_pred (void)
-{
-  return get_tail_policy_for_pred (PRED_TYPE_none);
-}
-
-/* Return true if it is a RVV mask mode.  */
-bool
-riscv_vector_mask_mode_p (machine_mode mode)
-{
-  return (mode == VNx1BImode || mode == VNx2BImode || mode == VNx4BImode
- || mode == VNx8BImode || mode == VNx16BImode || mode == VNx32BImode
- || mode == VNx64BImode);
-}
-
 /* Return the appropriate mask mode for MODE.  */
 
 opt_machine_mode
-riscv_vector_get_m

[PATCH] Add a != MIN/MAX_VALUE_CST ? CST-+1 : a to minmax_from_comparison

2023-05-07 Thread Andrew Pinski via Gcc-patches
This patch adds the support for match that was implemented for PR 87913 in 
phiopt.
It implements it by adding support to minmax_from_comparison for the check.
It uses the range information if available which allows to produce MIN/MAX 
expression
when comparing against the lower/upper bound of the range instead of lower/upper
of the type.

minmax-20.c is the new testcase which tests the ranges part.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* fold-const.cc (minmax_from_comparison): Add support for NE_EXPR.
* match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern):
Add ne as a possible cmp.
((a CMP b) ? minmax : minmax pattern): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-20.c: New test.
---
 gcc/fold-const.cc | 26 +++
 gcc/match.pd  |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c | 12 +++
 3 files changed, 40 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index db54bfc5662..d90671b9975 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -173,6 +173,19 @@ minmax_from_comparison (tree_code cmp, tree exp0, tree 
exp1, tree exp2, tree exp
  /* X > Y - 1 equals to X >= Y.  */
  if (cmp == GT_EXPR)
code = GE_EXPR;
+ /* a != MIN_RANGE ? a : MIN_RANGE+1 -> MAX_EXPR+1, 
a> */
+ if (cmp == NE_EXPR && TREE_CODE (exp0) == SSA_NAME)
+   {
+ value_range r;
+ get_range_query (cfun)->range_of_expr (r, exp0);
+ if (r.undefined_p ())
+   r.set_varying (TREE_TYPE (exp0));
+
+ widest_int min = widest_int::from (r.lower_bound (),
+TYPE_SIGN (TREE_TYPE (exp0)));
+ if (min == wi::to_widest (exp1))
+   code = MAX_EXPR;
+   }
}
   if (wi::to_widest (exp1) == (wi::to_widest (exp3) + 1))
{
@@ -182,6 +195,19 @@ minmax_from_comparison (tree_code cmp, tree exp0, tree 
exp1, tree exp2, tree exp
  /* X >= Y + 1 equals to X > Y.  */
  if (cmp == GE_EXPR)
  code = GT_EXPR;
+ /* a != MAX_RANGE ? a : MAX_RANGE-1 -> MIN_EXPR-1, 
a> */
+ if (cmp == NE_EXPR && TREE_CODE (exp0) == SSA_NAME)
+   {
+ value_range r;
+ get_range_query (cfun)->range_of_expr (r, exp0);
+ if (r.undefined_p ())
+   r.set_varying (TREE_TYPE (exp0));
+
+ widest_int max = widest_int::from (r.upper_bound (),
+TYPE_SIGN (TREE_TYPE (exp0)));
+ if (max == wi::to_widest (exp1))
+   code = MIN_EXPR;
+   }
}
 }
   if (code != ERROR_MARK
diff --git a/gcc/match.pd b/gcc/match.pd
index a55ede838cd..95f7e9a6abc 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4751,7 +4751,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
Case 2)
(cond (eq (convert1? x) c1) (convert2? x) c2) -> (cond (eq x c1) c1 c2).  */
-(for cmp (lt le gt ge eq)
+(for cmp (lt le gt ge eq ne)
  (simplify
   (cond (cmp (convert1? @1) INTEGER_CST@3) (convert2? @1) INTEGER_CST@2)
   (with
@@ -4942,7 +4942,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Optimize (a CMP b) ? minmax : minmax
to minmax, c> */
 (for minmax (min max)
- (for cmp (lt le gt ge)
+ (for cmp (lt le gt ge ne)
   (simplify
(cond (cmp @1 @3) (minmax:c @1 @4) (minmax:c @2 @4))
(with
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c
new file mode 100644
index 000..481c375f5f9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt2" } */
+
+int f(int num)
+{
+  if (num < 3) __builtin_unreachable();
+  return num != 3 ?  num : 4;
+}
+
+/* In phiopt2 with the range information, this should be turned into
+   a MAX_EXPR.  */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "phiopt2" } } */
-- 
2.31.1



Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-07 Thread Kewen.Lin via Gcc-patches
Hi,

on 2023/5/4 21:25, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> This patch is fixing V3 patch:
> https://patchwork.sourceware.org/project/gcc/patch/20230407014741.139387-1-juzhe.zh...@rivai.ai/
> 
> Fix issues according to Richard Sandiford && Richard Biener.
> 
> 1. Rename WHILE_LEN pattern into SELECT_VL according to Richard Sandiford.
> 2. Support multiple-rgroup for non-SLP auto-vectorization.
> 
>For vec_pack_trunc pattern (multi-rgroup of non-SLP), we generate the 
> total length:
> 
>  _36 = MIN_EXPR ;
> 
>  First length (MIN (X, VF/N)):
>loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;
> 
>  Second length (X - MIN (X, 1 * VF/N)):
>loop_len_16 = _36 - loop_len_15;
> 
>  Third length (X - MIN (X, 2 * VF/N)):
>_38 = MIN_EXPR <_36, POLY_INT_CST [4, 4]>;
>loop_len_17 = _36 - _38;
> 
>  Forth length (X - MIN (X, 3 * VF/N)):
>_39 = MIN_EXPR <_36, POLY_INT_CST [6, 6]>;
>loop_len_18 = _36 - _39;
> 
> The reason that I use MIN_EXPR instead of SELECT_VL to calculate total length 
> since using SELECT_VL
> to adapt induction IV consumes more instructions than just using MIN_EXPR. 
> Also, during testing,
> I found it's hard to adjust length correctly according to SELECT_VL.
> 
> So, this patch we only use SELECT_VL for single-rgroup with single length 
> control.
> 
> 3. Fix document of select_vl for Richard Biener (remove mode N).
> 4. Fix comments of vect_set_loop_controls_by_select_vl according to Richard 
> Biener.
> 5. Keep loop_vinfo as first parameter for "vect_get_loop_len".
> 6. make requirement of get_while_len_data_ref_ptr outside, let it to be gated 
> at the caller site.
> 
> More comments from Richard Biener:
>>> So it's not actually saturating.  The saturating operation is done by 
>>> .WHILE_LEN?
> I define the outcome of SELECT_VL (n, vf)  (WHILE_LEN) = IN_RANGE (0, min (n, 
> vf)) will make 
> the loop control counter never underflow zero.
> 
>>> I see.  I wonder if it makes sense to leave .WHILE_LEN aside for a start,
>>> the above scheme should also work for single rgroups, no?
>>> As said, it _looks_ like you can progress without .WHILE_LEN and using
>>> .WHILE_LEN is a pure optimization?
> Yes, SELECT_VL (WHILE_LEN) is pure optimization for single-rgroup and allow
> target adjust any length = INRANGE (0, min (n, vf)) each iteration.
> 
> Let me known if I missed something for the V3 patch.
> Thanks.
> 
> ---
>  gcc/cfgloopmanip.cc|   2 +-
>  gcc/doc/md.texi|  34 +++
>  gcc/gimple-loop-interchange.cc |   2 +-
>  gcc/internal-fn.def|   1 +
>  gcc/optabs.def |   1 +
>  gcc/tree-ssa-loop-ivcanon.cc   |   2 +-
>  gcc/tree-ssa-loop-ivopts.cc|   2 +-
>  gcc/tree-ssa-loop-manip.cc |  18 +-
>  gcc/tree-ssa-loop-manip.h  |   4 +-
>  gcc/tree-vect-data-refs.cc |   8 +-
>  gcc/tree-vect-loop-manip.cc| 374 -
>  gcc/tree-vect-loop.cc  |  32 ++-
>  gcc/tree-vect-stmts.cc |  89 +++-
>  gcc/tree-vectorizer.h  |   4 +-
>  14 files changed, 535 insertions(+), 38 deletions(-)
> 
> diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
> index 0e3ad8ed742..6e09dcbb0b1 100644
> --- a/gcc/cfgloopmanip.cc
> +++ b/gcc/cfgloopmanip.cc
> @@ -826,7 +826,7 @@ create_empty_loop_on_edge (edge entry_edge,
>  }
>  
>gsi = gsi_last_bb (loop_header);
> -  create_iv (initial_value, stride, iv, loop, &gsi, false,
> +  create_iv (initial_value, PLUS_EXPR, stride, iv, loop, &gsi, false,
>iv_before, iv_after);
>  
>/* Insert loop exit condition.  */
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index cc4a93a8763..99cf0cdbdca 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,40 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>  @end smallexample
>  
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of active elements in vector will be updated 
> value.
> +operand 1 is the total elements need to be updated value.
> +operand 2 is the vectorization factor.
> +The value of operand 0 is target dependent and flexible in each iteration.
> +The operation of this pattern can be:
> +
> +@smallexample
> +Case 1:
> +operand0 = MIN (operand1, operand2);
> +operand2 can be const_poly_int or poly_int related to vector mode size.
> +Some target like RISC-V has a standalone instruction to get MIN (n, MODE 
> SIZE) so
> +that we can reduce a use of general purpose register.
> +
> +In this case, only the last iteration of the loop is partial iteration.
> +@end smallexample
> +
> +@smallexample
> +Case 2:
> +if (operand1 <= operand2)
> +  operand0 = operand1;
> +else if (operand1 < 2 * operand2)
> +  operand0 = IN_RANGE (ceil (operand1 / 2), operand2);
> +else
> +  operand0 = operand2;
> +
> +This case will evenly distribute work over the last 2 iterations o

Re: ping Re: [PATCH 3/3] Fortran: Fix mpz and mpfr memory leaks

2023-05-07 Thread Bernhard Reutner-Fischer via Gcc-patches
On Mon, 17 Apr 2023 15:18:27 -0700
Steve Kargl  wrote:
> On Mon, Apr 17, 2023 at 09:47:50PM +0200, Bernhard Reutner-Fischer via 
> Fortran wrote:
> > On Mon, 03 Apr 2023 23:42:06 +0200
> > Bernhard Reutner-Fischer  wrote:
> >   
> > > On 3 April 2023 21:50:49 CEST, Harald Anlauf  wrote:  
> > > >Hi Bernhard,
> > > >
> > > >there is neither context nor a related PR with a testcase showing
> > > >that this patch fixes issues seen there.
> > > 
> > > Yes, i forgot to mention the PR:
> > > 
> > > PR fortran/68800
> > > 
> > > I did not construct individual test cases but it should be obvious that 
> > > we should not leak these.
> > >   
> > > >
> > > >On 4/2/23 17:05, Bernhard Reutner-Fischer via Gcc-patches wrote:
> > > >> From: Bernhard Reutner-Fischer 
> > > >> 
> > > >> Cc: fort...@gcc.gnu.org
> > > >> 
> > > >> gcc/fortran/ChangeLog:
> > > >> 
> > > >>* array.cc (gfc_ref_dimen_size): Free mpz memory before ICEing.
> > > >>* expr.cc (find_array_section): Fix mpz memory leak.
> > > >>* simplify.cc (gfc_simplify_reshape): Fix mpz memory leaks in
> > > >>error paths.
> > > >>(gfc_simplify_set_exponent): Fix mpfr memory leak.
> > > >> ---
> > > >>   gcc/fortran/array.cc| 3 +++
> > > >>   gcc/fortran/expr.cc | 8 
> > > >>   gcc/fortran/simplify.cc | 7 ++-
> > > >>   3 files changed, 13 insertions(+), 5 deletions(-)
> > > >> 
> > > >> diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc
> > > >> index be5eb8b6a0f..8b1e816a859 100644
> > > >> --- a/gcc/fortran/array.cc
> > > >> +++ b/gcc/fortran/array.cc
> > > >> @@ -2541,6 +2541,9 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int 
> > > >> dimen, mpz_t *result, mpz_t *end)
> > > >> return t;
> > > >> 
> > > >>   default:
> > > >> +  mpz_clear (lower);
> > > >> +  mpz_clear (stride);
> > > >> +  mpz_clear (upper);
> > > >> gfc_internal_error ("gfc_ref_dimen_size(): Bad dimen_type");
> > > >>   }
> > > >
> > > >  What is the point of clearing variables before issuing
> > > >  a gfc_internal_error?
> > > 
> > > To make it obvious that we are aware that we allocated these.  
> 
> I must admit I agree with Harald on this portion
> of the diff.  What's the point?  There is alot more
> allocated than just those 3 mzp_t variables when the
> internal error occurs.  For example, gfortran does not
> walk the namespaces and free those before the ICE.
> I suppose silencing valgrind might be sufficient 
> reason for the clutter.  So, ok.

I've dropped this hunk and pushed the rest as r14-567-g2521390dd2f8e5
Harald fixed the leak of expr in gfc_simplify_set_exponent in the
meantime.
thanks,


Re: [PATCH][stage1] Remove conditionals around free()

2023-05-07 Thread Bernhard Reutner-Fischer via Gcc-patches
On Wed, 1 Mar 2023 16:07:14 -0800
Steve Kargl  wrote:

> On Wed, Mar 01, 2023 at 10:28:56PM +0100, Bernhard Reutner-Fischer via 
> Fortran wrote:
> >  libgfortran/caf/single.c |6 ++
> >  libgfortran/io/async.c   |6 ++
> >  libgfortran/io/format.c  |3 +--
> >  libgfortran/io/transfer.c|6 ++
> >  libgfortran/io/unix.c|3 +--  
> 
> The Fortran ones are OK.
> 

I've pushed the fortran and libgfortran bits as r14-568-gca2f64d5d08c16
thanks,


Re: [PATCH] Move substitute_and_fold over to use simple_dce_from_worklist

2023-05-07 Thread Richard Biener via Gcc-patches
On Fri, May 5, 2023 at 5:18 PM Andrew Pinski via Gcc-patches
 wrote:
>
> While looking into a different issue, I noticed that it
> would take until the second forwprop pass to do some
> forward proping and it was because the ssa name was
> used more than once but the second statement was
> "dead" and we don't remove that until much later.
>
> So this uses simple_dce_from_worklist instead of manually
> removing of the known unused statements instead.
> Propagate engine does not do a cleanupcfg afterwards either but manually
> cleans up possible EH edges so simple_dce_from_worklist
> needs to communicate that back to the propagate engine.
>
> Some testcases needed to be updated/changed even because of better 
> optimization.
> gcc.dg/pr81192.c even had to be changed to be using the gimple FE so it would
> be less fragile in the future too.
> gcc.dg/tree-ssa/pr98737-1.c was failing because __atomic_fetch_ was being 
> matched
> but in those cases, the result was not being used so both __atomic_fetch_ and
> __atomic_x_and_fetch_ are valid choices and would not make a code generation 
> difference.
> evrp7.c, evrp8.c, vrp35.c, vrp36.c: just needed a slightly change as the 
> removal message
> is different slightly.
> kernels-alias-8.c: ccp1 is able to remove an unused load which causes ealias 
> to have
> one less load to analysis so update the expected scan #.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/109691
> * tree-ssa-dce.cc (simple_dce_from_worklist): Add need_eh_cleanup
> argument.
> If the removed statement can throw, have need_eh_cleanup
> include the bb of that statement.
> * tree-ssa-dce.h (simple_dce_from_worklist): Update declaration.
> * tree-ssa-propagate.cc (struct prop_stats_d): Remove
> num_dce.
> (substitute_and_fold_dom_walker::substitute_and_fold_dom_walker):
> Initialize dceworklist instead of stmts_to_remove.
> (substitute_and_fold_dom_walker::~substitute_and_fold_dom_walker):
> Destore dceworklist instead of stmts_to_remove.
> (substitute_and_fold_dom_walker::before_dom_children):
> Set dceworklist instead of adding to stmts_to_remove.
> (substitute_and_fold_engine::substitute_and_fold):
> Call simple_dce_from_worklist instead of poping
> from the list.
> Don't update the stat on removal statements.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/evrp7.c: Update for output change.
> * gcc.dg/tree-ssa/evrp8.c: Likewise.
> * gcc.dg/tree-ssa/vrp35.c: Likewise.
> * gcc.dg/tree-ssa/vrp36.c: Likewise.
> * gcc.dg/tree-ssa/pr98737-1.c: Update scan-tree-dump-not
> to check for assignment too instead of just a call.
> * c-c++-common/goacc/kernels-alias-8.c: Update test
> for removal of load.
> * gcc.dg/pr81192.c: Rewrite testcase in gimple based test.
> ---
>  .../c-c++-common/goacc/kernels-alias-8.c  |  6 +-
>  gcc/testsuite/gcc.dg/pr81192.c| 64 ---
>  gcc/testsuite/gcc.dg/tree-ssa/evrp7.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/evrp8.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/pr98737-1.c |  7 +-
>  gcc/testsuite/gcc.dg/tree-ssa/vrp35.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/vrp36.c |  2 +-
>  gcc/tree-ssa-dce.cc   |  7 +-
>  gcc/tree-ssa-dce.h|  2 +-
>  gcc/tree-ssa-propagate.cc | 39 ++-
>  10 files changed, 82 insertions(+), 51 deletions(-)
>
> diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c 
> b/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
> index 69200ccf192..c3922e33241 100644
> --- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
> @@ -16,7 +16,9 @@ foo (int *a, size_t n)
>}
>  }
>
> -/* Only the omp_data_i related loads should be annotated with cliques.  */
> -/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
> +/* Only the omp_data_i related loads should be annotated with cliques.
> +   Note ccp can remove one of the omp_data_i loads which is why there
> +   is there only one clique base still there.  */
> +/* { dg-final { scan-tree-dump-times "clique 1 base 1" 1 "ealias" } } */
>  /* { dg-final { scan-tree-dump-times "(?n)clique 1 base 0" 2 "ealias" } } */
>
> diff --git a/gcc/testsuite/gcc.dg/pr81192.c b/gcc/testsuite/gcc.dg/pr81192.c
> index 6cab6056558..f6d201ee71a 100644
> --- a/gcc/testsuite/gcc.dg/pr81192.c
> +++ b/gcc/testsuite/gcc.dg/pr81192.c
> @@ -1,5 +1,58 @@
> -/* { dg-options "-Os -fdump-tree-pre-details -fdisable-tree-evrp 
> -fno-tree-dse" } */
> +/* { dg-options "-Os -fgimple -fdump-tree-pre-details -fdisable-tree-evrp 
> -fno-tree-dse" } */
>
> +#if __SIZEOF_INT__ == 2
> +#define unsigned __UINT32_TY

Re: [PATCH-1, rs6000] Change mode and insn condition for scalar extract exp instruction

2023-05-07 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/1/4 14:16, HAO CHEN GUI wrote:
> Hi,
>   This patch changes the return type of __builtin_vsx_scalar_extract_exp
> from const signed long to const signed int, as the exponent can be put in
> a signed int. It is also inline with the external interface definition of
> the bif. The mode of exponent operand in "xsxexpdp" is changed to GPR mode
> and TARGET_64BIT check is removed, as the instruction can be executed on
> a 32-bit environment.
> 
>   The test cases are modified according to the changes of expand pattern.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> Gui Haochen
> 
> ChangeLog
> 2022-12-23  Haochen Gui  
> 
> gcc/
>   * config/rs6000/rs6000-builtins.def
>   (__builtin_vsx_scalar_extract_exp): Set return type to const unsigned

typo, s/unsigned/signed/

>   int and set its bif-pattern to xsxexpdp_si, move it from power9-64 to
>   power9 catalog.
>   * config/rs6000/vsx.md (xsxexpdp): Rename to ...
>   (xsxexpdp_): ..., set mode of operand 0 to GPR and remove
>   TARGET_64BIT check.
>   * doc/extend.texi (scalar_extract_exp): Remove 64-bit environment
>   requirement when it has a 64-bit argument.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/scalar-extract-exp-0.c: Remove lp64 check.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-1.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Deleted as the case is
>   invalid.

s/Deleted/Delete/

OK for trunk with these nits fixed, thanks!

BR,
Kewen

>   * gcc.target/powerpc/bfp/scalar-extract-exp-6.c: Remove lp64 check.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f76f54793d7..a8f1d3f1b3d 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2833,6 +2833,8 @@
>const signed int __builtin_dtstsfi_ov_td (const int<6>, _Decimal128);
>  TSTSFI_OV_TD dfptstsfi_unordered_td {}
> 
> +  const signed int  __builtin_vsx_scalar_extract_exp (double);
> +VSEEDP xsxexpdp_si {}
> 
>  [power9-64]
>void __builtin_altivec_xst_len_r (vsc, void *, long);
> @@ -2847,9 +2849,6 @@
>pure vsc __builtin_vsx_lxvl (const void *, signed long);
>  LXVL lxvl {}
> 
> -  const signed long __builtin_vsx_scalar_extract_exp (double);
> -VSEEDP xsxexpdp {}
> -
>const signed long __builtin_vsx_scalar_extract_sig (double);
>  VSESDP xsxsigdp {}
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 992fbc983be..229c26c3a61 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5089,11 +5089,11 @@ (define_insn "xsxexpqp_"
>[(set_attr "type" "vecmove")])
> 
>  ;; VSX Scalar Extract Exponent Double-Precision
> -(define_insn "xsxexpdp"
> -  [(set (match_operand:DI 0 "register_operand" "=r")
> - (unspec:DI [(match_operand:DF 1 "vsx_register_operand" "wa")]
> +(define_insn "xsxexpdp_"
> +  [(set (match_operand:GPR 0 "register_operand" "=r")
> + (unspec:GPR [(match_operand:DF 1 "vsx_register_operand" "wa")]
>UNSPEC_VSX_SXEXPDP))]
> -  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "TARGET_P9_VECTOR"
>"xsxexpdp %0,%x1"
>[(set_attr "type" "integer")])
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index d3812fa55b0..7c087967234 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -19598,7 +19598,10 @@ bool scalar_test_neg (double source);
>  bool scalar_test_neg (__ieee128 source);
>  @end smallexample
> 
> -The @code{scalar_extract_exp} and @code{scalar_extract_sig}
> +The @code{scalar_extract_exp} with a 64-bit source argument
> +function requires an environment supporting ISA 3.0 or later.
> +The @code{scalar_extract_exp} with a 128-bit source argument
> +and @code{scalar_extract_sig}
>  functions require a 64-bit environment supporting ISA 3.0 or later.
>  The @code{scalar_extract_exp} and @code{scalar_extract_sig} built-in
>  functions return the significand and the biased exponent value
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-0.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-0.c
> index 35bf1b240f3..d971833748e 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-0.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-0.c
> @@ -1,9 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> 
> -/* This test should succeed only on 64-bit configurations.  */
>  #include 
> 
>  unsigned int
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-1.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-1.c
> index 9737762c1d4..1cb438f9b70 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extra

RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-07 Thread Alexander Monakov via Gcc-patches
On Fri, 5 May 2023, Alexander Monakov wrote:

> > > gimple-head-export.cc does not exist.
> > > 
> > > gimple-match-exports.cc is not a generated file. It's under source 
> > > control and
> > > edited independently from genmatch.cc. It is compiled separately, 
> > > producing
> > > gimple-match-exports.o.
> > > 
> > > gimple-match-head.cc is also not a generated file, also under source 
> > > control.
> > > It is transitively included into gimple-match-N.o files. If it changes, 
> > > they will be
> > > rebuilt. This is not changed by my patch.
> > > 
> > > gimple-match-auto.h is a generated file. It depends on s-gimple-match 
> > > stamp
> > > file, which in turn depends on genmatch and match.pd. If either changes, 
> > > the
> > > rule for the stamp file triggers. gimple-match-N.o files also depend on 
> > > the
> > > stamp file, so they will be rebuilt as well.
> > 
> > s-gimple-match does not depend on gimple-match-head.cc. if it changes the 
> > stamp
> > is not invalidated. 
> 
> Right, this is correct: there's no need to rerun the recipe for the stamp,
> because contents of gimple-match-head.cc do not affect it.
> 
> > This happens to work because gimple-match-N.cc does depend on 
> > gimple-match-head.cc,
> > but if the gimple-match-N.cc already exists then nothing changes.
> 
> No, if gimple-match-N.cc already exist, make notices they are out-of-date via
> 
> $(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc; @true
> 
> and this triggers rebuilding gimple-match-N.o.
> 
> I tested this. After 'touch gimple-match-head.cc' all ten gimple-match-N.o 
> files
> are rebuilt.

My explanation was incomplete here. The gcc/Makefile.in rule quoted above
applies to .cc files and does not trigger rebuilds of .o files on its own.
The reason .o files get rebuilt is implicit dependency tracking: initial
build records header dependencies in gcc/.deps/*.Po files, and incremental
rebuild sees that gimple-match-1.o depends on gimple-match-head.cc.

Alexander


Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-07 Thread Richard Biener via Gcc-patches
On Fri, 5 May 2023, Jakub Jelinek wrote:

> On Fri, May 05, 2023 at 01:32:02PM -0400, Jason Merrill wrote:
> > > --- gcc/ada/gcc-interface/utils2.cc.jj2023-01-16 23:19:05.539727388 
> > > +0100
> > > +++ gcc/ada/gcc-interface/utils2.cc   2023-05-05 15:37:30.193990948 
> > > +0200
> > > @@ -3332,6 +3332,7 @@ gnat_invariant_expr (tree expr)
> > >   case IMAGPART_EXPR:
> > >   case VIEW_CONVERT_EXPR:
> > >   CASE_CONVERT:
> > > + case SAVE_EXPR:
> > 
> > I guess doing this would allow gnat_invariant_expr to handle
> > DECL_INVARIANT_P that save_expr doesn't know about.  But it seems that it
> > makes the same assumption as tree_invariant_p_1 about the pointed-to object
> > not changing:
> > 
> > > case INDIRECT_REF:
> > >   if ((!invariant_p && !TREE_READONLY (t)) || TREE_SIDE_EFFECTS 
> > > (t))
> > > return NULL_TREE;
> > 
> > I don't know if this assumption is any more valid in Ada than in C/C++.
> 
> I think we really need Eric (as one who e.g. introduced the
> DECL_INVARIANT_P apparently for this kind of stuff) to have a look at that on 
> the
> Ada side.
> 
> The question is if the posted tree.cc (smallest) patch + 3 new testcases
> + the 7 ada testsuite workarounds are ok for trunk if it passes
> bootstrap/regtest, then I'd file a PR about the Ada regression and only once
> it is dealt with would consider backporting, or if we need to wait for Eric
> before making progress.

I wonder if we should defer some of the choices to a langhook
like make the tree_invariant_p_1 a langhook invocation with the
default to call tree_invariant_p_1.  After lowering we can reset
the langhook to the default.

I think we at least need the generic save_expr to be langhook
aware during the time we fold the original non-gimplified
GENERIC IL (but no longer for the generated GENERIC
and to be gimplified IL the middle-end eventually generates
later).

Richard.


Re: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-07 Thread juzhe.zh...@rivai.ai
Hi, Kewen.>> Sorry for chiming in, I had some concern here.
>> We already have some handlings for partial vector in length in 
>> vect_set_loop_controls_directly
>>(actually it deals with both mask and length), the proposed 
>>vect_set_loop_controls_by_select_vl
>>for select_vl looks like a variant for partial vector in length (comparing to 
>>the normal MIN),
>>and also adopts decrement IV.  IMHO, it seems better to modify/extend the 
>>current handling in
>>vect_set_loop_controls_directly for length, or factor out the handlings for 
>>length there and
>>extend the factored one.  Otherwise, it means we have two sets of handlings 
>>for partial vector
>>in lengths, it looks bad to maintain.  As the previous discussion, adopting 
>>decrement IV is an
>>enhancement for loop control, it's good for both cases w/ or w/o select_vl.  
>>If the above
>>understanding is correct, the possible steps seem to be:
>>  - factor out the handling for length (* optional)
>>  - modify it with decrement IV
>>  - extend it with select_vl.
>>In future if some RVV vendor wants to degenerate select_vl to min, it can 
>>just adopt the same
>>handlings with min by not defining select_vl optab.
You mean like this:
doing this inside vect_set_loop_controls_directly ?
if (use_while_len_p)
  return vect_set_loop_controls_by_while_len (...)

Thanks.


juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-05-08 13:35
To: juzhe.zhong; rguenther; richard.sandiford
CC: gcc-patches
Subject: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by 
variable amount support
Hi,
 
on 2023/5/4 21:25, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> This patch is fixing V3 patch:
> https://patchwork.sourceware.org/project/gcc/patch/20230407014741.139387-1-juzhe.zh...@rivai.ai/
> 
> Fix issues according to Richard Sandiford && Richard Biener.
> 
> 1. Rename WHILE_LEN pattern into SELECT_VL according to Richard Sandiford.
> 2. Support multiple-rgroup for non-SLP auto-vectorization.
> 
>For vec_pack_trunc pattern (multi-rgroup of non-SLP), we generate the 
> total length:
> 
>  _36 = MIN_EXPR ;
> 
>  First length (MIN (X, VF/N)):
>loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;
> 
>  Second length (X - MIN (X, 1 * VF/N)):
>loop_len_16 = _36 - loop_len_15;
> 
>  Third length (X - MIN (X, 2 * VF/N)):
>_38 = MIN_EXPR <_36, POLY_INT_CST [4, 4]>;
>loop_len_17 = _36 - _38;
> 
>  Forth length (X - MIN (X, 3 * VF/N)):
>_39 = MIN_EXPR <_36, POLY_INT_CST [6, 6]>;
>loop_len_18 = _36 - _39;
> 
> The reason that I use MIN_EXPR instead of SELECT_VL to calculate total length 
> since using SELECT_VL
> to adapt induction IV consumes more instructions than just using MIN_EXPR. 
> Also, during testing,
> I found it's hard to adjust length correctly according to SELECT_VL.
> 
> So, this patch we only use SELECT_VL for single-rgroup with single length 
> control.
> 
> 3. Fix document of select_vl for Richard Biener (remove mode N).
> 4. Fix comments of vect_set_loop_controls_by_select_vl according to Richard 
> Biener.
> 5. Keep loop_vinfo as first parameter for "vect_get_loop_len".
> 6. make requirement of get_while_len_data_ref_ptr outside, let it to be gated 
> at the caller site.
> 
> More comments from Richard Biener:
>>> So it's not actually saturating.  The saturating operation is done by 
>>> .WHILE_LEN?
> I define the outcome of SELECT_VL (n, vf)  (WHILE_LEN) = IN_RANGE (0, min (n, 
> vf)) will make 
> the loop control counter never underflow zero.
> 
>>> I see.  I wonder if it makes sense to leave .WHILE_LEN aside for a start,
>>> the above scheme should also work for single rgroups, no?
>>> As said, it _looks_ like you can progress without .WHILE_LEN and using
>>> .WHILE_LEN is a pure optimization?
> Yes, SELECT_VL (WHILE_LEN) is pure optimization for single-rgroup and allow
> target adjust any length = INRANGE (0, min (n, vf)) each iteration.
> 
> Let me known if I missed something for the V3 patch.
> Thanks.
> 
> ---
>  gcc/cfgloopmanip.cc|   2 +-
>  gcc/doc/md.texi|  34 +++
>  gcc/gimple-loop-interchange.cc |   2 +-
>  gcc/internal-fn.def|   1 +
>  gcc/optabs.def |   1 +
>  gcc/tree-ssa-loop-ivcanon.cc   |   2 +-
>  gcc/tree-ssa-loop-ivopts.cc|   2 +-
>  gcc/tree-ssa-loop-manip.cc |  18 +-
>  gcc/tree-ssa-loop-manip.h  |   4 +-
>  gcc/tree-vect-data-refs.cc |   8 +-
>  gcc/tree-vect-loop-manip.cc| 374 -
>  gcc/tree-vect-loop.cc  |  32 ++-
>  gcc/tree-vect-stmts.cc |  89 +++-
>  gcc/tree-vectorizer.h  |   4 +-
>  14 files changed, 535 insertions(+), 38 deletions(-)
> 
> diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
> index 0e3ad8ed742..6e09dcbb0b1 100644
> --- a/gcc/cfgloopmanip.cc
> +++ b/gcc/cfgloopmanip.cc
> @@ -826,7 +826,7 @@ create_empty_loop_on_edge (edge entry_edge,
>  }
>  
>gsi = gsi_last_bb (loop_he

Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-07 Thread Richard Biener via Gcc-patches
On Sun, 7 May 2023, Jeff Law wrote:

> 
> 
> On 5/6/23 19:55, Li, Pan2 wrote:
> > It looks like we cannot simply swap the code and mode in rtx_def, the code
> > may have to be the same bits as the tree_code in tree_base. Or we will meet
> > ICE like below.
> > 
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > 
> > static inline decl_or_value
> > dv_from_value (rtx value)
> > {
> >decl_or_value dv;
> >dv = value;
> >gcc_checking_assert (dv_is_value_p (dv));  <=  ICE
> >return dv;
> Ugh.  We really just need to fix this code.  It assumes particular structure
> layouts and that's just wrong/dumb.

Well, it's a neat trick ... we just need to adjust it to

static inline bool
dv_is_decl_p (decl_or_value dv)
{ 
  return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE;
}  

I think (and hope for the 'decl' case the bits inspected are never 
'VALUE').  Of course the above stinks from a TBAA perspective ...

Any "real" fix would require allocating storage for a discriminator
and thus hurt the resource constrained var-tracking a lot.

Richard.


Re: [PATCH-2, rs6000] Change mode and insn condition for scalar extract sig instruction

2023-05-07 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/1/4 14:16, HAO CHEN GUI wrote:
> Hi,
>   This patch changes the return type of __builtin_vsx_scalar_extract_sig
> from const signed long to const signed long long, so that it can be called
> with "-m32/-mpowerpc64" option. The bif needs TARGET_POWERPC64 instead of
> TARGET_64BIT. So the condition check in the expander is changed.
> 
>   The test cases are modified according to the changes of expand pattern.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.

This patch is OK for trunk, thanks!

BR,
Kewen

> 
> Gui Haochen
> 
> ChangeLog
> 2023-01-03  Haochen Gui  
> 
> gcc/
>   * config/rs6000/rs6000-builtins.def
>   (__builtin_vsx_scalar_extract_sig): Set return type to const signed
>   long long.
>   * config/rs6000/vsx.md (xsxsigdp): Replace TARGET_64BIT with
>   TARGET_POWERPC64.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/scalar-extract-sig-0.c: Replace lp64 check
>   with has_arch_ppc64.
>   * gcc.target/powerpc/bfp/scalar-extract-sig-1.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-extract-sig-6.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index a8f1d3f1b3d..25647b7bdd2 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2849,7 +2849,7 @@
>pure vsc __builtin_vsx_lxvl (const void *, signed long);
>  LXVL lxvl {}
> 
> -  const signed long __builtin_vsx_scalar_extract_sig (double);
> +  const signed long long __builtin_vsx_scalar_extract_sig (double);
>  VSESDP xsxsigdp {}
> 
>const double __builtin_vsx_scalar_insert_exp (unsigned long long, \
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 229c26c3a61..27e03a4cf6c 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5111,7 +5111,7 @@ (define_insn "xsxsigdp"
>[(set (match_operand:DI 0 "register_operand" "=r")
>   (unspec:DI [(match_operand:DF 1 "vsx_register_operand" "wa")]
>UNSPEC_VSX_SXSIG))]
> -  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "TARGET_P9_VECTOR && TARGET_POWERPC64"
>"xsxsigdp %0,%x1"
>[(set_attr "type" "integer")])
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-0.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-0.c
> index 637080652b7..d22f7d1b274 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-0.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-0.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> 
>  /* This test should succeed only on 64-bit configurations.  */
>  #include 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-1.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-1.c
> index f12eed3d9d5..64747d73a51 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-1.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power8" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> 
>  /* This test should succeed only on 64-bit configurations.  */
>  #include 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-6.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-6.c
> index c85072da138..561be53fb9b 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-6.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-6.c
> @@ -1,7 +1,7 @@
>  /* { dg-do run { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target p9vector_hw } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> 
>  /* This test should succeed only on 64-bit configurations.  */
>  #include 


Re: [PATCH-3, rs6000] Change mode and insn condition for scalar insert exp instruction

2023-05-07 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/1/4 14:17, HAO CHEN GUI wrote:
> Hi,
>   This patch changes the mode of exponent to GPR in scalar insert exp
> pattern, as the exponent can be put into a 32-bit register. Also the
> condition check is changed from TARGET_64BIT to TARGET_POWERPC64.
> 
>   The test cases are modified according to the changes of expand pattern.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 

This patch is OK for trunk, thanks!

BR,
Kewen

> Gui Haochen
> 
> ChangeLog
> 2023-01-03  Haochen Gui  
> 
> gcc/
>   * config/rs6000/rs6000-builtins.def
>   (__builtin_vsx_scalar_insert_exp): Replace bif-pattern from xsiexpdp
>   to xsiexpdp_di.
>   (__builtin_vsx_scalar_insert_exp_dp): Replace bif-pattern from
>   xsiexpdpf to xsiexpdpf_di.
>   * config/rs6000/vsx.md (xsiexpdp): Rename to...
>   (xsiexpdp_): ..., set the mode of second operand to GPR and
>   replace TARGET_64BIT with TARGET_POWERPC64.
>   (xsiexpdpf): Rename to...
>   (xsiexpdpf_): ..., set the mode of second operand to GPR and
>   replace TARGET_64BIT with TARGET_POWERPC64.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/scalar-insert-exp-0.c: Replace lp64 check
>   with has_arch_ppc64.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-1.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-12.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-13.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-3.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-4.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 25647b7bdd2..b1b5002d7d9 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2854,10 +2854,10 @@
> 
>const double __builtin_vsx_scalar_insert_exp (unsigned long long, \
>  unsigned long long);
> -VSIEDP xsiexpdp {}
> +VSIEDP xsiexpdp_di {}
> 
>const double __builtin_vsx_scalar_insert_exp_dp (double, unsigned long 
> long);
> -VSIEDPF xsiexpdpf {}
> +VSIEDPF xsiexpdpf_di {}
> 
>pure vsc __builtin_vsx_xl_len_r (void *, signed long);
>  XL_LEN_R xl_len_r {}
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 27e03a4cf6c..3376090cc6f 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5137,22 +5137,22 @@ (define_insn "xsiexpqp_"
>[(set_attr "type" "vecmove")])
> 
>  ;; VSX Scalar Insert Exponent Double-Precision
> -(define_insn "xsiexpdp"
> +(define_insn "xsiexpdp_"
>[(set (match_operand:DF 0 "vsx_register_operand" "=wa")
>   (unspec:DF [(match_operand:DI 1 "register_operand" "r")
> - (match_operand:DI 2 "register_operand" "r")]
> + (match_operand:GPR 2 "register_operand" "r")]
>UNSPEC_VSX_SIEXPDP))]
> -  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "TARGET_P9_VECTOR && TARGET_POWERPC64"
>"xsiexpdp %x0,%1,%2"
>[(set_attr "type" "fpsimple")])
> 
>  ;; VSX Scalar Insert Exponent Double-Precision Floating Point Argument
> -(define_insn "xsiexpdpf"
> +(define_insn "xsiexpdpf_"
>[(set (match_operand:DF 0 "vsx_register_operand" "=wa")
>   (unspec:DF [(match_operand:DF 1 "register_operand" "r")
> - (match_operand:DI 2 "register_operand" "r")]
> + (match_operand:GPR 2 "register_operand" "r")]
>UNSPEC_VSX_SIEXPDP))]
> -  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "TARGET_P9_VECTOR && TARGET_POWERPC64"
>"xsiexpdp %x0,%1,%2"
>[(set_attr "type" "fpsimple")])
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-0.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-0.c
> index d8243258a67..88d77564158 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-0.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-0.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> 
>  /* This test should succeed only on 64-bit configurations.  */
>  #include 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-1.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-1.c
> index 8260b107178..2f219ddc83a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-1.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power8" } */
> +/* { dg-require-effective-target has_arch_ppc64 

Re: [PATCH-4, rs6000] Change ilp32 target check for some scalar-extract-sig and scalar-insert-exp test cases

2023-05-07 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/1/4 14:17, HAO CHEN GUI wrote:
> Hi,
>   "ilp32" is used in these test cases to make sure test cases only run on a
> 32-bit environment. Unfortunately, these cases also run with
> "-m32/-mpowerpc64" which causes unexpected errors. This patch changes the
> target check to skip if "has_arch_ppc64" is set. So the test cases won't run
> when arch_ppc64 has been set.

I think you meant one artificial run with runtestflags like
RUNTESTFLAGS="--target_board=unix/-m32/-mpowerpc64 ...", thanks for catching,
this patch is OK for trunk.

BR,
Kewen

> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> Gui Haochen
> 
> ChangeLog
> 2023-01-03  Haochen Gui  
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/scalar-extract-sig-2.c: Replace ilp32 check
>   with dg-skip-if has_arch_ppc64.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-2.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-5.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
> index 39ee74c94dc..148b5fbd9fa 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target ilp32 } */
> +/* { dg-skip-if "" { has_arch_ppc64 } } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
> index efd69725905..956c1183beb 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target ilp32 } */
> +/* { dg-skip-if "" { has_arch_ppc64 } } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
> index f85966a6fdf..9a7949fb89a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target ilp32 } */
> +/* { dg-skip-if "" { has_arch_ppc64 } } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> 



RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-07 Thread Li, Pan2 via Gcc-patches
Oops. Actually I am patching a version as you mentioned like storage 
allocation. Thank you Richard, will try your suggestion and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, May 8, 2023 2:30 PM
To: Jeff Law 
Cc: Li, Pan2 ; Kito Cheng ; 
juzhe.zh...@rivai.ai; richard.sandiford ; 
gcc-patches ; palmer ; jakub 

Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

On Sun, 7 May 2023, Jeff Law wrote:

> 
> 
> On 5/6/23 19:55, Li, Pan2 wrote:
> > It looks like we cannot simply swap the code and mode in rtx_def, 
> > the code may have to be the same bits as the tree_code in tree_base. 
> > Or we will meet ICE like below.
> > 
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > 
> > static inline decl_or_value
> > dv_from_value (rtx value)
> > {
> >decl_or_value dv;
> >dv = value;
> >gcc_checking_assert (dv_is_value_p (dv));  <=  ICE
> >return dv;
> Ugh.  We really just need to fix this code.  It assumes particular 
> structure layouts and that's just wrong/dumb.

Well, it's a neat trick ... we just need to adjust it to

static inline bool
dv_is_decl_p (decl_or_value dv)
{
  return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }  

I think (and hope for the 'decl' case the bits inspected are never 'VALUE').  
Of course the above stinks from a TBAA perspective ...

Any "real" fix would require allocating storage for a discriminator and thus 
hurt the resource constrained var-tracking a lot.

Richard.


Re: [PATCH] Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.

2023-05-07 Thread Richard Biener via Gcc-patches
On Sat, May 6, 2023 at 8:46 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 5/6/23 06:57, Roger Sayle wrote:
> >
> > Following up on posts/reviews by Segher and Uros, there's some question
> > over why the middle-end's lower subreg pass emits a clobber (of a
> > multi-word register) into the instruction stream before emitting the
> > sequence of moves of the word-sized parts.  This clobber interferes
> > with (LRA) register allocation, preventing the multi-word pseudo to
> > remain in the same hard registers.  This patch eliminates this
> > (presumably superfluous) clobber and thereby improves register allocation.
> Those clobbered used to help dataflow analysis know that a multi word
> register was fully assigned by a subsequent sequence.  I suspect they
> haven't been terribly useful in quite a while.

Likely - maybe they still make a difference for some targets though.
It might be interesting to see whether combining the clobber with the
first set or making the set a multi-set with a parallel would be any
better?

>
>
> >
> > A concrete example of the observed improvement is PR target/43644.
> > For the test case:
> > __int128 foo(__int128 x, __int128 y) { return x+y; }
> >
> > on x86_64-pc-linux-gnu, gcc -O2 currently generates:
> >
> > foo:movq%rsi, %rax
> >  movq%rdi, %r8
> >  movq%rax, %rdi
> >  movq%rdx, %rax
> >  movq%rcx, %rdx
> >  addq%r8, %rax
> >  adcq%rdi, %rdx
> >  ret
> >
> > with this patch, we now generate the much improved:
> >
> > foo:movq%rdx, %rax
> >  movq%rcx, %rdx
> >  addq%rdi, %rax
> >  adcq%rsi, %rdx
> >  ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32} with
> > no new failures.  OK for mainline?
> >
> >
> > 2023-05-06  Roger Sayle  
> >
> > gcc/ChangeLog
> >  PR target/43644
> >  * lower-subreg.cc (resolve_simple_move): Don't emit a clobber
> >  immediately before moving a multi-word register by parts.
> >
> > gcc/testsuite/ChangeLog
> >  PR target/43644
> >  * gcc.target/i386/pr43644.c: New test case.
> OK for the trunk.  I won't be at all surprised to see fallout in the
> various target tests.  We can fault in fixes as needed.  More
> importantly I think we want as much soak time for this change as we can
> in case there are unexpected consequences.
>
> jeff


Re: [PATCH 3/3] PHIOPT: factor out unary operations instead of just conversions

2023-05-07 Thread Richard Biener via Gcc-patches
On Sun, May 7, 2023 at 6:44 AM Andrew Pinski via Gcc-patches
 wrote:
>
> After using factor_out_conditional_conversion with diamond bb,
> we should be able do use it also for all normal unary gimple and not
> just conversions. This allows to optimize PR 59424 for an example.
> This is also a start to optimize PR 64700 and a few others.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> An example of this is:
> ```
> static inline unsigned long long g(int t)
> {
>   unsigned t1 = t;
>   return t1;
> }
> static int abs1(int a)
> {
>   if (a < 0)
> a = -a;
>   return a;
> }
> unsigned long long f(int c, int d, int e)
> {
>   unsigned long long t;
>   if (d > e)
> t = g(abs1(d));
>   else
> t = g(abs1(e));
>   return t;
> }
> ```
>
> Which should be optimized to:
>   _9 = MAX_EXPR ;
>   _4 = ABS_EXPR <_9>;
>   t_3 = (long long unsigned intD.16) _4;
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to 
> ...
> (factor_out_conditional_operation): This and add support for all unary
> operations.
> (pass_phiopt::execute): Update call to 
> factor_out_conditional_conversion
> to call factor_out_conditional_operation instead.
>
> PR tree-optimization/109424
> PR tree-optimization/59424
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/abs-2.c: Update tree scan for
> details change in wording.
> * gcc.dg/tree-ssa/minmax-17.c: Likewise.
> * gcc.dg/tree-ssa/pr103771.c: Likewise.
> * gcc.dg/tree-ssa/minmax-18.c: New test.
> * gcc.dg/tree-ssa/minmax-19.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/abs-2.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-18.c | 27 +++
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-19.c | 10 
>  gcc/testsuite/gcc.dg/tree-ssa/pr103771.c  |  2 +-
>  gcc/tree-ssa-phiopt.cc| 56 +--
>  6 files changed, 71 insertions(+), 28 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-18.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-19.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
> index 328b1802541..f8bbeb43237 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
> @@ -16,5 +16,5 @@ test_abs(int *cur)
>
>  /* We should figure out that test_abs has an ABS_EXPR in it. */
>  /* { dg-final { scan-tree-dump " = ABS_EXPR" "phiopt1"} } */
> -/* { dg-final { scan-tree-dump-times "changed to factor conversion out from" 
> 1 "phiopt1"} } */
> +/* { dg-final { scan-tree-dump-times "changed to factor operation out from" 
> 1 "phiopt1"} } */
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c
> index bd737e6b4cb..7c76cfc62a9 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c
> @@ -18,4 +18,4 @@ unsigned long long test_max(int c, int d, int e)
>
>  /* We should figure out that test_max has an MAX_EXPR in it. */
>  /* { dg-final { scan-tree-dump " = MAX_EXPR" "phiopt1"} } */
> -/* { dg-final { scan-tree-dump-times "changed to factor conversion out from" 
> 2 "phiopt1"} } */
> +/* { dg-final { scan-tree-dump-times "changed to factor operation out from" 
> 2 "phiopt1"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-18.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-18.c
> new file mode 100644
> index 000..c8e1670f64a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-18.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt1-details" } */
> +
> +static inline unsigned long long g(int t)
> +{
> +  unsigned t1 = t;
> +  return t1;
> +}
> +static inline int abs1(int a)
> +{
> +  if (a < 0)
> +a = -a;
> +  return a;
> +}
> +unsigned long long f(int c, int d, int e)
> +{
> +  unsigned long long t;
> +  if (d > e)
> +t = g(abs1(d));
> +  else
> +t = g(abs1(e));
> +  return t;
> +}
> +
> +/* { dg-final { scan-tree-dump " = MAX_EXPR" "phiopt1"} } */
> +/* { dg-final { scan-tree-dump-times " = ABS_EXPR" 2 "phiopt1"} } */
> +/* { dg-final { scan-tree-dump-times "changed to factor operation out from" 
> 3 "phiopt1"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-19.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-19.c
> new file mode 100644
> index 000..5ed55fe2e23
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-19.c
> @@ -0,0 +1,10 @@
> +/* PR tree-optimization/109424 */
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt1-details" } */
> +
> +int f2(int x, int y)
> +{
> +return (x > y) ? ~x : ~y;
> +}
> +/* { dg-final { scan-tree-dump " = MAX_EXPR" "phiopt1"} } */
> +/* { dg-final { scan-tree-dump-times "changed to factor operation out from" 
> 1 "phiopt1"} } */
> diff --git a/gcc/tests

Re: [PATCH 2/3] PHIOPT: Loop over calling factor_out_conditional_conversion

2023-05-07 Thread Richard Biener via Gcc-patches
On Sun, May 7, 2023 at 6:45 AM Andrew Pinski via Gcc-patches
 wrote:
>
> After adding diamond shaped bb support to factor_out_conditional_conversion,
> we can get a case where we have two conversions that needs factored out
> and then would have another phiopt happen.
> An example is:
> ```
> static inline unsigned long long g(int t)
> {
>   unsigned t1 = t;
>   return t1;
> }
> unsigned long long f(int c, int d, int e)
> {
>   unsigned long long t;
>   if (c > d)
> t = g(c);
>   else
> t = g(d);
>   return t;
> }
> ```
> In this case we should get a MAX_EXPR in phiopt1 with two casts.
> Before this patch, we would just factor out the outer cast and then
> wait till phiopt2 to factor out the inner cast.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (pass_phiopt::execute): Loop
> over factor_out_conditional_conversion.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/minmax-17.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c | 21 ++
>  gcc/tree-ssa-phiopt.cc| 27 +--
>  2 files changed, 36 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c
> new file mode 100644
> index 000..bd737e6b4cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt1-details" } */
> +
> +static inline unsigned long long g(int t)
> +{
> +  unsigned t1 = t;
> +  return t1;
> +}
> +unsigned long long test_max(int c, int d, int e)
> +{
> +  unsigned long long t;
> +  if (c > d)
> +t = g(c);
> +  else
> +t = g(d);
> +  return t;
> +}
> +
> +/* We should figure out that test_max has an MAX_EXPR in it. */
> +/* { dg-final { scan-tree-dump " = MAX_EXPR" "phiopt1"} } */
> +/* { dg-final { scan-tree-dump-times "changed to factor conversion out from" 
> 2 "phiopt1"} } */
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 41fea78dc8d..7fe088b13ff 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -4085,20 +4085,23 @@ pass_phiopt::execute (function *)
>   node.  */
>gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);
>
> -  gphi *newphi;
>if (single_pred_p (bb1)
> - && EDGE_COUNT (merge->preds) == 2
> - && (newphi = factor_out_conditional_conversion (e1, e2, phi,
> - arg0, arg1,
> - cond_stmt)))
> + && EDGE_COUNT (merge->preds) == 2)
> {
> - phi = newphi;
> - /* factor_out_conditional_conversion may create a new PHI in
> -BB2 and eliminate an existing PHI in BB2.  Recompute values
> -that may be affected by that change.  */
> - arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
> - arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
> - gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);
> + gphi *newphi = phi;
> + while (newphi)
> +   {
> + phi = newphi;
> + /* factor_out_conditional_conversion may create a new PHI in
> +BB2 and eliminate an existing PHI in BB2.  Recompute values
> +that may be affected by that change.  */
> + arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
> + arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
> + gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);
> + newphi = factor_out_conditional_conversion (e1, e2, phi,
> + arg0, arg1,
> + cond_stmt);
> +   }
> }
>
>/* Do the replacement of conditional if it can be done.  */
> --
> 2.31.1
>


Re: [PATCH 1/3] PHIOPT: Add diamond bb form to factor_out_conditional_conversion

2023-05-07 Thread Richard Biener via Gcc-patches
On Sun, May 7, 2023 at 7:05 AM Andrew Pinski via Gcc-patches
 wrote:
>
> So the function factor_out_conditional_conversion already supports
> diamond shaped bb forms, just need to be called for such a thing.
>
> harden-cond-comp.c needed to be changed as we would optimize out the
> conversion now and that causes the compare hardening not needing to
> split the block which it was testing. So change it such that there
> would be no chance of optimization.
>
> Also add two testcases that showed the improvement. PR 103771 is
> solved in ifconvert also for the vectorizer but now it is solved
> in a general sense.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

So what does it do for the diamond case?  Factor out _common_ operations
only or also promote conditionally executed operations to unconditionally
executed?  Apart from cost considerations (with the looping patch any number
might end up promoted) there's correctness issues for negation (signed overflow)
and for non-call exceptions there's possible external throws.

If it only factors out common operations the patch is OK (but the testcases
suggest that's not the case).  Otherwise we should add a limit as to how many
ops we hoist (maybe one for the diamond case?).  Thinking over it the same
issue of course exists for the non-diamond case?

Richard.

> PR tree-optimization/49959
> PR tree-optimization/103771
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (pass_phiopt::execute): Support
> Diamond shapped bb form for factor_out_conditional_conversion.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/torture/harden-cond-comp.c: Change testcase
> slightly to avoid the new phiopt optimization.
> * gcc.dg/tree-ssa/abs-2.c: New test.
> * gcc.dg/tree-ssa/pr103771.c: New test.
> ---
>  .../c-c++-common/torture/harden-cond-comp.c   |  6 +++---
>  gcc/testsuite/gcc.dg/tree-ssa/abs-2.c | 20 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr103771.c  | 18 +
>  gcc/tree-ssa-phiopt.cc|  2 +-
>  4 files changed, 42 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
>
> diff --git a/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c 
> b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
> index 5aad890a1d3..dcf364ee993 100644
> --- a/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
> +++ b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
> @@ -1,11 +1,11 @@
>  /* { dg-do compile } */
>  /* { dg-options "-fharden-conditional-branches -fharden-compares 
> -fdump-tree-hardcbr -fdump-tree-hardcmp -ffat-lto-objects" } */
>
> -int f(int i, int j) {
> +int f(int i, int j, int k, int l) {
>if (i == 0)
> -return j != 0;
> +return (j != 0) + l;
>else
> -return i * j != 0;
> +return (i * j != 0) * k;
>  }
>
>  /* { dg-final { scan-tree-dump-times "Splitting edge" 2 "hardcbr" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
> new file mode 100644
> index 000..328b1802541
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
> @@ -0,0 +1,20 @@
> +/* PR tree-optimization/49959 */
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt1-details" } */
> +
> +#define ABS(X)(((X)>0)?(X):-(X))
> +unsigned long
> +test_abs(int *cur)
> +{
> +  unsigned long sad = 0;
> +  if (cur[0] > 0)
> +sad = cur[0];
> +  else
> +sad = -cur[0];
> +  return sad;
> +}
> +
> +/* We should figure out that test_abs has an ABS_EXPR in it. */
> +/* { dg-final { scan-tree-dump " = ABS_EXPR" "phiopt1"} } */
> +/* { dg-final { scan-tree-dump-times "changed to factor conversion out from" 
> 1 "phiopt1"} } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
> new file mode 100644
> index 000..97c9db846cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-phiopt1-details" } */
> +/* { dg-final { scan-tree-dump-times "changed to factor conversion out from 
> COND_EXPR." 1 "phiopt1" } } */
> +
> +typedef unsigned char uint8_t;
> +
> +static uint8_t x264_clip_uint8 (int x)
> +{
> +  return x & (~255) ? (-x) >> 31 : x;
> +}
> +
> +void
> +mc_weight (uint8_t* __restrict dst, uint8_t* __restrict src,
> +  int i_width,int i_scale)
> +{
> +  for(int x = 0; x < i_width; x++)
> +dst[x] = x264_clip_uint8 (src[x] * i_scale);
> +}
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index f14b7e8b7e6..41fea78dc8d 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -4087,7 +4087,7 @@ pass_phiopt::execute (function *)
>
>gphi *newphi;
>if (single_pred_p (bb1)
> - && !diamond_p
> + && EDGE_COUNT (merge->preds) == 2
>   &

Re: [PATCH 1/3] PHIOPT: Add diamond bb form to factor_out_conditional_conversion

2023-05-07 Thread Richard Biener via Gcc-patches
On Mon, May 8, 2023 at 8:51 AM Richard Biener
 wrote:
>
> On Sun, May 7, 2023 at 7:05 AM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > So the function factor_out_conditional_conversion already supports
> > diamond shaped bb forms, just need to be called for such a thing.
> >
> > harden-cond-comp.c needed to be changed as we would optimize out the
> > conversion now and that causes the compare hardening not needing to
> > split the block which it was testing. So change it such that there
> > would be no chance of optimization.
> >
> > Also add two testcases that showed the improvement. PR 103771 is
> > solved in ifconvert also for the vectorizer but now it is solved
> > in a general sense.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> So what does it do for the diamond case?  Factor out _common_ operations
> only or also promote conditionally executed operations to unconditionally
> executed?  Apart from cost considerations (with the looping patch any number
> might end up promoted) there's correctness issues for negation (signed 
> overflow)
> and for non-call exceptions there's possible external throws.
>
> If it only factors out common operations the patch is OK (but the testcases
> suggest that's not the case).  Otherwise we should add a limit as to how many
> ops we hoist (maybe one for the diamond case?).  Thinking over it the same
> issue of course exists for the non-diamond case?

Oh, and we hoist the stuff without being sure the final PHI will be if-converted
in the end, correct?  Can we somehow improve on that, aka tentatively
convert if-convert the PHI first?

Richard.

> Richard.
>
> > PR tree-optimization/49959
> > PR tree-optimization/103771
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-phiopt.cc (pass_phiopt::execute): Support
> > Diamond shapped bb form for factor_out_conditional_conversion.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * c-c++-common/torture/harden-cond-comp.c: Change testcase
> > slightly to avoid the new phiopt optimization.
> > * gcc.dg/tree-ssa/abs-2.c: New test.
> > * gcc.dg/tree-ssa/pr103771.c: New test.
> > ---
> >  .../c-c++-common/torture/harden-cond-comp.c   |  6 +++---
> >  gcc/testsuite/gcc.dg/tree-ssa/abs-2.c | 20 +++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr103771.c  | 18 +
> >  gcc/tree-ssa-phiopt.cc|  2 +-
> >  4 files changed, 42 insertions(+), 4 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
> >
> > diff --git a/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c 
> > b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
> > index 5aad890a1d3..dcf364ee993 100644
> > --- a/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
> > +++ b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
> > @@ -1,11 +1,11 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-fharden-conditional-branches -fharden-compares 
> > -fdump-tree-hardcbr -fdump-tree-hardcmp -ffat-lto-objects" } */
> >
> > -int f(int i, int j) {
> > +int f(int i, int j, int k, int l) {
> >if (i == 0)
> > -return j != 0;
> > +return (j != 0) + l;
> >else
> > -return i * j != 0;
> > +return (i * j != 0) * k;
> >  }
> >
> >  /* { dg-final { scan-tree-dump-times "Splitting edge" 2 "hardcbr" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
> > new file mode 100644
> > index 000..328b1802541
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
> > @@ -0,0 +1,20 @@
> > +/* PR tree-optimization/49959 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O -fdump-tree-phiopt1-details" } */
> > +
> > +#define ABS(X)(((X)>0)?(X):-(X))
> > +unsigned long
> > +test_abs(int *cur)
> > +{
> > +  unsigned long sad = 0;
> > +  if (cur[0] > 0)
> > +sad = cur[0];
> > +  else
> > +sad = -cur[0];
> > +  return sad;
> > +}
> > +
> > +/* We should figure out that test_abs has an ABS_EXPR in it. */
> > +/* { dg-final { scan-tree-dump " = ABS_EXPR" "phiopt1"} } */
> > +/* { dg-final { scan-tree-dump-times "changed to factor conversion out 
> > from" 1 "phiopt1"} } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
> > new file mode 100644
> > index 000..97c9db846cb
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-phiopt1-details" } */
> > +/* { dg-final { scan-tree-dump-times "changed to factor conversion out 
> > from COND_EXPR." 1 "phiopt1" } } */
> > +
> > +typedef unsigned char uint8_t;
> > +
> > +static uint8_t x264_clip_uint8 (int x)
> > +{
> > +  return x & (~255) ? (-x) >> 31 : x;
> > +}
> > +
> > +void
> > +mc_weight (uint8_t* __restrict dst, uint8_t* __restrict src,
> > +