[PATCH RESEND] libatomic: drop redundant all-multi command

2023-07-31 Thread Jan Beulich via Gcc-patches
./multilib.am already specifies this same command, and make warns about
the earlier one being ignored when seeing the later one. All that needs
retaining to still satisfy the preceding comment is the extra
dependency.

libatomic/

* Makefile.am (all-multi): Drop commands.
* Makefile.in: Update accordingly.
---
While originally sent over a year ago and pinged subsequently, I can't
quite view changes like this as "trivial" ...

--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -149,12 +149,11 @@ endif
 libatomic_convenience_la_SOURCES = $(libatomic_la_SOURCES)
 libatomic_convenience_la_LIBADD = $(libatomic_la_LIBADD)
 
-# Override the automake generated all-multi rule to guarantee that all-multi
+# Amend the automake generated all-multi rule to guarantee that all-multi
 # is not run in parallel with the %_.lo rules which generate $(DEPDIR)/*.Ppo
 # makefile fragments to avoid broken *.Ppo getting included into the Makefile
 # when it is reloaded during the build of all-multi.
 all-multi: $(libatomic_la_LIBADD)
-   $(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE)
 
 # target overrides
 -include $(tmake_file)
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -892,12 +892,11 @@ vpath % $(strip $(search_path))
 %_.lo: Makefile
$(LTCOMPILE) $(M_DEPS) $(M_SIZE) $(M_IFUNC) -c -o $@ $(M_SRC)
 
-# Override the automake generated all-multi rule to guarantee that all-multi
+# Amend the automake generated all-multi rule to guarantee that all-multi
 # is not run in parallel with the %_.lo rules which generate $(DEPDIR)/*.Ppo
 # makefile fragments to avoid broken *.Ppo getting included into the Makefile
 # when it is reloaded during the build of all-multi.
 all-multi: $(libatomic_la_LIBADD)
-   $(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE)
 
 # target overrides
 -include $(tmake_file)


[PATCH] x86: fold two of vec_dupv2df's alternatives

2023-07-31 Thread Jan Beulich via Gcc-patches
By using Yvm in the source, both can be expressed in one.

gcc/

* sse.md (vec_dupv2df): Fold the middle two of the
alternatives.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -13784,21 +13784,20 @@
(set_attr "mode" "DF,DF,V1DF,V1DF,V1DF,V2DF,V1DF,V1DF,V1DF")])
 
 (define_insn "vec_dupv2df"
-  [(set (match_operand:V2DF 0 "register_operand" "=x,x,v,v")
+  [(set (match_operand:V2DF 0 "register_operand" "=x,v,v")
(vec_duplicate:V2DF
- (match_operand:DF 1 "nonimmediate_operand" "0,xm,vm,vm")))]
+ (match_operand:DF 1 "nonimmediate_operand" "0,Yvm,vm")))]
   "TARGET_SSE2"
   "@
unpcklpd\t%0, %0
%vmovddup\t{%1, %0|%0, %1}
-   vmovddup\t{%1, %0|%0, %1}
vbroadcastsd\t{%1, }%g0{|, %1}"
-  [(set_attr "isa" "noavx,sse3,avx512vl,*")
-   (set_attr "type" "sselog1,ssemov,ssemov,ssemov")
-   (set_attr "prefix" "orig,maybe_vex,evex,evex")
-   (set_attr "mode" "V2DF,DF,DF,V8DF")
+  [(set_attr "isa" "noavx,sse3,*")
+   (set_attr "type" "sselog1,ssemov,ssemov")
+   (set_attr "prefix" "orig,maybe_evex,evex")
+   (set_attr "mode" "V2DF,DF,V8DF")
(set (attr "enabled")
-   (cond [(eq_attr "alternative" "3")
+   (cond [(eq_attr "alternative" "2")
 (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
  && !TARGET_PREFER_AVX256")
   (match_test "")


[PATCH] Adjust testcase for more optimal codegen.

2023-07-31 Thread liuhongt via Gcc-patches
After
b9d7140c80bd3c7355b8291bb46f0895dcd8c3cb is the first bad commit
commit b9d7140c80bd3c7355b8291bb46f0895dcd8c3cb
Author: Jan Hubicka 
Date:   Fri Jul 28 09:16:09 2023 +0200

loop-split improvements, part 1

Now we have
vpbroadcastd %ecx, %xmm0
vpaddd .LC3(%rip), %xmm0, %xmm0
vpextrd $3, %xmm0, %eax
vmovddup %xmm3, %xmm0
vrndscalepd $9, %xmm0, %xmm0
vunpckhpd %xmm0, %xmm0, %xmm3

for vrndscalepd, no need to insert pxor since it reuses input register
xmm0 to avoid partial sse dependece.

Pushed to trunk.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr87007-4.c: Adjust testcase.
* gcc.target/i386/pr87007-5.c: Ditto.
---
 gcc/testsuite/gcc.target/i386/pr87007-4.c | 6 +++---
 gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c 
b/gcc/testsuite/gcc.target/i386/pr87007-4.c
index e91bdcbac44..23b5c5dcc52 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
-
+/* { dg-options "-O2 -march=skylake-avx512 -mfpmath=sse" } */
+/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to 
avoid partial dependence.  */
 
 #include
 
@@ -15,4 +15,4 @@ foo (int n, int k)
   d1 = ceil (d3);
 }
 
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index 20d13cf650b..b0b0a7b70ef 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
-
+/* { dg-options "-O2 -march=skylake-avx512 -mfpmath=sse" } */
+/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to 
avoid partial dependence.  */
 
 #include
 
@@ -15,4 +15,4 @@ foo (int n, int k)
   d1 = sqrt (d3);
 }
 
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
-- 
2.39.1.388.g2fc9e9ca3c



[PING][PATCH] ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]

2023-07-31 Thread Surya Kumari Jangala via Gcc-patches
Ping

On 21/07/23 3:43 pm, Surya Kumari Jangala via Gcc-patches wrote:
> The improve_allocation() routine does not update the
> allocated_hardreg_p[] array after an allocno is assigned a register.
> 
> If the register chosen in improve_allocation() is one that already has
> been assigned to a conflicting allocno, then allocated_hardreg_p[]
> already has the corresponding bit set to TRUE, so nothing needs to be
> done.
> 
> But improve_allocation() can also choose a register that has not been
> assigned to a conflicting allocno, and also has not been assigned to any
> other allocno. In this case, allocated_hardreg_p[] has to be updated.
> 
> 2023-07-21  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/PR110254
>   * ira-color.cc (improve_allocation): Update array
> ---
> 
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index 1fb2958bddd..5807d6d26f6 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -3340,6 +3340,10 @@ improve_allocation (void)
>   }
>/* Assign the best chosen hard register to A.  */
>ALLOCNO_HARD_REGNO (a) = best;
> +
> +  for (j = nregs - 1; j >= 0; j--)
> + allocated_hardreg_p[best + j] = true;
> +
>if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
>   fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
>best, ALLOCNO_NUM (a), ALLOCNO_REGNO (a));


Re: [RFC v2] RISC-V: Add Ztso atomic mappings

2023-07-31 Thread Jeff Law via Gcc-patches




On 7/17/23 15:28, Patrick O'Neill wrote:

The RISC-V Ztso extension currently has no effect on generated code.
With the additional ordering constraints guarenteed by Ztso, we can emit
more optimized atomic mappings than the RVWMO mappings.

This PR defines a strengthened version of Andrea Parri's proposed Ztso mappings 
("Proposed Mapping") [1]. The changes were discussed by Andrea Parri and Hans 
Boehm on the GCC mailing list and are required in order to be compatible with the RVWMO 
ABI [2].

This change corresponds to the Ztso psABI proposal[3].

[1] https://github.com/preames/public-notes/blob/master/riscv-tso-mappings.rst
[2] https://inbox.sourceware.org/gcc-patches/ZFV8pNAstwrF2qBb@andrea/T/#t
[3] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/391

gcc/ChangeLog:

2023-07-17  Patrick O'Neill  

* common/config/riscv/riscv-common.cc: Add Ztso and mark Ztso as
dependent on 'a' extension.
* config/riscv/riscv-opts.h (MASK_ZTSO): New mask.
(TARGET_ZTSO): New target.
* config/riscv/riscv.cc (riscv_memmodel_needs_amo_acquire): Add
Ztso case.
(riscv_memmodel_needs_amo_release): Add Ztso case.
(riscv_print_operand): Add Ztso case for LR/SC annotations.
* config/riscv/riscv.md: Import sync-rvwmo.md and sync-ztso.md.
* config/riscv/riscv.opt: Add Ztso target variable.
* config/riscv/sync.md (mem_thread_fence_1): Expand to RVWMO or
Ztso specific insn.
(atomic_load): Expand to RVWMO or Ztso specific insn.
(atomic_store): Expand to RVWMO or Ztso specific insn.
* config/riscv/sync-rvwmo.md: New file. Seperate out RVWMO
specific load/store/fence mappings.
* config/riscv/sync-ztso.md: New file. Seperate out Ztso
specific load/store/fence mappings.

gcc/testsuite/ChangeLog:

2023-07-17  Patrick O'Neill  

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-1.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-2.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-3.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-4.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-5.c: New test.
* gcc.target/riscv/amo-table-ztso-load-1.c: New test.
* gcc.target/riscv/amo-table-ztso-load-2.c: New test.
* gcc.target/riscv/amo-table-ztso-load-3.c: New test.
* gcc.target/riscv/amo-table-ztso-store-1.c: New test.
* gcc.target/riscv/amo-table-ztso-store-2.c: New test.
* gcc.target/riscv/amo-table-ztso-store-3.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill 
---





diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 195f0019e06..432d4389985 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4483,6 +4483,10 @@ riscv_union_memmodels (enum memmodel model1, enum 
memmodel model2)
  static bool
  riscv_memmodel_needs_amo_acquire (enum memmodel model)
  {
+  /* ZTSO amo mappings require no annotations.  */
+  if (TARGET_ZTSO)
+return false;

Formatting nit.  Should be indented two spaces from the open curley.


+
switch (model)
  {
case MEMMODEL_ACQ_REL:
@@ -4506,6 +4510,10 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
  static bool
  riscv_memmodel_needs_amo_release (enum memmodel model)
  {
+  /* ZTSO amo mappings require no annotations.  */
+  if (TARGET_ZTSO)
+return false;

Likewise.






+
+(define_insn "mem_thread_fence_rvwmo"
+  [(set (match_operand:BLK 0 "" "")
+   (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
+   (match_operand:SI 1 "const_int_operand" "")]  ;; model
Just another formatting nit.  The (unspec... should line up with the 
preceeding (match_operand..


Similarly for the other new patterns/expanders you've 

Re: [r14-2834 Regression] FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 1 on Linux/x86_64

2023-07-31 Thread Hongtao Liu via Gcc-patches
On Sat, Jul 29, 2023 at 11:55 AM haochen.jiang via Gcc-regression
 wrote:
>
> On Linux/x86_64,
>
> b9d7140c80bd3c7355b8291bb46f0895dcd8c3cb is the first bad commit
> commit b9d7140c80bd3c7355b8291bb46f0895dcd8c3cb
> Author: Jan Hubicka 
> Date:   Fri Jul 28 09:16:09 2023 +0200
>
> loop-split improvements, part 1
>
> caused
>
> FAIL: gcc.target/i386/pr87007-4.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 
> 1
> FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 
> 1
>
> with GCC configured with
I'll adjust testcase for this one.
Now we have
vpbroadcastd %ecx, %xmm0
vpaddd .LC3(%rip), %xmm0, %xmm0
vpextrd $3, %xmm0, %eax
vmovddup %xmm3, %xmm0
vrndscalepd $9, %xmm0, %xmm0
vunpckhpd %xmm0, %xmm0, %xmm3

for vrndscalepd, no need to insert pxor since it reuses input operand
xmm0 which loads from memory.
>
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2834/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-4.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-4.c --target_board='unix{-m32\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-4.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-4.c --target_board='unix{-m64\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m32\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m64\ 
> -march=cascadelake}'"
>
> (Please do not reply to this email, for question about this report, contact 
> me at haochen dot jiang at intel.com.)
> (If you met problems with cascadelake related, disabling AVX512F in command 
> line might save that.)
> (However, please make sure that there is no potential problems with AVX512.)



-- 
BR,
Hongtao


[PATCH] preprocessor: c++: Support `#pragma GCC target' macros [PR87299]

2023-07-31 Thread Lewis Hyatt via Gcc-patches
`#pragma GCC target' is not currently handled in preprocess-only mode (e.g.,
when running gcc -E or gcc -save-temps). As noted in the PR, this means that
if the target pragma defines any macros, those macros are not effective in
preprocess-only mode. Similarly, such macros are not effective when
compiling with C++ (even when compiling without -save-temps), because C++
does not process the pragma until after all tokens have been obtained from
libcpp, at which point it is too late for macro expansion to take place.

Since r13-1544 and r14-2893, there is a general mechanism to handle pragmas
under these conditions as well, so resolve the PR by using the new "early
pragma" support.

toplev.cc required some changes because the target-specific handlers for
`#pragma GCC target' may call target_reinit(), and toplev.cc was not expecting
that function to be called in preprocess-only mode.

I added some additional testcases from the PR for x86. The other targets
that support `#pragma GCC target' (aarch64, arm, nios2, powerpc, s390)
already had tests verifying that the pragma sets macros as expected; here I
have added -save-temps to some of them, to test that it now works in
preprocess-only mode as well.

gcc/c-family/ChangeLog:

PR preprocessor/87299
* c-pragma.cc (init_pragma): Register `#pragma GCC target' and
related pragmas in preprocess-only mode, and enable early handling.
(c_reset_target_pragmas): New function refactoring code from...
(handle_pragma_reset_options): ...here.
* c-pragma.h (c_reset_target_pragmas): Declare.

gcc/cp/ChangeLog:

PR preprocessor/87299
* parser.cc (cp_lexer_new_main): Call c_reset_target_pragmas ()
after preprocessing is complete, before starting compilation.

gcc/ChangeLog:

PR preprocessor/87299
* toplev.cc (no_backend): New static global.
(finalize): Remove argument no_backend, which is now a
static global.
(process_options): Likewise.
(do_compile): Likewise.
(target_reinit): Don't do anything in preprocess-only mode.
(toplev::main): Adapt to no_backend change.
(toplev::finalize): Likewise.

gcc/testsuite/ChangeLog:

PR preprocessor/87299
* c-c++-common/pragma-target-1.c: New test.
* c-c++-common/pragma-target-2.c: New test.
* g++.target/i386/pr87299-1.C: New test.
* g++.target/i386/pr87299-2.C: New test.
* gcc.target/i386/pr87299-1.c: New test.
* gcc.target/i386/pr87299-2.c: New test.
* gcc.target/s390/target-attribute/tattr-2.c: Add -save-temps to the
options, to test preprocess-only mode as well.
* gcc.target/aarch64/pragma_cpp_predefs_1.c: Likewise.
* gcc.target/arm/pragma_arch_attribute.c: Likewise.
* gcc.target/nios2/custom-fp-2.c: Likewise.
* gcc.target/powerpc/float128-3.c: Likewise.
---

Notes:
Hello-

This patch fixes the PR by enabling early pragma handling for `#pragma GCC
target' and related pragmas such as `#pragma GCC push_options'. I did not
need to touch any target-specific code, however I did need to make a change
to toplev.cc, affecting all targets, to make it safe to call target_reinit()
in preprocess-only mode. (Otherwise, it would be necessary to modify the
implementation of target pragmas in every target, to avoid this code path.)
That was the only complication I ran into.

Regarding testing, I did: (thanks to GCC compile farm for the non-x86
targets)

bootstrap + regtest all languages - x86_64-pc-linux-gnu
bootstrap + regtest c/c++ - powerpc64le-unknown-linux-gnu,
aarch64-unknown-linux-gnu

The following backends also implement this pragma so ought to be tested:
arm
nios2
s390

I am not able to test those directly. I did add coverage to their testsuites
(basically, adding -save-temps to any existing test, causes it to test the
pragma in preprocess-only mode.) Then, I verified on x86_64 with a cross
compiler, that the modified testcases fail before the patch and pass
afterwards. nios2 is an exception, it does not set any libcpp macros when
handling the pragma, so there is nothing to test, but I did verify that
processing the pragma in preprocess-only mode does not cause any problems.
The cross compilers tested were targets arm-unknown-linux-gnueabi,
nios2-unknown-linux, and s390-ibm-linux.

Please let me know if it looks OK? Thanks!

-Lewis

 gcc/c-family/c-pragma.cc  | 49 ---
 gcc/c-family/c-pragma.h   |  2 +-
 gcc/cp/parser.cc  |  6 +++
 gcc/testsuite/c-c++-common/pragma-target-1.c  | 19 +++
 gcc/testsuite/c-c++-common/pragma-target-2.c  | 27 ++
 gcc/testsuite/g++.target/i386/pr87299-1.C |  8 +++
 

[PATCH] c++: improve debug_tree for templated types/decls

2023-07-31 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

gcc/cp/ChangeLog:

* ptree.cc (cxx_print_decl): Check for DECL_LANG_SPECIFIC and
TS_DECL_COMMON only when necessary.  Print DECL_TEMPLATE_INFO
for all decls that have it, not just VAR_DECL or FUNCTION_DECL.
Also print DECL_USE_TEMPLATE.
(cxx_print_type): Print TYPE_TEMPLATE_INFO.
: Don't print TYPE_TI_ARGS
anymore.
: Print TEMPLATE_TYPE_PARM_INDEX
instead of printing the index, level and original level
individually.
---
 gcc/cp/ptree.cc | 32 +---
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/gcc/cp/ptree.cc b/gcc/cp/ptree.cc
index 33af7b81f58..13306fc8762 100644
--- a/gcc/cp/ptree.cc
+++ b/gcc/cp/ptree.cc
@@ -38,10 +38,6 @@ cxx_print_decl (FILE *file, tree node, int indent)
   return;
 }
 
-  if (!CODE_CONTAINS_STRUCT (TREE_CODE (node), TS_DECL_COMMON)
-  || !DECL_LANG_SPECIFIC (node))
-return;
-
   if (TREE_CODE (node) == FUNCTION_DECL)
 {
   int flags = TFF_DECL_SPECIFIERS|TFF_RETURN_TYPE
@@ -106,7 +102,10 @@ cxx_print_decl (FILE *file, tree node, int indent)
   need_indent = false;
 }
 
-  if (DECL_EXTERNAL (node) && DECL_NOT_REALLY_EXTERN (node))
+  if (CODE_CONTAINS_STRUCT (TREE_CODE (node), TS_DECL_COMMON)
+  && DECL_LANG_SPECIFIC (node)
+  && DECL_EXTERNAL (node)
+  && DECL_NOT_REALLY_EXTERN (node))
 {
   if (need_indent)
indent_to (file, indent + 3);
@@ -115,6 +114,7 @@ cxx_print_decl (FILE *file, tree node, int indent)
 }
 
   if (TREE_CODE (node) == FUNCTION_DECL
+  && DECL_LANG_SPECIFIC (node)
   && DECL_PENDING_INLINE_INFO (node))
 {
   if (need_indent)
@@ -124,27 +124,29 @@ cxx_print_decl (FILE *file, tree node, int indent)
   need_indent = false;
 }
   
-  if (VAR_OR_FUNCTION_DECL_P (node)
+  if (DECL_LANG_SPECIFIC (node)
   && DECL_TEMPLATE_INFO (node))
-print_node (file, "template-info", DECL_TEMPLATE_INFO (node),
-   indent + 4);
+{
+  print_node (file, "template-info", DECL_TEMPLATE_INFO (node),
+ indent + 4);
+  indent_to (file, indent + 3);
+  fprintf (file, " use_template=%d", DECL_USE_TEMPLATE (node));
+}
 }
 
 void
 cxx_print_type (FILE *file, tree node, int indent)
 {
+  if (TYPE_LANG_SPECIFIC (node)
+  && TYPE_TEMPLATE_INFO (node))
+print_node (file, "template-info", TYPE_TEMPLATE_INFO (node), indent + 4);
+
   switch (TREE_CODE (node))
 {
 case BOUND_TEMPLATE_TEMPLATE_PARM:
-  print_node (file, "args", TYPE_TI_ARGS (node), indent + 4);
-  gcc_fallthrough ();
-
 case TEMPLATE_TYPE_PARM:
 case TEMPLATE_TEMPLATE_PARM:
-  indent_to (file, indent + 3);
-  fprintf (file, "index %d level %d orig_level %d",
-  TEMPLATE_TYPE_IDX (node), TEMPLATE_TYPE_LEVEL (node),
-  TEMPLATE_TYPE_ORIG_LEVEL (node));
+  print_node (file, "tpi", TEMPLATE_TYPE_PARM_INDEX (node), indent + 4);
   return;
 
 case FUNCTION_TYPE:
-- 
2.41.0.478.gee48e70a82



[PATCH] tree-pretty-print: handle COMPONENT_REF with non-decl RHS

2023-07-31 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

In the C++ front end, a COMPONENT_REF's second operand isn't always a
decl (at least at template parse time).  This patch makes the generic
pretty printer not ICE when printing such a COMPONENT_REF.

gcc/ChangeLog:

* tree-pretty-print.cc (dump_generic_node) :
Don't call component_ref_field_offset if the RHS isn't a decl.
---
 gcc/tree-pretty-print.cc | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 25d191b10fd..da8dd002a3b 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -2482,14 +2482,16 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
   if (op_prio (op0) < op_prio (node))
pp_right_paren (pp);
   pp_string (pp, str);
-  dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false);
-  op0 = component_ref_field_offset (node);
-  if (op0 && TREE_CODE (op0) != INTEGER_CST)
-   {
- pp_string (pp, "{off: ");
- dump_generic_node (pp, op0, spc, flags, false);
+  op1 = TREE_OPERAND (node, 1);
+  dump_generic_node (pp, op1, spc, flags, false);
+  if (DECL_P (op1))
+   if (tree off = component_ref_field_offset (node))
+ if (TREE_CODE (off) != INTEGER_CST)
+   {
+ pp_string (pp, "{off: ");
+ dump_generic_node (pp, off, spc, flags, false);
  pp_right_brace (pp);
-   }
+   }
   break;
 
 case BIT_FIELD_REF:
-- 
2.41.0.478.gee48e70a82



[COMMITTEDv2] Fix PR 93044: extra cast is not removed

2023-07-31 Thread Andrew Pinski via Gcc-patches
In this case we are not removing convert to a bigger size
back to the same size (or smaller) if signedness does not
match.
For an example:
```
  signed char _1;
...
  _1 = *a_4(D);
  b_5 = (short unsigned int) _1;
  _2 = (unsigned char) b_5;
```
The inner cast is not needed and can be removed but was not.
The match pattern for removing the extra cast is overly
complex so decided to add a new case for rather than trying
to modify the current if statement here.

Committed as approved. Bootstrapped and tested on x86_64-linux-gnu with no 
regressions.

gcc/ChangeLog:

* match.pd (nested int casts): A truncation (to the same size or 
smaller)
can always remove the inner cast.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/cast-1.c: New test.
* gcc.dg/tree-ssa/cast-2.c: New test.
---
 gcc/match.pd   | 10 ++
 gcc/testsuite/gcc.dg/tree-ssa/cast-1.c | 12 
 gcc/testsuite/gcc.dg/tree-ssa/cast-2.c | 12 
 3 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cast-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cast-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 74f0a84f31d..cfd6ea08807 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4340,6 +4340,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 && ! (final_ptr && inside_prec != inter_prec))
  (ocvt @0))
 
+   /* `(outer:M)(inter:N) a:O`
+  can be converted to `(outer:M) a`
+  if M <= O && N >= O. No matter what signedness of the casts,
+  as the final is either a truncation from the original or just
+  a sign change of the type. */
+   (if (inside_int && inter_int && final_int
+&& final_prec <= inside_prec
+   && inter_prec >= inside_prec)
+(convert @0))
+
 /* A truncation to an unsigned type (a zero-extension) should be
canonicalized as bitwise and of a mask.  */
 (if (GIMPLE /* PR70366: doing this in GENERIC breaks -Wconversion.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cast-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cast-1.c
new file mode 100644
index 000..0f33ab58b3e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cast-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+
+
+void f(signed char *a, unsigned char *c)
+{
+  unsigned short b = *a;
+  *c = ((unsigned char)b);
+}
+
+
+/* { dg-final { scan-tree-dump-not "\\(short unsigned int\\)" "optimized"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cast-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cast-2.c
new file mode 100644
index 000..d665e924831
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cast-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+
+
+void f(signed short *a, unsigned char *c)
+{
+  unsigned long b = *a;
+  *c = ((unsigned char)b);
+}
+
+
+/* { dg-final { scan-tree-dump-not "\\(long unsigned int\\)" "optimized"} } */
-- 
2.31.1



libbacktrace patch committed

2023-07-31 Thread Ian Lance Taylor via Gcc-patches
This libbacktrace patch, based on one by Andres Freund, uses the
_pgmptr variable declared on Windows to find the executable file name
if none is specified.  Bootstrapped and ran libbacktrace testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian

Patch from Andres Freund:
* configure.ac: Check for _pgmptr declaration.
* fileline.c (fileline_initialize): Check for _pgmfptr before
/proc/self/exec.
* configure, config.h.in: Regenerate.
a349ba16f18b66b70c7a1bdb1ab5c5b6247676da
diff --git a/libbacktrace/configure.ac b/libbacktrace/configure.ac
index 39e6bf41e35..72ff2b30053 100644
--- a/libbacktrace/configure.ac
+++ b/libbacktrace/configure.ac
@@ -407,6 +407,9 @@ if test "$have_getexecname" = "yes"; then
   AC_DEFINE(HAVE_GETEXECNAME, 1, [Define if getexecname is available.])
 fi
 
+# Check for _pgmptr variable, contains the executable filename on windows
+AC_CHECK_DECLS([_pgmptr])
+
 # Check for sysctl definitions.
 
 AC_CACHE_CHECK([for KERN_PROC],
diff --git a/libbacktrace/fileline.c b/libbacktrace/fileline.c
index 674bf33cdcf..0e560b44e7a 100644
--- a/libbacktrace/fileline.c
+++ b/libbacktrace/fileline.c
@@ -155,6 +155,16 @@ macho_get_executable_path (struct backtrace_state *state,
 
 #endif /* !defined (HAVE_MACH_O_DYLD_H) */
 
+#if HAVE_DECL__PGMPTR
+
+#define windows_executable_filename() _pgmptr
+
+#else /* !HAVE_DECL__PGMPTR */
+
+#define windows_executable_filename() NULL
+
+#endif /* !HAVE_DECL__PGMPTR */
+
 /* Initialize the fileline information from the executable.  Returns 1
on success, 0 on failure.  */
 
@@ -192,7 +202,7 @@ fileline_initialize (struct backtrace_state *state,
 
   descriptor = -1;
   called_error_callback = 0;
-  for (pass = 0; pass < 8; ++pass)
+  for (pass = 0; pass < 9; ++pass)
 {
   int does_not_exist;
 
@@ -205,23 +215,28 @@ fileline_initialize (struct backtrace_state *state,
  filename = getexecname ();
  break;
case 2:
- filename = "/proc/self/exe";
+ /* Test this before /proc/self/exe, as the latter exists but points
+to the wine binary (and thus doesn't work).  */
+ filename = windows_executable_filename ();
  break;
case 3:
- filename = "/proc/curproc/file";
+ filename = "/proc/self/exe";
  break;
case 4:
+ filename = "/proc/curproc/file";
+ break;
+   case 5:
  snprintf (buf, sizeof (buf), "/proc/%ld/object/a.out",
(long) getpid ());
  filename = buf;
  break;
-   case 5:
+   case 6:
  filename = sysctl_exec_name1 (state, error_callback, data);
  break;
-   case 6:
+   case 7:
  filename = sysctl_exec_name2 (state, error_callback, data);
  break;
-   case 7:
+   case 8:
  filename = macho_get_executable_path (state, error_callback, data);
  break;
default:


Re: [PATCH v2] combine: Narrow comparison of memory and constant

2023-07-31 Thread Jeff Law via Gcc-patches




On 7/31/23 15:43, Prathamesh Kulkarni via Gcc-patches wrote:

On Mon, 19 Jun 2023 at 19:59, Stefan Schulze Frielinghaus via
Gcc-patches  wrote:


Comparisons between memory and constants might be done in a smaller mode
resulting in smaller constants which might finally end up as immediates
instead of in the literal pool.

For example, on s390x a non-symmetric comparison like
   x <= 0x3fff
results in the constant being spilled to the literal pool and an 8 byte
memory comparison is emitted.  Ideally, an equivalent comparison
   x0 <= 0x3f
where x0 is the most significant byte of x, is emitted where the
constant is smaller and more likely to materialize as an immediate.

Similarly, comparisons of the form
   x >= 0x4000
can be shortened into x0 >= 0x40.

Bootstrapped and regtested on s390x, x64, aarch64, and powerpc64le.
Note, the new tests show that for the mentioned little-endian targets
the optimization does not materialize since either the costs of the new
instructions are higher or they do not match.  Still ok for mainline?

Hi Stefan,
Unfortunately this patch (committed in 7cdd0860949c6c3232e6cff1d7ca37bb5234074c)
caused the following ICE on armv8l-unknown-linux-gnu:
during RTL pass: combine
../../../gcc/libgcc/fixed-bit.c: In function ‘__gnu_saturate1sq’:
../../../gcc/libgcc/fixed-bit.c:210:1: internal compiler error: in
decompose, at rtl.h:2297
   210 | }
   | ^
0xaa23e3 wi::int_traits

::decompose(long long*, unsigned int, std::pair
machine_mode> const&)
 ../../gcc/gcc/rtl.h:2297

[ ... ]
Yea, we're seeing something very similar on nios2-linux-gnu building the 
kernel.


Prathamesh, can you extract the .i file for fixed-bit on armv8 and open 
a bug for this issue, attaching the .i file as well as the right command 
line options necessary to reproduce the failure.  THat way Stefan can 
tackle it with a cross compiler.


Thanks,
jeff


Re: [PATCH] analyzer: Add support of placement new and improved operator new [PR105948]

2023-07-31 Thread David Malcolm via Gcc-patches
On Mon, 2023-07-31 at 13:46 +0200, Benjamin Priour wrote:
> Hi Dave,
> 
> On Fri, Jul 21, 2023 at 10:10 PM David Malcolm 
> wrote:

[...snip...]

> > 
> > I see that we have test coverage for:
> >   noexcept-new.C: -fno-exceptions with new vs nothrow-new
> > whereas:
> >   new-2.C has (implicitly) -fexceptions with new
> > 
> > It seems that of the four combinations for:
> >   - exceptions enabled or disabled
> > and:
> >   - throwing versus non-throwing new
> > this is covering three of the cases but is missing the case of
> > nothrow-
> > new when exceptions are enabled.
> > Presumably new-2.C should gain test coverage for this case.  Or am
> > I
> > missing something here?  Am I right in thinking that it's possible
> > for
> > the user to use nothrow new when exceptions are enabled to get a
> > new
> > that can fail and return nullptr?  Or is that not possible?
> > 
> > 
> Thanks a lot for spotting that, the new test pointed out an issue
> with the
> detection of nothrow.
> It has been fixed and now both test cases behave similarly.
> However, this highlighted a faulty test case I had written.
> 
> int* y = new(std::nothrow) int();
> int z = *y + 2; /* { dg-warning "dereference of NULL 'y'" } */
> /* { dg-warning "use of uninitialized value '\\*y'" "" { xfail *-*-*
> } .-1
> } */ // (#) should be a bogus
> delete y;
> 
> The test labelled (#) is wrong and should be a bogus instead.

Am I right in thinking that by this you mean that with the patch, the
analyzer complains about "use of uninitialized value '*y'" ? (which
would be an incorrect warning)

> If "y" is null then the allocation failed and dereferencing "y" will
> cause
> a segfault, not a "use-of-uninitialized-value".
> Thus we should stick to 'dereference of NULL 'y'" only.
> If "y" is non-null then the allocation succeeded and "*y" is
> initialized
> since we are calling a default initialization with the empty
> parenthesis.

I *think* it's possible to have the region_model have y pointing to a
heap_allocated_region of sizeof(int) size that's been initialized, but
still have the malloc state machine part of the program_state say that
the pointer is maybe-null.

What does the gimple look like and what does the program_state look
like after the assignment to y?

> 
> This led me to consider having "null-dereference" supersedes
> "use-of-uninitialized-value", but
> new PR 110830 made me reexamine it.
> 
> I believe fixing PR 110830 is thus required before submitting this
> patch,
> or we would have some extra irrelevant warnings.

How bad would the problem be?  PR 110830 looks a little involved, so is
there a way to get the current patch in without dragging that extra
complexity in?


[...snip...]

Thanks
Dave



Re: [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

2023-07-31 Thread Lewis Hyatt via Gcc-patches
On Fri, Jul 28, 2023 at 6:58 PM David Malcolm  wrote:
>
> On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> > Add a new linemap reason LC_GEN which enables encoding the location
> > of data
> > that was generated during compilation and does not appear in any
> > source file.
> > There could be many use cases, such as, for instance, referring to
> > the content
> > of builtin macros (not yet implemented, but an easy lift after this
> > one.) The
> > first intended application is to create a place to store the input to
> > a
> > _Pragma directive, so that proper locations can be assigned to those
> > tokens. This will be done in a subsequent commit.
> >
> > The actual change needed to the line-maps API in libcpp is not too
> > large and
> > requires no space overhead in the line map data structures (on 64-bit
> > systems
> > that is; one newly added data member to class line_map_ordinary sits
> > inside
> > former padding bytes.) An LC_GEN map is just an ordinary map like any
> > other,
> > but the TO_FILE member that normally points to the file name points
> > instead to
> > the actual data.  This works automatically with PCH as well, for the
> > same
> > reason that the file name makes its way into a PCH.  In order to
> > avoid
> > confusion, the member has been renamed from TO_FILE to DATA, and
> > associated
> > accessors adjusted.
> >
> > Outside libcpp, there are many small changes but most of them are to
> > selftests, which are necessarily more sensitive to implementation
> > details. From the perspective of the user (the "user", here, being a
> > frontend
> > using line maps or else the diagnostics infrastructure), the chief
> > visible
> > change is that the function location_get_source_line() should be
> > passed an
> > expanded_location object instead of a separate filename and line
> > number.  This
> > is not a big change because in most cases, this information came
> > anyway from a
> > call to expand_location and the needed expanded_location object is
> > readily
> > available. The new overload of location_get_source_line() uses the
> > extra
> > information in the expanded_location object to obtain the data from
> > the
> > in-memory buffer when it originated from an LC_GEN map.
> >
> > Until the subsequent patch that starts using LC_GEN maps, none are
> > yet
> > generated within GCC, hence nothing is added to the testsuite here;
> > but all
> > relevant selftests have been extended to cover generated data maps in
> > addition
> > to normal files.
>
> [..snip...]
>
> Thanks for the updated patch.
>
> Reading this patch, it felt a bit unnatural to me to have an
>   (exploded location, source line)
> pair where the exploded location seems to be representing "which source
> file or generated buffer", but the line/column info in that
> exploded_location is to be ignored in favor of the 2nd source line.
>
> I think we're missing a class: something that identifies either a
> specific source file, or a specific generated buffer.
>
> How about something like either:
>
> class source_id
> {
> public:
>   source_id (const char *filename)
>   : m_filename_or_buffer (filename),
> m_len (0)
>   {
>   }
>
>   explicit source_id (const char *buffer, unsigned buffer_len)
>   : m_filename_or_buffer (buffer),
> m_len (buffer_len)
>   {
> linemap_assert (buffer_len > 0);
>   }
>
> private:
>   const char *m_filename_or_buffer;
>   unsigned m_len;  // where 0 means "it's a filename"
> };
>
> or:
>
> class source_id
> {
> public:
>   source_id (const char *filename)
>   : m_ptr (filename),
> m_is_buffer (false)
>   {
>   }
>
>   explicit source_id (const linemap_ordinary *buffer_linemap)
>   : m_ptr (buffer_linemap),
> m_is_buffer (true)
>   {
>   }
>
> private:
>   const void *m_ptr;
>   bool m_is_buffer;
> };
>
> and use one of these "source_id file" in place of "const char *file",
> rather than replacing such things with expanded_location?
>
> > diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
> > index e8d3dece770..4164fa0b1ba 100644
> > --- a/gcc/c-family/c-indentation.cc
> > +++ b/gcc/c-family/c-indentation.cc
> > @@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
> >  unsigned int *first_nws,
> >  unsigned int tab_width)
> >  {
> > -  char_span line = location_get_source_line (exploc.file, exploc.line);
> > +  char_span line = location_get_source_line (exploc);
>
> ...so this might contine to be:
>
>   char_span line = location_get_source_line (exploc.file, exploc.line);
>
> ...but expanded_location's "file" field would become a source_id,
> rather than a const char *.  It looks like doing do might make a lot of
> "is this the same file or buffer?"  turn into comparisons of source_id
> instances.
>
> So I think expanded_location would become:
>
> typedef struct
> {
>   /* Either the name of the source file involved, or the
>  specific generated buffer.  */
>   source_id file;
>
>   /* The line-location in 

Re: Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization

2023-07-31 Thread 钟居哲
>> From my recollection this is usually 30-40% faster than the naive tree
>> adder and also amenable to vectorization.  As long as the multiplication
>> is not terribly slow, that is.  Mula's algorithm should be significantly
>> faster even, another 30% IIRC.
 
>> I'm not against continuing with the more well-known approach for now
>> but we should keep in mind that might still be potential for improvement.

No. I don't think it's faster.

>> Wait, why do we need vec_pack_trunc for popcountll?  For me vectorizing
>> it "just works" when the output is a uint64_t just like the standard
>> name demands.

>> If you're referring to something else, please detail in the comment.

I have no ideal. I saw ARM SVE generate:
POP_COUNT
POP_COUNT
VEC_PACK_TRUNC.

I am gonna drop this patch since it's meaningless.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-01 03:38
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng
Subject: Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization
Hi Juzhe,
 
> +/* Expand Vector POPCOUNT by parallel popcnt:
> +
> +   int parallel_popcnt(uint32_t n) {
> +   #define POW2(c)  (1U << (c))
> +   #define MASK(c)  (static_cast(-1) / (POW2(POW2(c)) + 1U))
> +   #define COUNT(x, c)  ((x) & MASK(c)) + (((x)>>(POW2(c))) & MASK(c))
> + n = COUNT(n, 0);
> + n = COUNT(n, 1);
> + n = COUNT(n, 2);
> + n = COUNT(n, 3);
> + n = COUNT(n, 4);
> +   // n = COUNT(n, 5);  // uncomment this line for 64-bit integers
> + return n;
> +   #undef COUNT
> +   #undef MASK
> +   #undef POW2
> +   }
 
That's quite a heavy implementation but I suppose with the proper cost
function it can still be worth it.  Did you also try some alternatives?
WWG comes to mind:
 
uint64_t c1 = 0x;
uint64_t c2 = 0x;
uint64_t c4 = 0x0F0F0F0F0F0F0F0F;
 
uint64_t wwg (uint64_t x) {
x -= (x >> 1) & c1;
x = ((x >> 2) & c2) + (x & c2);
x = (x + (x >> 4) ) & c4;
x *= 0x0101010101010101;
return x >> 56;
}
 
From my recollection this is usually 30-40% faster than the naive tree
adder and also amenable to vectorization.  As long as the multiplication
is not terribly slow, that is.  Mula's algorithm should be significantly
faster even, another 30% IIRC.
 
I'm not against continuing with the more well-known approach for now
but we should keep in mind that might still be potential for improvement.
 
>  } // namespace riscv_vector
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c
 
Any particular reason why the tests are in widen?
 
> +extern void abort (void) __attribute__ ((noreturn));
 
Why no __builtin_unreachable as in the other tests? 
 
> +  asm volatile ("" ::: "memory");
 
Is this necessary?  I doesn't hurt of course, just wondering.
 
All in all LGTM in case you'd rather get this upstream now.  We can
always improve later.
 
Regards
Robin
 


[PATCH] c++: parser cleanup, remove dummy arguments

2023-07-31 Thread Marek Polacek via Gcc-patches
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --

Now that cp_parser_constant_expression accepts a null non_constant_p,
we can transitively remove dummy arguments in the call chain.

Running dg.exp and counting the # of is_rvalue_constant_expression calls
from cp_parser_constant_expression:
pre-r14-2800: 2,459,145
this patch  : 1,719,454

gcc/cp/ChangeLog:

* parser.cc (cp_parser_postfix_expression): Adjust the call to
cp_parser_braced_list.
(cp_parser_postfix_open_square_expression): Likewise.
(cp_parser_new_initializer): Likewise.
(cp_parser_assignment_expression): Adjust the call to
cp_parser_initializer_clause.
(cp_parser_lambda_introducer): Adjust the call to cp_parser_initializer.
(cp_parser_range_for): Adjust the call to cp_parser_braced_list.
(cp_parser_jump_statement): Likewise.
(cp_parser_mem_initializer): Likewise.
(cp_parser_template_argument): Likewise.
(cp_parser_default_argument): Adjust the call to cp_parser_initializer.
(cp_parser_initializer): Handle null is_direct_init and non_constant_p
arguments.
(cp_parser_initializer_clause): Handle null non_constant_p argument.
(cp_parser_braced_list): Likewise.
(cp_parser_initializer_list): Likewise.
(cp_parser_member_declaration): Adjust the call to
cp_parser_initializer_clause and cp_parser_initializer.
(cp_parser_yield_expression): Adjust the call to cp_parser_braced_list.
(cp_parser_functional_cast): Likewise.
(cp_parser_late_parse_one_default_arg): Adjust the call to
cp_parser_initializer.
(cp_parser_omp_for_loop_init): Likewise.
(cp_parser_omp_declare_reduction_exprs): Likewise.
---
 gcc/cp/parser.cc | 102 +++
 1 file changed, 41 insertions(+), 61 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index b1d2e141e35..957eb705b2a 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2483,11 +2483,11 @@ static tree cp_parser_default_argument
 static void cp_parser_function_body
   (cp_parser *, bool);
 static tree cp_parser_initializer
-  (cp_parser *, bool *, bool *, bool = false);
+  (cp_parser *, bool * = nullptr, bool * = nullptr, bool = false);
 static cp_expr cp_parser_initializer_clause
-  (cp_parser *, bool *);
+  (cp_parser *, bool * = nullptr);
 static cp_expr cp_parser_braced_list
-  (cp_parser*, bool*);
+  (cp_parser*, bool * = nullptr);
 static vec *cp_parser_initializer_list
   (cp_parser *, bool *, bool *);
 
@@ -7734,12 +7734,8 @@ cp_parser_postfix_expression (cp_parser *parser, bool 
address_p, bool cast_p,
/* If things aren't going well, there's no need to
   keep going.  */
if (!cp_parser_error_occurred (parser))
- {
-   bool non_constant_p;
-   /* Parse the brace-enclosed initializer list.  */
-   initializer = cp_parser_braced_list (parser,
-_constant_p);
- }
+ /* Parse the brace-enclosed initializer list.  */
+ initializer = cp_parser_braced_list (parser);
/* If that worked, we're definitely looking at a
   compound-literal expression.  */
if (cp_parser_parse_definitely (parser))
@@ -8203,10 +8199,9 @@ cp_parser_postfix_open_square_expression (cp_parser 
*parser,
}
   else if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
{
- bool expr_nonconst_p;
  cp_lexer_set_source_position (parser->lexer);
  maybe_warn_cpp0x (CPP0X_INITIALIZER_LISTS);
- index = cp_parser_braced_list (parser, _nonconst_p);
+ index = cp_parser_braced_list (parser);
}
   else
index = cp_parser_expression (parser, NULL, /*cast_p=*/false,
@@ -9640,12 +9635,10 @@ cp_parser_new_initializer (cp_parser* parser)
 
   if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
 {
-  tree t;
-  bool expr_non_constant_p;
   cp_lexer_set_source_position (parser->lexer);
   maybe_warn_cpp0x (CPP0X_INITIALIZER_LISTS);
-  t = cp_parser_braced_list (parser, _non_constant_p);
-  CONSTRUCTOR_IS_DIRECT_INIT (t) = 1;
+  tree t = cp_parser_braced_list (parser);
+  CONSTRUCTOR_IS_DIRECT_INIT (t) = true;
   expression_list = make_tree_vector_single (t);
 }
   else
@@ -10505,11 +10498,8 @@ cp_parser_assignment_expression (cp_parser* parser, 
cp_id_kind * pidk,
= cp_parser_assignment_operator_opt (parser);
  if (assignment_operator != ERROR_MARK)
{
- bool non_constant_p;
-
  /* Parse the right-hand side of the assignment.  */
- cp_expr rhs = cp_parser_initializer_clause (parser,
- _constant_p);
+ cp_expr rhs = 

Re: [PATCH v2] combine: Narrow comparison of memory and constant

2023-07-31 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 1 Aug 2023 at 03:13, Prathamesh Kulkarni
 wrote:
>
> On Mon, 19 Jun 2023 at 19:59, Stefan Schulze Frielinghaus via
> Gcc-patches  wrote:
> >
> > Comparisons between memory and constants might be done in a smaller mode
> > resulting in smaller constants which might finally end up as immediates
> > instead of in the literal pool.
> >
> > For example, on s390x a non-symmetric comparison like
> >   x <= 0x3fff
> > results in the constant being spilled to the literal pool and an 8 byte
> > memory comparison is emitted.  Ideally, an equivalent comparison
> >   x0 <= 0x3f
> > where x0 is the most significant byte of x, is emitted where the
> > constant is smaller and more likely to materialize as an immediate.
> >
> > Similarly, comparisons of the form
> >   x >= 0x4000
> > can be shortened into x0 >= 0x40.
> >
> > Bootstrapped and regtested on s390x, x64, aarch64, and powerpc64le.
> > Note, the new tests show that for the mentioned little-endian targets
> > the optimization does not materialize since either the costs of the new
> > instructions are higher or they do not match.  Still ok for mainline?
> Hi Stefan,
> Unfortunately this patch (committed in 
> 7cdd0860949c6c3232e6cff1d7ca37bb5234074c)
> caused the following ICE on armv8l-unknown-linux-gnu:
Sorry I meant armv8l-unknown-linux-gnueabihf.
> during RTL pass: combine
> ../../../gcc/libgcc/fixed-bit.c: In function ‘__gnu_saturate1sq’:
> ../../../gcc/libgcc/fixed-bit.c:210:1: internal compiler error: in
> decompose, at rtl.h:2297
>   210 | }
>   | ^
> 0xaa23e3 wi::int_traits
> >::decompose(long long*, unsigned int, std::pair machine_mode> const&)
> ../../gcc/gcc/rtl.h:2297
> 0xaf5ab3 wide_int_ref_storage true>::wide_int_ref_storage
> >(std::pair const&)
> ../../gcc/gcc/wide-int.h:1030
> 0xaf5023 generic_wide_int
> >::generic_wide_int
> >(std::pair const&)
> ../../gcc/gcc/wide-int.h:788
> 0xf916f9 simplify_const_unary_operation(rtx_code, machine_mode,
> rtx_def*, machine_mode)
> ../../gcc/gcc/simplify-rtx.cc:2131
> 0xf8bad5 simplify_context::simplify_unary_operation(rtx_code,
> machine_mode, rtx_def*, machine_mode)
> ../../gcc/gcc/simplify-rtx.cc:889
> 0xf8a591 simplify_context::simplify_gen_unary(rtx_code, machine_mode,
> rtx_def*, machine_mode)
> ../../gcc/gcc/simplify-rtx.cc:360
> 0x9bd1b7 simplify_gen_unary(rtx_code, machine_mode, rtx_def*, machine_mode)
> ../../gcc/gcc/rtl.h:3520
> 0x1bd5677 simplify_comparison
> ../../gcc/gcc/combine.cc:13125
> 0x1bc2b2b simplify_set
> ../../gcc/gcc/combine.cc:6848
> 0x1bc1647 combine_simplify_rtx
> ../../gcc/gcc/combine.cc:6353
> 0x1bbf97f subst
> ../../gcc/gcc/combine.cc:5609
> 0x1bb864b try_combine
> ../../gcc/gcc/combine.cc:3302
> 0x1bb30fb combine_instructions
> ../../gcc/gcc/combine.cc:1264
> 0x1bd8d25 rest_of_handle_combine
> ../../gcc/gcc/combine.cc:15059
> 0x1bd8dd5 execute
> ../../gcc/gcc/combine.cc:15103
> Please submit a full bug report, with preprocessed source (by using
> -freport-bug).
> Please include the complete backtrace with any bug report.
> See  for instructions.
>
> Could you please take a look ?
>
> Thanks,
> Prathamesh
> >
> > gcc/ChangeLog:
> >
> > * combine.cc (simplify_compare_const): Narrow comparison of
> > memory and constant.
> > (try_combine): Adapt new function signature.
> > (simplify_comparison): Adapt new function signature.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/cmp-mem-const-1.c: New test.
> > * gcc.dg/cmp-mem-const-2.c: New test.
> > * gcc.dg/cmp-mem-const-3.c: New test.
> > * gcc.dg/cmp-mem-const-4.c: New test.
> > * gcc.dg/cmp-mem-const-5.c: New test.
> > * gcc.dg/cmp-mem-const-6.c: New test.
> > * gcc.target/s390/cmp-mem-const-1.c: New test.
> > ---
> >  gcc/combine.cc| 79 +--
> >  gcc/testsuite/gcc.dg/cmp-mem-const-1.c| 17 
> >  gcc/testsuite/gcc.dg/cmp-mem-const-2.c| 17 
> >  gcc/testsuite/gcc.dg/cmp-mem-const-3.c| 17 
> >  gcc/testsuite/gcc.dg/cmp-mem-const-4.c| 17 
> >  gcc/testsuite/gcc.dg/cmp-mem-const-5.c| 17 
> >  gcc/testsuite/gcc.dg/cmp-mem-const-6.c| 17 
> >  .../gcc.target/s390/cmp-mem-const-1.c | 24 ++
> >  8 files changed, 200 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-3.c
> >  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-4.c
> >  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-5.c
> >  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-6.c
> >  create mode 100644 gcc/testsuite/gcc.target/s390/cmp-mem-const-1.c
> >
> > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > 

Re: [PATCH v2] combine: Narrow comparison of memory and constant

2023-07-31 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 19 Jun 2023 at 19:59, Stefan Schulze Frielinghaus via
Gcc-patches  wrote:
>
> Comparisons between memory and constants might be done in a smaller mode
> resulting in smaller constants which might finally end up as immediates
> instead of in the literal pool.
>
> For example, on s390x a non-symmetric comparison like
>   x <= 0x3fff
> results in the constant being spilled to the literal pool and an 8 byte
> memory comparison is emitted.  Ideally, an equivalent comparison
>   x0 <= 0x3f
> where x0 is the most significant byte of x, is emitted where the
> constant is smaller and more likely to materialize as an immediate.
>
> Similarly, comparisons of the form
>   x >= 0x4000
> can be shortened into x0 >= 0x40.
>
> Bootstrapped and regtested on s390x, x64, aarch64, and powerpc64le.
> Note, the new tests show that for the mentioned little-endian targets
> the optimization does not materialize since either the costs of the new
> instructions are higher or they do not match.  Still ok for mainline?
Hi Stefan,
Unfortunately this patch (committed in 7cdd0860949c6c3232e6cff1d7ca37bb5234074c)
caused the following ICE on armv8l-unknown-linux-gnu:
during RTL pass: combine
../../../gcc/libgcc/fixed-bit.c: In function ‘__gnu_saturate1sq’:
../../../gcc/libgcc/fixed-bit.c:210:1: internal compiler error: in
decompose, at rtl.h:2297
  210 | }
  | ^
0xaa23e3 wi::int_traits
>::decompose(long long*, unsigned int, std::pair const&)
../../gcc/gcc/rtl.h:2297
0xaf5ab3 wide_int_ref_storage::wide_int_ref_storage
>(std::pair const&)
../../gcc/gcc/wide-int.h:1030
0xaf5023 generic_wide_int
>::generic_wide_int
>(std::pair const&)
../../gcc/gcc/wide-int.h:788
0xf916f9 simplify_const_unary_operation(rtx_code, machine_mode,
rtx_def*, machine_mode)
../../gcc/gcc/simplify-rtx.cc:2131
0xf8bad5 simplify_context::simplify_unary_operation(rtx_code,
machine_mode, rtx_def*, machine_mode)
../../gcc/gcc/simplify-rtx.cc:889
0xf8a591 simplify_context::simplify_gen_unary(rtx_code, machine_mode,
rtx_def*, machine_mode)
../../gcc/gcc/simplify-rtx.cc:360
0x9bd1b7 simplify_gen_unary(rtx_code, machine_mode, rtx_def*, machine_mode)
../../gcc/gcc/rtl.h:3520
0x1bd5677 simplify_comparison
../../gcc/gcc/combine.cc:13125
0x1bc2b2b simplify_set
../../gcc/gcc/combine.cc:6848
0x1bc1647 combine_simplify_rtx
../../gcc/gcc/combine.cc:6353
0x1bbf97f subst
../../gcc/gcc/combine.cc:5609
0x1bb864b try_combine
../../gcc/gcc/combine.cc:3302
0x1bb30fb combine_instructions
../../gcc/gcc/combine.cc:1264
0x1bd8d25 rest_of_handle_combine
../../gcc/gcc/combine.cc:15059
0x1bd8dd5 execute
../../gcc/gcc/combine.cc:15103
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

Could you please take a look ?

Thanks,
Prathamesh
>
> gcc/ChangeLog:
>
> * combine.cc (simplify_compare_const): Narrow comparison of
> memory and constant.
> (try_combine): Adapt new function signature.
> (simplify_comparison): Adapt new function signature.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/cmp-mem-const-1.c: New test.
> * gcc.dg/cmp-mem-const-2.c: New test.
> * gcc.dg/cmp-mem-const-3.c: New test.
> * gcc.dg/cmp-mem-const-4.c: New test.
> * gcc.dg/cmp-mem-const-5.c: New test.
> * gcc.dg/cmp-mem-const-6.c: New test.
> * gcc.target/s390/cmp-mem-const-1.c: New test.
> ---
>  gcc/combine.cc| 79 +--
>  gcc/testsuite/gcc.dg/cmp-mem-const-1.c| 17 
>  gcc/testsuite/gcc.dg/cmp-mem-const-2.c| 17 
>  gcc/testsuite/gcc.dg/cmp-mem-const-3.c| 17 
>  gcc/testsuite/gcc.dg/cmp-mem-const-4.c| 17 
>  gcc/testsuite/gcc.dg/cmp-mem-const-5.c| 17 
>  gcc/testsuite/gcc.dg/cmp-mem-const-6.c| 17 
>  .../gcc.target/s390/cmp-mem-const-1.c | 24 ++
>  8 files changed, 200 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-6.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/cmp-mem-const-1.c
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 5aa0ec5c45a..56e15a93409 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -460,7 +460,7 @@ static rtx simplify_shift_const (rtx, enum rtx_code, 
> machine_mode, rtx,
>  static int recog_for_combine (rtx *, rtx_insn *, rtx *);
>  static rtx gen_lowpart_for_combine (machine_mode, rtx);
>  static enum rtx_code simplify_compare_const (enum 

Re: [C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-07-31 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 21 Jul 2023 at 16:52, Martin Uecker via Gcc-patches
 wrote:
>
>
>
> This patch adds a warning for allocations with insufficient size
> based on the "alloc_size" attribute and the type of the pointer
> the result is assigned to. While it is theoretically legal to
> assign to the wrong pointer type and cast it to the right type
> later, this almost always indicates an error. Since this catches
> common mistakes and is simple to diagnose, it is suggested to
> add this warning.
>
>
> Bootstrapped and regression tested on x86.
>
>
> Martin
>
>
>
> Add option Walloc-type that warns about allocations that have
> insufficient storage for the target type of the pointer the
> storage is assigned to.
>
> gcc:
> * doc/invoke.texi: Document -Wstrict-flex-arrays option.
>
> gcc/c-family:
>
> * c.opt (Walloc-type): New option.
>
> gcc/c:
> * c-typeck.cc (convert_for_assignment): Add Walloc-type warning.
>
> gcc/testsuite:
>
> * gcc.dg/Walloc-type-1.c: New test.
>
>
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 4abdc8d0e77..8b9d148582b 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -319,6 +319,10 @@ Walloca
>  C ObjC C++ ObjC++ Var(warn_alloca) Warning
>  Warn on any use of alloca.
>
> +Walloc-type
> +C ObjC Var(warn_alloc_type) Warning
> +Warn when allocating insufficient storage for the target type of the
> assigned pointer.
> +
>  Walloc-size-larger-than=
>  C ObjC C++ LTO ObjC++ Var(warn_alloc_size_limit) Joined Host_Wide_Int
> ByteSize Warning Init(HOST_WIDE_INT_MAX)
>  -Walloc-size-larger-than=   Warn for calls to allocation
> functions that
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 7cf411155c6..2e392f9c952 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -7343,6 +7343,32 @@ convert_for_assignment (location_t location,
> location_t expr_loc, tree type,
> "request for implicit conversion "
> "from %qT to %qT not permitted in C++", rhstype,
> type);
>
> +  /* Warn of new allocations are not big enough for the target
> type.  */
> +  tree fndecl;
> +  if (warn_alloc_type
> + && TREE_CODE (rhs) == CALL_EXPR
> + && (fndecl = get_callee_fndecl (rhs)) != NULL_TREE
> + && DECL_IS_MALLOC (fndecl))
> +   {
> + tree fntype = TREE_TYPE (fndecl);
> + tree fntypeattrs = TYPE_ATTRIBUTES (fntype);
> + tree alloc_size = lookup_attribute ("alloc_size",
> fntypeattrs);
> + if (alloc_size)
> +   {
> + tree args = TREE_VALUE (alloc_size);
> + int idx = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1;
> + /* For calloc only use the second argument.  */
> + if (TREE_CHAIN (args))
> +   idx = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN
> (args))) - 1;
> + tree arg = CALL_EXPR_ARG (rhs, idx);
> + if (TREE_CODE (arg) == INTEGER_CST
> + && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
Hi Martin,
Just wondering if it'd be a good idea perhaps to warn if alloc size is
not a multiple of TYPE_SIZE_UNIT instead of just less-than ?
So it can catch cases like:
int *p = malloc (sizeof (int) + 2); // probably intended malloc
(sizeof (int) * 2)

FWIW, this is caught using -fanalyzer:
f.c: In function 'f':
f.c:3:12: warning: allocated buffer size is not a multiple of the
pointee's size [CWE-131] [-Wanalyzer-allocation-size]
3 |   int *p = __builtin_malloc (sizeof(int) + 2);
  |^~

Thanks,
Prathamesh
> +warning_at (location, OPT_Walloc_type, "allocation of
> "
> +"insufficient size %qE for type %qT with
> "
> +"size %qE", arg, ttl, TYPE_SIZE_UNIT
> (ttl));
> +   }
> +   }
> +
>/* See if the pointers point to incompatible address spaces.  */
>asl = TYPE_ADDR_SPACE (ttl);
>asr = TYPE_ADDR_SPACE (ttr);
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 88e3c625030..6869bed64c3 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -8076,6 +8076,15 @@ always leads to a call to another @code{cold}
> function such as wrappers of
>  C++ @code{throw} or fatal error reporting functions leading to
> @code{abort}.
>  @end table
>
> +@opindex Wno-alloc-type
> +@opindex Walloc-type
> +@item -Walloc-type
> +Warn about calls to allocation functions decorated with attribute
> +@code{alloc_size} that specify insufficient size for the target type
> of
> +the pointer the result is assigned to, including those to the built-in
> +forms of the functions @code{aligned_alloc}, @code{alloca},
> @code{calloc},
> +@code{malloc}, and @code{realloc}.
> +
>  @opindex Wno-alloc-zero
>  @opindex Walloc-zero
>  @item -Walloc-zero
> diff --git a/gcc/testsuite/gcc.dg/Walloc-type-1.c
> b/gcc/testsuite/gcc.dg/Walloc-type-1.c
> new file mode 100644
> index 000..bc62e5e9aa3
> --- 

Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization

2023-07-31 Thread Robin Dapp via Gcc-patches
> +/* FIXME: We don't allow vectorize "__builtin_popcountll" yet since it needs 
> "vec_pack_trunc" support
> +  and such pattern may cause inferior codegen.
> +   We will enable "vec_pack_trunc" when we support reasonable vector 
> cost model.  */

Wait, why do we need vec_pack_trunc for popcountll?  For me vectorizing
it "just works" when the output is a uint64_t just like the standard
name demands.

If you're referring to something else, please detail in the comment.

Regards
 Robin


[PATCH] PHIOPT: Mark the conditional lhs and rhs as to look at to see if DCEable

2023-07-31 Thread Andrew Pinski via Gcc-patches
In some cases (usually dealing with bools only), there could be some statements
left behind which are considered trivial dead.
An example is:
```
bool f(bool a, bool b)
{
if (!a && !b)
return 0;
if (!a && b)
return 0;
if (a && !b)
return 0;
return 1;
}
```
Where during phiopt2, the IR had:
```
  _3 = ~b_7(D);
  _4 = _3 & a_6(D);
  _4 != 0 ? 0 : 1
```
match-and-simplify would transform that into:
```
  _11 = ~a_6(D);
  _12 = b_7(D) | _11;
```
But phiopt would leave around the statements defining _4 and _3.
This helps by marking the conditional's lhs and rhs to see if they are
trivial dead.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (match_simplify_replacement): Mark's cond
statement's lhs and rhs to check if trivial dead.
Rename inserted_exprs to exprs_maybe_dce; also move it so
bitmap is not allocated if not needed.
---
 gcc/tree-ssa-phiopt.cc | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index cb4e2da023d..ff36bb0119b 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -767,7 +767,6 @@ match_simplify_replacement (basic_block cond_bb, 
basic_block middle_bb,
   tree result;
   gimple *stmt_to_move = NULL;
   gimple *stmt_to_move_alt = NULL;
-  auto_bitmap inserted_exprs;
   tree arg_true, arg_false;
 
   /* Special case A ? B : B as this will always simplify to B. */
@@ -844,6 +843,18 @@ match_simplify_replacement (basic_block cond_bb, 
basic_block middle_bb,
   if (!result)
 return false;
 
+  auto_bitmap exprs_maybe_dce;
+
+  /* Mark the cond statements' lhs/rhs as maybe dce.  */
+  if (TREE_CODE (gimple_cond_lhs (stmt)) == SSA_NAME
+  && !SSA_NAME_IS_DEFAULT_DEF (gimple_cond_lhs (stmt)))
+bitmap_set_bit (exprs_maybe_dce,
+   SSA_NAME_VERSION (gimple_cond_lhs (stmt)));
+  if (TREE_CODE (gimple_cond_rhs (stmt)) == SSA_NAME
+  && !SSA_NAME_IS_DEFAULT_DEF (gimple_cond_rhs (stmt)))
+bitmap_set_bit (exprs_maybe_dce,
+   SSA_NAME_VERSION (gimple_cond_rhs (stmt)));
+
   gsi = gsi_last_bb (cond_bb);
   /* Insert the sequence generated from gimple_simplify_phiopt.  */
   if (seq)
@@ -855,7 +866,7 @@ match_simplify_replacement (basic_block cond_bb, 
basic_block middle_bb,
  gimple *stmt = gsi_stmt (gsi1);
  tree name = gimple_get_lhs (stmt);
  if (name && TREE_CODE (name) == SSA_NAME)
-   bitmap_set_bit (inserted_exprs, SSA_NAME_VERSION (name));
+   bitmap_set_bit (exprs_maybe_dce, SSA_NAME_VERSION (name));
}
   if (dump_file && (dump_flags & TDF_FOLDING))
{
@@ -867,10 +878,10 @@ match_simplify_replacement (basic_block cond_bb, 
basic_block middle_bb,
 
   /* If there was a statement to move, move it to right before
  the original conditional.  */
-  move_stmt (stmt_to_move, , inserted_exprs);
-  move_stmt (stmt_to_move_alt, , inserted_exprs);
+  move_stmt (stmt_to_move, , exprs_maybe_dce);
+  move_stmt (stmt_to_move_alt, , exprs_maybe_dce);
 
-  replace_phi_edge_with_variable (cond_bb, e1, phi, result, inserted_exprs);
+  replace_phi_edge_with_variable (cond_bb, e1, phi, result, exprs_maybe_dce);
 
   /* Add Statistic here even though replace_phi_edge_with_variable already
  does it as we want to be able to count when match-simplify happens vs
-- 
2.31.1



Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-07-31 Thread Qing Zhao via Gcc-patches
Hi,

After some detailed study and consideration on how to use the new attribute 
“counted_by”
 in __builtin_dynamic_object_size, I came up with the following example with 
detailed explanation 
on the expected behavior from GCC on using this new attribute. 

Please take a look on this example and  the explanation embedded, and let me 
know if you have further
Comments or suggestions.

Thanks a lot.

Qing

===
#include 
#include 
#include 
#include 

struct annotated {
size_t foo;
int array[] __attribute__((counted_by (foo)));
};

#define expect(p, _v) do { \
size_t v = _v; \
if (p == v) \
__builtin_printf ("ok:  %s == %zd\n", #p, p); \
else \
{  \
  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
} \
} while (0);

#define noinline __attribute__((__noinline__))
#define SIZE_BUMP 2 

/* In general, Due to type casting, the type for the pointee of a pointer
   does not say anything about the object it points to,
   So, __builtin_object_size can not directly use the type of the pointee
   to decide the size of the object the pointer points to.

   there are only two reliable ways:
   A. observed allocations  (call to the allocation functions in the routine)
   B. observed accesses (read or write access to the location of the 
 pointer points to)

   that provide information about the type/existence of an object at
   the corresponding address.

   for A, we use the "alloc_size" attribute for the corresponding allocation
   functions to determine the object size;

   For B, we use the SIZE info of the TYPE attached to the corresponding access.
   (We treat counted_by attribute as a complement to the SIZE info of the TYPE
for FMA) 

   The only other way in C which ensures that a pointer actually points
   to an object of the correct type is 'static':

   void foo(struct P *p[static 1]);   

   See https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624814.html
   for more details.  */

/* in the following function, malloc allocated more space than the value
   of counted_by attribute.  Then what's the correct behavior we expect
   the __builtin_dynamic_object_size should have for each of the cases?  */ 

static struct annotated * noinline alloc_buf (int index)
{
  struct annotated *p;
  p = malloc(sizeof (*p) + (index + SIZE_BUMP) * sizeof (int));
  p->foo = index;

  /*when checking the observed access p->array, we have info on both
observered allocation and observed access, 
A. from observed allocation: (index + SIZE_BUMP) * sizeof (int)
B. from observed access: p->foo * sizeof (int)

in the above, p->foo = index.
   */
   
  /* for MAXIMUM sub-object size: chose the smaller of A and B.
   * Please see https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625891.html
   * for details on why.  */
  expect(__builtin_dynamic_object_size(p->array, 1), (p->foo) * sizeof(int));
  expect(__builtin_dynamic_object_size(p->array, 0), sizeof (*p) + (p->foo) * 
sizeof(int));

  /* for MINIMUM sub-object size: chose the smaller of A and B too.  */
  expect(__builtin_dynamic_object_size(p->array, 3), p->foo * sizeof(int));
  expect(__builtin_dynamic_object_size(p->array, 2), sizeof (*p) + p->foo * 
sizeof(int));

  /*when checking the pointer p, we only have info on the observed allocation.
So, the object size info can only been obtained from the call to malloc.
for both MAXIMUM and MINIMUM: A = (index + SIZE_BUMP) * sizeof (int)  */ 
  expect(__builtin_dynamic_object_size(p, 1), sizeof (*p) + (p->foo + 
SIZE_BUMP) * sizeof(int));
  expect(__builtin_dynamic_object_size(p, 0), sizeof (*p) + (p->foo + 
SIZE_BUMP) * sizeof(int));
  expect(__builtin_dynamic_object_size(p, 3), sizeof (*p) + (p->foo + 
SIZE_BUMP) * sizeof(int));
  expect(__builtin_dynamic_object_size(p, 2), sizeof (*p) + (p->foo + 
SIZE_BUMP) * sizeof(int));
  return p;
}


int main ()
{
  struct annotated *p; 
  p = alloc_buf (10);
  /*when checking the observed access p->array, we only have info on the
observed access, i.e, the TYPE_SIZE info from the access. We don't have
info on the whole object.  */

  /*For MAXIMUM size, We know the SIZE info of the TYPE from the access to the
sub-object p->array.
but don't know the whole object the pointer p points to.  */ 
  expect(__builtin_dynamic_object_size(p->array, 1), p->foo * sizeof(int));
  expect(__builtin_dynamic_object_size(p->array, 0), -1);

  /*for MINIMUM size, We know the TYPE_SIZE from the access to the sub-oject
p->array.
but don't know the whole object the pointer p points to.  */
  expect(__builtin_dynamic_object_size(p->array, 3), p->foo * sizeof(int));
  expect(__builtin_dynamic_object_size(p->array, 2), 0);

  /*when checking the pointer p, we have no observed allocation nor observed 
access.
therefore, we cannot determine the size info here.  */
  expect(__builtin_dynamic_object_size(p, 1), 

Re: [C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-07-31 Thread Martin Uecker via Gcc-patches
Am Montag, dem 31.07.2023 um 15:39 -0400 schrieb Siddhesh Poyarekar:
> On 2023-07-21 07:21, Martin Uecker via Gcc-patches wrote:
> > 
> > 
> > This patch adds a warning for allocations with insufficient size
> > based on the "alloc_size" attribute and the type of the pointer
> > the result is assigned to. While it is theoretically legal to
> > assign to the wrong pointer type and cast it to the right type
> > later, this almost always indicates an error. Since this catches
> > common mistakes and is simple to diagnose, it is suggested to
> > add this warning.
> >   

...

> > 
> 
> Wouldn't this be much more useful in later phases with ranger feedback 
> like with the warn_access warnings?  That way the comparison won't be 
> limited to constant sizes.

Possibly. Having it in the FE made it simple to implement and
also reliable.  One thing I considered is also looking deeper
into the argument and detect obvious mistakes, e.g. if the
type in a sizeof is the right one. Such extensions would be
easier in the FE.

But I wouldn't mind replacing or extending this with something
smarter emitted from later phases. I probably do not have time
to work on this is myself in the near future though.

Martin




Re: [C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-07-31 Thread Siddhesh Poyarekar

On 2023-07-21 07:21, Martin Uecker via Gcc-patches wrote:



This patch adds a warning for allocations with insufficient size
based on the "alloc_size" attribute and the type of the pointer
the result is assigned to. While it is theoretically legal to
assign to the wrong pointer type and cast it to the right type
later, this almost always indicates an error. Since this catches
common mistakes and is simple to diagnose, it is suggested to
add this warning.
  


Bootstrapped and regression tested on x86.


Martin



Add option Walloc-type that warns about allocations that have
insufficient storage for the target type of the pointer the
storage is assigned to.

gcc:
* doc/invoke.texi: Document -Wstrict-flex-arrays option.

gcc/c-family:

* c.opt (Walloc-type): New option.

gcc/c:
* c-typeck.cc (convert_for_assignment): Add Walloc-type warning.

gcc/testsuite:

* gcc.dg/Walloc-type-1.c: New test.


diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 4abdc8d0e77..8b9d148582b 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -319,6 +319,10 @@ Walloca
  C ObjC C++ ObjC++ Var(warn_alloca) Warning
  Warn on any use of alloca.
  
+Walloc-type

+C ObjC Var(warn_alloc_type) Warning
+Warn when allocating insufficient storage for the target type of the
assigned pointer.
+
  Walloc-size-larger-than=
  C ObjC C++ LTO ObjC++ Var(warn_alloc_size_limit) Joined Host_Wide_Int
ByteSize Warning Init(HOST_WIDE_INT_MAX)
  -Walloc-size-larger-than=Warn for calls to allocation
functions that
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 7cf411155c6..2e392f9c952 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -7343,6 +7343,32 @@ convert_for_assignment (location_t location,
location_t expr_loc, tree type,
"request for implicit conversion "
"from %qT to %qT not permitted in C++", rhstype,
type);
  
+  /* Warn of new allocations are not big enough for the target

type.  */
+  tree fndecl;
+  if (warn_alloc_type
+ && TREE_CODE (rhs) == CALL_EXPR
+ && (fndecl = get_callee_fndecl (rhs)) != NULL_TREE
+ && DECL_IS_MALLOC (fndecl))
+   {
+ tree fntype = TREE_TYPE (fndecl);
+ tree fntypeattrs = TYPE_ATTRIBUTES (fntype);
+ tree alloc_size = lookup_attribute ("alloc_size",
fntypeattrs);
+ if (alloc_size)
+   {
+ tree args = TREE_VALUE (alloc_size);
+ int idx = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1;
+ /* For calloc only use the second argument.  */
+ if (TREE_CHAIN (args))
+   idx = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN
(args))) - 1;
+ tree arg = CALL_EXPR_ARG (rhs, idx);
+ if (TREE_CODE (arg) == INTEGER_CST
+ && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
+warning_at (location, OPT_Walloc_type, "allocation of
"
+"insufficient size %qE for type %qT with
"
+"size %qE", arg, ttl, TYPE_SIZE_UNIT
(ttl));
+   }
+   }
+


Wouldn't this be much more useful in later phases with ranger feedback 
like with the warn_access warnings?  That way the comparison won't be 
limited to constant sizes.


Thanks,
Sid


Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization

2023-07-31 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

> +/* Expand Vector POPCOUNT by parallel popcnt:
> +
> +   int parallel_popcnt(uint32_t n) {
> +   #define POW2(c)  (1U << (c))
> +   #define MASK(c)  (static_cast(-1) / (POW2(POW2(c)) + 1U))
> +   #define COUNT(x, c)  ((x) & MASK(c)) + (((x)>>(POW2(c))) & MASK(c))
> + n = COUNT(n, 0);
> + n = COUNT(n, 1);
> + n = COUNT(n, 2);
> + n = COUNT(n, 3);
> + n = COUNT(n, 4);
> +   //n = COUNT(n, 5);  // uncomment this line for 64-bit integers
> + return n;
> +   #undef COUNT
> +   #undef MASK
> +   #undef POW2
> +   }

That's quite a heavy implementation but I suppose with the proper cost
function it can still be worth it.  Did you also try some alternatives?
WWG comes to mind:

uint64_t c1 = 0x;
uint64_t c2 = 0x;
uint64_t c4 = 0x0F0F0F0F0F0F0F0F;

uint64_t wwg (uint64_t x) {
x -= (x >> 1) & c1;
x = ((x >> 2) & c2) + (x & c2);
x = (x + (x >> 4) ) & c4;
x *= 0x0101010101010101;
return x >> 56;
}

>From my recollection this is usually 30-40% faster than the naive tree
adder and also amenable to vectorization.  As long as the multiplication
is not terribly slow, that is.  Mula's algorithm should be significantly
faster even, another 30% IIRC.

I'm not against continuing with the more well-known approach for now
but we should keep in mind that might still be potential for improvement.

>  } // namespace riscv_vector
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c

Any particular reason why the tests are in widen?

> +extern void abort (void) __attribute__ ((noreturn));

Why no __builtin_unreachable as in the other tests? 

> +  asm volatile ("" ::: "memory");

Is this necessary?  I doesn't hurt of course, just wondering.

All in all LGTM in case you'd rather get this upstream now.  We can
always improve later.

Regards
 Robin


Re: [PATCH RESEND] c: add -Wmissing-variable-declarations [PR65213]

2023-07-31 Thread Joseph Myers
On Mon, 31 Jul 2023, Hamza Mahfooz wrote:

> Hey Joseph,
> 
> On Fri, Jul 28 2023 at 08:32:31 PM +00:00:00, Joseph Myers
>  wrote:
> > > OK.
> > 
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com
> 
> Since I don't have write access, do you mind pushing this for me?

Done.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: One question on the source code of tree-object-size.cc

2023-07-31 Thread Qing Zhao via Gcc-patches


> On Jul 31, 2023, at 2:23 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-07-31 14:13, Qing Zhao wrote:
>> Okay. I see.
>> Then if the size info from the TYPE is smaller than the size info from the 
>> malloc,
>>  then based on the current code, we use the smaller one between these two,
>>  i.e, the size info from the TYPE.  (Even for the OST_MAXIMUM).
>> Is such behavior correct?
> 
> Yes, it's correct even for OST_MAXIMUM.  The smaller one between the two is 
> the more precise estimate, which is why the mode doesn't matter.
> 
>> This is for the new “counted_by” attribute and how to use it in 
>> __builtin_dynamic_object_size.
>> for example:
>> ===
>> struct annotated {
>> size_t foo;
>> int array[] __attribute__((counted_by (foo)));
>> };
>> #define noinline __attribute__((__noinline__))
>> #define SIZE_BUMP 2
>> /* in the following function, malloc allocated more space than the value
>>of counted_by attribute.  Then what's the correct behavior we expect
>>the __builtin_dynamic_object_size should have?  */
>> static struct annotated * noinline alloc_buf (int index)
>> {
>>   struct annotated *p;
>>   p = malloc(sizeof (*p) + (index + SIZE_BUMP) * sizeof (int));
>>   p->foo = index;
>>   /*when checking the observed access p->array, we have info on both
>> observered allocation and observed access,
>> A. from observed allocation: (index + SIZE_BUMP) * sizeof (int)
>> B. from observed access: p->foo * sizeof (int)
>> in the above, p->foo = index.
>>*/
>>   /* for MAXIMUM size, based on the current code, we will use the size info 
>> from the TYPE,
>>  i.e, the “counted_by” attribute, which is the smaller one.   */
>>   expect(__builtin_dynamic_object_size(p->array, 1), (p->foo) * sizeof(int));
> 
> If the counted_by is less than what is allocated, it is the more correct 
> value to return because that's what the application asked for through the 
> attribute.  If the allocated size is less, we return the allocated size 
> because in that case, despite what the application said, the actual allocated 
> size is less and hence that's the safer value.

Thanks a lot for the clear explanation. This makes good sense.
> 
> In fact in the latter case it may even make sense to emit a warning because 
> it is more likely than not to be a bug.

Agreed. 

Here is a patch from Martin on a new similar warning (-Walloc-type):  
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625172.html. 

I guess that I will also need to issue warning for such cases for the new 
attribute “counted_by”.

Qing

> Thanks,
> Sid



Re: One question on the source code of tree-object-size.cc

2023-07-31 Thread Siddhesh Poyarekar

On 2023-07-31 14:13, Qing Zhao wrote:

Okay. I see.

Then if the size info from the TYPE is smaller than the size info from the 
malloc,
  then based on the current code, we use the smaller one between these two,
  i.e, the size info from the TYPE.  (Even for the OST_MAXIMUM).

Is such behavior correct?


Yes, it's correct even for OST_MAXIMUM.  The smaller one between the two 
is the more precise estimate, which is why the mode doesn't matter.




This is for the new “counted_by” attribute and how to use it in 
__builtin_dynamic_object_size.
for example:

===

struct annotated {
 size_t foo;
 int array[] __attribute__((counted_by (foo)));
};

#define noinline __attribute__((__noinline__))
#define SIZE_BUMP 2

/* in the following function, malloc allocated more space than the value
of counted_by attribute.  Then what's the correct behavior we expect
the __builtin_dynamic_object_size should have?  */

static struct annotated * noinline alloc_buf (int index)
{
   struct annotated *p;
   p = malloc(sizeof (*p) + (index + SIZE_BUMP) * sizeof (int));
   p->foo = index;

   /*when checking the observed access p->array, we have info on both
 observered allocation and observed access,
 A. from observed allocation: (index + SIZE_BUMP) * sizeof (int)
 B. from observed access: p->foo * sizeof (int)

 in the above, p->foo = index.
*/

   /* for MAXIMUM size, based on the current code, we will use the size info 
from the TYPE,
  i.e, the “counted_by” attribute, which is the smaller one.   */
   expect(__builtin_dynamic_object_size(p->array, 1), (p->foo) * sizeof(int));


If the counted_by is less than what is allocated, it is the more correct 
value to return because that's what the application asked for through 
the attribute.  If the allocated size is less, we return the allocated 
size because in that case, despite what the application said, the actual 
allocated size is less and hence that's the safer value.


In fact in the latter case it may even make sense to emit a warning 
because it is more likely than not to be a bug.


Thanks,
Sid


Re: One question on the source code of tree-object-size.cc

2023-07-31 Thread Qing Zhao via Gcc-patches
Hi, Sid,

Thanks a lot.

> On Jul 31, 2023, at 1:07 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-07-31 13:03, Siddhesh Poyarekar wrote:
>> On 2023-07-31 12:47, Qing Zhao wrote:
>>> Hi, Sid and Jakub,
>>> 
>>> I have a question in the following source portion of the routine 
>>> “addr_object_size” of gcc/tree-object-size.cc:
>>> 
>>>   743   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
>>>   744   if (bytes != error_mark_node)
>>>   745 {
>>>   746   bytes = size_for_offset (var_size, bytes);
>>>   747   if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == 
>>> MEM_REF)
>>>   748 {
>>>   749   tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 
>>> 0),
>>>   750pt_var);
>>>   751   if (bytes2 != error_mark_node)
>>>   752 {
>>>   753   bytes2 = size_for_offset (pt_var_size, bytes2);
>>>   754   bytes = size_binop (MIN_EXPR, bytes, bytes2);
>>>   755 }
>>>   756 }
>>>   757 }
>>> 
>>> At line 754, why we always use “MIN_EXPR” whenever it’s for OST_MINIMUM or 
>>> not?
>>> Shall we use
>>> 
>>> (object_size_type & OST_MINIMUM
>>>  ? MIN_EXPR : MAX_EXPR)
>>> 
>> That MIN_EXPR is not for OST_MINIMUM.  It is to cater for allocations like 
>> this:
>> typedef struct
>> {
>>   int a;
>> } A;
>> size_t f()
>> {
>>   A *p = malloc (1);
>>   return __builtin_object_size (p, 0);
> 
> Correction, that should be __builtin_object_size (>a, 0)

Okay. I see.

Then if the size info from the TYPE is smaller than the size info from the 
malloc,
 then based on the current code, we use the smaller one between these two,
 i.e, the size info from the TYPE.  (Even for the OST_MAXIMUM). 

Is such behavior correct?

This is for the new “counted_by” attribute and how to use it in 
__builtin_dynamic_object_size. 
for example:

===

struct annotated {
size_t foo;
int array[] __attribute__((counted_by (foo)));
};

#define noinline __attribute__((__noinline__))
#define SIZE_BUMP 2 

/* in the following function, malloc allocated more space than the value
   of counted_by attribute.  Then what's the correct behavior we expect
   the __builtin_dynamic_object_size should have?  */

static struct annotated * noinline alloc_buf (int index)
{
  struct annotated *p;
  p = malloc(sizeof (*p) + (index + SIZE_BUMP) * sizeof (int));
  p->foo = index;

  /*when checking the observed access p->array, we have info on both
observered allocation and observed access, 
A. from observed allocation: (index + SIZE_BUMP) * sizeof (int)
B. from observed access: p->foo * sizeof (int)

in the above, p->foo = index.
   */

  /* for MAXIMUM size, based on the current code, we will use the size info 
from the TYPE, 
 i.e, the “counted_by” attribute, which is the smaller one.   */
  expect(__builtin_dynamic_object_size(p->array, 1), (p->foo) * sizeof(int));

  return p;
}


Is the above the correct behavior?

thanks.

Qing
> 
>> }
>> where the returned size should be 1 and not sizeof (int).  The mode doesn't 
>> really matter in this case.
>> HTH.
>> Sid



[PATCH 2/2] Slightly improve bitwise_inverted_equal_p comparisons

2023-07-31 Thread Andrew Pinski via Gcc-patches
This slighly improves bitwise_inverted_equal_p
for comparisons. Instead of just comparing the
comparisons operands also valueize them.
This will allow ccp and others to match the 2 comparisons
without an extra pass happening.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* gimple-match-head.cc (gimple_bitwise_inverted_equal_p): Valueize
the comparison operands before comparing them.
---
 gcc/gimple-match-head.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 0265e55be93..b1e96304d7c 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -319,12 +319,12 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, 
tree (*valueize) (tree)
   && TREE_CODE_CLASS (gimple_assign_rhs_code (a1)) == tcc_comparison
   && TREE_CODE_CLASS (gimple_assign_rhs_code (a2)) == tcc_comparison)
 {
-  tree op10 = gimple_assign_rhs1 (a1);
-  tree op20 = gimple_assign_rhs1 (a2);
+  tree op10 = do_valueize (valueize, gimple_assign_rhs1 (a1));
+  tree op20 = do_valueize (valueize, gimple_assign_rhs1 (a2));
   if (!operand_equal_p (op10, op20))
 return false;
-  tree op11 = gimple_assign_rhs2 (a1);
-  tree op21 = gimple_assign_rhs2 (a2);
+  tree op11 = do_valueize (valueize, gimple_assign_rhs2 (a1));
+  tree op21 = do_valueize (valueize, gimple_assign_rhs2 (a2));
   if (!operand_equal_p (op11, op21))
 return false;
   if (invert_tree_comparison (gimple_assign_rhs_code (a1),
-- 
2.31.1



[PATCH 1/2] Move `~X & X` and `~X | X` over to use bitwise_inverted_equal_p

2023-07-31 Thread Andrew Pinski via Gcc-patches
This is a simple patch to move these 2 patterns over to use
bitwise_inverted_equal_p. It also allows us to remove 2 other patterns
which were used on comparisons as they are now handled by
the original pattern.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (`~X & X`, `~X | X`): Move over to
use bitwise_inverted_equal_p, removing :c as bitwise_inverted_equal_p
handles that already.
Remove range test simplifications to true/false as they
are now handled by these patterns.
---
 gcc/match.pd | 28 ++--
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 74f0a84f31d..7d030262698 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1157,8 +1157,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* Simplify ~X & X as zero.  */
 (simplify
- (bit_and:c (convert? @0) (convert? (bit_not @0)))
-  { build_zero_cst (type); })
+ (bit_and (convert? @0) (convert? @1))
+ (if (bitwise_inverted_equal_p (@0, @1))
+  { build_zero_cst (type); }))
 
 /* PR71636: Transform x & ((1U << b) - 1) -> x & ~(~0U << b);  */
 (simplify
@@ -1395,8 +1396,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* ~x ^ x -> -1 */
 (for op (bit_ior bit_xor)
  (simplify
-  (op:c (convert? @0) (convert? (bit_not @0)))
-  (convert { build_all_ones_cst (TREE_TYPE (@0)); })))
+  (op (convert? @0) (convert? @1))
+  (if (bitwise_inverted_equal_p (@0, @1))
+   (convert { build_all_ones_cst (TREE_TYPE (@0)); }
 
 /* x ^ x -> 0 */
 (simplify
@@ -5994,24 +5996,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (bit_and:c (ordered @0 @0) (ordered:c@2 @0 @1))
  @2)
 
-/* Simple range test simplifications.  */
-/* A < B || A >= B -> true.  */
-(for test1 (lt le le le ne ge)
- test2 (ge gt ge ne eq ne)
- (simplify
-  (bit_ior:c (test1 @0 @1) (test2 @0 @1))
-  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
-   || VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0)))
-   { constant_boolean_node (true, type); })))
-/* A < B && A >= B -> false.  */
-(for test1 (lt lt lt le ne eq)
- test2 (ge gt eq gt eq gt)
- (simplify
-  (bit_and:c (test1 @0 @1) (test2 @0 @1))
-  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
-   || VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0)))
-   { constant_boolean_node (false, type); })))
-
 /* A & (2**N - 1) <= 2**K - 1 -> A & (2**N - 2**K) == 0
A & (2**N - 1) >  2**K - 1 -> A & (2**N - 2**K) != 0
 
-- 
2.31.1



Re: [PATCH 2/2] MATCH: Add `a == b | a cmp b` and `a != b & a cmp b` simplifications

2023-07-31 Thread Andrew Pinski via Gcc-patches
On Mon, Jul 31, 2023 at 3:53 AM Richard Biener via Gcc-patches
 wrote:
>
> On Mon, Jul 31, 2023 at 7:35 AM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > Even though these are done by combine_comparisons, we can add them to match
> > to allow simplifcations during match rather than just during 
> > reassoc/ifcombine.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> OK.  Note we want to get rid of the GENERIC folding parts of
> maybe_fold_{and,or}_comparisons which is what those passes rely on.

Yes that was the idea here but there are still a few more cases that
need to be handled: Floating point, and some integer constants due to
`a <= 1` changing to `a < 0`.

Thanks,
Andrew

>
> Richard.
>
> > gcc/ChangeLog:
> >
> > PR tree-optimization/106164
> > * match.pd (`a != b & a <= b`, `a != b & a >= b`,
> > `a == b | a < b`, `a == b | a > b`): Handle these cases
> > too.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/106164
> > * gcc.dg/tree-ssa/cmpbit-2.c: New test.
> > ---
> >  gcc/match.pd | 32 +--
> >  gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c | 39 
> >  2 files changed, 69 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 00af5d99119..cf8057701ea 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -2832,7 +2832,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (switch
> >(if (code1 == EQ_EXPR && val) @3)
> >(if (code1 == EQ_EXPR && !val) { constant_boolean_node (false, 
> > type); })
> > -  (if (code1 == NE_EXPR && !val) @4)))
> > +  (if (code1 == NE_EXPR && !val) @4)
> > +  (if (code1 == NE_EXPR
> > +   && code2 == GE_EXPR
> > +  && cmp == 0)
> > +   (gt @0 @1))
> > +  (if (code1 == NE_EXPR
> > +   && code2 == LE_EXPR
> > +  && cmp == 0)
> > +   (lt @0 @1))
> > + )
> > +)
> > +   )
> > +  )
> > + )
> > +)
> >
> >  /* Convert (X OP1 CST1) && (X OP2 CST2).
> > Convert (X OP1 Y) && (X OP2 Y).  */
> > @@ -2917,7 +2931,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (switch
> >(if (code1 == EQ_EXPR && val) @4)
> >(if (code1 == NE_EXPR && val) { constant_boolean_node (true, type); 
> > })
> > -  (if (code1 == NE_EXPR && !val) @3)))
> > +  (if (code1 == NE_EXPR && !val) @3)
> > +  (if (code1 == EQ_EXPR
> > +   && code2 == GT_EXPR
> > +  && cmp == 0)
> > +   (ge @0 @1))
> > +  (if (code1 == EQ_EXPR
> > +   && code2 == LT_EXPR
> > +  && cmp == 0)
> > +   (le @0 @1))
> > + )
> > +)
> > +   )
> > +  )
> > + )
> > +)
> >
> >  /* Convert (X OP1 CST1) || (X OP2 CST2).
> > Convert (X OP1 Y)|| (X OP2 Y).  */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c
> > new file mode 100644
> > index 000..c4226ef01af
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c
> > @@ -0,0 +1,39 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O1 -fno-tree-reassoc -fdump-tree-optimized-raw" } */
> > +
> > +_Bool f(int a, int b)
> > +{
> > +  _Bool c = a == b;
> > +  _Bool d = a > b;
> > +  return c | d;
> > +}
> > +
> > +_Bool f1(int a, int b)
> > +{
> > +  _Bool c = a != b;
> > +  _Bool d = a >= b;
> > +  return c & d;
> > +}
> > +
> > +_Bool g(int a, int b)
> > +{
> > +  _Bool c = a == b;
> > +  _Bool d = a < b;
> > +  return c | d;
> > +}
> > +
> > +_Bool g1(int a, int b)
> > +{
> > +  _Bool c = a != b;
> > +  _Bool d = a <= b;
> > +  return c & d;
> > +}
> > +
> > +
> > +/* We should be able to optimize these without reassociation too. */
> > +/* { dg-final { scan-tree-dump-not "bit_and_expr," "optimized" } } */
> > +/* { dg-final { scan-tree-dump-not "bit_ior_expr," "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "gt_expr," 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "ge_expr," 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "lt_expr," 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "le_expr," 1 "optimized" } } */
> > --
> > 2.31.1
> >


[COMMITTEDv3] tree-optimization: [PR100864] `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-07-31 Thread Andrew Pinski via Gcc-patches
This is a new version of the patch.
Instead of doing the matching of inversion comparison directly inside
match, creating a new function (bitwise_inverted_equal_p) to do it.
It is very similar to bitwise_equal_p that was added in 
r14-2751-g2a3556376c69a1fb
but instead it says `expr1 == ~expr2`. A follow on patch, will
use this function in other patterns where we try to match `@0` and `(bit_not 
@0)`.

Changed the name bitwise_not_equal_p to bitwise_inverted_equal_p.

Committed as approved after a Bootstrapped and test on x86_64-linux-gnu with no 
regressions.

PR tree-optimization/100864

gcc/ChangeLog:

* generic-match-head.cc (bitwise_inverted_equal_p): New function.
* gimple-match-head.cc (bitwise_inverted_equal_p): New macro.
(gimple_bitwise_inverted_equal_p): New function.
* match.pd ((~x | y) & x): Use bitwise_inverted_equal_p
instead of direct matching bit_not.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-3.c: New test.
---
 gcc/generic-match-head.cc| 42 ++
 gcc/gimple-match-head.cc | 71 
 gcc/match.pd |  5 +-
 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 ++
 4 files changed, 183 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index a71c0727b0b..ddaf22f2179 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -121,3 +121,45 @@ bitwise_equal_p (tree expr1, tree expr2)
 return wi::to_wide (expr1) == wi::to_wide (expr2);
   return operand_equal_p (expr1, expr2, 0);
 }
+
+/* Return true if EXPR1 and EXPR2 have the bitwise opposite value,
+   but not necessarily same type.
+   The types can differ through nop conversions.  */
+
+static inline bool
+bitwise_inverted_equal_p (tree expr1, tree expr2)
+{
+  STRIP_NOPS (expr1);
+  STRIP_NOPS (expr2);
+  if (expr1 == expr2)
+return false;
+  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
+return false;
+  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
+return wi::to_wide (expr1) == ~wi::to_wide (expr2);
+  if (operand_equal_p (expr1, expr2, 0))
+return false;
+  if (TREE_CODE (expr1) == BIT_NOT_EXPR
+  && bitwise_equal_p (TREE_OPERAND (expr1, 0), expr2))
+return true;
+  if (TREE_CODE (expr2) == BIT_NOT_EXPR
+  && bitwise_equal_p (expr1, TREE_OPERAND (expr2, 0)))
+return true;
+  if (COMPARISON_CLASS_P (expr1)
+  && COMPARISON_CLASS_P (expr2))
+{
+  tree op10 = TREE_OPERAND (expr1, 0);
+  tree op20 = TREE_OPERAND (expr2, 0);
+  if (!operand_equal_p (op10, op20))
+   return false;
+  tree op11 = TREE_OPERAND (expr1, 1);
+  tree op21 = TREE_OPERAND (expr2, 1);
+  if (!operand_equal_p (op11, op21))
+   return false;
+  if (invert_tree_comparison (TREE_CODE (expr1),
+ HONOR_NANS (op10))
+ == TREE_CODE (expr2))
+   return true;
+}
+  return false;
+}
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 5d6d26d009b..0265e55be93 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -263,3 +263,74 @@ gimple_bitwise_equal_p (tree expr1, tree expr2, tree 
(*valueize) (tree))
 return true;
   return false;
 }
+
+/* Return true if EXPR1 and EXPR2 have the bitwise opposite value,
+   but not necessarily same type.
+   The types can differ through nop conversions.  */
+#define bitwise_inverted_equal_p(expr1, expr2) \
+  gimple_bitwise_inverted_equal_p (expr1, expr2, valueize)
+
+/* Helper function for bitwise_equal_p macro.  */
+
+static inline bool
+gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, tree (*valueize) 
(tree))
+{
+  if (expr1 == expr2)
+return false;
+  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
+return false;
+  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
+return wi::to_wide (expr1) == ~wi::to_wide (expr2);
+  if (operand_equal_p (expr1, expr2, 0))
+return false;
+
+  tree other;
+  if (gimple_nop_convert (expr1, , valueize)
+  && gimple_bitwise_inverted_equal_p (other, expr2, valueize))
+return true;
+
+  if (gimple_nop_convert (expr2, , valueize)
+  && gimple_bitwise_inverted_equal_p (expr1, other, valueize))
+return true;
+
+  if (TREE_CODE (expr1) != SSA_NAME
+  || TREE_CODE (expr2) != SSA_NAME)
+return false;
+
+  gimple *d1 = get_def (valueize, expr1);
+  gassign *a1 = safe_dyn_cast  (d1);
+  gimple *d2 = get_def (valueize, expr2);
+  gassign *a2 = safe_dyn_cast  (d2);
+  if (a1
+  && gimple_assign_rhs_code (a1) == BIT_NOT_EXPR
+  && gimple_bitwise_equal_p (do_valueize (valueize,
+ gimple_assign_rhs1 (a1)),
+expr2, valueize))
+   return true;
+  if 

Re: One question on the source code of tree-object-size.cc

2023-07-31 Thread Siddhesh Poyarekar

On 2023-07-31 13:03, Siddhesh Poyarekar wrote:

On 2023-07-31 12:47, Qing Zhao wrote:

Hi, Sid and Jakub,

I have a question in the following source portion of the routine 
“addr_object_size” of gcc/tree-object-size.cc:


  743   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
  744   if (bytes != error_mark_node)
  745 {
  746   bytes = size_for_offset (var_size, bytes);
  747   if (var != pt_var && pt_var_size && TREE_CODE (pt_var) 
== MEM_REF)

  748 {
  749   tree bytes2 = compute_object_offset (TREE_OPERAND 
(ptr, 0),

  750    pt_var);
  751   if (bytes2 != error_mark_node)
  752 {
  753   bytes2 = size_for_offset (pt_var_size, bytes2);
  754   bytes = size_binop (MIN_EXPR, bytes, bytes2);
  755 }
  756 }
  757 }

At line 754, why we always use “MIN_EXPR” whenever it’s for 
OST_MINIMUM or not?

Shall we use

(object_size_type & OST_MINIMUM
 ? MIN_EXPR : MAX_EXPR)



That MIN_EXPR is not for OST_MINIMUM.  It is to cater for allocations 
like this:


typedef struct
{
   int a;
} A;

size_t f()
{
   A *p = malloc (1);

   return __builtin_object_size (p, 0);


Correction, that should be __builtin_object_size (>a, 0)


}

where the returned size should be 1 and not sizeof (int).  The mode 
doesn't really matter in this case.


HTH.

Sid



[PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-07-31 Thread Martin Jambor
Hi,

when IPA-SRA detects whether a parameter passed by reference is
written to, it does not special case CLOBBERs which means it often
bails out unnecessarily, especially when dealing with C++ destructors.
Fixed by the obvious continue in the two relevant loops.

The (slightly) more complex testcases in the PR need surprisingly more
effort but the simple one can be fixed now easily by this patch and I'll
work on the others incrementally.

Bootstrapped and currently undergoing testsuite run on x86_64-linux.  OK
if it passes too?

Thanks,

Martin




gcc/ChangeLog:

2023-07-31  Martin Jambor  

PR ipa/110378
* ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers.
(ptr_parm_has_nonarg_uses): Likewise.

gcc/testsuite/ChangeLog:

2023-07-31  Martin Jambor  

PR ipa/110378
* g++.dg/ipa/pr110378-1.C: New test.
---
 gcc/ipa-sra.cc|  6 ++--
 gcc/testsuite/g++.dg/ipa/pr110378-1.C | 47 +++
 2 files changed, 51 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr110378-1.C

diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index c35e03b7abd..edba364f56e 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -898,7 +898,8 @@ isra_track_scalar_value_uses (function *fun, cgraph_node 
*node, tree name,
 
   FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name)
 {
-  if (is_gimple_debug (stmt))
+  if (is_gimple_debug (stmt)
+ || gimple_clobber_p (stmt))
continue;
 
   /* TODO: We could handle at least const builtin functions like arithmetic
@@ -1056,7 +1057,8 @@ ptr_parm_has_nonarg_uses (cgraph_node *node, function 
*fun, tree parm,
   unsigned uses_ok = 0;
   use_operand_p use_p;
 
-  if (is_gimple_debug (stmt))
+  if (is_gimple_debug (stmt)
+ || gimple_clobber_p (stmt))
continue;
 
   if (gimple_assign_single_p (stmt))
diff --git a/gcc/testsuite/g++.dg/ipa/pr110378-1.C 
b/gcc/testsuite/g++.dg/ipa/pr110378-1.C
new file mode 100644
index 000..aabe326b8b2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr110378-1.C
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-sra -fdump-tree-optimized-slim"  } */
+
+/* Test that even though destructors end with clobbering all of *this, it
+   should not prevent IPA-SRA.  */
+
+namespace {
+
+  class foo
+  {
+  public:
+int *a;
+foo(int c)
+{
+  a = new int[c];
+  a[0] = 4;
+}
+__attribute__((noinline)) ~foo();
+int f ()
+{
+  return a[0] + 1;
+}
+  };
+
+  volatile int v1 = 4;
+
+  __attribute__((noinline)) foo::~foo()
+  {
+delete[] a;
+return;
+  }
+
+
+}
+
+volatile int v2 = 20;
+
+int test (void)
+{
+  foo shouldnotexist(v2);
+  v2 = shouldnotexist.f();
+  return 0;
+}
+
+
+/* { dg-final { scan-ipa-dump "Will split parameter 0" "sra"  } } */
+/* { dg-final { scan-tree-dump-not "shouldnotexist" "optimized" } } */
-- 
2.41.0



Re: One question on the source code of tree-object-size.cc

2023-07-31 Thread Siddhesh Poyarekar

On 2023-07-31 12:47, Qing Zhao wrote:

Hi, Sid and Jakub,

I have a question in the following source portion of the routine 
“addr_object_size” of gcc/tree-object-size.cc:

  743   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
  744   if (bytes != error_mark_node)
  745 {
  746   bytes = size_for_offset (var_size, bytes);
  747   if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == 
MEM_REF)
  748 {
  749   tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0),
  750pt_var);
  751   if (bytes2 != error_mark_node)
  752 {
  753   bytes2 = size_for_offset (pt_var_size, bytes2);
  754   bytes = size_binop (MIN_EXPR, bytes, bytes2);
  755 }
  756 }
  757 }

At line 754, why we always use “MIN_EXPR” whenever it’s for OST_MINIMUM or not?
Shall we use

(object_size_type & OST_MINIMUM
 ? MIN_EXPR : MAX_EXPR)



That MIN_EXPR is not for OST_MINIMUM.  It is to cater for allocations 
like this:


typedef struct
{
  int a;
} A;

size_t f()
{
  A *p = malloc (1);

  return __builtin_object_size (p, 0);
}

where the returned size should be 1 and not sizeof (int).  The mode 
doesn't really matter in this case.


HTH.

Sid


Re: [PATCH] gcc-ar: Handle response files properly [PR77576]

2023-07-31 Thread Jeff Law via Gcc-patches




On 7/28/23 15:11, Joseph Myers wrote:

This patch is OK.
I fixed the whitespace errors in the patch as well as a couple minor 
ChangeLog entry items and pushed Costas's patch to the trunk.

jeff


Re: [PATCH] Read global value/mask in IPA.

2023-07-31 Thread Martin Jambor
Hello,

On Tue, Jul 18 2023, Aldy Hernandez wrote:
> On 7/17/23 15:14, Aldy Hernandez wrote:
>> Instead of reading the known zero bits in IPA, read the value/mask
>> pair which is available.
>> 
>> There is a slight change of behavior here.  I have removed the check
>> for SSA_NAME, as the ranger can calculate the range and value/mask for
>> INTEGER_CST.  This simplifies the code a bit, since there's no special
>> casing when setting the jfunc bits.  The default range for VR is
>> undefined, so I think it's safe just to check for undefined_p().
>
> Final round of tests revealed a regression for which I've adjusted the 
> testcase.
>
> It turns out g++.dg/ipa/pure-const-3.C fails because IPA can now pick up 
> value/mask from any pass that has an integrated ranger.  The test was 
> previously disabling evrp and CCP, but now VRP[12], jump threading, and 
> DOM can make value/mask adjustments visible to IPA so they must be 
> disabled as well.

So can this be then converted into a new testcase that would test that
we can now derive something we could not in the past?

The patch is OK (but the testcase above is highly desirable).

Thanks for keeping looking at IPA-VR.

Martin


>
> We've run into these scenarios multiple times in the past-- any 
> improvements to the ranger pipeline causes everyone to get smarter, 
> making changes visible earlier in the pipeline.
>
> Aldy
> From e1dfd4d6b3d3bf09d55b6ea3ac732462c7030802 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez 
> Date: Fri, 14 Jul 2023 12:38:16 +0200
> Subject: [PATCH] Read global value/mask in IPA.
>
> Instead of reading the known zero bits in IPA, read the value/mask
> pair which is available.
>
> There is a slight change of behavior here.  I have removed the check
> for SSA_NAME, as the ranger can calculate the range and value/mask for
> INTEGER_CST.  This simplifies the code a bit, since there's no special
> casing when setting the jfunc bits.  The default range for VR is
> undefined, so I think it's safe just to check for undefined_p().
>
> gcc/ChangeLog:
>
>   * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Read global
>   value/mask.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/ipa/pure-const-3.C: Adjust for smarter value/mask being
>   read by ranger earlier than expected by test.
> ---
>  gcc/ipa-prop.cc | 18 --
>  gcc/testsuite/g++.dg/ipa/pure-const-3.C |  2 +-
>  2 files changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index 5d790ff1265..4f6ed7b89bd 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -2402,8 +2402,7 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>   }
>else
>   {
> -   if (TREE_CODE (arg) == SSA_NAME
> -   && param_type
> +   if (param_type
> && Value_Range::supports_type_p (TREE_TYPE (arg))
> && Value_Range::supports_type_p (param_type)
> && irange::supports_p (TREE_TYPE (arg))
> @@ -2422,15 +2421,14 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>   gcc_assert (!jfunc->m_vr);
>   }
>  
> -  if (INTEGRAL_TYPE_P (TREE_TYPE (arg))
> -   && (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST))
> +  if (INTEGRAL_TYPE_P (TREE_TYPE (arg)) && !vr.undefined_p ())
>   {
> -   if (TREE_CODE (arg) == SSA_NAME)
> - ipa_set_jfunc_bits (jfunc, 0,
> - widest_int::from (get_nonzero_bits (arg),
> -   TYPE_SIGN (TREE_TYPE (arg;
> -   else
> - ipa_set_jfunc_bits (jfunc, wi::to_widest (arg), 0);
> +   irange  = as_a  (vr);
> +   irange_bitmask bm = r.get_bitmask ();
> +   signop sign = TYPE_SIGN (TREE_TYPE (arg));
> +   ipa_set_jfunc_bits (jfunc,
> +   widest_int::from (bm.value (), sign),
> +   widest_int::from (bm.mask (), sign));
>   }
>else if (POINTER_TYPE_P (TREE_TYPE (arg)))
>   {
> diff --git a/gcc/testsuite/g++.dg/ipa/pure-const-3.C 
> b/gcc/testsuite/g++.dg/ipa/pure-const-3.C
> index b4a4673e86e..e43cf09af27 100644
> --- a/gcc/testsuite/g++.dg/ipa/pure-const-3.C
> +++ b/gcc/testsuite/g++.dg/ipa/pure-const-3.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fno-ipa-vrp -fdump-tree-optimized -fno-tree-ccp 
> -fdisable-tree-evrp"  } */
> +/* { dg-options "-O2 -fno-ipa-vrp -fdump-tree-optimized -fno-tree-ccp 
> -fdisable-tree-evrp -fdisable-tree-vrp1 -fdisable-tree-vrp2 -fno-thread-jumps 
> -fno-tree-dominator-opts"  } */
>  int *ptr;
>  static int barvar;
>  static int b(int a);
> -- 
> 2.40.1


One question on the source code of tree-object-size.cc

2023-07-31 Thread Qing Zhao via Gcc-patches
Hi, Sid and Jakub,

I have a question in the following source portion of the routine 
“addr_object_size” of gcc/tree-object-size.cc:

 743   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
 744   if (bytes != error_mark_node)
 745 {
 746   bytes = size_for_offset (var_size, bytes);
 747   if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == 
MEM_REF)
 748 {
 749   tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0),
 750pt_var);
 751   if (bytes2 != error_mark_node)
 752 {
 753   bytes2 = size_for_offset (pt_var_size, bytes2);
 754   bytes = size_binop (MIN_EXPR, bytes, bytes2);
 755 }
 756 }
 757 }

At line 754, why we always use “MIN_EXPR” whenever it’s for OST_MINIMUM or not? 
Shall we use 

(object_size_type & OST_MINIMUM
? MIN_EXPR : MAX_EXPR)

Instead?

Thanks a lot for the help.

Qing

Re: [PING 3] [PATCH] Less warnings for parameters declared as arrays [PR98541, PR98536]

2023-07-31 Thread Joseph Myers
On Mon, 31 Jul 2023, Martin Uecker via Gcc-patches wrote:

> Joseph, I would appreciate if you could take a look at this?  
> 
> This fixes the remaining issues which requires me to turn the
> warnings off with -Wno-vla-parameter and -Wno-nonnull in my
> projects.

The front-end changes are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[Committed] RISC-V: Implement ISA Manual Table A.6 Mappings

2023-07-31 Thread Patrick O'Neill

GCC 13.2 released[2] so I merged the series now that the branch is unfrozen.

Thanks,
Patrick

[2] https://inbox.sourceware.org/gcc/ZMJeq%2FY5SN+7i8a+@tucnak/T/#u

On 7/25/23 11:01, Patrick O'Neill wrote:

Discussed during the weekly RISC-V GCC meeting[1] and pre-approved by
Jeff Law.
If there aren't any objections I'll commit this cherry-picked series
on Thursday (July 27th).

Patchset on trunk:
https://inbox.sourceware.org/gcc-patches/20230427162301.1151333-1-patr...@rivosinc.com/
First commit: f37a36bce81b50a43ec1613c1d08d803642f7506

Also includes bugfix from:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109713
commit: 4bd434fbfc7865961a8e10d7e9601b28765ce7be

[1] 
https://inbox.sourceware.org/gcc/mhng-b7423fca-67ec-4ce4-9694-4e062632ceb0@palmer-ri-x1c9/T/#t

Martin Liska (1):
   riscv: fix error: control reaches end of non-void function

Patrick O'Neill (11):
   RISC-V: Eliminate SYNC memory models
   RISC-V: Enforce Libatomic LR/SC SEQ_CST
   RISC-V: Enforce subword atomic LR/SC SEQ_CST
   RISC-V: Enforce atomic compare_exchange SEQ_CST
   RISC-V: Add AMO release bits
   RISC-V: Strengthen atomic stores
   RISC-V: Eliminate AMO op fences
   RISC-V: Weaken LR/SC pairs
   RISC-V: Weaken mem_thread_fence
   RISC-V: Weaken atomic loads
   RISC-V: Table A.6 conformance tests

  gcc/config/riscv/riscv-protos.h   |   3 +
  gcc/config/riscv/riscv.cc |  66 --
  gcc/config/riscv/sync.md  | 196 --
  .../riscv/amo-table-a-6-amo-add-1.c   |  15 ++
  .../riscv/amo-table-a-6-amo-add-2.c   |  15 ++
  .../riscv/amo-table-a-6-amo-add-3.c   |  15 ++
  .../riscv/amo-table-a-6-amo-add-4.c   |  15 ++
  .../riscv/amo-table-a-6-amo-add-5.c   |  15 ++
  .../riscv/amo-table-a-6-compare-exchange-1.c  |   9 +
  .../riscv/amo-table-a-6-compare-exchange-2.c  |   9 +
  .../riscv/amo-table-a-6-compare-exchange-3.c  |   9 +
  .../riscv/amo-table-a-6-compare-exchange-4.c  |   9 +
  .../riscv/amo-table-a-6-compare-exchange-5.c  |   9 +
  .../riscv/amo-table-a-6-compare-exchange-6.c  |  10 +
  .../riscv/amo-table-a-6-compare-exchange-7.c  |   9 +
  .../gcc.target/riscv/amo-table-a-6-fence-1.c  |  14 ++
  .../gcc.target/riscv/amo-table-a-6-fence-2.c  |  15 ++
  .../gcc.target/riscv/amo-table-a-6-fence-3.c  |  15 ++
  .../gcc.target/riscv/amo-table-a-6-fence-4.c  |  15 ++
  .../gcc.target/riscv/amo-table-a-6-fence-5.c  |  15 ++
  .../gcc.target/riscv/amo-table-a-6-load-1.c   |  16 ++
  .../gcc.target/riscv/amo-table-a-6-load-2.c   |  17 ++
  .../gcc.target/riscv/amo-table-a-6-load-3.c   |  18 ++
  .../gcc.target/riscv/amo-table-a-6-store-1.c  |  16 ++
  .../gcc.target/riscv/amo-table-a-6-store-2.c  |  17 ++
  .../riscv/amo-table-a-6-store-compat-3.c  |  18 ++
  .../riscv/amo-table-a-6-subword-amo-add-1.c   |   9 +
  .../riscv/amo-table-a-6-subword-amo-add-2.c   |   9 +
  .../riscv/amo-table-a-6-subword-amo-add-3.c   |   9 +
  .../riscv/amo-table-a-6-subword-amo-add-4.c   |   9 +
  .../riscv/amo-table-a-6-subword-amo-add-5.c   |   9 +
  gcc/testsuite/gcc.target/riscv/pr89835.c  |   9 +
  libgcc/config/riscv/atomic.c  |   4 +-
  33 files changed, 563 insertions(+), 75 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-3.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-4.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-5.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-1.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-2.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-3.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-4.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-5.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-6.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-7.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-1.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-2.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-3.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-4.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-5.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-1.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-2.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-3.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-store-1.c
  create mode 100644 

[COMMITTED] PR tree-optimization/110582 - fur_list should not use the range vector for non-ssa, operands.

2023-07-31 Thread Andrew MacLeod via Gcc-patches
The fold_using_range operand fetching mechanism has a variety of modes.  
The "normal" mechanism simply invokes the current or supplied 
range_query to satisfy fetching current range info for any ssa-names 
used during the evalaution of the statement,


I also added support for fur_list which allows a list of ranges to be 
supplied which is used to satisfy ssa-names as they appear in the stmt.  
Once the list is exhausted, then it reverts to using the range query.


This allows us to fold a stmt using whatever values we want. ie,

a_2 = b_3 + c_4


i can call fold_stmt (r, stmt, [1,2],  [4,5])

and a_2 would be calculated using [1,2] for the first ssa_name, and 
[4,5] for the second encountered name.  This allows us to manually fold 
stmts when we desire.


There was a bug in the implementation of fur_list where it was using the 
supplied values for *any* encountered operand, not just ssa_names.


The PHI analyzer is the first consumer of the fur_list API, and was 
tripping over this.



     [local count: 1052266993]:
  # a_lsm.12_29 = PHI 
  iftmp.1_15 = 3 / a_lsm.12_29;

   [local count: 1063004408]:
  # iftmp.1_11 = PHI 
  # ivtmp_2 = PHI 
  ivtmp_36 = ivtmp_2 - 1;
  if (ivtmp_36 != 0)
    goto ; [98.99%]
  else
    goto ; [1.01%]

It detemined that the initial value of iftmp.1_11 was [2, 2] (from the 
edge 2->4), and that the only modifying statement is

iftmp.1_15 = 3 / a_lsm.12_29;

One of the things it tries to do is determine is if a few iterations 
feeding the initial value and combining it with the result of the 
statement converge, thus providing a complete initial range.  Its uses 
fold_range supplying the value for the ssa-operand directly..  but 
tripped over the bug.


So for the first iteration, instead of calculating   _15 = 3 / [2,2]  
and coming up with [1,1],   it was instead calculating [2,2]/VARYING, 
and coming up with [-2, 2].  Next pass of the iteration checker then 
erroneously calculated [-2,2]/VARYING and the result was [-2,2] and 
convergence was achieved, and the initial value of the PHI set to[-2, 2] 
... incorrectly.  and of course bad things happened.


This patch fixes fur_list::get_operand to check for an ssa-name before 
it pulling a value from the supplied list.  With this, no partlculary 
good starting value for the PHI node can be determined.


Andrew

From 914fa35a7f7db76211ca259606578193773a254e Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 31 Jul 2023 10:08:51 -0400
Subject: [PATCH] fur_list should not use the range vector for non-ssa
 operands.

	gcc/
	PR tree-optimization/110582
	* gimple-range-fold.cc (fur_list::get_operand): Do not use the
	range vector for non-ssa names.

	gcc/testsuite/
	* gcc.dg/pr110582.c: New.
---
 gcc/gimple-range-fold.cc|  3 ++-
 gcc/testsuite/gcc.dg/pr110582.c | 18 ++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr110582.c

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index d07246008f0..ab2d996c4eb 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -262,7 +262,8 @@ fur_list::fur_list (unsigned num, vrange **list, range_query *q)
 bool
 fur_list::get_operand (vrange , tree expr)
 {
-  if (m_index >= m_limit)
+  // Do not use the vector for non-ssa-names, or if it has been emptied.
+  if (TREE_CODE (expr) != SSA_NAME || m_index >= m_limit)
 return m_query->range_of_expr (r, expr);
   r = *m_list[m_index++];
   gcc_checking_assert (range_compatible_p (TREE_TYPE (expr), r.type ()));
diff --git a/gcc/testsuite/gcc.dg/pr110582.c b/gcc/testsuite/gcc.dg/pr110582.c
new file mode 100644
index 000..ae0650d3ae7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr110582.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp2" } */
+
+int a, b;
+int main() {
+  char c = a = 0;
+  for (; c != -3; c++) {
+int d = 2;
+d ^= 2 && a;
+b = a == 0 ? d : d / a;
+a = b;
+  }
+  for (; (1 + 95 << 24) + b + 1 + 686658714L + b - 2297271457;)
+;
+}
+
+/* { dg-final { scan-tree-dump-not "Folding predicate" "vrp2" } } */
+
-- 
2.40.1



Re: [PATCH] gcse: Extract reg pressure handling into separate file.

2023-07-31 Thread Jeff Law via Gcc-patches




On 7/28/23 04:17, Robin Dapp via Gcc-patches wrote:

Hi,

this patch extracts the hoist-pressure handling from gcse and puts it
into a separate file so it can be used by other passes in the future.
No functional change and I also abstained from c++ifying the code.
The naming with the regpressure_ prefix might be a bit clunky for
now and I'm open to a better scheme.

Some minor helper functions are added that just encapsulate BB aux
data manipulation.  All of this is in preparation for fwprop to
use register pressure data if needed.

Bootstrapped and regtested on x86, aarch64 and power.

Regards
  Robin

 From 65e69834eeb08ba093786e386ac16797cec4d8a7 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Mon, 24 Jul 2023 16:25:38 +0200
Subject: [PATCH] gcse: Extract reg pressure handling into separate file.

This patch extracts the hoist-pressure handling from gcse into a separate
file so it can be used by other passes in the future.  No functional change.

gcc/ChangeLog:

* Makefile.in: Add regpressure.o.
* gcse.cc (struct bb_data): Move to regpressure.cc.
(BB_DATA): Ditto.
(get_regno_pressure_class): Ditto.
(get_pressure_class_and_nregs): Ditto.
(record_set_data): Ditto.
(update_bb_reg_pressure): Ditto.
(should_hoist_expr_to_dom): Ditto.
(hoist_code): Ditto.
(change_pressure): Ditto.
(calculate_bb_reg_pressure): Ditto.
(one_code_hoisting_pass): Ditto.
* gcse.h (single_set_gcse): Export single_set_gcse.
* regpressure.cc: New file.
* regpressure.h: New file.
OK.  Feel free to C++ify if you want now ;-)  Having a reasonably well 
encapculated module to allow us to query register pressure seems like a 
step forward.


jeff


Re: [PATCH] rtl-optimization/110587 - remove quadratic regno_in_use_p

2023-07-31 Thread Jeff Law via Gcc-patches




On 7/31/23 04:53, Richard Biener via Gcc-patches wrote:

On Tue, 25 Jul 2023, Richard Biener wrote:


The following removes the code checking whether a noop copy
is between something involved in the return sequence composed
of a SET and USE.  Instead of checking for this special-case
the following makes us only ever remove noop copies between
pseudos - which is the case that is necessary for IRA/LRA
interfacing to function according to the comment.  That makes
looking for the return reg special case unnecessary, reducing
the compile-time in LRA non-specific to zero for the testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu with
all languages and {,-m32}.

OK?


Ping.


Thanks,
Richard.

PR rtl-optimization/110587
* lra-spills.cc (return_regno_p): Remove.
(regno_in_use_p): Likewise.
(lra_final_code_change): Do not remove noop moves
between hard registers.

OK
jeff


Re: [PATCH] rtl-optimization/110587 - speedup find_hard_regno_for_1

2023-07-31 Thread Jeff Law via Gcc-patches




On 7/31/23 04:54, Richard Biener via Gcc-patches wrote:

On Tue, 25 Jul 2023, Richard Biener wrote:


The following applies a micro-optimization to find_hard_regno_for_1,
re-ordering the check so we can easily jump-thread by using an else.
This reduces the time spent in this function by 15% for the testcase
in the PR.

Bootstrap & regtest running on x86_64-unknown-linux-gnu, OK if that
passes?


Ping.


Thanks,
Richard.

PR rtl-optimization/110587
* lra-assigns.cc (find_hard_regno_for_1): Re-order checks.
---
  gcc/lra-assigns.cc | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index b8582dcafff..d2ebcfd5056 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -522,14 +522,15 @@ find_hard_regno_for_1 (int regno, int *cost, int 
try_only_hard_regno,
   r2 != NULL;
   r2 = r2->start_next)
{
- if (r2->regno >= lra_constraint_new_regno_start
+ if (live_pseudos_reg_renumber[r2->regno] < 0
+ && r2->regno >= lra_constraint_new_regno_start
  && lra_reg_info[r2->regno].preferred_hard_regno1 >= 0
- && live_pseudos_reg_renumber[r2->regno] < 0
  && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
sparseset_set_bit (conflict_reload_and_inheritance_pseudos,
   r2->regno);
- if (live_pseudos_reg_renumber[r2->regno] >= 0
- && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
+ else if (live_pseudos_reg_renumber[r2->regno] >= 0
+  && rclass_intersect_p
+   [regno_allocno_class_array[r2->regno]])
sparseset_set_bit (live_range_hard_reg_pseudos, r2->regno);
My biggest concern here would be r2->regno < 0  in the new code which 
could cause an OOB array reference in the first condition of the test.


Isn't that the point if the original ordering?  Test that r2->regno is 
reasonable before using it as an array index?


jeff


Re: [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-31 Thread Joseph Myers
On Fri, 28 Jul 2023, Jason Merrill via Gcc-patches wrote:

> > Thanks, I had thought there could be a potential issue with needing to also
> > check cpp_get_options(pfile)->traditional. But looking at it more, there's
> > no
> > code path currently that can end up here in traditional mode, so yes we can
> > eliminate stream_tokens_to_preprocessor and just check flag_preprocess_only.
> > 
> > The attached simplified patch does this, bootstrap + regtest look good as
> > well.
> 
> LGTM, I'll let the C maintainers comment on the C parser change.

The C parser change is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] RISC-V: Return machine_mode rather than opt_machine_mode for get_mask_mode, NFC

2023-07-31 Thread Maciej W. Rozycki
On Mon, 31 Jul 2023, Jeff Law wrote:

> > That's a good suggestion! Thanks, let me try to apply myself workflow  :)
> I'm thinking that as part of the CI POC being done by RISE that the base AMI
> image ought to be gcc-13 based and that we should configure the toolchains we
> build with -enable-werror-always.
> 
> While we can't necessarily get every developer to embrace this workflow, we
> ought to be catching it quicker than we currently are.

 I wonder if we should enable the option by default, perhaps under certain 
conditions such as matching the build compiler version, for builds made 
from a Git checkout rather than a release tarball.  I suspect some people 
are simply not aware of this option.

  Maciej


Re: [PATCH] RISC-V: Return machine_mode rather than opt_machine_mode for get_mask_mode, NFC

2023-07-31 Thread Maciej W. Rozycki
On Mon, 31 Jul 2023, Kito Cheng wrote:

> >  I just configure with `--enable-werror-always', which we want to keep
> > our standards up to anyway,
> 
> I rely on the host GCC which is 11 relatively old compared to the
> trunk, so --enable-werror-always will get many -Wformat* warning :(

 If building a cross-compiler for upstream submissions or regression runs 
I always bootstrap a native compiler of the same checkout first and then 
use it for the build.  I think it's good practice, and it's needed for the 
Ada frontend anyway.  That's one way to avoid introducing warnings by 
chance, and it takes less than an hour to bootstrap native GCC on decent 
contemporary hardware (and then you don't have to be pedantic, and neither 
I am, and you can keep reusing an older native build for some time).

  Maciej


Re: [PATCH v2] RISC-V: convert the mulh with 0 to mov 0 to the reg.

2023-07-31 Thread Jeff Law via Gcc-patches




On 7/31/23 06:14, Wang, Yanzhang wrote:

Thanks your comments, Jeff and Robin


Is the mulh case somehow common or critical?

Well, I would actually back up even further.  What were the
circumstances that led to the mulh with a zero operand?


I think you both mentioned why should we add the mulh * 0 simplify.
Unfortunately, I have no such a benchmark to explain the criticalness. We found
there're some cases that exists in simplify_binary_operation in simplify-rtx.cc
but not working for RISC-V backend. For example,

- mult * 0 exists, but RISC-V has additional mulh * 0
- add + 0 / sub - 0 exists, but RISC-V has additional (madc + adc) + 0
- ...

So we want to do some complement to make the simplify can cover more cases.
That's the basic idea why we do these shortcut optimizations.
But the right place to handle this stuff is probably in the generic 
code, with a few exceptions.


So even if you don't have a benchmark, just having non-intrinsic/builtin 
code which triggers these cases would be helpful so that we can figure 
out the best place to fix this problem.  What I want to avoid is adding 
a bunch of patterns in the RISC-V backend for cases that are better 
handled by generic optimization passes.




Jeff


Re: _BitInt vs. _Atomic

2023-07-31 Thread Martin Uecker
Am Montag, dem 31.07.2023 um 14:33 + schrieb Michael Matz:
> Hello,
> 
> On Fri, 28 Jul 2023, Martin Uecker wrote:
> 
> > > > Sorry, somehow I must be missing something here.
> > > > 
> > > > If you add something you would create a new value and this may (in
> > > > an object) have random new padding.  But the "expected" value should
> > > > be updated by a failed atomic_compare_exchange cycle and then have
> > > > same padding as the value stored in the atomic. So the next cycle
> > > > should succeed.  The user would not change the representation of
> > > > the "expected" value but create a new value for another object
> > > > by adding something.
> > > 
> > > You're right that it would pass the expected value not something after an
> > > operation on it usually.  But still, expected type will be something like
> > > _BitInt(37) or _BitInt(195) and so neither the atomic_load nor what
> > > atomic_compare_exchange copies back on failure is guaranteed to have the
> > > padding bits preserved.
> > 
> > For atomic_load in C a value is returned. A value does not care about
> > padding and when stored into a new object can produce new and different
> > padding.  
> > 
> > But for atomic_compare_exchange the memory content is copied into 
> > an object passed by pointer, so here the C standard requires to
> > that the padding is preserved. It explicitely states that the effect
> > is like:
> > 
> > if (memcmp(object, expected, sizeof(*object)) == 0)
> >   memcpy(object, , sizeof(*object));
> > else
> >   memcpy(expected, object, sizeof(*object));
> > 
> > > It is true that if it is larger than 16 bytes the libatomic 
> > > atomic_compare_exchange will memcpy the value back which copies the 
> > > padding bits, but is there a guarantee that the user code doesn't 
> > > actually copy that value further into some other variable?  
> > 
> > I do not think it would be surprising for C user when
> > the next atomic_compare_exchange fails in this case.
> 
> But that is a problem (the same one the cited C++ paper tries to resolve, 
> IIUC). 

I do not quite understand the paper. I can't see
how the example 3 in

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0528r0.html

could loop indefinitely with the memcpy / memcmp semantics,
but somehow the authors seem to claim this. 


>  Say you have a loop like so:
> 
> _Atomic T obj;
> ...
> T expected1, expected2, newval;
> newval = ...;
> expected1 = ...;
> do {
>   expected2 = expected1;
>   if (atomic_compare_exchange_weak(, , newval);
> break;
>   expected1 = expected2;
> } while (1);
> 
> As written this looks of course stupid, and you may say "don't do that", 
> but internally the copies might result from temporaries (compiler 
> generated or wrapper function arguments, or suchlike). 
>  Now, while 
> expected2 will contain the copied padding bits after the cmpxchg the 
> copies to and from expected1 will possibly destroy them.  Either way I 
> don't see why the above loop should be out-of-spec, so I can write it and 
> expect it to proceed eventually (certainly when the _strong variant is 
> used).  Any argument that would declare the above loop out-of-spec I would 
> consider a defect in the spec.

It is "out-of-spec" for C in the sense that it can not be
expected work with the semantics as specified in the C standard.

But I agree with you that it would be better if it just worked.

A compiler could, for example, always clear the padding when
initializing or storing atomic values.  It might also clear
the padding of the initial "expected", when it is
initialized or stored to.

But it should not clear / ignore the padding when copying 
to "expected" using atomic_compare_exhange or when comparing
to the memory content. See below why I think this would not
be helpful.

> 
> It's never a good idea to introduce reliance on padding bits.  Exactly 
> because you can trivially destroy them with simple value copies.
> 
> > > Anyway, for smaller or equal to 16 (or 8) bytes if 
> > > atomic_compare_exchange is emitted inline I don't see what would 
> > > preserve the bits.
> > 
> > This then seems to be incorrect for C.
> 
> Or the spec is.

In practice, what the semantics specified using memcpy/memcmp
allow one to do is to also apply atomic operations on non-atomic 
types.  This is not guaranteed to work by the C standard, but
in practice  people often have to do this.  For example, nobody
is going to copy a 256 GB numerical array with non-atomic types
into another data structure with atomic versions of the same
type just so that you can apply atomic operations on it. So
one simply does an unsafe cast and hopes the compiler does not
break this.

If the non-atomic struct now has non-zero values in the padding, 
and the compiler would clear those automatically for "expected", 
you would create the problem of an infinite loop (this time 
for real).


Martin









Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-07-31 Thread Jeff Law via Gcc-patches




On 7/28/23 01:05, Richard Biener via Gcc-patches wrote:

The following delays sinking of loads within the same innermost
loop when it was unconditional before.  That's a not uncommon
issue preventing vectorization when masked loads are not available.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I have a followup patch improving sinking that without this would
cause more of the problematic sinking - now that we have a second
sink pass after loop opts this looks like a reasonable approach?

OK?

Thanks,
Richard.

PR tree-optimization/92335
* tree-ssa-sink.cc (select_best_block): Before loop
optimizations avoid sinking unconditional loads/stores
in innermost loops to conditional executed places.

* gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
* gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
expect predictive commoning to happen instead of sinking.
* gcc.dg/vect/pr65947-3.c: Adjust.
I think it's reasonable -- there's probably going to be cases where it's 
not great, but more often than not I think it's going to be a reasonable 
heuristic.


If there is undesirable fallout, better to find it over the coming 
months than next spring.  So I'd suggest we go forward now to give more 
time to find any pathological cases (if they exist).


Jeff


Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-07-31 Thread Jeff Law via Gcc-patches




On 7/31/23 08:01, Richard Biener via Gcc-patches wrote:

The following makes sure to limit the shift operand when vectorizing
(short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
operand otherwise invokes undefined behavior.  When we determine
whether we can demote the operand we know we at most shift in the
sign bit so we can adjust the shift amount.

Note this has the possibility of un-CSEing common shift operands
as there's no good way to share pattern stmts between patterns.
We'd have to separately pattern recognize the definition.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Not sure about LSHIFT_EXPR, it probably has the same issue but
the fallback optimistic zero for out-of-range shifts is at least
"corrrect".  Not sure we ever try to demote rotates (probably not).

OK?

Thanks,
Richard.

PR tree-optimization/110838
* tree-vect-patterns.cc (vect_recog_over_widening_pattern):
Adjust the shift operand of RSHIFT_EXPRs.

* gcc.dg/torture/pr110838.c: New testcase.
I'm not a fan of the asymmetric handling across RSHIFT/LSHIFT.  But if 
you think the asymmetry isn't a problem in practice, then I won't object.


Jeff


Re: [PATCH] RISC-V: Return machine_mode rather than opt_machine_mode for get_mask_mode, NFC

2023-07-31 Thread Jeff Law via Gcc-patches




On 7/31/23 08:52, Kito Cheng via Gcc-patches wrote:

On Mon, Jul 31, 2023 at 10:03 PM Maciej W. Rozycki  wrote:


On Mon, 31 Jul 2023, Kito Cheng wrote:


Sorry for disturbing, pushed a fix for that, and...added
-Werror=unused-variable to my build script to prevent that happen
again :(


  I just configure with `--enable-werror-always', which we want to keep
our standards up to anyway,


I rely on the host GCC which is 11 relatively old compared to the
trunk, so --enable-werror-always will get many -Wformat* warning :(


but if you find this infeasible for some
reason with your workflow, then there's always an option to grep for
warnings in the build log and diff that against the previous iteration.


That's a good suggestion! Thanks, let me try to apply myself workflow  :)
I'm thinking that as part of the CI POC being done by RISE that the base 
AMI image ought to be gcc-13 based and that we should configure the 
toolchains we build with -enable-werror-always.


While we can't necessarily get every developer to embrace this workflow, 
we ought to be catching it quicker than we currently are.


jeff


Re: [PATCH] RISC-V: Return machine_mode rather than opt_machine_mode for get_mask_mode, NFC

2023-07-31 Thread Kito Cheng via Gcc-patches
On Mon, Jul 31, 2023 at 10:03 PM Maciej W. Rozycki  wrote:
>
> On Mon, 31 Jul 2023, Kito Cheng wrote:
>
> > Sorry for disturbing, pushed a fix for that, and...added
> > -Werror=unused-variable to my build script to prevent that happen
> > again :(
>
>  I just configure with `--enable-werror-always', which we want to keep
> our standards up to anyway,

I rely on the host GCC which is 11 relatively old compared to the
trunk, so --enable-werror-always will get many -Wformat* warning :(

> but if you find this infeasible for some
> reason with your workflow, then there's always an option to grep for
> warnings in the build log and diff that against the previous iteration.

That's a good suggestion! Thanks, let me try to apply myself workflow  :)

>
>   Maciej


Re: [PATCH RESEND] c: add -Wmissing-variable-declarations [PR65213]

2023-07-31 Thread Hamza Mahfooz

Hey Joseph,

On Fri, Jul 28 2023 at 08:32:31 PM +00:00:00, Joseph Myers 
 wrote:

OK.


--
Joseph S. Myers
jos...@codesourcery.com


Since I don't have write access, do you mind pushing this for me?




Re: [PATCH] Read global value/mask in IPA.

2023-07-31 Thread Aldy Hernandez via Gcc-patches
PING * 2

On Tue, Jul 25, 2023 at 8:32 AM Aldy Hernandez  wrote:
>
> Ping
>
> On Mon, Jul 17, 2023, 15:14 Aldy Hernandez  wrote:
>>
>> Instead of reading the known zero bits in IPA, read the value/mask
>> pair which is available.
>>
>> There is a slight change of behavior here.  I have removed the check
>> for SSA_NAME, as the ranger can calculate the range and value/mask for
>> INTEGER_CST.  This simplifies the code a bit, since there's no special
>> casing when setting the jfunc bits.  The default range for VR is
>> undefined, so I think it's safe just to check for undefined_p().
>>
>> OK?
>>
>> gcc/ChangeLog:
>>
>> * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Read global
>> value/mask.
>> ---
>>  gcc/ipa-prop.cc | 18 --
>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
>> index 5d790ff1265..4f6ed7b89bd 100644
>> --- a/gcc/ipa-prop.cc
>> +++ b/gcc/ipa-prop.cc
>> @@ -2402,8 +2402,7 @@ ipa_compute_jump_functions_for_edge (struct 
>> ipa_func_body_info *fbi,
>> }
>>else
>> {
>> - if (TREE_CODE (arg) == SSA_NAME
>> - && param_type
>> + if (param_type
>>   && Value_Range::supports_type_p (TREE_TYPE (arg))
>>   && Value_Range::supports_type_p (param_type)
>>   && irange::supports_p (TREE_TYPE (arg))
>> @@ -2422,15 +2421,14 @@ ipa_compute_jump_functions_for_edge (struct 
>> ipa_func_body_info *fbi,
>> gcc_assert (!jfunc->m_vr);
>> }
>>
>> -  if (INTEGRAL_TYPE_P (TREE_TYPE (arg))
>> - && (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST))
>> +  if (INTEGRAL_TYPE_P (TREE_TYPE (arg)) && !vr.undefined_p ())
>> {
>> - if (TREE_CODE (arg) == SSA_NAME)
>> -   ipa_set_jfunc_bits (jfunc, 0,
>> -   widest_int::from (get_nonzero_bits (arg),
>> - TYPE_SIGN (TREE_TYPE 
>> (arg;
>> - else
>> -   ipa_set_jfunc_bits (jfunc, wi::to_widest (arg), 0);
>> + irange  = as_a  (vr);
>> + irange_bitmask bm = r.get_bitmask ();
>> + signop sign = TYPE_SIGN (TREE_TYPE (arg));
>> + ipa_set_jfunc_bits (jfunc,
>> + widest_int::from (bm.value (), sign),
>> + widest_int::from (bm.mask (), sign));
>> }
>>else if (POINTER_TYPE_P (TREE_TYPE (arg)))
>> {
>> --
>> 2.40.1
>>



Re: _BitInt vs. _Atomic

2023-07-31 Thread Michael Matz via Gcc-patches
Hello,

On Fri, 28 Jul 2023, Martin Uecker wrote:

> > > Sorry, somehow I must be missing something here.
> > > 
> > > If you add something you would create a new value and this may (in
> > > an object) have random new padding.  But the "expected" value should
> > > be updated by a failed atomic_compare_exchange cycle and then have
> > > same padding as the value stored in the atomic. So the next cycle
> > > should succeed.  The user would not change the representation of
> > > the "expected" value but create a new value for another object
> > > by adding something.
> > 
> > You're right that it would pass the expected value not something after an
> > operation on it usually.  But still, expected type will be something like
> > _BitInt(37) or _BitInt(195) and so neither the atomic_load nor what
> > atomic_compare_exchange copies back on failure is guaranteed to have the
> > padding bits preserved.
> 
> For atomic_load in C a value is returned. A value does not care about
> padding and when stored into a new object can produce new and different
> padding.  
> 
> But for atomic_compare_exchange the memory content is copied into 
> an object passed by pointer, so here the C standard requires to
> that the padding is preserved. It explicitely states that the effect
> is like:
> 
> if (memcmp(object, expected, sizeof(*object)) == 0)
>   memcpy(object, , sizeof(*object));
> else
>   memcpy(expected, object, sizeof(*object));
> 
> > It is true that if it is larger than 16 bytes the libatomic 
> > atomic_compare_exchange will memcpy the value back which copies the 
> > padding bits, but is there a guarantee that the user code doesn't 
> > actually copy that value further into some other variable?  
> 
> I do not think it would be surprising for C user when
> the next atomic_compare_exchange fails in this case.

But that is a problem (the same one the cited C++ paper tries to resolve, 
IIUC).  Say you have a loop like so:

_Atomic T obj;
...
T expected1, expected2, newval;
newval = ...;
expected1 = ...;
do {
  expected2 = expected1;
  if (atomic_compare_exchange_weak(, , newval);
break;
  expected1 = expected2;
} while (1);

As written this looks of course stupid, and you may say "don't do that", 
but internally the copies might result from temporaries (compiler 
generated or wrapper function arguments, or suchlike).  Now, while 
expected2 will contain the copied padding bits after the cmpxchg the 
copies to and from expected1 will possibly destroy them.  Either way I 
don't see why the above loop should be out-of-spec, so I can write it and 
expect it to proceed eventually (certainly when the _strong variant is 
used).  Any argument that would declare the above loop out-of-spec I would 
consider a defect in the spec.

It's never a good idea to introduce reliance on padding bits.  Exactly 
because you can trivially destroy them with simple value copies.

> > Anyway, for smaller or equal to 16 (or 8) bytes if 
> > atomic_compare_exchange is emitted inline I don't see what would 
> > preserve the bits.
> 
> This then seems to be incorrect for C.

Or the spec is.


Ciao,
Michael.


Re: Re: [PATCH] RISC-V: Support POPCOUNT auto-vectorization

2023-07-31 Thread 钟居哲
Address comment and fix on V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625870.html 
Ok for trunk?



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-31 21:38
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support POPCOUNT auto-vectorization
On Mon, Jul 31, 2023 at 8:03 PM Juzhe-Zhong  wrote:
>
> This patch is inspired by "lowerCTPOP" in LLVM.
> Support popcount auto-vectorization by following LLVM approach.
> https://godbolt.org/z/3K3GzvY7f
>
> Before this patch:
>
> :7:21: missed: couldn't vectorize loop
> :8:14: missed: not vectorized: relevant stmt not supported: _5 = 
> __builtin_popcount (_4);
>
> After this patch:
>
> popcount_32:
> ble a2,zero,.L5
> li  t3,1431654400
> li  a7,858992640
> li  t1,252645376
> li  a6,16711680
> li  a3,65536
> addiw   t3,t3,1365
> addiw   a7,a7,819
> addiw   t1,t1,-241
> addiw   a6,a6,255
> addiw   a3,a3,-1
> .L3:
> vsetvli a5,a2,e8,mf4,ta,ma
> vle32.v v1,0(a1)
> vsetivlizero,4,e32,m1,ta,ma
> vsrl.vi v2,v1,1
> vand.vx v2,v2,t3
> vsub.vv v1,v1,v2
> vsrl.vi v2,v1,2
> vand.vx v2,v2,a7
> vand.vx v1,v1,a7
> vadd.vv v1,v1,v2
> vsrl.vi v2,v1,4
> vadd.vv v1,v1,v2
> vand.vx v1,v1,t1
> vsrl.vi v2,v1,8
> vand.vx v2,v2,a6
> sllia4,a5,2
> vand.vx v1,v1,a6
> vadd.vv v1,v1,v2
> vsrl.vi v2,v1,16
> vand.vx v1,v1,a3
> vand.vx v2,v2,a3
> vadd.vv v1,v1,v2
> vmv.v.v v1,v1
> vsetvli zero,a2,e32,m1,ta,ma
> sub a2,a2,a5
> vse32.v v1,0(a0)
> add a1,a1,a4
> add a0,a0,a4
> bne a2,zero,.L3
> .L5:
> ret
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (popcount2): New pattern.
> * config/riscv/riscv-protos.h (expand_popcount): New function.
> * config/riscv/riscv-v.cc (expand_popcount): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/widen/popcount-1.c: New test.
> * gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 13 +++
>  gcc/config/riscv/riscv-protos.h   |  1 +
>  gcc/config/riscv/riscv-v.cc   | 95 +++
>  .../riscv/rvv/autovec/widen/popcount-1.c  | 23 +
>  .../riscv/rvv/autovec/widen/popcount_run-1.c  | 50 ++
>  5 files changed, 182 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index b5152bc91fd..9d32b91bdca 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -922,6 +922,19 @@
>DONE;
>  })
>
> +;; 
> ---
> +;; - [INT] POPCOUNT.
> +;; 
> ---
> +
> +(define_expand "popcount2"
> +  [(match_operand:VI 0 "register_operand")
> +   (match_operand:VI 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_popcount (operands);
> +  DONE;
> +})
> +
>  ;; 
> ---
>  ;;  [FP] Unary operations
>  ;; 
> ---
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index a729db44c32..ae40fbb4b53 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -321,6 +321,7 @@ void expand_select_vl (rtx *);
>  void expand_load_store (rtx *, bool);
>  void expand_gather_scatter (rtx *, bool);
>  void expand_cond_len_ternop (unsigned, rtx *);
> +void expand_popcount (rtx *);
>
>  /* Rounding mode bitfield for fixed point VXRM.  */
>  enum fixed_point_rounding_mode
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index c10e51b362e..b3caa4b188d 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -3614,4 +3614,99 @@ expand_reduction (rtx_code code, rtx *ops, rtx init, 
> reduction_type type)
>emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2));
>  }
>
> +/* Expand Vector POPCOUNT by parallel popcnt:
> +
> +   int parallel_popcnt(uint32_t n) {
> +   #define POW2(c)  (1U << (c))
> +   #define MASK(c)  (static_cast(-1) / (POW2(POW2(c)) + 1U))
> +   #define COUNT(x, c)  ((x) & MASK(c)) + (((x)>>(POW2(c))) & MASK(c))
> +   n = COUNT(n, 0);
> +   n = COUNT(n, 1);
> +   n = COUNT(n, 2);
> +   n = COUNT(n, 3);
> +   n = COUNT(n, 4);
> +   //  n = COUNT(n, 5);  // 

Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-31 Thread 钟居哲
Oh, Thanks a lot.
I can test it in RISC-V backend now.

But I have another questions:
>> I'm a bit confused (but also by the existing mask code), whether
>>vect_nargs needs adjustment depends on the IFN in the IL we analyze.
>>If if-conversion recognizes a .COND_ADD then we need to add nothing
>>for masking (that is, ifn == cond_fn already).  In your code above
>>you either use cond_len_fn or get_len_internal_fn (cond_fn) but
>>isn't that the very same?!  So how come you in one case add two
>>and in the other add four args?
>>Please make sure to place gcc_unreachable () in each arm and check
>>you have test coverage.  I believe that the else arm is unreachable
>>but when you vectorize .FMA you will need to add 4 and when you
>>vectorize .COND_FMA you will need to add two arguments (as said,
>>no idea why we special case reduc_idx >= 0 at the moment).

Do you mean I add gcc_unreachable in else like this:

  if (len_loop_p)
{
  if (len_opno >= 0)
{
  ifn = cond_len_fn;
  /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS.  */
  vect_nargs += 2;
}
  else if (reduc_idx >= 0)
{
  /* FMA -> COND_LEN_FMA takes 4 extra arguments:MASK,ELSE,LEN,BIAS.  */
  ifn = get_len_internal_fn (cond_fn);
  vect_nargs += 4;
}
else
gcc_unreachable ();
}

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-31 21:58
To: 钟居哲
CC: richard.sandiford; gcc-patches
Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
On Mon, 31 Jul 2023, ??? wrote:
 
> Yeah. I have tried this case too.
> 
> But this case doesn't need to be vectorized as COND_FMA, am I right?
 
Only when you enable loop masking.  Alternatively use
 
double foo (double *a, double *b, double *c)
{
  double result = 0.0;
  for (int i = 0; i < 1024; ++i)
result += i & 1 ? __builtin_fma (a[i], b[i], c[i]) : 0.0;
  return result;
}
 
but then for me if-conversion produces
 
  iftmp.0_18 = __builtin_fma (_8, _10, _5);
  _ifc__43 = _26 ? iftmp.0_18 : 0.0;
 
with -ffast-math (probably rightfully so).  I then get .FMAs
vectorized and .COND_FMA folded.
 
> The thing I wonder is that whether this condtion:
> 
> if  (mask_opno >= 0 && reduc_idx >= 0)
> 
> or similar as len
> if  (len_opno >= 0 && reduc_idx >= 0)
> 
> Whether they are redundant in vectorizable_call ?
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-31 21:33
> To: juzhe.zh...@rivai.ai
> CC: richard.sandiford; gcc-patches
> Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > >> I think you need to use fma from math.h together with -ffast-math
> > >>to get fma.
> > 
> > As you said, this is one of the case I tried:
> > https://godbolt.org/z/xMzrrv5dT 
> > GCC failed to vectorize.
> > 
> > Could you help me with this?
>  
> double foo (double *a, double *b, double *c)
> {
>   double result = 0.0;
>   for (int i = 0; i < 1024; ++i)
> result += __builtin_fma (a[i], b[i], c[i]);
>   return result;
> }
>  
> with -mavx2 -mfma -Ofast this is vectorized on x86_64 to
>  
> ...
>   vect__9.13_27 = MEM  [(double *)vectp_a.11_29];
>   _9 = *_8;
>   vect__10.14_26 = .FMA (vect__7.10_30, vect__9.13_27, vect__4.7_33);
>   vect_result_17.15_25 = vect__10.14_26 + vect_result_20.4_36;
> ...
>  
> but ifcvt still shows
>  
>   _9 = *_8;
>   _10 = __builtin_fma (_7, _9, _4);
>   result_17 = _10 + result_20;
>  
> still vectorizable_call has IFN_FMA with
>  
>   /* First try using an internal function.  */
>   code_helper convert_code = MAX_TREE_CODES;
>   if (cfn != CFN_LAST
>   && (modifier == NONE
>   || (modifier == NARROW
>   && simple_integer_narrowing (vectype_out, vectype_in,
>_code
> ifn = vectorizable_internal_function (cfn, callee, vectype_out,
>   vectype_in);
>  
> from CFN_BUILT_IN_FMA
>  
>  
>  
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-07-31 20:00
> > To: juzhe.zh...@rivai.ai
> > CC: richard.sandiford; gcc-patches
> > Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> > On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > Ok . Thanks Richard.
> > > 
> > > Could you give me a case that SVE can vectorize a reduction with FMA?
> > > Meaning it will go into vectorize_call and vectorize FMA into COND_FMA ?
> > > 
> > > I tried many times to reproduce such cases but I failed.
> >  
> > I think you need to use fma from math.h together with -ffast-math
> > to get fma.
> >  
> > Richard.
> >  
> > > Thanks.
> > > 
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Sandiford
> > > Date: 2023-07-31 18:19
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; rguenther
> > > Subject: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> > > Juzhe-Zhong  

[PATCH V2] RISC-V: Support POPCOUNT auto-vectorization

2023-07-31 Thread Juzhe-Zhong
This patch is inspired by "lowerCTPOP" in LLVM.
Support popcount auto-vectorization by LLVM approach.

Before this patch:

:7:21: missed: couldn't vectorize loop
:8:14: missed: not vectorized: relevant stmt not supported: _5 = 
__builtin_popcount (_4);

After this patch:

popcount_32:
ble a2,zero,.L5
li  t3,1431654400
li  a7,858992640
li  t1,252645376
li  a6,16711680
li  a3,65536
addiw   t3,t3,1365
addiw   a7,a7,819
addiw   t1,t1,-241
addiw   a6,a6,255
addiw   a3,a3,-1
.L3:
vsetvli a5,a2,e8,mf4,ta,ma
vle32.v v1,0(a1)
vsetivlizero,4,e32,m1,ta,ma
vsrl.vi v2,v1,1
vand.vx v2,v2,t3
vsub.vv v1,v1,v2
vsrl.vi v2,v1,2
vand.vx v2,v2,a7
vand.vx v1,v1,a7
vadd.vv v1,v1,v2
vsrl.vi v2,v1,4
vadd.vv v1,v1,v2
vand.vx v1,v1,t1
vsrl.vi v2,v1,8
vand.vx v2,v2,a6
sllia4,a5,2
vand.vx v1,v1,a6
vadd.vv v1,v1,v2
vsrl.vi v2,v1,16
vand.vx v1,v1,a3
vand.vx v2,v2,a3
vadd.vv v1,v1,v2
vmv.v.v v1,v1
vsetvli zero,a2,e32,m1,ta,ma
sub a2,a2,a5
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L3
.L5:
ret

gcc/ChangeLog:

* config/riscv/autovec.md (popcount2): New pattern.
* config/riscv/riscv-protos.h (expand_popcount): New function.
* config/riscv/riscv-v.cc (expand_popcount): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/popcount-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c: New test.

---
 gcc/config/riscv/autovec.md   | 13 +++
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 81 +++
 .../riscv/rvv/autovec/widen/popcount-1.c  | 23 ++
 .../riscv/rvv/autovec/widen/popcount_run-1.c  | 50 
 5 files changed, 168 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 2094a77a9a7..7babc9756a1 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -922,6 +922,19 @@
   DONE;
 })
 
+;; 
---
+;; - [INT] POPCOUNT.
+;; 
---
+
+(define_expand "popcount2"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_popcount (operands);
+  DONE;
+})
+
 ;; 
---
 ;;  [FP] Unary operations
 ;; 
---
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 9e2c3d3e2cc..446ba7b559e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -321,6 +321,7 @@ void expand_select_vl (rtx *);
 void expand_load_store (rtx *, bool);
 void expand_gather_scatter (rtx *, bool);
 void expand_cond_len_ternop (unsigned, rtx *);
+void expand_popcount (rtx *);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index aa8a6763716..ac7dae952be 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3610,4 +3610,85 @@ expand_reduction (rtx_code code, rtx *ops, rtx init, 
reduction_type type)
   emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2));
 }
 
+/* Expand Vector POPCOUNT by parallel popcnt:
+
+   int parallel_popcnt(uint32_t n) {
+   #define POW2(c)  (1U << (c))
+   #define MASK(c)  (static_cast(-1) / (POW2(POW2(c)) + 1U))
+   #define COUNT(x, c)  ((x) & MASK(c)) + (((x)>>(POW2(c))) & MASK(c))
+   n = COUNT(n, 0);
+   n = COUNT(n, 1);
+   n = COUNT(n, 2);
+   n = COUNT(n, 3);
+   n = COUNT(n, 4);
+   //  n = COUNT(n, 5);  // uncomment this line for 64-bit integers
+   return n;
+   #undef COUNT
+   #undef MASK
+   #undef POW2
+   }
+*/
+void
+expand_popcount (rtx *ops)
+{
+  rtx dst = ops[0];
+  rtx src = ops[1];
+  machine_mode mode = GET_MODE (dst);
+  scalar_mode smode = GET_MODE_INNER (mode);
+  static const uint64_t mask_values[6]
+= {0xULL, 0xULL, 0x0F0F0F0F0F0F0F0FULL,
+   0x00FF00FF00FF00FFULL, 0xULL, 0xULL};
+
+  unsigned bit_size = GET_MODE_BITSIZE (smode);
+  rtx count = CONST0_RTX (mode);
+
+  rtx part_value = src;
+  /* Currently we don't have TI vector modes so bit_size is always <= 64.  */
+  for 

Re: [PATCH] RISC-V: Return machine_mode rather than opt_machine_mode for get_mask_mode, NFC

2023-07-31 Thread Maciej W. Rozycki
On Mon, 31 Jul 2023, Kito Cheng wrote:

> Sorry for disturbing, pushed a fix for that, and...added
> -Werror=unused-variable to my build script to prevent that happen
> again :(

 I just configure with `--enable-werror-always', which we want to keep 
our standards up to anyway, but if you find this infeasible for some 
reason with your workflow, then there's always an option to grep for 
warnings in the build log and diff that against the previous iteration.

  Maciej


[PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-07-31 Thread Richard Biener via Gcc-patches
The following makes sure to limit the shift operand when vectorizing
(short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
operand otherwise invokes undefined behavior.  When we determine
whether we can demote the operand we know we at most shift in the
sign bit so we can adjust the shift amount.

Note this has the possibility of un-CSEing common shift operands
as there's no good way to share pattern stmts between patterns.
We'd have to separately pattern recognize the definition.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Not sure about LSHIFT_EXPR, it probably has the same issue but
the fallback optimistic zero for out-of-range shifts is at least
"corrrect".  Not sure we ever try to demote rotates (probably not).

OK?

Thanks,
Richard.

PR tree-optimization/110838
* tree-vect-patterns.cc (vect_recog_over_widening_pattern):
Adjust the shift operand of RSHIFT_EXPRs.

* gcc.dg/torture/pr110838.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110838.c | 43 +
 gcc/tree-vect-patterns.cc   | 24 ++
 2 files changed, 67 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110838.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110838.c 
b/gcc/testsuite/gcc.dg/torture/pr110838.c
new file mode 100644
index 000..f039bd6c8ea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110838.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+
+typedef __UINT32_TYPE__ uint32_t;
+typedef __UINT8_TYPE__ uint8_t;
+typedef __INT8_TYPE__ int8_t;
+typedef uint8_t pixel;
+
+/* get the sign of input variable (TODO: this is a dup, make common) */
+static inline int8_t signOf(int x)
+{
+  return (x >> 31) | ((int)uint32_t)-x)) >> 31));
+}
+
+__attribute__((noipa))
+static void calSign_bug(int8_t *dst, const pixel *src1, const pixel *src2, 
const int endX)
+{
+  for (int x = 0; x < endX; x++)
+dst[x] = signOf(src1[x] - src2[x]);
+}
+
+__attribute__((noipa, optimize(0)))
+static void calSign_ok(int8_t *dst, const pixel *src1, const pixel *src2, 
const int endX)
+{
+  for (int x = 0; x < endX; x++)
+dst[x] = signOf(src1[x] - src2[x]);
+}
+
+__attribute__((noipa, optimize(0)))
+int main()
+{
+  const pixel s1[9] = { 0xcd, 0x33, 0xd4, 0x3e, 0xb0, 0xfb, 0x95, 0x64, 0x70, 
};
+  const pixel s2[9] = { 0xba, 0x9f, 0xab, 0xa1, 0x3b, 0x29, 0xb1, 0xbd, 0x64, 
};
+  int endX = 9;
+  int8_t dst[9];
+  int8_t dst_ok[9];
+
+  calSign_bug(dst, s1, s2, endX);
+  calSign_ok(dst_ok, s1, s2, endX);
+
+  if (__builtin_memcmp(dst, dst_ok, endX) != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index ef806e2346e..e4ab8c2d65b 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -3099,9 +3099,33 @@ vect_recog_over_widening_pattern (vec_info *vinfo,
   tree ops[3] = {};
   for (unsigned int i = 1; i < first_op; ++i)
 ops[i - 1] = gimple_op (last_stmt, i);
+  /* For right shifts limit the shift operand.  */
   vect_convert_inputs (vinfo, last_stmt_info, nops, [first_op - 1],
   op_type, [0], op_vectype);
 
+  /* Limit shift operands.  */
+  if (code == RSHIFT_EXPR)
+{
+  wide_int min_value, max_value;
+  if (TREE_CODE (ops[1]) == INTEGER_CST)
+   ops[1] = wide_int_to_tree (op_type,
+  wi::bit_and (wi::to_wide (ops[1]),
+   new_precision - 1));
+  else if (!vect_get_range_info (ops[1], _value, _value)
+  || wi::ge_p (max_value, new_precision, TYPE_SIGN (op_type)))
+   {
+ /* ???  Note the following bad for SLP as that only supports
+same argument widened shifts and it un-CSEs same arguments.  */
+ tree new_var = vect_recog_temp_ssa_var (op_type, NULL);
+ gimple *pattern_stmt
+   = gimple_build_assign (new_var, BIT_AND_EXPR, ops[1],
+  build_int_cst (op_type, new_precision - 1));
+ ops[1] = new_var;
+ gimple_set_location (pattern_stmt, gimple_location (last_stmt));
+ append_pattern_def_seq (vinfo, last_stmt_info, pattern_stmt);
+   }
+}
+
   /* Use the operation to produce a result of type OP_TYPE.  */
   tree new_var = vect_recog_temp_ssa_var (op_type, NULL);
   gimple *pattern_stmt = gimple_build_assign (new_var, code,
-- 
2.35.3


Re: [PATCH v2] combine: Narrow comparison of memory and constant

2023-07-31 Thread Jeff Law via Gcc-patches




On 6/19/23 08:23, Stefan Schulze Frielinghaus via Gcc-patches wrote:

Comparisons between memory and constants might be done in a smaller mode
resulting in smaller constants which might finally end up as immediates
instead of in the literal pool.

For example, on s390x a non-symmetric comparison like
   x <= 0x3fff
results in the constant being spilled to the literal pool and an 8 byte
memory comparison is emitted.  Ideally, an equivalent comparison
   x0 <= 0x3f
where x0 is the most significant byte of x, is emitted where the
constant is smaller and more likely to materialize as an immediate.

Similarly, comparisons of the form
   x >= 0x4000
can be shortened into x0 >= 0x40.

Bootstrapped and regtested on s390x, x64, aarch64, and powerpc64le.
Note, the new tests show that for the mentioned little-endian targets
the optimization does not materialize since either the costs of the new
instructions are higher or they do not match.  Still ok for mainline?

gcc/ChangeLog:

* combine.cc (simplify_compare_const): Narrow comparison of
memory and constant.
(try_combine): Adapt new function signature.
(simplify_comparison): Adapt new function signature.

gcc/testsuite/ChangeLog:

* gcc.dg/cmp-mem-const-1.c: New test.
* gcc.dg/cmp-mem-const-2.c: New test.
* gcc.dg/cmp-mem-const-3.c: New test.
* gcc.dg/cmp-mem-const-4.c: New test.
* gcc.dg/cmp-mem-const-5.c: New test.
* gcc.dg/cmp-mem-const-6.c: New test.
* gcc.target/s390/cmp-mem-const-1.c: New test.
Sorry.  I'd looked at this a while back, wanted to take another looksie 
and totally forgot about it.


OK for the trunk.  Thanks for your patience.

jeff



Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-31 Thread Richard Biener via Gcc-patches
On Mon, 31 Jul 2023, ??? wrote:

> Yeah. I have tried this case too.
> 
> But this case doesn't need to be vectorized as COND_FMA, am I right?

Only when you enable loop masking.  Alternatively use

double foo (double *a, double *b, double *c)
{
  double result = 0.0;
  for (int i = 0; i < 1024; ++i)
result += i & 1 ? __builtin_fma (a[i], b[i], c[i]) : 0.0;
  return result;
}

but then for me if-conversion produces

  iftmp.0_18 = __builtin_fma (_8, _10, _5);
  _ifc__43 = _26 ? iftmp.0_18 : 0.0;

with -ffast-math (probably rightfully so).  I then get .FMAs
vectorized and .COND_FMA folded.

> The thing I wonder is that whether this condtion:
> 
> if  (mask_opno >= 0 && reduc_idx >= 0)
> 
> or similar as len
> if  (len_opno >= 0 && reduc_idx >= 0)
> 
> Whether they are redundant in vectorizable_call ?
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-31 21:33
> To: juzhe.zh...@rivai.ai
> CC: richard.sandiford; gcc-patches
> Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > >> I think you need to use fma from math.h together with -ffast-math
> > >>to get fma.
> > 
> > As you said, this is one of the case I tried:
> > https://godbolt.org/z/xMzrrv5dT 
> > GCC failed to vectorize.
> > 
> > Could you help me with this?
>  
> double foo (double *a, double *b, double *c)
> {
>   double result = 0.0;
>   for (int i = 0; i < 1024; ++i)
> result += __builtin_fma (a[i], b[i], c[i]);
>   return result;
> }
>  
> with -mavx2 -mfma -Ofast this is vectorized on x86_64 to
>  
> ...
>   vect__9.13_27 = MEM  [(double *)vectp_a.11_29];
>   _9 = *_8;
>   vect__10.14_26 = .FMA (vect__7.10_30, vect__9.13_27, vect__4.7_33);
>   vect_result_17.15_25 = vect__10.14_26 + vect_result_20.4_36;
> ...
>  
> but ifcvt still shows
>  
>   _9 = *_8;
>   _10 = __builtin_fma (_7, _9, _4);
>   result_17 = _10 + result_20;
>  
> still vectorizable_call has IFN_FMA with
>  
>   /* First try using an internal function.  */
>   code_helper convert_code = MAX_TREE_CODES;
>   if (cfn != CFN_LAST
>   && (modifier == NONE
>   || (modifier == NARROW
>   && simple_integer_narrowing (vectype_out, vectype_in,
>_code
> ifn = vectorizable_internal_function (cfn, callee, vectype_out,
>   vectype_in);
>  
> from CFN_BUILT_IN_FMA
>  
>  
>  
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-07-31 20:00
> > To: juzhe.zh...@rivai.ai
> > CC: richard.sandiford; gcc-patches
> > Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> > On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > Ok . Thanks Richard.
> > > 
> > > Could you give me a case that SVE can vectorize a reduction with FMA?
> > > Meaning it will go into vectorize_call and vectorize FMA into COND_FMA ?
> > > 
> > > I tried many times to reproduce such cases but I failed.
> >  
> > I think you need to use fma from math.h together with -ffast-math
> > to get fma.
> >  
> > Richard.
> >  
> > > Thanks.
> > > 
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Sandiford
> > > Date: 2023-07-31 18:19
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; rguenther
> > > Subject: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> > > Juzhe-Zhong  writes:
> > > > Hi, Richard and Richi.
> > > >
> > > > Base on the suggestions from Richard:
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
> > > >
> > > > This patch choose (1) approach that Richard provided, meaning:
> > > >
> > > > RVV implements cond_* optabs as expanders.  RVV therefore supports
> > > > both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
> > > > are needed at the gimple level.
> > > >
> > > > Such approach can make codes much cleaner and reasonable.
> > > >
> > > > Consider this following case:
> > > > void foo (float * __restrict a, float * __restrict b, int * __restrict 
> > > > cond, int n)
> > > > {
> > > >   for (int i = 0; i < n; i++)
> > > > if (cond[i])
> > > >   a[i] = b[i] + a[i];
> > > > }
> > > >
> > > >
> > > > Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
> > > > :5:21: missed: couldn't vectorize loop
> > > > :5:21: missed: not vectorized: control flow in loop.
> > > >
> > > > ARM SVE:
> > > >
> > > > ...
> > > > mask__27.10_51 = vect__4.9_49 != { 0, ... };
> > > > ...
> > > > vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
> > > > ...
> > > > vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, 
> > > > vect__8.16_60, vect__6.13_56);
> > > >
> > > > For RVV, we want IR as follows:
> > > >
> > > > ...
> > > > _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
> > > > ...
> > > > mask__27.10_51 = vect__4.9_49 != { 0, ... };
> > > > ...
> > > > vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, 
> > > > vect__8.16_59, 

Re: Re: [PATCH] RISC-V: Support POPCOUNT auto-vectorization

2023-07-31 Thread 钟居哲

>> Drop outer loop if word_size never larger than 1?
Yeah. we never have TI vector modes for now.

The codes I just directly copy from LLVM in generic intrinsic handling :)
since LLVM generic code is considering handling INT128 vector

I will remove all redundant code for INT128 vector mode in V2.
Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-31 21:38
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support POPCOUNT auto-vectorization
On Mon, Jul 31, 2023 at 8:03 PM Juzhe-Zhong  wrote:
>
> This patch is inspired by "lowerCTPOP" in LLVM.
> Support popcount auto-vectorization by following LLVM approach.
> https://godbolt.org/z/3K3GzvY7f
>
> Before this patch:
>
> :7:21: missed: couldn't vectorize loop
> :8:14: missed: not vectorized: relevant stmt not supported: _5 = 
> __builtin_popcount (_4);
>
> After this patch:
>
> popcount_32:
> ble a2,zero,.L5
> li  t3,1431654400
> li  a7,858992640
> li  t1,252645376
> li  a6,16711680
> li  a3,65536
> addiw   t3,t3,1365
> addiw   a7,a7,819
> addiw   t1,t1,-241
> addiw   a6,a6,255
> addiw   a3,a3,-1
> .L3:
> vsetvli a5,a2,e8,mf4,ta,ma
> vle32.v v1,0(a1)
> vsetivlizero,4,e32,m1,ta,ma
> vsrl.vi v2,v1,1
> vand.vx v2,v2,t3
> vsub.vv v1,v1,v2
> vsrl.vi v2,v1,2
> vand.vx v2,v2,a7
> vand.vx v1,v1,a7
> vadd.vv v1,v1,v2
> vsrl.vi v2,v1,4
> vadd.vv v1,v1,v2
> vand.vx v1,v1,t1
> vsrl.vi v2,v1,8
> vand.vx v2,v2,a6
> sllia4,a5,2
> vand.vx v1,v1,a6
> vadd.vv v1,v1,v2
> vsrl.vi v2,v1,16
> vand.vx v1,v1,a3
> vand.vx v2,v2,a3
> vadd.vv v1,v1,v2
> vmv.v.v v1,v1
> vsetvli zero,a2,e32,m1,ta,ma
> sub a2,a2,a5
> vse32.v v1,0(a0)
> add a1,a1,a4
> add a0,a0,a4
> bne a2,zero,.L3
> .L5:
> ret
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (popcount2): New pattern.
> * config/riscv/riscv-protos.h (expand_popcount): New function.
> * config/riscv/riscv-v.cc (expand_popcount): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/widen/popcount-1.c: New test.
> * gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 13 +++
>  gcc/config/riscv/riscv-protos.h   |  1 +
>  gcc/config/riscv/riscv-v.cc   | 95 +++
>  .../riscv/rvv/autovec/widen/popcount-1.c  | 23 +
>  .../riscv/rvv/autovec/widen/popcount_run-1.c  | 50 ++
>  5 files changed, 182 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index b5152bc91fd..9d32b91bdca 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -922,6 +922,19 @@
>DONE;
>  })
>
> +;; 
> ---
> +;; - [INT] POPCOUNT.
> +;; 
> ---
> +
> +(define_expand "popcount2"
> +  [(match_operand:VI 0 "register_operand")
> +   (match_operand:VI 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_popcount (operands);
> +  DONE;
> +})
> +
>  ;; 
> ---
>  ;;  [FP] Unary operations
>  ;; 
> ---
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index a729db44c32..ae40fbb4b53 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -321,6 +321,7 @@ void expand_select_vl (rtx *);
>  void expand_load_store (rtx *, bool);
>  void expand_gather_scatter (rtx *, bool);
>  void expand_cond_len_ternop (unsigned, rtx *);
> +void expand_popcount (rtx *);
>
>  /* Rounding mode bitfield for fixed point VXRM.  */
>  enum fixed_point_rounding_mode
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index c10e51b362e..b3caa4b188d 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -3614,4 +3614,99 @@ expand_reduction (rtx_code code, rtx *ops, rtx init, 
> reduction_type type)
>emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2));
>  }
>
> +/* Expand Vector POPCOUNT by parallel popcnt:
> +
> +   int parallel_popcnt(uint32_t n) {
> +   #define POW2(c)  (1U << (c))
> +   #define MASK(c)  (static_cast(-1) / (POW2(POW2(c)) + 1U))
> +   #define COUNT(x, c)  ((x) & MASK(c)) + 

Re: [PATCH 3/3] genmatch: Log line numbers indirectly

2023-07-31 Thread Richard Biener via Gcc-patches
On Mon, Jul 31, 2023 at 1:07 PM Andrzej Turko via Gcc-patches
 wrote:
>
> Currently fprintf calls logging to a dump file take line numbers
> in the match.pd file directly as arguments.
> When match.pd is edited, referenced code changes line numbers,
> which causes changes to many fprintf calls and, thus, to many
> (usually all) .cc files generated by genmatch. This forces make
> to (unnecessarily) rebuild many .o files.
>
> With this change those logging fprintf calls reference an array
> of line numbers, which is defined in one of the produced files.
> Thanks to this, when match.pd changes, it is enough to rebuild
> that single file and, of course, those actually affected by the
> change.
>
> Signed-off-by: Andrzej Turko 

How does this affect the size of the executable?  We are replacing
pushing a small immediate to the stack with an indexed load plus push.

Maybe further indirecting the whole dumping, passing an index of the
match and __FILE__/__LINE__ would help here, so instead of

  if (UNLIKELY (debug_dump)) fprintf
(dump_file, "Matching expression %s:%d, %s:%d\n", "match.pd", 2522,
__FILE__, __LINE__);

we emit sth like

  if (UNLIKELY (debug_dump)) dump_match (2522,
__FILE__, __LINE__);

with 2522 replaced by the ID?  That would also get rid of the inline
varargs invocation which might help code size as well (on some targets).

Richard.

> gcc/ChangeLog:
>
> * genmatch.cc: Keep line numbers from match.pd in an array.
>
> Signed-off-by: Andrzej Turko 
> ---
>  gcc/genmatch.cc | 73 +++--
>  1 file changed, 65 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index 1deca505603..0a480a140c9 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -217,9 +217,48 @@ fp_decl_done (FILE *f, const char *trailer)
>  fprintf (header_file, "%s;", trailer);
>  }
>
> +/* Line numbers for use by indirect line directives.  */
> +static vec dbg_line_numbers;
> +
> +static void
> +write_header_declarations (bool gimple, FILE *f)
> +{
> +  if (gimple)
> +fprintf (f, "\nextern int __gimple_dbg_line_numbers[];\n");
> +  else
> +fprintf (f, "\nextern int __generic_dbg_line_numbers[];\n");
> +}
> +
> +static void
> +define_dbg_line_numbers (bool gimple, FILE *f)
> +{
> +  if (gimple)
> +fprintf (f, "\nint __gimple_dbg_line_numbers[%d] = {",
> +   dbg_line_numbers.length ());
> +  else
> +fprintf (f, "\nint __generic_dbg_line_numbers[%d] = {",
> +   dbg_line_numbers.length ());
> +
> +   if (dbg_line_numbers.is_empty ())
> +{
> +  fprintf (f, "};\n\n");
> +  return;
> +}
> +
> +  for (int i = 0; i < (int)dbg_line_numbers.length () - 1; i++)
> +{
> +  if (i % 20 == 0)
> +   fprintf (f, "\n\t");
> +
> +  fprintf (f, "%d, ", dbg_line_numbers[i]);
> +}
> +  fprintf (f, "%d\n};\n\n", dbg_line_numbers.last ());
> +}
> +
>  static void
>  output_line_directive (FILE *f, location_t location,
> -  bool dumpfile = false, bool fnargs = false)
> + bool dumpfile = false, bool fnargs = false,
> + bool indirect_line_numbers = false, bool gimple = false)
>  {
>const line_map_ordinary *map;
>linemap_resolve_location (line_table, location, LRK_SPELLING_LOCATION, 
> );
> @@ -239,7 +278,20 @@ output_line_directive (FILE *f, location_t location,
> ++file;
>
>if (fnargs)
> -   fprintf (f, "\"%s\", %d", file, loc.line);
> +  {
> +  if (indirect_line_numbers)
> +{
> +  if (gimple)
> +  fprintf (f, "\"%s\", __gimple_dbg_line_numbers[%d]",
> + file, dbg_line_numbers.length ());
> +  else
> +  fprintf (f, "\"%s\", __generic_dbg_line_numbers[%d]",
> + file, dbg_line_numbers.length ());
> +  dbg_line_numbers.safe_push (loc.line);
> +}
> +  else
> +fprintf (f, "\"%s\", %d", file, loc.line);
> +  }
>else
> fprintf (f, "%s:%d", file, loc.line);
>  }
> @@ -3378,7 +3430,8 @@ dt_operand::gen (FILE *f, int indent, bool gimple, int 
> depth)
>  /* Emit a fprintf to the debug file to the file F, with the INDENT from
> either the RESULT location or the S's match location if RESULT is null. */
>  static void
> -emit_debug_printf (FILE *f, int indent, class simplify *s, operand *result)
> +emit_debug_printf (FILE *f, int indent, class simplify *s, operand *result,
> + bool gimple)
>  {
>fprintf_indent (f, indent, "if (UNLIKELY (debug_dump)) "
>"fprintf (dump_file, \"%s ",
> @@ -3387,7 +3440,7 @@ emit_debug_printf (FILE *f, int indent, class simplify 
> *s, operand *result)
>fprintf (f, "%%s:%%d, %%s:%%d\\n\", ");
>output_line_directive (f,
>  result ? result->location : s->match->location, true,
> -true);
> +true, true, gimple);
>fprintf (f, ", __FILE__, __LINE__);\n");
>  }
>

Re: Re: [committed] RISC-V: Fix bug of get_mask_mode

2023-07-31 Thread 钟居哲
Ok. Thanks. Li Pan is still testing. 



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-31 21:45
To: Kito Cheng
CC: Juzhe-Zhong; gcc-patches; jeffreyalaw; macro; pan2.li; rdapp.gcc
Subject: Re: [committed] RISC-V: Fix bug of get_mask_mode
I saw you didn't push yet, so I pushed another patch to fix those
unused variable issues.
 
On Mon, Jul 31, 2023 at 9:12 PM Kito Cheng  wrote:
>
> Ooops, I guess my code base was too old, and forgot to check that after 
> rebase, thanks for fix that!
>
> Juzhe-Zhong 於 2023年7月31日 週一,20:21寫道:
>>
>> Fix bugs:
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ‘void 
>> riscv_vector::emit_vlmax_masked_fp_mu_insn(unsigned int, int, rtx_def**)’:
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:999:54: error: request for 
>> member ‘require’ in ‘riscv_vector::get_mask_mode(dest_mode)’, which is of 
>> non-class type ‘machine_mode’
>>machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>>   ^~~
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ‘void 
>> riscv_vector::emit_nonvlmax_tumu_insn(unsigned int, int, rtx_def**, rtx)’:
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:1057:54: error: request for 
>> member ‘require’ in ‘riscv_vector::get_mask_mode(dest_mode)’, which is of 
>> non-class type ‘machine_mode’
>>machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>>   ^~~
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ‘void 
>> riscv_vector::emit_nonvlmax_fp_tumu_insn(unsigned int, int, rtx_def**, rtx)’:
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:1076:54: error: request for 
>> member ‘require’ in ‘riscv_vector::get_mask_mode(dest_mode)’, which is of 
>> non-class type ‘machine_mode’
>>machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>>
>> Obvious fix. Pushed.
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-v.cc (emit_vlmax_masked_fp_mu_insn): Fix bug.
>> (emit_nonvlmax_tumu_insn): Ditto.
>> (emit_nonvlmax_fp_tumu_insn): Ditto.
>> (expand_vec_series): Ditto.
>> (expand_vector_init_insert_elems): Ditto.
>>
>> ---
>>  gcc/config/riscv/riscv-v.cc | 8 +++-
>>  1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
>> index 76b437cc55e..40e4574dcc0 100644
>> --- a/gcc/config/riscv/riscv-v.cc
>> +++ b/gcc/config/riscv/riscv-v.cc
>> @@ -996,7 +996,7 @@ static void
>>  emit_vlmax_masked_fp_mu_insn (unsigned icode, int op_num, rtx *ops)
>>  {
>>machine_mode dest_mode = GET_MODE (ops[0]);
>> -  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>> +  machine_mode mask_mode = get_mask_mode (dest_mode);
>>insn_expander e (/*OP_NUM*/ op_num,
>>   /*HAS_DEST_P*/ true,
>>   /*FULLY_UNMASKED_P*/ false,
>> @@ -1054,7 +1054,7 @@ static void
>>  emit_nonvlmax_tumu_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
>>  {
>>machine_mode dest_mode = GET_MODE (ops[0]);
>> -  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>> +  machine_mode mask_mode = get_mask_mode (dest_mode);
>>insn_expander e (/*OP_NUM*/ op_num,
>>   /*HAS_DEST_P*/ true,
>>   /*FULLY_UNMASKED_P*/ false,
>> @@ -1073,7 +1073,7 @@ static void
>>  emit_nonvlmax_fp_tumu_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
>>  {
>>machine_mode dest_mode = GET_MODE (ops[0]);
>> -  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>> +  machine_mode mask_mode = get_mask_mode (dest_mode);
>>insn_expander e (/*OP_NUM*/ op_num,
>>   /*HAS_DEST_P*/ true,
>>   /*FULLY_UNMASKED_P*/ false,
>> @@ -1306,7 +1306,6 @@ void
>>  expand_vec_series (rtx dest, rtx base, rtx step)
>>  {
>>machine_mode mode = GET_MODE (dest);
>> -  machine_mode mask_mode = get_mask_mode (mode);
>>poly_int64 nunits_m1 = GET_MODE_NUNITS (mode) - 1;
>>poly_int64 value;
>>
>> @@ -2375,7 +2374,6 @@ expand_vector_init_insert_elems (rtx target, const 
>> rvv_builder ,
>>  int nelts_reqd)
>>  {
>>machine_mode mode = GET_MODE (target);
>> -  machine_mode mask_mode = get_mask_mode (mode);
>>rtx dup = expand_vector_broadcast (mode, builder.elt (0));
>>emit_move_insn (target, dup);
>>int ndups = builder.count_dups (0, nelts_reqd - 1, 1);
>> --
>> 2.36.3
>>
 


Re: [PATCH] RISC-V: Return machine_mode rather than opt_machine_mode for get_mask_mode, NFC

2023-07-31 Thread Kito Cheng via Gcc-patches
Hi Maciej:

Sorry for disturbing, pushed a fix for that, and...added
-Werror=unused-variable to my build script to prevent that happen
again :(

On Mon, Jul 31, 2023 at 7:08 PM Maciej W. Rozycki  wrote:
>
> On Mon, 31 Jul 2023, Kito Cheng via Gcc-patches wrote:
>
> > Pushed, thanks :)
>
>  This breaks compilation:
>
> .../gcc/config/riscv/riscv-v.cc: In function 'void 
> riscv_vector::expand_vec_series(rtx, rtx, rtx)':
> .../gcc/config/riscv/riscv-v.cc:1251:16: error: unused variable 'mask_mode' 
> [-Werror=unused-variable]
>  1251 |   machine_mode mask_mode = get_mask_mode (mode);
>   |^
> .../gcc/config/riscv/riscv-v.cc: In function 'void 
> riscv_vector::expand_vector_init_insert_elems(rtx, const 
> riscv_vector::rvv_builder&, int)':
> .../gcc/config/riscv/riscv-v.cc:2320:16: error: unused variable 'mask_mode' 
> [-Werror=unused-variable]
>  2320 |   machine_mode mask_mode = get_mask_mode (mode);
>   |^
>
> Please always at the very least build changes and verify that they cause
> no new issues before submitting patches.
>
>   Maciej


Re: [committed] RISC-V: Fix bug of get_mask_mode

2023-07-31 Thread Kito Cheng via Gcc-patches
I saw you didn't push yet, so I pushed another patch to fix those
unused variable issues.

On Mon, Jul 31, 2023 at 9:12 PM Kito Cheng  wrote:
>
> Ooops, I guess my code base was too old, and forgot to check that after 
> rebase, thanks for fix that!
>
> Juzhe-Zhong 於 2023年7月31日 週一,20:21寫道:
>>
>> Fix bugs:
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ‘void 
>> riscv_vector::emit_vlmax_masked_fp_mu_insn(unsigned int, int, rtx_def**)’:
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:999:54: error: request for 
>> member ‘require’ in ‘riscv_vector::get_mask_mode(dest_mode)’, which is of 
>> non-class type ‘machine_mode’
>>machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>>   ^~~
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ‘void 
>> riscv_vector::emit_nonvlmax_tumu_insn(unsigned int, int, rtx_def**, rtx)’:
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:1057:54: error: request for 
>> member ‘require’ in ‘riscv_vector::get_mask_mode(dest_mode)’, which is of 
>> non-class type ‘machine_mode’
>>machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>>   ^~~
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ‘void 
>> riscv_vector::emit_nonvlmax_fp_tumu_insn(unsigned int, int, rtx_def**, rtx)’:
>> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:1076:54: error: request for 
>> member ‘require’ in ‘riscv_vector::get_mask_mode(dest_mode)’, which is of 
>> non-class type ‘machine_mode’
>>machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>>
>> Obvious fix. Pushed.
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-v.cc (emit_vlmax_masked_fp_mu_insn): Fix bug.
>> (emit_nonvlmax_tumu_insn): Ditto.
>> (emit_nonvlmax_fp_tumu_insn): Ditto.
>> (expand_vec_series): Ditto.
>> (expand_vector_init_insert_elems): Ditto.
>>
>> ---
>>  gcc/config/riscv/riscv-v.cc | 8 +++-
>>  1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
>> index 76b437cc55e..40e4574dcc0 100644
>> --- a/gcc/config/riscv/riscv-v.cc
>> +++ b/gcc/config/riscv/riscv-v.cc
>> @@ -996,7 +996,7 @@ static void
>>  emit_vlmax_masked_fp_mu_insn (unsigned icode, int op_num, rtx *ops)
>>  {
>>machine_mode dest_mode = GET_MODE (ops[0]);
>> -  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>> +  machine_mode mask_mode = get_mask_mode (dest_mode);
>>insn_expander e (/*OP_NUM*/ op_num,
>>   /*HAS_DEST_P*/ true,
>>   /*FULLY_UNMASKED_P*/ false,
>> @@ -1054,7 +1054,7 @@ static void
>>  emit_nonvlmax_tumu_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
>>  {
>>machine_mode dest_mode = GET_MODE (ops[0]);
>> -  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>> +  machine_mode mask_mode = get_mask_mode (dest_mode);
>>insn_expander e (/*OP_NUM*/ op_num,
>>   /*HAS_DEST_P*/ true,
>>   /*FULLY_UNMASKED_P*/ false,
>> @@ -1073,7 +1073,7 @@ static void
>>  emit_nonvlmax_fp_tumu_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
>>  {
>>machine_mode dest_mode = GET_MODE (ops[0]);
>> -  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>> +  machine_mode mask_mode = get_mask_mode (dest_mode);
>>insn_expander e (/*OP_NUM*/ op_num,
>>   /*HAS_DEST_P*/ true,
>>   /*FULLY_UNMASKED_P*/ false,
>> @@ -1306,7 +1306,6 @@ void
>>  expand_vec_series (rtx dest, rtx base, rtx step)
>>  {
>>machine_mode mode = GET_MODE (dest);
>> -  machine_mode mask_mode = get_mask_mode (mode);
>>poly_int64 nunits_m1 = GET_MODE_NUNITS (mode) - 1;
>>poly_int64 value;
>>
>> @@ -2375,7 +2374,6 @@ expand_vector_init_insert_elems (rtx target, const 
>> rvv_builder ,
>>  int nelts_reqd)
>>  {
>>machine_mode mode = GET_MODE (target);
>> -  machine_mode mask_mode = get_mask_mode (mode);
>>rtx dup = expand_vector_broadcast (mode, builder.elt (0));
>>emit_move_insn (target, dup);
>>int ndups = builder.count_dups (0, nelts_reqd - 1, 1);
>> --
>> 2.36.3
>>


Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-31 Thread 钟居哲
Yeah. I have tried this case too.

But this case doesn't need to be vectorized as COND_FMA, am I right?

The thing I wonder is that whether this condtion:

if  (mask_opno >= 0 && reduc_idx >= 0)

or similar as len
if  (len_opno >= 0 && reduc_idx >= 0)

Whether they are redundant in vectorizable_call ?


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-31 21:33
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches
Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> >> I think you need to use fma from math.h together with -ffast-math
> >>to get fma.
> 
> As you said, this is one of the case I tried:
> https://godbolt.org/z/xMzrrv5dT 
> GCC failed to vectorize.
> 
> Could you help me with this?
 
double foo (double *a, double *b, double *c)
{
  double result = 0.0;
  for (int i = 0; i < 1024; ++i)
result += __builtin_fma (a[i], b[i], c[i]);
  return result;
}
 
with -mavx2 -mfma -Ofast this is vectorized on x86_64 to
 
...
  vect__9.13_27 = MEM  [(double *)vectp_a.11_29];
  _9 = *_8;
  vect__10.14_26 = .FMA (vect__7.10_30, vect__9.13_27, vect__4.7_33);
  vect_result_17.15_25 = vect__10.14_26 + vect_result_20.4_36;
...
 
but ifcvt still shows
 
  _9 = *_8;
  _10 = __builtin_fma (_7, _9, _4);
  result_17 = _10 + result_20;
 
still vectorizable_call has IFN_FMA with
 
  /* First try using an internal function.  */
  code_helper convert_code = MAX_TREE_CODES;
  if (cfn != CFN_LAST
  && (modifier == NONE
  || (modifier == NARROW
  && simple_integer_narrowing (vectype_out, vectype_in,
   _code
ifn = vectorizable_internal_function (cfn, callee, vectype_out,
  vectype_in);
 
from CFN_BUILT_IN_FMA
 
 
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-31 20:00
> To: juzhe.zh...@rivai.ai
> CC: richard.sandiford; gcc-patches
> Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Ok . Thanks Richard.
> > 
> > Could you give me a case that SVE can vectorize a reduction with FMA?
> > Meaning it will go into vectorize_call and vectorize FMA into COND_FMA ?
> > 
> > I tried many times to reproduce such cases but I failed.
>  
> I think you need to use fma from math.h together with -ffast-math
> to get fma.
>  
> Richard.
>  
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Sandiford
> > Date: 2023-07-31 18:19
> > To: Juzhe-Zhong
> > CC: gcc-patches; rguenther
> > Subject: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> > Juzhe-Zhong  writes:
> > > Hi, Richard and Richi.
> > >
> > > Base on the suggestions from Richard:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
> > >
> > > This patch choose (1) approach that Richard provided, meaning:
> > >
> > > RVV implements cond_* optabs as expanders.  RVV therefore supports
> > > both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
> > > are needed at the gimple level.
> > >
> > > Such approach can make codes much cleaner and reasonable.
> > >
> > > Consider this following case:
> > > void foo (float * __restrict a, float * __restrict b, int * __restrict 
> > > cond, int n)
> > > {
> > >   for (int i = 0; i < n; i++)
> > > if (cond[i])
> > >   a[i] = b[i] + a[i];
> > > }
> > >
> > >
> > > Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
> > > :5:21: missed: couldn't vectorize loop
> > > :5:21: missed: not vectorized: control flow in loop.
> > >
> > > ARM SVE:
> > >
> > > ...
> > > mask__27.10_51 = vect__4.9_49 != { 0, ... };
> > > ...
> > > vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
> > > ...
> > > vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
> > > vect__6.13_56);
> > >
> > > For RVV, we want IR as follows:
> > >
> > > ...
> > > _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
> > > ...
> > > mask__27.10_51 = vect__4.9_49 != { 0, ... };
> > > ...
> > > vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, 
> > > vect__8.16_59, vect__6.13_55, _68, 0);
> > > ...
> > >
> > > Both len and mask of COND_LEN_ADD are real not dummy.
> > >
> > > This patch has been fully tested in RISC-V port with supporting both 
> > > COND_* and COND_LEN_*.
> > >
> > > And also, Bootstrap and Regression on X86 passed.
> > >
> > > OK for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > * internal-fn.cc (FOR_EACH_LEN_FN_PAIR): New macro.
> > > (get_len_internal_fn): New function.
> > > (CASE): Ditto.
> > > * internal-fn.h (get_len_internal_fn): Ditto.
> > > * tree-vect-stmts.cc (vectorizable_call): Support CALL 
> > > vectorization with COND_LEN_*.
> > >
> > > ---
> > >  gcc/internal-fn.cc | 46 ++
> > >  gcc/internal-fn.h  |  1 +
> > >  gcc/tree-vect-stmts.cc | 87 

Re: [PATCH 2/3] genmatch: Reduce variability of generated code

2023-07-31 Thread Richard Biener via Gcc-patches
On Mon, Jul 31, 2023 at 1:06 PM Andrzej Turko via Gcc-patches
 wrote:
>
> So far genmatch has been using an unordered map to store information about
> functions to be generated. Since corresponding locations from match.pd were
> used as keys in the map, even small changes to match.pd which caused
> line number changes would change the order in which the functions are
> generated. This would reshuffle the functions between the generated .cc files.
> This way even a minimal modification to match.pd forces recompilation of all
> object files originating from match.pd on rebuild.
>
> This commit makes sure that functions are generated in the order of their
> processing (in contrast to the random order based on hashes of their
> locations in match.pd). This is done by replacing the unordered map with an
> ordered one. This way small changes to match.pd does not cause function
> renaming and reshuffling among generated source files.
> Together with the subsequent change to logging fprintf calls, this
> removes unnecessary changes to the files generated by genmatch allowing
> for reuse of already built object files during rebuild. The aim is to
> make editing of match.pd and subsequent testing easier.
>
> Signed-off-by: Andrzej Turko 

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * genmatch.cc: Make sinfo map ordered.
>
> Signed-off-by: Andrzej Turko 
> ---
>  gcc/genmatch.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index 2302f2a7ff0..1deca505603 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "hash-table.h"
>  #include "hash-set.h"
>  #include "is-a.h"
> +#include "ordered-hash-map.h"
>
>
>  /* Stubs for GGC referenced through instantiations triggered by hash-map.  */
> @@ -1684,7 +1685,7 @@ struct sinfo_hashmap_traits : 
> simple_hashmap_traits,
>template  static inline void remove (T &) {}
>  };
>
> -typedef hash_map
> +typedef ordered_hash_map
>sinfo_map_t;
>
>  /* Current simplifier ID we are processing during insertion into the
> --
> 2.34.1
>


Re: [PATCH 1/3] Support get_or_insert in ordered_hash_map

2023-07-31 Thread Richard Biener via Gcc-patches
On Mon, Jul 31, 2023 at 1:06 PM Andrzej Turko via Gcc-patches
 wrote:
>
> Get_or_insert method is already supported by the unordered hash map.
> Adding it to the ordered map enables us to replace the unordered map
> with the ordered one in cases where ordering may be useful.

OK.  Note the Makefile.in change really belongs to another patch in
the series, without this
it could be pushed independently.

Thanks,
Richard.

> Signed-off-by: Andrzej Turko 
>
> gcc/ChangeLog:
>
> * ordered-hash-map.h: Add get_or_insert.
> * Makefile.in: Require the ordered map header for genmatch.o.
> * ordered-hash-map-tests.cc: Use get_or_insert in tests.
>
> Signed-off-by: Andrzej Turko 
> ---
>  gcc/Makefile.in   |  4 ++--
>  gcc/ordered-hash-map-tests.cc | 19 +++
>  gcc/ordered-hash-map.h| 26 ++
>  3 files changed, 43 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index e99628cec07..2429128cbf2 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -3004,8 +3004,8 @@ build/genhooks.o : genhooks.cc $(TARGET_DEF) 
> $(C_TARGET_DEF)  \
>$(COMMON_TARGET_DEF) $(D_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h
>  build/genmddump.o : genmddump.cc $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)  
>   \
>$(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)
> -build/genmatch.o : genmatch.cc $(BCONFIG_H) $(SYSTEM_H) \
> -  $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \
> +build/genmatch.o : genmatch.cc $(BCONFIG_H) $(SYSTEM_H) $(CORETYPES_H) \
> +  errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h ordered-hash-map.h \
>tree.def builtins.def internal-fn.def case-cfn-macros.h $(CPPLIB_H)
>  build/gencfn-macros.o : gencfn-macros.cc $(BCONFIG_H) $(SYSTEM_H)  \
>$(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def  \
> diff --git a/gcc/ordered-hash-map-tests.cc b/gcc/ordered-hash-map-tests.cc
> index 1c26bbfa979..55894c25fa0 100644
> --- a/gcc/ordered-hash-map-tests.cc
> +++ b/gcc/ordered-hash-map-tests.cc
> @@ -58,6 +58,7 @@ static void
>  test_map_of_strings_to_int ()
>  {
>ordered_hash_map  m;
> +  bool existed;
>
>const char *ostrich = "ostrich";
>const char *elephant = "elephant";
> @@ -74,17 +75,23 @@ test_map_of_strings_to_int ()
>ASSERT_EQ (false, m.put (ostrich, 2));
>ASSERT_EQ (false, m.put (elephant, 4));
>ASSERT_EQ (false, m.put (ant, 6));
> -  ASSERT_EQ (false, m.put (spider, 8));
> +  existed = true;
> +  int  = m.get_or_insert (spider, );
> +  value = 8;
> +  ASSERT_EQ (false, existed);
>ASSERT_EQ (false, m.put (millipede, 750));
>ASSERT_EQ (false, m.put (eric, 3));
>
> +
>/* Verify that we can recover the stored values.  */
>ASSERT_EQ (6, m.elements ());
>ASSERT_EQ (2, *m.get (ostrich));
>ASSERT_EQ (4, *m.get (elephant));
>ASSERT_EQ (6, *m.get (ant));
>ASSERT_EQ (8, *m.get (spider));
> -  ASSERT_EQ (750, *m.get (millipede));
> +  existed = false;
> +  ASSERT_EQ (750, m.get_or_insert (millipede, ));
> +  ASSERT_EQ (true, existed);
>ASSERT_EQ (3, *m.get (eric));
>
>/* Verify that the order of insertion is preserved.  */
> @@ -113,6 +120,7 @@ test_map_of_int_to_strings ()
>  {
>const int EMPTY = -1;
>const int DELETED = -2;
> +  bool existed;
>typedef int_hash  int_hash_t;
>ordered_hash_map  m;
>
> @@ -131,7 +139,9 @@ test_map_of_int_to_strings ()
>ASSERT_EQ (false, m.put (2, ostrich));
>ASSERT_EQ (false, m.put (4, elephant));
>ASSERT_EQ (false, m.put (6, ant));
> -  ASSERT_EQ (false, m.put (8, spider));
> +  const char*  = m.get_or_insert (8, );
> +  value = spider;
> +  ASSERT_EQ (false, existed);
>ASSERT_EQ (false, m.put (750, millipede));
>ASSERT_EQ (false, m.put (3, eric));
>
> @@ -141,7 +151,8 @@ test_map_of_int_to_strings ()
>ASSERT_EQ (*m.get (4), elephant);
>ASSERT_EQ (*m.get (6), ant);
>ASSERT_EQ (*m.get (8), spider);
> -  ASSERT_EQ (*m.get (750), millipede);
> +  ASSERT_EQ (m.get_or_insert (750, ), millipede);
> +  ASSERT_EQ (existed, TRUE);
>ASSERT_EQ (*m.get (3), eric);
>
>/* Verify that the order of insertion is preserved.  */
> diff --git a/gcc/ordered-hash-map.h b/gcc/ordered-hash-map.h
> index 6b68cc96305..9fc875182e1 100644
> --- a/gcc/ordered-hash-map.h
> +++ b/gcc/ordered-hash-map.h
> @@ -76,6 +76,32 @@ public:
>  return m_map.get (k);
>}
>
> +  /* Return a reference to the value for the passed in key, creating the 
> entry
> +if it doesn't already exist.  If existed is not NULL then it is set to
> +false if the key was not previously in the map, and true otherwise.  */
> +
> +  Value _or_insert (const Key , bool *existed = NULL)
> +  {
> +bool _existed;
> +Value  = m_map.get_or_insert (k, &_existed);
> +
> +if (!_existed)
> +  {
> +   bool key_present;
> +   int  = m_key_index.get_or_insert (k, _present);
> +   if (!key_present)
> + {
> + 

Re: [PATCH] RISC-V: Support POPCOUNT auto-vectorization

2023-07-31 Thread Kito Cheng via Gcc-patches
On Mon, Jul 31, 2023 at 8:03 PM Juzhe-Zhong  wrote:
>
> This patch is inspired by "lowerCTPOP" in LLVM.
> Support popcount auto-vectorization by following LLVM approach.
> https://godbolt.org/z/3K3GzvY7f
>
> Before this patch:
>
> :7:21: missed: couldn't vectorize loop
> :8:14: missed: not vectorized: relevant stmt not supported: _5 = 
> __builtin_popcount (_4);
>
> After this patch:
>
> popcount_32:
> ble a2,zero,.L5
> li  t3,1431654400
> li  a7,858992640
> li  t1,252645376
> li  a6,16711680
> li  a3,65536
> addiw   t3,t3,1365
> addiw   a7,a7,819
> addiw   t1,t1,-241
> addiw   a6,a6,255
> addiw   a3,a3,-1
> .L3:
> vsetvli a5,a2,e8,mf4,ta,ma
> vle32.v v1,0(a1)
> vsetivlizero,4,e32,m1,ta,ma
> vsrl.vi v2,v1,1
> vand.vx v2,v2,t3
> vsub.vv v1,v1,v2
> vsrl.vi v2,v1,2
> vand.vx v2,v2,a7
> vand.vx v1,v1,a7
> vadd.vv v1,v1,v2
> vsrl.vi v2,v1,4
> vadd.vv v1,v1,v2
> vand.vx v1,v1,t1
> vsrl.vi v2,v1,8
> vand.vx v2,v2,a6
> sllia4,a5,2
> vand.vx v1,v1,a6
> vadd.vv v1,v1,v2
> vsrl.vi v2,v1,16
> vand.vx v1,v1,a3
> vand.vx v2,v2,a3
> vadd.vv v1,v1,v2
> vmv.v.v v1,v1
> vsetvli zero,a2,e32,m1,ta,ma
> sub a2,a2,a5
> vse32.v v1,0(a0)
> add a1,a1,a4
> add a0,a0,a4
> bne a2,zero,.L3
> .L5:
> ret
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (popcount2): New pattern.
> * config/riscv/riscv-protos.h (expand_popcount): New function.
> * config/riscv/riscv-v.cc (expand_popcount): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/widen/popcount-1.c: New test.
> * gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 13 +++
>  gcc/config/riscv/riscv-protos.h   |  1 +
>  gcc/config/riscv/riscv-v.cc   | 95 +++
>  .../riscv/rvv/autovec/widen/popcount-1.c  | 23 +
>  .../riscv/rvv/autovec/widen/popcount_run-1.c  | 50 ++
>  5 files changed, 182 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index b5152bc91fd..9d32b91bdca 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -922,6 +922,19 @@
>DONE;
>  })
>
> +;; 
> ---
> +;; - [INT] POPCOUNT.
> +;; 
> ---
> +
> +(define_expand "popcount2"
> +  [(match_operand:VI 0 "register_operand")
> +   (match_operand:VI 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_popcount (operands);
> +  DONE;
> +})
> +
>  ;; 
> ---
>  ;;  [FP] Unary operations
>  ;; 
> ---
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index a729db44c32..ae40fbb4b53 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -321,6 +321,7 @@ void expand_select_vl (rtx *);
>  void expand_load_store (rtx *, bool);
>  void expand_gather_scatter (rtx *, bool);
>  void expand_cond_len_ternop (unsigned, rtx *);
> +void expand_popcount (rtx *);
>
>  /* Rounding mode bitfield for fixed point VXRM.  */
>  enum fixed_point_rounding_mode
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index c10e51b362e..b3caa4b188d 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -3614,4 +3614,99 @@ expand_reduction (rtx_code code, rtx *ops, rtx init, 
> reduction_type type)
>emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2));
>  }
>
> +/* Expand Vector POPCOUNT by parallel popcnt:
> +
> +   int parallel_popcnt(uint32_t n) {
> +   #define POW2(c)  (1U << (c))
> +   #define MASK(c)  (static_cast(-1) / (POW2(POW2(c)) + 1U))
> +   #define COUNT(x, c)  ((x) & MASK(c)) + (((x)>>(POW2(c))) & MASK(c))
> +   n = COUNT(n, 0);
> +   n = COUNT(n, 1);
> +   n = COUNT(n, 2);
> +   n = COUNT(n, 3);
> +   n = COUNT(n, 4);
> +   //  n = COUNT(n, 5);  // uncomment this line for 64-bit integers
> +   return n;
> +   #undef COUNT
> +   #undef MASK
> +   #undef POW2
> +   }
> +*/
> +void
> +expand_popcount (rtx *ops)
> +{
> +  rtx dst = ops[0];
> +  rtx src = ops[1];
> +  machine_mode mode = GET_MODE (dst);
> +  scalar_mode smode = GET_MODE_INNER (mode);
> +  

Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-31 Thread Richard Biener via Gcc-patches
On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> >> I think you need to use fma from math.h together with -ffast-math
> >>to get fma.
> 
> As you said, this is one of the case I tried:
> https://godbolt.org/z/xMzrrv5dT 
> GCC failed to vectorize.
> 
> Could you help me with this?

double foo (double *a, double *b, double *c)
{
  double result = 0.0;
  for (int i = 0; i < 1024; ++i)
result += __builtin_fma (a[i], b[i], c[i]);
  return result;
}

with -mavx2 -mfma -Ofast this is vectorized on x86_64 to

...
  vect__9.13_27 = MEM  [(double *)vectp_a.11_29];
  _9 = *_8;
  vect__10.14_26 = .FMA (vect__7.10_30, vect__9.13_27, vect__4.7_33);
  vect_result_17.15_25 = vect__10.14_26 + vect_result_20.4_36;
...

but ifcvt still shows

  _9 = *_8;
  _10 = __builtin_fma (_7, _9, _4);
  result_17 = _10 + result_20;

still vectorizable_call has IFN_FMA with

  /* First try using an internal function.  */
  code_helper convert_code = MAX_TREE_CODES;
  if (cfn != CFN_LAST
  && (modifier == NONE
  || (modifier == NARROW
  && simple_integer_narrowing (vectype_out, vectype_in,
   _code
ifn = vectorizable_internal_function (cfn, callee, vectype_out,
  vectype_in);

from CFN_BUILT_IN_FMA



> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-31 20:00
> To: juzhe.zh...@rivai.ai
> CC: richard.sandiford; gcc-patches
> Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Ok . Thanks Richard.
> > 
> > Could you give me a case that SVE can vectorize a reduction with FMA?
> > Meaning it will go into vectorize_call and vectorize FMA into COND_FMA ?
> > 
> > I tried many times to reproduce such cases but I failed.
>  
> I think you need to use fma from math.h together with -ffast-math
> to get fma.
>  
> Richard.
>  
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Sandiford
> > Date: 2023-07-31 18:19
> > To: Juzhe-Zhong
> > CC: gcc-patches; rguenther
> > Subject: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> > Juzhe-Zhong  writes:
> > > Hi, Richard and Richi.
> > >
> > > Base on the suggestions from Richard:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
> > >
> > > This patch choose (1) approach that Richard provided, meaning:
> > >
> > > RVV implements cond_* optabs as expanders.  RVV therefore supports
> > > both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
> > > are needed at the gimple level.
> > >
> > > Such approach can make codes much cleaner and reasonable.
> > >
> > > Consider this following case:
> > > void foo (float * __restrict a, float * __restrict b, int * __restrict 
> > > cond, int n)
> > > {
> > >   for (int i = 0; i < n; i++)
> > > if (cond[i])
> > >   a[i] = b[i] + a[i];
> > > }
> > >
> > >
> > > Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
> > > :5:21: missed: couldn't vectorize loop
> > > :5:21: missed: not vectorized: control flow in loop.
> > >
> > > ARM SVE:
> > >
> > > ...
> > > mask__27.10_51 = vect__4.9_49 != { 0, ... };
> > > ...
> > > vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
> > > ...
> > > vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
> > > vect__6.13_56);
> > >
> > > For RVV, we want IR as follows:
> > >
> > > ...
> > > _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
> > > ...
> > > mask__27.10_51 = vect__4.9_49 != { 0, ... };
> > > ...
> > > vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, 
> > > vect__8.16_59, vect__6.13_55, _68, 0);
> > > ...
> > >
> > > Both len and mask of COND_LEN_ADD are real not dummy.
> > >
> > > This patch has been fully tested in RISC-V port with supporting both 
> > > COND_* and COND_LEN_*.
> > >
> > > And also, Bootstrap and Regression on X86 passed.
> > >
> > > OK for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > * internal-fn.cc (FOR_EACH_LEN_FN_PAIR): New macro.
> > > (get_len_internal_fn): New function.
> > > (CASE): Ditto.
> > > * internal-fn.h (get_len_internal_fn): Ditto.
> > > * tree-vect-stmts.cc (vectorizable_call): Support CALL 
> > > vectorization with COND_LEN_*.
> > >
> > > ---
> > >  gcc/internal-fn.cc | 46 ++
> > >  gcc/internal-fn.h  |  1 +
> > >  gcc/tree-vect-stmts.cc | 87 +-
> > >  3 files changed, 125 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > > index 8e294286388..379220bebc7 100644
> > > --- a/gcc/internal-fn.cc
> > > +++ b/gcc/internal-fn.cc
> > > @@ -4443,6 +4443,52 @@ get_conditional_internal_fn (internal_fn fn)
> > >  }
> > >  }
> > >  
> > > +/* Invoke T(IFN) for each internal function IFN that also has an
> > > +   IFN_COND_LEN_* or IFN_MASK_LEN_* form.  */
> > > +#define 

Re: [PATCH v2] combine: Narrow comparison of memory and constant

2023-07-31 Thread Stefan Schulze Frielinghaus via Gcc-patches
ping

On Mon, Jun 19, 2023 at 04:23:57PM +0200, Stefan Schulze Frielinghaus wrote:
> Comparisons between memory and constants might be done in a smaller mode
> resulting in smaller constants which might finally end up as immediates
> instead of in the literal pool.
> 
> For example, on s390x a non-symmetric comparison like
>   x <= 0x3fff
> results in the constant being spilled to the literal pool and an 8 byte
> memory comparison is emitted.  Ideally, an equivalent comparison
>   x0 <= 0x3f
> where x0 is the most significant byte of x, is emitted where the
> constant is smaller and more likely to materialize as an immediate.
> 
> Similarly, comparisons of the form
>   x >= 0x4000
> can be shortened into x0 >= 0x40.
> 
> Bootstrapped and regtested on s390x, x64, aarch64, and powerpc64le.
> Note, the new tests show that for the mentioned little-endian targets
> the optimization does not materialize since either the costs of the new
> instructions are higher or they do not match.  Still ok for mainline?
> 
> gcc/ChangeLog:
> 
>   * combine.cc (simplify_compare_const): Narrow comparison of
>   memory and constant.
>   (try_combine): Adapt new function signature.
>   (simplify_comparison): Adapt new function signature.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/cmp-mem-const-1.c: New test.
>   * gcc.dg/cmp-mem-const-2.c: New test.
>   * gcc.dg/cmp-mem-const-3.c: New test.
>   * gcc.dg/cmp-mem-const-4.c: New test.
>   * gcc.dg/cmp-mem-const-5.c: New test.
>   * gcc.dg/cmp-mem-const-6.c: New test.
>   * gcc.target/s390/cmp-mem-const-1.c: New test.
> ---
>  gcc/combine.cc| 79 +--
>  gcc/testsuite/gcc.dg/cmp-mem-const-1.c| 17 
>  gcc/testsuite/gcc.dg/cmp-mem-const-2.c| 17 
>  gcc/testsuite/gcc.dg/cmp-mem-const-3.c| 17 
>  gcc/testsuite/gcc.dg/cmp-mem-const-4.c| 17 
>  gcc/testsuite/gcc.dg/cmp-mem-const-5.c| 17 
>  gcc/testsuite/gcc.dg/cmp-mem-const-6.c| 17 
>  .../gcc.target/s390/cmp-mem-const-1.c | 24 ++
>  8 files changed, 200 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-6.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/cmp-mem-const-1.c
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 5aa0ec5c45a..56e15a93409 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -460,7 +460,7 @@ static rtx simplify_shift_const (rtx, enum rtx_code, 
> machine_mode, rtx,
>  static int recog_for_combine (rtx *, rtx_insn *, rtx *);
>  static rtx gen_lowpart_for_combine (machine_mode, rtx);
>  static enum rtx_code simplify_compare_const (enum rtx_code, machine_mode,
> -  rtx, rtx *);
> +  rtx *, rtx *);
>  static enum rtx_code simplify_comparison (enum rtx_code, rtx *, rtx *);
>  static void update_table_tick (rtx);
>  static void record_value_for_reg (rtx, rtx_insn *, rtx);
> @@ -3185,7 +3185,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
> compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
> if (is_a  (GET_MODE (i2dest), ))
>   compare_code = simplify_compare_const (compare_code, mode,
> -op0, );
> +, );
> target_canonicalize_comparison (_code, , , 1);
>   }
>  
> @@ -11796,13 +11796,14 @@ gen_lowpart_for_combine (machine_mode omode, rtx x)
> (CODE OP0 const0_rtx) form.
>  
> The result is a possibly different comparison code to use.
> -   *POP1 may be updated.  */
> +   *POP0 and *POP1 may be updated.  */
>  
>  static enum rtx_code
>  simplify_compare_const (enum rtx_code code, machine_mode mode,
> - rtx op0, rtx *pop1)
> + rtx *pop0, rtx *pop1)
>  {
>scalar_int_mode int_mode;
> +  rtx op0 = *pop0;
>HOST_WIDE_INT const_op = INTVAL (*pop1);
>  
>/* Get the constant we are comparing against and turn off all bits
> @@ -11987,6 +11988,74 @@ simplify_compare_const (enum rtx_code code, 
> machine_mode mode,
>break;
>  }
>  
> +  /* Narrow non-symmetric comparison of memory and constant as e.g.
> + x0...x7 <= 0x3fff into x0 <= 0x3f where x0 is the most
> + significant byte.  Likewise, transform x0...x7 >= 0x4000 
> into
> + x0 >= 0x40.  */
> +  if ((code == LEU || code == LTU || code == GEU || code == GTU)
> +  && is_a  (GET_MODE (op0), _mode)
> +  && MEM_P (op0)
> +  && !MEM_VOLATILE_P (op0)
> +  /* The 

Re: [committed] RISC-V: Fix bug of get_mask_mode

2023-07-31 Thread Kito Cheng via Gcc-patches
Ooops, I guess my code base was too old, and forgot to check that after
rebase, thanks for fix that!

Juzhe-Zhong 於 2023年7月31日 週一,20:21寫道:

> Fix bugs:
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ‘void
> riscv_vector::emit_vlmax_masked_fp_mu_insn(unsigned int, int, rtx_def**)’:
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:999:54: error: request for
> member ‘require’ in ‘riscv_vector::get_mask_mode(dest_mode)’, which is of
> non-class type ‘machine_mode’
>machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>   ^~~
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ‘void
> riscv_vector::emit_nonvlmax_tumu_insn(unsigned int, int, rtx_def**, rtx)’:
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:1057:54: error: request for
> member ‘require’ in ‘riscv_vector::get_mask_mode(dest_mode)’, which is of
> non-class type ‘machine_mode’
>machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>   ^~~
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ‘void
> riscv_vector::emit_nonvlmax_fp_tumu_insn(unsigned int, int, rtx_def**,
> rtx)’:
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:1076:54: error: request for
> member ‘require’ in ‘riscv_vector::get_mask_mode(dest_mode)’, which is of
> non-class type ‘machine_mode’
>machine_mode mask_mode = get_mask_mode (dest_mode).require ();
>
> Obvious fix. Pushed.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc (emit_vlmax_masked_fp_mu_insn): Fix bug.
> (emit_nonvlmax_tumu_insn): Ditto.
> (emit_nonvlmax_fp_tumu_insn): Ditto.
> (expand_vec_series): Ditto.
> (expand_vector_init_insert_elems): Ditto.
>
> ---
>  gcc/config/riscv/riscv-v.cc | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 76b437cc55e..40e4574dcc0 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -996,7 +996,7 @@ static void
>  emit_vlmax_masked_fp_mu_insn (unsigned icode, int op_num, rtx *ops)
>  {
>machine_mode dest_mode = GET_MODE (ops[0]);
> -  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
> +  machine_mode mask_mode = get_mask_mode (dest_mode);
>insn_expander e (/*OP_NUM*/ op_num,
>   /*HAS_DEST_P*/ true,
>   /*FULLY_UNMASKED_P*/ false,
> @@ -1054,7 +1054,7 @@ static void
>  emit_nonvlmax_tumu_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
>  {
>machine_mode dest_mode = GET_MODE (ops[0]);
> -  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
> +  machine_mode mask_mode = get_mask_mode (dest_mode);
>insn_expander e (/*OP_NUM*/ op_num,
>   /*HAS_DEST_P*/ true,
>   /*FULLY_UNMASKED_P*/ false,
> @@ -1073,7 +1073,7 @@ static void
>  emit_nonvlmax_fp_tumu_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
>  {
>machine_mode dest_mode = GET_MODE (ops[0]);
> -  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
> +  machine_mode mask_mode = get_mask_mode (dest_mode);
>insn_expander e (/*OP_NUM*/ op_num,
>   /*HAS_DEST_P*/ true,
>   /*FULLY_UNMASKED_P*/ false,
> @@ -1306,7 +1306,6 @@ void
>  expand_vec_series (rtx dest, rtx base, rtx step)
>  {
>machine_mode mode = GET_MODE (dest);
> -  machine_mode mask_mode = get_mask_mode (mode);
>poly_int64 nunits_m1 = GET_MODE_NUNITS (mode) - 1;
>poly_int64 value;
>
> @@ -2375,7 +2374,6 @@ expand_vector_init_insert_elems (rtx target, const
> rvv_builder ,
>  int nelts_reqd)
>  {
>machine_mode mode = GET_MODE (target);
> -  machine_mode mask_mode = get_mask_mode (mode);
>rtx dup = expand_vector_broadcast (mode, builder.elt (0));
>emit_move_insn (target, dup);
>int ndups = builder.count_dups (0, nelts_reqd - 1, 1);
> --
> 2.36.3
>
>


[PATCH] Improve sinking with unrelated defs

2023-07-31 Thread Richard Biener via Gcc-patches
statement_sink_location for loads is currently confused about
stores that are not on the paths we are sinking across.  The
following avoids this by explicitely checking whether a block
with a store is on any of those paths.  To not perform too many
walks over the sub-part of the CFG between the orignal stmt
location and the found sinking candidate we first collect all
blocks to check and then perform a single walk from the sinking
candidate location to the original stmt location.  We avoid enlarging
the region by conservatively handling backedges.

The original heuristics what store locations to ignore have been
refactored, some can possibly be removed now.

If anybody knows a cheaper way to check whether a BB is on a path
from block A to block B which is dominated by A I'd be happy to
know (or if there would be a clever caching method at least - I'm
probably going to limit the number of blocks to walk to aovid
quadraticness).

Boostrapped and tested on x86_64-unknown-linux-gnu.  This depends
on the previous sent RFC to limit testsuite fallout.

* tree-ssa-sink.cc (pass_sink_code::execute): Mark backedges.
(statement_sink_location): Do not consider
stores that are not on any path from the original to the
destination location.

* gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c |  16 +++
 gcc/tree-ssa-sink.cc| 125 
 2 files changed, 121 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
new file mode 100644
index 000..266ceb000a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink1-details" } */
+
+void bar ();
+int foo (int *p, int x)
+{
+  int res = *p;
+  if (x)
+{
+  bar ();
+  res = 1;
+}
+  return res;
+}
+
+/* { dg-final { scan-tree-dump "Sinking # VUSE" "sink1" } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index cf0a32a954b..e996f46c864 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -388,13 +388,32 @@ statement_sink_location (gimple *stmt, basic_block frombb,
 
  imm_use_iterator imm_iter;
  use_operand_p use_p;
+ auto_bitmap bbs_to_check;
  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_vuse (stmt))
{
  gimple *use_stmt = USE_STMT (use_p);
  basic_block bb = gimple_bb (use_stmt);
+
+ /* If there is no virtual definition here, continue.  */
+ if (gimple_code (use_stmt) != GIMPLE_PHI
+ && !gimple_vdef (use_stmt))
+   continue;
+
+ /* When the virtual definition is possibly within the same
+irreducible region as the current sinking location all
+bets are off.  */
+ if (((bb->flags & commondom->flags & BB_IRREDUCIBLE_LOOP)
+  && bb->loop_father == commondom->loop_father)
+ || ((commondom->flags & BB_IRREDUCIBLE_LOOP)
+ && flow_loop_nested_p (commondom->loop_father,
+bb->loop_father))
+ || ((bb->flags & BB_IRREDUCIBLE_LOOP)
+ && flow_loop_nested_p (bb->loop_father,
+commondom->loop_father)))
+   ;
  /* For PHI nodes the block we know sth about is the incoming block
 with the use.  */
- if (gimple_code (use_stmt) == GIMPLE_PHI)
+ else if (gimple_code (use_stmt) == GIMPLE_PHI)
{
  /* If the PHI defines the virtual operand, ignore it.  */
  if (gimple_phi_result (use_stmt) == gimple_vuse (stmt))
@@ -402,32 +421,97 @@ statement_sink_location (gimple *stmt, basic_block frombb,
  /* In case the PHI node post-dominates the current insert
 location we can disregard it.  But make sure it is not
 dominating it as well as can happen in a CFG cycle.  */
- if (commondom != bb
- && !dominated_by_p (CDI_DOMINATORS, commondom, bb)
- && dominated_by_p (CDI_POST_DOMINATORS, commondom, bb)
- /* If the blocks are possibly within the same irreducible
-cycle the above check breaks down.  */
- && !((bb->flags & commondom->flags & BB_IRREDUCIBLE_LOOP)
-  && bb->loop_father == commondom->loop_father)
- && !((commondom->flags & BB_IRREDUCIBLE_LOOP)
-  && flow_loop_nested_p (commondom->loop_father,
- bb->loop_father))
- && !((bb->flags & 

[committed] RISC-V: Fix bug of get_mask_mode

2023-07-31 Thread Juzhe-Zhong
Fix bugs:
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ???void 
riscv_vector::emit_vlmax_masked_fp_mu_insn(unsigned int, int, rtx_def**)???:
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:999:54: error: request for 
member ???require??? in ???riscv_vector::get_mask_mode(dest_mode)???, which is 
of non-class type ???machine_mode???
   machine_mode mask_mode = get_mask_mode (dest_mode).require ();
  ^~~
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ???void 
riscv_vector::emit_nonvlmax_tumu_insn(unsigned int, int, rtx_def**, rtx)???:
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:1057:54: error: request for 
member ???require??? in ???riscv_vector::get_mask_mode(dest_mode)???, which is 
of non-class type ???machine_mode???
   machine_mode mask_mode = get_mask_mode (dest_mode).require ();
  ^~~
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc: In function ???void 
riscv_vector::emit_nonvlmax_fp_tumu_insn(unsigned int, int, rtx_def**, rtx)???:
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:1076:54: error: request for 
member ???require??? in ???riscv_vector::get_mask_mode(dest_mode)???, which is 
of non-class type ???machine_mode???
   machine_mode mask_mode = get_mask_mode (dest_mode).require ();

Obvious fix. Pushed.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vlmax_masked_fp_mu_insn): Fix bug.
(emit_nonvlmax_tumu_insn): Ditto.
(emit_nonvlmax_fp_tumu_insn): Ditto.
(expand_vec_series): Ditto.
(expand_vector_init_insert_elems): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 76b437cc55e..40e4574dcc0 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -996,7 +996,7 @@ static void
 emit_vlmax_masked_fp_mu_insn (unsigned icode, int op_num, rtx *ops)
 {
   machine_mode dest_mode = GET_MODE (ops[0]);
-  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  machine_mode mask_mode = get_mask_mode (dest_mode);
   insn_expander e (/*OP_NUM*/ op_num,
  /*HAS_DEST_P*/ true,
  /*FULLY_UNMASKED_P*/ false,
@@ -1054,7 +1054,7 @@ static void
 emit_nonvlmax_tumu_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
 {
   machine_mode dest_mode = GET_MODE (ops[0]);
-  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  machine_mode mask_mode = get_mask_mode (dest_mode);
   insn_expander e (/*OP_NUM*/ op_num,
  /*HAS_DEST_P*/ true,
  /*FULLY_UNMASKED_P*/ false,
@@ -1073,7 +1073,7 @@ static void
 emit_nonvlmax_fp_tumu_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
 {
   machine_mode dest_mode = GET_MODE (ops[0]);
-  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  machine_mode mask_mode = get_mask_mode (dest_mode);
   insn_expander e (/*OP_NUM*/ op_num,
  /*HAS_DEST_P*/ true,
  /*FULLY_UNMASKED_P*/ false,
@@ -1306,7 +1306,6 @@ void
 expand_vec_series (rtx dest, rtx base, rtx step)
 {
   machine_mode mode = GET_MODE (dest);
-  machine_mode mask_mode = get_mask_mode (mode);
   poly_int64 nunits_m1 = GET_MODE_NUNITS (mode) - 1;
   poly_int64 value;
 
@@ -2375,7 +2374,6 @@ expand_vector_init_insert_elems (rtx target, const 
rvv_builder ,
 int nelts_reqd)
 {
   machine_mode mode = GET_MODE (target);
-  machine_mode mask_mode = get_mask_mode (mode);
   rtx dup = expand_vector_broadcast (mode, builder.elt (0));
   emit_move_insn (target, dup);
   int ndups = builder.count_dups (0, nelts_reqd - 1, 1);
-- 
2.36.3



RE: [PATCH] internal-fn: Refine macro define of COND_* and COND_LEN_* internal functions

2023-07-31 Thread Li, Pan2 via Gcc-patches
Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Sandiford via Gcc-patches
Sent: Monday, July 31, 2023 6:17 PM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de
Subject: Re: [PATCH] internal-fn: Refine macro define of COND_* and COND_LEN_* 
internal functions

juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard and Richi.
>
> Base on previous disscussions, we should make COND_* and COND_LEN_*
> consistent.
>
> So, this patch define these internal function together by these 2
> wrappers:
>
> #ifndef DEF_INTERNAL_COND_FN
> #define DEF_INTERNAL_COND_FN(NAME, FLAGS, OPTAB, TYPE)
>  \
>   DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##OPTAB, cond_##TYPE)   
>  \
>   DEF_INTERNAL_OPTAB_FN (COND_LEN_##NAME, FLAGS, cond_len_##OPTAB,
>  \
>cond_len_##TYPE)
> #endif
>
> #ifndef DEF_INTERNAL_SIGNED_COND_FN
> #define DEF_INTERNAL_SIGNED_COND_FN(NAME, FLAGS, SELECTOR, SIGNED_OPTAB,  
>  \
>   UNSIGNED_OPTAB, TYPE)  \
>   DEF_INTERNAL_SIGNED_OPTAB_FN (COND_##NAME, FLAGS, SELECTOR, 
>  \
>   cond_##SIGNED_OPTAB, cond_##UNSIGNED_OPTAB,\
>   cond_##TYPE)   \
>   DEF_INTERNAL_SIGNED_OPTAB_FN (COND_LEN_##NAME, FLAGS, SELECTOR, 
>  \
>   cond_len_##SIGNED_OPTAB,   \
>   cond_len_##UNSIGNED_OPTAB, cond_len_##TYPE)
> #endif
>
> Bootstrap and Regression on X86 passed.
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * internal-fn.def (DEF_INTERNAL_COND_FN): New macro.
> (DEF_INTERNAL_SIGNED_COND_FN): Ditto.
> (COND_ADD): Remove.
> (COND_SUB): Ditto.
> (COND_MUL): Ditto.
> (COND_DIV): Ditto.
> (COND_MOD): Ditto.
> (COND_RDIV): Ditto.
> (COND_MIN): Ditto.
> (COND_MAX): Ditto.
> (COND_FMIN): Ditto.
> (COND_FMAX): Ditto.
> (COND_AND): Ditto.
> (COND_IOR): Ditto.
> (COND_XOR): Ditto.
> (COND_SHL): Ditto.
> (COND_SHR): Ditto.
> (COND_FMA): Ditto.
> (COND_FMS): Ditto.
> (COND_FNMA): Ditto.
> (COND_FNMS): Ditto.
> (COND_NEG): Ditto.
> (COND_LEN_ADD): Ditto.
> (COND_LEN_SUB): Ditto.
> (COND_LEN_MUL): Ditto.
> (COND_LEN_DIV): Ditto.
> (COND_LEN_MOD): Ditto.
> (COND_LEN_RDIV): Ditto.
> (COND_LEN_MIN): Ditto.
> (COND_LEN_MAX): Ditto.
> (COND_LEN_FMIN): Ditto.
> (COND_LEN_FMAX): Ditto.
> (COND_LEN_AND): Ditto.
> (COND_LEN_IOR): Ditto.
> (COND_LEN_XOR): Ditto.
> (COND_LEN_SHL): Ditto.
> (COND_LEN_SHR): Ditto.
> (COND_LEN_FMA): Ditto.
> (COND_LEN_FMS): Ditto.
> (COND_LEN_FNMA): Ditto.
> (COND_LEN_FNMS): Ditto.
> (COND_LEN_NEG): Ditto.
> (ADD): New macro define.
> (SUB): Ditto.
> (MUL): Ditto.
> (DIV): Ditto.
> (MOD): Ditto.
> (RDIV): Ditto.
> (MIN): Ditto.
> (MAX): Ditto.
> (FMIN): Ditto.
> (FMAX): Ditto.
> (AND): Ditto.
> (IOR): Ditto.
> (XOR): Ditto.
> (SHL): Ditto.
> (SHR): Ditto.
> (FMA): Ditto.
> (FMS): Ditto.
> (FNMA): Ditto.
> (FNMS): Ditto.
> (NEG): Ditto.

OK, thanks.  (And sorry for the slow review.)

Richard

> ---
>  gcc/internal-fn.def | 123 
>  1 file changed, 56 insertions(+), 67 deletions(-)
>
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 04f3812326e..bf6825c5d00 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -34,10 +34,12 @@ along with GCC; see the file COPYING3.  If not see
>  UNSIGNED_OPTAB, TYPE)
>   DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE)
>   DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE)
> + DEF_INTERNAL_COND_FN (NAME, FLAGS, OPTAB, TYPE)
> + DEF_INTERNAL_SIGNED_COND_FN (NAME, FLAGS, OPTAB, TYPE)
>  
> where NAME is the name of the function, FLAGS is a set of
> ECF_* flags and FNSPEC is a string describing functions fnspec.
> -   
> +
> DEF_INTERNAL_OPTAB_FN defines an internal function that maps to a
> direct optab.  The function should only be called with a given
> set of types if the associated optab is available for the modes
> @@ -74,7 +76,8 @@ along with GCC; see the file COPYING3.  If not see
>  
> - cond_len_unary: a conditional unary optab, such as cond_len_neg
> - cond_len_binary: a conditional binary optab, such as cond_len_add
> -   - cond_len_ternary: a conditional ternary optab, such as 
> cond_len_fma_rev
> +   - cond_len_ternary: a conditional ternary optab, such 

RE: [PATCH v2] RISC-V: convert the mulh with 0 to mov 0 to the reg.

2023-07-31 Thread Wang, Yanzhang via Gcc-patches
Thanks your comments, Jeff and Robin

> > Is the mulh case somehow common or critical?
> Well, I would actually back up even further.  What were the
> circumstances that led to the mulh with a zero operand?   

I think you both mentioned why should we add the mulh * 0 simplify.
Unfortunately, I have no such a benchmark to explain the criticalness. We found
there're some cases that exists in simplify_binary_operation in simplify-rtx.cc
but not working for RISC-V backend. For example,

- mult * 0 exists, but RISC-V has additional mulh * 0
- add + 0 / sub - 0 exists, but RISC-V has additional (madc + adc) + 0
- ...

So we want to do some complement to make the simplify can cover more cases.
That's the basic idea why we do these shortcut optimizations.

> > However, adding new rtl expressions, especially generic ones that are
> > useful for others and the respective optimizations is a tedious
> > process as well.  Still, just recently Roger Sayle added bitreverse
> > and copysign.  You can refer to his patch as well as the follow-up
> > ones to get an idea of what would need to be done.
> > ("Add RTX codes for BITREVERSE and COPYSIGN")

Great advise. I'll have a check for the generic operations whether they can
be implemented by this patch's style. It seems that we have to write specific
pattern for the unspec relative insns, unfortunately.

Thanks,
Yanzhang

> -Original Message-
> From: Jeff Law 
> Sent: Saturday, July 29, 2023 7:07 AM
> To: Robin Dapp ; Wang, Yanzhang
> ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Li, Pan2
> 
> Subject: Re: [PATCH v2] RISC-V: convert the mulh with 0 to mov 0 to the reg.
> 
> 
> 
> On 7/28/23 06:31, Robin Dapp via Gcc-patches wrote:
> >> This is a draft patch. I would like to explain it's hard to make the
> >> simplify generic and ask for some help.
> >>
> >> There're 2 categories we need to optimize.
> >>
> >> - The op in optab such as div / 1.
> >> - The unspec operation such as mulh * 0, (vadc+vmadc) + 0.
> >>
> >> Especially for the unspec operation, I found we need to write one by
> >> one to match the special pattern. Seems there's no way to write a
> >> generic pattern that will match mulh, (vadc+vmadc), sll... This way
> >> is too complicated and not so elegant because need to write so much
> >> md patterns.
> >>
> >> Do you have any ideas?
> >
> > Yes, it's cumbersome having to add the patterns individually and it
> > would be nicer to have the middle end optimize for us.
> >
> > However, adding new rtl expressions, especially generic ones that are
> > useful for others and the respective optimizations is a tedious
> > process as well.  Still, just recently Roger Sayle added bitreverse
> > and copysign.  You can refer to his patch as well as the follow-up
> > ones to get an idea of what would need to be done.
> > ("Add RTX codes for BITREVERSE and COPYSIGN")
> >
> > So if we have few patterns that are really performance critical (like
> > for some benchmark) my take is to add them in a similar way you were
> > proposing but I would advise against using this excessively.
> > Is the mulh case somehow common or critical?
> Well, I would actually back up even further.  What were the
> circumstances that led to the mulh with a zero operand?   That would
> tend to be an indicator of a problem earlier.  Perhaps in the gimple
> pipeline or the gimple->rtl conversion.  I'd be a bit surprised to see a
> const0_rtx propagate in during the RTL pipeline, I guess it's possible, but
> I'd expect it to be relatively rare.
> 
> The one case I could see happening would be cases from the builtin apis...
> Of course one might call that user error ;-)
> 
> 
> jeff


Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-31 Thread juzhe.zh...@rivai.ai
Hi, Richi.

>> I think you need to use fma from math.h together with -ffast-math
>>to get fma.

As you said, this is one of the case I tried:
https://godbolt.org/z/xMzrrv5dT 
GCC failed to vectorize.

Could you help me with this?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-31 20:00
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches
Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
 
> Ok . Thanks Richard.
> 
> Could you give me a case that SVE can vectorize a reduction with FMA?
> Meaning it will go into vectorize_call and vectorize FMA into COND_FMA ?
> 
> I tried many times to reproduce such cases but I failed.
 
I think you need to use fma from math.h together with -ffast-math
to get fma.
 
Richard.
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-07-31 18:19
> To: Juzhe-Zhong
> CC: gcc-patches; rguenther
> Subject: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> Juzhe-Zhong  writes:
> > Hi, Richard and Richi.
> >
> > Base on the suggestions from Richard:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
> >
> > This patch choose (1) approach that Richard provided, meaning:
> >
> > RVV implements cond_* optabs as expanders.  RVV therefore supports
> > both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
> > are needed at the gimple level.
> >
> > Such approach can make codes much cleaner and reasonable.
> >
> > Consider this following case:
> > void foo (float * __restrict a, float * __restrict b, int * __restrict 
> > cond, int n)
> > {
> >   for (int i = 0; i < n; i++)
> > if (cond[i])
> >   a[i] = b[i] + a[i];
> > }
> >
> >
> > Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
> > :5:21: missed: couldn't vectorize loop
> > :5:21: missed: not vectorized: control flow in loop.
> >
> > ARM SVE:
> >
> > ...
> > mask__27.10_51 = vect__4.9_49 != { 0, ... };
> > ...
> > vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
> > ...
> > vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
> > vect__6.13_56);
> >
> > For RVV, we want IR as follows:
> >
> > ...
> > _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
> > ...
> > mask__27.10_51 = vect__4.9_49 != { 0, ... };
> > ...
> > vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, 
> > vect__8.16_59, vect__6.13_55, _68, 0);
> > ...
> >
> > Both len and mask of COND_LEN_ADD are real not dummy.
> >
> > This patch has been fully tested in RISC-V port with supporting both COND_* 
> > and COND_LEN_*.
> >
> > And also, Bootstrap and Regression on X86 passed.
> >
> > OK for trunk?
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.cc (FOR_EACH_LEN_FN_PAIR): New macro.
> > (get_len_internal_fn): New function.
> > (CASE): Ditto.
> > * internal-fn.h (get_len_internal_fn): Ditto.
> > * tree-vect-stmts.cc (vectorizable_call): Support CALL 
> > vectorization with COND_LEN_*.
> >
> > ---
> >  gcc/internal-fn.cc | 46 ++
> >  gcc/internal-fn.h  |  1 +
> >  gcc/tree-vect-stmts.cc | 87 +-
> >  3 files changed, 125 insertions(+), 9 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 8e294286388..379220bebc7 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4443,6 +4443,52 @@ get_conditional_internal_fn (internal_fn fn)
> >  }
> >  }
> >  
> > +/* Invoke T(IFN) for each internal function IFN that also has an
> > +   IFN_COND_LEN_* or IFN_MASK_LEN_* form.  */
> > +#define FOR_EACH_LEN_FN_PAIR(T)
> > \
> > +  T (MASK_LOAD, MASK_LEN_LOAD) 
> > \
> > +  T (MASK_STORE, MASK_LEN_STORE)   
> > \
> > +  T (MASK_GATHER_LOAD, MASK_LEN_GATHER_LOAD)   
> > \
> > +  T (MASK_SCATTER_STORE, MASK_LEN_SCATTER_STORE)   
> > \
> > +  T (COND_ADD, COND_LEN_ADD)   
> > \
> > +  T (COND_SUB, COND_LEN_SUB)   
> > \
> > +  T (COND_MUL, COND_LEN_MUL)   
> > \
> > +  T (COND_DIV, COND_LEN_DIV)   
> > \
> > +  T (COND_MOD, COND_LEN_MOD)   
> > \
> > +  T (COND_RDIV, COND_LEN_RDIV) 
> > \
> > +  T (COND_FMIN, COND_LEN_FMIN) 
> > \
> > +  T (COND_FMAX, COND_LEN_FMAX) 
> > \
> > +  T (COND_MIN, COND_LEN_MIN)   
> > \
> > +  T (COND_MAX, COND_LEN_MAX)   
> >

Re: [gcc-13] Backport PR10280 fix

2023-07-31 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 27 Jul 2023 at 12:04, Richard Biener  wrote:
>
> On Wed, 26 Jul 2023, Prathamesh Kulkarni wrote:
>
> > Sorry, I meant PR110280 in subject line (not PR10280).
>
> OK after 13.2 is released and the branch is open again.
Thanks, committed the patch to releases/gcc-13 branch in:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f4029de35fb1b293a4fd586574b1b4b73ddf7880

Thanks,
Prathamesh
>
> Richard.
>
> > On Wed, 26 Jul 2023 at 23:03, Prathamesh Kulkarni
> >  wrote:
> > >
> > > Hi Richard,
> > > Sorry for the delay in backport to gcc-13.
> > > The attached patch (cherry picked from master) is bootstrapped+tested
> > > on aarch64-linux-gnu with SVE enabled on gcc-13 branch.
> > > OK to commit to gcc-13 branch ?
> > >
> > > Thanks,
> > > Prathamesh
> >


[PATCH] RISC-V: Support POPCOUNT auto-vectorization

2023-07-31 Thread Juzhe-Zhong
This patch is inspired by "lowerCTPOP" in LLVM.
Support popcount auto-vectorization by following LLVM approach.
https://godbolt.org/z/3K3GzvY7f

Before this patch:

:7:21: missed: couldn't vectorize loop
:8:14: missed: not vectorized: relevant stmt not supported: _5 = 
__builtin_popcount (_4);

After this patch:

popcount_32:
ble a2,zero,.L5
li  t3,1431654400
li  a7,858992640
li  t1,252645376
li  a6,16711680
li  a3,65536
addiw   t3,t3,1365
addiw   a7,a7,819
addiw   t1,t1,-241
addiw   a6,a6,255
addiw   a3,a3,-1
.L3:
vsetvli a5,a2,e8,mf4,ta,ma
vle32.v v1,0(a1)
vsetivlizero,4,e32,m1,ta,ma
vsrl.vi v2,v1,1
vand.vx v2,v2,t3
vsub.vv v1,v1,v2
vsrl.vi v2,v1,2
vand.vx v2,v2,a7
vand.vx v1,v1,a7
vadd.vv v1,v1,v2
vsrl.vi v2,v1,4
vadd.vv v1,v1,v2
vand.vx v1,v1,t1
vsrl.vi v2,v1,8
vand.vx v2,v2,a6
sllia4,a5,2
vand.vx v1,v1,a6
vadd.vv v1,v1,v2
vsrl.vi v2,v1,16
vand.vx v1,v1,a3
vand.vx v2,v2,a3
vadd.vv v1,v1,v2
vmv.v.v v1,v1
vsetvli zero,a2,e32,m1,ta,ma
sub a2,a2,a5
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L3
.L5:
ret

gcc/ChangeLog:

* config/riscv/autovec.md (popcount2): New pattern.
* config/riscv/riscv-protos.h (expand_popcount): New function.
* config/riscv/riscv-v.cc (expand_popcount): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/popcount-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c: New test.

---
 gcc/config/riscv/autovec.md   | 13 +++
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 95 +++
 .../riscv/rvv/autovec/widen/popcount-1.c  | 23 +
 .../riscv/rvv/autovec/widen/popcount_run-1.c  | 50 ++
 5 files changed, 182 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b5152bc91fd..9d32b91bdca 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -922,6 +922,19 @@
   DONE;
 })
 
+;; 
---
+;; - [INT] POPCOUNT.
+;; 
---
+
+(define_expand "popcount2"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_popcount (operands);
+  DONE;
+})
+
 ;; 
---
 ;;  [FP] Unary operations
 ;; 
---
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index a729db44c32..ae40fbb4b53 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -321,6 +321,7 @@ void expand_select_vl (rtx *);
 void expand_load_store (rtx *, bool);
 void expand_gather_scatter (rtx *, bool);
 void expand_cond_len_ternop (unsigned, rtx *);
+void expand_popcount (rtx *);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index c10e51b362e..b3caa4b188d 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3614,4 +3614,99 @@ expand_reduction (rtx_code code, rtx *ops, rtx init, 
reduction_type type)
   emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2));
 }
 
+/* Expand Vector POPCOUNT by parallel popcnt:
+
+   int parallel_popcnt(uint32_t n) {
+   #define POW2(c)  (1U << (c))
+   #define MASK(c)  (static_cast(-1) / (POW2(POW2(c)) + 1U))
+   #define COUNT(x, c)  ((x) & MASK(c)) + (((x)>>(POW2(c))) & MASK(c))
+   n = COUNT(n, 0);
+   n = COUNT(n, 1);
+   n = COUNT(n, 2);
+   n = COUNT(n, 3);
+   n = COUNT(n, 4);
+   //  n = COUNT(n, 5);  // uncomment this line for 64-bit integers
+   return n;
+   #undef COUNT
+   #undef MASK
+   #undef POW2
+   }
+*/
+void
+expand_popcount (rtx *ops)
+{
+  rtx dst = ops[0];
+  rtx src = ops[1];
+  machine_mode mode = GET_MODE (dst);
+  scalar_mode smode = GET_MODE_INNER (mode);
+  static const uint64_t mask_values[6]
+= {0xULL, 0xULL, 0x0F0F0F0F0F0F0F0FULL,
+   0x00FF00FF00FF00FFULL, 0xULL, 0xULL};
+
+  unsigned bit_size = GET_MODE_BITSIZE (smode);
+  unsigned word_size
+= (bit_size + LONG_LONG_TYPE_SIZE - 1) / LONG_LONG_TYPE_SIZE;
+  rtx count = 

Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-31 Thread Richard Biener via Gcc-patches
On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:

> Ok . Thanks Richard.
> 
> Could you give me a case that SVE can vectorize a reduction with FMA?
> Meaning it will go into vectorize_call and vectorize FMA into COND_FMA ?
> 
> I tried many times to reproduce such cases but I failed.

I think you need to use fma from math.h together with -ffast-math
to get fma.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-07-31 18:19
> To: Juzhe-Zhong
> CC: gcc-patches; rguenther
> Subject: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> Juzhe-Zhong  writes:
> > Hi, Richard and Richi.
> >
> > Base on the suggestions from Richard:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
> >
> > This patch choose (1) approach that Richard provided, meaning:
> >
> > RVV implements cond_* optabs as expanders.  RVV therefore supports
> > both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
> > are needed at the gimple level.
> >
> > Such approach can make codes much cleaner and reasonable.
> >
> > Consider this following case:
> > void foo (float * __restrict a, float * __restrict b, int * __restrict 
> > cond, int n)
> > {
> >   for (int i = 0; i < n; i++)
> > if (cond[i])
> >   a[i] = b[i] + a[i];
> > }
> >
> >
> > Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
> > :5:21: missed: couldn't vectorize loop
> > :5:21: missed: not vectorized: control flow in loop.
> >
> > ARM SVE:
> >
> > ...
> > mask__27.10_51 = vect__4.9_49 != { 0, ... };
> > ...
> > vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
> > ...
> > vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
> > vect__6.13_56);
> >
> > For RVV, we want IR as follows:
> >
> > ...
> > _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
> > ...
> > mask__27.10_51 = vect__4.9_49 != { 0, ... };
> > ...
> > vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, 
> > vect__8.16_59, vect__6.13_55, _68, 0);
> > ...
> >
> > Both len and mask of COND_LEN_ADD are real not dummy.
> >
> > This patch has been fully tested in RISC-V port with supporting both COND_* 
> > and COND_LEN_*.
> >
> > And also, Bootstrap and Regression on X86 passed.
> >
> > OK for trunk?
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.cc (FOR_EACH_LEN_FN_PAIR): New macro.
> > (get_len_internal_fn): New function.
> > (CASE): Ditto.
> > * internal-fn.h (get_len_internal_fn): Ditto.
> > * tree-vect-stmts.cc (vectorizable_call): Support CALL 
> > vectorization with COND_LEN_*.
> >
> > ---
> >  gcc/internal-fn.cc | 46 ++
> >  gcc/internal-fn.h  |  1 +
> >  gcc/tree-vect-stmts.cc | 87 +-
> >  3 files changed, 125 insertions(+), 9 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 8e294286388..379220bebc7 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4443,6 +4443,52 @@ get_conditional_internal_fn (internal_fn fn)
> >  }
> >  }
> >  
> > +/* Invoke T(IFN) for each internal function IFN that also has an
> > +   IFN_COND_LEN_* or IFN_MASK_LEN_* form.  */
> > +#define FOR_EACH_LEN_FN_PAIR(T)
> > \
> > +  T (MASK_LOAD, MASK_LEN_LOAD) 
> > \
> > +  T (MASK_STORE, MASK_LEN_STORE)   
> > \
> > +  T (MASK_GATHER_LOAD, MASK_LEN_GATHER_LOAD)   
> > \
> > +  T (MASK_SCATTER_STORE, MASK_LEN_SCATTER_STORE)   
> > \
> > +  T (COND_ADD, COND_LEN_ADD)   
> > \
> > +  T (COND_SUB, COND_LEN_SUB)   
> > \
> > +  T (COND_MUL, COND_LEN_MUL)   
> > \
> > +  T (COND_DIV, COND_LEN_DIV)   
> > \
> > +  T (COND_MOD, COND_LEN_MOD)   
> > \
> > +  T (COND_RDIV, COND_LEN_RDIV) 
> > \
> > +  T (COND_FMIN, COND_LEN_FMIN) 
> > \
> > +  T (COND_FMAX, COND_LEN_FMAX) 
> > \
> > +  T (COND_MIN, COND_LEN_MIN)   
> > \
> > +  T (COND_MAX, COND_LEN_MAX)   
> > \
> > +  T (COND_AND, COND_LEN_AND)   
> > \
> > +  T (COND_IOR, COND_LEN_IOR)   
> > \
> > +  T (COND_XOR, COND_LEN_XOR)   
> > \
> > +  T (COND_SHL, COND_LEN_SHL)   
> > \
> > +  T (COND_SHR, COND_LEN_SHR)   

Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-31 Thread juzhe.zh...@rivai.ai
Ok . Thanks Richard.

Could you give me a case that SVE can vectorize a reduction with FMA?
Meaning it will go into vectorize_call and vectorize FMA into COND_FMA ?

I tried many times to reproduce such cases but I failed.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-31 18:19
To: Juzhe-Zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
Juzhe-Zhong  writes:
> Hi, Richard and Richi.
>
> Base on the suggestions from Richard:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
>
> This patch choose (1) approach that Richard provided, meaning:
>
> RVV implements cond_* optabs as expanders.  RVV therefore supports
> both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
> are needed at the gimple level.
>
> Such approach can make codes much cleaner and reasonable.
>
> Consider this following case:
> void foo (float * __restrict a, float * __restrict b, int * __restrict cond, 
> int n)
> {
>   for (int i = 0; i < n; i++)
> if (cond[i])
>   a[i] = b[i] + a[i];
> }
>
>
> Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
> :5:21: missed: couldn't vectorize loop
> :5:21: missed: not vectorized: control flow in loop.
>
> ARM SVE:
>
> ...
> mask__27.10_51 = vect__4.9_49 != { 0, ... };
> ...
> vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
> ...
> vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
> vect__6.13_56);
>
> For RVV, we want IR as follows:
>
> ...
> _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
> ...
> mask__27.10_51 = vect__4.9_49 != { 0, ... };
> ...
> vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, 
> vect__6.13_55, _68, 0);
> ...
>
> Both len and mask of COND_LEN_ADD are real not dummy.
>
> This patch has been fully tested in RISC-V port with supporting both COND_* 
> and COND_LEN_*.
>
> And also, Bootstrap and Regression on X86 passed.
>
> OK for trunk?
>
> gcc/ChangeLog:
>
> * internal-fn.cc (FOR_EACH_LEN_FN_PAIR): New macro.
> (get_len_internal_fn): New function.
> (CASE): Ditto.
> * internal-fn.h (get_len_internal_fn): Ditto.
> * tree-vect-stmts.cc (vectorizable_call): Support CALL vectorization 
> with COND_LEN_*.
>
> ---
>  gcc/internal-fn.cc | 46 ++
>  gcc/internal-fn.h  |  1 +
>  gcc/tree-vect-stmts.cc | 87 +-
>  3 files changed, 125 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8e294286388..379220bebc7 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4443,6 +4443,52 @@ get_conditional_internal_fn (internal_fn fn)
>  }
>  }
>  
> +/* Invoke T(IFN) for each internal function IFN that also has an
> +   IFN_COND_LEN_* or IFN_MASK_LEN_* form.  */
> +#define FOR_EACH_LEN_FN_PAIR(T)  
>   \
> +  T (MASK_LOAD, MASK_LEN_LOAD)   
>   \
> +  T (MASK_STORE, MASK_LEN_STORE) 
>   \
> +  T (MASK_GATHER_LOAD, MASK_LEN_GATHER_LOAD) 
>   \
> +  T (MASK_SCATTER_STORE, MASK_LEN_SCATTER_STORE) 
>   \
> +  T (COND_ADD, COND_LEN_ADD) 
>   \
> +  T (COND_SUB, COND_LEN_SUB) 
>   \
> +  T (COND_MUL, COND_LEN_MUL) 
>   \
> +  T (COND_DIV, COND_LEN_DIV) 
>   \
> +  T (COND_MOD, COND_LEN_MOD) 
>   \
> +  T (COND_RDIV, COND_LEN_RDIV)   
>   \
> +  T (COND_FMIN, COND_LEN_FMIN)   
>   \
> +  T (COND_FMAX, COND_LEN_FMAX)   
>   \
> +  T (COND_MIN, COND_LEN_MIN) 
>   \
> +  T (COND_MAX, COND_LEN_MAX) 
>   \
> +  T (COND_AND, COND_LEN_AND) 
>   \
> +  T (COND_IOR, COND_LEN_IOR) 
>   \
> +  T (COND_XOR, COND_LEN_XOR) 
>   \
> +  T (COND_SHL, COND_LEN_SHL) 
>   \
> +  T (COND_SHR, COND_LEN_SHR) 
>   \
> +  T (COND_NEG, COND_LEN_NEG) 
>   \
> +  T (COND_FMA, COND_LEN_FMA) 
>   \
> +  T (COND_FMS, COND_LEN_FMS) 
>   \
> +  T (COND_FNMA, COND_LEN_FNMA)   
>   \
> +  T (COND_FNMS, COND_LEN_FNMS)
 
With the earlier patch 

Re: [PATCH] analyzer: Add support of placement new and improved operator new [PR105948]

2023-07-31 Thread Benjamin Priour via Gcc-patches
Hi Dave,

On Fri, Jul 21, 2023 at 10:10 PM David Malcolm  wrote:
 [...]

It looks like something's gone wrong with the indentation in the above:
> previously we had tab characters, but now I'm seeing a pair of spaces,
> which means this wouldn't line up properly.  This might be a glitch
> somewhere in our email workflow, but please check it in your editor
> (with visual whitespace enabled).
>

I'll double check that before submitting.


> [...snip...]
>
> Some comments on the test cases:
>
> > diff --git a/gcc/testsuite/g++.dg/analyzer/new-2.C
> b/gcc/testsuite/g++.dg/analyzer/new-2.C
> > new file mode 100644
> > index 000..4e696040a54
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/analyzer/new-2.C
> > @@ -0,0 +1,50 @@
> > +// { dg-additional-options "-O0" }
> > +
> > +struct A
> > +{
> > +  int x;
> > +  int y;
> > +};
> > +
> > +void test_spurious_null_warning_throwing ()
> > +{
> > +  int *x = new int; /* { dg-bogus "dereference of possibly-NULL" } */
> > +  int *y = new int (); /* { dg-bogus "dereference of possibly-NULL"
> "non-throwing" } */
> > +  int *arr = new int[3]; /* { dg-bogus "dereference of possibly-NULL" }
> */
> > +  A *a = new A (); /* { dg-bogus "dereference of possibly-NULL"
> "throwing new cannot be null" } */
> > +
> > +  int z = *y + 2;
> > +  z = *x + 4; /* { dg-bogus "dereference of possibly-NULL 'x'" } */
> > +  /* { dg-warning "use of uninitialized value '\\*x'" "" { target *-*-*
> } .-1 } */
> > +  z = arr[0] + 4; /* { dg-bogus "dereference of possibly-NULL" } */
> > +
> > +  delete a;
> > +  delete y;
> > +  delete x;
> > +  delete[] arr;
> > +}
> > +
> > +void test_default_initialization ()
> > +{
> > +int *y = ::new int;
> > +int *x = ::new int (); /* { dg-bogus "dereference of possibly-NULL
> 'operator new" } */
> > +
> > +int b = *x + 3; /* { dg-bogus "dereference of possibly-NULL" } */
> > +/* { dg-bogus "use of uninitialized ‘*x’" "" { target *-*-* } .-1 }
> */
> > +int a = *y + 2; /* { dg-bogus "dereference of possibly-NULL 'y'" }
> */
> > +/* { dg-warning "use of uninitialized value '\\*y'" "no default
> init" { target *-*-* } .-1 } */
> > +
> > +delete x;
> > +delete y;
> > +}
> > +
> > +/* From clang core.uninitialized.NewArraySize
> > +new[] should not be called with an undefined size argument */
> > +
> > +void test_garbage_new_array ()
> > +{
> > +  int n;
> > +  int *arr = ::new int[n]; /* { dg-warning "use of uninitialized value
> 'n'" } */
> > +  arr[0] = 7;
> > +  ::delete[] arr; /* no warnings emitted here either */
> > +}
> > diff --git a/gcc/testsuite/g++.dg/analyzer/noexcept-new.C
> b/gcc/testsuite/g++.dg/analyzer/noexcept-new.C
> > new file mode 100644
> > index 000..7699cd99cff
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/analyzer/noexcept-new.C
> > @@ -0,0 +1,48 @@
> > +/* { dg-additional-options "-O0 -fno-exceptions
> -fno-analyzer-suppress-followups" } */
> > +#include 
> > +
> > +/* Test non-throwing variants of operator new */
> > +
> > +struct A
> > +{
> > +  int x;
> > +  int y;
> > +};
> > +
> > +void test_throwing ()
> > +{
> > +  int* x = new int;
> > +  int* y = new int(); /* { dg-warning "dereference of possibly-NULL" }
> */
> > +  int* arr = new int[10];
> > +  A *a = new A(); /* { dg-warning "dereference of possibly-NULL" } */
> > +
> > +  int z = *y + 2;
> > +  z = *x + 4; /* { dg-warning "dereference of possibly-NULL 'x'" } */
> > +  /* { dg-warning "use of uninitialized value '\\*x'" "" { target *-*-*
> } .-1 } */
> > +  z = arr[0] + 4; /* { dg-warning "dereference of possibly-NULL 'arr'"
> } */
> > +  /* { dg-warning "use of uninitialized value '\\*arr'" "" { target
> *-*-* } .-1 } */
> > +  a->y = a->x + 3;
> > +
> > +  delete a;
> > +  delete y;
> > +  delete x;
> > +  delete[] arr;
> > +}
> > +
> > +void test_nonthrowing ()
> > +{
> > +  int* x = new(std::nothrow) int;
> > +  int* y = new(std::nothrow) int();
> > +  int* arr = new(std::nothrow) int[10];
> > +
> > +  int z = *y + 2; /* { dg-warning "dereference of NULL 'y'" } */
> > +  /* { dg-warning "use of uninitialized value '\\*y'" "" { target *-*-*
> } .-1 } */
> > +  z = *x + 4; /* { dg-warning "dereference of possibly-NULL 'x'" } */
> > +  /* { dg-warning "use of uninitialized value '\\*x'" "" { target *-*-*
> } .-1 } */
> > +  z = arr[0] + 4; /* { dg-warning "dereference of possibly-NULL 'arr'"
> } */
> > +  /* { dg-warning "use of uninitialized value '\\*arr'" "" { target
> *-*-* } .-1 } */
> > +
> > +  delete y;
> > +  delete x;
> > +  delete[] arr;
> > +}
>
> I see that we have test coverage for:
>   noexcept-new.C: -fno-exceptions with new vs nothrow-new
> whereas:
>   new-2.C has (implicitly) -fexceptions with new
>
> It seems that of the four combinations for:
>   - exceptions enabled or disabled
> and:
>   - throwing versus non-throwing new
> this is covering three of the cases but is missing the case of nothrow-
> new when exceptions are enabled.
> Presumably new-2.C should gain test coverage for this case.  Or am 

Re: [PATCH] RISC-V: Return machine_mode rather than opt_machine_mode for get_mask_mode, NFC

2023-07-31 Thread Maciej W. Rozycki
On Mon, 31 Jul 2023, Kito Cheng via Gcc-patches wrote:

> Pushed, thanks :)

 This breaks compilation:

.../gcc/config/riscv/riscv-v.cc: In function 'void 
riscv_vector::expand_vec_series(rtx, rtx, rtx)':
.../gcc/config/riscv/riscv-v.cc:1251:16: error: unused variable 'mask_mode' 
[-Werror=unused-variable]
 1251 |   machine_mode mask_mode = get_mask_mode (mode);
  |^
.../gcc/config/riscv/riscv-v.cc: In function 'void 
riscv_vector::expand_vector_init_insert_elems(rtx, const 
riscv_vector::rvv_builder&, int)':
.../gcc/config/riscv/riscv-v.cc:2320:16: error: unused variable 'mask_mode' 
[-Werror=unused-variable]
 2320 |   machine_mode mask_mode = get_mask_mode (mode);
  |^

Please always at the very least build changes and verify that they cause 
no new issues before submitting patches.

  Maciej


[PATCH 3/3] genmatch: Log line numbers indirectly

2023-07-31 Thread Andrzej Turko via Gcc-patches
Currently fprintf calls logging to a dump file take line numbers
in the match.pd file directly as arguments.
When match.pd is edited, referenced code changes line numbers,
which causes changes to many fprintf calls and, thus, to many
(usually all) .cc files generated by genmatch. This forces make
to (unnecessarily) rebuild many .o files.

With this change those logging fprintf calls reference an array
of line numbers, which is defined in one of the produced files.
Thanks to this, when match.pd changes, it is enough to rebuild
that single file and, of course, those actually affected by the
change.

Signed-off-by: Andrzej Turko 

gcc/ChangeLog:

* genmatch.cc: Keep line numbers from match.pd in an array.

Signed-off-by: Andrzej Turko 
---
 gcc/genmatch.cc | 73 +++--
 1 file changed, 65 insertions(+), 8 deletions(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 1deca505603..0a480a140c9 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -217,9 +217,48 @@ fp_decl_done (FILE *f, const char *trailer)
 fprintf (header_file, "%s;", trailer);
 }
 
+/* Line numbers for use by indirect line directives.  */
+static vec dbg_line_numbers;
+
+static void
+write_header_declarations (bool gimple, FILE *f)
+{
+  if (gimple)
+fprintf (f, "\nextern int __gimple_dbg_line_numbers[];\n");
+  else
+fprintf (f, "\nextern int __generic_dbg_line_numbers[];\n");
+}
+
+static void
+define_dbg_line_numbers (bool gimple, FILE *f)
+{
+  if (gimple)
+fprintf (f, "\nint __gimple_dbg_line_numbers[%d] = {",
+   dbg_line_numbers.length ());
+  else
+fprintf (f, "\nint __generic_dbg_line_numbers[%d] = {",
+   dbg_line_numbers.length ());
+
+   if (dbg_line_numbers.is_empty ())
+{
+  fprintf (f, "};\n\n");
+  return;
+}
+
+  for (int i = 0; i < (int)dbg_line_numbers.length () - 1; i++)
+{
+  if (i % 20 == 0)
+   fprintf (f, "\n\t");
+
+  fprintf (f, "%d, ", dbg_line_numbers[i]);
+}
+  fprintf (f, "%d\n};\n\n", dbg_line_numbers.last ());
+}
+
 static void
 output_line_directive (FILE *f, location_t location,
-  bool dumpfile = false, bool fnargs = false)
+ bool dumpfile = false, bool fnargs = false,
+ bool indirect_line_numbers = false, bool gimple = false)
 {
   const line_map_ordinary *map;
   linemap_resolve_location (line_table, location, LRK_SPELLING_LOCATION, );
@@ -239,7 +278,20 @@ output_line_directive (FILE *f, location_t location,
++file;
 
   if (fnargs)
-   fprintf (f, "\"%s\", %d", file, loc.line);
+  {
+  if (indirect_line_numbers)
+{
+  if (gimple)
+  fprintf (f, "\"%s\", __gimple_dbg_line_numbers[%d]",
+ file, dbg_line_numbers.length ());
+  else
+  fprintf (f, "\"%s\", __generic_dbg_line_numbers[%d]",
+ file, dbg_line_numbers.length ());
+  dbg_line_numbers.safe_push (loc.line);
+}
+  else
+fprintf (f, "\"%s\", %d", file, loc.line);
+  }
   else
fprintf (f, "%s:%d", file, loc.line);
 }
@@ -3378,7 +3430,8 @@ dt_operand::gen (FILE *f, int indent, bool gimple, int 
depth)
 /* Emit a fprintf to the debug file to the file F, with the INDENT from
either the RESULT location or the S's match location if RESULT is null. */
 static void
-emit_debug_printf (FILE *f, int indent, class simplify *s, operand *result)
+emit_debug_printf (FILE *f, int indent, class simplify *s, operand *result,
+ bool gimple)
 {
   fprintf_indent (f, indent, "if (UNLIKELY (debug_dump)) "
   "fprintf (dump_file, \"%s ",
@@ -3387,7 +3440,7 @@ emit_debug_printf (FILE *f, int indent, class simplify 
*s, operand *result)
   fprintf (f, "%%s:%%d, %%s:%%d\\n\", ");
   output_line_directive (f,
 result ? result->location : s->match->location, true,
-true);
+true, true, gimple);
   fprintf (f, ", __FILE__, __LINE__);\n");
 }
 
@@ -3524,7 +3577,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   if (!result)
 {
   /* If there is no result then this is a predicate implementation.  */
-  emit_debug_printf (f, indent, s, result);
+  emit_debug_printf (f, indent, s, result, gimple);
   fprintf_indent (f, indent, "return true;\n");
 }
   else if (gimple)
@@ -3615,7 +3668,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
}
   else
gcc_unreachable ();
-  emit_debug_printf (f, indent, s, result);
+  emit_debug_printf (f, indent, s, result, gimple);
   fprintf_indent (f, indent, "return true;\n");
 }
   else /* GENERIC */
@@ -3670,7 +3723,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
}
  if (is_predicate)
{
- emit_debug_printf (f, indent, s, result);
+ emit_debug_printf (f, indent, s, result, gimple);
  

[PATCH 0/3] genmatch: Speed up recompilation after changes to match.pd

2023-07-31 Thread Andrzej Turko via Gcc-patches
The following reduces the number of object files that need to be rebuilt
after match.pd has been modified. Right now a change to match.pd which
adds/removes a line almost always forces recompilation of all files that
genmatch generates from it. This is because of unnecessary changes to
the generated .cc files:

1. Function names and ordering change as does the way the functions are
distributed across multiple source files.
2. Code locations from match.pd are quoted directly (including line
numbers) by logging fprintf calls.

This patch addresses the those issues without changing the behaviour
of the generated code. The first one is solved by making sure that minor
changes to match.pd do not influence the order in which functions are
generated. The second one by using a lookup table with line numbers.

Now a change to a single function will trigger a rebuild of 4 object
files (one with the function  and the one with the lookup table both for
gimple and generic) instead all of them (20 by default).
For reference, this decreased the rebuild time with 48 threads from 3.5
minutes to 1.5 minutes on my machine.


Note for reviewers: I do not have write access.

Andrzej Turko (3):
  Support get_or_insert in ordered_hash_map
  genmatch: Reduce variability of generated code
  genmatch: Log line numbers indirectly

 gcc/Makefile.in   |  4 +-
 gcc/genmatch.cc   | 76 ++-
 gcc/ordered-hash-map-tests.cc | 19 +++--
 gcc/ordered-hash-map.h| 26 
 4 files changed, 110 insertions(+), 15 deletions(-)

-- 
2.34.1



[PATCH 1/3] Support get_or_insert in ordered_hash_map

2023-07-31 Thread Andrzej Turko via Gcc-patches
Get_or_insert method is already supported by the unordered hash map.
Adding it to the ordered map enables us to replace the unordered map
with the ordered one in cases where ordering may be useful.

Signed-off-by: Andrzej Turko 

gcc/ChangeLog:

* ordered-hash-map.h: Add get_or_insert.
* Makefile.in: Require the ordered map header for genmatch.o.
* ordered-hash-map-tests.cc: Use get_or_insert in tests.

Signed-off-by: Andrzej Turko 
---
 gcc/Makefile.in   |  4 ++--
 gcc/ordered-hash-map-tests.cc | 19 +++
 gcc/ordered-hash-map.h| 26 ++
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index e99628cec07..2429128cbf2 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3004,8 +3004,8 @@ build/genhooks.o : genhooks.cc $(TARGET_DEF) 
$(C_TARGET_DEF)  \
   $(COMMON_TARGET_DEF) $(D_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h
 build/genmddump.o : genmddump.cc $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)
\
   $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)
-build/genmatch.o : genmatch.cc $(BCONFIG_H) $(SYSTEM_H) \
-  $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \
+build/genmatch.o : genmatch.cc $(BCONFIG_H) $(SYSTEM_H) $(CORETYPES_H) \
+  errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h ordered-hash-map.h \
   tree.def builtins.def internal-fn.def case-cfn-macros.h $(CPPLIB_H)
 build/gencfn-macros.o : gencfn-macros.cc $(BCONFIG_H) $(SYSTEM_H)  \
   $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def  \
diff --git a/gcc/ordered-hash-map-tests.cc b/gcc/ordered-hash-map-tests.cc
index 1c26bbfa979..55894c25fa0 100644
--- a/gcc/ordered-hash-map-tests.cc
+++ b/gcc/ordered-hash-map-tests.cc
@@ -58,6 +58,7 @@ static void
 test_map_of_strings_to_int ()
 {
   ordered_hash_map  m;
+  bool existed;
 
   const char *ostrich = "ostrich";
   const char *elephant = "elephant";
@@ -74,17 +75,23 @@ test_map_of_strings_to_int ()
   ASSERT_EQ (false, m.put (ostrich, 2));
   ASSERT_EQ (false, m.put (elephant, 4));
   ASSERT_EQ (false, m.put (ant, 6));
-  ASSERT_EQ (false, m.put (spider, 8));
+  existed = true;
+  int  = m.get_or_insert (spider, );
+  value = 8;
+  ASSERT_EQ (false, existed);
   ASSERT_EQ (false, m.put (millipede, 750));
   ASSERT_EQ (false, m.put (eric, 3));
 
+
   /* Verify that we can recover the stored values.  */
   ASSERT_EQ (6, m.elements ());
   ASSERT_EQ (2, *m.get (ostrich));
   ASSERT_EQ (4, *m.get (elephant));
   ASSERT_EQ (6, *m.get (ant));
   ASSERT_EQ (8, *m.get (spider));
-  ASSERT_EQ (750, *m.get (millipede));
+  existed = false;
+  ASSERT_EQ (750, m.get_or_insert (millipede, ));
+  ASSERT_EQ (true, existed);
   ASSERT_EQ (3, *m.get (eric));
 
   /* Verify that the order of insertion is preserved.  */
@@ -113,6 +120,7 @@ test_map_of_int_to_strings ()
 {
   const int EMPTY = -1;
   const int DELETED = -2;
+  bool existed;
   typedef int_hash  int_hash_t;
   ordered_hash_map  m;
 
@@ -131,7 +139,9 @@ test_map_of_int_to_strings ()
   ASSERT_EQ (false, m.put (2, ostrich));
   ASSERT_EQ (false, m.put (4, elephant));
   ASSERT_EQ (false, m.put (6, ant));
-  ASSERT_EQ (false, m.put (8, spider));
+  const char*  = m.get_or_insert (8, );
+  value = spider;
+  ASSERT_EQ (false, existed);
   ASSERT_EQ (false, m.put (750, millipede));
   ASSERT_EQ (false, m.put (3, eric));
 
@@ -141,7 +151,8 @@ test_map_of_int_to_strings ()
   ASSERT_EQ (*m.get (4), elephant);
   ASSERT_EQ (*m.get (6), ant);
   ASSERT_EQ (*m.get (8), spider);
-  ASSERT_EQ (*m.get (750), millipede);
+  ASSERT_EQ (m.get_or_insert (750, ), millipede);
+  ASSERT_EQ (existed, TRUE);
   ASSERT_EQ (*m.get (3), eric);
 
   /* Verify that the order of insertion is preserved.  */
diff --git a/gcc/ordered-hash-map.h b/gcc/ordered-hash-map.h
index 6b68cc96305..9fc875182e1 100644
--- a/gcc/ordered-hash-map.h
+++ b/gcc/ordered-hash-map.h
@@ -76,6 +76,32 @@ public:
 return m_map.get (k);
   }
 
+  /* Return a reference to the value for the passed in key, creating the entry
+if it doesn't already exist.  If existed is not NULL then it is set to
+false if the key was not previously in the map, and true otherwise.  */
+
+  Value _or_insert (const Key , bool *existed = NULL)
+  {
+bool _existed;
+Value  = m_map.get_or_insert (k, &_existed);
+
+if (!_existed)
+  {
+   bool key_present;
+   int  = m_key_index.get_or_insert (k, _present);
+   if (!key_present)
+ {
+   slot = m_keys.length ();
+   m_keys.safe_push (k);
+ }
+  }
+
+if (existed)
+  *existed = _existed;
+
+return ret;
+  }
+
   /* Removing a key removes it from the map, but retains the insertion
  order.  */
 
-- 
2.34.1



[PATCH 2/3] genmatch: Reduce variability of generated code

2023-07-31 Thread Andrzej Turko via Gcc-patches
So far genmatch has been using an unordered map to store information about
functions to be generated. Since corresponding locations from match.pd were
used as keys in the map, even small changes to match.pd which caused
line number changes would change the order in which the functions are
generated. This would reshuffle the functions between the generated .cc files.
This way even a minimal modification to match.pd forces recompilation of all
object files originating from match.pd on rebuild.

This commit makes sure that functions are generated in the order of their
processing (in contrast to the random order based on hashes of their
locations in match.pd). This is done by replacing the unordered map with an
ordered one. This way small changes to match.pd does not cause function
renaming and reshuffling among generated source files.
Together with the subsequent change to logging fprintf calls, this
removes unnecessary changes to the files generated by genmatch allowing
for reuse of already built object files during rebuild. The aim is to
make editing of match.pd and subsequent testing easier.

Signed-off-by: Andrzej Turko 

gcc/ChangeLog:

* genmatch.cc: Make sinfo map ordered.

Signed-off-by: Andrzej Turko 
---
 gcc/genmatch.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 2302f2a7ff0..1deca505603 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "hash-table.h"
 #include "hash-set.h"
 #include "is-a.h"
+#include "ordered-hash-map.h"
 
 
 /* Stubs for GGC referenced through instantiations triggered by hash-map.  */
@@ -1684,7 +1685,7 @@ struct sinfo_hashmap_traits : 
simple_hashmap_traits,
   template  static inline void remove (T &) {}
 };
 
-typedef hash_map
+typedef ordered_hash_map
   sinfo_map_t;
 
 /* Current simplifier ID we are processing during insertion into the
-- 
2.34.1



Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-31 Thread juzhe.zh...@rivai.ai
>> Ah.  So then just feed it cond_fn?  I mean, we don't have
>> LEN_FMA, the only LEN-but-not-MASK ifns are those used by
>> power/s390, LEN_LOAD and LEN_STORE?

Yes, that's why I feed cond_fn with get_len_internal_fn (cond_fn)

>> Yes, but all of this depends on what the original ifn is, no?

Yes.

>> reduc_idx < 0 means this stmt isn't part of a reduction.  So yes,
>> you can vectorize FMA as COND_LEN_FMA with dummy mask and len if you
>> don't have FMA expanders?

Could you give me an example that reduction >= 0 when vectorizing FMA into 
COND_LEN_FMA?

Actually, I failed to produce such circumstance in this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625697.html 

I only fully tested vectorizing COND_* into COND_LEN_*
but I failed to produce the case that:

FMA ---> COND_LEN_FMA.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-31 18:45
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richard. Thanks a lot for the comment
> 
> >> In your code above
> >> you either use cond_len_fn or get_len_internal_fn (cond_fn) but
> >> isn't that the very same?!  So how come you in one case add two
> >> and in the other add four args?
> 
> cond_len_fn is not the same as get_len_internal_fn (cond_fn) when vectorizing 
> FMA into COND_LEN_FMA.
> 
> since "internal_fn cond_len_fn = get_len_internal_fn (ifn);"
> 
> and the iterators:
> > +#define FOR_EACH_LEN_FN_PAIR(T)
> > \
> > +  T (MASK_LOAD, MASK_LEN_LOAD) 
> > \
> > +  T (MASK_STORE, MASK_LEN_STORE)   
> > \
> > +  T (MASK_GATHER_LOAD, MASK_LEN_GATHER_LOAD)   
> > \
> > +  T (MASK_SCATTER_STORE, MASK_LEN_SCATTER_STORE)   
> > \
> > +  T (COND_ADD, COND_LEN_ADD)   
> > \
> > +  T (COND_SUB, COND_LEN_SUB)   
> > \
> > +  T (COND_MUL, COND_LEN_MUL)   
> > \
> > +  T (COND_DIV, COND_LEN_DIV)   
> > \
> > +  T (COND_MOD, COND_LEN_MOD)   
> > \
> > +  T (COND_RDIV, COND_LEN_RDIV) 
> > \
> > +  T (COND_FMIN, COND_LEN_FMIN) 
> > \
> > +  T (COND_FMAX, COND_LEN_FMAX) 
> > \
> > +  T (COND_MIN, COND_LEN_MIN)   
> > \
> > +  T (COND_MAX, COND_LEN_MAX)   
> > \
> > +  T (COND_AND, COND_LEN_AND)   
> > \
> > +  T (COND_IOR, COND_LEN_IOR)   
> > \
> > +  T (COND_XOR, COND_LEN_XOR)   
> > \
> > +  T (COND_SHL, COND_LEN_SHL)   
> > \
> > +  T (COND_SHR, COND_LEN_SHR)   
> > \
> > +  T (COND_NEG, COND_LEN_NEG)   
> > \
> > +  T (COND_FMA, COND_LEN_FMA)   
> > \
> > +  T (COND_FMS, COND_LEN_FMS)   
> > \
> > +  T (COND_FNMA, COND_LEN_FNMA) 
> > \
> > +  T (COND_FNMS, COND_LEN_FNMS)
> 
> So, cond_len_fn will be IFN_LAST when ifn = FMA.
 
Ah.  So then just feed it cond_fn?  I mean, we don't have
LEN_FMA, the only LEN-but-not-MASK ifns are those used by
power/s390, LEN_LOAD and LEN_STORE?
 
> Maybe is it reasonable that I add 4 more iterators here?
> > +  T (FMA, COND_LEN_FMA)   \
> > +  T (FMS, COND_LEN_FMS)   \
> > +  T (FNMA, COND_LEN_FNMA) \
> > +  T (FNMS, COND_LEN_FNMS)
> 
> So that we won't need to have "get_len_internal_fn (cond_fn)"
 
No, as said we don't have LEN_FMA.
 
> When vectorizing COND_ADD into COND_LEN_ADD we already have "mask" and "else" 
> value.
> So we only need to add 2 arguments.
> 
> But when vectorizing FMA into COND_LEN_FMA, we need to add 4 arguments 
> (mask,else,len,bias).
 
Yes, but all of this depends on what the original ifn is, no?
 
> >>as said,
> >>no idea why we special case reduc_idx >= 0 at the moment
> 
> I also want to vectorize FMA into COND_LEN_FMA even reduc_idx < 0.
> Could I relax this condition for COND_LEN_* since it will improve RVV codegen 
> a lot.
 
reduc_idx < 0 means this stmt isn't part of a reduction.  So yes,
you can vectorize 

Re: [PATCH] rtl-optimization/110587 - speedup find_hard_regno_for_1

2023-07-31 Thread Richard Biener via Gcc-patches
On Tue, 25 Jul 2023, Richard Biener wrote:

> The following applies a micro-optimization to find_hard_regno_for_1,
> re-ordering the check so we can easily jump-thread by using an else.
> This reduces the time spent in this function by 15% for the testcase
> in the PR.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu, OK if that
> passes?

Ping.

> Thanks,
> Richard.
> 
>   PR rtl-optimization/110587
>   * lra-assigns.cc (find_hard_regno_for_1): Re-order checks.
> ---
>  gcc/lra-assigns.cc | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
> index b8582dcafff..d2ebcfd5056 100644
> --- a/gcc/lra-assigns.cc
> +++ b/gcc/lra-assigns.cc
> @@ -522,14 +522,15 @@ find_hard_regno_for_1 (int regno, int *cost, int 
> try_only_hard_regno,
>  r2 != NULL;
>  r2 = r2->start_next)
>   {
> -   if (r2->regno >= lra_constraint_new_regno_start
> +   if (live_pseudos_reg_renumber[r2->regno] < 0
> +   && r2->regno >= lra_constraint_new_regno_start
> && lra_reg_info[r2->regno].preferred_hard_regno1 >= 0
> -   && live_pseudos_reg_renumber[r2->regno] < 0
> && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
>   sparseset_set_bit (conflict_reload_and_inheritance_pseudos,
>  r2->regno);
> -   if (live_pseudos_reg_renumber[r2->regno] >= 0
> -   && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
> +   else if (live_pseudos_reg_renumber[r2->regno] >= 0
> +&& rclass_intersect_p
> + [regno_allocno_class_array[r2->regno]])
>   sparseset_set_bit (live_range_hard_reg_pseudos, r2->regno);
>   }
>   }
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] rtl-optimization/110587 - remove quadratic regno_in_use_p

2023-07-31 Thread Richard Biener via Gcc-patches
On Tue, 25 Jul 2023, Richard Biener wrote:

> The following removes the code checking whether a noop copy
> is between something involved in the return sequence composed
> of a SET and USE.  Instead of checking for this special-case
> the following makes us only ever remove noop copies between
> pseudos - which is the case that is necessary for IRA/LRA
> interfacing to function according to the comment.  That makes
> looking for the return reg special case unnecessary, reducing
> the compile-time in LRA non-specific to zero for the testcase.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu with
> all languages and {,-m32}.
> 
> OK?

Ping.

> Thanks,
> Richard.
> 
>   PR rtl-optimization/110587
>   * lra-spills.cc (return_regno_p): Remove.
>   (regno_in_use_p): Likewise.
>   (lra_final_code_change): Do not remove noop moves
>   between hard registers.
> ---
>  gcc/lra-spills.cc | 69 +--
>  1 file changed, 1 insertion(+), 68 deletions(-)
> 
> diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
> index 3a7bb7e8cd9..fe58f162d05 100644
> --- a/gcc/lra-spills.cc
> +++ b/gcc/lra-spills.cc
> @@ -705,72 +705,6 @@ alter_subregs (rtx *loc, bool final_p)
>return res;
>  }
>  
> -/* Return true if REGNO is used for return in the current
> -   function.  */
> -static bool
> -return_regno_p (unsigned int regno)
> -{
> -  rtx outgoing = crtl->return_rtx;
> -
> -  if (! outgoing)
> -return false;
> -
> -  if (REG_P (outgoing))
> -return REGNO (outgoing) == regno;
> -  else if (GET_CODE (outgoing) == PARALLEL)
> -{
> -  int i;
> -
> -  for (i = 0; i < XVECLEN (outgoing, 0); i++)
> - {
> -   rtx x = XEXP (XVECEXP (outgoing, 0, i), 0);
> -
> -   if (REG_P (x) && REGNO (x) == regno)
> - return true;
> - }
> -}
> -  return false;
> -}
> -
> -/* Return true if REGNO is in one of subsequent USE after INSN in the
> -   same BB.  */
> -static bool
> -regno_in_use_p (rtx_insn *insn, unsigned int regno)
> -{
> -  static lra_insn_recog_data_t id;
> -  static struct lra_static_insn_data *static_id;
> -  struct lra_insn_reg *reg;
> -  int i, arg_regno;
> -  basic_block bb = BLOCK_FOR_INSN (insn);
> -
> -  while ((insn = next_nondebug_insn (insn)) != NULL_RTX)
> -{
> -  if (BARRIER_P (insn) || bb != BLOCK_FOR_INSN (insn))
> - return false;
> -  if (! INSN_P (insn))
> - continue;
> -  if (GET_CODE (PATTERN (insn)) == USE
> -   && REG_P (XEXP (PATTERN (insn), 0))
> -   && regno == REGNO (XEXP (PATTERN (insn), 0)))
> - return true;
> -  /* Check that the regno is not modified.  */
> -  id = lra_get_insn_recog_data (insn);
> -  for (reg = id->regs; reg != NULL; reg = reg->next)
> - if (reg->type != OP_IN && reg->regno == (int) regno)
> -   return false;
> -  static_id = id->insn_static_data;
> -  for (reg = static_id->hard_regs; reg != NULL; reg = reg->next)
> - if (reg->type != OP_IN && reg->regno == (int) regno)
> -   return false;
> -  if (id->arg_hard_regs != NULL)
> - for (i = 0; (arg_regno = id->arg_hard_regs[i]) >= 0; i++)
> -   if ((int) regno == (arg_regno >= FIRST_PSEUDO_REGISTER
> -   ? arg_regno : arg_regno - FIRST_PSEUDO_REGISTER))
> - return false;
> -}
> -  return false;
> -}
> -
>  /* Final change of pseudos got hard registers into the corresponding
> hard registers and removing temporary clobbers.  */
>  void
> @@ -821,8 +755,7 @@ lra_final_code_change (void)
> if (NONJUMP_INSN_P (insn) && GET_CODE (pat) == SET
> && REG_P (SET_SRC (pat)) && REG_P (SET_DEST (pat))
> && REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat))
> -   && (! return_regno_p (REGNO (SET_SRC (pat)))
> -   || ! regno_in_use_p (insn, REGNO (SET_SRC (pat)
> +   && REGNO (SET_SRC (pat)) >= FIRST_PSEUDO_REGISTER)
>   {
> lra_invalidate_insn_data (insn);
> delete_insn (insn);
> 


Re: [PATCH 2/2] MATCH: Add `a == b | a cmp b` and `a != b & a cmp b` simplifications

2023-07-31 Thread Richard Biener via Gcc-patches
On Mon, Jul 31, 2023 at 7:35 AM Andrew Pinski via Gcc-patches
 wrote:
>
> Even though these are done by combine_comparisons, we can add them to match
> to allow simplifcations during match rather than just during 
> reassoc/ifcombine.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.  Note we want to get rid of the GENERIC folding parts of
maybe_fold_{and,or}_comparisons which is what those passes rely on.

Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/106164
> * match.pd (`a != b & a <= b`, `a != b & a >= b`,
> `a == b | a < b`, `a == b | a > b`): Handle these cases
> too.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/106164
> * gcc.dg/tree-ssa/cmpbit-2.c: New test.
> ---
>  gcc/match.pd | 32 +--
>  gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c | 39 
>  2 files changed, 69 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 00af5d99119..cf8057701ea 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2832,7 +2832,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (switch
>(if (code1 == EQ_EXPR && val) @3)
>(if (code1 == EQ_EXPR && !val) { constant_boolean_node (false, type); 
> })
> -  (if (code1 == NE_EXPR && !val) @4)))
> +  (if (code1 == NE_EXPR && !val) @4)
> +  (if (code1 == NE_EXPR
> +   && code2 == GE_EXPR
> +  && cmp == 0)
> +   (gt @0 @1))
> +  (if (code1 == NE_EXPR
> +   && code2 == LE_EXPR
> +  && cmp == 0)
> +   (lt @0 @1))
> + )
> +)
> +   )
> +  )
> + )
> +)
>
>  /* Convert (X OP1 CST1) && (X OP2 CST2).
> Convert (X OP1 Y) && (X OP2 Y).  */
> @@ -2917,7 +2931,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (switch
>(if (code1 == EQ_EXPR && val) @4)
>(if (code1 == NE_EXPR && val) { constant_boolean_node (true, type); })
> -  (if (code1 == NE_EXPR && !val) @3)))
> +  (if (code1 == NE_EXPR && !val) @3)
> +  (if (code1 == EQ_EXPR
> +   && code2 == GT_EXPR
> +  && cmp == 0)
> +   (ge @0 @1))
> +  (if (code1 == EQ_EXPR
> +   && code2 == LT_EXPR
> +  && cmp == 0)
> +   (le @0 @1))
> + )
> +)
> +   )
> +  )
> + )
> +)
>
>  /* Convert (X OP1 CST1) || (X OP2 CST2).
> Convert (X OP1 Y)|| (X OP2 Y).  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c
> new file mode 100644
> index 000..c4226ef01af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fno-tree-reassoc -fdump-tree-optimized-raw" } */
> +
> +_Bool f(int a, int b)
> +{
> +  _Bool c = a == b;
> +  _Bool d = a > b;
> +  return c | d;
> +}
> +
> +_Bool f1(int a, int b)
> +{
> +  _Bool c = a != b;
> +  _Bool d = a >= b;
> +  return c & d;
> +}
> +
> +_Bool g(int a, int b)
> +{
> +  _Bool c = a == b;
> +  _Bool d = a < b;
> +  return c | d;
> +}
> +
> +_Bool g1(int a, int b)
> +{
> +  _Bool c = a != b;
> +  _Bool d = a <= b;
> +  return c & d;
> +}
> +
> +
> +/* We should be able to optimize these without reassociation too. */
> +/* { dg-final { scan-tree-dump-not "bit_and_expr," "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "bit_ior_expr," "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "gt_expr," 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "ge_expr," 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "lt_expr," 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "le_expr," 1 "optimized" } } */
> --
> 2.31.1
>


Re: [PATCH 1/2] MATCH: PR 106164 : Optimize `(X CMP1 Y) AND/IOR (X CMP2 Y)`

2023-07-31 Thread Richard Biener via Gcc-patches
On Mon, Jul 31, 2023 at 7:35 AM Andrew Pinski via Gcc-patches
 wrote:
>
> I noticed that there are patterns that optimize
> `(X CMP1 CST1) AND/IOR (X CMP2 CST2)` and we can easily extend
> them to support the  `(X CMP1 Y) AND/IOR (X CMP2 Y)` by saying they
> compare equal. This allows for this kind of optimization for integral
> and pointer types (which have the same semantics).
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/106164
> * match.pd: Extend the `(X CMP1 CST1) AND/IOR (X CMP2 CST2)`
> patterns to support `(X CMP1 Y) AND/IOR (X CMP2 Y)`.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/106164
> * gcc.dg/tree-ssa/cmpbit-1.c: New test.
> ---
>  gcc/match.pd | 66 +++-
>  gcc/testsuite/gcc.dg/tree-ssa/cmpbit-1.c | 38 ++
>  2 files changed, 90 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 73eb249f704..00af5d99119 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2799,14 +2799,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>
>  /* Convert (X == CST1) && (X OP2 CST2) to a known value
> based on CST1 OP2 CST2.  Similarly for (X != CST1).  */
> +/* Convert (X == Y) && (X OP2 Y) to a known value if X is an integral type.
> +   Similarly for (X != Y).  */
>
>  (for code1 (eq ne)
>   (for code2 (eq ne lt gt le ge)
>(simplify
> -   (bit_and:c (code1@3 @0 INTEGER_CST@1) (code2@4 @0 INTEGER_CST@2))
> +   (bit_and:c (code1@3 @0 @1) (code2@4 @0 @2))
> +   (if ((TREE_CODE (@1) == INTEGER_CST
> +&& TREE_CODE (@2) == INTEGER_CST)
> +   || ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +|| POINTER_TYPE_P (TREE_TYPE (@1)))
> +   && operand_equal_p (@1, @2)))
>  (with
>   {
> -  int cmp = tree_int_cst_compare (@1, @2);
> +  int cmp = 0;
> +  if (TREE_CODE (@1) == INTEGER_CST
> + && TREE_CODE (@2) == INTEGER_CST)
> +   cmp = tree_int_cst_compare (@1, @2);
>bool val;
>switch (code2)
>  {
> @@ -2822,17 +2832,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (switch
>(if (code1 == EQ_EXPR && val) @3)
>(if (code1 == EQ_EXPR && !val) { constant_boolean_node (false, type); 
> })
> -  (if (code1 == NE_EXPR && !val) @4))
> +  (if (code1 == NE_EXPR && !val) @4)))
>
> -/* Convert (X OP1 CST1) && (X OP2 CST2).  */
> +/* Convert (X OP1 CST1) && (X OP2 CST2).
> +   Convert (X OP1 Y) && (X OP2 Y).  */
>
>  (for code1 (lt le gt ge)
>   (for code2 (lt le gt ge)
>(simplify
> -  (bit_and (code1:c@3 @0 INTEGER_CST@1) (code2:c@4 @0 INTEGER_CST@2))
> +  (bit_and (code1:c@3 @0 @1) (code2:c@4 @0 @2))
> +  (if ((TREE_CODE (@1) == INTEGER_CST
> +   && TREE_CODE (@2) == INTEGER_CST)
> +   || ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +   || POINTER_TYPE_P (TREE_TYPE (@1)))
> +  && operand_equal_p (@1, @2)))
> (with
>  {
> - int cmp = tree_int_cst_compare (@1, @2);
> + int cmp = 0;
> + if (TREE_CODE (@1) == INTEGER_CST
> +&& TREE_CODE (@2) == INTEGER_CST)
> +   cmp = tree_int_cst_compare (@1, @2);
>  }
>  (switch
>   /* Choose the more restrictive of two < or <= comparisons.  */
> @@ -2861,18 +2880,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   && (code1 == GT_EXPR || code1 == GE_EXPR)
>   && (code2 == LT_EXPR || code2 == LE_EXPR))
>{ constant_boolean_node (false, type); })
> - )
> + ))
>
>  /* Convert (X == CST1) || (X OP2 CST2) to a known value
> based on CST1 OP2 CST2.  Similarly for (X != CST1).  */
> +/* Convert (X == Y) || (X OP2 Y) to a known value if X is an integral type.
> +   Similarly for (X != Y).  */
>
>  (for code1 (eq ne)
>   (for code2 (eq ne lt gt le ge)
>(simplify
> -   (bit_ior:c (code1@3 @0 INTEGER_CST@1) (code2@4 @0 INTEGER_CST@2))
> +   (bit_ior:c (code1@3 @0 @1) (code2@4 @0 @2))
> +   (if ((TREE_CODE (@1) == INTEGER_CST
> +&& TREE_CODE (@2) == INTEGER_CST)
> +   || ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +   || POINTER_TYPE_P (TREE_TYPE (@1)))
> +   && operand_equal_p (@1, @2)))
>  (with
>   {
> -  int cmp = tree_int_cst_compare (@1, @2);
> +  int cmp = 0;
> +  if (TREE_CODE (@1) == INTEGER_CST
> + && TREE_CODE (@2) == INTEGER_CST)
> +   cmp = tree_int_cst_compare (@1, @2);
>bool val;
>switch (code2)
> {
> @@ -2888,17 +2917,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (switch
>(if (code1 == EQ_EXPR && val) @4)
>(if (code1 == NE_EXPR && val) { constant_boolean_node (true, type); })
> -  (if (code1 == NE_EXPR && !val) @3))
> +  (if (code1 == NE_EXPR && !val) @3)))
>
> -/* Convert (X OP1 CST1) || (X OP2 CST2).  */
> +/* Convert (X OP1 CST1) || (X OP2 CST2).
> +   Convert (X OP1 Y)|| 

Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-31 Thread Richard Biener via Gcc-patches
On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richard. Thanks a lot for the comment
> 
> >> In your code above
> >> you either use cond_len_fn or get_len_internal_fn (cond_fn) but
> >> isn't that the very same?!  So how come you in one case add two
> >> and in the other add four args?
> 
> cond_len_fn is not the same as get_len_internal_fn (cond_fn) when vectorizing 
> FMA into COND_LEN_FMA.
> 
> since "internal_fn cond_len_fn = get_len_internal_fn (ifn);"
> 
> and the iterators:
> > +#define FOR_EACH_LEN_FN_PAIR(T)
> > \
> > +  T (MASK_LOAD, MASK_LEN_LOAD) 
> > \
> > +  T (MASK_STORE, MASK_LEN_STORE)   
> > \
> > +  T (MASK_GATHER_LOAD, MASK_LEN_GATHER_LOAD)   
> > \
> > +  T (MASK_SCATTER_STORE, MASK_LEN_SCATTER_STORE)   
> > \
> > +  T (COND_ADD, COND_LEN_ADD)   
> > \
> > +  T (COND_SUB, COND_LEN_SUB)   
> > \
> > +  T (COND_MUL, COND_LEN_MUL)   
> > \
> > +  T (COND_DIV, COND_LEN_DIV)   
> > \
> > +  T (COND_MOD, COND_LEN_MOD)   
> > \
> > +  T (COND_RDIV, COND_LEN_RDIV) 
> > \
> > +  T (COND_FMIN, COND_LEN_FMIN) 
> > \
> > +  T (COND_FMAX, COND_LEN_FMAX) 
> > \
> > +  T (COND_MIN, COND_LEN_MIN)   
> > \
> > +  T (COND_MAX, COND_LEN_MAX)   
> > \
> > +  T (COND_AND, COND_LEN_AND)   
> > \
> > +  T (COND_IOR, COND_LEN_IOR)   
> > \
> > +  T (COND_XOR, COND_LEN_XOR)   
> > \
> > +  T (COND_SHL, COND_LEN_SHL)   
> > \
> > +  T (COND_SHR, COND_LEN_SHR)   
> > \
> > +  T (COND_NEG, COND_LEN_NEG)   
> > \
> > +  T (COND_FMA, COND_LEN_FMA)   
> > \
> > +  T (COND_FMS, COND_LEN_FMS)   
> > \
> > +  T (COND_FNMA, COND_LEN_FNMA) 
> > \
> > +  T (COND_FNMS, COND_LEN_FNMS)
> 
> So, cond_len_fn will be IFN_LAST when ifn = FMA.

Ah.  So then just feed it cond_fn?  I mean, we don't have
LEN_FMA, the only LEN-but-not-MASK ifns are those used by
power/s390, LEN_LOAD and LEN_STORE?

> Maybe is it reasonable that I add 4 more iterators here?
> > +  T (FMA, COND_LEN_FMA)   \
> > +  T (FMS, COND_LEN_FMS)   \
> > +  T (FNMA, COND_LEN_FNMA) \
> > +  T (FNMS, COND_LEN_FNMS)
> 
> So that we won't need to have "get_len_internal_fn (cond_fn)"

No, as said we don't have LEN_FMA.

> When vectorizing COND_ADD into COND_LEN_ADD we already have "mask" and "else" 
> value.
> So we only need to add 2 arguments.
> 
> But when vectorizing FMA into COND_LEN_FMA, we need to add 4 arguments 
> (mask,else,len,bias).

Yes, but all of this depends on what the original ifn is, no?

> >>as said,
> >>no idea why we special case reduc_idx >= 0 at the moment
> 
> I also want to vectorize FMA into COND_LEN_FMA even reduc_idx < 0.
> Could I relax this condition for COND_LEN_* since it will improve RVV codegen 
> a lot.

reduc_idx < 0 means this stmt isn't part of a reduction.  So yes,
you can vectorize FMA as COND_LEN_FMA with dummy mask and len if you
don't have FMA expanders?

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-31 17:26
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> On Fri, 28 Jul 2023, Juzhe-Zhong wrote:
>  
> > Hi, Richard and Richi.
> > 
> > Base on the suggestions from Richard:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
> > 
> > This patch choose (1) approach that Richard provided, meaning:
> > 
> > RVV implements cond_* optabs as expanders.  RVV therefore supports
> > both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
> > are needed at the gimple level.
> > 
> > Such approach can make codes much cleaner and reasonable.
> > 
> > Consider this following case:
> > void foo (float * __restrict a, float * __restrict b, int * __restrict 
> > cond, int n)
> > {
> >   for (int i = 0; i < n; i++)
> > if (cond[i])
> >   

Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-31 Thread Richard Sandiford via Gcc-patches
Juzhe-Zhong  writes:
> Hi, Richard and Richi.
>
> Base on the suggestions from Richard:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
>
> This patch choose (1) approach that Richard provided, meaning:
>
> RVV implements cond_* optabs as expanders.  RVV therefore supports
> both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
> are needed at the gimple level.
>
> Such approach can make codes much cleaner and reasonable.
>
> Consider this following case:
> void foo (float * __restrict a, float * __restrict b, int * __restrict cond, 
> int n)
> {
>   for (int i = 0; i < n; i++)
> if (cond[i])
>   a[i] = b[i] + a[i];
> }
>
>
> Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
> :5:21: missed: couldn't vectorize loop
> :5:21: missed: not vectorized: control flow in loop.
>
> ARM SVE:
>
> ...
> mask__27.10_51 = vect__4.9_49 != { 0, ... };
> ...
> vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
> ...
> vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
> vect__6.13_56);
>
> For RVV, we want IR as follows:
>
> ...
> _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
> ...
> mask__27.10_51 = vect__4.9_49 != { 0, ... };
> ...
> vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, 
> vect__6.13_55, _68, 0);
> ...
>
> Both len and mask of COND_LEN_ADD are real not dummy.
>
> This patch has been fully tested in RISC-V port with supporting both COND_* 
> and COND_LEN_*.
>
> And also, Bootstrap and Regression on X86 passed.
>
> OK for trunk?
>
> gcc/ChangeLog:
>
> * internal-fn.cc (FOR_EACH_LEN_FN_PAIR): New macro.
> (get_len_internal_fn): New function.
> (CASE): Ditto.
> * internal-fn.h (get_len_internal_fn): Ditto.
> * tree-vect-stmts.cc (vectorizable_call): Support CALL vectorization 
> with COND_LEN_*.
>
> ---
>  gcc/internal-fn.cc | 46 ++
>  gcc/internal-fn.h  |  1 +
>  gcc/tree-vect-stmts.cc | 87 +-
>  3 files changed, 125 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8e294286388..379220bebc7 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4443,6 +4443,52 @@ get_conditional_internal_fn (internal_fn fn)
>  }
>  }
>  
> +/* Invoke T(IFN) for each internal function IFN that also has an
> +   IFN_COND_LEN_* or IFN_MASK_LEN_* form.  */
> +#define FOR_EACH_LEN_FN_PAIR(T)  
>   \
> +  T (MASK_LOAD, MASK_LEN_LOAD)   
>   \
> +  T (MASK_STORE, MASK_LEN_STORE) 
>   \
> +  T (MASK_GATHER_LOAD, MASK_LEN_GATHER_LOAD) 
>   \
> +  T (MASK_SCATTER_STORE, MASK_LEN_SCATTER_STORE) 
>   \
> +  T (COND_ADD, COND_LEN_ADD) 
>   \
> +  T (COND_SUB, COND_LEN_SUB) 
>   \
> +  T (COND_MUL, COND_LEN_MUL) 
>   \
> +  T (COND_DIV, COND_LEN_DIV) 
>   \
> +  T (COND_MOD, COND_LEN_MOD) 
>   \
> +  T (COND_RDIV, COND_LEN_RDIV)   
>   \
> +  T (COND_FMIN, COND_LEN_FMIN)   
>   \
> +  T (COND_FMAX, COND_LEN_FMAX)   
>   \
> +  T (COND_MIN, COND_LEN_MIN) 
>   \
> +  T (COND_MAX, COND_LEN_MAX) 
>   \
> +  T (COND_AND, COND_LEN_AND) 
>   \
> +  T (COND_IOR, COND_LEN_IOR) 
>   \
> +  T (COND_XOR, COND_LEN_XOR) 
>   \
> +  T (COND_SHL, COND_LEN_SHL) 
>   \
> +  T (COND_SHR, COND_LEN_SHR) 
>   \
> +  T (COND_NEG, COND_LEN_NEG) 
>   \
> +  T (COND_FMA, COND_LEN_FMA) 
>   \
> +  T (COND_FMS, COND_LEN_FMS) 
>   \
> +  T (COND_FNMA, COND_LEN_FNMA)   
>   \
> +  T (COND_FNMS, COND_LEN_FNMS)

With the earlier patch to add DEF_INTERNAL_COND_FN and
DEF_INTERNAL_SIGNED_COND_FN, I think we should use those to handle
the COND_* cases, rather than putting them in this macro.

Thanks,
Richard


Re: [PATCH] internal-fn: Refine macro define of COND_* and COND_LEN_* internal functions

2023-07-31 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard and Richi.
>
> Base on previous disscussions, we should make COND_* and COND_LEN_*
> consistent.
>
> So, this patch define these internal function together by these 2
> wrappers:
>
> #ifndef DEF_INTERNAL_COND_FN
> #define DEF_INTERNAL_COND_FN(NAME, FLAGS, OPTAB, TYPE)
>  \
>   DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##OPTAB, cond_##TYPE)   
>  \
>   DEF_INTERNAL_OPTAB_FN (COND_LEN_##NAME, FLAGS, cond_len_##OPTAB,
>  \
>cond_len_##TYPE)
> #endif
>
> #ifndef DEF_INTERNAL_SIGNED_COND_FN
> #define DEF_INTERNAL_SIGNED_COND_FN(NAME, FLAGS, SELECTOR, SIGNED_OPTAB,  
>  \
>   UNSIGNED_OPTAB, TYPE)  \
>   DEF_INTERNAL_SIGNED_OPTAB_FN (COND_##NAME, FLAGS, SELECTOR, 
>  \
>   cond_##SIGNED_OPTAB, cond_##UNSIGNED_OPTAB,\
>   cond_##TYPE)   \
>   DEF_INTERNAL_SIGNED_OPTAB_FN (COND_LEN_##NAME, FLAGS, SELECTOR, 
>  \
>   cond_len_##SIGNED_OPTAB,   \
>   cond_len_##UNSIGNED_OPTAB, cond_len_##TYPE)
> #endif
>
> Bootstrap and Regression on X86 passed.
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * internal-fn.def (DEF_INTERNAL_COND_FN): New macro.
> (DEF_INTERNAL_SIGNED_COND_FN): Ditto.
> (COND_ADD): Remove.
> (COND_SUB): Ditto.
> (COND_MUL): Ditto.
> (COND_DIV): Ditto.
> (COND_MOD): Ditto.
> (COND_RDIV): Ditto.
> (COND_MIN): Ditto.
> (COND_MAX): Ditto.
> (COND_FMIN): Ditto.
> (COND_FMAX): Ditto.
> (COND_AND): Ditto.
> (COND_IOR): Ditto.
> (COND_XOR): Ditto.
> (COND_SHL): Ditto.
> (COND_SHR): Ditto.
> (COND_FMA): Ditto.
> (COND_FMS): Ditto.
> (COND_FNMA): Ditto.
> (COND_FNMS): Ditto.
> (COND_NEG): Ditto.
> (COND_LEN_ADD): Ditto.
> (COND_LEN_SUB): Ditto.
> (COND_LEN_MUL): Ditto.
> (COND_LEN_DIV): Ditto.
> (COND_LEN_MOD): Ditto.
> (COND_LEN_RDIV): Ditto.
> (COND_LEN_MIN): Ditto.
> (COND_LEN_MAX): Ditto.
> (COND_LEN_FMIN): Ditto.
> (COND_LEN_FMAX): Ditto.
> (COND_LEN_AND): Ditto.
> (COND_LEN_IOR): Ditto.
> (COND_LEN_XOR): Ditto.
> (COND_LEN_SHL): Ditto.
> (COND_LEN_SHR): Ditto.
> (COND_LEN_FMA): Ditto.
> (COND_LEN_FMS): Ditto.
> (COND_LEN_FNMA): Ditto.
> (COND_LEN_FNMS): Ditto.
> (COND_LEN_NEG): Ditto.
> (ADD): New macro define.
> (SUB): Ditto.
> (MUL): Ditto.
> (DIV): Ditto.
> (MOD): Ditto.
> (RDIV): Ditto.
> (MIN): Ditto.
> (MAX): Ditto.
> (FMIN): Ditto.
> (FMAX): Ditto.
> (AND): Ditto.
> (IOR): Ditto.
> (XOR): Ditto.
> (SHL): Ditto.
> (SHR): Ditto.
> (FMA): Ditto.
> (FMS): Ditto.
> (FNMA): Ditto.
> (FNMS): Ditto.
> (NEG): Ditto.

OK, thanks.  (And sorry for the slow review.)

Richard

> ---
>  gcc/internal-fn.def | 123 
>  1 file changed, 56 insertions(+), 67 deletions(-)
>
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 04f3812326e..bf6825c5d00 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -34,10 +34,12 @@ along with GCC; see the file COPYING3.  If not see
>  UNSIGNED_OPTAB, TYPE)
>   DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE)
>   DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE)
> + DEF_INTERNAL_COND_FN (NAME, FLAGS, OPTAB, TYPE)
> + DEF_INTERNAL_SIGNED_COND_FN (NAME, FLAGS, OPTAB, TYPE)
>  
> where NAME is the name of the function, FLAGS is a set of
> ECF_* flags and FNSPEC is a string describing functions fnspec.
> -   
> +
> DEF_INTERNAL_OPTAB_FN defines an internal function that maps to a
> direct optab.  The function should only be called with a given
> set of types if the associated optab is available for the modes
> @@ -74,7 +76,8 @@ along with GCC; see the file COPYING3.  If not see
>  
> - cond_len_unary: a conditional unary optab, such as cond_len_neg
> - cond_len_binary: a conditional binary optab, such as cond_len_add
> -   - cond_len_ternary: a conditional ternary optab, such as 
> cond_len_fma_rev
> +   - cond_len_ternary: a conditional ternary optab, such as
> +   cond_len_fma_rev
>  
> DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that
> maps to one of two optabs, depending on the signedness of an input.
> @@ -106,6 +109,16 @@ along with GCC; see the file COPYING3.  If not see
> These five internal functions will require two optabs each, a SIGNED_OPTAB
> 

Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-07-31 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 31, 2023 at 11:40 AM Richard Biener  wrote:
>
> On Sun, 30 Jul 2023, Uros Bizjak wrote:
>
> > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF
> > named patterns in order to avoid generation of partial vector V4SFmode
> > trapping instructions.
> >
> > The new option is enabled by default, because even with sanitization,
> > a small but consistent speed up of 2 to 3% with Polyhedron capacita
> > benchmark can be achieved vs. scalar code.
> >
> > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
> > vs. scalar code.  This is what clang does by default, as it defaults
> > to -fno-trapping-math.
>
> I like the new option, note you lack invoke.texi documentation where
> I'd also elaborate a bit on the interaction with -fno-trapping-math
> and the possible performance impact then NaNs or denormals leak
> into the upper halves and cross-reference -mdaz-ftz.

Yes, this is my plan (lack of documentation is due to RFC status of
the patch). OTOH, Hongtao has some other ideas in the PR, so I'll wait
with the patch a bit.

Thanks,
Uros.

> Thanks,
> Richard.
>
> > PR target/110832
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.h (TARGET_MMXFP_WITH_SSE): New macro.
> > * config/i386/i386/opt (mmmxfp-with-sse): New option.
> > * config/i386/mmx.md (movq__to_sse): Do not sanitize
> > upper part of V2SFmode register with -fno-trapping-math.
> > (v2sf3): Enable for TARGET_MMXFP_WITH_SSE.
> > (divv2sf3): Ditto.
> > (v2sf3): Ditto.
> > (sqrtv2sf2): Ditto.
> > (*mmx_haddv2sf3_low): Ditto.
> > (*mmx_hsubv2sf3_low): Ditto.
> > (vec_addsubv2sf3): Ditto.
> > (vec_cmpv2sfv2si): Ditto.
> > (vcondv2sf): Ditto.
> > (fmav2sf4): Ditto.
> > (fmsv2sf4): Ditto.
> > (fnmav2sf4): Ditto.
> > (fnmsv2sf4): Ditto.
> > (fix_truncv2sfv2si2): Ditto.
> > (fixuns_truncv2sfv2si2): Ditto.
> > (floatv2siv2sf2): Ditto.
> > (floatunsv2siv2sf2): Ditto.
> > (nearbyintv2sf2): Ditto.
> > (rintv2sf2): Ditto.
> > (lrintv2sfv2si2): Ditto.
> > (ceilv2sf2): Ditto.
> > (lceilv2sfv2si2): Ditto.
> > (floorv2sf2): Ditto.
> > (lfloorv2sfv2si2): Ditto.
> > (btruncv2sf2): Ditto.
> > (roundv2sf2): Ditto.
> > (lroundv2sfv2si2): Ditto.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > Uros.
> >
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


  1   2   >