Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-08-03 Thread Hao Liu OS via Gcc-patches
Gentle ping. Is it OK for master? I'm afraid the ICE may cause trouble and hope it can be fixed ASAP. Thanks, Hao From: Hao Liu OS Sent: Wednesday, August 2, 2023 11:45 To: Richard Sandiford Cc: Richard Biener; GCC-patches@gcc.gnu.org Subject: Re:

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-08-01 Thread Hao Liu OS via Gcc-patches
Hi Richard, Update the patch with a simple case (see below case and comments). It shows a live stmt may not have reduction def, which introduce the ICE. Is it OK for trunk? Fix the assertion failure on empty reduction define in info_for_reduction. Even a stmt is live, it may still have

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-08-01 Thread Hao Liu OS via Gcc-patches
Hi Richard, This is a quick fix to the several ICEs. It seems even STMT_VINFO_LIVE_P is true, some reduct stmts still don't have REDUC_DEF. So I change the check to STMT_VINFO_REDUC_DEF. Is it OK for trunk? --- Fix the ICEs on empty reduction define. Even STMT_VINFO_LIVE_P is true, some

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-31 Thread Hao Liu OS via Gcc-patches
Sure, the helper makes the code simpler. I'll test the new patch and push if there is no other issue. Thanks, Hao From: Richard Sandiford Sent: Monday, July 31, 2023 17:11 To: Hao Liu OS Cc: Richard Biener; GCC-patches@gcc.gnu.org Subject: Re: [PATCH]

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-30 Thread Hao Liu OS via Gcc-patches
ency = MAX (ops->reduction_latency, base); > > Thanks, > Hao > > > From: Richard Sandiford > Sent: Wednesday, July 26, 2023 17:14 > To: Richard Biener > Cc: Hao Liu OS; GCC-patches@gcc.gnu.org > Subject: Re: [PATCH] AArch

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-28 Thread Hao Liu OS via Gcc-patches
_ From: Richard Sandiford Sent: Wednesday, July 26, 2023 17:14 To: Richard Biener Cc: Hao Liu OS; GCC-patches@gcc.gnu.org Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Richard Biener writes: > On Wed,

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-26 Thread Hao Liu OS via Gcc-patches
t: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Richard Biener writes: > On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches > wrote: >> >> > When was STMT_VINFO_REDUC_DEF empty? I just want to make s

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-25 Thread Hao Liu OS via Gcc-patches
> When was STMT_VINFO_REDUC_DEF empty? I just want to make sure that we're not > papering over an issue elsewhere. Yes, I also wonder if this is an issue in vectorizable_reduction. Below is the the gimple of "gcc.target/aarch64/sve/cost_model_13.c": : # res_18 = PHI # i_20 = PHI _1

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-25 Thread Hao Liu OS via Gcc-patches
Hi, Thanks for the suggestion. I tested it and found a gcc_assert failure: gcc.target/aarch64/sve/cost_model_13.c (internal compiler error: in info_for_reduction, at tree-vect-loop.cc:5473) It is caused by empty STMT_VINFO_REDUC_DEF. So, I added an extra check before checking

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-23 Thread Hao Liu OS via Gcc-patches
Hi Richard, Gentle ping. Is it ok for trunk? Or, you will have patch covering such fix? Thanks, -Hao From: Hao Liu OS Sent: Wednesday, July 19, 2023 12:33 To: GCC-patches@gcc.gnu.org Cc: richard.sandif...@arm.com Subject: [PATCH] AArch64: Do not

[PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-18 Thread Hao Liu OS via Gcc-patches
This only affects the new costs in aarch64 backend. Currently, the reduction latency of vector body is too large as it is multiplied by stmt count. As the scalar reduction latency is small, the new costs model may think "scalar code would issue more quickly" and increase the vector body cost a

Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-06 Thread Hao Liu OS via Gcc-patches
r; Hao Liu OS Cc: GCC-patches@gcc.gnu.org Subject: Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449) On 7/6/23 06:44, Richard Biener via Gcc-patches wrote: > On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches >

[PATCH] Vect: select small VF for epilog of unrolled loop (PR tree-optimization/110474)

2023-07-05 Thread Hao Liu OS via Gcc-patches
Hi, If a loop is unrolled during vectorization (i.e. suggested_unroll_factor > 1), the VFs of both main and epilog loop are enlarged. The epilog vect loop is specific for a loop with small iteration counts, so a large VF may hurt performance. This patch unscales the main loop VF by

[PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-05 Thread Hao Liu OS via Gcc-patches
Hi, If a loop is unrolled by n times during vectoriation, two steps are used to calculate the induction variable: - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step) - The large step for the whole loop: vec_loop = vec_iv + (VF * Step) This patch calculates an extra

[PATCH] Vect: avoid using uninitialized variable (PR tree-optimization/110531)

2023-07-04 Thread Hao Liu OS via Gcc-patches
slp_done_for_suggested_uf is used in vect_analyze_loop_2 without initialization, which is undefined behavior. Initialize it to false according to the discussion. gcc/ChangeLog: PR tree-optimization/110531 * tree-vect-loop.cc (vect_analyze_loop_1): initialize

RE: Add libcody

2020-12-21 Thread Hao Liu OS via Gcc-patches
Hi Nathan, This patch causes a build failure on CentOS. More information: https://gcc.gnu.org/bugzilla//show_bug.cgi?id=98318#c3 Thanks, -Hao > -Original Message- > From: Gcc-patches On Behalf Of Nathan > Sidwell > Sent: Tuesday, December 15, 2020 11:46 PM > To: GCC Patches >

Re: [PATCH] extend cselim to check non-trapping for more references (PR tree-optimizaton/89430)

2020-06-04 Thread Hao Liu OS via Gcc-patches
Hi Jakub, I've updated the incorrect ChangLog. gcc/: PR tree-optimization/89430 * tree-ssa-phiopt.c (struct name_to_bb): Rename to ref_to_bb; add a new field exp; remove ssa_name_ver, store, offset fields. (struct ssa_names_hasher): Rename to

Re: [PATCH] extend cselim to check non-trapping for more references (PR tree-optimizaton/89430)

2020-06-03 Thread Hao Liu OS via Gcc-patches
Hi All, The patch is refactored a little according to the last comment. Do you have more comments? If no, I will commit it later. Tested on X86_64 and AArch64. gcc/: PR tree-optimization/89430 * tree-ssa-phiopt.c (cond_store_replacement): Extend non-trap checking to

RE: [PATCH] extend cselim to check non-trapping for more references (PR tree-optimizaton/89430)

2020-05-29 Thread Hao Liu OS via Gcc-patches
Hi Richard, Thanks for your comments. It's a good idea to simplify the code and remove get_inner_reference. I've updated the patch accordingly. I also simplified the code to ignore other loads, which can not help to check if a store can be trapped. About tests: 1. All previously XFAIL

[PATCH] extend cselim to check non-trapping for more references (PR tree-optimizaton/89430)

2020-05-26 Thread Hao Liu OS via Gcc-patches
Hi all, Previously, the fix for PR89430 was reverted by PR94734 due to a bug. The root cause is missing non-trapping check with dominating LOAD/STORE. This patch extends the cselim