Re: [PATCH] Improve code generation of strided SLP loads

2024-06-11 Thread Richard Sandiford
Richard Biener writes: > This avoids falling back to elementwise accesses for strided SLP > loads when the group size is not a multiple of the vector element > size. Instead we can use a smaller vector or integer type for the load. > > For stores we can do the same though restrictions on stores

Re: [PATCH] tree-optimization/115385 - handle more gaps with peeling of a single iteration

2024-06-11 Thread Richard Sandiford
Don't think it makes any difference, but: Richard Biener writes: > @@ -2151,7 +2151,16 @@ get_group_load_store_type (vec_info *vinfo, > stmt_vec_info stmt_info, >access excess elements. >??? Enhancements include peeling multiple iterations >

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-11 Thread Richard Sandiford
Ajit Agarwal writes: > Hello Richard: > > On 11/06/24 9:41 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>>>> Thanks a lot. Can I know what should we be doing with neg (fma) >>>>> correctness failures with load fusion. >>>> &

Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-06-11 Thread Richard Sandiford
Robin Dapp writes: >> I was looking at the code in more detail and just wanted to check. >> We have: >> >> int last_needs_comparison = -1; >> >> bool ok = noce_convert_multiple_sets_1 >> (if_info, _no_cmov, _src, , , >> _insns, _needs_comparison); >> if (!ok) >> return

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-11 Thread Richard Sandiford
Ajit Agarwal writes: >>> Thanks a lot. Can I know what should we be doing with neg (fma) >>> correctness failures with load fusion. >> >> I think it would involve: >> >> - describing lxvp and stxvp as unspec patterns, as I mentioned >> in the previous reply >> >> - making plain movoo split

Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-06-11 Thread Richard Sandiford
Robin Dapp writes: > The attached v3 tracks the use of cond_earliest as you suggested > and adds its cost in default_noce_conversion_profitable_p. > > Bootstrapped and regtested on x86 and p10, aarch64 still > running. Regtested on riscv64. > > Regards > Robin > > Before noce_find_if_block

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-11 Thread Richard Sandiford
Ajit Agarwal writes: > On 11/06/24 7:07 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>> Hello Richard: >>> On 11/06/24 6:12 pm, Richard Sandiford wrote: >>>> Ajit Agarwal writes: >>>>> Hello Richard: >>>>> >>&g

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-11 Thread Richard Sandiford
Ajit Agarwal writes: > Hello Richard: > On 11/06/24 6:12 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>> Hello Richard: >>> >>> On 11/06/24 5:15 pm, Richard Sandiford wrote: >>>> Ajit Agarwal writes: >>>>> Hello Ri

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-11 Thread Richard Sandiford
Ajit Agarwal writes: > Hello Richard: > > On 11/06/24 5:15 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>> Hello Richard: >>> On 11/06/24 4:56 pm, Ajit Agarwal wrote: >>>> Hello Richard: >>>> >>>> On 11/06/24 4:36 pm,

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-11 Thread Richard Sandiford
Ajit Agarwal writes: > Hello Richard: > On 11/06/24 4:56 pm, Ajit Agarwal wrote: >> Hello Richard: >> >> On 11/06/24 4:36 pm, Richard Sandiford wrote: >>> Ajit Agarwal writes: >>>>>>>> After LRA reload: >>>>>>>&g

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-11 Thread Richard Sandiford
Ajit Agarwal writes: > After LRA reload: > > (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] > [240]) > (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285]) > (const_int 16 [0x10])) [1 MEM > [(real(kind=8)

Re: [PATCH] Add SLP_TREE_MEMORY_ACCESS_TYPE

2024-06-11 Thread Richard Sandiford
Richard Biener writes: > It turns out target costing code looks at STMT_VINFO_MEMORY_ACCESS_TYPE > to identify operations from (emulated) gathers for example. This > doesn't work for SLP loads since we do not set STMT_VINFO_MEMORY_ACCESS_TYPE > there as the vectorization strathegy might differ

Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-06-10 Thread Richard Sandiford
Robin Dapp writes: >> Is there any way we can avoid using pattern_cost here? Using it means >> that we can make use of targetm.insn_cost for the jump but circumvent >> it for the condition, giving a bit of a mixed metric. >> >> (I realise there are existing calls to pattern_cost in ifcvt.cc, >>

Re: [PATCH v3 6/6] aarch64: Add DLL import/export to AArch64 target

2024-06-10 Thread Richard Sandiford
Thanks for the update. Parts 1-5 look good to me. Some minor comments below about part 6: Evgeny Karpov writes: > This patch reuses the MinGW implementation to enable DLL import/export > functionality for the aarch64-w64-mingw32 target. It also modifies > environment configurations for MinGW.

Re: [PATCH v2] vect: Merge loop mask and cond_op mask in fold-left, reduction [PR115382].

2024-06-10 Thread Richard Sandiford
Robin Dapp writes: >> Actually, as Richard mentioned in the PR, it would probably be better >> to use prepare_vec_mask instead. It should work in this context too >> and would avoid redundant double masking. > > Attached is v2 that uses prepare_vec_mask. > > Regtested on riscv64 and

Re: [PATCH] internal-fn: Force to reg if operand doesn't match.

2024-06-10 Thread Richard Sandiford
Richard Biener writes: > On Mon, Jun 10, 2024 at 9:35 AM Robin Dapp wrote: >> >> Hi, >> >> despite looking good on cfarm185 and Linaro's pre-commit CI >> gcc-15-638-g7ca35f2e430 now appears to have caused several >> regressions on arm-eabi cortex-m55 as found by Linaro's CI: >> >>

Re: [PATCH] aarch64: Add fcsel to cmov integer and csel to float cmov [PR98477]

2024-06-10 Thread Richard Sandiford
Andrew Pinski writes: > This patch adds an alternative to the integer cmov and one to floating > point cmov so we avoid in some more moving > > PR target/98477 > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (*cmov_insn[GPI]): Add 'w' > alternative. > (*cmov_insn[GPF]):

Re: [PATCH] aarch64: Add vector floating point trunc pattern

2024-06-10 Thread Richard Sandiford
Pengxuan Zheng writes: > This patch is a follow-up of r15-1079-g230d62a2cdd16c to add vector floating > point trunc pattern for V2DF->V2SF and V4SF->V4HF conversions by renaming the > existing aarch64_float_truncate_lo_ pattern to the > standard > optab one, i.e., trunc2. This allows the

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-10 Thread Richard Sandiford
Ajit Agarwal writes: > On 10/06/24 3:20 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>> On 10/06/24 2:52 pm, Richard Sandiford wrote: >>>> Ajit Agarwal writes: >>>>> On 10/06/24 2:12 pm, Richard

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-10 Thread Richard Sandiford
Ajit Agarwal writes: > Hello Richard: > > On 10/06/24 2:52 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>> On 10/06/24 2:12 pm, Richard Sandiford wrote: >>>> Ajit Agarwal writes: >>>>>>>>>>>>> + >>&

Re: [PATCH] vect: Merge loop mask and cond_op mask in fold-left, reduction.

2024-06-10 Thread Richard Sandiford
Richard Sandiford writes: > Robin Dapp writes: >> Hi, >> >> currently we discard the cond-op mask when the loop is fully masked >> which causes wrong code in >> gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c >> when compiled with >> -O3 -march=

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-10 Thread Richard Sandiford
Ajit Agarwal writes: > On 10/06/24 2:12 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>>>>>>>>>> + >>>>>>>>>>> + rtx set = single_set (insn); >>>>>>>>>>> + i

Re: [PATCH] vect: Merge loop mask and cond_op mask in fold-left, reduction.

2024-06-10 Thread Richard Sandiford
Robin Dapp writes: > Hi, > > currently we discard the cond-op mask when the loop is fully masked > which causes wrong code in > gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > when compiled with > -O3 -march=cascadelake --param vect-partial-vector-usage=2. > > This patch ANDs both masks

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-10 Thread Richard Sandiford
Ajit Agarwal writes: > + > + rtx set = single_set (insn); > + if (set == NULL_RTX) > + return false; > + > + rtx op0 = SET_SRC (set); > + rtx_code code = GET_CODE (op0); > + > + // This

Re: [RFC/RFA] [PATCH 06/12] aarch64: Implement new expander for efficient CRC computation

2024-06-08 Thread Richard Sandiford
Mariam Arutunian writes: > This patch introduces two new expanders for the aarch64 backend, > dedicated to generate optimized code for CRC computations. > The new expanders are designed to leverage specific hardware capabilities > to achieve faster CRC calculations, > particularly using the pmul

Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-06-08 Thread Richard Sandiford
Thanks a lot for doing this! It's a really nice series. Just had a comment on the long division helper: Mariam Arutunian writes: > +/* Return the quotient of polynomial long division of x^2N by POLYNOMIAL > + in GF (2^N). */ It looks like there might be an off-by-one discrepancy between

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-07 Thread Richard Sandiford
Ajit Agarwal writes: >>> + >>> + df_ref use; >>> + df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ()); >>> + FOR_EACH_INSN_INFO_DEF (use, insn_info) >>> +{ >>> + struct df_link *def_link = DF_REF_CHAIN (use); >>> + >>> + if (!def_link ||

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-06 Thread Richard Sandiford
Ajit Agarwal writes: > On 06/06/24 8:03 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>> On 06/06/24 2:28 pm, Richard Sandiford wrote: >>>> Hi, >>>> >>>> Just some comments on the fuseable_load_p part, since that's what >>>&

Re: [PATCH V2] aarch64: Add missing ACLE macro for NEON-SVE Bridge

2024-06-06 Thread Richard Sandiford
config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros): > Add missing __ARM_NEON_SVE_BRIDGE. > > On 6/6/24 13:20, Richard Sandiford wrote: >> Richard Ball writes: >>> __ARM_NEON_SVE_BRIDGE was missed in the original patch and is >>> added by

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-06 Thread Richard Sandiford
Ajit Agarwal writes: > On 06/06/24 2:28 pm, Richard Sandiford wrote: >> Hi, >> >> Just some comments on the fuseable_load_p part, since that's what >> we were discussing last time. >> >> It looks like this now relies on: >> >> Ajit Agarwal w

Re: [PATCH] aarch64: Add fix_truncv4sfv4hi2 pattern [PR113882]

2024-06-06 Thread Richard Sandiford
Pengxuan Zheng writes: > This patch adds the fix_truncv4sfv4hi2 (V4SF->V4HI) pattern which is > implemented > using fix_truncv4sfv4si2 (V4SF->V4SI) and then truncv4siv4hi2 (V4SI->V4HI). > > PR target/113882 > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md (fix_truncv4sfv4hi2):

Re: [PATCH] aarch64: Add missing ACLE macro for NEON-SVE Bridge

2024-06-06 Thread Richard Sandiford
Richard Ball writes: > __ARM_NEON_SVE_BRIDGE was missed in the original patch and is > added by this patch. > > Ok for trunk and a backport into gcc-14? > > gcc/ChangeLog: > > * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): > Add missing __ARM_NEON_SVE_BRIDGE. After this

Re: [PATCH]AArch64: correct constraint on Upl early clobber alternatives

2024-06-06 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > I made an oversight in the previous patch, where I added a ?Upa > alternative to the Upl cases. This causes it to create the tie > between the larger register file rather than the constrained one. > > This fixes the affected patterns. > > Bootstrapped

Re: [PATCH v2] aarch64: Add vector floating point extend pattern [PR113880, PR113869]

2024-06-06 Thread Richard Sandiford
Pengxuan Zheng writes: > This patch adds vector floating point extend pattern for V2SF->V2DF and > V4HF->V4SF conversions by renaming the existing > aarch64_float_extend_lo_ > pattern to the standard optab one, i.e., extend2. This allows the > vectorizer to vectorize certain floating point

Re: [PATCH v2 1/2] driver: Use -as/ld/objcopy as final fallback instead of native ones for cross

2024-06-06 Thread Richard Sandiford
YunQiang Su writes: > YunQiang Su 于2024年5月29日周三 10:02写道: >> >> Richard Sandiford 于2024年5月29日周三 05:28写道: >> > >> > YunQiang Su writes: >> > > If `find_a_program` cannot find `as/ld/objcopy` and we are a cross >> > > toolchain, >>

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-06 Thread Richard Sandiford
Hi, Just some comments on the fuseable_load_p part, since that's what we were discussing last time. It looks like this now relies on: Ajit Agarwal writes: > + /* We use DF data flow because we change location rtx > + which is easier to find and modify. > + We use mix of rtl-ssa

Re: [PATCH] expmed: TRUNCATE value1 if needed in store_bit_field_using_insv

2024-06-05 Thread Richard Sandiford
YunQiang Su writes: > Richard Sandiford 于2024年6月5日周三 23:20写道: >> >> YunQiang Su writes: >> > Richard Sandiford 于2024年6月5日周三 22:14写道: >> >> >> >> YunQiang Su writes: >> >> > PR target/113179. >> >> > >> >

Re: [PATCH] expmed: TRUNCATE value1 if needed in store_bit_field_using_insv

2024-06-05 Thread Richard Sandiford
YunQiang Su writes: > Richard Sandiford 于2024年6月5日周三 22:14写道: >> >> YunQiang Su writes: >> > PR target/113179. >> > >> > In `store_bit_field_using_insv`, we just use SUBREG if value_mode >> >>= op_mode, while in some ports, a sign_extend wil

Re: [PATCH] expmed: TRUNCATE value1 if needed in store_bit_field_using_insv

2024-06-05 Thread Richard Sandiford
YunQiang Su writes: > PR target/113179. > > In `store_bit_field_using_insv`, we just use SUBREG if value_mode >>= op_mode, while in some ports, a sign_extend will be needed, > such as MIPS64: > If either GPR rs or GPR rt does not contain sign-extended 32-bit > values (bits 63..31 equal), then

Re: PATCH] AArch64: Fix cpu features initialization [PR115342]

2024-06-05 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Richard, > >>> Essentially anything covered by HWCAP doesn't need an explicit check. So I >>> kept >>> the LS64 and PREDRES checks since they don't have a HWCAP allocated (I'm not >>> entirely convinced we need these, let alone having 3 individual bits for >>> LS64,

Re: [PATCH] [RFC] lower SLP load permutation to interleaving

2024-06-05 Thread Richard Sandiford
Richard Biener writes: > On Tue, 4 Jun 2024, Richard Sandiford wrote: > >> Richard Biener writes: >> > The following emulates classical interleaving for SLP load permutes >> > that we are unlikely handling natively. This is to handle cases >> >

Re: [PATCH v4 1/3] [RFC] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets

2024-06-05 Thread Richard Sandiford
Sorry for the slow review. Manolis Tsamis writes: > This is an extension of what was done in PR106590. > > Currently if a sequence generated in noce_convert_multiple_sets clobbers the > condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards > (sequences that emit the comparison

Re: [PATCH] libgcc/aarch64: also provide AT_HWCAP2 fallback

2024-06-05 Thread Richard Sandiford
Jan Beulich writes: > Much like AT_HWCAP is already provided in case the platform headers > don't have the value (yet). > > libgcc/ > > * config/aarch64/cpuinfo.c: Provide AT_HWCAP2. OK for trunk and GCC 14. Thanks, Richard > --- > Observed as build failure with 14.1.0, so may want

Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-06-05 Thread Richard Sandiford
Hongyu Wang writes: > CC'd Richard for ccmp part as previously it is added only for aarch64. > The original logic will not interrupted since if > aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also > success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the >

Re: [PATCH-1v2] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-05 Thread Richard Sandiford
HAO CHEN GUI writes: > Hi, > This patch replaces rtx_cost with insn_cost in forward propagation. > In the PR, one constant vector should be propagated and replace a > pseudo in a store insn if we know it's a duplicated constant vector. > It reduces the insn cost but not rtx cost. In this case,

Re: [PATCH v1 0/6] Add DLL import/export implementation to AArch64

2024-06-05 Thread Richard Sandiford
Evgeny Karpov writes: > Richard and Uros, could you please review the changes for v2? > Additionally, we have detected an issue with GCC GC in winnt-dll.cc. The fix > will be included in v2. Would it be possible to have a more "purposeful" name than CMODEL_IS_NOT_LARGE_OR_MEDIUM_PIC? What's

Re: [PATCH] [RFC] lower SLP load permutation to interleaving

2024-06-04 Thread Richard Sandiford
Richard Biener writes: > The following emulates classical interleaving for SLP load permutes > that we are unlikely handling natively. This is to handle cases > where interleaving (or load/store-lanes) is the optimal choice for > vectorizing even when we are doing that within SLP. An example >

Re: PATCH] AArch64: Fix cpu features initialization [PR115342]

2024-06-04 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Richard, > > I've reworded the commit message a bit: > > The CPU features initialization code uses CPUID registers (rather than > HWCAP). The equality comparisons it uses are incorrect: for example FEAT_SVE > is not set if SVE2 is available. Using HWCAPs for these is

Re: PATCH] AArch64: Fix cpu features initialization [PR115342]

2024-06-04 Thread Richard Sandiford
Wilco Dijkstra writes: > Fix CPU features initialization. Use HWCAP rather than explicit accesses > to CPUID registers. Perform the initialization atomically to avoid multi- > threading issues. Please describe the problem that the patch is fixing. I think the PR description would make a

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Richard Sandiford
Ajit Agarwal writes: > Hello Richard: > > On 03/06/24 7:47 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>> On 03/06/24 5:03 pm, Richard Sandiford wrote: >>>> Ajit Agarwal writes: >>>>>> [...] >>>>>>

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Richard Sandiford
Ajit Agarwal writes: > On 03/06/24 5:03 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>>> [...] >>>> If it is intentional, what distinguishes things like vperm and xxinsertw >>>> (and all other unspecs) from plain addition? >>>>

Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-06-03 Thread Richard Sandiford
Robin Dapp writes: > Hi, > > before noce_find_if_block processes a block it sets up an if_info > structure that holds the original costs. At that point the costs of > the then/else blocks have not been added so we only care about the > "if" cost. > > The code originally used BRANCH_COST for that

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Richard Sandiford
Ajit Agarwal writes: >> [...] >> If it is intentional, what distinguishes things like vperm and xxinsertw >> (and all other unspecs) from plain addition? >> >> [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa") >> (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")

Re: [PATCH 36/52] aarch64: New hook implementation aarch64_c_mode_for_floating_type

2024-06-03 Thread Richard Sandiford
Kewen Lin writes: > This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE > defines in aarch64 port, and add new port specific hook > implementation aarch64_c_mode_for_floating_type. > > gcc/ChangeLog: > > * config/aarch64/aarch64.cc (aarch64_c_mode_for_floating_type): > New

Re: [PATCH] aarch64: adjust enum writeback after rename

2024-06-03 Thread Richard Sandiford
Marc Poulhiès writes: > gcc/ChangeLog: > > * config/aarch64/aarch64-ldp-fusion.cc (struct aarch64_pair_fusion): > Use new type name. > --- > My previous change fixed the generic code, but I forgot to adjust the > overload in aarch64. > > I don't have an aarch64 setup to check it

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Richard Sandiford
Ajit Agarwal writes: > Hello Richard: > On 31/05/24 8:08 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>> On 31/05/24 3:23 pm, Richard Sandiford wrote: >>>> Ajit Agarwal writes: >>>>> Hello All: >>>>> >>>>

Re: [PATCH] pair-fusion: fix for older GCC

2024-06-03 Thread Richard Sandiford
Marc Poulhiès writes: > Older GCCs fail with: > > .../gcc/pair-fusion.cc: In member function ‘bool > pair_fusion_bb_info::fuse_pair(bool, unsigned int, int, rtl_ssa::insn_info*, > rtl_ssa::in > sn_info*, base_cand&, const rtl_ssa::insn_range_info&)’: > .../gcc/pair-fusion.cc:1790:40:

Re: [Patch, aarch64, middle-end\ v4: Move pair_fusion pass from aarch64 to middle-end

2024-05-31 Thread Richard Sandiford
Marc Poulhiès writes: > Hello, > > I can't bootstrap using gcc 5.5 since this change. It fails with: > > .../gcc/pair-fusion.cc: In member function ‘bool > pair_fusion_bb_info::fuse_pair(bool, unsigned int, int, rtl_ssa::insn_info*, > rtl_ssa::in > sn_info*, base_cand&, const

Re: [PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Richard, > >> I think this should be in a push_options/pop_options block, as for other >> intrinsics that require certain features. > > But then the intrinsic would always be defined, which is contrary to what the > ACLE spec demands - it would not give a compilation

Re: [PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Richard Sandiford
Wilco Dijkstra writes: > Add __ARM_FEATURE_MOPS predefine. Add support for ACLE __arm_mops_memset_tag. > > Passes regress, OK for commit? > > gcc: > * config/aaarch64/aarch64-c.cc (aarch64_update_cpp_builtins): > Add __ARM_FEATURE_MOPS predefine. > *

Re: [PATCH] testsuite: Improve check-function-bodies

2024-05-31 Thread Richard Sandiford
Wilco Dijkstra writes: > Improve check-function-bodies by allowing single-character function names. > Also skip '#' comments which may be emitted from inline assembler. > > Passes regress, OK for commit? > > gcc/testsuite: > * lib/scanasm.exp (configure_check-function-bodies): Allow

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Richard Sandiford
Ajit Agarwal writes: > On 31/05/24 3:23 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>> Hello All: >>> >>> Common infrastructure using generic code for pair mem fusion of different >>> targets. >>> >>> rs6000 tar

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Richard Sandiford
Reviewing my review :) Richard Sandiford writes: >> + >> + for (auto def : info->defs ()) >> +{ >> + auto set = dyn_cast (def); >> + if (set && set->has_any_uses ()) >> +{ >> + for (auto use : set->all_us

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Richard Sandiford
Ajit Agarwal writes: > Hello All: > > Common infrastructure using generic code for pair mem fusion of different > targets. > > rs6000 target specific specific code implements virtual functions defined > by generic code. > > Code is implemented with pure virtual functions to interface with target

Re: [PATCH 2/4] resource.cc: Replace calls to find_basic_block with cfgrtl BLOCK_FOR_INSN

2024-05-31 Thread Richard Sandiford
Hans-Peter Nilsson writes: > [...] > (Not-so-)fun fact: add_insn_after takes a bb parameter which > reorg.cc always passes as NULL. But - the argument is > *always ignored* and the bb in the "after" insn is used. > I traced that ignored parameter as far as > r0-81421-g6fb5fa3cbc0d78 "Merge

Re: [PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-05-31 Thread Richard Sandiford
Jakub Jelinek writes: > On Fri, May 31, 2024 at 08:45:54AM +0100, Richard Sandiford wrote: >> > When you say same way, do you mean the way SVE ABI defines the rules for >> > SVE types? >> >> No, sorry, I meant that if the choice isn't purely local to a sourc

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Richard Sandiford
Segher Boessenkool writes: > Hi! > > On Fri, May 31, 2024 at 01:21:44AM +0530, Ajit Agarwal wrote: >> Code is implemented with pure virtual functions to interface with target >> code. > > It's not a pure function. A pure function -- by definition -- has no > side effects. These things have side

Re: [PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-05-31 Thread Richard Sandiford
Tejas Belagod writes: > On 5/30/24 6:28 PM, Richard Sandiford wrote: >> Tejas Belagod writes: >>> Currently poly-int type structures are passed by value to OpenMP runtime >>> functions for shared clauses etc. This patch improves on this by passing >>> a

Re: [PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores.

2024-05-30 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2. > It is kept off for generic codegen. > > Note the reason for the +sve even though they are in aarch64-sve.exp is if the > testsuite is ran with a forced SVE off option, e.g.

Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-30 Thread Richard Sandiford
Tamar Christina writes: > [...] > @@ -6651,8 +6661,10 @@ (define_insn "and3" > (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") > (match_operand:PRED_ALL 2 "register_operand")))] >"TARGET_SVE" > - {@ [ cons: =0, 1 , 2 ] > - [ Upa , Upa, Upa ]

Re: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-05-30 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Tamar Christina >> Sent: Wednesday, May 22, 2024 10:29 AM >> To: Richard Sandiford >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; ktkac...@gcc.gnu.org &

Re: [PATCH 00/11] AArch64/OpenMP: Test SVE ACLE types with various OpenMP constructs.

2024-05-30 Thread Richard Sandiford
Tejas Belagod writes: > Note: This patch series is based on Richard's initial patch > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606741.html > and Jakub's suggestion > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611892.html > > The following patch series handles

Re: [PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-05-30 Thread Richard Sandiford
Tejas Belagod writes: > Currently poly-int type structures are passed by value to OpenMP runtime > functions for shared clauses etc. This patch improves on this by passing > around poly-int structures by address to avoid copy-overhead. > > gcc/ChangeLog > * omp-low.c

Re: [PATCH 03/11] AArch64: Diagnose OpenMP offloading when SVE types involved.

2024-05-30 Thread Richard Sandiford
Tejas Belagod writes: > The target clause in OpenMP is used to offload loop kernels to accelarator > peripeherals. target's 'map' clause is used to move data from and to the > accelarator. When the data is SVE type, it may not be suitable because of > various reasons i.e. the two SVE targets

Re: [PATCH 02/11] AArch64: Add test cases for SVE types in OpenMP shared clause.

2024-05-30 Thread Richard Sandiford
Tejas Belagod writes: > This patch tests various shared clauses with SVE types. It also adds a test > scaffold to run OpenMP tests in under the gcc.target testsuite. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/omp/aarch64-sve-omp.exp: New scaffold. Hopefully Jakub can

Re: [Patch, aarch64, middle-end\ v4: Move pair_fusion pass from aarch64 to middle-end

2024-05-30 Thread Richard Sandiford
Thanks for the update. Some comments below, but looks very close to ready. Ajit Agarwal writes: > diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc > new file mode 100644 > index 000..060fd95 > --- /dev/null > +++ b/gcc/pair-fusion.cc > @@ -0,0 +1,3012 @@ > +// Pass to fuse

Re: [PATCH] aarch64: testsuite: Explicitly add -mlittle-endian to vget_low_2.c

2024-05-30 Thread Richard Sandiford
Pengxuan Zheng writes: > vget_low_2.c is a test case for little-endian, but we missed the > -mlittle-endian > flag in r15-697-ga2e4fe5a53cf75. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vget_low_2.c: Add -mlittle-endian. Ok, thanks. If you'd like write access, please follow

Re: [PATCH] aarch64: Add vector floating point extend patterns [PR113880, PR113869]

2024-05-30 Thread Richard Sandiford
Pengxuan Zheng writes: > This patch improves vectorization of certain floating point widening > operations > for the aarch64 target by adding vector floating point extend patterns for > V2SF->V2DF and V4HF->V4SF conversions. > > PR target/113880 > PR target/113869 > > gcc/ChangeLog:

[PATCH] ira: Fix go_through_subreg offset calculation [PR115281]

2024-05-30 Thread Richard Sandiford
go_through_subreg used: else if (!can_div_trunc_p (SUBREG_BYTE (x), REGMODE_NATURAL_SIZE (GET_MODE (x)), offset)) to calculate the register offset for a pseudo subreg x. In the blessed days before poly-int, this was: *offset = (SUBREG_BYTE (x) /

[PATCH] aarch64: Split aarch64_combinev16qi before RA [PR115258]

2024-05-29 Thread Richard Sandiford
Two-vector TBL instructions are fed by an aarch64_combinev16qi, whose purpose is to put the two input data vectors into consecutive registers. This aarch64_combinev16qi was then split after reload into individual moves (from the first input to the first half of the output, and from the second

Re: [PATCH] tree-optimization/115252 - enhance peeling for gaps avoidance

2024-05-29 Thread Richard Sandiford
Richard Biener writes: > Code generation for contiguous load vectorization can already deal > with generalized avoidance of loading from a gap. The following > extends detection of peeling for gaps requirement with that, > gets rid of the old special casing of a half load and makes sure > when

Re: [PATCH 1/5] Do single-lane SLP discovery for reductions

2024-05-29 Thread Richard Sandiford
Richard Biener writes: > On Fri, 24 May 2024, Richard Biener wrote: > >> This is the second merge proposed from the SLP vectorizer branch. >> I have again managed without adding and using --param vect-single-lane-slp >> but instead this provides always enabled functionality. >> >> This makes us

Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Richard Sandiford
HAO CHEN GUI writes: > Hi, > This patch adds an optab for __builtin_isfinite. The finite check can be > implemented on rs6000 by a single instruction. It needs an optab to be > expanded to the certain sequence of instructions. > > The subsequent patches will implement the expand on rs6000. >

Re: [PATCH v3] tree-ssa-pre.c/115214(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE when deal special cases.

2024-05-28 Thread Richard Sandiford
Richard Biener writes: > On Mon, May 27, 2024 at 9:48 AM Jiawei wrote: >> >> Return NULL_TREE when genop3 equal EXACT_DIV_EXPR. >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652641.html >> >> version log v3: remove additional POLY_INT_CST check. >>

Re: [PATCH v2 1/2] driver: Use -as/ld/objcopy as final fallback instead of native ones for cross

2024-05-28 Thread Richard Sandiford
YunQiang Su writes: > If `find_a_program` cannot find `as/ld/objcopy` and we are a cross toolchain, > the final fallback is `as/ld` of system. In fact, we can have a try with > -as/ld/objcopy before fallback to native as/ld/objcopy. > > This patch is derivatived from Debian's patch: >

Re: [PATCH] attribs: Fix and refactor diag_attr_exclusions

2024-05-28 Thread Richard Sandiford
Andrew Carlotti writes: > The existing implementation of this function was convoluted, and had > multiple control flow errors that became apparent to me while reading > the code: > > 1. The initial early return only checked the properties of the first > exclusion in the list, when these

[PATCH] vect: Fix access size alignment assumption [PR115192]

2024-05-24 Thread Richard Sandiford
create_intersect_range_checks checks whether two access ranges a and b are alias-free using something equivalent to: end_a <= start_b || end_b <= start_a It has two ways of doing this: a "vanilla" way that calculates the exact exclusive end pointers, and another way that uses the last

Re: [PATCH] aarch64: Fold vget_high_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-22 Thread Richard Sandiford
Pengxuan Zheng writes: > This patch is a follow-up of r15-697-ga2e4fe5a53cf75 to also fold vget_high_* > intrinsics to BIT_FILED_REF and remove the vget_high_* definitions from > arm_neon.h to use the new intrinsics framework. > > PR target/102171 > > gcc/ChangeLog: > > *

Re: [PATCH v1 5/6] Adjust DLL import/export implementation for AArch64

2024-05-22 Thread Richard Sandiford
Evgeny Karpov writes: > The DLL import/export mingw implementation, originally from ix86, requires > minor adjustments to be compatible with AArch64. > > gcc/ChangeLog: > > * config/mingw/mingw32.h (defined): Use the correct DllMainCRTStartup > entry function. > *

Re: [PATCH v1 4/6] aarch64: Add selectany attribute handling

2024-05-22 Thread Richard Sandiford
Evgeny Karpov writes: > This patch extends the aarch64 attributes list with the selectany > attribute for the aarch64-w64-mingw32 target and reuses the mingw > implementation to handle it. > > * config/aarch64/aarch64.cc: > Extend the aarch64 attributes list. > *

Re: [PATCH v1 3/6] Rename functions for reuse in AArch64

2024-05-22 Thread Richard Sandiford
Evgeny Karpov writes: > This patch renames functions related to dllimport/dllexport > and selectany functionality. These functions will be reused > in the aarch64-w64-mingw32 target. > > gcc/ChangeLog: > > * config/i386/cygming.h (mingw_pe_record_stub): > Rename functions in mingw

Re: [PATCH 4/4] Testsuite updates

2024-05-22 Thread Richard Sandiford
Richard Biener writes: > On Tue, 21 May 2024, Richard Biener wrote: > >> The gcc.dg/vect/slp-12a.c case is interesting as we currently split >> the 8 store group into lanes 0-5 which we SLP with an unroll factor >> of two (on x86-64 with SSE) and the remaining two lanes are using >> interleaving

Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-22 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Wednesday, May 22, 2024 10:48 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; ktkac...@gcc.gnu.org &

Re: [PATCH v1 2/6] Extract ix86 dllimport implementation to mingw

2024-05-22 Thread Richard Sandiford
Evgeny Karpov writes: > This patch extracts the ix86 implementation for expanding a SYMBOL > into its corresponding dllimport, far-address, or refptr symbol. > It will be reused in the aarch64-w64-mingw32 target. > The implementation is copied as is from i386/i386.cc with > minor changes to

Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-22 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > This patch adds new alternatives to the patterns which are affected. The new > alternatives with the conditional early clobbers are added before the normal > ones in order for LRA to prefer them in the event that we have enough free > registers to

Re: [PATCH] Fix mixed input kind permute optimization

2024-05-22 Thread Richard Sandiford
Richard Sandiford writes: > Richard Biener writes: >> When change_vec_perm_layout runs into a permute combining two >> nodes where one is invariant and one internal the partition of >> one input can be -1 but the other might not be. The following >> supports this cas

Re: [PATCH 3/4] Avoid splitting store dataref groups during SLP discovery

2024-05-21 Thread Richard Sandiford
Richard Biener writes: > The following avoids splitting store dataref groups during SLP > discovery but instead forces (eventually single-lane) consecutive > lane SLP discovery for all lanes of the group, creating VEC_PERM > SLP nodes merging them so the store will always cover the whole group. >

Re: [PATCH] Fix mixed input kind permute optimization

2024-05-21 Thread Richard Sandiford
Richard Biener writes: > When change_vec_perm_layout runs into a permute combining two > nodes where one is invariant and one internal the partition of > one input can be -1 but the other might not be. The following > supports this case by simply ignoring inputs with input partiton -1. > > I'm

Re: [PATCH v3] aarch64: Fix normal returns inside functions which use eh_returns [PR114843]

2024-05-21 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Andrew, > > A few comments on the implementation, I think it can be simplified a lot: FWIW, I agree with Wilco's comments, except: >> +++ b/gcc/config/aarch64/aarch64.h >> @@ -700,8 +700,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = >> AARCH64_FL_SM_OFF; >>

[PATCH] Cache the set of EH_RETURN_DATA_REGNOs

2024-05-21 Thread Richard Sandiford
While reviewing Andrew's fix for PR114843, it seemed like it would be convenient to have a HARD_REG_SET of EH_RETURN_DATA_REGNOs. This patch adds one and uses it to simplify a couple of use sites. Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install? Richard gcc/ *

  1   2   3   4   5   6   7   8   9   10   >