Richard Biener writes:
> This avoids falling back to elementwise accesses for strided SLP
> loads when the group size is not a multiple of the vector element
> size. Instead we can use a smaller vector or integer type for the load.
>
> For stores we can do the same though restrictions on stores
Don't think it makes any difference, but:
Richard Biener writes:
> @@ -2151,7 +2151,16 @@ get_group_load_store_type (vec_info *vinfo,
> stmt_vec_info stmt_info,
>access excess elements.
>??? Enhancements include peeling multiple iterations
>
Ajit Agarwal writes:
> Hello Richard:
>
> On 11/06/24 9:41 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>>>> Thanks a lot. Can I know what should we be doing with neg (fma)
>>>>> correctness failures with load fusion.
>>>>
&
Robin Dapp writes:
>> I was looking at the code in more detail and just wanted to check.
>> We have:
>>
>> int last_needs_comparison = -1;
>>
>> bool ok = noce_convert_multiple_sets_1
>> (if_info, _no_cmov, _src, , ,
>> _insns, _needs_comparison);
>> if (!ok)
>> return
Ajit Agarwal writes:
>>> Thanks a lot. Can I know what should we be doing with neg (fma)
>>> correctness failures with load fusion.
>>
>> I think it would involve:
>>
>> - describing lxvp and stxvp as unspec patterns, as I mentioned
>> in the previous reply
>>
>> - making plain movoo split
Robin Dapp writes:
> The attached v3 tracks the use of cond_earliest as you suggested
> and adds its cost in default_noce_conversion_profitable_p.
>
> Bootstrapped and regtested on x86 and p10, aarch64 still
> running. Regtested on riscv64.
>
> Regards
> Robin
>
> Before noce_find_if_block
Ajit Agarwal writes:
> On 11/06/24 7:07 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>> Hello Richard:
>>> On 11/06/24 6:12 pm, Richard Sandiford wrote:
>>>> Ajit Agarwal writes:
>>>>> Hello Richard:
>>>>>
>>&g
Ajit Agarwal writes:
> Hello Richard:
> On 11/06/24 6:12 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>> Hello Richard:
>>>
>>> On 11/06/24 5:15 pm, Richard Sandiford wrote:
>>>> Ajit Agarwal writes:
>>>>> Hello Ri
Ajit Agarwal writes:
> Hello Richard:
>
> On 11/06/24 5:15 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>> Hello Richard:
>>> On 11/06/24 4:56 pm, Ajit Agarwal wrote:
>>>> Hello Richard:
>>>>
>>>> On 11/06/24 4:36 pm,
Ajit Agarwal writes:
> Hello Richard:
> On 11/06/24 4:56 pm, Ajit Agarwal wrote:
>> Hello Richard:
>>
>> On 11/06/24 4:36 pm, Richard Sandiford wrote:
>>> Ajit Agarwal writes:
>>>>>>>> After LRA reload:
>>>>>>>&g
Ajit Agarwal writes:
> After LRA reload:
>
> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ]
> [240])
> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
> (const_int 16 [0x10])) [1 MEM
> [(real(kind=8)
Richard Biener writes:
> It turns out target costing code looks at STMT_VINFO_MEMORY_ACCESS_TYPE
> to identify operations from (emulated) gathers for example. This
> doesn't work for SLP loads since we do not set STMT_VINFO_MEMORY_ACCESS_TYPE
> there as the vectorization strathegy might differ
Robin Dapp writes:
>> Is there any way we can avoid using pattern_cost here? Using it means
>> that we can make use of targetm.insn_cost for the jump but circumvent
>> it for the condition, giving a bit of a mixed metric.
>>
>> (I realise there are existing calls to pattern_cost in ifcvt.cc,
>>
Thanks for the update. Parts 1-5 look good to me. Some minor comments
below about part 6:
Evgeny Karpov writes:
> This patch reuses the MinGW implementation to enable DLL import/export
> functionality for the aarch64-w64-mingw32 target. It also modifies
> environment configurations for MinGW.
Robin Dapp writes:
>> Actually, as Richard mentioned in the PR, it would probably be better
>> to use prepare_vec_mask instead. It should work in this context too
>> and would avoid redundant double masking.
>
> Attached is v2 that uses prepare_vec_mask.
>
> Regtested on riscv64 and
Richard Biener writes:
> On Mon, Jun 10, 2024 at 9:35 AM Robin Dapp wrote:
>>
>> Hi,
>>
>> despite looking good on cfarm185 and Linaro's pre-commit CI
>> gcc-15-638-g7ca35f2e430 now appears to have caused several
>> regressions on arm-eabi cortex-m55 as found by Linaro's CI:
>>
>>
Andrew Pinski writes:
> This patch adds an alternative to the integer cmov and one to floating
> point cmov so we avoid in some more moving
>
> PR target/98477
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (*cmov_insn[GPI]): Add 'w'
> alternative.
> (*cmov_insn[GPF]):
Pengxuan Zheng writes:
> This patch is a follow-up of r15-1079-g230d62a2cdd16c to add vector floating
> point trunc pattern for V2DF->V2SF and V4SF->V4HF conversions by renaming the
> existing aarch64_float_truncate_lo_ pattern to the
> standard
> optab one, i.e., trunc2. This allows the
Ajit Agarwal writes:
> On 10/06/24 3:20 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>> On 10/06/24 2:52 pm, Richard Sandiford wrote:
>>>> Ajit Agarwal writes:
>>>>> On 10/06/24 2:12 pm, Richard
Ajit Agarwal writes:
> Hello Richard:
>
> On 10/06/24 2:52 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>> On 10/06/24 2:12 pm, Richard Sandiford wrote:
>>>> Ajit Agarwal writes:
>>>>>>>>>>>>> +
>>&
Richard Sandiford writes:
> Robin Dapp writes:
>> Hi,
>>
>> currently we discard the cond-op mask when the loop is fully masked
>> which causes wrong code in
>> gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>> when compiled with
>> -O3 -march=
Ajit Agarwal writes:
> On 10/06/24 2:12 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>>>>>>>>>> +
>>>>>>>>>>> + rtx set = single_set (insn);
>>>>>>>>>>> + i
Robin Dapp writes:
> Hi,
>
> currently we discard the cond-op mask when the loop is fully masked
> which causes wrong code in
> gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> when compiled with
> -O3 -march=cascadelake --param vect-partial-vector-usage=2.
>
> This patch ANDs both masks
Ajit Agarwal writes:
> +
> + rtx set = single_set (insn);
> + if (set == NULL_RTX)
> + return false;
> +
> + rtx op0 = SET_SRC (set);
> + rtx_code code = GET_CODE (op0);
> +
> + // This
Mariam Arutunian writes:
> This patch introduces two new expanders for the aarch64 backend,
> dedicated to generate optimized code for CRC computations.
> The new expanders are designed to leverage specific hardware capabilities
> to achieve faster CRC calculations,
> particularly using the pmul
Thanks a lot for doing this! It's a really nice series.
Just had a comment on the long division helper:
Mariam Arutunian writes:
> +/* Return the quotient of polynomial long division of x^2N by POLYNOMIAL
> + in GF (2^N). */
It looks like there might be an off-by-one discrepancy between
Ajit Agarwal writes:
>>> +
>>> + df_ref use;
>>> + df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
>>> + FOR_EACH_INSN_INFO_DEF (use, insn_info)
>>> +{
>>> + struct df_link *def_link = DF_REF_CHAIN (use);
>>> +
>>> + if (!def_link ||
Ajit Agarwal writes:
> On 06/06/24 8:03 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>> On 06/06/24 2:28 pm, Richard Sandiford wrote:
>>>> Hi,
>>>>
>>>> Just some comments on the fuseable_load_p part, since that's what
>>>&
config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
> Add missing __ARM_NEON_SVE_BRIDGE.
>
> On 6/6/24 13:20, Richard Sandiford wrote:
>> Richard Ball writes:
>>> __ARM_NEON_SVE_BRIDGE was missed in the original patch and is
>>> added by
Ajit Agarwal writes:
> On 06/06/24 2:28 pm, Richard Sandiford wrote:
>> Hi,
>>
>> Just some comments on the fuseable_load_p part, since that's what
>> we were discussing last time.
>>
>> It looks like this now relies on:
>>
>> Ajit Agarwal w
Pengxuan Zheng writes:
> This patch adds the fix_truncv4sfv4hi2 (V4SF->V4HI) pattern which is
> implemented
> using fix_truncv4sfv4si2 (V4SF->V4SI) and then truncv4siv4hi2 (V4SI->V4HI).
>
> PR target/113882
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-simd.md (fix_truncv4sfv4hi2):
Richard Ball writes:
> __ARM_NEON_SVE_BRIDGE was missed in the original patch and is
> added by this patch.
>
> Ok for trunk and a backport into gcc-14?
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
> Add missing __ARM_NEON_SVE_BRIDGE.
After this
Tamar Christina writes:
> Hi All,
>
> I made an oversight in the previous patch, where I added a ?Upa
> alternative to the Upl cases. This causes it to create the tie
> between the larger register file rather than the constrained one.
>
> This fixes the affected patterns.
>
> Bootstrapped
Pengxuan Zheng writes:
> This patch adds vector floating point extend pattern for V2SF->V2DF and
> V4HF->V4SF conversions by renaming the existing
> aarch64_float_extend_lo_
> pattern to the standard optab one, i.e., extend2. This allows the
> vectorizer to vectorize certain floating point
YunQiang Su writes:
> YunQiang Su 于2024年5月29日周三 10:02写道:
>>
>> Richard Sandiford 于2024年5月29日周三 05:28写道:
>> >
>> > YunQiang Su writes:
>> > > If `find_a_program` cannot find `as/ld/objcopy` and we are a cross
>> > > toolchain,
>>
Hi,
Just some comments on the fuseable_load_p part, since that's what
we were discussing last time.
It looks like this now relies on:
Ajit Agarwal writes:
> + /* We use DF data flow because we change location rtx
> + which is easier to find and modify.
> + We use mix of rtl-ssa
YunQiang Su writes:
> Richard Sandiford 于2024年6月5日周三 23:20写道:
>>
>> YunQiang Su writes:
>> > Richard Sandiford 于2024年6月5日周三 22:14写道:
>> >>
>> >> YunQiang Su writes:
>> >> > PR target/113179.
>> >> >
>> >
YunQiang Su writes:
> Richard Sandiford 于2024年6月5日周三 22:14写道:
>>
>> YunQiang Su writes:
>> > PR target/113179.
>> >
>> > In `store_bit_field_using_insv`, we just use SUBREG if value_mode
>> >>= op_mode, while in some ports, a sign_extend wil
YunQiang Su writes:
> PR target/113179.
>
> In `store_bit_field_using_insv`, we just use SUBREG if value_mode
>>= op_mode, while in some ports, a sign_extend will be needed,
> such as MIPS64:
> If either GPR rs or GPR rt does not contain sign-extended 32-bit
> values (bits 63..31 equal), then
Wilco Dijkstra writes:
> Hi Richard,
>
>>> Essentially anything covered by HWCAP doesn't need an explicit check. So I
>>> kept
>>> the LS64 and PREDRES checks since they don't have a HWCAP allocated (I'm not
>>> entirely convinced we need these, let alone having 3 individual bits for
>>> LS64,
Richard Biener writes:
> On Tue, 4 Jun 2024, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > The following emulates classical interleaving for SLP load permutes
>> > that we are unlikely handling natively. This is to handle cases
>> >
Sorry for the slow review.
Manolis Tsamis writes:
> This is an extension of what was done in PR106590.
>
> Currently if a sequence generated in noce_convert_multiple_sets clobbers the
> condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
> (sequences that emit the comparison
Jan Beulich writes:
> Much like AT_HWCAP is already provided in case the platform headers
> don't have the value (yet).
>
> libgcc/
>
> * config/aarch64/cpuinfo.c: Provide AT_HWCAP2.
OK for trunk and GCC 14.
Thanks,
Richard
> ---
> Observed as build failure with 14.1.0, so may want
Hongyu Wang writes:
> CC'd Richard for ccmp part as previously it is added only for aarch64.
> The original logic will not interrupted since if
> aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also
> success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the
>
HAO CHEN GUI writes:
> Hi,
> This patch replaces rtx_cost with insn_cost in forward propagation.
> In the PR, one constant vector should be propagated and replace a
> pseudo in a store insn if we know it's a duplicated constant vector.
> It reduces the insn cost but not rtx cost. In this case,
Evgeny Karpov writes:
> Richard and Uros, could you please review the changes for v2?
> Additionally, we have detected an issue with GCC GC in winnt-dll.cc. The fix
> will be included in v2.
Would it be possible to have a more "purposeful" name than
CMODEL_IS_NOT_LARGE_OR_MEDIUM_PIC? What's
Richard Biener writes:
> The following emulates classical interleaving for SLP load permutes
> that we are unlikely handling natively. This is to handle cases
> where interleaving (or load/store-lanes) is the optimal choice for
> vectorizing even when we are doing that within SLP. An example
>
Wilco Dijkstra writes:
> Hi Richard,
>
> I've reworded the commit message a bit:
>
> The CPU features initialization code uses CPUID registers (rather than
> HWCAP). The equality comparisons it uses are incorrect: for example FEAT_SVE
> is not set if SVE2 is available. Using HWCAPs for these is
Wilco Dijkstra writes:
> Fix CPU features initialization. Use HWCAP rather than explicit accesses
> to CPUID registers. Perform the initialization atomically to avoid multi-
> threading issues.
Please describe the problem that the patch is fixing. I think the
PR description would make a
Ajit Agarwal writes:
> Hello Richard:
>
> On 03/06/24 7:47 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>> On 03/06/24 5:03 pm, Richard Sandiford wrote:
>>>> Ajit Agarwal writes:
>>>>>> [...]
>>>>>>
Ajit Agarwal writes:
> On 03/06/24 5:03 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>>> [...]
>>>> If it is intentional, what distinguishes things like vperm and xxinsertw
>>>> (and all other unspecs) from plain addition?
>>>>
Robin Dapp writes:
> Hi,
>
> before noce_find_if_block processes a block it sets up an if_info
> structure that holds the original costs. At that point the costs of
> the then/else blocks have not been added so we only care about the
> "if" cost.
>
> The code originally used BRANCH_COST for that
Ajit Agarwal writes:
>> [...]
>> If it is intentional, what distinguishes things like vperm and xxinsertw
>> (and all other unspecs) from plain addition?
>>
>> [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
>> (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")
Kewen Lin writes:
> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> defines in aarch64 port, and add new port specific hook
> implementation aarch64_c_mode_for_floating_type.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.cc (aarch64_c_mode_for_floating_type):
> New
Marc Poulhiès writes:
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-ldp-fusion.cc (struct aarch64_pair_fusion):
> Use new type name.
> ---
> My previous change fixed the generic code, but I forgot to adjust the
> overload in aarch64.
>
> I don't have an aarch64 setup to check it
Ajit Agarwal writes:
> Hello Richard:
> On 31/05/24 8:08 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>> On 31/05/24 3:23 pm, Richard Sandiford wrote:
>>>> Ajit Agarwal writes:
>>>>> Hello All:
>>>>>
>>>>
Marc Poulhiès writes:
> Older GCCs fail with:
>
> .../gcc/pair-fusion.cc: In member function ‘bool
> pair_fusion_bb_info::fuse_pair(bool, unsigned int, int, rtl_ssa::insn_info*,
> rtl_ssa::in
> sn_info*, base_cand&, const rtl_ssa::insn_range_info&)’:
> .../gcc/pair-fusion.cc:1790:40:
Marc Poulhiès writes:
> Hello,
>
> I can't bootstrap using gcc 5.5 since this change. It fails with:
>
> .../gcc/pair-fusion.cc: In member function ‘bool
> pair_fusion_bb_info::fuse_pair(bool, unsigned int, int, rtl_ssa::insn_info*,
> rtl_ssa::in
> sn_info*, base_cand&, const
Wilco Dijkstra writes:
> Hi Richard,
>
>> I think this should be in a push_options/pop_options block, as for other
>> intrinsics that require certain features.
>
> But then the intrinsic would always be defined, which is contrary to what the
> ACLE spec demands - it would not give a compilation
Wilco Dijkstra writes:
> Add __ARM_FEATURE_MOPS predefine. Add support for ACLE __arm_mops_memset_tag.
>
> Passes regress, OK for commit?
>
> gcc:
> * config/aaarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
> Add __ARM_FEATURE_MOPS predefine.
> *
Wilco Dijkstra writes:
> Improve check-function-bodies by allowing single-character function names.
> Also skip '#' comments which may be emitted from inline assembler.
>
> Passes regress, OK for commit?
>
> gcc/testsuite:
> * lib/scanasm.exp (configure_check-function-bodies): Allow
Ajit Agarwal writes:
> On 31/05/24 3:23 pm, Richard Sandiford wrote:
>> Ajit Agarwal writes:
>>> Hello All:
>>>
>>> Common infrastructure using generic code for pair mem fusion of different
>>> targets.
>>>
>>> rs6000 tar
Reviewing my review :)
Richard Sandiford writes:
>> +
>> + for (auto def : info->defs ())
>> +{
>> + auto set = dyn_cast (def);
>> + if (set && set->has_any_uses ())
>> +{
>> + for (auto use : set->all_us
Ajit Agarwal writes:
> Hello All:
>
> Common infrastructure using generic code for pair mem fusion of different
> targets.
>
> rs6000 target specific specific code implements virtual functions defined
> by generic code.
>
> Code is implemented with pure virtual functions to interface with target
Hans-Peter Nilsson writes:
> [...]
> (Not-so-)fun fact: add_insn_after takes a bb parameter which
> reorg.cc always passes as NULL. But - the argument is
> *always ignored* and the bb in the "after" insn is used.
> I traced that ignored parameter as far as
> r0-81421-g6fb5fa3cbc0d78 "Merge
Jakub Jelinek writes:
> On Fri, May 31, 2024 at 08:45:54AM +0100, Richard Sandiford wrote:
>> > When you say same way, do you mean the way SVE ABI defines the rules for
>> > SVE types?
>>
>> No, sorry, I meant that if the choice isn't purely local to a sourc
Segher Boessenkool writes:
> Hi!
>
> On Fri, May 31, 2024 at 01:21:44AM +0530, Ajit Agarwal wrote:
>> Code is implemented with pure virtual functions to interface with target
>> code.
>
> It's not a pure function. A pure function -- by definition -- has no
> side effects. These things have side
Tejas Belagod writes:
> On 5/30/24 6:28 PM, Richard Sandiford wrote:
>> Tejas Belagod writes:
>>> Currently poly-int type structures are passed by value to OpenMP runtime
>>> functions for shared clauses etc. This patch improves on this by passing
>>> a
Tamar Christina writes:
> Hi All,
>
> This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2.
> It is kept off for generic codegen.
>
> Note the reason for the +sve even though they are in aarch64-sve.exp is if the
> testsuite is ran with a forced SVE off option, e.g.
Tamar Christina writes:
> [...]
> @@ -6651,8 +6661,10 @@ (define_insn "and3"
> (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand")
> (match_operand:PRED_ALL 2 "register_operand")))]
>"TARGET_SVE"
> - {@ [ cons: =0, 1 , 2 ]
> - [ Upa , Upa, Upa ]
Tamar Christina writes:
>> -Original Message-
>> From: Tamar Christina
>> Sent: Wednesday, May 22, 2024 10:29 AM
>> To: Richard Sandiford
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; ktkac...@gcc.gnu.org
&
Tejas Belagod writes:
> Note: This patch series is based on Richard's initial patch
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606741.html
> and Jakub's suggestion
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611892.html
>
> The following patch series handles
Tejas Belagod writes:
> Currently poly-int type structures are passed by value to OpenMP runtime
> functions for shared clauses etc. This patch improves on this by passing
> around poly-int structures by address to avoid copy-overhead.
>
> gcc/ChangeLog
> * omp-low.c
Tejas Belagod writes:
> The target clause in OpenMP is used to offload loop kernels to accelarator
> peripeherals. target's 'map' clause is used to move data from and to the
> accelarator. When the data is SVE type, it may not be suitable because of
> various reasons i.e. the two SVE targets
Tejas Belagod writes:
> This patch tests various shared clauses with SVE types. It also adds a test
> scaffold to run OpenMP tests in under the gcc.target testsuite.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/sve/omp/aarch64-sve-omp.exp: New scaffold.
Hopefully Jakub can
Thanks for the update. Some comments below, but looks very close
to ready.
Ajit Agarwal writes:
> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
> new file mode 100644
> index 000..060fd95
> --- /dev/null
> +++ b/gcc/pair-fusion.cc
> @@ -0,0 +1,3012 @@
> +// Pass to fuse
Pengxuan Zheng writes:
> vget_low_2.c is a test case for little-endian, but we missed the
> -mlittle-endian
> flag in r15-697-ga2e4fe5a53cf75.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vget_low_2.c: Add -mlittle-endian.
Ok, thanks.
If you'd like write access, please follow
Pengxuan Zheng writes:
> This patch improves vectorization of certain floating point widening
> operations
> for the aarch64 target by adding vector floating point extend patterns for
> V2SF->V2DF and V4HF->V4SF conversions.
>
> PR target/113880
> PR target/113869
>
> gcc/ChangeLog:
go_through_subreg used:
else if (!can_div_trunc_p (SUBREG_BYTE (x),
REGMODE_NATURAL_SIZE (GET_MODE (x)), offset))
to calculate the register offset for a pseudo subreg x. In the blessed
days before poly-int, this was:
*offset = (SUBREG_BYTE (x) /
Two-vector TBL instructions are fed by an aarch64_combinev16qi, whose
purpose is to put the two input data vectors into consecutive registers.
This aarch64_combinev16qi was then split after reload into individual
moves (from the first input to the first half of the output, and from
the second
Richard Biener writes:
> Code generation for contiguous load vectorization can already deal
> with generalized avoidance of loading from a gap. The following
> extends detection of peeling for gaps requirement with that,
> gets rid of the old special casing of a half load and makes sure
> when
Richard Biener writes:
> On Fri, 24 May 2024, Richard Biener wrote:
>
>> This is the second merge proposed from the SLP vectorizer branch.
>> I have again managed without adding and using --param vect-single-lane-slp
>> but instead this provides always enabled functionality.
>>
>> This makes us
HAO CHEN GUI writes:
> Hi,
> This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
> The subsequent patches will implement the expand on rs6000.
>
Richard Biener writes:
> On Mon, May 27, 2024 at 9:48 AM Jiawei wrote:
>>
>> Return NULL_TREE when genop3 equal EXACT_DIV_EXPR.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652641.html
>>
>> version log v3: remove additional POLY_INT_CST check.
>>
YunQiang Su writes:
> If `find_a_program` cannot find `as/ld/objcopy` and we are a cross toolchain,
> the final fallback is `as/ld` of system. In fact, we can have a try with
> -as/ld/objcopy before fallback to native as/ld/objcopy.
>
> This patch is derivatived from Debian's patch:
>
Andrew Carlotti writes:
> The existing implementation of this function was convoluted, and had
> multiple control flow errors that became apparent to me while reading
> the code:
>
> 1. The initial early return only checked the properties of the first
> exclusion in the list, when these
create_intersect_range_checks checks whether two access ranges
a and b are alias-free using something equivalent to:
end_a <= start_b || end_b <= start_a
It has two ways of doing this: a "vanilla" way that calculates
the exact exclusive end pointers, and another way that uses the
last
Pengxuan Zheng writes:
> This patch is a follow-up of r15-697-ga2e4fe5a53cf75 to also fold vget_high_*
> intrinsics to BIT_FILED_REF and remove the vget_high_* definitions from
> arm_neon.h to use the new intrinsics framework.
>
> PR target/102171
>
> gcc/ChangeLog:
>
> *
Evgeny Karpov writes:
> The DLL import/export mingw implementation, originally from ix86, requires
> minor adjustments to be compatible with AArch64.
>
> gcc/ChangeLog:
>
> * config/mingw/mingw32.h (defined): Use the correct DllMainCRTStartup
> entry function.
> *
Evgeny Karpov writes:
> This patch extends the aarch64 attributes list with the selectany
> attribute for the aarch64-w64-mingw32 target and reuses the mingw
> implementation to handle it.
>
> * config/aarch64/aarch64.cc:
> Extend the aarch64 attributes list.
> *
Evgeny Karpov writes:
> This patch renames functions related to dllimport/dllexport
> and selectany functionality. These functions will be reused
> in the aarch64-w64-mingw32 target.
>
> gcc/ChangeLog:
>
> * config/i386/cygming.h (mingw_pe_record_stub):
> Rename functions in mingw
Richard Biener writes:
> On Tue, 21 May 2024, Richard Biener wrote:
>
>> The gcc.dg/vect/slp-12a.c case is interesting as we currently split
>> the 8 store group into lanes 0-5 which we SLP with an unroll factor
>> of two (on x86-64 with SSE) and the remaining two lanes are using
>> interleaving
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Wednesday, May 22, 2024 10:48 AM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; ktkac...@gcc.gnu.org
&
Evgeny Karpov writes:
> This patch extracts the ix86 implementation for expanding a SYMBOL
> into its corresponding dllimport, far-address, or refptr symbol.
> It will be reused in the aarch64-w64-mingw32 target.
> The implementation is copied as is from i386/i386.cc with
> minor changes to
Tamar Christina writes:
> Hi All,
>
> This patch adds new alternatives to the patterns which are affected. The new
> alternatives with the conditional early clobbers are added before the normal
> ones in order for LRA to prefer them in the event that we have enough free
> registers to
Richard Sandiford writes:
> Richard Biener writes:
>> When change_vec_perm_layout runs into a permute combining two
>> nodes where one is invariant and one internal the partition of
>> one input can be -1 but the other might not be. The following
>> supports this cas
Richard Biener writes:
> The following avoids splitting store dataref groups during SLP
> discovery but instead forces (eventually single-lane) consecutive
> lane SLP discovery for all lanes of the group, creating VEC_PERM
> SLP nodes merging them so the store will always cover the whole group.
>
Richard Biener writes:
> When change_vec_perm_layout runs into a permute combining two
> nodes where one is invariant and one internal the partition of
> one input can be -1 but the other might not be. The following
> supports this case by simply ignoring inputs with input partiton -1.
>
> I'm
Wilco Dijkstra writes:
> Hi Andrew,
>
> A few comments on the implementation, I think it can be simplified a lot:
FWIW, I agree with Wilco's comments, except:
>> +++ b/gcc/config/aarch64/aarch64.h
>> @@ -700,8 +700,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE =
>> AARCH64_FL_SM_OFF;
>>
While reviewing Andrew's fix for PR114843, it seemed like it would
be convenient to have a HARD_REG_SET of EH_RETURN_DATA_REGNOs.
This patch adds one and uses it to simplify a couple of use sites.
Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install?
Richard
gcc/
*
1 - 100 of 8500 matches
Mail list logo