Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-09 Thread Hongyu Wang via Gcc-patches
Vladimir Makarov via Gcc-patches 于2023年9月9日周六 01:04写道: > > > On 8/31/23 04:20, Hongyu Wang wrote: > > @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression > > (@code{MEM} for the top level > > of an address, @code{ADDRESS} for something that occurs in an > >

Re: [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns

2023-09-01 Thread Hongyu Wang via Gcc-patches
Jakub Jelinek 于2023年9月1日周五 17:20写道: > > On Fri, Sep 01, 2023 at 05:07:53PM +0800, Hongyu Wang wrote: > > Jakub Jelinek via Gcc-patches 于2023年8月31日周四 > > 17:44写道: > > > > > > On Thu, Aug 31, 2023 at 04:20:19PM +0800, Hongyu Wang via Gcc-patches > > &

Re: [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns

2023-09-01 Thread Hongyu Wang via Gcc-patches
Jakub Jelinek via Gcc-patches 于2023年8月31日周四 17:44写道: > > On Thu, Aug 31, 2023 at 04:20:19PM +0800, Hongyu Wang via Gcc-patches wrote: > > For vector move insns like vmovdqa/vmovdqu, their evex counterparts > > requrire explicit suffix 64/32/16/8. The usage of these instruction

Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-01 Thread Hongyu Wang via Gcc-patches
Uros Bizjak via Gcc-patches 于2023年8月31日周四 18:16写道: > > On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang wrote: > > > > From: Kong Lingling > > > > Current reload infrastructure does not support selective base_reg_class > > for backend insn. Add insn argument to base_reg_class for > > lra/reload

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-01 Thread Hongyu Wang via Gcc-patches
Uros Bizjak via Gcc-patches 于2023年8月31日周四 18:01写道: > > On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches > wrote: > > > > On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote: > > > From: Kong Lingling > > > > > &g

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-01 Thread Hongyu Wang via Gcc-patches
Jakub Jelinek via Gcc-patches 于2023年8月31日周四 17:18写道: > > On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote: > > From: Kong Lingling > > > > In inline asm, we do not know if the insn can use EGPR, so disable EGPR > > usage by default f

Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)

2023-09-01 Thread Hongyu Wang via Gcc-patches
Richard Biener via Gcc-patches 于2023年8月31日周四 17:31写道: > > On Thu, Aug 31, 2023 at 11:26 AM Richard Biener > wrote: > > > > On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches > > wrote: > > > > > > From: Kong Lingling > > > >

Re: [PATCH 00/13] [RFC] Support Intel APX EGPR

2023-09-01 Thread Hongyu Wang via Gcc-patches
Richard Biener via Gcc-patches 于2023年8月31日周四 17:21写道: > > On Thu, Aug 31, 2023 at 10:22 AM Hongyu Wang via Gcc-patches > wrote: > > > > Intel Advanced performance extension (APX) has been released in [1]. > > It contains several extensions such as extended 16 general p

[PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5)

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling These vex insn may have legacy counterpart that could support EGPR, but they do not have evex counterpart. Split out its vex part from patterns and set the vex part to non-EGPR supported by adjusting constraints and attr_gpr32. insn list: 1. vmovmskpd/vmovmskps 2. vpmovmskb

[PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling The APX enabled hardware should also be AVX10 enabled, thus for map2/3 insns with evex counterpart, we assume auto promotion to EGPR under APX_F if the insn uses GPR32. So for below insns, we disabled EGPR usage for their sse mnenomics, while allowing egpr generation of their

[PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5)

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling These legacy insns in opcode map2/3 have vex but no evex counterpart, disable EGPR for them by adjusting alternatives and attr_gpr32. insn list: 1. phaddw/vphaddw, phaddd/vphaddd, phaddsw/vphaddsw 2. phsubw/vphsubw, phsubd/vphsubd, phsubsw/vphsubsw 3. psignb/vpsginb,

[PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5)

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling These legacy insn in opcode map0/1 only support GPR16, and do not have vex/evex counterpart, directly adjust constraints and add gpr32 attr to patterns. insn list: 1. xsave/xsave64, xrstor/xrstor64 2. xsaves/xsaves64, xrstors/xrstors64 3. xsavec/xsavec64 4.

[PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling Extend GENERAL_REGS with extra r16-r31 registers like REX registers, named as REX2 registers. They will only be enabled under TARGET_APX_EGPR. gcc/ChangeLog: * config/i386/i386-protos.h (x86_extended_rex2reg_mentioned_p): New function prototype. *

[PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling Disable EGPR usage for below legacy insns in opcode map2/3 that have vex but no evex counterpart. insn list: 1. phminposuw/vphminposuw 2. ptest/vptest 3. roundps/vroundps, roundpd/vroundpd, roundss/vroundss, roundsd/vroundsd 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm

[PATCH 07/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class.

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling Add backend helper functions to verify if a rtx_insn can adopt EGPR to its base/index reg of memory operand. The verification rule goes like 1. For asm insn, enable/disable EGPR by ix86_apx_inline_asm_use_gpr32. 2. Disable EGPR for unrecognized insn. 3. If

[PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling For APX, as we extended the GENERAL_REG_CLASS, new constraints are needed to restrict insns that cannot adopt EGPR either in its reg or memory operands. gcc/ChangeLog: * config/i386/constraints.md (h): New register constraint for GENERAL_GPR16. (Bt):

[PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling Current reload infrastructure does not support selective base_reg_class for backend insn. Add insn argument to base_reg_class for lra/reload usage. gcc/ChangeLog: * addresses.h (base_reg_class): Add insn argument. Pass to MODE_CODE_BASE_REG_CLASS.

[PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling In inline asm, we do not know if the insn can use EGPR, so disable EGPR usage by default from mapping the common reg/mem constraint to non-EGPR constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage for inline asm. gcc/ChangeLog: *

[PATCH 03/13] [APX_EGPR] Initial support for APX_F

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling Add -mapx-features= enumeration to separate subfeatures of APX_F. -mapxf is treated same as previous ISA flag, while it sets -mapx-features=apx_all that enables all subfeatures. gcc/ChangeLog: * common/config/i386/cpuinfo.h (XSTATE_APX_F): New macro.

[PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns

2023-08-31 Thread Hongyu Wang via Gcc-patches
For vector move insns like vmovdqa/vmovdqu, their evex counterparts requrire explicit suffix 64/32/16/8. The usage of these instruction are prohibited under AVX10_1 or AVX512F, so for AVX2+APX_F we select vmovaps/vmovups for vector load/store insns that contains EGPR. gcc/ChangeLog: *

[PATCH 00/13] [RFC] Support Intel APX EGPR

2023-08-31 Thread Hongyu Wang via Gcc-patches
Intel Advanced performance extension (APX) has been released in [1]. It contains several extensions such as extended 16 general purpose registers (EGPRs), push2/pop2, new data destination (NDD), conditional compare (CCMP/CTEST) combined with suppress flags write version of common instructions

[PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument.

2023-08-31 Thread Hongyu Wang via Gcc-patches
Like base_reg_class, INDEX_REG_CLASS also does not support backend insn. Add index_reg_class with insn argument for lra/reload usage. gcc/ChangeLog: * addresses.h (index_reg_class): New wrapper function like base_reg_class. * doc/tm.texi: Document INSN_INDEX_REG_CLASS.

[PATCH] Fix avx512ne2ps2bf16 wrong code [PR 111127]

2023-08-24 Thread Hongyu Wang via Gcc-patches
Hi, For PR27, the wrong code was caused by wrong expander for maskz. correct the parameter order for avx512ne2ps2bf16_maskz expander Bootstrapped/regtested on x86-64-pc-linux-gnu{m32,}. OK for master and backport to GCC13? gcc/ChangeLog: PR target/27 *

[PATCH] i386: Update document for inlining rules

2023-07-06 Thread Hongyu Wang via Gcc-patches
Hi, This is a follow-up patch for https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623525.html that updates document about x86 inlining rules. Ok for trunk? gcc/ChangeLog: * doc/extend.texi: Move x86 inlining rule to a new subsubsection and add description for inling of

Re: [PATCH V2] i386: Inline function with default arch/tune to caller

2023-07-05 Thread Hongyu Wang via Gcc-patches
Thanks, this is the updated patch I'm going to check in. Uros Bizjak 于2023年7月4日周二 16:57写道: > > On Tue, Jul 4, 2023 at 10:32 AM Hongyu Wang wrote: > > > > > In a follow-up patch, can you please document inlining rules involving > > > -march and -mtune to "x86 Function Attributes" section?

Re: [PATCH V2] i386: Inline function with default arch/tune to caller

2023-07-04 Thread Hongyu Wang via Gcc-patches
> In a follow-up patch, can you please document inlining rules involving > -march and -mtune to "x86 Function Attributes" section? Currently, the > inlining rules at the end of "target function attribute" section does > not even mention -march and -mtune. Maybe a subsubsection "Inlining > rules"

[PATCH V2] i386: Inline function with default arch/tune to caller

2023-07-03 Thread Hongyu Wang via Gcc-patches
Hi, For function with different target attributes, current logic rejects to inline the callee when any arch or tune is mismatched. Relax the condition to allow callee with default arch/tune to be inlined. Boostrapped/regtested on x86-64-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: *

Re: [PATCH] i386: Relax inline requirement for functions with different target attrs

2023-06-28 Thread Hongyu Wang via Gcc-patches
> If the user specified a different arch for callee than the caller, > then the compiler will switch on different ISAs (-march is just a > shortcut for different ISA packs), and the programmer is aware that > inlining isn't intended here (we have -mtune, which is not as strong > as -march, but

Re: [PATCH] i386: Sync tune_string with arch_string for target attribute arch=*

2023-06-27 Thread Hongyu Wang via Gcc-patches
The testcase fails with --with-arch=native build on cascadelake, here is the patch to adjust it gcc/testsuite/ChangeLog: * gcc.target/i386/mvc17.c: Add -march=x86-64 to dg-options. --- gcc/testsuite/gcc.target/i386/mvc17.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

Re: [PATCH] i386: Relax inline requirement for functions with different target attrs

2023-06-27 Thread Hongyu Wang via Gcc-patches
> I don't think this is desirable. If we inline something with different > ISAs, we get some strange mix of ISAs when the function is inlined. > OTOH - we already inline with mismatched tune flags if the function is > marked with always_inline. Previously ix86_can_inline_p has if

Re: [PATCH] i386: Sync tune_string with arch_string for target attribute arch=*

2023-06-26 Thread Hongyu Wang via Gcc-patches
Thanks, I'll backport it down to GCC10 after this passed all bootstrap/regtest. Uros Bizjak via Gcc-patches 于2023年6月26日周一 14:05写道: > > On Mon, Jun 26, 2023 at 4:31 AM Hongyu Wang wrote: > > > > Hi, > > > > For function with target attribute arch=*, current logic will set its > > tune to -mtune

[PATCH] i386: Relax inline requirement for functions with different target attrs

2023-06-25 Thread Hongyu Wang via Gcc-patches
Hi, For function with different target attributes, current logic rejects to inline the callee when any arch or tune is mismatched. Relax the condition to honor just prefer_vecotr_width_type and other flags that may cause safety issue so caller can get more optimization opportunity.

[PATCH] i386: Sync tune_string with arch_string for target attribute arch=*

2023-06-25 Thread Hongyu Wang via Gcc-patches
Hi, For function with target attribute arch=*, current logic will set its tune to -mtune from command line so all target_clones will get same tuning flags which would affect the performance for each clone. Override tune with arch if tune was not explicitly specified to get proper tuning flags for

Re: [PATCH] libgomp: Fix default value of GOMP_SPINCOUNT [PR 109062]

2023-03-08 Thread Hongyu Wang via Gcc-patches
> Seems for many ICVs the default values are done through > gomp_default_icv_values, but that doesn't cover wait_policy. > For other vars, the defaults are provided through just initializers of > those vars on the var definitions, e.g.: > char *gomp_affinity_format_var = "level %L thread %i

Re: [PATCH] libgomp: Fix default value of GOMP_SPINCOUNT [PR 109062]

2023-03-08 Thread Hongyu Wang via Gcc-patches
Hongyu Wang 于2023年3月8日周三 16:07写道: > > > I think the right spot to fix this would be instead in initialize_icvs, > > change the > > icvs->wait_policy = 0; > > in there to > > icvs->wait_policy = -1; > > That way it will be the default for all the devices, not just the > > initial one. > > It

Re: [PATCH] libgomp: Fix default value of GOMP_SPINCOUNT [PR 109062]

2023-03-08 Thread Hongyu Wang via Gcc-patches
> I think the right spot to fix this would be instead in initialize_icvs, > change the > icvs->wait_policy = 0; > in there to > icvs->wait_policy = -1; > That way it will be the default for all the devices, not just the > initial one. It doesn't work, for the code that determines value of

[PATCH] libgomp: Fix default value of GOMP_SPINCOUNT [PR 109062]

2023-03-07 Thread Hongyu Wang via Gcc-patches
Hi, When OMP_WAIT_POLICY is not specified, current implementation will cause icv flag GOMP_ICV_WAIT_POLICY unset, so global variable wait_policy will remain its uninitialized value. Set it to -1 when the flag is not specified to keep GOMP_SPINCOUNT behavior consistent with its description.

Re: [PATCH] i386: Avoid fma_chain for -march=alderlake and sapphirerapids.

2022-12-14 Thread Hongyu Wang via Gcc-patches
If there is no objection, I'm going to backport the m_SAPPHIRERAPIDS and m_ALDERLAKE change to GCC 12. Uros Bizjak via Gcc-patches 于2022年12月7日周三 15:11写道: > > On Wed, Dec 7, 2022 at 7:36 AM Hongyu Wang wrote: > > > > For Alderlake there is similar issue like PR 81616, enable > >

[PATCH] i386: Avoid fma_chain for -march=alderlake and sapphirerapids.

2022-12-06 Thread Hongyu Wang via Gcc-patches
For Alderlake there is similar issue like PR 81616, enable avoid_fma256_chain will also benefit on Intel latest platforms Alderlake and Sapphire Rapids. Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}. Ok for master? gcc/ChangeLog: * config/i386/x86-tune.def

Re: [PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-22 Thread Hongyu Wang via Gcc-patches
Hi Jeff, > The reversion of the loop-init.cc changes is fine. The x86 maintainers > will need to chime in on the rest. Consider installing the loop-init.cc > reversion immediately as the current state has regressed s390 and > potentially other targets. I've posted a patch in

Re: [PATCH] rs6000: Adjust loop_unroll_adjust to match middle-end change [PR 107692]

2022-11-22 Thread Hongyu Wang via Gcc-patches
Hi, Segher and Richard > > Something in your patch was wrong, please fix that (or revert the > > patch). You should not have to touch config/rs6000/ at all. > > Sure something is wrong, but I think there's the opportunity to > simplify rs6000/ and s390x/, the only other two implementors of > the

Re: [PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-20 Thread Hongyu Wang via Gcc-patches
> It's not necessarily right. unroll_factor will be set as 1 when > -fno-unroll-loops, which is exactly -fno-unroll-loops means. Not that exactly, -fno-unroll-loops previously will prevent the pass from running, and on the current trunk the pass still runs. Actually I think the implementation on

[PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-18 Thread Hongyu Wang via Gcc-patches
Hi, Followed by the discussion in pr107602, -munroll-only-small-loops Does not turns on/off -funroll-loops, and current check in pass_rtl_unroll_loops::gate would cause -funroll-loops do not take effect. Revert the change about targetm.loop_unroll_adjust and apply the backend option change to

Re: [PATCH] Optimize VEC_PERM_EXPR with same permutation index and operation [PR98167]

2022-11-16 Thread Hongyu Wang via Gcc-patches
> I assume the "full permutation" condition is to avoid performing some > extra operations that would raise exception flags. If so, are there > conditions (-fno-trapping-math?) where the transformation would be safe > with arbitrary shuffles? Yes, that could be an alternative choice with

[PATCH] rs6000: Adjust loop_unroll_adjust to match middle-end change [PR 107692]

2022-11-16 Thread Hongyu Wang via Gcc-patches
Hi, r13-3950-g071e428c24ee8c enables O2 small loop unrolling, but it breaks -fno-unroll-loops for rs6000 with loop_unroll_adjust hook. Adjust the option handling and target hook accordingly. Bootstrapped & regtested on powerpc64le-linux-gnu, OK for trunk? gcc/ChangeLog: PR

Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

2022-11-16 Thread Hongyu Wang via Gcc-patches
> Please use 'git commit --author' to indicate authorship of the patch > (or simply let me push it once approved). Yes, just change the author and push it. Thanks for your help!

Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

2022-11-15 Thread Hongyu Wang via Gcc-patches
> When emitting a compare-and-swap loop for @ref{__sync Builtins} > and @ref{__atomic Builtins} lacking a native instruction, optimize > for the highly contended case by issuing an atomic load before the > @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction > to save CPU power when

[PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

2022-11-14 Thread Hongyu Wang via Gcc-patches
Hi, According to PR 107676, the document of -mrelax-cmpxchg-loop is nonsensical. Adjust the wording according to the comments. Bootstrapped on x86_64-pc-linux-gnu, ok for trunk? gcc/ChangeLog: PR target/107676 * doc/invoke.texi: Reword the description of

Re: [PATCH V2] Enable small loop unrolling for O2

2022-11-13 Thread Hongyu Wang via Gcc-patches
> Ok, Note GCC documents have been ported to sphinx, so you need to > adjust changes in invoke.texi to new sphinx files. Yes, this is the patch I'm going to check-in. Thanks. Hongtao Liu 于2022年11月14日周一 09:35写道: > > On Wed, Nov 9, 2022 at 9:29 AM Hongyu Wang wrote: > > > > > Although

Re: [PATCH] Optimize VEC_PERM_EXPR with same permutation index and operation [PR98167]

2022-11-10 Thread Hongyu Wang via Gcc-patches
> full_perm_p = count == nelts; > > I'll note that you should still check .encoding ().encoded_full_vector_p () > and only bother to check that case, that's a very simple check. > > > > > Attached updated patch. > > > > Richard Biener via Gcc-patches 于2022

Re: [PATCH] Optimize VEC_PERM_EXPR with same permutation index and operation [PR98167]

2022-11-09 Thread Hongyu Wang via Gcc-patches
22:38写道: > > On Fri, Nov 4, 2022 at 7:44 AM Prathamesh Kulkarni via Gcc-patches > wrote: > > > > On Fri, 4 Nov 2022 at 05:36, Hongyu Wang via Gcc-patches > > wrote: > > > > > > Hi, > > > > > > This is a follow-up patch for PR98167 &

Re: [PATCH V2] Enable small loop unrolling for O2

2022-11-08 Thread Hongyu Wang via Gcc-patches
> Although ix86_small_unroll_insns is coming from issue_rate, it's tuned > for codesize. > Make it exact as issue_rate and using factor * issue_width / > loop->ninsns may increase code size too much. > So I prefer to add those 2 parameters to the cost table for core > tunings instead of 1. Yes,

[PATCH] Optimize VEC_PERM_EXPR with same permutation index and operation [PR98167]

2022-11-03 Thread Hongyu Wang via Gcc-patches
Hi, This is a follow-up patch for PR98167 The sequence c1 = VEC_PERM_EXPR (a, a, mask) c2 = VEC_PERM_EXPR (b, b, mask) c3 = c1 op c2 can be optimized to c = a op b c3 = VEC_PERM_EXPR (c, c, mask) for all integer vector operation, and float operation with full

[PATCH V2] Enable small loop unrolling for O2

2022-11-01 Thread Hongyu Wang via Gcc-patches
Hi, this is the updated patch of https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604345.html, which uses targetm.loop_unroll_adjust as gate to enable small loop unroll. This patch does not change rs6000/s390 since I don't have machine to test them, but I suppose the default behavior is

Re: [PATCH] i386: Enable small loop unrolling for O2

2022-10-28 Thread Hongyu Wang via Gcc-patches
> Ugh, that's all quite ugly and unmaintainable, no? Agreed, I have the same feeling. > I'm quite sure that if this works it's not by intention. Doesn't this > also disable > register renaming and web when the user explicitely specifies -funroll-loops? > > Doesn't this change -funroll-loops

Re: [PATCH] i386: Enable small loop unrolling for O2

2022-10-26 Thread Hongyu Wang via Gcc-patches
> Does this setting benefit all targets? IIRC, in the past all > benchmarks also enabled -funroll-loops, so it looks to me that > unrolling small loops by default is a good compromise. The idea to unroll small loops can be explained from the x86 micro-architecture. Modern x86 processors has

[PATCH] i386: Enable small loop unrolling for O2

2022-10-25 Thread Hongyu Wang via Gcc-patches
Hi, Inspired by rs6000 and s390 port changes, this patch enables loop unrolling for small size loop at O2 by default. The default behavior is to unroll loop with unknown trip-count and less than 4 insns by 1 time. This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with 1.2% codesize

[PATCH] Support Intel AVX-IFMA

2022-10-19 Thread Hongyu Wang via Gcc-patches
Hi, Here is the update patch that align the implementation to AVX-VNNI, and corrects some spelling error for AVX512IFMA pattern. Bootstrapped/regtested on x86_64-pc-linux-gnu and sde. Ok for trunk? gcc/ * common/config/i386/i386-common.cc (OPTION_MASK_ISA_AVXIFMA_SET,

Re: [PATCH] MAINTAINERS: Add myself for write after approval

2022-06-27 Thread Hongyu Wang via Gcc-patches
Sorry, should be between Boris Kolpackov Dave Korn Hongyu Wang 于2022年6月27日周一 16:29写道: > > According to the official guide, please sort your last name in > alphabetical order, which means you shold put your name between > > Dave Korn > Julia Koval > > Kong, Lingling via Gcc-patches

Re: [PATCH] MAINTAINERS: Add myself for write after approval

2022-06-27 Thread Hongyu Wang via Gcc-patches
According to the official guide, please sort your last name in alphabetical order, which means you shold put your name between Dave Korn Julia Koval Kong, Lingling via Gcc-patches 于2022年6月27日周一 16:05写道: > > Hi, > > I want to add myself in MAINTANINER for write after approval. > > OK for

Re: [PATCH] i386: Add a constraint for absolute symboilc address [PR 105576]

2022-05-18 Thread Hongyu Wang via Gcc-patches
> -fpic will break compilation with "i" constraint. Ah, yes. But "X" is like no constraint, shouldn't we provide something similar to "S" in aarch64 and riscv? I think it is better to constrain the operand to constant symbols rather than allowing everything. Uros Bizjak 于2022年5月18日周三 18:18写道: >

Re: [PATCH] i386: Add a constraint for absolute symboilc address [PR 105576]

2022-05-18 Thread Hongyu Wang via Gcc-patches
Oh, I just found that asm ("%p0" :: "i"(addr)); also works on -mcmodel=large in this case, please ignore this patch. Thanks. Uros Bizjak via Gcc-patches 于2022年5月18日周三 17:46写道: > > On Wed, May 18, 2022 at 9:32 AM Hongyu Wang wrote: > > > > Hi, > > > > This patch adds a constraint "Ws" to allow

[PATCH] i386: Add a constraint for absolute symboilc address [PR 105576]

2022-05-18 Thread Hongyu Wang via Gcc-patches
Hi, This patch adds a constraint "Ws" to allow absolute symbolic address for either function or variable. This also works under -mcmodel=large. Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} Ok for master? gcc/ChangeLog: PR target/105576 * config/i386/constraints.md

Re: [PATCH] Reconstruct i386 testsuite with __builtin_cpu_supports

2022-05-06 Thread Hongyu Wang via Gcc-patches
> I don't think *_os_support calls should be removed. IIRC, > __builtin_cpu_supports function checks if the feature is supported by > CPU, whereas *_os_supports calls check via xgetbv if OS supports > handling of new registers. avx_os_support is like avx_os_support (void) { unsigned int eax,

Re: [PATCH] [i386]Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest.

2022-05-06 Thread Hongyu Wang via Gcc-patches
> +(define_split > + [(set (reg:CCZ FLAGS_REG) > + (compare:CCZ (unspec:SI > + [(eq:VI1_AVX2 > + (match_operand:VI1_AVX2 0 "vector_operand") > + (match_operand:VI1_AVX2 1 "const0_operand"))] > +

Re: [PATCH] AVX512F: Add missing macro for mask(z?)_scalf_s[sd] [PR 105339]

2022-04-22 Thread Hongyu Wang via Gcc-patches
> Please add the corresponding intrinsic test in sse-14.c Sorry for forgetting this part. Updated patch. Thanks. Hongtao Liu via Gcc-patches 于2022年4月22日周五 16:49写道: > > On Fri, Apr 22, 2022 at 4:12 PM Hongyu Wang via Gcc-patches > wrote: > > > > Hi, > > > > A

[PATCH] AVX512F: Add missing macro for mask(z?)_scalf_s[sd] [PR 105339]

2022-04-22 Thread Hongyu Wang via Gcc-patches
Hi, Add missing macro under O0 and adjust macro format for scalf intrinsics. Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}. Ok for master and backport to GCC 9/10/11? gcc/ChangeLog: PR target/105339 * config/i386/avx512fintrin.h (_mm512_scalef_round_pd): Add

Re: [PATCH] i386: Correct target attribute for crc32 intrinsics

2022-04-15 Thread Hongyu Wang via Gcc-patches
> This test should not be changed, it correctly reports ISA mismatch. It > even passes -mno-crc32. The error message changes from "needs isa option -mcrc32" to "target specific option mismatch" with the #pragma change. I see many of our intrinsic would throw such error, it has been a long term

[PATCH] i386: Correct target attribute for crc32 intrinsics

2022-04-14 Thread Hongyu Wang via Gcc-patches
Hi, Complile _mm_crc32_u8/16/32/64 intrinsics with -mcrc32 would meet target specific option mismatch. Correct target pragma to fix. Bootstrapped/regtest on x86_64-pc-linux-gnu{-m32,}. Ok for master and backport to GCC 11? gcc/ChangeLog: * config/i386/smmintrin.h: Correct target

Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Hongyu Wang via Gcc-patches
; > > > + && TARGET_STV && TARGET_SSE2 && optimize > 1 > > > > + && optimize_function_for_speed_p (cfun)); > > > > > > ... and use it here instead of referencing 'cfun' > > > > > > Richard. >

Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Hongyu Wang via Gcc-patches
6e9ed > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c > > @@ -0,0 +1,23 @@ > > +/* PR target/105034 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-Os -msse4.1" } */ > > + > > +#define max(a,b) (((a) > (b))? (a) : (b)) > > +

Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Hongyu Wang via Gcc-patches
(a) : (b)) +#define min(a,b) (((a) < (b))? (a) : (b)) + +int foo(int x) +{ + return max(x,0); +} + +int bar(int x) +{ + return min(x,0); +} + +unsigned int baz(unsigned int x) +{ + return min(x,1); +} + +/* { dg-final { scan-assembler-not "xmm" } } */ -- 2.18.1 Richard Biener via Gc

[PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-13 Thread Hongyu Wang via Gcc-patches
Hi, >From -Os point of view, stv converts scalar register to vector mode which introduces extra reg conversion and increase instruction size. Disabling stv under optimize_size would avoid such code size increment and no need to touch ix86_size_cost that has not been tuned for long time.

[PATCH] i386: Fix infinite loop under -mrelax-cmpxchg-loop [PR 103069]

2022-04-13 Thread Hongyu Wang via Gcc-patches
Hi, For -mrelax-cmpxchg-loop which relaxes atomic_fetch_ loops, there is a missing set to %eax when compare fails, which would result in infinite loop in some benchmark. Add set to %eax to avoid it. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} Ok for master? gcc/ChangeLog: PR

Re: [PATCH] x86: Use x constraint on KL patterns

2022-03-25 Thread Hongyu Wang via Gcc-patches
> > Is it possible to create a test case that gas would throw an error for > > invalid operands? > > You can use -ffix-xmmN to disable XMM0-15. I mean can we create an intrinsic test for this PR that produces xmm16-31? And the -ffix-xmmN is an option for assembler or compiler? I didn't find it in

Re: [PATCH] x86: Use x constraint on KL patterns

2022-03-25 Thread Hongyu Wang via Gcc-patches
Is it possible to create a test case that gas would throw an error for invalid operands? H.J. Lu via Gcc-patches 于2022年3月26日周六 04:50写道: > > Since KL instructions have no AVX512 version, replace the "v" register > constraint with the "x" register constraint. > > PR target/105058 >

[PATCH v3] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Hongyu Wang via Gcc-patches
Hi, here is the patch with force_reg before lowpart_subreg. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master? For complex scalar intrinsic like _mm_mask_fcmadd_sch, the mask should be and by 1 to ensure the mask is bind to lowest byte. Use masked vmovss to perform same

Re: [PATCH v2] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Hongyu Wang via Gcc-patches
are strictly V8HF operands from builtin input. I suppose there should be no chance to input a different size subreg for the expander, otherwise (__v8hf) convert in builtin would fail first. Hongtao Liu via Gcc-patches 于2022年3月21日周一 20:53写道: > > On Mon, Mar 21, 2022 at 7:52 PM Hongyu Wang via

[PATCH v2] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Hongyu Wang via Gcc-patches
Hi, For complex scalar intrinsic like _mm_mask_fcmadd_sch, the mask should be and by 1 to ensure the mask is bind to lowest byte. Use masked vmovss to perform same operation which omits higher bits of mask. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master?

Re: [PATCH] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-20 Thread Hongyu Wang via Gcc-patches
; > Hongtao Liu via Gcc-patches 于2022年3月21日周一 09:08写道: > > > > > > On Sat, Mar 19, 2022 at 8:09 AM Hongyu Wang via Gcc-patches > > > wrote: > > > > > > > > Hi, > > > > > > > > For complex scalar intrinsic like _mm_ma

Re: [PATCH] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-20 Thread Hongyu Wang via Gcc-patches
日周一 09:08写道: > > On Sat, Mar 19, 2022 at 8:09 AM Hongyu Wang via Gcc-patches > wrote: > > > > Hi, > > > > For complex scalar intrinsic like _mm_mask_fcmadd_sch, the > > mask should be and by 1 to ensure the mask is bind to lowest byte. > > > &g

[PATCH] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-18 Thread Hongyu Wang via Gcc-patches
Hi, For complex scalar intrinsic like _mm_mask_fcmadd_sch, the mask should be and by 1 to ensure the mask is bind to lowest byte. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master? gcc/ChangeLog: PR target/104978 * config/i386/sse.md

[PATCH] AVX512FP16: Fix masm=intel output for vfc?(madd|mul)csh [PR 104977]

2022-03-18 Thread Hongyu Wang via Gcc-patches
Hi, This patch fixes typo in subst for scalar complex mask_round operand. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master? gcc/ChangeLog: PR target/104977 * config/i386/sse.md (avx512fp16_fmash_v8hf): Correct round operand for intel

[PATCH] AVX512FP16: Fix vcvt[u]si2sh runtime tests for Solaris

2022-03-01 Thread Hongyu Wang via Gcc-patches
Use standard C type instead of __int64_t which doesn't work on Solaris. Tested by Rainer Orth on Solaris/x86. Pushed to trunk as obvious fix. gcc/testsuite/ChangeLog: PR target/104724 * gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c: Use long long instead of __int64_t.

[PATCH] i386: Fix pr104551 testcase for solaris [PR 104726]

2022-03-01 Thread Hongyu Wang via Gcc-patches
Use avx2-check mechanism to avoid illegal instrucion on non-avx2 target. Tested by Rainer Orth on Solaris/x86. Pushed to trunk as obvious fix. gcc/testsuite/ChangeLog: PR target/104726 * gcc.target/i386/pr104551.c: Use avx2-check.h. --- gcc/testsuite/gcc.target/i386/pr104551.c

[PATCH] i386: Fix V8HF vector init under -mno-avx [PR 104664]

2022-02-28 Thread Hongyu Wang via Gcc-patches
Hi, For V8HFmode vector init with HFmode, do not directly emits V8HF move with subreg, which may cause reload to assign general register to move src. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,}. Ok for master? gcc/ChangeLog: PR target/104664 *

[PATCH] AVX512F: Add helper enumeration for ternary logic intrinsics.

2022-02-25 Thread Hongyu Wang via Gcc-patches
Hi, This patch intends to sync with llvm change in https://reviews.llvm.org/D120307 to add enumeration and truncate imm to unsigned char, so users could use ~ on immediates. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,}. Ok for master? gcc/ChangeLog: * config/i386/avx512fintrin.h

[PATCH] i386: Relax cmpxchg instruction under -mrelax-cmpxchg-loop [PR 103069]

2022-02-21 Thread Hongyu Wang via Gcc-patches
Hi, For cmpxchg, it is commonly used in spin loop, and several user code such as pthread directly takes cmpxchg as loop condition, which cause huge cache bouncing. This patch extends previous implementation to relax all cmpxchg instruction under -mrelax-cmpxchg-loop with an extra atomic load,

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-15 Thread Hongyu Wang via Gcc-patches
Thanks for the suggestion, here is the updated patch that survived bootstrap/regtest. > Please note reg_mentioned_p in the above condition. This function > returns nonzero if register op0 appears somewhere within op1 and is > critical for the correct operation of your patch. I added

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-14 Thread Hongyu Wang via Gcc-patches
> Are there any technical obstacles to introduce subst to > define_{,insn_and_}split? gccint says: define_subst can be used only in define_insn and define_expand, it cannot be used in other expressions (e.g. in define_insn_and_split). I have no idea how to implement it in current infrastructure.

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-13 Thread Hongyu Wang via Gcc-patches
> > No, the approach is wrong. You have to solve output clearing on RTL > > level, please look at how e.g. tzcnt false dep is solved: > > Actually we have considered such approach before, but we found we need > to break original define_insn to remove the mask/rounding subst, > since define_split

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-13 Thread Hongyu Wang via Gcc-patches
> No, the approach is wrong. You have to solve output clearing on RTL > level, please look at how e.g. tzcnt false dep is solved: Actually we have considered such approach before, but we found we need to break original define_insn to remove the mask/rounding subst, since define_split could not

[PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-12 Thread Hongyu Wang via Gcc-patches
From: wwwhhhyyy Hi, For GoldenCove micro-architecture, force insert zero-idiom in asm template to break false dependency of dest register for several insns. The related insns are: VPERM/D/Q/PS/PD VRANGEPD/PS/SD/SS VGETMANTSS/SD/SH VGETMANDPS/PD - mem version only VPMULLQ VFMULCSH/PH

[PATCH] i386: Fix wrong codegen for -mrelax-cmpxchg-loop

2021-11-17 Thread Hongyu Wang via Gcc-patches
Hi Uros, For -mrelax-cmpxchg-loop introduced by PR 103069/r12-5265, it would produce infinite loop. The correct code should be .L84: movl(%rdi), %ecx movl%eax, %edx orl %esi, %edx cmpl%eax, %ecx jne .L82 lock cmpxchgl %edx,

Re: [PATCH] PR target/103069: Relax cmpxchg loop for x86 target

2021-11-15 Thread Hongyu Wang via Gcc-patches
Thanks for your review, this is the patch I'm going to check-in. Uros Bizjak via Gcc-patches 于2021年11月15日周一 下午4:25写道: > > On Sat, Nov 13, 2021 at 3:34 AM Hongyu Wang wrote: > > > > Hi, > > > > From the CPU's point of view, getting a cache line for writing is more > > expensive than reading.

[PATCH] PR libgomp/103068: Optimize gomp_mutex_lock_slow for x86 target

2021-11-13 Thread Hongyu Wang via Gcc-patches
Hi, >From the CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers /xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line

[PATCH] PR target/103069: Relax cmpxchg loop for x86 target

2021-11-12 Thread Hongyu Wang via Gcc-patches
Hi, >From the CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line

Re: [PATCH] i386: Fix wrong result for AMX-TILE intrinsic when parsing expression.

2021-11-03 Thread Hongyu Wang via Gcc-patches
> Could you add a testcase for that? Yes, updated patch. Hongtao Liu via Gcc-patches 于2021年11月4日周四 上午10:25写道: > > On Thu, Nov 4, 2021 at 9:19 AM Hongyu Wang via Gcc-patches > wrote: > > > > Hi, > > > > _tile_loadd, _tile_stored, _tile_streamloadd int

[PATCH] i386: Auto vectorize sdot_prod, usdot_prod with VNNI instruction.

2021-11-03 Thread Hongyu Wang via Gcc-patches
Hi, AVX512VNNI/AVXVNNI has vpdpwssd for HImode, vpdpbusd for QImode, so Adjust HImode sdot_prod expander and add QImode usdot_prod expander to enhance vectorization for dotprod. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master? gcc/ChangeLog: *

[PATCH] i386: Fix wrong result for AMX-TILE intrinsic when parsing expression.

2021-11-03 Thread Hongyu Wang via Gcc-patches
Hi, _tile_loadd, _tile_stored, _tile_streamloadd intrinsics are defined by macro, so the parameters should be wrapped by parentheses to accept expressions. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. OK for master and backport to GCC11 branch? gcc/ChangeLog: *

  1   2   >