[committed] i386: Use CMOV in .SAT_{ADD|SUB} expansion for TARGET_CMOV [PR112600]

2024-06-11 Thread Uros Bizjak
For TARGET_CMOV targets emit insn sequence involving conditional move. .SAT_ADD: addl%esi, %edi movl$-1, %eax cmovnc %edi, %eax ret .SAT_SUB: subl%esi, %edi movl$0, %eax cmovnc %edi, %eax ret PR target/112600

[committed] i386: Implement .SAT_SUB for unsigned scalar integers [PR112600]

2024-06-09 Thread Uros Bizjak
The following testcase: unsigned sub_sat (unsigned x, unsigned y) { unsigned res; res = x - y; res &= -(x >= y); return res; } currently compiles (-O2) to: sub_sat: movl%edi, %edx xorl%eax, %eax subl%esi, %edx cmpl%esi, %edi setnb

Re: [committed] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Uros Bizjak
On Sat, Jun 8, 2024 at 2:09 PM Gerald Pfeifer wrote: > > On Sat, 8 Jun 2024, Uros Bizjak wrote: > > gcc/ChangeLog: > > > > * config/i386/i386.md (usadd3): New expander. > > (x86_movcc_0_m1_neg): Use SWI mode iterator. > > When you write "comm

[committed] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Uros Bizjak
The following testcase: unsigned add_sat(unsigned x, unsigned y) { unsigned z; return __builtin_add_overflow(x, y, ) ? -1u : z; } currently compiles (-O2) to: add_sat: addl%esi, %edi jc .L3 movl%edi, %eax ret .L3: orl $-1, %eax

Re: [PATCH v2 2/6] Extract ix86 dllimport implementation to mingw

2024-06-07 Thread Uros Bizjak
On Fri, Jun 7, 2024 at 11:48 AM Evgeny Karpov wrote: > > This patch extracts the ix86 implementation for expanding a SYMBOL > into its corresponding dllimport, far-address, or refptr symbol. > It will be reused in the aarch64-w64-mingw32 target. > The implementation is copied as is from

Re: [x86 PATCH] PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.

2024-06-07 Thread Uros Bizjak
On Fri, Jun 7, 2024 at 11:21 AM Roger Sayle wrote: > > > This patch addresses PR target/115351, which is a code quality regression > on x86 when passing floating point complex numbers. The ABI considers > these arguments to have TImode, requiring interunit moves to place the > FP values (which

[committed] testsuite/i386: Add vector sat_sub testcases [PR112600]

2024-06-06 Thread Uros Bizjak
PR middle-end/112600 gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-2a.c: New test. * gcc.target/i386/pr112600-2b.c: New test. Tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/testsuite/gcc.target/i386/pr112600-2a.c b/gcc/testsuite/gcc.target/i386/pr112600-2a.c new

Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
T_SUB via __builtin_sub_overflow (and in similar way for saturated add). Uros. > > Pan > > -Original Message- > From: Uros Bizjak > Sent: Wednesday, June 5, 2024 4:46 PM > To: Li, Pan2 > Cc: Richard Biener ; gcc-patches@gcc.gnu.org; > juzhe.zh...@rivai.ai; kito.ch...@gmail.co

Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 10:38 AM Li, Pan2 wrote: > > > I see. x86 doesn't have scalar saturating instructions, so the scalar > > version indeed can't be converted. > > > I will amend x86 testcases after the vector part of your patch is committed. > > Thanks for the confirmation. Just curious, the

Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 10:22 AM Li, Pan2 wrote: > > > Is the above testcase correct? You need "(x + y)" as the first term. > > Thanks for comments, should be copy issue here, you can take SAT_SUB (x, y) > => (x - y) & (-(TYPE)(x >= y)) or below template for reference. > > +#define

Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 9:38 AM Li, Pan2 wrote: > > Thanks Richard, will commit after the rebased pass the regression test. > > Pan > > -Original Message- > From: Richard Biener > Sent: Wednesday, June 5, 2024 3:19 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai;

Re: [PATCH v1 0/6] Add DLL import/export implementation to AArch64

2024-06-05 Thread Uros Bizjak
On Tue, Jun 4, 2024 at 10:10 PM Evgeny Karpov wrote: > > Richard and Uros, could you please review the changes for v2? LGTM for the generic x86 part, OS-specific part (cygming) should also be reviewed by OS port maintainer (CC'd). Thanks, Uros. > Additionally, we have detected an issue with

[committed] i386: Force operand 1 of bswapsi2 to a register for !TARGET_BSWAP [PR115321]

2024-06-03 Thread Uros Bizjak
PR target/115321 gcc/ChangeLog: * config/i386/i386.md (bswapsi2): Force operand 1 to a register also for !TARGET_BSWAP. gcc/testsuite/ChangeLog: * gcc.target/i386/pr115321.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,m32}. Uros. diff --git

Re: [PATCH] [x86] Add some preference for floating point rtl ifcvt when sse4.1 is not available

2024-06-03 Thread Uros Bizjak
On Mon, Jun 3, 2024 at 5:11 AM liuhongt wrote: > > W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por) for > movdfcc/movsfcc, and could possibly fail cost comparison. Increase > branch cost could hurt performance for other modes, so specially add > some preference for floating point

Re: [PATCH 39/52] i386: New hook implementation ix86_c_mode_for_floating_type

2024-06-03 Thread Uros Bizjak
On Mon, Jun 3, 2024 at 5:02 AM Kewen Lin wrote: > > This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE > defines in i386 port, and add new port specific hook > implementation ix86_c_mode_for_floating_type. > > gcc/ChangeLog: > > * config/i386/i386.cc

[committed] alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

2024-05-31 Thread Uros Bizjak
any_divmod instructions are modelled with invalid RTX: [(set (match_operand:DI 0 "register_operand" "=c") (sign_extend:DI (match_operator:SI 3 "divmod_operator" [(match_operand:DI 1 "register_operand" "a") (match_operand:DI 2

[committed] i386: Rewrite bswaphi2 handling [PR115102]

2024-05-30 Thread Uros Bizjak
Introduce *bswaphi2 instruction pattern and enable bswaphi2 expander also for non-movbe targets. The testcase: unsigned short bswap8 (unsigned short val) { return ((val & 0xff00) >> 8) | ((val & 0xff) << 8); } now expands through bswaphi2 named expander. Rewrite bswaphi_lowpart insn pattern

[committed] i386: Improve access to _Atomic DImode location via XMM regs for SSE4.1 x86_32 targets

2024-05-28 Thread Uros Bizjak
Use MOVD/PEXTRD and MOVD/PINSRD insn sequences to move DImode value between XMM and GPR register sets for SSE4.1 x86_32 targets in order to avoid spilling the value to stack. The load from _Atomic location a improves from: movqa, %xmm0 movq%xmm0, (%esp) movl(%esp), %eax

Re: [PATCH V2] Reduce cost of MEM (A + imm).

2024-05-28 Thread Uros Bizjak
On Tue, May 28, 2024 at 12:48 PM liuhongt wrote: > > > IMO, there is no need for CONST_INT_P condition, we should also allow > > symbol_ref, label_ref and const (all allowed by > > x86_64_immediate_operand predicate), these all decay to an immediate > > value. > > Changed. > > Bootstrapped and

Re: [PATCH] [x86_64]: Zhaoxin shijidadao enablement

2024-05-28 Thread Uros Bizjak
On Mon, May 27, 2024 at 10:33 AM MayShao wrote: > > From: mayshao > > Hi all: > This patch enables -march/-mtune=shijidadao, costs and tunings are set > according to the characteristics of the processor. > > Bootstrapped /regtested X86_64. > > Ok for trunk? OK. Thanks, Uros. > BR

Re: [PATCH] Reduce cost of MEM (A + imm).

2024-05-28 Thread Uros Bizjak
On Tue, May 28, 2024 at 4:48 AM liuhongt wrote: > > For MEM, rtx_cost iterates each subrtx, and adds up the costs, > so for MEM (reg) and MEM (reg + 4), the former costs 5, > the latter costs 9, it is not accurate for x86. Ideally > address_cost should be used, but it reduce cost too much. > So

Re: [PATCH v1 2/6] Extract ix86 dllimport implementation to mingw

2024-05-23 Thread Uros Bizjak
On Thu, May 23, 2024 at 7:53 PM Evgeny Karpov wrote: > > > Thursday, May 23, 2024 10:35 AM > Uros Bizjak wrote: > > > Richard Sandiford wrote: > > > > > > > This looks good to me apart from a couple of very minor comments > > > > below, bu

Re: [PATCH v1 2/6] Extract ix86 dllimport implementation to mingw

2024-05-23 Thread Uros Bizjak
On Thu, May 23, 2024 at 10:35 AM Uros Bizjak wrote: > > On Wed, May 22, 2024 at 4:32 PM Evgeny Karpov > wrote: > > > > Wednesday, May 22, 2024 1:06 PM > > Richard Sandiford wrote: > > > > > This looks good to me apart from a couple of very minor c

Re: [PATCH v1 2/6] Extract ix86 dllimport implementation to mingw

2024-05-23 Thread Uros Bizjak
On Wed, May 22, 2024 at 4:32 PM Evgeny Karpov wrote: > > Wednesday, May 22, 2024 1:06 PM > Richard Sandiford wrote: > > > This looks good to me apart from a couple of very minor comments below, but > > please get approval from the x86 maintainers as well. In particular, they > > might > >

Re: [x86_64 PATCH] Correct insn_cost of movabsq.

2024-05-22 Thread Uros Bizjak
On Wed, May 22, 2024 at 5:15 PM Roger Sayle wrote: > > This single line patch fixes a strange quirk/glitch in i386's rtx_costs, > which considers an instruction loading a 64-bit constant to be significantly > cheaper than loading a 32-bit (or smaller) constant. > > Consider the two functions: >

Re: [PATCH v2 1/8] [APX NF]: Support APX NF add

2024-05-22 Thread Uros Bizjak
On Wed, May 22, 2024 at 10:29 AM Kong, Lingling wrote: > > > I wonder if we can use "define_subst" to conditionally add flags clobber > > for !TARGET_APX_NF targets. Even the example for "Define Subst" uses the > > insn > > w/ and w/o the clobber, so I think it is worth considering this

Re: [PATCH v3] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Uros Bizjak
On Tue, May 21, 2024 at 11:01 AM Haochen Jiang wrote: > > Hi all, > > This is the v3 patch to fix PR115069. The new testcase has passed. > > Changes in v3: > - Simplify the testcase. > > Changes in v2: > - Add a testcase. > - Change the comment for the early exit. > > Thx, > Haochen > >

Re: [PATCH 2/2] [x86] Adjust rtx_cost for MEM to enable more simplication

2024-05-21 Thread Uros Bizjak
On Tue, May 21, 2024 at 7:13 AM liuhongt wrote: > > For CONST_VECTOR_DUPLICATE_P in constant_pool, it is just broadcast or > variants in ix86_vector_duplicate_simode_const. > Adjust the cost to COSTS_N_INSNS (2) + speed which should be a little > bit larger than broadcast. > > Bootstrapped and

Re: [PATCH] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Uros Bizjak
On Tue, May 21, 2024 at 8:16 AM Haochen Jiang wrote: > > Hi all, > > Since vpermq is really slow, we should avoid using it when it is > the only instruction could be used for ix86_expand_vecop_qihi2. > > Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk? > > Thx, > Haochen > >

[PATCH] i386: Rename sat_plusminus expanders to standard names [PR11260]

2024-05-17 Thread Uros Bizjak
Rename _3 expander to a standard ssadd, usadd, sssub and ussub name to enable corresponding optab expansion. Also add named expander for MMX modes. PR middle-end/112600 gcc/ChangeLog: * config/i386/mmx.md (3): New expander. * config/i386/sse.md (_3): Rename expander to 3.

Re: [PATCH] [x86] Set d.one_operand_p to true when TARGET_SSSE3 in ix86_expand_vecop_qihi_partial.

2024-05-15 Thread Uros Bizjak
On Wed, May 15, 2024 at 12:05 PM liuhongt wrote: > > pshufb is available under TARGET_SSSE3, so > ix86_expand_vec_perm_const_1 must return true when TARGET_SSSE3. > w/o TARGET_SSSE3, if we set one_operand_p to true, > ix86_expand_vec_perm_const_1 could return false. > > With the patch under

Re: [PATCH 1/8] [APX NF]: Support APX NF add

2024-05-15 Thread Uros Bizjak
On Wed, May 15, 2024 at 9:43 AM Kong, Lingling wrote: > > From: Hongyu Wang > > APX NF(no flags) feature implements suppresses the update of status flags for > arithmetic operations. > > For NF add, it is not clear whether NF add can be faster than lea. If so, the > pattern needs to be

Re: [PATCH 1/8] [APX NF]: Support APX NF add

2024-05-15 Thread Uros Bizjak
On Wed, May 15, 2024 at 9:43 AM Kong, Lingling wrote: > > From: Hongyu Wang > > APX NF(no flags) feature implements suppresses the update of status flags for > arithmetic operations. > > For NF add, it is not clear whether NF add can be faster than lea. If so, the > pattern needs to be

Re: [PATCH 1/1] [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-14 Thread Uros Bizjak
On Thu, May 9, 2024 at 11:12 AM Levy Hsu wrote: > > Hi All > > We've introduced a new subroutine in ix86_expand_vec_perm_const_1 > to optimize vector shifting for the V16QI type on x86. > This patch uses a three-instruction sequence psrlw, psllw, and por > to handle specific vector shuffle

Re: [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-08 Thread Uros Bizjak
On Wed, May 8, 2024 at 4:44 AM Levy Hsu wrote: > > PR target/107563 > > gcc/ChangeLog: > > * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New > subroutine. > (ix86_expand_vec_perm_const_1): New Entry. > > gcc/testsuite/ChangeLog: > > *

Re: [PATCH] x86: Fix cmov cost model issue [PR109549]

2024-05-06 Thread Uros Bizjak
On Mon, May 6, 2024 at 5:20 AM Hongtao Liu wrote: > > CC uros. > > On Mon, May 6, 2024 at 11:03 AM Kong, Lingling > wrote: > > > > Hi, > > (if_then_else:SI (eq (reg:CCZ 17 flags) > > (const_int 0 [0])) > > (reg/v:SI 101 [ e ]) > > (reg:SI 102)) > > The cost is 8 for the rtx, the

Re: [PATCH] [x86] Adjust alternative *k to ?k for avx512 mask in zero_extend patterns

2024-04-28 Thread Uros Bizjak
On Sun, Apr 28, 2024 at 7:47 AM liuhongt wrote: > > So when both source operand and dest operand require avx512 MASK_REGS, RA > can allocate MASK_REGS register instead of GPR to avoid reload it from > GPR to MASK_REGS. > It's similar as what did for logic patterns. > > Bootstrapped and regtested

Re: [PATCH] i386: Fix array index overflow in pr105354-2.c

2024-04-26 Thread Uros Bizjak
On Fri, Apr 26, 2024 at 11:03 AM Haochen Jiang wrote: > > Hi all, > > The array index should not be over 8 for v8hi, or it will fail > under -O0 or using -fstack-protector. > > This patch aims to fix that, which is mentioned in PR110621. > > Commit as obvious and backport to GCC13. > > Thx, >

Re: [PATCH] i386: Avoid =, r, r andn double-word alternative for ia32 [PR114810]

2024-04-23 Thread Uros Bizjak
On Tue, Apr 23, 2024 at 5:50 PM Jakub Jelinek wrote: > > Hi! > > As discussed in the PR, on ia32 with its 8 GPRs, where 1 is always fixed > and other 2 often are as well having an alternative which needs 3 > double-word registers is just too much for RA. > The following patch splits that

Re: [PATCH] [testsuite] [i386] add -msse2 to tests that require it

2024-04-17 Thread Uros Bizjak
On Tue, Apr 16, 2024 at 5:52 AM Alexandre Oliva wrote: > > > Without -msse2, an i586-targeting toolchain fails bf16_short_warn.c > because neither type __m128bh nor intrinsic _mm_cvtneps_pbh get > declared. > > Regstrapped on x86_64-linux-gnu. Also tested with gcc-13 on arm-, > aarch64-, x86-

Re: [PATCH] [testsuite] [i386] work around fails with --enable-frame-pointer

2024-04-17 Thread Uros Bizjak
On Tue, Apr 16, 2024 at 5:51 AM Alexandre Oliva wrote: > > > A few x86 tests get unexpected insn counts if the toolchain is > configured with --enable-frame-pointer. Add explicit > -fomit-frame-pointer so that the expected insn sequences are output. > > Regstrapped on x86_64-linux-gnu. Also

Re: Combine patch ping

2024-04-11 Thread Uros Bizjak
On Thu, Apr 11, 2024 at 4:02 PM Segher Boessenkool wrote: > > On Wed, Apr 10, 2024 at 08:32:39PM +0200, Uros Bizjak wrote: > > On Wed, Apr 10, 2024 at 7:56 PM Segher Boessenkool > > wrote: > > > This is never okay. You cannot commit a patch without approval, *eve

Re: Combine patch ping

2024-04-10 Thread Uros Bizjak
On Wed, Apr 10, 2024 at 7:56 PM Segher Boessenkool wrote: > > On Sun, Apr 07, 2024 at 08:31:38AM +0200, Uros Bizjak wrote: > > If there are no further comments, I plan to commit the referred patch > > to the mainline on Wednesday. The latest version can be considered an

Re: Combine patch ping

2024-04-07 Thread Uros Bizjak
On Mon, Apr 1, 2024 at 9:28 PM Uros Bizjak wrote: > I'd like to ping the > https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647634.html > PR112560 P1 patch. If there are no further comments, I plan to commit the referred patch to the mainline on Wednesday. The latest ve

Re: [PATCH] x86: Use explicit shift count in double-precision shifts

2024-04-06 Thread Uros Bizjak
On Fri, Apr 5, 2024 at 5:56 PM H.J. Lu wrote: > > Don't use implicit shift count in double-precision shifts in AT syntax > since they aren't in Intel SDM. Keep the 's' modifier for backward > compatibility with inline asm statements. > > PR target/114590 > * config/i386/i386.md

Re: [PATCH] x86: Define __APX_F__ for -mapxf

2024-04-04 Thread Uros Bizjak
On Thu, Apr 4, 2024 at 5:08 PM H.J. Lu wrote: > > Define __APX_F__ when APX is enabled. > > gcc/ > > PR target/114587 > * config/i386/i386-c.cc (ix86_target_macros_internal): Define > __APX_F__ when APX is enabled. > > gcc/testsuite/ > > PR target/114587 >

Combine patch ping

2024-04-01 Thread Uros Bizjak
Hello! I'd like to ping the https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647634.html PR112560 P1 patch. Thanks, Uros.

Re: [PATCH] testsuite: Fix up ext-floating{3,12}.C on i686-linux

2024-03-27 Thread Uros Bizjak
On Wed, Mar 27, 2024 at 11:48 AM Jakub Jelinek wrote: > > Hi! > > These tests FAIL for quite a while on i686-linux since July last year, > likely r14-2628 . Since that patch gcc claims _Float16 and __bf16 > support even without -msse2 because some functions could be using > target attribute. >

Re: [PATCH] testsuite: i386: Skip gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c etc. with Solaris as [PR114150]

2024-03-21 Thread Uros Bizjak
On Thu, Mar 21, 2024 at 10:26 AM Rainer Orth wrote: > > Two avx512cd tests FAIL to assemble with the Solaris/x86 assembler: > > FAIL: gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c (test for excess errors) > UNRESOLVED: gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c compilation failed > to produce

[PATCH] i386: Unify {general, timode}_scalar_chain::convert_op [PR111822]

2024-03-18 Thread Uros Bizjak
Recent PR111822 fix implemented REG_EH_REGION note copying to a STV converted preload instruction in general_scalar_chain::convert_op. However, the same issue remains in timode_scalar_chain::convert_op. Instead of copying the newly introduced code to timode_scalar_chain::convert_op, the patch

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-18 Thread Uros Bizjak
On Mon, Mar 18, 2024 at 3:51 PM Segher Boessenkool wrote: > > On Thu, Mar 07, 2024 at 11:46:54PM +0100, Uros Bizjak wrote: > > > Can't you just describe the dataflow then, without an unspec? An unspec > > > by definition does some (unspecified) operation on the

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-18 Thread Uros Bizjak
On Mon, Mar 18, 2024 at 3:46 PM Segher Boessenkool wrote: > > On Thu, Mar 07, 2024 at 11:27:28PM +0100, Uros Bizjak wrote: > > On Thu, Mar 7, 2024 at 11:07 PM Uros Bizjak wrote: > > > > > (unspec:DI [ > > > > > (reg:C

Re: [PATCH] i386 [stv]: Handle REG_EH_REGION note [pr111822].

2024-03-18 Thread Uros Bizjak
On Mon, Mar 18, 2024 at 11:52 AM liuhongt wrote: > > Commit r14-9459-g618e34d56cc38e only handles > general_scalar_chain::convert_op. The patch also handles > timode_scalar_chain::convert_op to avoid potential similar bug. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for

Re: [PATCH] i386: Fix a pasto in ix86_expand_int_sse_cmp [PR114339]

2024-03-15 Thread Uros Bizjak
On Fri, Mar 15, 2024 at 9:50 AM Jakub Jelinek wrote: > > Hi! > > In r13-3803-gfa271afb58 I've added an optimization for LE/LEU/GE/GEU > comparison against CONST_VECTOR. As the comments say: > /* x <= cst can be handled as x < cst + 1 unless there is > wrap around in cst + 1.

Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Uros Bizjak
On Thu, Mar 14, 2024 at 8:42 AM Uros Bizjak wrote: > > On Thu, Mar 14, 2024 at 8:32 AM Hongtao Liu wrote: > > > > On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak wrote: > > > > > > On Thu, Mar 14, 2024 at 2:33 AM liuhongt wrote: > > > > > > &g

Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Uros Bizjak
On Thu, Mar 14, 2024 at 8:32 AM Hongtao Liu wrote: > > On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak wrote: > > > > On Thu, Mar 14, 2024 at 2:33 AM liuhongt wrote: > > > > > > When we split > > > (insn 37 36 38 10 (set (reg:DI 104 [ _18 ]) > > &

Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Uros Bizjak
On Thu, Mar 14, 2024 at 2:33 AM liuhongt wrote: > > When we split > (insn 37 36 38 10 (set (reg:DI 104 [ _18 ]) > (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct > SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])) "test.C":22:42 84 > {*movdi_internal} >

Fwd: [PATCH v3] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-12 Thread Uros Bizjak
Forgot to CC gcc-patches@ ML... sorry for the duplicate... The compiler, configured with --enable-checking=yes,rtl,extra ICEs with: internal compiler error: RTL check: expected elt 0 type 'e' or 'u', have 'E' (rtx unspec) in try_combine, at combine.cc:3237 This is 3236 /* Just

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 11:29 PM Segher Boessenkool wrote: > > On Thu, Mar 07, 2024 at 11:07:18PM +0100, Uros Bizjak wrote: > > On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool > > wrote: > > > > but can be something else, such as the above not

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 11:07 PM Uros Bizjak wrote: > > On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool > wrote: > > > > On Thu, Mar 07, 2024 at 10:04:32PM +0100, Uros Bizjak wrote: > > > > [snip] > > > > > The part we want to fix deals with the

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool wrote: > > On Thu, Mar 07, 2024 at 10:04:32PM +0100, Uros Bizjak wrote: > > [snip] > > > The part we want to fix deals with the *user* of the CC register. It > > is not true that this is always COMPARISON_P, so EQ, NE,

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 10:04 PM Uros Bizjak wrote: > The source code that deals with the *user* of the CC register assumes > the former form, so it blindly tries to update the mode of the CC > register inside LT comparison RTX (some other nearby source code even > checks for (cons

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 6:39 PM Segher Boessenkool wrote: > > On Thu, Mar 07, 2024 at 10:55:12AM +0100, Richard Biener wrote: > > On Thu, 7 Mar 2024, Uros Bizjak wrote: > > > This is > > > > > > 3236 /* Just replace the CC reg with a new mode.

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 12:11 PM Richard Biener wrote: > > On Thu, 7 Mar 2024, Jakub Jelinek wrote: > > > On Thu, Mar 07, 2024 at 11:11:35AM +0100, Uros Bizjak wrote: > > > > Since you CCed me - looking at the code I wonder why we fatally fail. > > > >

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 11:37 AM Jakub Jelinek wrote: > > On Thu, Mar 07, 2024 at 11:11:35AM +0100, Uros Bizjak wrote: > > > Since you CCed me - looking at the code I wonder why we fatally fail. > > > The following might also fix the issue and preserve more of th

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 10:56 AM Richard Biener wrote: > > On Thu, 7 Mar 2024, Uros Bizjak wrote: > > > The compiler, configured with --enable-checking=yes,rtl,extra ICEs with: > > > > internal compiler error: RTL check: expected elt 0 type 'e' or 'u', > > hav

[PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
The compiler, configured with --enable-checking=yes,rtl,extra ICEs with: internal compiler error: RTL check: expected elt 0 type 'e' or 'u', have 'E' (rtx unspec) in try_combine, at combine.cc:3237 This is 3236 /* Just replace the CC reg with a new mode. */ 3237 SUBST

[committed] i386: Fix and improve insn constraint for V2QI arithmetic/shift insns

2024-03-06 Thread Uros Bizjak
optimize_function_for_size_p predicate is not stable during optab selection, because it also depends on node->count/node->frequency of the current function, which are updated during IPA, so they may change between early opts and late opts. Use optimize_size instead - optimize_size implies

[committed] i386: Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move

2024-03-06 Thread Uros Bizjak
Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move and use generic code instead. No functional changes. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_move) [TARGET_MACHO]: Eliminate common code and use generic code instead. Bootstrapped and regression

Re: [PATCH] i386: Fix up the vzeroupper REG_DEAD/REG_UNUSED note workaround [PR114190]

2024-03-06 Thread Uros Bizjak
On Wed, Mar 6, 2024 at 9:10 AM Jakub Jelinek wrote: > > Hi! > > When writing the rest_of_handle_insert_vzeroupper workaround to manually > remove all the REG_DEAD/REG_UNUSED notes from the IL, I've missed that > there is a df_analyze () call right after it and that the problems added > earlier in

Re: [PATCH] i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]

2024-03-04 Thread Uros Bizjak
On Mon, Mar 4, 2024 at 9:41 AM Jakub Jelinek wrote: > > On Mon, Mar 04, 2024 at 09:34:30AM +0100, Uros Bizjak wrote: > > > --- gcc/config/i386/i386-expand.cc.jj 2024-03-01 14:56:34.120925989 > > > +0100 > > > +++ gcc/config/i386/i386-expand.cc 2024-03-0

Re: [PATCH] i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]

2024-03-04 Thread Uros Bizjak
On Mon, Mar 4, 2024 at 9:25 AM Jakub Jelinek wrote: > > Hi! > > The Intel extended format has the various weird number categories, > pseudo denormals, pseudo infinities, pseudo NaNs and unnormals. > Those are not representable in the GCC real_value and so neither > GIMPLE nor RTX

[committed] alpha: Introduce UMUL_HIGHPART rtx_code [PR113720]

2024-03-03 Thread Uros Bizjak
umuldi3_highpart expander does: if (REG_P (operands[2])) operands[2] = gen_rtx_ZERO_EXTEND (TImode, operands[2]); on register_operand predicate, which also allows SUBREG RTX. So, subregs were emitted without ZERO_EXTEND RTX. But nowadays we have UMUL_HIGHPART that allows us to fix this

[committed] i386: psrlq is not used for PERM [PR113871]

2024-02-27 Thread Uros Bizjak
Also handle V2BF mode. PR target/113871 gcc/ChangeLog: * config/i386/mmx.md (V248FI): Add V2BF mode. (V24FI_32): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr113871-5a.c: New test. * gcc.target/i386/pr113871-5b.c: New test. Bootstrapped and regression tested on

Re: Patch ping^2

2024-02-26 Thread Uros Bizjak
On Mon, Feb 26, 2024 at 10:33 AM Jakub Jelinek wrote: > > Hi! > > I'd like to ping 2 patches: > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645326.html > i386: Enable _BitInt support on ia32 > > all the FAILs mentioned in that mail have been fixed by now. LGTM, based on HJ's advice.

Re: [PATCH v2] x86: Check interrupt instead of noreturn attribute

2024-02-26 Thread Uros Bizjak
On Sun, Feb 25, 2024 at 10:14 PM H.J. Lu wrote: > > ix86_set_func_type checks noreturn attribute to avoid incompatible > attribute error in LTO1 on interrupt functions. Since TREE_THIS_VOLATILE > is set also for _Noreturn without noreturn attribute, check interrupt > attribute for interrupt

Re: [PATCH] x86: Check interrupt instead of noreturn attribute

2024-02-25 Thread Uros Bizjak
On Sun, Feb 25, 2024 at 5:01 PM H.J. Lu wrote: > > ix86_set_func_type checks noreturn attribute to avoid incompatible > attribute error in LTO1 on interrupt functions. Since TREE_THIS_VOLATILE > is set also for _Noreturn without noreturn attribute, check interrupt > attribute for interrupt

Re: PING: [PATCH] x86-64: Check R_X86_64_CODE_6_GOTTPOFF support

2024-02-23 Thread Uros Bizjak
On Fri, Feb 23, 2024 at 3:45 AM H.J. Lu wrote: > > On Thu, Feb 22, 2024 at 6:39 PM Hongtao Liu wrote: > > > > On Thu, Feb 22, 2024 at 10:33 PM H.J. Lu wrote: > > > > > > On Sun, Feb 18, 2024 at 8:02 AM H.J. Lu wrote: > > > > > > > > If assembler and linker supports > > > > > > > > add %reg1,

[committed] testsuite: Fix a couple of x86 issues in gcc.dg/vect testsuite

2024-02-14 Thread Uros Bizjak
A compile-time test can use -march=skylake-avx512 for all x86 targets, but a runtime test needs to check avx512f effective target if the instructions can be assembled. The runtime test also needs to check if the target machine supports instruction set we have been compiled for. The testsuite

[committed] i386: psrlq is not used for PERM [PR113871]

2024-02-14 Thread Uros Bizjak
Introduce vec_shl_ and vec_shr_ expanders to improve '*a = __builtin_shufflevector(*a, (vect64){0}, 1, 2, 3, 4);' and '*a = __builtin_shufflevector((vect64){0}, *a, 3, 4, 5, 6);' shuffles. The generated code improves from: movzwl 6(%rdi), %eax movzwl 4(%rdi), %edx salq

Re: [PATCH v6] x86-64: Find a scratch register for large model profiling

2024-02-05 Thread Uros Bizjak
On Mon, Feb 5, 2024 at 5:43 PM H.J. Lu wrote: > > Changes in v6: > > 1. Use ix86_save_reg and accessible_reg_set in > x86_64_select_profile_regnum. > 2. Construct a complete reg name in x86_function_profiler. > > Changes in v5: > > 1. Add pr113689-3.c. > 2. Use %r10 if

Re: [PATCH v5] x86-64: Find a scratch register for large model profiling

2024-02-05 Thread Uros Bizjak
On Fri, Feb 2, 2024 at 11:47 PM H.J. Lu wrote: > > Changes in v5: > > 1. Add pr113689-3.c. > 2. Use %r10 if ix86_profile_before_prologue () return true. > 3. Try a callee-saved register which has been saved on stack in the > prologue. > > Changes in v4: > > 1. Remove pr113689-3.c. > 2. Use

Re: [x86_64 PATCH] PR target/113690: Fix-up MULT REG_EQUAL notes in STV.

2024-02-05 Thread Uros Bizjak
On Mon, Feb 5, 2024 at 9:06 AM Uros Bizjak wrote: > > On Mon, Feb 5, 2024 at 1:24 AM Roger Sayle wrote: > > > > > > This patch fixes PR target/113690, an ICE-on-valid regression on x86_64 > > that exhibits with a specific combination of command line options. The

Re: [PATCH] i386: Clear REG_UNUSED and REG_DEAD notes from the IL at the end of vzeroupper pass [PR113059]

2024-02-05 Thread Uros Bizjak
On Wed, Jan 31, 2024 at 9:23 AM Jakub Jelinek wrote: > > Hi! > > The move of the vzeroupper pass from after reload pass to after > postreload_cse helped only partially, CSE-like passes can still invalidate > those notes (especially REG_UNUSED) if they use some earlier register > holding some

Re: [x86_64 PATCH] PR target/113690: Fix-up MULT REG_EQUAL notes in STV.

2024-02-05 Thread Uros Bizjak
On Mon, Feb 5, 2024 at 1:24 AM Roger Sayle wrote: > > > This patch fixes PR target/113690, an ICE-on-valid regression on x86_64 > that exhibits with a specific combination of command line options. The > cause is that x86's scalar-to-vector pass converts a chain of instructions > from TImode to

Re: [PATCH] testsuite: i386: Fix gcc.target/i386/pr71321.c on Solaris/x86

2024-02-02 Thread Uros Bizjak
On Fri, Feb 2, 2024 at 9:59 AM Rainer Orth wrote: > > gcc.target/i386/pr71321.c FAILs on 64-bit Solaris/x86 with the native > assembler: > > FAIL: gcc.target/i386/pr71321.c scan-assembler-not lea.*0 > > The problem is that /bin/as doesn't fully support cfi directives, so the > .eh_frame section

[committed] i386: Improve *cmp_doubleword splitter [PR113701]

2024-02-01 Thread Uros Bizjak
The fix for PR70321 introduced a splitter that split a doubleword comparison into a pair of XORs followed by an IOR to set the (zero) flags register. To help the reload, splitter forced SUBREG pieces of double-word input values to a pseudo, but this regressed gcc.target/i386/pr82580.c int f0 (U

Re: [PATCH 1/2] target/113255 - avoid REG_POINTER on a pointer difference

2024-02-01 Thread Uros Bizjak
On Thu, Feb 1, 2024 at 3:18 PM Richard Biener wrote: > > The following avoids re-using a register holding a pointer (and > thus might be REG_POINTER) for the result of a pointer difference > computation. That might confuse heuristics in (broken) RTL alias > analysis which relies on REG_POINTER

Re: [PATCH] testsuite: i386: Fix gcc.target/i386/no-callee-saved-1.c etc. on Solaris/x86

2024-01-31 Thread Uros Bizjak
On Wed, Jan 31, 2024 at 1:57 PM Rainer Orth wrote: > > The gcc.target/i386/no-callee-saved-[12].c tests FAIL on Solaris/x86: > > FAIL: gcc.target/i386/no-callee-saved-1.c scan-assembler-not push > FAIL: gcc.target/i386/no-callee-saved-2.c scan-assembler-not push > > In both cases, the test

Re: [PATCH] testsuite: i386: Fix gcc.target/i386/pr38534-1.c etc. on Solaris/x86

2024-01-31 Thread Uros Bizjak
On Wed, Jan 31, 2024 at 2:02 PM Rainer Orth wrote: > > The gcc.target/i386/pr38534-1.c etc. tests FAIL on 32 and 64-bit > Solaris/x86: > > FAIL: gcc.target/i386/pr38534-1.c scan-assembler-not push > FAIL: gcc.target/i386/pr38534-2.c scan-assembler-not push > FAIL: gcc.target/i386/pr38534-3.c

Re: Unreviewed patches

2024-01-31 Thread Uros Bizjak
On Wed, Jan 31, 2024 at 3:04 PM Rainer Orth wrote: > > Three patches have remained unreviewed for a week or more: > > c++: Fix g++.dg/ext/attr-section2.C etc. with Solaris/SPARC as > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643434.html > > This one may even be

Re: [PATCH] testsuite: i386: Fix gcc.target/i386/pr80833-1.c on 32-bit Solaris/x86

2024-01-24 Thread Uros Bizjak
On Wed, Jan 24, 2024 at 10:07 AM Rainer Orth wrote: > > gcc.target/i386/pr80833-1.c FAILs on 32-bit Solaris/x86 since 20220609: > > FAIL: gcc.target/i386/pr80833-1.c scan-assembler pextrd > > Unlike e.g. Linux/i686, 32-bit Solaris/x86 defaults to -mstackrealign, > so this patch overrides that to

Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-01-20 Thread Uros Bizjak
On Fri, Jan 19, 2024 at 5:50 PM Jeff Law wrote: > > > > On 1/19/24 09:05, Georg-Johann Lay wrote: > > > > > > Am 18.01.24 um 20:54 schrieb Roger Sayle: > >> > >> This patch tweaks RTL expansion of multi-word shifts and rotates to use > >> PLUS rather than IOR for disjunctive operations. During

Re: [PATCH] i386: Add -masm=intel profiling support [PR113122]

2024-01-18 Thread Uros Bizjak
On Thu, Jan 18, 2024 at 8:31 AM Jakub Jelinek wrote: > > Hi! > > x86_function_profiler emits assembly directly into file and only emits > AT syntax. The following patch adjusts it to emit MASM syntax > if -masm=intel. > As it doesn't use asm_fprintf, I can't use {|} syntax for the dialects. > >

Re: [PATCH] i386: Add "Ws" constraint for symbolic address/label reference [PR105576]

2024-01-16 Thread Uros Bizjak
On Thu, Jan 11, 2024 at 7:24 PM Fangrui Song wrote: > > Printing the raw symbol is useful in inline asm (e.g. in C++ to get the > mangled name). Similar constraints are available in other targets (e.g. > "S" for aarch64/riscv, "Cs" for m68k). > > There isn't a good way for x86 yet, e.g. "i"

Re: [PATCH] i386: Add "z" constraint for symbolic address/label reference [PR105576]

2024-01-11 Thread Uros Bizjak
On Thu, Jan 11, 2024 at 9:33 AM Fangrui Song wrote: > > On 2024-01-11, Uros Bizjak wrote: > >On Thu, Jan 11, 2024 at 4:44 AM Fangrui Song wrote: > >> > >> Printing the raw symbol is useful in inline asm (e.g. in C++ to get the > >> mangled name). Sim

Re: [PATCH] i386: Add "z" constraint for symbolic address/label reference [PR105576]

2024-01-10 Thread Uros Bizjak
On Thu, Jan 11, 2024 at 4:44 AM Fangrui Song wrote: > > Printing the raw symbol is useful in inline asm (e.g. in C++ to get the > mangled name). Similar constraints are available in other targets (e.g. > "S" for aarch64/riscv, "Cs" for m68k). > > There isn't a good way for x86 yet, e.g. "i"

Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]

2024-01-09 Thread Uros Bizjak
On Tue, Jan 9, 2024 at 11:19 AM Uros Bizjak wrote: > > On Tue, Jan 9, 2024 at 11:06 AM Richard Biener wrote: > > > > On Tue, 9 Jan 2024, Uros Bizjak wrote: > > > > > On Tue, Jan 9, 2024 at 10:44?AM Richard Biener wrote: > > > >

Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]

2024-01-09 Thread Uros Bizjak
On Tue, Jan 9, 2024 at 11:06 AM Richard Biener wrote: > > On Tue, 9 Jan 2024, Uros Bizjak wrote: > > > On Tue, Jan 9, 2024 at 10:44?AM Richard Biener wrote: > > > > > > On Tue, 9 Jan 2024, Uros Bizjak wrote: > > > > > > >

Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]

2024-01-09 Thread Uros Bizjak
On Tue, Jan 9, 2024 at 10:44 AM Richard Biener wrote: > > On Tue, 9 Jan 2024, Uros Bizjak wrote: > > > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener wrote: > > > > > > On Mon, 8 Jan 2024, Uros Bizjak wrote: > > > > > > &g

  1   2   3   4   5   6   7   8   9   10   >