date:20210731

[r12-2649 Regression] FAIL: gcc.target/i386/pr78103-2.c scan-assembler \\m(leal|addl)\\M on Linux/x86_64

2021-07-31 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

91425e2adecd00091d7443104ecb367686e88663 is the first bad commit
commit 91425e2adecd00091d7443104ecb367686e88663
Author: Jakub Jelinek 
Date:   Sat Jul 31 09:19:32 2021 +0200

i386: Improve extensions of __builtin_clz and constant - __builtin_clz for 
-mno-lzcnt [PR78103]

caused

FAIL: gcc.target/i386/pr78103-2.c scan-assembler \\m(leal|addl)\\M

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2649/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr78103-2.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]

2021-07-31 Thread H.J. Lu via Gcc-patches

On Fri, Jul 30, 2021 at 6:27 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Fri, Jul 30, 2021 at 12:27:39PM +0200, Uros Bizjak wrote:
> > Please put some space here, e.g.:
> ...
> > Can you just name the relevant insn pattern and use
> >
> > emit_insn (gen_bsr_1)?
>
> Here is the updated patch.  I'll bootstrap/regtest it tonight.
>
> 2021-07-30  Jakub Jelinek  
>
> PR target/78103
> * config/i386/i386.md (bsr_rex64_1, bsr_1, bsr_zext_1): New
> define_insn patterns.
> (*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns.
> Add combine splitters for constant - clz.
> (clz2): Use a temporary pseudo for bsr result.
>
> * gcc.target/i386/pr78103-1.c: New test.
> * gcc.target/i386/pr78103-2.c: New test.
> * gcc.target/i386/pr78103-3.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2021-07-28 12:05:56.857977764 +0200
> +++ gcc/config/i386/i386.md 2021-07-30 15:13:49.994946550 +0200
> @@ -14761,6 +14761,18 @@ (define_insn "bsr_rex64"
> (set_attr "znver1_decode" "vector")
> (set_attr "mode" "DI")])
>
> +(define_insn "bsr_rex64_1"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +   (minus:DI (const_int 63)
> + (clz:DI (match_operand:DI 1 "nonimmediate_operand" "rm"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT && TARGET_64BIT"
> +  "bsr{q}\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "alu1")
> +   (set_attr "prefix_0f" "1")
> +   (set_attr "znver1_decode" "vector")
> +   (set_attr "mode" "DI")])
> +
>  (define_insn "bsr"
>[(set (reg:CCZ FLAGS_REG)
> (compare:CCZ (match_operand:SI 1 "nonimmediate_operand" "rm")
> @@ -14775,17 +14787,204 @@ (define_insn "bsr"
> (set_attr "znver1_decode" "vector")
> (set_attr "mode" "SI")])
>
> +(define_insn "bsr_1"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +   (minus:SI (const_int 31)
> + (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT"
> +  "bsr{l}\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "alu1")
> +   (set_attr "prefix_0f" "1")
> +   (set_attr "znver1_decode" "vector")
> +   (set_attr "mode" "SI")])
> +
> +(define_insn "bsr_zext_1"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +   (zero_extend:DI
> + (minus:SI
> +   (const_int 31)
> +   (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
> +   (clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT && TARGET_64BIT"
> +  "bsr{l}\t{%1, %k0|%k0, %1}"
> +  [(set_attr "type" "alu1")
> +   (set_attr "prefix_0f" "1")
> +   (set_attr "znver1_decode" "vector")
> +   (set_attr "mode" "SI")])
> +
> +; As bsr is undefined behavior on zero and for other input
> +; values it is in range 0 to 63, we can optimize away sign-extends.
> +(define_insn_and_split "*bsr_rex64_2"
> +  [(set (match_operand:DI 0 "register_operand")
> +   (xor:DI
> + (sign_extend:DI
> +   (minus:SI
> + (const_int 63)
> + (subreg:SI (clz:DI (match_operand:DI 1 "nonimmediate_operand"))
> +0)))
> + (const_int 63)))
> +(clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(parallel [(set (reg:CCZ FLAGS_REG)
> +  (compare:CCZ (match_dup 1) (const_int 0)))
> + (set (match_dup 2)
> +  (minus:DI (const_int 63) (clz:DI (match_dup 1])
> +   (parallel [(set (match_dup 0)
> +  (zero_extend:DI (xor:SI (match_dup 3) (const_int 63
> + (clobber (reg:CC FLAGS_REG))])]
> +{
> +  operands[2] = gen_reg_rtx (DImode);
> +  operands[3] = lowpart_subreg (SImode, operands[2], DImode);
> +})
> +
> +(define_insn_and_split "*bsr_2"
> +  [(set (match_operand:DI 0 "register_operand")
> +   (sign_extend:DI
> + (xor:SI
> +   (minus:SI
> + (const_int 31)
> + (clz:SI (match_operand:SI 1 "nonimmediate_operand")))
> +   (const_int 31
> +   (clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(parallel [(set (reg:CCZ FLAGS_REG)
> +  (compare:CCZ (match_dup 1) (const_int 0)))
> + (set (match_dup 2)
> +  (minus:SI (const_int 31) (clz:SI (match_dup 1])
> +   (parallel [(set (match_dup 0)
> +  (zero_extend:DI (xor:SI (match_dup 2) (const_int 31
> + (clobber (reg:CC FLAGS_REG))])]
> +  "operands[2] = gen_reg_rtx (SImode);")
> +
> +; Splitters to optimize 64 - __builtin_clzl (x) or 32 - __builtin_clz (x).
> +; Again, as for !TARGET_LZCNT CLZ is UB at zero, CLZ is guaranteed to be
> +; in [0, 63] or [0, 31] range.
> +(define_split
> +  [(set (match_operand:SI 0 "register_operand")
> +   (minus:SI
> + (match_operand:SI 2 "const_int_operand")
> + (xor:SI
> +   (minus:SI

Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]

2021-07-31 Thread H.J. Lu via Gcc-patches

On Sat, Jul 31, 2021 at 12:38 PM H.J. Lu  wrote:
>
> On Fri, Jul 30, 2021 at 6:27 AM Jakub Jelinek via Gcc-patches
>  wrote:
> >
> > On Fri, Jul 30, 2021 at 12:27:39PM +0200, Uros Bizjak wrote:
> > > Please put some space here, e.g.:
> > ...
> > > Can you just name the relevant insn pattern and use
> > >
> > > emit_insn (gen_bsr_1)?
> >
> > Here is the updated patch.  I'll bootstrap/regtest it tonight.
> >
> > 2021-07-30  Jakub Jelinek  
> >
> > PR target/78103
> > * config/i386/i386.md (bsr_rex64_1, bsr_1, bsr_zext_1): New
> > define_insn patterns.
> > (*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns.
> > Add combine splitters for constant - clz.
> > (clz2): Use a temporary pseudo for bsr result.
> >
> > * gcc.target/i386/pr78103-1.c: New test.
> > * gcc.target/i386/pr78103-2.c: New test.
> > * gcc.target/i386/pr78103-3.c: New test.
> >
> > --- gcc/config/i386/i386.md.jj  2021-07-28 12:05:56.857977764 +0200
> > +++ gcc/config/i386/i386.md 2021-07-30 15:13:49.994946550 +0200
> > @@ -14761,6 +14761,18 @@ (define_insn "bsr_rex64"
> > (set_attr "znver1_decode" "vector")
> > (set_attr "mode" "DI")])
> >
> > +(define_insn "bsr_rex64_1"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +   (minus:DI (const_int 63)
> > + (clz:DI (match_operand:DI 1 "nonimmediate_operand" 
> > "rm"
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "!TARGET_LZCNT && TARGET_64BIT"
> > +  "bsr{q}\t{%1, %0|%0, %1}"
> > +  [(set_attr "type" "alu1")
> > +   (set_attr "prefix_0f" "1")
> > +   (set_attr "znver1_decode" "vector")
> > +   (set_attr "mode" "DI")])
> > +
> >  (define_insn "bsr"
> >[(set (reg:CCZ FLAGS_REG)
> > (compare:CCZ (match_operand:SI 1 "nonimmediate_operand" "rm")
> > @@ -14775,17 +14787,204 @@ (define_insn "bsr"
> > (set_attr "znver1_decode" "vector")
> > (set_attr "mode" "SI")])
> >
> > +(define_insn "bsr_1"
> > +  [(set (match_operand:SI 0 "register_operand" "=r")
> > +   (minus:SI (const_int 31)
> > + (clz:SI (match_operand:SI 1 "nonimmediate_operand" 
> > "rm"
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "!TARGET_LZCNT"
> > +  "bsr{l}\t{%1, %0|%0, %1}"
> > +  [(set_attr "type" "alu1")
> > +   (set_attr "prefix_0f" "1")
> > +   (set_attr "znver1_decode" "vector")
> > +   (set_attr "mode" "SI")])
> > +
> > +(define_insn "bsr_zext_1"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +   (zero_extend:DI
> > + (minus:SI
> > +   (const_int 31)
> > +   (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "!TARGET_LZCNT && TARGET_64BIT"
> > +  "bsr{l}\t{%1, %k0|%k0, %1}"
> > +  [(set_attr "type" "alu1")
> > +   (set_attr "prefix_0f" "1")
> > +   (set_attr "znver1_decode" "vector")
> > +   (set_attr "mode" "SI")])
> > +
> > +; As bsr is undefined behavior on zero and for other input
> > +; values it is in range 0 to 63, we can optimize away sign-extends.
> > +(define_insn_and_split "*bsr_rex64_2"
> > +  [(set (match_operand:DI 0 "register_operand")
> > +   (xor:DI
> > + (sign_extend:DI
> > +   (minus:SI
> > + (const_int 63)
> > + (subreg:SI (clz:DI (match_operand:DI 1 
> > "nonimmediate_operand"))
> > +0)))
> > + (const_int 63)))
> > +(clobber (reg:CC FLAGS_REG))]
> > +  "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()"
> > +  "#"
> > +  "&& 1"
> > +  [(parallel [(set (reg:CCZ FLAGS_REG)
> > +  (compare:CCZ (match_dup 1) (const_int 0)))
> > + (set (match_dup 2)
> > +  (minus:DI (const_int 63) (clz:DI (match_dup 1])
> > +   (parallel [(set (match_dup 0)
> > +  (zero_extend:DI (xor:SI (match_dup 3) (const_int 63
> > + (clobber (reg:CC FLAGS_REG))])]
> > +{
> > +  operands[2] = gen_reg_rtx (DImode);
> > +  operands[3] = lowpart_subreg (SImode, operands[2], DImode);
> > +})
> > +
> > +(define_insn_and_split "*bsr_2"
> > +  [(set (match_operand:DI 0 "register_operand")
> > +   (sign_extend:DI
> > + (xor:SI
> > +   (minus:SI
> > + (const_int 31)
> > + (clz:SI (match_operand:SI 1 "nonimmediate_operand")))
> > +   (const_int 31
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()"
> > +  "#"
> > +  "&& 1"
> > +  [(parallel [(set (reg:CCZ FLAGS_REG)
> > +  (compare:CCZ (match_dup 1) (const_int 0)))
> > + (set (match_dup 2)
> > +  (minus:SI (const_int 31) (clz:SI (match_dup 1])
> > +   (parallel [(set (match_dup 0)
> > +  (zero_extend:DI (xor:SI (match_dup 2) (const_int 31
> > + (clobber (reg:CC FLAGS_REG))])]
> > +  "operands[2] = gen_reg_rtx (SImode);")
> > +
> > +; Splitters to optimize 64 - __builtin_clzl (x) or 32 -

Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]

2021-07-31 Thread H.J. Lu via Gcc-patches

On Fri, Jul 30, 2021 at 6:27 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Fri, Jul 30, 2021 at 12:27:39PM +0200, Uros Bizjak wrote:
> > Please put some space here, e.g.:
> ...
> > Can you just name the relevant insn pattern and use
> >
> > emit_insn (gen_bsr_1)?
>
> Here is the updated patch.  I'll bootstrap/regtest it tonight.
>
> 2021-07-30  Jakub Jelinek  
>
> PR target/78103
> * config/i386/i386.md (bsr_rex64_1, bsr_1, bsr_zext_1): New
> define_insn patterns.
> (*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns.
> Add combine splitters for constant - clz.
> (clz2): Use a temporary pseudo for bsr result.
>
> * gcc.target/i386/pr78103-1.c: New test.
> * gcc.target/i386/pr78103-2.c: New test.
> * gcc.target/i386/pr78103-3.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2021-07-28 12:05:56.857977764 +0200
> +++ gcc/config/i386/i386.md 2021-07-30 15:13:49.994946550 +0200
> @@ -14761,6 +14761,18 @@ (define_insn "bsr_rex64"
> (set_attr "znver1_decode" "vector")
> (set_attr "mode" "DI")])
>
> +(define_insn "bsr_rex64_1"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +   (minus:DI (const_int 63)
> + (clz:DI (match_operand:DI 1 "nonimmediate_operand" "rm"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT && TARGET_64BIT"
> +  "bsr{q}\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "alu1")
> +   (set_attr "prefix_0f" "1")
> +   (set_attr "znver1_decode" "vector")
> +   (set_attr "mode" "DI")])
> +
>  (define_insn "bsr"
>[(set (reg:CCZ FLAGS_REG)
> (compare:CCZ (match_operand:SI 1 "nonimmediate_operand" "rm")
> @@ -14775,17 +14787,204 @@ (define_insn "bsr"
> (set_attr "znver1_decode" "vector")
> (set_attr "mode" "SI")])
>
> +(define_insn "bsr_1"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +   (minus:SI (const_int 31)
> + (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT"
> +  "bsr{l}\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "alu1")
> +   (set_attr "prefix_0f" "1")
> +   (set_attr "znver1_decode" "vector")
> +   (set_attr "mode" "SI")])
> +
> +(define_insn "bsr_zext_1"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +   (zero_extend:DI
> + (minus:SI
> +   (const_int 31)
> +   (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
> +   (clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT && TARGET_64BIT"
> +  "bsr{l}\t{%1, %k0|%k0, %1}"
> +  [(set_attr "type" "alu1")
> +   (set_attr "prefix_0f" "1")
> +   (set_attr "znver1_decode" "vector")
> +   (set_attr "mode" "SI")])
> +
> +; As bsr is undefined behavior on zero and for other input
> +; values it is in range 0 to 63, we can optimize away sign-extends.
> +(define_insn_and_split "*bsr_rex64_2"
> +  [(set (match_operand:DI 0 "register_operand")
> +   (xor:DI
> + (sign_extend:DI
> +   (minus:SI
> + (const_int 63)
> + (subreg:SI (clz:DI (match_operand:DI 1 "nonimmediate_operand"))
> +0)))
> + (const_int 63)))
> +(clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(parallel [(set (reg:CCZ FLAGS_REG)
> +  (compare:CCZ (match_dup 1) (const_int 0)))
> + (set (match_dup 2)
> +  (minus:DI (const_int 63) (clz:DI (match_dup 1])
> +   (parallel [(set (match_dup 0)
> +  (zero_extend:DI (xor:SI (match_dup 3) (const_int 63
> + (clobber (reg:CC FLAGS_REG))])]
> +{
> +  operands[2] = gen_reg_rtx (DImode);
> +  operands[3] = lowpart_subreg (SImode, operands[2], DImode);
> +})
> +
> +(define_insn_and_split "*bsr_2"
> +  [(set (match_operand:DI 0 "register_operand")
> +   (sign_extend:DI
> + (xor:SI
> +   (minus:SI
> + (const_int 31)
> + (clz:SI (match_operand:SI 1 "nonimmediate_operand")))
> +   (const_int 31
> +   (clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(parallel [(set (reg:CCZ FLAGS_REG)
> +  (compare:CCZ (match_dup 1) (const_int 0)))
> + (set (match_dup 2)
> +  (minus:SI (const_int 31) (clz:SI (match_dup 1])
> +   (parallel [(set (match_dup 0)
> +  (zero_extend:DI (xor:SI (match_dup 2) (const_int 31
> + (clobber (reg:CC FLAGS_REG))])]
> +  "operands[2] = gen_reg_rtx (SImode);")
> +
> +; Splitters to optimize 64 - __builtin_clzl (x) or 32 - __builtin_clz (x).
> +; Again, as for !TARGET_LZCNT CLZ is UB at zero, CLZ is guaranteed to be
> +; in [0, 63] or [0, 31] range.
> +(define_split
> +  [(set (match_operand:SI 0 "register_operand")
> +   (minus:SI
> + (match_operand:SI 2 "const_int_operand")
> + (xor:SI
> +   (minus:SI

PING^1 [PATCH v5] : Add pragma GCC target("general-regs-only")

2021-07-31 Thread H.J. Lu via Gcc-patches

On Sat, Jul 17, 2021 at 6:45 PM H.J. Lu  wrote:
>
> On Thu, Apr 22, 2021 at 7:30 AM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, Apr 22, 2021 at 2:52 PM Richard Biener
> >  wrote:
> > >
> > > On Thu, Apr 22, 2021 at 2:22 PM Jakub Jelinek  wrote:
> > > >
> > > > On Thu, Apr 22, 2021 at 01:23:20PM +0200, Richard Biener via 
> > > > Gcc-patches wrote:
> > > > > > The question is if the pragma GCC target right now behaves 
> > > > > > incrementally
> > > > > > or not, whether
> > > > > > #pragma GCC target("avx2")
> > > > > > adds -mavx2 to options if it was missing before and nothing 
> > > > > > otherwise, or if
> > > > > > it switches other options off.  If it is incremental, we could e.g. 
> > > > > > try to
> > > > > > use the second least significant bit of global_options_set.x_* to 
> > > > > > mean
> > > > > > this option has been set explicitly by some surrounding #pragma GCC 
> > > > > > target.
> > > > > > The normal tests - global_options_set.x_flag_whatever could still 
> > > > > > work
> > > > > > fine because they wouldn't care if the option was explicit from 
> > > > > > anywhere
> > > > > > (command line or GCC target or target attribute) and just & 2 would 
> > > > > > mean
> > > > > > it was explicit from pragma GCC target; though there is the case of
> > > > > > bitfields... And then the inlining decision could check the & 2 
> > > > > > flags to
> > > > > > see what is required and what is just from command line.
> > > > > > Or we can have some other pragma GCC that would be like target but 
> > > > > > would
> > > > > > have flags that are explicit (and could e.g. be more restricted, to 
> > > > > > ISA
> > > > > > options only, and let those use in addition to #pragma GCC target.
> > > > >
> > > > > I'm still curious as to what you think will break if always-inline 
> > > > > does what
> > > > > it is documented to do.
> > > >
> > > > We will silently accept calling intrinsics that must be used only in 
> > > > certain
> > > > ISA contexts, which will lead to people writing non-portable code.
> > > >
> > > > So -O2 -mno-avx
> > > > #include 
> > > >
> > > > void
> > > > foo (__m256 *x)
> > > > {
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > > }
> > > > etc. will now be accepted when it shouldn't be.
> > > > clang rejects it like gcc with:
> > > > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target 
> > > > feature 'avx', but would be inlined into function 'foo' that is 
> > > > compiled without support for 'avx'
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > >  ^
> > > >
> > > > Note, if I do:
> > > > #include 
> > > >
> > > > __attribute__((target ("no-sse3"))) void
> > > > foo (__m256 *x)
> > > > {
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > > }
> > > > and compile
> > > > clang -S -O2 -mavx2 1.c
> > > > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target 
> > > > feature 'avx', but would be inlined into function 'foo' that is 
> > > > compiled without support for 'avx'
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > >  ^
> > > > then from the error message it seems that unlike GCC, clang remembers
> > > > the exact target features that are needed for the intrinsics and checks 
> > > > just
> > > > those.
> > > > Though, looking at the preprocessed source, seems it uses
> > > > static __inline __m256 __attribute__((__always_inline__, __nodebug__, 
> > > > __target__("avx"), __min_vector_width__(256)))
> > > > _mm256_sub_ps(__m256 __a, __m256 __b)
> > > > {
> > > >   return (__m256)((__v8sf)__a-(__v8sf)__b);
> > > > }
> > > > and not target pragmas.
> > > >
> > > > Anyway, if we tweak our intrinsic headers so that
> > > > -#ifndef __AVX__
> > > >  #pragma GCC push_options
> > > >  #pragma GCC target("avx")
> > > > -#define __DISABLE_AVX__
> > > > -#endif /* __AVX__ */
> > > >
> > > > ...
> > > > -#ifdef __DISABLE_AVX__
> > > > -#undef __DISABLE_AVX__
> > > >  #pragma GCC pop_options
> > > > -#endif /* __DISABLE_AVX__ */
> > > > and do the opts_set->x_* & 2 stuff on explicit options coming out of
> > > > target/optimize pragmas and attributes, perhaps we don't even need
> > > > to introduce a new attribute and can handle everything magically:
> >
> > Oh, and any such changes will likely interact with Martins ideas to rework
> > how optimize and target attributes work (aka adding ontop of the
> > commandline options).  That is, attribute target will then not be enough
> > to remember the exact set of needed ISA features (as opposed to what
> > likely clang implements?)
> >
> > > > 1) if it is gnu_inline extern inline, allow indirect calls, otherwise
> > > > disallow them for always_inline functions
> > >
> > > There are a lot of intrinsics using extern inline __gnu_inline though...
> > >
> > > > 2) for the isa flags and option mismatches, only disallow opts_set->x_* 
> > > > & 2
> > > > stuff
> > > > This will keep both intrinsics and glibc fortify macros working fine
> > > > in all the needed use

Re: [PATCH] c++: Reject anonymous struct with bases

2021-07-31 Thread Jason Merrill via Gcc-patches

On Fri, Jul 30, 2021 at 3:35 PM Andrew Pinski  wrote:

> On Fri, Jul 30, 2021 at 9:26 AM Jason Merrill via Gcc-patches
>  wrote:
> >
> > In discussion of jakub's patch for C++20 pointer-interconvertibility, it
> > came up that we allow anonymous structs to have bases, but don't do
> anything
> > usable with them.  Let's reject it.
> >
> > The comment change is something I noticed while looking for the right
> place
> > to diagnose this: finish_struct_anon does not actually check for anything
> > invalid, so it shouldn't claim to.
>
> This should fix PR 96636 by rejecting the code.
>

Thanks.

Jason

[pushed] c++: ICE on anon struct with base [PR96636]

2021-07-31 Thread Jason Merrill via Gcc-patches

pinskia pointed out that my recent change to reject anonymous structs with
bases was relevant to this PR.  But we still ICEd after giving that error;
this fixes the ICE.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/96636

gcc/cp/ChangeLog:

* decl.c (fixup_anonymous_aggr): Clear TYPE_NEEDS_CONSTRUCTING
after error.

gcc/testsuite/ChangeLog:

* g++.dg/ext/anon-struct9.C: New test.
---
 gcc/cp/decl.c   | 6 +-
 gcc/testsuite/g++.dg/ext/anon-struct9.C | 9 +
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/anon-struct9.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index e4be6be1819..6fa6b9adc87 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -5094,7 +5094,11 @@ fixup_anonymous_aggr (tree t)
   tree field, type;
 
   if (BINFO_N_BASE_BINFOS (TYPE_BINFO (t)))
-   error_at (location_of (t), "anonymous struct with base classes");
+   {
+ error_at (location_of (t), "anonymous struct with base classes");
+ /* Avoid ICE after error on anon-struct9.C.  */
+ TYPE_NEEDS_CONSTRUCTING (t) = false;
+   }
 
   for (field = TYPE_FIELDS (t); field; field = DECL_CHAIN (field))
if (TREE_CODE (field) == FIELD_DECL)
diff --git a/gcc/testsuite/g++.dg/ext/anon-struct9.C 
b/gcc/testsuite/g++.dg/ext/anon-struct9.C
new file mode 100644
index 000..56759429620
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/anon-struct9.C
@@ -0,0 +1,9 @@
+// PR c++/96636
+// { dg-options "" }
+
+typedef class {
+  class a {};
+  class : virtual a {};// { dg-error "anonymous struct with 
base" }
+} b;
+void foo(){ b();}
+

base-commit: 4c4249b71de3b15ba1e176ce90a57fb7bc54b917
prerequisite-patch-id: 62730bcaf1f07786fd756efb6f3bbd94d778c092
-- 
2.27.0

[pushed] c++: pretty-print TYPE_PACK_EXPANSION better

2021-07-31 Thread Jason Merrill via Gcc-patches

gcc/cp/ChangeLog:

* ptree.c (cxx_print_type) [TYPE_PACK_EXPANSION]: Also print
PACK_EXPANSION_PATTERN.
---

Tested x86_64-pc-linux-gnu, applying to trunk.

 gcc/cp/ptree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/cp/ptree.c b/gcc/cp/ptree.c
index 33b73fb24b6..7f140f5f06b 100644
--- a/gcc/cp/ptree.c
+++ b/gcc/cp/ptree.c
@@ -171,6 +171,7 @@ cxx_print_type (FILE *file, tree node, int indent)
   return;
 
 case TYPE_PACK_EXPANSION:
+  print_node (file, "pattern", PACK_EXPANSION_PATTERN (node), indent + 4);
   print_node (file, "args", PACK_EXPANSION_EXTRA_ARGS (node), indent + 4);
   return;
 

base-commit: 4c4249b71de3b15ba1e176ce90a57fb7bc54b917
-- 
2.27.0

committed: [PATCH] mips: Fix up mips_atomic_assign_expand_fenv [PR94780]

2021-07-31 Thread Xi Ruoyao via Gcc-patches

On Sat, 2021-07-31 at 02:08 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2021-07-30 at 16:23 +0800, Xi Ruoyao via Gcc-patches wrote:
> > On Fri, 2021-07-30 at 09:11 +0100, Richard Sandiford wrote:
> > > Xi Ruoyao  writes:
> > > > Ping again.
> > > > 
> > > > On Wed, 2021-06-23 at 11:11 +0800, Xi Ruoyao wrote:
> > > > > Commit message shamelessly copied from 1777beb6b129 by jakub:
> > > > > 
> > > > > This function, because it is sometimes called even outside of
> > > > > function
> > > > > bodies, uses create_tmp_var_raw rather than create_tmp_var. 
> > > > > But
> > > > > in
> > > > > order
> > > > > for that to work, when first referenced, the VAR_DECLs need to
> > > > > appear
> > > > > in a
> > > > > TARGET_EXPR so that during gimplification the var gets the
> > > > > right
> > > > > DECL_CONTEXT and is added to local decls.
> > > > > 
> > > > > Bootstrapped & regtested on mips64el-linux-gnu.  Ok for trunk
> > > > > and
> > > > > backport
> > > > > to 11, 10, and 9?
> > > 
> > > OK for all, thanks.
> > > 
> > > Similar comments to the previous message about the appropriateness
> > > of me reviewing the patch, but like you say, this is doing for
> > > MIPS
> > > what we've already had to do for other targets.
> > 
> > Thanks for reviewing.
> > 
> > Will bootstrap and test it again, and commit if there is no
> > regressions.
> 
> Committed to master at 20656544 and releases/gcc-11 at 7db1795a.

Commited to releases/gcc-10 at 613e4ebc and releases/gcc-9 at 79184d8c.

RE: [r12-2640 Regression] FAIL: gcc.target/i386/dec-cmov-2.c scan-assembler-not test(l|q|w) on Linux/x86_64

2021-07-31 Thread Roger Sayle



[Committed] Tweak new test case gcc.target/i386/dec-cmov-2.c

With -m32, this test case is sensitive to the instruction timings of
the target (for ifcvt to normalize bar() to foo() during the ce1 pass,
prior to the transformations actually being tested here).  Specifying
-march=core2 prevents these failures.  Committed as obvious.

2021-07-31  Roger Sayle  

gcc/testsuite/ChangeLog
* gcc.target/i386/dec-cmov-2.c: Require -march=core2 with -m32.

Roger
--

-Original Message-
From: sunil.k.pandey  
Sent: 31 July 2021 08:13
To: gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org;
ro...@nextmovesoftware.com
Subject: [r12-2640 Regression] FAIL: gcc.target/i386/dec-cmov-2.c
scan-assembler-not test(l|q|w) on Linux/x86_64

On Linux/x86_64,

f7bf03cf69ccb7dcfa0320774aa7f3c51344dada is the first bad commit commit
f7bf03cf69ccb7dcfa0320774aa7f3c51344dada
Author: Roger Sayle 
Date:   Fri Jul 30 22:46:32 2021 +0100

Decrement followed by cmov improvements.

caused

FAIL: gcc.target/i386/dec-cmov-2.c scan-assembler-not test(l|q|w)

with GCC configured with

../../gcc/configure
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2640/
usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet
--without-isl --enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check
RUNTESTFLAGS="i386.exp=gcc.target/i386/dec-cmov-2.c
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check
RUNTESTFLAGS="i386.exp=gcc.target/i386/dec-cmov-2.c
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact
me at skpgkp2 at gmail dot com)

New French PO file for 'gcc' (version 11.2.0)

2021-07-31 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-11.2.0.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

[PATCH] Optimize x ? bswap(x) : 0 in tree-ssa-phiopt

2021-07-31 Thread Roger Sayle


Many thanks again to Jakub Jelinek for a speedy fix for PR 101642.
Interestingly, that test case "bswap16(x) ? : x" also reveals a
missed optimization opportunity.  The resulting "x ? bswap(x) : 0"
can be further simplified to just bswap(x).

Conveniently, tree-ssa-phiopt.c already recognizes/optimizes the
related "x ? popcount(x) : 0", so this patch simply makes that
transformation make general, additionally handling bswap, parity,
ffs and clrsb.  All of the required infrastructure is already
present thanks to Jakub previously adding support for clz/ctz.
To reflect this generalization, the name of the function is changed
from cond_removal_in_popcount_clz_ctz_pattern to the hopefully
equally descriptive cond_removal_in_builtin_zero_pattern.

The following patch has been tested on x86_64-pc-linux-gnu with a
"make bootstrap" and "make -k check" with no new failures.

Ok for mainline?


2021-07-31  Roger Sayle  

gcc/ChangeLog
* tree-ssa-phiopt.c (cond_removal_in_builtin_zero_pattern):
Renamed from cond_removal_in_popcount_clz_ctz_pattern.
Add support for BSWAP, FFS, PARITY and CLRSB builtins.
(tree_ssa_phiop_worker): Update call to function above.

gcc/testuite/ChangeLog
* gcc.dg/tree-ssa/phi-opt-25.c: New test case.


Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index c6adbbd..66af902 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -66,9 +66,9 @@ static bool minmax_replacement (basic_block, basic_block,
edge, edge, gphi *, tree, tree);
 static bool spaceship_replacement (basic_block, basic_block,
   edge, edge, gphi *, tree, tree);
-static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, basic_block,
- edge, edge, gphi *,
- tree, tree);
+static bool cond_removal_in_builtin_zero_pattern (basic_block, basic_block,
+ edge, edge, gphi *,
+ tree, tree);
 static bool cond_store_replacement (basic_block, basic_block, edge, edge,
hash_set *);
 static bool cond_if_else_store_replacement (basic_block, basic_block, 
basic_block);
@@ -350,9 +350,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
   early_p))
cfgchanged = true;
  else if (!early_p
-  && cond_removal_in_popcount_clz_ctz_pattern (bb, bb1, e1,
-   e2, phi, arg0,
-   arg1))
+  && cond_removal_in_builtin_zero_pattern (bb, bb1, e1, e2,
+   phi, arg0, arg1))
cfgchanged = true;
  else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
cfgchanged = true;
@@ -2466,7 +2465,8 @@ spaceship_replacement (basic_block cond_bb, basic_block 
middle_bb,
   return true;
 }
 
-/* Convert
+/* Optimize x ? __builtin_fun (x) : C, where C is __builtin_fun (0).
+   Convert
 

if (b_4(D) != 0)
@@ -2498,10 +2498,10 @@ spaceship_replacement (basic_block cond_bb, basic_block 
middle_bb,
instead of 0 above it uses the value from that macro.  */
 
 static bool
-cond_removal_in_popcount_clz_ctz_pattern (basic_block cond_bb,
- basic_block middle_bb,
- edge e1, edge e2, gphi *phi,
- tree arg0, tree arg1)
+cond_removal_in_builtin_zero_pattern (basic_block cond_bb,
+ basic_block middle_bb,
+ edge e1, edge e2, gphi *phi,
+ tree arg0, tree arg1)
 {
   gimple *cond;
   gimple_stmt_iterator gsi, gsi_from;
@@ -2549,6 +2549,12 @@ cond_removal_in_popcount_clz_ctz_pattern (basic_block 
cond_bb,
   int val = 0;
   switch (cfn)
 {
+case CFN_BUILT_IN_BSWAP16:
+case CFN_BUILT_IN_BSWAP32:
+case CFN_BUILT_IN_BSWAP64:
+case CFN_BUILT_IN_BSWAP128:
+CASE_CFN_FFS:
+CASE_CFN_PARITY:
 CASE_CFN_POPCOUNT:
   break;
 CASE_CFN_CLZ:
@@ -2577,6 +2583,15 @@ cond_removal_in_popcount_clz_ctz_pattern (basic_block 
cond_bb,
}
}
   return false;
+case BUILT_IN_CLRSB:
+  val = TYPE_PRECISION (integer_type_node) - 1;
+  break;
+case BUILT_IN_CLRSBL:
+  val = TYPE_PRECISION (long_integer_type_node) - 1;
+  break;
+case BUILT_IN_CLRSBLL:
+  val = TYPE_PRECISION (long_long_integer_type_node) - 1;
+  break;
 default:
   return false;
 }
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */

[committed] openmp: Handle OpenMP directives in attribute syntax in attribute-declaration

2021-07-31 Thread Jakub Jelinek via Gcc-patches

Hi!

Now that we parse attribute-declaration (outside of functions), the following
patch handles OpenMP directives in its attribute(s).
What needs handling incrementally is diagnose mismatching begin/end pair
like
 [[omp::directive (declare target)]];
 int a;
 #pragma omp end declare target
or
 #pragma omp declare target
 int b;
 [[omp::directive (end declare target)]];
and handling declare simd/declare variant on declarations (function
definitions and declarations), for those in two different spots.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed
to trunk.

2021-07-31  Jakub Jelinek  

* parser.c (cp_parser_declaration): Handle OpenMP directives
in attribute-declaration.

* g++.dg/gomp/attrs-9.C: New test.

--- gcc/cp/parser.c.jj  2021-07-30 14:43:43.049383470 +0200
+++ gcc/cp/parser.c 2021-07-30 19:43:22.464675663 +0200
@@ -14423,6 +14423,25 @@ cp_parser_declaration (cp_parser* parser
 {
   location_t attrs_loc = token1->location;
   tree std_attrs = cp_parser_std_attribute_spec_seq (parser);
+
+  if (std_attrs && (flag_openmp || flag_openmp_simd))
+   {
+ gcc_assert (!parser->lexer->in_omp_attribute_pragma);
+ std_attrs = cp_parser_handle_statement_omp_attributes (parser,
+std_attrs);
+ if (parser->lexer->in_omp_attribute_pragma)
+   {
+ cp_lexer *lexer = parser->lexer;
+ while (parser->lexer->in_omp_attribute_pragma)
+   {
+ gcc_assert (cp_lexer_next_token_is (parser->lexer,
+ CPP_PRAGMA));
+ cp_parser_pragma (parser, pragma_external, NULL);
+   }
+ cp_lexer_destroy (lexer);
+   }
+   }
+
   if (std_attrs != NULL_TREE)
warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
OPT_Wattributes, "attribute ignored");
--- gcc/testsuite/g++.dg/gomp/attrs-9.C.jj  2021-07-30 19:51:28.977218521 
+0200
+++ gcc/testsuite/g++.dg/gomp/attrs-9.C 2021-07-30 19:30:54.421622986 +0200
@@ -0,0 +1,15 @@
+// { dg-do compile { target c++11 } }
+
+[[omp::sequence (directive (requires, atomic_default_mem_order (seq_cst)))]];
+[[omp::directive (declare reduction (plus: int: omp_out += omp_in) initializer 
(omp_priv = 0))]];
+int a;
+[[omp::directive (declare target (a))]];
+int t;
+[[omp::sequence (omp::directive (threadprivate (t)))]];
+int b, c;
+[[omp::directive (declare target, to (b), link (c))]];
+[[omp::directive (declare target)]];
+[[omp::directive (declare target)]];
+int d;
+[[omp::directive (end declare target)]];
+[[omp::directive (end declare target)]];


Jakub

[r12-2640 Regression] FAIL: gcc.target/i386/dec-cmov-2.c scan-assembler-not test(l|q|w) on Linux/x86_64

2021-07-31 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

f7bf03cf69ccb7dcfa0320774aa7f3c51344dada is the first bad commit
commit f7bf03cf69ccb7dcfa0320774aa7f3c51344dada
Author: Roger Sayle 
Date:   Fri Jul 30 22:46:32 2021 +0100

Decrement followed by cmov improvements.

caused

FAIL: gcc.target/i386/dec-cmov-2.c scan-assembler-not test(l|q|w)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2640/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/dec-cmov-2.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/dec-cmov-2.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

[r12-2649 Regression] FAIL: gcc.target/i386/pr78103-2.c scan-assembler \\m(leal|addl)\\M on Linux/x86_64

Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]

Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]

Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]

PING^1 [PATCH v5] : Add pragma GCC target("general-regs-only")

Re: [PATCH] c++: Reject anonymous struct with bases

[pushed] c++: ICE on anon struct with base [PR96636]

[pushed] c++: pretty-print TYPE_PACK_EXPANSION better

committed: [PATCH] mips: Fix up mips_atomic_assign_expand_fenv [PR94780]

RE: [r12-2640 Regression] FAIL: gcc.target/i386/dec-cmov-2.c scan-assembler-not test(l|q|w) on Linux/x86_64

New French PO file for 'gcc' (version 11.2.0)

[PATCH] Optimize x ? bswap(x) : 0 in tree-ssa-phiopt

[committed] openmp: Handle OpenMP directives in attribute syntax in attribute-declaration

[r12-2640 Regression] FAIL: gcc.target/i386/dec-cmov-2.c scan-assembler-not test(l|q|w) on Linux/x86_64

14 matches

Site Navigation

Mail list logo

Footer information