Re: [PATCH] Optimize _Float16 usage for non AVX512FP16.
On Mon, Nov 29, 2021 at 8:46 AM liuhongt wrote: > > As discussed in PR, this patch do optimizations: > 1. No memory is needed to move HI/HFmode between GPR and SSE registers > under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o > AVX512FP16. > 2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace > ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant > initialization cound be eliminated. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and > x86_64-pc-linux-gnu{-m32\ -march=cadcadelake,\ -march=cascadelake} > Ok for trunk? > > gcc/ChangeLog: > > PR target/102811 > * config/i386/i386.c (inline_secondary_memory_needed): HImode > move between GPR and SSE registers is supported under > TARGET_SSE2 and above. > * config/i386/i386.md (extendhfsf2): Optimize expander. > (truncsfhf2): Ditto. > * config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to > align with V8HImode. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr102811-2.c: New test. > * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new > scan-assembler-times. > --- > gcc/config/i386/i386.c| 5 +++-- > gcc/config/i386/i386.md | 18 +++ > gcc/config/i386/sse.md| 2 +- > .../i386/avx512vl-vcvtps2ph-pr102811.c| 2 +- > gcc/testsuite/gcc.target/i386/pr102811-2.c| 22 +++ > 5 files changed, 41 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr102811-2.c > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 7cf599f57f7..2657e7817ae 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -19437,8 +19437,9 @@ inline_secondary_memory_needed (machine_mode mode, > reg_class_t class1, >if (msize > UNITS_PER_WORD) > return true; > > - /* In addition to SImode moves, AVX512FP16 also enables HImode moves. > */ > - int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode); > + /* In addition to SImode moves, HImode moves are supported for SSE2 > and above, > +Use vmovw with AVX512FP16, or pinsrw/pextrw without AVX512FP16. */ > + int minsize = GET_MODE_SIZE (TARGET_SSE2 ? HImode : SImode); > >if (msize < minsize) > return true; > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 2cb3e727588..070758edb66 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -4617,9 +4617,18 @@ (define_expand "extendhfsf2" >if (!TARGET_AVX512FP16) > { >rtx res = gen_reg_rtx (V4SFmode); > - rtx tmp = force_reg (V8HFmode, CONST0_RTX (V8HFmode)); > + rtx tmp = gen_reg_rtx (V8HFmode); > + rtx zero = force_reg (V8HFmode, CONST0_RTX (V8HFmode)); > > - ix86_expand_vector_set (false, tmp, operands[1], 0); > + if (TARGET_AVX2) > + { > + rtx dup = gen_reg_rtx (V8HFmode); > + emit_move_insn (dup, gen_rtx_VEC_DUPLICATE (V8HFmode, operands[1])); > + emit_move_insn (tmp, gen_rtx_VEC_MERGE (V8HFmode, dup, > + zero, const1_rtx)); > + } > + else > + emit_insn (gen_sse2_pinsrph (tmp, zero, operands[1], const1_rtx)); >emit_insn (gen_vcvtph2ps (res, gen_lowpart (V8HImode, tmp))); >emit_move_insn (operands[0], gen_lowpart (SFmode, res)); >DONE; > @@ -4833,9 +4842,10 @@ (define_expand "truncsfhf2" > if (!TARGET_AVX512FP16) > { >rtx res = gen_reg_rtx (V8HFmode); > - rtx tmp = force_reg (V4SFmode, CONST0_RTX (V4SFmode)); > + rtx tmp = gen_reg_rtx (V4SFmode); > + rtx zero = force_reg (V4SFmode, CONST0_RTX (V4SFmode)); > > - ix86_expand_vector_set (false, tmp, operands[1], 0); > + emit_insn (gen_vec_setv4sf_0 (tmp, zero, operands[1])); >emit_insn (gen_vcvtps2ph (gen_lowpart (V8HImode, res), tmp, GEN_INT > (4))); >emit_move_insn (operands[0], gen_lowpart (HFmode, res)); >DONE; > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 5229b23af98..b371b140eb1 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17272,7 +17272,7 @@ (define_mode_iterator PINSR_MODE > (V2DI "TARGET_SSE4_1 && TARGET_64BIT")]) > > (define_mode_attr sse2p4_1 > - [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse4_1") > + [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse2") > (V4SI "sse4_1") (V2DI "sse4_1")]) > > (define_mode_attr pinsr_evex_isa > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > index dfbfb167953..9a6c432c866 100644 > --- a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > @@ -1,6 +1,6 @@ > /* { dg-do compile } */ > /* { dg-options "-O2 -mf16c -mno-avx512fp16" } */ >
Re: [PATCH] Fix regression introduced by r12-5536.
On Mon, Nov 29, 2021 at 2:32 AM liuhongt wrote: > > There're several failures reported in [1]: > 1. unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)" > %vpextrw should be used in output templates. > 2. ICE in get_attr_memory for movhi_internal since some alternatives > are marked as TYPE_SSELOG. > Explicitly set memory_attr for those alternatives. > > Also this patch fixs a typo and some latent bugs which are related to > moving HImode from/to sse register w/o TARGET_AVX512FP16. > > For optimization issues discussed in PR102811, I'll create another patch for > it. > [1] https://gcc.gnu.org/pipermail/gcc-regression/2021-November/075893.html > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and > x86_64-pc-linux-gnu{-m32\ -march=cascadelake,\ -march=cascadelake} > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.c (ix86_secondary_reload): Without > TARGET_SSE4_1, General register is needed to move HImode from > sse register to memory. > * config/i386/sse.md (*vec_extrachf): Use %vpextrw instead of > pextrw in output templates. > * config/i386/i386.md (movhi_internal): Ditto, also fix typo of > MEM_P (operands[1]) and adjust memory/mode/prefix/type > attribute for alternatives related to sse register. OK, but please use sselog1 type instead so you don't need to introduce the memory attribute. Thanks, Uros. > --- > gcc/config/i386/i386.c | 2 +- > gcc/config/i386/i386.md | 44 ++--- > gcc/config/i386/sse.md | 6 +++--- > 3 files changed, 36 insertions(+), 16 deletions(-) > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 3dedf522c42..7cf599f57f7 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -19277,7 +19277,7 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t > rclass, > } > >/* Require movement to gpr, and then store to memory. */ > - if (mode == HFmode > + if ((mode == HFmode || mode == HImode) >&& !TARGET_SSE4_1 >&& SSE_CLASS_P (rclass) >&& !in_p && MEM_P (x)) > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 68606e57e60..2cb3e727588 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -2528,12 +2528,12 @@ (define_insn "*movhi_internal" > case TYPE_SSELOG: >if (SSE_REG_P (operands[0])) > return MEM_P (operands[1]) > - ? "pinsrw\t{$0, %1, %0|%0, %1, 0}" > - : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}"; > + ? "%vpinsrw\t{$0, %1, %0|%0, %1, 0}" > + : "%vpinsrw\t{$0, %k1, %0|%0, %k1, 0}"; >else > - return MEM_P (operands[1]) > - ? "pextrw\t{$0, %1, %0|%0, %1, 0}" > - : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}"; > + return MEM_P (operands[0]) > + ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}" > + : "%vpextrw\t{$0, %1, %k0|%k0, %1, 0}"; > > case TYPE_MSKLOG: >if (operands[1] == const0_rtx) > @@ -2557,12 +2557,14 @@ (define_insn "*movhi_internal" >] >(const_string "*"))) > (set (attr "type") > - (cond [(eq_attr "alternative" "9,10,11,12,13") > + (cond [(eq_attr "alternative" "9,10,12,13") > (if_then_else (match_test "TARGET_AVX512FP16") > (const_string "ssemov") > (const_string "sselog")) > (eq_attr "alternative" "4,5,6,7") > (const_string "mskmov") > + (eq_attr "alternative" "11") > + (const_string "ssemov") > (eq_attr "alternative" "8") > (const_string "msklog") > (match_test "optimize_function_for_size_p (cfun)") > @@ -2579,15 +2581,33 @@ (define_insn "*movhi_internal" > (const_string "imovx") >] >(const_string "imov"))) > +(set (attr "memory") > +(cond [(eq_attr "alternative" "9,10") > + (const_string "none") > + (eq_attr "alternative" "12") > + (const_string "load") > + (eq_attr "alternative" "13") > + (const_string "store") > + ] > + (const_string "*"))) Please use sselog1 type instead, and the memory attribute will be calculated correctly. > (set (attr "prefix") > - (if_then_else (eq_attr "alternative" "4,5,6,7,8") > - (const_string "vex") > - (const_string "orig"))) > +(cond [(eq_attr "alternative" "9,10,11,12,13") > + (const_string "maybe_evex") > + (eq_attr "alternative" "4,5,6,7,8") > + (const_string "vex") > + ] > + (const_string "orig"))) > (set (attr "mode") >(cond [(eq_attr "type" "imovx") >(const_string "SI") > +(eq_attr "alternative" "9,10,12,13") > + (if_then_else (match_test "TARGET_AVX512FP16") > +(const_string "HI") > +
[PATCH] Optimize _Float16 usage for non AVX512FP16.
As discussed in PR, this patch do optimizations: 1. No memory is needed to move HI/HFmode between GPR and SSE registers under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o AVX512FP16. 2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant initialization cound be eliminated. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and x86_64-pc-linux-gnu{-m32\ -march=cadcadelake,\ -march=cascadelake} Ok for trunk? gcc/ChangeLog: PR target/102811 * config/i386/i386.c (inline_secondary_memory_needed): HImode move between GPR and SSE registers is supported under TARGET_SSE2 and above. * config/i386/i386.md (extendhfsf2): Optimize expander. (truncsfhf2): Ditto. * config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to align with V8HImode. gcc/testsuite/ChangeLog: * gcc.target/i386/pr102811-2.c: New test. * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new scan-assembler-times. --- gcc/config/i386/i386.c| 5 +++-- gcc/config/i386/i386.md | 18 +++ gcc/config/i386/sse.md| 2 +- .../i386/avx512vl-vcvtps2ph-pr102811.c| 2 +- gcc/testsuite/gcc.target/i386/pr102811-2.c| 22 +++ 5 files changed, 41 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr102811-2.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7cf599f57f7..2657e7817ae 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -19437,8 +19437,9 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1, if (msize > UNITS_PER_WORD) return true; - /* In addition to SImode moves, AVX512FP16 also enables HImode moves. */ - int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode); + /* In addition to SImode moves, HImode moves are supported for SSE2 and above, +Use vmovw with AVX512FP16, or pinsrw/pextrw without AVX512FP16. */ + int minsize = GET_MODE_SIZE (TARGET_SSE2 ? HImode : SImode); if (msize < minsize) return true; diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 2cb3e727588..070758edb66 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4617,9 +4617,18 @@ (define_expand "extendhfsf2" if (!TARGET_AVX512FP16) { rtx res = gen_reg_rtx (V4SFmode); - rtx tmp = force_reg (V8HFmode, CONST0_RTX (V8HFmode)); + rtx tmp = gen_reg_rtx (V8HFmode); + rtx zero = force_reg (V8HFmode, CONST0_RTX (V8HFmode)); - ix86_expand_vector_set (false, tmp, operands[1], 0); + if (TARGET_AVX2) + { + rtx dup = gen_reg_rtx (V8HFmode); + emit_move_insn (dup, gen_rtx_VEC_DUPLICATE (V8HFmode, operands[1])); + emit_move_insn (tmp, gen_rtx_VEC_MERGE (V8HFmode, dup, + zero, const1_rtx)); + } + else + emit_insn (gen_sse2_pinsrph (tmp, zero, operands[1], const1_rtx)); emit_insn (gen_vcvtph2ps (res, gen_lowpart (V8HImode, tmp))); emit_move_insn (operands[0], gen_lowpart (SFmode, res)); DONE; @@ -4833,9 +4842,10 @@ (define_expand "truncsfhf2" if (!TARGET_AVX512FP16) { rtx res = gen_reg_rtx (V8HFmode); - rtx tmp = force_reg (V4SFmode, CONST0_RTX (V4SFmode)); + rtx tmp = gen_reg_rtx (V4SFmode); + rtx zero = force_reg (V4SFmode, CONST0_RTX (V4SFmode)); - ix86_expand_vector_set (false, tmp, operands[1], 0); + emit_insn (gen_vec_setv4sf_0 (tmp, zero, operands[1])); emit_insn (gen_vcvtps2ph (gen_lowpart (V8HImode, res), tmp, GEN_INT (4))); emit_move_insn (operands[0], gen_lowpart (HFmode, res)); DONE; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 5229b23af98..b371b140eb1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -17272,7 +17272,7 @@ (define_mode_iterator PINSR_MODE (V2DI "TARGET_SSE4_1 && TARGET_64BIT")]) (define_mode_attr sse2p4_1 - [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse4_1") + [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse2") (V4SI "sse4_1") (V2DI "sse4_1")]) (define_mode_attr pinsr_evex_isa diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c index dfbfb167953..9a6c432c866 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mf16c -mno-avx512fp16" } */ -/* { dg-final { scan-assembler-times "vpxor\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "vpxor\[ \\t\]" 1 } } */ /* { dg-final { scan-assembler-times "vcvtph2ps\[ \\t\]" 2 } } */ /* { dg-final { scan-assembler-times "vcvtps2ph\[ \\t\]" 1 } } */ /* {
[PATCH]middle-end cse: Make sure duplicate elements are not entered into the equivalence set [PR103404]
Hi All, CSE uses equivalence classes to keep track of expressions that all have the same values at the current point in the program. Normal equivalences through SETs only insert and perform lookups in this set but equivalence determined from comparisons, e.g. (insn 46 44 47 7 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 105 [ iD.2893 ]) (const_int 0 [0]))) "cse.c":18:22 7 {*cmpsi_ccno_1} (expr_list:REG_DEAD (reg:SI 105 [ iD.2893 ]) (nil))) creates the equivalence EQ on (reg:SI 105 [ iD.2893 ]) and (const_int 0 [0]). This causes a merge to happen between the two equivalence sets denoted by (const_int 0 [0]) and (reg:SI 105 [ iD.2893 ]) respectively. The operation happens through merge_equiv_classes however this function has an invariant that the classes to be merge not contain any duplicates. This is because it frees entries before merging. The given testcase when using the supplied flags trigger an ICE due to the equivalence set being (rr) p dump_class (class1) Equivalence chain for (reg:SI 105 [ iD.2893 ]): (reg:SI 105 [ iD.2893 ]) $3 = void (rr) p dump_class (class2) Equivalence chain for (const_int 0 [0]): (const_int 0 [0]) (reg:SI 97 [ _10 ]) (reg:SI 97 [ _10 ]) $4 = void This happens because the original INSN being recorded is (insn 18 17 24 2 (set (subreg:V1SI (reg:SI 97 [ _10 ]) 0) (const_vector:V1SI [ (const_int 0 [0]) ])) "cse.c":11:9 1363 {*movv1si_internal} (expr_list:REG_UNUSED (reg:SI 97 [ _10 ]) (nil))) and we end up generating two equivalences. the first one is simply that reg:SI 97 is 0. The second one is that 0 can be extracted from the V1SI, so subreg (subreg:V1SI (reg:SI 97) 0) 0 == 0. This nested subreg gets folded away to just reg:SI 97 and we re-insert the same equivalence. This patch changes it so that once we figure out the bucket to insert into we check if the equivalence set already contains the entry and if so just return the existing entry and exit. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no regressions. Ok for master? Thanks, Tamar gcc/ChangeLog: PR rtl-optimization/103404 * cse.c (insert_with_costs): Check if item exists already before adding a new entry in the equivalence class. gcc/testsuite/ChangeLog: PR rtl-optimization/103404 * gcc.target/i386/pr103404.c: New test. --- inline copy of patch -- diff --git a/gcc/cse.c b/gcc/cse.c index c1c7d0ca27b73c4b944b4719f95fece74e0358d5..08295246c594109e947276051c6776e4cabca4ec 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -1537,6 +1537,17 @@ insert_with_costs (rtx x, struct table_elt *classp, unsigned int hash, if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER) add_to_hard_reg_set (_regs_in_table, GET_MODE (x), REGNO (x)); + /* We cannot allow a duplicate to be entered into the equivalence sets + and so we should perform a check before we do any allocations or + change the buckets. */ + if (classp) +{ + struct table_elt *p; + for (p = classp; p; p = p->next_same_value) + if (exp_equiv_p (p->exp, x, 1, false)) + return p; +} + /* Put an element for X into the right hash bucket. */ elt = free_element_chain; diff --git a/gcc/testsuite/gcc.target/i386/pr103404.c b/gcc/testsuite/gcc.target/i386/pr103404.c new file mode 100644 index ..66f33645301db09503fc0977fd0f061a19e56ea5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103404.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Og -fcse-follow-jumps -fno-dce -fno-early-inlining -fgcse -fharden-conditional-branches -frerun-cse-after-loop -fno-tree-ccp -mavx5124fmaps -std=c99 -w" } */ + +typedef unsigned __attribute__((__vector_size__ (4))) U; +typedef unsigned __attribute__((__vector_size__ (16))) V; +typedef unsigned __attribute__((__vector_size__ (64))) W; + +int x, y; + +V v; +W w; + +inline +int bar (U a) +{ + a |= x; + W k = +__builtin_shufflevector (v, 5 / a, +2, 4, 0, 2, 4, 1, 0, 1, +1, 2, 1, 3, 0, 4, 4, 0); + w = k; + y = 0; +} + +int +foo () +{ + bar ((U){0x}); + for (unsigned i; i < sizeof (foo);) +; +} + -- diff --git a/gcc/cse.c b/gcc/cse.c index c1c7d0ca27b73c4b944b4719f95fece74e0358d5..08295246c594109e947276051c6776e4cabca4ec 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -1537,6 +1537,17 @@ insert_with_costs (rtx x, struct table_elt *classp, unsigned int hash, if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER) add_to_hard_reg_set (_regs_in_table, GET_MODE (x), REGNO (x)); + /* We cannot allow a duplicate to be entered into the equivalence sets + and so we should perform a check before we do any allocations or + change the buckets. */ + if (classp) +{ + struct table_elt *p; + for (p = classp; p; p = p->next_same_value) + if (exp_equiv_p (p->exp, x, 1,
Re: [PATCH] rs6000/test: Add emulated gather test case
on 2021/11/27 上午12:24, Segher Boessenkool wrote: > Hi! > > On Thu, Nov 25, 2021 at 11:20:57AM +0800, Kewen.Lin wrote: >> This patch is to add a test case similar to the one in i386 >> to add testing coverage for 510.parest_r hotspots. > >> gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/vect-gather-1.c: New test. > > This is okay for trunk. Thanks! > Thanks Segher! Committed as r12-5569. BR, Kewen
[PATCH] Make the path to etags used in the build system configurable [PR103021]
The attached patch allows users to specify a path to their `etags` executable for use when doing `make tags`, which is meant to close PR other/103021: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103021 I based this patch off of this one from upstream automake: https://git.savannah.gnu.org/cgit/automake.git/commit/m4?id=d2ccbd7eb38d6a4277d6f42b994eb5a29b1edf29 This means that I just supplied variables that the user can override for the tags programs, rather than having the configure scripts actually check for them. I handle etags and ctags separately because the intl subdirectory has separate targets for them. Tested with `make tags`; the changes I made work successfully, but some of the subdirectories still have broken tags targets, so I had to switch to `make -k tags` part way through. This isn't because of anything I did, though; the `-k` flag is only necessary because of errors that were already there before I touched anything. Also note that this patch only affects the subdirectories that use handwritten Makefiles; the ones that use automake will have to wait until we update the version of automake used to be 1.16.4 or newer before they'll be fixed. patch-configurable-etags.diff Description: Binary data
[PATCH] Fix regression introduced by r12-5536.
There're several failures reported in [1]: 1. unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)" %vpextrw should be used in output templates. 2. ICE in get_attr_memory for movhi_internal since some alternatives are marked as TYPE_SSELOG. Explicitly set memory_attr for those alternatives. Also this patch fixs a typo and some latent bugs which are related to moving HImode from/to sse register w/o TARGET_AVX512FP16. For optimization issues discussed in PR102811, I'll create another patch for it. [1] https://gcc.gnu.org/pipermail/gcc-regression/2021-November/075893.html Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and x86_64-pc-linux-gnu{-m32\ -march=cascadelake,\ -march=cascadelake} Ok for trunk? gcc/ChangeLog: * config/i386/i386.c (ix86_secondary_reload): Without TARGET_SSE4_1, General register is needed to move HImode from sse register to memory. * config/i386/sse.md (*vec_extrachf): Use %vpextrw instead of pextrw in output templates. * config/i386/i386.md (movhi_internal): Ditto, also fix typo of MEM_P (operands[1]) and adjust memory/mode/prefix/type attribute for alternatives related to sse register. --- gcc/config/i386/i386.c | 2 +- gcc/config/i386/i386.md | 44 ++--- gcc/config/i386/sse.md | 6 +++--- 3 files changed, 36 insertions(+), 16 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 3dedf522c42..7cf599f57f7 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -19277,7 +19277,7 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass, } /* Require movement to gpr, and then store to memory. */ - if (mode == HFmode + if ((mode == HFmode || mode == HImode) && !TARGET_SSE4_1 && SSE_CLASS_P (rclass) && !in_p && MEM_P (x)) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 68606e57e60..2cb3e727588 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2528,12 +2528,12 @@ (define_insn "*movhi_internal" case TYPE_SSELOG: if (SSE_REG_P (operands[0])) return MEM_P (operands[1]) - ? "pinsrw\t{$0, %1, %0|%0, %1, 0}" - : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}"; + ? "%vpinsrw\t{$0, %1, %0|%0, %1, 0}" + : "%vpinsrw\t{$0, %k1, %0|%0, %k1, 0}"; else - return MEM_P (operands[1]) - ? "pextrw\t{$0, %1, %0|%0, %1, 0}" - : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}"; + return MEM_P (operands[0]) + ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}" + : "%vpextrw\t{$0, %1, %k0|%k0, %1, 0}"; case TYPE_MSKLOG: if (operands[1] == const0_rtx) @@ -2557,12 +2557,14 @@ (define_insn "*movhi_internal" ] (const_string "*"))) (set (attr "type") - (cond [(eq_attr "alternative" "9,10,11,12,13") + (cond [(eq_attr "alternative" "9,10,12,13") (if_then_else (match_test "TARGET_AVX512FP16") (const_string "ssemov") (const_string "sselog")) (eq_attr "alternative" "4,5,6,7") (const_string "mskmov") + (eq_attr "alternative" "11") + (const_string "ssemov") (eq_attr "alternative" "8") (const_string "msklog") (match_test "optimize_function_for_size_p (cfun)") @@ -2579,15 +2581,33 @@ (define_insn "*movhi_internal" (const_string "imovx") ] (const_string "imov"))) +(set (attr "memory") +(cond [(eq_attr "alternative" "9,10") + (const_string "none") + (eq_attr "alternative" "12") + (const_string "load") + (eq_attr "alternative" "13") + (const_string "store") + ] + (const_string "*"))) (set (attr "prefix") - (if_then_else (eq_attr "alternative" "4,5,6,7,8") - (const_string "vex") - (const_string "orig"))) +(cond [(eq_attr "alternative" "9,10,11,12,13") + (const_string "maybe_evex") + (eq_attr "alternative" "4,5,6,7,8") + (const_string "vex") + ] + (const_string "orig"))) (set (attr "mode") (cond [(eq_attr "type" "imovx") (const_string "SI") +(eq_attr "alternative" "9,10,12,13") + (if_then_else (match_test "TARGET_AVX512FP16") +(const_string "HI") +(const_string "TI")) (eq_attr "alternative" "11") - (const_string "HF") + (if_then_else (match_test "TARGET_AVX512FP16") +(const_string "HF") +(const_string "SF")) (and (eq_attr "alternative" "1,2") (match_operand:HI 1 "aligned_operand")) (const_string "SI") @@ -3791,9 +3811,9 @@ (define_insn "*movhf_internal" ? "pinsrw\t{$0, %1,
Re: [PATCH] tree-optimization: [PR101540] Simplify CONSTRUCTOR for vector(1) to be VCE
On Sun, Nov 28, 2021 at 12:25 PM Jeff Law via Gcc-patches wrote: > > > > On 11/28/2021 10:56 AM, apinski--- via Gcc-patches wrote: > > From: Andrew Pinski > > > > This just adds a simplification to simplify_vector_constructor for > > vector of 1 element to be VCE which should reduce memory usage in > > the compiler and maybe allow for some more optimizations. > > > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. > > > > PR tree-optimization/101540 > > > > gcc/ChangeLog: > > > > * tree-ssa-forwprop.c (simplify_vector_constructor): > > Simplify constructor of vector of 1 element to just > > be a VIEW_CONVERT_EXPR. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/tree-ssa/pr101540-1.c: New test. > So why generate a VCE here if the type conversion is useless? Why not > just a NOP_EXPR? Is there something special about converting between > the element type and the outer vector type that requires VCE rather than > NOP_EXR? Neither an ACK or NAK, just trying to understand it a bit better. Because right now tree-cfg.c has this check for vector types for NOP_EXPR: /* Allow conversions between vectors with the same number of elements, provided that the conversion is OK for the element types too. */ if (VECTOR_TYPE_P (lhs_type) && VECTOR_TYPE_P (rhs1_type) && known_eq (TYPE_VECTOR_SUBPARTS (lhs_type), TYPE_VECTOR_SUBPARTS (rhs1_type))) { lhs_type = TREE_TYPE (lhs_type); rhs1_type = TREE_TYPE (rhs1_type); } else if (VECTOR_TYPE_P (lhs_type) || VECTOR_TYPE_P (rhs1_type)) { error ("invalid vector types in nop conversion"); debug_generic_expr (lhs_type); debug_generic_expr (rhs1_type); return true; } We can change this check here for NOP_EXPR and vector types but VCE is still a nop in most cases and handled as such really. But I wonder if the rest of the compiler is ready for it though. Thanks, Andrew Pinski > > Jeff > >
Re: [PATCH] Fix PR 19089: Environment variable TMP may yield gcc: abort
On Sun, Nov 28, 2021 at 12:14 PM Jeff Law via Gcc-patches wrote: > > > > On 11/27/2021 7:49 PM, apinski--- via Gcc-patches wrote: > > From: Andrew Pinski > > > > Even though I cannot reproduce the ICE any more, this is still > > a bug. We check already to see if we can access the directory > > but never check to see if the path is actually a directory. > > > > This adds the check and now we reject the file as not usable > > as a tmp directory. > > > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. > > > > libiberty/ChangeLog: > > > > * make-temp-file.c (try_dir): Check to see if the dir > > is actually a directory. > > --- > > libiberty/make-temp-file.c | 16 +++- > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/libiberty/make-temp-file.c b/libiberty/make-temp-file.c > > index 31f87fbcfde..11eb03d12ec 100644 > > --- a/libiberty/make-temp-file.c > > +++ b/libiberty/make-temp-file.c > > @@ -39,6 +39,10 @@ Boston, MA 02110-1301, USA. */ > > #if defined(_WIN32) && !defined(__CYGWIN__) > > #include > > #endif > > +#if HAVE_SYS_STAT_H > > +#include > > +#endif > > + > > > > #ifndef R_OK > > #define R_OK 4 > > @@ -76,7 +80,17 @@ try_dir (const char *dir, const char *base) > > return base; > > if (dir != 0 > > && access (dir, R_OK | W_OK | X_OK) == 0) > > -return dir; > > +{ > > + /* Check to make sure dir is actually a directory. */ > > +#ifdef S_ISDIR > > + struct stat s; > > + if (stat(dir, )) > Formatting nit, missing whitespace between stat and open paren. > > Presumably this doesn't fix the problem in the case where S_ISDIR is not > defined. But it's still an improvement. OK with the nit fixed. Correct, though I don't know of any host where S_ISDIR is not defined. Mingw has them defined. So does cygwin. glibc (and all libc on Linux) has them defined, Solaris and AIX has them defined. So Does Mac OS X. MSVC does not define them but we don't support MSVC to compile GCC so that should not be an issue. Thanks, Andrew > > jeff >
Re: [committed 03/12] d: Insert null terminator in obstack buffers
Excerpts from Iain Buclaw's message of November 26, 2021 1:35 pm: > Excerpts from Martin Liška's message of November 25, 2021 3:09 pm: >> On 7/30/21 13:01, Iain Buclaw via Gcc-patches wrote: >>> |Covers cases where functions that handle the extracted strings ignore the >>> explicit length. This isn't something that's known to happen in the current >>> front-end, but the self-hosted front-end has been observed to do this in >>> its conversions between D and C-style strings.| >> >> Can you please cherry pick this for gcc-11 branch as I see nasty output when >> using --verbose: >> >> $ gcc /home/marxin/Programming/gcc/gcc/testsuite/gdc.dg/attr_optimize4.d -c >> --verbose >> ... >> predefs GNU D_Version2 LittleEndian GNU_DWARF2_Exceptions >> GNU_StackGrowsDown GNU_InlineAsm D_LP64 assert D_ModuleInfo D_Exceptions >> D_TypeInfo all X86_64 D_HardFloat Posix linux CRuntime_Glibc >> CppRuntime_Gcc��... >> >> > > Ouch, I'll have a look at gcc-9 and 10 too to see if they are the same. > FYI, patch applied cleanly to gcc-11 branch and has been committed. Saw no regressions on x86_64-linux-gnu in both bootstrap and tests. Checked other branches, however earlier releases used the dmd front-end's OutBuffer, so are unaffected. Iain.
[PATCH] Extend usage of user hint in _Hashtable
libstdc++: In _Hashtable, use insertion hint as much as possible. Make use in unordered containers of the user provided hint iterator as much as possible. Hint is now used: - As a hint for allocation, in order to limit memory fragmentation when allocator is making use of it. - For unordered_set/unordered_map we check if it does not match the key of the element to insert, before computing the hash code. - For unordered_multiset/unordered_multimap, if equals to the key of the element to insert, the hash code is taken from the hint so that we can take advantage of the potential hash code cache. Moreover, in _M_count_tr and _M_equal_range_tr reuse the first matching node key to check for other matching nodes to avoid any temporary instantiations. libstdc++-v3/ChangeLog: * include/bits/hashtable_policy.h (_NodeBuilder<>::_S_build): Add _NodePtr template parameter. (_ReuseOrAllocNode::operator()): Add __node_ptr parameter. (_AllocNode::operator()): Likewise. (_Insert_base::try_emplace): Adapt to use hint. (_Hash_code_base<>::_M_hash_code(const _Hash_node_value<>&)): New. (_Hashtable_base<>::_M_equals<>(const _Kt&, const _Hash_node_value<>&)): New. (_Hashtable_base<>::_M_equals<>(const _Kt&, __hash_code, const _Hash_node_value<>&)): Adapt, use latter. (_Hashtable_base<>::_M_equals_tr<>(const _Kt&, const _Hash_node_value<>&)): New. (_Hashtable_base<>::_M_equals_tr<>(const _Kt&, __hash_code, const _Hash_node_value<>&)): Adapt, use latter. (_Hashtable_alloc<>::_M_allocate_node(__node_ptr, _Args&&...)): Add __node_ptr parameter. * include/bits/hashtable.h (_Hashtable<>::_Scope_node<>(__hashtable_alloc*, __node_ptr, _Args&&...)): Add __node_ptr parameter. (_Hashtable<>::_M_get_node_hint(size_type, __node_ptr)): New. (_Hashtable<>::_M_emplace_unique(const_iterator, _Args&&...)): New. (_Hashtable<>::_M_emplace_multi(const_iterator, _Args&&...)): New. (_Hashtable<>::_M_emplace()): Adapt to use latter. (_Hashtable<>::_M_insert_unique(const_iterator, _Kt&&, _Arg&&, const _NodeGenerator&)): (_Hashtable<>::_M_reinsert_node(const_iterator, node_type&&)): Add const_iterator. Add const_iterator parameter. * include/bits/unordered_map.h (unordered_map<>::insert(node_type&&)): Pass cend as hint. (unordered_map<>::insert(const_iterator, node_type&&)): Adapt to use hint. * include/bits/unordered_set.h (unordered_set<>::insert(node_type&&)): Pass cend as hint. (unordered_set<>::insert(const_iterator, node_type&&)): Adapt to use hint. Tested under Linux x86_64. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index 6e2d4c10cfe..5010cefcd77 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -301,9 +301,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // Allocate a node and construct an element within it. template - _Scoped_node(__hashtable_alloc* __h, _Args&&... __args) + _Scoped_node(__hashtable_alloc* __h, + __node_ptr __hint, _Args&&... __args) : _M_h(__h), - _M_node(__h->_M_allocate_node(std::forward<_Args>(__args)...)) + _M_node(__h->_M_allocate_node(__hint, + std::forward<_Args>(__args)...)) { } // Destroy element and deallocate node. @@ -818,6 +820,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return nullptr; } + // Gets a hint after which a node should be allocated given a bucket. + __node_ptr + _M_get_node_hint(size_type __bkt, __node_ptr __hint = nullptr) const + { + __node_base_ptr __node; + if (__node = _M_buckets[__bkt]) + return __node != &_M_before_begin + ? static_cast<__node_ptr>(__node) : __hint; + + return __hint; + } + // Insert a node at the beginning of a bucket. void _M_insert_bucket_begin(size_type, __node_ptr); @@ -846,26 +860,40 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template std::pair - _M_emplace(true_type __uks, _Args&&... __args); + _M_emplace_unique(const_iterator, _Args&&... __args); template iterator - _M_emplace(false_type __uks, _Args&&... __args) - { return _M_emplace(cend(), __uks, std::forward<_Args>(__args)...); } + _M_emplace_multi(const_iterator, _Args&&... __args); + + template + std::pair + _M_emplace(true_type /*__uks*/, _Args&&... __args) + { return _M_emplace_unique(cend(), std::forward<_Args>(__args)...); } - // Emplace with hint, useless when keys are unique. template iterator - _M_emplace(const_iterator, true_type __uks, _Args&&... __args) - { return _M_emplace(__uks, std::forward<_Args>(__args)...).first; } + _M_emplace(false_type
Re: [PATCH] tree-optimization: [PR101540] Simplify CONSTRUCTOR for vector(1) to be VCE
On 11/28/2021 10:56 AM, apinski--- via Gcc-patches wrote: From: Andrew Pinski This just adds a simplification to simplify_vector_constructor for vector of 1 element to be VCE which should reduce memory usage in the compiler and maybe allow for some more optimizations. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/101540 gcc/ChangeLog: * tree-ssa-forwprop.c (simplify_vector_constructor): Simplify constructor of vector of 1 element to just be a VIEW_CONVERT_EXPR. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr101540-1.c: New test. So why generate a VCE here if the type conversion is useless? Why not just a NOP_EXPR? Is there something special about converting between the element type and the outer vector type that requires VCE rather than NOP_EXR? Neither an ACK or NAK, just trying to understand it a bit better. Jeff
Re: [PATCH] Fix PR 19089: Environment variable TMP may yield gcc: abort
On 11/27/2021 7:49 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski Even though I cannot reproduce the ICE any more, this is still a bug. We check already to see if we can access the directory but never check to see if the path is actually a directory. This adds the check and now we reject the file as not usable as a tmp directory. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. libiberty/ChangeLog: * make-temp-file.c (try_dir): Check to see if the dir is actually a directory. --- libiberty/make-temp-file.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/libiberty/make-temp-file.c b/libiberty/make-temp-file.c index 31f87fbcfde..11eb03d12ec 100644 --- a/libiberty/make-temp-file.c +++ b/libiberty/make-temp-file.c @@ -39,6 +39,10 @@ Boston, MA 02110-1301, USA. */ #if defined(_WIN32) && !defined(__CYGWIN__) #include #endif +#if HAVE_SYS_STAT_H +#include +#endif + #ifndef R_OK #define R_OK 4 @@ -76,7 +80,17 @@ try_dir (const char *dir, const char *base) return base; if (dir != 0 && access (dir, R_OK | W_OK | X_OK) == 0) -return dir; +{ + /* Check to make sure dir is actually a directory. */ +#ifdef S_ISDIR + struct stat s; + if (stat(dir, )) Formatting nit, missing whitespace between stat and open paren. Presumably this doesn't fix the problem in the case where S_ISDIR is not defined. But it's still an improvement. OK with the nit fixed. jeff
Re: [PATCH] Fix PR 62157: disclean in libsanitizer not working
On 11/27/2021 6:19 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski So what is happening is DIST_SUBDIRS contains the conditional directories which is wrong, so we need to force DIST_SUBDIRS to be the same as SUBDIRS as recommened by the automake manual. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Also now make distclean works inside libsanitizer directory. libsanitizer/ChangeLog: PR sanitizer/62157 * Makefile.am: Force DIST_SUBDIRS to be SUBDIRS. * Makefile.in: Regenerate. * asan/Makefile.in: Likewise. * hwasan/Makefile.in: Likewise. * interception/Makefile.in: Likewise. * libbacktrace/Makefile.in: Likewise. * lsan/Makefile.in: Likewise. * sanitizer_common/Makefile.in: Likewise. * tsan/Makefile.in: Likewise. * ubsan/Makefile.in: Likewise. OK jeff
Re: [RFC][PATCH] c++/46476 - implement -Wunreachable-code-return
On 11/26/2021 5:18 AM, Richard Biener via Gcc-patches wrote: This implements a subset of -Wunreachable-code, unreachable code after a return stmt. Contrary to the previous attemt at CFG construction time this implements the bits during GIMPLE lowering where there are still all GIMPLE return stmts in the IL. The lowering phase keeps track of whether stmts can fallthru which is used to determine if the following stmt is reachable. The implementation only considers labels here. The fallthru flag is transparently extended to allow tracking a reason for non-fallthruness which is used to mark returns. This patch runs in to the same stray return/gcc_unreachable as the previous one and thus requires cleanup across the GCC code base which seems controversical. So I'm putting this on hold unless I receive some OK for cleanup in any way, meaning this isn't going to make stage3. Sorry. Richard. 2021-11-26 Richard Biener PR c++/46476 gcc/cp/ * decl.c (finish_function): Set input_location to BUILTINS_LOCATION around the code building the return 0 for main(). * cp-gimplify.c (genericize_if_stmt): Avoid optimizing if (true) and if (false) when -Wunreachable-code-return is in effect. gcc/ * common.opt (Wunreachable-code): Re-enable. (Wunreachable-code-return): New diagnostic, enabled by -Wextra and -Wunreachable-code. * doc/invoke.texi (Wunreachable-code): Document. (Wunreachable-code-return): Likewise. * gimple-low.c: Include diagnostic.h. (struct cft_reason): New. (lower_data::cannot_fallthru): Make a cft_reason. (lower_stmt): Diagnose unreachable stmts after a return. * Makefile.in (insn-emit.o-warn): Disable -Wunreachable-code-return. gcc/testsuite/ * c-c++-common/Wunreachable-code-return-1.c: New testcase. I wouldn't object to this moving forward. I've already ACK'd the cleanups. Jeff
Re: [PATCH] x86_64: PR target/100711: Splitters for pandn
On Sun, Nov 28, 2021 at 2:25 PM Roger Sayle wrote: > > > This patch addresses PR target/100711 by introducing define_split > patterns so that not/broadcast/pand may be simplified (by combine) > to broadcast/pandn. This introduces two splitters one for optimizing > pandn on TARGET_SSE for V4SI and V2DI, and another for vpandn on > TARGET_AVX2 for V16QI, V8HI, V32QI, V16HI and V8SI. Each splitter > has its own new testcase. > > I've also confirmed that not/broadcast/pandn is already getting > simplified to broadcast/pand by the middle-end optimizers. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures. Ok for mainline? > > > 2021-11-28 Roger Sayle > > gcc/ChangeLog > PR target/100711 > * config/i386/sse.md (define_split): New splitters to simplify > not;vec_duplicate;and as vec_duplicate;andn. > > gcc/testsuite/ChangeLog > PR target/100711 > * gcc.target/i386/pr100711-1.c: New test case. > * gcc.target/i386/pr100711-2.c: New test case. +;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn +(define_split + [(set (match_operand:VI48_128 0 "register_operand") + (and:VI48_128 + (vec_duplicate:VI48_128 +(not: + (match_operand: 1 "register_operand"))) + (match_operand:VI48_128 2 "register_operand")))] You can use "vector_operand" here, the resulting PANDN can handle these. + "TARGET_SSE && can_create_pseudo_p ()" This is a combine splitter, so can_create_pseudo_p () is not needed, because it runs only during the combine phase. FYI, the combine splitter is somehow different than normal splitter, the important part from the documentation is, that the insn is *not* matched by some define_insn pattern, and the split results in exactly two patterns: The insn combiner phase also splits putative insns. If three insns are merged into one insn with a complex expression that cannot be matched by some 'define_insn' pattern, the combiner phase attempts to split the complex pattern into two insns that are recognized. Usually it can break the complex pattern into two patterns by splitting out some subexpression. However, in some other cases, such as performing an addition of a large constant in two insns on a RISC machine, the way to split the addition into two insns is machine-dependent. + [(set (match_dup 3) + (vec_duplicate:VI48_128 (match_dup 1))) + (set (match_dup 0) + (and:VI48_128 (not:VI48_128 (match_dup 3)) + (match_dup 2)))] + "operands[3] = gen_reg_rtx (mode);") + +;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn +(define_split + [(set (match_operand:VI124_AVX2 0 "register_operand") + (and:VI124_AVX2 + (vec_duplicate:VI124_AVX2 +(not: + (match_operand: 1 "register_operand"))) + (match_operand:VI124_AVX2 2 "register_operand")))] + "TARGET_AVX2 && can_create_pseudo_p ()" + [(set (match_dup 3) + (vec_duplicate:VI124_AVX2 (match_dup 1))) + (set (match_dup 0) + (and:VI124_AVX2 (not:VI124_AVX2 (match_dup 3)) + (match_dup 2)))] + "operands[3] = gen_reg_rtx (mode);") Same here as above. +/* { dg-do compile } */ +/* { dg-options "-O2" } */ Please add -msse2 here, 32bit targets do not enable SSE by default, and please check if they handle DImode long long at all. Also, please run tests for x86_64 and i386 targets. The testsuite should be ran with: make -k check RUNTESTFLAGS="--target_board=unix\{,-m32\}" (Eventually, you can use check-gcc instead of check and/or add i386.exp after --target-board.) Uros. +typedef int v4si __attribute__((vector_size (16))); +typedef long long v2di __attribute__((vector_size (16))); + +v4si foo (int a, v4si b) +{ +return (__extension__ (v4si) {~a, ~a, ~a, ~a}) & b; +} + +v2di bar (long long a, v2di b) +{ +return (__extension__ (v2di) {~a, ~a}) & b; +} > > Thanks in advance, > Roger > -- >
Re: [PATCH] Restore can_be_invalidated_p semantics to before refactoring
On 11/26/2021 12:53 AM, Richard Biener via Gcc-patches wrote: This restores the semantics of can_be_invalidated_p to the original semantics of the function this was split out from tree-ssa-uninit.c. The current semantics only ever look at the first predicate which cannot be correct. Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? Thanks, Richard. 2021-11-26 Richard Biener * gimple-predicate-analysis.cc (can_be_invalidated_p): Restore semantics to the one before the split from tree-ssa-uninit.c. OK. Sorry this got missed in the review of splitting out those bits. jeff
Re: [PATCH] Remove unreachable returns
On 11/25/2021 7:16 AM, Richard Biener via Gcc-patches wrote: This removes unreachable return statements as diagnosed by the -Wunreachable-code patch. Some cases are more obviously an improvement than others - in fact some may get you the idea to replace them with gcc_unreachable () instead, leading to cases of the 'Remove unreachable gcc_unreachable () at the end of functions' patch. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK? Comments? Feel free to approve select cases only. Thanks, Richard. 2021-11-25 Richard Biener * vec.c (qsort_chk): Do not return the void return value from the noreturn qsort_chk_error. * ccmp.c (expand_ccmp_expr_1): Remove unreachable return. * df-scan.c (df_ref_equal_p): Likewise. * dwarf2out.c (is_base_type): Likewise. (add_const_value_attribute): Likewise. * fixed-value.c (fixed_arithmetic): Likewise. * gimple-fold.c (gimple_fold_builtin_fputs): Likewise. * gimple-ssa-strength-reduction.c (stmt_cost): Likewise. * graphite-isl-ast-to-gimple.c (gcc_expression_from_isl_expr_op): Likewise. (gcc_expression_from_isl_expression): Likewise. * ipa-fnsummary.c (will_be_nonconstant_expr_predicate): Likewise. * lto-streamer-in.c (lto_input_mode_table): Likewise. gcc/c-family/ * c-opts.c (c_common_post_options): Remove unreachable return. * c-pragma.c (handle_pragma_target): Likewise. (handle_pragma_optimize): Likewise. gcc/c/ * c-typeck.c (c_tree_equal): Remove unreachable return. * c-parser.c (get_matching_symbol): Likewise. libgomp/ * oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): Remove unreachable return. I'd commit the whole set. jeff
Re: [PATCH] x86_64: Improved V1TImode rotations by non-constant amounts.
On Sun, Nov 28, 2021 at 3:02 PM Roger Sayle wrote: > > > This patch builds on the recent improvements to TImode rotations (and > Jakub's fixes to shldq/shrdq patterns). Now that expanding a TImode > rotation can never fail, it is safe to allow general_operand constraints > on the QImode shift amounts in rotlv1ti3 and rotrv1ti3 patterns. > I've also made an additional tweak to ix86_expand_v1ti_to_ti to use > vec_extract via V2DImode, which avoid using memory and takes advantage > vpextrq on recent hardware. > > For the following test case: > > typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); > uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); } > > GCC with -O2 -mavx2 would previously generate: > > rotr: vmovdqa %xmm0, -24(%rsp) > movq-16(%rsp), %rdx > movl%edi, %ecx > xorl%esi, %esi > movq-24(%rsp), %rax > shrdq %rdx, %rax > shrq%cl, %rdx > testb $64, %dil > cmovne %rdx, %rax > cmovne %rsi, %rdx > negl%ecx > xorl%edi, %edi > andl$127, %ecx > vmovq %rax, %xmm2 > movq-24(%rsp), %rax > vpinsrq $1, %rdx, %xmm2, %xmm1 > movq-16(%rsp), %rdx > shldq %rax, %rdx > salq%cl, %rax > testb $64, %cl > cmovne %rax, %rdx > cmovne %rdi, %rax > vmovq %rax, %xmm3 > vpinsrq $1, %rdx, %xmm3, %xmm0 > vpor%xmm1, %xmm0, %xmm0 > ret > > with this patch, we now generate: > > rotr: movl%edi, %ecx > vpextrq $1, %xmm0, %rax > vmovq %xmm0, %rdx > shrdq %rax, %rdx > vmovq %xmm0, %rsi > shrdq %rsi, %rax > andl$64, %ecx > movq%rdx, %rsi > cmovne %rax, %rsi > cmove %rax, %rdx > vmovq %rsi, %xmm0 > vpinsrq $1, %rdx, %xmm0, %xmm0 > ret > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures. Ok for mainline? > > > 2021-11-28 Roger Sayle > > gcc/ChangeLog > * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti): Perform the > conversion via V2DImode using vec_extractv2didi on TARGET_SSE2. > * config/i386/sse.md (rotlv1ti3, rotrv1ti3): Change constraint > on QImode shift amounts from const_int_operand to general_operand. > > gcc/testsuite/ChangeLog > * gcc.target/i386/sse2-v1ti-rotate.c: New test case. OK. Thanks, Uros. > > > Thanks in advance, > Roger > -- >
Re: [PATCH] Remove unreachable gcc_unreachable () at the end of functions
On 11/25/2021 6:33 AM, Richard Biener via Gcc-patches wrote: It seems to be a style to place gcc_unreachable () after a switch that handles all cases with every case returning. Those are unreachable (well, yes!), so they will be elided at CFG construction time and the middle-end will place another __builtin_unreachable "after" them to note the path doesn't lead to a return when the function is not declared void. So IMHO those explicit gcc_unreachable () serve no purpose, if they could be replaced by a comment. But since all cases cover switches not handling a case or not returning will likely cause some diagnostic to be emitted which is better than running into an ICE only at runtime. Bootstrapped and tested on x86_64-unknown-linux-gnu - any comments? Thanks, Richard. 2021-11-24 Richard Biener * tree.h (reverse_storage_order_for_component_p): Remove spurious gcc_unreachable. * cfganal.c (dfs_find_deadend): Likewise. * fold-const-call.c (fold_const_logb): Likewise. (fold_const_significand): Likewise. * gimple-ssa-store-merging.c (lhs_valid_for_store_merging_p): Likewise. gcc/c-family/ * c-format.c (check_format_string): Remove spurious gcc_unreachable. They would be a check if someone added a case to the switch that didn't return. But we'd get a return-value warning if that happened. So I don't see that they serve much purpose. --- gcc/c-family/c-format.c| 2 -- gcc/cfganal.c | 2 -- gcc/fold-const-call.c | 2 -- gcc/gimple-ssa-store-merging.c | 2 -- gcc/tree.h | 2 -- 5 files changed, 10 deletions(-) diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c index e735e092043..617fb5ea626 100644 --- a/gcc/c-family/c-format.c +++ b/gcc/c-family/c-format.c @@ -296,8 +296,6 @@ check_format_string (const_tree fntype, unsigned HOST_WIDE_INT format_num, *no_add_attrs = true; return false; } - - gcc_unreachable (); } /* Under the control of FLAGS, verify EXPR is a valid constant that diff --git a/gcc/cfganal.c b/gcc/cfganal.c index 0cba612738d..48598e55c01 100644 --- a/gcc/cfganal.c +++ b/gcc/cfganal.c @@ -752,8 +752,6 @@ dfs_find_deadend (basic_block bb) next = e ? e->dest : EDGE_SUCC (bb, 0)->dest; } } - - gcc_unreachable (); } diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c index d6cb9b11a31..c542e780a18 100644 --- a/gcc/fold-const-call.c +++ b/gcc/fold-const-call.c @@ -429,7 +429,6 @@ fold_const_logb (real_value *result, const real_value *arg, } return false; } - gcc_unreachable (); } /* Try to evaluate: @@ -463,7 +462,6 @@ fold_const_significand (real_value *result, const real_value *arg, } return false; } - gcc_unreachable (); } /* Try to evaluate: diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c index e7c90ba8b59..13413ca4cd6 100644 --- a/gcc/gimple-ssa-store-merging.c +++ b/gcc/gimple-ssa-store-merging.c @@ -4861,8 +4861,6 @@ lhs_valid_for_store_merging_p (tree lhs) default: return false; } - - gcc_unreachable (); } /* Return true if the tree RHS is a constant we want to consider diff --git a/gcc/tree.h b/gcc/tree.h index f0e72b55abe..094501bd9b1 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -5110,8 +5110,6 @@ reverse_storage_order_for_component_p (tree t) default: return false; } - - gcc_unreachable (); } /* Return true if T is a storage order barrier, i.e. a VIEW_CONVERT_EXPR
Compare guessed profile frequencies to actual profile feedback in profile dump file
Hi, this patch adds simple code to dump and compare frequencies of basic blocks read from the profile feedback and frequencies guessed statically. It dumps basic blocks in the order of decreasing frequencies from feedback along with guessed frequencies and histograms. It makes it to possible spot basic blocks in hot regions that are considered cold by guessed profile or vice versa. I am trying to figure out how realistic our profile estimate is compared to read one on exchange2 (looking again into PR98782. There IRA now places spills into hot regions of code while with older (and worse) profile it did not. Catch is that the function is very large and has 9 nested loops, so it is hard to figure out how to improve the profile estimate and/or IRA. So here I get: Basic block 136 guessed freq: 17.548 cummulative: 0.60% feedback freq: 51.848 cummulative: 1.94% cnt: 101811269914 Basic block 137 guessed freq: 15.618 cummulative: 1.14% feedback freq: 46.471 cummulative: 3.69% cnt: 101623431810 Basic block 258 guessed freq: 15.256 cummulative: 1.67% feedback freq: 46.467 cummulative: 5.43% cnt: 101623295779 Basic block 155 guessed freq: 25.458 cummulative: 2.54% feedback freq: 39.772 cummulative: 6.92% cnt: 101389409933 Basic block 98 guessed freq:9.773 cummulative: 2.88% feedback freq: 36.496 cummulative: 8.29% cnt: 101274937256 Basic block 156 guessed freq: 22.658 cummulative: 3.66% feedback freq: 35.642 cummulative: 9.62% cnt: 101245112573 Basic block 242 guessed freq: 22.296 cummulative: 4.42% feedback freq: 35.638 cummulative: 10.96% cnt: 101244976542 Basic block 99 guessed freq:8.698 cummulative: 4.72% feedback freq: 32.558 cummulative: 12.18% cnt: 101137388587 Basic block 290 guessed freq:8.336 cummulative: 5.01% feedback freq: 32.554 cummulative: 13.40% cnt: 101137252556 Basic block 79 guessed freq:7.975 cummulative: 5.28% feedback freq: 31.622 cummulative: 14.58% cnt: 101104687116 Basic block 80 guessed freq:7.098 cummulative: 5.53% feedback freq: 28.448 cummulative: 15.65% cnt: 10993797250 Basic block 306 guessed freq:6.735 cummulative: 5.76% feedback freq: 28.444 cummulative: 16.72% cnt: 10993661219 Basic block 101 guessed freq:8.807 cummulative: 6.06% feedback freq: 26.463 cummulative: 17.71% cnt: 10924453185 Basic block 276 guessed freq:6.996 cummulative: 6.30% feedback freq: 26.443 cummulative: 18.70% cnt: 10923773030 Basic block 82 guessed freq:8.622 cummulative: 6.60% feedback freq: 23.648 cummulative: 19.59% cnt: 10826108200 Basic block 292 guessed freq:6.449 cummulative: 6.82% feedback freq: 23.624 cummulative: 20.47% cnt: 10825292014 Basic block 117 guessed freq: 12.720 cummulative: 7.26% feedback freq: 23.190 cummulative: 21.34% cnt: 10810135530 Basic block 63 guessed freq:8.673 cummulative: 7.56% feedback freq: 22.247 cummulative: 22.17% cnt: 10777181279 Basic block 308 guessed freq:6.139 cummulative: 7.77% feedback freq: 22.220 cummulative: 23.01% cnt: 10776229062 Basic block 120 guessed freq:9.170 cummulative: 8.09% feedback freq: 21.523 cummulative: 23.81% cnt: 10751896540 Basic block 260 guessed freq:7.721 cummulative: 8.35% feedback freq: 21.508 cummulative: 24.62% cnt: 10751352416 Basic block 44 guessed freq:8.949 cummulative: 8.66% feedback freq: 21.257 cummulative: 25.42% cnt: 10742592992 Basic block 324 guessed freq:6.052 cummulative: 8.87% feedback freq: 21.226 cummulative: 26.21% cnt: 10741504744 Basic block 102 guessed freq:7.046 cummulative: 9.11% feedback freq: 21.170 cummulative: 27.01% cnt: 10739562548 Basic block 277 guessed freq:5.597 cummulative: 9.30% feedback freq: 21.155 cummulative: 27.80% cnt: 10739018424 Basic block 123 guessed freq: 20.841 cummulative: 10.02% feedback freq: 20.405 cummulative: 28.56% cnt: 10712829670 Basic block 262 guessed freq: 17.548 cummulative: 10.62% feedback freq: 20.386 cummulative: 29.33% cnt: 10712168948 Basic block 104 guessed freq: 16.013 cummulative: 11.17% feedback freq: 20.014 cummulative: 30.08% cnt: 10699178597 Basic block 278 guessed freq: 12.720 cummulative: 11.61% feedback freq: 19.995 cummulative: 30.83% cnt: 10698517875 Basic block 83 guessed freq:7.185 cummulative: 11.86% feedback freq: 19.706 cummulative: 31.57% cnt: 10688423500 Basic block 293 guessed freq:5.374 cummulative: 12.04% feedback freq: 19.687 cummulative: 32.30% cnt: 10687743345 Basic block 64 guessed freq:7.434 cummulative: 12.30% feedback freq:
Re: [PATCH] [RFC] unreachable returns
On 11/25/2021 6:23 AM, Richard Biener via Gcc-patches wrote: We have quite a number of "default" returns that cannot be reached. One is particularly interesting since it says (see patch below): default: gcc_unreachable (); } /* We can get here with --disable-checking. */ return false; which suggests that _maybe_ the intention was to have the gcc_unreachable () which expands to __builtin_unreachable () with --disable-checking and thus a fallthru to "somewhere" be catched with a "sane" default return value rather than falling through to the next function or so. BUT - that isn't what actually happens since the 'return false' is unreachable after CFG construction and will be elided. In fact the IL after CFG construction is exactly the same with and without the spurious return. Now, I wonder if we should, instead of expanding gcc_unreachable to __builtin_unreachable () with --disable-checking, expand it to __builtin_trap () (or remove the --disable-checking variant completely, always retaining assert level checking but maybe make it cheaper in size by using __builtin_trap () or abort ()) Thoughts? That said, I do have a set of changes removing such spurious returns. 2021-11-25 Richard Biener gcc/c/ * c-typeck.c (c_tree_equal): Remove unreachable return. I'd bet if you dig into the history you'll find that the return was added first to make enable-checking happy, then later we added the gcc_unreachable(). I think expanding to __builtin_trap is highly preferable to __builtin_unreachable and it's probably the lowest overhead option. I can also live with removing the -disable-checking variant and instead using something that always halts execution. Once we're always halting execution on that path I have no objection to removing the extraneous return. jeff
[PATCH] tree-optimization: [PR101540] Simplify CONSTRUCTOR for vector(1) to be VCE
From: Andrew Pinski This just adds a simplification to simplify_vector_constructor for vector of 1 element to be VCE which should reduce memory usage in the compiler and maybe allow for some more optimizations. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/101540 gcc/ChangeLog: * tree-ssa-forwprop.c (simplify_vector_constructor): Simplify constructor of vector of 1 element to just be a VIEW_CONVERT_EXPR. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr101540-1.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/pr101540-1.c | 13 + gcc/tree-ssa-forwprop.c| 13 + 2 files changed, 26 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr101540-1.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr101540-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr101540-1.c new file mode 100644 index 000..73fb342e029 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr101540-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop1" } */ +/* PR tree-optimization/101540 */ +typedef unsigned char __attribute__((__vector_size__ (1))) W; + +W foo (unsigned char uc) +{ + return (W){uc}; +} +/* The constructor in the above function should be converted into a VCE. */ +/* { dg-final { scan-tree-dump-times "VIEW_CONVERT_EXPR" 1 "forwprop1"} } */ +// {uc_1(D)} +/* { dg-final { scan-tree-dump-times "{uc_\[0-9\]+.D.}" 0 "forwprop1"} } */ diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c index a830bab78ba..94b92d3d0af 100644 --- a/gcc/tree-ssa-forwprop.c +++ b/gcc/tree-ssa-forwprop.c @@ -2392,6 +2392,19 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi) elem_type = TREE_TYPE (type); elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type)); + /* Special case V1 constructor with the same type to being a VCE. */ + if (nelts == 1 && CONSTRUCTOR_NELTS (op) == 1) +{ + tree op1 = CONSTRUCTOR_ELT (op, 0)->value; + if (useless_type_conversion_p (elem_type, TREE_TYPE (op1))) + { + op1 = build1 (VIEW_CONVERT_EXPR, type, op1); + gimple_assign_set_rhs_from_tree (gsi, op1); + update_stmt (gsi_stmt (*gsi)); + return true; + } +} + orig[0] = NULL; orig[1] = NULL; conv_code = ERROR_MARK; -- 2.17.1
Re: [PATCH 1/4] libgcc: remove crt{begin,end}.o from powerpc-wrs-vxworks target
Hi Rasmus, (making progress but not quite there on the stdint business) > On 1 Nov 2021, at 10:34, Rasmus Villemoes wrote: > > Since commit 78e49fb1bc (Introduce vxworks specific crtstuff support), > the generic crtbegin.o/crtend.o have been unnecessary to build. So > remove them from extra_parts. > > This is effectively a revert of commit 9a5b8df70 (libgcc: add > crt{begin,end} for powerpc-wrs-vxworks target). > > libgcc/ > * config.host (powerpc-wrs-vxworks): Do not add crtbegin.o and > crtend.o to extra_parts. Yes, this one is ok, thanks!
[PATCH] x86_64: Improved V1TImode rotations by non-constant amounts.
This patch builds on the recent improvements to TImode rotations (and Jakub's fixes to shldq/shrdq patterns). Now that expanding a TImode rotation can never fail, it is safe to allow general_operand constraints on the QImode shift amounts in rotlv1ti3 and rotrv1ti3 patterns. I've also made an additional tweak to ix86_expand_v1ti_to_ti to use vec_extract via V2DImode, which avoid using memory and takes advantage vpextrq on recent hardware. For the following test case: typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); } GCC with -O2 -mavx2 would previously generate: rotr: vmovdqa %xmm0, -24(%rsp) movq-16(%rsp), %rdx movl%edi, %ecx xorl%esi, %esi movq-24(%rsp), %rax shrdq %rdx, %rax shrq%cl, %rdx testb $64, %dil cmovne %rdx, %rax cmovne %rsi, %rdx negl%ecx xorl%edi, %edi andl$127, %ecx vmovq %rax, %xmm2 movq-24(%rsp), %rax vpinsrq $1, %rdx, %xmm2, %xmm1 movq-16(%rsp), %rdx shldq %rax, %rdx salq%cl, %rax testb $64, %cl cmovne %rax, %rdx cmovne %rdi, %rax vmovq %rax, %xmm3 vpinsrq $1, %rdx, %xmm3, %xmm0 vpor%xmm1, %xmm0, %xmm0 ret with this patch, we now generate: rotr: movl%edi, %ecx vpextrq $1, %xmm0, %rax vmovq %xmm0, %rdx shrdq %rax, %rdx vmovq %xmm0, %rsi shrdq %rsi, %rax andl$64, %ecx movq%rdx, %rsi cmovne %rax, %rsi cmove %rax, %rdx vmovq %rsi, %xmm0 vpinsrq $1, %rdx, %xmm0, %xmm0 ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2021-11-28 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti): Perform the conversion via V2DImode using vec_extractv2didi on TARGET_SSE2. * config/i386/sse.md (rotlv1ti3, rotrv1ti3): Change constraint on QImode shift amounts from const_int_operand to general_operand. gcc/testsuite/ChangeLog * gcc.target/i386/sse2-v1ti-rotate.c: New test case. Thanks in advance, Roger -- diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 088e6af..1e9734b 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -6162,7 +6162,17 @@ static rtx ix86_expand_v1ti_to_ti (rtx x) { rtx result = gen_reg_rtx (TImode); - emit_move_insn (result, gen_lowpart (TImode, x)); + if (TARGET_SSE2) +{ + rtx temp = gen_reg_rtx (V2DImode); + emit_move_insn (temp, gen_lowpart (V2DImode, x)); + rtx lo = gen_lowpart (DImode, result); + emit_insn (gen_vec_extractv2didi (lo, temp, const0_rtx)); + rtx hi = gen_highpart (DImode, result); + emit_insn (gen_vec_extractv2didi (hi, temp, const1_rtx)); +} + else +emit_move_insn (result, gen_lowpart (TImode, x)); return result; } diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2764a25..459eec9 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -15169,7 +15169,7 @@ [(set (match_operand:V1TI 0 "register_operand") (rotate:V1TI (match_operand:V1TI 1 "register_operand") -(match_operand:QI 2 "const_int_operand")))] +(match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { ix86_expand_v1ti_rotate (ROTATE, operands); @@ -15180,7 +15180,7 @@ [(set (match_operand:V1TI 0 "register_operand") (rotatert:V1TI (match_operand:V1TI 1 "register_operand") -(match_operand:QI 2 "const_int_operand")))] +(match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { ix86_expand_v1ti_rotate (ROTATERT, operands); diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-rotate.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-rotate.c new file mode 100644 index 000..b4b2814 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-rotate.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2" } */ +/* { dg-require-effective-target sse2 } */ + +typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); + +uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); } +uv1ti rotl(uv1ti x, unsigned int i) { return (x << i) | (x >> (128-i)); } + +/* { dg-final { scan-assembler-not "shrq" } } */ +/* { dg-final { scan-assembler-not "salq" } } */
[PATCH] x86_64: PR target/100711: Splitters for pandn
This patch addresses PR target/100711 by introducing define_split patterns so that not/broadcast/pand may be simplified (by combine) to broadcast/pandn. This introduces two splitters one for optimizing pandn on TARGET_SSE for V4SI and V2DI, and another for vpandn on TARGET_AVX2 for V16QI, V8HI, V32QI, V16HI and V8SI. Each splitter has its own new testcase. I've also confirmed that not/broadcast/pandn is already getting simplified to broadcast/pand by the middle-end optimizers. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2021-11-28 Roger Sayle gcc/ChangeLog PR target/100711 * config/i386/sse.md (define_split): New splitters to simplify not;vec_duplicate;and as vec_duplicate;andn. gcc/testsuite/ChangeLog PR target/100711 * gcc.target/i386/pr100711-1.c: New test case. * gcc.target/i386/pr100711-2.c: New test case. Thanks in advance, Roger -- diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b109c2a..7147bc1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -16323,6 +16323,38 @@ ] (const_string "")))]) +;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn +(define_split + [(set (match_operand:VI48_128 0 "register_operand") + (and:VI48_128 + (vec_duplicate:VI48_128 + (not: + (match_operand: 1 "register_operand"))) + (match_operand:VI48_128 2 "register_operand")))] + "TARGET_SSE && can_create_pseudo_p ()" + [(set (match_dup 3) + (vec_duplicate:VI48_128 (match_dup 1))) + (set (match_dup 0) + (and:VI48_128 (not:VI48_128 (match_dup 3)) + (match_dup 2)))] + "operands[3] = gen_reg_rtx (mode);") + +;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn +(define_split + [(set (match_operand:VI124_AVX2 0 "register_operand") + (and:VI124_AVX2 + (vec_duplicate:VI124_AVX2 + (not: + (match_operand: 1 "register_operand"))) + (match_operand:VI124_AVX2 2 "register_operand")))] + "TARGET_AVX2 && can_create_pseudo_p ()" + [(set (match_dup 3) + (vec_duplicate:VI124_AVX2 (match_dup 1))) + (set (match_dup 0) + (and:VI124_AVX2 (not:VI124_AVX2 (match_dup 3)) + (match_dup 2)))] + "operands[3] = gen_reg_rtx (mode);") + (define_insn "*andnot3_mask" [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v") (vec_merge:VI48_AVX512VL diff --git a/gcc/testsuite/gcc.target/i386/pr100711-1.c b/gcc/testsuite/gcc.target/i386/pr100711-1.c new file mode 100644 index 000..81112f9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100711-1.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef int v4si __attribute__((vector_size (16))); +typedef long long v2di __attribute__((vector_size (16))); + +v4si foo (int a, v4si b) +{ +return (__extension__ (v4si) {~a, ~a, ~a, ~a}) & b; +} + +v2di bar (long long a, v2di b) +{ +return (__extension__ (v2di) {~a, ~a}) & b; +} + +/* { dg-final { scan-assembler-times "pandn" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100711-2.c b/gcc/testsuite/gcc.target/i386/pr100711-2.c new file mode 100644 index 000..ccaf168 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100711-2.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx2" } */ + +typedef char v16qi __attribute__ ((vector_size (16))); +typedef short v8hi __attribute__ ((vector_size (16))); +typedef int v4si __attribute__ ((vector_size (16))); + +typedef char v32qi __attribute__ ((vector_size (32))); +typedef short v16hi __attribute__ ((vector_size (32))); +typedef int v8si __attribute__ ((vector_size (32))); + +v16qi foo_v16qi (char a, v16qi b) +{ +return (__extension__ (v16qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) & b; +} + +v8hi foo_v8hi (short a, v8hi b) +{ +return (__extension__ (v8hi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,}) & b; +} + +v4si foo_v4si (int a, v4si b) +{ +return (__extension__ (v4si) {~a, ~a, ~a, ~a}) & b; +} + +v32qi foo_v32qi (char a, v32qi b) +{ +return (__extension__ (v32qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) & b; +} + +v16hi foo_v16hi (short a, v16hi b) +{ +return (__extension__ (v16hi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) & b; +} + +v8si foo_v8si (int a, v8si b) +{ +return (__extension__ (v8si) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,}) & b; +} + +/* { dg-final { scan-assembler-times "vpandn" 6 } } */
Re: [PATCH] d: fix ASAN in option processing
On 11/25/21 14:59, Martin Liška wrote: Fixes: ==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0666ca5c at pc 0x00ef094b bp 0x7fff8180 sp 0x7fff8178 READ of size 4 at 0x0666ca5c thread T0 #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855 #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916 #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887 #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829 #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427 #5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346 #6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967 #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808 #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146 for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d. Ready for master? Thanks, Martin gcc/d/ChangeLog: * d-attribs.cc (parse_optimize_options): Check index before accessing cl_options. --- gcc/d/d-attribs.cc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc index d81b7d122f7..1ec800526f7 100644 --- a/gcc/d/d-attribs.cc +++ b/gcc/d/d-attribs.cc @@ -852,7 +852,9 @@ parse_optimize_options (tree args) unsigned j = 1; for (unsigned i = 1; i < decoded_options_count; ++i) { - if (! (cl_options[decoded_options[i].opt_index].flags & CL_OPTIMIZATION)) + unsigned opt_index = decoded_options[i].opt_index; + if (opt_index >= cl_options_count + && ! (cl_options[opt_index].flags & CL_OPTIMIZATION)) { ret = false; warning (OPT_Wattributes, Sorry, I made a stupid thinko in the patch. There's fix that I'm going to install. MartinFrom 7a66c4909fd175ba429f39a3ca30be39ea02ae64 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Sun, 28 Nov 2021 09:39:40 +0100 Subject: [PATCH] d: fix thinko in optimize attr parsing gcc/d/ChangeLog: * d-attribs.cc (parse_optimize_options): Fix thinko. --- gcc/d/d-attribs.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc index 1ec800526f7..b79cf96f55c 100644 --- a/gcc/d/d-attribs.cc +++ b/gcc/d/d-attribs.cc @@ -854,7 +854,7 @@ parse_optimize_options (tree args) { unsigned opt_index = decoded_options[i].opt_index; if (opt_index >= cl_options_count - && ! (cl_options[opt_index].flags & CL_OPTIMIZATION)) + || ! (cl_options[opt_index].flags & CL_OPTIMIZATION)) { ret = false; warning (OPT_Wattributes, -- 2.34.0
Re: LoongArch Port
On Sat, 2021-11-27 at 16:27 +0800, chenglulu wrote: > The LoongArch architecture (LoongArch) is an Instruction Set > Architecture (ISA) that has a Reduced Instruction Set Computer (RISC) > style. > The documents are on > https://loongson.github.io/LoongArch-Documentation/README-EN.html > > The ELF ABI Documents are on: > https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html > > The binutils has been merged into trunk: > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=560b3fe208255ae909b4b1c88ba9c28b09043307 > > Note: this GCC port requires the following patch applied to binutils > to build. > https://github.com/loongson/binutils-gdb/commit/aacb0bf860f02aa5a7dcb76dd0e392bf871c7586 > (will be submitted to upstream soon) Native bootstrap succeeds at r12-5560, with the patches applied, problematic code thunk mentioned in https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585586.html removed, and IN_LIBGCC2 -> IN_LIBGCC2 || IN_TARGET_LIBS change mentioned in https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585589.html done. Test summary is attached. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University Native configuration is loongarch64-unknown-linux-gnu === gcc tests === Running target unix FAIL: gcc.dg/analyzer/analyzer-verbosity-2a.c (test for excess errors) FAIL: gcc.dg/analyzer/analyzer-verbosity-3a.c (test for excess errors) FAIL: gcc.dg/analyzer/edges-1.c (test for excess errors) FAIL: gcc.dg/analyzer/file-1.c (test for excess errors) FAIL: gcc.dg/analyzer/file-2.c (test for excess errors) FAIL: gcc.dg/analyzer/file-paths-1.c (test for excess errors) FAIL: gcc.dg/analyzer/file-pr58237.c (test for excess errors) FAIL: gcc.dg/analyzer/pr99716-1.c (test for excess errors) FAIL: gcc.dg/compat/scalar-by-value-3 c_compat_x_tst.o-c_compat_y_tst.o execute FAIL: gcc.dg/Warray-bounds-48.c pr102706 (test for warnings, line 33) FAIL: gcc.dg/Warray-bounds-48.c pr102706 (test for warnings, line 133) FAIL: gcc.dg/Wzero-length-array-bounds-2.c (test for excess errors) XPASS: gcc.dg/attr-alloc_size-11.c missing range info for signed char (test for warnings, line 50) XPASS: gcc.dg/attr-alloc_size-11.c missing range info for short (test for warnings, line 51) FAIL: gcc.dg/builtin-apply2.c execution test FAIL: gcc.dg/pr102892-1.c (test for excess errors) FAIL: gcc.dg/pr44194-1.c scan-rtl-dump dse1 "global deletions = (2|3)" FAIL: gcc.dg/pr44194-1.c scan-rtl-dump-not final "insn[: ][^\\n]*set (mem(?![^\\n]*scratch)" FAIL: gcc.dg/stack-usage-1.c scan-stack-usage foo\\t(256|264)\\tstatic XPASS: gcc.dg/uninit-pred-7_a.c bogus warning (test for bogus messages, line 26) FAIL: gcc.dg/uninit-pred-9_b.c bogus warning (test for bogus messages, line 20) FAIL: c-c++-common/attr-retain-5.c -Wc++-compat (test for excess errors) FAIL: c-c++-common/attr-retain-6.c -Wc++-compat (test for excess errors) FAIL: c-c++-common/attr-retain-9.c -Wc++-compat (test for excess errors) FAIL: c-c++-common/spec-barrier-1.c -Wc++-compat (test for excess errors) FAIL: gcc.dg/fixed-point/composite-type.c (test for excess errors) FAIL: gcc.dg/torture/fp-uint64-convert-double-1.c -O3 -g (internal compiler error) FAIL: gcc.dg/torture/fp-uint64-convert-double-1.c -O3 -g (test for excess errors) UNRESOLVED: gcc.dg/torture/fp-uint64-convert-double-1.c -O3 -g compilation failed to produce executable FAIL: gcc.dg/torture/fp-uint64-convert-double-2.c -O3 -g (internal compiler error) FAIL: gcc.dg/torture/fp-uint64-convert-double-2.c -O3 -g (test for excess errors) UNRESOLVED: gcc.dg/torture/fp-uint64-convert-double-2.c -O3 -g compilation failed to produce executable XPASS: gcc.dg/tree-ssa/20040204-1.c scan-tree-dump-times optimized "link_error" 0 FAIL: gcc.dg/tree-ssa/ssa-dom-cse-2.c scan-tree-dump optimized "return 28;" FAIL: gcc.dg/tree-ssa/ssa-sink-18.c scan-tree-dump-times sink2 "Sunk statements: 4" 1 === gcc Summary === # of expected passes130270 # of unexpected failures29 # of unexpected successes 4 # of expected failures 861 # of unresolved testcases 2 # of unsupported tests 2235 /home/xry111/gcc-test/gcc-12-larch-20211128/build/gcc/xgcc version 12.0.0 20211127 (experimental) (GCC) === gfortran tests === Running target unix FAIL: gfortran.dg/bind_c_array_params_2.f90 -O scan-assembler-times [ \\t][\$,_0-9]*myBindC 1 FAIL: gfortran.dg/pr95690.f90 -O (test for errors, line 6) FAIL: gfortran.dg/pr95690.f90 -O (test for excess errors) FAIL: gfortran.dg/reshape_shape_2.f90 -O (internal compiler error) FAIL: gfortran.dg/reshape_shape_2.f90 -O (test for errors, line 6) FAIL: gfortran.dg/reshape_shape_2.f90 -O (test for excess errors) FAIL: gfortran.dg/vector_subscript_1.f90 -O1 execution test FAIL: gfortran.dg/vector_subs