[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Richard Biener  ---
Fixed, possibly latent.  Note this didn't fix the vector types chosen (bool
patterns ...) but instead hardens against choices that don't work out.

A missed-optimization bug might be filed (ISTR there are some around bool
patterns already).

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:f4e92d62dccb96ade753f3a8f49be1b5f61c31f1

commit r14-9666-gf4e92d62dccb96ade753f3a8f49be1b5f61c31f1
Author: Richard Biener 
Date:   Tue Mar 26 09:11:00 2024 +0100

tree-optimization/114471 - ICE with mismatching vector types

The following fixes too lax verification of vector type compatibility
in vectorizable_operation.  When we only have a single vector size then
comparing the number of elements is enough but with SLP we mix those
and thus for operations like BIT_AND_EXPR we need to verify compatible
element types as well.  Allow sign changes for ABSU_EXPR though.

PR tree-optimization/114471
* tree-vect-stmts.cc (vectorizable_operation): Verify operand
types are compatible with the result type.

* gcc.dg/vect/pr114471.c: New testcase.

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #7 from Richard Biener  ---
Mine.

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471

--- Comment #6 from Hongtao Liu  ---
(In reply to Hongtao Liu from comment #5)
> Maybe we should always use kmask under AVX512, currently only >= 128-bits
> vector of vector _Float16 use kmask, < 128 bits vector still use vector mask.
> 
and we need to support vec_cmp/vcond_mask for 64/32/16-bit vectors.
For the testcase, there's no kmask used at all, why x86-64-v3 doesn't issue an
error.

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471

Hongtao Liu  changed:

   What|Removed |Added

 CC||liuhongt at gcc dot gnu.org

--- Comment #5 from Hongtao Liu  ---
Maybe we should always use kmask under AVX512, currently only >= 128-bits
vector of vector _Float16 use kmask, < 128 bits vector still use vector mask.

24628  /* Scalar mask case.  */
24629  if ((TARGET_AVX512F && TARGET_EVEX512 && vector_size == 64)
24630  || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16))
24631  /* AVX512FP16 only supports vector comparison
24632 to kmask for _Float16.  */
24633  || (TARGET_AVX512VL && TARGET_AVX512FP16
24634  && GET_MODE_INNER (data_mode) == E_HFmode))
24635{
24636  if (elem_size == 4
24637  || elem_size == 8
24638  || (TARGET_AVX512BW && (elem_size == 1 || elem_size == 2)))
24639return smallest_int_mode_for_size (nunits);
24640}
24641
24642  scalar_int_mode elem_mode
24643= smallest_int_mode_for_size (elem_size * BITS_PER_UNIT);
24644
24645  gcc_assert (elem_size * nunits == vector_size);
24646
24647  return mode_for_vector (elem_mode, nunits);

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Keywords||ice-on-valid-code,
   ||needs-bisection
   Last reconfirmed||2024-03-25

--- Comment #4 from Andrew Pinski  ---
A little cleaned up testcase which might show what can and cannot be SLP'ed:
```
float f1, f0, fa[2];
short sa[2];
void quantize(short s0) {
  _Bool ta[2] = {(fa[0] < 0), (fa[1] < 0)};
  _Bool t = ((s0 > 0) & ta[0]);
  short x1 = s0 + t;
  _Bool t1 = ((x1 > 0) & ta[1]);
  sa[0] = x1;
  sa[1] = s0 + t1;
}
```

Confirmed.

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-25 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471

--- Comment #3 from Sam James  ---
float quantize_x_1, quantize_x_0;
short *quantize_xq;
short quantize_x0;
void quantize() {
  short x1 = quantize_xq[0] =
  quantize_x0 + ((quantize_x0 > 0) & (quantize_x_0 < 0));
  quantize_xq[1] = 1 + ((x1 > 0) & (quantize_x_1 < 0));
}

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-25 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471

--- Comment #2 from Sam James  ---
Created attachment 57812
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57812=edit
spec.i.orig.xz

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-25 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471

--- Comment #1 from Sam James  ---
The original failed with:
```
../liblc3-1.0.4/src/spec.c: In function ‘quantize’:
../liblc3-1.0.4/src/spec.c:210:21: error: type mismatch in binary expression
  210 | LC3_HOT static void quantize(enum lc3_dt dt, enum lc3_srate sr,
  | ^~~~
vector(2) 

vector(2) 

vector(2) 

mask_patt_142.112_162 = mask__23.109_158 & mask_patt_141.111_161;
during GIMPLE pass: vect
../liblc3-1.0.4/src/spec.c:210:21: internal compiler error: verify_gimple
failed
0x559439cc2712 verify_gimple_in_cfg(function*, bool, bool)
   
/usr/src/debug/sys-devel/gcc-14.0.1_pre20240324/gcc-14-20240324/gcc/tree-cfg.cc:5663
0x559439ad9a28 execute_function_todo
   
/usr/src/debug/sys-devel/gcc-14.0.1_pre20240324/gcc-14-20240324/gcc/passes.cc:2088
0x559439ad9c8b do_per_function
   
/usr/src/debug/sys-devel/gcc-14.0.1_pre20240324/gcc-14-20240324/gcc/passes.cc:1687
0x559439ad9c8b execute_todo
   
/usr/src/debug/sys-devel/gcc-14.0.1_pre20240324/gcc-14-20240324/gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
ninja: build stopped: subcommand failed.
```

i.e. vect, not slp.

[Bug target/97585] Improve documentation for -march=x86-64 to say MMX, SSE, SSE2 are implied

2023-07-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97585

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #2 from Hongtao.liu  ---
It's now documented in x86psABI

Table 3.1: Micro-Architecture Levels
Level Name CPU Feature Example instruction
(baseline)
CMOV cmov
CX8 cmpxchg8b
FPU fld
FXSR fxsave
MMX emms
OSFXSR fxsave
SCE syscall
SSE cvtss2si
SSE2 cvtpi2pd


x86-64-v2
CMPXCHG16B cmpxchg16b
LAHF-SAHF lahf
POPCNT popcnt
SSE3 addsubpd
SSE4_1 blendpd
SSE4_2 pcmpestri
SSSE3 phaddd


x86-64-v3
AVX vzeroall
AVX2 vpermd
BMI1 andn
BMI2 bzhi
F16C vcvtph2ps
FMA vfmadd132pd
LZCNT lzcnt
MOVBE movbe
OSXSAVE xgetbv


x86-64-v4 AVX512F kmovw
AVX512BW vdbpsadbw
AVX512CD vplzcntd
AVX512DQ vpmullq
AVX512VL n/a

[Bug target/97585] Improve documentation for -march=x86-64 to say MMX, SSE, SSE2 are implied

2023-07-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97585

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-07-10

--- Comment #1 from Andrew Pinski  ---
Confirmed.

[Bug tree-optimization/108629] New: 549.fotonik3d_r regresses 15-24% at -O2 -flto -march=x86-64-v3 since r13-1203-g038b077689bb53

2023-02-01 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108629

Bug ID: 108629
   Summary: 549.fotonik3d_r regresses 15-24% at -O2 -flto
-march=x86-64-v3 since r13-1203-g038b077689bb53
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: rsandifo at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux
Target: x86_64-linux

When benchmarking trunk revision 99ea0d76116 I noticed a 24%
regression on Zen4 and Zen3 machines and 16% on a Zen2 and a Intel
CascadeLake when running 549.fotonik3d_r from SPEC 2017 FPrate suite
built with options -O2 -g -march=x86-64-v3 -flto=32 compared to the
binary produced by GCC 12.

The number of branches reported by perf stat between gcc 12 and the
aforementioned trunk revision on the Zen3 machine jumped by 90%.

The symbol profile changed from:

  Overhead  Samples  Shared object   Name
  33.23%40078fotonik3d_r_peak.gcc12 
__upml_mod_MOD_upml_updatee_simple.lto_priv.0
  27.74%33471fotonik3d_r_peak.gcc12  __upml_mod_MOD_upml_updateh
  17.50%21114fotonik3d_r_peak.gcc12  __material_mod_MOD_mat_updatee
  9.52% 11493fotonik3d_r_peak.gcc12  __update_mod_MOD_updateh
  9.49% 11445fotonik3d_r_peak.gcc12  __power_mod_MOD_power_dft

To:

  Overhead  Samples  Shared object   Name
  26.68%39825fotonik3d_r_peak.trunk 
__upml_mod_MOD_upml_updatee_simple.lto_priv.0
  22.35%33368fotonik3d_r_peak.trunk  __upml_mod_MOD_upml_updateh
  13.99%20892fotonik3d_r_peak.trunk  __material_mod_MOD_mat_updatee
  13.96%20816fotonik3d_r_peak.trunk  __power_mod_MOD_power_dft
  11.51%17164libgcc_s.so.1   __muldc3
  8.60% 12840fotonik3d_r_peak.trunk  __update_mod_MOD_updateh


On the Zen3 machine at least, I have bisected this to:

  commit 038b077689bb5310386b04d40a2cea234f01e6aa
  Author: Richard Sandiford 
  Date:   Wed Jun 22 11:27:15 2022 +0100

data-ref: Improve non-loop disambiguation [PR106019]

When dr_may_alias_p is called without a loop context, it tries
to use the tree-affine interface to calculate the difference
between the two addresses and use that difference to check whether
the gap between the accesses is known at compile time.  However, as the
example in the PR shows, this doesn't expand SSA_NAMEs and so can easily
be defeated by things like reassociation.

One fix would have been to use aff_combination_expand to expand the
SSA_NAMEs, but we'd then need some way of maintaining the associated
cache.  This patch instead reuses the innermost_loop_behavior fields
(which exist even when no loop context is provided).

It might still be useful to do the aff_combination_expand thing too,
if an example turns out to need it.

gcc/
PR tree-optimization/106019
* tree-data-ref.cc (dr_may_alias_p): Try using the
innermost_loop_behavior to disambiguate non-loop queries.

gcc/testsuite/
PR tree-optimization/106019
* gcc.dg/vect/bb-slp-pr106019.c: New test.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug testsuite/106344] New: A few x86_64 tests fail with -march=x86-64-v2

2022-07-18 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106344

Bug ID: 106344
   Summary: A few x86_64 tests fail with -march=x86-64-v2
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mpolacek at gcc dot gnu.org
  Target Milestone: ---

The following tests fail when -march=x86-64-v2 is the default:

gcc.dg/vect/bb-slp-57.c
gcc.dg/vect/slp-21.c
gcc.dg/vect/slp-perm-9.c
gcc.target/i386/minmax-9.c
gcc.target/i386/sse2-mmx-21.c
g++.target/i386/pr98218-1.C

Can be reproduced with:
$ make check-gcc RUNTESTFLAGS='--target_board=unix\{,-march=x86-64-v2\}
vect.exp=bb-slp-57.c'
$ make check-gcc RUNTESTFLAGS='--target_board=unix\{,-march=x86-64-v2\}
vect.exp=slp-21.c'
$ make check-gcc RUNTESTFLAGS='--target_board=unix\{,-march=x86-64-v2\}
vect.exp=slp-perm-9.c'
$ make check-gcc RUNTESTFLAGS='--target_board=unix\{,-march=x86-64-v2\}
i386.exp=minmax-9.c'
$ make check-gcc RUNTESTFLAGS='--target_board=unix\{,-march=x86-64-v2\}
i386.exp=sse2-mmx-21.c'
$ make check-g++ RUNTESTFLAGS='--target_board=unix\{,-march=x86-64-v2\}
i386.exp=pr98218-1.C'

Would there be a way to amend the tests so that they don't fail with
-march=x86-64-v2?

[COMMITED][PATCH] x86: Compile PR target/104441 tests with -march=x86-64

2022-02-09 Thread H.J. Lu via Gcc-patches
Compile PR target/104441 tests with -march=x86-64 to fix test failures
when GCC is configured with --with-arch=native --with-cpu=native.

PR target/104441
* gcc.target/i386/pr104441-1a.c: Compile with -march=x86-64.
* gcc.target/i386/pr104441-1b.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr104441-1a.c | 2 +-
 gcc/testsuite/gcc.target/i386/pr104441-1b.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr104441-1a.c 
b/gcc/testsuite/gcc.target/i386/pr104441-1a.c
index f4d263205f8..83734f710bd 100644
--- a/gcc/testsuite/gcc.target/i386/pr104441-1a.c
+++ b/gcc/testsuite/gcc.target/i386/pr104441-1a.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mtune=skylake -Wno-attributes" } */
+/* { dg-options "-O3 -march=x86-64 -mtune=skylake -Wno-attributes" } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.target/i386/pr104441-1b.c 
b/gcc/testsuite/gcc.target/i386/pr104441-1b.c
index 0b8a796d93c..325af044bb8 100644
--- a/gcc/testsuite/gcc.target/i386/pr104441-1b.c
+++ b/gcc/testsuite/gcc.target/i386/pr104441-1b.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -mvzeroupper -Wno-attributes" } */
+/* { dg-options "-O3 -march=x86-64 -mvzeroupper -Wno-attributes" } */
 
 #include "pr104441-1a.c"
 
-- 
2.34.1



[Bug target/97281] Mark -march=x86-64-v[234] binaries

2021-07-31 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97281

H.J. Lu  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from H.J. Lu  ---
Fixed by r11-5634:

commit 54967b02c192f893e0f23481c865dd8abcb74018
Author: H.J. Lu 
Date:   Mon Nov 9 09:29:23 2020 -0800

x86: Add -mneeded for GNU_PROPERTY_X86_ISA_1_V[234] marker

[Bug target/98274] -march=x86-64-v[234] incompatible with target attribute

2020-12-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98274

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Jakub Jelinek  ---
Fixed.

[Bug target/98274] -march=x86-64-v[234] incompatible with target attribute

2020-12-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98274

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:69bd5d473d22157d0737fc20e98eb3347cbd6ab5

commit r11-6041-g69bd5d473d22157d0737fc20e98eb3347cbd6ab5
Author: Jakub Jelinek 
Date:   Tue Dec 15 10:16:08 2020 +0100

i386: Fix up -march=x86-64-v[234] vs. target attribute [PR98274]

The following testcase fails to compile.  The problem is that
when ix86_option_override_internal is called the first time for command
line, it sees -mtune= wasn't present on the command line and so as fallback
sets ix86_tune_string to ix86_arch_string value ("x86-64-v2"), but
ix86_tune_specified is false, so we don't find the tuning in the table
but don't error on it.
When processing the target attribute, ix86_tune_string is what
it was earlier left with, but this time ix86_tune_specified is true and
so we error on it.

The following patch does what is done already e.g. for "x86-64" march,
in particular default the tuning to "generic".

2020-12-15  Jakub Jelinek  

PR target/98274
* config/i386/i386-options.c (ix86_option_override_internal): Set
ix86_tune_string to "generic" even when it wasn't specified and
ix86_arch_string is "x86-64-v2", "x86-64-v3" or "x86-64-v4".
Remove useless {}s around a single statement.

* gcc.target/i386/pr98274.c: New test.

Re: [PATCH] i386: Fix up -march=x86-64-v[234] vs. target attribute [PR98274]

2020-12-15 Thread Uros Bizjak via Gcc-patches
On Tue, Dec 15, 2020 at 10:03 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase fails to compile.  The problem is that
> when ix86_option_override_internal is called the first time for command
> line, it sees -mtune= wasn't present on the command line and so as fallback
> sets ix86_tune_string to ix86_arch_string value ("x86-64-v2"), but
> ix86_tune_specified is false, so we don't find the tuning in the table
> but don't error on it.
> When processing the target attribute, ix86_tune_string is what
> it was earlier left with, but this time ix86_tune_specified is true and
> so we error on it.
>
> The following patch does what is done already e.g. for "x86-64" march,
> in particular default the tuning to "generic".
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2020-12-15  Jakub Jelinek  
>
> PR target/98274
> * config/i386/i386-options.c (ix86_option_override_internal): Set
> ix86_tune_string to "generic" even when it wasn't specified and
> ix86_arch_string is "x86-64-v2", "x86-64-v3" or "x86-64-v4".
> Remove useless {}s around a single statement.
>
> * gcc.target/i386/pr98274.c: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386-options.c.jj   2020-12-14 13:31:51.864733294 +0100
> +++ gcc/config/i386/i386-options.c  2020-12-14 14:40:09.795222723 +0100
> @@ -1884,9 +1884,7 @@ ix86_option_override_internal (bool main
>  as -mtune=generic.  With native compilers we won't see the
>  -mtune=native, as it was changed by the driver.  */
>if (!strcmp (opts->x_ix86_tune_string, "native"))
> -   {
> - opts->x_ix86_tune_string = "generic";
> -   }
> +   opts->x_ix86_tune_string = "generic";
>else if (!strcmp (opts->x_ix86_tune_string, "x86-64"))
>  warning (OPT_Wdeprecated,
>  main_args_p
> @@ -1908,10 +1906,12 @@ ix86_option_override_internal (bool main
>
>/* opts->x_ix86_tune_string is set to opts->x_ix86_arch_string
>  or defaulted.  We need to use a sensible tune option.  */
> -  if (!strcmp (opts->x_ix86_tune_string, "x86-64"))
> -   {
> - opts->x_ix86_tune_string = "generic";
> -   }
> +  if (!strncmp (opts->x_ix86_tune_string, "x86-64", 6)
> + && (opts->x_ix86_tune_string[6] == '\0'
> + || (!strcmp (opts->x_ix86_tune_string + 6, "-v2")
> + || !strcmp (opts->x_ix86_tune_string + 6, "-v3")
> + || !strcmp (opts->x_ix86_tune_string + 6, "-v4"
> +   opts->x_ix86_tune_string = "generic";
>  }
>
>if (opts->x_ix86_stringop_alg == rep_prefix_8_byte
> --- gcc/testsuite/gcc.target/i386/pr98274.c.jj  2020-12-14 14:44:09.197559567 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr98274.c 2020-12-14 14:43:22.406080077 
> +0100
> @@ -0,0 +1,8 @@
> +/* PR target/98274 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-mabi=sysv -O2 -march=x86-64-v2" } */
> +
> +void __attribute__((target ("avx")))
> +foo (void)
> +{
> +}
>
> Jakub
>


[PATCH] i386: Fix up -march=x86-64-v[234] vs. target attribute [PR98274]

2020-12-15 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase fails to compile.  The problem is that
when ix86_option_override_internal is called the first time for command
line, it sees -mtune= wasn't present on the command line and so as fallback
sets ix86_tune_string to ix86_arch_string value ("x86-64-v2"), but
ix86_tune_specified is false, so we don't find the tuning in the table
but don't error on it.
When processing the target attribute, ix86_tune_string is what
it was earlier left with, but this time ix86_tune_specified is true and
so we error on it.

The following patch does what is done already e.g. for "x86-64" march,
in particular default the tuning to "generic".

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-12-15  Jakub Jelinek  

PR target/98274
* config/i386/i386-options.c (ix86_option_override_internal): Set
ix86_tune_string to "generic" even when it wasn't specified and
ix86_arch_string is "x86-64-v2", "x86-64-v3" or "x86-64-v4".
Remove useless {}s around a single statement.

* gcc.target/i386/pr98274.c: New test.

--- gcc/config/i386/i386-options.c.jj   2020-12-14 13:31:51.864733294 +0100
+++ gcc/config/i386/i386-options.c  2020-12-14 14:40:09.795222723 +0100
@@ -1884,9 +1884,7 @@ ix86_option_override_internal (bool main
 as -mtune=generic.  With native compilers we won't see the
 -mtune=native, as it was changed by the driver.  */
   if (!strcmp (opts->x_ix86_tune_string, "native"))
-   {
- opts->x_ix86_tune_string = "generic";
-   }
+   opts->x_ix86_tune_string = "generic";
   else if (!strcmp (opts->x_ix86_tune_string, "x86-64"))
 warning (OPT_Wdeprecated,
 main_args_p
@@ -1908,10 +1906,12 @@ ix86_option_override_internal (bool main
 
   /* opts->x_ix86_tune_string is set to opts->x_ix86_arch_string
 or defaulted.  We need to use a sensible tune option.  */
-  if (!strcmp (opts->x_ix86_tune_string, "x86-64"))
-   {
- opts->x_ix86_tune_string = "generic";
-   }
+  if (!strncmp (opts->x_ix86_tune_string, "x86-64", 6)
+ && (opts->x_ix86_tune_string[6] == '\0'
+ || (!strcmp (opts->x_ix86_tune_string + 6, "-v2")
+ || !strcmp (opts->x_ix86_tune_string + 6, "-v3")
+ || !strcmp (opts->x_ix86_tune_string + 6, "-v4"
+   opts->x_ix86_tune_string = "generic";
 }
 
   if (opts->x_ix86_stringop_alg == rep_prefix_8_byte
--- gcc/testsuite/gcc.target/i386/pr98274.c.jj  2020-12-14 14:44:09.197559567 
+0100
+++ gcc/testsuite/gcc.target/i386/pr98274.c 2020-12-14 14:43:22.406080077 
+0100
@@ -0,0 +1,8 @@
+/* PR target/98274 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-mabi=sysv -O2 -march=x86-64-v2" } */
+
+void __attribute__((target ("avx")))
+foo (void)
+{
+}

Jakub



Re: [PATCH] i386: Make -march=x86-64-v[234] behave more like other -march= options

2020-12-14 Thread H.J. Lu via Gcc-patches
On Mon, Dec 14, 2020 at 7:09 AM Uros Bizjak  wrote:
>
> On Mon, Dec 14, 2020 at 2:13 PM Jakub Jelinek  wrote:
> >
> > Hi!
> >
> > If somebody has -march=x86-64-v2 (or -v3 or -v4) in $CFLAGS, $CXXFLAGS etc.,
> > then -m32 or -mabi=ms stops working.
> > What is worse, if one configures gcc --with-arch-32=x86-64-v2 (or -v3 or 
> > -v4),
> > then -mabi=ms stops working.
> >
> > I think that is a nightmare user experience.  It is ok that x86-64-v[234]
> > behave slightly different from other -march= options (in that they imply
> > unless overridden -mtune=generic rather then -mtune= equal to the -march
> > argument), but the error when one mixes it with -mabi=ms, or -m32 doesn't
> > improve anything.
> > It is true that the exact option set is only defined in the x86-64 psABI
> > (IMHO that is a mistake too, we should copy that into the GCC documentation
> > like we document it for any other -march= option), but there is no reason
> > why that exact set of CPU features can't be used for other ABIs, it is just
> > a set of CPU features.  If we add micro-architecture levels to the 32-bit
> > ABI (I doubt anyone wants to do that, but just hypothetically), then those
> > micro-architecture levels wouldn't certainly be called x86-64-v* but perhaps
> > i386-v*.
> > In the tests, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 can't be expected on -m32
> > not because the CPU feature wouldn't be set, but because the instruction
> > is 64-bit only and 32-bit code doesn't have __int128 etc. support.
> >
> > Ok for trunk if this passes full bootstrap/regtest?
> >
> > 2020-12-14  Jakub Jelinek  
> >
> > * config/i386/i386-options.c (ix86_option_override_internal): Don't
> >     error on -march=x86-64-v[234] with -m32 or -mabi=ms.
> > * config.gcc: Don't reject --with-arch=x86-64-v[234] or
> > --with-arch_32=x86-64-v[234].
> > * doc/invoke.texi (-march=x86-64-v[234]): Document what the option
> > does for other ABIs.
> >
> > * gcc.target/i386/x86-64-v2.c: Don't expect
> > __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
> > * gcc.target/i386/x86-64-v2-other.c: New test.
> > * gcc.target/i386/x86-64-v2-msabi.c: New test.
> > * gcc.target/i386/x86-64-v3.c: Fix a comment pasto.  Don't expect
> > __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
> > * gcc.target/i386/x86-64-v3-other.c: New test.
> > * gcc.target/i386/x86-64-v3-msabi.c: New test.
> > * gcc.target/i386/x86-64-v4.c:Don't expect
> > __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
> > * gcc.target/i386/x86-64-v4-other.c: New test.
> > * gcc.target/i386/x86-64-v4-msabi.c: New test.
>
> LGTM, but please allow some time for HJ to comment.

LGTM too.

Thanks.

> Thanks,
> Uros.
>
> >
> > --- gcc/config/i386/i386-options.c.jj   2020-12-08 15:43:46.641140657 +0100
> > +++ gcc/config/i386/i386-options.c  2020-12-14 13:31:51.864733294 +0100
> > @@ -2084,17 +2084,6 @@ ix86_option_override_internal (bool main
> > return false;
> >   }
> >
> > -   /* The feature-only micro-architecture levels that use
> > -  PTA_NO_TUNE are only defined for the x86-64 psABI.  */
> > -   if ((processor_alias_table[i].flags & PTA_NO_TUNE) != 0
> > -   && (!TARGET_64BIT_P (opts->x_ix86_isa_flags)
> > -   || opts->x_ix86_abi != SYSV_ABI))
> > - {
> > -   error (G_("%qs architecture level is only defined"
> > - " for the x86-64 psABI"), opts->x_ix86_arch_string);
> > -   return false;
> > - }
> > -
> > ix86_schedule = processor_alias_table[i].schedule;
> > ix86_arch = processor_alias_table[i].processor;
> >
> > --- gcc/config.gcc.jj   2020-12-08 10:36:28.817303511 +0100
> > +++ gcc/config.gcc  2020-12-14 14:00:27.571656138 +0100
> > @@ -4517,10 +4517,8 @@ case "${target}" in
> > case " $x86_64_archs " in
> > *" ${val} "*)
> > # Disallow x86-64-v* for 
> > --with-cpu=/--with-tune=
> > -   # or --with-arch= or 
> > --with-arch_32=
> > -   # It can be only specified 
> > in --with-arch_64=
> > 

Re: [PATCH] i386: Make -march=x86-64-v[234] behave more like other -march= options

2020-12-14 Thread Uros Bizjak via Gcc-patches
On Mon, Dec 14, 2020 at 2:13 PM Jakub Jelinek  wrote:
>
> Hi!
>
> If somebody has -march=x86-64-v2 (or -v3 or -v4) in $CFLAGS, $CXXFLAGS etc.,
> then -m32 or -mabi=ms stops working.
> What is worse, if one configures gcc --with-arch-32=x86-64-v2 (or -v3 or -v4),
> then -mabi=ms stops working.
>
> I think that is a nightmare user experience.  It is ok that x86-64-v[234]
> behave slightly different from other -march= options (in that they imply
> unless overridden -mtune=generic rather then -mtune= equal to the -march
> argument), but the error when one mixes it with -mabi=ms, or -m32 doesn't
> improve anything.
> It is true that the exact option set is only defined in the x86-64 psABI
> (IMHO that is a mistake too, we should copy that into the GCC documentation
> like we document it for any other -march= option), but there is no reason
> why that exact set of CPU features can't be used for other ABIs, it is just
> a set of CPU features.  If we add micro-architecture levels to the 32-bit
> ABI (I doubt anyone wants to do that, but just hypothetically), then those
> micro-architecture levels wouldn't certainly be called x86-64-v* but perhaps
> i386-v*.
> In the tests, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 can't be expected on -m32
> not because the CPU feature wouldn't be set, but because the instruction
> is 64-bit only and 32-bit code doesn't have __int128 etc. support.
>
> Ok for trunk if this passes full bootstrap/regtest?
>
> 2020-12-14  Jakub Jelinek  
>
>     * config/i386/i386-options.c (ix86_option_override_internal): Don't
> error on -march=x86-64-v[234] with -m32 or -mabi=ms.
> * config.gcc: Don't reject --with-arch=x86-64-v[234] or
> --with-arch_32=x86-64-v[234].
> * doc/invoke.texi (-march=x86-64-v[234]): Document what the option
> does for other ABIs.
>
> * gcc.target/i386/x86-64-v2.c: Don't expect
> __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
> * gcc.target/i386/x86-64-v2-other.c: New test.
> * gcc.target/i386/x86-64-v2-msabi.c: New test.
> * gcc.target/i386/x86-64-v3.c: Fix a comment pasto.  Don't expect
> __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
> * gcc.target/i386/x86-64-v3-other.c: New test.
> * gcc.target/i386/x86-64-v3-msabi.c: New test.
> * gcc.target/i386/x86-64-v4.c:Don't expect
> __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
> * gcc.target/i386/x86-64-v4-other.c: New test.
> * gcc.target/i386/x86-64-v4-msabi.c: New test.

LGTM, but please allow some time for HJ to comment.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-options.c.jj   2020-12-08 15:43:46.641140657 +0100
> +++ gcc/config/i386/i386-options.c  2020-12-14 13:31:51.864733294 +0100
> @@ -2084,17 +2084,6 @@ ix86_option_override_internal (bool main
> return false;
>   }
>
> -   /* The feature-only micro-architecture levels that use
> -  PTA_NO_TUNE are only defined for the x86-64 psABI.  */
> -   if ((processor_alias_table[i].flags & PTA_NO_TUNE) != 0
> -   && (!TARGET_64BIT_P (opts->x_ix86_isa_flags)
> -   || opts->x_ix86_abi != SYSV_ABI))
> - {
> -   error (G_("%qs architecture level is only defined"
> - " for the x86-64 psABI"), opts->x_ix86_arch_string);
> -   return false;
> - }
> -
> ix86_schedule = processor_alias_table[i].schedule;
> ix86_arch = processor_alias_table[i].processor;
>
> --- gcc/config.gcc.jj   2020-12-08 10:36:28.817303511 +0100
> +++ gcc/config.gcc  2020-12-14 14:00:27.571656138 +0100
> @@ -4517,10 +4517,8 @@ case "${target}" in
> case " $x86_64_archs " in
> *" ${val} "*)
> # Disallow x86-64-v* for 
> --with-cpu=/--with-tune=
> -   # or --with-arch= or 
> --with-arch_32=
> -   # It can be only specified in 
> --with-arch_64=
> case "x$which$val" in
> -   
> xcpu*x86-64-v*|xtune*x86-64-v*|xarchx86-64-v*|xarch_32x86-64-v*)
> +   
> xcpu*x86-64-v*|xtune*x86-64-v*)
> echo "Unknown CPU 
> given in --with-$which=$val." 1>&2
> exit 1
> 

[Bug target/98274] -march=x86-64-v[234] incompatible with target attribute

2020-12-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98274

--- Comment #1 from Jakub Jelinek  ---
Created attachment 49760
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49760=edit
gcc11-pr98274.patch

Untested fix.

[Bug target/98274] -march=x86-64-v[234] incompatible with target attribute

2020-12-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98274

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2020-12-14
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

[Bug target/98274] New: -march=x86-64-v[234] incompatible with target attribute

2020-12-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98274

Bug ID: 98274
   Summary: -march=x86-64-v[234] incompatible with target
attribute
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

void __attribute__((target ("avx")))
foo (void)
{
}

fails to compile with -march=x86-64-v2 :
~/src/gcc/obj04/gcc/xgcc -B ~/src/gcc/obj04/gcc/ -S -O2 qauquu.c
-march=x86-64-v2
qauquu.c:3:1: error: bad value (‘x86-64-v2’) for ‘target("tune=")’ attribute
3 | {
  | ^
qauquu.c:3:1: note: valid arguments to ‘target("tune=")’ attribute are: nocona
core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i
haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client
icelake-server cascadelake tigerlake cooperlake sapphirerapids alderlake
bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64
eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3
opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1
bdver2 bdver3 bdver4 znver1 znver2 znver3 btver1 btver2 generic native; did you
mean ‘x86-64’?

[PATCH] i386: Make -march=x86-64-v[234] behave more like other -march= options

2020-12-14 Thread Jakub Jelinek via Gcc-patches
Hi!

If somebody has -march=x86-64-v2 (or -v3 or -v4) in $CFLAGS, $CXXFLAGS etc.,
then -m32 or -mabi=ms stops working.
What is worse, if one configures gcc --with-arch-32=x86-64-v2 (or -v3 or -v4),
then -mabi=ms stops working.

I think that is a nightmare user experience.  It is ok that x86-64-v[234]
behave slightly different from other -march= options (in that they imply
unless overridden -mtune=generic rather then -mtune= equal to the -march
argument), but the error when one mixes it with -mabi=ms, or -m32 doesn't
improve anything.
It is true that the exact option set is only defined in the x86-64 psABI
(IMHO that is a mistake too, we should copy that into the GCC documentation
like we document it for any other -march= option), but there is no reason
why that exact set of CPU features can't be used for other ABIs, it is just
a set of CPU features.  If we add micro-architecture levels to the 32-bit
ABI (I doubt anyone wants to do that, but just hypothetically), then those
micro-architecture levels wouldn't certainly be called x86-64-v* but perhaps
i386-v*.
In the tests, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 can't be expected on -m32
not because the CPU feature wouldn't be set, but because the instruction
is 64-bit only and 32-bit code doesn't have __int128 etc. support.

Ok for trunk if this passes full bootstrap/regtest?

2020-12-14  Jakub Jelinek  

* config/i386/i386-options.c (ix86_option_override_internal): Don't
error on -march=x86-64-v[234] with -m32 or -mabi=ms.
* config.gcc: Don't reject --with-arch=x86-64-v[234] or
--with-arch_32=x86-64-v[234].
* doc/invoke.texi (-march=x86-64-v[234]): Document what the option
does for other ABIs.

* gcc.target/i386/x86-64-v2.c: Don't expect
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
* gcc.target/i386/x86-64-v2-other.c: New test.
* gcc.target/i386/x86-64-v2-msabi.c: New test.
* gcc.target/i386/x86-64-v3.c: Fix a comment pasto.  Don't expect
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
* gcc.target/i386/x86-64-v3-other.c: New test.
* gcc.target/i386/x86-64-v3-msabi.c: New test.
* gcc.target/i386/x86-64-v4.c:Don't expect
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
* gcc.target/i386/x86-64-v4-other.c: New test.
* gcc.target/i386/x86-64-v4-msabi.c: New test.

--- gcc/config/i386/i386-options.c.jj   2020-12-08 15:43:46.641140657 +0100
+++ gcc/config/i386/i386-options.c  2020-12-14 13:31:51.864733294 +0100
@@ -2084,17 +2084,6 @@ ix86_option_override_internal (bool main
return false;
  }
 
-   /* The feature-only micro-architecture levels that use
-  PTA_NO_TUNE are only defined for the x86-64 psABI.  */
-   if ((processor_alias_table[i].flags & PTA_NO_TUNE) != 0
-   && (!TARGET_64BIT_P (opts->x_ix86_isa_flags)
-   || opts->x_ix86_abi != SYSV_ABI))
- {
-   error (G_("%qs architecture level is only defined"
- " for the x86-64 psABI"), opts->x_ix86_arch_string);
-   return false;
- }
-
ix86_schedule = processor_alias_table[i].schedule;
ix86_arch = processor_alias_table[i].processor;
 
--- gcc/config.gcc.jj   2020-12-08 10:36:28.817303511 +0100
+++ gcc/config.gcc  2020-12-14 14:00:27.571656138 +0100
@@ -4517,10 +4517,8 @@ case "${target}" in
case " $x86_64_archs " in
*" ${val} "*)
# Disallow x86-64-v* for 
--with-cpu=/--with-tune=
-   # or --with-arch= or 
--with-arch_32=
-   # It can be only specified in 
--with-arch_64=
case "x$which$val" in
-   
xcpu*x86-64-v*|xtune*x86-64-v*|xarchx86-64-v*|xarch_32x86-64-v*)
+   xcpu*x86-64-v*|xtune*x86-64-v*)
echo "Unknown CPU given 
in --with-$which=$val." 1>&2
exit 1
;;
--- gcc/doc/invoke.texi.jj  2020-12-09 23:51:01.284558982 +0100
+++ gcc/doc/invoke.texi 2020-12-14 13:36:45.523458639 +0100
@@ -29778,8 +29778,9 @@ A generic CPU with 64-bit extensions.
 @itemx x86-64-v3
 @itemx x86-64-v4
 These choices for @var{cpu-type} select the corresponding
-micro-architecture level from the x86-64 psABI.  They are only available
-when compiling for an x86-64 target that uses the System V psABI@.
+micro-architecture level from the x86-64 psABI.  On ABIs other than
+the x86-64 psABI they select the same CPU featur

Why does -march=x86-64-v[234] require SysV psABI?

2020-11-08 Thread Tomasz Konojacki
The current development docs say:

>‘x86-64-v2’
>‘x86-64-v3’
>‘x86-64-v4’
>
> These choices for cpu-type select the corresponding
> micro-architecture level from the x86-64 psABI. They are only available
> when compiling for an x86-64 target that uses the System V psABI.
>
> Since these cpu-type values do not have a corresponding -mtune
> setting, using -march with these values enables generic tuning. Specific
> tuning can be enabled using the -mtune=other-cpu-type option with an
> appropriate other-cpu-type value.

Why are those options valid only for the SysV psABI targets? Despite the
fact that SysV psABI specification is the document where those levels
are defined, I don't think there's anything ABI-specific about them.

It seems arbitrary to prevent the other targets (most notably Windows)
from using them.



[Bug other/97585] New: Improve documentation for -march=x86-64 to say MMX, SSE, SSE2 are implied

2020-10-26 Thread max at quendi dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97585

Bug ID: 97585
   Summary: Improve documentation for -march=x86-64 to say MMX,
SSE, SSE2 are implied
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: max at quendi dot de
  Target Milestone: ---

The documentation at https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html is
pretty good when it comes to indicating which instruction set extensions are
supported by which `march` value.

But -march=x86-64 is an exception: It just states "A generic CPU with 64-bit
extensions." but does not make it clear that this implies MMX, SSE, SSE2
(according to the content of gcc/common/config/i386/i386-common.c). This is
only mentioned (as far as I could tell) in one place, indirectly, in the
documentation for -mfpmath where it says:

> For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 
> switches to enable SSE extensions and make this option effective. For the 
> x86-64 compiler, these extensions are enabled by default.

I suggest to change 

> A generic CPU with 64-bit extensions.

to something like this, matching the phrasing of other architectures like
pentium4, nocona etc.:

> A generic CPU with 64-bit extensions, MMX, SSE and SSE2 instruction set 
> support.

[Bug target/97250] Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-10-10 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

H.J. Lu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #10 from H.J. Lu  ---
Fixed.

[Bug target/97250] Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-10-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

--- Comment #9 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:16664e6e4fb4281be6477c13989740d44c963c77

commit r11-3764-g16664e6e4fb4281be6477c13989740d44c963c77
Author: H.J. Lu 
Date:   Fri Oct 9 06:12:17 2020 -0700

x86-64: Check CMPXCHG16B for x86-64-v[234]

x86-64-v2 includes CMPXCHG16B.  Since -mcx16 enables CMPXCHG16B and
defines __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16, check it in x86-64-v[234]
tests.

PR target/97250
* gcc.target/i386/x86-64-v2.c: Verify that
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 is defined.
* gcc.target/i386/x86-64-v3.c: Likewise.
* gcc.target/i386/x86-64-v4.c: Likewise.

[Bug target/97250] Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-10-09 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

--- Comment #8 from H.J. Lu  ---
(In reply to Florian Weimer from comment #7)
> (In reply to H.J. Lu from comment #6)
> > (In reply to Florian Weimer from comment #5)
> > > (In reply to H.J. Lu from comment #4)
> > > > x86-64-v2 includes CMPXCHG16B, aka CX16. But -mcx16 doesn't define 
> > > > __CX16__.
> > > 
> > > It's called __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16, I think. Is this good
> > > enough?
> > 
> > Yes. But a test is missing.
> 
> Fair enough.  Are you going to send a patch?

Sure.

[Bug target/97250] Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-10-09 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

--- Comment #7 from Florian Weimer  ---
(In reply to H.J. Lu from comment #6)
> (In reply to Florian Weimer from comment #5)
> > (In reply to H.J. Lu from comment #4)
> > > x86-64-v2 includes CMPXCHG16B, aka CX16. But -mcx16 doesn't define 
> > > __CX16__.
> > 
> > It's called __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16, I think. Is this good
> > enough?
> 
> Yes. But a test is missing.

Fair enough.  Are you going to send a patch?

[Bug target/97250] Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-10-09 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

--- Comment #6 from H.J. Lu  ---
(In reply to Florian Weimer from comment #5)
> (In reply to H.J. Lu from comment #4)
> > x86-64-v2 includes CMPXCHG16B, aka CX16. But -mcx16 doesn't define __CX16__.
> 
> It's called __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16, I think. Is this good
> enough?

Yes. But a test is missing.

[Bug target/97250] Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-10-09 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

--- Comment #5 from Florian Weimer  ---
(In reply to H.J. Lu from comment #4)
> x86-64-v2 includes CMPXCHG16B, aka CX16. But -mcx16 doesn't define __CX16__.

It's called __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16, I think. Is this good enough?

[Bug target/97250] Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-10-09 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

H.J. Lu  changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Status|RESOLVED|REOPENED

--- Comment #4 from H.J. Lu  ---
x86-64-v2 includes CMPXCHG16B, aka CX16. But -mcx16 doesn't define __CX16__.

[Bug target/97281] Mark -march=x86-64-v[234] binaries

2020-10-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97281

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2020-10-05

--- Comment #2 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #1)
> I'm not convinced it is a good idea.
> What if only some TUs or even only some functions are compiled that way and
> the app uses cpuid or cpuid based mechanisms to determine whether such code
> can or can't be called?

With glibc-hwcaps change:

https://sourceware.org/pipermail/libc-alpha/2020-October/118184.html

shared libraries under subdirectories compiled with -march=x86-64-v[234]
have no CPUID check. A command-line option to mark these libraries informs
people that these libraries can only run on processors with proper x86-64
ISA level support.

[Bug target/97281] Mark -march=x86-64-v[234] binaries

2020-10-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97281

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
I'm not convinced it is a good idea.
What if only some TUs or even only some functions are compiled that way and the
app uses cpuid or cpuid based mechanisms to determine whether such code can or
can't be called?

[Bug target/97281] New: Mark- march=x86-64-v[234] binaries

2020-10-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97281

Bug ID: 97281
   Summary: Mark- march=x86-64-v[234] binaries
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
  Target Milestone: ---
Target: x86-64

Add a marker, GNU_PROPERTY_X86_ISA_1_V[234]:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13

to x86-64 ELF binaries to indicate that micro-architecture ISA levels
required to execute the binary.

[Bug target/97250] Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-10-01 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

Florian Weimer  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Florian Weimer  ---
Fixed for GCC 11.

[Bug target/97250] Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-10-01 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Florian Weimer :

https://gcc.gnu.org/g:324bec558e95584e8c1997575ae9d75978af59f1

commit r11-3578-g324bec558e95584e8c1997575ae9d75978af59f1
Author: Florian Weimer 
Date:   Thu Oct 1 10:08:24 2020 +0200

PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4
levels for x86-64

These micro-architecture levels are defined in the x86-64 psABI:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9

PTA_NO_TUNE is introduced so that the new processor alias table entries
do not affect the CPU tuning setting in ix86_tune.

The tests depend on the macros added in commit 92e652d8c21bd7e66cbb0f900
("i386: Define __LAHF_SAHF__ and __MOVBE__ macros, based on ISA flags").

gcc/:
PR target/97250
* config/i386/i386.h (PTA_NO_TUNE, PTA_X86_64_BASELINE)
(PTA_X86_64_V2, PTA_X86_64_V3, PTA_X86_64_V4): New.
* common/config/i386/i386-common.c (processor_alias_table):
Add "x86-64-v2", "x86-64-v3", "x86-64-v4".
* config/i386/i386-options.c (ix86_option_override_internal):
Handle new PTA_NO_TUNE processor table entries.
* doc/invoke.texi (x86 Options): Document new -march values.

gcc/testsuite/:
PR target/97250
* gcc.target/i386/x86-64-v2.c: New test.
* gcc.target/i386/x86-64-v3.c: New test.
* gcc.target/i386/x86-64-v3-haswell.c: New test.
* gcc.target/i386/x86-64-v3-skylake.c: New test.
* gcc.target/i386/x86-64-v4.c: New test.

[Bug target/97250] Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-09-30 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

--- Comment #1 from Florian Weimer  ---
First patch committed (preparatory only):
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=92e652d8c21bd7e66c

Second patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555174.html

[Bug target/97250] Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-09-30 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

Florian Weimer  changed:

   What|Removed |Added

   Last reconfirmed||2020-09-30
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1

[Bug target/97250] New: Implement -march=x86-64-v2, -march=x86-64-v3, -march=x86-64-v4 for x86-64

2020-09-30 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

Bug ID: 97250
   Summary: Implement -march=x86-64-v2, -march=x86-64-v3,
-march=x86-64-v4 for x86-64
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: fw at gcc dot gnu.org
  Reporter: fw at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64-*-*

x86-64-v2, x86-64-v3, x86-64-v4 have been added to the psABI as new
micro-architecture levels:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9a6b9396884b67c7c

GCC should provide -march= options so that programmers can easily built
programs and shared objects that are optimized for these levels.

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-09-19 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #20 from Jan Hubicka  ---
> OK, will do, but, at least superficially, our situation seems very similar to
> this one, so I thought it would be better to keep this one going. But, again,
> I'll open the new one as soon as I can make a test case for it, if this is 
> your
> preference.

Yes, please fill new bug report.  There should be one issue per bug
report with ocassional metabugs linking them together. 

Honza

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-09-19 Thread vz-gcc at zeitlins dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #19 from Vadim Zeitlin  ---
(In reply to Jan Hubicka from comment #18)
> We need a reproducer to fix bugs.

Yes, of course, I understand this. I just didn't have time to make one yet,
we've literally discovered the issue only today (well, maybe yesterday,
depending on the time zone).

> So if you have actual testcase that
> slow down, it would be great to open separate bug report for that.

OK, will do, but, at least superficially, our situation seems very similar to
this one, so I thought it would be better to keep this one going. But, again,
I'll open the new one as soon as I can make a test case for it, if this is your
preference.

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-09-19 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #18 from Jan Hubicka  ---
> I've just subscribed to this bug because we see bug slow downs in our project
> when switching from 8.3 to 10.2 (89% slower in an important use case, 30%
> slowdown more or less across the board), without any other changes. We don't
> have any simple test showing this (yet), but there is definitely something 
> very
> wrong here and I don't think it should be closed.
> 
> FWIW in our case using -O3 doesn't help (it does make the code marginally
> faster, but improvement of <0.01% is not worth 10% higher build time).

We need a reproducer to fix bugs.  So if you have actual testcase that
slow down, it would be great to open separate bug report for that.
It is best to have a self contained testcases, if that is not possible
at least a perf profile and we can discuss with you what to do next.

Honza

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-09-19 Thread vz-gcc at zeitlins dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #17 from Vadim Zeitlin  ---
I've just subscribed to this bug because we see bug slow downs in our project
when switching from 8.3 to 10.2 (89% slower in an important use case, 30%
slowdown more or less across the board), without any other changes. We don't
have any simple test showing this (yet), but there is definitely something very
wrong here and I don't think it should be closed.

FWIW in our case using -O3 doesn't help (it does make the code marginally
faster, but improvement of <0.01% is not worth 10% higher build time).

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-09-19 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

Jan Hubicka  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #16 from Jan Hubicka  ---
It seems that the benchmarks was flawed. We could reopen if phoronix suceeds to
reporduce them.

Re: [PATCH] x86: Use -march=x86-64/-march=i386 in

2020-08-24 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 24, 2020 at 3:23 PM H.J. Lu  wrote:

> > Speaking of pragmas, these should be added outside cpuid.h, like:
> >
> > #pragma GCC push_options
> > #pragma GCC target("general-regs-only")
> >
> > #include 
> >
> > void cpuid_check ()
> > ...
> >
> > #pragma GCC pop_options
> >
> > >footnote
> >
> > Nowadays, -march=native is mostly used outside generic target
> > compilations, so for relevant avx512 targets, we still generate spills
> > to mask regs. In future, we can review the setting of the tuning flag
> > for a generic target in the same way as with SSE2 inter-reg moves.
> >
>
> Florian raised an issue that we need to limit  to the basic ISAs.
>  should be handled similarly to other intrinsic header files.
> That is  should use
>
> #pragma GCC push_options
> #ifdef __x86_64__
> #pragma GCC target("arch=x86-64")
> #else
> #pragma GCC target("arch=i386")
> ...
> #pragma GCC pop_options
>
> Here is a patch.  OK for master?

-ENOPATCH

However, how will this affect inlining? Every single function in
cpuid.h is defined as static __inline, and due to target flags
mismatch, it won't be inlined anymore. These inline functions are used
in some bit testing functions, and to keep them inlined, these should
also use the same options to avoid non-basic ISAs. This is the reason
cpuid.h should be #included after pragma, together with bit testing
functions, as shown above.

Uros.


[PATCH] x86: Use -march=x86-64/-march=i386 in

2020-08-24 Thread H.J. Lu via Gcc-patches
On Sun, Aug 23, 2020 at 9:03 AM Uros Bizjak  wrote:
>
> On Sun, Aug 23, 2020 at 5:23 PM H.J. Lu  wrote:
> >
> > On Sun, Aug 23, 2020 at 10:18:28AM +0200, Uros Bizjak wrote:
> > > On Sat, Aug 22, 2020 at 9:09 PM H.J. Lu  wrote:
> > >
> > > > > > Compile CPUID check with "-mno-sse -mfpmath=387" to disable SSE, 
> > > > > > AVX and
> > > > > > AVX512 during CPUID check to avoid vector and mask register 
> > > > > > operations.
> > > > >
> > > > > -mgeneral-regs-only ?
> > > > >
> > > >
> > > > Here is a patch to add target("general-regs-only") function
> > > > attribute and use it for CPUID check.   OK for master if there
> > > > are no regressions?
> > >
> > > Please test it first, then ask for an approval.
> > >
> > > Please submit the general-regs-only part as an independent patch. (I
> > > think this is the option linux should use for compilation).
> > >
> > > OTOH, wrapping CPUID check in a target attribute is a bad idea. We
> > > should disable spills to mask registers for generic targets by either
> > > raising costs of moves between general and mask registers and/or (as
> > > suggested earlier) introducing TARGET_SPILL_TO_MASK_REGS tuning and
> > > use it in secondary_memory_needed to prevent inter register unit
> > > spills.
> > >
> > > So, compiling with -mavx512bw would NOT enable spills by default,
> > > where compiling with -march=skylake-avx512 (or using equivalent
> > > -mtune) would. This is IMO the least surprising approach, and would
> > > avoid changing sources (as you now have to do for several testcases).
> >
> > We have 2 orthogonal issues here:
> >
> > 1. When mask register spill should be enabled.
> > 2. CPUID check should be done with general registers only.
> >
> > As shown in GCC testcases, CPUID check may be done with arbitrary ISAs
> > or -march/-mtune options enabled.  We should either
> >
> > 1. Enable only general registers for CPUID check.  Or
> > 2. Issue an error for CPUID check if non-general registers are used.
>
> We should follow the same approach as with SSE2, where DI/SImode
> spills to XMM registers were effectively disabled for a generic
> target. So, unless the tuning target is also specified, spills to mask
> registers should not be generated. It was my oversight to approve the
> patch that enables spills for a generic target, and without the tuning
> flag, the patch will be reverted.
>
> Now, we have -mgeneral-regs-only functionality in place, so if a
> package wants to enable spills, the correct -mtune (ro -march that
> implies -mtune) should be used, and it is expected that the detection
> code is amended with general-regs-only pragmas.
>
> 
> Speaking of pragmas, these should be added outside cpuid.h, like:
>
> #pragma GCC push_options
> #pragma GCC target("general-regs-only")
>
> #include 
>
> void cpuid_check ()
> ...
>
> #pragma GCC pop_options
>
> >footnote
>
> Nowadays, -march=native is mostly used outside generic target
> compilations, so for relevant avx512 targets, we still generate spills
> to mask regs. In future, we can review the setting of the tuning flag
> for a generic target in the same way as with SSE2 inter-reg moves.
>

Florian raised an issue that we need to limit  to the basic ISAs.
 should be handled similarly to other intrinsic header files.
That is  should use

#pragma GCC push_options
#ifdef __x86_64__
#pragma GCC target("arch=x86-64")
#else
#pragma GCC target("arch=i386")
...
#pragma GCC pop_options

Here is a patch.  OK for master?

Thanks.

-- 
H.J.


[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-08-01 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #15 from Jan Hubicka  ---
> I think, this inliner change needs to be reverted. People expect -O2 to 
> produce
> decently optimized binaries, and starting with gcc 10.x it doesn't deliver. 
> -O3
> traditionally enabled optimizations that may or may not improve performance
> (and historically, sometimes even break code), so most projects don't use it.
I wrote a short description of inliner changes to the phoronix
discussion
https://www.phoronix.com/forums/forum/software/programming-compilers/1196789-gcc-benchmarks-at-varying-optimization-levels-with-core-i9-10900k-show-an-unexpected-surprise/page5
comment 44.

Inliner changes was not targetting to make compile time faster and
compiled code slower. It was intended to reflect more closely modern C++
codebases and get faster binaries (at -O2 and -O2 -flto) without
regressing in code sizes.  In fact more inlining happens and thus we
needed to optimize inliner code carefully to avoid regressions with LTO.

It was benchmarked on wide range of bechmarks including some where
phoronix measured a degradation before GCC10 release.

The benchmarks presented does not reproduce and seems odd. 50% on very
simple benchmarks is bit too much for a change in one optimization.  It
seems more like thermal throttling. Michael promised to re-run the tests
and he is still spekaing about htat in the last reply from 31st.

Testcases are greatly welcome.

Honza

Re: [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-08-01 Thread Jan Hubicka
> I think, this inliner change needs to be reverted. People expect -O2 to 
> produce
> decently optimized binaries, and starting with gcc 10.x it doesn't deliver. 
> -O3
> traditionally enabled optimizations that may or may not improve performance
> (and historically, sometimes even break code), so most projects don't use it.
I wrote a short description of inliner changes to the phoronix
discussion
https://www.phoronix.com/forums/forum/software/programming-compilers/1196789-gcc-benchmarks-at-varying-optimization-levels-with-core-i9-10900k-show-an-unexpected-surprise/page5
comment 44.

Inliner changes was not targetting to make compile time faster and
compiled code slower. It was intended to reflect more closely modern C++
codebases and get faster binaries (at -O2 and -O2 -flto) without
regressing in code sizes.  In fact more inlining happens and thus we
needed to optimize inliner code carefully to avoid regressions with LTO.

It was benchmarked on wide range of bechmarks including some where
phoronix measured a degradation before GCC10 release.

The benchmarks presented does not reproduce and seems odd. 50% on very
simple benchmarks is bit too much for a change in one optimization.  It
seems more like thermal throttling. Michael promised to re-run the tests
and he is still spekaing about htat in the last reply from 31st.

Testcases are greatly welcome.

Honza


[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-08-01 Thread david.bolvansky at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #14 from Dávid Bolvanský  ---
Or change -Os to be gcc10 -O2 with less inlining, -revert O2 to gcc9 -02 and
implement -Oz to create agressive “-Os”.

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-08-01 Thread andysem at mail dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #13 from andysem at mail dot ru ---
I think, this inliner change needs to be reverted. People expect -O2 to produce
decently optimized binaries, and starting with gcc 10.x it doesn't deliver. -O3
traditionally enabled optimizations that may or may not improve performance
(and historically, sometimes even break code), so most projects don't use it.

If there needs to be an optimization mode that prioritizes compilation speed
then let that be a separate mode, e.g. -O1.

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-29 Thread aros at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #12 from Artem S. Tashkinov  ---
Michael has admitted that might be a specific CPU relate regression:

> Been running some more tests today:
> - Tried on a i9-10980XE Cascade Lake and Cascade Lake Xeon systems and did 
> not reproduce...
> - I went back to the i9-10900K and picked just a few of the tests where it 
> was impacted the hardest, but then surprisingly the results were similar that 
> run.

Source:
https://www.phoronix.com/forums/forum/software/programming-compilers/1196789-gcc-benchmarks-at-varying-optimization-levels-with-core-i9-10900k-show-an-unexpected-surprise?p=1197196#post1197196

The plot thickens.

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-28 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #11 from Jan Hubicka  ---
> 
> Maybe you want to use same GCC version as phoronix used (GCC 10.2)?
OK, I will give it a try, but there are no inliner changes in gcc 10.2
compared to 10.1.

Honza

Re: [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-28 Thread Jan Hubicka
> 
> Maybe you want to use same GCC version as phoronix used (GCC 10.2)?
OK, I will give it a try, but there are no inliner changes in gcc 10.2
compared to 10.1.

Honza


[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-28 Thread david.bolvansky at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #10 from Dávid Bolvanský  ---
>> Compiler version : GCC10.1.1

Maybe you want to use same GCC version as phoronix used (GCC 10.2)?

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-28 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #9 from Jan Hubicka  ---
scimark
GCC 9:
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
Composite Score: 1062.28
FFT Mflops:   189.17(N=1048576)
SOR Mflops:   947.53(1000 x 1000)
MonteCarlo: Mflops:   710.10
Sparse matmult  Mflops:  1402.08(N=10, nz=100)
LU  Mflops:  2062.49(M=1000, N=1000)

GCC 10:
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
Composite Score: 1176.22
FFT Mflops:   201.17(N=1048576)
SOR Mflops:   961.33(1000 x 1000)
MonteCarlo: Mflops:   708.62
Sparse matmult  Mflops:  1639.66(N=10, nz=100)
LU  Mflops:  2370.30(M=1000, N=1000)

So again around 10% improvement for gcc10

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-28 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #8 from Jan Hubicka  ---
This is the built withour release flags override as seems to be done by
phoronix:

GCC 9:
y4m  [info]: 1920x1080 fps 30/1 i420p8 frames 0 - 599 of 600
raw  [info]: output file: /dev/null
x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9
x265 [info]: build info [Linux][GCC 9.3.1][64 bit][noasm] 8bit
x265 [info]: using cpu capabilities: none!
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices  : 1
x265 [info]: frame threads / pool features   : 2 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt: 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb   : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress: CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I:  3, Avg QP:27.57  kb/s: 14018.64  
x265 [info]: frame P:146, Avg QP:28.84  kb/s: 4313.98 
x265 [info]: frame B:451, Avg QP:35.29  kb/s: 204.06  
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 0.7% 0.0% 0.0% 94.6% 4.7% 

encoded 600 frames in 171.30s (3.50 fps), 1273.22 kb/s, Avg QP:33.68
599.58user 1.62system 2:51.33elapsed 350%CPU (0avgtext+0avgdata
416976maxresident)k
225384inputs+0outputs (0major+95380minor)pagefaults 0swaps

GCC 10:
y4m  [info]: 1920x1080 fps 30/1 i420p8 frames 0 - 599 of 600
raw  [info]: output file: /dev/null
x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9
x265 [info]: build info [Linux][GCC 10.1.1][64 bit][noasm] 8bit
x265 [info]: using cpu capabilities: none!
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices  : 1
x265 [info]: frame threads / pool features   : 2 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt: 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb   : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress: CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I:  3, Avg QP:27.57  kb/s: 14018.64  
x265 [info]: frame P:146, Avg QP:28.84  kb/s: 4313.98 
x265 [info]: frame B:451, Avg QP:35.29  kb/s: 204.06  
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 0.7% 0.0% 0.0% 94.6% 4.7% 

encoded 600 frames in 168.97s (3.55 fps), 1273.22 kb/s, Avg QP:33.68
592.69user 1.89system 2:49.00elapsed 351%CPU (0avgtext+0avgdata
416184maxresident)k
476408inputs+0outputs (1major+95191minor)pagefaults 0swaps

So a small improvement.

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-28 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #7 from Jan Hubicka  ---
X265
GCC 9:
y4m  [info]: 1920x1080 fps 30/1 i420p8 frames 0 - 599 of 600
raw  [info]: output file: /dev/null
x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9
x265 [info]: build info [Linux][GCC 9.3.1][64 bit][noasm] 8bit
x265 [info]: using cpu capabilities: none!
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices  : 1
x265 [info]: frame threads / pool features   : 2 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt: 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb   : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress: CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I:  3, Avg QP:27.57  kb/s: 14018.64  
x265 [info]: frame P:146, Avg QP:28.84  kb/s: 4313.98 
x265 [info]: frame B:451, Avg QP:35.29  kb/s: 204.06  
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 0.7% 0.0% 0.0% 94.6% 4.7% 

encoded 600 frames in 279.98s (2.14 fps), 1273.22 kb/s, Avg QP:33.68
1056.04user 1.31system 4:40.01elapsed 377%CPU (0avgtext+0avgdata
432688maxresident)k
0inputs+0outputs (0major+102385minor)pagefaults 0swaps


GCC 10:
y4m  [info]: 1920x1080 fps 30/1 i420p8 frames 0 - 599 of 600
raw  [info]: output file: /dev/null
x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9
x265 [info]: build info [Linux][GCC 10.1.1][64 bit][noasm] 8bit
x265 [info]: using cpu capabilities: none!
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices  : 1
x265 [info]: frame threads / pool features   : 2 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt: 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb   : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress: CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I:  3, Avg QP:27.57  kb/s: 14018.64  
x265 [info]: frame P:146, Avg QP:28.84  kb/s: 4313.98 
x265 [info]: frame B:451, Avg QP:35.29  kb/s: 204.06  
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 0.7% 0.0% 0.0% 94.6% 4.7% 

encoded 600 frames in 292.63s (2.05 fps), 1273.22 kb/s, Avg QP:33.68
1079.80user 1.76system 4:52.65elapsed 369%CPU (0avgtext+0avgdata
427464maxresident)k
0inputs+0outputs (0major+73644minor)pagefaults 0swaps

So 5% difference instead of 50%. This is a codebase that I would build with
-O3.  Looking at perf reports there is a difference in inlining.

GCC 9:
   8.74%  x265 libx265.so.176   [.] (anonymous namespace)::satd_8x4
   5.67%  x265 libx265.so.176   [.] (anonymous
namespace)::filterVertical_sp_c<8>
   4.44%  x265 libx265.so.176   [.] (anonymous
namespace)::pixelavg_pp<8, 8>
   4.11%  x265 libx265.so.176   [.] (anonymous
namespace)::psyCost_pp<3>   
   3.81%  x265 libx265.so.176   [.] (anonymous
namespace)::interp_horiz_ps_c<8, 64, 64>
   3.33%  x265 libx265.so.176   [.] (anonymous namespace)::sad<8, 8>
   3.29%  x265 libx265.so.176   [.] partialButterfly32

GCC 10:
   9.17%  x265 libx265.so.176   [.] (anonymous namespace)::_sa8d_8x8
   8.70%  x265 libx265.so.176   [.] (anonymous namespace)::satd_8x4 
   5.80%  x265 libx265.so.176   [.] (anonymous
namespace)::pixelavg_pp<8, 8>
   5.55%  x265 libx265.so.176   [.] (anonymous
namespace)::filterVertical_sp_c<8> 
   3.90%  x265 libx265.so.176   [.] (anonymous namespace)::sad<8, 8>
   3.71%  x265 libx265.so.176   [.] (anonymous
namespace)::interp_horiz_ps_c<8, 64, 64> 
   3.48%  x265 libx265.so.176   [.] (anonymous namespace)::sad_x4<8, 8>

I build with 
cmake ../source/ -DCMAKE_CXX_FLAGS=-O2 

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-28 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #6 from Jan Hubicka  ---
Coremark.

GCC 9 run1:
CoreMark Size: 666
Total ticks  : 12310
Total time (secs): 12.31
Iterations/Sec   : 24370.430544
Iterations   : 30
Compiler version : GCC9.3.1 20200406 [revision
6db837a5288ee3ca5ec504fbd5a765817e556ac2]
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt

GCC 9 run2:
CoreMark Size: 666
Total ticks  : 12471
Total time (secs): 12.471000
Iterations/Sec   : 24055.809478
Iterations   : 30
Compiler version : GCC9.3.1 20200406 [revision
6db837a5288ee3ca5ec504fbd5a765817e556ac2]
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt


GCC 10 run1:
CoreMark Size: 666
Total ticks  : 15269
Total time (secs): 15.269000
Iterations/Sec   : 26196.869474
Iterations   : 40
Compiler version : GCC10.1.1 20200507 [revision
dd38686d9c810cecbaa80bb82ed91caaa58ad635]
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt

GCC 10 run2:
CoreMark Size: 666
Total ticks  : 11770
Total time (secs): 11.77
Iterations/Sec   : 25488.530161
Iterations   : 30
Compiler version : GCC10.1.1 20200507 [revision
dd38686d9c810cecbaa80bb82ed91caaa58ad635]
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-28 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #5 from Jan Hubicka  ---
OK, I started with checking Himeno where phoronix reports 4377->2681
on my notebook (Intel(R) Core(TM) i7-6600U CPU) there may be around 1-5%
regression that is not inliner related

GCC 10
 Loop executed for 7445 times
 Gosa : 2.924613e-08 
 MFLOPS measured : 2346.645663  cpu : 50.172505
 Score based on Pentium III 600MHz using Fortran 77: 28.617630

GCC 9
 Loop executed for 8253 times
 Gosa : 9.062229e-09 
 MFLOPS measured : 2454.019320  cpu : 53.184180
 Score based on Pentium III 600MHz using Fortran 77: 29.927065

The internal loops and inlining looks almost identical.

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-28 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #4 from Jan Hubicka  ---
There was changes to -O2 inliner.  I have
 - enabled auto-inlininig
 - reduced early inlining a bit
 - reduced limits for inlining functions declared inline
The second two was needed to keep code size under control and did well on
overall -O2 spec and Firefox performance (without FDO, with FDO we indeed had
some performance loss and code size gains, which I plan to revisit).

This should not be visible on linux kernel though since it does always inline.
The linked patch to enable -O3 by default does not make too much sense to me. 

I will see if I can reproduce phoronix benchmarks - indeed those workloads are
not typical -O2 workloads and may be affected by the inline limits.

Honza

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-28 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

Richard Biener  changed:

   What|Removed |Added

  Component|rtl-optimization|ipa
Summary|GCC 10.2: twice as slow for |[10/11 Regression] GCC
   |-O2 -march=x86-64 vs. GCC   |10.2: twice as slow for -O2
   |9.3/8.4 |-march=x86-64 vs. GCC
   ||9.3/8.4
   Keywords||missed-optimization
 CC||hubicka at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org

--- Comment #3 from Richard Biener  ---
Well, the workloads tested are not -O2 workloads but yes, distros likely still
will use -O2 for them unless the package itself overrides.

But IIRC the main change was that -O2 -fprofile-use no longer uses -O3
inliner settings, the settings for -O2 itself were not changed much?  Honza?

[Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-28 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.3

[Bug rtl-optimization/96337] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-27 Thread aros at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #2 from Artem S. Tashkinov  ---
Looks like even kernel performance is affected:
https://lore.kernel.org/lkml/20200507224530.2993316-1-ja...@zx2c4.com/

That was surely not a change for the better.

[Bug rtl-optimization/96337] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-27 Thread david.bolvansky at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

Dávid Bolvanský  changed:

   What|Removed |Added

 CC||david.bolvansky at gmail dot 
com

--- Comment #1 from Dávid Bolvanský  ---
Inliner changes?

[Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4

2020-07-27 Thread aros at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

Bug ID: 96337
   Summary: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC
9.3/8.4
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: aros at gmx dot com
  Target Milestone: ---

All the pertinent details are on this page:

https://www.phoronix.com/scan.php?page=article=gcc-10900k-compiler=4

Options used: -O2

CPU used: Intel Core i9 10900K

I wonder what could have caused such a huge regression. Many Linux distros
compile their code just for -march=x86-64 and could be affected by the bug.

Re: On -march=x86-64

2019-07-14 Thread Allan Sandfeld Jensen
On Donnerstag, 11. Juli 2019 20:58:04 CEST Allan Sandfeld Jensen wrote:
> Years ago I discovered Chrome was optimizing with  -march=x86-64, and
> knowing was an undocumented arch that would optimize for K8 I laughed at it
> and just removed that piece of idiocy from our fork of Chromium, so it
> would be faster than upstream. Recently though I noticed phoronix is also
> using now sometimes optimize with -march=x86-64 instead of using, well..
> nothing. as they should. And checking recent GCC documentation I noticed
> that in gcc 8 and gcc 9 documentation you are now documenting -march=x86-64
> and calling it a generic 64-bit processor. So now it is no longer a
> laughing matter that people mistake it for that.
> 
> I would suggest instead of fixing the documentation to say what
> -march=x86-64 actually does, that we should perhaps change it to do what
> people expect and make it an alias for generic?
> 
Nevermind, it was fixed back in 2007 when the generic architecture was 
introduced. The arch table is just misleading here.

Best regards
'Allan




On -march=x86-64

2019-07-11 Thread Allan Sandfeld Jensen
Years ago I discovered Chrome was optimizing with  -march=x86-64, and knowing 
was an undocumented arch that would optimize for K8 I laughed at it and just 
removed that piece of idiocy from our fork of Chromium, so it would be faster 
than upstream. Recently though I noticed phoronix is also using now sometimes 
optimize with -march=x86-64 instead of using, well.. nothing. as they should. 
And checking recent GCC documentation I noticed that in gcc 8 and gcc 9 
documentation you are now documenting -march=x86-64 and calling it a generic 
64-bit processor. So now it is no longer a laughing matter that people mistake 
it for that. 

I would suggest instead of fixing the documentation to say what -march=x86-64 
actually does, that we should perhaps change it to do what people expect and 
make it an alias for generic?

Best regards
'Allan





[Bug target/57112] -march=x86-64 not documented

2018-07-31 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57112

Marc Glisse  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
  Known to work||8.2.0
 Resolution|--- |FIXED

--- Comment #4 from Marc Glisse  ---
r258953 | marxin | 2018-03-29 15:02:23 +0200 (Thu, 29 Mar 2018) | 9 lines

Documentation tweaks.

2018-03-29  Martin Liska  

PR lto/84995.
* doc/invoke.texi: Document how LTO works with debug info.
Describe auto-load support of binutils.  Mention 'x86-64'
as valid option value of -march option.

[Bug target/57112] -march=x86-64 not documented

2018-07-31 Thread curlypaul924 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57112

Paul Brannan  changed:

   What|Removed |Added

 CC||curlypaul924 at gmail dot com

--- Comment #3 from Paul Brannan  ---
-march=x86-64 is in the man page for gcc 8.2 (it was not in the man page for
5.4; I'm not sure which version it first appears).

[Bug target/57112] -march=x86-64 not documented

2015-10-06 Thread manu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57112

Manuel López-Ibáñez  changed:

   What|Removed |Added

   Last reconfirmed|2013-04-29 00:00:00 |2015-10-6

--- Comment #2 from Manuel López-Ibáñez  ---
I just learned from it by googling! :(

If someone cared to explain it, I will add it to the FAQ (I can't send patches
ATM, sorry)

[Bug target/57112] New: -march=x86-64 not documented

2013-04-29 Thread glisse at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57112



 Bug #: 57112

   Summary: -march=x86-64 not documented

Classification: Unclassified

   Product: gcc

   Version: 4.9.0

Status: UNCONFIRMED

  Keywords: documentation

  Severity: normal

  Priority: P3

 Component: target

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: gli...@gcc.gnu.org

Target: x86_64-linux-gnu





Hello,



for people who have a gcc that was configured with some non-default --with-arch

but who want to build a generic binary, an option like -march=generic would be

useful. It seems that the right spelling is -march=x86-64, but I couldn't find

that documented in invoke.texi.


[Bug target/57112] -march=x86-64 not documented

2013-04-29 Thread redi at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57112



Jonathan Wakely redi at gcc dot gnu.org changed:



   What|Removed |Added



 Status|UNCONFIRMED |NEW

   Last reconfirmed||2013-04-29

 Ever Confirmed|0   |1



--- Comment #1 from Jonathan Wakely redi at gcc dot gnu.org 2013-04-29 
12:59:28 UTC ---

Yes, I only learned about it from reading the sources.