from:"Haochen Jiang via Gcc\-cvs"

[gcc r14-10397] i386: Correct AVX10 CPUID emulation

2024-07-09 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:74c15cb93b3830fee79f75805329d4299ff4a2f0

commit r14-10397-g74c15cb93b3830fee79f75805329d4299ff4a2f0
Author: Haochen Jiang 
Date:   Tue Jul 9 16:31:02 2024 +0800

i386: Correct AVX10 CPUID emulation

AVX10 Documentaion has specified ecx value as 0 for AVX10 version and
vector size under 0x24 subleaf. Although for ecx=1, the bits are all
reserved for now, we still need to specify ecx as 0 to avoid dirty
value in ecx.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features): Correct
AVX10 CPUID emulation to specify ecx value.

Diff:
---
 gcc/common/config/i386/cpuinfo.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 017a952a5db0..56427474b7be 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -1014,10 +1014,10 @@ get_available_features (struct __processor_model 
*cpu_model,
}
 }
 
-  /* Get Advanced Features at level 0x24 (eax = 0x24).  */
+  /* Get Advanced Features at level 0x24 (eax = 0x24, ecx = 0).  */
   if (avx10_set && max_cpuid_level >= 0x24)
 {
-  __cpuid (0x24, eax, ebx, ecx, edx);
+  __cpuid_count (0x24, 0, eax, ebx, ecx, edx);
   version = ebx & 0xff;
   if (ebx & bit_AVX10_256)
switch (version)

[gcc r15-1908] i386: Correct AVX10 CPUID emulation

2024-07-09 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:298a576f00c49b8f4529ea2f87b9943a32743250

commit r15-1908-g298a576f00c49b8f4529ea2f87b9943a32743250
Author: Haochen Jiang 
Date:   Tue Jul 9 16:31:02 2024 +0800

i386: Correct AVX10 CPUID emulation

AVX10 Documentaion has specified ecx value as 0 for AVX10 version and
vector size under 0x24 subleaf. Although for ecx=1, the bits are all
reserved for now, we still need to specify ecx as 0 to avoid dirty
value in ecx.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features): Correct
AVX10 CPUID emulation to specify ecx value.

Diff:
---
 gcc/common/config/i386/cpuinfo.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 936039725ab6..2ae77d335d24 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -998,10 +998,10 @@ get_available_features (struct __processor_model 
*cpu_model,
}
 }
 
-  /* Get Advanced Features at level 0x24 (eax = 0x24).  */
+  /* Get Advanced Features at level 0x24 (eax = 0x24, ecx = 0).  */
   if (avx10_set && max_cpuid_level >= 0x24)
 {
-  __cpuid (0x24, eax, ebx, ecx, edx);
+  __cpuid_count (0x24, 0, eax, ebx, ecx, edx);
   version = ebx & 0xff;
   if (ebx & bit_AVX10_256)
switch (version)

[gcc r14-10283] testsuite: i386: Require ifunc support in gcc.target/i386/avx10_1-25.c etc.

2024-06-04 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:e11a42b8c7ac32f8a1e307f99719a0f9c63813e8

commit r14-10283-ge11a42b8c7ac32f8a1e307f99719a0f9c63813e8
Author: Rainer Orth 
Date:   Tue Jun 4 13:33:46 2024 +0200

testsuite: i386: Require ifunc support in gcc.target/i386/avx10_1-25.c etc.

Two new AVX10.1 tests FAIL on Solaris/x86:

FAIL: gcc.target/i386/avx10_1-25.c (test for excess errors)
FAIL: gcc.target/i386/avx10_1-26.c (test for excess errors)

Excess errors:

/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/avx10_1-25.c:6:9: 
error: the call requires 'ifunc', which is not supported by this target

Fixed by requiring ifunc support.

Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

2024-06-04  Rainer Orth  

gcc/testsuite:
* gcc.target/i386/avx10_1-25.c: Require ifunc support.
* gcc.target/i386/avx10_1-26.c: Likewise.

Diff:
---
 gcc/testsuite/gcc.target/i386/avx10_1-25.c | 1 +
 gcc/testsuite/gcc.target/i386/avx10_1-26.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-25.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
index 73f1b724560..5bd2b88fb08 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_1-25.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx" } */
+/* { dg-require-ifunc "" } */
 
 #include 
 __attribute__((target_clones ("default","avx10.1-256")))
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-26.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
index 514ab57a406..cf8c976e21f 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_1-26.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-ifunc "" } */
 
 #include 
 __attribute__((target_clones ("default","avx10.1-512")))

[gcc r14-10271] Add AVX10.1 target_clones support

2024-06-03 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:97474ba2075dc3c397bbc2861646561dcfd13386

commit r14-10271-g97474ba2075dc3c397bbc2861646561dcfd13386
Author: Haochen Jiang 
Date:   Mon May 20 15:52:32 2024 +0800

Add AVX10.1 target_clones support

Since AVX10 is the first major ISA introduced after AVX-512, we propose
to add target_clones support for it.

Although AVX10.1-256 won't cover 512-bit part of AVX512F, but since
it is only for priority but not for implication, it won't be an issue.

gcc/ChangeLog:

* common/config/i386/i386-common.cc: Change Granite Rapids
series CPU type to P_PROC_AVX10_1_512.
* common/config/i386/i386-cpuinfo.h (enum feature_priority):
Revise comment part. Add P_AVX10_1_256, P_AVX10_1_512,
P_PROC_AVX10_1_512.
* common/config/i386/i386-isas.h: Link to avx10.1-256, avx10.1-512.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-25.c: New test.
* gcc.target/i386/avx10_1-26.c: Ditto.

Diff:
---
 gcc/common/config/i386/i386-common.cc  | 4 ++--
 gcc/common/config/i386/i386-cpuinfo.h  | 5 -
 gcc/common/config/i386/i386-isas.h | 4 ++--
 gcc/testsuite/gcc.target/i386/avx10_1-25.c | 9 +
 gcc/testsuite/gcc.target/i386/avx10_1-26.c | 9 +
 5 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 77b154663bc..d578918dfb7 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2273,10 +2273,10 @@ const pta processor_alias_table[] =
   {"meteorlake", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
 M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
   {"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL, PTA_GRANITERAPIDS,
-M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
+M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX10_1_512},
   {"graniterapids-d", PROCESSOR_GRANITERAPIDS_D, CPU_HASWELL,
 PTA_GRANITERAPIDS_D, M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D),
-P_PROC_AVX512F},
+P_PROC_AVX10_1_512},
   {"arrowlake", PROCESSOR_ARROWLAKE, CPU_HASWELL, PTA_ARROWLAKE,
 M_CPU_SUBTYPE (INTEL_COREI7_ARROWLAKE), P_PROC_AVX2},
   {"arrowlake-s", PROCESSOR_ARROWLAKE_S, CPU_HASWELL, PTA_ARROWLAKE_S,
diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
b/gcc/common/config/i386/i386-cpuinfo.h
index 73131657eab..be52ad2c60d 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -112,7 +112,7 @@ enum processor_subtypes
 /* Priority of i386 features, greater value is higher priority.   This is
used to decide the order in which function dispatch must happen.  For
instance, a version specialized for SSE4.2 should be checked for dispatch
-   before a version for SSE3, as SSE4.2 implies SSE3.  */
+   before a version for SSE3.  */
 enum feature_priority
 {
   P_NONE = 0,
@@ -148,6 +148,9 @@ enum feature_priority
   P_AVX512F,
   P_PROC_AVX512F,
   P_X86_64_V4,
+  P_AVX10_1_256,
+  P_AVX10_1_512,
+  P_PROC_AVX10_1_512,
   P_PROC_DYNAMIC
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h 
b/gcc/common/config/i386/i386-isas.h
index d6deb9a1522..9c2179a3dd8 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -194,6 +194,6 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("apxf", FEATURE_APX_F, P_NONE, "-mapxf")
   ISA_NAMES_TABLE_ENTRY("usermsr", FEATURE_USER_MSR, P_NONE, "-musermsr")
   ISA_NAMES_TABLE_ENTRY("avx10.1", FEATURE_AVX10_1_256, P_NONE, "-mavx10.1")
-  ISA_NAMES_TABLE_ENTRY("avx10.1-256", FEATURE_AVX10_1_256, P_NONE, 
"-mavx10.1-256")
-  ISA_NAMES_TABLE_ENTRY("avx10.1-512", FEATURE_AVX10_1_512, P_NONE, 
"-mavx10.1-512")
+  ISA_NAMES_TABLE_ENTRY("avx10.1-256", FEATURE_AVX10_1_256, P_AVX10_1_256, 
"-mavx10.1-256")
+  ISA_NAMES_TABLE_ENTRY("avx10.1-512", FEATURE_AVX10_1_512, P_AVX10_1_512, 
"-mavx10.1-512")
 ISA_NAMES_TABLE_END
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-25.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
new file mode 100644
index 000..73f1b724560
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+#include 
+__attribute__((target_clones ("default","avx10.1-256")))
+__m256d foo(__m256d a, __m256d b)
+{
+  return a + b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-26.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
new file mode 100644
index 000..514ab57a406
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f" } */
+
+#include 
+__attribute__((target_clones ("default","avx10.1-512")))
+__m512d foo(__m512d a, __m512d b)
+{
+  return a + b;
+}

[gcc r15-983] Add AVX10.1 target_clones support

2024-06-03 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:1f2ca510065a2033bac408eb5a960ef0126f25cc

commit r15-983-g1f2ca510065a2033bac408eb5a960ef0126f25cc
Author: Haochen Jiang 
Date:   Mon May 20 15:52:32 2024 +0800

Add AVX10.1 target_clones support

Since AVX10 is the first major ISA introduced after AVX-512, we propose
to add target_clones support for it.

Although AVX10.1-256 won't cover 512-bit part of AVX512F, but since
it is only for priority but not for implication, it won't be an issue.

gcc/ChangeLog:

* common/config/i386/i386-common.cc: Change Granite Rapids
series CPU type to P_PROC_AVX10_1_512.
* common/config/i386/i386-cpuinfo.h (enum feature_priority):
Revise comment part. Add P_AVX10_1_256, P_AVX10_1_512,
P_PROC_AVX10_1_512.
* common/config/i386/i386-isas.h: Link to avx10.1-256, avx10.1-512.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-25.c: New test.
* gcc.target/i386/avx10_1-26.c: Ditto.

Diff:
---
 gcc/common/config/i386/i386-common.cc  | 4 ++--
 gcc/common/config/i386/i386-cpuinfo.h  | 5 -
 gcc/common/config/i386/i386-isas.h | 4 ++--
 gcc/testsuite/gcc.target/i386/avx10_1-25.c | 9 +
 gcc/testsuite/gcc.target/i386/avx10_1-26.c | 9 +
 5 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 895e5fa662d..5d9c188c9c7 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2187,10 +2187,10 @@ const pta processor_alias_table[] =
   {"meteorlake", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
 M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
   {"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL, PTA_GRANITERAPIDS,
-M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
+M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX10_1_512},
   {"graniterapids-d", PROCESSOR_GRANITERAPIDS_D, CPU_HASWELL,
 PTA_GRANITERAPIDS_D, M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D),
-P_PROC_AVX512F},
+P_PROC_AVX10_1_512},
   {"arrowlake", PROCESSOR_ARROWLAKE, CPU_HASWELL, PTA_ARROWLAKE,
 M_CPU_SUBTYPE (INTEL_COREI7_ARROWLAKE), P_PROC_AVX2},
   {"arrowlake-s", PROCESSOR_ARROWLAKE_S, CPU_HASWELL, PTA_ARROWLAKE_S,
diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
b/gcc/common/config/i386/i386-cpuinfo.h
index 9edad96d4fd..3ec9e005a6a 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -110,7 +110,7 @@ enum processor_subtypes
 /* Priority of i386 features, greater value is higher priority.   This is
used to decide the order in which function dispatch must happen.  For
instance, a version specialized for SSE4.2 should be checked for dispatch
-   before a version for SSE3, as SSE4.2 implies SSE3.  */
+   before a version for SSE3.  */
 enum feature_priority
 {
   P_NONE = 0,
@@ -146,6 +146,9 @@ enum feature_priority
   P_AVX512F,
   P_PROC_AVX512F,
   P_X86_64_V4,
+  P_AVX10_1_256,
+  P_AVX10_1_512,
+  P_PROC_AVX10_1_512,
   P_PROC_DYNAMIC
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h 
b/gcc/common/config/i386/i386-isas.h
index 4b4d4b4af99..2a092f740bb 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -184,6 +184,6 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("apxf", FEATURE_APX_F, P_NONE, "-mapxf")
   ISA_NAMES_TABLE_ENTRY("usermsr", FEATURE_USER_MSR, P_NONE, "-musermsr")
   ISA_NAMES_TABLE_ENTRY("avx10.1", FEATURE_AVX10_1_256, P_NONE, "-mavx10.1")
-  ISA_NAMES_TABLE_ENTRY("avx10.1-256", FEATURE_AVX10_1_256, P_NONE, 
"-mavx10.1-256")
-  ISA_NAMES_TABLE_ENTRY("avx10.1-512", FEATURE_AVX10_1_512, P_NONE, 
"-mavx10.1-512")
+  ISA_NAMES_TABLE_ENTRY("avx10.1-256", FEATURE_AVX10_1_256, P_AVX10_1_256, 
"-mavx10.1-256")
+  ISA_NAMES_TABLE_ENTRY("avx10.1-512", FEATURE_AVX10_1_512, P_AVX10_1_512, 
"-mavx10.1-512")
 ISA_NAMES_TABLE_END
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-25.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
new file mode 100644
index 000..73f1b724560
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+#include 
+__attribute__((target_clones ("default","avx10.1-256")))
+__m256d foo(__m256d a, __m256d b)
+{
+  return a + b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-26.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
new file mode 100644
index 000..514ab57a406
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f" } */
+
+#include 
+__attribute__((target_clones ("default","avx10.1-512")))
+__m512d foo(__m512d a, __m512d b)
+{
+  return a + b;
+}

[gcc r15-888] Align tight loop without considering max skipping bytes.

2024-05-28 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:b644126237a1aa8599f767a5e0bbada1d7286f44

commit r15-888-gb644126237a1aa8599f767a5e0bbada1d7286f44
Author: liuhongt 
Date:   Wed May 29 11:14:26 2024 +0800

Align tight loop without considering max skipping bytes.

When hot loop is small enough to fix into one cacheline, we should align
the loop with ceil_log2 (loop_size) without considering maximum
skipp bytes. It will help code prefetch.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_avoid_jump_mispredicts): Change
gen_pad to gen_max_skip_align.
(ix86_align_loops): New function.
(ix86_reorg): Call ix86_align_loops.
* config/i386/i386.md (pad): Rename to ..
(max_skip_align): .. this, and accept 2 operands for align and
skip.

Diff:
---
 gcc/config/i386/i386.cc | 148 +++-
 gcc/config/i386/i386.md |  10 ++--
 2 files changed, 153 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 85d87b9f778..1a0206ab573 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23146,7 +23146,7 @@ ix86_avoid_jump_mispredicts (void)
  if (dump_file)
fprintf (dump_file, "Padding insn %i by %i bytes!\n",
 INSN_UID (insn), padsize);
-  emit_insn_before (gen_pad (GEN_INT (padsize)), insn);
+ emit_insn_before (gen_max_skip_align (GEN_INT (4), GEN_INT 
(padsize)), insn);
}
 }
 }
@@ -23419,6 +23419,150 @@ ix86_split_stlf_stall_load ()
 }
 }
 
+/* When a hot loop can be fit into one cacheline,
+   force align the loop without considering the max skip.  */
+static void
+ix86_align_loops ()
+{
+  basic_block bb;
+
+  /* Don't do this when we don't know cache line size.  */
+  if (ix86_cost->prefetch_block == 0)
+return;
+
+  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
+  profile_count count_threshold = cfun->cfg->count_max / param_align_threshold;
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  rtx_insn *label = BB_HEAD (bb);
+  bool has_fallthru = 0;
+  edge e;
+  edge_iterator ei;
+
+  if (!LABEL_P (label))
+   continue;
+
+  profile_count fallthru_count = profile_count::zero ();
+  profile_count branch_count = profile_count::zero ();
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+   {
+ if (e->flags & EDGE_FALLTHRU)
+   has_fallthru = 1, fallthru_count += e->count ();
+ else
+   branch_count += e->count ();
+   }
+
+  if (!fallthru_count.initialized_p () || !branch_count.initialized_p ())
+   continue;
+
+  if (bb->loop_father
+ && bb->loop_father->latch != EXIT_BLOCK_PTR_FOR_FN (cfun)
+ && (has_fallthru
+ ? (!(single_succ_p (bb)
+  && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
+&& optimize_bb_for_speed_p (bb)
+&& branch_count + fallthru_count > count_threshold
+&& (branch_count > fallthru_count * 
param_align_loop_iterations))
+ /* In case there'no fallthru for the loop.
+Nops inserted won't be executed.  */
+ : (branch_count > count_threshold
+|| (bb->count > bb->prev_bb->count * 10
+&& (bb->prev_bb->count
+<= ENTRY_BLOCK_PTR_FOR_FN (cfun)->count / 2)
+   {
+ rtx_insn* insn, *end_insn;
+ HOST_WIDE_INT size = 0;
+ bool padding_p = true;
+ basic_block tbb = bb;
+ unsigned cond_branch_num = 0;
+ bool detect_tight_loop_p = false;
+
+ for (unsigned int i = 0; i != bb->loop_father->num_nodes;
+  i++, tbb = tbb->next_bb)
+   {
+ /* Only handle continuous cfg layout. */
+ if (bb->loop_father != tbb->loop_father)
+   {
+ padding_p = false;
+ break;
+   }
+
+ FOR_BB_INSNS (tbb, insn)
+   {
+ if (!NONDEBUG_INSN_P (insn))
+   continue;
+ size += ix86_min_insn_size (insn);
+
+ /* We don't know size of inline asm.
+Don't align loop for call.  */
+ if (asm_noperands (PATTERN (insn)) >= 0
+ || CALL_P (insn))
+   {
+ size = -1;
+ break;
+   }
+   }
+
+ if (size == -1 || size > ix86_cost->prefetch_block)
+   {
+ padding_p = false;
+ break;
+   }
+
+ FOR_EACH_EDGE (e, ei, tbb->succs)
+   {
+ /* It could be part of the loop.  */
+ if (e->dest == bb)
+   {
+ detect_tight_loop_p = true;
+ break;
+   }
+   }
+
+ if

[gcc r15-887] Adjust generic loop alignment from 16:11:8 to 16 for Intel processors

2024-05-28 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:00ed5424b1d4dcccfa187f55205521826794898c

commit r15-887-g00ed5424b1d4dcccfa187f55205521826794898c
Author: Haochen Jiang 
Date:   Wed May 29 11:13:55 2024 +0800

Adjust generic loop alignment from 16:11:8 to 16 for Intel processors

Previously, we use 16:11:8 in generic tune for Intel processors, which
lead to cross cache line issue and result in some random performance
penalty in benchmarks with small loops commit to commit.

After changing to always aligning to 16 bytes, it will somehow solve
the issue.

gcc/ChangeLog:

* config/i386/x86-tune-costs.h (generic_cost): Change from
16:11:8 to 16.

Diff:
---
 gcc/config/i386/x86-tune-costs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 65d7d1f7e42..d34b5cc 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -3758,7 +3758,7 @@ struct processor_costs generic_cost = {
   generic_memset,
   COSTS_N_INSNS (4),   /* cond_taken_branch_cost.  */
   COSTS_N_INSNS (2),   /* cond_not_taken_branch_cost.  */
-  "16:11:8",   /* Loop alignment.  */
+  "16",/* Loop alignment.  */
   "16:11:8",   /* Jump alignment.  */
   "0:0:8", /* Label alignment.  */
   "16",/* Func alignment.  */

[gcc r14-10254] Align tight loop without considering max skipping bytes.

2024-05-28 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:b4d4ece0443433cd5c3078cfe03f18429e73b77a

commit r14-10254-gb4d4ece0443433cd5c3078cfe03f18429e73b77a
Author: liuhongt 
Date:   Wed May 29 11:12:51 2024 +0800

Align tight loop without considering max skipping bytes.

When hot loop is small enough to fix into one cacheline, we should align
the loop with ceil_log2 (loop_size) without considering maximum
skipp bytes. It will help code prefetch.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_avoid_jump_mispredicts): Change
gen_pad to gen_max_skip_align.
(ix86_align_loops): New function.
(ix86_reorg): Call ix86_align_loops.
* config/i386/i386.md (pad): Rename to ..
(max_skip_align): .. this, and accept 2 operands for align and
skip.

Diff:
---
 gcc/config/i386/i386.cc | 148 +++-
 gcc/config/i386/i386.md |  10 ++--
 2 files changed, 153 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index fbd9b4dac2e..984ba37beeb 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23135,7 +23135,7 @@ ix86_avoid_jump_mispredicts (void)
  if (dump_file)
fprintf (dump_file, "Padding insn %i by %i bytes!\n",
 INSN_UID (insn), padsize);
-  emit_insn_before (gen_pad (GEN_INT (padsize)), insn);
+ emit_insn_before (gen_max_skip_align (GEN_INT (4), GEN_INT 
(padsize)), insn);
}
 }
 }
@@ -23408,6 +23408,150 @@ ix86_split_stlf_stall_load ()
 }
 }
 
+/* When a hot loop can be fit into one cacheline,
+   force align the loop without considering the max skip.  */
+static void
+ix86_align_loops ()
+{
+  basic_block bb;
+
+  /* Don't do this when we don't know cache line size.  */
+  if (ix86_cost->prefetch_block == 0)
+return;
+
+  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
+  profile_count count_threshold = cfun->cfg->count_max / param_align_threshold;
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  rtx_insn *label = BB_HEAD (bb);
+  bool has_fallthru = 0;
+  edge e;
+  edge_iterator ei;
+
+  if (!LABEL_P (label))
+   continue;
+
+  profile_count fallthru_count = profile_count::zero ();
+  profile_count branch_count = profile_count::zero ();
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+   {
+ if (e->flags & EDGE_FALLTHRU)
+   has_fallthru = 1, fallthru_count += e->count ();
+ else
+   branch_count += e->count ();
+   }
+
+  if (!fallthru_count.initialized_p () || !branch_count.initialized_p ())
+   continue;
+
+  if (bb->loop_father
+ && bb->loop_father->latch != EXIT_BLOCK_PTR_FOR_FN (cfun)
+ && (has_fallthru
+ ? (!(single_succ_p (bb)
+  && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
+&& optimize_bb_for_speed_p (bb)
+&& branch_count + fallthru_count > count_threshold
+&& (branch_count > fallthru_count * 
param_align_loop_iterations))
+ /* In case there'no fallthru for the loop.
+Nops inserted won't be executed.  */
+ : (branch_count > count_threshold
+|| (bb->count > bb->prev_bb->count * 10
+&& (bb->prev_bb->count
+<= ENTRY_BLOCK_PTR_FOR_FN (cfun)->count / 2)
+   {
+ rtx_insn* insn, *end_insn;
+ HOST_WIDE_INT size = 0;
+ bool padding_p = true;
+ basic_block tbb = bb;
+ unsigned cond_branch_num = 0;
+ bool detect_tight_loop_p = false;
+
+ for (unsigned int i = 0; i != bb->loop_father->num_nodes;
+  i++, tbb = tbb->next_bb)
+   {
+ /* Only handle continuous cfg layout. */
+ if (bb->loop_father != tbb->loop_father)
+   {
+ padding_p = false;
+ break;
+   }
+
+ FOR_BB_INSNS (tbb, insn)
+   {
+ if (!NONDEBUG_INSN_P (insn))
+   continue;
+ size += ix86_min_insn_size (insn);
+
+ /* We don't know size of inline asm.
+Don't align loop for call.  */
+ if (asm_noperands (PATTERN (insn)) >= 0
+ || CALL_P (insn))
+   {
+ size = -1;
+ break;
+   }
+   }
+
+ if (size == -1 || size > ix86_cost->prefetch_block)
+   {
+ padding_p = false;
+ break;
+   }
+
+ FOR_EACH_EDGE (e, ei, tbb->succs)
+   {
+ /* It could be part of the loop.  */
+ if (e->dest == bb)
+   {
+ detect_tight_loop_p = true;
+ break;
+   }
+   }
+
+ if

[gcc r14-10253] Adjust generic loop alignment from 16:11:8 to 16 for Intel processors

2024-05-28 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:80600352d1282f084900ab444f2d4c83986f2ae5

commit r14-10253-g80600352d1282f084900ab444f2d4c83986f2ae5
Author: Haochen Jiang 
Date:   Wed May 29 11:12:37 2024 +0800

Adjust generic loop alignment from 16:11:8 to 16 for Intel processors

Previously, we use 16:11:8 in generic tune for Intel processors, which
lead to cross cache line issue and result in some random performance
penalty in benchmarks with small loops commit to commit.

After changing to always aligning to 16 bytes, it will somehow solve
the issue.

gcc/ChangeLog:

* config/i386/x86-tune-costs.h (generic_cost): Change from
16:11:8 to 16.

Diff:
---
 gcc/config/i386/x86-tune-costs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 65d7d1f7e42..d34b5cc 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -3758,7 +3758,7 @@ struct processor_costs generic_cost = {
   generic_memset,
   COSTS_N_INSNS (4),   /* cond_taken_branch_cost.  */
   COSTS_N_INSNS (2),   /* cond_not_taken_branch_cost.  */
-  "16:11:8",   /* Loop alignment.  */
+  "16",/* Loop alignment.  */
   "16:11:8",   /* Jump alignment.  */
   "0:0:8", /* Label alignment.  */
   "16",/* Func alignment.  */

[gcc r14-10229] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:1ad5c9d524d8fa99773045e75da04ae958012085

commit r14-10229-g1ad5c9d524d8fa99773045e75da04ae958012085
Author: Haochen Jiang 
Date:   Tue May 21 14:10:43 2024 +0800

i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

Since vpermq is really slow, we should avoid using it for permutation
when vpmovwb is not available (needs AVX512BW) for ix86_expand_vecop_qihi2
and fall back to ix86_expand_vecop_qihi.

gcc/ChangeLog:

PR target/115069
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Do not enable the optimization when AVX512BW is not enabled.

gcc/testsuite/ChangeLog:

PR target/115069
* gcc.target/i386/pr115069.c: New.

Diff:
---
 gcc/config/i386/i386-expand.cc   | 7 +++
 gcc/testsuite/gcc.target/i386/pr115069.c | 9 +
 2 files changed, 16 insertions(+)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 8bb8f21e686..51efe6fdd7d 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -23963,6 +23963,13 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
   bool uns_p = code != ASHIFTRT;
 
+  /* Without VPMOVWB (provided by AVX512BW ISA), the expansion uses the
+ generic permutation to merge the data back into the right place.  This
+ permutation results in VPERMQ, which is slow, so better fall back to
+ ix86_expand_vecop_qihi.  */
+  if (!TARGET_AVX512BW)
+return false;
+
   if ((qimode == V16QImode && !TARGET_AVX2)
   || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
   /* There are no V64HImode instructions.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr115069.c 
b/gcc/testsuite/gcc.target/i386/pr115069.c
new file mode 100644
index 000..50a3e033079
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115069.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+/* { dg-final { scan-assembler-not "vpermq" } } */
+
+typedef char v16qi __attribute__((vector_size(16)));
+
+v16qi foo (v16qi a, v16qi b) {
+return a * b;
+}

[gcc r15-764] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:73a167cfa225d5ee7092d41596b9fea1719898ff

commit r15-764-g73a167cfa225d5ee7092d41596b9fea1719898ff
Author: Haochen Jiang 
Date:   Tue May 21 14:10:43 2024 +0800

i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

Since vpermq is really slow, we should avoid using it for permutation
when vpmovwb is not available (needs AVX512BW) for ix86_expand_vecop_qihi2
and fall back to ix86_expand_vecop_qihi.

gcc/ChangeLog:

PR target/115069
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Do not enable the optimization when AVX512BW is not enabled.

gcc/testsuite/ChangeLog:

PR target/115069
* gcc.target/i386/pr115069.c: New.

Diff:
---
 gcc/config/i386/i386-expand.cc   | 7 +++
 gcc/testsuite/gcc.target/i386/pr115069.c | 9 +
 2 files changed, 16 insertions(+)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 7142c0a9d77..ec402a78a09 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -24188,6 +24188,13 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
   bool uns_p = code != ASHIFTRT;
 
+  /* Without VPMOVWB (provided by AVX512BW ISA), the expansion uses the
+ generic permutation to merge the data back into the right place.  This
+ permutation results in VPERMQ, which is slow, so better fall back to
+ ix86_expand_vecop_qihi.  */
+  if (!TARGET_AVX512BW)
+return false;
+
   if ((qimode == V16QImode && !TARGET_AVX2)
   || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
   /* There are no V64HImode instructions.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr115069.c 
b/gcc/testsuite/gcc.target/i386/pr115069.c
new file mode 100644
index 000..50a3e033079
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115069.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+/* { dg-final { scan-assembler-not "vpermq" } } */
+
+typedef char v16qi __attribute__((vector_size(16)));
+
+v16qi foo (v16qi a, v16qi b) {
+return a * b;
+}

[gcc r13-8652] i386: Fix array index overflow in pr105354-2.c

2024-04-26 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:7425436b5382a04f3eb28c7c7912f4d9a1cad0bd

commit r13-8652-g7425436b5382a04f3eb28c7c7912f4d9a1cad0bd
Author: Haochen Jiang 
Date:   Fri Apr 26 16:48:29 2024 +0800

i386: Fix array index overflow in pr105354-2.c

The array index should not be over 8 for v8hi, or it will fail
under -O0 or using -fstack-protector.

gcc/testsuite/ChangeLog:

PR target/110621
* gcc.target/i386/pr105354-2.c: As mentioned.

Diff:
---
 gcc/testsuite/gcc.target/i386/pr105354-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr105354-2.c 
b/gcc/testsuite/gcc.target/i386/pr105354-2.c
index b78b62e1e7e..1c592e84860 100644
--- a/gcc/testsuite/gcc.target/i386/pr105354-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr105354-2.c
@@ -17,7 +17,7 @@ sse2_test (void)
   b.a[i] = i + 16;
   res_ab.a[i] = 0;
   exp_ab.a[i] = -1;
-  if (i <= 8)
+  if (i < 8)
{
  c.a[i] = i;
  d.a[i] = i + 8;

[gcc r14-10137] i386: Fix array index overflow in pr105354-2.c

2024-04-26 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:4a2e55b3ada20fe6457466bb687a66c8d03e056e

commit r14-10137-g4a2e55b3ada20fe6457466bb687a66c8d03e056e
Author: Haochen Jiang 
Date:   Fri Apr 26 16:48:29 2024 +0800

i386: Fix array index overflow in pr105354-2.c

The array index should not be over 8 for v8hi, or it will fail
under -O0 or using -fstack-protector.

gcc/testsuite/ChangeLog:

PR target/110621
* gcc.target/i386/pr105354-2.c: As mentioned.

Diff:
---
 gcc/testsuite/gcc.target/i386/pr105354-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr105354-2.c 
b/gcc/testsuite/gcc.target/i386/pr105354-2.c
index b78b62e1e7e..1c592e84860 100644
--- a/gcc/testsuite/gcc.target/i386/pr105354-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr105354-2.c
@@ -17,7 +17,7 @@ sse2_test (void)
   b.a[i] = i + 16;
   res_ab.a[i] = 0;
   exp_ab.a[i] = -1;
-  if (i <= 8)
+  if (i < 8)
{
  c.a[i] = i;
  d.a[i] = i + 8;

[gcc r14-10104] i386: Fix behavior for both using AVX10.1-256 in options and function attribute

2024-04-24 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:d279c9d89b2f6ce89c1eec0ff4b980e9c5f51fd1

commit r14-10104-gd279c9d89b2f6ce89c1eec0ff4b980e9c5f51fd1
Author: Haochen Jiang 
Date:   Wed Apr 24 10:43:18 2024 +0800

i386: Fix behavior for both using AVX10.1-256 in options and function 
attribute

When we are using -mavx10.1-256 in command line and avx10.1-256 in
target attribute together, zmm should never be generated. But current
GCC will generate zmm since it wrongly enables EVEX512 for non-explicitly
set AVX512. This patch will fix that issue.

gcc/ChangeLog:

* config/i386/i386-options.cc (ix86_valid_target_attribute_tree):
Check whether AVX512F is explicitly enabled.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-24.c: New test.

Diff:
---
 gcc/config/i386/i386-options.cc| 1 +
 gcc/testsuite/gcc.target/i386/avx10_1-24.c | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 68a2e1c6910..ac48b5c61c4 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -1431,6 +1431,7 @@ ix86_valid_target_attribute_tree (tree fndecl, tree args,
  scenario.  */
   if ((def->x_ix86_isa_flags2 & OPTION_MASK_ISA2_AVX10_1_256)
   && (opts->x_ix86_isa_flags & OPTION_MASK_ISA_AVX512F)
+  && (opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512F)
   && !(def->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_EVEX512)
   && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_EVEX512))
 opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_EVEX512;
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-24.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-24.c
new file mode 100644
index 000..2e93f041760
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-24.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx10.1" } */
+/* { dg-final { scan-assembler-not "%zmm" } } */
+
+typedef float __m512 __attribute__ ((__vector_size__ (64), __may_alias__));
+
+void __attribute__((target("avx10.1-256"))) callee256(__m512 *a, __m512 *b) { 
*a = *b; }

[gcc r13-8641] i386: Fix Sierra Forest auto dispatch

2024-04-22 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:d80c9df20ed77a26eb71457679dad2b564c5da60

commit r13-8641-gd80c9df20ed77a26eb71457679dad2b564c5da60
Author: Haochen Jiang 
Date:   Mon Apr 22 16:57:36 2024 +0800

i386: Fix Sierra Forest auto dispatch

gcc/ChangeLog:

* common/config/i386/i386-common.cc (processor_alias_table):
Let Sierra Forest map to CPU_TYPE enum.

Diff:
---
 gcc/common/config/i386/i386-common.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 988805a3aed..a8809889360 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2110,7 +2110,7 @@ const pta processor_alias_table[] =
   {"gracemont", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
   {"sierraforest", PROCESSOR_SIERRAFOREST, CPU_HASWELL, PTA_SIERRAFOREST,
-M_CPU_SUBTYPE (INTEL_SIERRAFOREST), P_PROC_AVX2},
+M_CPU_TYPE (INTEL_SIERRAFOREST), P_PROC_AVX2},
   {"grandridge", PROCESSOR_GRANDRIDGE, CPU_HASWELL, PTA_GRANDRIDGE,
 M_CPU_TYPE (INTEL_GRANDRIDGE), P_PROC_AVX2},
   {"knl", PROCESSOR_KNL, CPU_SLM, PTA_KNL,

[gcc r14-10072] i386: Fix Sierra Forest auto dispatch

2024-04-22 Thread Haochen Jiang via Gcc-cvs

https://gcc.gnu.org/g:6b5248d15c6d10325c6cbb92a0e0a9eb04e3f122

commit r14-10072-g6b5248d15c6d10325c6cbb92a0e0a9eb04e3f122
Author: Haochen Jiang 
Date:   Mon Apr 22 16:57:36 2024 +0800

i386: Fix Sierra Forest auto dispatch

gcc/ChangeLog:

* common/config/i386/i386-common.cc (processor_alias_table):
Let Sierra Forest map to CPU_TYPE enum.

Diff:
---
 gcc/common/config/i386/i386-common.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index f814df8385b..77b154663bc 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2302,7 +2302,7 @@ const pta processor_alias_table[] =
   {"gracemont", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
   {"sierraforest", PROCESSOR_SIERRAFOREST, CPU_HASWELL, PTA_SIERRAFOREST,
-M_CPU_SUBTYPE (INTEL_SIERRAFOREST), P_PROC_AVX2},
+M_CPU_TYPE (INTEL_SIERRAFOREST), P_PROC_AVX2},
   {"grandridge", PROCESSOR_GRANDRIDGE, CPU_HASWELL, PTA_GRANDRIDGE,
 M_CPU_TYPE (INTEL_GRANDRIDGE), P_PROC_AVX2},
   {"clearwaterforest", PROCESSOR_CLEARWATERFOREST, CPU_HASWELL,

gcc-wwwdocs branch master updated. 033976162ed4745f7f808f14ba62b1c055e35d16

2024-04-12 Thread Haochen Jiang via Gcc-cvs-wwwdocs

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  033976162ed4745f7f808f14ba62b1c055e35d16 (commit)
  from  9e32f911b70a8c2303b9b60679ce337896ccffdd (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit 033976162ed4745f7f808f14ba62b1c055e35d16
Author: Haochen Jiang 
Date:   Fri Apr 12 16:34:48 2024 +0800

Uncomment MCore part title

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 14301157..8ac08e9a 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -828,7 +828,7 @@ __asm (".global __flmap_lock"  "\n\t"
   
 
 
-
+MCore
 
   Bitfields are now signed by default per GCC policy.  If you need 
bitfields
 to be unsigned, use -funsigned-bitfields.

---

Summary of changes:
 htdocs/gcc-14/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


hooks/post-receive
-- 
gcc-wwwdocs

[gcc r14-10397] i386: Correct AVX10 CPUID emulation

[gcc r15-1908] i386: Correct AVX10 CPUID emulation

[gcc r14-10283] testsuite: i386: Require ifunc support in gcc.target/i386/avx10_1-25.c etc.

[gcc r14-10271] Add AVX10.1 target_clones support

[gcc r15-983] Add AVX10.1 target_clones support

[gcc r15-888] Align tight loop without considering max skipping bytes.

[gcc r15-887] Adjust generic loop alignment from 16:11:8 to 16 for Intel processors

[gcc r14-10254] Align tight loop without considering max skipping bytes.

[gcc r14-10253] Adjust generic loop alignment from 16:11:8 to 16 for Intel processors

[gcc r14-10229] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

[gcc r15-764] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

[gcc r13-8652] i386: Fix array index overflow in pr105354-2.c

[gcc r14-10137] i386: Fix array index overflow in pr105354-2.c

[gcc r14-10104] i386: Fix behavior for both using AVX10.1-256 in options and function attribute

[gcc r13-8641] i386: Fix Sierra Forest auto dispatch

[gcc r14-10072] i386: Fix Sierra Forest auto dispatch

gcc-wwwdocs branch master updated. 033976162ed4745f7f808f14ba62b1c055e35d16

17 matches

Site Navigation

Mail list logo

Footer information