Re: [PATCH 1/2] match.pd: Support combine cond_len_op + vec_cond similar to cond_op

2023-10-30 Thread Andrew Pinski
On Wed, Sep 20, 2023 at 6:10 AM Lehua Ding  wrote:
>
> This patch adds combine cond_len_op and vec_cond to cond_len_op like
> cond_op.
>
> gcc/ChangeLog:
>
> * gimple-match.h (gimple_match_op::gimple_match_op):
> Add interfaces for more arguments.
> (gimple_match_op::set_op): Add interfaces for more arguments.
> * match.pd: Add support of combining cond_len_op + vec_cond
> ---
>  gcc/gimple-match.h | 72 ++
>  gcc/match.pd   | 39 +
>  2 files changed, 111 insertions(+)
>
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index bec3ff42e3e..9892c142285 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -92,6 +92,10 @@ public:
>code_helper, tree, tree, tree, tree, tree);
>gimple_match_op (const gimple_match_cond &,
>code_helper, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +  code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +  code_helper, tree, tree, tree, tree, tree, tree, tree, 
> tree);
>
>void set_op (code_helper, tree, unsigned int);
>void set_op (code_helper, tree, tree);
> @@ -100,6 +104,8 @@ public:
>void set_op (code_helper, tree, tree, tree, tree, bool);
>void set_op (code_helper, tree, tree, tree, tree, tree);
>void set_op (code_helper, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
>void set_value (tree);
>
>tree op_or_null (unsigned int) const;
> @@ -212,6 +218,39 @@ gimple_match_op::gimple_match_op (const 
> gimple_match_cond &cond_in,
>ops[4] = op4;
>  }
>
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> + code_helper code_in, tree type_in,
> + tree op0, tree op1, tree op2, tree op3,
> + tree op4, tree op5)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +num_ops (6)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}

Hmm, does it make sense to start to use variadic templates for these
constructors instead of writing them out?
And we can even add a static_assert to make sure the number of
arguments is <= MAX_NUM_OPS to make sure they are correct. And use
std::is_same to make sure we are only passing tree types.

Thanks,
Andrew

> +
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> + code_helper code_in, tree type_in,
> + tree op0, tree op1, tree op2, tree op3,
> + tree op4, tree op5, tree op6)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +num_ops (7)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Change the operation performed to CODE_IN, the type of the result to
> TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
> to set the operands itself.  */
> @@ -299,6 +338,39 @@ gimple_match_op::set_op (code_helper code_in, tree 
> type_in,
>ops[4] = op4;
>  }
>
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +tree op0, tree op1, tree op2, tree op3, tree op4,
> +tree op5)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 6;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}
> +
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +tree op0, tree op1, tree op2, tree op3, tree op4,
> +tree op5, tree op6)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 7;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Set the "operation" to be the single value VALUE, such as a constant
> or SSA_NAME.  */
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index a37af05f873..75b7e100120 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -103,12 +103,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>IFN_COND_FMIN IFN_COND_FMAX
>IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
>IFN_COND_SHL IFN_COND_SHR)
> +(define_operator_list COND_LEN_BINARY
> +  IFN_COND_LEN_ADD IFN_COND_LEN_SUB
> +  IFN_COND_LEN_MUL IFN_COND_LEN_DIV
> +  IFN_COND_LEN_MOD IFN_COND_LEN_RDIV
> +  IFN_COND_LEN_MIN IFN_COND_LEN_MAX
> +  IFN_COND_LEN_FMIN IFN_COND_LEN_FMAX
> +  IFN_COND_LEN_AND IFN_COND_LEN_IOR IFN_COND_LEN_XO

Pushed: [PATCH v2] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

2023-10-30 Thread Xi Ruoyao
Pushed r14-5030.  The subject and ChangeLog are updated to include the
PR number.  The code change is same as v1.

On Mon, 2023-10-30 at 20:44 +0800, chenglulu wrote:
> 
> 在 2023/10/30 下午8:26, Xi Ruoyao 写道:
> > On Mon, 2023-10-30 at 19:50 +0800, chenglulu wrote:
> > > 在 2023/10/30 下午7:42, Xi Ruoyao 写道:
> > > > Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
> > > > building a cross compiler if the cross assembler is not installed yet.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > >     * config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
> > > >     if not defined yet.
> > > > ---
> > > > 
> > > > Ok for trunk?
> > > I have no problem with this submission, but I don't understand the
> > > circumstances surrounding the error.
> > When the developers hack GCC they sometimes build a cross compiler with
> > no cross assembler, then HAVE_AS_TLS will just be undefined.  And in the
> > future we may have an assmebler w/o TLS support (for example a tiny
> > assembler for bare-metal target), then HAVE_AS_TLS will be undefined
> > too.
> 
> Ok!
> 
> Thanks!
> 
> > 
> > The error message is:
> > 
> > g++ -c   -g -O2   -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions 
> > -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
> > -Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
> > -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long 
> > -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  
> > -DGENERATOR_FILE -I. -Ibuild -I../../gcc/gcc -I../../gcc/gcc/build 
> > -I../../gcc/gcc/../include  -I../../gcc/gcc/../libcpp/include  \
> > -o build/gencondmd.o build/gencondmd.cc
> > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' 
> > was not declared in this scope
> >   3655 |   "HAVE_AS_TLS"
> >    |  ^~~
> > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' 
> > was not declared in this scope
> >   3655 |   "HAVE_AS_TLS"
> >    |  ^~~
> > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' 
> > was not declared in this scope
> >   3655 |   "HAVE_AS_TLS"
> >    |  ^~~
> > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' 
> > was not declared in this scope
> >   3655 |   "HAVE_AS_TLS"
> >    |  ^~~
> > make[1]: *** [Makefile:2962: build/gencondmd.o] Error 1
> > 
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH 3/4] [PATCH 3/3] Change internal intrin call for AVX512 intrins

2023-10-30 Thread Haochen Jiang
gcc/ChangeLog:

* config/i386/avx512bf16vlintrin.h
(_mm_avx512_castsi128_ps): New.
(_mm256_avx512_castsi256_ps): Ditto.
(_mm_avx512_slli_epi32): Ditto.
(_mm256_avx512_slli_epi32): Ditto.
(_mm_avx512_cvtepi16_epi32): Ditto.
(_mm256_avx512_cvtepi16_epi32): Ditto.
(__attribute__): Change intrin call.
* config/i386/avx512bwintrin.h
(_mm_avx512_set_epi32): New.
(_mm_avx512_set_epi16): Ditto.
(_mm_avx512_set_epi8): Ditto.
(__attribute__): Change intrin call.
* config/i386/avx512fp16intrin.h: Ditto.
* config/i386/avx512fp16vlintrin.h
(_mm_avx512_set1_ps): New.
(_mm256_avx512_set1_ps): Ditto.
(_mm_avx512_and_si128): Ditto.
(_mm256_avx512_and_si256): Ditto.
(__attribute__): Change intrin call.
* config/i386/avx512vlbwintrin.h
(_mm_avx512_set1_epi32): New.
(_mm_avx512_set1_epi16): Ditto.
(_mm_avx512_set1_epi8): Ditto.
(_mm256_avx512_set_epi16): Ditto.
(_mm256_avx512_set_epi8): Ditto.
(_mm256_avx512_set1_epi16): Ditto.
(_mm256_avx512_set1_epi32): Ditto.
(_mm256_avx512_set1_epi8): Ditto.
(_mm_avx512_max_epi16): Ditto.
(_mm_avx512_min_epi16): Ditto.
(_mm_avx512_max_epu16): Ditto.
(_mm_avx512_min_epu16): Ditto.
(_mm_avx512_max_epi8): Ditto.
(_mm_avx512_min_epi8): Ditto.
(_mm_avx512_max_epu8): Ditto.
(_mm_avx512_min_epu8): Ditto.
(_mm256_avx512_max_epi16): Ditto.
(_mm256_avx512_min_epi16): Ditto.
(_mm256_avx512_max_epu16): Ditto.
(_mm256_avx512_min_epu16): Ditto.
(_mm256_avx512_insertf128_ps): Ditto.
(_mm256_avx512_extractf128_pd): Ditto.
(_mm256_avx512_extracti128_si256): Ditto.
(_MM256_AVX512_REDUCE_OPERATOR_BASIC_EPI16): Ditto.
(_MM256_AVX512_REDUCE_OPERATOR_MAX_MIN_EP16): Ditto.
(_MM256_AVX512_REDUCE_OPERATOR_BASIC_EPI8): Ditto.
(_MM256_AVX512_REDUCE_OPERATOR_MAX_MIN_EP8): Ditto.
(__attribute__): Change intrin call.
---
 gcc/config/i386/avx512bf16vlintrin.h |  58 -
 gcc/config/i386/avx512bwintrin.h |  26 +++
 gcc/config/i386/avx512fp16intrin.h   |   2 +-
 gcc/config/i386/avx512fp16vlintrin.h |  54 +++--
 gcc/config/i386/avx512vlbwintrin.h   | 338 +++
 5 files changed, 409 insertions(+), 69 deletions(-)

diff --git a/gcc/config/i386/avx512bf16vlintrin.h 
b/gcc/config/i386/avx512bf16vlintrin.h
index 517544c5b89..78c001f55ad 100644
--- a/gcc/config/i386/avx512bf16vlintrin.h
+++ b/gcc/config/i386/avx512bf16vlintrin.h
@@ -45,6 +45,44 @@ typedef __bf16 __m128bh __attribute__ ((__vector_size__ 
(16), __may_alias__));
 
 typedef __bf16 __bfloat16;
 
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_avx512_castsi128_ps(__m128i __A)
+{
+  return (__m128) __A;
+}
+
+extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm256_avx512_castsi256_ps (__m256i __A)
+{
+  return (__m256) __A;
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_avx512_slli_epi32 (__m128i __A, int __B)
+{
+  return (__m128i)__builtin_ia32_pslldi128 ((__v4si)__A, __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_avx512_slli_epi32 (__m256i __A, int __B)
+{
+  return (__m256i)__builtin_ia32_pslldi256 ((__v8si)__A, __B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_avx512_cvtepi16_epi32 (__m128i __X)
+{
+  return (__m128i) __builtin_ia32_pmovsxwd128 ((__v8hi)__X);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_avx512_cvtepi16_epi32 (__m128i __X)
+{
+  return (__m256i) __builtin_ia32_pmovsxwd256 ((__v8hi)__X);
+}
+
 #define _mm256_cvtneps_pbh(A) \
   (__m128bh) __builtin_ia32_cvtneps2bf16_v8sf (A)
 #define _mm_cvtneps_pbh(A) \
@@ -182,23 +220,23 @@ extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_cvtpbh_ps (__m128bh __A)
 {
-  return (__m128)_mm_castsi128_ps ((__m128i)_mm_slli_epi32 (
-(__m128i)_mm_cvtepi16_epi32 ((__m128i)__A), 16));
+  return (__m128)_mm_avx512_castsi128_ps ((__m128i)_mm_avx512_slli_epi32 (
+(__m128i)_mm_avx512_cvtepi16_epi32 ((__m128i)__A), 16));
 }
 
 extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cvtpbh_ps (__m128bh __A)
 {
-  return (__m256)_mm256_castsi256_ps ((__m256i)_mm256_slli_epi32 (
-(__m256i)_mm256_cvtepi16_epi32 ((__m128i)__A), 16));
+  return (__m256)_mm256_avx512_castsi256_ps ((__m256i)_mm256_avx512_slli_epi32 
(
+(__m256i)_mm256_avx512_cvtepi16_epi32 ((__m128i)__A), 16));
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)

[PATCH 2/4] [PATCH 2/3] Change internal intrin call for AVX512 intrins

2023-10-30 Thread Haochen Jiang
gcc/ChangeLog:

* config/i386/avx512bf16vlintrin.h: Change intrin call.
* config/i386/avx512fintrin.h
(_mm_avx512_undefined_ps): New.
(_mm_avx512_undefined_pd): Ditto.
(__attribute__): Change intrin call.
* config/i386/avx512vbmivlintrin.h: Ditto.
* config/i386/avx512vlbwintrin.h: Ditto.
* config/i386/avx512vldqintrin.h: Ditto.
* config/i386/avx512vlintrin.h
(_mm_avx512_undefined_si128): New.
(_mm256_avx512_undefined_ps): Ditto.
(_mm256_avx512_undefined_pd): Ditto.
(_mm256_avx512_undefined_si256): Ditto.
(__attribute__): Change intrin call.
---
 gcc/config/i386/avx512bf16vlintrin.h |   2 +-
 gcc/config/i386/avx512fintrin.h  |  24 +-
 gcc/config/i386/avx512vbmivlintrin.h |   8 +-
 gcc/config/i386/avx512vlbwintrin.h   |  12 +--
 gcc/config/i386/avx512vldqintrin.h   |  10 +--
 gcc/config/i386/avx512vlintrin.h | 110 ++-
 6 files changed, 113 insertions(+), 53 deletions(-)

diff --git a/gcc/config/i386/avx512bf16vlintrin.h 
b/gcc/config/i386/avx512bf16vlintrin.h
index 6e8a6a09511..517544c5b89 100644
--- a/gcc/config/i386/avx512bf16vlintrin.h
+++ b/gcc/config/i386/avx512bf16vlintrin.h
@@ -174,7 +174,7 @@ _mm_cvtness_sbh (float __A)
 {
   __v4sf __V = {__A, 0, 0, 0};
   __v8bf __R = __builtin_ia32_cvtneps2bf16_v4sf_mask ((__v4sf)__V,
-  (__v8bf)_mm_undefined_si128 (), (__mmask8)-1);
+  (__v8bf)_mm_avx512_undefined_si128 (), (__mmask8)-1);
   return __R[0];
 }
 
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 530be29eefa..90a00bec09a 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -59,6 +59,26 @@ typedef enum
when calling AVX512 intrins implemented with these intrins under no-evex512
function attribute.  All AVX512 intrins calling those AVX2 intrins or
before will change their calls to these AVX512 version.  */
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_avx512_undefined_ps (void)
+{
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-self"
+  __m128 __Y = __Y;
+#pragma GCC diagnostic pop
+  return __Y;
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_avx512_undefined_pd (void)
+{
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-self"
+  __m128d __Y = __Y;
+#pragma GCC diagnostic pop
+  return __Y;
+}
+
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_avx512_setzero_ps (void)
 {
@@ -674,13 +694,13 @@ _mm_maskz_scalef_round_ss (__mmask8 __U, __m128 __A, 
__m128 __B, const int __R)
 #define _mm_scalef_round_sd(A, B, C)   \
   ((__m128d)   \
__builtin_ia32_scalefsd_mask_round ((A), (B),   \
-  (__v2df) _mm_undefined_pd (),\
+  (__v2df) _mm_avx512_undefined_pd (), 
\
   -1, (C)))
 
 #define _mm_scalef_round_ss(A, B, C)   \
   ((__m128)\
__builtin_ia32_scalefss_mask_round ((A), (B),   \
-  (__v4sf) _mm_undefined_ps (),\
+  (__v4sf) _mm_avx512_undefined_ps (), 
\
   -1, (C)))
 
 #define _mm_mask_scalef_round_sd(W, U, A, B, C)
\
diff --git a/gcc/config/i386/avx512vbmivlintrin.h 
b/gcc/config/i386/avx512vbmivlintrin.h
index 270e9406db5..acec23b742f 100644
--- a/gcc/config/i386/avx512vbmivlintrin.h
+++ b/gcc/config/i386/avx512vbmivlintrin.h
@@ -62,7 +62,7 @@ _mm256_multishift_epi64_epi8 (__m256i __X, __m256i __Y)
   return (__m256i) __builtin_ia32_vpmultishiftqb256_mask ((__v32qi) __X,
  (__v32qi) __Y,
  (__v32qi)
- 
_mm256_undefined_si256 (),
+ 
_mm256_avx512_undefined_si256 (),
  (__mmask32) -1);
 }
 
@@ -94,7 +94,7 @@ _mm_multishift_epi64_epi8 (__m128i __X, __m128i __Y)
   return (__m128i) __builtin_ia32_vpmultishiftqb128_mask ((__v16qi) __X,
  (__v16qi) __Y,
  (__v16qi)
- _mm_undefined_si128 
(),
+ 
_mm_avx512_undefined_si128 (),
  (__mmask16) -

[PATCH 4/4] Push no-evex512 target for 128/256 bit intrins

2023-10-30 Thread Haochen Jiang
gcc/ChangeLog:

PR target/111889
* config/i386/avx512bf16intrin.h: Push no-evex512 target.
* config/i386/avx512bf16vlintrin.h: Ditto.
* config/i386/avx512bitalgvlintrin.h: Ditto.
* config/i386/avx512bwintrin.h: Ditto.
* config/i386/avx512dqintrin.h: Ditto.
* config/i386/avx512fintrin.h: Ditto.
* config/i386/avx512fp16intrin.h: Ditto.
* config/i386/avx512fp16vlintrin.h: Ditto.
* config/i386/avx512ifmavlintrin.h: Ditto.
* config/i386/avx512vbmi2vlintrin.h: Ditto.
* config/i386/avx512vbmivlintrin.h: Ditto.
* config/i386/avx512vlbwintrin.h: Ditto.
* config/i386/avx512vldqintrin.h: Ditto.
* config/i386/avx512vlintrin.h: Ditto.
* config/i386/avx512vnnivlintrin.h: Ditto.
* config/i386/avx512vp2intersectvlintrin.h: Ditto.
* config/i386/avx512vpopcntdqvlintrin.h: Ditto.

gcc/testsuite/ChangeLog:

PR target/111889
* gcc.target/i386/pr111889.c: New test.
---
 gcc/config/i386/avx512bf16intrin.h   |  4 ++--
 gcc/config/i386/avx512bf16vlintrin.h |  4 ++--
 gcc/config/i386/avx512bitalgvlintrin.h   |  4 ++--
 gcc/config/i386/avx512bwintrin.h |  4 ++--
 gcc/config/i386/avx512dqintrin.h |  4 ++--
 gcc/config/i386/avx512fintrin.h  |  4 ++--
 gcc/config/i386/avx512fp16intrin.h   |  4 ++--
 gcc/config/i386/avx512fp16vlintrin.h |  4 ++--
 gcc/config/i386/avx512ifmavlintrin.h |  4 ++--
 gcc/config/i386/avx512vbmi2vlintrin.h|  4 ++--
 gcc/config/i386/avx512vbmivlintrin.h |  4 ++--
 gcc/config/i386/avx512vlbwintrin.h   |  4 ++--
 gcc/config/i386/avx512vldqintrin.h   |  4 ++--
 gcc/config/i386/avx512vlintrin.h |  6 +++---
 gcc/config/i386/avx512vnnivlintrin.h |  4 ++--
 gcc/config/i386/avx512vp2intersectvlintrin.h |  5 +++--
 gcc/config/i386/avx512vpopcntdqvlintrin.h|  5 +++--
 gcc/testsuite/gcc.target/i386/pr111889.c | 10 ++
 18 files changed, 47 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111889.c

diff --git a/gcc/config/i386/avx512bf16intrin.h 
b/gcc/config/i386/avx512bf16intrin.h
index 94ccbf6389f..5084a8c23ed 100644
--- a/gcc/config/i386/avx512bf16intrin.h
+++ b/gcc/config/i386/avx512bf16intrin.h
@@ -28,9 +28,9 @@
 #ifndef _AVX512BF16INTRIN_H_INCLUDED
 #define _AVX512BF16INTRIN_H_INCLUDED
 
-#ifndef __AVX512BF16__
+#if !defined (__AVX512BF16__) || defined (__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512bf16")
+#pragma GCC target("avx512bf16,no-evex512")
 #define __DISABLE_AVX512BF16__
 #endif /* __AVX512BF16__ */
 
diff --git a/gcc/config/i386/avx512bf16vlintrin.h 
b/gcc/config/i386/avx512bf16vlintrin.h
index 78c001f55ad..a389bfe7cec 100644
--- a/gcc/config/i386/avx512bf16vlintrin.h
+++ b/gcc/config/i386/avx512bf16vlintrin.h
@@ -28,9 +28,9 @@
 #ifndef _AVX512BF16VLINTRIN_H_INCLUDED
 #define _AVX512BF16VLINTRIN_H_INCLUDED
 
-#if !defined(__AVX512VL__) || !defined(__AVX512BF16__)
+#if !defined(__AVX512VL__) || !defined(__AVX512BF16__) || defined (__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512bf16,avx512vl")
+#pragma GCC target("avx512bf16,avx512vl,no-evex512")
 #define __DISABLE_AVX512BF16VL__
 #endif /* __AVX512BF16__ */
 
diff --git a/gcc/config/i386/avx512bitalgvlintrin.h 
b/gcc/config/i386/avx512bitalgvlintrin.h
index 39301625601..327425ef0cb 100644
--- a/gcc/config/i386/avx512bitalgvlintrin.h
+++ b/gcc/config/i386/avx512bitalgvlintrin.h
@@ -28,9 +28,9 @@
 #ifndef _AVX512BITALGVLINTRIN_H_INCLUDED
 #define _AVX512BITALGVLINTRIN_H_INCLUDED
 
-#if !defined(__AVX512BITALG__) || !defined(__AVX512VL__)
+#if !defined(__AVX512BITALG__) || !defined(__AVX512VL__) || defined 
(__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512bitalg,avx512vl")
+#pragma GCC target("avx512bitalg,avx512vl,no-evex512")
 #define __DISABLE_AVX512BITALGVL__
 #endif /* __AVX512BITALGVL__ */
 
diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 45a46936aef..d5ce79fd073 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -28,9 +28,9 @@
 #ifndef _AVX512BWINTRIN_H_INCLUDED
 #define _AVX512BWINTRIN_H_INCLUDED
 
-#ifndef __AVX512BW__
+#if !defined (__AVX512BW__) || defined (__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512bw")
+#pragma GCC target("avx512bw,no-evex512")
 #define __DISABLE_AVX512BW__
 #endif /* __AVX512BW__ */
 
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index fb0aea70280..55a5d9fee9c 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -28,9 +28,9 @@
 #ifndef _AVX512DQINTRIN_H_INCLUDED
 #define _AVX512DQINTRIN_H_INCLUDED
 
-#ifndef __AVX512DQ__
+#if !defined (__AVX512DQ__) || defined (__EVEX512__)
 #pragma GCC push_options
-#pragma GCC target("avx512dq")
+#pragma GCC target("avx512dq,no-evex512")

[PATCH 0/4] Fix no-evex512 function attribute

2023-10-30 Thread Haochen Jiang
Hi all,

These four patches are going to fix no-evex512 function attribute. The detail
of the issue comes following:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889

My proposal for this problem is to also push "no-evex512" when defining
128/256 intrins in AVX512.

Besides, I added some new intrins to support the current AVX512 intrins.
The newly added  _mm{,256}_avx512* intrins are duplicated from their
_mm{,256}_* forms from AVX2 or before. We need to add them to prevent target
option mismatch when calling AVX512 intrins implemented with these intrins
under no-evex512 function attribute. All AVX512 intrins calling those AVX2
intrins or before will change their calls to these newly added AVX512 version.

This will solve the problem when we are using no-evex512 attribute with
AVX512 related intrins. But it will not solve target option mismatch when we
are calling AVX2 intrins or before with no-evex512 function attribute since as
mentioned in PR111889, it actually comes from a legacy issue. Therefore, we
are not expecting that usage.

Regtested on x86_64-pc-linux-gnu. Ok for trunk?

Thx,
Haochen




Re: [PATCH 1/2] match.pd: Support combine cond_len_op + vec_cond similar to cond_op

2023-10-30 Thread Lehua Ding

Committed, thanks Jeff.

On 2023/9/28 6:24, Jeff Law wrote:



On 9/20/23 07:09, Lehua Ding wrote:

This patch adds combine cond_len_op and vec_cond to cond_len_op like
cond_op.

gcc/ChangeLog:

* gimple-match.h (gimple_match_op::gimple_match_op):
Add interfaces for more arguments.
(gimple_match_op::set_op): Add interfaces for more arguments.
* match.pd: Add support of combining cond_len_op + vec_cond

OK
jeff



--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 2/2] RISC-V: Add assert of the number of vmerge in autovec cond testcases

2023-10-30 Thread Lehua Ding

Committed, thanks Jeff.

On 2023/10/17 11:19, Lehua Ding wrote:

Hi Jeff,

Can you replace riscv_vector with riscv_v?  That way this will still 
work after Joern commits his change to standardize on the riscv_v 
target selector.


OK with that change, no need to wait for a review on V2, just go ahead 
and blast it in.


No problem, I'll tweak it later and submit it. Thanks.



--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: Re: [PATCH V5] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-10-30 Thread Li Xu
Since the following three instances share the class binop, 
I cannot distinguish between vadd and vfadd.
 I think it is difficult to add maybe_require_frm_p 
and maybe_require_vxrm_p to function_base.


static CONSTEXPR const binop vadd_obj;
static CONSTEXPR const binop vfadd_obj;
static CONSTEXPR const binop vfadd_frm_obj;


template
class binop : public function_base
{
public:
  bool maybe_require_frm_p () const override { return true; }//vadd is true
...
}


--



Li Xu



>+static bool



>+maybe_require_frm_p (function_instance &instance)



>+{



>+  return instance.base == bases::vfwredusum



>+   || instance.base == bases::vfwredosum || instance.base == bases::vfadd



>+   || instance.base == bases::vfwsub || instance.base == bases::vfwnmsac



>+   || instance.base == bases::vfwnmacc || instance.base == bases::vfwmul



>+   || instance.base == bases::vfcvt_x || instance.base == bases::vfcvt_f



>+   || instance.base == bases::vfcvt_xu || instance.base == bases::vfwmsac



>+   || instance.base == bases::vfwmacc || instance.base == bases::vfwcvt_x



>+   || instance.base == bases::vfwadd || instance.base == bases::vfsub



>+   || instance.base == bases::vfsqrt || instance.base == bases::vfredusum



>+   || instance.base == bases::vfrsub || instance.base == bases::vfredosum



>+   || instance.base == bases::vfrec7 || instance.base == bases::vfrdiv



>+   || instance.base == bases::vfnmsub || instance.base == bases::vfnmsac



>+   || instance.base == bases::vfnmadd || instance.base == bases::vfnmacc



>+   || instance.base == bases::vfncvt_f || instance.base == bases::vfncvt_x



>+   || instance.base == bases::vfncvt_xu || instance.base == bases::vfmul



>+   || instance.base == bases::vfmsub || instance.base == bases::vfmsac



>+   || instance.base == bases::vfmadd || instance.base == bases::vfmacc



>+   || instance.base == bases::vfdiv || instance.base == bases::vfwcvt_xu;



>+}



>+



>+static bool



>+maybe_require_vxrm_p (function_instance &instance)



>+{



>+  return instance.base == bases::vaadd || instance.base == bases::vaaddu



>+   || instance.base == bases::vasub || instance.base == bases::vasubu



>+   || instance.base == bases::vssrl || instance.base == bases::vssra



>+   || instance.base == bases::vsmul || instance.base == bases::vnclipu



>+   || instance.base == bases::vnclip;



>+}



>



>I am sorry that I didn't was wrong before.



>



>Could we add maybe_require_frm_p and maybe_require_vxrm_p into function_base ?



>By default it is FALSE.



>



>In riscv-vector-builtins-bases.cc, set them in each corresponding 
>function_base:



>



>For example:



>



>class vsmul :: public function_base



>bool maybe_require_vxrm_p () const



>{



>  return true;



>}



>



>The benefits is that you only need to use instance.base.maybe_require_frm_p () 
>or instance.base.maybe_require_vxrm_p ()



>And no need to compare them one by one.



>



>Thanks.



>



>



>juzhe.zh...@rivai.ai



> 



>From: Li Xu



>Date: 2023-10-31 10:24



>To: gcc-patches



>CC: kito.cheng; palmer; juzhe.zhong; xuli



>Subject: [PATCH V5] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
>intrinsic



>From: xuli 



> 



>Update in v5:



>* Split has_vxrm_or_frm_p into maybe_require_frm_p and



>  maybe_require_vxrm_p.



>* Adjust comments.



> 



>Update in v4:



>* Remove class function_resolver.



>* Remove function get_non_overloaded_instance.



>* Add overloaded hash traits for non-overloaded intrinsic.



>* All overloaded intrinsics are implemented, and the tests pass.



> 



>Update in v3:



> 



>* Rewrite comment for overloaded function add.



>* Move get_non_overloaded_instance to function_base.



> 



>Update in v2:



> 



>* Add get_non_overloaded_instance for function instance.



>* Fix overload check for policy function.



>* Enrich the test cases check.



> 



>Original log:



> 



>This patch would like add the framework to support the RVV overloaded



>intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.



> 



>However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN



>with below steps.



> 



>* Register overloaded functions.



>* Add function_resolver for overloaded function resolving.



>* Add resolve API for function shape with default implementation.



>* Implement HOOK for navigating the overloaded API to non-overloaded API.



> 



>gcc/ChangeLog:



> 



>    * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): New 
>function for the hook.



>    (riscv_register_pragmas): Register the hook.



>    * config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.



>    * config/riscv/riscv-vector-builtins-shapes.cc (build_one): Register 
>overloaded function.



>    * config/riscv/riscv-vector-builtins.cc (struct 
>non_overloaded_registered

Re: [PATCH] RISC-V: Add the missed combine of [u]int64 -> _Float16 and vcond

2023-10-30 Thread Lehua Ding

Committed, thanks Juzhe.

On 2023/10/31 11:43, juzhe.zh...@rivai.ai wrote:

LGTM.


juzhe.zh...@rivai.ai

*From:* Lehua Ding 
*Date:* 2023-10-31 11:39
*To:* gcc-patches 
*CC:* juzhe.zhong ; kito.cheng
; rdapp.gcc
; palmer ;
jeffreyalaw ; lehua.ding

*Subject:* [PATCH] RISC-V: Add the missed combine of [u]int64 ->
_Float16 and vcond
Hi,
This patch let the INT64 to FP16 convert split to two small converts
(INT64 -> FP32 and FP32 -> FP16) when expanding instead of dealy the
split to split1 pass. This change could make it possible to combine
the FP32 to FP16 and vcond patterns and so we don't need to add an
combine pattern for INT64 to FP16 and vcond patterns.
Consider this code:
   void
   foo (_Float16 *__restrict r, int64_t *__restrict a, _FLoat16
*__restrict b,
    int64_t *__restrict pred, int n)
   {
     for (int i = 0; i < n; i += 1)
   {
     r[i] = pred[i] ? (_Float16) a[i] : b[i];
   }
   }
Before this patch:
   ...
   vfncvt.f.f.w    v2,v2
   vmerge.vvm  v1,v1,v2,v0
   vse16.v v1,0(a0)
   ...
After this patch:
   ...
   vfncvt.f.f.w    v1,v2,v0.t
   vse16.v v1,0(a0)
   ...
gcc/ChangeLog:
* config/riscv/autovec.md (2):
Change to define_expand.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c:
Add vfncvt.f.f.w assert.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c:
Ditto.
---
gcc/config/riscv/autovec.md  | 5 +
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c   | 2 ++
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c   | 2 ++
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c   | 2 ++
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c   | 2 ++
5 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 5f49d73be44..bfd45dd76ff 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -977,14 +977,11 @@
;; This operation can be performed in the loop vectorizer but
unfortunately
;; not applicable for now. We can remove this pattern after loop
vectorizer
;; is able to take care of INT64 to FP16 conversion.
-(define_insn_and_split "2"
+(define_expand "2"
    [(set (match_operand:  0 "register_operand")
(any_float:
   (match_operand:VWWCONVERTI 1 "register_operand")))]
    "TARGET_VECTOR && TARGET_ZVFH && can_create_pseudo_p () &&
!flag_trapping_math"
-  "#"
-  "&& 1"
-  [(const_int 0)]
    {
  rtx single = gen_reg_rtx (mode); /* Get vector SF
mode.  */
diff --git

a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
index f5d3bb4c789..030c8fe33ce 100644
---

a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
+++

b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
@@ -12,4 +12,6 @@
/* { dg-final { scan-assembler-times
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
/* { dg-final { scan-assembler-times
{\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+/* { dg-final { scan-assembler-times
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
/* { dg-final { scan-assembler
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git

a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
index f5d3bb4c789..030c8fe33ce 100644
---

a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
+++

b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
@@ -12,4 +12,6 @@
/* { dg-final { scan-assembler-times
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
/* { dg-final { scan-assembler-times
{\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+/* { dg-final { scan-assembler-times
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
/* { dg-final { scan-assembler
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} }

Re: [PATCH] RISC-V: Add the missed combine of [u]int64 -> _Float16 and vcond

2023-10-30 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-31 11:39
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH] RISC-V: Add the missed combine of [u]int64 -> _Float16 and 
vcond
Hi,
 
This patch let the INT64 to FP16 convert split to two small converts
(INT64 -> FP32 and FP32 -> FP16) when expanding instead of dealy the
split to split1 pass. This change could make it possible to combine
the FP32 to FP16 and vcond patterns and so we don't need to add an
combine pattern for INT64 to FP16 and vcond patterns.
 
Consider this code:
  void
  foo (_Float16 *__restrict r, int64_t *__restrict a, _FLoat16 *__restrict b,
   int64_t *__restrict pred, int n)
  {
for (int i = 0; i < n; i += 1)
  {
r[i] = pred[i] ? (_Float16) a[i] : b[i];
  }
  }
 
Before this patch:
  ...
  vfncvt.f.f.wv2,v2
  vmerge.vvm  v1,v1,v2,v0
  vse16.v v1,0(a0)
  ...
 
After this patch:
  ...
  vfncvt.f.f.wv1,v2,v0.t
  vse16.v v1,0(a0)
  ...
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2):
Change to define_expand.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c:
Add vfncvt.f.f.w assert.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c:
Ditto.
 
---
gcc/config/riscv/autovec.md  | 5 +
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c   | 2 ++
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c   | 2 ++
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c   | 2 ++
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c   | 2 ++
5 files changed, 9 insertions(+), 4 deletions(-)
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 5f49d73be44..bfd45dd76ff 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -977,14 +977,11 @@
;; This operation can be performed in the loop vectorizer but unfortunately
;; not applicable for now. We can remove this pattern after loop vectorizer
;; is able to take care of INT64 to FP16 conversion.
-(define_insn_and_split "2"
+(define_expand "2"
   [(set (match_operand:  0 "register_operand")
(any_float:
  (match_operand:VWWCONVERTI 1 "register_operand")))]
   "TARGET_VECTOR && TARGET_ZVFH && can_create_pseudo_p () && 
!flag_trapping_math"
-  "#"
-  "&& 1"
-  [(const_int 0)]
   {
 rtx single = gen_reg_rtx (mode); /* Get vector SF mode.  */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
index f5d3bb4c789..030c8fe33ce 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
@@ -12,4 +12,6 @@
/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
/* { dg-final { scan-assembler-times {\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 
2 } } */
+/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
/* { dg-final { scan-assembler 
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
index f5d3bb4c789..030c8fe33ce 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
@@ -12,4 +12,6 @@
/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
/* { dg-final { scan-assembler-times {\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 
2 } } */
+/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
/* { dg-final { scan-assembler 
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
index 5ebed2f7fdc..d6298f5351a 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
@@ -12,4 +12,6 @@
/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
/* { dg-final { scan-assembler-times {\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 
2 } } */
+/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
/* { dg-final { scan-assembler 
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/con

[PATCH] RISC-V: Add the missed combine of [u]int64 -> _Float16 and vcond

2023-10-30 Thread Lehua Ding
Hi,

This patch let the INT64 to FP16 convert split to two small converts
(INT64 -> FP32 and FP32 -> FP16) when expanding instead of dealy the
split to split1 pass. This change could make it possible to combine
the FP32 to FP16 and vcond patterns and so we don't need to add an
combine pattern for INT64 to FP16 and vcond patterns.

Consider this code:
  void
  foo (_Float16 *__restrict r, int64_t *__restrict a, _FLoat16 *__restrict b,
   int64_t *__restrict pred, int n)
  {
for (int i = 0; i < n; i += 1)
  {
r[i] = pred[i] ? (_Float16) a[i] : b[i];
  }
  }

Before this patch:
  ...
  vfncvt.f.f.wv2,v2
  vmerge.vvm  v1,v1,v2,v0
  vse16.v v1,0(a0)
  ...

After this patch:
  ...
  vfncvt.f.f.wv1,v2,v0.t
  vse16.v v1,0(a0)
  ...

gcc/ChangeLog:

* config/riscv/autovec.md (2):
Change to define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c:
Add vfncvt.f.f.w assert.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c:
Ditto.

---
 gcc/config/riscv/autovec.md  | 5 +
 .../riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c   | 2 ++
 .../riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c   | 2 ++
 .../riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c   | 2 ++
 .../riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c   | 2 ++
 5 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 5f49d73be44..bfd45dd76ff 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -977,14 +977,11 @@
 ;; This operation can be performed in the loop vectorizer but unfortunately
 ;; not applicable for now. We can remove this pattern after loop vectorizer
 ;; is able to take care of INT64 to FP16 conversion.
-(define_insn_and_split "2"
+(define_expand "2"
   [(set (match_operand:  0 "register_operand")
(any_float:
  (match_operand:VWWCONVERTI 1 "register_operand")))]
   "TARGET_VECTOR && TARGET_ZVFH && can_create_pseudo_p () && 
!flag_trapping_math"
-  "#"
-  "&& 1"
-  [(const_int 0)]
   {
 rtx single = gen_reg_rtx (mode); /* Get vector SF mode.  */
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
index f5d3bb4c789..030c8fe33ce 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
@@ -12,4 +12,6 @@
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 
+/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
 /* { dg-final { scan-assembler 
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
index f5d3bb4c789..030c8fe33ce 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
@@ -12,4 +12,6 @@
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 
+/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
 /* { dg-final { scan-assembler 
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
index 5ebed2f7fdc..d6298f5351a 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
@@ -12,4 +12,6 @@
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 
+/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
 /* { dg-final { scan-assembler 
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c
index 097e377

Re: Re: [PATCH 2/4] [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

2023-10-30 Thread Fei Gao
On 2023-10-31 03:16  Jeff Law  wrote:
>
>
>
>On 10/30/23 01:25, Fei Gao wrote:
>> Conditional add, if zero
>> rd = (rc == 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> add rd, rs1, rd
>>
>> Conditional add, if non-zero
>> rd = (rc != 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> add rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>>  * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>>  (noce_try_cond_zero_arith): handler for condtional zero op
>>  (noce_process_if_block): add noce_try_cond_zero_arith with hook 
>>control
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
>> ---
>>   gcc/ifcvt.cc  | 112 +++
>>   .../gcc.target/riscv/zicond_ifcvt_opt.c   | 130 ++
>>   2 files changed, 242 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
>>
>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>> index a0af553b9ff..4f98c1c7bf9 100644
>> --- a/gcc/ifcvt.cc
>> +++ b/gcc/ifcvt.cc
>> +static rtx
>> +noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, 
>> rtx z, rtx target)
>> +{
>> +  machine_mode mode = GET_MODE (target);
>> +  rtx cond_op0 = XEXP (if_info->cond, 0);
>> +  rtx czero_cond
>> +    = gen_rtx_fmt_ee (czero_code, GET_MODE (cond_op0), cond_op0, 
>> const0_rtx);
>> +  rtx if_then_else = gen_rtx_IF_THEN_ELSE (mode, czero_cond, const0_rtx, z);
>> +  rtx set = gen_rtx_SET (target, if_then_else);
>> +
>> +  start_sequence ();
>> +  rtx_insn *insn = emit_insn (set);
>> +
>> +  if (recog_memoized (insn) >= 0)
>> +    {
>> +  rtx_insn *seq = get_insns ();
>> +  end_sequence ();
>> +  emit_insn (seq);
>> +
>> +  return target;
>> +    }
>> +
>> +  end_sequence ();
>> +  return NULL_RTX;
>> +}
>So just a few notes to further illustrate why I'm currently looking to
>take the VRULL+Ventana implementation.  The code above would be much
>better handled by just calling noce_emit_cmove.  noce_emit_cmove will go
>through the conditional move expander.  So any improvement we make in
>the expander "just work" when called from the if-converter. 
noce_emit_czero is used here to make sure czero insns are emited. 
noce_emit_cmove includes SFB and Thead movcc, which will take precedence
over zicond in RISCV if enabled. Unfortunately we have products with SFB and 
Zicond
both available and saw such conflict. 
And that is also the reason to add hook TARGET_HAVE_COND_ZERO
in [PATCH 1/4] to disallow ineffient code emited by SFB enable and Zicond 
disabled case. 

>> +
>>   /* Try only simple constants and registers here.  More complex cases
>>  are handled in noce_try_cmove_arith after noce_try_store_flag_arith
>>  has had a go at it.  */
>> @@ -2880,6 +2908,88 @@ noce_try_sign_mask (struct noce_if_info *if_info)
>> return true;
>>   }
>>  
>> +/* Convert x = c ? y + z : y or x = c ? y : y + z. */
>> +
>> +static bool
>> +noce_try_cond_zero_arith (struct noce_if_info *if_info)
>> +{
>> +  rtx target;
>> +  rtx_insn *seq;
>> +  machine_mode mode = GET_MODE (if_info->x);
>> +  rtx common = NULL_RTX;
>> +  enum rtx_code czero_code = UNKNOWN;
>> +  rtx a = if_info->a;
>> +  rtx b = if_info->b;
>> +  rtx z = NULL_RTX;
>> +  rtx cond = if_info->cond;
>> +
>> +  if (!noce_simple_bbs (if_info))
>> +    return false;
>[ ... ]
>So the internal code we have does a bit of canonicalization before the
>optimizing transformations.  In particular we may be presented with
>
>(a == 0) ? b : a which we transform into (a != 0 ? a : b) which allows
>us to pick up more cases.  (b != 0 ? b : a) gets similar handling.
>
>As I mentioned earlier, the VRULL+Ventana code handles wrapping
>extensions & subregs.  Our code also handles if-converting shifts/rotates. 
Cool and waiting for your submit. Shifts/rotates can be added in 
noce_try_cond_zero_arith.
I tried to keep noce_try_cond_zero_arith simple without introducing SCC and 
other stuff
as addtional insns will be generated for greater than like comparision
but may not be generated for branch-insn based SFB.
IMHO, the earlier the noce_try* function emerges in noce_process_if_block, the 
simpler
optimization scenarios are and more efficent codes shall be generated,
then the later function like noce_try_cmove_arith will handle the more general 
case.

BR, 
Fei
>
>Hopefully that explains a bit more why I think cleaning up the
>VRULL+Ventana code is a better choice. 

>
>jeff

Re: [committed] d: Merge upstream dmd, druntime e48bc0987d, phobos 2458e8f82.

2023-10-30 Thread Iain Buclaw
Excerpts from Rainer Orth's message of Oktober 30, 2023 5:37 pm:
> Hi Iain,
> 
>> This patch merges the D front-end and runtime library with upstream dmd
>> e48bc0987d, and standard library with phobos 2458e8f82.
>>
>> Synchronizing with the v2.106.0-beta.1 release.
>>
>> D front-end changes:
>>
>> - Import dmd v2.106.0-beta.1.
> 
> this patch broke D bootstrap, it seems:
> 
> /vol/gcc/src/hg/master/local/gcc/d/expr.cc: In member function 'virtual void 
> ExprVisitor::visit(NewExp*)':
> /vol/gcc/src/hg/master/local/gcc/d/expr.cc:2361:21: error: unused variable 
> 'tarray' [-Werror=unused-variable]
>  2361 | TypeDArray *tarray = tb->isTypeDArray ();
>   | ^~
> 
> It removed the uses of tarray, but kept the initialization.
> 

Hi Rainer,

Thanks for spotting, I'll fix it up.

Iain.


Re: Re: [PATCH 2/4] [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

2023-10-30 Thread Fei Gao
On 2023-10-31 00:36  Jeff Law  wrote:
>
>
>
>On 10/30/23 01:25, Fei Gao wrote:
>> Conditional add, if zero
>> rd = (rc == 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> add rd, rs1, rd
>>
>> Conditional add, if non-zero
>> rd = (rc != 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> add rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>>  * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>>  (noce_try_cond_zero_arith): handler for condtional zero op
>>  (noce_process_if_block): add noce_try_cond_zero_arith with hook 
>>control
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
>So the idea here is to improve upon the current code we generate for
>conditional arithmetic.  Right now we support conditional arithmetic
>using zicond, but the sequence is poor.
>
>Basically the if-converter knows how to generate a conditional add, but
>it does so in a way that isn't as efficient as it could be.
>
>In effect ifcvt wants to generate
>
>t = a + b
>res = cond ? t : b
>
>
>We want to change it to
>
>t = cond ? b : 0;
>res = a + t;
>
>The latter sequence expands to more efficient code trivially for risc-v. 
Exactly. 2 less insns for add case below:
long test_ADD_ceqz(long x, long y, long z, long c){
  if (c)
    x = y + z;  
  else
    x = y;  
  return x;
  }
  
test_ADD_ceqz(before this patch): 
  add a2,a1,a2
  czero.eqz a0,a2,a3
  czero.nez a3,a1,a3
  or a0,a3,a0
ret

test_ADD_ceqz(after this patch):
  czero.eqz a3,a2,a3
  add a0,a1,a3
  ret
>
>I wandered a bit through the combine dumps to see if it would be easy to
>capture this class of cases.  We never get anything useful, and while I
>can imagine "bridge" patterns that would potentially expose enough RTL
>to allow us to rewrite without changing ifcvt, it'd just be a hack IMHO.
>
>So going back to ifcvt...
>
>In the first sequence the addition must wait for both "a" and "b" to be
>available and the conditional move can fire on the next cycle.
>
>In the second sequence the conditional move can fire when just "b" is
>available.  So that gives "a" another cycle to become ready (say if it's
>coming from memory or a multi-cycle operation like multiply).
>
>On the other hand the second sequence does keep "a" live longer.
>
>In the end I strongly suspect neither sequence is significantly better
>than the other.  Meaning I don't think we need to conditionalize using
>condzero arith at all. 
As shown case above, 2 less insns with using condzero arith.

>
>
>I'll note that subsequent patches add MINUS, IOR, XOR and AND.  It's
>also possible (and important) to handle shifts.  There's a conditional
>shift-by-6 in leela's hot path. 
This series is a initial framework for simple condzero arith. Shift may come 
later
as it involes sugreg stuff.

>
>Overall this looks a lot like the VRULL code, but just less complete.
>My inclination is to do a cleanup pass on the VRULL code verify it
>handles all the cases in your tests and commit the VRULL implementation
>with your tests. 
I searched and didn't find VRULL codes, could you please provide a link at
your convience? My colleague Zeng Xiao posted monthes ago 
https://patchwork.sourceware.org/project/gcc/patch/20230719101156.21771-6-zengx...@eswincomputing.com/
But after fixing several bugs, we realized the previous implementation is quite 
complex and
come up with this patch series.

>
>I'll do some further poking at this today.  Thanks for re-submitting
>these bits.  Getting this target independent work cleaned up has been on
>my TODO for a while now. 
Thanks for your patience.

BR, 
Fei

>
>jeff

Re: Re: [PATCH v2] RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]

2023-10-30 Thread Li Xu

Already backported to releases/gcc-13.


--



Li Xu



>Ok for gcc 13 but just wait one more week to make sure everything is fine



>as gcc convention :)



>



>Li Xu 於 2023年10月24日 週二,15:49寫道:



>



>> Committed to trunk. Thanks juzhe.



>>



>>



>> --



>>



>>



>>



>> Li Xu



>>



>>



>>



>> >Ok for trunk (You can commit it to the trunk now).



>>



>>



>>



>> >



>>



>>



>>



>> >For GCC-13,  I'd like to wait for kito's comment.



>>



>>



>>



>> >



>>



>>



>>



>> >Thanks.



>>



>>



>>



>> >



>>



>>



>>



>> >



>>



>>



>>



>> >juzhe.zh...@rivai.ai



>>



>>



>>



>> >



>>



>>



>>



>> >From: Li Xu



>>



>>



>>



>> >Date: 2023-10-24 15:29



>>



>>



>>



>> >To: gcc-patches



>>



>>



>>



>> >CC: kito.cheng; palmer; juzhe.zhong



>>



>>



>>



>> >Subject: [PATCH v2] RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]



>>



>>



>>



>> >



>>



>>



>>



>> >Calling vget/vset intrinsic without receiving a return value will cause



>>



>>



>>



>> >a crash. Because in this case e.target is null.



>>



>>



>>



>> >This patch should be backported to releases/gcc-13.



>>



>>



>>



>> >



>>



>>



>>



>> >    PR/target 111935



>>



>>



>>



>> >



>>



>>



>>



>> >gcc/ChangeLog:



>>



>>



>>



>> >



>>



>>



>>



>> >    * config/riscv/riscv-vector-builtins-bases.cc: fix bug.



>>



>>



>>



>> >



>>



>>



>>



>> >gcc/testsuite/ChangeLog:



>>



>>



>>



>> >



>>



>>



>>



>> >    * gcc.target/riscv/rvv/base/pr111935.c: New test.



>>



>>



>>



>> >---



>>



>>



>>



>> > .../riscv/riscv-vector-builtins-bases.cc  |  4 +++



>>



>>



>>



>> > .../gcc.target/riscv/rvv/base/pr111935.c  | 26 +++



>>



>>



>>



>> > 2 files changed, 30 insertions(+)



>>



>>



>>



>> > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c



>>



>>



>>



>> >



>>



>>



>>



>> >diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc



>> b/gcc/config/riscv/riscv-vector-builtins-bases.cc



>>



>>



>>



>> >index ab12e130907..0b1409a52e0 100644



>>



>>



>>



>> >--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc



>>



>>



>>



>> >+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc



>>



>>



>>



>> >@@ -1740,6 +1740,8 @@ public:



>>



>>



>>



>> >



>>



>>



>>



>> >   rtx expand (function_expander &e) const override



>>



>>



>>



>> >   {



>>



>>



>>



>> >+    if (!e.target)



>>



>>



>>



>> >+  return NULL_RTX;



>>



>>



>>



>> > rtx dest = expand_normal (CALL_EXPR_ARG (e.exp, 0));



>>



>>



>>



>> > gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (dest)));



>>



>>



>>



>> > rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));



>>



>>



>>



>> >@@ -1777,6 +1779,8 @@ public:



>>



>>



>>



>> >



>>



>>



>>



>> >   rtx expand (function_expander &e) const override



>>



>>



>>



>> >   {



>>



>>



>>



>> >+    if (!e.target)



>>



>>



>>



>> >+  return NULL_RTX;



>>



>>



>>



>> > rtx src = expand_normal (CALL_EXPR_ARG (e.exp, 0));



>>



>>



>>



>> > gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (src)));



>>



>>



>>



>> > rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));



>>



>>



>>



>> >diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c



>> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c



>>



>>



>>



>> >new file mode 100644



>>



>>



>>



>> >index 000..0b936d849a1



>>



>>



>>



>> >--- /dev/null



>>



>>



>>



>> >+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c



>>



>>



>>



>> >@@ -0,0 +1,26 @@



>>



>>



>>



>> >+/* { dg-do compile } */



>>



>>



>>



>> >+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0 -Wno-psabi" } */



>>



>>



>>



>> >+



>>



>>



>>



>> >+#include "riscv_vector.h"



>>



>>



>>



>> >+



>>



>>



>>



>> >+inline vuint32m4_t __attribute__((__always_inline__))



>> transpose_indexes() {



>>



>>



>>



>> >+  static const uint32_t idx_[16] = {0, 4, 8, 12,



>>



>>



>>



>> >+  1, 5, 9, 13,



>>



>>



>>



>> >+  2, 6, 10, 14,



>>



>>



>>



>> >+  3, 7, 11, 15};



>>



>>



>>



>> >+  return __riscv_vle32_v_u32m4(idx_, 16);



>>



>>



>>



>> >+}



>>



>>



>>



>> >+



>>



>>



>>



>> >+void pffft_real_preprocess_4x4(const float *in) {



>>



>>



>>



>> >+  vfloat32m1_t r0=__riscv_vle32_v_f32m1(in,4);



>>



>>



>>



>> >+  vfloat32m4_t tmp = __riscv_vundefined_f32m4();



>>



>>



>>



>> >+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 0, r0);



>>



>>



>>



>> >+  tmp = __riscv_vset_

Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-30 Thread Kewen.Lin
Hi Carl,

on 2023/10/31 08:08, Carl Love wrote:
> GCC maintainers:
> 
> The following patch adds tests for two of the rs6000 overloaded built-
> ins that do not have tests.  Additionally the GCC documentation file

I just found that actually they have the test coverage, because we have

#define __builtin_bcdcmpeq(a,b)   __builtin_vec_bcdsub_eq(a,b,0)
#define __builtin_bcdcmpgt(a,b)   __builtin_vec_bcdsub_gt(a,b,0)
#define __builtin_bcdcmplt(a,b)   __builtin_vec_bcdsub_lt(a,b,0)
#define __builtin_bcdcmpge(a,b)   __builtin_vec_bcdsub_ge(a,b,0)
#define __builtin_bcdcmple(a,b)   __builtin_vec_bcdsub_le(a,b,0)

in altivec.h and gcc/testsuite/gcc.target/powerpc/bcd-4.c tests all these
__builtin_bcdcmp* ...

> doc/extend.texi is updated to include the built-in definitions as they
> were missing.

... since we already document __builtin_vec_bcdsub_{eq,gt,lt}, I think
it's still good to supplement the documentation and add the explicit
testing cases.

> 
> The patch has been tested on a Power 10 system with no regressions. 
> Please let me know if this patch is acceptable for mainline.
> 
>  Carl
> 
> ---
> rs6000, Add missing overloaded bcd builtin tests
> 
> The two BCD overloaded built-ins __builtin_bcdsub_ge and __builtin_bcdsub_le
> do not have a corresponding test.  Add tests to existing test file and update
> the documentation with the built-in definitions.

As above, this commit log doesn't describe the actuality well, please update
it with something like:

Currently we have the documentation for __builtin_vec_bcdsub_{eq,gt,lt} but
not for __builtin_bcdsub_[gl]e, this patch is to supplement the descriptions
for them.  Although they are mainly for __builtin_bcdcmp{ge,le}, we already
have some testing coverage for __builtin_vec_bcdsub_{eq,gt,lt}, this patch
adds the corresponding explicit test cases as well.

> 
> gcc/ChangeLog:
>   * doc/extend.texi (__builtin_bcdsub_le, __builtin_bcdsub_ge): Add
>   documentation for the builti-ins.
> 
> gcc/testsuite/ChangeLog:
>   * bcd-3.c (do_sub_ge, do_suble): Add functions to test builtins
>   __builtin_bcdsub_ge and __builtin_bcdsub_le).

1) Unexpected ")" at the end.

2) I supposed git gcc-verify would complain on this changelog entry.

Should be starting with:

* gcc.target/powerpc/bcd-3.c (

, no?

OK for trunk with the above comments addressed, thanks!

BR,
Kewen

> ---
>  gcc/doc/extend.texi  |  4 
>  gcc/testsuite/gcc.target/powerpc/bcd-3.c | 22 +-
>  2 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index cf0d0c63cce..fa7402813e7 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -20205,12 +20205,16 @@ int __builtin_bcdadd_ov (vector unsigned char, 
> vector unsigned char, const int);
>  vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const 
> int);
>  vector unsigned char __builtin_bcdsub (vector unsigned char, vector unsigned 
> char,
> const int);
> +int __builtin_bcdsub_le (vector __int128, vector __int128, const int);
> +int __builtin_bcdsub_le (vector unsigned char, vector unsigned char, const 
> int);
>  int __builtin_bcdsub_lt (vector __int128, vector __int128, const int);
>  int __builtin_bcdsub_lt (vector unsigned char, vector unsigned char, const 
> int);
>  int __builtin_bcdsub_eq (vector __int128, vector __int128, const int);
>  int __builtin_bcdsub_eq (vector unsigned char, vector unsigned char, const 
> int);
>  int __builtin_bcdsub_gt (vector __int128, vector __int128, const int);
>  int __builtin_bcdsub_gt (vector unsigned char, vector unsigned char, const 
> int);
> +int __builtin_bcdsub_ge (vector __int128, vector __int128, const int);
> +int __builtin_bcdsub_ge (vector unsigned char, vector unsigned char, const 
> int);
>  int __builtin_bcdsub_ov (vector __int128, vector __int128, const int);
>  int __builtin_bcdsub_ov (vector unsigned char, vector unsigned char, const 
> int);
>  @end smallexample
> diff --git a/gcc/testsuite/gcc.target/powerpc/bcd-3.c 
> b/gcc/testsuite/gcc.target/powerpc/bcd-3.c
> index 7948a0c95e2..9891f4ff08e 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bcd-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bcd-3.c
> @@ -3,7 +3,7 @@
>  /* { dg-require-effective-target powerpc_p8vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power8 -O2" } */
>  /* { dg-final { scan-assembler-times "bcdadd\[.\] " 4 } } */
> -/* { dg-final { scan-assembler-times "bcdsub\[.\] " 4 } } */
> +/* { dg-final { scan-assembler-times "bcdsub\[.\] " 6 } } */
>  /* { dg-final { scan-assembler-not   "bl __builtin"   } } */
>  /* { dg-final { scan-assembler-not   "mtvsr"   } } */
>  /* { dg-final { scan-assembler-not   "mfvsr"   } } */
> @@ -93,6 +93,26 @@ do_sub_gt (vector_128_t a, vector_128_t b, int *p)
>return ret;
>  }
>  
> +vector_128_t
> +do_sub_ge (vector

[PATCH V5] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-10-30 Thread Li Xu
From: xuli 

Update in v5:
* Split has_vxrm_or_frm_p into maybe_require_frm_p and
  maybe_require_vxrm_p.
* Adjust comments.

Update in v4:
* Remove class function_resolver.
* Remove function get_non_overloaded_instance.
* Add overloaded hash traits for non-overloaded intrinsic.
* All overloaded intrinsics are implemented, and the tests pass.

Update in v3:

* Rewrite comment for overloaded function add.
* Move get_non_overloaded_instance to function_base.

Update in v2:

* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.

Original log:

This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.

However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.

* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): New 
function for the hook.
(riscv_register_pragmas): Register the hook.
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one): Register 
overloaded function.
* config/riscv/riscv-vector-builtins.cc (struct 
non_overloaded_registered_function_hasher):
  New hash table.
(function_builder::add_function): Add overloaded arg.
(function_builder::add_unique_function): Map overloaded function to 
non-overloaded function.
(function_builder::add_overloaded_function): New API impl.
(registered_function::overloaded_hash): Calculate hash value.
(maybe_require_frm_p): New function impl.
(maybe_require_vxrm_p): Ditto.
(has_vxrm_or_frm_p): Ditto.
(non_overloaded_registered_function_hasher::hash): Ditto.
(non_overloaded_registered_function_hasher::equal): Ditto.
(handle_pragma_vector): Allocate space for hash table.
(resolve_overloaded_builtin): New function impl.
* config/riscv/riscv-vector-builtins.h: Add additional parameters to 
add_function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/overloaded_rv32_vadd.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv32_vfadd.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv32_vget_vset.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv32_vloxseg2ei16.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv32_vmv.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv32_vreinterpret.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vadd.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vfadd.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vget_vset.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vloxseg2ei16.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vreinterpret.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vadd.h: New test.
* gcc.target/riscv/rvv/base/overloaded_vfadd.h: New test.
* gcc.target/riscv/rvv/base/overloaded_vget_vset.h: New test.
* gcc.target/riscv/rvv/base/overloaded_vloxseg2ei16.h: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv.h: New test.
* gcc.target/riscv/rvv/base/overloaded_vreinterpret.h: New test.

Signed-off-by: Li Xu 
Co-Authored-By: Pan Li 
---
 gcc/config/riscv/riscv-c.cc   |  36 ++-
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/riscv-vector-builtins-shapes.cc |   1 +
 gcc/config/riscv/riscv-vector-builtins.cc | 259 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   5 +-
 .../riscv/rvv/base/overloaded_rv32_vadd.c |  12 +
 .../riscv/rvv/base/overloaded_rv32_vfadd.c|  12 +
 .../rvv/base/overloaded_rv32_vget_vset.c  |   7 +
 .../rvv/base/overloaded_rv32_vloxseg2ei16.c   |  11 +
 .../riscv/rvv/base/overloaded_rv32_vmv.c  |  10 +
 .../rvv/base/overloaded_rv32_vreinterpret.c   |  10 +
 .../riscv/rvv/base/overloaded_rv64_vadd.c |  11 +
 .../riscv/rvv/base/overloaded_rv64_vfadd.c|  11 +
 .../rvv/base/overloaded_rv64_vget_vset.c  |   6 +
 .../rvv/base/overloaded_rv64_vloxseg2ei16.c   |  10 +
 .../riscv/rvv/base/overloaded_rv64_vmv.c  |  10 +
 .../rvv/base/overloaded_rv64_vreinterpret.c   |   9 +
 .../riscv/rvv/base/overloaded_vadd.h  |  59 
 .../riscv/rvv/base/overloaded_vfadd.h |  67 +
 .../riscv/rvv/base/overloaded_vget_vset.h |  27 ++
 .../riscv/rvv/base/overloaded_vloxseg2ei16.h  |  39 +++
 .../riscv/rvv/base/overloaded_vmv.h   |  26 ++
 .../riscv/rvv/base/overloaded_vreinterpret.h  |  29 ++
 23 file

Re: [RFC] RISC-V: Support -mcmodel=large.

2023-10-30 Thread Kito Cheng
> Overall it looks pretty good.   Does Andestech have a copyright
> assignment in place?  Or are you contributing under the DCO rule?

Kuan-Lin Chen is Andestech folk, and Andestech have signed copyright
assignment for most GNU toolchain components I believe :)

> https://gcc.gnu.org/dco.html
>
> JeJeff


Re: Re: [PATCH] RISC-V: Add vector fmin/fmax expanders.

2023-10-30 Thread juzhe.zh...@rivai.ai
LGTM as long as you add HONOR_SNANS




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-31 03:26
To: Joseph Myers
CC: rdapp.gcc; gcc-patches; palmer; Kito Cheng; jeffreyalaw; 
juzhe.zh...@rivai.ai
Subject: Re: [PATCH] RISC-V: Add vector fmin/fmax expanders.
> Aren't they actually the IEEE 754-2019 operations (with different 
> signaling NaN semantics; C functions such as fmaximum in C23), not the 
> IEEE 754-2008 operations (C functions such as fmax)?  V spec 1.0 says "The 
> vector floating-point vfmin and vfmax instructions have the same behavior 
> as the corresponding scalar floating-point instructions in version 2.2 of 
> the RISC-V F/D/Q extension.".  And version 2.2 of F/D/Q (which is *not* 
> version 2.2 of the instruction set, it's later than that) changed the 
 
Oh, thanks for catching this - I indeed incorrectly assumed this refers to
version 2.2 of the RISC-V spec (which contains F/D, .. of version 2.0).
Too bad, it appeared too convenient.  Then I need to add the same
!HONOR_SNANS to all the expanders as well as the tests.
 
Regards
Robin
 


Re: [PATCH 1/4] RISC-V: Recategorize "prefetch" availabilities

2023-10-30 Thread Kito Cheng
> Unless Kito feels otherwise I would suggest keeping a distinct API
> interface for each case.

Yeah, I think they should have a distinct API.


[PATCH 2/2] RISC-V: Require a extension for testcases with atomic insns

2023-10-30 Thread Patrick O'Neill
Add testsuite infrastructure for the A extension and use it to require the A
extension for dg-do run and add the add extension for non-A dg-do compile.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: Add A extension to
dg-options for dg-do compile.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/inline-atomics-2.c: Ditto.
* gcc.target/riscv/inline-atomics-3.c: Require A extension for dg-do
run.
* gcc.target/riscv/inline-atomics-4.c: Ditto.
* gcc.target/riscv/inline-atomics-5.c: Ditto.
* gcc.target/riscv/inline-atomics-6.c: Ditto.
* gcc.target/riscv/inline-atomics-7.c: Ditto.
* gcc.target/riscv/inline-atomics-8.c: Ditto.
* lib/target-supports.exp: Add testing infrastructure to require the A
extension or add it to an existing -march.

Signed-off-by: Patrick O'Neill 
---
This patch relies on the previous one in the series. If applied seperately,
amo-table-a-6-store-compat-3.c and amo-table-a-6-load-3.c must be updated to
require the A extension as those testcases check for the optimized fences.
---
 .../riscv/amo-table-a-6-amo-add-1.c   |  1 +
 .../riscv/amo-table-a-6-amo-add-2.c   |  1 +
 .../riscv/amo-table-a-6-amo-add-3.c   |  1 +
 .../riscv/amo-table-a-6-amo-add-4.c   |  1 +
 .../riscv/amo-table-a-6-amo-add-5.c   |  1 +
 .../riscv/amo-table-a-6-compare-exchange-1.c  |  1 +
 .../riscv/amo-table-a-6-compare-exchange-2.c  |  1 +
 .../riscv/amo-table-a-6-compare-exchange-3.c  |  1 +
 .../riscv/amo-table-a-6-compare-exchange-4.c  |  1 +
 .../riscv/amo-table-a-6-compare-exchange-5.c  |  1 +
 .../riscv/amo-table-a-6-compare-exchange-6.c  |  1 +
 .../riscv/amo-table-a-6-compare-exchange-7.c  |  1 +
 .../riscv/amo-table-a-6-subword-amo-add-1.c   |  1 +
 .../riscv/amo-table-a-6-subword-amo-add-2.c   |  1 +
 .../riscv/amo-table-a-6-subword-amo-add-3.c   |  1 +
 .../riscv/amo-table-a-6-subword-amo-add-4.c   |  1 +
 .../riscv/amo-table-a-6-subword-amo-add-5.c   |  1 +
 .../gcc.target/riscv/inline-atomics-2.c   |  3 ++-
 .../gcc.target/riscv/inline-atomics-3.c   |  2 +-
 .../gcc.target/riscv/inline-atomics-4.c   |  2 +-
 .../gcc.target/riscv/inline-atomics-5.c   |  2 +-
 .../gcc.target/riscv/inline-atomics-6.c   |  2 +-
 .../gcc.target/riscv/inline-atomics-7.c   |  2 +-
 .../gcc.target/riscv/inline-atomics-8.c   |  2 +-
 gcc/testsuite/lib/target-supports.exp | 23 +++
 25 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
index 071a33928fe..8ab1a02b40c 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match Table A.6's recommended mapping.  */
 /* { dg-options "-O3" } */
+/* { dg-add-options riscv_a } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c
index d6b2d91db2a..a5a841abdcd 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match Table A.6's recommended mapping.  */
 /* { dg-options "-O3" } */
+/* { dg-add-options riscv_a } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-3.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-3.c
index 68a69ed8b78..f523821b658 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add

[PATCH 1/2] RISC-V: Let non-atomic targets use optimized amo loads/stores

2023-10-30 Thread Patrick O'Neill
Non-atomic targets are currently prevented from using the optimized fencing for
seq_cst load/seq_cst store. This patch removes that constraint.

gcc/ChangeLog:

* config/riscv/sync-rvwmo.md (atomic_load_rvwmo): Remove
TARGET_ATOMIC constraint
(atomic_store_rvwmo): Ditto.
* config/riscv/sync-ztso.md (atomic_load_ztso): Ditto.
(atomic_store_ztso): Ditto.
* config/riscv/sync.md (atomic_load): Ditto.
(atomic_store): Ditto.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/sync-rvwmo.md | 4 ++--
 gcc/config/riscv/sync-ztso.md  | 4 ++--
 gcc/config/riscv/sync.md   | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/sync-rvwmo.md b/gcc/config/riscv/sync-rvwmo.md
index cb641ea9ec3..c35eae15334 100644
--- a/gcc/config/riscv/sync-rvwmo.md
+++ b/gcc/config/riscv/sync-rvwmo.md
@@ -52,7 +52,7 @@
[(match_operand:GPR 1 "memory_operand" "A")
 (match_operand:SI 2 "const_int_operand")]  ;; model
 UNSPEC_ATOMIC_LOAD))]
-  "TARGET_ATOMIC && !TARGET_ZTSO"
+  "!TARGET_ZTSO"
   {
 enum memmodel model = (enum memmodel) INTVAL (operands[2]);
 model = memmodel_base (model);
@@ -78,7 +78,7 @@
[(match_operand:GPR 1 "reg_or_0_operand" "rJ")
 (match_operand:SI 2 "const_int_operand")]  ;; model
 UNSPEC_ATOMIC_STORE))]
-  "TARGET_ATOMIC && !TARGET_ZTSO"
+  "!TARGET_ZTSO"
   {
 enum memmodel model = (enum memmodel) INTVAL (operands[2]);
 model = memmodel_base (model);
diff --git a/gcc/config/riscv/sync-ztso.md b/gcc/config/riscv/sync-ztso.md
index 7bb15b7ab8c..6fdfa912a2c 100644
--- a/gcc/config/riscv/sync-ztso.md
+++ b/gcc/config/riscv/sync-ztso.md
@@ -46,7 +46,7 @@
[(match_operand:GPR 1 "memory_operand" "A")
 (match_operand:SI 2 "const_int_operand")]  ;; model
 UNSPEC_ATOMIC_LOAD))]
-  "TARGET_ATOMIC && TARGET_ZTSO"
+  "TARGET_ZTSO"
   {
 enum memmodel model = (enum memmodel) INTVAL (operands[2]);
 model = memmodel_base (model);
@@ -66,7 +66,7 @@
[(match_operand:GPR 1 "reg_or_0_operand" "rJ")
 (match_operand:SI 2 "const_int_operand")]  ;; model
 UNSPEC_ATOMIC_STORE))]
-  "TARGET_ATOMIC && TARGET_ZTSO"
+  "TARGET_ZTSO"
   {
 enum memmodel model = (enum memmodel) INTVAL (operands[2]);
 model = memmodel_base (model);
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 6ff3493b5ce..ec9d4b4f59e 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -60,7 +60,7 @@
   [(match_operand:GPR 0 "register_operand")
(match_operand:GPR 1 "memory_operand")
(match_operand:SI 2 "const_int_operand")] ;; model
-  "TARGET_ATOMIC"
+  ""
   {
 if (TARGET_ZTSO)
   emit_insn (gen_atomic_load_ztso (operands[0], operands[1],
@@ -75,7 +75,7 @@
   [(match_operand:GPR 0 "memory_operand")
(match_operand:GPR 1 "reg_or_0_operand")
(match_operand:SI 2 "const_int_operand")] ;; model
-  "TARGET_ATOMIC"
+  ""
   {
 if (TARGET_ZTSO)
   emit_insn (gen_atomic_store_ztso (operands[0], operands[1],
--
2.34.1



[PATCH] RISC-V: Enable ztso tests on rv32

2023-10-30 Thread Patrick O'Neill
This patch transitions the ztso testcases to use the testsuite infrastructure,
enabling the tests on both rv64 and rv32 targets.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add Ztso extension to
dg-options for dg-do compile.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.
* lib/target-supports.exp: Add testing infrastructure to require the
Ztso extension or add it to an existing -march.

Signed-off-by: Patrick O'Neill 
---
 .../riscv/amo-table-ztso-amo-add-1.c  |  3 ++-
 .../riscv/amo-table-ztso-amo-add-2.c  |  3 ++-
 .../riscv/amo-table-ztso-amo-add-3.c  |  3 ++-
 .../riscv/amo-table-ztso-amo-add-4.c  |  3 ++-
 .../riscv/amo-table-ztso-amo-add-5.c  |  3 ++-
 .../riscv/amo-table-ztso-compare-exchange-1.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-2.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-3.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-4.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-5.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-6.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-7.c |  2 +-
 .../gcc.target/riscv/amo-table-ztso-fence-1.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-fence-2.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-fence-3.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-fence-4.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-fence-5.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-load-1.c  |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-load-2.c  |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-load-3.c  |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-store-1.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-store-2.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-store-3.c |  3 ++-
 .../riscv/amo-table-ztso-subword-amo-add-1.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-2.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-3.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-4.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-5.c  |  2 +-
 gcc/testsuite/lib/target-supports.exp | 23 +++
 29 files changed, 67 insertions(+), 28 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
index a88d08eb3f4..65a4351025d 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match the Ztso suggested mapping.  */
-/* { dg-options "-march=rv64id_ztso -mabi=lp64d -O3" } */
+/* { dg-options "-O3" } */
+/* { dg-add-options riscv_ztso } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
index ebd240f9dd2..03da6b04de0 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings the Ztso suggested mapping.  */
-/* { dg-options "-march=rv64id_ztso -mabi=lp64d -O3" } */
+/* { dg-options 

[PATCH v2 2/4] Output file checksums in CodeView section

2023-10-30 Thread Mark Harmstone
Outputs the file name and MD5 hash of the main source file into the
CodeView .debug$S section, along with that of any #include'd files.
---
 gcc/dwarf2codeview.cc | 254 ++
 gcc/dwarf2codeview.h  |   1 +
 gcc/dwarf2out.cc  |   5 +
 3 files changed, 260 insertions(+)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index f08f5d55ad7..da8315310b5 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -39,6 +39,257 @@ along with GCC; see the file COPYING3.  If not see
 
 #define CV_SIGNATURE_C13   4
 
+#define DEBUG_S_STRINGTABLE 0xf3
+#define DEBUG_S_FILECHKSMS  0xf4
+
+#define CHKSUM_TYPE_MD51
+
+#define HASH_SIZE 16
+
+struct codeview_string
+{
+  codeview_string *next;
+  uint32_t offset;
+  char *string;
+};
+
+struct string_hasher : free_ptr_hash 
+{
+  typedef const char *compare_type;
+
+  static hashval_t hash (const codeview_string *x)
+  {
+return htab_hash_string (x->string);
+  }
+
+  static bool equal (const codeview_string *x, const char *y)
+  {
+return !strcmp (x->string, y);
+  }
+
+  static void mark_empty (codeview_string *x)
+  {
+if (x->string)
+  {
+   free (x->string);
+   x->string = NULL;
+  }
+  }
+
+  static void remove (codeview_string *&x)
+  {
+free (x->string);
+  }
+};
+
+struct codeview_source_file
+{
+  codeview_source_file *next;
+  unsigned int file_num;
+  uint32_t string_offset;
+  char *filename;
+  uint8_t hash[HASH_SIZE];
+};
+
+static codeview_source_file *files, *last_file;
+static unsigned int num_files;
+static uint32_t string_offset = 1;
+static hash_table *strings_htab;
+static codeview_string *strings, *last_string;
+
+/* Adds string to the string table, returning its offset.  If already present,
+   this returns the offset of the existing string.  */
+
+static uint32_t
+add_string (const char *string)
+{
+  codeview_string **slot;
+  codeview_string *s;
+  size_t len;
+
+  if (!strings_htab)
+strings_htab = new hash_table (10);
+
+  slot = strings_htab->find_slot_with_hash (string, htab_hash_string (string),
+   INSERT);
+
+  if (*slot)
+return (*slot)->offset;
+
+  s = (codeview_string *) xmalloc (sizeof (codeview_string));
+  len = strlen (string);
+
+  s->next = NULL;
+
+  s->offset = string_offset;
+  string_offset += len + 1;
+
+  s->string = xstrdup (string);
+
+  if (last_string)
+last_string->next = s;
+  else
+strings = s;
+
+  last_string = s;
+
+  *slot = s;
+
+  return s->offset;
+}
+
+/* A new source file has been encountered - record the details and calculate
+   its hash.  */
+
+void
+codeview_start_source_file (const char *filename)
+{
+  codeview_source_file *sf;
+  char *path;
+  uint32_t string_offset;
+  FILE *f;
+
+  path = lrealpath (filename);
+  string_offset = add_string (path);
+  free (path);
+
+  sf = files;
+  while (sf)
+{
+  if (sf->string_offset == string_offset)
+   return;
+
+  sf = sf->next;
+}
+
+  sf = (codeview_source_file *) xmalloc (sizeof (codeview_source_file));
+  sf->next = NULL;
+  sf->file_num = num_files;
+  sf->string_offset = string_offset;
+  sf->filename = xstrdup (filename);
+
+  f = fopen (filename, "r");
+  if (!f)
+internal_error ("could not open %s for reading", filename);
+
+  if (md5_stream (f, sf->hash))
+{
+  fclose (f);
+  internal_error ("md5_stream failed");
+}
+
+  fclose (f);
+
+  if (last_file)
+last_file->next = sf;
+  else
+files = sf;
+
+  last_file = sf;
+  num_files++;
+}
+
+/* Write out the strings table into the .debug$S section.  The linker will
+   parse this, and handle the deduplication and hashing for all the object
+   files.  */
+
+static void
+write_strings_table (void)
+{
+  codeview_string *string;
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, DEBUG_S_STRINGTABLE);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_strings_end - %LLcv_strings_start\n");
+
+  asm_fprintf (asm_out_file, "%LLcv_strings_start:\n");
+
+  /* The first entry is always an empty string.  */
+  fputs (integer_asm_op (1, false), asm_out_file);
+  fprint_whex (asm_out_file, 0);
+  putc ('\n', asm_out_file);
+
+  string = strings;
+  while (string)
+{
+  ASM_OUTPUT_ASCII (asm_out_file, string->string,
+   strlen (string->string) + 1);
+
+  string = string->next;
+}
+
+  delete strings_htab;
+
+  asm_fprintf (asm_out_file, "%LLcv_strings_end:\n");
+
+  ASM_OUTPUT_ALIGN (asm_out_file, 2);
+}
+
+/* Write out the file checksums data into the .debug$S section.  */
+
+static void
+write_source_files (void)
+{
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, DEBUG_S_FILECHKSMS);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  asm_fprintf (asm_out_file,
+  "%LLcv_file

[PATCH v2 1/4] Support for CodeView debugging format

2023-10-30 Thread Mark Harmstone
This patch and the following add initial support for Microsoft's
CodeView debugging format, as used by MSVC, to mingw targets.

Note that you will need a recent version of binutils for this to be
useful. The best way to view the output is to run Microsoft's
cvdump.exe, found in their microsoft-pdb repo on GitHub, against the
object files.
---
 gcc/Makefile.in   |  2 +
 gcc/config/i386/cygming.h |  2 +
 gcc/dwarf2codeview.cc | 54 +++
 gcc/dwarf2codeview.h  | 30 +++
 gcc/dwarf2out.cc  |  6 +++
 gcc/flag-types.h  |  3 ++
 gcc/flags.h   |  4 ++
 gcc/opts.cc   | 23 ++--
 .../gcc.dg/debug/codeview/codeview-1.c|  6 +++
 .../gcc.dg/debug/codeview/codeview.exp| 48 +
 gcc/toplev.cc |  4 ++
 11 files changed, 177 insertions(+), 5 deletions(-)
 create mode 100644 gcc/dwarf2codeview.cc
 create mode 100644 gcc/dwarf2codeview.h
 create mode 100644 gcc/testsuite/gcc.dg/debug/codeview/codeview-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/codeview/codeview.exp

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 91d6bfbea4d..b260fe12c08 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1430,6 +1430,7 @@ OBJS = \
dumpfile.o \
dwarf2asm.o \
dwarf2cfi.o \
+   dwarf2codeview.o \
dwarf2ctf.o \
dwarf2out.o \
early-remat.o \
@@ -2800,6 +2801,7 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/dwarf2out.h \
   $(srcdir)/dwarf2asm.cc \
   $(srcdir)/dwarf2cfi.cc \
+  $(srcdir)/dwarf2codeview.cc \
   $(srcdir)/dwarf2ctf.cc \
   $(srcdir)/dwarf2out.cc \
   $(srcdir)/ctfc.h \
diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index d539f8d0699..a141462133b 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -20,6 +20,8 @@ along with GCC; see the file COPYING3.  If not see
 
 #define DWARF2_DEBUGGING_INFO 1
 
+#define CODEVIEW_DEBUGGING_INFO 1
+
 #undef PREFERRED_DEBUGGING_TYPE
 #define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG
 
diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
new file mode 100644
index 000..f08f5d55ad7
--- /dev/null
+++ b/gcc/dwarf2codeview.cc
@@ -0,0 +1,54 @@
+/* Generate CodeView debugging info from the GCC DWARF.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* See gas/codeview.h in binutils for more about the constants and structs
+   listed below.  References to Microsoft files refer to Microsoft's PDB
+   repository: https://github.com/microsoft/microsoft-pdb.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "target.h"
+#include "output.h"
+#include "errors.h"
+#include "md5.h"
+#include "function.h"
+#include "version.h"
+#include "tree.h"
+#include "langhooks.h"
+#include "dwarf2out.h"
+#include "dwarf2codeview.h"
+
+#ifdef CODEVIEW_DEBUGGING_INFO
+
+#define CV_SIGNATURE_C13   4
+
+/* Finish CodeView debug info emission.  */
+
+void
+codeview_debug_finish (void)
+{
+  targetm.asm_out.named_section (".debug$S", SECTION_DEBUG, NULL);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, CV_SIGNATURE_C13);
+  putc ('\n', asm_out_file);
+}
+
+#endif
diff --git a/gcc/dwarf2codeview.h b/gcc/dwarf2codeview.h
new file mode 100644
index 000..efda148eb49
--- /dev/null
+++ b/gcc/dwarf2codeview.h
@@ -0,0 +1,30 @@
+/* dwarf2codeview.h - DWARF interface for CodeView generation.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifnd

[PATCH v4 4/4] Output S_COMPILE3 symbol in CodeView debug section

2023-10-30 Thread Mark Harmstone
Outputs the S_COMPILE3 symbol in the CodeView .debug$S debug section.
The DEBUG_S_SYMBOLS block added here makes up pretty much everything
that isn't data structures or line numbers; we add the S_COMPILE3 symbol
here to start it off.

This is a descriptive bit, the most interesting part of which is the
version of the compiler used.
---
 gcc/dwarf2codeview.cc | 126 ++
 1 file changed, 126 insertions(+)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 9c69ebf8998..db776d79be4 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -39,14 +39,25 @@ along with GCC; see the file COPYING3.  If not see
 
 #define CV_SIGNATURE_C13   4
 
+#define DEBUG_S_SYMBOLS0xf1
 #define DEBUG_S_LINES  0xf2
 #define DEBUG_S_STRINGTABLE 0xf3
 #define DEBUG_S_FILECHKSMS  0xf4
 
 #define CHKSUM_TYPE_MD51
 
+#define S_COMPILE3 0x113c
+
+#define CV_CFL_80386   0x03
+#define CV_CFL_X64 0xD0
+
+#define CV_CFL_C   0x00
+#define CV_CFL_CXX 0x01
+
 #define LINE_LABEL "Lcvline"
 #define END_FUNC_LABEL "Lcvendfunc"
+#define SYMBOL_START_LABEL "Lcvsymstart"
+#define SYMBOL_END_LABEL   "Lcvsymend"
 
 #define HASH_SIZE 16
 
@@ -120,6 +131,7 @@ struct codeview_function
 
 static unsigned int line_label_num;
 static unsigned int func_label_num;
+static unsigned int sym_label_num;
 static codeview_source_file *files, *last_file;
 static unsigned int num_files;
 static uint32_t string_offset = 1;
@@ -592,6 +604,119 @@ codeview_end_epilogue (void)
 }
 }
 
+/* Return the CodeView constant for the selected architecture.  */
+
+static uint16_t
+target_processor (void)
+{
+  if (TARGET_64BIT)
+return CV_CFL_X64;
+  else
+return CV_CFL_80386;
+}
+
+/* Return the CodeView constant for the language being used.  */
+
+static uint32_t
+language_constant (void)
+{
+  const char *language_string = lang_hooks.name;
+
+  if (startswith (language_string, "GNU C++"))
+return CV_CFL_CXX;
+  else if (startswith (language_string, "GNU C"))
+return CV_CFL_C;
+
+  return 0;
+}
+
+/* Write a S_COMPILE3 symbol, which records the details of the compiler
+   being used.  */
+
+static void
+write_compile3_symbol (void)
+{
+  unsigned int label_num = ++sym_label_num;
+
+  static const char compiler_name[] = "GCC ";
+
+  /* This is struct COMPILESYM3 in binutils and Microsoft's cvinfo.h:
+
+ struct COMPILESYM3
+ {
+   uint16_t length;
+   uint16_t type;
+   uint32_t flags;
+   uint16_t machine;
+   uint16_t frontend_major;
+   uint16_t frontend_minor;
+   uint16_t frontend_build;
+   uint16_t frontend_qfe;
+   uint16_t backend_major;
+   uint16_t backend_minor;
+   uint16_t backend_build;
+   uint16_t backend_qfe;
+ } ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file,
+  "%L" SYMBOL_END_LABEL "%u - %L" SYMBOL_START_LABEL "%u\n",
+  label_num, label_num);
+
+  targetm.asm_out.internal_label (asm_out_file, SYMBOL_START_LABEL, label_num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, S_COMPILE3);
+  putc ('\n', asm_out_file);
+
+  /* Microsoft has the flags as a bitfield, with the bottom 8 bits being the
+ language constant, and the reset being MSVC-specific stuff.  */
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, language_constant ());
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, target_processor ());
+  putc ('\n', asm_out_file);
+
+  /* Write 8 uint16_ts for the frontend and backend versions.  As with GAS, we
+ zero these, as it's easier to record the version in the compiler
+ string.  */
+  for (unsigned int i = 0; i < 8; i++)
+{
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, 0);
+  putc ('\n', asm_out_file);
+}
+
+  ASM_OUTPUT_ASCII (asm_out_file, compiler_name, sizeof (compiler_name) - 1);
+  ASM_OUTPUT_ASCII (asm_out_file, version_string, strlen (version_string) + 1);
+
+  ASM_OUTPUT_ALIGN (asm_out_file, 2);
+
+  targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num);
+}
+
+/* Write the CodeView symbols into the .debug$S section.  */
+
+static void
+write_codeview_symbols (void)
+{
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, DEBUG_S_SYMBOLS);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_syms_end - %LLcv_syms_start\n");
+
+  asm_fprintf (asm_out_file, "%LLcv_syms_start:\n");
+
+  write_compile3_symbol ();
+
+  asm_fprintf (asm_out_file, "%LLcv_syms_end:\n");
+}
+
 /* Finish CodeView debug info emission.  */
 
 void
@@ -606,6 +731,7 @@ codeview_debug_finish (void)
   write_strings_table ();
  

[PATCH v2 3/4] Output line numbers in CodeView section

2023-10-30 Thread Mark Harmstone
Outputs the DEBUG_S_LINES block in the CodeView .debug$S section, which
maps between line numbers and addresses.

You'll need a fairly recent version of GAS for the .secidx directive to
be recognized.
---
 gcc/dwarf2codeview.cc | 303 ++
 gcc/dwarf2codeview.h  |   3 +
 gcc/dwarf2out.cc  |  15 +++
 gcc/opts.cc   |   2 +-
 4 files changed, 322 insertions(+), 1 deletion(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index da8315310b5..9c69ebf8998 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -39,11 +39,15 @@ along with GCC; see the file COPYING3.  If not see
 
 #define CV_SIGNATURE_C13   4
 
+#define DEBUG_S_LINES  0xf2
 #define DEBUG_S_STRINGTABLE 0xf3
 #define DEBUG_S_FILECHKSMS  0xf4
 
 #define CHKSUM_TYPE_MD51
 
+#define LINE_LABEL "Lcvline"
+#define END_FUNC_LABEL "Lcvendfunc"
+
 #define HASH_SIZE 16
 
 struct codeview_string
@@ -91,11 +95,128 @@ struct codeview_source_file
   uint8_t hash[HASH_SIZE];
 };
 
+struct codeview_line
+{
+  codeview_line *next;
+  unsigned int line_no;
+  unsigned int label_num;
+};
+
+struct codeview_line_block
+{
+  codeview_line_block *next;
+  uint32_t file_id;
+  unsigned int num_lines;
+  codeview_line *lines, *last_line;
+};
+
+struct codeview_function
+{
+  codeview_function *next;
+  function *func;
+  unsigned int end_label;
+  codeview_line_block *blocks, *last_block;
+};
+
+static unsigned int line_label_num;
+static unsigned int func_label_num;
 static codeview_source_file *files, *last_file;
 static unsigned int num_files;
 static uint32_t string_offset = 1;
 static hash_table *strings_htab;
 static codeview_string *strings, *last_string;
+static codeview_function *funcs, *last_func;
+static const char* last_filename;
+static uint32_t last_file_id;
+
+/* Record new line number against the current function.  */
+
+void
+codeview_source_line (unsigned int line_no, const char *filename)
+{
+  codeview_line *l;
+  uint32_t file_id = last_file_id;
+  unsigned int label_num = ++line_label_num;
+
+  targetm.asm_out.internal_label (asm_out_file, LINE_LABEL, label_num);
+
+  if (!last_func || last_func->func != cfun)
+{
+  codeview_function *f = (codeview_function *)
+   xmalloc (sizeof (codeview_function));
+
+  f->next = NULL;
+  f->func = cfun;
+  f->end_label = 0;
+  f->blocks = f->last_block = NULL;
+
+  if (!funcs)
+   funcs = f;
+  else
+   last_func->next = f;
+
+  last_func = f;
+}
+
+  if (filename != last_filename)
+{
+  codeview_source_file *sf = files;
+
+  while (sf)
+   {
+ if (!strcmp (sf->filename, filename))
+   {
+ /* 0x18 is the size of the checksum entry for each file.
+0x6 bytes for the header, plus 0x10 bytes for the hash,
+then padded to a multiple of 4.  */
+
+ file_id = sf->file_num * 0x18;
+ last_filename = filename;
+ last_file_id = file_id;
+ break;
+   }
+
+ sf = sf->next;
+   }
+}
+
+  if (!last_func->last_block || last_func->last_block->file_id != file_id)
+{
+  codeview_line_block *b;
+
+  b = (codeview_line_block *) xmalloc (sizeof (codeview_line_block));
+
+  b->next = NULL;
+  b->file_id = file_id;
+  b->num_lines = 0;
+  b->lines = b->last_line = NULL;
+
+  if (!last_func->blocks)
+   last_func->blocks = b;
+  else
+   last_func->last_block->next = b;
+
+  last_func->last_block = b;
+}
+
+  if (last_func->last_block->last_line
+&& last_func->last_block->last_line->line_no == line_no)
+return;
+
+  l = (codeview_line *) xmalloc (sizeof (codeview_line));
+
+  l->next = NULL;
+  l->line_no = line_no;
+  l->label_num = label_num;
+
+  if (!last_func->last_block->lines)
+last_func->last_block->lines = l;
+  else
+last_func->last_block->last_line->next = l;
+
+  last_func->last_block->last_line = l;
+  last_func->last_block->num_lines++;
+}
 
 /* Adds string to the string table, returning its offset.  If already present,
this returns the offset of the existing string.  */
@@ -290,6 +411,187 @@ write_source_files (void)
   asm_fprintf (asm_out_file, "%LLcv_filechksms_end:\n");
 }
 
+/* Write out the line number information for each function into the
+   .debug$S section.  */
+
+static void
+write_line_numbers (void)
+{
+  unsigned int func_num = 0;
+
+  while (funcs)
+{
+  codeview_function *next = funcs->next;
+  unsigned int first_label_num;
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, DEBUG_S_LINES);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_lines%u_end - %LLcv_lines%u_start\n",
+  func_num, func_num);
+
+  asm_fprintf (asm_out_file, "%LLcv_lines%u_start:\n"

[PATCH v2 0/4] CodeView patches

2023-10-30 Thread Mark Harmstone
Changes from initial version:

* First patch now accepted
* Added #ifdefs to avoid compilation failures on other targets



[PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-30 Thread Carl Love
GCC maintainers:

The following patch adds tests for two of the rs6000 overloaded built-
ins that do not have tests.  Additionally the GCC documentation file
doc/extend.texi is updated to include the built-in definitions as they
were missing.

The patch has been tested on a Power 10 system with no regressions. 
Please let me know if this patch is acceptable for mainline.

 Carl

---
rs6000, Add missing overloaded bcd builtin tests

The two BCD overloaded built-ins __builtin_bcdsub_ge and __builtin_bcdsub_le
do not have a corresponding test.  Add tests to existing test file and update
the documentation with the built-in definitions.

gcc/ChangeLog:
* doc/extend.texi (__builtin_bcdsub_le, __builtin_bcdsub_ge): Add
documentation for the builti-ins.

gcc/testsuite/ChangeLog:
* bcd-3.c (do_sub_ge, do_suble): Add functions to test builtins
__builtin_bcdsub_ge and __builtin_bcdsub_le).
---
 gcc/doc/extend.texi  |  4 
 gcc/testsuite/gcc.target/powerpc/bcd-3.c | 22 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cf0d0c63cce..fa7402813e7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20205,12 +20205,16 @@ int __builtin_bcdadd_ov (vector unsigned char, vector 
unsigned char, const int);
 vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const int);
 vector unsigned char __builtin_bcdsub (vector unsigned char, vector unsigned 
char,
const int);
+int __builtin_bcdsub_le (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_le (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_lt (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_lt (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_eq (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_eq (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_gt (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_gt (vector unsigned char, vector unsigned char, const 
int);
+int __builtin_bcdsub_ge (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_ge (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_ov (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_ov (vector unsigned char, vector unsigned char, const 
int);
 @end smallexample
diff --git a/gcc/testsuite/gcc.target/powerpc/bcd-3.c 
b/gcc/testsuite/gcc.target/powerpc/bcd-3.c
index 7948a0c95e2..9891f4ff08e 100644
--- a/gcc/testsuite/gcc.target/powerpc/bcd-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/bcd-3.c
@@ -3,7 +3,7 @@
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O2" } */
 /* { dg-final { scan-assembler-times "bcdadd\[.\] " 4 } } */
-/* { dg-final { scan-assembler-times "bcdsub\[.\] " 4 } } */
+/* { dg-final { scan-assembler-times "bcdsub\[.\] " 6 } } */
 /* { dg-final { scan-assembler-not   "bl __builtin"   } } */
 /* { dg-final { scan-assembler-not   "mtvsr" } } */
 /* { dg-final { scan-assembler-not   "mfvsr" } } */
@@ -93,6 +93,26 @@ do_sub_gt (vector_128_t a, vector_128_t b, int *p)
   return ret;
 }
 
+vector_128_t
+do_sub_ge (vector_128_t a, vector_128_t b, int *p)
+{
+  vector_128_t ret = __builtin_bcdsub (a, b, 0);
+  if (__builtin_bcdsub_ge (a, b, 0))
+*p = 1;
+
+  return ret;
+}
+
+vector_128_t
+do_sub_le (vector_128_t a, vector_128_t b, int *p)
+{
+  vector_128_t ret = __builtin_bcdsub (a, b, 0);
+  if (__builtin_bcdsub_le (a, b, 0))
+*p = 1;
+
+  return ret;
+}
+
 vector_128_t
 do_sub_ov (vector_128_t a, vector_128_t b, int *p)
 {
-- 
2.37.2




Re: [PATCH v3] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-30 Thread Vineet Gupta




On 10/30/23 13:33, Jeff Law wrote:



On 10/29/23 21:21, Vineet Gupta wrote:

RV64 compare and branch instructions only support 64-bit operands.
At Expand time, the backend conservatively zero/sign extends
its operands even if not needed, such as incoming 32-bit function args
which ABI/ISA guarantee to be sign-extended already.

And subsequently REE fails to eliminate them as
    "missing defintion(s)" or "multiple definition(s)
since function args don't have explicit definition.

So during expand riscv_extend_comparands (), if an operand is a
subreg-promoted SI with inner DI, which is representative of a function
arg, just peel away the subreg to expose the DI, eliding the sign
extension. As Jeff noted this routine is also used in if-conversion so
also helps there.

Note there's currently patches floating around to improve REE and also a
new pass to eliminate unneccesary extensions, but it is still beneficial
to not generate those extra extensions in first place. It is obviously
less work for post-reload passes such as REE, but even for earlier
passes, such as combine, having to deal with one less thing and ensuing
fewer combinations is a win too.

Way too many existing tests used to observe this issue.
e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc
It elimiates the SEXT.W

Tested with rv64gc with no regressions, I'm relying on PAtrick's
pre-commit CI to do the full testing.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New.
* (riscv_extend_comparands): Call New function on operands.

Signed-off-by: Vineet Gupta 
---
Changes since v2:
   - Fix linting issues flagged by pre-commit CI
Changes since v1:
   - Elide sign extension for 32-bit operarnds only
   - Apply elison for both arguments
---
  gcc/config/riscv/riscv.cc | 23 +--
  1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ca9a2ca81d53..269beb3b159b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3678,6 +3678,24 @@ riscv_zero_if_equal (rtx cmp0, rtx cmp1)
 cmp0, cmp1, 0, 0, OPTAB_DIRECT);
  }
  +/* Helper function for riscv_extend_comparands to Sign-extend the OP.
+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI) 0
+   just peel off the SUBREG to get DI, avoiding extraneous 
extension.  */

+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_MODE (*op) == SImode
+  && GET_CODE (*op) == SUBREG
+  && SUBREG_PROMOTED_VAR_P (*op)
+  && GET_MODE_SIZE (GET_MODE (XEXP (*op, 0))).to_constant ()
+ == GET_MODE_SIZE (word_mode))
+    *op = XEXP (*op, 0);
+  else
+    *op = gen_rtx_SIGN_EXTEND (word_mode, *op);
So for the wrapped test GET_MODE_SIZE stuff), add parenthesis and 
indent the "==" clause.  ie


  && (GET_MODE_SIZE (GET_MODE (XEXP (*op), 0))).to_constant ()
  == GET_MODE_SIZE (word_mode))


Ok. FWIW I was using the wrong checker: git_check_commit.py vs. 
check_GNU_style.sh




Don't you also need to verify that the subreg was sign extended? The 
PROMOTED_VAR_P just notes that it was promoted, not *how* it was 
promoted.  I think you just need to add a test like this:


  && SUBREG_PROMOTED_SIGNED_P (*op)


Thx for catching this.
The orig test case I used to spot the issue had an unsigned promoted 
subreg but I was convinced it could still be removed (wrong on so many 
counts).



I don't guess you have data on how this impacts dynamic instruction 
counts on anything significant do you?


No, haven't run it yet. I can fire one though. I doubt if this is as 
significant as the prev one, even if this is the right thing to do.




OK with the formatting nit fixed and adding the additional check to 
ensure the value was sign extended.


Thx. I just wait for SPEC run before pushing this.

-Vineet


Re: [2/3] [aarch64] Add function multiversioning support

2023-10-30 Thread Richard Sandiford
Andrew Carlotti  writes:
> This adds initial support for function multiversion on aarch64 using the
> target_version and target_clones attributes. This mostly follows the
> Beta specification in the ACLE [1], with a few diffences that remain to
> be fixed:
>
> - Symbol mangling for target_clones differs from that for target_version
>   and does not match the mangling specified in the ACLE. This
>   inconsistency is also present in i386 and rs6000 mangling.
> - The target_clones attribute does not currently support an implicit
>   "default" version.
> - Unrecognised target names in a target_clones attribute should be
>   ignored (with an optional warning), but currently cause an error to be
>   raised instead.
> - There is no option to disable function multiversioning at compile
>   time.
> - There is no support for function multiversioning in C, since this is
>   not yet enabled in the frontend. On the other hand, this patch
>   happens to enable multiversioning in Ada and D as well, using their
>   existing frontend support.
>
> This patch relies on adding functionality to libgcc, to support:
> - struct { unsigned long long features; } __aarch64_cpu_features;
> - void __init_cpu_features (void);
> - void __init_cpu_features_resolver (unsigned long hwcap,
>const __ifunc_arg_t *arg);
> This support matches the interface currently used in LLVM's compiler-rt,
> and will be implemented in a future patch (which will be merged before
> merging this patch).
>
> This version of the patch incorrectly uses __init_cpu_features in the
> ifunc resolvers, which could lead to invalid library calls at load time.
> I will fix this to use __init_cpu_features_resolver in a future version
> of the patch.
>
> [1] 
> https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
>
> gcc/ChangeLog:
>
>   * attribs.cc (decl_attributes): Pass attribute name to target
>   hook.
>   * config/aarch64/aarch64.cc
>   (aarch64_process_target_version_attr): New.
>   (aarch64_option_valid_attribute_p): Add check and support for
>   target_version attribute.
>   (enum CPUFeatures): New list of for bitmask positions.
>   (aarch64_fmv_feature_data): New.
>   (get_feature_bit): New.
>   (get_feature_mask_for_version): New.
>   (compare_feature_masks): New.
>   (aarch64_compare_version_priority): New.
>   (make_resolver_func): New.
>   (add_condition_to_bb): New.
>   (compare_feature_version_info): New.
>   (dispatch_function_versions): New.
>   (aarch64_generate_version_dispatcher_body): New.
>   (aarch64_get_function_versions_dispatcher): New.
>   (aarch64_common_function_versions): New.
>   (aarch64_mangle_decl_assembler_name): New.
>   (TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
>   (TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
>   (TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
>   (TARGET_COMPARE_VERSION_PRIORITY): New implementation.
>   (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
>   (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
>   (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.

Nice.  Mostly LGTM, but some comments below.

> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> index 
> a3c4a81e8582ea4fd06b9518bf51fad7c998ddd6..cc935b502028392ebdc105f940900f01f79196a7
>  100644
> --- a/gcc/attribs.cc
> +++ b/gcc/attribs.cc
> @@ -657,7 +657,8 @@ decl_attributes (tree *node, tree attributes, int flags,
>   options to the attribute((target(...))) list.  */
>if (TREE_CODE (*node) == FUNCTION_DECL
>&& current_target_pragma
> -  && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
> +  && targetm.target_option.valid_attribute_p (*node,
> +   get_identifier("target"),
> current_target_pragma, 0))
>  {
>tree cur_attr = lookup_attribute ("target", attributes);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 9c3c0e705e2e6ea3b55b4a5f1e7d3360f91eb51d..ca0e2a2507ffdbf99e17b77240504bf2d175b9c0
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -19088,11 +19088,70 @@ aarch64_process_target_attr (tree args)
>return true;
>  }
>  
> +/* Parse the tree in ARGS that contains the targeti_version attribute
> +   information and update the global target options space.  */
> +
> +bool
> +aarch64_process_target_version_attr (tree args)
> +{
> +  if (TREE_CODE (args) == TREE_LIST)
> +{
> +  if (TREE_CHAIN (args))
> + {
> +   error ("attribute % has multiple values");
> +   return false;
> + }
> +  args = TREE_VALUE (args);
> +}
> +
> +  if (!args || TREE_CODE (args) != STRING_CST)
> +{
> +  error ("attribute % argument not a string");
> +  return false;

Re: [PATCH v2] c: don't emit -Wmissing-variable-declarations for register variables [PR110947]

2023-10-30 Thread Hamza Mahfooz

ping

On Fri, Sep 1 2023 at 03:02:41 PM -04:00:00, Hamza Mahfooz 
 wrote:

Resolves:
PR c/110947 - Should -Wmissing-variable-declarations not trigger on
register variables?

gcc/c/ChangeLog:

PR c/110947
* c-decl.cc (start_decl): don't emit
-Wmissing-variable-declarations for DECL_REGISTER VAR_DECLs.

gcc/testsuite/ChangeLog:

PR c/110947
* gcc.dg/pr110947.c: New test.

Signed-off-by: Hamza Mahfooz 
---
Please push this for me if you think it looks good. Since, I don't 
have

write access to the repository.

v2: put "target" before the relevant architectures in pr110947.c.
---
 gcc/c/c-decl.cc | 3 ++-
 gcc/testsuite/gcc.dg/pr110947.c | 4 
 2 files changed, 6 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr110947.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 1f9eb44dbaa..819af6aa050 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5376,7 +5376,8 @@ start_decl (struct c_declarator *declarator, 
struct c_declspecs *declspecs,

 warning (OPT_Wmain, "%q+D is usually a function", decl);

   if (warn_missing_variable_declarations && VAR_P (decl)
-  && !DECL_EXTERNAL (decl) && TREE_PUBLIC (decl) && old_decl == 
NULL_TREE)
+  && !DECL_EXTERNAL (decl) && !DECL_REGISTER (decl) && 
TREE_PUBLIC (decl)

+  && old_decl == NULL_TREE)
 warning_at (DECL_SOURCE_LOCATION (decl), 
OPT_Wmissing_variable_declarations,

"no previous declaration for %qD", decl);

diff --git a/gcc/testsuite/gcc.dg/pr110947.c 
b/gcc/testsuite/gcc.dg/pr110947.c

new file mode 100644
index 000..3c0b8a82ab3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr110947.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-Wmissing-variable-declarations" } */
+
+register unsigned long current_stack_pointer asm("rsp");
--
2.41.0






Re: [PATCH 1/4] RISC-V: Recategorize "prefetch" availabilities

2023-10-30 Thread Jeff Law




On 10/23/23 01:22, Tsukasa OI wrote:

From: Tsukasa OI 

Because they are for all prefetch instructions, "prefetch" fits better
than "prefetchi".
But there's  a significant difference between the cases.  prefetch.i in 
particular fetches into the icache.  While prefetch.r and prefetch.w 
would fetch into the data cache.


And I strongly suspect from an API standpoint we'll want to distinguish 
each case from the others.


Unless Kito feels otherwise I would suggest keeping a distinct API 
interface for each case.




Jeff


Re: [PATCH 0/2] RISC-V: Define not broken prefetch builtins

2023-10-30 Thread Jeff Law




On 10/22/23 21:55, Tsukasa OI wrote:


What I still don't understand is why we're dealing with a decomposed
address in the builtin, define_expand and/or define_insn.




Sorry, I misunderstood your intent (quite badly) possibly because I was
not familiar with the concept of "predicates" in GCC.
OK.  So you might want to read the machine description part of the GCC 
manual.  It describes operand predicates, operand constraints, insn 
conditions, the difference between define_insn vs define_expand and much 
more.





On 2023/08/29 6:20, Jeff Law wrote:

What I would suggest is making a new predicate that accepts either a
register or a register+offset where the offset fits in a signed 12 bit
immediate.  Use that for operand 0's predicate and I think this will
"just work" and cover all the cases supported by the prefetch.i instruction.


I misunderstood that as "just" adding the offset field to the
instructions and that's the reason I veered off the path so much.  So
instead, I'll answer your original question.

register+offset seems a problem for prefetch instructions because signed
12 bit immediate values need to be also a multiple of 32.  There's no
proper relocation type for this kind and I considered we have "very"
limited cases where making such predicate (as you suggested) will
*efficiently* work.

My opinion is, if we need very fine-grained control with prefetch
instructions, we'd better to use inline assembly.

I'll continue testing the possibilities of register+offset predicate
(including whether it works efficiently) and I'll temporarily withdraw
new built-in functions to focus on major issues before GCC 14:

1.  Remove completely broken __builtin_riscv_zicbop_prefetch_i and
2.  Fix an ICE when __builtin_prefetch is used with some constants.

I'll submit minimized patches only to fix those issues.  They will not
contain "register+offset" you suggested because of the difficulties
above but should be sufficient to fix imminent issues.

We should be able to describe this need quite easily.

Each operand has a predicate which the compiler tests to see if a 
particular RTL expression matches.  Some are very generic like 
"register_operand".  Others are target specific.  If you look in 
predicates.md you'll see a list of the predicates already defined for 
risc-v.  I'm pretty sure none of them will work for this case, but we 
can add a new one easily.


The operand in question is going to be a MEM with restrictions on its 
addressing mode.  Either REG or REG + aligned offset.


(define_predicate "prefetch_memory_operand"
  (match_code "mem")
{
  op = XEXP (op, 0);
  return (REG_P (op)
  || (GET_CODE (op) == PLUS
  && REG_P (XEXP (op, 0))
  && CONST_INT_P (XEXP (op, 1))
  && (INTVAL (XEXP (op, 1)) % 32) == 0);
}

[ Note that we did not declare "op".  It's provided by the generator and 
corresponds to the operand we're testing. ]


So you're going to want a define_expand for the basic prefetch

(define_expand "riscv_prefetch_r_"
  [(unspec_volatile:X [(match_operand:X 0 "memory_operand")]
   UNSPEC_PREFETCH_R)]
  "TARGET_ZICBOP"
{
  if (!prefetch_memory_operand (Pmode, operands[0])
XEXP (operands[0], 0) = force_reg (Pmode, XEXP (operands[0], 0);
}

The thing to know about a define_expand is that it's sole purpose is for 
RTL generation purposes.   We can use it as a place to adjust operands 
(as is done in this case), or emit additional RTL such as we do for 
SImode max on rv64 where we have to extend the incoming operands.


In our case we see if the memory address matches 
prefetch_memory_operand, and if not it'll force that address into a new 
register to create a (mem (reg)) object.




(define_insn "*riscv_prefetch_r_"
  [(unspec_volatile:X [(match_operand:X 0 "prefetch_memory_operand")]
   UNSPEC_PREFETCH_R)]
  "TARGET_ZICBOP"
  "prefetch.r\t%0"
  [(set_attr "type" "cbo")])

The define_insn construct maps an RTL template to assembly code with 
provisions for testing operands and such.


Anyway, hopefully that makes things clearer.


Jeff


Re: [PATCH] [x86_64]: Zhaoxin yongfeng enablement

2023-10-30 Thread Uros Bizjak
On Mon, Oct 30, 2023 at 10:08 AM Mayshao-oc  wrote:
>
> >On Fri, Oct 27, 2023 at 12:20 PM mayshao  wrote:
> >>
> >> On 2023/10/26 17:34, Uros Bizjak wrote:
> >> > On Wed, Oct 25, 2023 at 8:43 AM mayshao  wrote:
> >> >>
> >> >> Hi all:
> >> >>  This patch enables -march/-mtune=yongfeng, costs and tunings are 
> >> >> set according to the characteristics of the processor. We add a new md 
> >> >> file to describe yongfeng processor.
> >> >>
> >> >>  Bootstrapped /regtested X86_64.
> >> >>
> >> >>  Ok for trunk?
> >> >> BR
> >> >> Mayshao
> >> >> gcc/ChangeLog:
> >> >>
> >> >>  * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize 
> >> >> yongfeng.
> >> >>  * common/config/i386/i386-common.cc: Add yongfeng.
> >> >>  * common/config/i386/i386-cpuinfo.h (enum processor_subtypes): 
> >> >> Add ZHAOXIN_FAM7H_YONGFENG.
> >> >>  * config.gcc: Add yongfeng.
> >> >>  * config/i386/driver-i386.cc (host_detect_local_cpu): Let 
> >> >> -march=native
> >> >>  recognize yongfeng processors.
> >> >>  * config/i386/i386-c.cc (ix86_target_macros_internal): Add 
> >> >> yongfeng.
> >> >>  * config/i386/i386-options.cc (m_YONGFENG): New definition.
> >> >>  (m_ZHAOXIN): Ditto.
> >> >>  * config/i386/i386.h (enum processor_type): Add 
> >> >> PROCESSOR_YONGFENG.
> >> >>  * config/i386/i386.md: Add yongfeng.
> >> >>  * config/i386/lujiazui.md: Fix typo.
> >> >>  * config/i386/x86-tune-costs.h (struct processor_costs): Add 
> >> >> yongfeng costs.
> >> >>  * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add 
> >> >> yongfeng.
> >> >>  (ix86_adjust_cost): Ditto.
> >> >>  * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Replace 
> >> >> m_LUJIAZUI by m_ZHAOXIN.
> >> >>  (X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto.
> >> >>  (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto.
> >> >>  (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto.
> >> >>  (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto.
> >> >>  (X86_TUNE_MOVX): Ditto.
> >> >>  (X86_TUNE_MEMORY_MISMATCH_STALL): Ditto.
> >> >>  (X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto.
> >> >>  (X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto.
> >> >>  (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto.
> >> >>  (X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto.
> >> >>  (X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto.
> >> >>  (X86_TUNE_USE_LEAVE): Ditto.
> >> >>  (X86_TUNE_PUSH_MEMORY): Ditto.
> >> >>  (X86_TUNE_LCP_STALL): Ditto.
> >> >>  (X86_TUNE_INTEGER_DFMODE_MOVES): Ditto.
> >> >>  (X86_TUNE_OPT_AGU): Ditto.
> >> >>  (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto.
> >> >>  (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto.
> >> >>  (X86_TUNE_USE_SAHF): Ditto.
> >> >>  (X86_TUNE_USE_BT): Ditto.
> >> >>  (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto.
> >> >>  (X86_TUNE_ONE_IF_CONV_INSN): Ditto.
> >> >>  (X86_TUNE_AVOID_MFENCE): Ditto.
> >> >>  (X86_TUNE_EXPAND_ABS): Ditto.
> >> >>  (X86_TUNE_USE_SIMODE_FIOP): Ditto.
> >> >>  (X86_TUNE_USE_FFREEP): Ditto.
> >> >>  (X86_TUNE_EXT_80387_CONSTANTS): Ditto.
> >> >>  (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto.
> >> >>  (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto.
> >> >>  (X86_TUNE_SSE_TYPELESS_STORES): Ditto.
> >> >>  (X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto.
> >> >>  (X86_TUNE_USE_GATHER_2PARTS): Add m_YONGFENG.
> >> >>  (X86_TUNE_USE_GATHER_4PARTS): Ditto.
> >> >>  (X86_TUNE_USE_GATHER_8PARTS): Ditto.
> >> >>  (X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
> >> >>  * doc/extend.texi: Add details about yongfeng.
> >> >>  * doc/invoke.texi: Ditto.
> >> >>  * config/i386/yongfeng.md: New file for decribing yongfeng 
> >> >> processor.
> >> >>
> >> >> gcc/testsuite/ChangeLog:
> >> >>
> >> >>  * g++.target/i386/mv32.C: Handle new march.
> >> >>  * gcc.target/i386/funcspec-56.inc: Ditto.
> >> >
> >> > LGTM.
> >> >
> >> > There are a couple of comments that needs to be fixed, please see inline.
> >> >
> >> > BTW: A couple of days ago, I have added a new tunung flag [1]. I
> >> > considered Zhaoxin cores a modern core, but please review the new
> >> > flag anyway.
> >> >
> >> > [1]
> >> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634280.html
> >> >
> >> > Thanks,
> >> > Uros.
> >> >
> >> Hi Uros:
> >>Thanks for your review. I have fix the errors that you comment,
> >> please review the attached patch again.
> >>I have review the new tuning flag[1]. When a write of 64 bits
> >> or less is followed by a read of a smaller size which is fully
> >> contained in the write address range, regardless of alignement,
> >> Zhaoxin processors will do store forwarding.
> >
> >The patch is OK.
> >
> >Thanks,
> >

Re: [PATCH] genemit: Split insn-emit.cc into ten files.

2023-10-30 Thread Jeff Law




On 10/27/23 13:04, Robin Dapp wrote:

After working with Sam off-list (thanks) I managed to get hppa to
build.  Initially it looked as if hppa just had a very small number of
instruction patterns so we wouldn't generate all 10 output files.
However, the actual issue (which we will only hit with a low
pattern count) was with counting all the patterns vs only counting
the patterns that will be output.  A wrong pattern count lead to
prematurely stopping to write output files.

With that corrected, hppa "just works" until I hit linker errors
due to relocations - most likely unrelated:

bin/ld: unwind-dw2-fde-dip_s.o(.data.rel.ro+0): cannot handle
R_PARISC_FPTR64 for __pthread_key_create@@GLIBC_2.34

Attached is v3 that has been bootstrapped and tested on x86 and power10,
aarch64 bootstrap was ok, testsuite is still running.  A riscv build and
testsuite run was successful as well.

Regards
  Robin

 From 248744c328440bff9cc339d2bf622852cbaac343 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Thu, 12 Oct 2023 11:23:26 +0200
Subject: [PATCH v3] genemit: Split insn-emit.cc into several partitions.

On riscv insn-emit.cc has grown to over 1.2 mio lines of code and
compiling it takes considerable time.
Therefore, this patch adjust genemit to create several partitions
(insn-emit-1.cc to insn-emit-n.cc).  The available patterns are
written to the given files in a sequential fashion.

Similar to match.pd a configure option --with-emitinsn-partitions=num
is introduced that makes the number of partition configurable.

gcc/ChangeLog:

PR bootstrap/84402
PR target/111600

* Makefile.in: Handle split insn-emit.cc.
* configure: Regenerate.
* configure.ac: Add --with-insnemit-partitions.
* genemit.cc (output_peephole2_scratches): Print to file instead
of stdout.
(print_code): Ditto.
(gen_rtx_scratch): Ditto.
(gen_exp): Ditto.
(gen_emit_seq): Ditto.
(emit_c_code): Ditto.
(gen_insn): Ditto.
(gen_expand): Ditto.
(gen_split): Ditto.
(output_add_clobbers): Ditto.
(output_added_clobbers_hard_reg_p): Ditto.
(print_overload_arguments): Ditto.
(print_overload_test): Ditto.
(handle_overloaded_code_for): Ditto.
(handle_overloaded_gen): Ditto.
(print_header): New function.
(handle_arg): New function.
(main): Split output into 10 files.
* gensupport.cc (count_patterns): New function.
* gensupport.h (count_patterns): Define.
* read-md.cc (md_reader::print_md_ptr_loc): Add file argument.
* read-md.h (class md_reader): Change definition.
Just one note on testing.  I threw this into my tester which ran through 
its usual set of crosses as well as native emulated builds of alpha, 
hppa, m68k, sh4, sh4eb, riscv, aarch64.  s390x and ppc64le are in 
progress, and have progressed beyond their build phase.   Note that the 
emulated natives other than risc-v are 3-stage bootstrapped.


OK for the trunk.  Thanks for taking care of this.  I guess I'll need to 
time a risc-v bootstrap again.  It's currently using --disable-bootstrap 
at configure time in my tester.



jeff


Re: [RFC] RISC-V: Support -mcmodel=large.

2023-10-30 Thread Jeff Law




On 10/25/23 19:49, KuanLin Chen wrote:

This is a RFC patch for large code model implementation.

gcc/ChangeLog:
* gcc/config/riscv/predicates.md(move_operand): Check SYMBOL_REF
and LABEL_REF type.
(call_insn_operand): Support for CM_Large.
(pcrel_symbol_operand): New.
* gcc/config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Add builtin_define
"__riscv_cmodel_large".
* gcc/config/riscv/riscv-opts.h (riscv_code_model): Define CM_LARGE.
* gcc/config/riscv/riscv-protos.h (riscv_symbol_type): Define
SYMBOL_FORCE_TO_MEM.
(riscv_asm_output_pool_epilogue): New.
* gcc/config/riscv/riscv.cc (riscv_classify_symbol) Support CM_LARGE model.
(riscv_symbol_insns) Add SYMBOL_FORCE_TO_MEM.
(riscv_cannot_force_const_mem): Ditto.
(riscv_split_symbol): Ditto.
(riscv_force_address): Check pseudo reg available before force_reg.
(riscv_size_ok_for_small_data_p): Disable in CM_LARGE model.
(riscv_can_use_per_function_literal_pools_p): New.
(riscv_asm_output_pool_epilogue): New. Hook ASM_OUTPUT_POOL_EPILOGUE.
(riscv_output_mi_thunk): Add riscv_in_thunk_func.
(riscv_option_override): Support CM_LARGE model.
(riscv_function_ok_for_sibcall): Disable sibcalls in CM_LARGE model.
* gcc/config/riscv/riscv.h (ASM_OUTPUT_POOL_EPILOGUE): Hookfg
* gcc/config/riscv/riscv.md (unspec): Define UNSPEC_FORCE_FOR_MEM.
(*large_load_address"): New.
* gcc/config/riscv/riscv.opt (code_model): New.

gcc/testsuite/ChangeLog:

   * gcc/testsuite/gcc.target/riscv/large-model.c: New test.
First, thank you so much for tackling this.  It's one of the many 
missing components that we need to round out the implementation.






0001-RISC-V-Support-mcmodel-large.patch

 From b09ba36220db1dbce3b1934685b1783125b5cb66 Mon Sep 17 00:00:00 2001
From: Kuan-Lin Chen
Date: Sun, 18 Feb 2018 20:19:49 +0800
Subject: [PATCH] RISC-V: Support -mcmodel=large.

---
  gcc/config/riscv/predicates.md   | 23 +-
  gcc/config/riscv/riscv-c.cc  |  4 +
  gcc/config/riscv/riscv-opts.h|  1 +
  gcc/config/riscv/riscv-protos.h  |  4 +-
  gcc/config/riscv/riscv.cc| 77 +++-
  gcc/config/riscv/riscv.h |  2 +
  gcc/config/riscv/riscv.md|  9 +++
  gcc/config/riscv/riscv.opt   |  3 +
  gcc/testsuite/gcc.target/riscv/large-model.c | 11 +++
  9 files changed, 127 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/large-model.c

@@ -312,9 +313,15 @@
  })
  
  (define_predicate "call_insn_operand"

-  (ior (match_operand 0 "absolute_symbolic_operand")
-   (match_operand 0 "plt_symbolic_operand")
-   (match_operand 0 "register_operand")))
+  (match_operand 0 "general_operand")
+{
+  if (riscv_cmodel == CM_LARGE)
+return register_operand (op, mode);
+  else
+return (absolute_symbolic_operand (op, mode) ||
+   plt_symbolic_operand (op, mode) ||
+   register_operand (op, mode));
+})
Formatting nit.  When wrapping a long line, bring the operator down to 
the next line, indented just beyond the open paren.  Like this:


return (absolute_symbolic_oeprand (op, mode)
|| plt_symbolic_operand (op, mode)
|| register_operand (op, mode);


Also make sure to use a tab when indenting something 8 spaces.  It's an 
annoyance, but it's the standard way things are formatted in GCC.  THere 
are some scripts in the contrib subdirectory which can help find 
formatting problems, though I'm not sure they work on .md files.



@@ -1972,7 +1992,19 @@ static rtx
  riscv_force_address (rtx x, machine_mode mode)
  {
if (!riscv_legitimate_address_p (mode, x, false))
-x = force_reg (Pmode, x);
+{
+  if (can_create_pseudo_p ())
+   return force_reg (Pmode, x);
Note that $ra is fixed now.  So if you need a scratch register, you can 
fall back to $ra.


More importantly, what are the circumstances where you can be asked to 
force an address after the register allocation/reloading phase is 
complete?  Or does it happen within the register allocators (the latter 
would be an indicator we need a secondary reload).




@@ -5665,6 +5697,9 @@ riscv_size_ok_for_small_data_p (int size)
  static bool
  riscv_in_small_data_p (const_tree x)
  {
+  if (riscv_cmodel == CM_LARGE)
+return false;
+
if (TREE_CODE (x) == STRING_CST || TREE_CODE (x) == FUNCTION_DECL)
  return false;
How does large code model impact our ability to access small data 
through $gp?  Aren't they independent?



+void
+riscv_asm_output_pool_epilogue (FILE *f, const char *, tree,
+   HOST_WIDE_INT offset)
+{
+  /* When using per-function literal pools, we must ensure that any code
+ section is aligned to the minimal instruction length, lest we get
+ errors from the assembler re "unaligned instructions".  */
+  if ((offset & 3) && riscv_can_use_per_function_literal_pools_p ())
+ASM_OUTPUT_ALIGN (f, 2);
+}
So the comment implies you're aligning the section.  If tha

Re: [RFC PATCH v1] c: Do not warn about external declaration following inline definition

2023-10-30 Thread Joseph Myers
On Mon, 30 Oct 2023, Barnabás Pőcze wrote:

> Hi
> 
> 
> 2023. október 30., hétfő 19:01 keltezéssel, Joseph Myers írta:
> 
> > On Sat, 28 Oct 2023, Barnabás Pőcze wrote:
> > 
> > > An external declaration following an inline definition is not redundant
> > > because it forces the compiler to emit an external definition for the 
> > > function.
> > > That is,
> > > 
> > > inline void f(void) { }
> > > [extern] void f(void);
> > > 
> > > should not trigger the
> > > 
> > > redundant redeclaration of ...
> > > 
> > > warning.
> > 
> > 
> > This should add a testcase to the testsuite (that fails before and passes
> > after the front-end change is made).
> 
> I did not want to commit more effort until I have some feedback.
> I will most certainly add a test case if it turns out that the change
> seems reasonable and has a chance of being accepted.

I agree that such a declaration is not redundant, and indeed serves a 
useful purpose, and so it's appropriate to avoid the warning in that case.  
Maybe also edit the documentation in invoke.texi to mention this case as 
not being diagnosed because not redundant.

Hopefully cases such as

inline void f(void) { }
void f(void);
void f(void);

do warn for the final declaration, because that one *is* redundant.  
Similarly, the changes should not affect warnings in the -fgnu89-inline 
case, because then a subsequent extern declaration has no effect on a 
prior inline definition.  Tests should include all these variations.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 2/3] MATCH: Move jump_function_from_stmt support to match.pd

2023-10-30 Thread Andrew Pinski
On Mon, Oct 30, 2023 at 2:29 AM Richard Biener
 wrote:
>
> On Sun, Oct 29, 2023 at 5:41 PM Andrew Pinski  wrote:
> >
> > This moves the value_replacement support for jump_function_from_stmt
> > to match pattern.
> > This allows us to optimize things earlier in phiopt1 rather than waiting
> > to phiopt2. Which means phiopt1 needs to be disable for vrp03.c testcase.
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
>
> Do we need to make sure to only do this after pass_early_object_sizes
> at least?  IIRC early PHI-opt didn't do value-replacement, so maybe
> even after late object-size?  There's PROP_objsz, but no
> function similar to optimize_vectors_before_lowering_p in
> {generic,gimple}-match-head.cc

Let me look into that.
But I suspect any which way we might end up with the same issue as the
problems you found in PR 112266 really.
So I am going to put this patch on the backburner for now (but still
look into this and the fall out from PR 112266 ).

Thanks,
Andrew

>
> Richard.
>
> > gcc/ChangeLog:
> >
> > * match.pd (PTR == 0 ? 0 : &PTR->field): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/vrp03.c: Disable phiopt1.
> > * c-c++-common/analyzer/inlining-3-multiline.c: Likewise.
> > * c-c++-common/analyzer/inlining-3.c: Likewise.
> > * gcc.dg/tree-ssa/phi-opt-value-3.c: New testcase.
> > ---
> >  gcc/match.pd  | 21 ++
> >  .../analyzer/inlining-3-multiline.c   |  5 -
> >  .../c-c++-common/analyzer/inlining-3.c|  3 +++
> >  .../gcc.dg/tree-ssa/phi-opt-value-3.c | 22 +++
> >  gcc/testsuite/gcc.dg/tree-ssa/vrp03.c |  2 +-
> >  5 files changed, 51 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-value-3.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 22899c51a2f..9bc945ccada 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -4159,6 +4159,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(cond (eq @0 integer_zerop) @1 (op@2 @1 @0))
> > @2))
> >
> > +/* PTR == 0 ? 0 : &PTR->field -> PTR if field offset was 0. */
> > +(simplify
> > + (cond (eq @0 integer_zerop) integer_zerop ADDR_EXPR@1)
> > + (with {
> > +   poly_int64 offset;
> > +   tree res = NULL_TREE;
> > +   tree tem = @1;
> > +   if (TREE_CODE (tem) == SSA_NAME)
> > + if (gassign *def = dyn_cast  (SSA_NAME_DEF_STMT (tem)))
> > +   if (gimple_assign_rhs_code (def) == ADDR_EXPR)
> > + tem = gimple_assign_rhs1 (def);
> > +
> > +   if (TREE_CODE (tem) == ADDR_EXPR)
> > + res = get_addr_base_and_unit_offset (TREE_OPERAND (tem, 0), &offset);
> > +  }
> > +  (if (res
> > +   && TREE_CODE (res) == MEM_REF
> > +   && known_eq (mem_ref_offset (res) + offset, 0)
> > +   && operand_equal_p (TREE_OPERAND (res, 0), @0))
> > +   (convert @0
> > +
> >  /* Simplifications of shift and rotates.  */
> >
> >  (for rotate (lrotate rrotate)
> > diff --git a/gcc/testsuite/c-c++-common/analyzer/inlining-3-multiline.c 
> > b/gcc/testsuite/c-c++-common/analyzer/inlining-3-multiline.c
> > index fbd20e949b6..9741b91abee 100644
> > --- a/gcc/testsuite/c-c++-common/analyzer/inlining-3-multiline.c
> > +++ b/gcc/testsuite/c-c++-common/analyzer/inlining-3-multiline.c
> > @@ -3,6 +3,9 @@
> >
> >  /* { dg-additional-options "-O2 -fdiagnostics-show-path-depths" } */
> >  /* { dg-additional-options "-fdiagnostics-path-format=inline-events 
> > -fdiagnostics-show-caret" } */
> > +/* Disable phi-opt1 because get_input_file_name gets optimized to just
> > +   `return inpf;`. */
> > +/* { dg-additional-options "-fdisable-tree-phiopt1" } */
> >
> >  #include "../../gcc.dg/analyzer/analyzer-decls.h"
> >  typedef __SIZE_TYPE__ size_t;
> > @@ -96,4 +99,4 @@ test (const input_file *inpf)
> >  |   (4) ...to here
> >  |   (5) argument 1 ('') NULL where 
> > non-null expected
> >  |
> > -   { dg-end-multiline-output "" { target c++ } } */
> > \ No newline at end of file
> > +   { dg-end-multiline-output "" { target c++ } } */
> > diff --git a/gcc/testsuite/c-c++-common/analyzer/inlining-3.c 
> > b/gcc/testsuite/c-c++-common/analyzer/inlining-3.c
> > index 0345585bed2..2b2b4858d45 100644
> > --- a/gcc/testsuite/c-c++-common/analyzer/inlining-3.c
> > +++ b/gcc/testsuite/c-c++-common/analyzer/inlining-3.c
> > @@ -2,6 +2,9 @@
> > after early inlining.  */
> >
> >  /* { dg-additional-options "-O2 -fdiagnostics-show-path-depths" } */
> > +/* Disable phi-opt1 because get_input_file_name gets optimized to just
> > +   `return inpf;`. */
> > +/* { dg-additional-options "-fdisable-tree-phiopt1" } */
> >
> >  #include "../../gcc.dg/analyzer/analyzer-decls.h"
> >  typedef __SIZE_TYPE__ size_t;
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-value-3.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-value-3.c
> > new file mode 100644
> > index 000..ad55bd288b9
> > --- /

Re: [PATCH v3] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-30 Thread Jeff Law




On 10/29/23 21:21, Vineet Gupta wrote:

RV64 compare and branch instructions only support 64-bit operands.
At Expand time, the backend conservatively zero/sign extends
its operands even if not needed, such as incoming 32-bit function args
which ABI/ISA guarantee to be sign-extended already.

And subsequently REE fails to eliminate them as
"missing defintion(s)" or "multiple definition(s)
since function args don't have explicit definition.

So during expand riscv_extend_comparands (), if an operand is a
subreg-promoted SI with inner DI, which is representative of a function
arg, just peel away the subreg to expose the DI, eliding the sign
extension. As Jeff noted this routine is also used in if-conversion so
also helps there.

Note there's currently patches floating around to improve REE and also a
new pass to eliminate unneccesary extensions, but it is still beneficial
to not generate those extra extensions in first place. It is obviously
less work for post-reload passes such as REE, but even for earlier
passes, such as combine, having to deal with one less thing and ensuing
fewer combinations is a win too.

Way too many existing tests used to observe this issue.
e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc
It elimiates the SEXT.W

Tested with rv64gc with no regressions, I'm relying on PAtrick's
pre-commit CI to do the full testing.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New.
* (riscv_extend_comparands): Call New function on operands.

Signed-off-by: Vineet Gupta 
---
Changes since v2:
   - Fix linting issues flagged by pre-commit CI
Changes since v1:
   - Elide sign extension for 32-bit operarnds only
   - Apply elison for both arguments
---
  gcc/config/riscv/riscv.cc | 23 +--
  1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ca9a2ca81d53..269beb3b159b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3678,6 +3678,24 @@ riscv_zero_if_equal (rtx cmp0, rtx cmp1)
   cmp0, cmp1, 0, 0, OPTAB_DIRECT);
  }
  
+/* Helper function for riscv_extend_comparands to Sign-extend the OP.

+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI) 0
+   just peel off the SUBREG to get DI, avoiding extraneous extension.  */
+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_MODE (*op) == SImode
+  && GET_CODE (*op) == SUBREG
+  && SUBREG_PROMOTED_VAR_P (*op)
+  && GET_MODE_SIZE (GET_MODE (XEXP (*op, 0))).to_constant ()
+== GET_MODE_SIZE (word_mode))
+*op = XEXP (*op, 0);
+  else
+*op = gen_rtx_SIGN_EXTEND (word_mode, *op);
So for the wrapped test GET_MODE_SIZE stuff), add parenthesis and indent 
the "==" clause.  ie


  && (GET_MODE_SIZE (GET_MODE (XEXP (*op), 0))).to_constant ()
  == GET_MODE_SIZE (word_mode))

Don't you also need to verify that the subreg was sign extended?  The 
PROMOTED_VAR_P just notes that it was promoted, not *how* it was 
promoted.  I think you just need to add a test like this:


  && SUBREG_PROMOTED_SIGNED_P (*op)


I don't guess you have data on how this impacts dynamic instruction 
counts on anything significant do you?


OK with the formatting nit fixed and adding the additional check to 
ensure the value was sign extended.



jeff


Re: [PATCH] Testsuite, i386: Fix test by passing -march

2023-10-30 Thread FX Coudert
> Well It can fail on x86_64-linux-gnu too if GCC was configured with
> --with-arch=core2 for an example.
> So having it, in this case, not being darwin specific would be
> beneficial for all x86_64/i?86 targets.

I pushed it as-is, meaning it will indeed apply to all x86_64/i?86 targets.

FX


Re: [RFC PATCH v1] c: Do not warn about external declaration following inline definition

2023-10-30 Thread Barnabás Pőcze
Hi


2023. október 30., hétfő 19:01 keltezéssel, Joseph Myers írta:

> On Sat, 28 Oct 2023, Barnabás Pőcze wrote:
> 
> > An external declaration following an inline definition is not redundant
> > because it forces the compiler to emit an external definition for the 
> > function.
> > That is,
> > 
> > inline void f(void) { }
> > [extern] void f(void);
> > 
> > should not trigger the
> > 
> > redundant redeclaration of ...
> > 
> > warning.
> 
> 
> This should add a testcase to the testsuite (that fails before and passes
> after the front-end change is made).

I did not want to commit more effort until I have some feedback.
I will most certainly add a test case if it turns out that the change
seems reasonable and has a chance of being accepted.


Regards,
Barnabás Pőcze


[PATCH v5] bpf: Improvements in CO-RE builtins implementation.

2023-10-30 Thread Cupertino Miranda

Hi everyone,

Please find a new version for the review as inline attachment.

Best regards,
Cupertino


Changes from v4:
 - Implemented TARGET_DELEGITIMIZE_ADDRESS target hook as the proper
 solution to the the warning for UNSPEC_CORE_RELOC being
 non-delegitimize.

commit 5b45d225c473827b5ef7001e5b24df74d27953ff
Author: Cupertino Miranda 
Date:   Tue Aug 8 09:22:41 2023 +0100

bpf: Improvements in CO-RE builtins implementation.

This patch moved the processing of attribute preserve_access_index to
its own independent pass in a gimple lowering pass.
This approach is more consistent with the implementation of the CO-RE
builtins when used explicitly in the code.  The attributed type accesses
are now early converted to __builtin_core_reloc builtin instead of being
kept as an expression in code through out all of the middle-end.
This disables the compiler to optimize out or manipulate the expression
using the local defined type, instead of assuming nothing is known about
this expression, as it should be the case in all of the CO-RE
relocations.

In the process, also the __builtin_preserve_access_index has been
improved to generate code for more complex expressions that would
require more then one CO-RE relocation.
This turned out to be a requirement, since bpf-next selftests would rely on
loop unrolling in order to convert an undefined index array access into a
defined one. This seemed extreme to expect for the unroll to happen, and for
that reason GCC still generates correct code in such scenarios, even when index
access is never predictable or unrolling does not occur.

gcc/ChangeLog:
* config/bpf/bpf-passes.def (pass_lower_bpf_core): Added pass.
* config/bpf/bpf-protos.h: Added prototype for new pass.
* config/bpf/bpf.cc (bpf_delegitimize_address): New function.
* config/bpf/bpf.md (mov_reloc_core): Prefixed
name with '*'.
* config/bpf/core-builtins.cc (cr_builtins) Added access_node to
struct.
(is_attr_preserve_access): Improved check.
(core_field_info): Make use of root_for_core_field_info
function.
(process_field_expr): Adapted to new functions.
(pack_type): Small improvement.
(bpf_handle_plugin_finish_type): Adapted to GTY(()).
(bpf_init_core_builtins): Changed to new function names.
(construct_builtin_core_reloc): Improved implementation.
(bpf_resolve_overloaded_core_builtin): Changed how
__builtin_preserve_access_index is converted.
(compute_field_expr): Corrected implementation. Added
access_node argument.
(bpf_core_get_index): Added valid argument.
(root_for_core_field_info, pack_field_expr)
(core_expr_with_field_expr_plus_base, make_core_safe_access_index)
(replace_core_access_index_comp_expr, maybe_get_base_for_field_expr)
(core_access_clean, core_is_access_index, core_mark_as_access_index)
(make_gimple_core_safe_access_index, execute_lower_bpf_core)
(make_pass_lower_bpf_core): Added functions.
(pass_data_lower_bpf_core): New pass struct.
(pass_lower_bpf_core): New gimple_opt_pass class.
(pack_field_expr_for_preserve_field)
(bpf_replace_core_move_operands): Removed function.
(bpf_enum_value_kind): Added GTY(()).
* config/bpf/core-builtins.h (bpf_field_info_kind, bpf_type_id_kind)
(bpf_type_info_kind, bpf_enum_value_kind): New enum.
* config/bpf/t-bpf: Added pass bpf-passes.def to PASSES_EXTRA.

gcc/testsuite/ChangeLog:
* gcc.target/bpf/core-attr-5.c: New test.
* gcc.target/bpf/core-attr-6.c: New test.
* gcc.target/bpf/core-builtin-1.c: Corrected
* gcc.target/bpf/core-builtin-enumvalue-opt.c: Corrected regular
expression.
* gcc.target/bpf/core-builtin-enumvalue.c: Corrected regular
expression.
* gcc.target/bpf/core-builtin-exprlist-1.c: New test.
* gcc.target/bpf/core-builtin-exprlist-2.c: New test.
* gcc.target/bpf/core-builtin-exprlist-3.c: New test.
* gcc.target/bpf/core-builtin-exprlist-4.c: New test.
* gcc.target/bpf/core-builtin-fieldinfo-offset-1.c: Extra tests

diff --git a/gcc/config/bpf/bpf-passes.def b/gcc/config/bpf/bpf-passes.def
new file mode 100644
index ..0ec20eac965d
--- /dev/null
+++ b/gcc/config/bpf/bpf-passes.def
@@ -0,0 +1,20 @@
+/* Declaration of target-specific passes for eBPF.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Fou

Re: [PATCH] RISC-V: Add vector fmin/fmax expanders.

2023-10-30 Thread Robin Dapp
> Aren't they actually the IEEE 754-2019 operations (with different 
> signaling NaN semantics; C functions such as fmaximum in C23), not the 
> IEEE 754-2008 operations (C functions such as fmax)?  V spec 1.0 says "The 
> vector floating-point vfmin and vfmax instructions have the same behavior 
> as the corresponding scalar floating-point instructions in version 2.2 of 
> the RISC-V F/D/Q extension.".  And version 2.2 of F/D/Q (which is *not* 
> version 2.2 of the instruction set, it's later than that) changed the 

Oh, thanks for catching this - I indeed incorrectly assumed this refers to
version 2.2 of the RISC-V spec (which contains F/D, .. of version 2.0).
Too bad, it appeared too convenient.  Then I need to add the same
!HONOR_SNANS to all the expanders as well as the tests.

Regards
 Robin


Re: [PATCH 2/4] [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

2023-10-30 Thread Jeff Law




On 10/30/23 01:25, Fei Gao wrote:

Conditional add, if zero
rd = (rc == 0) ? (rs1 + rs2) : rs1
-->
czero.nez rd, rs2, rc
add rd, rs1, rd

Conditional add, if non-zero
rd = (rc != 0) ? (rs1 + rs2) : rs1
-->
czero.eqz rd, rs2, rc
add rd, rs1, rd

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

 * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
 (noce_try_cond_zero_arith): handler for condtional zero op
 (noce_process_if_block): add noce_try_cond_zero_arith with hook control

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
---
  gcc/ifcvt.cc  | 112 +++
  .../gcc.target/riscv/zicond_ifcvt_opt.c   | 130 ++
  2 files changed, 242 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index a0af553b9ff..4f98c1c7bf9 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
+static rtx
+noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, rtx 
z, rtx target)
+{
+  machine_mode mode = GET_MODE (target);
+  rtx cond_op0 = XEXP (if_info->cond, 0);
+  rtx czero_cond
+= gen_rtx_fmt_ee (czero_code, GET_MODE (cond_op0), cond_op0, const0_rtx);
+  rtx if_then_else = gen_rtx_IF_THEN_ELSE (mode, czero_cond, const0_rtx, z);
+  rtx set = gen_rtx_SET (target, if_then_else);
+
+  start_sequence ();
+  rtx_insn *insn = emit_insn (set);
+
+  if (recog_memoized (insn) >= 0)
+{
+  rtx_insn *seq = get_insns ();
+  end_sequence ();
+  emit_insn (seq);
+
+  return target;
+}
+
+  end_sequence ();
+  return NULL_RTX;
+}
So just a few notes to further illustrate why I'm currently looking to 
take the VRULL+Ventana implementation.  The code above would be much 
better handled by just calling noce_emit_cmove.  noce_emit_cmove will go 
through the conditional move expander.  So any improvement we make in 
the expander "just work" when called from the if-converter.

+
  /* Try only simple constants and registers here.  More complex cases
 are handled in noce_try_cmove_arith after noce_try_store_flag_arith
 has had a go at it.  */
@@ -2880,6 +2908,88 @@ noce_try_sign_mask (struct noce_if_info *if_info)
return true;
  }
  
+/* Convert x = c ? y + z : y or x = c ? y : y + z. */

+
+static bool
+noce_try_cond_zero_arith (struct noce_if_info *if_info)
+{
+  rtx target;
+  rtx_insn *seq;
+  machine_mode mode = GET_MODE (if_info->x);
+  rtx common = NULL_RTX;
+  enum rtx_code czero_code = UNKNOWN;
+  rtx a = if_info->a;
+  rtx b = if_info->b;
+  rtx z = NULL_RTX;
+  rtx cond = if_info->cond;
+
+  if (!noce_simple_bbs (if_info))
+return false;

[ ... ]
So the internal code we have does a bit of canonicalization before the 
optimizing transformations.  In particular we may be presented with


(a == 0) ? b : a which we transform into (a != 0 ? a : b) which allows 
us to pick up more cases.  (b != 0 ? b : a) gets similar handling.


As I mentioned earlier, the VRULL+Ventana code handles wrapping 
extensions & subregs.  Our code also handles if-converting shifts/rotates.


Hopefully that explains a bit more why I think cleaning up the 
VRULL+Ventana code is a better choice.


jeff


Re: [committed][_GLIBCXX_INLINE_VERSION] Fix constract violation

2023-10-30 Thread Jonathan Wakely
On Mon, 30 Oct 2023, 18:31 François Dumont,  wrote:

>
> On 30/10/2023 14:45, Jonathan Wakely wrote:
> > On Sun, 29 Oct 2023 at 21:11, François Dumont 
> wrote:
> >> This fixes handle_contract_violation under versioned namespace mode.
> >>
> >> Tested under Linux x64 and confirmed to also fix Darwin build.
> >>
> >> libstdc++: [_GLIBCXX_INLINE_VERSION] Provide handle_contract_violation
> >> symbol
> >>
> >> libstdc++-v3/ChangeLog:
> >>
> >>   * src/experimental/contract.cc
> >>   [_GLIBCXX_INLINE_VERSION](handle_contract_violation): Provide
> >> symbol
> >>   without version namespace decoration for gcc.
> >> +#if _GLIBCXX_INLINE_VERSION
> >> +// Provide symbol without version namespace decoration for gcc.
> > For the comment in the code, I think this would be better:
> >
> > // The compiler expects the contract_violation class to be in an
> unversioned
> > // namespace, so provide a forwarding function with the expected symbol
> name.
> Sure, I'll update it.
> > Do we want the forwarding function to be a weak symbol? The main
> > handler function is weak because we want users to be able to override
> > it with their own handler. But for this new forwarding function, they
> > can't even declare it (because it has a reserved name that doesn't
> > demangle to a valid type for the versioned namespace build).
> >
> Good point, I see no reason neither so I'll remove it.
>

Thanks, looks good for trunk (and gcc-13 maybe?) with that change.

>


Re: [committed][_GLIBCXX_INLINE_VERSION] Add emul TLS symbol exports

2023-10-30 Thread Jonathan Wakely
On Mon, 30 Oct 2023, 18:07 François Dumont,  wrote:

>
> On 30/10/2023 14:58, Jonathan Wakely wrote:
> > On Sun, 29 Oct 2023 at 21:25, François Dumont 
> wrote:
> >> libstdc++: [_GLIBCXX_INLINE_VERSION] Add emul TLS symbols
> >>
> >> libstdc++-v3/ChangeLog:
> >>
> >>   * config/abi/pre/gnu-versioned-namespace.ver: Add missing emul TLS
> >>   symbols.
> >
> > Please put a comment above the two new lines, the same as in gnu.ver:
> >
> > # targets using emutls
> >
> > OK with that change, thanks.
>
> It's already committed as it was a trivial change limited to the
> versioned namespace special mode for which I'm maintainer.
>
> Can you confirm you want a new commit just to add this comment ?
>

Yes please

>


Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-10-30 Thread Iain Sandoe
Hi Folks

> On 30 Oct 2023, at 16:31, FX Coudert  wrote:
> 
>> +enable_darwin_at_rpath_$1=no
> 
> I actually don’t understand why this one would have $1 in the name, unlike 
> all other regenerated configure files. What value do we expect for $1 at this 
> point in the file? That’s just plain weird.

I’ve committed the missing hunk - at least that should appease CI.

Agreed, it is weird, (actually, I’ve never quite understood why fixincludes 
wants libtool.m4 given that it is host-side and not building any libraries) ..

Iain



Re: [PATCH] Testsuite, i386: Fix test by passing -march

2023-10-30 Thread Andrew Pinski
On Mon, Oct 30, 2023 at 5:05 AM Iain Sandoe  wrote:
>
>
>
> > On 30 Oct 2023, at 11:53, FX Coudert  wrote:
>
> > The newly introduced test gcc.target/i386/pr111698.c currently fails on 
> > Darwin, where the default arch is core2.
> > Andrew suggested in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112287 to 
> > pass a recent value to -march, and I can confirm that it fixes the 
> > testsuite failure on x86_64-apple-darwin21.
> >
> > OK to push?
>
> Fine from a Darwin perspective,
> we could also make it ...
> dg-additional-options “ -march=sandybridge” { target *-*-darwin* }
> … if that is deemed less invasive.

Well It can fail on x86_64-linux-gnu too if GCC was configured with
--with-arch=core2 for an example.
So having it, in this case, not being darwin specific would be
beneficial for all x86_64/i?86 targets.

Thanks,
Andrew

>
> Iain
>
>


Re: [PATCH 4/4] [ifcvt] if convert x=c ? y&z : y by RISC-V Zicond like insns

2023-10-30 Thread Jeff Law




On 10/30/23 01:25, Fei Gao wrote:

Conditional and, if zero
rd = (rc == 0) ? (rs1 & rs2) : rs1
-->
and rd, rs1, rs2
czero.eqz rtmp, rs1, rc
or rd, rd, rtmp

Conditional and, if non-zero
rd = (rc != 0) ? (rs1 & rs2) : rs1
-->
and rd, rs1, rs2
czero.nez rtmp, rs1, rc
or rd, rd, rtmp

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

 * ifcvt.cc (noce_cond_zero_binary_op_supported): add support for and
 (noce_try_cond_zero_arith): adapt for and operation.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for and operation.
Our internal bits also capture all these cases.  The allocation is 
slightly different and occasionally operands are swapped in an 
associative operation, but the net is the same.


Jeff


Re: [PATCH 2/4] [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

2023-10-30 Thread Jeff Law




On 10/30/23 01:25, Fei Gao wrote:

Conditional add, if zero
rd = (rc == 0) ? (rs1 + rs2) : rs1
-->
czero.nez rd, rs2, rc
add rd, rs1, rd

Conditional add, if non-zero
rd = (rc != 0) ? (rs1 + rs2) : rs1
-->
czero.eqz rd, rs2, rc
add rd, rs1, rd

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

 * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
 (noce_try_cond_zero_arith): handler for condtional zero op
 (noce_process_if_block): add noce_try_cond_zero_arith with hook control

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
Just an intermediate follow-up.  As I expected, the work we have 
internally fixes all the cases included in this patch.  Similarly for 
patch #3.


jeff


Re: [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM)

2023-10-30 Thread Andre Vieira (lists)

Hi Richi,

Friendly ping on this. I'm going away for two weeks end of this week, so 
I won't be here for end of stage-1, but I'd still very much like to get 
this done for GCC 14.


I don't know if you had a chance to look at this yet when you reviewed 
the other patches or if you maybe just missed it? A quick td;lr this 
moves around the TARGET_SIMD_CLONE_ADJUST call after we've vectorized 
the types in simdclones to avoid having to add the extra target hooks to 
change the types.  This required some moving around of the code that 
constructed the adjustments and the code that constructed the array for 
the return value.


Kind regards,
Andre

On 18/10/2023 15:41, Andre Vieira (lists) wrote:
This patch moves the call to TARGET_SIMD_CLONE_ADJUST until after the 
arguments and return types have been transformed into vector types.  It 
also constructs the adjuments and retval modifications after this call, 
allowing targets to alter the types of the arguments and return of the 
clone prior to the modifications to the function definition.


Is this OK?

gcc/ChangeLog:

     * omp-simd-clone.cc (simd_clone_adjust_return_type): Hoist out
     code to create return array and don't return new type.
     (simd_clone_adjust_argument_types): Hoist out code that creates
     ipa_param_body_adjustments and don't return them.
     (simd_clone_adjust): Call TARGET_SIMD_CLONE_ADJUST after return
     and argument types have been vectorized, create adjustments and
     return array after the hook.
     (expand_simd_clones): Call TARGET_SIMD_CLONE_ADJUST after return
     and argument types have been vectorized.

On 04/10/2023 13:40, Andre Vieira (lists) wrote:



On 04/10/2023 11:41, Richard Biener wrote:

On Wed, 4 Oct 2023, Andre Vieira (lists) wrote:




On 30/08/2023 14:04, Richard Biener wrote:

On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

This patch adds a new target hook to enable us to adapt the types 
of return
and parameters of simd clones.  We use this in two ways, the first 
one is

to
make sure we can create valid SVE types, including the SVE type 
attribute,
when creating a SVE simd clone, even when the target options do 
not support
SVE.  We are following the same behaviour seen with x86 that 
creates simd
clones according to the ABI rules when no simdlen is provided, 
even if that
simdlen is not supported by the current target options.  Note that 
this

doesn't mean the simd clone will be used in auto-vectorization.


You are not documenting the bool parameter of the new hook.

What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?


simd_clone_adjust_argument_types is called after that hook, so by 
the time we
call TARGET_SIMD_CLONE_ADJUST the types are still in scalar, not 
vector.  The

same is true for the return type one.

Also the changes to the types need to be taken into consideration in
'adjustments' I think.


Nothing in the three existing implementations of 
TARGET_SIMD_CLONE_ADJUST

relies on this ordering I think, how about moving the hook invocation
after simd_clone_adjust_argument_types?



But that wouldn't change the 'ipa_param_body_adjustments' for when we 
have a function definition and we need to redo the body.

Richard.

PS: I hope the subject line survived, my email client is having a 
bit of a

wobble this morning... it's what you get for updating software :(


Re: [committed][_GLIBCXX_INLINE_VERSION] Fix constract violation

2023-10-30 Thread François Dumont



On 30/10/2023 14:45, Jonathan Wakely wrote:

On Sun, 29 Oct 2023 at 21:11, François Dumont  wrote:

This fixes handle_contract_violation under versioned namespace mode.

Tested under Linux x64 and confirmed to also fix Darwin build.

libstdc++: [_GLIBCXX_INLINE_VERSION] Provide handle_contract_violation
symbol

libstdc++-v3/ChangeLog:

  * src/experimental/contract.cc
  [_GLIBCXX_INLINE_VERSION](handle_contract_violation): Provide
symbol
  without version namespace decoration for gcc.
+#if _GLIBCXX_INLINE_VERSION
+// Provide symbol without version namespace decoration for gcc.

For the comment in the code, I think this would be better:

// The compiler expects the contract_violation class to be in an unversioned
// namespace, so provide a forwarding function with the expected symbol name.

Sure, I'll update it.

Do we want the forwarding function to be a weak symbol? The main
handler function is weak because we want users to be able to override
it with their own handler. But for this new forwarding function, they
can't even declare it (because it has a reserved name that doesn't
demangle to a valid type for the versioned namespace build).


Good point, I see no reason neither so I'll remove it.



Re: [PATCH] RISC-V: Add vector fmin/fmax expanders.

2023-10-30 Thread Joseph Myers
On Mon, 30 Oct 2023, Robin Dapp wrote:

> Hi,
> 
> this patch adds expanders for fmin and fmax and the associated
> cond and reduc ones.  As per RISC-V V spec 1.0 vfmin/vfmax are
> IEEE 754-2008 compliant so that should be ok.

Aren't they actually the IEEE 754-2019 operations (with different 
signaling NaN semantics; C functions such as fmaximum in C23), not the 
IEEE 754-2008 operations (C functions such as fmax)?  V spec 1.0 says "The 
vector floating-point vfmin and vfmax instructions have the same behavior 
as the corresponding scalar floating-point instructions in version 2.2 of 
the RISC-V F/D/Q extension.".  And version 2.2 of F/D/Q (which is *not* 
version 2.2 of the instruction set, it's later than that) changed the 
scalar instructions to be the IEEE 754-2019 operations (thus, the 
!HONOR_SNANS checks in the back end).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [committed][_GLIBCXX_INLINE_VERSION] Add emul TLS symbol exports

2023-10-30 Thread François Dumont



On 30/10/2023 14:58, Jonathan Wakely wrote:

On Sun, 29 Oct 2023 at 21:25, François Dumont  wrote:

libstdc++: [_GLIBCXX_INLINE_VERSION] Add emul TLS symbols

libstdc++-v3/ChangeLog:

  * config/abi/pre/gnu-versioned-namespace.ver: Add missing emul TLS
  symbols.


Please put a comment above the two new lines, the same as in gnu.ver:

# targets using emutls

OK with that change, thanks.


It's already committed as it was a trivial change limited to the 
versioned namespace special mode for which I'm maintainer.


Can you confirm you want a new commit just to add this comment ?




Re: [RFC PATCH v1] c: Do not warn about external declaration following inline definition

2023-10-30 Thread Joseph Myers
On Sat, 28 Oct 2023, Barnabás Pőcze wrote:

> An external declaration following an inline definition is not redundant
> because it forces the compiler to emit an external definition for the 
> function.
> That is,
> 
>   inline void f(void) { }
>   [extern] void f(void);
> 
> should not trigger the
> 
>   redundant redeclaration of ...
> 
> warning.

This should add a testcase to the testsuite (that fails before and passes 
after the front-end change is made).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 2/3] ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)

2023-10-30 Thread Martin Jambor
Hello,

On Thu, Oct 05 2023, Jan Hubicka wrote:
>> gcc/ChangeLog:
>> 
>> 2023-09-19  Martin Jambor  
>> 
>>  PR ipa/57
>>  * ipa-prop.h (struct ipa_argagg_value): Newf flag killed.
>>  * ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function.
>>  (update_signature): Mark any any IPA-CP aggregate constants at
>>  positions known to be killed as killed.  Move check that there is
>>  clone_info after this pruning.
>>  * ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag.
>>  (ipa_argagg_value_list::push_adjusted_values): Clear the new flag.
>>  (push_agg_values_from_plats): Likewise.
>>  (ipa_push_agg_values_from_jfunc): Likewise.
>>  (estimate_local_effects): Likewise.
>>  (push_agg_values_for_index_from_edge): Likewise.
>>  * ipa-prop.cc (write_ipcp_transformation_info): Stream the killed
>>  flag.
>>  (read_ipcp_transformation_info): Likewise.
>>  (ipcp_get_aggregate_const): Update comment, assert that encountered
>>  record does not have killed flag set.
>>  (ipcp_transform_function): Prune all aggregate constants with killed
>>  set.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2023-09-18  Martin Jambor  
>> 
>>  PR ipa/57
>>  * gcc.dg/lto/pr57_0.c: New test.
>>  * gcc.dg/lto/pr57_1.c: Second file of the same new test.
>
>> diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
>> index c04f9f44c06..a8fcf159259 100644
>> --- a/gcc/ipa-modref.cc
>> +++ b/gcc/ipa-modref.cc
>> @@ -4065,21 +4065,71 @@ remap_kills (vec  &kills, const 
>> vec  &map)
>>i++;
>>  }
>>  
>> +/* Return true if the V can overlap with KILL.  */
>> +
>> +static bool
>> +ipcp_argagg_and_kill_overlap_p (const ipa_argagg_value &v,
>> +const modref_access_node &kill)
>> +{
>> +  if (kill.parm_index == v.index)
>> +{
>> +  gcc_assert (kill.parm_offset_known);
>> +  gcc_assert (known_eq (kill.max_size, kill.size));
>> +  poly_int64 repl_size;
>> +  bool ok = poly_int_tree_p (TYPE_SIZE (TREE_TYPE (v.value)),
>> + &repl_size);
>> +  gcc_assert (ok);
>> +  poly_int64 repl_offset (v.unit_offset);
>> +  repl_offset <<= LOG2_BITS_PER_UNIT;
>> +  poly_int64 combined_offset
>> += (kill.parm_offset << LOG2_BITS_PER_UNIT) + kill.offset;
> parm_offset may be negative which I think will confuse 
> ranges_maybe_overlap_p. 
> I think you need to test for this and if it is negative adjust
> repl_offset instead of kill.offset

After a discussion with Honza about this in person, we came to the
conclusion that the patch works as intended even in presence of negative
parm_offsets (I even have a testcase but I need to enhance IPA-CP a bit
in order for it to be useful also outside a debugger).

>> +  if (ranges_maybe_overlap_p (repl_offset, repl_size,
>> +  combined_offset, kill.size))
>> +return true;
>> +}
>> +  return false;
>> +}
>> +
>>  /* If signature changed, update the summary.  */
>>  
>>  static void
>>  update_signature (struct cgraph_node *node)
>>  {
>> -  clone_info *info = clone_info::get (node);
>> -  if (!info || !info->param_adjustments)
>> -return;
>> -
>>modref_summary *r = optimization_summaries
>>? optimization_summaries->get (node) : NULL;
>>modref_summary_lto *r_lto = summaries_lto
>>? summaries_lto->get (node) : NULL;
>>if (!r && !r_lto)
>>  return;
>> +
>> +  ipcp_transformation *ipcp_ts = ipcp_get_transformation_summary (node);
> Please add comment on why this is necessary.
>> +  if (ipcp_ts)
>> +{
>> +for (auto &v : ipcp_ts->m_agg_values)
>> +  {
>> +if (!v.by_ref)
>> +  continue;
>> +if (r)
>> +  for (const modref_access_node &kill : r->kills)
>> +if (ipcp_argagg_and_kill_overlap_p (v, kill))
>> +  {
>> +v.killed = true;
>> +break;
>> +  }
>> +if (!v.killed && r_lto)
>> +  for (const modref_access_node &kill : r_lto->kills)
>> +if (ipcp_argagg_and_kill_overlap_p (v, kill))
>> +  {
>> +v.killed = 1;
>  = true?
>> +break;
>> +  }
>> +  }
>> +}
>> +
>> +  clone_info *info = clone_info::get (node);
>> +  if (!info || !info->param_adjustments)
>> +return;
>> +
> OK.

This is what I am about to commit.

Thanks,

Martin



PR 57 shows that IPA-modref and IPA-CP (when plugged into value
numbering) can optimize out a store both before a call (because the
call will overwrite it) and in the call (because the store is of the
same value) and by eliminating both create miscompilation.

This patch fixes that by pruning any constants from the list of IPA-CP
aggregate value constants that it knows the contents of the memory can
be "killed."  Unfortunately, doing so is tricky.  First, IPA-modref
loads override kills and so only stores not loaded are truly not
necessary.  Looking stuff up

[x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-10-30 Thread Roger Sayle

This patch is a follow-up to my previous PR target/110551 patch, this
time to address the additional move after mulx, seen on TARGET_BMI2
architectures (such as -march=haswell).  The complication here is
that the flexible multiple-set mulx instruction is introduced into
RTL after reload, by split2, and therefore can't benefit from register
preferencing.  This results in RTL like the following:

(insn 32 31 17 2 (parallel [
(set (reg:DI 4 si [orig:101 r ] [101])
(mult:DI (reg:DI 1 dx [109])
(reg:DI 5 di [109])))
(set (reg:DI 5 di [ r+8 ])
(umul_highpart:DI (reg:DI 1 dx [109])
(reg:DI 5 di [109])))
]) "pr110551-2.c":8:17 -1
 (nil))

(insn 17 32 9 2 (set (reg:DI 0 ax [107])
(reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
 (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
(nil)))

Here insn 32, the mulx instruction, places its results in si and di,
and then immediately after decides to move di to ax, with di now dead.
This can be trivially cleaned up by a peephole2.  I've added an
additional constraint that the two SET_DESTs can't be the same
register to avoid confusing the middle-end, but this has well-defined
behaviour on x86_64/BMI2, encoding a umul_highpart.

For the new test case, compiled on x86_64 with -O2 -march=haswell:

Before:
mulx64: movabsq $-7046029254386353131, %rdx
mulx%rdi, %rsi, %rdi
movq%rdi, %rax
xorq%rsi, %rax
ret

After:
mulx64: movabsq $-7046029254386353131, %rdx
mulx%rdi, %rsi, %rax
xorq%rsi, %rax
ret

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2023-10-30  Roger Sayle  

gcc/ChangeLog
PR target/110551
* config/i386/i386.md (*bmi2_umul3_1): Tidy condition
as operands[2] with predicate register_operand must be !MEM_P.
(peephole2): Optimize a mulx followed by a register-to-register
move, to place result in the correct destination if possible.

gcc/testsuite/ChangeLog
PR target/110551
* gcc.target/i386/pr110551-2.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index eb4121b..a314f1a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9747,13 +9747,37 @@
  (match_operand:DWIH 3 "nonimmediate_operand" "rm")))
(set (match_operand:DWIH 1 "register_operand" "=r")
(umul_highpart:DWIH (match_dup 2) (match_dup 3)))]
-  "TARGET_BMI2
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_BMI2"
   "mulx\t{%3, %0, %1|%1, %0, %3}"
   [(set_attr "type" "imulx")
(set_attr "prefix" "vex")
(set_attr "mode" "")])
 
+;; Tweak *bmi2_umul3_1 to eliminate following mov.
+(define_peephole2
+  [(parallel [(set (match_operand:DWIH 0 "general_reg_operand")
+  (mult:DWIH (match_operand:DWIH 2 "register_operand")
+ (match_operand:DWIH 3 "nonimmediate_operand")))
+ (set (match_operand:DWIH 1 "general_reg_operand")
+  (umul_highpart:DWIH (match_dup 2) (match_dup 3)))])
+   (set (match_operand:DWIH 4 "general_reg_operand")
+   (match_operand:DWIH 5 "general_reg_operand"))]
+  "TARGET_BMI2
+   && ((REGNO (operands[5]) == REGNO (operands[0])
+&& REGNO (operands[1]) != REGNO (operands[4]))
+   || (REGNO (operands[5]) == REGNO (operands[1])
+  && REGNO (operands[0]) != REGNO (operands[4])))
+   && peep2_reg_dead_p (2, operands[5])"
+  [(parallel [(set (match_dup 0) (mult:DWIH (match_dup 2) (match_dup 3)))
+ (set (match_dup 1)
+  (umul_highpart:DWIH (match_dup 2) (match_dup 3)))])]
+{
+  if (REGNO (operands[5]) == REGNO (operands[0]))
+operands[0] = operands[4];
+  else
+operands[1] = operands[4];
+})
+
 (define_insn "*umul3_1"
   [(set (match_operand: 0 "register_operand" "=r,A")
(mult:
diff --git a/gcc/testsuite/gcc.target/i386/pr110551-2.c 
b/gcc/testsuite/gcc.target/i386/pr110551-2.c
new file mode 100644
index 000..4936adf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110551-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -march=haswell" } */
+
+typedef unsigned long long uint64_t;
+
+uint64_t mulx64(uint64_t x)
+{
+__uint128_t r = (__uint128_t)x * 0x9E3779B97F4A7C15ull;
+return (uint64_t)r ^ (uint64_t)( r >> 64 );
+}
+
+/* { dg-final { scan-assembler-not "movq" } } */


[PATCH v6 1/1] gcc: config: microblaze: fix cpu version check

2023-10-30 Thread Neal Frager
The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp
instead of strverscmp to check the mcpu version against feature
options.  By simply changing the define to use strverscmp,
the new version 10.0 is treated correctly as a higher version
than previous versions.

Signed-off-by: Neal Frager 
---
V1->V2:
 - No need to create a new microblaze specific version check
   routine as strverscmp is the correct solution.
V2->V3:
 - Changed mcpu define for microblaze isa testsuite examples.
V3->V4:
 - Added ChangeLog
V4->V5:
 - Added testsuite ChangeLog
V5->V6:
 - Updated testsuite ChangeLog to include all files
---
 gcc/ChangeLog |  4 
 gcc/config/microblaze/microblaze.cc   |  2 +-
 gcc/testsuite/ChangeLog   | 22 +++
 .../gcc.target/microblaze/isa/bshift.c|  2 +-
 gcc/testsuite/gcc.target/microblaze/isa/div.c |  2 +-
 .../gcc.target/microblaze/isa/fcmp1.c |  2 +-
 .../gcc.target/microblaze/isa/fcmp2.c |  2 +-
 .../gcc.target/microblaze/isa/fcmp3.c |  2 +-
 .../gcc.target/microblaze/isa/fcmp4.c |  2 +-
 .../gcc.target/microblaze/isa/fcvt.c  |  2 +-
 .../gcc.target/microblaze/isa/float.c |  2 +-
 .../gcc.target/microblaze/isa/fsqrt.c |  2 +-
 .../microblaze/isa/mul-bshift-pcmp.c  |  2 +-
 .../gcc.target/microblaze/isa/mul-bshift.c|  2 +-
 gcc/testsuite/gcc.target/microblaze/isa/mul.c |  2 +-
 .../microblaze/isa/mulh-bshift-pcmp.c |  2 +-
 .../gcc.target/microblaze/isa/mulh.c  |  2 +-
 .../gcc.target/microblaze/isa/nofcmp.c|  2 +-
 .../gcc.target/microblaze/isa/nofloat.c   |  2 +-
 .../gcc.target/microblaze/isa/pcmp.c  |  2 +-
 .../gcc.target/microblaze/isa/vanilla.c   |  2 +-
 .../gcc.target/microblaze/microblaze.exp  |  2 +-
 22 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4964796c6a6..7f63f39d4cd 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2023-10-30  Neal Frager  
+
+   * config/microblaze/microblaze.cc: Fix mcpu version check.
+
 2023-10-29  Martin Uecker  
 
PR tree-optimization/109334
diff --git a/gcc/config/microblaze/microblaze.cc 
b/gcc/config/microblaze/microblaze.cc
index c9f6c4198cf..60ad55120d2 100644
--- a/gcc/config/microblaze/microblaze.cc
+++ b/gcc/config/microblaze/microblaze.cc
@@ -56,7 +56,7 @@
 /* This file should be included last.  */
 #include "target-def.h"
 
-#define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)
+#define MICROBLAZE_VERSION_COMPARE(VA,VB) strverscmp (VA, VB)
 
 /* Classifies an address.
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 5c18129b4ac..2f0fc3275ae 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,25 @@
+2023-10-30  Neal Frager  
+
+   * gcc.target/microblaze/isa/bshift.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/div.c: Ditto.
+   * gcc.target/microblaze/isa/fcmp1.c: Ditto.
+   * gcc.target/microblaze/isa/fcmp2.c: Ditto.
+   * gcc.target/microblaze/isa/fcmp3.c: Ditto.
+   * gcc.target/microblaze/isa/fcmp4.c: Ditto.
+   * gcc.target/microblaze/isa/fcvt.c: Ditto.
+   * gcc.target/microblaze/isa/float.c: Ditto.
+   * gcc.target/microblaze/isa/fsqrt.c: Ditto.
+   * gcc.target/microblaze/isa/mul-bshift-pcmp.c: Ditto.
+   * gcc.target/microblaze/isa/mul-bshift.c: Ditto.
+   * gcc.target/microblaze/isa/mul.c: Ditto.
+   * gcc.target/microblaze/isa/mulh-bshift-pcmp.c: Ditto.
+   * gcc.target/microblaze/isa/mulh.c: Ditto.
+   * gcc.target/microblaze/isa/nofcmp.c: Ditto.
+   * gcc.target/microblaze/isa/nofloat.c: Ditto.
+   * gcc.target/microblaze/isa/pcmp.c: Ditto.
+   * gcc.target/microblaze/isa/vanilla.c: Ditto.
+   * gcc.target/microblaze/microblaze.exp: Ditto.
+
 2023-10-29  Iain Buclaw  
 
PR d/110712
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/bshift.c 
b/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
index 64cf1e2e59e..664586bff9f 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mxl-barrel-shift" } */
+/* { dg-options "-O3 -mcpu=v10.0 -mxl-barrel-shift" } */
 
 volatile int m1, m2, m3;
 volatile unsigned int u1, u2, u3;
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/div.c 
b/gcc/testsuite/gcc.target/microblaze/isa/div.c
index 25ee42ce5c8..783e7c0f684 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/div.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/div.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mno-xl-soft-div" } */
+/* { dg-options "-O3 -mcpu=v10.0 -mno-xl-soft-div" } */
 
 volatile int m1, m2, m3;
 volatile long l1, l2;
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c 
b/gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c
index 4041a241391..b6202e168d6 100644
--- a/g

[Committed] RISC-V: Make rv32i_zcmp testcase more robust

2023-10-30 Thread Patrick O'Neill



On 10/30/23 09:55, Jeff Law wrote:



On 10/30/23 10:37, Patrick O'Neill wrote:

GCC recently changed its register allocator which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any s register in the range of 1-9 for cm.push and cm.popret insns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32i_zcmp.c: Accept any register in the
range of 1-9 for cm.push and cm.popret insns.

OK
jeff


Committed.

Patrick



Re: [PATCH] RISC-V: Make rv32i_zcmp testcase more robust

2023-10-30 Thread Jeff Law




On 10/30/23 10:37, Patrick O'Neill wrote:

GCC recently changed its register allocator which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any s register in the range of 1-9 for cm.push and cm.popret insns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32i_zcmp.c: Accept any register in the
range of 1-9 for cm.push and cm.popret insns.

OK
jeff


Re: [PATCH v3] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-30 Thread Patrick O'Neill



On 10/29/23 20:21, Vineet Gupta wrote:

RV64 compare and branch instructions only support 64-bit operands.
At Expand time, the backend conservatively zero/sign extends
its operands even if not needed, such as incoming 32-bit function args
which ABI/ISA guarantee to be sign-extended already.

And subsequently REE fails to eliminate them as
"missing defintion(s)" or "multiple definition(s)
since function args don't have explicit definition.

So during expand riscv_extend_comparands (), if an operand is a
subreg-promoted SI with inner DI, which is representative of a function
arg, just peel away the subreg to expose the DI, eliding the sign
extension. As Jeff noted this routine is also used in if-conversion so
also helps there.

Note there's currently patches floating around to improve REE and also a
new pass to eliminate unneccesary extensions, but it is still beneficial
to not generate those extra extensions in first place. It is obviously
less work for post-reload passes such as REE, but even for earlier
passes, such as combine, having to deal with one less thing and ensuing
fewer combinations is a win too.

Way too many existing tests used to observe this issue.
e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc
It elimiates the SEXT.W

Tested with rv64gc with no regressions, I'm relying on PAtrick's
pre-commit CI to do the full testing.

Testing on the pre-commit CI has completed.
https://github.com/ewlu/gcc-precommit-ci/issues/499#issuecomment-1784446631

The patch was applied to this baseline:
https://github.com/gcc-mirror/gcc/commit/c6929b085580cf00cbc52b0f5b0afe2b9caa2a22

and no new failures or resolved failures were found when running the 
testsuite.


Tested-by: Patrick O'Neill 

Thanks!
Patrick


gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New.
* (riscv_extend_comparands): Call New function on operands.

Signed-off-by: Vineet Gupta 
---
Changes since v2:
   - Fix linting issues flagged by pre-commit CI
Changes since v1:
   - Elide sign extension for 32-bit operarnds only
   - Apply elison for both arguments
---
  gcc/config/riscv/riscv.cc | 23 +--
  1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ca9a2ca81d53..269beb3b159b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3678,6 +3678,24 @@ riscv_zero_if_equal (rtx cmp0, rtx cmp1)
   cmp0, cmp1, 0, 0, OPTAB_DIRECT);
  }
  
+/* Helper function for riscv_extend_comparands to Sign-extend the OP.

+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI) 0)
+   just peel off the SUBREG to get DI, avoiding extraneous extension.  */
+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_MODE (*op) == SImode
+  && GET_CODE (*op) == SUBREG
+  && SUBREG_PROMOTED_VAR_P (*op)
+  && GET_MODE_SIZE (GET_MODE (XEXP (*op, 0))).to_constant ()
+== GET_MODE_SIZE (word_mode))
+*op = XEXP (*op, 0);
+  else
+*op = gen_rtx_SIGN_EXTEND (word_mode, *op);
+}
+
  /* Sign- or zero-extend OP0 and OP1 for integer comparisons.  */
  
  static void

@@ -3707,9 +3725,10 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx 
*op1)
}
else
{
- *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+ riscv_sign_extend_if_not_subreg_prom (op0);
+
  if (*op1 != const0_rtx)
-   *op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
+   riscv_sign_extend_if_not_subreg_prom (op1);
}
  }
  }


Re: [committed] d: Merge upstream dmd, druntime e48bc0987d, phobos 2458e8f82.

2023-10-30 Thread Rainer Orth
Hi Iain,

> This patch merges the D front-end and runtime library with upstream dmd
> e48bc0987d, and standard library with phobos 2458e8f82.
>
> Synchronizing with the v2.106.0-beta.1 release.
>
> D front-end changes:
>
> - Import dmd v2.106.0-beta.1.

this patch broke D bootstrap, it seems:

/vol/gcc/src/hg/master/local/gcc/d/expr.cc: In member function 'virtual void 
ExprVisitor::visit(NewExp*)':
/vol/gcc/src/hg/master/local/gcc/d/expr.cc:2361:21: error: unused variable 
'tarray' [-Werror=unused-variable]
 2361 | TypeDArray *tarray = tb->isTypeDArray ();
  | ^~

It removed the uses of tarray, but kept the initialization.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] RISC-V: Make rv32i_zcmp testcase more robust

2023-10-30 Thread Patrick O'Neill
GCC recently changed its register allocator which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any s register in the range of 1-9 for cm.push and cm.popret insns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32i_zcmp.c: Accept any register in the
range of 1-9 for cm.push and cm.popret insns.

Signed-off-by: Patrick O'Neill 
---
Tested using glibc rv64gc on r14-4980-g2672c60917d.
---
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c 
b/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
index ea562b7a233..1e1a8be8705 100644
--- a/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
+++ b/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
@@ -26,9 +26,9 @@ f2 (void);
 /*
 **test1:
 ** ...
-** cm.push {ra, s0-s4}, -80
+** cm.push {ra, s0-s[1-9]}, -80
 ** ...
-** cm.popret   {ra, s0-s4}, 80
+** cm.popret   {ra, s0-s[1-9]}, 80
 ** ...
 */
 int
@@ -50,9 +50,9 @@ test1 ()
 /*
 **test2_step1_0_size:
 ** ...
-** cm.push {ra, s0-s1}, -64
+** cm.push {ra, s0-s[1-9]}, -64
 ** ...
-** cm.popret   {ra, s0-s1}, 64
+** cm.popret   {ra, s0-s[1-9]}, 64
 ** ...
 */
 int
@@ -70,9 +70,9 @@ test2_step1_0_size ()
 /*
 **test3:
 ** ...
-** cm.push {ra, s0-s4}, -80
+** cm.push {ra, s0-s[1-9]}, -80
 ** ...
-** cm.popret   {ra, s0-s4}, 80
+** cm.popret   {ra, s0-s[1-9]}, 80
 ** ...
 */
 float
-- 
2.34.1



Re: [PATCH 2/4] [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

2023-10-30 Thread Jeff Law




On 10/30/23 01:25, Fei Gao wrote:

Conditional add, if zero
rd = (rc == 0) ? (rs1 + rs2) : rs1
-->
czero.nez rd, rs2, rc
add rd, rs1, rd

Conditional add, if non-zero
rd = (rc != 0) ? (rs1 + rs2) : rs1
-->
czero.eqz rd, rs2, rc
add rd, rs1, rd

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

 * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
 (noce_try_cond_zero_arith): handler for condtional zero op
 (noce_process_if_block): add noce_try_cond_zero_arith with hook control

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
So the idea here is to improve upon the current code we generate for 
conditional arithmetic.  Right now we support conditional arithmetic 
using zicond, but the sequence is poor.


Basically the if-converter knows how to generate a conditional add, but 
it does so in a way that isn't as efficient as it could be.


In effect ifcvt wants to generate

t = a + b
res = cond ? t : b


We want to change it to

t = cond ? b : 0;
res = a + t;

The latter sequence expands to more efficient code trivially for risc-v.

I wandered a bit through the combine dumps to see if it would be easy to 
capture this class of cases.  We never get anything useful, and while I 
can imagine "bridge" patterns that would potentially expose enough RTL 
to allow us to rewrite without changing ifcvt, it'd just be a hack IMHO.


So going back to ifcvt...

In the first sequence the addition must wait for both "a" and "b" to be 
available and the conditional move can fire on the next cycle.


In the second sequence the conditional move can fire when just "b" is 
available.  So that gives "a" another cycle to become ready (say if it's 
coming from memory or a multi-cycle operation like multiply).


On the other hand the second sequence does keep "a" live longer.

In the end I strongly suspect neither sequence is significantly better 
than the other.  Meaning I don't think we need to conditionalize using 
condzero arith at all.



I'll note that subsequent patches add MINUS, IOR, XOR and AND.  It's 
also possible (and important) to handle shifts.  There's a conditional 
shift-by-6 in leela's hot path.


Overall this looks a lot like the VRULL code, but just less complete. 
My inclination is to do a cleanup pass on the VRULL code verify it 
handles all the cases in your tests and commit the VRULL implementation 
with your tests.


I'll do some further poking at this today.  Thanks for re-submitting 
these bits.  Getting this target independent work cleaned up has been on 
my TODO for a while now.


jeff


Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-10-30 Thread FX Coudert
Hi,

> +enable_darwin_at_rpath_$1=no

I actually don’t understand why this one would have $1 in the name, unlike all 
other regenerated configure files. What value do we expect for $1 at this point 
in the file? That’s just plain weird.

FX

[PATCH v6 1/1] gcc: config: microblaze: fix cpu version check

2023-10-30 Thread Neal Frager
The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp
instead of strverscmp to check the mcpu version against feature
options.  By simply changing the define to use strverscmp,
the new version 10.0 is treated correctly as a higher version
than previous versions.

Signed-off-by: Neal Frager 
---
V1->V2:
 - No need to create a new microblaze specific version check
   routine as strverscmp is the correct solution.
V2->V3:
 - Changed mcpu define for microblaze isa testsuite examples.
V3->V4:
 - Added ChangeLog
V4->V5:
 - Added testsuite ChangeLog
V5->V6:
 - Updated testsuite ChangeLog to include all files
---
 gcc/ChangeLog |  4 
 gcc/config/microblaze/microblaze.cc   |  2 +-
 gcc/testsuite/ChangeLog   | 22 +++
 .../gcc.target/microblaze/isa/bshift.c|  2 +-
 gcc/testsuite/gcc.target/microblaze/isa/div.c |  2 +-
 .../gcc.target/microblaze/isa/fcmp1.c |  2 +-
 .../gcc.target/microblaze/isa/fcmp2.c |  2 +-
 .../gcc.target/microblaze/isa/fcmp3.c |  2 +-
 .../gcc.target/microblaze/isa/fcmp4.c |  2 +-
 .../gcc.target/microblaze/isa/fcvt.c  |  2 +-
 .../gcc.target/microblaze/isa/float.c |  2 +-
 .../gcc.target/microblaze/isa/fsqrt.c |  2 +-
 .../microblaze/isa/mul-bshift-pcmp.c  |  2 +-
 .../gcc.target/microblaze/isa/mul-bshift.c|  2 +-
 gcc/testsuite/gcc.target/microblaze/isa/mul.c |  2 +-
 .../microblaze/isa/mulh-bshift-pcmp.c |  2 +-
 .../gcc.target/microblaze/isa/mulh.c  |  2 +-
 .../gcc.target/microblaze/isa/nofcmp.c|  2 +-
 .../gcc.target/microblaze/isa/nofloat.c   |  2 +-
 .../gcc.target/microblaze/isa/pcmp.c  |  2 +-
 .../gcc.target/microblaze/isa/vanilla.c   |  2 +-
 .../gcc.target/microblaze/microblaze.exp  |  2 +-
 22 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4964796c6a6..7f63f39d4cd 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2023-10-30  Neal Frager  
+
+   * config/microblaze/microblaze.cc: Fix mcpu version check.
+
 2023-10-29  Martin Uecker  
 
PR tree-optimization/109334
diff --git a/gcc/config/microblaze/microblaze.cc 
b/gcc/config/microblaze/microblaze.cc
index c9f6c4198cf..60ad55120d2 100644
--- a/gcc/config/microblaze/microblaze.cc
+++ b/gcc/config/microblaze/microblaze.cc
@@ -56,7 +56,7 @@
 /* This file should be included last.  */
 #include "target-def.h"
 
-#define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)
+#define MICROBLAZE_VERSION_COMPARE(VA,VB) strverscmp (VA, VB)
 
 /* Classifies an address.
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 5c18129b4ac..9be4942b61d 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,25 @@
+2023-10-30  Neal Frager  
+
+   * gcc.target/microblaze/isa/bshift.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/div.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/fcmp1.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/fcmp2.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/fcmp3.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/fcmp4.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/fcvt.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/float.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/fsqrt.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/mul-bshift-pcmp.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/mul-bshift.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/mul.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/mulh-bshift-pcmp.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/mulh.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/nofcmp.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/nofloat.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/pcmp.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/isa/vanilla.c: Bump to mcpu=v10.0.
+   * gcc.target/microblaze/microblaze.exp: Bump to mcpu=v10.0.
+
 2023-10-29  Iain Buclaw  
 
PR d/110712
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/bshift.c 
b/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
index 64cf1e2e59e..664586bff9f 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mxl-barrel-shift" } */
+/* { dg-options "-O3 -mcpu=v10.0 -mxl-barrel-shift" } */
 
 volatile int m1, m2, m3;
 volatile unsigned int u1, u2, u3;
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/div.c 
b/gcc/testsuite/gcc.target/microblaze/isa/div.c
index 25ee42ce5c8..783e7c0f684 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/div.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/div.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mno-xl-soft-div" } */
+/* { dg-options "-O3 -mcpu=v10

Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-10-30 Thread Iain Sandoe
Hi Martin,

> On 30 Oct 2023, at 16:17, Martin Jambor  wrote:

> On Tue, Aug 15 2023, FX Coudert via Gcc-patches wrote:
>> 
> 
> [...]
> 
>> From e1cf04cadb9fa065fb3f7d6bccf9ed6f1e9e3fc1 Mon Sep 17 00:00:00 2001
>> From: Iain Sandoe 
>> Date: Sun, 28 Mar 2021 14:48:17 +0100
>> Subject: [PATCH 2/4] Darwin: Allow for configuring Darwin to use embedded
>> runpath.
> 
> our buildbot checker found that after this patch, there is an
> uncommitted auto(re)conf generated hunk in fixincludes/configure:
> 
> diff --git a/fixincludes/configure b/fixincludes/configure
> index b9770489adc..1bb547a1724 100755
> --- a/fixincludes/configure
> +++ b/fixincludes/configure
> @@ -3027,6 +3027,7 @@ ac_configure="$SHELL $ac_aux_dir/configure"  # Please 
> don't use this var.
> # ---
> # _LT_COMPILER_PIC
> 
> +enable_darwin_at_rpath_$1=no
> 
> # _LT_LINKER_SHLIBS([TAGNAME])
> # 
> @@ -3049,7 +3050,6 @@ ac_configure="$SHELL $ac_aux_dir/configure"  # Please 
> don't use this var.
> # the compiler configuration to `libtool'.
> # _LT_LANG_CXX_CONFIG
> 
> -
> # _LT_SYS_HIDDEN_LIBDEPS([TAGNAME])
> # -
> # Figure out "hidden" library dependencies from verbose
> 
> 
> Can I commit it (with an appropriate ChangeLog message) or do you want
> to take care of it yourself?

Sorry for the omission, I’ll take care of it later today, thanks for spotting 
it.
Iain

> 
> Thanks,
> 
> Martin
> 
> 
>> 
>> Recent Darwin versions place contraints on the use of run paths
>> specified in environment variables.  This breaks some assumptions
>> in the GCC build.
>> 
>> This change allows the user to configure a Darwin build to use
>> '@rpath/libraryname.dylib' in library names and then to add an
>> embedded runpath to executables (and libraries with dependents).
>> 
>> The embedded runpath is added by default unless the user adds
>> '-nodefaultrpaths' to the link line.
>> 
>> For an installed compiler, it means that any executable built with
>> that compiler will reference the runtimes installed with the
>> compiler (equivalent to hard-coding the library path into the name
>> of the library).
>> 
>> During build-time configurations  any "-B" entries will be added to
>> the runpath thus the newly-built libraries will be found by exes.
>> 
>> Since the install name is set in libtool, that decision needs to be
>> available here (but might also cause dependent ones in Makefiles,
>> so we need to export a conditional).
>> 
>> This facility is not available for Darwin 8 or earlier, however the
>> existing environment variable runpath does work there.
>> 
>> We default this on for systems where the external DYLD_LIBRARY_PATH
>> does not work and off for Darwin 8 or earlier.  For systems that can
>> use either method, if the value is unset, we use the default (which
>> is currently DYLD_LIBRARY_PATH).
>> 
>> 
>> 
>> Ada changes:
>> add paths relative to @loader-path
>> 
>> JIT changes:
>> 
>> This patch expects DARWIN_RPATH to be computed and available; which
>> means that we will use @rpath or ${libdir} as the name prefix
>> depending on the system version and the setting of
>> --enable-darwin-at-rpath.  For branches that do not have this
>> available, the value should be set to ${libdir}.
>> 
>> added m2 library changes.
>> 
>> ChangeLog:
>> 
>>  * configure: Regenerate.
>>  * configure.ac: Do not add default runpaths to GCC exes
>>  when we are building -static-libstdc++/-static-libgcc (the
>>  default).
>>  * libtool.m4: Add 'enable-darwin-at-runpath'.  Act  on the
>>  enable flag to alter Darwin libraries to use @rpath names.
>> 
>> fixincludes/ChangeLog:
>> 
>>  * configure: Regenerate.
>> 
>> gcc/ChangeLog:
>> 
>>  * aclocal.m4: Regenerate.
>>  * configure: Regenerate.
>>  * configure.ac: Handle Darwin rpaths.
>>  * config/darwin-driver.cc: Handle Darwin rpaths.
>>  * config/darwin.h: Handle Darwin rpaths.
>>  * config/darwin.opt: Handle Darwin rpaths.
>>  * Makefile.in:  Handle Darwin rpaths.
>> 
>> gcc/ada/ChangeLog:
>> 
>>  * gcc-interface/Makefile.in: Handle Darwin rpaths.
>> 
>> gcc/jit/ChangeLog:
>>  * Make-lang.in: Handle Darwin rpaths.
>> 
>> libatomic/ChangeLog:
>> 
>>  * Makefile.am: Handle Darwin rpaths.
>>  * Makefile.in: Regenerate.
>>  * configure: Regenerate.
>>  * configure.ac: Handle Darwin rpaths.
>> 
>> libbacktrace/ChangeLog:
>> 
>>  * configure: Regenerate.
>>  * configure.ac: Handle Darwin rpaths.
>> 
>> libcc1/ChangeLog:
>> 
>>  * configure: Regenerate.
>> 
>> libffi/ChangeLog:
>> 
>>  * Makefile.am: Handle Darwin rpaths.
>>  * Makefile.in: Regenerate.
>>  * configure: Regenerate.
>> 
>> libgcc/ChangeLog:
>> 
>>  * config/t-slibgcc-darwin: Generate libgcc_s
>>  with an @rpath name.
>>  * config.host: Handle Darwin rpaths.
>> 
>> libgfortran/ChangeLog:
>> 
>>  * Makefile.am: Handle Darwin rpaths.
>>  * Makefile.in: Regenerate

Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-10-30 Thread Martin Jambor
Hello Iain,

On Tue, Aug 15 2023, FX Coudert via Gcc-patches wrote:
>

[...]

> From e1cf04cadb9fa065fb3f7d6bccf9ed6f1e9e3fc1 Mon Sep 17 00:00:00 2001
> From: Iain Sandoe 
> Date: Sun, 28 Mar 2021 14:48:17 +0100
> Subject: [PATCH 2/4] Darwin: Allow for configuring Darwin to use embedded
>  runpath.

our buildbot checker found that after this patch, there is an
uncommitted auto(re)conf generated hunk in fixincludes/configure:

diff --git a/fixincludes/configure b/fixincludes/configure
index b9770489adc..1bb547a1724 100755
--- a/fixincludes/configure
+++ b/fixincludes/configure
@@ -3027,6 +3027,7 @@ ac_configure="$SHELL $ac_aux_dir/configure"  # Please 
don't use this var.
 # ---
 # _LT_COMPILER_PIC
 
+enable_darwin_at_rpath_$1=no
 
 # _LT_LINKER_SHLIBS([TAGNAME])
 # 
@@ -3049,7 +3050,6 @@ ac_configure="$SHELL $ac_aux_dir/configure"  # Please 
don't use this var.
 # the compiler configuration to `libtool'.
 # _LT_LANG_CXX_CONFIG
 
-
 # _LT_SYS_HIDDEN_LIBDEPS([TAGNAME])
 # -
 # Figure out "hidden" library dependencies from verbose


Can I commit it (with an appropriate ChangeLog message) or do you want
to take care of it yourself?

Thanks,

Martin


>
> Recent Darwin versions place contraints on the use of run paths
> specified in environment variables.  This breaks some assumptions
> in the GCC build.
>
> This change allows the user to configure a Darwin build to use
> '@rpath/libraryname.dylib' in library names and then to add an
> embedded runpath to executables (and libraries with dependents).
>
> The embedded runpath is added by default unless the user adds
> '-nodefaultrpaths' to the link line.
>
> For an installed compiler, it means that any executable built with
> that compiler will reference the runtimes installed with the
> compiler (equivalent to hard-coding the library path into the name
> of the library).
>
> During build-time configurations  any "-B" entries will be added to
> the runpath thus the newly-built libraries will be found by exes.
>
> Since the install name is set in libtool, that decision needs to be
> available here (but might also cause dependent ones in Makefiles,
> so we need to export a conditional).
>
> This facility is not available for Darwin 8 or earlier, however the
> existing environment variable runpath does work there.
>
> We default this on for systems where the external DYLD_LIBRARY_PATH
> does not work and off for Darwin 8 or earlier.  For systems that can
> use either method, if the value is unset, we use the default (which
> is currently DYLD_LIBRARY_PATH).
>
> 
>
> Ada changes:
>  add paths relative to @loader-path
>
> JIT changes:
>
> This patch expects DARWIN_RPATH to be computed and available; which
> means that we will use @rpath or ${libdir} as the name prefix
> depending on the system version and the setting of
> --enable-darwin-at-rpath.  For branches that do not have this
> available, the value should be set to ${libdir}.
>
> added m2 library changes.
>
> ChangeLog:
>
>   * configure: Regenerate.
>   * configure.ac: Do not add default runpaths to GCC exes
>   when we are building -static-libstdc++/-static-libgcc (the
>   default).
>   * libtool.m4: Add 'enable-darwin-at-runpath'.  Act  on the
>   enable flag to alter Darwin libraries to use @rpath names.
>
> fixincludes/ChangeLog:
>
>   * configure: Regenerate.
>
> gcc/ChangeLog:
>
>   * aclocal.m4: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Handle Darwin rpaths.
>   * config/darwin-driver.cc: Handle Darwin rpaths.
>   * config/darwin.h: Handle Darwin rpaths.
>   * config/darwin.opt: Handle Darwin rpaths.
>   * Makefile.in:  Handle Darwin rpaths.
>
> gcc/ada/ChangeLog:
>
>   * gcc-interface/Makefile.in: Handle Darwin rpaths.
>
> gcc/jit/ChangeLog:
>   * Make-lang.in: Handle Darwin rpaths.
>
> libatomic/ChangeLog:
>
>   * Makefile.am: Handle Darwin rpaths.
>   * Makefile.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Handle Darwin rpaths.
>
> libbacktrace/ChangeLog:
>
>   * configure: Regenerate.
>   * configure.ac: Handle Darwin rpaths.
>
> libcc1/ChangeLog:
>
>   * configure: Regenerate.
>
> libffi/ChangeLog:
>
>   * Makefile.am: Handle Darwin rpaths.
>   * Makefile.in: Regenerate.
>   * configure: Regenerate.
>
> libgcc/ChangeLog:
>
>   * config/t-slibgcc-darwin: Generate libgcc_s
>   with an @rpath name.
>   * config.host: Handle Darwin rpaths.
>
> libgfortran/ChangeLog:
>
>   * Makefile.am: Handle Darwin rpaths.
>   * Makefile.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Handle Darwin rpaths
>
> libgm2/ChangeLog:
>
>   * Makefile.am: Handle Darwin rpaths.
>   * Makefile.in: Regenerate.
>   * aclocal.m4: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Handle Darwin rpaths.
>   * libm2cor/Makefile.

Re: [PATCH v5 1/1] gcc: config: microblaze: fix cpu version check

2023-10-30 Thread Michael Eager

On 10/29/23 23:13, Neal Frager wrote:

The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp
instead of strverscmp to check the mcpu version against feature
options.  By simply changing the define to use strverscmp,
the new version 10.0 is treated correctly as a higher version
than previous versions.

Signed-off-by: Neal Frager 
---
V1->V2:
  - No need to create a new microblaze specific version check
routine as strverscmp is the correct solution.
V2->V3:
  - Changed mcpu define for microblaze isa testsuite examples.
V3->V4:
  - Added ChangeLog
V4->V5:
  - Added testsuite ChangeLog
---
  gcc/ChangeLog  | 4 
  gcc/config/microblaze/microblaze.cc| 2 +-
  gcc/testsuite/ChangeLog| 4 
  gcc/testsuite/gcc.target/microblaze/isa/bshift.c   | 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/div.c  | 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c| 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/fcmp2.c| 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/fcmp3.c| 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/fcmp4.c| 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/fcvt.c | 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/float.c| 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/fsqrt.c| 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/mul-bshift-pcmp.c  | 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/mul-bshift.c   | 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/mul.c  | 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/mulh-bshift-pcmp.c | 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/mulh.c | 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/nofcmp.c   | 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/nofloat.c  | 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/pcmp.c | 2 +-
  gcc/testsuite/gcc.target/microblaze/isa/vanilla.c  | 2 +-
  gcc/testsuite/gcc.target/microblaze/microblaze.exp | 2 +-
  22 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4964796c6a6..7f63f39d4cd 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2023-10-30  Neal Frager  
+
+   * config/microblaze/microblaze.cc: Fix mcpu version check.
+
  2023-10-29  Martin Uecker  
  
  	PR tree-optimization/109334

diff --git a/gcc/config/microblaze/microblaze.cc 
b/gcc/config/microblaze/microblaze.cc
index c9f6c4198cf..60ad55120d2 100644
--- a/gcc/config/microblaze/microblaze.cc
+++ b/gcc/config/microblaze/microblaze.cc
@@ -56,7 +56,7 @@
  /* This file should be included last.  */
  #include "target-def.h"
  
-#define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)

+#define MICROBLAZE_VERSION_COMPARE(VA,VB) strverscmp (VA, VB)
  
  /* Classifies an address.
  
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog

index 5c18129b4ac..1d7abcf2584 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2023-10-30  Neal Frager  
+
+   * gcc.target/microblaze: Bump tests to mcpu=v10.0.


Please look at gcc/testsuite/ChangeLog and follow the standard
practice:  List each file modified or added.

For example:

2023-10-23  Pan Li  

* gcc.target/riscv/rvv/base/binop_vv_constraint-1.c: Remove the
vsetvl asm check from func body.
* gcc.target/riscv/rvv/base/binop_vx_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-10.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-11.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-12.c: Ditto.




+
  2023-10-29  Iain Buclaw  
  
  	PR d/110712

diff --git a/gcc/testsuite/gcc.target/microblaze/isa/bshift.c 
b/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
index 64cf1e2e59e..664586bff9f 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mxl-barrel-shift" } */
+/* { dg-options "-O3 -mcpu=v10.0 -mxl-barrel-shift" } */
  
  volatile int m1, m2, m3;

  volatile unsigned int u1, u2, u3;
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/div.c 
b/gcc/testsuite/gcc.target/microblaze/isa/div.c
index 25ee42ce5c8..783e7c0f684 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/div.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/div.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mno-xl-soft-div" } */
+/* { dg-options "-O3 -mcpu=v10.0 -mno-xl-soft-div" } */
  
  volatile int m1, m2, m3;

  volatile long l1, l2;
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c 
b/gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c
index 4041a241391..b6202e168d6 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mhard-flo

[PATCH] RISC-V: Add vector fmin/fmax expanders.

2023-10-30 Thread Robin Dapp
Hi,

this patch adds expanders for fmin and fmax and the associated
cond and reduc ones.  As per RISC-V V spec 1.0 vfmin/vfmax are
IEEE 754-2008 compliant so that should be ok.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (3): fmax/fmin
expanders.
(cond_): Ditto.
(cond_len_): Ditto.
(reduc_fmax_scal_): Ditto.
(reduc_fmin_scal_): Ditto.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add fmin/fmax.
* config/riscv/vector-iterators.md (fmin): New UNSPEC.
(UNSPEC_VFMIN): Ditto.
* config/riscv/vector.md (@pred_): Add
UNSPEC insn patterns.
(@pred__scalar): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Remove
-ffast-math.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/fmax-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_zvfh-10.c: New test.
---
 gcc/config/riscv/autovec.md   | 72 +++
 gcc/config/riscv/riscv-v.cc   |  2 +
 gcc/config/riscv/vector-iterators.md  |  8 +++
 gcc/config/riscv/vector.md| 43 +++
 .../riscv/rvv/autovec/binop/fmax-1.c  | 24 +++
 .../riscv/rvv/autovec/binop/fmax_run-1.c  | 47 
 .../riscv/rvv/autovec/binop/fmax_zvfh-1.c | 23 ++
 .../riscv/rvv/autovec/binop/fmax_zvfh_run-1.c | 48 +
 .../riscv/rvv/autovec/binop/fmin-1.c  | 10 +++
 .../riscv/rvv/autovec/binop/fmin_run-1.c  |  5 ++
 .../riscv/rvv/autovec/binop/fmin_zvfh-1.c | 10 +++
 .../riscv/rvv/autovec/binop/fmin_zvfh_run-1.c |  5 ++
 .../riscv/rvv/autovec/cond/cond_fmax-1.c  |  6 +-
 .../riscv/rvv/autovec/cond/cond_fmax-2.c  |  3 +-
 .../riscv/rvv/autovec/cond/cond_fmax-3.c  |  6 +-
 .../riscv/rvv/autovec/cond/cond_fmax-4.c  |  6 +-
 .../riscv/rvv/autovec/cond/cond_fmax_run-1.c  |  3 +-
 .../riscv/rvv/autovec/cond/cond_fmax_run-2.c  |  3 +-
 .../riscv/rvv/autovec/cond/cond_fmax_run-3.c  |  3 +-
 .../riscv/rvv/autovec/co

Re: [ARC PATCH] Improve DImode left shift by a single bit.

2023-10-30 Thread Jeff Law




On 10/30/23 09:27, Roger Sayle wrote:


WRT H8.  Bug filed so we don't lose track of it.  We don't have DImode 
operations
defined on the H8.  First step would be DImode loads/stores and basic 
arithmetic.


The H8's machine description is impressively well organized.
Would it make sense to add a doubleword.md, or should DImode
support be added to each of the individual addsub.md, logical.md,
shiftrotate.md etc..?
No strong opinion :-)  Back when I reorganized this stuff I was just 
trying to get it more manageable than a single huge .md file -- 
especially when I knew the port was going to grow 2x larger with the cc0 
conversion.




The fact that register-to-register moves clobber some of the flags bits
must also make reload's task very difficult (impossible?).
The clobbering of CC doesn't show up until after reload.  At least 
they're not supposed to show up until then!  If they do, then it's a bug 
and a correctness issue waiting to happen.


Given the lack of registers on the H8 we never felt that supporting 
DImode in the target files was a good idea.  Just supporting a 64bit 
destructive add uses 5 of the 6 (or 7 if no frame pointer is needed) 
available registers.


Jeff


RE: [ARC PATCH] Improve DImode left shift by a single bit.

2023-10-30 Thread Roger Sayle
Hi Jeff,
> From: Jeff Law 
> Sent: 30 October 2023 15:09
> Subject: Re: [ARC PATCH] Improve DImode left shift by a single bit.
> 
> On 10/28/23 07:05, Roger Sayle wrote:
> >
> > This patch improves the code generated for X << 1 (and for X + X) when
> > X is 64-bit DImode, using the same two instruction code sequence used
> > for DImode addition.
> >
> > For the test case:
> >
> > long long foo(long long x) { return x << 1; }
> >
> > GCC -O2 currently generates the following code:
> >
> > foo:lsr r2,r0,31
> >  asl_s   r1,r1,1
> >  asl_s   r0,r0,1
> >  j_s.d   [blink]
> >  or_sr1,r1,r2
> >
> > and on CPU without a barrel shifter, i.e. -mcpu=em
> >
> > foo:add.f   0,r0,r0
> >  asl_s   r1,r1
> >  rlc r2,0
> >  asl_s   r0,r0
> >  j_s.d   [blink]
> >  or_sr1,r1,r2
> >
> > with this patch (both with and without a barrel shifter):
> >
> > foo:add.f   r0,r0,r0
> >  j_s.d   [blink]
> >  adc r1,r1,r1
> >
> > [For Jeff Law's benefit a similar optimization is also applicable to
> > H8300H, that could also use a two instruction sequence (plus rts) but
> > currently GCC generates 16 instructions (plus an rts) for foo above.]
> >
> > Tested with a cross-compiler to arc-linux hosted on x86_64, with no
> > new (compile-only) regressions from make -k check.
> > Ok for mainline if this passes Claudiu's nightly testing?
> WRT H8.  Bug filed so we don't lose track of it.  We don't have DImode 
> operations
> defined on the H8.  First step would be DImode loads/stores and basic 
> arithmetic.

The H8's machine description is impressively well organized.
Would it make sense to add a doubleword.md, or should DImode
support be added to each of the individual addsub.md, logical.md,
shiftrotate.md etc..?

The fact that register-to-register moves clobber some of the flags bits
must also make reload's task very difficult (impossible?).

Cheers,
Roger
--




Re: [PATCH 1/4] [RISC-V]add hook to control Zicond based ifcvt opt

2023-10-30 Thread Jeff Law




On 10/30/23 01:25, Fei Gao wrote:

TARGET_HAVE_COND_ZERO is added to control ifcvt optimization
for targets with RISC-V Zicond like insns.

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_have_cond_zero): Implement 
TARGET_HAVE_COND_ZERO
 (TARGET_HAVE_COND_ZERO): define RISC-V hook
 * doc/tm.texi: add TARGET_HAVE_COND_ZERO
 * doc/tm.texi.in: add TARGET_HAVE_COND_ZERO
 * target.def: define TARGET_HAVE_COND_ZERO
I'd like to avoid doing something like this.   If we must query the 
target for information, I think the better choice would be to construct 
a conditional zero insn, then see if it's recognized.   ie, why create a 
new target hook when we can ask the question using existing mechanisms.


jeff


[committed][wwwdocs] Uncomment link to "Porting to GCC 14"

2023-10-30 Thread Jonathan Wakely
Pushed to wwwdocs.

-- >8 --

---
 htdocs/gcc-14/changes.html | 2 --
 1 file changed, 2 deletions(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 5611fc4f..e5d3970c 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -17,11 +17,9 @@
 
 This page is a "brief" summary of some of the huge number of improvements
 in GCC 14.
-
 
 
 Note: GCC 14 has not been released yet, so this document is
-- 
2.41.0



[committed][wwwdocs] Add "Porting to GCC 14"

2023-10-30 Thread Jonathan Wakely
Pushed to wwwdocs.

-- >8 --

---
 htdocs/gcc-14/porting_to.html | 50 +++
 1 file changed, 50 insertions(+)
 create mode 100644 htdocs/gcc-14/porting_to.html

diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html
new file mode 100644
index ..dea9ac80
--- /dev/null
+++ b/htdocs/gcc-14/porting_to.html
@@ -0,0 +1,50 @@
+
+
+
+
+
+Porting to GCC 14
+https://gcc.gnu.org/gcc.css";>
+
+
+
+Porting to GCC 14
+
+
+The GCC 14 release series differs from previous GCC releases in
+a number of ways. Some of these are a result
+of bug fixing, and some old behaviors have been intentionally changed
+to support new standards, or relaxed in standards-conforming ways to
+facilitate compilation or run-time performance.
+
+
+
+Some of these changes are user visible and can cause grief when
+porting to GCC 14. This document is an effort to identify common issues
+and provide solutions. Let us know if you have suggestions for improvements!
+
+
+C++ language issues
+
+Header dependency changes
+Some C++ Standard Library headers have been changed to no longer include
+other headers that were being used internally by the library.
+As such, C++ programs that used standard library components without
+including the right headers will no longer compile.
+
+
+The following headers are used less widely in libstdc++ and may need to
+be included explicitly when compiling with GCC 14:
+
+
+ 
+  (for std::copy_n, std::lower_bound,
+  std::remove, std::reverse,
+  std::sort etc.)
+
+
+
+
+
+
+
-- 
2.41.0



Re: [ARC PATCH] Improve DImode left shift by a single bit.

2023-10-30 Thread Jeff Law




On 10/28/23 07:05, Roger Sayle wrote:


This patch improves the code generated for X << 1 (and for X + X) when
X is 64-bit DImode, using the same two instruction code sequence used
for DImode addition.

For the test case:

long long foo(long long x) { return x << 1; }

GCC -O2 currently generates the following code:

foo:lsr r2,r0,31
 asl_s   r1,r1,1
 asl_s   r0,r0,1
 j_s.d   [blink]
 or_sr1,r1,r2

and on CPU without a barrel shifter, i.e. -mcpu=em

foo:add.f   0,r0,r0
 asl_s   r1,r1
 rlc r2,0
 asl_s   r0,r0
 j_s.d   [blink]
 or_sr1,r1,r2

with this patch (both with and without a barrel shifter):

foo:add.f   r0,r0,r0
 j_s.d   [blink]
 adc r1,r1,r1

[For Jeff Law's benefit a similar optimization is also applicable to
H8300H, that could also use a two instruction sequence (plus rts) but
currently GCC generates 16 instructions (plus an rts) for foo above.]

Tested with a cross-compiler to arc-linux hosted on x86_64,
with no new (compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?
WRT H8.  Bug filed so we don't lose track of it.  We don't have DImode 
operations defined on the H8.  First step would be DImode loads/stores 
and basic arithmetic.


Jeff


[PATCH V2] aarch64: Implement the ACLE instruction/data prefetch functions.

2023-10-30 Thread Victor Do Nascimento
Correct CV-qualification from being erroeously applied to the `addr'
pointer, applying it instead to its pointer target, as specified by
the ACLE standards.

---

Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:

  1. Data prefetch intrinsics:
  
  void __pldx (/*constant*/ unsigned int /*access_kind*/,
   /*constant*/ unsigned int /*cache_level*/,
   /*constant*/ unsigned int /*retention_policy*/,
   void const volatile *addr);

  void __pld (void const volatile *addr);

  2. Instruction prefetch intrinsics:
  ---
  void __plix (/*constant*/ unsigned int /*cache_level*/,
   /*constant*/ unsigned int /*retention_policy*/,
   void const volatile *addr);

  void __pli (void const volatile *addr);

`__pldx' affords the programmer more fine-grained control over the
data prefetch behaviour than the analogous GCC builtin
`__builtin_prefetch', and allows access to the "SLC" cache level.

While `__builtin_prefetch' chooses both cache-level and retention
policy automatically via the optional `locality' parameter, `__pldx'
expects 2 (mandatory) arguments to explicitly define the desired
cache-level and retention policies.

`__plix' on the other hand, generates a code prefetch instruction and
so extends functionality on aarch64 targets beyond that which is
exposed by `builtin_prefetch'.

`__pld' and `__pli' do prefetch of data and instructions,
respectively, using default values for both cache-level and retention
policies.

Bootstrapped and tested on aarch64-none-linux-gnu.

[1] 
https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc:
(AARCH64_PLD): New enum aarch64_builtins entry.
(AARCH64_PLDX): Likewise.
(AARCH64_PLI): Likewise.
(AARCH64_PLIX): Likewise.
(aarch64_init_prefetch_builtin): New.
(aarch64_general_init_builtins): Call prefetch init function.
(aarch64_expand_prefetch_builtin): New.
(aarch64_general_expand_builtin):  Add prefetch expansion.
* config/aarch64/aarch64.md (UNSPEC_PLDX): New.
(aarch64_pldx): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/builtin_pld_pli.c: New.
---
 gcc/config/aarch64/aarch64-builtins.cc| 161 ++
 gcc/config/aarch64/aarch64.md |  12 ++
 gcc/config/aarch64/arm_acle.h |  30 
 .../gcc.target/aarch64/builtin_pldx.c |  90 ++
 4 files changed, 293 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pldx.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..27a4c87b300 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,10 @@ enum aarch64_builtins
   AARCH64_RBIT,
   AARCH64_RBITL,
   AARCH64_RBITLL,
+  AARCH64_PLD,
+  AARCH64_PLDX,
+  AARCH64_PLI,
+  AARCH64_PLIX,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1798,6 +1802,34 @@ aarch64_init_rng_builtins (void)
   AARCH64_BUILTIN_RNG_RNDRRS);
 }
 
+/* Add builtins for data and instrution prefetch.  */
+static void
+aarch64_init_prefetch_builtin (void)
+{
+#define AARCH64_INIT_PREFETCH_BUILTIN(INDEX, N)
\
+  aarch64_builtin_decls[INDEX] =   \
+aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX)
+
+  tree ftype;
+  tree cv_argtype;
+  cv_argtype = build_qualified_type (void_type_node, TYPE_QUAL_CONST
+| TYPE_QUAL_VOLATILE);
+  cv_argtype = build_pointer_type (cv_argtype);
+
+  ftype = build_function_type_list (void_type_node, cv_argtype, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLD, "pld");
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLI, "pli");
+
+  ftype = build_function_type_list (void_type_node, unsigned_type_node,
+   unsigned_type_node, unsigned_type_node,
+   cv_argtype, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLDX, "pldx");
+
+  ftype = build_function_type_list (void_type_node, unsigned_type_node,
+   unsigned_type_node, cv_argtype, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLIX, "plix");
+}
+
 /* Initialize the memory tagging extension (MTE) builtins.  */
 struct
 {
@@ -2019,6 +2051,8 @@ aarch64_general_init_builtins (void)
   aarch64_init_rng_builtins ();
   aarch64_init_data_intrinsics ();
 
+  aarch64_init_prefetch_builtin ();
+
   tree ftype_jcvt
 = build_function_type_list (intSI_type_node, double_type_node, NULL);
   aarch64_builtin_decls[AARCH64_JSCVT]
@@ -2599,6 +2633,127 @@ aarch64_expand_rng_builtin (tree exp, rtx target, int 
fcode, int ignore)
   return target;
 }
 
+/* Expand a prefetch

Re: [ARC PATCH] Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.

2023-10-30 Thread Jeff Law




On 10/29/23 03:16, Roger Sayle wrote:


This patch overhauls the ARC backend's insn_cost target hook, and makes
some related improvements to rtx_costs, BRANCH_COST, etc.  The primary
goal is to allow the backend to indicate that shifts and rotates are
slow (discouraged) when the CPU doesn't have a barrel shifter. I should
also acknowledge Richard Sandiford for inspiring the use of set_cost
in this rewrite of arc_insn_cost; this implementation borrows heavily
for the target hooks for AArch64 and ARM.

The motivating example is derived from PR rtl-optimization/110717.

struct S { int a : 5; };
unsigned int foo (struct S *p) {
   return p->a;
}

With a barrel shifter, GCC -O2 generates the reasonable:

foo:ldb_s   r0,[r0]
 asl_s   r0,r0,27
 j_s.d   [blink]
 asr_s   r0,r0,27

What's interesting is that during combine, the middle-end actually
has two shifts by three bits, and a sign-extension from QI to SI.

Trying 8, 9 -> 11:
 8: r158:SI=r157:QI#0<<0x3
   REG_DEAD r157:QI
 9: r159:SI=sign_extend(r158:SI#0)
   REG_DEAD r158:SI
11: r155:SI=r159:SI>>0x3
   REG_DEAD r159:SI

Whilst it's reasonable to simplify this to two shifts by 27 bits when
the CPU has a barrel shifter, it's actually a significant pessimization
when these shifts are implemented by loops.  This combination can be
prevented if the backend provides accurate-ish estimates for insn_cost.
Same scenario on the H8, though we already had the cost issues under 
control.  byte load (effectively shift by 24), 3 bit shifts and extension.


Jeff


Re: [PATCH] Testsuite, i386: Mark test as requiring ifunc

2023-10-30 Thread Richard Biener
On Mon, Oct 30, 2023 at 3:43 PM FX Coudert  wrote:
>
> Hi,
>
> The test is currently failing on x86_64-apple-darwin.
> Marking the test as requiring ifunc fixes the issue.
>
> OK to push?

OK.

>
> FX
>


[PATCH] Testsuite, i386: Mark test as requiring ifunc

2023-10-30 Thread FX Coudert
Hi,

The test is currently failing on x86_64-apple-darwin.
Marking the test as requiring ifunc fixes the issue.

OK to push?

FX



0001-Testsuite-i386-Mark-test-as-requiring-ifunc.patch
Description: Binary data


[PATCH] Testsuite, i386: Mark test as requiring dfp

2023-10-30 Thread FX Coudert
Hi,

The test is currently failing on x86_64-apple-darwin with "decimal 
floating-point not supported for this target”.
Marking the test as requiring dfp fixes the issue.

OK to push?

FX



0001-Testsuite-i386-Mark-test-as-requiring-dfp.patch
Description: Binary data


Re: [committed][_GLIBCXX_INLINE_VERSION] Add emul TLS symbol exports

2023-10-30 Thread Jonathan Wakely
On Sun, 29 Oct 2023 at 21:25, François Dumont  wrote:
>
> libstdc++: [_GLIBCXX_INLINE_VERSION] Add emul TLS symbols
>
> libstdc++-v3/ChangeLog:
>
>  * config/abi/pre/gnu-versioned-namespace.ver: Add missing emul TLS
>  symbols.


Please put a comment above the two new lines, the same as in gnu.ver:

# targets using emutls

OK with that change, thanks.



Re: [PATCH] Testsuite, Darwin: Fix trampoline warning

2023-10-30 Thread Iain Sandoe
Hi FX,

> On 30 Oct 2023, at 13:50, FX Coudert  wrote:
> 
> Heap-based trampolines are enabled on darwin20 and later, meaning that no 
> warning is emitted.
> Fixes the test failure on x86_64-apple-darwin21
> 
> OK to push?

OK, thanks
Iain




Re: [ARC PATCH] Convert (signed<<31)>>31 to -(signed&1) without barrel shifter.

2023-10-30 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

Do you want to say bmsk_s instead of msk_s here:
+/* { dg-final { scan-assembler "msk_s\\s+r0,r0,0" } } */

Anyhow, the patch looks good. Proceed with your commit.

Thank you,
Claudiu

On Mon, Oct 30, 2023 at 5:05 AM Jeff Law  wrote:
>
>
>
> On 10/28/23 10:47, Roger Sayle wrote:
> >
> > This patch optimizes PR middle-end/101955 for the ARC backend.  On ARC
> > CPUs with a barrel shifter, using two shifts is (probably) optimal as:
> >
> >  asl_s   r0,r0,31
> >  asr_s   r0,r0,31
> >
> > but without a barrel shifter, GCC -O2 -mcpu=em currently generates:
> >
> >  and r2,r0,1
> >  ror r2,r2
> >  add.f   0,r2,r2
> >  sbc r0,r0,r0
> >
> > with this patch, we now generate the smaller, faster and non-flags
> > clobbering:
> >
> >  bmsk_s  r0,r0,0
> >  neg_s   r0,r0
> >
> > Tested with a cross-compiler to arc-linux hosted on x86_64,
> > with no new (compile-only) regressions from make -k check.
> > Ok for mainline if this passes Claudiu's nightly testing?
> >
> >
> > 2023-10-28  Roger Sayle  
> >
> > gcc/ChangeLog
> >  PR middle-end/101955
> >  * config/arc/arc.md (*extvsi_1_0): New define_insn_and_split
> >  to convert sign extract of the least significant bit into an
> >  AND $1 then a NEG when !TARGET_BARREL_SHIFTER.
> >
> > gcc/testsuite/ChangeLog
> >  PR middle-end/101955
> >  * gcc.target/arc/pr101955.c: New test case.
> Good catch.  Looking to do something very similar on the H8 based on
> your work here.
>
> One the H8 we can use bld to load a bit from an 8 bit register into the
> C flag.  Then we use subtract with carry to get an 8 bit 0/-1 which we
> can then sign extend to 16 or 32 bits.  That covers bit positions 0..15
> of an SImode input.
>
> For bits 16..31 we can move the high half into the low half, the use the
> bld sequence.
>
> For bit zero the and+neg is the same number of clocks and size as bld
> based sequence.  But it'll simulate faster, so it's special cased.
>
>
> Jeff
>


[PATCH] Testsuite, Darwin: Fix trampoline warning

2023-10-30 Thread FX Coudert
Heap-based trampolines are enabled on darwin20 and later, meaning that no 
warning is emitted.
Fixes the test failure on x86_64-apple-darwin21

OK to push?
FX



0001-Testsuite-Darwin-Fix-trampoline-warning.patch
Description: Binary data


Re: [ARC PATCH] Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.

2023-10-30 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

You have a block of 8 spaces that needs to be replaced by tabs:
gcc/config/arc/arc.cc:5538:0:   if (n < 4)

Please fix the above, and proceed with your commit.

Thank you,
Claudiu

On Sun, Oct 29, 2023 at 11:16 AM Roger Sayle  wrote:
>
>
> This patch overhauls the ARC backend's insn_cost target hook, and makes
> some related improvements to rtx_costs, BRANCH_COST, etc.  The primary
> goal is to allow the backend to indicate that shifts and rotates are
> slow (discouraged) when the CPU doesn't have a barrel shifter. I should
> also acknowledge Richard Sandiford for inspiring the use of set_cost
> in this rewrite of arc_insn_cost; this implementation borrows heavily
> for the target hooks for AArch64 and ARM.
>
> The motivating example is derived from PR rtl-optimization/110717.
>
> struct S { int a : 5; };
> unsigned int foo (struct S *p) {
>   return p->a;
> }
>
> With a barrel shifter, GCC -O2 generates the reasonable:
>
> foo:ldb_s   r0,[r0]
> asl_s   r0,r0,27
> j_s.d   [blink]
> asr_s   r0,r0,27
>
> What's interesting is that during combine, the middle-end actually
> has two shifts by three bits, and a sign-extension from QI to SI.
>
> Trying 8, 9 -> 11:
> 8: r158:SI=r157:QI#0<<0x3
>   REG_DEAD r157:QI
> 9: r159:SI=sign_extend(r158:SI#0)
>   REG_DEAD r158:SI
>11: r155:SI=r159:SI>>0x3
>   REG_DEAD r159:SI
>
> Whilst it's reasonable to simplify this to two shifts by 27 bits when
> the CPU has a barrel shifter, it's actually a significant pessimization
> when these shifts are implemented by loops.  This combination can be
> prevented if the backend provides accurate-ish estimates for insn_cost.
>
>
> Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:
>
> foo:ldb_s   r0,[r0]
> mov lp_count,27
> lp  2f
> add r0,r0,r0
> nop
> 2:  # end single insn loop
> mov lp_count,27
> lp  2f
> asr r0,r0
> nop
> 2:  # end single insn loop
> j_s [blink]
>
> which contains two loops and requires about ~113 cycles to execute.
> With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:
>
> foo:ldb_s   r0,[r0]
> mov_s   r2,0;3
> add3r0,r2,r0
> sexb_s  r0,r0
> asr_s   r0,r0
> asr_s   r0,r0
> j_s.d   [blink]
> asr_s   r0,r0
>
> which requires only ~6 cycles, for the shorter shifts by 3 and sign
> extension.
>
>
> Tested with a cross-compiler to arc-linux hosted on x86_64,
> with no new (compile-only) regressions from make -k check.
> Ok for mainline if this passes Claudiu's nightly testing?
>
>
> 2023-10-29  Roger Sayle  
>
> gcc/ChangeLog
> * config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
> Provide reasonable values for SHIFTS and ROTATES by constant
> bit counts depending upon TARGET_BARREL_SHIFTER.
> (arc_insn_cost): Use insn attributes if the instruction is
> recognized.  Avoid calling get_attr_length for type "multi",
> i.e. define_insn_and_split patterns without explicit type.
> Fall-back to set_rtx_cost for single_set and pattern_cost
> otherwise.
> * config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
> (BRANCH_COST): Improve/correct definition.
> (LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.
>
>
> Thanks again,
> Roger
> --
>


Re: [committed][_GLIBCXX_INLINE_VERSION] Fix constract violation

2023-10-30 Thread Jonathan Wakely
On Sun, 29 Oct 2023 at 21:11, François Dumont  wrote:
>
> This fixes handle_contract_violation under versioned namespace mode.
>
> Tested under Linux x64 and confirmed to also fix Darwin build.
>
> libstdc++: [_GLIBCXX_INLINE_VERSION] Provide handle_contract_violation
> symbol
>
> libstdc++-v3/ChangeLog:
>
>  * src/experimental/contract.cc
>  [_GLIBCXX_INLINE_VERSION](handle_contract_violation): Provide
> symbol
>  without version namespace decoration for gcc.

>+#if _GLIBCXX_INLINE_VERSION
>+// Provide symbol without version namespace decoration for gcc.

For the comment in the code, I think this would be better:

// The compiler expects the contract_violation class to be in an unversioned
// namespace, so provide a forwarding function with the expected symbol name.

Do we want the forwarding function to be a weak symbol? The main
handler function is weak because we want users to be able to override
it with their own handler. But for this new forwarding function, they
can't even declare it (because it has a reserved name that doesn't
demangle to a valid type for the versioned namespace build).



[PING][PATCH RFA] PR target/111815: VAX: Only accept the index scaler as the RHS operand to ASHIFT

2023-10-30 Thread Maciej W. Rozycki
On Mon, 16 Oct 2023, Maciej W. Rozycki wrote:

>  The testcase is generic enough I thought it wouldn't hurt to place it in 
> a generic part of the testsuite, where it has been verified to pass with 
> the `powerpc64le-linux-gnu', `riscv64-linux-gnu', and `vax-netbsdelf' 
> targets.  I'm fine to move it to the VAX part of the testsuite though if 
> there's disagreement as to my choice.  Otherwise OK to apply for this 
> part?

 Ping for: 
.

  Maciej


Re: [ARC PATCH] Improved SImode shifts and rotates with -mswap.

2023-10-30 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

+(define_insn "si2_cnt16"
+  [(set (match_operand:SI 0 "dest_reg_operand" "=w")

Please use "register_operand", and "r" constraint.

+(ANY_ROTATE:SI (match_operand:SI 1 "register_operand" "c")

Please use "r" constraint instead of "c".

+   (const_int 16)))]
+  "TARGET_SWAP"
+  "swap\\t%0,%1"

Otherwise, it looks good to me. Please fix the above and proceed with
your commit.

Thank you for your contribution,
Claudiu


Re: [PATCH v8] tree-ssa-sink: Improve code sinking pass

2023-10-30 Thread Ajit Agarwal



On 30/10/23 5:51 pm, Ajit Agarwal wrote:
> Hello Richard:
> 
> On 17/10/23 2:47 pm, Richard Biener wrote:
>> On Tue, Oct 17, 2023 at 10:53 AM Ajit Agarwal  wrote:
>>>
>>> Hello Richard:
>>>
>>> On 17/10/23 2:03 pm, Richard Biener wrote:
 On Thu, Oct 12, 2023 at 10:42 AM Ajit Agarwal  
 wrote:
>
> This patch improves code sinking pass to sink statements before call to 
> reduce
> register pressure.
> Review comments are incorporated. Synced and modified with latest trunk 
> sources.
>
> For example :
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>   l = a + b + c + d +e + f;
>   if (a != 5)
> {
>   bar();
>   j = l;
> }
> }
>
> Code Sinking does the following:
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>
>   if (a != 5)
> {
>   l = a + b + c + d +e + f;
>   bar();
>   j = l;
> }
> }
>
> Bootstrapped regtested on powerpc64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
> tree-ssa-sink: Improve code sinking pass
>
> Currently, code sinking will sink code after function calls.  This 
> increases
> register pressure for callee-saved registers.  The following patch 
> improves
> code sinking by placing the sunk code before calls in the use block or in
> the immediate dominator of the use blocks.

 The patch no longer does what the description above says.
>>> Why you think so. Please let me know.
>>
>> You talk about calls above but the patch doesn't do anything about calls.  
>> You
>> also don't do anything about register pressure, rather the effect of
>> your changes
>> are to move some stmts by a smaller "distance", whatever effect that has.
>>

> 
> I have incorporated the changes in version 11 of the patch.
 More comments below.

> 2023-10-12  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> PR tree-optimization/81953
> * tree-ssa-sink.cc (statement_sink_location): Move statements 
> before
> calls.
> (select_best_block): Add heuristics to select the best blocks in 
> the
> immediate post dominator.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/81953
> * gcc.dg/tree-ssa/ssa-sink-20.c: New test.
> * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 ++
>  gcc/tree-ssa-sink.cc| 39 -
>  3 files changed, 56 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> new file mode 100644
> index 000..d3b79ca5803
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> new file mode 100644
> index 000..84e7938c54f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j, x;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  if (b != 3)
> +x = 3;
> +  else
> +x = 5;
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index a360c5cdd6e..95298bc8402 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -174,7 +174,8 @@ nearest_common_dominator_of_uses (def_operand_p 
> def_p, bool *debug_stmts)
>
>  /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
> tree, return the best basic block between them (inclusive) to place
> -   statements.
>

[PATCH] Assert we don't create recursive DW_AT_abstract_origin

2023-10-30 Thread Richard Biener
We have a support case that shows GCC 7 sometimes creates
DW_TAG_label refering to itself via a DW_AT_abstract_origin
when using LTO.  This for example triggers the sanity check
added below during LTO bootstrap.

Making this check cover more than just DW_AT_abstract_origin
breaks bootstrap on trunk for

  /* GNU extension: Record what type our vtable lives in.  */
  if (TYPE_VFIELD (type))
{
  tree vtype = DECL_FCONTEXT (TYPE_VFIELD (type));

  gen_type_die (vtype, context_die);
  add_AT_die_ref (type_die, DW_AT_containing_type,
  lookup_type_die (vtype));

so the check is for now restricted to DW_AT_abstract_origin.

Bootstrapped on x86_64-unknown-linux-gnu, OK?

My workaround for the GCC 7 problem is

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 5590845d2a4..07185a1a0d3 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -23030,7 +23031,7 @@ gen_label_die (tree decl, dw_die_ref context_die)
   lbl_die = new_die (DW_TAG_label, context_die, decl);
   equate_decl_number_to_die (decl, lbl_die);
 
-  if (origin != NULL)
+  if (origin != NULL && origin != decl)
add_abstract_origin_attribute (lbl_die, origin);
   else
add_name_and_src_coords_attributes (lbl_die, decl);

that's not needed on trunk because there we dont' end up
with LABEL_DECLs with self-DECL_ABSTRACT_ORIGIN (and not DECL_ABSTRACT).

Thanks,
Richard.

* dwarf2out.cc (add_AT_die_ref): Assert we do not add
a self-ref DW_AT_abstract_origin.
---
 gcc/dwarf2out.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 1e0cec66c5e..0070a9e8412 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -4908,6 +4908,7 @@ add_AT_die_ref (dw_die_ref die, enum dwarf_attribute 
attr_kind, dw_die_ref targ_
 {
   dw_attr_node attr;
   gcc_checking_assert (targ_die != NULL);
+  gcc_assert (targ_die != die || attr_kind != DW_AT_abstract_origin);
 
   /* With LTO we can end up trying to reference something we didn't create
  a DIE for.  Avoid crashing later on a NULL referenced DIE.  */
-- 
2.35.3


Re: [wwwdocs] Get newlib via git in simtest-howto.html

2023-10-30 Thread Richard Biener
On Fri, Oct 27, 2023 at 6:39 PM Roger Sayle  wrote:
>
>
> A minor tweak to the documentation, to use git rather than cvs to obtain
> the latest version of newlib.  Ok for mainline?

OK

>
> 2023-10-27  Roger Sayle  
>
> * htdocs/simtest-howto.html: Use git to obtain newlib.
>
> Cheers,
> Roger
> --
>


Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2023-10-30 Thread Richard Biener
On Fri, Oct 27, 2023 at 4:28 PM Thomas Schwinge  wrote:
>
> Hi!
>
> Richard, as the original author of 'SSA_NAME_POINTS_TO_READONLY_MEMORY':
> 2018 commit 6214d5c7e7470bdd5ecbeae668c2522551bfebbc (Subversion r263958)
> "Move const_parm trick to generic code"; 'gcc/tree.h':
>
> /* Nonzero if this SSA_NAME is known to point to memory that may not
>be written to.  This is set for default defs of function parameters
>that have a corresponding r or R specification in the functions
>fn spec attribute.  This is used by alias analysis.  */
> #define SSA_NAME_POINTS_TO_READONLY_MEMORY(NODE) \
> SSA_NAME_CHECK (NODE)->base.deprecated_flag
>
> ..., may I ask you to please help review the following patch
> (full-quoted)?
>
> For context: this patch here ("second patch") depends on a first patch:
> 
> "[PATCH, OpenACC 2.7] readonly modifier support in front-ends".  That one
> is still under review/rework; so you're not able to apply this second
> patch here.
>
> In a nutshell: a 'readonly' modifier has been added to the OpenACC
> 'copyin' clause (copy host to device memory, don't copy back at end of
> region):
>
> | If the optional 'readonly' modifier appears, then the implementation may 
> assume that the data
> | referenced by _var-list_ is never written to within the applicable region.
>
> That is, for example (untested):
>
> #pragma acc routine
> void escape(int *);
>
> int x[32] = [...];
> #pragma acc parallel copyin(readonly: x)
> {
>   int a1 = x[3];
>   escape(x);
>   int a2 = x[3]; // Per 'readonly', don't need to reload 'x[3]' here.
>   //x[22] = 0; // Invalid -- but no diagnostic mandated.
> }
>
> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
> flag.
>
> The actual optimization then is done in this second patch.  Chung-Lin
> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
> I don't have much experience with most of the following generic code, so
> would appreciate a helping hand, whether that conceptually makes sense as
> well as from the implementation point of view:

No, I don't think you can use that flag on non-default-defs, nor
preserve it on copying.  So
it also doesn't nicely extend to DECLs as done by the patch.  We
currently _only_ use it
for incoming parameters.  When used on arbitrary code you can get to for example

ptr1(points-to-readony-memory) = &p->x;
... access via ptr1 ...
ptr2 = &p->x;
... access via ptr2 ...

where both are your OMP regions differently constrained (the constrain is on the
code in the region, _not_ on the actual protections of the pointed to
data, much like
for the fortran case).  But now CSE comes along and happily replaces all ptr2
with ptr2 in the second region and ... oops!

So no, re-using SSA_NAME_POINTS_TO_READONLY_MEMORY doesn't look good.

Richard.

> On 2023-07-25T23:52:06+0800, Chung-Lin Tang via Gcc-patches 
>  wrote:
> > On 2023/7/11 2:33 AM, Chung-Lin Tang via Gcc-patches wrote:
> >> As we discussed earlier, the work for actually linking this to middle-end
> >> points-to analysis is a somewhat non-trivial issue. This first patch allows
> >> the language feature to be used in OpenACC directives first (with no 
> >> effect for now).
> >> The middle-end changes are probably going to be a later patch.
> >
> > This second patch tries to link the readonly modifier to points-to analysis.
> >
> > There already exists SSA_NAME_POINTS_TO_READONLY_MEMORY and it's support in 
> > the
> > alias oracle routines in tree-ssa-alias.cc, so basically what this patch 
> > does is
> > try to make the variables holding the array section base pointers to have 
> > this
> > flag set.
> >
> > There is an another OMP_CLAUSE_MAP_POINTS_TO_READONLY set by front-ends on 
> > the
> > associated pointer clauses if OMP_CLAUSE_MAP_READONLY is set.
> > Also a DECL_POINTS_TO_READONLY flag is set for VAR_DECLs when creating the 
> > tmp
> > vars carrying these receiver references on the offloaded side. These
> > eventually get translated to SSA_NAME_POINTS_TO_READONLY_MEMORY.
>
>
> > This still doesn't always work as expected in terms of optimization:
> > struct pointer fields and Fortran arrays (kind of like C structs) which have
> > several accesses to create the pointer access on the receive/offloaded side,
> > and SRA appears to not work on these sequences, so gets in the way of much
> > redundancy elimination.
>
> I understand correctly that this is left as future work?  Please add the test
> cases you have, XFAILed in some reasonable way.
>
>
> > Currently have one testcase where we can demonstrate 'readonly' can avoid
> > a clobber by function call.
>
> :-)
>
>
> > --- a/gcc/c/c-typeck.cc
> > +++ b/gcc/c/c-typeck.cc
> > @@ -14258,6 +14258,8 @@ handle_omp_array_sections (tree c, enum 
> > c_omp_region_type ort)
> >   OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
> >else

Re: [PATCH] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined

2023-10-30 Thread chenglulu



在 2023/10/30 下午8:26, Xi Ruoyao 写道:

On Mon, 2023-10-30 at 19:50 +0800, chenglulu wrote:

在 2023/10/30 下午7:42, Xi Ruoyao 写道:

Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
building a cross compiler if the cross assembler is not installed yet.

gcc/ChangeLog:

    * config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
    if not defined yet.
---

Ok for trunk?

I have no problem with this submission, but I don't understand the
circumstances surrounding the error.

When the developers hack GCC they sometimes build a cross compiler with
no cross assembler, then HAVE_AS_TLS will just be undefined.  And in the
future we may have an assmebler w/o TLS support (for example a tiny
assembler for bare-metal target), then HAVE_AS_TLS will be undefined
too.


Ok!

Thanks!



The error message is:

g++ -c   -g -O2   -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -I. 
-Ibuild -I../../gcc/gcc -I../../gcc/gcc/build -I../../gcc/gcc/../include  
-I../../gcc/gcc/../libcpp/include  \
-o build/gencondmd.o build/gencondmd.cc
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
  3655 |   "HAVE_AS_TLS"
   |  ^~~
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
  3655 |   "HAVE_AS_TLS"
   |  ^~~
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
  3655 |   "HAVE_AS_TLS"
   |  ^~~
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
  3655 |   "HAVE_AS_TLS"
   |  ^~~
make[1]: *** [Makefile:2962: build/gencondmd.o] Error 1





Re: [PATCH] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined

2023-10-30 Thread Xi Ruoyao
On Mon, 2023-10-30 at 19:50 +0800, chenglulu wrote:
> 在 2023/10/30 下午7:42, Xi Ruoyao 写道:
> > Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
> > building a cross compiler if the cross assembler is not installed yet.
> > 
> > gcc/ChangeLog:
> > 
> >     * config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
> >     if not defined yet.
> > ---
> > 
> > Ok for trunk?
> I have no problem with this submission, but I don't understand the 
> circumstances surrounding the error.

When the developers hack GCC they sometimes build a cross compiler with
no cross assembler, then HAVE_AS_TLS will just be undefined.  And in the
future we may have an assmebler w/o TLS support (for example a tiny
assembler for bare-metal target), then HAVE_AS_TLS will be undefined
too.

The error message is:

g++ -c   -g -O2   -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -I. 
-Ibuild -I../../gcc/gcc -I../../gcc/gcc/build -I../../gcc/gcc/../include  
-I../../gcc/gcc/../libcpp/include  \
-o build/gencondmd.o build/gencondmd.cc
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
 3655 |   "HAVE_AS_TLS"
  |  ^~~
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
 3655 |   "HAVE_AS_TLS"
  |  ^~~
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
 3655 |   "HAVE_AS_TLS"
  |  ^~~
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
 3655 |   "HAVE_AS_TLS"
  |  ^~~
make[1]: *** [Makefile:2962: build/gencondmd.o] Error 1

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v8] tree-ssa-sink: Improve code sinking pass

2023-10-30 Thread Ajit Agarwal
Hello Richard:

On 17/10/23 2:47 pm, Richard Biener wrote:
> On Tue, Oct 17, 2023 at 10:53 AM Ajit Agarwal  wrote:
>>
>> Hello Richard:
>>
>> On 17/10/23 2:03 pm, Richard Biener wrote:
>>> On Thu, Oct 12, 2023 at 10:42 AM Ajit Agarwal  
>>> wrote:

 This patch improves code sinking pass to sink statements before call to 
 reduce
 register pressure.
 Review comments are incorporated. Synced and modified with latest trunk 
 sources.

 For example :

 void bar();
 int j;
 void foo(int a, int b, int c, int d, int e, int f)
 {
   int l;
   l = a + b + c + d +e + f;
   if (a != 5)
 {
   bar();
   j = l;
 }
 }

 Code Sinking does the following:

 void bar();
 int j;
 void foo(int a, int b, int c, int d, int e, int f)
 {
   int l;

   if (a != 5)
 {
   l = a + b + c + d +e + f;
   bar();
   j = l;
 }
 }

 Bootstrapped regtested on powerpc64-linux-gnu.

 Thanks & Regards
 Ajit

 tree-ssa-sink: Improve code sinking pass

 Currently, code sinking will sink code after function calls.  This 
 increases
 register pressure for callee-saved registers.  The following patch improves
 code sinking by placing the sunk code before calls in the use block or in
 the immediate dominator of the use blocks.
>>>
>>> The patch no longer does what the description above says.
>> Why you think so. Please let me know.
> 
> You talk about calls above but the patch doesn't do anything about calls.  You
> also don't do anything about register pressure, rather the effect of
> your changes
> are to move some stmts by a smaller "distance", whatever effect that has.
> 
>>>

I have incorporated the changes in version 11 of the patch.
>>> More comments below.
>>>
 2023-10-12  Ajit Kumar Agarwal  

 gcc/ChangeLog:

 PR tree-optimization/81953
 * tree-ssa-sink.cc (statement_sink_location): Move statements 
 before
 calls.
 (select_best_block): Add heuristics to select the best blocks in 
 the
 immediate post dominator.

 gcc/testsuite/ChangeLog:

 PR tree-optimization/81953
 * gcc.dg/tree-ssa/ssa-sink-20.c: New test.
 * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
 ---
  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 ++
  gcc/tree-ssa-sink.cc| 39 -
  3 files changed, 56 insertions(+), 17 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 new file mode 100644
 index 000..d3b79ca5803
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 @@ -0,0 +1,15 @@
 +/* { dg-do compile } */
 +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
 +void bar();
 +int j;
 +void foo(int a, int b, int c, int d, int e, int f)
 +{
 +  int l;
 +  l = a + b + c + d +e + f;
 +  if (a != 5)
 +{
 +  bar();
 +  j = l;
 +}
 +}
 +/* { dg-final { scan-tree-dump 
 {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
 new file mode 100644
 index 000..84e7938c54f
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
 @@ -0,0 +1,19 @@
 +/* { dg-do compile } */
 +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
 +void bar();
 +int j, x;
 +void foo(int a, int b, int c, int d, int e, int f)
 +{
 +  int l;
 +  l = a + b + c + d +e + f;
 +  if (a != 5)
 +{
 +  bar();
 +  if (b != 3)
 +x = 3;
 +  else
 +x = 5;
 +  j = l;
 +}
 +}
 +/* { dg-final { scan-tree-dump 
 {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
 diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
 index a360c5cdd6e..95298bc8402 100644
 --- a/gcc/tree-ssa-sink.cc
 +++ b/gcc/tree-ssa-sink.cc
 @@ -174,7 +174,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
 bool *debug_stmts)

  /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
 tree, return the best basic block between them (inclusive) to place
 -   statements.
 +   statements. The best basic block should be an immediate dominator of
 +   best basic block if the use stmt is after the call.

 We want the most control dependent block in the shallowest loop nest.

[PATCH v3] VECT: Refine the type size restriction of call vectorizer

2023-10-30 Thread pan2 . li
From: Pan Li 

Update in v3:

* Add func to predicate type size is legal or not for vectorizer call.

Update in v2:

* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.

Original log:

The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}

lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type

Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to refine this data
type size check and unblock the standard name like lrintmn2 on conditions.

The type size of vectype_out need to be exactly the same as the type
size of vectype_in when the vectype_out size isn't participating in
the optab selection. While there is no such restriction when the
vectype_out is somehow a part of the optab query.

The below test are passed for this patch.

* The x86 bootstrap and regression test.
* The aarch64 regression test.
* The risc-v regression tests.
* Ensure the lrintf standard name in risc-v.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_type_size_legal_p): New
func impl to predicate the type size is legal or not.
(vectorizable_call): Leverage vectorizable_type_size_legal_p.

Signed-off-by: Pan Li 
---
 gcc/tree-vect-stmts.cc | 51 +++---
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..24b3448d961 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1430,6 +1430,35 @@ vectorizable_internal_function (combined_fn cfn, tree 
fndecl,
   return IFN_LAST;
 }
 
+/* Return TRUE when the type size is legal for the call vectorizer,
+   or FALSE.
+   The type size of both the vectype_in and vectype_out should be
+   exactly the same when vectype_out isn't participating the optab.
+   While there is no restriction for type size when vectype_out
+   is part of the optab query.
+ */
+static bool
+vectorizable_type_size_legal_p (internal_fn ifn, tree vectype_out,
+   tree vectype_in)
+{
+  bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
+
+  if (ifn == IFN_LAST || !direct_internal_fn_p (ifn))
+return same_size_p;
+
+  const direct_internal_fn_info &difn_info = direct_internal_fn (ifn);
+
+  if (!difn_info.vectorizable)
+return same_size_p;
+
+  /* According to vectorizable_internal_function, the type0/1 < 0 indicates
+ the vectype_out participating the optable selection.  Aka the type size
+ check can be skipped here.  */
+  if (difn_info.type0 < 0 || difn_info.type1 < 0)
+return true;
+
+  return same_size_p;
+}
 
 static tree permute_vec_elements (vec_info *, tree, tree, tree, stmt_vec_info,
  gimple_stmt_iterator *);
@@ -3361,19 +3390,6 @@ vectorizable_call (vec_info *vinfo,
 
   return false;
 }
-  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
- just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
- are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
- by a pack of the two vectors into an SI vector.  We would need
- separate code to handle direct VnDI->VnSI IFN_CTZs.  */
-  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"mismatched vector sizes %T and %T\n",
-vectype_in, vectype_out);
-  return false;
-}
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
   != VECTOR_BOOLEAN_TYPE_P (vectype_in))
@@ -3431,6 +3447,15 @@ vectorizable_call (vec_info *vinfo,
 ifn = vectorizable_internal_function (cfn, callee, vectype_out,
  vectype_in);
 
+  if (!vectorizable_type_size_legal_p (ifn, vectype_out, vectype_in))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"mismatched vector sizes %T and %T\n",
+vectype_in, vectype_out);
+  return false;
+}
+
   /* If that fails, try asking for a target-specific built-in function.  */
   if (ifn == IFN_LAST)
 {
-- 
2.34.1



[PATCH V11] : tree-ssa-sink: Improve code sinking pass

2023-10-30 Thread Ajit Agarwal
Hello Richard:

Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in immediate dominator with same loop nest depth.

Review comments are incorporated.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;

  if (a != 5)
{
  l = a + b + c + d +e + f;
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in immediate dominator with same loop nest depth.

2023-10-30  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements with
same loop nest depth.
(select_best_block): Add heuristics to select the best blocks in the
immediate dominato for same loop nest depthr.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
* gcc.dg/tree-ssa/ssa-sink-22.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++
 gcc/tree-ssa-sink.cc| 21 ++---
 3 files changed, 48 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index a360c5cdd6e..0b823b81309 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
tree, return the best basic block between them (inclusive) to place
statements.
 
+   The best basic block should be an immediate dominator of
+   best basic block if we've moved to same loop nest.
+
We want the most control dependent block in the shallowest loop nest.
 
If the resulting block is in a shallower loop nest, then use it.  Else
@@ -201,14 +204,13 @@ select_best_block (basic_block early_bb,
 {
   /* If we've moved into a lower loop nest, then that becomes
 our best block.  */
-  if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
+  if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
best_bb = temp_bb;
 
   /* Walk up the dominator tree, hopefully we'll find a shallower
 loop nest.  */
   temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
 }
-
   /* Placing a statement before a setjmp-like function would be invalid
  (it cannot be reevaluated when execution follows an abnormal edge).
  If we selected a block with abnormal predecessors, just punt.  */
@@ -250,7 +252,14 @@ select_best_block (basic_block early_bb,
   /* If result of comparsion is unknown, prefer EARLY_BB.
 Thus use !(...>=..) rather than (...<...)  */
   && !(best_bb->count * 100 >= early_bb->count * threshold))
-return best_bb;
+{
+ /* Avoid sinking to immediate dominator if the statement to be moved
+has memory operand and same loop nest.  */
+  if (best_bb != late_bb && gimple_vuse (stmt))
+   return late_bb;
+
+  return best_bb;
+}
 
   /* No better block found, so return EARLY_BB, which happens to be the
  statement's or

  1   2   >