Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-09 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 8:38 AM Uros Bizjak  wrote:
>
> On Wed, Aug 9, 2023 at 8:37 AM Liu, Hongtao  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Uros Bizjak 
> > > Sent: Wednesday, August 9, 2023 2:33 PM
> > > To: Liu, Hongtao 
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy
> > > Bridge.
> > >
> > > On Wed, Aug 9, 2023 at 3:48 AM liuhongt  wrote:
> > > >
> > > > > Please rather do it in a more self-descriptive way, as proposed in
> > > > > the attached patch. You won't need a comment then.
> > > > >
> > > >
> > > > Adjusted in V2 patch.
> > > >
> > > > Don't access leaf 7 subleaf 1 unless subleaf 0 says it is supported
> > > > via EAX.
> > > >
> > > > Intel documentation says invalid subleaves return 0. We had been
> > > > relying on that behavior instead of checking the max sublef number.
> > > >
> > > > It appears that some Sandy Bridge CPUs return at least the subleaf 0
> > > > EDX value for subleaf 1. Best guess is that this is a bug in a
> > > > microcode patch since all of the bits we're seeing set in EDX were
> > > > introduced after Sandy Bridge was originally released.
> > > >
> > > > This is causing avxvnniint16 to be incorrectly enabled with
> > > > -march=native on these CPUs.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * common/config/i386/cpuinfo.h (get_available_features): Check
> > > > EAX for valid subleaf before use CPUID.
> > > > ---
> > > >  gcc/common/config/i386/cpuinfo.h | 82
> > > > +---
> > > >  1 file changed, 43 insertions(+), 39 deletions(-)
> > > >
> > > > diff --git a/gcc/common/config/i386/cpuinfo.h
> > > > b/gcc/common/config/i386/cpuinfo.h
> > > > index 30ef0d334ca..9fa4dec2a7e 100644
> > > > --- a/gcc/common/config/i386/cpuinfo.h
> > > > +++ b/gcc/common/config/i386/cpuinfo.h
> > > > @@ -663,6 +663,7 @@ get_available_features (struct __processor_model
> > > *cpu_model,
> > > >unsigned int max_cpuid_level = cpu_model2->__cpu_max_level;
> > > >unsigned int eax, ebx;
> > > >unsigned int ext_level;
> > > > +  unsigned int subleaf_level;
> > >
> > > Oh, I failed this in my previous review. This variable should be named
> > > max_subleaf_level, as it represents the maximum supported ECX value.
> > I've committed previous patch ,but not backport yet.
> > Guess I can just commit another patch to change the name?
> > For backport, I'll merge the change together with just 1 commit.
>
> Yes. It is a trivial minor change.

I also think the declaration should go inside (max_cpuid_level >= 7)
block, since it is used only there (and irrelevant outside the block),
but it is your call...

Uros.


Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-09 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 8:37 AM Liu, Hongtao  wrote:
>
>
>
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Wednesday, August 9, 2023 2:33 PM
> > To: Liu, Hongtao 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy
> > Bridge.
> >
> > On Wed, Aug 9, 2023 at 3:48 AM liuhongt  wrote:
> > >
> > > > Please rather do it in a more self-descriptive way, as proposed in
> > > > the attached patch. You won't need a comment then.
> > > >
> > >
> > > Adjusted in V2 patch.
> > >
> > > Don't access leaf 7 subleaf 1 unless subleaf 0 says it is supported
> > > via EAX.
> > >
> > > Intel documentation says invalid subleaves return 0. We had been
> > > relying on that behavior instead of checking the max sublef number.
> > >
> > > It appears that some Sandy Bridge CPUs return at least the subleaf 0
> > > EDX value for subleaf 1. Best guess is that this is a bug in a
> > > microcode patch since all of the bits we're seeing set in EDX were
> > > introduced after Sandy Bridge was originally released.
> > >
> > > This is causing avxvnniint16 to be incorrectly enabled with
> > > -march=native on these CPUs.
> > >
> > > gcc/ChangeLog:
> > >
> > > * common/config/i386/cpuinfo.h (get_available_features): Check
> > > EAX for valid subleaf before use CPUID.
> > > ---
> > >  gcc/common/config/i386/cpuinfo.h | 82
> > > +---
> > >  1 file changed, 43 insertions(+), 39 deletions(-)
> > >
> > > diff --git a/gcc/common/config/i386/cpuinfo.h
> > > b/gcc/common/config/i386/cpuinfo.h
> > > index 30ef0d334ca..9fa4dec2a7e 100644
> > > --- a/gcc/common/config/i386/cpuinfo.h
> > > +++ b/gcc/common/config/i386/cpuinfo.h
> > > @@ -663,6 +663,7 @@ get_available_features (struct __processor_model
> > *cpu_model,
> > >unsigned int max_cpuid_level = cpu_model2->__cpu_max_level;
> > >unsigned int eax, ebx;
> > >unsigned int ext_level;
> > > +  unsigned int subleaf_level;
> >
> > Oh, I failed this in my previous review. This variable should be named
> > max_subleaf_level, as it represents the maximum supported ECX value.
> I've committed previous patch ,but not backport yet.
> Guess I can just commit another patch to change the name?
> For backport, I'll merge the change together with just 1 commit.

Yes. It is a trivial minor change.

Uros.


RE: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-09 Thread Liu, Hongtao via Gcc-patches


> -Original Message-
> From: Uros Bizjak 
> Sent: Wednesday, August 9, 2023 2:33 PM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy
> Bridge.
> 
> On Wed, Aug 9, 2023 at 3:48 AM liuhongt  wrote:
> >
> > > Please rather do it in a more self-descriptive way, as proposed in
> > > the attached patch. You won't need a comment then.
> > >
> >
> > Adjusted in V2 patch.
> >
> > Don't access leaf 7 subleaf 1 unless subleaf 0 says it is supported
> > via EAX.
> >
> > Intel documentation says invalid subleaves return 0. We had been
> > relying on that behavior instead of checking the max sublef number.
> >
> > It appears that some Sandy Bridge CPUs return at least the subleaf 0
> > EDX value for subleaf 1. Best guess is that this is a bug in a
> > microcode patch since all of the bits we're seeing set in EDX were
> > introduced after Sandy Bridge was originally released.
> >
> > This is causing avxvnniint16 to be incorrectly enabled with
> > -march=native on these CPUs.
> >
> > gcc/ChangeLog:
> >
> > * common/config/i386/cpuinfo.h (get_available_features): Check
> > EAX for valid subleaf before use CPUID.
> > ---
> >  gcc/common/config/i386/cpuinfo.h | 82
> > +---
> >  1 file changed, 43 insertions(+), 39 deletions(-)
> >
> > diff --git a/gcc/common/config/i386/cpuinfo.h
> > b/gcc/common/config/i386/cpuinfo.h
> > index 30ef0d334ca..9fa4dec2a7e 100644
> > --- a/gcc/common/config/i386/cpuinfo.h
> > +++ b/gcc/common/config/i386/cpuinfo.h
> > @@ -663,6 +663,7 @@ get_available_features (struct __processor_model
> *cpu_model,
> >unsigned int max_cpuid_level = cpu_model2->__cpu_max_level;
> >unsigned int eax, ebx;
> >unsigned int ext_level;
> > +  unsigned int subleaf_level;
> 
> Oh, I failed this in my previous review. This variable should be named
> max_subleaf_level, as it represents the maximum supported ECX value.
I've committed previous patch ,but not backport yet.
Guess I can just commit another patch to change the name?
For backport, I'll merge the change together with just 1 commit.
> 
> Uros.
> 
> >
> >/* Get XCR_XFEATURE_ENABLED_MASK register with xgetbv.  */
> >  #define XCR_XFEATURE_ENABLED_MASK  0x0
> > @@ -762,7 +763,7 @@ get_available_features (struct __processor_model
> *cpu_model,
> >/* Get Advanced Features at level 7 (eax = 7, ecx = 0/1). */
> >if (max_cpuid_level >= 7)
> >  {
> > -  __cpuid_count (7, 0, eax, ebx, ecx, edx);
> > +  __cpuid_count (7, 0, subleaf_level, ebx, ecx, edx);
> >if (ebx & bit_BMI)
> > set_feature (FEATURE_BMI);
> >if (ebx & bit_SGX)
> > @@ -874,45 +875,48 @@ get_available_features (struct
> __processor_model *cpu_model,
> > set_feature (FEATURE_AVX512FP16);
> > }
> >
> > -  __cpuid_count (7, 1, eax, ebx, ecx, edx);
> > -  if (eax & bit_HRESET)
> > -   set_feature (FEATURE_HRESET);
> > -  if (eax & bit_CMPCCXADD)
> > -   set_feature(FEATURE_CMPCCXADD);
> > -  if (edx & bit_PREFETCHI)
> > -   set_feature (FEATURE_PREFETCHI);
> > -  if (eax & bit_RAOINT)
> > -   set_feature (FEATURE_RAOINT);
> > -  if (avx_usable)
> > -   {
> > - if (eax & bit_AVXVNNI)
> > -   set_feature (FEATURE_AVXVNNI);
> > - if (eax & bit_AVXIFMA)
> > -   set_feature (FEATURE_AVXIFMA);
> > - if (edx & bit_AVXVNNIINT8)
> > -   set_feature (FEATURE_AVXVNNIINT8);
> > - if (edx & bit_AVXNECONVERT)
> > -   set_feature (FEATURE_AVXNECONVERT);
> > - if (edx & bit_AVXVNNIINT16)
> > -   set_feature (FEATURE_AVXVNNIINT16);
> > - if (eax & bit_SM3)
> > -   set_feature (FEATURE_SM3);
> > - if (eax & bit_SHA512)
> > -   set_feature (FEATURE_SHA512);
> > - if (eax & bit_SM4)
> > -   set_feature (FEATURE_SM4);
> > -   }
> > -  if (avx512_usable)
> > -   {
> > - if (eax & bit_AVX512BF16)
> > -   set_feature (FEATURE_AVX512BF16);
> > -   }
> > -  if (amx_usable)
> > +  if (subleaf_level >= 1)
> > {
> > - if (eax & bit_AMX_FP16)
> > -   set_feature (FEATURE_AMX_FP16);
> > - 

Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-09 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 3:48 AM liuhongt  wrote:
>
> > Please rather do it in a more self-descriptive way, as proposed in the
> > attached patch. You won't need a comment then.
> >
>
> Adjusted in V2 patch.
>
> Don't access leaf 7 subleaf 1 unless subleaf 0 says it is
> supported via EAX.
>
> Intel documentation says invalid subleaves return 0. We had been
> relying on that behavior instead of checking the max sublef number.
>
> It appears that some Sandy Bridge CPUs return at least the subleaf 0
> EDX value for subleaf 1. Best guess is that this is a bug in a
> microcode patch since all of the bits we're seeing set in EDX were
> introduced after Sandy Bridge was originally released.
>
> This is causing avxvnniint16 to be incorrectly enabled with
> -march=native on these CPUs.
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_available_features): Check
> EAX for valid subleaf before use CPUID.
> ---
>  gcc/common/config/i386/cpuinfo.h | 82 +---
>  1 file changed, 43 insertions(+), 39 deletions(-)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index 30ef0d334ca..9fa4dec2a7e 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -663,6 +663,7 @@ get_available_features (struct __processor_model 
> *cpu_model,
>unsigned int max_cpuid_level = cpu_model2->__cpu_max_level;
>unsigned int eax, ebx;
>unsigned int ext_level;
> +  unsigned int subleaf_level;

Oh, I failed this in my previous review. This variable should be named
max_subleaf_level, as it represents the maximum supported ECX value.

Uros.

>
>/* Get XCR_XFEATURE_ENABLED_MASK register with xgetbv.  */
>  #define XCR_XFEATURE_ENABLED_MASK  0x0
> @@ -762,7 +763,7 @@ get_available_features (struct __processor_model 
> *cpu_model,
>/* Get Advanced Features at level 7 (eax = 7, ecx = 0/1). */
>if (max_cpuid_level >= 7)
>  {
> -  __cpuid_count (7, 0, eax, ebx, ecx, edx);
> +  __cpuid_count (7, 0, subleaf_level, ebx, ecx, edx);
>if (ebx & bit_BMI)
> set_feature (FEATURE_BMI);
>if (ebx & bit_SGX)
> @@ -874,45 +875,48 @@ get_available_features (struct __processor_model 
> *cpu_model,
> set_feature (FEATURE_AVX512FP16);
> }
>
> -  __cpuid_count (7, 1, eax, ebx, ecx, edx);
> -  if (eax & bit_HRESET)
> -   set_feature (FEATURE_HRESET);
> -  if (eax & bit_CMPCCXADD)
> -   set_feature(FEATURE_CMPCCXADD);
> -  if (edx & bit_PREFETCHI)
> -   set_feature (FEATURE_PREFETCHI);
> -  if (eax & bit_RAOINT)
> -   set_feature (FEATURE_RAOINT);
> -  if (avx_usable)
> -   {
> - if (eax & bit_AVXVNNI)
> -   set_feature (FEATURE_AVXVNNI);
> - if (eax & bit_AVXIFMA)
> -   set_feature (FEATURE_AVXIFMA);
> - if (edx & bit_AVXVNNIINT8)
> -   set_feature (FEATURE_AVXVNNIINT8);
> - if (edx & bit_AVXNECONVERT)
> -   set_feature (FEATURE_AVXNECONVERT);
> - if (edx & bit_AVXVNNIINT16)
> -   set_feature (FEATURE_AVXVNNIINT16);
> - if (eax & bit_SM3)
> -   set_feature (FEATURE_SM3);
> - if (eax & bit_SHA512)
> -   set_feature (FEATURE_SHA512);
> - if (eax & bit_SM4)
> -   set_feature (FEATURE_SM4);
> -   }
> -  if (avx512_usable)
> -   {
> - if (eax & bit_AVX512BF16)
> -   set_feature (FEATURE_AVX512BF16);
> -   }
> -  if (amx_usable)
> +  if (subleaf_level >= 1)
> {
> - if (eax & bit_AMX_FP16)
> -   set_feature (FEATURE_AMX_FP16);
> - if (edx & bit_AMX_COMPLEX)
> -   set_feature (FEATURE_AMX_COMPLEX);
> + __cpuid_count (7, 1, eax, ebx, ecx, edx);
> + if (eax & bit_HRESET)
> +   set_feature (FEATURE_HRESET);
> + if (eax & bit_CMPCCXADD)
> +   set_feature(FEATURE_CMPCCXADD);
> + if (edx & bit_PREFETCHI)
> +   set_feature (FEATURE_PREFETCHI);
> + if (eax & bit_RAOINT)
> +   set_feature (FEATURE_RAOINT);
> + if (avx_usable)
> +   {
> + if (eax & bit_AVXVNNI)
> +   set_feature (FEATURE_AVXVNNI);
> + if (eax & bit_AVXIFMA)
> +   set_feature (FEATURE_AVXIFMA);
> + if (edx & bit_AVXVNNIINT8)
> +   set_feature (FEATURE_AVXVNNIINT8);
> + if (edx & bit_AVXNECONVERT)
> +   set_feature (FEATURE_AVXNECONVERT);
> + if (edx & bit_AVXVNNIINT16)
> +   set_feature (FEATURE_AVXVNNIINT16);
> + if (eax & bit_SM3)
> +   set_feature (FEATURE_SM3);
> + if (eax & bit_SHA512)
> +   set_feature (FEATURE_SHA512);
> + if (eax & bit_SM4)
> +   set_feature (FEATURE_SM4);
> +   }
> + if (avx512_usable)
> +   {
> + if (eax & bit_AVX512BF16)
> +  

Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 3:48 AM liuhongt  wrote:
>
> > Please rather do it in a more self-descriptive way, as proposed in the
> > attached patch. You won't need a comment then.
> >
>
> Adjusted in V2 patch.
>
> Don't access leaf 7 subleaf 1 unless subleaf 0 says it is
> supported via EAX.
>
> Intel documentation says invalid subleaves return 0. We had been
> relying on that behavior instead of checking the max sublef number.

Probably a documentation bug, even Wikipedia says about CPUID:

EAX=7, ECX=0: Extended Features

This returns extended feature flags in EBX, ECX, and EDX. Returns the
maximum ECX value for EAX=7 in EAX.

> It appears that some Sandy Bridge CPUs return at least the subleaf 0
> EDX value for subleaf 1. Best guess is that this is a bug in a
> microcode patch since all of the bits we're seeing set in EDX were
> introduced after Sandy Bridge was originally released.
>
> This is causing avxvnniint16 to be incorrectly enabled with
> -march=native on these CPUs.
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_available_features): Check
> EAX for valid subleaf before use CPUID.

OK for mainline and backports.

Thanks,
Uros.

> ---
>  gcc/common/config/i386/cpuinfo.h | 82 +---
>  1 file changed, 43 insertions(+), 39 deletions(-)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index 30ef0d334ca..9fa4dec2a7e 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -663,6 +663,7 @@ get_available_features (struct __processor_model 
> *cpu_model,
>unsigned int max_cpuid_level = cpu_model2->__cpu_max_level;
>unsigned int eax, ebx;
>unsigned int ext_level;
> +  unsigned int subleaf_level;
>
>/* Get XCR_XFEATURE_ENABLED_MASK register with xgetbv.  */
>  #define XCR_XFEATURE_ENABLED_MASK  0x0
> @@ -762,7 +763,7 @@ get_available_features (struct __processor_model 
> *cpu_model,
>/* Get Advanced Features at level 7 (eax = 7, ecx = 0/1). */
>if (max_cpuid_level >= 7)
>  {
> -  __cpuid_count (7, 0, eax, ebx, ecx, edx);
> +  __cpuid_count (7, 0, subleaf_level, ebx, ecx, edx);
>if (ebx & bit_BMI)
> set_feature (FEATURE_BMI);
>if (ebx & bit_SGX)
> @@ -874,45 +875,48 @@ get_available_features (struct __processor_model 
> *cpu_model,
> set_feature (FEATURE_AVX512FP16);
> }
>
> -  __cpuid_count (7, 1, eax, ebx, ecx, edx);
> -  if (eax & bit_HRESET)
> -   set_feature (FEATURE_HRESET);
> -  if (eax & bit_CMPCCXADD)
> -   set_feature(FEATURE_CMPCCXADD);
> -  if (edx & bit_PREFETCHI)
> -   set_feature (FEATURE_PREFETCHI);
> -  if (eax & bit_RAOINT)
> -   set_feature (FEATURE_RAOINT);
> -  if (avx_usable)
> -   {
> - if (eax & bit_AVXVNNI)
> -   set_feature (FEATURE_AVXVNNI);
> - if (eax & bit_AVXIFMA)
> -   set_feature (FEATURE_AVXIFMA);
> - if (edx & bit_AVXVNNIINT8)
> -   set_feature (FEATURE_AVXVNNIINT8);
> - if (edx & bit_AVXNECONVERT)
> -   set_feature (FEATURE_AVXNECONVERT);
> - if (edx & bit_AVXVNNIINT16)
> -   set_feature (FEATURE_AVXVNNIINT16);
> - if (eax & bit_SM3)
> -   set_feature (FEATURE_SM3);
> - if (eax & bit_SHA512)
> -   set_feature (FEATURE_SHA512);
> - if (eax & bit_SM4)
> -   set_feature (FEATURE_SM4);
> -   }
> -  if (avx512_usable)
> -   {
> - if (eax & bit_AVX512BF16)
> -   set_feature (FEATURE_AVX512BF16);
> -   }
> -  if (amx_usable)
> +  if (subleaf_level >= 1)
> {
> - if (eax & bit_AMX_FP16)
> -   set_feature (FEATURE_AMX_FP16);
> - if (edx & bit_AMX_COMPLEX)
> -   set_feature (FEATURE_AMX_COMPLEX);
> + __cpuid_count (7, 1, eax, ebx, ecx, edx);
> + if (eax & bit_HRESET)
> +   set_feature (FEATURE_HRESET);
> + if (eax & bit_CMPCCXADD)
> +   set_feature(FEATURE_CMPCCXADD);
> + if (edx & bit_PREFETCHI)
> +   set_feature (FEATURE_PREFETCHI);
> + if (eax & bit_RAOINT)
> +   set_feature (FEATURE_RAOINT);
> + if (avx_usable)
> +   {
> + if (eax & bit_AVXVNNI)
> +   set_feature (FEATURE_AVXVNNI);
> + if (eax & bit_AVXIFMA)
> +   set_feature (FEATURE_AVXIFMA);
> + if (edx & bit_AVXVNNIINT8)
> +   set_feature (FEATURE_AVXVNNIINT8);
> + if (edx & bit_AVXNECONVERT)
> +   set_feature (FEATURE_AVXNECONVERT);
> + if (edx & bit_AVXVNNIINT16)
> +   set_feature (FEATURE_AVXVNNIINT16);
> + if (eax & bit_SM3)
> +   set_feature (FEATURE_SM3);
> + if (eax & bit_SHA512)
> +   set_feature (FEATURE_SHA512);
> + if (eax & bit_SM4)
> +   set_feature (FEATURE_SM4);
> +   }