On Thu, Mar 02, 2017 at 10:03:28AM +0100, Thomas Koenig wrote:
> Am 02.03.2017 um 09:43 schrieb Jakub Jelinek:
> > On Wed, Mar 01, 2017 at 10:00:08PM +0100, Thomas Koenig wrote:
> > > @@ -101,7 +93,7 @@
> > >  `static void
> > >  'matmul_name` ('rtype` * const restrict retarray,
> > >   'rtype` * const restrict a, 'rtype` * const restrict b, int try_blas,
> > > - int blas_limit, blas_call gemm) __attribute__((__target__("avx2")));
> > > + int blas_limit, blas_call gemm) __attribute__((__target__("avx2,fma")));
> > >  static' include(matmul_internal.m4)dnl
> > >  `#endif /* HAVE_AVX2 */
> > > 
> > 
> > I guess the question here is if there are any CPUs that have AVX2 but don't
> > have FMA3.  If there are none, then this is not controversial, if there are
> > some, it depends on how widely they are used compared to ones that have both
> > AVX2 and FMA3.  Going just from our -march= bitsets, it seems if there is
> > PTA_AVX2, then there is also PTA_FMA: haswell, broadwell, skylake, 
> > skylake-avx512, knl,
> > bdver4, znver1, there are CPUs that have just PTA_AVX and not PTA_AVX2 and
> > still have PTA_FMA: bdver2, bdver3 (but that is not relevant to this patch).
> 
> In a previous incantation of the patch, I saw that the compiler
> generated the same floating point code for AVX and AVX2 (which why
> there currently is no AVX2 floating point version).  I could also
> generate an AVX+FMA version for floating point and an AVX2 version
> for integer (if anybody cares about integer matmul).

I think having another avx,fma version is not worth it, avx+fma is far less
common than avx without fma.

> > > @@ -147,7 +141,8 @@
> > >  #endif  /* HAVE_AVX512F */
> > > 
> > >  #ifdef HAVE_AVX2
> > > -           if (__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2))
> > > +           if ((__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2))
> > > +      && (__cpu_model.__cpu_features[0] & (1 << FEATURE_FMA)))
> > >       {
> > >         matmul_p = matmul_'rtype_code`_avx2;
> > >         goto tailcall;
> > 
> > and this too.
> 
> Will do.

Note I meant obviously the FEATURE_AVX512F related hunk, not this one,
sorry.

        Jakub

Reply via email to