On Tue, Apr 16, 2019 at 11:41 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> On Tue, Apr 16, 2019 at 8:36 AM Martin Liška <mli...@suse.cz> wrote:
> >
> > On 4/16/19 4:50 PM, H.J. Lu wrote:
> > > On Tue, Apr 16, 2019 at 1:28 AM Martin Liška <mli...@suse.cz> wrote:
> > >>
> > >> On 4/15/19 5:09 PM, H.J. Lu wrote:
> > >>> On Mon, Apr 15, 2019 at 12:26 AM Martin Liška <mli...@suse.cz> wrote:
> > >>>>
> > >>>> On 4/12/19 4:12 PM, H.J. Lu wrote:
> > >>>>> On Fri, Apr 12, 2019 at 4:41 AM Martin Liška <mli...@suse.cz> wrote:
> > >>>>>>
> > >>>>>> On 4/11/19 6:30 PM, H.J. Lu wrote:
> > >>>>>>> On Thu, Apr 11, 2019 at 1:38 AM Martin Liška <mli...@suse.cz> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi.
> > >>>>>>>>
> > >>>>>>>> The patch is adding missing AVX512 ISAs for target and target_clone
> > >>>>>>>> attributes.
> > >>>>>>>>
> > >>>>>>>> Patch can bootstrap on x86_64-linux-gnu and survives regression 
> > >>>>>>>> tests.
> > >>>>>>>>
> > >>>>>>>> Ready to be installed?
> > >>>>>>>> Thanks,
> > >>>>>>>> Martin
> > >>>>>>>>
> > >>>>>>>> gcc/ChangeLog:
> > >>>>>>>>
> > >>>>>>>> 2019-04-10  Martin Liska  <mli...@suse.cz>
> > >>>>>>>>
> > >>>>>>>>         PR target/89929
> > >>>>>>>>         * config/i386/i386.c (get_builtin_code_for_version): Add
> > >>>>>>>>         support for missing AVX512 ISAs.
> > >>>>>>>>
> > >>>>>>>> gcc/testsuite/ChangeLog:
> > >>>>>>>>
> > >>>>>>>> 2019-04-10  Martin Liska  <mli...@suse.cz>
> > >>>>>>>>
> > >>>>>>>>         PR target/89929
> > >>>>>>>>         * g++.target/i386/mv28.C: New test.
> > >>>>>>>>         * gcc.target/i386/mvc14.c: New test.
> > >>>>>>>> ---
> > >>>>>>>>  gcc/config/i386/i386.c                | 34 
> > >>>>>>>> ++++++++++++++++++++++++++-
> > >>>>>>>>  gcc/testsuite/g++.target/i386/mv28.C  | 30 +++++++++++++++++++++++
> > >>>>>>>>  gcc/testsuite/gcc.target/i386/mvc14.c | 16 +++++++++++++
> > >>>>>>>>  3 files changed, 79 insertions(+), 1 deletion(-)
> > >>>>>>>>  create mode 100644 gcc/testsuite/g++.target/i386/mv28.C
> > >>>>>>>>  create mode 100644 gcc/testsuite/gcc.target/i386/mvc14.c
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>> Hi.
> > >>>>>>
> > >>>>>>> Since any ISAs beyond AVX512F may be enabled individually, we
> > >>>>>>> can't simply assign priorities to them.   For GFNI, we can have
> > >>>>>>>
> > >>>>>>> 1. GFNI
> > >>>>>>> 2.  GFNI + AVX
> > >>>>>>> 3.  GFNI + AVX512F
> > >>>>>>> 4. GFNI + AVX512F + AVX512VL
> > >>>>>>
> > >>>>>> Makes sense to me! I'm considering syntax extension where one would 
> > >>>>>> be
> > >>>>>> able to come up with a priority. Eg.
> > >>>>>>
> > >>>>>> __attribute__((target("gfni,avx512bw", priority((3)))))
> > >>>>>>
> > >>>>>> Without that the ISA combinations are probably not comparable in a 
> > >>>>>> reasonable way.
> > >>>>>>
> > >>>>>>>
> > >>>>>>> For this code,  GFNI + AVX512BW is ignored:
> > >>>>>>>
> > >>>>>>> [hjl@gnu-cfl-1 pr89929]$ cat z.ii
> > >>>>>>> __attribute__((target("gfni")))
> > >>>>>>> int foo(int i) {
> > >>>>>>>         return 1;
> > >>>>>>> }
> > >>>>>>> __attribute__((target("gfni,avx512bw")))
> > >>>>>>> int foo(int i) {
> > >>>>>>>         return 4;
> > >>>>>>> }
> > >>>>>>> __attribute__((target("default")))
> > >>>>>>> int foo(int i) {
> > >>>>>>>         return 3;
> > >>>>>>> }
> > >>>>>>> int bar ()
> > >>>>>>> {
> > >>>>>>>         return foo(2);
> > >>>>>>> }
> > >>>>>>
> > >>>>>> For 'target' attribute it works for me:
> > >>>>>>
> > >>>>>> 1) $ cat z.c && ./xg++ -B. z.c -c
> > >>>>>> #include <x86intrin.h>
> > >>>>>> volatile __m512i x1, x2;
> > >>>>>> volatile __mmask64 m64;
> > >>>>>>
> > >>>>>> __attribute__((target("gfni")))
> > >>>>>> int foo(int i) {
> > >>>>>>     x1 = _mm512_gf2p8affineinv_epi64_epi8(x1, x2, 3);
> > >>>>>>     return 1;
> > >>>>>> }
> > >>>>>> __attribute__((target("gfni,avx512bw")))
> > >>>>>> int foo(int i) {
> > >>>>>>     return 4;
> > >>>>>> }
> > >>>>>> __attribute__((target("default")))
> > >>>>>> int foo(int i) {
> > >>>>>>   return 3;
> > >>>>>> }
> > >>>>>> int bar ()
> > >>>>>> {
> > >>>>>>         return foo(2);
> > >>>>>> }
> > >>>>>> In file included from ./include/immintrin.h:117,
> > >>>>>>                  from ./include/x86intrin.h:32,
> > >>>>>>                  from z.c:1:
> > >>>>>> z.c: In function ‘int foo(int)’:
> > >>>>>> z.c:7:10: error: ‘__builtin_ia32_vgf2p8affineinvqb_v64qi’ needs isa 
> > >>>>>> option -m32 -mgfni -mavx512f
> > >>>>>>     7 |     x1 = _mm512_gf2p8affineinv_epi64_epi8(x1, x2, 3);
> > >>>>>>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >>>>>> z.c:7:10: note: the ABI for passing parameters with 64-byte 
> > >>>>>> alignment has changed in GCC 4.6
> > >>>>>>
> > >>>>>> 2) $ cat z.c && ./xg++ -B. z.c -c
> > >>>>>> #include <x86intrin.h>
> > >>>>>> volatile __m512i x1, x2;
> > >>>>>> volatile __mmask64 m64;
> > >>>>>>
> > >>>>>> __attribute__((target("gfni")))
> > >>>>>> int foo(int i) {
> > >>>>>>     return 1;
> > >>>>>> }
> > >>>>>> __attribute__((target("gfni,avx512bw")))
> > >>>>>> int foo(int i) {
> > >>>>>>     x1 = _mm512_gf2p8affineinv_epi64_epi8(x1, x2, 3);
> > >>>>>>     return 4;
> > >>>>>> }
> > >>>>>> __attribute__((target("default")))
> > >>>>>> int foo(int i) {
> > >>>>>>   return 3;
> > >>>>>> }
> > >>>>>> int bar ()
> > >>>>>> {
> > >>>>>>         return foo(2);
> > >>>>>> }
> > >>>>>>
> > >>>>>> [OK]
> > >>>>>>
> > >>>>>> Btw. is it really correct the '-m32' in: 'needs isa option -m32' ?
> > >>>>>
> > >>>>> It does look odd.
> > >>>>
> > >>>> Then let me take a look at this.
> > >>>>
> > >>>>>
> > >>>>>> Similar applies to target_clone attribute where we'll have to come 
> > >>>>>> up with
> > >>>>>> a syntax that will allow multiple ISA to be combined. Something like:
> > >>>>>>
> > >>>>>> __attribute__((target_clones("gfni+avx512bw")))
> > >>>>>>
> > >>>>>> ? Priorities can be maybe implemented by order?
> > >>>>>>
> > >>>>>
> > >>>>> I am thinking -misa=processor which will enable ISAs for
> > >>>>> processor.  It differs from -march=.  -misa= doesn't set
> > >>>>> -mtune.
> > >>>>>
> > >>>>
> > >>>> Well, isn't that what we currently support, e.g.:
> > >>>>
> > >>>> $ cat mvc11.c && gcc mvc11.c -c
> > >>>> __attribute__((target_clones("arch=sandybridge", "arch=cascadelake", 
> > >>>> "default"))) int
> > >>>> foo (void)
> > >>>> {
> > >>>>   return 0;
> > >>>> }
> > >>>>
> > >>>> int
> > >>>> main ()
> > >>>> {
> > >>>>   foo ();
> > >>>> }
> > >>>>
> > >>>> If so, we can provide a new warning that will tell that for AVX512* on 
> > >>>> should use 'arch=xyz'
> > >>>> instead?
> > >>>>
> > >>>
> > >>> 1. We don't have one option to enable AVX512F and AVX512CD, which are 
> > >>> common to
> > >>> Skylake server and KNL.
> > >>
> > >> Then arch=axy is the solution for that.
> > >>
> > >>> 2. -march= also implies -mtune, which may not be wanted.
> > >>
> > >> Why is that not intended. Using target (or target_clone), I would expect 
> > >> the
> > >> fastest possible version of the function for defined ISA (or ARCH). Then
> > >> implying -mtune is reasonable for me.
> > >>
> > >>> 3. -march=icelake-client enables more than just AVX512XXX.
> > >>
> > >> Yes, but if you take a look at number of AVX512 ISA extensions (18), then
> > >> it looks more easier to me to use arch=xyz as it leads to reasonable 
> > >> number
> > >> of architectures.
> > >>
> > >
> > > Are you proposing to add a new set of -march=XXX to support the family
> > > of processors with AVX512?  Adding -march=XXX is much more invasive
> > > that adding -misa=XXX, which is just a combination of -mavx512XXXs.
> > > Do we really want to do that?
> > >
> >
> > No, I'm saying that I would reject all these:
> >
> > +      {"avx512f", P_AVX512F},
> > +      {"avx512vl", P_AVX512VL},
> > +      {"avx512bw", P_AVX512BW},
> > +      {"avx512dq", P_AVX512DQ},
> > +      {"avx512cd", P_AVX512CD},
> > +      {"avx512er", P_AVX512ER},
> > +      {"avx512pf", P_AVX512PF},
> > +      {"avx512vbmi", P_AVX512VBMI},
> > +      {"avx512ifma", P_AVX512IFMA},
> > +      {"avx5124vnniw", P_AVX5124VNNIW},
> > +      {"avx5124fmaps", P_AVX5124FMAPS},
> > +      {"avx512vpopcntdq", P_AVX512VPOPCNTDQ},
> > +      {"avx512vbmi2", P_AVX512VBMI2},
> > +      {"gfni", P_GFNI},
> > +      {"vpclmulqdq", P_VPCLMULQDQ},
> > +      {"avx512vnni", P_AVX512VNNI},
> > +      {"avx512bitalg", P_AVX512BITALG}
>
> Works for me.
>
> > as arguments to target_clone attribute and instead I would recommend to 
> > users to use
> > target_clones(arch=skylake,arch=skylake-avx512,arch=cannonlake,arch=icelake-client,arch=icelake-server,
> >  ..)
>
> Please check with bug reporter to verify that this solves his problem.
>
> > It's because we don't have a syntax for combination of ISAs for a 
> > target_clones attribute (e.g. avx512f+avx512cd+avx512er)
> > and we don't have a syntax for priorities.
> >
>
> Thanks.
>
> --
> H.J.


Any other comments, I'll merge this to trunk?

-- 
BR,
Hongtao

Reply via email to