RE: Intel AVX10.1 Compiler Design and Support

Jiang, Haochen via Gcc-patches Thu, 10 Aug 2023 08:10:37 -0700

Hi all,

There are lots of discussions on arch level and ABIs and I really appreciate 
that.


For the arch level issue, it might be a little early to discuss and should not 
block
these patches.

For ABI issue, the problem actually comes from the current behavior between
GCC and clang/LLVM are different in return value for m512 w/o 512 bit support.
Then it becomes a question to get unified and we get the whole discussion.
However, it is a corner case.

So let's first focus on the options design and the behavior on that. We could
continue to discuss those two issues after the main behavior is settled down.
Richard has raised some concerns in option combinations. Any other concerns?

Thx,
Haochen

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+haochen.jiang=intel....@gcc.gnu.org> On Behalf Of Haochen Jiang via
> Gcc-patches
> Sent: Tuesday, August 8, 2023 3:13 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao <hongtao....@intel.com>
> Subject: Intel AVX10.1 Compiler Design and Support
> 
> Hi all,
> 
> We will send out our initial support of AVX10 and some sample patches in this
> mailing thread. And there will be more coming up afterwards. Therefore, we
> would like to share our proposed AVX10 design in GCC.
> 
> Here is a quick introduction to AVX10:
>   - AVX10 is the first major new ISA since the introduction of AVX512 in 2013.
>   - Since the introduction of AVX10, we would like to establish a common,
>     converged vector instruction set across all Intel architectures, including
>     Xeon Server, Atom Server and Clients.
>   - The default maximum vector size for AVX10 will be 256 bit, while 512 bit 
> is
>     optional.
>   - AVX10.1 will include all existing AVX512 instructions in Granite Rapids.
>   - There will be no new AVX512 CPUID introduced in future. All EVEX vector
>     instructions will be under AVX10 umbrella.
>   - AVX10 will be version-based ISA instead of tons of different CPUIDs like
>     AVX512BW, AVX512DQ, AVX512FP16, etc.
>   - Based on AVX10.1, AVX10.2 will introduce ymm embedded rounding, SAE
>     (Suppressed All Exceptions) control and new instructions.
> 
> If you would like to have a closed look at the details, please follow the 
> links
> below:
> 
> Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification 
> It
> describes the Intel Advanced Vector Extensions 10 Instruction Set 
> Architecture.
> https://cdrdv2.intel.com/v1/dl/getContent/784267
> 
> The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper 
> It
> provides introductory information regarding the converged vector ISA: Intel
> Advanced Vector Extensions 10.
> https://cdrdv2.intel.com/v1/dl/getContent/784343
> 
> Hence, we will have several compiler design ground rules for AVX10:
>   - AVX10 is a converged ISA feature set.
>     We will not provide -m[no-]xxx to enable/disable each single vector 
> feature
>     in one version as we used to before. Instead, a simple option 
> -m[no-]avx10.x
>     is used. If 512 bit version is needed, -mavx10.x-512 is all you need. 
> Also,
>     maximum vector width should be the same when different version of AVX10 is
>     used. For example, enabling AVX10.1 with 512 bit vector width while 
> enabling
>     AVX10.2 with only 256 bit vector width is not a desired behavior.
>   - AVX10 is an evolving ISA feature set.
>     Every feature showed up in the current version will always show up in 
> future
>     version.
>   - AVX10 is an independent ISA feature set.
>     Although sharing the same instructions and encodings, AVX10 and AVX512 are
>     conceptual independent features, which means they are orthogonal.
> 
> Since AVX10 will have several benefits like bringing AVX512 features on Atom
> Server and Clients and getting rid of tons of AVX512 CPUIDs but a simple AVX10
> option to enable features, we lean towards the adoption of AVX10 instead of
> AVX512 from now on.
> 
> Based on all we got, we would like to introduce the following compiler 
> options:
>   - -mavx10.x: The option will enable AVX10.1-AVX10.x features with a default
>     256 bit vector width to make sure the compatibility on all platforms.
>   - -mavx10.x-512: The option will enable AVX10.1-AVX10.x features with 512 
> bit
>     vector width. “-mno-avx10.x-512” option will not be provided to avoid
>     confusion of disabling 512 vector width or avx10.x itself.
>   - -mavx10.x-256: The option will enable AVX10.1-AVX10.x features with 256 
> bit
>     vector width. But it will disable 512 bit vector width since the vector 
> size
>     is indicated in option. “-mno-avx10.x-256” option will not be provided to
>     keep align with the 512 ones.
>   - -mno-avx10.x: The option will disable all the features introduced 
> >=avx10.x
>     (both 256 and 512 bit) and keep features <avx10.x if enabled, just like 
> how
>     -mno- options behave previously.
> 
> When there comes an option combination of various vector size indicated (e.g. 
> -
> mavx10.x-512 -mavx10.y-256), we would like to emit a warning since the vector
> size conflicts under this scenario. Also in the warning message, we will 
> indicate
> the last mentioned vector size will be picked. The ISA set will be the 
> highest one.
> 
> For the auto dispatch support including function __builtin_cpu_supports (),
> function multi versioning, function attribute usage, the behavior will be 
> identical
> to compiler options, which means we will have avx10.x, avx10.x-256,
> avx10.x-512 and no-avx10.x.
> 
> As we have mentioned before, we lean towards the adoption of AVX10 instead of
> AVX512 from now on. Hence, we don’t recommend users to combine the AVX10
> and legacy AVX512 options since different users will have different opinions 
> on
> compiler behavior with option combinations like “-m[no-]avx10.1 -m[no-
> ]avx512f"
> and it is hard to tell whether compiler should open or close the feature under
> those scenarios. Furthermore, we don't guarantee that the behavior is 
> consistent
> between GCC and LLVM/ICX.
> 
> From our understanding, we propose to maintain the independency between
> AVX10 and AVX512 switches. Therefore, opening one of them will turn on the
> feature, no matter the other one is opened or not. We will emit a warning when
> user enables one feature but disable the other afterwards. Some typical 
> examples
> are given to help better understand that:
>   - -mno-avx512xxx: It will check if AVX10.1 is disabled when handling the
>     option. If AVX10.1 is  disabled, it is valid and then disables AVX512xxx.
>     If AVX10.1 not disabled, a warning will be emitted and -mno-avx512xxx will
>     be ignored.
>   - -mno-avx10.1: It will check if all AVX512 features in Granite Rapids are
>     disabled when handling the option. If all disabled, it is valid and then
>     disables all the features. If not, a warning will be emitted and
>     -mno-avx10.1 will be ignored.
>   - -mno-avx10.x (x >= 2): It is always valid.
> 
> Also, since we maintain the independency between AVX10 and AVX512 switches,
> when using a compiler option of “-mavx10.x[-256] -mavx512xxx”, it will 
> actually
> open all the AVX10.x 128/256 bit vector instruction support and 512 bit vector
> instruction support for AVX512xxx.
> 
> Last thing needed to be mentioned is -march options. We will imply AVX10
> features for future platforms with AVX10 available, i.e., AVX10/512 for Xeon
> Servers and AVX10/256 for Atom Servers and Clients. We purpose to change the
> current -march=graniterapids/graniterapids-d from implying AVX512 features to
> AVX10.1/512. No obvious behavior changes will happen for these two -march.
> 
> There will be a minor open after implying change: when we are using -
> march=graniterapids -mno-avx512f or -mno-avx512f -march=graniterapids, it will
> not disable AVX512F and it is a change in behavior. Should we emit a warning 
> for
> that? Our current behavior is not to emit a warning but I am open for changes.
> However, I suppose if we finally choose to emit a warning, it should only 
> happen
> in Granite Rapids and Granite Rapids D since for the next generation Xeon 
> Server
> product, user should be aware of AVX10 change.
> 
> For the following nine patches, first three of them will be the initial 
> support for
> AVX10.1 while the latter six is the AVX10.1 support for AVX512DQ+AVX512VL.
> 
> If you have any questions, feel free to ask in this thread. Also, if you are 
> working
> on AVX512 related patterns during AVX10 upstreaming, especially constraints,
> target check and iterators related, please kindly cc me in the patches since 
> there
> might be some conflicts.
> 
> Thx,
> Haochen
>

RE: Intel AVX10.1 Compiler Design and Support

Reply via email to