Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-18 Thread Richard Biener
On Fri, Mar 15, 2024 at 5:36 PM Andrew Stubbs  wrote:
>
> On 15/03/2024 13:56, Tobias Burnus wrote:
> > Hi Andrew,
> >
> > Andrew Stubbs wrote:
> >> This is more-or-less what I was planning to do myself, but as I want
> >> to include all the other features that get parametrized in gcn.cc,
> >> gcn.h, gcn-hsa.h, gcn-opts.h, I hadn't got around to it yet.
> >> Unfortunately, I think the gcn.opt and config.gcc will always need
> >> manually updating, but if that's all it'll be an improvement.
> >
> > Well, for .opt see how nvptx does it – it actually generates an .opt file.
> >
> >> I don't like the idea of including AMDGPU_ISA_UNSUPPORTED;
> >
> > I concur – I was initially thinking of reporting the device name
> > ("Unsupported %s") but I then realized that the agent returns a string
> > while only for GCC generated files (→ eflag) the hexcode is used. Thus,
> > I ended up not using it.
> >
> >> Ultimately, I want to replace many of the conditionals like
> >> "TARGET_CDNA2_PLUS" from the code and replace them with feature flags
> >> derived from a def file, or at least a header file. We've acquired too
> >> many places where there are unsearchable conditionals that need
> >> finding and fixing every time a new device comes along.
> > I was thinking of having more flags, but those where the only ones
> > required for the two files.
> >> I had imagined that this .def file would exist in gcc/config/gcn, but
> >> you've placed it in libgomp maybe it makes sense to have multiple
> >> such files if they contain very different data, but I had imagined one
> >> file and I'm not sure that the compiler definitions live in libgomp.
> >
> > There is already:
> >
> > gcc/config/darwin-c.cc:#include "../../libcpp/internal.h"
> >
> > gcc/config/gcn/gcn-run.cc:#include
> > "../../../libgomp/config/gcn/libgomp-gcn.h"
> >
> > gcc/fortran/cpp.cc:#include "../../libcpp/internal.h"
> >
> > gcc/fortran/trigd_fe.inc:#include "../../libgfortran/intrinsics/trigd.inc"
> >
> > But there is also the reverse:
> >
> > libcpp/lex.cc:#include "../gcc/config/i386/cpuid.h"
> >
> > libgfortran/libgfortran.h:#include "../gcc/fortran/libgfortran.h"
> >
> > lto-plugin/lto-plugin.c:#include "../gcc/lto/common.h"
> >
> > If you add more items, it is probably better to have it under
> > gcc/config/gcn/ - and I really prefer a single file for all.
> >
> > * * *
> >
> > Talking about feature sets: This would be a bit like LLVM (see below)
> > but I think they have a bit too much indirections. But I do concur that
> > we need to consolidate the current support – and hopefully make it
> > easier to keep adding more GPU support; we seem to have already covered
> > a larger chunk :-)
> >
> > I also did wonder whether we should support, e.g., running a gfx1100
> > code (or a gfx11-generic one) on, e.g., a gfx1103 device. Alternatively,
> > we could keep the current check which requires an exact match.
>
> We didn't invent that restriction; the runtime won't let you do it. We
> only have the check because the HSA/ROCr error message is not very
> user-friendly.

Note that I heard/read somewhere that they plan to support a "blend" version
that would allow running a kernel on any gfx11xx or gfx106x versions (supposedly
at some runtime cost).  Guess we'll need to watch the LLVM side of things here
(or the ROCm runtime side of it).

> > BTW: I do note that looking at the feature sets in LLVM that all GFX110x
> > GPUs seem to have common silicon bugs: FeatureMSAALoadDstSelBug and
> > FeatureMADIntraFwdBug, while 1100 and 1102 additionally have the
> > FeatureUserSGPRInit16Bug but 1101 and 1103 don't. — For some reasons,
> > FeatureISAVersion11_Generic only consists of two of those bugs (it
> > doesn't have FeatureMADIntraFwdBug), which doesn't seem to be that
> > consistent. Maybe the workaround has issues elsewhere? If so, a generic
> > -march=gfx11 might be not as useful as one might hope for.
> >
> > * * *
> >
> > If I look at LLVM's
> > https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPU.td
> >  ,
> >
> > they first define several features – like 'FeatureUnalignedScratchAccess'.
> >
> > Then they combine them like in:
> >
> > def FeatureISAVersion11_Common ... [FeatureGFX11, ...
> > FeatureAtomicFaddRtnInsts ...
> >
> > And then they use those to map them to feature sets like:
> >
> > def FeatureISAVersion11_0_Common ...
> > listconcat(FeatureISAVersion11_Common.Features,
> >  [FeatureMSAALoadDstSelBug ...
> >
> > And for gfx1103:
> >
> > def FeatureISAVersion11_0_3 : FeatureSet<
> >!listconcat(FeatureISAVersion11_0_Common.Features,
> >  [])>;
> >
> > The mapping to gfx... names then happens in
> > https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/GCNProcessors.td
> >  such as:
> >
> > def : ProcessorModel<"gfx1103", GFX11SpeedModel,
> >FeatureISAVersion11_0_3.Features
> >  >;
> >
> > Or for the generic one, i.e.:
> >
> > // [gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151]
> > def : 

Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-15 Thread Andrew Stubbs

On 15/03/2024 13:56, Tobias Burnus wrote:

Hi Andrew,

Andrew Stubbs wrote:
This is more-or-less what I was planning to do myself, but as I want 
to include all the other features that get parametrized in gcn.cc, 
gcn.h, gcn-hsa.h, gcn-opts.h, I hadn't got around to it yet. 
Unfortunately, I think the gcn.opt and config.gcc will always need 
manually updating, but if that's all it'll be an improvement.


Well, for .opt see how nvptx does it – it actually generates an .opt file.


I don't like the idea of including AMDGPU_ISA_UNSUPPORTED;


I concur – I was initially thinking of reporting the device name 
("Unsupported %s") but I then realized that the agent returns a string 
while only for GCC generated files (→ eflag) the hexcode is used. Thus, 
I ended up not using it.


Ultimately, I want to replace many of the conditionals like 
"TARGET_CDNA2_PLUS" from the code and replace them with feature flags 
derived from a def file, or at least a header file. We've acquired too 
many places where there are unsearchable conditionals that need 
finding and fixing every time a new device comes along.
I was thinking of having more flags, but those where the only ones 
required for the two files.
I had imagined that this .def file would exist in gcc/config/gcn, but 
you've placed it in libgomp maybe it makes sense to have multiple 
such files if they contain very different data, but I had imagined one 
file and I'm not sure that the compiler definitions live in libgomp.


There is already:

gcc/config/darwin-c.cc:#include "../../libcpp/internal.h"

gcc/config/gcn/gcn-run.cc:#include 
"../../../libgomp/config/gcn/libgomp-gcn.h"


gcc/fortran/cpp.cc:#include "../../libcpp/internal.h"

gcc/fortran/trigd_fe.inc:#include "../../libgfortran/intrinsics/trigd.inc"

But there is also the reverse:

libcpp/lex.cc:#include "../gcc/config/i386/cpuid.h"

libgfortran/libgfortran.h:#include "../gcc/fortran/libgfortran.h"

lto-plugin/lto-plugin.c:#include "../gcc/lto/common.h"

If you add more items, it is probably better to have it under 
gcc/config/gcn/ - and I really prefer a single file for all.


* * *

Talking about feature sets: This would be a bit like LLVM (see below) 
but I think they have a bit too much indirections. But I do concur that 
we need to consolidate the current support – and hopefully make it 
easier to keep adding more GPU support; we seem to have already covered 
a larger chunk :-)


I also did wonder whether we should support, e.g., running a gfx1100 
code (or a gfx11-generic one) on, e.g., a gfx1103 device. Alternatively, 
we could keep the current check which requires an exact match.


We didn't invent that restriction; the runtime won't let you do it. We 
only have the check because the HSA/ROCr error message is not very 
user-friendly.


BTW: I do note that looking at the feature sets in LLVM that all GFX110x 
GPUs seem to have common silicon bugs: FeatureMSAALoadDstSelBug and 
FeatureMADIntraFwdBug, while 1100 and 1102 additionally have the 
FeatureUserSGPRInit16Bug but 1101 and 1103 don't. — For some reasons, 
FeatureISAVersion11_Generic only consists of two of those bugs (it 
doesn't have FeatureMADIntraFwdBug), which doesn't seem to be that 
consistent. Maybe the workaround has issues elsewhere? If so, a generic 
-march=gfx11 might be not as useful as one might hope for.


* * *

If I look at LLVM's 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPU.td ,


they first define several features – like 'FeatureUnalignedScratchAccess'.

Then they combine them like in:

def FeatureISAVersion11_Common ... [FeatureGFX11, ... 
FeatureAtomicFaddRtnInsts ...


And then they use those to map them to feature sets like:

def FeatureISAVersion11_0_Common ... 
listconcat(FeatureISAVersion11_Common.Features,

     [FeatureMSAALoadDstSelBug ...

And for gfx1103:

def FeatureISAVersion11_0_3 : FeatureSet<
   !listconcat(FeatureISAVersion11_0_Common.Features,
     [])>;

The mapping to gfx... names then happens in 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/GCNProcessors.td such as:


def : ProcessorModel<"gfx1103", GFX11SpeedModel,
   FeatureISAVersion11_0_3.Features
 >;

Or for the generic one, i.e.:

// [gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151]
def : ProcessorModel<"gfx11-generic", GFX11SpeedModel,
   FeatureISAVersion11_Generic.Features

LLVM also has some generic flags like the following in 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/TargetParser/TargetParser.cpp


     {{"gfx1013"},   {"gfx1013"}, GK_GFX1013, 
FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32|FEATURE_XNACK|FEATURE_WGP},


I hope that this will give some inspiration – but I assume that at least 
the initial implementation will be much shorter.


Yeah, we can have one macro for each arch, or multiple macros for 
building different tables. First one seems easier but less readable, 
second one will need some thinking about. Probably best to keep it 
simple 

Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-15 Thread Tobias Burnus

Hi Andrew,

Andrew Stubbs wrote:
This is more-or-less what I was planning to do myself, but as I want 
to include all the other features that get parametrized in gcn.cc, 
gcn.h, gcn-hsa.h, gcn-opts.h, I hadn't got around to it yet. 
Unfortunately, I think the gcn.opt and config.gcc will always need 
manually updating, but if that's all it'll be an improvement.


Well, for .opt see how nvptx does it – it actually generates an .opt file.


I don't like the idea of including AMDGPU_ISA_UNSUPPORTED;


I concur – I was initially thinking of reporting the device name 
("Unsupported %s") but I then realized that the agent returns a string 
while only for GCC generated files (→ eflag) the hexcode is used. Thus, 
I ended up not using it.


Ultimately, I want to replace many of the conditionals like 
"TARGET_CDNA2_PLUS" from the code and replace them with feature flags 
derived from a def file, or at least a header file. We've acquired too 
many places where there are unsearchable conditionals that need 
finding and fixing every time a new device comes along.
I was thinking of having more flags, but those where the only ones 
required for the two files.
I had imagined that this .def file would exist in gcc/config/gcn, but 
you've placed it in libgomp maybe it makes sense to have multiple 
such files if they contain very different data, but I had imagined one 
file and I'm not sure that the compiler definitions live in libgomp.


There is already:

gcc/config/darwin-c.cc:#include "../../libcpp/internal.h"

gcc/config/gcn/gcn-run.cc:#include 
"../../../libgomp/config/gcn/libgomp-gcn.h"


gcc/fortran/cpp.cc:#include "../../libcpp/internal.h"

gcc/fortran/trigd_fe.inc:#include "../../libgfortran/intrinsics/trigd.inc"

But there is also the reverse:

libcpp/lex.cc:#include "../gcc/config/i386/cpuid.h"

libgfortran/libgfortran.h:#include "../gcc/fortran/libgfortran.h"

lto-plugin/lto-plugin.c:#include "../gcc/lto/common.h"

If you add more items, it is probably better to have it under 
gcc/config/gcn/ - and I really prefer a single file for all.


* * *

Talking about feature sets: This would be a bit like LLVM (see below) 
but I think they have a bit too much indirections. But I do concur that 
we need to consolidate the current support – and hopefully make it 
easier to keep adding more GPU support; we seem to have already covered 
a larger chunk :-)


I also did wonder whether we should support, e.g., running a gfx1100 
code (or a gfx11-generic one) on, e.g., a gfx1103 device. Alternatively, 
we could keep the current check which requires an exact match.


BTW: I do note that looking at the feature sets in LLVM that all GFX110x 
GPUs seem to have common silicon bugs: FeatureMSAALoadDstSelBug and 
FeatureMADIntraFwdBug, while 1100 and 1102 additionally have the 
FeatureUserSGPRInit16Bug but 1101 and 1103 don't. — For some reasons, 
FeatureISAVersion11_Generic only consists of two of those bugs (it 
doesn't have FeatureMADIntraFwdBug), which doesn't seem to be that 
consistent. Maybe the workaround has issues elsewhere? If so, a generic 
-march=gfx11 might be not as useful as one might hope for.


* * *

If I look at LLVM's 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPU.td 
,


they first define several features – like 'FeatureUnalignedScratchAccess'.

Then they combine them like in:

def FeatureISAVersion11_Common ... [FeatureGFX11, ... 
FeatureAtomicFaddRtnInsts ...


And then they use those to map them to feature sets like:

def FeatureISAVersion11_0_Common ... 
listconcat(FeatureISAVersion11_Common.Features,

    [FeatureMSAALoadDstSelBug ...

And for gfx1103:

def FeatureISAVersion11_0_3 : FeatureSet<
  !listconcat(FeatureISAVersion11_0_Common.Features,
    [])>;

The mapping to gfx... names then happens in 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/GCNProcessors.td 
such as:


def : ProcessorModel<"gfx1103", GFX11SpeedModel,
  FeatureISAVersion11_0_3.Features
>;

Or for the generic one, i.e.:

// [gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151]
def : ProcessorModel<"gfx11-generic", GFX11SpeedModel,
  FeatureISAVersion11_Generic.Features

LLVM also has some generic flags like the following in 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/TargetParser/TargetParser.cpp


    {{"gfx1013"},   {"gfx1013"}, GK_GFX1013, 
FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32|FEATURE_XNACK|FEATURE_WGP},


I hope that this will give some inspiration – but I assume that at least 
the initial implementation will be much shorter.


Tobias



Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-15 Thread Andrew Stubbs

On 15/03/2024 12:21, Tobias Burnus wrote:
Given the large number of AMD GPU ISAs and the number of files which 
have to be adapted, I wonder whether it makes sense to consolidate this 
a bit, especially in the light that we may want to support more in the 
future.


Besides using some macros, I also improved the diagnostic if the object 
code couldn't be recognized (shouldn't happen) or if the GPU is 
unsupported (likely; it now prints the GPU string). I was initially 
thinking of resolving the arch encoded in the eflag to a string, but as 
this is about GCC-generated code, it seemed to be unlikely of much use. 
[It should that rare that we might also go back to the static string 
instead of outputting the hex value of the eflag.]


Note: I only modified mkoffload.cc and plugin-gcn.c, but with some 
tweaks it could also be used for other files in gcc/config/gcn/.


If you add a new ISA, you still need to update plugin-gcn.c's 
max_isa_vgprs and the xnack/sram-ecc handling in mkoffload.c's main, but 
that should be all for those two files.


Thoughts?


This is more-or-less what I was planning to do myself, but as I want to 
include all the other features that get parametrized in gcn.cc, gcn.h, 
gcn-hsa.h, gcn-opts.h, I hadn't got around to it yet.  Unfortunately, I 
think the gcn.opt and config.gcc will always need manually updating, but 
if that's all it'll be an improvement.


I don't like the idea of including AMDGPU_ISA_UNSUPPORTED; that list is 
going to be permanently out of date, and even if we maintain it 
fastidiously last year's release isn't going to have the updated list in 
the wild. I think it's not actually active in this patch in any case.


Instead of AMDGPU_ISA, I think "AMDGPU_ELF" makes more sense. The ISA is 
"CDNA2" or "RDNA3", etc., and the compiler needs to know about that.


Ultimately, I want to replace many of the conditionals like 
"TARGET_CDNA2_PLUS" from the code and replace them with feature flags 
derived from a def file, or at least a header file. We've acquired too 
many places where there are unsearchable conditionals that need finding 
and fixing every time a new device comes along.


I had imagined that this .def file would exist in gcc/config/gcn, but 
you've placed it in libgomp maybe it makes sense to have multiple 
such files if they contain very different data, but I had imagined one 
file and I'm not sure that the compiler definitions live in libgomp.



Tobias

PS: I think the patch is fine and builds, but I have not tested it on an 
AMD GPU machine, yet.


PPS: For using for other files, see also in config/nvptx which uses 
nvptx-sm.def to generate several files.




Andrew