[PATCH] D60620: [HIP] Support target id by --offload-arch

Artem Belevich via Phabricator via cfe-commits Wed, 27 May 2020 12:32:23 -0700

tra added inline comments.


================
Comment at: clang/lib/Driver/ToolChains/HIP.cpp:121-123
+  auto Pos = SubArchName.find_first_of("+-");
+  if (Pos != SubArchName.npos)
+    SubArchName = SubArchName.substr(0, Pos);
----------------
yaxunl wrote:
> tra wrote:
> > yaxunl wrote:
> > > tra wrote:
> > > > Parsing should probably be extracted into a separate function to avoid 
> > > > replicating it all over the place.
> > > > 
> > > > I'd also propose use a different syntax for the properties.
> > > > * use explicit character to separate individual elements. This way 
> > > > splitting the properties becomes independent of what those properties 
> > > > are. If you decide to make properties with values or change their 
> > > > meaning some other way, it would not affect how you compose them.
> > > > * use `name=value` or `name[+-]` for individual properties. This makes 
> > > > it easy to parse individual properties and normalize their names. This 
> > > > makes property map creation independent of the property values.
> > > > 
> > > > Right now `[+-]` serves as both a separator and as the value, which 
> > > > would present problems if you ever need more flexible parametrization 
> > > > of properties. What if a property must be a number or a string. 
> > > > Granted, you can always encode them as a set of bools, but that's 
> > > > rather unpractical. 
> > > > 
> > > > E.g. something like this would work a bit better: 
> > > > `gfx111:foo+:bar=33:buz=string`.
> > > > 
> > > I discussed this with our team.
> > > 
> > > The target id features are not raw GPU properties. They are distilled to 
> > > become a few target features to decide what the compiler should do.
> > > 
> > > Each target feature has 3 values: on, off, and default, which are encoded 
> > > as +feature, -feature, and not present.
> > > 
> > > For runtime, it is simple and clear how to choose device binaries based 
> > > on the target features: it will try exact match, otherwise choose the 
> > > default.
> > > 
> > > For compiler, it is also simple and clear what to do for each target 
> > > feature value, since they corresponding to backend target features.
> > > 
> > > Basically we expect the target id feature to be like flags, not key/value 
> > > pairs. In case we do need key/value pairs, they can still use + as 
> > > delimiter.
> > > 
> > > Another reason we use +/- format is that it is more in line with the 
> > > format of existing clang-offload-bundler id and target triple, which uses 
> > > - as delimiter.
> > > 
> > > Since the target id is an extension of offload arch and users will put it 
> > > into command line, we want to make it short, concise and aesthetically 
> > > appealing, we would avoid using non-alpha-numeric characters in the 
> > > target id features. Target triple components have similar requirements. 
> > > Using : as delimiter seems unnecessary, longer, and more difficult to 
> > > read.
> > > 
> > > Consider the following example
> > > 
> > > 
> > > ```
> > > clang -offload-id gfx908+xnack-sramecc a.hip
> > > 
> > > clang -offload-id gfx908:xnack+:sramecc- a.hip
> > > ```
> > > 
> > > We are more inclined to keep the original format. 
> > You're thinking in terms what's needed by AMDGPU *now*. The scheme you're 
> > proposing is sufficient for your use case and I'm fine with that. I'm 
> > suggesting that you should consider what happens once this change lands.
> > 
> > The functionality you're implementing is exposed to end-users via top-level 
> > clang driver argument. This is visible to users and will be relied on.
> > This will make it hard to change in the future without breaking someone. 
> > It's worth making sure we're not painting ourselves in the corner here.
> > 
> > Also, the functionality may be useful/applicable beyond the scope of amdgpu 
> > and the binary flags will not be sufficient for everyone. The scheme you're 
> > proposing would be somewhat restrictive if I need to pass an integer value 
> > or string. We could use something like `gfx123+foo=456-bar=789` but it 
> > would look rather odd, IMO. 
> > 
> > Granted, none of the above is a showstopper. I guess we could support 
> > multiple formats if it comes to that, but I'd rather not multiply things 
> > later because we didn't think of them earlier.
> > 
> > > Another reason we use +/- format is that it is more in line with the 
> > > format of existing clang-offload-bundler id and target triple, which uses 
> > > - as delimiter.
> > 
> > The point was that commingling field separator and the field value is not 
> > the cleanest approach, IMO. I'd be fine fine with some other character.
> > 
> > > Since the target id is an extension of offload arch and users will put it 
> > > into command line, we want to make it short, concise and aesthetically 
> > > appealing, we would avoid using non-alpha-numeric characters in the 
> > > target id features. Target triple components have similar requirements. 
> > > Using : as delimiter seems unnecessary, longer, and more difficult to 
> > > read.
> > 
> > The current use of `gfxXXX` seems to fit the 'short, concise & 
> > aesthetically pleasing' part of your argument much better than the proposed 
> > scheme.
> > 
> > Is the end user allowed to specify an arbitrary set of the features? Or is 
> > the offload-id set restricted to a smaller number of combinations (i.e. 
> > tied to particular hardware variants). I vaguely recall that in the past 
> > the problem was that AMD needed to create multiple device compilations for 
> > one GPU architecture and that didn't fit in the model used by CUDA 
> > compilation. 
> > 
> > Would it make sense to keep user-visible GPU arch argument as is and map 
> > each known one internally into a set of `offload-id` parameters used to 
> > create driver device-side compilations? For CUDA it will be a pass-through, 
> > for HIP it will translate single user-specified arch into multiple 
> > offload-ids. This would leave AMDGPU free to choose the way internally-used 
> >  offload-id is structured and can change it if/when it's necessary without 
> > worrying about existing users. It also keeps user-visible parameters short. 
> > The translation from gpu-arch to offload-id should be simple enough to 
> > maintain.
> > 
> > 
> After discussion, we decided to adopt the format you proposed. The rationale 
> is that we want target id to be treated as an extended `--offload-arch` 
> option, which means it needs to be able to accept all existing and future 
> CUDA arch names. Using `:` as delimiter should be tolerant enough whereas 
> `+/-` is not.
> 
> Also I will try introducing -offload-target-id for this option.
> 
> The features that can be used in target id are restricted to a few predefined 
> features for each GPU arch, because both compiler and runtime needs to know 
> how to handle them.
> 
> I am not sure if I understand your last question. With the new format we 
> should be able to use any CUDA arch names as target id, therefore we no 
> longer need a map. Also we need to pass each target id as a whole option 
> since we need to use it as an id for the device binary for each device 
> compilation.
> we want target id to be treated as an extended --offload-arch option
> Also I will try introducing -offload-target-id for this option.

Do we need a new option? I think it may be a natural extension of the 
`--offload-arch` where all currently used options will still be parsed 
correctly as an arch without extra features. The tests in the last revision of 
this patch look reasonable: 

```
// ...
// RUN:   -x hip --offload-arch=gfx908 \
// RUN:   --offload-arch=gfx908:sramecc+:xnack+ 
```

Does this mean that HIP will create two compilation passes -- one for `gfx908` 
and one for `gfx908:sramecc+:xnack+` ?
Or does it mean that the first line is ignored if you get a more detailed 
offload arch?

One thing you'll need is a way to normalize the arch+features tuple so we can 
compare them. 

> The features that can be used in target id are restricted to a few predefined 
> features for each GPU arch, because both compiler and runtime needs to know 
> how to handle them.

What I mean -- are users free to speficy any combination of {feature[+-]} and 
would it be expected for all/most of them to make sense to the user?
Or does it only make sense for a few specific arch:featureA+:featureB- 
combinations?
If we only have a limited set of valid combinations, it would make sense to 
give users easy-to-use names.

I.e. if the only valid ids for gfx111  are `gfx111:foo+:bar-` and 
`gfx111:buz+`, we could call them `gfx111a` and `gfx111b` and expand it into 
the right set of features ourselves without relying on the users not to make a 
typo.

> I am not sure if I understand your last question. With the new format we 
> should be able to use any CUDA arch names as target id, therefore we no 
> longer need a map. Also we need to pass each target id as a whole option 
> since we need to use it as an id for the device binary for each device 
> compilation.

What I'm saying is that maybe we should not expose detailrd features to the end 
user directly (or by default). Allow them to use friendly GPU names and 
normalize them internally into an offload ID or a set of IDs.

E.g. right now we specify offload-arch and create one device compilation per 
specified offload arch.
This patch proposed to make offload-arch more nuanced, but otherwise keeps the 
machinery the same.
What I'm suggesting is this:

* Normalize each offload-arch argument into a list of build IDs. For CUDA it 
will just map each arch to a singleton list.
   For AMDGPU, it will expand friendly names into lists of offload-IDs they 
represent, and into singleton with a single normalized offload ID otherwise.
* do similar normalization for `--no-offload-arch`
* concatenate all enabled offload IDs.
* use the list of offload-ids to drive device compilation pass creation.

As far as the end users are concerned, they can keep using whatever 
--offload-arch flags they are using now.
If building with --offload-arch=gfx908 requires actually building two GPU 
objects, it will all be handled transparently by the driver. If they need 
something specific, it's doable with --offload-arch=gfx908:featureA+ which will 
build for that variant only.

Would this fit your use case? If not, what do I miss? Could you give me more 
examples of how do you see offload-id being used?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D60620: [HIP] Support target id by --offload-arch

Reply via email to