gregrodgers added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/HIP.cpp:116
+  if (getOrCheckAMDGPUCodeObjectVersion(C.getDriver(), Args) >= 4)
+    OffloadKind = OffloadKind + "v4";
   for (const auto &II : Inputs) {
----------------
yaxunl wrote:
> tra wrote:
> > We do not do it for v2/v3. Could you elaborate on what makes v4 special 
> > that it needs its own offload kind? 
> > 
> > Will you need to target different object versions simultaneously?
> > If yes, how? AFAICT, the version specified is currently global and applies 
> > to all sub-compilations.
> > If not, then do we really need to encode the version in the offload target 
> > name?
> Introducing hipv4 is to differentiate with code object version 2 and 3 which 
> are used by HIP applications compiled by older version of clang. ROCm 
> platform is required to keep binary backward compatibility, i.e., old HIP 
> applications built by ROCm 4.0 should run on ROCm 4.1. The bundle ID has 
> different interpretation depending on whether it is version 2/3 or version 4, 
> e.g. 'gfx906' implies xnack and sramecc off with code object v2/3 but implies 
> xnack and sramecc ANY with v4. Since code object version 2/3 uses 'hip', code 
> object version 4 needs to be different, therefore it uses 'hipv4'.
We need to start thinking in terms of offload requirements of a compiled image 
vs the capabilities of a particular active runtime on a particular GPU.   This 
concept can eliminate the need for a new offload kind.  For AMD, we would add 
the requirement of code object v4 (cov4) if built for code object v4 or 
greater.    This means it can only run on a system with that capability.  This 
concept works well with requirements xnack+, xnack-, sramecc+ and sramecc-.    
The bundle entry id is the offload-kind, the triple, and the list of image 
requirements.  The gpu type (offload-arch) is really an image requirement.  

In this model, there is no requirement for xnack-any.  The lack of the xnack+ 
or xnack- requirement implies "any" which means it can run on any capable 
machine.  

This is a general model that is extensible.   To make this work, a runtime must 
be able to detect the capabilities for any requirement that could be tagged on 
an image.  In fact, every requirement of an embedded image must have its 
capability detected by the runtime for that offload image to be usable.   
However, a system's runtime could have more capabilities than the requirements 
of an image.   So in the case of xnack, the lack of xnack- or xnack+ will be 
acceptable no matter what the xnack capability of the runtime is.   If the 
compiler driver puts the requirement cov4 in the bundle entry id requirements 
field the runtime will not run that image unless the GPU loader supports v4 or 
greater.     

The clang driver can create the requirement xnack- for code object < 4 on those 
GPUs that support either xnack mode.   This will ensure  the image will 
gracefully fail or use an alternative image if the runtime capability is xnack+.

But the cov4 requirement is mostly unrelated to xnack .  It is about the 
capability of the GPU loader.  If the code object version >= 4, then it will be 
tagged with the cov4 requirement.   This would prevent an old system that does 
not have a newer software stack from running an image with a cov4 requirement. 

This general notion of image requirements and runtime capabilities is 
extensible to other offload architectures.   Suppose cuda version 12 
compilation REQUIRES that a cuda version 12 runtime.   Old runtimes would never 
display cuv12 capability and would fail to run any image created with the 
requirement cuv12.    
 






Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99235/new/

https://reviews.llvm.org/D99235

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to