[clang] [flang] [llvm] [mlir] Reland "[NVPTX] Validate user-specified PTX version against SM version" (PR #180116)

via cfe-commits Thu, 05 Feb 2026 20:55:09 -0800

llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-flang-fir-hlfir

Author: Justin Fargnoli (justinfargnoli)

<details>
<summary>Changes</summary>

Previous commit message: 

&gt;Previous commit message: 
&gt;
&gt;&gt; Original commit message:
&gt;&gt;
&gt;&gt;&gt;When users explicitly specify a PTX version via -mattr=+ptxNN that's
insufficient for their target SM, we now emit a fatal error. Previously,
we silently upgraded the PTX version to the minimum required for the
target SM.
&gt;&gt;&gt;
&gt;&gt;&gt;When no SM or PTX version is specified, we now use PTX 3.2 (the
minimum for the default SM 3.0) instead of PTX 6.0.
&gt;&gt;
&gt;&gt;The following commits should fix the failures that arose when I
previously tried to land this commit:
&gt;&gt;
&gt;&gt;
&gt;&gt;https://github.com/llvm/llvm-project/commit/9fc5fd0ad689eed94f65b1d6d10f9c5642935e68
should address the llvm-nvptx*-nvidia-* build failures:
https://github.com/llvm/llvm-project/pull/174834#issuecomment-3742242651
&gt;&gt;
&gt;&gt;
&gt;&gt;https://github.com/llvm/llvm-project/commit/600514a63760c6730e4cd970d2fcead9c5a897b3
should address the MLIR failures
&gt;
&gt;The previous commit was reverted with 
https://github.com/llvm/llvm-project/commit/d23cb79ba497281de050ef609cb91b91058bf323
 because the 
[mlir-nvidia](https://lab.llvm.org/buildbot/#/builders/138/builds/24797) and 
[mlir-nvidia-gcc7](https://lab.llvm.org/buildbot/#/builders/116/builds/23929) 
Buildbots were failing.
&gt;
&gt;Those tests failed because MLIR's default SM was 5.0, which caused NVPTX
to target PTX ISA v4.0, which did not support the intrinsics used in the
failing tests.
&gt;
&gt;https://github.com/llvm/llvm-project/commit/243f011577193c99358ccc4142b296d4fa80ea11
 should address this by bumping MLIR's default SM to 7.5. Now, using MLIR's new 
default SM, NVPTX
targets the PTX ISA v6.3, which supports the intrinsics used in the
failing tests.

---

The previous commit was reverted with e9b578a4d77025e18318efedd0f3f3764338d859 
[because](https://github.com/llvm/llvm-project/pull/179304#issuecomment-3856301333)
 the clang driver set the default PTX ISA version to v4.2 when no CUDA 
installation is found. However, given our patch, we should not set a default; 
instead, let the LLVM backend select the appropriate PTX ISA version for the 
target SM. 

---

Patch is 36.53 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/180116.diff


30 Files Affected:

- (modified) clang/lib/Basic/Targets/NVPTX.cpp (+3-1) 
- (modified) clang/lib/Basic/Targets/NVPTX.h (+4-1) 
- (modified) clang/lib/Driver/ToolChains/Cuda.cpp (+5-3) 
- (modified) clang/test/CodeGen/builtins-nvptx-ptx60.cu (+1-1) 
- (modified) clang/test/CodeGen/builtins-nvptx.c (+2-2) 
- (modified) clang/test/CodeGen/nvptx_attributes.c (+1-1) 
- (modified) clang/test/CodeGenCUDA/convergent.cu (+4-4) 
- (modified) clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp (+1-1) 
- (modified) clang/test/Headers/gpuintrin.c (+2-2) 
- (modified) flang/lib/Frontend/CompilerInstance.cpp (+3-12) 
- (modified) flang/test/Lower/OpenMP/target_cpu_features.f90 (+1-1) 
- (modified) llvm/lib/Target/NVPTX/NVPTX.td (+15-57) 
- (modified) llvm/lib/Target/NVPTX/NVPTXSubtarget.cpp (+91-2) 
- (modified) llvm/lib/Target/NVPTX/NVPTXSubtarget.h (+7-2) 
- (modified) llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll (+4-4) 
- (modified) llvm/test/CodeGen/NVPTX/convert-sm100a.ll (+2-2) 
- (modified) llvm/test/CodeGen/NVPTX/convert-sm103a.ll (+2-2) 
- (modified) llvm/test/CodeGen/NVPTX/f32-ex2.ll (+2-2) 
- (modified) llvm/test/CodeGen/NVPTX/fexp2.ll (+2-2) 
- (modified) llvm/test/CodeGen/NVPTX/flog2.ll (+2-2) 
- (modified) llvm/test/CodeGen/NVPTX/i128.ll (+2-2) 
- (modified) llvm/test/CodeGen/NVPTX/nvvm-reflect-arch-O0.ll (+1-1) 
- (added) llvm/test/CodeGen/NVPTX/ptx-version-validation.ll (+51) 
- (modified) llvm/test/CodeGen/NVPTX/rsqrt.ll (+2-2) 
- (modified) llvm/test/CodeGen/NVPTX/sm-version.ll (+1-1) 
- (modified) llvm/test/CodeGen/NVPTX/surf-tex.py (+2-2) 
- (removed) llvm/test/CodeGen/NVPTX/wmma-ptx86-sm120a.py (-14) 
- (modified) mlir/include/mlir/Dialect/GPU/Pipelines/Passes.h (+1-1) 
- (modified) mlir/include/mlir/Dialect/GPU/Transforms/Passes.td (+1-1) 
- (modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+3-4) 


``````````diff
diff --git a/clang/lib/Basic/Targets/NVPTX.cpp 
b/clang/lib/Basic/Targets/NVPTX.cpp
index dec076ac54f41..6526eeff3d718 100644
--- a/clang/lib/Basic/Targets/NVPTX.cpp
+++ b/clang/lib/Basic/Targets/NVPTX.cpp
@@ -42,7 +42,9 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
   assert((TargetPointerWidth == 32 || TargetPointerWidth == 64) &&
          "NVPTX only supports 32- and 64-bit modes.");
 
-  PTXVersion = 32;
+  // PTXVersion is 0 by default, meaning "use the minimum for the SM target".
+  // Only set it if the user explicitly requested a PTX version.
+  PTXVersion = 0;
   for (const StringRef Feature : Opts.FeaturesAsWritten) {
     int PTXV;
     if (!Feature.starts_with("+ptx") ||
diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index 6338a4f2f9036..9bd0cc36d12b4 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -89,7 +89,10 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public 
TargetInfo {
                  const std::vector<std::string> &FeaturesVec) const override {
     if (GPU != OffloadArch::UNUSED)
       Features[OffloadArchToString(GPU)] = true;
-    Features["ptx" + std::to_string(PTXVersion)] = true;
+    // Only add PTX feature if explicitly requested. Otherwise, let the backend
+    // use the minimum required PTX version for the target SM.
+    if (PTXVersion != 0)
+      Features["ptx" + std::to_string(PTXVersion)] = true;
     return TargetInfo::initFeatureMap(Features, Diags, CPU, FeaturesVec);
   }
 
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 840df0b0fd5fc..570b65a493a5d 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -667,7 +667,7 @@ void NVPTX::getNVPTXTargetFeatures(const Driver &D, const 
llvm::Triple &Triple,
                                    std::vector<StringRef> &Features) {
   if (Args.hasArg(options::OPT_cuda_feature_EQ)) {
     StringRef PtxFeature =
-        Args.getLastArgValue(options::OPT_cuda_feature_EQ, "+ptx42");
+        Args.getLastArgValue(options::OPT_cuda_feature_EQ);
     Features.push_back(Args.MakeArgString(PtxFeature));
     return;
   }
@@ -712,9 +712,11 @@ void NVPTX::getNVPTXTargetFeatures(const Driver &D, const 
llvm::Triple &Triple,
     PtxFeature = "+ptx86";
     break;
   default:
-    PtxFeature = "+ptx42";
+    // No PTX feature specified; let the backend choose based on the target SM.
+    break;
   }
-  Features.push_back(PtxFeature);
+  if (PtxFeature)
+    Features.push_back(PtxFeature);
 }
 
 /// NVPTX toolchain. Our assembler is ptxas, and our linker is nvlink. This
diff --git a/clang/test/CodeGen/builtins-nvptx-ptx60.cu 
b/clang/test/CodeGen/builtins-nvptx-ptx60.cu
index 8b2514a183221..04d391a10115c 100644
--- a/clang/test/CodeGen/builtins-nvptx-ptx60.cu
+++ b/clang/test/CodeGen/builtins-nvptx-ptx60.cu
@@ -3,7 +3,7 @@
 // RUN:            -emit-llvm -o - -x cuda %s \
 // RUN:   | FileCheck -check-prefix=CHECK %s
 // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -target-cpu sm_80 \
-// RUN:            -fcuda-is-device -target-feature +ptx65 \
+// RUN:            -fcuda-is-device -target-feature +ptx70 \
 // RUN:            -emit-llvm -o - -x cuda %s \
 // RUN:   | FileCheck -check-prefix=CHECK %s
 // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -target-cpu sm_80 \
diff --git a/clang/test/CodeGen/builtins-nvptx.c 
b/clang/test/CodeGen/builtins-nvptx.c
index a739b66042f19..2e1acc0aac259 100644
--- a/clang/test/CodeGen/builtins-nvptx.c
+++ b/clang/test/CodeGen/builtins-nvptx.c
@@ -46,10 +46,10 @@
 // RUN: %clang_cc1 -ffp-contract=off -triple nvptx64-unknown-unknown 
-target-cpu sm_101a -target-feature +ptx86 -DPTX=86 \
 // RUN:            -disable-llvm-optzns -fcuda-is-device -emit-llvm -o - -x 
cuda %s \
 // RUN:   | FileCheck -check-prefix=CHECK -check-prefix=CHECK_PTX86_SM101a %s
-// RUN: %clang_cc1 -ffp-contract=off -triple nvptx64-unknown-unknown 
-target-cpu sm_120a -target-feature +ptx86 -DPTX=86 \
+// RUN: %clang_cc1 -ffp-contract=off -triple nvptx64-unknown-unknown 
-target-cpu sm_120a -target-feature +ptx87 -DPTX=87 \
 // RUN:            -disable-llvm-optzns -fcuda-is-device -emit-llvm -o - -x 
cuda %s \
 // RUN:   | FileCheck -check-prefix=CHECK -check-prefix=CHECK_PTX86_SM120a %s
-// RUN: %clang_cc1 -ffp-contract=off -triple nvptx64-unknown-unknown 
-target-cpu sm_103a -target-feature +ptx87 -DPTX=87 \
+// RUN: %clang_cc1 -ffp-contract=off -triple nvptx64-unknown-unknown 
-target-cpu sm_103a -target-feature +ptx88 -DPTX=88 \
 // RUN:            -disable-llvm-optzns -fcuda-is-device -emit-llvm -o - -x 
cuda %s \
 // RUN:   | FileCheck -check-prefix=CHECK -check-prefix=CHECK_PTX87_SM103a %s
 // RUN: %clang_cc1 -ffp-contract=off -triple nvptx64-unknown-unknown 
-target-cpu sm_100a -target-feature +ptx87 -DPTX=87 \
diff --git a/clang/test/CodeGen/nvptx_attributes.c 
b/clang/test/CodeGen/nvptx_attributes.c
index 8b9f3a2c18a1d..4695fca51ea53 100644
--- a/clang/test/CodeGen/nvptx_attributes.c
+++ b/clang/test/CodeGen/nvptx_attributes.c
@@ -16,7 +16,7 @@ __attribute__((nvptx_kernel)) void foo(int *ret) {
 }
 
 //.
-// CHECK: attributes #[[ATTR0]] = { convergent noinline nounwind optnone 
"no-trapping-math"="true" "stack-protector-buffer-size"="8" 
"target-cpu"="sm_61" "target-features"="+ptx32,+sm_61" }
+// CHECK: attributes #[[ATTR0]] = { convergent noinline nounwind optnone 
"no-trapping-math"="true" "stack-protector-buffer-size"="8" 
"target-cpu"="sm_61" "target-features"="+sm_61" }
 //.
 // CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
 // CHECK: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
diff --git a/clang/test/CodeGenCUDA/convergent.cu 
b/clang/test/CodeGenCUDA/convergent.cu
index b187f3a8a32d6..87948235f736e 100644
--- a/clang/test/CodeGenCUDA/convergent.cu
+++ b/clang/test/CodeGenCUDA/convergent.cu
@@ -71,10 +71,10 @@ __host__ __device__ void bar() {
 
 
 //.
-// DEVICE: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind 
optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" 
"target-features"="+ptx32" }
-// DEVICE: attributes #[[ATTR1]] = { mustprogress noinline nounwind optnone 
"no-trapping-math"="true" "stack-protector-buffer-size"="8" 
"target-features"="+ptx32" }
-// DEVICE: attributes #[[ATTR2:[0-9]+]] = { convergent nounwind 
"no-trapping-math"="true" "stack-protector-buffer-size"="8" 
"target-features"="+ptx32" }
-// DEVICE: attributes #[[ATTR3:[0-9]+]] = { nounwind "no-trapping-math"="true" 
"stack-protector-buffer-size"="8" "target-features"="+ptx32" }
+// DEVICE: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind 
optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// DEVICE: attributes #[[ATTR1]] = { mustprogress noinline nounwind optnone 
"no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// DEVICE: attributes #[[ATTR2:[0-9]+]] = { convergent nounwind 
"no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// DEVICE: attributes #[[ATTR3:[0-9]+]] = { nounwind "no-trapping-math"="true" 
"stack-protector-buffer-size"="8" }
 // DEVICE: attributes #[[ATTR4]] = { convergent nounwind }
 // DEVICE: attributes #[[ATTR5]] = { convergent nounwind memory(none) }
 // DEVICE: attributes #[[ATTR6]] = { nounwind }
diff --git a/clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp 
b/clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp
index cd1d4d801951d..67b53f3ae81cf 100644
--- a/clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp
+++ b/clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp
@@ -182,7 +182,7 @@ int main() {
 // CHECK-AMDGCN: #[[AMDGCN_ATTR0]] = { convergent mustprogress noinline 
norecurse nounwind optnone "no-trapping-math"="true" 
"stack-protector-buffer-size"="8" }
 // CHECK-AMDGCN: #[[AMDGCN_ATTR1]] = { convergent nounwind }
 //
-// CHECK-NVPTX: #[[NVPTX_ATTR0]] = { convergent mustprogress noinline 
norecurse nounwind optnone "no-trapping-math"="true" 
"stack-protector-buffer-size"="8" "target-features"="+ptx32" }
+// CHECK-NVPTX: #[[NVPTX_ATTR0]] = { convergent mustprogress noinline 
norecurse nounwind optnone "no-trapping-math"="true" 
"stack-protector-buffer-size"="8" }
 // CHECK-NVPTX: #[[NVPTX_ATTR1]] = { convergent nounwind }
 //
 // CHECK-SPIR: #[[SPIR_ATTR0]] = { convergent mustprogress noinline norecurse 
nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
diff --git a/clang/test/Headers/gpuintrin.c b/clang/test/Headers/gpuintrin.c
index a9b9889b280ae..565481ab0a971 100644
--- a/clang/test/Headers/gpuintrin.c
+++ b/clang/test/Headers/gpuintrin.c
@@ -5,8 +5,8 @@
 // RUN: | FileCheck %s --check-prefix=AMDGPU
 // RUN: %clang_cc1 -internal-isystem %S/Inputs/include  \
 // RUN:   -internal-isystem %S/../../lib/Headers/ \
-// RUN:   -target-feature +ptx62 \
-// RUN:   -triple nvptx64-nvidia-cuda -emit-llvm %s -o - \
+// RUN:   -triple nvptx64-nvidia-cuda -target-feature +ptx63 \
+// RUN:   -emit-llvm %s -o - \
 // RUN: | FileCheck %s --check-prefix=NVPTX
 // RUN: %clang_cc1 -internal-isystem %S/Inputs/include  \
 // RUN:   -internal-isystem %S/../../lib/Headers/ \
diff --git a/flang/lib/Frontend/CompilerInstance.cpp 
b/flang/lib/Frontend/CompilerInstance.cpp
index 851cd1f47afd2..5448293584d47 100644
--- a/flang/lib/Frontend/CompilerInstance.cpp
+++ b/flang/lib/Frontend/CompilerInstance.cpp
@@ -288,25 +288,16 @@ 
getExplicitAndImplicitNVPTXTargetFeatures(clang::DiagnosticsEngine &diags,
                                           const llvm::Triple triple) {
   llvm::StringRef cpu = targetOpts.cpu;
   llvm::StringMap<bool> implicitFeaturesMap;
-  std::string errorMsg;
-  bool ptxVer = false;
 
   // Add target features specified by the user
   for (auto &userFeature : targetOpts.featuresAsWritten) {
     llvm::StringRef userKeyString(llvm::StringRef(userFeature).drop_front(1));
     implicitFeaturesMap[userKeyString.str()] = (userFeature[0] == '+');
-    // Check if the user provided a PTX version
-    if (userKeyString.starts_with("ptx"))
-      ptxVer = true;
   }
 
-  // Set the default PTX version to `ptx61` if none was provided.
-  // TODO: set the default PTX version based on the chip.
-  if (!ptxVer)
-    implicitFeaturesMap["ptx61"] = true;
-
-  // Set the compute capability.
-  implicitFeaturesMap[cpu.str()] = true;
+  // Set the compute capability (only if one was explicitly provided).
+  if (!cpu.empty())
+    implicitFeaturesMap[cpu.str()] = true;
 
   llvm::SmallVector<std::string> featuresVec;
   for (auto &implicitFeatureItem : implicitFeaturesMap) {
diff --git a/flang/test/Lower/OpenMP/target_cpu_features.f90 
b/flang/test/Lower/OpenMP/target_cpu_features.f90
index 341cfc7991d43..78f29b23068af 100644
--- a/flang/test/Lower/OpenMP/target_cpu_features.f90
+++ b/flang/test/Lower/OpenMP/target_cpu_features.f90
@@ -16,4 +16,4 @@
 
 !NVPTX: module attributes {
 !NVPTX-SAME: fir.target_cpu = "sm_80"
-!NVPTX-SAME: fir.target_features = #llvm.target_features<["+ptx61", "+sm_80"]>
+!NVPTX-SAME: fir.target_features = #llvm.target_features<["+sm_80"]>
diff --git a/llvm/lib/Target/NVPTX/NVPTX.td b/llvm/lib/Target/NVPTX/NVPTX.td
index d41a43de95098..80491ac4cc1f8 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.td
+++ b/llvm/lib/Target/NVPTX/NVPTX.td
@@ -68,10 +68,11 @@ class FeaturePTX<int version>:
 // represents 'z'), sm_103f, and sm_103 architecture variants. The sm_103 is
 // compatible with sm_103a and sm_103f, and sm_103f is compatible with sm_103a.
 //
-// Encoding := Arch * 10 + 2 (for 'f') + 1 (for 'a')
+// Encoding := Arch * 10 + ArchSuffixOffset
 // Arch := X * 10 + Y
+// ArchSuffixOffset := 0 (base), 2 ('f'), or 3 ('a')
 //
-// For example, sm_103a is encoded as 1033 (103 * 10 + 2 + 1) and sm_103f is
+// For example, sm_103a is encoded as 1033 (103 * 10 + 3) and sm_103f is
 // encoded as 1032 (103 * 10 + 2).
 //
 // This encoding allows simple partial ordering of the architectures.
@@ -80,21 +81,27 @@ class FeaturePTX<int version>:
 //  + Compare within the family by comparing FullSMVersion, given both belongs 
to
 //    the same family.
 //  + Detect 'a' variants by checking FullSMVersion & 1.
+class Proc<FeatureSM SM>
+ : Processor<SM.Name, NoItineraries, [SM]>;
+
 foreach sm = [20, 21, 30, 32, 35, 37, 50, 52, 53, 60,
               61, 62, 70, 72, 75, 80, 86, 87, 88, 89,
               90, 100, 101, 103, 110, 120, 121] in {
   // Base SM version (e.g. FullSMVersion for sm_100 is 1000)
   def SM#sm : FeatureSM<""#sm, !mul(sm, 10)>;
+  def : Proc<!cast<FeatureSM>("SM"#sm)>;
 
-  // Family-specific targets which are compatible within same family
-  // (e.g. FullSMVersion for sm_100f is 1002)
-  if !ge(sm, 100) then
+  // Family-specific variants, compatible within same family (e.g. sm_100f = 
1002)
+  if !ge(sm, 100) then {
     def SM#sm#f : FeatureSM<""#sm#"f", !add(!mul(sm, 10), 2)>;
+    def : Proc<!cast<FeatureSM>("SM"#sm#"f")>;
+  }
 
-  // Architecture-specific targets which are incompatible across architectures
-  // (e.g. FullSMVersion for sm_100a is 1003)
-  if !ge(sm, 90) then
+  // Architecture-specific variants, incompatible across architectures (e.g. 
sm_100a = 1003)
+  if !ge(sm, 90) then {
     def SM#sm#a : FeatureSM<""#sm#"a", !add(!mul(sm, 10), 3)>;
+    def : Proc<!cast<FeatureSM>("SM"#sm#"a")>;
+  }
 }
 
 foreach version = [32, 40, 41, 42, 43, 50, 60, 61, 62, 63, 64, 65, 70, 71, 72,
@@ -102,55 +109,6 @@ foreach version = [32, 40, 41, 42, 43, 50, 60, 61, 62, 63, 
64, 65, 70, 71, 72,
                    90] in
   def PTX#version : FeaturePTX<version>;
 
-//===----------------------------------------------------------------------===//
-// NVPTX supported processors.
-//===----------------------------------------------------------------------===//
-
-class Proc<string Name, list<SubtargetFeature> Features>
- : Processor<Name, NoItineraries, Features>;
-
-def : Proc<"sm_20",   [SM20, PTX32]>;
-def : Proc<"sm_21",   [SM21, PTX32]>;
-def : Proc<"sm_30",   [SM30]>;
-def : Proc<"sm_32",   [SM32, PTX40]>;
-def : Proc<"sm_35",   [SM35, PTX32]>;
-def : Proc<"sm_37",   [SM37, PTX41]>;
-def : Proc<"sm_50",   [SM50, PTX40]>;
-def : Proc<"sm_52",   [SM52, PTX41]>;
-def : Proc<"sm_53",   [SM53, PTX42]>;
-def : Proc<"sm_60",   [SM60, PTX50]>;
-def : Proc<"sm_61",   [SM61, PTX50]>;
-def : Proc<"sm_62",   [SM62, PTX50]>;
-def : Proc<"sm_70",   [SM70, PTX60]>;
-def : Proc<"sm_72",   [SM72, PTX61]>;
-def : Proc<"sm_75",   [SM75, PTX63]>;
-def : Proc<"sm_80",   [SM80, PTX70]>;
-def : Proc<"sm_86",   [SM86, PTX71]>;
-def : Proc<"sm_87",   [SM87, PTX74]>;
-def : Proc<"sm_88",   [SM88, PTX90]>;
-def : Proc<"sm_89",   [SM89, PTX78]>;
-def : Proc<"sm_90",   [SM90, PTX78]>;
-def : Proc<"sm_90a",  [SM90a, PTX80]>;
-def : Proc<"sm_100",  [SM100, PTX86]>;
-def : Proc<"sm_100a", [SM100a, PTX86]>;
-def : Proc<"sm_100f", [SM100f, PTX88]>;
-def : Proc<"sm_101",  [SM101, PTX86]>;
-def : Proc<"sm_101a", [SM101a, PTX86]>;
-def : Proc<"sm_101f", [SM101f, PTX88]>;
-def : Proc<"sm_103",  [SM103, PTX88]>;
-def : Proc<"sm_103a", [SM103a, PTX88]>;
-def : Proc<"sm_103f", [SM103f, PTX88]>;
-def : Proc<"sm_110",  [SM110, PTX90]>;
-def : Proc<"sm_110a", [SM110a, PTX90]>;
-def : Proc<"sm_110f", [SM110f, PTX90]>;
-def : Proc<"sm_120",  [SM120, PTX87]>;
-def : Proc<"sm_120a", [SM120a, PTX87]>;
-def : Proc<"sm_120f", [SM120f, PTX88]>;
-def : Proc<"sm_121",  [SM121, PTX88]>;
-def : Proc<"sm_121a", [SM121a, PTX88]>;
-def : Proc<"sm_121f", [SM121f, PTX88]>;
-
-
 def Is64Bit : Predicate<"Subtarget->getTargetTriple().getArch() == 
Triple::nvptx64">;
 def NVPTX64 : HwMode<[Is64Bit]>;
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXSubtarget.cpp 
b/llvm/lib/Target/NVPTX/NVPTXSubtarget.cpp
index 989be50d45554..bf3c78d3606bf 100644
--- a/llvm/lib/Target/NVPTX/NVPTXSubtarget.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXSubtarget.cpp
@@ -35,9 +35,87 @@ static cl::opt<bool> NoF32x2("nvptx-no-f32x2", cl::Hidden,
                                       "f32x2 instructions and registers."),
                              cl::init(false));
 
+// FullSmVersion encoding helpers: SM * 10 + suffix offset
+// (0 = base, 2 = 'f', 3 = 'a').
+static constexpr unsigned SM(unsigned Version) { return Version * 10; }
+static constexpr unsigned SMF(unsigned Version) { return SM(Version) + 2; }
+static constexpr unsigned SMA(unsigned Version) { return SM(Version) + 3; }
+
 // Pin the vtable to this file.
 void NVPTXSubtarget::anchor() {}
 
+// Returns the minimum PTX version required for a given SM target.
+// This must be kept in sync with the "Supported Targets" column of the
+// "PTX Release History" table in the PTX ISA documentation:
+// 
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes-ptx-release-history
+//
+// Note: LLVM's minimum supported PTX version is 3.2 (see FeaturePTX in
+// NVPTX.td), so older SMs that supported earlier PTX versions instead use 3.2
+// as their effective minimum.
+static unsigned getMinPTXVersionForSM(unsigned FullSmVersion) {
+  switch (FullSmVersion) {
+  case SM(20):
+  case SM(21):
+  case SM(30):
+  case SM(35):
+    return 32;
+  case SM(32):
+  case SM(50):
+    return 40;
+  case SM(37):
+  case SM(52):
+    return 41;
+  case SM(53):
+    return 42;
+  case SM(60):
+  case SM(61):
+  case SM(62):
+    return 50;
+  case SM(70):
+    return 60;
+  case SM(72):
+    return 61;
+  case SM(75):
+    return 63;
+  case SM(80):
+    return 70;
+  case SM(86):
+    return 71;
+  case SM(87):
+    return 74;
+  case SM(89):
+  case SM(90):
+    return 78;
+  case SMA(90):
+    return 80;
+  case SM(100):
+  case SMA(100):
+  case SM(101):
+  case SMA(101):
+    return 86;
+  case SM(120):
+  case SMA(120):
+    return 87;
+  case SMF(100):
+  case SMF(101):
+  case SM(103):
+  case SMF(103):
+  case SMA(103):
+  case SMF(120):
+  case SM(121):
+  case SMF(121):
+  case SMA(121):
+    return 88;
+  case SM(88):
+  case SM(110):
+  case SMF(110):
+  case SMA(110):
+    return 90;
+  default:
+    llvm_unreachable("Unknown SM version");
+  }
+}
+
 NVPTXSubtarget &NVPTXSubtarget::initializeSubtargetDependencies(StringRef CPU,
                                                                 StringRef FS) {
   TargetName = std::string(CPU);
@@ -49,9 +127,20 @@ NVPTXSubtarget 
&NVPTXSubtarget::initializeSubtargetDependencies(StringRef CPU,
   // sm_90a, which would *not* be a subset of sm_91.
   SmVersion = getSmVersion();
 
-  // Set default to PTX 6.0 (CUDA 9.0)
+  unsigned MinPTX = getMinPTXVersionForSM(FullSmVersion);
+
   if (PTXVersion == 0) {
-    PTXVersion = 60;
+    // User didn't request a specific PTX version; use the minimum for this SM.
+    PTXVersion = MinPTX;
+  } else if (PTXVersion < MinPTX) {
+    // User explicitly requested an insufficient PTX version.
+    reportFatalUsageError(
+        formatv("PTX version {0}.{1} does not support target '{2}'. "
+                "Minimum required PTX version is {3}.{4}. "
+                "Either remove the PTX version to use the default, "
+                "or increase it to at least {3}.{4}.",
+                PTXVersion / 10, PTXVersion % 10, getTargetName(), MinPTX / 10,
+                MinPTX % 10));
   }
 
   return *this;
diff --git a/llvm/lib/Target/NVPTX/NVPTXSubtarget.h 
b/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
index aeface20f07f3..01f1680e9c63d 100644
--- a/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
+++ b/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
@@ -29,6 +29,10 @@
 
 namespace llvm {
 
+// FullSmVersion encoding: SM * 10 + ArchSuffixOffset
+// ArchSuffixOffset: 0 (base), 2 ('f'), 3 ('a')
+// e.g. sm_100 -> 1000, sm_100f -> 1002, sm_100a -> 1003
+
 class NVPTXSubtarget : public NVPTXGenSubtargetInfo {
   virtual void anchor();
   std::string TargetName;
@@ -36,8 +40,9 @@ class NVPTXSubtarget : public NVPTXGenSubtargetInfo {
   // PTX version x.y is represented as 10*x+y, e.g. 3.1 == 31
   unsigned PTXVersion;
 
-  // Full SM version x.y is represented as 100*x+10*y+feature, e.g. 3.1 == 310
-  // sm_90a == 901
+  // FullSmVersion encoding: SM * 10 + ArchSuffixOffset
+  // ArchSuffixOffset: 0 (base), 2 ('f'), 3 ('a')
+  // e.g. sm_30 -> 300, sm_90a -> 903, sm_100f -> 1002
   unsigned int FullSmVersion;
 
   // SM version x.y is represented as 10*x+y, e.g. 3.1 == 31. Derived from
diff --git a/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll 
b/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll
index 9e6beda9b64aa..c115cc546df28 100644
--- a/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll
+++ b/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicas...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/180116
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [llvm] [mlir] Reland "[NVPTX] Validate user-specified PTX version against SM version" (PR #180116)

Reply via email to