[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2023-12-15 Thread Jun Wang via cfe-commits

https://github.com/jwanggit86 created 
https://github.com/llvm/llvm-project/pull/75647

A new function attribute named amdgpu-num-work-groups is added. This attribute 
allows programmers to let the compiler know the number of workgroups to be 
launched and do optimizations based on that information.

>From bb15eebae9645e5383f26066093c0734ea76442d Mon Sep 17 00:00:00 2001
From: Jun Wang 
Date: Fri, 15 Dec 2023 13:53:54 -0600
Subject: [PATCH] [AMDGPU] Adding the amdgpu-num-work-groups function attribute

A new function attribute named amdgpu-num-work-groups is added.
This attribute allows programmers to let the compiler know the
number of workgroups to be launched and do optimizations based
on that information.
---
 clang/include/clang/Basic/Attr.td |  7 ++
 clang/include/clang/Basic/AttrDocs.td | 23 ++
 clang/lib/CodeGen/Targets/AMDGPU.cpp  |  7 ++
 clang/lib/Sema/SemaDeclAttr.cpp   | 13 +++
 ...a-attribute-supported-attributes-list.test |  1 +
 .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp  |  4 +
 llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp|  6 ++
 llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h  |  3 +
 .../Target/AMDGPU/SIMachineFunctionInfo.cpp   |  1 +
 .../lib/Target/AMDGPU/SIMachineFunctionInfo.h |  9 ++
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 15 
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  8 ++
 .../AMDGPU/attr-amdgpu-num-work-groups.ll | 82 +++
 13 files changed, 179 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-work-groups.ll

diff --git a/clang/include/clang/Basic/Attr.td 
b/clang/include/clang/Basic/Attr.td
index 5943583d92773a..605fcbbff027b9 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -2011,6 +2011,13 @@ def AMDGPUNumVGPR : InheritableAttr {
   let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">;
 }
 
+def AMDGPUNumWorkGroups : InheritableAttr {
+  let Spellings = [Clang<"amdgpu_num_work_groups", 0>];
+  let Args = [UnsignedArgument<"NumWorkGroups">];
+  let Documentation = [AMDGPUNumWorkGroupsDocs];
+  let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">;
+}
+
 def AMDGPUKernelCall : DeclOrTypeAttr {
   let Spellings = [Clang<"amdgpu_kernel">];
   let Documentation = [Undocumented];
diff --git a/clang/include/clang/Basic/AttrDocs.td 
b/clang/include/clang/Basic/AttrDocs.td
index 77950ab6d877ea..0bf3ccf367284c 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -2693,6 +2693,29 @@ An error will be given if:
   }];
 }
 
+def AMDGPUNumWorkGroupsDocs : Documentation {
+  let Category = DocCatAMDGPUAttributes;
+  let Content = [{
+The number of work groups specifies the number of work groups when the kernel
+is dispatched.
+
+Clang supports the
+``__attribute__((amdgpu_num_work_groups()))`` attribute for the
+AMDGPU target. This attribute may be attached to a kernel function definition
+and is an optimization hint.
+
+ parameter specifies the number of work groups.
+
+If specified, the AMDGPU target backend might be able to produce better machine
+code.
+
+An error will be given if:
+  - Specified values violate subtarget specifications;
+  - Specified values are not compatible with values provided through other
+attributes.
+  }];
+}
+
 def DocCatCallingConvs : DocumentationCategory<"Calling Conventions"> {
   let Content = [{
 Clang supports several different calling conventions, depending on the target
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 03ac6b78598fc8..11a0835f37f4a9 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -356,6 +356,13 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
 if (NumVGPR != 0)
   F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));
   }
+
+  if (const auto *Attr = FD->getAttr()) {
+uint32_t NumWG = Attr->getNumWorkGroups();
+
+if (NumWG != 0)
+  F->addFnAttr("amdgpu-num-work-groups", llvm::utostr(NumWG));
+  }
 }
 
 /// Emits control constants used to change per-architecture behaviour in the
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 5b29b05dee54b3..3737dd256aff02 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -8051,6 +8051,16 @@ static void handleAMDGPUNumVGPRAttr(Sema &S, Decl *D, 
const ParsedAttr &AL) {
   D->addAttr(::new (S.Context) AMDGPUNumVGPRAttr(S.Context, AL, NumVGPR));
 }
 
+static void handleAMDGPUNumWorkGroupsAttr(Sema &S, Decl *D,
+  const ParsedAttr &AL) {
+  uint32_t NumWG = 0;
+  Expr *NumWGExpr = AL.getArgAsExpr(0);
+  if (!checkUInt32Argument(S, AL, NumWGExpr, NumWG))
+return;
+
+  D->addAttr(::new (S.Context) AMDGPUNumWorkGroupsAttr(S.Context, AL, NumWG));
+}
+
 static void handleX86ForceAlignArgPointerAttr(Sema &S, Decl *D,
   

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2023-12-15 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Jun Wang (jwanggit86)


Changes

A new function attribute named amdgpu-num-work-groups is added. This attribute 
allows programmers to let the compiler know the number of workgroups to be 
launched and do optimizations based on that information.

---
Full diff: https://github.com/llvm/llvm-project/pull/75647.diff


13 Files Affected:

- (modified) clang/include/clang/Basic/Attr.td (+7) 
- (modified) clang/include/clang/Basic/AttrDocs.td (+23) 
- (modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (+7) 
- (modified) clang/lib/Sema/SemaDeclAttr.cpp (+13) 
- (modified) clang/test/Misc/pragma-attribute-supported-attributes-list.test 
(+1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp (+4) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp (+6) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h (+3) 
- (modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp (+1) 
- (modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h (+9) 
- (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+15) 
- (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h (+8) 
- (added) llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-work-groups.ll (+82) 


``diff
diff --git a/clang/include/clang/Basic/Attr.td 
b/clang/include/clang/Basic/Attr.td
index 5943583d92773a..605fcbbff027b9 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -2011,6 +2011,13 @@ def AMDGPUNumVGPR : InheritableAttr {
   let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">;
 }
 
+def AMDGPUNumWorkGroups : InheritableAttr {
+  let Spellings = [Clang<"amdgpu_num_work_groups", 0>];
+  let Args = [UnsignedArgument<"NumWorkGroups">];
+  let Documentation = [AMDGPUNumWorkGroupsDocs];
+  let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">;
+}
+
 def AMDGPUKernelCall : DeclOrTypeAttr {
   let Spellings = [Clang<"amdgpu_kernel">];
   let Documentation = [Undocumented];
diff --git a/clang/include/clang/Basic/AttrDocs.td 
b/clang/include/clang/Basic/AttrDocs.td
index 77950ab6d877ea..0bf3ccf367284c 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -2693,6 +2693,29 @@ An error will be given if:
   }];
 }
 
+def AMDGPUNumWorkGroupsDocs : Documentation {
+  let Category = DocCatAMDGPUAttributes;
+  let Content = [{
+The number of work groups specifies the number of work groups when the kernel
+is dispatched.
+
+Clang supports the
+``__attribute__((amdgpu_num_work_groups()))`` attribute for the
+AMDGPU target. This attribute may be attached to a kernel function definition
+and is an optimization hint.
+
+ parameter specifies the number of work groups.
+
+If specified, the AMDGPU target backend might be able to produce better machine
+code.
+
+An error will be given if:
+  - Specified values violate subtarget specifications;
+  - Specified values are not compatible with values provided through other
+attributes.
+  }];
+}
+
 def DocCatCallingConvs : DocumentationCategory<"Calling Conventions"> {
   let Content = [{
 Clang supports several different calling conventions, depending on the target
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 03ac6b78598fc8..11a0835f37f4a9 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -356,6 +356,13 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
 if (NumVGPR != 0)
   F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));
   }
+
+  if (const auto *Attr = FD->getAttr()) {
+uint32_t NumWG = Attr->getNumWorkGroups();
+
+if (NumWG != 0)
+  F->addFnAttr("amdgpu-num-work-groups", llvm::utostr(NumWG));
+  }
 }
 
 /// Emits control constants used to change per-architecture behaviour in the
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 5b29b05dee54b3..3737dd256aff02 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -8051,6 +8051,16 @@ static void handleAMDGPUNumVGPRAttr(Sema &S, Decl *D, 
const ParsedAttr &AL) {
   D->addAttr(::new (S.Context) AMDGPUNumVGPRAttr(S.Context, AL, NumVGPR));
 }
 
+static void handleAMDGPUNumWorkGroupsAttr(Sema &S, Decl *D,
+  const ParsedAttr &AL) {
+  uint32_t NumWG = 0;
+  Expr *NumWGExpr = AL.getArgAsExpr(0);
+  if (!checkUInt32Argument(S, AL, NumWGExpr, NumWG))
+return;
+
+  D->addAttr(::new (S.Context) AMDGPUNumWorkGroupsAttr(S.Context, AL, NumWG));
+}
+
 static void handleX86ForceAlignArgPointerAttr(Sema &S, Decl *D,
   const ParsedAttr &AL) {
   // If we try to apply it to a function pointer, don't warn, but don't
@@ -9058,6 +9068,9 @@ ProcessDeclAttribute(Sema &S, Scope *scope, Decl *D, 
const ParsedAttr &AL,
   case ParsedAttr::AT_AMDGPUNumVGPR:
 handleAMDGPUNumVGPRAttr(S, D, AL);
 break;
+  case 

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-22 Thread Jun Wang via cfe-commits

https://github.com/jwanggit86 closed 
https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-09 Thread Matt Arsenault via cfe-commits

arsenm wrote:

ping @krzysz00 

https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2023-12-19 Thread Jun Wang via cfe-commits

jwanggit86 wrote:

Two possible optimizations mentioned by the requester are, 
"1. This'll let the backend know the maximum size of the workgroup ID, and so 
we can do things like infer nsw or the ability to use a 16-bit add or so on

2. This could be used to optimize global sync stuff in the future
"

https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-15 Thread Jun Wang via cfe-commits

jwanggit86 wrote:

@krzysz00 So how do you want to proceed?

https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-15 Thread Krzysztof Drewniak via cfe-commits

krzysz00 wrote:

I'd go with Matt's point: close this, and then add metadata for required launch 
grid sizes. Then you can update `AMDGPULowerKernelAttributes` to use said 
metadata.

https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-16 Thread Alexey Bader via cfe-commits

bader wrote:

> How does this attribute relate to `reqd_work_group_size` and related existing 
> attributes?

They seems to be different/"unrelated". Based on the description of the 
`amdgpu-num-work-groups` attribute it provides "number of work-groups", whereas 
`reqd_work_group_size` provides "number of work-items in a work-group".

I think this attributed can be useful for other targets. The optimization ideas 
described in 
https://github.com/llvm/llvm-project/pull/75647#issuecomment-1863352459 seems 
to be generic. There is an RFC to unify some existing functionality exposing 
"grid" information: 
https://discourse.llvm.org/t/proposing-llvm-gpu-intrinsics/75374. This might 
fall into similar category.

https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-16 Thread Krzysztof Drewniak via cfe-commits

krzysz00 wrote:

Good to know that other targets have that sort of "how many work groups will be 
launched" information. Having that be a min/max (either per dimension or in 
total or both) may be the right approach here, and this could be a good excuse 
for the unification being talked about.

(This isn't anything I, as the initial proposer of the idea, am needing super 
urgently - this was spawned from "hey, I'm up here in MLIR generating kernels 
that always have N workgroups in the grid, can I be cleverer than just sticking 
`!range` metadata on the intrinsics?")

https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits