[clang] [llvm] AMDGPU: Loop over the types for global_load_tr16 pats (NFC) (PR #99551)

2024-07-18 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/99551
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] AMDGPU: Add back half and bfloat support for global_load_tr16 pats (PR #99540)

2024-07-18 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/99540
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Report error in clang if wave32 is requested where unsupported (PR #97633)

2024-07-09 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> /build/buildbot/premerge-monolithic-linux/llvm-project/flang/lib/Frontend/CompilerInstance.cpp:226:44:
>  error: too many arguments to function call, expected 3, have 4

Fixed.

https://github.com/llvm/llvm-project/pull/97633
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Report error in clang if wave32 is requested where unsupported (PR #97633)

2024-07-09 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

[AMD Official Use Only - AMD Internal Distribution Only]

Fixed https://github.com/llvm/llvm-project/pull/98231
Sorry.

Stas

From: LLVM Continuous Integration ***@***.***>
Date: Tuesday, July 9, 2024 at 14:37
To: llvm/llvm-project ***@***.***>
Cc: Mekhanoshin, Stanislav ***@***.***>, State change ***@***.***>
Subject: Re: [llvm/llvm-project] [AMDGPU] Report error in clang if wave32 is 
requested where unsupported (PR #97633)
Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.


LLVM Buildbot has detected a new failure on builder ppc64le-flang-rhel-clang 
running on ppc64le-flang-rhel-test while building clang,llvm at step 5 
"build-unified-tree".

Full details are available at: 
https://lab.llvm.org/buildbot/#/builders/157/builds/2114

Here is the relevant piece of the build log for the reference:

Step 5 (build-unified-tree) failure: build (failure)

...

53.159 [18/8/6429] Creating library symlink lib/libLTO.so

53.668 [18/7/6430] Linking CXX executable bin/clang-import-test

53.749 [18/6/6431] Linking CXX executable bin/c-index-test

53.847 [18/5/6432] Linking CXX executable bin/clang-scan-deps

54.324 [18/4/6433] Linking CXX executable bin/clang-repl

54.489 [18/3/6434] Linking CXX executable bin/clang-19

54.503 [17/3/6435] Linking CXX shared library lib/libclang-cpp.so.19.0git

54.504 [16/3/6436] Creating executable symlink bin/clang

54.508 [16/2/6437] Creating library symlink lib/libclang-cpp.so

59.345 [16/1/6438] Building CXX object 
tools/flang/lib/Frontend/CMakeFiles/obj.flangFrontend.dir/CompilerInstance.cpp.o

FAILED: 
tools/flang/lib/Frontend/CMakeFiles/obj.flangFrontend.dir/CompilerInstance.cpp.o

ccache /home/buildbots/llvm-external-buildbots/clang.16.0.1/bin/clang++ 
-DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG 
-D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS 
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/tools/flang/lib/Frontend
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/lib/Frontend
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/include
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/tools/flang/include
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/include
 
-I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/llvm/include
 -isystem 
/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/llvm/../mlir/include
 -isystem 
/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/tools/mlir/include
 -isystem 
/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/tools/clang/include
 -isystem 
/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/llvm/../clang/include
 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden 
-Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra 
-Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers 
-pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough 
-Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor 
-Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion 
-Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color 
-ffunction-sections -fdata-sections -Werror -Wno-deprecated-copy 
-Wno-string-conversion -Wno-ctad-maybe-unsupported 
-Wno-unused-command-line-argument -Wstring-conversion   
-Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  
-fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT 
tools/flang/lib/Frontend/CMakeFiles/obj.flangFrontend.dir/CompilerInstance.cpp.o
 -MF 
tools/flang/lib/Frontend/CMakeFiles/obj.flangFrontend.dir/CompilerInstance.cpp.o.d
 -o 
tools/flang/lib/Frontend/CMakeFiles/obj.flangFrontend.dir/CompilerInstance.cpp.o
 -c 
/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/lib/Frontend/CompilerInstance.cpp

/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/lib/Frontend/CompilerInstance.cpp:226:44:
 error: too many arguments to function call, expected 3, have 4

   errorMsg)) {

   ^~~~


[clang] [llvm] [AMDGPU] Report error in clang if wave32 is requested where unsupported (PR #97633)

2024-07-09 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/97633
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Report error in clang if wave32 is requested where unsupported (PR #97633)

2024-07-09 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/97633

>From dc9d1e2039981bb412e68975570d9911511bb880 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 3 Jul 2024 13:12:21 -0700
Subject: [PATCH 1/3] [AMDGPU] Report error in clang if wave32 is requested
 where unsupported

---
 clang/lib/Basic/Targets/AMDGPU.cpp  |  8 ++--
 clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl |  2 ++
 .../test/SemaOpenCL/builtins-amdgcn-error-wave32.cl |  3 +--
 llvm/include/llvm/TargetParser/TargetParser.h   |  3 ++-
 llvm/lib/TargetParser/TargetParser.cpp  | 13 +
 5 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index cc7be64656e5b..ea20acdb930fa 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -188,8 +188,12 @@ bool AMDGPUTargetInfo::initFeatureMap(
 
   // TODO: Should move this logic into TargetParser
   std::string ErrorMsg;
-  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg)) {
-Diags.Report(diag::err_invalid_feature_combination) << ErrorMsg;
+  bool IsCombinationError;
+  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg,
+ IsCombinationError)) {
+Diags.Report(IsCombinationError ? diag::err_invalid_feature_combination
+: diag::err_opt_not_valid_on_target)
+<< ErrorMsg;
 return false;
   }
 
diff --git a/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl
index 7dbf5c3c6cd59..4e2f7f86e8402 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl
@@ -1,6 +1,8 @@
 // RUN: not %clang_cc1 -triple amdgcn -target-feature +wavefrontsize32 
-target-feature +wavefrontsize64 -o /dev/null %s 2>&1 | FileCheck %s
 // RUN: not %clang_cc1 -triple amdgcn -target-cpu gfx1103 -target-feature 
+wavefrontsize32 -target-feature +wavefrontsize64 -o /dev/null %s 2>&1 | 
FileCheck %s
+// RUN: not %clang_cc1 -triple amdgcn -target-cpu gfx900 -target-feature 
+wavefrontsize32 -o /dev/null %s 2>&1 | FileCheck %s --check-prefix=GFX9
 
 // CHECK: error: invalid feature combination: 'wavefrontsize32' and 
'wavefrontsize64' are mutually exclusive
+// GFX9: error: option 'wavefrontsize32' cannot be specified on this target
 
 kernel void test() {}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl
index 52f31c1ff0575..e0e3872b566d9 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl
@@ -12,8 +12,7 @@ void test_ballot_wave32(global uint* out, int a, int b) {
   *out = __builtin_amdgcn_ballot_w32(a == b);  // expected-error 
{{'__builtin_amdgcn_ballot_w32' needs target feature wavefrontsize32}}
 }
 
-// FIXME: Should error for subtargets that don't support wave32
-__attribute__((target("wavefrontsize32")))
+__attribute__((target("wavefrontsize32"))) // gfx9-error@*:* {{option 
'wavefrontsize32' cannot be specified on this target}}
 void test_ballot_wave32_target_attr(global uint* out, int a, int b) {
   *out = __builtin_amdgcn_ballot_w32(a == b);
 }
diff --git a/llvm/include/llvm/TargetParser/TargetParser.h 
b/llvm/include/llvm/TargetParser/TargetParser.h
index e03d8f6eebfca..858a1fdc01b37 100644
--- a/llvm/include/llvm/TargetParser/TargetParser.h
+++ b/llvm/include/llvm/TargetParser/TargetParser.h
@@ -178,7 +178,8 @@ void fillAMDGPUFeatureMap(StringRef GPU, const Triple ,
 
 /// Inserts wave size feature for given GPU into features map
 bool insertWaveSizeFeature(StringRef GPU, const Triple ,
-   StringMap , std::string );
+   StringMap , std::string ,
+   bool );
 
 } // namespace AMDGPU
 } // namespace llvm
diff --git a/llvm/lib/TargetParser/TargetParser.cpp 
b/llvm/lib/TargetParser/TargetParser.cpp
index 00df92e0aaded..4bcd966183c67 100644
--- a/llvm/lib/TargetParser/TargetParser.cpp
+++ b/llvm/lib/TargetParser/TargetParser.cpp
@@ -618,15 +618,20 @@ static bool isWave32Capable(StringRef GPU, const Triple 
) {
 
 bool AMDGPU::insertWaveSizeFeature(StringRef GPU, const Triple ,
StringMap ,
-   std::string ) {
+   std::string ,
+   bool ) {
   bool IsWave32Capable = isWave32Capable(GPU, T);
   const bool IsNullGPU = GPU.empty();
-  // FIXME: Not diagnosing wavefrontsize32 on wave64 only targets.
-  const bool HaveWave32 =
-  (IsWave32Capable || IsNullGPU) && Features.count("wavefrontsize32");
+  const bool HaveWave32 = Features.count("wavefrontsize32");
   const bool HaveWave64 = Features.count("wavefrontsize64");
   if (HaveWave32 && HaveWave64) {

[clang] [llvm] [AMDGPU] Report error in clang if wave32 is requested where unsupported (PR #97633)

2024-07-09 Thread Stanislav Mekhanoshin via cfe-commits


@@ -188,8 +188,12 @@ bool AMDGPUTargetInfo::initFeatureMap(
 
   // TODO: Should move this logic into TargetParser
   std::string ErrorMsg;
-  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg)) {
-Diags.Report(diag::err_invalid_feature_combination) << ErrorMsg;
+  bool IsCombinationError;
+  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg,

rampitec wrote:

Changed. Do you like it better?

https://github.com/llvm/llvm-project/pull/97633
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Report error in clang if wave32 is requested where unsupported (PR #97633)

2024-07-09 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/97633

>From dc9d1e2039981bb412e68975570d9911511bb880 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 3 Jul 2024 13:12:21 -0700
Subject: [PATCH 1/2] [AMDGPU] Report error in clang if wave32 is requested
 where unsupported

---
 clang/lib/Basic/Targets/AMDGPU.cpp  |  8 ++--
 clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl |  2 ++
 .../test/SemaOpenCL/builtins-amdgcn-error-wave32.cl |  3 +--
 llvm/include/llvm/TargetParser/TargetParser.h   |  3 ++-
 llvm/lib/TargetParser/TargetParser.cpp  | 13 +
 5 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index cc7be64656e5b..ea20acdb930fa 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -188,8 +188,12 @@ bool AMDGPUTargetInfo::initFeatureMap(
 
   // TODO: Should move this logic into TargetParser
   std::string ErrorMsg;
-  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg)) {
-Diags.Report(diag::err_invalid_feature_combination) << ErrorMsg;
+  bool IsCombinationError;
+  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg,
+ IsCombinationError)) {
+Diags.Report(IsCombinationError ? diag::err_invalid_feature_combination
+: diag::err_opt_not_valid_on_target)
+<< ErrorMsg;
 return false;
   }
 
diff --git a/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl
index 7dbf5c3c6cd59..4e2f7f86e8402 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl
@@ -1,6 +1,8 @@
 // RUN: not %clang_cc1 -triple amdgcn -target-feature +wavefrontsize32 
-target-feature +wavefrontsize64 -o /dev/null %s 2>&1 | FileCheck %s
 // RUN: not %clang_cc1 -triple amdgcn -target-cpu gfx1103 -target-feature 
+wavefrontsize32 -target-feature +wavefrontsize64 -o /dev/null %s 2>&1 | 
FileCheck %s
+// RUN: not %clang_cc1 -triple amdgcn -target-cpu gfx900 -target-feature 
+wavefrontsize32 -o /dev/null %s 2>&1 | FileCheck %s --check-prefix=GFX9
 
 // CHECK: error: invalid feature combination: 'wavefrontsize32' and 
'wavefrontsize64' are mutually exclusive
+// GFX9: error: option 'wavefrontsize32' cannot be specified on this target
 
 kernel void test() {}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl
index 52f31c1ff0575..e0e3872b566d9 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl
@@ -12,8 +12,7 @@ void test_ballot_wave32(global uint* out, int a, int b) {
   *out = __builtin_amdgcn_ballot_w32(a == b);  // expected-error 
{{'__builtin_amdgcn_ballot_w32' needs target feature wavefrontsize32}}
 }
 
-// FIXME: Should error for subtargets that don't support wave32
-__attribute__((target("wavefrontsize32")))
+__attribute__((target("wavefrontsize32"))) // gfx9-error@*:* {{option 
'wavefrontsize32' cannot be specified on this target}}
 void test_ballot_wave32_target_attr(global uint* out, int a, int b) {
   *out = __builtin_amdgcn_ballot_w32(a == b);
 }
diff --git a/llvm/include/llvm/TargetParser/TargetParser.h 
b/llvm/include/llvm/TargetParser/TargetParser.h
index e03d8f6eebfca..858a1fdc01b37 100644
--- a/llvm/include/llvm/TargetParser/TargetParser.h
+++ b/llvm/include/llvm/TargetParser/TargetParser.h
@@ -178,7 +178,8 @@ void fillAMDGPUFeatureMap(StringRef GPU, const Triple ,
 
 /// Inserts wave size feature for given GPU into features map
 bool insertWaveSizeFeature(StringRef GPU, const Triple ,
-   StringMap , std::string );
+   StringMap , std::string ,
+   bool );
 
 } // namespace AMDGPU
 } // namespace llvm
diff --git a/llvm/lib/TargetParser/TargetParser.cpp 
b/llvm/lib/TargetParser/TargetParser.cpp
index 00df92e0aaded..4bcd966183c67 100644
--- a/llvm/lib/TargetParser/TargetParser.cpp
+++ b/llvm/lib/TargetParser/TargetParser.cpp
@@ -618,15 +618,20 @@ static bool isWave32Capable(StringRef GPU, const Triple 
) {
 
 bool AMDGPU::insertWaveSizeFeature(StringRef GPU, const Triple ,
StringMap ,
-   std::string ) {
+   std::string ,
+   bool ) {
   bool IsWave32Capable = isWave32Capable(GPU, T);
   const bool IsNullGPU = GPU.empty();
-  // FIXME: Not diagnosing wavefrontsize32 on wave64 only targets.
-  const bool HaveWave32 =
-  (IsWave32Capable || IsNullGPU) && Features.count("wavefrontsize32");
+  const bool HaveWave32 = Features.count("wavefrontsize32");
   const bool HaveWave64 = Features.count("wavefrontsize64");
   if (HaveWave32 && HaveWave64) {

[clang] [llvm] [AMDGPU] Report error in clang if wave32 is requested where unsupported (PR #97633)

2024-07-08 Thread Stanislav Mekhanoshin via cfe-commits


@@ -188,8 +188,12 @@ bool AMDGPUTargetInfo::initFeatureMap(
 
   // TODO: Should move this logic into TargetParser
   std::string ErrorMsg;
-  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg)) {
-Diags.Report(diag::err_invalid_feature_combination) << ErrorMsg;
+  bool IsCombinationError;
+  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg,

rampitec wrote:

Ugh. Actually it would need to include clang header into the TargetParser.h. 
This is the primary reason the diagnostics was not moved there completely and 
the TODO comment 3 lines above.

https://github.com/llvm/llvm-project/pull/97633
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Report error in clang if wave32 is requested where unsupported (PR #97633)

2024-07-08 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec edited 
https://github.com/llvm/llvm-project/pull/97633
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Report error in clang if wave32 is requested where unsupported (PR #97633)

2024-07-08 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec edited 
https://github.com/llvm/llvm-project/pull/97633
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Report error in clang if wave32 is requested where unsupported (PR #97633)

2024-07-08 Thread Stanislav Mekhanoshin via cfe-commits


@@ -188,8 +188,12 @@ bool AMDGPUTargetInfo::initFeatureMap(
 
   // TODO: Should move this logic into TargetParser
   std::string ErrorMsg;
-  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg)) {
-Diags.Report(diag::err_invalid_feature_combination) << ErrorMsg;
+  bool IsCombinationError;
+  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg,

rampitec wrote:

I still need to return 2 values: error code and error message. Do you want 
std::optionalhttps://github.com/llvm/llvm-project/pull/97633
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Report error in clang if wave32 is requested where unsupported (PR #97633)

2024-07-03 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/97633

None

>From dc9d1e2039981bb412e68975570d9911511bb880 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 3 Jul 2024 13:12:21 -0700
Subject: [PATCH] [AMDGPU] Report error in clang if wave32 is requested where
 unsupported

---
 clang/lib/Basic/Targets/AMDGPU.cpp  |  8 ++--
 clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl |  2 ++
 .../test/SemaOpenCL/builtins-amdgcn-error-wave32.cl |  3 +--
 llvm/include/llvm/TargetParser/TargetParser.h   |  3 ++-
 llvm/lib/TargetParser/TargetParser.cpp  | 13 +
 5 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index cc7be64656e5b2..ea20acdb930fae 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -188,8 +188,12 @@ bool AMDGPUTargetInfo::initFeatureMap(
 
   // TODO: Should move this logic into TargetParser
   std::string ErrorMsg;
-  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg)) {
-Diags.Report(diag::err_invalid_feature_combination) << ErrorMsg;
+  bool IsCombinationError;
+  if (!insertWaveSizeFeature(CPU, getTriple(), Features, ErrorMsg,
+ IsCombinationError)) {
+Diags.Report(IsCombinationError ? diag::err_invalid_feature_combination
+: diag::err_opt_not_valid_on_target)
+<< ErrorMsg;
 return false;
   }
 
diff --git a/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl
index 7dbf5c3c6cd596..4e2f7f86e84022 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features-illegal.cl
@@ -1,6 +1,8 @@
 // RUN: not %clang_cc1 -triple amdgcn -target-feature +wavefrontsize32 
-target-feature +wavefrontsize64 -o /dev/null %s 2>&1 | FileCheck %s
 // RUN: not %clang_cc1 -triple amdgcn -target-cpu gfx1103 -target-feature 
+wavefrontsize32 -target-feature +wavefrontsize64 -o /dev/null %s 2>&1 | 
FileCheck %s
+// RUN: not %clang_cc1 -triple amdgcn -target-cpu gfx900 -target-feature 
+wavefrontsize32 -o /dev/null %s 2>&1 | FileCheck %s --check-prefix=GFX9
 
 // CHECK: error: invalid feature combination: 'wavefrontsize32' and 
'wavefrontsize64' are mutually exclusive
+// GFX9: error: option 'wavefrontsize32' cannot be specified on this target
 
 kernel void test() {}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl
index 52f31c1ff05759..e0e3872b566d9e 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave32.cl
@@ -12,8 +12,7 @@ void test_ballot_wave32(global uint* out, int a, int b) {
   *out = __builtin_amdgcn_ballot_w32(a == b);  // expected-error 
{{'__builtin_amdgcn_ballot_w32' needs target feature wavefrontsize32}}
 }
 
-// FIXME: Should error for subtargets that don't support wave32
-__attribute__((target("wavefrontsize32")))
+__attribute__((target("wavefrontsize32"))) // gfx9-error@*:* {{option 
'wavefrontsize32' cannot be specified on this target}}
 void test_ballot_wave32_target_attr(global uint* out, int a, int b) {
   *out = __builtin_amdgcn_ballot_w32(a == b);
 }
diff --git a/llvm/include/llvm/TargetParser/TargetParser.h 
b/llvm/include/llvm/TargetParser/TargetParser.h
index e03d8f6eebfca3..858a1fdc01b371 100644
--- a/llvm/include/llvm/TargetParser/TargetParser.h
+++ b/llvm/include/llvm/TargetParser/TargetParser.h
@@ -178,7 +178,8 @@ void fillAMDGPUFeatureMap(StringRef GPU, const Triple ,
 
 /// Inserts wave size feature for given GPU into features map
 bool insertWaveSizeFeature(StringRef GPU, const Triple ,
-   StringMap , std::string );
+   StringMap , std::string ,
+   bool );
 
 } // namespace AMDGPU
 } // namespace llvm
diff --git a/llvm/lib/TargetParser/TargetParser.cpp 
b/llvm/lib/TargetParser/TargetParser.cpp
index 00df92e0aadeda..4bcd966183c678 100644
--- a/llvm/lib/TargetParser/TargetParser.cpp
+++ b/llvm/lib/TargetParser/TargetParser.cpp
@@ -618,15 +618,20 @@ static bool isWave32Capable(StringRef GPU, const Triple 
) {
 
 bool AMDGPU::insertWaveSizeFeature(StringRef GPU, const Triple ,
StringMap ,
-   std::string ) {
+   std::string ,
+   bool ) {
   bool IsWave32Capable = isWave32Capable(GPU, T);
   const bool IsNullGPU = GPU.empty();
-  // FIXME: Not diagnosing wavefrontsize32 on wave64 only targets.
-  const bool HaveWave32 =
-  (IsWave32Capable || IsNullGPU) && Features.count("wavefrontsize32");
+  const bool HaveWave32 = Features.count("wavefrontsize32");
   const bool HaveWave64 = Features.count("wavefrontsize64");
   if (HaveWave32 && 

[clang] [libclc] [llvm] [AMDGPU] Add a new target gfx1152 (PR #94534)

2024-06-05 Thread Stanislav Mekhanoshin via cfe-commits


@@ -1534,6 +1534,12 @@ def FeatureISAVersion11_5_1 : FeatureSet<
  FeatureVGPRSingleUseHintInsts,
  Feature1_5xVGPRs])>;
 
+def FeatureISAVersion11_5_2 : FeatureSet<

rampitec wrote:

Then I defer review to Jay.

https://github.com/llvm/llvm-project/pull/94534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libclc] [llvm] [AMDGPU] Add a new target gfx1152 (PR #94534)

2024-06-05 Thread Stanislav Mekhanoshin via cfe-commits


@@ -1534,6 +1534,12 @@ def FeatureISAVersion11_5_1 : FeatureSet<
  FeatureVGPRSingleUseHintInsts,
  Feature1_5xVGPRs])>;
 
+def FeatureISAVersion11_5_2 : FeatureSet<

rampitec wrote:

I don't know, but if they are I have a question why a new target needed?

https://github.com/llvm/llvm-project/pull/94534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libclc] [llvm] [AMDGPU] Add a new target gfx1152 (PR #94534)

2024-06-05 Thread Stanislav Mekhanoshin via cfe-commits


@@ -1534,6 +1534,12 @@ def FeatureISAVersion11_5_1 : FeatureSet<
  FeatureVGPRSingleUseHintInsts,
  Feature1_5xVGPRs])>;
 
+def FeatureISAVersion11_5_2 : FeatureSet<

rampitec wrote:

Looks the same as 1150?

https://github.com/llvm/llvm-project/pull/94534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Builtin for GLOBAL_LOAD_LDS on GFX940 (PR #92962)

2024-05-21 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/92962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Clang builtin for GLOBAL_LOAD_LDS on GFX940 (PR #92962)

2024-05-21 Thread Stanislav Mekhanoshin via cfe-commits


@@ -2466,23 +2466,20 @@ def int_amdgcn_perm :
 // GFX9 Intrinsics
 
//===--===//
 
-class AMDGPUGlobalLoadLDS : Intrinsic <
-  [],
-  [LLVMQualPointerType<1>, // Base global pointer to load from
-   LLVMQualPointerType<3>, // LDS base pointer to store to
-   llvm_i32_ty,// Data byte size: 1/2/4
-   llvm_i32_ty,// imm offset (applied to both global 
and LDS address)
-   llvm_i32_ty],   // auxiliary data (imm, cachepolicy 
(bit 0 = glc/sc0,
-   //   
bit 1 = slc/sc1,
-   //   
bit 2 = dlc on gfx10/gfx11))
-   //   
bit 4 = scc/nt on gfx90a+))
-   //  gfx12+:
-   //  cachepolicy 
(bits [0-2] = th,
-   //   
bits [3-4] = scope)
-   //  swizzled buffer 
(bit 6 = swz),
-  [IntrWillReturn, NoCapture>, NoCapture>,
-   ImmArg>, ImmArg>, ImmArg>, 
IntrNoCallback, IntrNoFree],
-  "", [SDNPMemOperand]>;
+class AMDGPUGlobalLoadLDS :
+  ClangBuiltin<"__builtin_amdgcn_global_load_lds">,
+  Intrinsic <
+[],
+[LLVMQualPointerType<1>,// Base global pointer to load from
+ LLVMQualPointerType<3>,// LDS base pointer to store to
+ llvm_i32_ty,   // Data byte size: 1/2/4 (/12/16 for 
gfx950)
+ llvm_i32_ty,   // imm offset (applied to both global 
and LDS address)
+ llvm_i32_ty],  // auxiliary data (imm, cachepolicy 
(bit 0 = glc/sc0,
+//   
bit 1 = slc/sc1,
+//   
bit 4 = scc/nt on gfx90a+))

rampitec wrote:

Just sc0, sc1 and scc. It does not exist on gfx90a.

https://github.com/llvm/llvm-project/pull/92962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Clang builtin for GLOBAL_LOAD_LDS on GFX940 (PR #92962)

2024-05-21 Thread Stanislav Mekhanoshin via cfe-commits


@@ -2466,23 +2466,24 @@ def int_amdgcn_perm :
 // GFX9 Intrinsics
 
//===--===//
 
-class AMDGPUGlobalLoadLDS : Intrinsic <
-  [],
-  [LLVMQualPointerType<1>, // Base global pointer to load from
-   LLVMQualPointerType<3>, // LDS base pointer to store to
-   llvm_i32_ty,// Data byte size: 1/2/4
-   llvm_i32_ty,// imm offset (applied to both global 
and LDS address)
-   llvm_i32_ty],   // auxiliary data (imm, cachepolicy 
(bit 0 = glc/sc0,
-   //   
bit 1 = slc/sc1,
-   //   
bit 2 = dlc on gfx10/gfx11))
-   //   
bit 4 = scc/nt on gfx90a+))
-   //  gfx12+:
-   //  cachepolicy 
(bits [0-2] = th,
-   //   
bits [3-4] = scope)
-   //  swizzled buffer 
(bit 6 = swz),
-  [IntrWillReturn, NoCapture>, NoCapture>,
-   ImmArg>, ImmArg>, ImmArg>, 
IntrNoCallback, IntrNoFree],
-  "", [SDNPMemOperand]>;
+class AMDGPUGlobalLoadLDS :
+  ClangBuiltin<"__builtin_amdgcn_global_load_lds">,
+  Intrinsic <
+[],
+[LLVMQualPointerType<1>,// Base global pointer to load from
+ LLVMQualPointerType<3>,// LDS base pointer to store to
+ llvm_i32_ty,   // Data byte size: 1/2/4 (/12/16 for 
gfx950)
+ llvm_i32_ty,   // imm offset (applied to both global 
and LDS address)
+ llvm_i32_ty],  // auxiliary data (imm, cachepolicy 
(bit 0 = glc/sc0,

rampitec wrote:

Keep description of only sc0, sc1, and scc? It is not supported except on 
gfx940 anyway.

https://github.com/llvm/llvm-project/pull/92962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add Clang builtins for amdgcn s_ttrace intrinsics (PR #88076)

2024-04-11 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/88076
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add Clang builtins for amdgcn s_ttrace intrinsics (PR #88076)

2024-04-11 Thread Stanislav Mekhanoshin via cfe-commits


@@ -61,6 +61,8 @@ BUILTIN(__builtin_amdgcn_s_waitcnt, "vIi", "n")
 BUILTIN(__builtin_amdgcn_s_sendmsg, "vIiUi", "n")
 BUILTIN(__builtin_amdgcn_s_sendmsghalt, "vIiUi", "n")
 BUILTIN(__builtin_amdgcn_s_barrier, "v", "n")
+BUILTIN(__builtin_amdgcn_s_ttracedata, "vi", "n")
+BUILTIN(__builtin_amdgcn_s_ttracedata_imm, "vIs", "n")

rampitec wrote:

s_ttracedata_imm is only available since gfx10, so it needs to be a target 
bultin.

https://github.com/llvm/llvm-project/pull/88076
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (PR #86313)

2024-03-22 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> global_load_re_b64

Type global_load_re_b64.

https://github.com/llvm/llvm-project/pull/86313
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-22 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> I don't think intrinsics are meant for users. Builtins are the user-facing 
> front. :-)

Depending on who you consider an user. Are folks writing MLIR generators users?

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> > Do you want to rename intrinsics as well? Because now intrinsic names do 
> > not match builtin names.
> 
> Do we have to match builtins with intrinsics? Renaming intrinsics here means 
> we will have to duplicate the intrinsics.

Is that because of the mangling?

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Stanislav Mekhanoshin via cfe-commits


@@ -432,13 +432,15 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vi", 
"n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "b", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts")
 
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
-
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8bf16, "V8yV8y*1", "nc", 
"gfx12-insts,wavefrontsize32")
+
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4bf16, "V4yV4y*1", "nc", 
"gfx12-insts,wavefrontsize64")

rampitec wrote:

There should not be legacy yet.

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec commented:

Do you want to rename intrinsics as well? Because now intrinsic names do not 
match builtin names.

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] AMDGPU: Define a feature for v_dot4_f32_* instructions (PR #84248)

2024-03-06 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec approved this pull request.

LGTM, thanks!

https://github.com/llvm/llvm-project/pull/84248
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-04 Thread Stanislav Mekhanoshin via cfe-commits


@@ -1122,7 +1122,7 @@ class S_SETREG_B32_Pseudo  pattern=[]> : 
SOPK_Pseudo <
   pattern>;
 
 def S_SETREG_B32 : S_SETREG_B32_Pseudo <
-  [(int_amdgcn_s_setreg (i32 SIMM16bit:$simm16), i32:$sdst)]> {
+  [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> {

rampitec wrote:

If it is sign extended, it should work.

https://github.com/llvm/llvm-project/pull/83906
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-04 Thread Stanislav Mekhanoshin via cfe-commits


@@ -1122,7 +1122,7 @@ class S_SETREG_B32_Pseudo  pattern=[]> : 
SOPK_Pseudo <
   pattern>;
 
 def S_SETREG_B32 : S_SETREG_B32_Pseudo <
-  [(int_amdgcn_s_setreg (i32 SIMM16bit:$simm16), i32:$sdst)]> {
+  [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> {

rampitec wrote:

It is not expected to be negative, the original problem was that we used to 
force users to use negative constants. Now we can accept something like 0xf000 
instead of a negative value.

https://github.com/llvm/llvm-project/pull/83906
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-04 Thread Stanislav Mekhanoshin via cfe-commits


@@ -1122,7 +1122,7 @@ class S_SETREG_B32_Pseudo  pattern=[]> : 
SOPK_Pseudo <
   pattern>;
 
 def S_SETREG_B32 : S_SETREG_B32_Pseudo <
-  [(int_amdgcn_s_setreg (i32 SIMM16bit:$simm16), i32:$sdst)]> {
+  [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> {

rampitec wrote:

This just reverts my patch https://github.com/llvm/llvm-project/pull/77997 and 
reintroduces the original problem.

https://github.com/llvm/llvm-project/pull/83906
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Fix operand types for `V_DOT2_F32_BF16` (PR #82044)

2024-02-20 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/82044
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-16 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec approved this pull request.

Thanks. There are definitely at least 2 outstanding problems, but it seems 
there are no regressions comparing to what we have now. LGTM.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-16 Thread Stanislav Mekhanoshin via cfe-commits


@@ -0,0 +1,8 @@
+// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s | FileCheck %s
+// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1200 -show-encoding %s | FileCheck %s
+
+v_dot2_bf16_bf16 v5, v1, v2, 100.0
+// CHECK: v_dot2_bf16_bf16 v5, v1, v2, 0x42c8 ; encoding: 
[0x05,0x00,0x67,0xd6,0x01,0x05,0xfe,0x03,0xc8,0x42,0x00,0x00]
+
+v_dot2_bf16_bf16 v5, v1, v2, 1.0
+// CHECK: v_dot2_bf16_bf16 v5, v1, v2, 1.0 ; encoding: 
[0x05,0x00,0x67,0xd6,0x01,0x05,0xca,0x03]

rampitec wrote:

Wow! Yeah, it's another ticket. Looks like a can of worms.
Add at least 'v_dot2_bf16_bf16 v2, v0, 1.0, v2', this one works.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-16 Thread Stanislav Mekhanoshin via cfe-commits


@@ -0,0 +1,8 @@
+// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s | FileCheck %s
+// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1200 -show-encoding %s | FileCheck %s
+
+v_dot2_bf16_bf16 v5, v1, v2, 100.0
+// CHECK: v_dot2_bf16_bf16 v5, v1, v2, 0x42c8 ; encoding: 
[0x05,0x00,0x67,0xd6,0x01,0x05,0xfe,0x03,0xc8,0x42,0x00,0x00]
+
+v_dot2_bf16_bf16 v5, v1, v2, 1.0
+// CHECK: v_dot2_bf16_bf16 v5, v1, v2, 1.0 ; encoding: 
[0x05,0x00,0x67,0xd6,0x01,0x05,0xca,0x03]

rampitec wrote:

Can you add couple more tests here? The same instruction, but with the 
immediate in place of v2bf16 operand, so we test that code path too. Both 
immediate and inline literal. Here and in disasm.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-16 Thread Stanislav Mekhanoshin via cfe-commits


@@ -2652,6 +2652,23 @@ bool isInlinableLiteral32(int32_t Literal, bool 
HasInv2Pi) {
  (Val == 0x3e22f983 && HasInv2Pi);
 }
 
+bool isInlinableLiteralBF16(int16_t Literal, bool HasInv2Pi) {
+  if (!HasInv2Pi)
+return false;

rampitec wrote:

It does not change the behavior, but generally it shall only matter when you 
compare value to 0x3E22.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-15 Thread Stanislav Mekhanoshin via cfe-commits


@@ -0,0 +1,8 @@
+# RUN: llvm-mc -triple=amdgcn -mcpu=gfx1100 -disassemble -show-encoding < %s | 
FileCheck %s
+# RUN: llvm-mc -triple=amdgcn -mcpu=gfx1200 -disassemble -show-encoding < %s | 
FileCheck %s
+
+# CHECK: v_dot2_bf16_bf16 v5, v1, v2, 0x42c8

rampitec wrote:

Add encoding to the check lines. Currently it is broken and encoded value is 
different from decoded one.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-13 Thread Stanislav Mekhanoshin via cfe-commits


@@ -4185,9 +4185,17 @@ bool SIInstrInfo::isInlineConstant(const MachineOperand 
,
   case AMDGPU::OPERAND_REG_INLINE_C_V2FP16:
   case AMDGPU::OPERAND_REG_INLINE_AC_V2FP16:
 return AMDGPU::isInlinableLiteralV2F16(Imm);
+  case AMDGPU::OPERAND_REG_IMM_V2BF16:
+  case AMDGPU::OPERAND_REG_INLINE_C_V2BF16:
+  case AMDGPU::OPERAND_REG_INLINE_AC_V2BF16:
+return AMDGPU::isInlinableLiteralV2BF16(Imm);
+  case AMDGPU::OPERAND_REG_IMM_BF16:
   case AMDGPU::OPERAND_REG_IMM_FP16:
+  case AMDGPU::OPERAND_REG_IMM_BF16_DEFERRED:
   case AMDGPU::OPERAND_REG_IMM_FP16_DEFERRED:
+  case AMDGPU::OPERAND_REG_INLINE_C_BF16:
   case AMDGPU::OPERAND_REG_INLINE_C_FP16:
+  case AMDGPU::OPERAND_REG_INLINE_AC_BF16:

rampitec wrote:

But right in this place you know the actual format. So you can split F16 and 
BF16 code and call different functions.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-13 Thread Stanislav Mekhanoshin via cfe-commits


@@ -2819,11 +2819,11 @@ def int_amdgcn_fdot2_f16_f16 :
 def int_amdgcn_fdot2_bf16_bf16 :
   ClangBuiltin<"__builtin_amdgcn_fdot2_bf16_bf16">,
   DefaultAttrsIntrinsic<
-[llvm_i16_ty],   // %r
+[llvm_bfloat_ty],   // %r

rampitec wrote:

clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-gfx11.cl fails. You need to 
insert casts to bf16 while lowering it to make it working.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-13 Thread Stanislav Mekhanoshin via cfe-commits


@@ -4185,9 +4185,17 @@ bool SIInstrInfo::isInlineConstant(const MachineOperand 
,
   case AMDGPU::OPERAND_REG_INLINE_C_V2FP16:
   case AMDGPU::OPERAND_REG_INLINE_AC_V2FP16:
 return AMDGPU::isInlinableLiteralV2F16(Imm);
+  case AMDGPU::OPERAND_REG_IMM_V2BF16:
+  case AMDGPU::OPERAND_REG_INLINE_C_V2BF16:
+  case AMDGPU::OPERAND_REG_INLINE_AC_V2BF16:
+return AMDGPU::isInlinableLiteralV2BF16(Imm);
+  case AMDGPU::OPERAND_REG_IMM_BF16:
   case AMDGPU::OPERAND_REG_IMM_FP16:
+  case AMDGPU::OPERAND_REG_IMM_BF16_DEFERRED:
   case AMDGPU::OPERAND_REG_IMM_FP16_DEFERRED:
+  case AMDGPU::OPERAND_REG_INLINE_C_BF16:
   case AMDGPU::OPERAND_REG_INLINE_C_FP16:
+  case AMDGPU::OPERAND_REG_INLINE_AC_BF16:

rampitec wrote:

It seems isInlinableLiteral16() cannot handle bf16?

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-13 Thread Stanislav Mekhanoshin via cfe-commits


@@ -488,6 +488,49 @@ static bool printImmediateFloat16(uint32_t Imm, const 
MCSubtargetInfo ,
   return true;
 }
 
+static bool printImmediateBFloat16(uint32_t Imm, const MCSubtargetInfo ,
+   raw_ostream ) {
+  if (Imm == 0x3F80)
+O << "1.0";
+  else if (Imm == 0xBF80)
+O << "-1.0";
+  else if (Imm == 0x3F00)
+O << "0.5";
+  else if (Imm == 0xBF00)
+O << "-0.5";
+  else if (Imm == 0x4000)
+O << "2.0";
+  else if (Imm == 0xC000)
+O << "-2.0";
+  else if (Imm == 0x4080)
+O << "4.0";
+  else if (Imm == 0xC080)
+O << "-4.0";
+  else if (Imm == 0x3E22 && STI.hasFeature(AMDGPU::FeatureInv2PiInlineImm))
+O << "0.15915494";
+  else
+return false;
+
+  return true;
+}
+
+void AMDGPUInstPrinter::printImmediateBF16(uint32_t Imm,
+   const MCSubtargetInfo ,
+   raw_ostream ) {
+  int16_t SImm = static_cast(Imm);
+  if (isInlinableIntLiteral(SImm)) {
+O << SImm;
+return;
+  }
+
+  uint16_t HImm = static_cast(Imm);
+  if (printImmediateBFloat16(HImm, STI, O))
+return;
+
+  uint64_t Imm16 = static_cast(Imm);

rampitec wrote:

It's the same as HImm above.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-13 Thread Stanislav Mekhanoshin via cfe-commits


@@ -1,8 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | 
FileCheck %s --check-prefixes=GFX11,SDAG-GFX11
-; RUN: llc -global-isel -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < 
%s | FileCheck %s --check-prefixes=GFX11,GISEL-GFX11

rampitec wrote:

Change 'RUN' with 'XUN' and add a comment instead.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-13 Thread Stanislav Mekhanoshin via cfe-commits


@@ -0,0 +1,8 @@
+// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s | FileCheck %s

rampitec wrote:

You also need a disasm test for this.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-12 Thread Stanislav Mekhanoshin via cfe-commits


@@ -79,17 +79,17 @@ define amdgpu_ps void @test_llvm_amdgcn_fdot2_bf16_bf16_sis(
 ; GFX11:   ; %bb.0: ; %entry
 ; GFX11-NEXT:v_mov_b32_e32 v2, s1
 ; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT:v_dot2_bf16_bf16 v2, s0, 0x10001, v2
+; GFX11-NEXT:v_dot2_bf16_bf16 v2, s0, 0x3f803f80, v2

rampitec wrote:

Well, this is unrelated to the patch itself. We can use inline 1.0 here, but 
then we must use op_sel_hi to produce it in the high half.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-08 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec edited 
https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-08 Thread Stanislav Mekhanoshin via cfe-commits


@@ -4181,13 +4181,20 @@ bool SIInstrInfo::isInlineConstant(const MachineOperand 
,
   case AMDGPU::OPERAND_REG_INLINE_C_V2INT16:
   case AMDGPU::OPERAND_REG_INLINE_AC_V2INT16:
 return AMDGPU::isInlinableLiteralV2I16(Imm);
+  case AMDGPU::OPERAND_REG_IMM_V2BF16:

rampitec wrote:

It does not seem isInlinableLiteralV2F16() can handle bf16.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-08 Thread Stanislav Mekhanoshin via cfe-commits


@@ -79,17 +79,17 @@ define amdgpu_ps void @test_llvm_amdgcn_fdot2_bf16_bf16_sis(
 ; GFX11:   ; %bb.0: ; %entry
 ; GFX11-NEXT:v_mov_b32_e32 v2, s1
 ; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT:v_dot2_bf16_bf16 v2, s0, 0x10001, v2
+; GFX11-NEXT:v_dot2_bf16_bf16 v2, s0, 0x3f803f80, v2

rampitec wrote:

This shall be encoded as inline immediate 1.0.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-08 Thread Stanislav Mekhanoshin via cfe-commits


@@ -521,8 +521,11 @@ void AMDGPUInstPrinter::printImmediateV216(uint32_t Imm, 
uint8_t OpType,
 if (printImmediateFloat32(Imm, STI, O))
   return;
 break;
+  case AMDGPU::OPERAND_REG_IMM_V2BF16:
   case AMDGPU::OPERAND_REG_IMM_V2FP16:
+  case AMDGPU::OPERAND_REG_INLINE_C_V2BF16:
   case AMDGPU::OPERAND_REG_INLINE_C_V2FP16:
+  case AMDGPU::OPERAND_REG_INLINE_AC_V2BF16:

rampitec wrote:

It does not seem right, and there are no tests for v2bf16 added. I am not sure 
though we have instructions which can accept this type of operand.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-08 Thread Stanislav Mekhanoshin via cfe-commits


@@ -1562,8 +1562,9 @@ bool IRTranslator::translateBitCast(const User ,
 
 bool IRTranslator::translateCast(unsigned Opcode, const User ,
  MachineIRBuilder ) {
-  if (U.getType()->getScalarType()->isBFloatTy() ||
-  U.getOperand(0)->getType()->getScalarType()->isBFloatTy())
+  if (Opcode != TargetOpcode::G_BITCAST &&

rampitec wrote:

This is actually an orthogonal problem. Global ISel is completely broken for 
bf16 and whatever the outcome of the supporting bf16 in codegen is we just need 
to be ready some gisel tests will fail and will need to be disabled.

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-08 Thread Stanislav Mekhanoshin via cfe-commits


@@ -0,0 +1,8 @@
+// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s | FileCheck %s
+// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1200 -show-encoding %s | FileCheck %s
+
+v_dot2_bf16_bf16 v5, v1, v2, 100.0
+// CHECK: v_dot2_bf16_bf16 v5, v1, v2, 0x42c8 ; encoding: 
[0x05,0x00,0x67,0xd6,0x01,0x05,0xfe,0x03,0xc8,0x42,0x00,0x00]
+
+v_dot2_bf16_bf16 v5, v1, v2, 1.0
+// v_dot2_bf16_bf16 v5, v1, v2, 0x3f80 ; encoding: 
[0x05,0x00,0x67,0xd6,0x01,0x05,0xfe,0x03,0x80,0x3f,0x00,0x00]

rampitec wrote:

FYI: this shall be inline literal. I.e:
0xd6672005
0x03ca0501

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)

2024-02-05 Thread Stanislav Mekhanoshin via cfe-commits


@@ -832,6 +832,13 @@ void test_atomic_inc_dec(local uint *lptr, global uint 
*gptr, uint val) {
   res = __builtin_amdgcn_atomic_dec32((volatile global uint*)gptr, val, 
__ATOMIC_SEQ_CST, "");
 }
 
+// CHECK-LABEL test_wavefrontsize(
+unsigned test_wavefrontsize() {

rampitec wrote:

Ugh, it's inside the body. Unusual, but test above is he same.

https://github.com/llvm/llvm-project/pull/80741
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)

2024-02-05 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/80741
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)

2024-02-05 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec edited 
https://github.com/llvm/llvm-project/pull/80741
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)

2024-02-05 Thread Stanislav Mekhanoshin via cfe-commits


@@ -832,6 +832,13 @@ void test_atomic_inc_dec(local uint *lptr, global uint 
*gptr, uint val) {
   res = __builtin_amdgcn_atomic_dec32((volatile global uint*)gptr, val, 
__ATOMIC_SEQ_CST, "");
 }
 
+// CHECK-LABEL test_wavefrontsize(
+unsigned test_wavefrontsize() {

rampitec wrote:

Missing check for the test.

https://github.com/llvm/llvm-project/pull/80741
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [llvm] [clang] [AMDGPU] GlobalISel for f8 conversions (PR #80503)

2024-02-05 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/80503
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [clang-tools-extra] [AMDGPU] GlobalISel for f8 conversions (PR #80503)

2024-02-05 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/80503

>From b07f5866aa8acf881fbdb15450ecda4dfc8a68e8 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Fri, 2 Feb 2024 14:28:00 -0800
Subject: [PATCH 1/2] [AMDGPU] Fixed byte_sel of v_cvt_f32_bf8/v_cvt_f32_fp8

Opsel bits are swapped. Actual byte select table:

Byte  OPSEL
0 0
1 2
2 1
3 3
---
 llvm/lib/Target/AMDGPU/VOP1Instructions.td  | 6 ++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll | 4 ++--
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll | 8 
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP1Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP1Instructions.td
index 920c220fb2c65..58b67b21e274b 100644
--- a/llvm/lib/Target/AMDGPU/VOP1Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP1Instructions.td
@@ -668,10 +668,8 @@ class Cvt_F32_F8_Pat_OpSel 
index,
 VOP1_Pseudo inst_e32, VOP3_Pseudo inst_e64> : GCNPat<
 (f32 (node i32:$src, index)),
 !if (index,
- (inst_e64 !if(index{0},
- !if(index{1}, !or(SRCMODS.OP_SEL_0, SRCMODS.OP_SEL_1),
-   SRCMODS.OP_SEL_0),
- !if(index{1}, SRCMODS.OP_SEL_1, 0)),
+ (inst_e64 !or(!if(index{0}, SRCMODS.OP_SEL_1, 0),
+   !if(index{1}, SRCMODS.OP_SEL_0, 0)),
 $src, 0),
  (inst_e32 $src))
 >;
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll
index f49fec60892cd..e21d61036375a 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll
@@ -16,7 +16,7 @@ define amdgpu_cs float @test_cvt_f32_bf8_byte1(i32 %a) {
 ; GFX12:   ; %bb.0:
 ; GFX12-NEXT:v_mov_b32_dpp v0, v0 quad_perm:[0,1,2,3] row_mask:0xf 
bank_mask:0xf bound_ctrl:1
 ; GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[1,0]
+; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[0,1]
 ; GFX12-NEXT:; return to shader part epilog
   %tmp0 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %a, i32 228, i32 15, i32 15, 
i1 1)
   %ret = tail call float @llvm.amdgcn.cvt.f32.bf8(i32 %tmp0, i32 1)
@@ -28,7 +28,7 @@ define amdgpu_cs float @test_cvt_f32_bf8_byte2(i32 %a) {
 ; GFX12:   ; %bb.0:
 ; GFX12-NEXT:v_mov_b32_dpp v0, v0 quad_perm:[0,1,2,3] row_mask:0xf 
bank_mask:0xf bound_ctrl:1
 ; GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[0,1]
+; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[1,0]
 ; GFX12-NEXT:; return to shader part epilog
   %tmp0 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %a, i32 228, i32 15, i32 15, 
i1 1)
   %ret = tail call float @llvm.amdgcn.cvt.f32.bf8(i32 %tmp0, i32 2)
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll
index 17b1fcf865e94..f915fa8e6cd1c 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll
@@ -45,7 +45,7 @@ define float @test_cvt_f32_bf8_byte1(i32 %a) {
 ; GFX12-NEXT:s_wait_samplecnt 0x0
 ; GFX12-NEXT:s_wait_bvhcnt 0x0
 ; GFX12-NEXT:s_wait_kmcnt 0x0
-; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[1,0]
+; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[0,1]
 ; GFX12-NEXT:s_setpc_b64 s[30:31]
   %ret = tail call float @llvm.amdgcn.cvt.f32.bf8(i32 %a, i32 1)
   ret float %ret
@@ -65,7 +65,7 @@ define float @test_cvt_f32_bf8_byte2(i32 %a) {
 ; GFX12-NEXT:s_wait_samplecnt 0x0
 ; GFX12-NEXT:s_wait_bvhcnt 0x0
 ; GFX12-NEXT:s_wait_kmcnt 0x0
-; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[0,1]
+; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[1,0]
 ; GFX12-NEXT:s_setpc_b64 s[30:31]
   %ret = tail call float @llvm.amdgcn.cvt.f32.bf8(i32 %a, i32 2)
   ret float %ret
@@ -125,7 +125,7 @@ define float @test_cvt_f32_fp8_byte1(i32 %a) {
 ; GFX12-NEXT:s_wait_samplecnt 0x0
 ; GFX12-NEXT:s_wait_bvhcnt 0x0
 ; GFX12-NEXT:s_wait_kmcnt 0x0
-; GFX12-NEXT:v_cvt_f32_fp8_e64 v0, v0 op_sel:[1,0]
+; GFX12-NEXT:v_cvt_f32_fp8_e64 v0, v0 op_sel:[0,1]
 ; GFX12-NEXT:s_setpc_b64 s[30:31]
   %ret = tail call float @llvm.amdgcn.cvt.f32.fp8(i32 %a, i32 1)
   ret float %ret
@@ -145,7 +145,7 @@ define float @test_cvt_f32_fp8_byte2(i32 %a) {
 ; GFX12-NEXT:s_wait_samplecnt 0x0
 ; GFX12-NEXT:s_wait_bvhcnt 0x0
 ; GFX12-NEXT:s_wait_kmcnt 0x0
-; GFX12-NEXT:v_cvt_f32_fp8_e64 v0, v0 op_sel:[0,1]
+; GFX12-NEXT:v_cvt_f32_fp8_e64 v0, v0 op_sel:[1,0]
 ; GFX12-NEXT:s_setpc_b64 s[30:31]
   %ret = tail call float @llvm.amdgcn.cvt.f32.fp8(i32 %a, i32 2)
   ret float %ret

>From 5f211ec3068988ab397d7234e2fc5a61e074bee8 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Fri, 2 Feb 2024 14:35:59 -0800
Subject: [PATCH 2/2] [AMDGPU] GlobalISel for f8 conversions

---
 

[libcxx] [flang] [mlir] [llvm] [compiler-rt] [clang-tools-extra] [openmp] [libc] [lldb] [lld] [clang] AMDGPU: Add SourceOfDivergence for int_amdgcn_global_load_tr (PR #79218)

2024-01-23 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/79218
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libc] [clang] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [llvm] [lldb] [flang] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2024-01-17 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/75974
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libc] [clang] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [llvm] [lldb] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libc] [clang] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [llvm] [lldb] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> > > lgtm, but can still fix the -O0 thing
> > 
> > 
> > But where do I get TM in the getAnalysisUsage?
> 
> MF.getTarget() (or maybe a pass parameter is necessary?)

There is no MF there of course.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [lldb] [compiler-rt] [clang] [lld] [mlir] [libc] [clang-tools-extra] [flang] [AMDGPU] Reapply 'Sign extend simm16 in setreg intrinsic' (PR #78492)

2024-01-17 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/78492
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [flang] [lld] [libc] [clang] [llvm] [mlir] [compiler-rt] [lldb] [AMDGPU] Reapply 'Sign extend simm16 in setreg intrinsic' (PR #78492)

2024-01-17 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/78492

>From 01af6c9d8e80b810bbdec35dee38b1cf5d73cfe0 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Fri, 12 Jan 2024 15:07:53 -0800
Subject: [PATCH 1/3] [AMDGPU] Sign extend simm16 in setreg intrinsic

We currently force users to use a negative contant in the
intrinsic call. Changing it zext would break existing programs,
so just sign extend an argument.
---
 llvm/lib/Target/AMDGPU/SOPInstructions.td | 11 ++--
 .../CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll| 66 +++
 2 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td 
b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index 46fa3d57a21cb2..5b35d4dcac2e4f 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -1117,14 +1117,12 @@ def S_GETREG_B32 : SOPK_Pseudo <
 let Defs = [MODE], Uses = [MODE] in {
 
 // FIXME: Need to truncate immediate to 16-bits.
-class S_SETREG_B32_Pseudo  pattern=[]> : SOPK_Pseudo <
+class S_SETREG_B32_Pseudo : SOPK_Pseudo <
   "s_setreg_b32",
   (outs), (ins SReg_32:$sdst, hwreg:$simm16),
-  "$simm16, $sdst",
-  pattern>;
+  "$simm16, $sdst">;
 
-def S_SETREG_B32 : S_SETREG_B32_Pseudo <
-  [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> {
+def S_SETREG_B32 : S_SETREG_B32_Pseudo {
   // Use custom inserter to optimize some cases to
   // S_DENORM_MODE/S_ROUND_MODE/S_SETREG_B32_mode.
   let usesCustomInserter = 1;
@@ -1160,6 +1158,9 @@ def S_SETREG_IMM32_B32_mode : S_SETREG_IMM32_B32_Pseudo {
 
 } // End Defs = [MODE], Uses = [MODE]
 
+def : GCNPat<(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst),
+ (S_SETREG_B32 $sdst, (as_i16timm $simm16))>;
+
 class SOPK_WAITCNT pat=[]> :
 SOPK_Pseudo<
 opName,
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll
index d2c14f2401fc35..99d80b5dd14b33 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll
@@ -1433,6 +1433,72 @@ define amdgpu_kernel void 
@test_setreg_set_4_bits_straddles_round_and_denorm() {
   ret void
 }
 
+define amdgpu_ps void @test_63489(i32 inreg %var.mode) {
+; GFX6-LABEL: test_63489:
+; GFX6:   ; %bb.0:
+; GFX6-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX6-NEXT:;;#ASMSTART
+; GFX6-NEXT:;;#ASMEND
+; GFX6-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX789-LABEL: test_63489:
+; GFX789:   ; %bb.0:
+; GFX789-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX789-NEXT:;;#ASMSTART
+; GFX789-NEXT:;;#ASMEND
+; GFX789-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX10-LABEL: test_63489:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX10-NEXT:;;#ASMSTART
+; GFX10-NEXT:;;#ASMEND
+; GFX10-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX11-LABEL: test_63489:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX11-NEXT:;;#ASMSTART
+; GFX11-NEXT:;;#ASMEND
+; GFX11-NEXT:s_endpgm ; encoding: [0x00,0x00,0xb0,0xbf]
+  call void @llvm.amdgcn.s.setreg(i32 63489, i32 %var.mode)
+  call void asm sideeffect "", ""()
+  ret void
+}
+
+define amdgpu_ps void @test_minus_2047(i32 inreg %var.mode) {
+; GFX6-LABEL: test_minus_2047:
+; GFX6:   ; %bb.0:
+; GFX6-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX6-NEXT:;;#ASMSTART
+; GFX6-NEXT:;;#ASMEND
+; GFX6-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX789-LABEL: test_minus_2047:
+; GFX789:   ; %bb.0:
+; GFX789-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX789-NEXT:;;#ASMSTART
+; GFX789-NEXT:;;#ASMEND
+; GFX789-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX10-LABEL: test_minus_2047:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX10-NEXT:;;#ASMSTART
+; GFX10-NEXT:;;#ASMEND
+; GFX10-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX11-LABEL: test_minus_2047:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX11-NEXT:;;#ASMSTART
+; GFX11-NEXT:;;#ASMEND
+; GFX11-NEXT:s_endpgm ; encoding: [0x00,0x00,0xb0,0xbf]
+  call void @llvm.amdgcn.s.setreg(i32 -2047, i32 %var.mode)
+  call void asm sideeffect "", ""()
+  ret void
+}
+
 ; FIXME: Broken for DAG
 ; define void @test_setreg_roundingmode_var_vgpr(i32 %var.mode) {
 ;   call void @llvm.amdgcn.s.setreg(i32 4097, i32 %var.mode)

>From daeef9d3780bcfc9f48a2bf4fff313f3e5575f6b Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 15 Jan 2024 11:21:05 

[clang] [llvm] [clang-tools-extra] [AMDGPU] Reapply 'Sign extend simm16 in setreg intrinsic' (PR #78492)

2024-01-17 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/78492

We currently force users to use a negative contant in the intrinsic call. 
Changing it zext would break existing programs, so just sign extend an argument.

>From 01af6c9d8e80b810bbdec35dee38b1cf5d73cfe0 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Fri, 12 Jan 2024 15:07:53 -0800
Subject: [PATCH 1/3] [AMDGPU] Sign extend simm16 in setreg intrinsic

We currently force users to use a negative contant in the
intrinsic call. Changing it zext would break existing programs,
so just sign extend an argument.
---
 llvm/lib/Target/AMDGPU/SOPInstructions.td | 11 ++--
 .../CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll| 66 +++
 2 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td 
b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index 46fa3d57a21cb2..5b35d4dcac2e4f 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -1117,14 +1117,12 @@ def S_GETREG_B32 : SOPK_Pseudo <
 let Defs = [MODE], Uses = [MODE] in {
 
 // FIXME: Need to truncate immediate to 16-bits.
-class S_SETREG_B32_Pseudo  pattern=[]> : SOPK_Pseudo <
+class S_SETREG_B32_Pseudo : SOPK_Pseudo <
   "s_setreg_b32",
   (outs), (ins SReg_32:$sdst, hwreg:$simm16),
-  "$simm16, $sdst",
-  pattern>;
+  "$simm16, $sdst">;
 
-def S_SETREG_B32 : S_SETREG_B32_Pseudo <
-  [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> {
+def S_SETREG_B32 : S_SETREG_B32_Pseudo {
   // Use custom inserter to optimize some cases to
   // S_DENORM_MODE/S_ROUND_MODE/S_SETREG_B32_mode.
   let usesCustomInserter = 1;
@@ -1160,6 +1158,9 @@ def S_SETREG_IMM32_B32_mode : S_SETREG_IMM32_B32_Pseudo {
 
 } // End Defs = [MODE], Uses = [MODE]
 
+def : GCNPat<(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst),
+ (S_SETREG_B32 $sdst, (as_i16timm $simm16))>;
+
 class SOPK_WAITCNT pat=[]> :
 SOPK_Pseudo<
 opName,
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll
index d2c14f2401fc35..99d80b5dd14b33 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll
@@ -1433,6 +1433,72 @@ define amdgpu_kernel void 
@test_setreg_set_4_bits_straddles_round_and_denorm() {
   ret void
 }
 
+define amdgpu_ps void @test_63489(i32 inreg %var.mode) {
+; GFX6-LABEL: test_63489:
+; GFX6:   ; %bb.0:
+; GFX6-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX6-NEXT:;;#ASMSTART
+; GFX6-NEXT:;;#ASMEND
+; GFX6-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX789-LABEL: test_63489:
+; GFX789:   ; %bb.0:
+; GFX789-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX789-NEXT:;;#ASMSTART
+; GFX789-NEXT:;;#ASMEND
+; GFX789-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX10-LABEL: test_63489:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX10-NEXT:;;#ASMSTART
+; GFX10-NEXT:;;#ASMEND
+; GFX10-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX11-LABEL: test_63489:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX11-NEXT:;;#ASMSTART
+; GFX11-NEXT:;;#ASMEND
+; GFX11-NEXT:s_endpgm ; encoding: [0x00,0x00,0xb0,0xbf]
+  call void @llvm.amdgcn.s.setreg(i32 63489, i32 %var.mode)
+  call void asm sideeffect "", ""()
+  ret void
+}
+
+define amdgpu_ps void @test_minus_2047(i32 inreg %var.mode) {
+; GFX6-LABEL: test_minus_2047:
+; GFX6:   ; %bb.0:
+; GFX6-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX6-NEXT:;;#ASMSTART
+; GFX6-NEXT:;;#ASMEND
+; GFX6-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX789-LABEL: test_minus_2047:
+; GFX789:   ; %bb.0:
+; GFX789-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX789-NEXT:;;#ASMSTART
+; GFX789-NEXT:;;#ASMEND
+; GFX789-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX10-LABEL: test_minus_2047:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX10-NEXT:;;#ASMSTART
+; GFX10-NEXT:;;#ASMEND
+; GFX10-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX11-LABEL: test_minus_2047:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX11-NEXT:;;#ASMSTART
+; GFX11-NEXT:;;#ASMEND
+; GFX11-NEXT:s_endpgm ; encoding: [0x00,0x00,0xb0,0xbf]
+  call void @llvm.amdgcn.s.setreg(i32 -2047, i32 %var.mode)
+  call void asm sideeffect "", ""()
+  ret void
+}
+
 ; FIXME: Broken for DAG
 ; define void @test_setreg_roundingmode_var_vgpr(i32 %var.mode) {
 ;   call void @llvm.amdgcn.s.setreg(i32 

[libcxx] [llvm] [lld] [compiler-rt] [clang-tools-extra] [clang] [libc] [lldb] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> > lgtm, but can still fix the -O0 thing
> 
> But where do I get TM in the getAnalysisUsage?

Found addUsedIfAvailable() which does the trick.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libcxx] [llvm] [lld] [compiler-rt] [clang-tools-extra] [clang] [libc] [lldb] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/13] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots for DS.
+  // 

[clang-tools-extra] [lldb] [libc] [libcxx] [clang] [compiler-rt] [lld] [flang] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> lgtm, but can still fix the -O0 thing

But where do I get TM in the getAnalysisUsage?

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [lldb] [libc] [libcxx] [clang] [compiler-rt] [lld] [flang] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via cfe-commits


@@ -707,7 +723,40 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
 (TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) {
   // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS
   // written can be accessed. A load from LDS to VMEM does not need a wait.
-  setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);
+  unsigned Slot = 0;
+  for (const auto *MemOp : Inst.memoperands()) {
+if (!MemOp->isStore() ||
+MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS)
+  continue;
+// Comparing just AA info does not guarantee memoperands are equal
+// in general, but this is so for LDS DMA in practice.
+auto AAI = MemOp->getAAInfo();
+// Alias scope information gives a way to definitely identify an
+// original memory object and practically produced in the module LDS
+// lowering pass. If there is no scope available we will not be able
+// to disambiguate LDS aliasing as after the module lowering all LDS
+// is squashed into a single big object. Do not attemt to use one of

rampitec wrote:

Done

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libcxx] [compiler-rt] [clang] [clang-tools-extra] [libc] [flang] [lldb] [lld] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/12] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots for DS.
+  // 

[libcxx] [flang] [llvm] [libc] [compiler-rt] [clang-tools-extra] [clang] [lld] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-15 Thread Stanislav Mekhanoshin via cfe-commits


@@ -1183,9 +1228,21 @@ bool 
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr ,
 // No need to wait before load from VMEM to LDS.
 if (TII->mayWriteLDSThroughDMA(MI))
   continue;
-unsigned RegNo = SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS;
+
 // VM_CNT is only relevant to vgpr or LDS.
-ScoreBrackets.determineWait(VM_CNT, RegNo, Wait);
+unsigned RegNo = SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS;
+bool FoundAliasingStore = false;
+if (Ptr && Memop->getAAInfo() && Memop->getAAInfo().Scope) {

rampitec wrote:

I have added more comments to explain this. The place which fills the LDS DMA 
slot bails if there is no scope info not to waste limited tracking slots. In 
that case a generic first slot is still used for such operation (it is always 
used, regardless if we can or cannot be more specific about the underlying 
object). Here AA will be unable to disambiguate aliasing if there is no scope 
info, so this condition is simply a shortcut to avoid an expensive loop and AA 
query. I can remove this part of the condition here and nothing will change 
except it will work slower. Note that not entering this 'if' statement will 
always produce a conservatively correct wait using first generic tracking slot, 
which always gets a score regardless of our ability to track a specific object. 
The condition is around the relaxation code to avoid a generic and conservative 
'wait for everything' part below.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [lld] [clang-tools-extra] [llvm] [compiler-rt] [lldb] [clang] [libc] [libcxx] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-15 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/11] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7eff..2e079404b087faa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7d..50ad22130e939e2 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots for DS.
+ 

[clang-tools-extra] [flang] [libc] [lldb] [compiler-rt] [lld] [llvm] [libcxx] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-12 Thread Stanislav Mekhanoshin via cfe-commits


@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
 setRegScore(RegNo, T, CurrScore);
   }
 }
-if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) {
-  setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);
+if (Inst.mayStore() &&
+(TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) {
+  // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS
+  // written can be accessed. A load from LDS to VMEM does not need a wait.
+  unsigned Slot = 0;
+  for (const auto *MemOp : Inst.memoperands()) {
+if (!MemOp->isStore() ||
+MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS)
+  continue;
+// Comparing just AA info does not guarantee memoperands are equal

rampitec wrote:

Right, there is no PSV. I have mentioned PSV because you have earlier suggested 
to use it. For the real IR value: it is not helpful to compare it. The IR value 
is a GEP, and this GEP is always different. I.e. these values never compare 
equal. The rest of the IR is already gone and unavailable for the analysis. 
Even if it would be available this GEP will address kernel module LDS variable, 
a single huge LDS array, and will be useless again. In this case it will tell 
you any LDS operation aliases any other. Now during the module LDS lowering I 
am creating alias scope info specifically to disambiguate aliasing after the 
pass has squashed all LDS variables.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [libcxx] [compiler-rt] [llvm] [libc] [lldb] [lld] [clang-tools-extra] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-12 Thread Stanislav Mekhanoshin via cfe-commits


@@ -1183,9 +1228,21 @@ bool 
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr ,
 // No need to wait before load from VMEM to LDS.
 if (TII->mayWriteLDSThroughDMA(MI))
   continue;
-unsigned RegNo = SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS;
+
 // VM_CNT is only relevant to vgpr or LDS.
-ScoreBrackets.determineWait(VM_CNT, RegNo, Wait);
+unsigned RegNo = SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS;
+bool FoundAliasingStore = false;
+if (Ptr && Memop->getAAInfo() && Memop->getAAInfo().Scope) {

rampitec wrote:

I have reserved just 8 pseudo registers to track it. I do not want to fill it 
with unrelated stuff. I know that the only way AA will be able to handle this 
very specific situation is if there is scope info, otherwise there is no reason 
to waste a slot and compile time. If I do not enter this 'if' the pass will 
just do conservatively correct thing and wait for this memory regardless of 
aliasing or lack of it.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [libcxx] [compiler-rt] [llvm] [libc] [lldb] [lld] [clang-tools-extra] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-12 Thread Stanislav Mekhanoshin via cfe-commits


@@ -130,6 +130,8 @@
 ; GCN-O0-NEXT:MachineDominator Tree Construction
 ; GCN-O0-NEXT:Machine Natural Loop Construction
 ; GCN-O0-NEXT:MachinePostDominator Tree Construction
+; GCN-O0-NEXT:Basic Alias Analysis (stateless AA impl)
+; GCN-O0-NEXT:Function Alias Analysis Results

rampitec wrote:

If I just skip getAnalysis call it does not help since analysis is requested in 
the getAnalysisUsage. If I do not request it it is not available at any 
optlevel.  This is the benefit of the alternative 
https://github.com/llvm/llvm-project/pull/75974, it does not request the full 
analysis.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [libcxx] [compiler-rt] [llvm] [libc] [lldb] [lld] [clang-tools-extra] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-12 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/10] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots for DS.
+  // 

[llvm] [flang] [clang] [clang-tools-extra] [compiler-rt] [libc] [lldb] [lld] [libcxx] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-11 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

Ping

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AMDGPU] Add global_load_tr for GFX12 (PR #77772)

2024-01-11 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/2
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [clang] [libc] [flang] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-03 Thread Stanislav Mekhanoshin via cfe-commits


@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
 setRegScore(RegNo, T, CurrScore);
   }
 }
-if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) {
-  setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);
+if (Inst.mayStore() &&
+(TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) {
+  // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS
+  // written can be accessed. A load from LDS to VMEM does not need a wait.
+  unsigned Slot = 0;
+  for (const auto *MemOp : Inst.memoperands()) {
+if (!MemOp->isStore() ||
+MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS)
+  continue;
+// Comparing just AA info does not guarantee memoperands are equal

rampitec wrote:

> The values don't need to be identical, that's the point of the AA query. 
> BasicAA will parse through the offsets

I also think that values don't need to be identical. But that is what 
MI:mayAlias() does *before* it checks AA: 
https://llvm.org/doxygen/MachineInstr_8cpp_source.html#l01285

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [clang] [libc] [flang] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-03 Thread Stanislav Mekhanoshin via cfe-commits


@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
 setRegScore(RegNo, T, CurrScore);
   }
 }
-if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) {
-  setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);
+if (Inst.mayStore() &&
+(TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) {
+  // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS
+  // written can be accessed. A load from LDS to VMEM does not need a wait.
+  unsigned Slot = 0;
+  for (const auto *MemOp : Inst.memoperands()) {
+if (!MemOp->isStore() ||
+MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS)
+  continue;
+// Comparing just AA info does not guarantee memoperands are equal

rampitec wrote:

> It looks to me like it does use it if you pass UseTBAA=true. Not sure why 
> this would be a parameter in the first place

I am passing it, but to get to that check it shall first go through all Value 
and offset checks. Using AA is the last thing it does: 
https://llvm.org/doxygen/MachineInstr_8cpp_source.html#l01285

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [clang] [libc] [flang] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-03 Thread Stanislav Mekhanoshin via cfe-commits


@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
 setRegScore(RegNo, T, CurrScore);
   }
 }
-if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) {
-  setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);
+if (Inst.mayStore() &&
+(TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) {
+  // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS
+  // written can be accessed. A load from LDS to VMEM does not need a wait.
+  unsigned Slot = 0;
+  for (const auto *MemOp : Inst.memoperands()) {
+if (!MemOp->isStore() ||
+MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS)
+  continue;
+// Comparing just AA info does not guarantee memoperands are equal

rampitec wrote:

> PseudoSourceValue::mayAlias is supposed to report aliasing to possible IR 
> values. It looks like it's layered weirdly, and expects you to go through 
> MachineInstr::mayAlias. MachineInstr::mayAlias ought to be using the AA tags, 
> it shouldn't be a fundamental limitation

This is all PSV::mayAlias() does:
```
bool PseudoSourceValue::mayAlias(const MachineFrameInfo *) const {
  return !(isGOT() || isConstantPool() || isJumpTable());
}
```
No very useful. Then even to get to the AA tags check MI:mayAlias() shall go 
through all IR values' checks first.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libcxx] [flang] [libc] [clang-tools-extra] [lldb] [lld] [compiler-rt] [clang] [llvm] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2024-01-02 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

Ping

https://github.com/llvm/llvm-project/pull/75974
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [clang] [flang] [clang-tools-extra] [llvm] [lldb] [libc] [compiler-rt] [libcxx] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-02 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

Ping

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lldb] [lld] [flang] [clang-tools-extra] [libcxx] [llvm] [libc] [compiler-rt] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2023-12-19 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

This is the place I am creating it: https://reviews.llvm.org/D108315

https://github.com/llvm/llvm-project/pull/75974
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lldb] [flang] [llvm] [libc] [libcxx] [lld] [clang-tools-extra] [compiler-rt] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2023-12-19 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

One thing to note: this alias.scope I am creating myself in the module LDS 
lowering, so I do exactly know what to expect. And then since there is this 
module LDS lowering even if any alias scope would be created before (which 
never happens, much less for an intrinsic call) it is already lost. It is lost 
along with the memory objects deleted by the lowering. That is the whole point 
of creating alias.scope metadata during the lowering: we are putting all module 
LDS into a single structure, so no AA will ever disambiguate it w/o alias scope 
info. In this situation I am the sole creator of the metadata, instructions 
carrying it, memory object accessed, and the consumer of this metadata.

At -O0 there will be no LDS lowering, but there will be no AA either. I do not 
see how to exploit it on practice.

One other thing to note here: there is also !noalias metadata generated in the 
very same place. I do not care about this because I am really searching for a 
store into this memory, which is a scope.

When I was writing code to generate this metadata I kept in mind exactly a 
scenario similar to this.

https://github.com/llvm/llvm-project/pull/75974
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [compiler-rt] [flang] [libc] [libcxx] [llvm] [clang] [lldb] [clang-tools-extra] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2023-12-19 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/75974

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/12] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots for DS.
+  // 

[lld] [compiler-rt] [flang] [libc] [libcxx] [llvm] [clang] [lldb] [clang-tools-extra] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

Actually since I am only using alias scope I can avoid all alias analysis 
altogether and only compare alias scope. This does not need an analysis pass, 
calls to mayAlias, and in general simpler code. You can see an alternative PR 
if you like it more: https://github.com/llvm/llvm-project/pull/75974

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [compiler-rt] [flang] [libc] [libcxx] [llvm] [clang] [lldb] [clang-tools-extra] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2023-12-19 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/75974

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a pseudo
register is used in the scoreboard, acting like if LDS DMA writes
it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias scope info.

Fixes: SWDEV-433427

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/11] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include 

[compiler-rt] [llvm] [libc] [libcxx] [lldb] [clang] [lld] [clang-tools-extra] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> > This is still correct, pointer argument cannot alias module global. A 
> > pointer argument to a kernel is an LDS external requested by the host side, 
> > and host cannot see module LDS.
> 
> I.e. that is really the point of the patch: if we are able to definitively 
> identify an LDS object targeted by both load and store we only wait on that 
> store or stores. And the only way to definitively identify the object at this 
> stage is via alias.scope info which we are generating ourselves during module 
> LDS lowering.

I have added a check for the presence of alias scope info just in case we get a 
rogue AA. The testcase with a pointer argument still produces correct code with 
vmcnt(1).

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [lldb] [lld] [llvm] [compiler-rt] [libcxx] [flang] [clang-tools-extra] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/10] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots for DS.
+  // 

[clang] [clang-tools-extra] [compiler-rt] [llvm] [libcxx] [lldb] [lld] [libc] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> This is still correct, pointer argument cannot alias module global. A pointer 
> argument to a kernel is an LDS external requested by the host side, and host 
> cannot see module LDS.

I.e. that is really the point of the patch: if we are able to definitively 
identify an LDS object targeted by both load and store we only wait on that 
store or stores. And the only way to definitively identify the object at this 
stage is via alias.scope info which we are generating ourselves during module 
LDS lowering.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [compiler-rt] [llvm] [libcxx] [lldb] [lld] [libc] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> Test case:
> 
> ```
> @lds.0 = internal addrspace(3) global [64 x float] poison, align 16
> @lds.1 = internal addrspace(3) global [64 x float] poison, align 16
> 
> declare void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr 
> addrspace(3) nocapture, i32 %size, i32 %voffset, i32 %soffset, i32 %offset, 
> i32 %aux)
> 
> define amdgpu_kernel void @f(<4 x i32> %rsrc, i32 %i1, i32 %i2, ptr 
> addrspace(1) %out, ptr addrspace(3) %ptr) {
> main_body:
>   call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr 
> addrspace(3) @lds.0, i32 4, i32 0, i32 0, i32 0, i32 0)
>   call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr 
> addrspace(3) %ptr, i32 4, i32 0, i32 0, i32 0, i32 0)
>   %gep.0 = getelementptr float, ptr addrspace(3) @lds.0, i32 %i1
>   %gep.1 = getelementptr float, ptr addrspace(3) @lds.1, i32 %i2
>   %val.0 = load volatile float, ptr addrspace(3) %gep.0, align 4
>   %val.1 = load volatile float, ptr addrspace(3) %gep.1, align 4
>   %out.gep.1 = getelementptr float, ptr addrspace(1) %out, i32 1
>   store float %val.0, ptr addrspace(1) %out
>   store float %val.1, ptr addrspace(1) %out.gep.1
>   ret void
> }
> ```
> 
> Generates:
> 
> ```
>   s_load_dwordx8 s[4:11], s[0:1], 0x24
>   s_load_dword s2, s[0:1], 0x44
>   s_mov_b32 m0, 0
>   v_mov_b32_e32 v2, 0
>   s_waitcnt lgkmcnt(0)
>   buffer_load_dword off, s[4:7], 0 lds
>   s_mov_b32 m0, s2
>   s_lshl_b32 s0, s8, 2
>   buffer_load_dword off, s[4:7], 0 lds
>   s_lshl_b32 s1, s9, 2
>   v_mov_b32_e32 v0, s0
>   v_mov_b32_e32 v1, s1
>   s_waitcnt vmcnt(1)
>   ds_read_b32 v0, v0
>   s_waitcnt vmcnt(0)
>   ds_read_b32 v1, v1 offset:256
>   s_waitcnt lgkmcnt(0)
>   global_store_dwordx2 v2, v[0:1], s[10:11]
>   s_endpgm
> ```
> 
> The `s_waitcnt vmcnt(1)` seems incorrect, because the second 
> buffer-load-to-lds might clobber `@lds.0`.

This is still correct, pointer argument cannot alias module global. A pointer 
argument to a kernel is an LDS external requested by the host side, and host 
cannot see module LDS.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[compiler-rt] [libcxx] [flang] [libc] [lldb] [lld] [clang] [clang-tools-extra] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> How does this work in a case like this?
> 
> ```
> call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) 
> @lds.3, i32 4, i32 0, i32 0, i32 0, i32 0)
> call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) 
> %ptr, i32 4, i32 0, i32 0, i32 0, i32 0)
> %val.3 = load float, ptr addrspace(3) @lds.3, align 4
> ```
> 
> i.e.
> 
> * store to known lds address `@lds.3` (this will use slot 0 and another 
> slot e.g. slot 3?)
> 
> * store to unknown lds address (this will use slot 0?)
> 
> * load from known lds address `@lds.3` (this will use slot 3?)

It does not know the pointer, so it uses default slot 0 and waits till 0. I 
have to tell anyone interested here: before I even wrote this code it didn't 
know of the dependency and did not wait for anything at all. Everyone was happy.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] [compiler-rt] [lldb] [libc] [llvm] [lld] [flang] [clang-tools-extra] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-18 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

All split off parts were merged and this patch is merged with main. Only 
waitcount insertion pass changes remained here.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [libcxx] [lldb] [clang-tools-extra] [libc] [compiler-rt] [flang] [lld] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-18 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 1/9] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots for DS.
+  // 

[compiler-rt] [clang-tools-extra] [libcxx] [libc] [clang] [llvm] [flang] [AMDGPU] Produce better memoperand for LDS DMA (PR #75247)

2023-12-18 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/75247
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libc] [llvm] [libcxx] [clang-tools-extra] [clang] [compiler-rt] [flang] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)

2023-12-18 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/75249
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libc] [compiler-rt] [clang-tools-extra] [clang] [llvm] [flang] [libcxx] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)

2023-12-14 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

Ping. This one seems obvious to me.

https://github.com/llvm/llvm-project/pull/75249
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [llvm] [libc] [flang] [compiler-rt] [libcxx] [clang] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)

2023-12-13 Thread Stanislav Mekhanoshin via cfe-commits


@@ -3656,8 +3656,8 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const 
MachineInstr ,
   // underlying address space, even if it was lowered to a different one,
   // e.g. private accesses lowered to use MUBUF instructions on a scratch
   // buffer.
-  if (isDS(MIa)) {
-if (isDS(MIb))
+  if (isDS(MIa) || isLDSDMA(MIa)) {
+if (isDS(MIb) || isLDSDMA(MIb))
   return checkInstOffsetsDoNotOverlap(MIa, MIb);

rampitec wrote:

Just bail early.

https://github.com/llvm/llvm-project/pull/75249
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [llvm] [libc] [flang] [compiler-rt] [libcxx] [clang] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)

2023-12-13 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/75249

>From 82606c4447e8aa8edde90ed420f1c48707967695 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Tue, 12 Dec 2023 13:45:47 -0800
Subject: [PATCH 1/3] [AMDGPU] Fix lack of LDS DMA check in the AA handling

SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset
checks, but does not account for LDS DMA instructions. Added
these checks. Without it code falls through and returns true
which is wrong. As a result mayAlias would always return false
for LDS DMA and a regular LDS instruction or 2 LDS DMA instructions.

At the moment this is NFCI because we do not use this AA in a
context which may touch LDS DMA instructions. This is also
unreacheable now because of the ordered memory ref checks just
above in the function and LDS DMA is marked as volatile. This
volatile marking is removed in PR #75247, therefore I'd submit
this check before #75247.
---
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 ++--
 llvm/lib/Target/AMDGPU/SIInstrInfo.h   | 8 
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index d4e4526795f3b3..c485eb299d52a3 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -3656,8 +3656,8 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const 
MachineInstr ,
   // underlying address space, even if it was lowered to a different one,
   // e.g. private accesses lowered to use MUBUF instructions on a scratch
   // buffer.
-  if (isDS(MIa)) {
-if (isDS(MIb))
+  if (isDS(MIa) || isLDSDMA(MIa)) {
+if (isDS(MIb) || isLDSDMA(MIb))
   return checkInstOffsetsDoNotOverlap(MIa, MIb);
 
 return !isFLAT(MIb) || isSegmentSpecificFLAT(MIb);
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index e794d8cf7cc220..97800bda775cda 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -546,6 +546,14 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 return get(Opcode).TSFlags & SIInstrFlags::DS;
   }
 
+  static bool isLDSDMA(const MachineInstr ) {
+return isVALU(MI) && (isMUBUF(MI) || isFLAT(MI));
+  }
+
+  bool isLDSDMA(uint16_t Opcode) {
+return isVALU(Opcode) && (isMUBUF(Opcode) || isFLAT(Opcode));
+  }
+
   static bool isGWS(const MachineInstr ) {
 return MI.getDesc().TSFlags & SIInstrFlags::GWS;
   }

>From d8d9f3aab2d2fff2911a99d096685e78faf3d917 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 13 Dec 2023 11:42:10 -0800
Subject: [PATCH 2/3] Bail early in areMemAccessesTriviallyDisjoint

---
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 57eaefd41b2622..31669764144530 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -3651,6 +3651,9 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const 
MachineInstr ,
   if (MIa.hasOrderedMemoryRef() || MIb.hasOrderedMemoryRef())
 return false;
 
+  if (isLDSDMA(MIa) || isLDSDMA(MIb))
+return false;
+
   // TODO: Should we check the address space from the MachineMemOperand? That
   // would allow us to distinguish objects we know don't alias based on the
   // underlying address space, even if it was lowered to a different one,

>From 609be418b81f6ce8c9b323f60636af01f862a994 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 13 Dec 2023 11:45:50 -0800
Subject: [PATCH 3/3] Remove old code

---
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 31669764144530..d05d3c6996261f 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -3659,8 +3659,8 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const 
MachineInstr ,
   // underlying address space, even if it was lowered to a different one,
   // e.g. private accesses lowered to use MUBUF instructions on a scratch
   // buffer.
-  if (isDS(MIa) || isLDSDMA(MIa)) {
-if (isDS(MIb) || isLDSDMA(MIb))
+  if (isDS(MIa)) {
+if (isDS(MIb))
   return checkInstOffsetsDoNotOverlap(MIa, MIb);
 
 return !isFLAT(MIb) || isSegmentSpecificFLAT(MIb);

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [lldb] [llvm] [libc] [flang] [lld] [compiler-rt] [libcxx] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-13 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 1/9] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots for DS.
+  // 

[libc] [flang] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [lldb] [clang] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-13 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 1/8] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots for DS.
+  // 

[clang-tools-extra] [llvm] [libc] [flang] [compiler-rt] [libcxx] [clang] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)

2023-12-13 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/75249

>From 82606c4447e8aa8edde90ed420f1c48707967695 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Tue, 12 Dec 2023 13:45:47 -0800
Subject: [PATCH 1/2] [AMDGPU] Fix lack of LDS DMA check in the AA handling

SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset
checks, but does not account for LDS DMA instructions. Added
these checks. Without it code falls through and returns true
which is wrong. As a result mayAlias would always return false
for LDS DMA and a regular LDS instruction or 2 LDS DMA instructions.

At the moment this is NFCI because we do not use this AA in a
context which may touch LDS DMA instructions. This is also
unreacheable now because of the ordered memory ref checks just
above in the function and LDS DMA is marked as volatile. This
volatile marking is removed in PR #75247, therefore I'd submit
this check before #75247.
---
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 ++--
 llvm/lib/Target/AMDGPU/SIInstrInfo.h   | 8 
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index d4e4526795f3b3..c485eb299d52a3 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -3656,8 +3656,8 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const 
MachineInstr ,
   // underlying address space, even if it was lowered to a different one,
   // e.g. private accesses lowered to use MUBUF instructions on a scratch
   // buffer.
-  if (isDS(MIa)) {
-if (isDS(MIb))
+  if (isDS(MIa) || isLDSDMA(MIa)) {
+if (isDS(MIb) || isLDSDMA(MIb))
   return checkInstOffsetsDoNotOverlap(MIa, MIb);
 
 return !isFLAT(MIb) || isSegmentSpecificFLAT(MIb);
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index e794d8cf7cc220..97800bda775cda 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -546,6 +546,14 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 return get(Opcode).TSFlags & SIInstrFlags::DS;
   }
 
+  static bool isLDSDMA(const MachineInstr ) {
+return isVALU(MI) && (isMUBUF(MI) || isFLAT(MI));
+  }
+
+  bool isLDSDMA(uint16_t Opcode) {
+return isVALU(Opcode) && (isMUBUF(Opcode) || isFLAT(Opcode));
+  }
+
   static bool isGWS(const MachineInstr ) {
 return MI.getDesc().TSFlags & SIInstrFlags::GWS;
   }

>From d8d9f3aab2d2fff2911a99d096685e78faf3d917 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 13 Dec 2023 11:42:10 -0800
Subject: [PATCH 2/2] Bail early in areMemAccessesTriviallyDisjoint

---
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 57eaefd41b2622..31669764144530 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -3651,6 +3651,9 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const 
MachineInstr ,
   if (MIa.hasOrderedMemoryRef() || MIb.hasOrderedMemoryRef())
 return false;
 
+  if (isLDSDMA(MIa) || isLDSDMA(MIb))
+return false;
+
   // TODO: Should we check the address space from the MachineMemOperand? That
   // would allow us to distinguish objects we know don't alias based on the
   // underlying address space, even if it was lowered to a different one,

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   >