date:20240613

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits


illwieckz wrote:

```$
$ rg HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY

libc/utils/gpu/loader/amdgpu/Loader.cpp
521:   HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY),

openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
74:  HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY = 0xA016,

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
1892:if (auto Err = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY,
```

The `openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp` file requires the 
`HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY` symbol.

This symbol is expected to be provided by 
`openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h`, not by 
third-party external `/opt/rocm/include/hsa/hsa_ext_amd.h`

The code in `release/17.x` and `release/18.x` is explictely looking for 
`ROCm`'s `hsa/_ext_amd.h` and never look for LLVM `dynamic_hsa/hsa_ext_amd.h`. 
It tries to look for LLVM-provided `hsa_ext_amd.h` as a fallback but because of 
a mistake in `CMakeLists.txt`, this doesn't work in all cases because 
`dynamic_hsa` is not added to include directories in all cases.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

> We made a change recently that made the dynamic_hsa version the default. The 
> error you're seeing is from an old HSA, so if you're overriding the default 
> to use an old library that's probably not worth working around.

The error I see comes from the fact there is no old HSA around to workaround an 
LLVM bug.

There is no `hsa/hsa.h` in the tree, the default `dynamic_hsa` is not used.

The `hsa/hsa.h` file is from ROCm, not from LLVM.

Without such patch, LLVM requires ROCm to be installed and configured to be in 
default includes for `src/rtl.cpp` to build if `hsa.cpp` is not built.

This patch is to make LLVM use `dynamic_hsa` for building `src/rtl.cpp` because 
it is the default.

This patch is needed to build both `release/17.x` and `release/18.x`, the 
`main` branch changed the code layout so the patch will not work.

I assume a full LLVM build will not trigger the build problem because something 
else will include `dynamic_hsa` and will make it findable by `src/rtl.cpp` by 
luck. But when building a not-full LLVM, just what's needed by some 
applications, `dynamic_hsa` is not added to the include directories while being 
required by `src/rtl.cpp`.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Joseph Huber via llvm-branch-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libc] 4076c30 - [libc] more fix

2024-06-13 Thread Schrodinger ZHU Yifan via llvm-branch-commits


Author: Schrodinger ZHU Yifan
Date: 2024-06-13T20:22:21-07:00
New Revision: 4076c3004f09e95d1fcd299452843f99235ff422

URL: 
https://github.com/llvm/llvm-project/commit/4076c3004f09e95d1fcd299452843f99235ff422
DIFF: 
https://github.com/llvm/llvm-project/commit/4076c3004f09e95d1fcd299452843f99235ff422.diff

LOG: [libc] more fix

Added: 


Modified: 
libc/cmake/modules/LLVMLibCTestRules.cmake
libc/test/IntegrationTest/CMakeLists.txt
libc/test/IntegrationTest/test.cpp
libc/test/UnitTest/CMakeLists.txt
libc/test/UnitTest/HermeticTestUtils.cpp

Removed: 




diff  --git a/libc/cmake/modules/LLVMLibCTestRules.cmake 
b/libc/cmake/modules/LLVMLibCTestRules.cmake
index eb6be91b55e26..c8d7c8a2b1c7c 100644
--- a/libc/cmake/modules/LLVMLibCTestRules.cmake
+++ b/libc/cmake/modules/LLVMLibCTestRules.cmake
@@ -686,6 +686,15 @@ function(add_libc_hermetic_test test_name)
LibcTest.hermetic
libc.test.UnitTest.ErrnoSetterMatcher
${fq_deps_list})
+  # TODO: currently the dependency chain is broken such that getauxval cannot 
properly
+  # propagate to hermetic tests. This is a temporary workaround.
+  if (LIBC_TARGET_ARCHITECTURE_IS_AARCH64)
+target_link_libraries(
+  ${fq_build_target_name}
+  PRIVATE
+libc.src.sys.auxv.getauxval
+)
+  endif()
 
   # Tests on the GPU require an external loader utility to launch the kernel.
   if(TARGET libc.utils.gpu.loader)

diff  --git a/libc/test/IntegrationTest/CMakeLists.txt 
b/libc/test/IntegrationTest/CMakeLists.txt
index 4f31f10b29f0b..4a999407d48d7 100644
--- a/libc/test/IntegrationTest/CMakeLists.txt
+++ b/libc/test/IntegrationTest/CMakeLists.txt
@@ -1,3 +1,7 @@
+set(arch_specific_deps)
+if(LIBC_TARGET_ARCHITECTURE_IS_AARCH64)
+  set(arch_specific_deps libc.src.sys.auxv.getauxval)
+endif()
 add_object_library(
   test
   SRCS
@@ -8,4 +12,5 @@ add_object_library(
 test.h
   DEPENDS
 libc.src.__support.OSUtil.osutil
+${arch_specific_deps}
 )

diff  --git a/libc/test/IntegrationTest/test.cpp 
b/libc/test/IntegrationTest/test.cpp
index 27e7f29efa0f1..a8b2f2911fd8e 100644
--- a/libc/test/IntegrationTest/test.cpp
+++ b/libc/test/IntegrationTest/test.cpp
@@ -6,6 +6,8 @@
 //
 
//===--===//
 
+#include "src/__support/common.h"
+#include "src/sys/auxv/getauxval.h"
 #include 
 #include 
 
@@ -80,9 +82,11 @@ void *realloc(void *ptr, size_t s) {
 // __dso_handle when -nostdlib is used.
 void *__dso_handle = nullptr;
 
-// On some platform (aarch64 fedora tested) full build integration test
-// objects need to link against libgcc, which may expect a __getauxval
-// function. For now, it is fine to provide a weak definition that always
-// returns false.
-[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; }
+#ifdef LIBC_TARGET_ARCH_IS_AARCH64
+// Due to historical reasons, libgcc on aarch64 may expect __getauxval to be
+// defined. See also 
https://gcc.gnu.org/pipermail/gcc-cvs/2020-June/300635.html
+unsigned long __getauxval(unsigned long id) {
+  return LIBC_NAMESPACE::getauxval(id);
+}
+#endif
 } // extern "C"

diff  --git a/libc/test/UnitTest/CMakeLists.txt 
b/libc/test/UnitTest/CMakeLists.txt
index 302af3044ca3d..4adc2f5c725f7 100644
--- a/libc/test/UnitTest/CMakeLists.txt
+++ b/libc/test/UnitTest/CMakeLists.txt
@@ -41,7 +41,7 @@ function(add_unittest_framework_library name)
   target_compile_options(${name}.hermetic PRIVATE ${compile_options})
 
   if(TEST_LIB_DEPENDS)
-foreach(dep IN LISTS ${TEST_LIB_DEPENDS})
+foreach(dep IN ITEMS ${TEST_LIB_DEPENDS})
   if(TARGET ${dep}.unit)
 add_dependencies(${name}.unit ${dep}.unit)
   else()

diff  --git a/libc/test/UnitTest/HermeticTestUtils.cpp 
b/libc/test/UnitTest/HermeticTestUtils.cpp
index 349c182ff2379..6e815e6c8aab0 100644
--- a/libc/test/UnitTest/HermeticTestUtils.cpp
+++ b/libc/test/UnitTest/HermeticTestUtils.cpp
@@ -6,6 +6,8 @@
 //
 
//===--===//
 
+#include "src/__support/common.h"
+#include "src/sys/auxv/getauxval.h"
 #include 
 #include 
 
@@ -19,6 +21,12 @@ void *memmove(void *dst, const void *src, size_t count);
 void *memset(void *ptr, int value, size_t count);
 int atexit(void (*func)(void));
 
+// TODO: It seems that some old test frameworks does not use
+// add_libc_hermetic_test properly. Such that they won't get correct linkage
+// against the object containing this function. We create a dummy function that
+// always returns 0 to indicate a failure.
+[[gnu::weak]] unsigned long getauxval(unsigned long id) { return 0; }
+
 } // namespace LIBC_NAMESPACE
 
 namespace {
@@ -102,6 +110,14 @@ void __cxa_pure_virtual() {
 // __dso_handle when -nostdlib is used.
 void *__dso_handle = nullptr;
 
+#ifdef LIBC_TARGET_ARCH_IS_AARCH64
+// Due to historical reasons,

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Joseph Huber via llvm-branch-commits


https://github.com/jhuber6 commented:

We made a change recently that made the dynamic_hsa version the default. The 
error you're seeing is from an old HSA, so if you're overriding the default to 
use an old library that's probably not worth working around.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits



@@ -354,6 +354,23 @@ Given that ``signedPointer`` matches the layout for signed 
pointers signed with
 the given key, extract the raw pointer from it.  This operation does not trap
 and cannot fail, even if the pointer is not validly signed.
 
+``ptrauth_sign_constant``
+^
+
+.. code-block:: c
+
+  ptrauth_sign_constant(pointer, key, discriminator)
+
+Return a signed pointer for a constant address in a manner which guarantees
+a non-attackable sequence.
+
+``pointer`` must be a constant expression of pointer type which evaluates to
+a non-null pointer.  The result will have the same type as ``discriminator``.
+
+Calls to this are constant expressions if the discriminator is a null-pointer
+constant expression or an integer constant expression. Implementations may
+allow other pointer expressions as well.

ahmedbougacha wrote:

Yeah, I agree today this could simply be "it's always a constant expression";  
I'll rewrite it (cc @rjmccall if this looks like anything to you)

https://github.com/llvm/llvm-project/pull/93904
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits



@@ -354,6 +354,23 @@ Given that ``signedPointer`` matches the layout for signed 
pointers signed with
 the given key, extract the raw pointer from it.  This operation does not trap
 and cannot fail, even if the pointer is not validly signed.
 
+``ptrauth_sign_constant``
+^
+
+.. code-block:: c
+
+  ptrauth_sign_constant(pointer, key, discriminator)
+
+Return a signed pointer for a constant address in a manner which guarantees
+a non-attackable sequence.

ahmedbougacha wrote:

Later additions to this document describe that in depth, you can look for
> [clang][docs] Document the ptrauth security model.

on my branch

https://github.com/llvm/llvm-project/pull/93904
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits



@@ -58,6 +58,35 @@ void test_string_discriminator(const char *str) {
 }
 
 
+void test_sign_constant(int *dp, int (*fp)(int)) {
+  __builtin_ptrauth_sign_constant(, VALID_DATA_KEY); // expected-error 
{{too few arguments}}
+  __builtin_ptrauth_sign_constant(, VALID_DATA_KEY, , ); // 
expected-error {{too many arguments}}
+
+  __builtin_ptrauth_sign_constant(mismatched_type, VALID_DATA_KEY, 0); // 
expected-error {{signed value must have pointer type; type here is 'struct A'}}
+  __builtin_ptrauth_sign_constant(, mismatched_type, 0); // expected-error 
{{passing 'struct A' to parameter of incompatible type 'int'}}
+  __builtin_ptrauth_sign_constant(, VALID_DATA_KEY, mismatched_type); // 
expected-error {{extra discriminator must have pointer or integer type; type 
here is 'struct A'}}
+
+  (void) __builtin_ptrauth_sign_constant(NULL, VALID_DATA_KEY, ); // 
expected-error {{argument to ptrauth_sign_constant must refer to a global 
variable or function}}

ahmedbougacha wrote:

We could special-case null pointers, but they're already covered by the 
diagnostic, which asks for global variables or functions – which NULL isn't.  
For auth/sign, we don't have that sort of constraint on the pointer: it really 
is NULL and NULL alone that's special.

Now, the more interesting question is whether we should allow null pointers at 
all here.  Since defining these original builtins we have taught the qualifier 
to have a mode that signs/authenticates null, for some specific use-cases where 
replacing a signed value with NULL (which is otherwise never signed or 
authenticated) would bypass signing in a problematic way.
We haven't had the chance or need to revisit the builtins to allow sign/auth of 
NULL, but it's reasonable to add that support in the future.  We'd have to 
consider how to expose that in the builtins, because it's probably still 
something that's almost always a mistake;  more builtins would be an easy 
solution but maybe not a sophisticated one.

https://github.com/llvm/llvm-project/pull/93904
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits



@@ -2061,6 +2071,58 @@ ConstantLValueEmitter::VisitCallExpr(const CallExpr *E) {
   }
 }
 
+ConstantLValue
+ConstantLValueEmitter::emitPointerAuthSignConstant(const CallExpr *E) {
+  llvm::Constant *UnsignedPointer = emitPointerAuthPointer(E->getArg(0));
+  unsigned Key = emitPointerAuthKey(E->getArg(1));
+  llvm::Constant *StorageAddress;
+  llvm::Constant *OtherDiscriminator;
+  std::tie(StorageAddress, OtherDiscriminator) =

ahmedbougacha wrote:

Yeah, this simply predates structured bindings;  we can indeed use them now.

https://github.com/llvm/llvm-project/pull/93904
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits


illwieckz wrote:

@pranav-sivaraman try this patch:

```diff
diff --git a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt 
b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
index 92523c23f68b..92bcd94edb7a 100644
--- a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
+++ b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
@@ -56,13 +56,14 @@ include_directories(
 set(LIBOMPTARGET_DLOPEN_LIBHSA OFF)
 option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" 
${LIBOMPTARGET_DLOPEN_LIBHSA})
 
+include_directories(dynamic_hsa)
+
 if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA)
   libomptarget_say("Building AMDGPU plugin linked against libhsa")
   set(LIBOMPTARGET_EXTRA_SOURCE)
   set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64)
 else()
   libomptarget_say("Building AMDGPU plugin for dlopened libhsa")
-  include_directories(dynamic_hsa)
   set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp)
   set(LIBOMPTARGET_DEP_LIBRARIES)
 endif()
```

I haven't tested it, but maybe the mistake is similar.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits


illwieckz wrote:

The 14 branch seems to be very old, espially the file you link is in `plugins/` 
directory, while the files I modify are in `plugins-nextgen/` directory, witht 
the `plugins/` directory not existing anymore. So I strongly doubt the patch is 
useful for LLVM 14, but your problem probably needs another but similar 
solution.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)

2024-06-13 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: shaw young (shawbyoung)


Changes



Test Plan: tbd


---
Full diff: https://github.com/llvm/llvm-project/pull/95486.diff


2 Files Affected:

- (modified) bolt/lib/Profile/StaleProfileMatching.cpp (+29-8) 
- (modified) llvm/include/llvm/Transforms/Utils/SampleProfileInference.h (-2) 


``diff
diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 6588cf2c0ce66..cbd98f4d4769f 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(50), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks matched loosely.
+  uint64_t MatchedLooseBlocks{0};
+  /// The number of execution counts matched.
+  uint64_t MatchedExecCounts{0};
+};
+
 /// The object is used to identify and match basic blocks in a BinaryFunction
 /// given their hashes computed on a binary built from several revisions behind
 /// release.
@@ -400,7 +411,8 @@ createFlowFunction(const 
BinaryFunction::BasicBlockOrderType ) {
 void matchWeightsByHashes(BinaryContext ,
   const BinaryFunction::BasicBlockOrderType 
,
   const yaml::bolt::BinaryFunctionProfile ,
-  FlowFunction ) {
+  FlowFunction ,
+  FunctionMatchingData ) {
   assert(Func.Blocks.size() == BlockOrder.size() + 1);
 
   std::vector Blocks;
@@ -440,9 +452,11 @@ void matchWeightsByHashes(BinaryContext ,
   if (Matcher.isHighConfidenceMatch(BinHash, YamlHash)) {
 ++BC.Stats.NumMatchedBlocks;
 BC.Stats.MatchedSampleCount += YamlBB.ExecCount;
-Func.MatchedExecCount += YamlBB.ExecCount;
+FuncMatchingData.MatchedExecCounts += YamlBB.ExecCount;
+FuncMatchingData.MatchedExactBlocks += 1;
 LLVM_DEBUG(dbgs() << "  exact match\n");
   } else {
+FuncMatchingData.MatchedLooseBlocks += 1;
 LLVM_DEBUG(dbgs() << "  loose match\n");
   }
   if (YamlBB.NumInstructions == BB->size())
@@ -582,11 +596,14 @@ void preprocessUnreachableBlocks(FlowFunction ) {
 /// Decide if stale profile matching can be applied for a given function.
 /// Currently we skip inference for (very) large instances and for instances
 /// having "unexpected" control flow (e.g., having no sink basic blocks).
-bool canApplyInference(const FlowFunction , const 
yaml::bolt::BinaryFunctionProfile ) {
+bool canApplyInference(const FlowFunction ,
+   const yaml::bolt::BinaryFunctionProfile ,
+   const FunctionMatchingData ) {
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >= 
opts::MatchedProfileThreshold)
+  if ((double)FuncMatchingData.MatchedExactBlocks / YamlBF.Blocks.size() >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(
@@ -735,18 +752,22 @@ bool YAMLProfileReader::inferStaleProfile(
   const BinaryFunction::BasicBlockOrderType BlockOrder(
   BF.getLayout().block_begin(), BF.getLayout().block_end());
 
+  // Create a containter for function matching data.
+  FunctionMatchingData FuncMatchingData;
+
   // Create a wrapper flow function to use with the profile inference 
algorithm.
   FlowFunction Func = createFlowFunction(BlockOrder);
 
   // Match as many block/jump counts from the stale profile as possible
-  matchWeightsByHashes(BF.getBinaryContext(), BlockOrder, YamlBF, Func);
+  matchWeightsByHashes(BF.getBinaryContext(), BlockOrder, YamlBF, Func,
+   FuncMatchingData);
 
   // Adjust the flow function by marking unreachable blocks Unlikely so that
   // they don't get any counts assigned.
   preprocessUnreachableBlocks(Func);
 
   // Check if profile inference can be applied for the instance.
-  if (!canApplyInference(Func, YamlBF))
+  if (!canApplyInference(Func, YamlBF, FuncMatchingData))
 return false;
 
   // Apply the profile inference algorithm.
diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index e7971ca1cb428..b4ea1ad840f9d 100644
---

[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)

2024-06-13 Thread shaw young via llvm-branch-commits


https://github.com/shawbyoung closed 
https://github.com/llvm/llvm-project/pull/95486
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)

2024-06-13 Thread shaw young via llvm-branch-commits


https://github.com/shawbyoung created 
https://github.com/llvm/llvm-project/pull/95486



Test Plan: tbd



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Pranav Sivaraman via llvm-branch-commits


pranav-sivaraman wrote:

This is different from this 
[file](https://github.com/llvm/llvm-project/blob/release/14.x/openmp/libomptarget/plugins/amdgpu/impl/hsa_api.h)
 right? I'm trying to fix an issue when building LLVM 14 with a newer ROCm 
releases which fails to find the newer `hsa/hsa.h` headers. Not sure if I need 
to extend the patch to include these changes as well. 

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits


illwieckz wrote:

I first noticed the issue when building the chipStar fork of LLVM 17: 
https://github.com/CHIP-SPV/llvm-project (branch `chipStar-llvm-17`), but the 
code being the same in LLVM 18, it is expected to fail in LLVM 18 too.

The whole folder disappeared in `main` so I made this patch to target the most 
recent release branch having those files: LLVM18.

It would be good to backport it to LLVM 17 too.

I haven't checked it yet if versions older than LLVM 17 are affected.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Thomas Debesse (illwieckz)


Changes

The `dynamic_hsa/` include directory is required by both optional 
`dynamic_hsa/hsa.cpp` and non-optional `src/rtl.cpp`.

It should then always be included or the build will fail if only `src/rtl.cpp` 
is built.

This also simplifies the way header files from `dynamic_hsa/` are included in 
`src/rtl.cpp`.

Fixes:

```
error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope
```

---
Full diff: https://github.com/llvm/llvm-project/pull/95484.diff


2 Files Affected:

- (modified) openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt (+3-1) 
- (modified) openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp (-10) 


``diff
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt 
b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
index 68ce63467a6c8..42cc560c79112 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
@@ -38,13 +38,15 @@ add_definitions(-DDEBUG_PREFIX="TARGET AMDGPU RTL")
 set(LIBOMPTARGET_DLOPEN_LIBHSA OFF)
 option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" 
${LIBOMPTARGET_DLOPEN_LIBHSA})
 
+# Required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp.
+include_directories(dynamic_hsa)
+
 if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA)
   libomptarget_say("Building AMDGPU NextGen plugin linked against libhsa")
   set(LIBOMPTARGET_EXTRA_SOURCE)
   set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64)
 else()
   libomptarget_say("Building AMDGPU NextGen plugin for dlopened libhsa")
-  include_directories(dynamic_hsa)
   set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp)
   set(LIBOMPTARGET_DEP_LIBRARIES)
 endif()
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp 
b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
index 81634ae1edc49..8cedc72d5f63c 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -56,18 +56,8 @@
 #define BIGENDIAN_CPU
 #endif
 
-#if defined(__has_include)
-#if __has_include("hsa/hsa.h")
-#include "hsa/hsa.h"
-#include "hsa/hsa_ext_amd.h"
-#elif __has_include("hsa.h")
 #include "hsa.h"
 #include "hsa_ext_amd.h"
-#endif
-#else
-#include "hsa/hsa.h"
-#include "hsa/hsa_ext_amd.h"
-#endif
 
 namespace llvm {
 namespace omp {

``




https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits


https://github.com/illwieckz created 
https://github.com/llvm/llvm-project/pull/95484

The `dynamic_hsa/` include directory is required by both optional 
`dynamic_hsa/hsa.cpp` and non-optional `src/rtl.cpp`.

It should then always be included or the build will fail if only `src/rtl.cpp` 
is built.

This also simplifies the way header files from `dynamic_hsa/` are included in 
`src/rtl.cpp`.

Fixes:

```
error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope
```

>From e84e8bdef6d902d51a72eb93f7ca9812f0467c72 Mon Sep 17 00:00:00 2001
From: Thomas Debesse 
Date: Fri, 14 Jun 2024 00:38:25 +0200
Subject: [PATCH] release/18.x: [OpenMP][OMPT] Fix hsa include when building
 amdgpu/src/rtl.cpp
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The dynamic_hsa/ include directory is required by both
optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp.
It should then always be included or the build will fail
if only src/rtl.cpp is built.

This also simplifies the way header files from dynamic_hsa/
are included in src/rtl.cpp.

Fixes:

  error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope
---
 .../libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt |  4 +++-
 openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp | 10 --
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt 
b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
index 68ce63467a6c8..42cc560c79112 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
@@ -38,13 +38,15 @@ add_definitions(-DDEBUG_PREFIX="TARGET AMDGPU RTL")
 set(LIBOMPTARGET_DLOPEN_LIBHSA OFF)
 option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" 
${LIBOMPTARGET_DLOPEN_LIBHSA})
 
+# Required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp.
+include_directories(dynamic_hsa)
+
 if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA)
   libomptarget_say("Building AMDGPU NextGen plugin linked against libhsa")
   set(LIBOMPTARGET_EXTRA_SOURCE)
   set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64)
 else()
   libomptarget_say("Building AMDGPU NextGen plugin for dlopened libhsa")
-  include_directories(dynamic_hsa)
   set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp)
   set(LIBOMPTARGET_DEP_LIBRARIES)
 endif()
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp 
b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
index 81634ae1edc49..8cedc72d5f63c 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -56,18 +56,8 @@
 #define BIGENDIAN_CPU
 #endif
 
-#if defined(__has_include)
-#if __has_include("hsa/hsa.h")
-#include "hsa/hsa.h"
-#include "hsa/hsa_ext_amd.h"
-#elif __has_include("hsa.h")
 #include "hsa.h"
 #include "hsa_ext_amd.h"
-#endif
-#else
-#include "hsa/hsa.h"
-#include "hsa/hsa_ext_amd.h"
-#endif
 
 namespace llvm {
 namespace omp {

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits


https://github.com/ahmedbougacha updated 
https://github.com/llvm/llvm-project/pull/94394

>From 1e9a3fde97d907c3cd6be33db91d1c18c7236ffb Mon Sep 17 00:00:00 2001
From: Ahmed Bougacha 
Date: Tue, 4 Jun 2024 12:41:47 -0700
Subject: [PATCH 1/7] [Support] Reformat SipHash.cpp to match libSupport.

While there, give it our usual file header and an acknowledgement,
and remove the imported README.md.SipHash.
---
 llvm/lib/Support/README.md.SipHash | 126 --
 llvm/lib/Support/SipHash.cpp   | 264 ++---
 2 files changed, 129 insertions(+), 261 deletions(-)
 delete mode 100644 llvm/lib/Support/README.md.SipHash

diff --git a/llvm/lib/Support/README.md.SipHash 
b/llvm/lib/Support/README.md.SipHash
deleted file mode 100644
index 4de3cd1854681..0
--- a/llvm/lib/Support/README.md.SipHash
+++ /dev/null
@@ -1,126 +0,0 @@
-# SipHash
-
-[![License:
-CC0-1.0](https://licensebuttons.net/l/zero/1.0/80x15.png)](http://creativecommons.org/publicdomain/zero/1.0/)
-
-[![License: 
MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-
-
-SipHash is a family of pseudorandom functions (PRFs) optimized for speed on 
short messages.
-This is the reference C code of SipHash: portable, simple, optimized for 
clarity and debugging.
-
-SipHash was designed in 2012 by [Jean-Philippe Aumasson](https://aumasson.jp)
-and [Daniel J. Bernstein](https://cr.yp.to) as a defense against [hash-flooding
-DoS attacks](https://aumasson.jp/siphash/siphashdos_29c3_slides.pdf).
-
-SipHash is:
-
-* *Simpler and faster* on short messages than previous cryptographic
-algorithms, such as MACs based on universal hashing.
-
-* *Competitive in performance* with insecure non-cryptographic algorithms, 
such as [fhhash](https://github.com/cbreeden/fxhash).
-
-* *Cryptographically secure*, with no sign of weakness despite multiple 
[cryptanalysis](https://eprint.iacr.org/2019/865) 
[projects](https://eprint.iacr.org/2019/865) by leading cryptographers.
-
-* *Battle-tested*, with successful integration in OSs (Linux kernel, OpenBSD,
-FreeBSD, FreeRTOS), languages (Perl, Python, Ruby, etc.), libraries (OpenSSL 
libcrypto,
-Sodium, etc.) and applications (Wireguard, Redis, etc.).
-
-As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can 
also be used as a secure message authentication code (MAC).
-But SipHash is *not a hash* in the sense of general-purpose key-less hash 
function such as BLAKE3 or SHA-3.
-SipHash should therefore always be used with a secret key in order to be 
secure.
-
-
-## Variants
-
-The default SipHash is *SipHash-2-4*: it takes a 128-bit key, does 2 
compression
-rounds, 4 finalization rounds, and returns a 64-bit tag.
-
-Variants can use a different number of rounds. For example, we proposed 
*SipHash-4-8* as a conservative version.
-
-The following versions are not described in the paper but were designed and 
analyzed to fulfill applications' needs:
-
-* *SipHash-128* returns a 128-bit tag instead of 64-bit. Versions with 
specified number of rounds are SipHash-2-4-128, SipHash4-8-128, and so on.
-
-* *HalfSipHash* works with 32-bit words instead of 64-bit, takes a 64-bit key,
-and returns 32-bit or 64-bit tags. For example, HalfSipHash-2-4-32 has 2
-compression rounds, 4 finalization rounds, and returns a 32-bit tag.
-
-
-## Security
-
-(Half)SipHash-*c*-*d* with *c* ≥ 2 and *d* ≥ 4 is expected to provide the 
maximum PRF
-security for any function with the same key and output size.
-
-The standard PRF security goal allow the attacker access to the output of 
SipHash on messages chosen adaptively by the attacker.
-
-Security is limited by the key size (128 bits for SipHash), such that
-attackers searching 2*s* keys have chance 2*s*−128 of 
finding
-the SipHash key. 
-Security is also limited by the output size. In particular, when
-SipHash is used as a MAC, an attacker who blindly tries 2*s* tags 
will
-succeed with probability 2*s*-*t*, if *t* is that tag's bit size.
-
-
-## Research
-
-* [Research paper](https://www.aumasson.jp/siphash/siphash.pdf) "SipHash: a 
fast short-input PRF" (accepted at INDOCRYPT 2012)
-* [Slides](https://cr.yp.to/talks/2012.12.12/slides.pdf) of the presentation 
of SipHash at INDOCRYPT 2012 (Bernstein)
-* [Slides](https://www.aumasson.jp/siphash/siphash_slides.pdf) of the 
presentation of SipHash at the DIAC workshop (Aumasson)
-
-
-## Usage
-
-Running
-
-```sh
-  make
-```
-
-will build tests for 
-
-* SipHash-2-4-64
-* SipHash-2-4-128
-* HalfSipHash-2-4-32
-* HalfSipHash-2-4-64
-
-
-```C
-  ./test
-```
-
-verifies 64 test vectors, and
-
-```C
-  ./debug
-```
-
-does the same and prints intermediate values.
-
-The code can be adapted to implement SipHash-*c*-*d*, the version of SipHash
-with *c* compression rounds and *d* finalization rounds, by defining `cROUNDS`
-or `dROUNDS` when compiling.  This can be done with `-D` command line arguments
-to many compilers such as below.
-
-```sh
-gcc -Wall

[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits


ahmedbougacha wrote:

[37c84b9](https://github.com/llvm/llvm-project/pull/94394/commits/37c84b9dce70f40db8a7c27b7de8232c4d10f78f)
 shows what I had in mind, let me know what you all think.  I added:
```
void getSipHash_2_4_64(const uint8_t *In, uint64_t InLen,
   const uint8_t ()[16], uint8_t ()[8]);

void getSipHash_2_4_128(const uint8_t *In, uint64_t InLen,
const uint8_t ()[16], uint8_t ()[16]);
```
as the core interfaces, and mimicked the ref. test harness to reuse the same 
test vectors.  If this seems reasonable to yall I'm happy to extract the 
vectors.h file from the ref. implementation into the "Import original sources" 
PR – that's why I kept it open ;)

https://github.com/llvm/llvm-project/pull/94394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits


https://github.com/ahmedbougacha updated 
https://github.com/llvm/llvm-project/pull/94394

>From 1e9a3fde97d907c3cd6be33db91d1c18c7236ffb Mon Sep 17 00:00:00 2001
From: Ahmed Bougacha 
Date: Tue, 4 Jun 2024 12:41:47 -0700
Subject: [PATCH 1/6] [Support] Reformat SipHash.cpp to match libSupport.

While there, give it our usual file header and an acknowledgement,
and remove the imported README.md.SipHash.
---
 llvm/lib/Support/README.md.SipHash | 126 --
 llvm/lib/Support/SipHash.cpp   | 264 ++---
 2 files changed, 129 insertions(+), 261 deletions(-)
 delete mode 100644 llvm/lib/Support/README.md.SipHash

diff --git a/llvm/lib/Support/README.md.SipHash 
b/llvm/lib/Support/README.md.SipHash
deleted file mode 100644
index 4de3cd1854681..0
--- a/llvm/lib/Support/README.md.SipHash
+++ /dev/null
@@ -1,126 +0,0 @@
-# SipHash
-
-[![License:
-CC0-1.0](https://licensebuttons.net/l/zero/1.0/80x15.png)](http://creativecommons.org/publicdomain/zero/1.0/)
-
-[![License: 
MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-
-
-SipHash is a family of pseudorandom functions (PRFs) optimized for speed on 
short messages.
-This is the reference C code of SipHash: portable, simple, optimized for 
clarity and debugging.
-
-SipHash was designed in 2012 by [Jean-Philippe Aumasson](https://aumasson.jp)
-and [Daniel J. Bernstein](https://cr.yp.to) as a defense against [hash-flooding
-DoS attacks](https://aumasson.jp/siphash/siphashdos_29c3_slides.pdf).
-
-SipHash is:
-
-* *Simpler and faster* on short messages than previous cryptographic
-algorithms, such as MACs based on universal hashing.
-
-* *Competitive in performance* with insecure non-cryptographic algorithms, 
such as [fhhash](https://github.com/cbreeden/fxhash).
-
-* *Cryptographically secure*, with no sign of weakness despite multiple 
[cryptanalysis](https://eprint.iacr.org/2019/865) 
[projects](https://eprint.iacr.org/2019/865) by leading cryptographers.
-
-* *Battle-tested*, with successful integration in OSs (Linux kernel, OpenBSD,
-FreeBSD, FreeRTOS), languages (Perl, Python, Ruby, etc.), libraries (OpenSSL 
libcrypto,
-Sodium, etc.) and applications (Wireguard, Redis, etc.).
-
-As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can 
also be used as a secure message authentication code (MAC).
-But SipHash is *not a hash* in the sense of general-purpose key-less hash 
function such as BLAKE3 or SHA-3.
-SipHash should therefore always be used with a secret key in order to be 
secure.
-
-
-## Variants
-
-The default SipHash is *SipHash-2-4*: it takes a 128-bit key, does 2 
compression
-rounds, 4 finalization rounds, and returns a 64-bit tag.
-
-Variants can use a different number of rounds. For example, we proposed 
*SipHash-4-8* as a conservative version.
-
-The following versions are not described in the paper but were designed and 
analyzed to fulfill applications' needs:
-
-* *SipHash-128* returns a 128-bit tag instead of 64-bit. Versions with 
specified number of rounds are SipHash-2-4-128, SipHash4-8-128, and so on.
-
-* *HalfSipHash* works with 32-bit words instead of 64-bit, takes a 64-bit key,
-and returns 32-bit or 64-bit tags. For example, HalfSipHash-2-4-32 has 2
-compression rounds, 4 finalization rounds, and returns a 32-bit tag.
-
-
-## Security
-
-(Half)SipHash-*c*-*d* with *c* ≥ 2 and *d* ≥ 4 is expected to provide the 
maximum PRF
-security for any function with the same key and output size.
-
-The standard PRF security goal allow the attacker access to the output of 
SipHash on messages chosen adaptively by the attacker.
-
-Security is limited by the key size (128 bits for SipHash), such that
-attackers searching 2*s* keys have chance 2*s*−128 of 
finding
-the SipHash key. 
-Security is also limited by the output size. In particular, when
-SipHash is used as a MAC, an attacker who blindly tries 2*s* tags 
will
-succeed with probability 2*s*-*t*, if *t* is that tag's bit size.
-
-
-## Research
-
-* [Research paper](https://www.aumasson.jp/siphash/siphash.pdf) "SipHash: a 
fast short-input PRF" (accepted at INDOCRYPT 2012)
-* [Slides](https://cr.yp.to/talks/2012.12.12/slides.pdf) of the presentation 
of SipHash at INDOCRYPT 2012 (Bernstein)
-* [Slides](https://www.aumasson.jp/siphash/siphash_slides.pdf) of the 
presentation of SipHash at the DIAC workshop (Aumasson)
-
-
-## Usage
-
-Running
-
-```sh
-  make
-```
-
-will build tests for 
-
-* SipHash-2-4-64
-* SipHash-2-4-128
-* HalfSipHash-2-4-32
-* HalfSipHash-2-4-64
-
-
-```C
-  ./test
-```
-
-verifies 64 test vectors, and
-
-```C
-  ./debug
-```
-
-does the same and prints intermediate values.
-
-The code can be adapted to implement SipHash-*c*-*d*, the version of SipHash
-with *c* compression rounds and *d* finalization rounds, by defining `cROUNDS`
-or `dROUNDS` when compiling.  This can be done with `-D` command line arguments
-to many compilers such as below.
-
-```sh
-gcc -Wall

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Krzysztof Drewniak via llvm-branch-commits


krzysz00 wrote:

On the other hand, it's a lot easier to handle ugly types down in instruction 
selection, where you get to play much more fast and loose with types.

And there are buffer uses that don't fit into the fat pointer use use case 
where we'd still want them to work.
For example, both `str
uct.ptr.bufferload.v6f16` and `struct.ptr.buffer.load.v3f32` should be a 
`buffer_load_dwordx3`, but I'm pretty sure 6 x half isn't a register type.

The load and store intrinsics are already overloaded to handle various {8, 16, 
..., 128}-bit types, and it seems much cleaner to let it support any type of 
those lengths. It's just a load/store with somewhat weird indexing semantics, 
is all.

And then, since we're there ... `load i256, ptr addrspace(1) %p` legalizes to 
multiple instructions, and `{raw,struct}.ptr.buffer.load(ptr addrspace(8) %p, 
i32 %offset, ...)` should too. It's just a load, after all.

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)

2024-06-13 Thread Valentin Clement バレンタインクレメン via llvm-branch-commits



@@ -5745,6 +5745,14 @@ IntrinsicLibrary::genReduce(mlir::Type resultType,
   int rank = arrayTmp.rank();
   assert(rank >= 1);
 
+  // Arguements to the reduction operation are passed by reference or value?
+  bool argByRef = true;
+  if (auto embox =
+  mlir::dyn_cast_or_null(operation.getDefiningOp())) 
{

clementval wrote:

> Does REDUCE works with dummy procedure and procedure pointers? If so it would 
> be good to add tests for those cases to ensure the pattern matching here 
> works with them.

I'll check if this is supported and add proper test if it is. 

https://github.com/llvm/llvm-project/pull/95353
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [llvm] release/18.x: [lld] Fix -ObjC load behavior with LTO (#92162) (PR #92478)

2024-06-13 Thread via llvm-branch-commits


https://github.com/AtariDreams reopened 
https://github.com/llvm/llvm-project/pull/92478
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)

2024-06-13 Thread via llvm-branch-commits


llvmbot wrote:

@uweigand What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/95463
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)

2024-06-13 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/95463
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)

2024-06-13 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/95463

Backport 7e4c6e98fa05f5c3bf14f96365ae74a8d12c6257

Requested by: @nikic

>From 016c200faf4bcf1a531dabd4411a2ec4d0a23068 Mon Sep 17 00:00:00 2001
From: Jonas Paulsson 
Date: Mon, 15 Apr 2024 16:32:14 +0200
Subject: [PATCH] [SystemZ] Bugfix in getDemandedSrcElements(). (#88623)

For the intrinsic s390_vperm, all of the elements are demanded, so use
an APInt with the value of '-1' for them (not '1').

Fixes https://github.com/llvm/llvm-project/issues/88397

(cherry picked from commit 7e4c6e98fa05f5c3bf14f96365ae74a8d12c6257)
---
 .../Target/SystemZ/SystemZISelLowering.cpp|  2 +-
 .../SystemZ/knownbits-intrinsics-binop.ll | 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp 
b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 5e0b0594b0a42..3a297238c2088 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -7774,7 +7774,7 @@ static APInt getDemandedSrcElements(SDValue Op, const 
APInt ,
   break;
 }
 case Intrinsic::s390_vperm:
-  SrcDemE = APInt(NumElts, 1);
+  SrcDemE = APInt(NumElts, -1);
   break;
 default:
   llvm_unreachable("Unhandled intrinsic.");
diff --git a/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll 
b/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll
index 3bcbbb45581f9..b855d01934782 100644
--- a/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll
+++ b/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll
@@ -458,3 +458,22 @@ define <16 x i8> @f30() {
i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
   ret <16 x i8> %res
 }
+
+; Test VPERM with various constant operands.
+define i32 @f31() {
+; CHECK-LABEL: f31:
+; CHECK-LABEL: # %bb.0:
+; CHECK-NEXT: larl %r1, .LCPI31_0
+; CHECK-NEXT: vl %v0, 0(%r1), 3
+; CHECK-NEXT: larl %r1, .LCPI31_1
+; CHECK-NEXT: vl %v1, 0(%r1), 3
+; CHECK-NEXT: vperm %v0, %v1, %v1, %v0
+; CHECK-NEXT: vlgvb %r2, %v0, 0
+; CHECK-NEXT: nilf %r2, 7
+; CHECK-NEXT:   # kill: def $r2l killed $r2l killed 
$r2d
+; CHECK-NEXT: br %r14
+  %P = tail call <16 x i8> @llvm.s390.vperm(<16 x i8> , <16 x 
i8> , <16 x i8> )
+  %E = extractelement <16 x i8> %P, i64 0
+  %res = zext i8 %E to i32
+  ret i32 %res
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Bump version to 18.1.8 (PR #95458)

2024-06-13 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-testing-tools

Author: Tom Stellard (tstellar)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/95458.diff


2 Files Affected:

- (modified) llvm/CMakeLists.txt (+1-1) 
- (modified) llvm/utils/lit/lit/__init__.py (+1-1) 


``diff
diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index 51278943847aa..909a965cd86c8 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR)
   set(LLVM_VERSION_MINOR 1)
 endif()
 if(NOT DEFINED LLVM_VERSION_PATCH)
-  set(LLVM_VERSION_PATCH 7)
+  set(LLVM_VERSION_PATCH 8)
 endif()
 if(NOT DEFINED LLVM_VERSION_SUFFIX)
   set(LLVM_VERSION_SUFFIX)
diff --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py
index 5003d78ce5218..800d59492d8ff 100644
--- a/llvm/utils/lit/lit/__init__.py
+++ b/llvm/utils/lit/lit/__init__.py
@@ -2,7 +2,7 @@
 
 __author__ = "Daniel Dunbar"
 __email__ = "dan...@minormatter.com"
-__versioninfo__ = (18, 1, 7)
+__versioninfo__ = (18, 1, 8)
 __version__ = ".".join(str(v) for v in __versioninfo__) + "dev"
 
 __all__ = []

``




https://github.com/llvm/llvm-project/pull/95458
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Bump version to 18.1.8 (PR #95458)

2024-06-13 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar created 
https://github.com/llvm/llvm-project/pull/95458

None

>From 2edf6218b7e74cc76035e4e1efa8166b1c22312d Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Thu, 13 Jun 2024 12:33:39 -0700
Subject: [PATCH] Bump version to 18.1.8

---
 llvm/CMakeLists.txt| 2 +-
 llvm/utils/lit/lit/__init__.py | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index 51278943847aa..909a965cd86c8 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR)
   set(LLVM_VERSION_MINOR 1)
 endif()
 if(NOT DEFINED LLVM_VERSION_PATCH)
-  set(LLVM_VERSION_PATCH 7)
+  set(LLVM_VERSION_PATCH 8)
 endif()
 if(NOT DEFINED LLVM_VERSION_SUFFIX)
   set(LLVM_VERSION_SUFFIX)
diff --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py
index 5003d78ce5218..800d59492d8ff 100644
--- a/llvm/utils/lit/lit/__init__.py
+++ b/llvm/utils/lit/lit/__init__.py
@@ -2,7 +2,7 @@
 
 __author__ = "Daniel Dunbar"
 __email__ = "dan...@minormatter.com"
-__versioninfo__ = (18, 1, 7)
+__versioninfo__ = (18, 1, 8)
 __version__ = ".".join(str(v) for v in __versioninfo__) + "dev"
 
 __all__ = []

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits


https://github.com/gbMattN converted_to_draft 
https://github.com/llvm/llvm-project/pull/95387
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-13 Thread Kristof Beyls via llvm-branch-commits


kbeyls wrote:

> Yes, this doesn't have tests by itself because there's no exposed interface. 
> It's certainly trivial to add one (which would allow using the reference test 
> vectors).
> 
> I don't have strong arguments either way, but I figured the conservative 
> option is to force hypothetical users to consider their use more seriously. 
> One might argue that's not how we usually treat libSupport, so I'm happy to 
> expose the raw function here.

I see some value in being able to test with the reference test vectors to be 
fully sure that the implementation really implements SipHash. But as I said 
above, I'm happy with merging this as is.

https://github.com/llvm/llvm-project/pull/94394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] 7fe862d - Revert "[hwasan] Add fixed_shadow_base flag (#73980)"

2024-06-13 Thread via llvm-branch-commits


Author: Florian Mayer
Date: 2024-06-13T09:55:29-07:00
New Revision: 7fe862d0a1f6dfa67c236f5af32ad15546797404

URL: 
https://github.com/llvm/llvm-project/commit/7fe862d0a1f6dfa67c236f5af32ad15546797404
DIFF: 
https://github.com/llvm/llvm-project/commit/7fe862d0a1f6dfa67c236f5af32ad15546797404.diff

LOG: Revert "[hwasan] Add fixed_shadow_base flag (#73980)"

This reverts commit ea991a11b2a3d2bfa545adbefb71cd17e8970a43.

Added: 


Modified: 
compiler-rt/lib/hwasan/hwasan_flags.inc
compiler-rt/lib/hwasan/hwasan_linux.cpp

Removed: 
compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c



diff  --git a/compiler-rt/lib/hwasan/hwasan_flags.inc 
b/compiler-rt/lib/hwasan/hwasan_flags.inc
index 058a0457b9e7f..978fa46b705cb 100644
--- a/compiler-rt/lib/hwasan/hwasan_flags.inc
+++ b/compiler-rt/lib/hwasan/hwasan_flags.inc
@@ -84,10 +84,3 @@ HWASAN_FLAG(bool, malloc_bisect_dump, false,
 // are untagged before the call.
 HWASAN_FLAG(bool, fail_without_syscall_abi, true,
 "Exit if fail to request relaxed syscall ABI.")
-
-HWASAN_FLAG(
-uptr, fixed_shadow_base, -1,
-"If not -1, HWASan will attempt to allocate the shadow at this address, "
-"instead of choosing one dynamically."
-"Tip: this can be combined with the compiler option, "
-"-hwasan-mapping-offset, to optimize the instrumentation.")

diff  --git a/compiler-rt/lib/hwasan/hwasan_linux.cpp 
b/compiler-rt/lib/hwasan/hwasan_linux.cpp
index e6aa60b324fa7..c254670ee2d48 100644
--- a/compiler-rt/lib/hwasan/hwasan_linux.cpp
+++ b/compiler-rt/lib/hwasan/hwasan_linux.cpp
@@ -106,12 +106,8 @@ static uptr GetHighMemEnd() {
 }
 
 static void InitializeShadowBaseAddress(uptr shadow_size_bytes) {
-  if (flags()->fixed_shadow_base != (uptr)-1) {
-__hwasan_shadow_memory_dynamic_address = flags()->fixed_shadow_base;
-  } else {
-__hwasan_shadow_memory_dynamic_address =
-FindDynamicShadowStart(shadow_size_bytes);
-  }
+  __hwasan_shadow_memory_dynamic_address =
+  FindDynamicShadowStart(shadow_size_bytes);
 }
 
 static void MaybeDieIfNoTaggingAbi(const char *message) {

diff  --git a/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c 
b/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c
deleted file mode 100644
index 4ff1d3e64c1d0..0
--- a/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c
+++ /dev/null
@@ -1,76 +0,0 @@
-// Test fixed shadow base functionality.
-//
-// Default compiler instrumentation works with any shadow base (dynamic or 
fixed).
-// RUN: %clang_hwasan %s -o %t && %run %t
-// RUN: %clang_hwasan %s -o %t && 
HWASAN_OPTIONS=fixed_shadow_base=263878495698944 %run %t
-// RUN: %clang_hwasan %s -o %t && 
HWASAN_OPTIONS=fixed_shadow_base=4398046511104 %run %t
-//
-// If -hwasan-mapping-offset is set, then the fixed_shadow_base needs to match.
-// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=263878495698944 -o %t 
&& HWASAN_OPTIONS=fixed_shadow_base=263878495698944 %run %t
-// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=4398046511104 -o %t && 
HWASAN_OPTIONS=fixed_shadow_base=4398046511104 %run %t
-// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=263878495698944 -o %t 
&& HWASAN_OPTIONS=fixed_shadow_base=4398046511104 not %run %t
-// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=4398046511104 -o %t && 
HWASAN_OPTIONS=fixed_shadow_base=263878495698944 not %run %t
-//
-// Note: if fixed_shadow_base is not set, compiler-rt will dynamically choose a
-// shadow base, which has a tiny but non-zero probability of matching the
-// compiler instrumentation. To avoid test flake, we do not test this case.
-//
-// Assume 48-bit VMA
-// REQUIRES: aarch64-target-arch
-//
-// REQUIRES: Clang
-//
-// UNSUPPORTED: android
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-int main() {
-  __hwasan_enable_allocator_tagging();
-
-  // We test that the compiler instrumentation is able to access shadow memory
-  // for many 
diff erent addresses. If we only test a small number of addresses,
-  // it might work by chance even if the shadow base does not match between the
-  // compiler instrumentation and compiler-rt.
-  void **mmaps[256];
-  // 48-bit VMA
-  for (int i = 0; i < 256; i++) {
-unsigned long long addr = (i * (1ULL << 40));
-
-void *p = mmap((void *)addr, 4096, PROT_READ | PROT_WRITE,
-   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-// We don't use MAP_FIXED, to avoid overwriting critical memory.
-// However, if we don't get allocated the requested address, it
-// isn't a useful test.
-if ((unsigned long long)p != addr) {
-  munmap(p, 4096);
-  mmaps[i] = MAP_FAILED;
-} else {
-  mmaps[i] = p;
-}
-  }
-
-  int failures = 0;
-  for (int i = 0; i < 256; i++) {
-if (mmaps[i] == MAP_FAILED) {
-  failures++;
-} else {
-  printf("%d %p\n", i, mmaps[i]);
-  munmap(mmaps[i], 4096);
-}
-  }
-
-

[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)

2024-06-13 Thread via llvm-branch-commits


https://github.com/jeanPerier approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/95353
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-13 Thread shaw young via llvm-branch-commits



@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) {
 /// Decide if stale profile matching can be applied for a given function.
 /// Currently we skip inference for (very) large instances and for instances
 /// having "unexpected" control flow (e.g., having no sink basic blocks).
-bool canApplyInference(const FlowFunction ) {
+bool canApplyInference(const FlowFunction ,
+   const yaml::bolt::BinaryFunctionProfile ) {
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
+return false;

shawbyoung wrote:

I’m leaning towards the block count heuristic now. I think the 1M and 4x1K exec 
count block case is likely pretty common – I imagine functions with loops would 
look a lot like this. Having some blocks matched exactly would suggest to me 
that there would likely be a reasonable amount of similarity between the 
profiled function and existing function relationally, which block coldness 
likely doesn’t have an outsized bearing on. I think having a reasonably high 
threshold for matched blocks would conservatively allow us to drop functions in 
high discrepancy – I’ll test this on a production binary. 

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Krzysztof Drewniak via llvm-branch-commits


krzysz00 wrote:

Yeah, makes sense.

... what prevents a match-bitwidth operator from existing?

Context from where I'm standing is that you should be able to 
`raw.buffer.load/store` any (non-aggregate, let's say, since that could be 
better handled in `addrspace(7)` handling) type you could `load` or `store`.

That is, `raw.ptr.buffer.load.i15` should work (as an i16 load that truncates) 
as should `raw.ptr.buffer.store.v8f32` (or `raw.ptr.buffer.store.i256`). Sure, 
the latter are two instructions long, but regular loads can regularize to 
multiple instructions just fine. 

My thoughts on how to implement that second behavior were to split the type 
into legal chunks and add in the offsets, and then merge/bitcast the values 
back.

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)

2024-06-13 Thread via llvm-branch-commits



@@ -5745,6 +5745,14 @@ IntrinsicLibrary::genReduce(mlir::Type resultType,
   int rank = arrayTmp.rank();
   assert(rank >= 1);
 
+  // Arguements to the reduction operation are passed by reference or value?
+  bool argByRef = true;
+  if (auto embox =
+  mlir::dyn_cast_or_null(operation.getDefiningOp())) 
{

jeanPerier wrote:

Does REDUCE works with dummy procedure and procedure pointers? If so it would 
be good to add tests for those cases to ensure the pattern matching here works 
with them.

https://github.com/llvm/llvm-project/pull/95353
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

I don't think we should be trying to handle the unreasonable illegal types in 
the intrinsics themselves. Theoretically the intrinsic should correspond to 
direct support.

We would handle the ugly types in the fat pointer lowering in terms of the 
intrinsics. 

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits


gbMattN wrote:

This may be a side effect of a different bug tracking global variables. I think 
fixing that bug first, and then applying this change if the problem persists is 
a better idea. Because of this, I'm switching this to a draft for now. 
Discourse link is 
https://discourse.llvm.org/t/reviving-typesanitizer-a-sanitizer-to-catch-type-based-aliasing-violations/66092/23

https://github.com/llvm/llvm-project/pull/95387
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libc] 93e7f14 - Revert "[libc] fix aarch64 linux full build (#95358)"

2024-06-13 Thread via llvm-branch-commits


Author: Schrodinger ZHU Yifan
Date: 2024-06-13T07:54:57-07:00
New Revision: 93e7f145bc38c7c47d797e652d891695eb44fcfa

URL: 
https://github.com/llvm/llvm-project/commit/93e7f145bc38c7c47d797e652d891695eb44fcfa
DIFF: 
https://github.com/llvm/llvm-project/commit/93e7f145bc38c7c47d797e652d891695eb44fcfa.diff

LOG: Revert "[libc] fix aarch64 linux full build (#95358)"

This reverts commit ca05204f9aa258c5324d5675c7987c7e570168a0.

Added: 


Modified: 
libc/config/linux/aarch64/entrypoints.txt
libc/src/__support/threads/linux/CMakeLists.txt
libc/test/IntegrationTest/test.cpp

Removed: 




diff  --git a/libc/config/linux/aarch64/entrypoints.txt 
b/libc/config/linux/aarch64/entrypoints.txt
index 7ce088689b925..db96a80051a8d 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -643,12 +643,6 @@ if(LLVM_LIBC_FULL_BUILD)
 libc.src.pthread.pthread_mutexattr_setrobust
 libc.src.pthread.pthread_mutexattr_settype
 libc.src.pthread.pthread_once
-libc.src.pthread.pthread_rwlockattr_destroy
-libc.src.pthread.pthread_rwlockattr_getkind_np
-libc.src.pthread.pthread_rwlockattr_getpshared
-libc.src.pthread.pthread_rwlockattr_init
-libc.src.pthread.pthread_rwlockattr_setkind_np
-libc.src.pthread.pthread_rwlockattr_setpshared
 libc.src.pthread.pthread_setspecific
 
 # sched.h entrypoints
@@ -759,7 +753,6 @@ if(LLVM_LIBC_FULL_BUILD)
 libc.src.unistd._exit
 libc.src.unistd.environ
 libc.src.unistd.execv
-libc.src.unistd.fork
 libc.src.unistd.getopt
 libc.src.unistd.optarg
 libc.src.unistd.optind

diff  --git a/libc/src/__support/threads/linux/CMakeLists.txt 
b/libc/src/__support/threads/linux/CMakeLists.txt
index 8e6cd7227b2c8..9bf88ccc84557 100644
--- a/libc/src/__support/threads/linux/CMakeLists.txt
+++ b/libc/src/__support/threads/linux/CMakeLists.txt
@@ -64,7 +64,6 @@ add_object_library(
 .futex_utils
 libc.config.linux.app_h
 libc.include.sys_syscall
-libc.include.fcntl
 libc.src.errno.errno
 libc.src.__support.CPP.atomic
 libc.src.__support.CPP.stringstream

diff  --git a/libc/test/IntegrationTest/test.cpp 
b/libc/test/IntegrationTest/test.cpp
index 27e7f29efa0f1..3bdbe89a3fb62 100644
--- a/libc/test/IntegrationTest/test.cpp
+++ b/libc/test/IntegrationTest/test.cpp
@@ -79,10 +79,4 @@ void *realloc(void *ptr, size_t s) {
 // Integration tests are linked with -nostdlib. BFD linker expects
 // __dso_handle when -nostdlib is used.
 void *__dso_handle = nullptr;
-
-// On some platform (aarch64 fedora tested) full build integration test
-// objects need to link against libgcc, which may expect a __getauxval
-// function. For now, it is fine to provide a weak definition that always
-// returns false.
-[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; }
 } // extern "C"



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libc] 91323a6 - Revert "Revert "[libc] fix aarch64 linux full build (#95358)" (#95419)"

2024-06-13 Thread via llvm-branch-commits


Author: Schrodinger ZHU Yifan
Date: 2024-06-13T08:38:05-07:00
New Revision: 91323a6ea8f32a9fe2cec7051e8a99b87157133e

URL: 
https://github.com/llvm/llvm-project/commit/91323a6ea8f32a9fe2cec7051e8a99b87157133e
DIFF: 
https://github.com/llvm/llvm-project/commit/91323a6ea8f32a9fe2cec7051e8a99b87157133e.diff

LOG: Revert "Revert "[libc] fix aarch64 linux full build (#95358)" (#95419)"

This reverts commit 9e5428e6b02c77fb18c4bdf688a216c957fd7a53.

Added: 


Modified: 
libc/config/linux/aarch64/entrypoints.txt
libc/src/__support/threads/linux/CMakeLists.txt
libc/test/IntegrationTest/test.cpp

Removed: 




diff  --git a/libc/config/linux/aarch64/entrypoints.txt 
b/libc/config/linux/aarch64/entrypoints.txt
index db96a80051a8d..7ce088689b925 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -643,6 +643,12 @@ if(LLVM_LIBC_FULL_BUILD)
 libc.src.pthread.pthread_mutexattr_setrobust
 libc.src.pthread.pthread_mutexattr_settype
 libc.src.pthread.pthread_once
+libc.src.pthread.pthread_rwlockattr_destroy
+libc.src.pthread.pthread_rwlockattr_getkind_np
+libc.src.pthread.pthread_rwlockattr_getpshared
+libc.src.pthread.pthread_rwlockattr_init
+libc.src.pthread.pthread_rwlockattr_setkind_np
+libc.src.pthread.pthread_rwlockattr_setpshared
 libc.src.pthread.pthread_setspecific
 
 # sched.h entrypoints
@@ -753,6 +759,7 @@ if(LLVM_LIBC_FULL_BUILD)
 libc.src.unistd._exit
 libc.src.unistd.environ
 libc.src.unistd.execv
+libc.src.unistd.fork
 libc.src.unistd.getopt
 libc.src.unistd.optarg
 libc.src.unistd.optind

diff  --git a/libc/src/__support/threads/linux/CMakeLists.txt 
b/libc/src/__support/threads/linux/CMakeLists.txt
index 9bf88ccc84557..8e6cd7227b2c8 100644
--- a/libc/src/__support/threads/linux/CMakeLists.txt
+++ b/libc/src/__support/threads/linux/CMakeLists.txt
@@ -64,6 +64,7 @@ add_object_library(
 .futex_utils
 libc.config.linux.app_h
 libc.include.sys_syscall
+libc.include.fcntl
 libc.src.errno.errno
 libc.src.__support.CPP.atomic
 libc.src.__support.CPP.stringstream

diff  --git a/libc/test/IntegrationTest/test.cpp 
b/libc/test/IntegrationTest/test.cpp
index 3bdbe89a3fb62..27e7f29efa0f1 100644
--- a/libc/test/IntegrationTest/test.cpp
+++ b/libc/test/IntegrationTest/test.cpp
@@ -79,4 +79,10 @@ void *realloc(void *ptr, size_t s) {
 // Integration tests are linked with -nostdlib. BFD linker expects
 // __dso_handle when -nostdlib is used.
 void *__dso_handle = nullptr;
+
+// On some platform (aarch64 fedora tested) full build integration test
+// objects need to link against libgcc, which may expect a __getauxval
+// function. For now, it is fine to provide a weak definition that always
+// returns false.
+[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; }
 } // extern "C"



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

That's what we've traditionally done and I think we should stop. We currently 
skip inserting the casts if the type is legal. 

It introduces extra bitcasts, which have a cost and increase pattern match 
complexity. We have a bunch of patterns that don't bother to look through the 
casts for a load/store 

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Krzysztof Drewniak via llvm-branch-commits


krzysz00 wrote:

So, general question on this patch series:

Wouldn't it be more reasonable to, instead of having separate handling for all 
the possible register types, always do loads as `i8`, `i16`, `i32` `<2 x i32>`, 
`<3 x i32>, or `<4 x i32>` and then `bitcast`/`merge_values`/... the results 
back to their type?

Or at least to have that fallback path - if we don't know what a type is, 
load/store it as its bits?

(Then we wouldn't need to, for example, go back and add a `<16 x i8>` case if 
someone realizes they want that)

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-13 Thread Yaxun Liu via llvm-branch-commits



@@ -117,13 +117,44 @@ void test_update_dpp(global int* out, int arg1, int arg2)
 }
 
 // CHECK-LABEL: @test_ds_fadd
-// CHECK: {{.*}}call{{.*}} float @llvm.amdgcn.ds.fadd.f32(ptr addrspace(3) 
%out, float %src, i32 0, i32 0, i1 false)
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src monotonic, align 
4{{$}}
+// CHECK: atomicrmw volatile fadd ptr addrspace(3) %out, float %src monotonic, 
align 4{{$}}
+
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acquire, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acquire, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src release, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acq_rel, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src seq_cst, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src seq_cst, align 
4{{$}}
+
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src syncscope("agent") 
monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src 
syncscope("workgroup") monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src 
syncscope("wavefront") monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src 
syncscope("singlethread") monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src monotonic, align 
4{{$}}
 #if !defined(__SPIRV__)
 void test_ds_faddf(local float *out, float src) {
 #else
-void test_ds_faddf(__attribute__((address_space(3))) float *out, float src) {
+  void test_ds_faddf(__attribute__((address_space(3))) float *out, float src) {
 #endif
+
   *out = __builtin_amdgcn_ds_faddf(out, src, 0, 0, false);
+  *out = __builtin_amdgcn_ds_faddf(out, src, 0, 0, true);
+
+  // Test all orders.
+  *out = __builtin_amdgcn_ds_faddf(out, src, 1, 0, false);

yxsamliu wrote:

better use predefined macros
```
  // Define macros for the C11 / C++11 memory orderings
  Builder.defineMacro("__ATOMIC_RELAXED", "0");
  Builder.defineMacro("__ATOMIC_CONSUME", "1");
  Builder.defineMacro("__ATOMIC_ACQUIRE", "2");
  Builder.defineMacro("__ATOMIC_RELEASE", "3");
  Builder.defineMacro("__ATOMIC_ACQ_REL", "4");
  Builder.defineMacro("__ATOMIC_SEQ_CST", "5");

  // Define macros for the clang atomic scopes.
  Builder.defineMacro("__MEMORY_SCOPE_SYSTEM", "0");
  Builder.defineMacro("__MEMORY_SCOPE_DEVICE", "1");
  Builder.defineMacro("__MEMORY_SCOPE_WRKGRP", "2");
  Builder.defineMacro("__MEMORY_SCOPE_WVFRNT", "3");
  Builder.defineMacro("__MEMORY_SCOPE_SINGLE", "4");

```

https://github.com/llvm/llvm-project/pull/95395
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits


gbMattN wrote:

@fhahn 

https://github.com/llvm/llvm-project/pull/95387
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-13 Thread Christudasan Devadasan via llvm-branch-commits



@@ -2331,40 +2337,74 @@ static Value *upgradeARMIntrinsicCall(StringRef Name, 
CallBase *CI, Function *F,
   llvm_unreachable("Unknown function for ARM CallBase upgrade.");
 }
 
+// These are expected to have have the arguments:

cdevadas wrote:

```suggestion
// These are expected to have the arguments:
```

https://github.com/llvm/llvm-project/pull/95396
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits


https://github.com/gbMattN updated 
https://github.com/llvm/llvm-project/pull/95387

>From 432f994b1bc21e4db0778fff9cc1425f788f8168 Mon Sep 17 00:00:00 2001
From: Matthew Nagy 
Date: Thu, 13 Jun 2024 09:54:04 +
Subject: [PATCH] [TySan] Fixed false positive when accessing offset member
 variables

---
 compiler-rt/lib/tysan/tysan.cpp | 12 +-
 compiler-rt/test/tysan/struct-members.c | 31 +
 2 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 compiler-rt/test/tysan/struct-members.c

diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp
index f627851d049e6..747727e48a152 100644
--- a/compiler-rt/lib/tysan/tysan.cpp
+++ b/compiler-rt/lib/tysan/tysan.cpp
@@ -221,7 +221,17 @@ __tysan_check(void *addr, int size, tysan_type_descriptor 
*td, int flags) {
 OldTDPtr -= i;
 OldTD = *OldTDPtr;
 
-if (!isAliasingLegal(td, OldTD))
+tysan_type_descriptor *InternalMember = OldTD;
+if (OldTD->Tag == TYSAN_STRUCT_TD) {
+  for (int j = 0; j < OldTD->Struct.MemberCount; j++) {
+if (OldTD->Struct.Members[j].Offset == i) {
+  InternalMember = OldTD->Struct.Members[j].Type;
+  break;
+}
+  }
+}
+
+if (!isAliasingLegal(td, InternalMember))
   reportError(addr, size, td, OldTD, AccessStr,
   "accesses part of an existing object", -i, pc, bp, sp);
 
diff --git a/compiler-rt/test/tysan/struct-members.c 
b/compiler-rt/test/tysan/struct-members.c
new file mode 100644
index 0..76ea3c431dd7b
--- /dev/null
+++ b/compiler-rt/test/tysan/struct-members.c
@@ -0,0 +1,31 @@
+// RUN: %clang_tysan -O0 %s -o %t && %run %t >%t.out 2>&1
+// RUN: FileCheck %s < %t.out
+
+#include 
+
+struct X {
+  int a, b, c;
+} x;
+
+static struct X xArray[2];
+
+int main() {
+  x.a = 1;
+  x.b = 2;
+  x.c = 3;
+
+  printf("%d %d %d\n", x.a, x.b, x.c);
+  // CHECK-NOT: ERROR: TypeSanitizer: type-aliasing-violation
+
+  for (size_t i = 0; i < 2; i++) {
+xArray[i].a = 1;
+xArray[i].b = 1;
+xArray[i].c = 1;
+  }
+
+  struct X *xPtr = (struct X *)&(xArray[0].c);
+  xPtr->a = 1;
+  // CHECK: ERROR: TypeSanitizer: type-aliasing-violation
+  // CHECK: WRITE of size 4 at {{.*}} with type int (in X at offset 0) 
accesses an existing object of type int (in X at offset 8)
+  // CHECK: {{#0 0x.* in main .*struct-members.c:}}[[@LINE-3]]
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits



@@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_LoadIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}

arsenm wrote:

I'm not a big fan of omitting the braces, especially in tablegen. If we're 
going to delete the braces the lines should at least be indented 

https://github.com/llvm/llvm-project/pull/95378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/95379

>From 14695322d92821374dd6599d8f0f76d212e50169 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 12 Jun 2024 10:10:20 +0200
Subject: [PATCH] AMDGPU: Fix buffer load/store of pointers

Make sure we test all the address spaces since this support isn't
free in gisel.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  31 +-
 .../AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll | 596 ++
 .../llvm.amdgcn.raw.ptr.buffer.store.ll   | 456 ++
 3 files changed, 1071 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 81098201e9c0f..7a36c88b892c8 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1112,29 +1112,33 @@ unsigned 
SITargetLowering::getVectorTypeBreakdownForCallingConv(
 Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT);
 }
 
-static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrData(const SITargetLowering ,
+ const DataLayout , Type *Ty,
+ unsigned MaxNumLanes) {
   assert(MaxNumLanes != 0);
 
+  LLVMContext  = Ty->getContext();
   if (auto *VT = dyn_cast(Ty)) {
 unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements());
-return EVT::getVectorVT(Ty->getContext(),
-EVT::getEVT(VT->getElementType()),
+return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()),
 NumElts);
   }
 
-  return EVT::getEVT(Ty);
+  return TLI.getValueType(DL, Ty);
 }
 
 // Peek through TFE struct returns to only use the data size.
-static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrReturn(const SITargetLowering ,
+   const DataLayout , Type *Ty,
+   unsigned MaxNumLanes) {
   auto *ST = dyn_cast(Ty);
   if (!ST)
-return memVTFromLoadIntrData(Ty, MaxNumLanes);
+return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes);
 
   // TFE intrinsics return an aggregate type.
   assert(ST->getNumContainedTypes() == 2 &&
  ST->getContainedType(1)->isIntegerTy(32));
-  return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes);
+  return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes);
 }
 
 /// Map address space 7 to MVT::v5i32 because that's its in-memory
@@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
 }
 
-Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes);
+Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(),
+ CI.getType(), MaxNumLanes);
   } else {
-Info.memVT = memVTFromLoadIntrReturn(
-CI.getType(), std::numeric_limits::max());
+Info.memVT =
+memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(),
+std::numeric_limits::max());
   }
 
   // FIXME: What does alignment mean for an image?
@@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   if (RsrcIntr->IsImage) {
 unsigned DMask = 
cast(CI.getArgOperand(1))->getZExtValue();
 unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
-Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes);
+Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy,
+   DMaskLanes);
   } else
-Info.memVT = EVT::getEVT(DataTy);
+Info.memVT = getValueType(MF.getDataLayout(), DataTy);
 
   Info.flags |= MachineMemOperand::MOStore;
 } else {
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
index 3e3371091ef72..4d557c76dc4d0 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
@@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr 
addrspace(8) inreg %rsrc, i
   ret <2 x i64> %data
 }
 
+define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 
%voffset) {
+; PREGFX10-LABEL: buffer_load_p0__voffset_add:
+; PREGFX10:   ; %bb.0:
+; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60
+; PREGFX10-NEXT:s_waitcnt vmcnt(0)
+; PREGFX10-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: buffer_load_p0__voffset_add:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60
+;

[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/95378

>From 1dfcc0961e82bbe656faded0c38e694da0d76c9b Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 9 Jun 2024 23:12:31 +0200
Subject: [PATCH] AMDGPU: Cleanup selection patterns for buffer loads

We should just support these for all register types.
---
 llvm/lib/Target/AMDGPU/BUFInstructions.td | 72 ++-
 llvm/lib/Target/AMDGPU/SIRegisterInfo.td  | 16 ++---
 2 files changed, 39 insertions(+), 49 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td 
b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 50e62788c5eac..978d261f5a662 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_LoadIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
 
 defm : MUBUF_LoadIntrinsicPat;
 defm : MUBUF_LoadIntrinsicPat;
@@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_StoreIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
 
 defm : MUBUF_StoreIntrinsicPat;
 defm : MUBUF_StoreIntrinsicPat;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index caac7126068ef..a8efe2b2ba35e 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -586,7 +586,9 @@ class RegisterTypes reg_types> {
 
 def Reg16Types : RegisterTypes<[i16, f16, bf16]>;
 def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, 
p6]>;
-def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>;
+def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, 
v4bf16]>;
+def Reg96Types : RegisterTypes<[v3i32, v3f32]>;
+def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, 
v8bf16]>;
 
 let HasVGPR = 1 in {
 // VOP3 and VINTERP can access 256 lo and 256 hi registers.
@@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16,
   let BaseClassOrder = 1;
 }
 
-def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, 
v8f16, v8bf16], 32,
+def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32,
   (add PRIVATE_RSRC_REG)> {
   let isAllocatable = 0;
   let CopyCost = -1;
@@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16, v
   let HasSGPR = 1;
 }
 
-def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, 
v4bf16], 32,
+def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32,
 (add SGPR_64Regs)> {
   let CopyCost = 1;
   let AllocationPriority = 1;
@@ -905,8 +907,8 @@ multiclass SRegClass;
-defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], 
SGPR_128Regs, TTMP_128Regs>;
+defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>;
+defm "" : SRegClass<4, Reg128Types.types, SGPR_128Regs, TTMP_128Regs>;
 defm "" : SRegClass<5,

[llvm-branch-commits] [clang] 0e8c9bc - Revert "[clang][NFC] Add a test for CWG2685 (#95206)"

2024-06-13 Thread via llvm-branch-commits


Author: Younan Zhang
Date: 2024-06-13T18:53:46+08:00
New Revision: 0e8c9bca863137f14aea2cee0e05d4270b33e0e8

URL: 
https://github.com/llvm/llvm-project/commit/0e8c9bca863137f14aea2cee0e05d4270b33e0e8
DIFF: 
https://github.com/llvm/llvm-project/commit/0e8c9bca863137f14aea2cee0e05d4270b33e0e8.diff

LOG: Revert "[clang][NFC] Add a test for CWG2685 (#95206)"

This reverts commit 3475116e2c37a2c8a69658b36c02871c322da008.

Added: 


Modified: 
clang/test/CXX/drs/cwg26xx.cpp
clang/www/cxx_dr_status.html

Removed: 




diff  --git a/clang/test/CXX/drs/cwg26xx.cpp b/clang/test/CXX/drs/cwg26xx.cpp
index fee3ef16850bf..2b17c8101438d 100644
--- a/clang/test/CXX/drs/cwg26xx.cpp
+++ b/clang/test/CXX/drs/cwg26xx.cpp
@@ -225,15 +225,6 @@ void m() {
 }
 
 #if __cplusplus >= 202302L
-
-namespace cwg2685 { // cwg2685: 17
-template 
-struct A {
-  T ar[4];
-};
-A a = { "foo" };
-}
-
 namespace cwg2687 { // cwg2687: 18
 struct S{
 void f(int);

diff  --git a/clang/www/cxx_dr_status.html b/clang/www/cxx_dr_status.html
index 8c79708f23abd..5e2ab06701703 100755
--- a/clang/www/cxx_dr_status.html
+++ b/clang/www/cxx_dr_status.html
@@ -15918,7 +15918,7 @@ C++ defect report implementation 
status
 https://cplusplus.github.io/CWG/issues/2685.html;>2685
 C++23
 Aggregate CTAD, string, and brace elision
-Clang 17
+Unknown
   
   
 https://cplusplus.github.io/CWG/issues/2686.html;>2686



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (gbMattN)


Changes

This patch fixes a bug the current TySan implementation has. Currently if you 
access a member variable other than the first, TySan reports an error. TySan 
believes you are accessing the struct type with an offset equal to the offset 
of the member variable you are trying to access.
With this patch, the type we are trying to access is amended to the type of the 
member variable matching the offset we are accessing with. It does this if and 
only if there is a member at that offset, however, so any incorrect accesses 
are still caught. This is checked in the struct-members.c test.

---
Full diff: https://github.com/llvm/llvm-project/pull/95387.diff


2 Files Affected:

- (modified) compiler-rt/lib/tysan/tysan.cpp (+11-1) 
- (added) compiler-rt/test/tysan/struct-members.c (+32) 


``diff
diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp
index f627851d049e6..747727e48a152 100644
--- a/compiler-rt/lib/tysan/tysan.cpp
+++ b/compiler-rt/lib/tysan/tysan.cpp
@@ -221,7 +221,17 @@ __tysan_check(void *addr, int size, tysan_type_descriptor 
*td, int flags) {
 OldTDPtr -= i;
 OldTD = *OldTDPtr;
 
-if (!isAliasingLegal(td, OldTD))
+tysan_type_descriptor *InternalMember = OldTD;
+if (OldTD->Tag == TYSAN_STRUCT_TD) {
+  for (int j = 0; j < OldTD->Struct.MemberCount; j++) {
+if (OldTD->Struct.Members[j].Offset == i) {
+  InternalMember = OldTD->Struct.Members[j].Type;
+  break;
+}
+  }
+}
+
+if (!isAliasingLegal(td, InternalMember))
   reportError(addr, size, td, OldTD, AccessStr,
   "accesses part of an existing object", -i, pc, bp, sp);
 
diff --git a/compiler-rt/test/tysan/struct-members.c 
b/compiler-rt/test/tysan/struct-members.c
new file mode 100644
index 0..8cf6499f78ce6
--- /dev/null
+++ b/compiler-rt/test/tysan/struct-members.c
@@ -0,0 +1,32 @@
+// RUN: %clang_tysan -O0 %s -o %t && %run %t >%t.out 2>&1
+// RUN: FileCheck %s < %t.out
+
+#include 
+
+struct X {
+  int a, b, c;
+} x;
+
+static struct X xArray[2];
+
+int main() {
+  x.a = 1;
+  x.b = 2;
+  x.c = 3;
+
+  printf("%d %d %d\n", x.a, x.b, x.c);
+  // CHECK-NOT: ERROR: TypeSanitizer: type-aliasing-violation
+
+  for (size_t i = 0; i < 2; i++) {
+xArray[i].a = 1;
+xArray[i].b = 1;
+xArray[i].c = 1;
+  }
+  printf("Here\n");
+
+  struct X *xPtr = (struct X *)&(xArray[0].c);
+  xPtr->a = 1;
+  // CHECK: ERROR: TypeSanitizer: type-aliasing-violation
+  // CHECK: WRITE of size 4 at {{.*}} with type int (in X at offset 0) 
accesses an existing object of type int (in X at offset 8)
+  // CHECK: {{#0 0x.* in main .*struct-members.c:}}[[@LINE-3]]
+}

``




https://github.com/llvm/llvm-project/pull/95387
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits


github-actions[bot] wrote:



Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this 
page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using `@` followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from 
other developers.

If you have further questions, they may be answered by the [LLVM GitHub User 
Guide](https://llvm.org/docs/GitHub.html).

You can also ask questions in a comment on this PR, on the [LLVM 
Discord](https://discord.com/invite/xS7Z362) or on the 
[forums](https://discourse.llvm.org/).

https://github.com/llvm/llvm-project/pull/95387
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)

2024-06-13 Thread Jay Foad via llvm-branch-commits


https://github.com/jayfoad approved this pull request.


https://github.com/llvm/llvm-project/pull/95377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/95377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

We should just support these for all register types.

---
Full diff: https://github.com/llvm/llvm-project/pull/95378.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+30-42) 
- (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+9-7) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td 
b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 94dd45f1333b0..2f52edb7f917a 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_LoadIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
 
 defm : MUBUF_LoadIntrinsicPat;
 defm : MUBUF_LoadIntrinsicPat;
@@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_StoreIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
 
 defm : MUBUF_StoreIntrinsicPat;
 defm : MUBUF_StoreIntrinsicPat;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index caac7126068ef..a8efe2b2ba35e 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -586,7 +586,9 @@ class RegisterTypes reg_types> {
 
 def Reg16Types : RegisterTypes<[i16, f16, bf16]>;
 def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, 
p6]>;
-def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>;
+def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, 
v4bf16]>;
+def Reg96Types : RegisterTypes<[v3i32, v3f32]>;
+def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, 
v8bf16]>;
 
 let HasVGPR = 1 in {
 // VOP3 and VINTERP can access 256 lo and 256 hi registers.
@@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16,
   let BaseClassOrder = 1;
 }
 
-def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, 
v8f16, v8bf16], 32,
+def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32,
   (add PRIVATE_RSRC_REG)> {
   let isAllocatable = 0;
   let CopyCost = -1;
@@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16, v
   let HasSGPR = 1;
 }
 
-def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, 
v4bf16], 32,
+def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32,
 (add SGPR_64Regs)> {
   let CopyCost = 1;
   let AllocationPriority = 1;
@@ -905,8 +907,8 @@ multiclass SRegClass;
-defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], 
SGPR_128Regs, TTMP_128Regs>;
+defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>;
+defm "" : SRegClass<4, Reg128Types.types, SGPR_128Regs, TTMP_128Regs>;
 defm "" : SRegClass<5, [v5i32, v5f32], SGPR_160Regs, TTMP_160Regs>;
 defm "" : SRegClass<6, [v6i32, v6f32, v3i64, v3f64], SGPR_192Regs, 
TTMP_192Regs>;
 defm "" :

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

Make sure we test all the address spaces since this support isn't
free in gisel.

---

Patch is 38.37 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/95379.diff


3 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+19-12) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll (+596) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.ll 
(+144) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 81098201e9c0f..7a36c88b892c8 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1112,29 +1112,33 @@ unsigned 
SITargetLowering::getVectorTypeBreakdownForCallingConv(
 Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT);
 }
 
-static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrData(const SITargetLowering ,
+ const DataLayout , Type *Ty,
+ unsigned MaxNumLanes) {
   assert(MaxNumLanes != 0);
 
+  LLVMContext  = Ty->getContext();
   if (auto *VT = dyn_cast(Ty)) {
 unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements());
-return EVT::getVectorVT(Ty->getContext(),
-EVT::getEVT(VT->getElementType()),
+return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()),
 NumElts);
   }
 
-  return EVT::getEVT(Ty);
+  return TLI.getValueType(DL, Ty);
 }
 
 // Peek through TFE struct returns to only use the data size.
-static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrReturn(const SITargetLowering ,
+   const DataLayout , Type *Ty,
+   unsigned MaxNumLanes) {
   auto *ST = dyn_cast(Ty);
   if (!ST)
-return memVTFromLoadIntrData(Ty, MaxNumLanes);
+return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes);
 
   // TFE intrinsics return an aggregate type.
   assert(ST->getNumContainedTypes() == 2 &&
  ST->getContainedType(1)->isIntegerTy(32));
-  return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes);
+  return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes);
 }
 
 /// Map address space 7 to MVT::v5i32 because that's its in-memory
@@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
 }
 
-Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes);
+Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(),
+ CI.getType(), MaxNumLanes);
   } else {
-Info.memVT = memVTFromLoadIntrReturn(
-CI.getType(), std::numeric_limits::max());
+Info.memVT =
+memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(),
+std::numeric_limits::max());
   }
 
   // FIXME: What does alignment mean for an image?
@@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   if (RsrcIntr->IsImage) {
 unsigned DMask = 
cast(CI.getArgOperand(1))->getZExtValue();
 unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
-Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes);
+Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy,
+   DMaskLanes);
   } else
-Info.memVT = EVT::getEVT(DataTy);
+Info.memVT = getValueType(MF.getDataLayout(), DataTy);
 
   Info.flags |= MachineMemOperand::MOStore;
 } else {
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
index 3e3371091ef72..4d557c76dc4d0 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
@@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr 
addrspace(8) inreg %rsrc, i
   ret <2 x i64> %data
 }
 
+define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 
%voffset) {
+; PREGFX10-LABEL: buffer_load_p0__voffset_add:
+; PREGFX10:   ; %bb.0:
+; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60
+; PREGFX10-NEXT:s_waitcnt vmcnt(0)
+; PREGFX10-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: buffer_load_p0__voffset_add:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60
+; GFX10-NEXT:s_waitcnt vmcnt(0)
+;

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/95378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)

2024-06-13 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/95377.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll 
(+32-5) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 4946129c65a95..81098201e9c0f 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -874,7 +874,7 @@ SITargetLowering::SITargetLowering(const TargetMachine ,
  {MVT::Other, MVT::v2i16, MVT::v2f16, MVT::v2bf16,
   MVT::v3i16, MVT::v3f16, MVT::v4f16, MVT::v4i16,
   MVT::v4bf16, MVT::v8i16, MVT::v8f16, MVT::v8bf16,
-  MVT::f16, MVT::i16, MVT::i8, MVT::i128},
+  MVT::f16, MVT::i16, MVT::bf16, MVT::i8, MVT::i128},
  Custom);
 
   setOperationAction(ISD::STACKSAVE, MVT::Other, Custom);
@@ -9973,7 +9973,7 @@ SDValue 
SITargetLowering::handleByteShortBufferStores(SelectionDAG ,
   EVT VDataType, SDLoc DL,
   SDValue Ops[],
   MemSDNode *M) const {
-  if (VDataType == MVT::f16)
+  if (VDataType == MVT::f16 || VDataType == MVT::bf16)
 Ops[1] = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Ops[1]);
 
   SDValue BufferStoreExt = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Ops[1]);
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
index f7f3742a90633..82dd35ab4c240 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
@@ -5,11 +5,38 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefix=GFX10 
%s
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | 
FileCheck --check-prefixes=GFX11 %s
 
-; FIXME
-; define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, 
bfloat %data, i32 %offset) {
-;   call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr 
addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
-;   ret void
-; }
+define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat 
%data, i32 %offset) {
+; GFX7-LABEL: buffer_store_bf16:
+; GFX7:   ; %bb.0:
+; GFX7-NEXT:v_mul_f32_e32 v0, 1.0, v0
+; GFX7-NEXT:v_lshrrev_b32_e32 v0, 16, v0
+; GFX7-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX7-NEXT:s_endpgm
+;
+; GFX8-LABEL: buffer_store_bf16:
+; GFX8:   ; %bb.0:
+; GFX8-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX8-NEXT:s_endpgm
+;
+; GFX9-LABEL: buffer_store_bf16:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX9-NEXT:s_endpgm
+;
+; GFX10-LABEL: buffer_store_bf16:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX10-NEXT:s_endpgm
+;
+; GFX11-LABEL: buffer_store_bf16:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:buffer_store_b16 v0, v1, s[0:3], 0 offen
+; GFX11-NEXT:s_nop 0
+; GFX11-NEXT:s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX11-NEXT:s_endpgm
+  call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr 
addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
+  ret void
+}
 
 define amdgpu_ps void @buffer_store_v2bf16(ptr addrspace(8) inreg %rsrc, <2 x 
bfloat> %data, i32 %offset) {
 ; GFX7-LABEL: buffer_store_v2bf16:

``




https://github.com/llvm/llvm-project/pull/95377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-downstack-mergeability-warning;
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests;>Learn more

* **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/> 
* **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/95378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-downstack-mergeability-warning;
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests;>Learn more

* **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/> 
* **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/95377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-downstack-mergeability-warning;
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests;>Learn more

* **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/> 
* **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/95379

Make sure we test all the address spaces since this support isn't
free in gisel.

>From b05179ed684e289ce31f7aee8b57939c7bf2809c Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 12 Jun 2024 10:10:20 +0200
Subject: [PATCH] AMDGPU: Fix buffer load/store of pointers

Make sure we test all the address spaces since this support isn't
free in gisel.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  31 +-
 .../AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll | 596 ++
 .../llvm.amdgcn.raw.ptr.buffer.store.ll   | 144 +
 3 files changed, 759 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 81098201e9c0f..7a36c88b892c8 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1112,29 +1112,33 @@ unsigned 
SITargetLowering::getVectorTypeBreakdownForCallingConv(
 Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT);
 }
 
-static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrData(const SITargetLowering ,
+ const DataLayout , Type *Ty,
+ unsigned MaxNumLanes) {
   assert(MaxNumLanes != 0);
 
+  LLVMContext  = Ty->getContext();
   if (auto *VT = dyn_cast(Ty)) {
 unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements());
-return EVT::getVectorVT(Ty->getContext(),
-EVT::getEVT(VT->getElementType()),
+return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()),
 NumElts);
   }
 
-  return EVT::getEVT(Ty);
+  return TLI.getValueType(DL, Ty);
 }
 
 // Peek through TFE struct returns to only use the data size.
-static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrReturn(const SITargetLowering ,
+   const DataLayout , Type *Ty,
+   unsigned MaxNumLanes) {
   auto *ST = dyn_cast(Ty);
   if (!ST)
-return memVTFromLoadIntrData(Ty, MaxNumLanes);
+return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes);
 
   // TFE intrinsics return an aggregate type.
   assert(ST->getNumContainedTypes() == 2 &&
  ST->getContainedType(1)->isIntegerTy(32));
-  return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes);
+  return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes);
 }
 
 /// Map address space 7 to MVT::v5i32 because that's its in-memory
@@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
 }
 
-Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes);
+Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(),
+ CI.getType(), MaxNumLanes);
   } else {
-Info.memVT = memVTFromLoadIntrReturn(
-CI.getType(), std::numeric_limits::max());
+Info.memVT =
+memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(),
+std::numeric_limits::max());
   }
 
   // FIXME: What does alignment mean for an image?
@@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   if (RsrcIntr->IsImage) {
 unsigned DMask = 
cast(CI.getArgOperand(1))->getZExtValue();
 unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
-Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes);
+Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy,
+   DMaskLanes);
   } else
-Info.memVT = EVT::getEVT(DataTy);
+Info.memVT = getValueType(MF.getDataLayout(), DataTy);
 
   Info.flags |= MachineMemOperand::MOStore;
 } else {
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
index 3e3371091ef72..4d557c76dc4d0 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
@@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr 
addrspace(8) inreg %rsrc, i
   ret <2 x i64> %data
 }
 
+define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 
%voffset) {
+; PREGFX10-LABEL: buffer_load_p0__voffset_add:
+; PREGFX10:   ; %bb.0:
+; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60
+; PREGFX10-NEXT:s_waitcnt vmcnt(0)
+; PREGFX10-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: buffer_load_p0__voffset_add:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+;

[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/95378

We should just support these for all register types.

>From 46c7f8b4529827204e5273472ea5b642ecb7266e Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 9 Jun 2024 23:12:31 +0200
Subject: [PATCH] AMDGPU: Cleanup selection patterns for buffer loads

We should just support these for all register types.
---
 llvm/lib/Target/AMDGPU/BUFInstructions.td | 72 ++-
 llvm/lib/Target/AMDGPU/SIRegisterInfo.td  | 16 ++---
 2 files changed, 39 insertions(+), 49 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td 
b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 94dd45f1333b0..2f52edb7f917a 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_LoadIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
 
 defm : MUBUF_LoadIntrinsicPat;
 defm : MUBUF_LoadIntrinsicPat;
@@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_StoreIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
 
 defm : MUBUF_StoreIntrinsicPat;
 defm : MUBUF_StoreIntrinsicPat;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index caac7126068ef..a8efe2b2ba35e 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -586,7 +586,9 @@ class RegisterTypes reg_types> {
 
 def Reg16Types : RegisterTypes<[i16, f16, bf16]>;
 def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, 
p6]>;
-def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>;
+def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, 
v4bf16]>;
+def Reg96Types : RegisterTypes<[v3i32, v3f32]>;
+def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, 
v8bf16]>;
 
 let HasVGPR = 1 in {
 // VOP3 and VINTERP can access 256 lo and 256 hi registers.
@@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16,
   let BaseClassOrder = 1;
 }
 
-def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, 
v8f16, v8bf16], 32,
+def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32,
   (add PRIVATE_RSRC_REG)> {
   let isAllocatable = 0;
   let CopyCost = -1;
@@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16, v
   let HasSGPR = 1;
 }
 
-def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, 
v4bf16], 32,
+def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32,
 (add SGPR_64Regs)> {
   let CopyCost = 1;
   let AllocationPriority = 1;
@@ -905,8 +907,8 @@ multiclass SRegClass;
-defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], 
SGPR_128Regs, TTMP_128Regs>;
+defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>;
+defm "" : SRegClass<4, Reg128Types.types,

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/95377

None

>From 520d91d73339d8bea65f2e30e2a4d7fd0eb3d92b Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 9 Jun 2024 22:54:35 +0200
Subject: [PATCH] AMDGPU: Fix buffer intrinsic store of bfloat

---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  4 +-
 .../llvm.amdgcn.raw.ptr.buffer.store.bf16.ll  | 37 ---
 2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 4946129c65a95..81098201e9c0f 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -874,7 +874,7 @@ SITargetLowering::SITargetLowering(const TargetMachine ,
  {MVT::Other, MVT::v2i16, MVT::v2f16, MVT::v2bf16,
   MVT::v3i16, MVT::v3f16, MVT::v4f16, MVT::v4i16,
   MVT::v4bf16, MVT::v8i16, MVT::v8f16, MVT::v8bf16,
-  MVT::f16, MVT::i16, MVT::i8, MVT::i128},
+  MVT::f16, MVT::i16, MVT::bf16, MVT::i8, MVT::i128},
  Custom);
 
   setOperationAction(ISD::STACKSAVE, MVT::Other, Custom);
@@ -9973,7 +9973,7 @@ SDValue 
SITargetLowering::handleByteShortBufferStores(SelectionDAG ,
   EVT VDataType, SDLoc DL,
   SDValue Ops[],
   MemSDNode *M) const {
-  if (VDataType == MVT::f16)
+  if (VDataType == MVT::f16 || VDataType == MVT::bf16)
 Ops[1] = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Ops[1]);
 
   SDValue BufferStoreExt = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Ops[1]);
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
index f7f3742a90633..82dd35ab4c240 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
@@ -5,11 +5,38 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefix=GFX10 
%s
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | 
FileCheck --check-prefixes=GFX11 %s
 
-; FIXME
-; define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, 
bfloat %data, i32 %offset) {
-;   call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr 
addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
-;   ret void
-; }
+define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat 
%data, i32 %offset) {
+; GFX7-LABEL: buffer_store_bf16:
+; GFX7:   ; %bb.0:
+; GFX7-NEXT:v_mul_f32_e32 v0, 1.0, v0
+; GFX7-NEXT:v_lshrrev_b32_e32 v0, 16, v0
+; GFX7-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX7-NEXT:s_endpgm
+;
+; GFX8-LABEL: buffer_store_bf16:
+; GFX8:   ; %bb.0:
+; GFX8-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX8-NEXT:s_endpgm
+;
+; GFX9-LABEL: buffer_store_bf16:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX9-NEXT:s_endpgm
+;
+; GFX10-LABEL: buffer_store_bf16:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX10-NEXT:s_endpgm
+;
+; GFX11-LABEL: buffer_store_bf16:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:buffer_store_b16 v0, v1, s[0:3], 0 offen
+; GFX11-NEXT:s_nop 0
+; GFX11-NEXT:s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX11-NEXT:s_endpgm
+  call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr 
addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
+  ret void
+}
 
 define amdgpu_ps void @buffer_store_v2bf16(ptr addrspace(8) inreg %rsrc, <2 x 
bfloat> %data, i32 %offset) {
 ; GFX7-LABEL: buffer_store_v2bf16:

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

65 matches

Mail list logo