[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: ```$ $ rg HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY libc/utils/gpu/loader/amdgpu/Loader.cpp 521: HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY), openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h 74: HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY = 0xA016, openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp 1892:if (auto Err = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY, ``` The `openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp` file requires the `HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY` symbol. This symbol is expected to be provided by `openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h`, not by third-party external `/opt/rocm/include/hsa/hsa_ext_amd.h` The code in `release/17.x` and `release/18.x` is explictely looking for `ROCm`'s `hsa/_ext_amd.h` and never look for LLVM `dynamic_hsa/hsa_ext_amd.h`. It tries to look for LLVM-provided `hsa_ext_amd.h` as a fallback but because of a mistake in `CMakeLists.txt`, this doesn't work in all cases because `dynamic_hsa` is not added to include directories in all cases. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: > We made a change recently that made the dynamic_hsa version the default. The > error you're seeing is from an old HSA, so if you're overriding the default > to use an old library that's probably not worth working around. The error I see comes from the fact there is no old HSA around to workaround an LLVM bug. There is no `hsa/hsa.h` in the tree, the default `dynamic_hsa` is not used. The `hsa/hsa.h` file is from ROCm, not from LLVM. Without such patch, LLVM requires ROCm to be installed and configured to be in default includes for `src/rtl.cpp` to build if `hsa.cpp` is not built. This patch is to make LLVM use `dynamic_hsa` for building `src/rtl.cpp` because it is the default. This patch is needed to build both `release/17.x` and `release/18.x`, the `main` branch changed the code layout so the patch will not work. I assume a full LLVM build will not trigger the build problem because something else will include `dynamic_hsa` and will make it findable by `src/rtl.cpp` by luck. But when building a not-full LLVM, just what's needed by some applications, `dynamic_hsa` is not added to the include directories while being required by `src/rtl.cpp`. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc] 4076c30 - [libc] more fix
Author: Schrodinger ZHU Yifan Date: 2024-06-13T20:22:21-07:00 New Revision: 4076c3004f09e95d1fcd299452843f99235ff422 URL: https://github.com/llvm/llvm-project/commit/4076c3004f09e95d1fcd299452843f99235ff422 DIFF: https://github.com/llvm/llvm-project/commit/4076c3004f09e95d1fcd299452843f99235ff422.diff LOG: [libc] more fix Added: Modified: libc/cmake/modules/LLVMLibCTestRules.cmake libc/test/IntegrationTest/CMakeLists.txt libc/test/IntegrationTest/test.cpp libc/test/UnitTest/CMakeLists.txt libc/test/UnitTest/HermeticTestUtils.cpp Removed: diff --git a/libc/cmake/modules/LLVMLibCTestRules.cmake b/libc/cmake/modules/LLVMLibCTestRules.cmake index eb6be91b55e26..c8d7c8a2b1c7c 100644 --- a/libc/cmake/modules/LLVMLibCTestRules.cmake +++ b/libc/cmake/modules/LLVMLibCTestRules.cmake @@ -686,6 +686,15 @@ function(add_libc_hermetic_test test_name) LibcTest.hermetic libc.test.UnitTest.ErrnoSetterMatcher ${fq_deps_list}) + # TODO: currently the dependency chain is broken such that getauxval cannot properly + # propagate to hermetic tests. This is a temporary workaround. + if (LIBC_TARGET_ARCHITECTURE_IS_AARCH64) +target_link_libraries( + ${fq_build_target_name} + PRIVATE +libc.src.sys.auxv.getauxval +) + endif() # Tests on the GPU require an external loader utility to launch the kernel. if(TARGET libc.utils.gpu.loader) diff --git a/libc/test/IntegrationTest/CMakeLists.txt b/libc/test/IntegrationTest/CMakeLists.txt index 4f31f10b29f0b..4a999407d48d7 100644 --- a/libc/test/IntegrationTest/CMakeLists.txt +++ b/libc/test/IntegrationTest/CMakeLists.txt @@ -1,3 +1,7 @@ +set(arch_specific_deps) +if(LIBC_TARGET_ARCHITECTURE_IS_AARCH64) + set(arch_specific_deps libc.src.sys.auxv.getauxval) +endif() add_object_library( test SRCS @@ -8,4 +12,5 @@ add_object_library( test.h DEPENDS libc.src.__support.OSUtil.osutil +${arch_specific_deps} ) diff --git a/libc/test/IntegrationTest/test.cpp b/libc/test/IntegrationTest/test.cpp index 27e7f29efa0f1..a8b2f2911fd8e 100644 --- a/libc/test/IntegrationTest/test.cpp +++ b/libc/test/IntegrationTest/test.cpp @@ -6,6 +6,8 @@ // //===--===// +#include "src/__support/common.h" +#include "src/sys/auxv/getauxval.h" #include #include @@ -80,9 +82,11 @@ void *realloc(void *ptr, size_t s) { // __dso_handle when -nostdlib is used. void *__dso_handle = nullptr; -// On some platform (aarch64 fedora tested) full build integration test -// objects need to link against libgcc, which may expect a __getauxval -// function. For now, it is fine to provide a weak definition that always -// returns false. -[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; } +#ifdef LIBC_TARGET_ARCH_IS_AARCH64 +// Due to historical reasons, libgcc on aarch64 may expect __getauxval to be +// defined. See also https://gcc.gnu.org/pipermail/gcc-cvs/2020-June/300635.html +unsigned long __getauxval(unsigned long id) { + return LIBC_NAMESPACE::getauxval(id); +} +#endif } // extern "C" diff --git a/libc/test/UnitTest/CMakeLists.txt b/libc/test/UnitTest/CMakeLists.txt index 302af3044ca3d..4adc2f5c725f7 100644 --- a/libc/test/UnitTest/CMakeLists.txt +++ b/libc/test/UnitTest/CMakeLists.txt @@ -41,7 +41,7 @@ function(add_unittest_framework_library name) target_compile_options(${name}.hermetic PRIVATE ${compile_options}) if(TEST_LIB_DEPENDS) -foreach(dep IN LISTS ${TEST_LIB_DEPENDS}) +foreach(dep IN ITEMS ${TEST_LIB_DEPENDS}) if(TARGET ${dep}.unit) add_dependencies(${name}.unit ${dep}.unit) else() diff --git a/libc/test/UnitTest/HermeticTestUtils.cpp b/libc/test/UnitTest/HermeticTestUtils.cpp index 349c182ff2379..6e815e6c8aab0 100644 --- a/libc/test/UnitTest/HermeticTestUtils.cpp +++ b/libc/test/UnitTest/HermeticTestUtils.cpp @@ -6,6 +6,8 @@ // //===--===// +#include "src/__support/common.h" +#include "src/sys/auxv/getauxval.h" #include #include @@ -19,6 +21,12 @@ void *memmove(void *dst, const void *src, size_t count); void *memset(void *ptr, int value, size_t count); int atexit(void (*func)(void)); +// TODO: It seems that some old test frameworks does not use +// add_libc_hermetic_test properly. Such that they won't get correct linkage +// against the object containing this function. We create a dummy function that +// always returns 0 to indicate a failure. +[[gnu::weak]] unsigned long getauxval(unsigned long id) { return 0; } + } // namespace LIBC_NAMESPACE namespace { @@ -102,6 +110,14 @@ void __cxa_pure_virtual() { // __dso_handle when -nostdlib is used. void *__dso_handle = nullptr; +#ifdef LIBC_TARGET_ARCH_IS_AARCH64 +// Due to historical reasons,
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
https://github.com/jhuber6 commented: We made a change recently that made the dynamic_hsa version the default. The error you're seeing is from an old HSA, so if you're overriding the default to use an old library that's probably not worth working around. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)
@@ -354,6 +354,23 @@ Given that ``signedPointer`` matches the layout for signed pointers signed with the given key, extract the raw pointer from it. This operation does not trap and cannot fail, even if the pointer is not validly signed. +``ptrauth_sign_constant`` +^ + +.. code-block:: c + + ptrauth_sign_constant(pointer, key, discriminator) + +Return a signed pointer for a constant address in a manner which guarantees +a non-attackable sequence. + +``pointer`` must be a constant expression of pointer type which evaluates to +a non-null pointer. The result will have the same type as ``discriminator``. + +Calls to this are constant expressions if the discriminator is a null-pointer +constant expression or an integer constant expression. Implementations may +allow other pointer expressions as well. ahmedbougacha wrote: Yeah, I agree today this could simply be "it's always a constant expression"; I'll rewrite it (cc @rjmccall if this looks like anything to you) https://github.com/llvm/llvm-project/pull/93904 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)
@@ -354,6 +354,23 @@ Given that ``signedPointer`` matches the layout for signed pointers signed with the given key, extract the raw pointer from it. This operation does not trap and cannot fail, even if the pointer is not validly signed. +``ptrauth_sign_constant`` +^ + +.. code-block:: c + + ptrauth_sign_constant(pointer, key, discriminator) + +Return a signed pointer for a constant address in a manner which guarantees +a non-attackable sequence. ahmedbougacha wrote: Later additions to this document describe that in depth, you can look for > [clang][docs] Document the ptrauth security model. on my branch https://github.com/llvm/llvm-project/pull/93904 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)
@@ -58,6 +58,35 @@ void test_string_discriminator(const char *str) { } +void test_sign_constant(int *dp, int (*fp)(int)) { + __builtin_ptrauth_sign_constant(, VALID_DATA_KEY); // expected-error {{too few arguments}} + __builtin_ptrauth_sign_constant(, VALID_DATA_KEY, , ); // expected-error {{too many arguments}} + + __builtin_ptrauth_sign_constant(mismatched_type, VALID_DATA_KEY, 0); // expected-error {{signed value must have pointer type; type here is 'struct A'}} + __builtin_ptrauth_sign_constant(, mismatched_type, 0); // expected-error {{passing 'struct A' to parameter of incompatible type 'int'}} + __builtin_ptrauth_sign_constant(, VALID_DATA_KEY, mismatched_type); // expected-error {{extra discriminator must have pointer or integer type; type here is 'struct A'}} + + (void) __builtin_ptrauth_sign_constant(NULL, VALID_DATA_KEY, ); // expected-error {{argument to ptrauth_sign_constant must refer to a global variable or function}} ahmedbougacha wrote: We could special-case null pointers, but they're already covered by the diagnostic, which asks for global variables or functions – which NULL isn't. For auth/sign, we don't have that sort of constraint on the pointer: it really is NULL and NULL alone that's special. Now, the more interesting question is whether we should allow null pointers at all here. Since defining these original builtins we have taught the qualifier to have a mode that signs/authenticates null, for some specific use-cases where replacing a signed value with NULL (which is otherwise never signed or authenticated) would bypass signing in a problematic way. We haven't had the chance or need to revisit the builtins to allow sign/auth of NULL, but it's reasonable to add that support in the future. We'd have to consider how to expose that in the builtins, because it's probably still something that's almost always a mistake; more builtins would be an easy solution but maybe not a sophisticated one. https://github.com/llvm/llvm-project/pull/93904 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)
@@ -2061,6 +2071,58 @@ ConstantLValueEmitter::VisitCallExpr(const CallExpr *E) { } } +ConstantLValue +ConstantLValueEmitter::emitPointerAuthSignConstant(const CallExpr *E) { + llvm::Constant *UnsignedPointer = emitPointerAuthPointer(E->getArg(0)); + unsigned Key = emitPointerAuthKey(E->getArg(1)); + llvm::Constant *StorageAddress; + llvm::Constant *OtherDiscriminator; + std::tie(StorageAddress, OtherDiscriminator) = ahmedbougacha wrote: Yeah, this simply predates structured bindings; we can indeed use them now. https://github.com/llvm/llvm-project/pull/93904 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: @pranav-sivaraman try this patch: ```diff diff --git a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt index 92523c23f68b..92bcd94edb7a 100644 --- a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt +++ b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt @@ -56,13 +56,14 @@ include_directories( set(LIBOMPTARGET_DLOPEN_LIBHSA OFF) option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" ${LIBOMPTARGET_DLOPEN_LIBHSA}) +include_directories(dynamic_hsa) + if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA) libomptarget_say("Building AMDGPU plugin linked against libhsa") set(LIBOMPTARGET_EXTRA_SOURCE) set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64) else() libomptarget_say("Building AMDGPU plugin for dlopened libhsa") - include_directories(dynamic_hsa) set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp) set(LIBOMPTARGET_DEP_LIBRARIES) endif() ``` I haven't tested it, but maybe the mistake is similar. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: The 14 branch seems to be very old, espially the file you link is in `plugins/` directory, while the files I modify are in `plugins-nextgen/` directory, witht the `plugins/` directory not existing anymore. So I strongly doubt the patch is useful for LLVM 14, but your problem probably needs another but similar solution. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)
llvmbot wrote: @llvm/pr-subscribers-llvm-transforms Author: shaw young (shawbyoung) Changes Test Plan: tbd --- Full diff: https://github.com/llvm/llvm-project/pull/95486.diff 2 Files Affected: - (modified) bolt/lib/Profile/StaleProfileMatching.cpp (+29-8) - (modified) llvm/include/llvm/Transforms/Utils/SampleProfileInference.h (-2) ``diff diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp b/bolt/lib/Profile/StaleProfileMatching.cpp index 6588cf2c0ce66..cbd98f4d4769f 100644 --- a/bolt/lib/Profile/StaleProfileMatching.cpp +++ b/bolt/lib/Profile/StaleProfileMatching.cpp @@ -53,9 +53,9 @@ cl::opt cl::opt MatchedProfileThreshold( "matched-profile-threshold", -cl::desc("Percentage threshold of matched execution counts at which stale " +cl::desc("Percentage threshold of matched basic blocks at which stale " "profile inference is executed."), -cl::init(5), cl::Hidden, cl::cat(BoltOptCategory)); +cl::init(50), cl::Hidden, cl::cat(BoltOptCategory)); cl::opt StaleMatchingMaxFuncSize( "stale-matching-max-func-size", @@ -186,6 +186,17 @@ struct BlendedBlockHash { uint8_t SuccHash{0}; }; +/// A data object containing function matching information. +struct FunctionMatchingData { +public: + /// The number of blocks matched exactly. + uint64_t MatchedExactBlocks{0}; + /// The number of blocks matched loosely. + uint64_t MatchedLooseBlocks{0}; + /// The number of execution counts matched. + uint64_t MatchedExecCounts{0}; +}; + /// The object is used to identify and match basic blocks in a BinaryFunction /// given their hashes computed on a binary built from several revisions behind /// release. @@ -400,7 +411,8 @@ createFlowFunction(const BinaryFunction::BasicBlockOrderType ) { void matchWeightsByHashes(BinaryContext , const BinaryFunction::BasicBlockOrderType , const yaml::bolt::BinaryFunctionProfile , - FlowFunction ) { + FlowFunction , + FunctionMatchingData ) { assert(Func.Blocks.size() == BlockOrder.size() + 1); std::vector Blocks; @@ -440,9 +452,11 @@ void matchWeightsByHashes(BinaryContext , if (Matcher.isHighConfidenceMatch(BinHash, YamlHash)) { ++BC.Stats.NumMatchedBlocks; BC.Stats.MatchedSampleCount += YamlBB.ExecCount; -Func.MatchedExecCount += YamlBB.ExecCount; +FuncMatchingData.MatchedExecCounts += YamlBB.ExecCount; +FuncMatchingData.MatchedExactBlocks += 1; LLVM_DEBUG(dbgs() << " exact match\n"); } else { +FuncMatchingData.MatchedLooseBlocks += 1; LLVM_DEBUG(dbgs() << " loose match\n"); } if (YamlBB.NumInstructions == BB->size()) @@ -582,11 +596,14 @@ void preprocessUnreachableBlocks(FlowFunction ) { /// Decide if stale profile matching can be applied for a given function. /// Currently we skip inference for (very) large instances and for instances /// having "unexpected" control flow (e.g., having no sink basic blocks). -bool canApplyInference(const FlowFunction , const yaml::bolt::BinaryFunctionProfile ) { +bool canApplyInference(const FlowFunction , + const yaml::bolt::BinaryFunctionProfile , + const FunctionMatchingData ) { if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize) return false; - if (Func.MatchedExecCount / YamlBF.ExecCount >= opts::MatchedProfileThreshold) + if ((double)FuncMatchingData.MatchedExactBlocks / YamlBF.Blocks.size() >= + opts::MatchedProfileThreshold / 100.0) return false; bool HasExitBlocks = llvm::any_of( @@ -735,18 +752,22 @@ bool YAMLProfileReader::inferStaleProfile( const BinaryFunction::BasicBlockOrderType BlockOrder( BF.getLayout().block_begin(), BF.getLayout().block_end()); + // Create a containter for function matching data. + FunctionMatchingData FuncMatchingData; + // Create a wrapper flow function to use with the profile inference algorithm. FlowFunction Func = createFlowFunction(BlockOrder); // Match as many block/jump counts from the stale profile as possible - matchWeightsByHashes(BF.getBinaryContext(), BlockOrder, YamlBF, Func); + matchWeightsByHashes(BF.getBinaryContext(), BlockOrder, YamlBF, Func, + FuncMatchingData); // Adjust the flow function by marking unreachable blocks Unlikely so that // they don't get any counts assigned. preprocessUnreachableBlocks(Func); // Check if profile inference can be applied for the instance. - if (!canApplyInference(Func, YamlBF)) + if (!canApplyInference(Func, YamlBF, FuncMatchingData)) return false; // Apply the profile inference algorithm. diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h index e7971ca1cb428..b4ea1ad840f9d 100644 ---
[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)
https://github.com/shawbyoung closed https://github.com/llvm/llvm-project/pull/95486 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)
https://github.com/shawbyoung created https://github.com/llvm/llvm-project/pull/95486 Test Plan: tbd ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
pranav-sivaraman wrote: This is different from this [file](https://github.com/llvm/llvm-project/blob/release/14.x/openmp/libomptarget/plugins/amdgpu/impl/hsa_api.h) right? I'm trying to fix an issue when building LLVM 14 with a newer ROCm releases which fails to find the newer `hsa/hsa.h` headers. Not sure if I need to extend the patch to include these changes as well. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: I first noticed the issue when building the chipStar fork of LLVM 17: https://github.com/CHIP-SPV/llvm-project (branch `chipStar-llvm-17`), but the code being the same in LLVM 18, it is expected to fail in LLVM 18 too. The whole folder disappeared in `main` so I made this patch to target the most recent release branch having those files: LLVM18. It would be good to backport it to LLVM 17 too. I haven't checked it yet if versions older than LLVM 17 are affected. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Thomas Debesse (illwieckz) Changes The `dynamic_hsa/` include directory is required by both optional `dynamic_hsa/hsa.cpp` and non-optional `src/rtl.cpp`. It should then always be included or the build will fail if only `src/rtl.cpp` is built. This also simplifies the way header files from `dynamic_hsa/` are included in `src/rtl.cpp`. Fixes: ``` error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope ``` --- Full diff: https://github.com/llvm/llvm-project/pull/95484.diff 2 Files Affected: - (modified) openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt (+3-1) - (modified) openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp (-10) ``diff diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt index 68ce63467a6c8..42cc560c79112 100644 --- a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt +++ b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt @@ -38,13 +38,15 @@ add_definitions(-DDEBUG_PREFIX="TARGET AMDGPU RTL") set(LIBOMPTARGET_DLOPEN_LIBHSA OFF) option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" ${LIBOMPTARGET_DLOPEN_LIBHSA}) +# Required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp. +include_directories(dynamic_hsa) + if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA) libomptarget_say("Building AMDGPU NextGen plugin linked against libhsa") set(LIBOMPTARGET_EXTRA_SOURCE) set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64) else() libomptarget_say("Building AMDGPU NextGen plugin for dlopened libhsa") - include_directories(dynamic_hsa) set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp) set(LIBOMPTARGET_DEP_LIBRARIES) endif() diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp index 81634ae1edc49..8cedc72d5f63c 100644 --- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp +++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp @@ -56,18 +56,8 @@ #define BIGENDIAN_CPU #endif -#if defined(__has_include) -#if __has_include("hsa/hsa.h") -#include "hsa/hsa.h" -#include "hsa/hsa_ext_amd.h" -#elif __has_include("hsa.h") #include "hsa.h" #include "hsa_ext_amd.h" -#endif -#else -#include "hsa/hsa.h" -#include "hsa/hsa_ext_amd.h" -#endif namespace llvm { namespace omp { `` https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
https://github.com/illwieckz created https://github.com/llvm/llvm-project/pull/95484 The `dynamic_hsa/` include directory is required by both optional `dynamic_hsa/hsa.cpp` and non-optional `src/rtl.cpp`. It should then always be included or the build will fail if only `src/rtl.cpp` is built. This also simplifies the way header files from `dynamic_hsa/` are included in `src/rtl.cpp`. Fixes: ``` error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope ``` >From e84e8bdef6d902d51a72eb93f7ca9812f0467c72 Mon Sep 17 00:00:00 2001 From: Thomas Debesse Date: Fri, 14 Jun 2024 00:38:25 +0200 Subject: [PATCH] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The dynamic_hsa/ include directory is required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp. It should then always be included or the build will fail if only src/rtl.cpp is built. This also simplifies the way header files from dynamic_hsa/ are included in src/rtl.cpp. Fixes: error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope --- .../libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt | 4 +++- openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp | 10 -- 2 files changed, 3 insertions(+), 11 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt index 68ce63467a6c8..42cc560c79112 100644 --- a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt +++ b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt @@ -38,13 +38,15 @@ add_definitions(-DDEBUG_PREFIX="TARGET AMDGPU RTL") set(LIBOMPTARGET_DLOPEN_LIBHSA OFF) option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" ${LIBOMPTARGET_DLOPEN_LIBHSA}) +# Required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp. +include_directories(dynamic_hsa) + if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA) libomptarget_say("Building AMDGPU NextGen plugin linked against libhsa") set(LIBOMPTARGET_EXTRA_SOURCE) set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64) else() libomptarget_say("Building AMDGPU NextGen plugin for dlopened libhsa") - include_directories(dynamic_hsa) set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp) set(LIBOMPTARGET_DEP_LIBRARIES) endif() diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp index 81634ae1edc49..8cedc72d5f63c 100644 --- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp +++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp @@ -56,18 +56,8 @@ #define BIGENDIAN_CPU #endif -#if defined(__has_include) -#if __has_include("hsa/hsa.h") -#include "hsa/hsa.h" -#include "hsa/hsa_ext_amd.h" -#elif __has_include("hsa.h") #include "hsa.h" #include "hsa_ext_amd.h" -#endif -#else -#include "hsa/hsa.h" -#include "hsa/hsa_ext_amd.h" -#endif namespace llvm { namespace omp { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)
https://github.com/ahmedbougacha updated https://github.com/llvm/llvm-project/pull/94394 >From 1e9a3fde97d907c3cd6be33db91d1c18c7236ffb Mon Sep 17 00:00:00 2001 From: Ahmed Bougacha Date: Tue, 4 Jun 2024 12:41:47 -0700 Subject: [PATCH 1/7] [Support] Reformat SipHash.cpp to match libSupport. While there, give it our usual file header and an acknowledgement, and remove the imported README.md.SipHash. --- llvm/lib/Support/README.md.SipHash | 126 -- llvm/lib/Support/SipHash.cpp | 264 ++--- 2 files changed, 129 insertions(+), 261 deletions(-) delete mode 100644 llvm/lib/Support/README.md.SipHash diff --git a/llvm/lib/Support/README.md.SipHash b/llvm/lib/Support/README.md.SipHash deleted file mode 100644 index 4de3cd1854681..0 --- a/llvm/lib/Support/README.md.SipHash +++ /dev/null @@ -1,126 +0,0 @@ -# SipHash - -[![License: -CC0-1.0](https://licensebuttons.net/l/zero/1.0/80x15.png)](http://creativecommons.org/publicdomain/zero/1.0/) - -[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) - - -SipHash is a family of pseudorandom functions (PRFs) optimized for speed on short messages. -This is the reference C code of SipHash: portable, simple, optimized for clarity and debugging. - -SipHash was designed in 2012 by [Jean-Philippe Aumasson](https://aumasson.jp) -and [Daniel J. Bernstein](https://cr.yp.to) as a defense against [hash-flooding -DoS attacks](https://aumasson.jp/siphash/siphashdos_29c3_slides.pdf). - -SipHash is: - -* *Simpler and faster* on short messages than previous cryptographic -algorithms, such as MACs based on universal hashing. - -* *Competitive in performance* with insecure non-cryptographic algorithms, such as [fhhash](https://github.com/cbreeden/fxhash). - -* *Cryptographically secure*, with no sign of weakness despite multiple [cryptanalysis](https://eprint.iacr.org/2019/865) [projects](https://eprint.iacr.org/2019/865) by leading cryptographers. - -* *Battle-tested*, with successful integration in OSs (Linux kernel, OpenBSD, -FreeBSD, FreeRTOS), languages (Perl, Python, Ruby, etc.), libraries (OpenSSL libcrypto, -Sodium, etc.) and applications (Wireguard, Redis, etc.). - -As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can also be used as a secure message authentication code (MAC). -But SipHash is *not a hash* in the sense of general-purpose key-less hash function such as BLAKE3 or SHA-3. -SipHash should therefore always be used with a secret key in order to be secure. - - -## Variants - -The default SipHash is *SipHash-2-4*: it takes a 128-bit key, does 2 compression -rounds, 4 finalization rounds, and returns a 64-bit tag. - -Variants can use a different number of rounds. For example, we proposed *SipHash-4-8* as a conservative version. - -The following versions are not described in the paper but were designed and analyzed to fulfill applications' needs: - -* *SipHash-128* returns a 128-bit tag instead of 64-bit. Versions with specified number of rounds are SipHash-2-4-128, SipHash4-8-128, and so on. - -* *HalfSipHash* works with 32-bit words instead of 64-bit, takes a 64-bit key, -and returns 32-bit or 64-bit tags. For example, HalfSipHash-2-4-32 has 2 -compression rounds, 4 finalization rounds, and returns a 32-bit tag. - - -## Security - -(Half)SipHash-*c*-*d* with *c* ≥ 2 and *d* ≥ 4 is expected to provide the maximum PRF -security for any function with the same key and output size. - -The standard PRF security goal allow the attacker access to the output of SipHash on messages chosen adaptively by the attacker. - -Security is limited by the key size (128 bits for SipHash), such that -attackers searching 2*s* keys have chance 2*s*−128 of finding -the SipHash key. -Security is also limited by the output size. In particular, when -SipHash is used as a MAC, an attacker who blindly tries 2*s* tags will -succeed with probability 2*s*-*t*, if *t* is that tag's bit size. - - -## Research - -* [Research paper](https://www.aumasson.jp/siphash/siphash.pdf) "SipHash: a fast short-input PRF" (accepted at INDOCRYPT 2012) -* [Slides](https://cr.yp.to/talks/2012.12.12/slides.pdf) of the presentation of SipHash at INDOCRYPT 2012 (Bernstein) -* [Slides](https://www.aumasson.jp/siphash/siphash_slides.pdf) of the presentation of SipHash at the DIAC workshop (Aumasson) - - -## Usage - -Running - -```sh - make -``` - -will build tests for - -* SipHash-2-4-64 -* SipHash-2-4-128 -* HalfSipHash-2-4-32 -* HalfSipHash-2-4-64 - - -```C - ./test -``` - -verifies 64 test vectors, and - -```C - ./debug -``` - -does the same and prints intermediate values. - -The code can be adapted to implement SipHash-*c*-*d*, the version of SipHash -with *c* compression rounds and *d* finalization rounds, by defining `cROUNDS` -or `dROUNDS` when compiling. This can be done with `-D` command line arguments -to many compilers such as below. - -```sh -gcc -Wall
[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)
ahmedbougacha wrote: [37c84b9](https://github.com/llvm/llvm-project/pull/94394/commits/37c84b9dce70f40db8a7c27b7de8232c4d10f78f) shows what I had in mind, let me know what you all think. I added: ``` void getSipHash_2_4_64(const uint8_t *In, uint64_t InLen, const uint8_t ()[16], uint8_t ()[8]); void getSipHash_2_4_128(const uint8_t *In, uint64_t InLen, const uint8_t ()[16], uint8_t ()[16]); ``` as the core interfaces, and mimicked the ref. test harness to reuse the same test vectors. If this seems reasonable to yall I'm happy to extract the vectors.h file from the ref. implementation into the "Import original sources" PR – that's why I kept it open ;) https://github.com/llvm/llvm-project/pull/94394 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)
https://github.com/ahmedbougacha updated https://github.com/llvm/llvm-project/pull/94394 >From 1e9a3fde97d907c3cd6be33db91d1c18c7236ffb Mon Sep 17 00:00:00 2001 From: Ahmed Bougacha Date: Tue, 4 Jun 2024 12:41:47 -0700 Subject: [PATCH 1/6] [Support] Reformat SipHash.cpp to match libSupport. While there, give it our usual file header and an acknowledgement, and remove the imported README.md.SipHash. --- llvm/lib/Support/README.md.SipHash | 126 -- llvm/lib/Support/SipHash.cpp | 264 ++--- 2 files changed, 129 insertions(+), 261 deletions(-) delete mode 100644 llvm/lib/Support/README.md.SipHash diff --git a/llvm/lib/Support/README.md.SipHash b/llvm/lib/Support/README.md.SipHash deleted file mode 100644 index 4de3cd1854681..0 --- a/llvm/lib/Support/README.md.SipHash +++ /dev/null @@ -1,126 +0,0 @@ -# SipHash - -[![License: -CC0-1.0](https://licensebuttons.net/l/zero/1.0/80x15.png)](http://creativecommons.org/publicdomain/zero/1.0/) - -[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) - - -SipHash is a family of pseudorandom functions (PRFs) optimized for speed on short messages. -This is the reference C code of SipHash: portable, simple, optimized for clarity and debugging. - -SipHash was designed in 2012 by [Jean-Philippe Aumasson](https://aumasson.jp) -and [Daniel J. Bernstein](https://cr.yp.to) as a defense against [hash-flooding -DoS attacks](https://aumasson.jp/siphash/siphashdos_29c3_slides.pdf). - -SipHash is: - -* *Simpler and faster* on short messages than previous cryptographic -algorithms, such as MACs based on universal hashing. - -* *Competitive in performance* with insecure non-cryptographic algorithms, such as [fhhash](https://github.com/cbreeden/fxhash). - -* *Cryptographically secure*, with no sign of weakness despite multiple [cryptanalysis](https://eprint.iacr.org/2019/865) [projects](https://eprint.iacr.org/2019/865) by leading cryptographers. - -* *Battle-tested*, with successful integration in OSs (Linux kernel, OpenBSD, -FreeBSD, FreeRTOS), languages (Perl, Python, Ruby, etc.), libraries (OpenSSL libcrypto, -Sodium, etc.) and applications (Wireguard, Redis, etc.). - -As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can also be used as a secure message authentication code (MAC). -But SipHash is *not a hash* in the sense of general-purpose key-less hash function such as BLAKE3 or SHA-3. -SipHash should therefore always be used with a secret key in order to be secure. - - -## Variants - -The default SipHash is *SipHash-2-4*: it takes a 128-bit key, does 2 compression -rounds, 4 finalization rounds, and returns a 64-bit tag. - -Variants can use a different number of rounds. For example, we proposed *SipHash-4-8* as a conservative version. - -The following versions are not described in the paper but were designed and analyzed to fulfill applications' needs: - -* *SipHash-128* returns a 128-bit tag instead of 64-bit. Versions with specified number of rounds are SipHash-2-4-128, SipHash4-8-128, and so on. - -* *HalfSipHash* works with 32-bit words instead of 64-bit, takes a 64-bit key, -and returns 32-bit or 64-bit tags. For example, HalfSipHash-2-4-32 has 2 -compression rounds, 4 finalization rounds, and returns a 32-bit tag. - - -## Security - -(Half)SipHash-*c*-*d* with *c* ≥ 2 and *d* ≥ 4 is expected to provide the maximum PRF -security for any function with the same key and output size. - -The standard PRF security goal allow the attacker access to the output of SipHash on messages chosen adaptively by the attacker. - -Security is limited by the key size (128 bits for SipHash), such that -attackers searching 2*s* keys have chance 2*s*−128 of finding -the SipHash key. -Security is also limited by the output size. In particular, when -SipHash is used as a MAC, an attacker who blindly tries 2*s* tags will -succeed with probability 2*s*-*t*, if *t* is that tag's bit size. - - -## Research - -* [Research paper](https://www.aumasson.jp/siphash/siphash.pdf) "SipHash: a fast short-input PRF" (accepted at INDOCRYPT 2012) -* [Slides](https://cr.yp.to/talks/2012.12.12/slides.pdf) of the presentation of SipHash at INDOCRYPT 2012 (Bernstein) -* [Slides](https://www.aumasson.jp/siphash/siphash_slides.pdf) of the presentation of SipHash at the DIAC workshop (Aumasson) - - -## Usage - -Running - -```sh - make -``` - -will build tests for - -* SipHash-2-4-64 -* SipHash-2-4-128 -* HalfSipHash-2-4-32 -* HalfSipHash-2-4-64 - - -```C - ./test -``` - -verifies 64 test vectors, and - -```C - ./debug -``` - -does the same and prints intermediate values. - -The code can be adapted to implement SipHash-*c*-*d*, the version of SipHash -with *c* compression rounds and *d* finalization rounds, by defining `cROUNDS` -or `dROUNDS` when compiling. This can be done with `-D` command line arguments -to many compilers such as below. - -```sh -gcc -Wall
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
krzysz00 wrote: On the other hand, it's a lot easier to handle ugly types down in instruction selection, where you get to play much more fast and loose with types. And there are buffer uses that don't fit into the fat pointer use use case where we'd still want them to work. For example, both `str uct.ptr.bufferload.v6f16` and `struct.ptr.buffer.load.v3f32` should be a `buffer_load_dwordx3`, but I'm pretty sure 6 x half isn't a register type. The load and store intrinsics are already overloaded to handle various {8, 16, ..., 128}-bit types, and it seems much cleaner to let it support any type of those lengths. It's just a load/store with somewhat weird indexing semantics, is all. And then, since we're there ... `load i256, ptr addrspace(1) %p` legalizes to multiple instructions, and `{raw,struct}.ptr.buffer.load(ptr addrspace(8) %p, i32 %offset, ...)` should too. It's just a load, after all. https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)
@@ -5745,6 +5745,14 @@ IntrinsicLibrary::genReduce(mlir::Type resultType, int rank = arrayTmp.rank(); assert(rank >= 1); + // Arguements to the reduction operation are passed by reference or value? + bool argByRef = true; + if (auto embox = + mlir::dyn_cast_or_null(operation.getDefiningOp())) { clementval wrote: > Does REDUCE works with dummy procedure and procedure pointers? If so it would > be good to add tests for those cases to ensure the pattern matching here > works with them. I'll check if this is supported and add proper test if it is. https://github.com/llvm/llvm-project/pull/95353 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [llvm] release/18.x: [lld] Fix -ObjC load behavior with LTO (#92162) (PR #92478)
https://github.com/AtariDreams reopened https://github.com/llvm/llvm-project/pull/92478 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)
llvmbot wrote: @uweigand What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/95463 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/95463 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/95463 Backport 7e4c6e98fa05f5c3bf14f96365ae74a8d12c6257 Requested by: @nikic >From 016c200faf4bcf1a531dabd4411a2ec4d0a23068 Mon Sep 17 00:00:00 2001 From: Jonas Paulsson Date: Mon, 15 Apr 2024 16:32:14 +0200 Subject: [PATCH] [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) For the intrinsic s390_vperm, all of the elements are demanded, so use an APInt with the value of '-1' for them (not '1'). Fixes https://github.com/llvm/llvm-project/issues/88397 (cherry picked from commit 7e4c6e98fa05f5c3bf14f96365ae74a8d12c6257) --- .../Target/SystemZ/SystemZISelLowering.cpp| 2 +- .../SystemZ/knownbits-intrinsics-binop.ll | 19 +++ 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp index 5e0b0594b0a42..3a297238c2088 100644 --- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp +++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp @@ -7774,7 +7774,7 @@ static APInt getDemandedSrcElements(SDValue Op, const APInt , break; } case Intrinsic::s390_vperm: - SrcDemE = APInt(NumElts, 1); + SrcDemE = APInt(NumElts, -1); break; default: llvm_unreachable("Unhandled intrinsic."); diff --git a/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll b/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll index 3bcbbb45581f9..b855d01934782 100644 --- a/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll +++ b/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll @@ -458,3 +458,22 @@ define <16 x i8> @f30() { i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1> ret <16 x i8> %res } + +; Test VPERM with various constant operands. +define i32 @f31() { +; CHECK-LABEL: f31: +; CHECK-LABEL: # %bb.0: +; CHECK-NEXT: larl %r1, .LCPI31_0 +; CHECK-NEXT: vl %v0, 0(%r1), 3 +; CHECK-NEXT: larl %r1, .LCPI31_1 +; CHECK-NEXT: vl %v1, 0(%r1), 3 +; CHECK-NEXT: vperm %v0, %v1, %v1, %v0 +; CHECK-NEXT: vlgvb %r2, %v0, 0 +; CHECK-NEXT: nilf %r2, 7 +; CHECK-NEXT: # kill: def $r2l killed $r2l killed $r2d +; CHECK-NEXT: br %r14 + %P = tail call <16 x i8> @llvm.s390.vperm(<16 x i8> , <16 x i8> , <16 x i8> ) + %E = extractelement <16 x i8> %P, i64 0 + %res = zext i8 %E to i32 + ret i32 %res +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Bump version to 18.1.8 (PR #95458)
llvmbot wrote: @llvm/pr-subscribers-testing-tools Author: Tom Stellard (tstellar) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/95458.diff 2 Files Affected: - (modified) llvm/CMakeLists.txt (+1-1) - (modified) llvm/utils/lit/lit/__init__.py (+1-1) ``diff diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt index 51278943847aa..909a965cd86c8 100644 --- a/llvm/CMakeLists.txt +++ b/llvm/CMakeLists.txt @@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR) set(LLVM_VERSION_MINOR 1) endif() if(NOT DEFINED LLVM_VERSION_PATCH) - set(LLVM_VERSION_PATCH 7) + set(LLVM_VERSION_PATCH 8) endif() if(NOT DEFINED LLVM_VERSION_SUFFIX) set(LLVM_VERSION_SUFFIX) diff --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py index 5003d78ce5218..800d59492d8ff 100644 --- a/llvm/utils/lit/lit/__init__.py +++ b/llvm/utils/lit/lit/__init__.py @@ -2,7 +2,7 @@ __author__ = "Daniel Dunbar" __email__ = "dan...@minormatter.com" -__versioninfo__ = (18, 1, 7) +__versioninfo__ = (18, 1, 8) __version__ = ".".join(str(v) for v in __versioninfo__) + "dev" __all__ = [] `` https://github.com/llvm/llvm-project/pull/95458 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Bump version to 18.1.8 (PR #95458)
https://github.com/tstellar created https://github.com/llvm/llvm-project/pull/95458 None >From 2edf6218b7e74cc76035e4e1efa8166b1c22312d Mon Sep 17 00:00:00 2001 From: Tom Stellard Date: Thu, 13 Jun 2024 12:33:39 -0700 Subject: [PATCH] Bump version to 18.1.8 --- llvm/CMakeLists.txt| 2 +- llvm/utils/lit/lit/__init__.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt index 51278943847aa..909a965cd86c8 100644 --- a/llvm/CMakeLists.txt +++ b/llvm/CMakeLists.txt @@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR) set(LLVM_VERSION_MINOR 1) endif() if(NOT DEFINED LLVM_VERSION_PATCH) - set(LLVM_VERSION_PATCH 7) + set(LLVM_VERSION_PATCH 8) endif() if(NOT DEFINED LLVM_VERSION_SUFFIX) set(LLVM_VERSION_SUFFIX) diff --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py index 5003d78ce5218..800d59492d8ff 100644 --- a/llvm/utils/lit/lit/__init__.py +++ b/llvm/utils/lit/lit/__init__.py @@ -2,7 +2,7 @@ __author__ = "Daniel Dunbar" __email__ = "dan...@minormatter.com" -__versioninfo__ = (18, 1, 7) +__versioninfo__ = (18, 1, 8) __version__ = ".".join(str(v) for v in __versioninfo__) + "dev" __all__ = [] ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
https://github.com/gbMattN converted_to_draft https://github.com/llvm/llvm-project/pull/95387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)
kbeyls wrote: > Yes, this doesn't have tests by itself because there's no exposed interface. > It's certainly trivial to add one (which would allow using the reference test > vectors). > > I don't have strong arguments either way, but I figured the conservative > option is to force hypothetical users to consider their use more seriously. > One might argue that's not how we usually treat libSupport, so I'm happy to > expose the raw function here. I see some value in being able to test with the reference test vectors to be fully sure that the implementation really implements SipHash. But as I said above, I'm happy with merging this as is. https://github.com/llvm/llvm-project/pull/94394 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] 7fe862d - Revert "[hwasan] Add fixed_shadow_base flag (#73980)"
Author: Florian Mayer Date: 2024-06-13T09:55:29-07:00 New Revision: 7fe862d0a1f6dfa67c236f5af32ad15546797404 URL: https://github.com/llvm/llvm-project/commit/7fe862d0a1f6dfa67c236f5af32ad15546797404 DIFF: https://github.com/llvm/llvm-project/commit/7fe862d0a1f6dfa67c236f5af32ad15546797404.diff LOG: Revert "[hwasan] Add fixed_shadow_base flag (#73980)" This reverts commit ea991a11b2a3d2bfa545adbefb71cd17e8970a43. Added: Modified: compiler-rt/lib/hwasan/hwasan_flags.inc compiler-rt/lib/hwasan/hwasan_linux.cpp Removed: compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c diff --git a/compiler-rt/lib/hwasan/hwasan_flags.inc b/compiler-rt/lib/hwasan/hwasan_flags.inc index 058a0457b9e7f..978fa46b705cb 100644 --- a/compiler-rt/lib/hwasan/hwasan_flags.inc +++ b/compiler-rt/lib/hwasan/hwasan_flags.inc @@ -84,10 +84,3 @@ HWASAN_FLAG(bool, malloc_bisect_dump, false, // are untagged before the call. HWASAN_FLAG(bool, fail_without_syscall_abi, true, "Exit if fail to request relaxed syscall ABI.") - -HWASAN_FLAG( -uptr, fixed_shadow_base, -1, -"If not -1, HWASan will attempt to allocate the shadow at this address, " -"instead of choosing one dynamically." -"Tip: this can be combined with the compiler option, " -"-hwasan-mapping-offset, to optimize the instrumentation.") diff --git a/compiler-rt/lib/hwasan/hwasan_linux.cpp b/compiler-rt/lib/hwasan/hwasan_linux.cpp index e6aa60b324fa7..c254670ee2d48 100644 --- a/compiler-rt/lib/hwasan/hwasan_linux.cpp +++ b/compiler-rt/lib/hwasan/hwasan_linux.cpp @@ -106,12 +106,8 @@ static uptr GetHighMemEnd() { } static void InitializeShadowBaseAddress(uptr shadow_size_bytes) { - if (flags()->fixed_shadow_base != (uptr)-1) { -__hwasan_shadow_memory_dynamic_address = flags()->fixed_shadow_base; - } else { -__hwasan_shadow_memory_dynamic_address = -FindDynamicShadowStart(shadow_size_bytes); - } + __hwasan_shadow_memory_dynamic_address = + FindDynamicShadowStart(shadow_size_bytes); } static void MaybeDieIfNoTaggingAbi(const char *message) { diff --git a/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c b/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c deleted file mode 100644 index 4ff1d3e64c1d0..0 --- a/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c +++ /dev/null @@ -1,76 +0,0 @@ -// Test fixed shadow base functionality. -// -// Default compiler instrumentation works with any shadow base (dynamic or fixed). -// RUN: %clang_hwasan %s -o %t && %run %t -// RUN: %clang_hwasan %s -o %t && HWASAN_OPTIONS=fixed_shadow_base=263878495698944 %run %t -// RUN: %clang_hwasan %s -o %t && HWASAN_OPTIONS=fixed_shadow_base=4398046511104 %run %t -// -// If -hwasan-mapping-offset is set, then the fixed_shadow_base needs to match. -// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=263878495698944 -o %t && HWASAN_OPTIONS=fixed_shadow_base=263878495698944 %run %t -// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=4398046511104 -o %t && HWASAN_OPTIONS=fixed_shadow_base=4398046511104 %run %t -// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=263878495698944 -o %t && HWASAN_OPTIONS=fixed_shadow_base=4398046511104 not %run %t -// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=4398046511104 -o %t && HWASAN_OPTIONS=fixed_shadow_base=263878495698944 not %run %t -// -// Note: if fixed_shadow_base is not set, compiler-rt will dynamically choose a -// shadow base, which has a tiny but non-zero probability of matching the -// compiler instrumentation. To avoid test flake, we do not test this case. -// -// Assume 48-bit VMA -// REQUIRES: aarch64-target-arch -// -// REQUIRES: Clang -// -// UNSUPPORTED: android - -#include -#include -#include -#include -#include -#include - -int main() { - __hwasan_enable_allocator_tagging(); - - // We test that the compiler instrumentation is able to access shadow memory - // for many diff erent addresses. If we only test a small number of addresses, - // it might work by chance even if the shadow base does not match between the - // compiler instrumentation and compiler-rt. - void **mmaps[256]; - // 48-bit VMA - for (int i = 0; i < 256; i++) { -unsigned long long addr = (i * (1ULL << 40)); - -void *p = mmap((void *)addr, 4096, PROT_READ | PROT_WRITE, - MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); -// We don't use MAP_FIXED, to avoid overwriting critical memory. -// However, if we don't get allocated the requested address, it -// isn't a useful test. -if ((unsigned long long)p != addr) { - munmap(p, 4096); - mmaps[i] = MAP_FAILED; -} else { - mmaps[i] = p; -} - } - - int failures = 0; - for (int i = 0; i < 256; i++) { -if (mmaps[i] == MAP_FAILED) { - failures++; -} else { - printf("%d %p\n", i, mmaps[i]); - munmap(mmaps[i], 4096); -} - } - -
[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)
https://github.com/jeanPerier approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/95353 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)
@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) { /// Decide if stale profile matching can be applied for a given function. /// Currently we skip inference for (very) large instances and for instances /// having "unexpected" control flow (e.g., having no sink basic blocks). -bool canApplyInference(const FlowFunction ) { +bool canApplyInference(const FlowFunction , + const yaml::bolt::BinaryFunctionProfile ) { if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize) return false; + if ((double)Func.MatchedExecCount / YamlBF.ExecCount >= + opts::MatchedProfileThreshold / 100.0) +return false; shawbyoung wrote: I’m leaning towards the block count heuristic now. I think the 1M and 4x1K exec count block case is likely pretty common – I imagine functions with loops would look a lot like this. Having some blocks matched exactly would suggest to me that there would likely be a reasonable amount of similarity between the profiled function and existing function relationally, which block coldness likely doesn’t have an outsized bearing on. I think having a reasonably high threshold for matched blocks would conservatively allow us to drop functions in high discrepancy – I’ll test this on a production binary. https://github.com/llvm/llvm-project/pull/95156 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
krzysz00 wrote: Yeah, makes sense. ... what prevents a match-bitwidth operator from existing? Context from where I'm standing is that you should be able to `raw.buffer.load/store` any (non-aggregate, let's say, since that could be better handled in `addrspace(7)` handling) type you could `load` or `store`. That is, `raw.ptr.buffer.load.i15` should work (as an i16 load that truncates) as should `raw.ptr.buffer.store.v8f32` (or `raw.ptr.buffer.store.i256`). Sure, the latter are two instructions long, but regular loads can regularize to multiple instructions just fine. My thoughts on how to implement that second behavior were to split the type into legal chunks and add in the offsets, and then merge/bitcast the values back. https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)
@@ -5745,6 +5745,14 @@ IntrinsicLibrary::genReduce(mlir::Type resultType, int rank = arrayTmp.rank(); assert(rank >= 1); + // Arguements to the reduction operation are passed by reference or value? + bool argByRef = true; + if (auto embox = + mlir::dyn_cast_or_null(operation.getDefiningOp())) { jeanPerier wrote: Does REDUCE works with dummy procedure and procedure pointers? If so it would be good to add tests for those cases to ensure the pattern matching here works with them. https://github.com/llvm/llvm-project/pull/95353 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
arsenm wrote: I don't think we should be trying to handle the unreasonable illegal types in the intrinsics themselves. Theoretically the intrinsic should correspond to direct support. We would handle the ugly types in the fat pointer lowering in terms of the intrinsics. https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
gbMattN wrote: This may be a side effect of a different bug tracking global variables. I think fixing that bug first, and then applying this change if the problem persists is a better idea. Because of this, I'm switching this to a draft for now. Discourse link is https://discourse.llvm.org/t/reviving-typesanitizer-a-sanitizer-to-catch-type-based-aliasing-violations/66092/23 https://github.com/llvm/llvm-project/pull/95387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc] 93e7f14 - Revert "[libc] fix aarch64 linux full build (#95358)"
Author: Schrodinger ZHU Yifan Date: 2024-06-13T07:54:57-07:00 New Revision: 93e7f145bc38c7c47d797e652d891695eb44fcfa URL: https://github.com/llvm/llvm-project/commit/93e7f145bc38c7c47d797e652d891695eb44fcfa DIFF: https://github.com/llvm/llvm-project/commit/93e7f145bc38c7c47d797e652d891695eb44fcfa.diff LOG: Revert "[libc] fix aarch64 linux full build (#95358)" This reverts commit ca05204f9aa258c5324d5675c7987c7e570168a0. Added: Modified: libc/config/linux/aarch64/entrypoints.txt libc/src/__support/threads/linux/CMakeLists.txt libc/test/IntegrationTest/test.cpp Removed: diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt index 7ce088689b925..db96a80051a8d 100644 --- a/libc/config/linux/aarch64/entrypoints.txt +++ b/libc/config/linux/aarch64/entrypoints.txt @@ -643,12 +643,6 @@ if(LLVM_LIBC_FULL_BUILD) libc.src.pthread.pthread_mutexattr_setrobust libc.src.pthread.pthread_mutexattr_settype libc.src.pthread.pthread_once -libc.src.pthread.pthread_rwlockattr_destroy -libc.src.pthread.pthread_rwlockattr_getkind_np -libc.src.pthread.pthread_rwlockattr_getpshared -libc.src.pthread.pthread_rwlockattr_init -libc.src.pthread.pthread_rwlockattr_setkind_np -libc.src.pthread.pthread_rwlockattr_setpshared libc.src.pthread.pthread_setspecific # sched.h entrypoints @@ -759,7 +753,6 @@ if(LLVM_LIBC_FULL_BUILD) libc.src.unistd._exit libc.src.unistd.environ libc.src.unistd.execv -libc.src.unistd.fork libc.src.unistd.getopt libc.src.unistd.optarg libc.src.unistd.optind diff --git a/libc/src/__support/threads/linux/CMakeLists.txt b/libc/src/__support/threads/linux/CMakeLists.txt index 8e6cd7227b2c8..9bf88ccc84557 100644 --- a/libc/src/__support/threads/linux/CMakeLists.txt +++ b/libc/src/__support/threads/linux/CMakeLists.txt @@ -64,7 +64,6 @@ add_object_library( .futex_utils libc.config.linux.app_h libc.include.sys_syscall -libc.include.fcntl libc.src.errno.errno libc.src.__support.CPP.atomic libc.src.__support.CPP.stringstream diff --git a/libc/test/IntegrationTest/test.cpp b/libc/test/IntegrationTest/test.cpp index 27e7f29efa0f1..3bdbe89a3fb62 100644 --- a/libc/test/IntegrationTest/test.cpp +++ b/libc/test/IntegrationTest/test.cpp @@ -79,10 +79,4 @@ void *realloc(void *ptr, size_t s) { // Integration tests are linked with -nostdlib. BFD linker expects // __dso_handle when -nostdlib is used. void *__dso_handle = nullptr; - -// On some platform (aarch64 fedora tested) full build integration test -// objects need to link against libgcc, which may expect a __getauxval -// function. For now, it is fine to provide a weak definition that always -// returns false. -[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; } } // extern "C" ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc] 91323a6 - Revert "Revert "[libc] fix aarch64 linux full build (#95358)" (#95419)"
Author: Schrodinger ZHU Yifan Date: 2024-06-13T08:38:05-07:00 New Revision: 91323a6ea8f32a9fe2cec7051e8a99b87157133e URL: https://github.com/llvm/llvm-project/commit/91323a6ea8f32a9fe2cec7051e8a99b87157133e DIFF: https://github.com/llvm/llvm-project/commit/91323a6ea8f32a9fe2cec7051e8a99b87157133e.diff LOG: Revert "Revert "[libc] fix aarch64 linux full build (#95358)" (#95419)" This reverts commit 9e5428e6b02c77fb18c4bdf688a216c957fd7a53. Added: Modified: libc/config/linux/aarch64/entrypoints.txt libc/src/__support/threads/linux/CMakeLists.txt libc/test/IntegrationTest/test.cpp Removed: diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt index db96a80051a8d..7ce088689b925 100644 --- a/libc/config/linux/aarch64/entrypoints.txt +++ b/libc/config/linux/aarch64/entrypoints.txt @@ -643,6 +643,12 @@ if(LLVM_LIBC_FULL_BUILD) libc.src.pthread.pthread_mutexattr_setrobust libc.src.pthread.pthread_mutexattr_settype libc.src.pthread.pthread_once +libc.src.pthread.pthread_rwlockattr_destroy +libc.src.pthread.pthread_rwlockattr_getkind_np +libc.src.pthread.pthread_rwlockattr_getpshared +libc.src.pthread.pthread_rwlockattr_init +libc.src.pthread.pthread_rwlockattr_setkind_np +libc.src.pthread.pthread_rwlockattr_setpshared libc.src.pthread.pthread_setspecific # sched.h entrypoints @@ -753,6 +759,7 @@ if(LLVM_LIBC_FULL_BUILD) libc.src.unistd._exit libc.src.unistd.environ libc.src.unistd.execv +libc.src.unistd.fork libc.src.unistd.getopt libc.src.unistd.optarg libc.src.unistd.optind diff --git a/libc/src/__support/threads/linux/CMakeLists.txt b/libc/src/__support/threads/linux/CMakeLists.txt index 9bf88ccc84557..8e6cd7227b2c8 100644 --- a/libc/src/__support/threads/linux/CMakeLists.txt +++ b/libc/src/__support/threads/linux/CMakeLists.txt @@ -64,6 +64,7 @@ add_object_library( .futex_utils libc.config.linux.app_h libc.include.sys_syscall +libc.include.fcntl libc.src.errno.errno libc.src.__support.CPP.atomic libc.src.__support.CPP.stringstream diff --git a/libc/test/IntegrationTest/test.cpp b/libc/test/IntegrationTest/test.cpp index 3bdbe89a3fb62..27e7f29efa0f1 100644 --- a/libc/test/IntegrationTest/test.cpp +++ b/libc/test/IntegrationTest/test.cpp @@ -79,4 +79,10 @@ void *realloc(void *ptr, size_t s) { // Integration tests are linked with -nostdlib. BFD linker expects // __dso_handle when -nostdlib is used. void *__dso_handle = nullptr; + +// On some platform (aarch64 fedora tested) full build integration test +// objects need to link against libgcc, which may expect a __getauxval +// function. For now, it is fine to provide a weak definition that always +// returns false. +[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; } } // extern "C" ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
arsenm wrote: That's what we've traditionally done and I think we should stop. We currently skip inserting the casts if the type is legal. It introduces extra bitcasts, which have a cost and increase pattern match complexity. We have a bunch of patterns that don't bother to look through the casts for a load/store https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
krzysz00 wrote: So, general question on this patch series: Wouldn't it be more reasonable to, instead of having separate handling for all the possible register types, always do loads as `i8`, `i16`, `i32` `<2 x i32>`, `<3 x i32>, or `<4 x i32>` and then `bitcast`/`merge_values`/... the results back to their type? Or at least to have that fallback path - if we don't know what a type is, load/store it as its bits? (Then we wouldn't need to, for example, go back and add a `<16 x i8>` case if someone realizes they want that) https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)
@@ -117,13 +117,44 @@ void test_update_dpp(global int* out, int arg1, int arg2) } // CHECK-LABEL: @test_ds_fadd -// CHECK: {{.*}}call{{.*}} float @llvm.amdgcn.ds.fadd.f32(ptr addrspace(3) %out, float %src, i32 0, i32 0, i1 false) +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src monotonic, align 4{{$}} +// CHECK: atomicrmw volatile fadd ptr addrspace(3) %out, float %src monotonic, align 4{{$}} + +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acquire, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acquire, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src release, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acq_rel, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src seq_cst, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src seq_cst, align 4{{$}} + +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src syncscope("agent") monotonic, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src syncscope("workgroup") monotonic, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src syncscope("wavefront") monotonic, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src syncscope("singlethread") monotonic, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src monotonic, align 4{{$}} #if !defined(__SPIRV__) void test_ds_faddf(local float *out, float src) { #else -void test_ds_faddf(__attribute__((address_space(3))) float *out, float src) { + void test_ds_faddf(__attribute__((address_space(3))) float *out, float src) { #endif + *out = __builtin_amdgcn_ds_faddf(out, src, 0, 0, false); + *out = __builtin_amdgcn_ds_faddf(out, src, 0, 0, true); + + // Test all orders. + *out = __builtin_amdgcn_ds_faddf(out, src, 1, 0, false); yxsamliu wrote: better use predefined macros ``` // Define macros for the C11 / C++11 memory orderings Builder.defineMacro("__ATOMIC_RELAXED", "0"); Builder.defineMacro("__ATOMIC_CONSUME", "1"); Builder.defineMacro("__ATOMIC_ACQUIRE", "2"); Builder.defineMacro("__ATOMIC_RELEASE", "3"); Builder.defineMacro("__ATOMIC_ACQ_REL", "4"); Builder.defineMacro("__ATOMIC_SEQ_CST", "5"); // Define macros for the clang atomic scopes. Builder.defineMacro("__MEMORY_SCOPE_SYSTEM", "0"); Builder.defineMacro("__MEMORY_SCOPE_DEVICE", "1"); Builder.defineMacro("__MEMORY_SCOPE_WRKGRP", "2"); Builder.defineMacro("__MEMORY_SCOPE_WVFRNT", "3"); Builder.defineMacro("__MEMORY_SCOPE_SINGLE", "4"); ``` https://github.com/llvm/llvm-project/pull/95395 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
gbMattN wrote: @fhahn https://github.com/llvm/llvm-project/pull/95387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)
@@ -2331,40 +2337,74 @@ static Value *upgradeARMIntrinsicCall(StringRef Name, CallBase *CI, Function *F, llvm_unreachable("Unknown function for ARM CallBase upgrade."); } +// These are expected to have have the arguments: cdevadas wrote: ```suggestion // These are expected to have the arguments: ``` https://github.com/llvm/llvm-project/pull/95396 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
https://github.com/gbMattN updated https://github.com/llvm/llvm-project/pull/95387 >From 432f994b1bc21e4db0778fff9cc1425f788f8168 Mon Sep 17 00:00:00 2001 From: Matthew Nagy Date: Thu, 13 Jun 2024 09:54:04 + Subject: [PATCH] [TySan] Fixed false positive when accessing offset member variables --- compiler-rt/lib/tysan/tysan.cpp | 12 +- compiler-rt/test/tysan/struct-members.c | 31 + 2 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 compiler-rt/test/tysan/struct-members.c diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp index f627851d049e6..747727e48a152 100644 --- a/compiler-rt/lib/tysan/tysan.cpp +++ b/compiler-rt/lib/tysan/tysan.cpp @@ -221,7 +221,17 @@ __tysan_check(void *addr, int size, tysan_type_descriptor *td, int flags) { OldTDPtr -= i; OldTD = *OldTDPtr; -if (!isAliasingLegal(td, OldTD)) +tysan_type_descriptor *InternalMember = OldTD; +if (OldTD->Tag == TYSAN_STRUCT_TD) { + for (int j = 0; j < OldTD->Struct.MemberCount; j++) { +if (OldTD->Struct.Members[j].Offset == i) { + InternalMember = OldTD->Struct.Members[j].Type; + break; +} + } +} + +if (!isAliasingLegal(td, InternalMember)) reportError(addr, size, td, OldTD, AccessStr, "accesses part of an existing object", -i, pc, bp, sp); diff --git a/compiler-rt/test/tysan/struct-members.c b/compiler-rt/test/tysan/struct-members.c new file mode 100644 index 0..76ea3c431dd7b --- /dev/null +++ b/compiler-rt/test/tysan/struct-members.c @@ -0,0 +1,31 @@ +// RUN: %clang_tysan -O0 %s -o %t && %run %t >%t.out 2>&1 +// RUN: FileCheck %s < %t.out + +#include + +struct X { + int a, b, c; +} x; + +static struct X xArray[2]; + +int main() { + x.a = 1; + x.b = 2; + x.c = 3; + + printf("%d %d %d\n", x.a, x.b, x.c); + // CHECK-NOT: ERROR: TypeSanitizer: type-aliasing-violation + + for (size_t i = 0; i < 2; i++) { +xArray[i].a = 1; +xArray[i].b = 1; +xArray[i].c = 1; + } + + struct X *xPtr = (struct X *)&(xArray[0].c); + xPtr->a = 1; + // CHECK: ERROR: TypeSanitizer: type-aliasing-violation + // CHECK: WRITE of size 4 at {{.*}} with type int (in X at offset 0) accesses an existing object of type int (in X at offset 8) + // CHECK: {{#0 0x.* in main .*struct-members.c:}}[[@LINE-3]] +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
@@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_LoadIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} arsenm wrote: I'm not a big fan of omitting the braces, especially in tablegen. If we're going to delete the braces the lines should at least be indented https://github.com/llvm/llvm-project/pull/95378 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/95379 >From 14695322d92821374dd6599d8f0f76d212e50169 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 12 Jun 2024 10:10:20 +0200 Subject: [PATCH] AMDGPU: Fix buffer load/store of pointers Make sure we test all the address spaces since this support isn't free in gisel. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 31 +- .../AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll | 596 ++ .../llvm.amdgcn.raw.ptr.buffer.store.ll | 456 ++ 3 files changed, 1071 insertions(+), 12 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 81098201e9c0f..7a36c88b892c8 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1112,29 +1112,33 @@ unsigned SITargetLowering::getVectorTypeBreakdownForCallingConv( Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT); } -static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrData(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { assert(MaxNumLanes != 0); + LLVMContext = Ty->getContext(); if (auto *VT = dyn_cast(Ty)) { unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements()); -return EVT::getVectorVT(Ty->getContext(), -EVT::getEVT(VT->getElementType()), +return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()), NumElts); } - return EVT::getEVT(Ty); + return TLI.getValueType(DL, Ty); } // Peek through TFE struct returns to only use the data size. -static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrReturn(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { auto *ST = dyn_cast(Ty); if (!ST) -return memVTFromLoadIntrData(Ty, MaxNumLanes); +return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes); // TFE intrinsics return an aggregate type. assert(ST->getNumContainedTypes() == 2 && ST->getContainedType(1)->isIntegerTy(32)); - return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes); + return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes); } /// Map address space 7 to MVT::v5i32 because that's its in-memory @@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask); } -Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes); +Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(), + CI.getType(), MaxNumLanes); } else { -Info.memVT = memVTFromLoadIntrReturn( -CI.getType(), std::numeric_limits::max()); +Info.memVT = +memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(), +std::numeric_limits::max()); } // FIXME: What does alignment mean for an image? @@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , if (RsrcIntr->IsImage) { unsigned DMask = cast(CI.getArgOperand(1))->getZExtValue(); unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask); -Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes); +Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy, + DMaskLanes); } else -Info.memVT = EVT::getEVT(DataTy); +Info.memVT = getValueType(MF.getDataLayout(), DataTy); Info.flags |= MachineMemOperand::MOStore; } else { diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll index 3e3371091ef72..4d557c76dc4d0 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll @@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr addrspace(8) inreg %rsrc, i ret <2 x i64> %data } +define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 %voffset) { +; PREGFX10-LABEL: buffer_load_p0__voffset_add: +; PREGFX10: ; %bb.0: +; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60 +; PREGFX10-NEXT:s_waitcnt vmcnt(0) +; PREGFX10-NEXT:s_setpc_b64 s[30:31] +; +; GFX10-LABEL: buffer_load_p0__voffset_add: +; GFX10: ; %bb.0: +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60 +;
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/95378 >From 1dfcc0961e82bbe656faded0c38e694da0d76c9b Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 9 Jun 2024 23:12:31 +0200 Subject: [PATCH] AMDGPU: Cleanup selection patterns for buffer loads We should just support these for all register types. --- llvm/lib/Target/AMDGPU/BUFInstructions.td | 72 ++- llvm/lib/Target/AMDGPU/SIRegisterInfo.td | 16 ++--- 2 files changed, 39 insertions(+), 49 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td index 50e62788c5eac..978d261f5a662 100644 --- a/llvm/lib/Target/AMDGPU/BUFInstructions.td +++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td @@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_LoadIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} defm : MUBUF_LoadIntrinsicPat; defm : MUBUF_LoadIntrinsicPat; @@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_StoreIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} defm : MUBUF_StoreIntrinsicPat; defm : MUBUF_StoreIntrinsicPat; diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td index caac7126068ef..a8efe2b2ba35e 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td @@ -586,7 +586,9 @@ class RegisterTypes reg_types> { def Reg16Types : RegisterTypes<[i16, f16, bf16]>; def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, p6]>; -def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>; +def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, v4bf16]>; +def Reg96Types : RegisterTypes<[v3i32, v3f32]>; +def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16]>; let HasVGPR = 1 in { // VOP3 and VINTERP can access 256 lo and 256 hi registers. @@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, let BaseClassOrder = 1; } -def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, v8f16, v8bf16], 32, +def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32, (add PRIVATE_RSRC_REG)> { let isAllocatable = 0; let CopyCost = -1; @@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v let HasSGPR = 1; } -def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, v4bf16], 32, +def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32, (add SGPR_64Regs)> { let CopyCost = 1; let AllocationPriority = 1; @@ -905,8 +907,8 @@ multiclass SRegClass; -defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], SGPR_128Regs, TTMP_128Regs>; +defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>; +defm "" : SRegClass<4, Reg128Types.types, SGPR_128Regs, TTMP_128Regs>; defm "" : SRegClass<5,
[llvm-branch-commits] [clang] 0e8c9bc - Revert "[clang][NFC] Add a test for CWG2685 (#95206)"
Author: Younan Zhang Date: 2024-06-13T18:53:46+08:00 New Revision: 0e8c9bca863137f14aea2cee0e05d4270b33e0e8 URL: https://github.com/llvm/llvm-project/commit/0e8c9bca863137f14aea2cee0e05d4270b33e0e8 DIFF: https://github.com/llvm/llvm-project/commit/0e8c9bca863137f14aea2cee0e05d4270b33e0e8.diff LOG: Revert "[clang][NFC] Add a test for CWG2685 (#95206)" This reverts commit 3475116e2c37a2c8a69658b36c02871c322da008. Added: Modified: clang/test/CXX/drs/cwg26xx.cpp clang/www/cxx_dr_status.html Removed: diff --git a/clang/test/CXX/drs/cwg26xx.cpp b/clang/test/CXX/drs/cwg26xx.cpp index fee3ef16850bf..2b17c8101438d 100644 --- a/clang/test/CXX/drs/cwg26xx.cpp +++ b/clang/test/CXX/drs/cwg26xx.cpp @@ -225,15 +225,6 @@ void m() { } #if __cplusplus >= 202302L - -namespace cwg2685 { // cwg2685: 17 -template -struct A { - T ar[4]; -}; -A a = { "foo" }; -} - namespace cwg2687 { // cwg2687: 18 struct S{ void f(int); diff --git a/clang/www/cxx_dr_status.html b/clang/www/cxx_dr_status.html index 8c79708f23abd..5e2ab06701703 100755 --- a/clang/www/cxx_dr_status.html +++ b/clang/www/cxx_dr_status.html @@ -15918,7 +15918,7 @@ C++ defect report implementation status https://cplusplus.github.io/CWG/issues/2685.html;>2685 C++23 Aggregate CTAD, string, and brace elision -Clang 17 +Unknown https://cplusplus.github.io/CWG/issues/2686.html;>2686 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
llvmbot wrote: @llvm/pr-subscribers-compiler-rt-sanitizer Author: None (gbMattN) Changes This patch fixes a bug the current TySan implementation has. Currently if you access a member variable other than the first, TySan reports an error. TySan believes you are accessing the struct type with an offset equal to the offset of the member variable you are trying to access. With this patch, the type we are trying to access is amended to the type of the member variable matching the offset we are accessing with. It does this if and only if there is a member at that offset, however, so any incorrect accesses are still caught. This is checked in the struct-members.c test. --- Full diff: https://github.com/llvm/llvm-project/pull/95387.diff 2 Files Affected: - (modified) compiler-rt/lib/tysan/tysan.cpp (+11-1) - (added) compiler-rt/test/tysan/struct-members.c (+32) ``diff diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp index f627851d049e6..747727e48a152 100644 --- a/compiler-rt/lib/tysan/tysan.cpp +++ b/compiler-rt/lib/tysan/tysan.cpp @@ -221,7 +221,17 @@ __tysan_check(void *addr, int size, tysan_type_descriptor *td, int flags) { OldTDPtr -= i; OldTD = *OldTDPtr; -if (!isAliasingLegal(td, OldTD)) +tysan_type_descriptor *InternalMember = OldTD; +if (OldTD->Tag == TYSAN_STRUCT_TD) { + for (int j = 0; j < OldTD->Struct.MemberCount; j++) { +if (OldTD->Struct.Members[j].Offset == i) { + InternalMember = OldTD->Struct.Members[j].Type; + break; +} + } +} + +if (!isAliasingLegal(td, InternalMember)) reportError(addr, size, td, OldTD, AccessStr, "accesses part of an existing object", -i, pc, bp, sp); diff --git a/compiler-rt/test/tysan/struct-members.c b/compiler-rt/test/tysan/struct-members.c new file mode 100644 index 0..8cf6499f78ce6 --- /dev/null +++ b/compiler-rt/test/tysan/struct-members.c @@ -0,0 +1,32 @@ +// RUN: %clang_tysan -O0 %s -o %t && %run %t >%t.out 2>&1 +// RUN: FileCheck %s < %t.out + +#include + +struct X { + int a, b, c; +} x; + +static struct X xArray[2]; + +int main() { + x.a = 1; + x.b = 2; + x.c = 3; + + printf("%d %d %d\n", x.a, x.b, x.c); + // CHECK-NOT: ERROR: TypeSanitizer: type-aliasing-violation + + for (size_t i = 0; i < 2; i++) { +xArray[i].a = 1; +xArray[i].b = 1; +xArray[i].c = 1; + } + printf("Here\n"); + + struct X *xPtr = (struct X *)&(xArray[0].c); + xPtr->a = 1; + // CHECK: ERROR: TypeSanitizer: type-aliasing-violation + // CHECK: WRITE of size 4 at {{.*}} with type int (in X at offset 0) accesses an existing object of type int (in X at offset 8) + // CHECK: {{#0 0x.* in main .*struct-members.c:}}[[@LINE-3]] +} `` https://github.com/llvm/llvm-project/pull/95387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
github-actions[bot] wrote: Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using `@` followed by their GitHub username. If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the [LLVM GitHub User Guide](https://llvm.org/docs/GitHub.html). You can also ask questions in a comment on this PR, on the [LLVM Discord](https://discord.com/invite/xS7Z362) or on the [forums](https://discourse.llvm.org/). https://github.com/llvm/llvm-project/pull/95387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)
https://github.com/jayfoad approved this pull request. https://github.com/llvm/llvm-project/pull/95377 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/95377 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes We should just support these for all register types. --- Full diff: https://github.com/llvm/llvm-project/pull/95378.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+30-42) - (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+9-7) ``diff diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td index 94dd45f1333b0..2f52edb7f917a 100644 --- a/llvm/lib/Target/AMDGPU/BUFInstructions.td +++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td @@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_LoadIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} defm : MUBUF_LoadIntrinsicPat; defm : MUBUF_LoadIntrinsicPat; @@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_StoreIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} defm : MUBUF_StoreIntrinsicPat; defm : MUBUF_StoreIntrinsicPat; diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td index caac7126068ef..a8efe2b2ba35e 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td @@ -586,7 +586,9 @@ class RegisterTypes reg_types> { def Reg16Types : RegisterTypes<[i16, f16, bf16]>; def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, p6]>; -def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>; +def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, v4bf16]>; +def Reg96Types : RegisterTypes<[v3i32, v3f32]>; +def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16]>; let HasVGPR = 1 in { // VOP3 and VINTERP can access 256 lo and 256 hi registers. @@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, let BaseClassOrder = 1; } -def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, v8f16, v8bf16], 32, +def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32, (add PRIVATE_RSRC_REG)> { let isAllocatable = 0; let CopyCost = -1; @@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v let HasSGPR = 1; } -def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, v4bf16], 32, +def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32, (add SGPR_64Regs)> { let CopyCost = 1; let AllocationPriority = 1; @@ -905,8 +907,8 @@ multiclass SRegClass; -defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], SGPR_128Regs, TTMP_128Regs>; +defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>; +defm "" : SRegClass<4, Reg128Types.types, SGPR_128Regs, TTMP_128Regs>; defm "" : SRegClass<5, [v5i32, v5f32], SGPR_160Regs, TTMP_160Regs>; defm "" : SRegClass<6, [v6i32, v6f32, v3i64, v3f64], SGPR_192Regs, TTMP_192Regs>; defm "" :
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes Make sure we test all the address spaces since this support isn't free in gisel. --- Patch is 38.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95379.diff 3 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+19-12) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll (+596) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.ll (+144) ``diff diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 81098201e9c0f..7a36c88b892c8 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1112,29 +1112,33 @@ unsigned SITargetLowering::getVectorTypeBreakdownForCallingConv( Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT); } -static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrData(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { assert(MaxNumLanes != 0); + LLVMContext = Ty->getContext(); if (auto *VT = dyn_cast(Ty)) { unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements()); -return EVT::getVectorVT(Ty->getContext(), -EVT::getEVT(VT->getElementType()), +return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()), NumElts); } - return EVT::getEVT(Ty); + return TLI.getValueType(DL, Ty); } // Peek through TFE struct returns to only use the data size. -static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrReturn(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { auto *ST = dyn_cast(Ty); if (!ST) -return memVTFromLoadIntrData(Ty, MaxNumLanes); +return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes); // TFE intrinsics return an aggregate type. assert(ST->getNumContainedTypes() == 2 && ST->getContainedType(1)->isIntegerTy(32)); - return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes); + return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes); } /// Map address space 7 to MVT::v5i32 because that's its in-memory @@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask); } -Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes); +Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(), + CI.getType(), MaxNumLanes); } else { -Info.memVT = memVTFromLoadIntrReturn( -CI.getType(), std::numeric_limits::max()); +Info.memVT = +memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(), +std::numeric_limits::max()); } // FIXME: What does alignment mean for an image? @@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , if (RsrcIntr->IsImage) { unsigned DMask = cast(CI.getArgOperand(1))->getZExtValue(); unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask); -Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes); +Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy, + DMaskLanes); } else -Info.memVT = EVT::getEVT(DataTy); +Info.memVT = getValueType(MF.getDataLayout(), DataTy); Info.flags |= MachineMemOperand::MOStore; } else { diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll index 3e3371091ef72..4d557c76dc4d0 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll @@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr addrspace(8) inreg %rsrc, i ret <2 x i64> %data } +define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 %voffset) { +; PREGFX10-LABEL: buffer_load_p0__voffset_add: +; PREGFX10: ; %bb.0: +; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60 +; PREGFX10-NEXT:s_waitcnt vmcnt(0) +; PREGFX10-NEXT:s_setpc_b64 s[30:31] +; +; GFX10-LABEL: buffer_load_p0__voffset_add: +; GFX10: ; %bb.0: +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60 +; GFX10-NEXT:s_waitcnt vmcnt(0) +;
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/95378 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/95377.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-2) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll (+32-5) ``diff diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 4946129c65a95..81098201e9c0f 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -874,7 +874,7 @@ SITargetLowering::SITargetLowering(const TargetMachine , {MVT::Other, MVT::v2i16, MVT::v2f16, MVT::v2bf16, MVT::v3i16, MVT::v3f16, MVT::v4f16, MVT::v4i16, MVT::v4bf16, MVT::v8i16, MVT::v8f16, MVT::v8bf16, - MVT::f16, MVT::i16, MVT::i8, MVT::i128}, + MVT::f16, MVT::i16, MVT::bf16, MVT::i8, MVT::i128}, Custom); setOperationAction(ISD::STACKSAVE, MVT::Other, Custom); @@ -9973,7 +9973,7 @@ SDValue SITargetLowering::handleByteShortBufferStores(SelectionDAG , EVT VDataType, SDLoc DL, SDValue Ops[], MemSDNode *M) const { - if (VDataType == MVT::f16) + if (VDataType == MVT::f16 || VDataType == MVT::bf16) Ops[1] = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Ops[1]); SDValue BufferStoreExt = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Ops[1]); diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll index f7f3742a90633..82dd35ab4c240 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll @@ -5,11 +5,38 @@ ; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefix=GFX10 %s ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | FileCheck --check-prefixes=GFX11 %s -; FIXME -; define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) { -; call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0) -; ret void -; } +define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) { +; GFX7-LABEL: buffer_store_bf16: +; GFX7: ; %bb.0: +; GFX7-NEXT:v_mul_f32_e32 v0, 1.0, v0 +; GFX7-NEXT:v_lshrrev_b32_e32 v0, 16, v0 +; GFX7-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX7-NEXT:s_endpgm +; +; GFX8-LABEL: buffer_store_bf16: +; GFX8: ; %bb.0: +; GFX8-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX8-NEXT:s_endpgm +; +; GFX9-LABEL: buffer_store_bf16: +; GFX9: ; %bb.0: +; GFX9-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX9-NEXT:s_endpgm +; +; GFX10-LABEL: buffer_store_bf16: +; GFX10: ; %bb.0: +; GFX10-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX10-NEXT:s_endpgm +; +; GFX11-LABEL: buffer_store_bf16: +; GFX11: ; %bb.0: +; GFX11-NEXT:buffer_store_b16 v0, v1, s[0:3], 0 offen +; GFX11-NEXT:s_nop 0 +; GFX11-NEXT:s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) +; GFX11-NEXT:s_endpgm + call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0) + ret void +} define amdgpu_ps void @buffer_store_v2bf16(ptr addrspace(8) inreg %rsrc, <2 x bfloat> %data, i32 %offset) { ; GFX7-LABEL: buffer_store_v2bf16: `` https://github.com/llvm/llvm-project/pull/95377 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-downstack-mergeability-warning; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests;>Learn more * **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about stacking. Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/95378 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-downstack-mergeability-warning; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests;>Learn more * **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about stacking. Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/95377 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-downstack-mergeability-warning; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests;>Learn more * **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about stacking. Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/95379 Make sure we test all the address spaces since this support isn't free in gisel. >From b05179ed684e289ce31f7aee8b57939c7bf2809c Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 12 Jun 2024 10:10:20 +0200 Subject: [PATCH] AMDGPU: Fix buffer load/store of pointers Make sure we test all the address spaces since this support isn't free in gisel. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 31 +- .../AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll | 596 ++ .../llvm.amdgcn.raw.ptr.buffer.store.ll | 144 + 3 files changed, 759 insertions(+), 12 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 81098201e9c0f..7a36c88b892c8 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1112,29 +1112,33 @@ unsigned SITargetLowering::getVectorTypeBreakdownForCallingConv( Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT); } -static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrData(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { assert(MaxNumLanes != 0); + LLVMContext = Ty->getContext(); if (auto *VT = dyn_cast(Ty)) { unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements()); -return EVT::getVectorVT(Ty->getContext(), -EVT::getEVT(VT->getElementType()), +return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()), NumElts); } - return EVT::getEVT(Ty); + return TLI.getValueType(DL, Ty); } // Peek through TFE struct returns to only use the data size. -static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrReturn(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { auto *ST = dyn_cast(Ty); if (!ST) -return memVTFromLoadIntrData(Ty, MaxNumLanes); +return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes); // TFE intrinsics return an aggregate type. assert(ST->getNumContainedTypes() == 2 && ST->getContainedType(1)->isIntegerTy(32)); - return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes); + return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes); } /// Map address space 7 to MVT::v5i32 because that's its in-memory @@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask); } -Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes); +Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(), + CI.getType(), MaxNumLanes); } else { -Info.memVT = memVTFromLoadIntrReturn( -CI.getType(), std::numeric_limits::max()); +Info.memVT = +memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(), +std::numeric_limits::max()); } // FIXME: What does alignment mean for an image? @@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , if (RsrcIntr->IsImage) { unsigned DMask = cast(CI.getArgOperand(1))->getZExtValue(); unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask); -Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes); +Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy, + DMaskLanes); } else -Info.memVT = EVT::getEVT(DataTy); +Info.memVT = getValueType(MF.getDataLayout(), DataTy); Info.flags |= MachineMemOperand::MOStore; } else { diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll index 3e3371091ef72..4d557c76dc4d0 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll @@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr addrspace(8) inreg %rsrc, i ret <2 x i64> %data } +define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 %voffset) { +; PREGFX10-LABEL: buffer_load_p0__voffset_add: +; PREGFX10: ; %bb.0: +; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60 +; PREGFX10-NEXT:s_waitcnt vmcnt(0) +; PREGFX10-NEXT:s_setpc_b64 s[30:31] +; +; GFX10-LABEL: buffer_load_p0__voffset_add: +; GFX10: ; %bb.0: +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +;
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/95378 We should just support these for all register types. >From 46c7f8b4529827204e5273472ea5b642ecb7266e Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 9 Jun 2024 23:12:31 +0200 Subject: [PATCH] AMDGPU: Cleanup selection patterns for buffer loads We should just support these for all register types. --- llvm/lib/Target/AMDGPU/BUFInstructions.td | 72 ++- llvm/lib/Target/AMDGPU/SIRegisterInfo.td | 16 ++--- 2 files changed, 39 insertions(+), 49 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td index 94dd45f1333b0..2f52edb7f917a 100644 --- a/llvm/lib/Target/AMDGPU/BUFInstructions.td +++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td @@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_LoadIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} defm : MUBUF_LoadIntrinsicPat; defm : MUBUF_LoadIntrinsicPat; @@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_StoreIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} defm : MUBUF_StoreIntrinsicPat; defm : MUBUF_StoreIntrinsicPat; diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td index caac7126068ef..a8efe2b2ba35e 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td @@ -586,7 +586,9 @@ class RegisterTypes reg_types> { def Reg16Types : RegisterTypes<[i16, f16, bf16]>; def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, p6]>; -def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>; +def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, v4bf16]>; +def Reg96Types : RegisterTypes<[v3i32, v3f32]>; +def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16]>; let HasVGPR = 1 in { // VOP3 and VINTERP can access 256 lo and 256 hi registers. @@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, let BaseClassOrder = 1; } -def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, v8f16, v8bf16], 32, +def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32, (add PRIVATE_RSRC_REG)> { let isAllocatable = 0; let CopyCost = -1; @@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v let HasSGPR = 1; } -def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, v4bf16], 32, +def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32, (add SGPR_64Regs)> { let CopyCost = 1; let AllocationPriority = 1; @@ -905,8 +907,8 @@ multiclass SRegClass; -defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], SGPR_128Regs, TTMP_128Regs>; +defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>; +defm "" : SRegClass<4, Reg128Types.types,
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/95377 None >From 520d91d73339d8bea65f2e30e2a4d7fd0eb3d92b Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 9 Jun 2024 22:54:35 +0200 Subject: [PATCH] AMDGPU: Fix buffer intrinsic store of bfloat --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 4 +- .../llvm.amdgcn.raw.ptr.buffer.store.bf16.ll | 37 --- 2 files changed, 34 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 4946129c65a95..81098201e9c0f 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -874,7 +874,7 @@ SITargetLowering::SITargetLowering(const TargetMachine , {MVT::Other, MVT::v2i16, MVT::v2f16, MVT::v2bf16, MVT::v3i16, MVT::v3f16, MVT::v4f16, MVT::v4i16, MVT::v4bf16, MVT::v8i16, MVT::v8f16, MVT::v8bf16, - MVT::f16, MVT::i16, MVT::i8, MVT::i128}, + MVT::f16, MVT::i16, MVT::bf16, MVT::i8, MVT::i128}, Custom); setOperationAction(ISD::STACKSAVE, MVT::Other, Custom); @@ -9973,7 +9973,7 @@ SDValue SITargetLowering::handleByteShortBufferStores(SelectionDAG , EVT VDataType, SDLoc DL, SDValue Ops[], MemSDNode *M) const { - if (VDataType == MVT::f16) + if (VDataType == MVT::f16 || VDataType == MVT::bf16) Ops[1] = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Ops[1]); SDValue BufferStoreExt = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Ops[1]); diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll index f7f3742a90633..82dd35ab4c240 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll @@ -5,11 +5,38 @@ ; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefix=GFX10 %s ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | FileCheck --check-prefixes=GFX11 %s -; FIXME -; define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) { -; call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0) -; ret void -; } +define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) { +; GFX7-LABEL: buffer_store_bf16: +; GFX7: ; %bb.0: +; GFX7-NEXT:v_mul_f32_e32 v0, 1.0, v0 +; GFX7-NEXT:v_lshrrev_b32_e32 v0, 16, v0 +; GFX7-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX7-NEXT:s_endpgm +; +; GFX8-LABEL: buffer_store_bf16: +; GFX8: ; %bb.0: +; GFX8-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX8-NEXT:s_endpgm +; +; GFX9-LABEL: buffer_store_bf16: +; GFX9: ; %bb.0: +; GFX9-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX9-NEXT:s_endpgm +; +; GFX10-LABEL: buffer_store_bf16: +; GFX10: ; %bb.0: +; GFX10-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX10-NEXT:s_endpgm +; +; GFX11-LABEL: buffer_store_bf16: +; GFX11: ; %bb.0: +; GFX11-NEXT:buffer_store_b16 v0, v1, s[0:3], 0 offen +; GFX11-NEXT:s_nop 0 +; GFX11-NEXT:s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) +; GFX11-NEXT:s_endpgm + call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0) + ret void +} define amdgpu_ps void @buffer_store_v2bf16(ptr addrspace(8) inreg %rsrc, <2 x bfloat> %data, i32 %offset) { ; GFX7-LABEL: buffer_store_v2bf16: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits