[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)
@@ -372,6 +372,31 @@ void foo(double *d, float f, float *fp, long double *l, int *i, const char *c) { // HAS_MAYTRAP: declare float @llvm.experimental.constrained.minnum.f32( // HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.minnum.f80( + fmaximum_num(*d,*d); fmaximum_numf(f,f); fmaximum_numl(*l,*l); + +// NO__ERRNO: declare double @llvm.maximumnum.f64(double, double) [[READNONE_INTRINSIC]] +// NO__ERRNO: declare float @llvm.maximumnum.f32(float, float) [[READNONE_INTRINSIC]] +// NO__ERRNO: declare x86_fp80 @llvm.maximumnum.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC]] +// HAS_ERRNO: declare double @llvm.maximumnum.f64(double, double) [[READNONE_INTRINSIC]] +// HAS_ERRNO: declare float @llvm.maximumnum.f32(float, float) [[READNONE_INTRINSIC]] +// HAS_ERRNO: declare x86_fp80 @llvm.maximumnum.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC]] +// HAS_MAYTRAP: declare double @llvm.maximumnum.f64( +// HAS_MAYTRAP: declare float @llvm.maximumnum.f32( +// HAS_MAYTRAP: declare x86_fp80 @llvm.maximumnum.f80( + + fminimum_num(*d,*d); fminimum_numf(f,f); fminimum_numl(*l,*l); + +// NO__ERRNO: declare double @llvm.minimumnum.f64(double, double) [[READNONE_INTRINSIC]] +// NO__ERRNO: declare float @llvm.minimumnum.f32(float, float) [[READNONE_INTRINSIC]] +// NO__ERRNO: declare x86_fp80 @llvm.minimumnum.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC]] +// HAS_ERRNO: declare double @llvm.minimumnum.f64(double, double) [[READNONE_INTRINSIC]] +// HAS_ERRNO: declare float @llvm.minimumnum.f32(float, float) [[READNONE_INTRINSIC]] +// HAS_ERRNO: declare x86_fp80 @llvm.minimumnum.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC]] +// HAS_MAYTRAP: declare double @llvm.minimumnum.f64( +// HAS_MAYTRAP: declare float @llvm.minimumnum.f32( arsenm wrote: These checks should be common. The attributes of intrinsics are fixed and these don't set errno https://github.com/llvm/llvm-project/pull/96281 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Replace use of `llvm-mc` with `clang` (PR #112041)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/112041 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Replace use of `llvm-mc` with `clang` (PR #112041)
@@ -463,10 +463,11 @@ void HIP::constructGenerateObjFileFromHIPFatBinary( Objf << ObjBuffer; - ArgStringList McArgs{"-triple", Args.MakeArgString(HostTriple.normalize()), + ArgStringList McArgs{"-target", Args.MakeArgString(HostTriple.normalize()), "-o", Output.getFilename(), - McinFile, "--filetype=obj"}; - const char *Mc = Args.MakeArgString(TC.GetProgramPath("llvm-mc")); + "-x", "assembler", + ObjinFile, "-c"}; + const char *Mc = Args.MakeArgString(TC.GetProgramPath("clang")); arsenm wrote: But the toolchain tracked the name of the current clang? Really you want to find the current binary `I don't think it's critical that the clang we invoke here is the amdclang` It's critical to find the exact clang that you are running https://github.com/llvm/llvm-project/pull/112041 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Replace use of `llvm-mc` with `clang` (PR #112041)
@@ -463,10 +463,11 @@ void HIP::constructGenerateObjFileFromHIPFatBinary( Objf << ObjBuffer; - ArgStringList McArgs{"-triple", Args.MakeArgString(HostTriple.normalize()), + ArgStringList McArgs{"-target", Args.MakeArgString(HostTriple.normalize()), "-o", Output.getFilename(), - McinFile, "--filetype=obj"}; - const char *Mc = Args.MakeArgString(TC.GetProgramPath("llvm-mc")); + "-x", "assembler", + ObjinFile, "-c"}; + const char *Mc = Args.MakeArgString(TC.GetProgramPath("clang")); arsenm wrote: Because sometimes there's a version suffix (e.g. clang-19), and some distributions add on random prefixes or suffixes (e.g. amdclang) https://github.com/llvm/llvm-project/pull/112041 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Replace use of `llvm-mc` with `clang` (PR #112041)
@@ -463,10 +463,11 @@ void HIP::constructGenerateObjFileFromHIPFatBinary( Objf << ObjBuffer; - ArgStringList McArgs{"-triple", Args.MakeArgString(HostTriple.normalize()), + ArgStringList McArgs{"-target", Args.MakeArgString(HostTriple.normalize()), "-o", Output.getFilename(), - McinFile, "--filetype=obj"}; - const char *Mc = Args.MakeArgString(TC.GetProgramPath("llvm-mc")); + "-x", "assembler", + ObjinFile, "-c"}; + const char *Mc = Args.MakeArgString(TC.GetProgramPath("clang")); arsenm wrote: Shouldn't assume the binary name is clang, other places seem to be doing something like this: `TCArgs.MakeArgString(getToolChain().GetProgramPath(getShortName()))` https://github.com/llvm/llvm-project/pull/112041 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang: Add llvm-mc to CLANG_TEST_DEPS (PR #112032)
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/112032 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang: Add llvm-mc to CLANG_TEST_DEPS (PR #112032)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/112032 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang: Add llvm-mc to CLANG_TEST_DEPS (PR #112032)
arsenm wrote: * **#112032** https://app.graphite.dev/github/pr/llvm/llvm-project/112032?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/112032 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang: Add llvm-mc to CLANG_TEST_DEPS (PR #112032)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/112032 Attempt to fit sporadic precommit test failures in hip-partial-link.hip The driver really shouldn't be using llvm-mc in the first place though, filed #112031 to fix this. >From 7337759de47b0623f96241927b167e2ed413378d Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 11 Oct 2024 22:20:45 +0400 Subject: [PATCH] clang: Add llvm-mc to CLANG_TEST_DEPS Attempt to fit sporadic precommit test failures in hip-partial-link.hip The driver really shouldn't be using llvm-mc in the first place though, filed #112031 to fix this. --- clang/test/CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/clang/test/CMakeLists.txt b/clang/test/CMakeLists.txt index 2d84b0d73053f6..98829d53db934f 100644 --- a/clang/test/CMakeLists.txt +++ b/clang/test/CMakeLists.txt @@ -127,6 +127,7 @@ if( NOT CLANG_BUILT_STANDALONE ) llvm-dwarfdump llvm-ifs llvm-lto2 +llvm-mc llvm-modextract llvm-nm llvm-objcopy ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add a flag to include GPU startup files (PR #112025)
@@ -648,6 +648,15 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA, Args.MakeArgString("-plugin-opt=-mattr=" + llvm::join(Features, ","))); } + if (Args.hasArg(options::OPT_gpustartfiles)) { arsenm wrote: Default value would be a toolchain choice, so yes? https://github.com/llvm/llvm-project/pull/112025 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add a flag to include GPU startup files (PR #112025)
@@ -648,6 +648,15 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA, Args.MakeArgString("-plugin-opt=-mattr=" + llvm::join(Features, ","))); } + if (Args.hasArg(options::OPT_gpustartfiles)) { arsenm wrote: can we make that have a positive pair, like other flags? No gpu prefix? https://github.com/llvm/llvm-project/pull/112025 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add a flag to include GPU startup files (PR #112025)
@@ -648,6 +648,15 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA, Args.MakeArgString("-plugin-opt=-mattr=" + llvm::join(Features, ","))); } + if (Args.hasArg(options::OPT_gpustartfiles)) { arsenm wrote: Is there prior art for a flag to link crt? (i.e. can we just use that instead of inventing a new -gpu flag) https://github.com/llvm/llvm-project/pull/112025 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang: Fix hipstdpar test relying on default target (PR #111975)
arsenm wrote: Window bot passed, which was the important bit. Linux failed on a different test entirely https://github.com/llvm/llvm-project/pull/111975 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang: Fix hipstdpar test relying on default target (PR #111975)
arsenm wrote: > @arsenm what are you actually trying to fix and what do you expect this to do? Fix not running tests except on linux. We should have maximum host test coverage, and this test has no reason to depend on the host. All it needs is the explicit target instead of relying on the default https://github.com/llvm/llvm-project/pull/111975 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
arsenm wrote: > > The LangRef doesn't need to know why it's undesirable. It's like the n field > > `n` field? The following? > Yes. It's an optimization hint https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang: Fix hipstdpar test relying on default target (PR #111975)
@@ -1,21 +1,17 @@ -// REQUIRES: x86-registered-target -// REQUIRES: amdgpu-registered-target -// REQUIRES: system-linux arsenm wrote: This is a pile of workarounds, there's no reason any of these tests should be host dependent https://github.com/llvm/llvm-project/pull/111975 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)
@@ -15314,6 +15314,32 @@ bool FloatExprEvaluator::VisitCallExpr(const CallExpr *E) { Result = RHS; return true; } + + case Builtin::BI__builtin_fmaximum_num: + case Builtin::BI__builtin_fmaximum_numf: + case Builtin::BI__builtin_fmaximum_numl: + case Builtin::BI__builtin_fmaximum_numf16: + case Builtin::BI__builtin_fmaximum_numf128: { +APFloat RHS(0.); +if (!EvaluateFloat(E->getArg(0), Result, Info) || arsenm wrote: This doesn't have tests showing the evaluation, similar to those added for fmin/fmax in ec32386404409b65d21fdf916110c08912335926 https://github.com/llvm/llvm-project/pull/96281 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,12 +1,14 @@ ; This test aims to check ability to support "Arithmetic with Overflow" intrinsics ; in the special case when those intrinsics are being generated by the CodeGenPrepare; -; pass during translations with optimization (note -O3 in llc arguments). +; pass during translations with optimization (note -disable-lsr, to inhibit +; strength reduction pre-empting with a more preferable match for this pattern +; in llc arguments). -; RUN: llc -O3 -mtriple=spirv32-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O3 -mtriple=spirv32-unknown-unknown %s -o - -filetype=obj | spirv-val %} +; RUN: llc -O3 -disable-lsr -mtriple=spirv32-unknown-unknown %s -o - | FileCheck %s arsenm wrote: The purpose of this test appears to be to demonstrate the net result, which would be update (rather than disable lsr to get the previous output). Some other transform decided something else was better, should show what that is. https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
arsenm wrote: > Does it mean, that the reasoning behind this very PR is not legit? No. This is providing the generic property in the datalayout used by InstCombine and others as a hint of what to do without directly knowing what the target is https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang: Fix hipstdpar test relying on default target (PR #111975)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/111975 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang: Fix hipstdpar test relying on default target (PR #111975)
arsenm wrote: * **#111976** https://app.graphite.dev/github/pr/llvm/llvm-project/111976?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#111975** https://app.graphite.dev/github/pr/llvm/llvm-project/111975?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/111975 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang: Fix hipstdpar test relying on default target (PR #111975)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/111975 Use explicit target and stop restricting hosts it can run on. >From d3ec46ab6c4d4d5d740336a9c81c24ed8dc70680 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 11 Oct 2024 14:38:02 +0400 Subject: [PATCH] clang: Fix hipstdpar test relying on default target Use explicit target and stop restricting hosts it can run on. --- clang/test/Driver/hipstdpar.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/clang/test/Driver/hipstdpar.c b/clang/test/Driver/hipstdpar.c index 32e040ef70d754..b759c5fb2084a3 100644 --- a/clang/test/Driver/hipstdpar.c +++ b/clang/test/Driver/hipstdpar.c @@ -1,21 +1,17 @@ -// REQUIRES: x86-registered-target -// REQUIRES: amdgpu-registered-target -// REQUIRES: system-linux -// UNSUPPORTED: target={{.*}}-zos{{.*}} -// XFAIL: target={{.*}}hexagon{{.*}} -// XFAIL: target={{.*}}-scei{{.*}} -// XFAIL: target={{.*}}-sie{{.*}} +// REQUIRES: x86-registered-target, amdgpu-registered-target -// RUN: not %clang -### --hipstdpar --hipstdpar-path=/does/not/exist -nogpulib \ +// RUN: not %clang -### --target=x86_64-unknown-linux-gnu \ +// RUN: --hipstdpar --hipstdpar-path=/does/not/exist -nogpulib\ // RUN: -nogpuinc --compile %s 2>&1 | \ // RUN: FileCheck --check-prefix=HIPSTDPAR-MISSING-LIB %s -// RUN: %clang -### --hipstdpar --hipstdpar-path=%S/Inputs/hipstdpar \ +// RUN: %clang -### --target=x86_64-unknown-linux-gnu \ +// RUN: --hipstdpar --hipstdpar-path=%S/Inputs/hipstdpar \ // RUN: --hipstdpar-thrust-path=%S/Inputs/hipstdpar/thrust \ // RUN: --hipstdpar-prim-path=%S/Inputs/hipstdpar/rocprim \ // RUN: -nogpulib -nogpuinc --compile %s 2>&1 | \ // RUN: FileCheck --check-prefix=HIPSTDPAR-COMPILE %s // RUN: touch %t.o -// RUN: %clang -### --hipstdpar %t.o 2>&1 | FileCheck --check-prefix=HIPSTDPAR-LINK %s +// RUN: %clang -### --target=x86_64-unknown-linux-gnu --hipstdpar %t.o 2>&1 | FileCheck --check-prefix=HIPSTDPAR-LINK %s // HIPSTDPAR-MISSING-LIB: error: cannot find HIP Standard Parallelism Acceleration library; provide it via '--hipstdpar-path' // HIPSTDPAR-COMPILE: "-x" "hip" ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)
@@ -15314,6 +15314,32 @@ bool FloatExprEvaluator::VisitCallExpr(const CallExpr *E) { Result = RHS; return true; } + + case Builtin::BI__builtin_fmaximum_num: + case Builtin::BI__builtin_fmaximum_numf: + case Builtin::BI__builtin_fmaximum_numl: + case Builtin::BI__builtin_fmaximum_numf16: + case Builtin::BI__builtin_fmaximum_numf128: { +APFloat RHS(0.); +if (!EvaluateFloat(E->getArg(0), Result, Info) || arsenm wrote: Missing constexpr evaluation tests https://github.com/llvm/llvm-project/pull/96281 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)
@@ -475,6 +475,12 @@ SYMBOL(fmaxl, None, ) SYMBOL(fmin, None, ) SYMBOL(fminf, None, ) SYMBOL(fminl, None, ) +SYMBOL(fmaximum_num, None, ) arsenm wrote: Not sure what this for, but this isn't tested? https://github.com/llvm/llvm-project/pull/96281 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)
@@ -1295,6 +1295,24 @@ SYMBOL(fminf, None, ) SYMBOL(fminl, std::, ) SYMBOL(fminl, None, ) SYMBOL(fminl, None, ) +SYMBOL(fmaximum_num, std::, ) arsenm wrote: Not sure what this for, but this isn't tested? https://github.com/llvm/llvm-project/pull/96281 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)
@@ -372,6 +372,31 @@ void foo(double *d, float f, float *fp, long double *l, int *i, const char *c) { // HAS_MAYTRAP: declare float @llvm.experimental.constrained.minnum.f32( // HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.minnum.f80( + fmaximum_num(f,f); fmaximum_numf(f,f); fmaximum_numl(f,f); arsenm wrote: Use right type and avoid the implicit casts? https://github.com/llvm/llvm-project/pull/96281 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)
@@ -15314,6 +15314,32 @@ bool FloatExprEvaluator::VisitCallExpr(const CallExpr *E) { Result = RHS; arsenm wrote: Unrelated, but why is up here reproducing logic that's already in APFloat? https://github.com/llvm/llvm-project/pull/96281 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)
@@ -683,6 +706,59 @@ struct AAAMDAttributesFunction : public AAAMDAttributes { return !A.checkForAllCallLikeInstructions(DoesNotRetrieve, *this, UsedAssumedInformation); } + + // Returns true if FlatScratchInit is needed, i.e., no-flat-scratch-init is + // not to be set. + bool needFlatScratchInit(Attributor &A) { +assert(isAssumed(FLAT_SCRATCH_INIT)); // only called if the bit is still set + +// This is called on each callee; false means callee shouldn't have +// no-flat-scratch-init. +auto CheckForNoFlatScratchInit = [&](Instruction &I) { + const auto &CB = cast(I); + const Function *Callee = CB.getCalledFunction(); + + // Callee == 0 for inline asm or indirect call with known callees. + // In the latter case, updateImpl() already checked the callees and we + // know their FLAT_SCRATCH_INIT bit is set. + // If function has indirect call with unknown callees, the bit is + // already removed in updateImpl() and execution won't reach here. + if (!Callee) +return true; + + return Callee->getIntrinsicID() != + Intrinsic::amdgcn_addrspacecast_nonnull; +}; + +bool UsedAssumedInformation = false; +// If any callee is false (i.e. need FlatScratchInit), +// checkForAllCallLikeInstructions returns false, in which case this +// function returns true. +return !A.checkForAllCallLikeInstructions(CheckForNoFlatScratchInit, *this, + UsedAssumedInformation); + } + + bool constHasASCast(const Constant *C, + SmallPtrSetImpl &Visited) { +if (!Visited.insert(C).second) + return false; + +if (const auto *CE = dyn_cast(C)) + if (CE->getOpcode() == Instruction::AddrSpaceCast && + CE->getOperand(0)->getType()->getPointerAddressSpace() == + AMDGPUAS::PRIVATE_ADDRESS) +return true; + +for (const Use &U : C->operands()) { + const auto *OpC = dyn_cast(U); + if (!OpC || !Visited.insert(OpC).second) +continue; + + if (constHasASCast(OpC, Visited)) +return true; +} +return false; + } arsenm wrote: I do not want to duplicate the same function that already exists for the LDS case. Unify these. We also should try to avoid doing this walk over all instructions through all constant expressions twice for the two attributes https://github.com/llvm/llvm-project/pull/94647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)
@@ -439,6 +439,26 @@ struct AAAMDAttributesFunction : public AAAMDAttributes { indicatePessimisticFixpoint(); return; } + +SmallPtrSet VisitedConsts; + +for (Instruction &I : instructions(F)) { arsenm wrote: Should use checkForAllInstructions instead of manually looking at all instructions https://github.com/llvm/llvm-project/pull/94647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
arsenm wrote: > But that will still require to define, what is undesirable address space > right? The LangRef doesn't need to know why it's undesirable. It's like the n field https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [polly] [NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (PR #111752)
https://github.com/arsenm approved this pull request. There are definitely places that would benefit from a getDeclaration. There are several places that have to awkwardly construct the intrinsic name to check getFunction https://github.com/llvm/llvm-project/pull/111752 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) arsenm wrote: We could do the same thing for amdgpu. We implement addrspacecast with the same operations. This also reminds me, we should have a valid flag on addrspacecast. https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) arsenm wrote: If I have skimmed SPIRV correctly, it expects invalid addrspacecasts (OpGenericCastToPtrExplicit) to return null. You could implement the same kind of check by looking for icmp ne (addrspacecast x to y), null https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (PR #111579)
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/111579 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (PR #111579)
arsenm wrote: > Not sure if you still want to keep it for backward compatibility. Definitely not. It's already bitcode auto upgraded https://github.com/llvm/llvm-project/pull/111579 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (PR #111579)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/111579 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (PR #111579)
arsenm wrote: * **#111579** https://app.graphite.dev/github/pr/llvm/llvm-project/111579?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/111579 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (PR #111579)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/111579 This has been replaced with metadata on individual atomicrmw instructions. >From be077b9947546b5d6a87be7c57d44b57ff6efb5f Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Thu, 27 Jun 2024 13:46:35 +0200 Subject: [PATCH] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute This has been replaced with metadata on individual atomicrmw instructions. --- clang/lib/CodeGen/Targets/AMDGPU.cpp| 3 --- clang/test/CodeGenCUDA/amdgpu-func-attrs.cu | 22 - clang/test/OpenMP/amdgcn-attributes.cpp | 3 --- 3 files changed, 28 deletions(-) delete mode 100644 clang/test/CodeGenCUDA/amdgpu-func-attrs.cu diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp index 37e6af3d4196a8..b852dcffb295c9 100644 --- a/clang/lib/CodeGen/Targets/AMDGPU.cpp +++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp @@ -452,9 +452,6 @@ void AMDGPUTargetCodeGenInfo::setTargetAttributes( if (FD) setFunctionDeclAttributes(FD, F, M); - if (M.getContext().getTargetInfo().allowAMDGPUUnsafeFPAtomics()) -F->addFnAttr("amdgpu-unsafe-fp-atomics", "true"); - if (!getABIInfo().getCodeGenOpts().EmitIEEENaNCompliantInsts) F->addFnAttr("amdgpu-ieee", "false"); } diff --git a/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu b/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu deleted file mode 100644 index 89add87919c12d..00 --- a/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu +++ /dev/null @@ -1,22 +0,0 @@ -// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \ -// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \ -// RUN: | FileCheck -check-prefixes=NO-UNSAFE-FP-ATOMICS %s -// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \ -// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \ -// RUN: -munsafe-fp-atomics \ -// RUN: | FileCheck -check-prefixes=UNSAFE-FP-ATOMICS %s -// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \ -// RUN: -o - -x hip %s -munsafe-fp-atomics \ -// RUN: | FileCheck -check-prefix=NO-UNSAFE-FP-ATOMICS %s - -#include "Inputs/cuda.h" - -__device__ void test() { -// UNSAFE-FP-ATOMICS: define{{.*}} void @_Z4testv() [[ATTR:#[0-9]+]] -} - - -// Make sure this is silently accepted on other targets. -// NO-UNSAFE-FP-ATOMICS-NOT: "amdgpu-unsafe-fp-atomics" - -// UNSAFE-FP-ATOMICS-DAG: attributes [[ATTR]] = {{.*}}"amdgpu-unsafe-fp-atomics"="true" diff --git a/clang/test/OpenMP/amdgcn-attributes.cpp b/clang/test/OpenMP/amdgcn-attributes.cpp index 5ddc34537d12fb..2c9e16a4f5098e 100644 --- a/clang/test/OpenMP/amdgcn-attributes.cpp +++ b/clang/test/OpenMP/amdgcn-attributes.cpp @@ -5,7 +5,6 @@ // RUN: %clang_cc1 -target-cpu gfx900 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck -check-prefixes=CPU,ALL %s // RUN: %clang_cc1 -menable-no-nans -mno-amdgpu-ieee -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck -check-prefixes=NOIEEE,ALL %s -// RUN: %clang_cc1 -munsafe-fp-atomics -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck -check-prefixes=UNSAFEATOMIC,ALL %s // expected-no-diagnostics @@ -35,9 +34,7 @@ int callable(int x) { // DEFAULT: attributes #0 = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,42" "kernel" "no-trapping-math"="true" "omp_target_thread_limit"="42" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" } // CPU: attributes #0 = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,42" "kernel" "no-trapping-math"="true" "omp_target_thread_limit"="42" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" "uniform-work-group-size"="true" } // NOIEEE: attributes #0 = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,42" "amdgpu-ieee"="false" "kernel" "no-nans-fp-math"="true" "no-trapping-math"="true" "omp_target_thread_limit"="42" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" } -// UNSAFEATOMIC: attributes #0 = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,42" "amdgpu-unsafe-fp-atomics"="true" "kernel" "no-trapping-math"="true" "omp_target_thread_limit"="42" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" } // DEFAULT: attributes #2 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer
[clang] clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (PR #102462)
@@ -647,6 +647,14 @@ class LangOptions : public LangOptionsBase { return ConvergentFunctions; } + /// Return true if atomicrmw operations targeting allocations in private + /// memory are undefined. + bool threadPrivateMemoryAtomicsAreUndefined() const { +// Should be false for OpenMP. +// TODO: Should this be true for SYCL? arsenm wrote: This is now derived from the builtins rather than the language mode https://github.com/llvm/llvm-project/pull/102462 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (PR #102462)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/102462 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,12 +1,14 @@ ; This test aims to check ability to support "Arithmetic with Overflow" intrinsics ; in the special case when those intrinsics are being generated by the CodeGenPrepare; -; pass during translations with optimization (note -O3 in llc arguments). +; pass during translations with optimization (note -disable-lsr, to inhibit +; strength reduction pre-empting with a more preferable match for this pattern +; in llc arguments). -; RUN: llc -O3 -mtriple=spirv32-unknown-unknown %s -o - | FileCheck %s -; RUN: %if spirv-tools %{ llc -O3 -mtriple=spirv32-unknown-unknown %s -o - -filetype=obj | spirv-val %} +; RUN: llc -O3 -disable-lsr -mtriple=spirv32-unknown-unknown %s -o - | FileCheck %s arsenm wrote: If the intent is to specifically check codegenprepare, should have an IR->IR test in test/Transforms/CodeGenPrepare. I don't know whether the -disable-lsr output is the best or not, but based on the name of the test I would assume this would try to document the actual result, not with the special flag. Also this test shouldn't have been using -O3 (it barely does anything and -O2 is the default) https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)
arsenm wrote: > WRT eliminating the constrained intrinsics completely, I thought that operand > bundles could only be attached to function calls and not regular > instructions? If I'm wrong, we _still_ have a problem because there are so > many uses of the regular FP instructions that we can't be safe-by-default and > still use those instructions. We'd need to keep some kind of the constrained > intrinsics (or new intrinsics) that give us replacements for the regular FP > instructions. Right, we would need to introduce new llvm.fadd etc. to carry bundles. If there are no bundles these could fold back to the regular instruction https://github.com/llvm/llvm-project/pull/109798 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -0,0 +1,201 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z16 | FileCheck %s +; +; Tests for 16-bit floating point (half). + +; Incoming half arguments added together and returned. +define half @fun0(half %Op0, half %Op1) { +; CHECK-LABEL: fun0: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r13, %r15, 104(%r15) +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -168 +; CHECK-NEXT:.cfi_def_cfa_offset 328 +; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill +; CHECK-NEXT:.cfi_offset %f8, -168 +; CHECK-NEXT:vlgvf %r0, %v2, 0 +; CHECK-NEXT:llghr %r2, %r0 +; CHECK-NEXT:vlgvf %r13, %v0, 0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:llghr %r2, %r13 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:aebr %f0, %f8 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload +; CHECK-NEXT:vlvgf %v0, %r2, 0 +; CHECK-NEXT:lmg %r13, %r15, 272(%r15) +; CHECK-NEXT:br %r14 +entry: + %Res = fadd half %Op0, %Op1 + ret half %Res +} + +; The half values are loaded and stored instead. +define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) { +; CHECK-LABEL: fun1: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r12, %r15, 96(%r15) +; CHECK-NEXT:.cfi_offset %r12, -64 +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -168 +; CHECK-NEXT:.cfi_def_cfa_offset 328 +; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill +; CHECK-NEXT:.cfi_offset %f8, -168 +; CHECK-NEXT:llgh %r12, 0(%r2) +; CHECK-NEXT:llgh %r2, 0(%r3) +; CHECK-NEXT:lgr %r13, %r4 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:lgr %r2, %r12 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:aebr %f0, %f8 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload +; CHECK-NEXT:sth %r2, 0(%r13) +; CHECK-NEXT:lmg %r12, %r15, 264(%r15) +; CHECK-NEXT:br %r14 +entry: + %0 = load half, ptr %Op0, align 2 + %1 = load half, ptr %Op1, align 2 + %add = fadd half %0, %1 + store half %add, ptr %Dst, align 2 + ret void +} + +; Test a chain of half operations which should have each operation surrounded +; by conversions to/from fp32 for proper emulation. +define half @fun2(half %Op0, half %Op1, half %Op2) { +; CHECK-LABEL: fun2: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r12, %r15, 96(%r15) +; CHECK-NEXT:.cfi_offset %r12, -64 +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -168 +; CHECK-NEXT:.cfi_def_cfa_offset 328 +; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill +; CHECK-NEXT:.cfi_offset %f8, -168 +; CHECK-NEXT:vlgvf %r0, %v2, 0 +; CHECK-NEXT:llghr %r2, %r0 +; CHECK-NEXT:vlgvf %r13, %v4, 0 +; CHECK-NEXT:vlgvf %r12, %v0, 0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:llghr %r2, %r12 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:aebr %f0, %f8 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:llghr %r2, %r2 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:llghr %r2, %r13 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:wfasb %f0, %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload +; CHECK-NEXT:vlvgf %v0, %r2, 0 +; CHECK-NEXT:lmg %r12, %r15, 264(%r15) +; CHECK-NEXT:br %r14 +entry: + %A0 = fadd half %Op0, %Op1 + %Res = fadd half %A0, %Op2 + ret half %Res +} + +; Store an incoming half argument and return a loaded one. +define half @fun3(half %Op0, ptr %Dst, ptr %Src) { +; CHECK-LABEL: fun3: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:vlgvf %r0, %v0, 0 +; CHECK-NEXT:sth %r0, 0(%r2) +; CHECK-NEXT:lh %r0, 0(%r3) +; CHECK-NEXT:vlvgf %v0, %r0, 0 +; CHECK-NEXT:br %r14 +entry: + store half %Op0, ptr %Dst + + %Res = load half, ptr %Src + ret half %Res +} + +; Call a function with half argument and return values. +declare half @foo(half) +define void @fun4(ptr %Src, ptr %Dst) { +; CHECK-LABEL: fun4: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r13, %r15, 104(%r15) +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -160 +; CHECK-NEXT:.cfi_def_cfa_offset 320 +; CHECK-NEXT:lh %r0, 0(%r2) +; CHECK-NEXT:vlvgf %v0, %r0, 0 +; CHEC
[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -784,6 +791,20 @@ bool SystemZTargetLowering::useSoftFloat() const { return Subtarget.hasSoftFloat(); } +MVT SystemZTargetLowering::getRegisterTypeForCallingConv( + LLVMContext &Context, CallingConv::ID CC, + EVT VT) const { + // 128-bit single-element vector types are passed like other vectors, + // not like their element type. + if (VT.isVector() && VT.getSizeInBits() == 128 && + VT.getVectorNumElements() == 1) +return MVT::v16i8; + // Keep f16 so that they can be recognized and handled. + if (VT == MVT::f16) arsenm wrote: I assume this is because it's an illegal type. It would be much nicer if calling convention code just always worked on the original types to begin with https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -1597,6 +1618,15 @@ bool SystemZTargetLowering::splitValueIntoRegisterParts( return true; } + // Convert f16 to f32 (Out-arg). + if (PartVT == MVT::f16) { +assert(NumParts == 1 && ""); arsenm wrote: Remove && "" or make it a meaningful message https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -0,0 +1,201 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z16 | FileCheck %s +; +; Tests for 16-bit floating point (half). + +; Incoming half arguments added together and returned. +define half @fun0(half %Op0, half %Op1) { +; CHECK-LABEL: fun0: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r13, %r15, 104(%r15) +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -168 +; CHECK-NEXT:.cfi_def_cfa_offset 328 +; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill +; CHECK-NEXT:.cfi_offset %f8, -168 +; CHECK-NEXT:vlgvf %r0, %v2, 0 +; CHECK-NEXT:llghr %r2, %r0 +; CHECK-NEXT:vlgvf %r13, %v0, 0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:llghr %r2, %r13 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:aebr %f0, %f8 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload +; CHECK-NEXT:vlvgf %v0, %r2, 0 +; CHECK-NEXT:lmg %r13, %r15, 272(%r15) +; CHECK-NEXT:br %r14 +entry: + %Res = fadd half %Op0, %Op1 + ret half %Res +} + +; The half values are loaded and stored instead. +define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) { +; CHECK-LABEL: fun1: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r12, %r15, 96(%r15) +; CHECK-NEXT:.cfi_offset %r12, -64 +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -168 +; CHECK-NEXT:.cfi_def_cfa_offset 328 +; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill +; CHECK-NEXT:.cfi_offset %f8, -168 +; CHECK-NEXT:llgh %r12, 0(%r2) +; CHECK-NEXT:llgh %r2, 0(%r3) +; CHECK-NEXT:lgr %r13, %r4 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:lgr %r2, %r12 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:aebr %f0, %f8 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload +; CHECK-NEXT:sth %r2, 0(%r13) +; CHECK-NEXT:lmg %r12, %r15, 264(%r15) +; CHECK-NEXT:br %r14 +entry: + %0 = load half, ptr %Op0, align 2 + %1 = load half, ptr %Op1, align 2 + %add = fadd half %0, %1 + store half %add, ptr %Dst, align 2 + ret void +} + +; Test a chain of half operations which should have each operation surrounded +; by conversions to/from fp32 for proper emulation. +define half @fun2(half %Op0, half %Op1, half %Op2) { +; CHECK-LABEL: fun2: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r12, %r15, 96(%r15) +; CHECK-NEXT:.cfi_offset %r12, -64 +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -168 +; CHECK-NEXT:.cfi_def_cfa_offset 328 +; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill +; CHECK-NEXT:.cfi_offset %f8, -168 +; CHECK-NEXT:vlgvf %r0, %v2, 0 +; CHECK-NEXT:llghr %r2, %r0 +; CHECK-NEXT:vlgvf %r13, %v4, 0 +; CHECK-NEXT:vlgvf %r12, %v0, 0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:llghr %r2, %r12 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:aebr %f0, %f8 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:llghr %r2, %r2 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:llghr %r2, %r13 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:wfasb %f0, %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload +; CHECK-NEXT:vlvgf %v0, %r2, 0 +; CHECK-NEXT:lmg %r12, %r15, 264(%r15) +; CHECK-NEXT:br %r14 +entry: + %A0 = fadd half %Op0, %Op1 + %Res = fadd half %A0, %Op2 + ret half %Res +} + +; Store an incoming half argument and return a loaded one. +define half @fun3(half %Op0, ptr %Dst, ptr %Src) { +; CHECK-LABEL: fun3: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:vlgvf %r0, %v0, 0 +; CHECK-NEXT:sth %r0, 0(%r2) +; CHECK-NEXT:lh %r0, 0(%r3) +; CHECK-NEXT:vlvgf %v0, %r0, 0 +; CHECK-NEXT:br %r14 +entry: + store half %Op0, ptr %Dst + + %Res = load half, ptr %Src + ret half %Res +} + +; Call a function with half argument and return values. +declare half @foo(half) +define void @fun4(ptr %Src, ptr %Dst) { +; CHECK-LABEL: fun4: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r13, %r15, 104(%r15) +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -160 +; CHECK-NEXT:.cfi_def_cfa_offset 320 +; CHECK-NEXT:lh %r0, 0(%r2) +; CHECK-NEXT:vlvgf %v0, %r0, 0 +; CHEC
[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -0,0 +1,201 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z16 | FileCheck %s +; +; Tests for 16-bit floating point (half). + +; Incoming half arguments added together and returned. +define half @fun0(half %Op0, half %Op1) { +; CHECK-LABEL: fun0: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r13, %r15, 104(%r15) +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -168 +; CHECK-NEXT:.cfi_def_cfa_offset 328 +; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill +; CHECK-NEXT:.cfi_offset %f8, -168 +; CHECK-NEXT:vlgvf %r0, %v2, 0 +; CHECK-NEXT:llghr %r2, %r0 +; CHECK-NEXT:vlgvf %r13, %v0, 0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:llghr %r2, %r13 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:aebr %f0, %f8 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload +; CHECK-NEXT:vlvgf %v0, %r2, 0 +; CHECK-NEXT:lmg %r13, %r15, 272(%r15) +; CHECK-NEXT:br %r14 +entry: + %Res = fadd half %Op0, %Op1 + ret half %Res +} + +; The half values are loaded and stored instead. +define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) { +; CHECK-LABEL: fun1: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r12, %r15, 96(%r15) +; CHECK-NEXT:.cfi_offset %r12, -64 +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -168 +; CHECK-NEXT:.cfi_def_cfa_offset 328 +; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill +; CHECK-NEXT:.cfi_offset %f8, -168 +; CHECK-NEXT:llgh %r12, 0(%r2) +; CHECK-NEXT:llgh %r2, 0(%r3) +; CHECK-NEXT:lgr %r13, %r4 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:lgr %r2, %r12 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:aebr %f0, %f8 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload +; CHECK-NEXT:sth %r2, 0(%r13) +; CHECK-NEXT:lmg %r12, %r15, 264(%r15) +; CHECK-NEXT:br %r14 +entry: + %0 = load half, ptr %Op0, align 2 + %1 = load half, ptr %Op1, align 2 + %add = fadd half %0, %1 + store half %add, ptr %Dst, align 2 + ret void +} + +; Test a chain of half operations which should have each operation surrounded +; by conversions to/from fp32 for proper emulation. +define half @fun2(half %Op0, half %Op1, half %Op2) { +; CHECK-LABEL: fun2: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r12, %r15, 96(%r15) +; CHECK-NEXT:.cfi_offset %r12, -64 +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -168 +; CHECK-NEXT:.cfi_def_cfa_offset 328 +; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill +; CHECK-NEXT:.cfi_offset %f8, -168 +; CHECK-NEXT:vlgvf %r0, %v2, 0 +; CHECK-NEXT:llghr %r2, %r0 +; CHECK-NEXT:vlgvf %r13, %v4, 0 +; CHECK-NEXT:vlgvf %r12, %v0, 0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:llghr %r2, %r12 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:aebr %f0, %f8 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:llghr %r2, %r2 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:llghr %r2, %r13 +; CHECK-NEXT:ldr %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT +; CHECK-NEXT:wfasb %f0, %f8, %f0 +; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT +; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload +; CHECK-NEXT:vlvgf %v0, %r2, 0 +; CHECK-NEXT:lmg %r12, %r15, 264(%r15) +; CHECK-NEXT:br %r14 +entry: + %A0 = fadd half %Op0, %Op1 + %Res = fadd half %A0, %Op2 + ret half %Res +} + +; Store an incoming half argument and return a loaded one. +define half @fun3(half %Op0, ptr %Dst, ptr %Src) { +; CHECK-LABEL: fun3: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:vlgvf %r0, %v0, 0 +; CHECK-NEXT:sth %r0, 0(%r2) +; CHECK-NEXT:lh %r0, 0(%r3) +; CHECK-NEXT:vlvgf %v0, %r0, 0 +; CHECK-NEXT:br %r14 +entry: + store half %Op0, ptr %Dst + + %Res = load half, ptr %Src + ret half %Res +} + +; Call a function with half argument and return values. +declare half @foo(half) +define void @fun4(ptr %Src, ptr %Dst) { +; CHECK-LABEL: fun4: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:stmg %r13, %r15, 104(%r15) +; CHECK-NEXT:.cfi_offset %r13, -56 +; CHECK-NEXT:.cfi_offset %r14, -48 +; CHECK-NEXT:.cfi_offset %r15, -40 +; CHECK-NEXT:aghi %r15, -160 +; CHECK-NEXT:.cfi_def_cfa_offset 320 +; CHECK-NEXT:lh %r0, 0(%r2) +; CHECK-NEXT:vlvgf %v0, %r0, 0 +; CHEC
[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -0,0 +1,85 @@ +// RUN: %clang_cc1 -triple s390x-linux-gnu \ +// RUN: -ffloat16-excess-precision=standard -emit-llvm -o - %s \ +// RUN: | FileCheck %s -check-prefix=STANDARD + +// RUN: %clang_cc1 -triple s390x-linux-gnu \ +// RUN: -ffloat16-excess-precision=none -emit-llvm -o - %s \ +// RUN: | FileCheck %s -check-prefix=NONE + +// RUN: %clang_cc1 -triple s390x-linux-gnu \ +// RUN: -ffloat16-excess-precision=fast -emit-llvm -o - %s \ +// RUN: | FileCheck %s -check-prefix=FAST + +_Float16 f(_Float16 a, _Float16 b, _Float16 c, _Float16 d) { +return a * b + c * d; +} + arsenm wrote: Test vector cases https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -784,6 +791,20 @@ bool SystemZTargetLowering::useSoftFloat() const { return Subtarget.hasSoftFloat(); } +MVT SystemZTargetLowering::getRegisterTypeForCallingConv( + LLVMContext &Context, CallingConv::ID CC, + EVT VT) const { + // 128-bit single-element vector types are passed like other vectors, + // not like their element type. + if (VT.isVector() && VT.getSizeInBits() == 128 && + VT.getVectorNumElements() == 1) +return MVT::v16i8; arsenm wrote: Seems unrelated? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)
arsenm wrote: > Think it would be useful to put that on functions in the wrapper headers that > definitely aren't convergent? E.g. getting a thread id. You could, but it's trivially inferable in those cases anyway https://github.com/llvm/llvm-project/pull/111076 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/111076 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)
@@ -4106,9 +4106,10 @@ bool CompilerInvocation::ParseLangArgs(LangOptions &Opts, ArgList &Args, Opts.Blocks = Args.hasArg(OPT_fblocks) || (Opts.OpenCL && Opts.OpenCLVersion == 200); - Opts.ConvergentFunctions = Args.hasArg(OPT_fconvergent_functions) || - Opts.OpenCL || (Opts.CUDA && Opts.CUDAIsDevice) || - Opts.SYCLIsDevice || Opts.HLSL; + Opts.ConvergentFunctions = Args.hasFlag( + OPT_fconvergent_functions, OPT_fno_convergent_functions, + Opts.OpenMPIsTargetDevice || T.isAMDGPU() || T.isNVPTX() || Opts.OpenCL || + Opts.CUDAIsDevice || Opts.SYCLIsDevice || Opts.HLSL); arsenm wrote: Sort all the language checks together, before the target list. We probably should have a hasConvergentOperations() predicate somewhere https://github.com/llvm/llvm-project/pull/111076 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)
arsenm wrote: > -fno-convergent-functions to opt-out if you want to test broken behavior. You may legitimately know there are no convergent functions in the TU. We also have the noconvergent source attribute now for this https://github.com/llvm/llvm-project/pull/111076 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Implement resource directory headers for common GPU intrinsics (PR #110179)
@@ -0,0 +1,187 @@ +//===-- amdgpuintrin.h - AMDPGU intrinsic functions ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef __AMDGPUINTRIN_H +#define __AMDGPUINTRIN_H + +#ifndef __AMDGPU__ +#error "This file is intended for AMDGPU targets or offloading to AMDGPU +#endif + +#include +#include + +#if defined(__HIP__) || defined(__CUDA__) +#define _DEFAULT_ATTRS __attribute__((device)) __attribute__((always_inline)) +#else +#define _DEFAULT_ATTRS __attribute__((always_inline)) +#endif + +#pragma omp begin declare target device_type(nohost) +#pragma omp begin declare variant match(device = {arch(amdgcn)}) + +// Type aliases to the address spaces used by the AMDGPU backend. +#define _private __attribute__((opencl_private)) +#define _constant __attribute__((opencl_constant)) +#define _local __attribute__((opencl_local)) +#define _global __attribute__((opencl_global)) + +// Attribute to declare a function as a kernel. +#define _kernel __attribute__((amdgpu_kernel, visibility("protected"))) + +// Returns the number of workgroups in the 'x' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_x() { + return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x(); +} + +// Returns the number of workgroups in the 'y' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_y() { + return __builtin_amdgcn_grid_size_y() / __builtin_amdgcn_workgroup_size_y(); +} + +// Returns the number of workgroups in the 'z' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_z() { + return __builtin_amdgcn_grid_size_z() / __builtin_amdgcn_workgroup_size_z(); +} + +// Returns the total number of workgruops in the grid. +_DEFAULT_ATTRS static inline uint64_t _get_num_blocks() { + return _get_num_blocks_x() * _get_num_blocks_y() * _get_num_blocks_z(); +} + +// Returns the 'x' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_x() { + return __builtin_amdgcn_workgroup_id_x(); +} + +// Returns the 'y' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_y() { + return __builtin_amdgcn_workgroup_id_y(); +} + +// Returns the 'z' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_z() { + return __builtin_amdgcn_workgroup_id_z(); +} + +// Returns the absolute id of the AMD workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_block_id() { + return _get_block_id_x() + _get_num_blocks_x() * _get_block_id_y() + + _get_num_blocks_x() * _get_num_blocks_y() * _get_block_id_z(); +} + +// Returns the number of workitems in the 'x' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_x() { + return __builtin_amdgcn_workgroup_size_x(); +} + +// Returns the number of workitems in the 'y' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_y() { + return __builtin_amdgcn_workgroup_size_y(); +} + +// Returns the number of workitems in the 'z' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_z() { + return __builtin_amdgcn_workgroup_size_z(); +} + +// Returns the total number of workitems in the workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_num_threads() { + return _get_num_threads_x() * _get_num_threads_y() * _get_num_threads_z(); +} + +// Returns the 'x' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_x() { + return __builtin_amdgcn_workitem_id_x(); +} + +// Returns the 'y' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_y() { + return __builtin_amdgcn_workitem_id_y(); +} + +// Returns the 'z' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_z() { + return __builtin_amdgcn_workitem_id_z(); +} + +// Returns the absolute id of the thread in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_thread_id() { + return _get_thread_id_x() + _get_num_threads_x() * _get_thread_id_y() + + _get_num_threads_x() * _get_num_threads_y() * _get_thread_id_z(); +} + +// Returns the size of an AMD wavefront, either 32 or 64 depending on hardware +// and compilation options. +_DEFAULT_ATTRS static inline uint32_t _get_lane_size() { + return __builtin_amdgcn_wavefrontsize(); +} + +// Returns the id of the thread inside of an AMD wavefront executing together. +_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t _get_lane_id() { + return __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u)); +} + +// Returns the bit-mask of active threads in
[clang] [Clang] Implement resource directory headers for common GPU intrinsics (PR #110179)
@@ -0,0 +1,187 @@ +//===-- amdgpuintrin.h - AMDPGU intrinsic functions ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef __AMDGPUINTRIN_H +#define __AMDGPUINTRIN_H + +#ifndef __AMDGPU__ +#error "This file is intended for AMDGPU targets or offloading to AMDGPU +#endif + +#include +#include + +#if defined(__HIP__) || defined(__CUDA__) +#define _DEFAULT_ATTRS __attribute__((device)) __attribute__((always_inline)) +#else +#define _DEFAULT_ATTRS __attribute__((always_inline)) +#endif + +#pragma omp begin declare target device_type(nohost) +#pragma omp begin declare variant match(device = {arch(amdgcn)}) + +// Type aliases to the address spaces used by the AMDGPU backend. +#define _private __attribute__((opencl_private)) +#define _constant __attribute__((opencl_constant)) +#define _local __attribute__((opencl_local)) +#define _global __attribute__((opencl_global)) + +// Attribute to declare a function as a kernel. +#define _kernel __attribute__((amdgpu_kernel, visibility("protected"))) + +// Returns the number of workgroups in the 'x' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_x() { + return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x(); +} + +// Returns the number of workgroups in the 'y' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_y() { + return __builtin_amdgcn_grid_size_y() / __builtin_amdgcn_workgroup_size_y(); +} + +// Returns the number of workgroups in the 'z' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_z() { + return __builtin_amdgcn_grid_size_z() / __builtin_amdgcn_workgroup_size_z(); +} + +// Returns the total number of workgruops in the grid. +_DEFAULT_ATTRS static inline uint64_t _get_num_blocks() { + return _get_num_blocks_x() * _get_num_blocks_y() * _get_num_blocks_z(); +} + +// Returns the 'x' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_x() { + return __builtin_amdgcn_workgroup_id_x(); +} + +// Returns the 'y' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_y() { + return __builtin_amdgcn_workgroup_id_y(); +} + +// Returns the 'z' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_z() { + return __builtin_amdgcn_workgroup_id_z(); +} + +// Returns the absolute id of the AMD workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_block_id() { + return _get_block_id_x() + _get_num_blocks_x() * _get_block_id_y() + + _get_num_blocks_x() * _get_num_blocks_y() * _get_block_id_z(); +} + +// Returns the number of workitems in the 'x' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_x() { + return __builtin_amdgcn_workgroup_size_x(); +} + +// Returns the number of workitems in the 'y' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_y() { + return __builtin_amdgcn_workgroup_size_y(); +} + +// Returns the number of workitems in the 'z' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_z() { + return __builtin_amdgcn_workgroup_size_z(); +} + +// Returns the total number of workitems in the workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_num_threads() { + return _get_num_threads_x() * _get_num_threads_y() * _get_num_threads_z(); +} + +// Returns the 'x' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_x() { + return __builtin_amdgcn_workitem_id_x(); +} + +// Returns the 'y' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_y() { + return __builtin_amdgcn_workitem_id_y(); +} + +// Returns the 'z' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_z() { + return __builtin_amdgcn_workitem_id_z(); +} + +// Returns the absolute id of the thread in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_thread_id() { + return _get_thread_id_x() + _get_num_threads_x() * _get_thread_id_y() + + _get_num_threads_x() * _get_num_threads_y() * _get_thread_id_z(); +} + +// Returns the size of an AMD wavefront, either 32 or 64 depending on hardware +// and compilation options. +_DEFAULT_ATTRS static inline uint32_t _get_lane_size() { + return __builtin_amdgcn_wavefrontsize(); +} + +// Returns the id of the thread inside of an AMD wavefront executing together. +_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t _get_lane_id() { arsenm wrote: We should really just rip out the convergent source attribute. We should only have noconvergent
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) +return std::pair(Ptr, AddressSpace::CrossWorkgroup); + + return std::pair(nullptr, UINT32_MAX); +} arsenm wrote: This is the fancy stuff that should go into a follow up patch to add assume support https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { arsenm wrote: Move to separate change, not sure this is necessarily valid for spirv https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -178,6 +266,9 @@ void SPIRVPassConfig::addIRPasses() { addPass(createSPIRVStructurizerPass()); } + if (TM.getOptLevel() > CodeGenOptLevel::None) +addPass(createInferAddressSpacesPass(AddressSpace::Generic)); arsenm wrote: Not sure why this is a pass parameter to InferAddressSpaces, and a TTI hook https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) arsenm wrote: Shouldn't be looking at the amdgcn intrinsics? Surely spirv has its own operations for this? https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) +return std::pair(Ptr, AddressSpace::CrossWorkgroup); + + return std::pair(nullptr, UINT32_MAX); +} + +bool SPIRVTargetMachine::isNoopAddrSpaceCast(unsigned SrcAS, + unsigned DestAS) const { + if (SrcAS != AddressSpace::Generic && SrcAS != AddressSpace::CrossWorkgroup) +return false; + return DestAS == AddressSpace::Generic || + DestAS == AddressSpace::CrossWorkgroup; +} arsenm wrote: This is separate, I don't think InferAddressSpaces relies on this https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -0,0 +1,29 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py arsenm wrote: You don't need to duplicate all of these tests. You just need some basic samples that the target is implemented, the full set is testing pass mechanics which can be done on any target https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NFC][TableGen] Change `Record::getSuperClasses` to use const Record* (PR #110845)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/110845 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [TableGen] Change `DefInit::Def` to a const Record pointer (PR #110747)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/110747 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [TableGen] Change `DefInit::Def` to a const Record pointer (PR #110747)
@@ -1660,7 +1660,7 @@ class Record { // this record. SmallVector Locs; SmallVector ForwardDeclarationLocs; - SmallVector ReferenceLocs; + mutable SmallVector ReferenceLocs; arsenm wrote: You have the const_cast on the addition, so this is unnecessary? https://github.com/llvm/llvm-project/pull/110747 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,56 +0,0 @@ -; This test aims to check ability to support "Arithmetic with Overflow" intrinsics arsenm wrote: The codegen prepare behavior is still backend code to be tested. You can just run codegenprepare as a standalone pass too (usually would have separate llc and opt run lines in such a test) https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [llvm] [mlir] Make Ownership of MachineModuleInfo in Its Wrapper Pass External (PR #110443)
@@ -0,0 +1,102 @@ +//===-- LLVMTargetMachineC.cpp ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This file implements the LLVM-C part of TargetMachine.h that directly +// depends on the CodeGen library. +// +//===--===// + +#include "llvm-c/Core.h" +#include "llvm-c/TargetMachine.h" +#include "llvm/CodeGen/MachineModuleInfo.h" +#include "llvm/IR/LegacyPassManager.h" +#include "llvm/IR/Module.h" +#include "llvm/Support/FileSystem.h" +#include "llvm/Support/raw_ostream.h" +#include "llvm/Target/TargetMachine.h" + +using namespace llvm; + +static TargetMachine *unwrap(LLVMTargetMachineRef P) { + return reinterpret_cast(P); +} + +static Target *unwrap(LLVMTargetRef P) { return reinterpret_cast(P); } + +static LLVMTargetMachineRef wrap(const TargetMachine *P) { + return reinterpret_cast(const_cast(P)); +} + +static LLVMTargetRef wrap(const Target *P) { + return reinterpret_cast(const_cast(P)); +} + +static LLVMBool LLVMTargetMachineEmit(LLVMTargetMachineRef T, LLVMModuleRef M, + raw_pwrite_stream &OS, + LLVMCodeGenFileType codegen, + char **ErrorMessage) { + TargetMachine *TM = unwrap(T); + Module *Mod = unwrap(M); + + legacy::PassManager pass; + MachineModuleInfo MMI(static_cast(TM)); + + std::string error; + + Mod->setDataLayout(TM->createDataLayout()); + + CodeGenFileType ft; + switch (codegen) { + case LLVMAssemblyFile: +ft = CodeGenFileType::AssemblyFile; +break; + default: +ft = CodeGenFileType::ObjectFile; +break; + } + if (TM->addPassesToEmitFile(pass, MMI, OS, nullptr, ft)) { +error = "TargetMachine can't emit a file of this type"; +*ErrorMessage = strdup(error.c_str()); +return true; + } + + pass.run(*Mod); + + OS.flush(); + return false; +} + +LLVMBool LLVMTargetMachineEmitToFile(LLVMTargetMachineRef T, LLVMModuleRef M, + const char *Filename, + LLVMCodeGenFileType codegen, + char **ErrorMessage) { + std::error_code EC; + raw_fd_ostream dest(Filename, EC, sys::fs::OF_None); + if (EC) { +*ErrorMessage = strdup(EC.message().c_str()); +return true; + } + bool Result = LLVMTargetMachineEmit(T, M, dest, codegen, ErrorMessage); + dest.flush(); + return Result; +} + +LLVMBool LLVMTargetMachineEmitToMemoryBuffer(LLVMTargetMachineRef T, + LLVMModuleRef M, + LLVMCodeGenFileType codegen, + char **ErrorMessage, + LLVMMemoryBufferRef *OutMemBuf) { + SmallString<0> CodeString; + raw_svector_ostream OStream(CodeString); + bool Result = LLVMTargetMachineEmit(T, M, OStream, codegen, ErrorMessage); + + StringRef Data = OStream.str(); + *OutMemBuf = + LLVMCreateMemoryBufferWithMemoryRangeCopy(Data.data(), Data.size(), ""); + return Result; +} arsenm wrote: Missing newline at end of file https://github.com/llvm/llvm-project/pull/110443 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
arsenm wrote: > with the PR pulled in (on top of LLVM's HEAD > [aadfba9](https://github.com/llvm/llvm-project/commit/aadfba9b2aa107f9cada2fd9bcbe612cbf560650)), > the compilation command is: `clang++ -cl-std=CL2.0 -emit-llvm -c -x cl -g0 > --target=spir -Xclang -finclude-default-header -O2 test.cl` The output LLVM > IR after the optimizations is: You want spirv, not spir > note bitcast to i128 with the following truncation to i96 - those types > aren't part of the datalayout, yet some optimization generated them. So > something has to be done with it and changing the datalayout is not enough. Any pass is allowed to introduce any IR type. This field is a pure optimization hint. It is not required to do anything, and places no restrictions on any pass > > > This does not mean arbitrary integer bitwidths do not work. The n field is > > weird, it's more of an optimization hint. > > And I can imagine that we would want to not only be able to emit 4-bit > integers in the frontend, but also allow optimization passes to emit them. Just because there's an extension doesn't mean it's desirable to use them. On real targets, they'll end up codegenning in wider types anyway https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
arsenm wrote: > 1. Usually (or at least AFAIK) optimization passes won't consider datalayout > automatically, The datalayout is a widely used global constant. There's no option of "not considering it" > Do you plan to go over LLVM passes adding this check? There's nothing new to do here. This has always existed > 2. Some existing and future extensions might allow extra bit widths for > integers. This does not mean arbitrary integer bitwidths do not work. The n field is weird, it's more of an optimization hint. https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,56 +0,0 @@ -; This test aims to check ability to support "Arithmetic with Overflow" intrinsics arsenm wrote: > Right but it's relying on a non-guaranteed maybe-optimisation firing, as far > as I can tell. The point is to test the optimization does work. The codegen pipeline is a bunch of intertwined IR passes on top of core codegen, and they need to cooperate https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
arsenm wrote: > > I would like to avoid adding additional special properties to AS0, or > > defining the flat concept. > > How can we add a new specification w/o defining it? By not defining it in terms of flat addressing. Just make it the undesirable address space https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -54,14 +54,14 @@ static std::string computeDataLayout(const Triple &TT) { // memory model used for graphics: PhysicalStorageBuffer64. But it shouldn't // mean anything. if (Arch == Triple::spirv32) -return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1"; +return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-" + "v256:256-v512:512-v1024:1024-n8:16:32:64-G1"; if (TT.getVendor() == Triple::VendorType::AMD && TT.getOS() == Triple::OSType::AMDHSA) -return "e-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0"; - return "e-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1"; +return "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-" + "v512:512-v1024:1024-n32:64-S32-G1-P4-A0"; arsenm wrote: AMDGPU sets S32 now, which isn't wrong. But the rest of codegen assumes 16-byte alignment by default https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,56 +0,0 @@ -; This test aims to check ability to support "Arithmetic with Overflow" intrinsics arsenm wrote: That is not the nature of this kind of test https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/110198 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [IR] Allow fast math flags on calls with homogeneous FP struct types (PR #110506)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/110506 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,56 +0,0 @@ -; This test aims to check ability to support "Arithmetic with Overflow" intrinsics arsenm wrote: This one is testing codegenprepare as part of the normal codegen pipeline, so this one is fine. The other case was a full optimization pipeline + codegen, which are more far removed https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,56 +0,0 @@ -; This test aims to check ability to support "Arithmetic with Overflow" intrinsics arsenm wrote: Not sure what the problem is with this test, but it's already covered by another? https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -54,14 +54,14 @@ static std::string computeDataLayout(const Triple &TT) { // memory model used for graphics: PhysicalStorageBuffer64. But it shouldn't // mean anything. if (Arch == Triple::spirv32) -return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1"; +return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-" + "v256:256-v512:512-v1024:1024-n8:16:32:64-G1"; if (TT.getVendor() == Triple::VendorType::AMD && TT.getOS() == Triple::OSType::AMDHSA) -return "e-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0"; - return "e-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1"; +return "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-" + "v512:512-v1024:1024-n32:64-S32-G1-P4-A0"; arsenm wrote: The stack alignment should be 16 bytes (S128), but that's not mentioned in the description. Do this separately? I'm pretty sure this is wrong for the amdgcn triples too https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [IR] Allow fast math flags on calls with homogeneous FP struct types (PR #110506)
@@ -1122,6 +1122,26 @@ define void @fastMathFlagsForArrayCalls([2 x float] %f, [2 x double] %d1, [2 x < ret void } +declare { float, float } @fmf_struct_f32() +declare { double, double } @fmf_struct_f64() +declare { <4 x double>, <4 x double> } @fmf_struct_v4f64() + +; CHECK-LABEL: fastMathFlagsForStructCalls( +define void @fastMathFlagsForStructCalls({ float, float } %f, { double, double } %d1, { <4 x double>, <4 x double> } %d2) { + %call.fast = call fast { float, float } @fmf_struct_f32() + ; CHECK: %call.fast = call fast { float, float } @fmf_struct_f32() + + ; Throw in some other attributes to make sure those stay in the right places. + + %call.nsz.arcp = notail call nsz arcp { double, double } @fmf_struct_f64() + ; CHECK: %call.nsz.arcp = notail call nsz arcp { double, double } @fmf_struct_f64() + + %call.nnan.ninf = tail call nnan ninf fastcc { <4 x double>, <4 x double> } @fmf_struct_v4f64() + ; CHECK: %call.nnan.ninf = tail call nnan ninf fastcc { <4 x double>, <4 x double> } @fmf_struct_v4f64() + arsenm wrote: Can you also add a test with nofpclass attributes on the return / argument? The intent was it would be allowed for the same types as FPMathOperator https://github.com/llvm/llvm-project/pull/110506 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [llvm] [mlir] Make Ownership of MachineModuleInfo in Its Wrapper Pass External (PR #110443)
arsenm wrote: > * Move the MC emission functions in `TargetMachine` to `LLVMTargetMachine`. > With the changes in this PR, we explicitly assume in both > `addPassesToEmitFile` and `addPassesToEmitMC` that the `TargetMachine` is an > `LLVMTargetMachine`; Hence it does not make sense for these functions to be > present in the `TargetMachine` interface. Was this already implicitly assumed? IIRC there was some layering reason why this is the way it was. There were previous attempts to merge these before, which were abandoned: https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html https://reviews.llvm.org/D38482 https://reviews.llvm.org/D38489 https://github.com/llvm/llvm-project/pull/110443 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [LLVM][TableGen] Change SeachableTableEmitter to use const RecordKeeper (PR #110032)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/110032 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)
arsenm wrote: > With the constrained intrinsics the default is safe because optimizations > don't recognize the constrained intrinsic and thus don't know how to optimize > it. If we instead rely on the strictfp attribute then we'll need possibly > thousands of checks for this attribute, we'll need everyone going forward to > remember to check for it, and we'll have no way to verify that this rule is > being followed. The current state already requires you to check this for any library calls. Not sure any wide audit of those ever happened. I don't see a better alternative to cover those, plus the full set of target intrinsics. https://github.com/llvm/llvm-project/pull/109798 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [LLVM][TableGen] Change SeachableTableEmitter to use const RecordKeeper (PR #110032)
@@ -1556,7 +1557,7 @@ class RecordVal { bool IsUsed = false; /// Reference locations to this record value. - SmallVector ReferenceLocs; + mutable SmallVector ReferenceLocs; arsenm wrote: Is this removed in later patches? https://github.com/llvm/llvm-project/pull/110032 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)
@@ -273,6 +273,74 @@ void test_builtin_elementwise_min(int i, short s, double d, float4 v, int3 iv, u // expected-error@-1 {{1st argument must be a vector, integer or floating point type (was '_Complex float')}} } +void test_builtin_elementwise_maximum(int i, short s, float f, double d, float4 v, int3 iv, unsigned3 uv, int *p) { + i = __builtin_elementwise_maximum(p, d); + // expected-error@-1 {{arguments are of different types ('int *' vs 'double')}} + + struct Foo foo = __builtin_elementwise_maximum(d, d); + // expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'double'}} + + i = __builtin_elementwise_maximum(i); + // expected-error@-1 {{too few arguments to function call, expected 2, have 1}} + + i = __builtin_elementwise_maximum(); + // expected-error@-1 {{too few arguments to function call, expected 2, have 0}} + + i = __builtin_elementwise_maximum(i, i, i); + // expected-error@-1 {{too many arguments to function call, expected 2, have 3}} + + i = __builtin_elementwise_maximum(v, iv); + // expected-error@-1 {{arguments are of different types ('float4' (vector of 4 'float' values) vs 'int3' (vector of 3 'int' values))}} + + i = __builtin_elementwise_maximum(uv, iv); + // expected-error@-1 {{arguments are of different types ('unsigned3' (vector of 3 'unsigned int' values) vs 'int3' (vector of 3 'int' values))}} + + d = __builtin_elementwise_maximum(d, f); + + v = __builtin_elementwise_maximum(v, v); + + int A[10]; + A = __builtin_elementwise_maximum(A, A); + // expected-error@-1 {{1st argument must be a vector, integer or floating point type (was 'int *')}} + + _Complex float c1, c2; + c1 = __builtin_elementwise_maximum(c1, c2); + // expected-error@-1 {{1st argument must be a vector, integer or floating point type (was '_Complex float')}} +} + +void test_builtin_elementwise_minimum(int i, short s, float f, double d, float4 v, int3 iv, unsigned3 uv, int *p) { + i = __builtin_elementwise_minimum(p, d); + // expected-error@-1 {{arguments are of different types ('int *' vs 'double')}} + + struct Foo foo = __builtin_elementwise_minimum(d, d); + // expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'double'}} + + i = __builtin_elementwise_minimum(i); + // expected-error@-1 {{too few arguments to function call, expected 2, have 1}} + + i = __builtin_elementwise_minimum(); + // expected-error@-1 {{too few arguments to function call, expected 2, have 0}} + + i = __builtin_elementwise_minimum(i, i, i); + // expected-error@-1 {{too many arguments to function call, expected 2, have 3}} + + i = __builtin_elementwise_minimum(v, iv); + // expected-error@-1 {{arguments are of different types ('float4' (vector of 4 'float' values) vs 'int3' (vector of 3 'int' values))}} + + i = __builtin_elementwise_minimum(uv, iv); + // expected-error@-1 {{arguments are of different types ('unsigned3' (vector of 3 'unsigned int' values) vs 'int3' (vector of 3 'int' values))}} + + d = __builtin_elementwise_minimum(f, d); + + int A[10]; + A = __builtin_elementwise_minimum(A, A); + // expected-error@-1 {{1st argument must be a vector, integer or floating point type (was 'int *')}} arsenm wrote: The codegen assumes this is only floating point, so the integer part of the message is wrong. Also missing tests using 2 arguments with only integer / vector of integer https://github.com/llvm/llvm-project/pull/110198 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)
@@ -706,6 +706,12 @@ Unless specified otherwise operation(±0) = ±0 and operation(±infinity) = ±in representable values for the signed/unsigned integer type. T __builtin_elementwise_sub_sat(T x, T y) return the difference of x and y, clamped to the range ofinteger types representable values for the signed/unsigned integer type. + T __builtin_elementwise_maximum(T x, T y) return x or y, whichever is larger. If exactly one argument is integer and floating point types + a NaN, return the other argument. If both arguments are NaNs, arsenm wrote: This doesn't fully explain the semantics, and I'd like to avoid trying to re-explain all the details in every instance of this. Can you just point this to some other description of the semantics? https://github.com/llvm/llvm-project/pull/110198 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [cuda][[HIP] `__constant__` should imply constant (PR #110182)
arsenm wrote: If it's not legal for it to be marked as constant, it's also not legal to use constant address space https://github.com/llvm/llvm-project/pull/110182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
arsenm wrote: > Both in InferAddressSpaces, and in Attributor, you don't really care about > whether a flat address-space exists. Right, this is more of an undesirable address space. Optimizations don't need to know anything about its behavior beyond that. > In reply to your question above re whether this is a DL or a Target property, > I don't have a strong opinion there (it appears @shiltian and @arsenm might). I don't really like putting this in the DataLayout. My original idea was to move it to TargetMachine, but we want to avoid the dependence on CodeGen. The DataLayout is just the other place we have that defines module level target information. The simple solution is just have a switch over the target architecture in Attributor. > I do believe that this is a necessary bit of query-able information, > especially from a Clang, for correctness reasons (more on that below). I don't think this buys frontends much. Clang still needs to understand the full language address space -> target address space mapping. This would just allow populating one entry generically > Ah, this is part of the challenge - we do indeed assume that 0 is flat, but > Targets aren't bound by LangRef to use 0 to denote flat (and some, like SPIR > / SPIR-V) do not As I mentioned above, SPIRV can just work its way out of this problem for its IR. SPIR's only reason for existence is bitcode compatibility, so doing anything with there will be quite a lot of work which will never realistically happen. > I'm fine with adding the enforcement in LLVM that AS0 needs to be the flat > AS, if a target has it, but the definition of a flat AS still needs to be > set. If we do that, how will SPIR/SPIR-V work? > This is the most generic wording I can come up with so far. Happy to hear > more feedbacks. I would like to avoid adding additional special properties to AS0, or defining the flat concept. https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
@@ -579,7 +579,7 @@ static StringRef computeDataLayout(const Triple &TT) { "-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-" "v32:32-v48:64-v96:" "128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-" - "G1-ni:7:8:9"; + "G1-ni:7:8:9-T0"; arsenm wrote: No, but yes. We probably should just define 0 to be the flat address space and take the same numbers as amdgcn. Flat will just be unsupported in codegen (but theoretically someone could go implement software tagged pointers) https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
arsenm wrote: > Just to clarify, does this mean any two non-flat address space pointers > _cannot_ alias? This should change nothing about aliasing. The IR assumption is any address space may alias any other https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
arsenm wrote: > There are targets that use a different integer to denote flat (e.g. see SPIR > & SPIR-V). Whilst I know that there are objections to that, the fact remains > that they had historical reason (wanted to make legacy OCL convention that > the default is private work, and given that IR defaults to 0 this was an > easy, if possibly costly, way out; The SPIRV IR would be better off changing its numbers around like we did in AMDGPU ages ago. The only concern would be bitcode compatibility, but given it's still an "experimental target" that shouldn't be an issue. > AMDGPU also borks this for legacy OCL reasons, which has been a source of > pain). This is only a broken in-clang hack, the backend IR always uses the correct address space https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
@@ -66,12 +66,12 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple, HasFloat16 = true; if (TargetPointerWidth == 32) -resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64"); +resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64-T0"); arsenm wrote: It is https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)
https://github.com/arsenm approved this pull request. I think we need more thought about how the ABI for this will work, but we need to start somewhere https://github.com/llvm/llvm-project/pull/102913 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)
arsenm wrote: > If we can't keep the constrained semantics and near-100% guarantee that no > new exceptions will be introduced then operand bundles are not a replacement > for the constrained intrinsics. We would still need a call / function attribute to indicate strictfp calls, and such calls would then be annotatable with bundles to relax the assumptions. The default would always have to be the most conservative assumption https://github.com/llvm/llvm-project/pull/109798 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/94647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)
@@ -434,6 +434,15 @@ struct AAAMDAttributesFunction : public AAAMDAttributes { indicatePessimisticFixpoint(); return; } + +for (Instruction &I : instructions(F)) { + if (isa(I) && arsenm wrote: Simple example, where the cast is still directly the operand. It could be further nested inside another constant expression https://github.com/llvm/llvm-project/pull/94647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)
@@ -434,6 +434,15 @@ struct AAAMDAttributesFunction : public AAAMDAttributes { indicatePessimisticFixpoint(); return; } + +for (Instruction &I : instructions(F)) { + if (isa(I) && arsenm wrote: 5->3 is an illegal address space cast, but the round trip cast can fold away. You don't want the cast back to the original address space. https://github.com/llvm/llvm-project/pull/94647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)
arsenm wrote: Also it's silly that we need to do bitcode autoupgrade of "experimental" intrinsics, but x86 started shipping with strictfp enabled in production before they graduated. We might as well drop the experimental bit then https://github.com/llvm/llvm-project/pull/109798 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)
@@ -357,6 +357,9 @@ class IRBuilderBase { void setConstrainedFPCallAttr(CallBase *I) { I->addFnAttr(Attribute::StrictFP); +MemoryEffects ME = MemoryEffects::inaccessibleMemOnly(); arsenm wrote: It shouldn't be necessary to touch the attributes. The set of intrinsic attributes are fixed (callsite attributes are another thing, but generally should be droppable here) https://github.com/llvm/llvm-project/pull/109798 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)
@@ -78,15 +78,15 @@ void MCResourceInfo::finalize(MCContext &OutContext) { } MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) { - return OutContext.getOrCreateSymbol("max_num_vgpr"); + return OutContext.getOrCreateSymbol("amdgcn.max_num_vgpr"); arsenm wrote: We're usually using amdgpu instead of amdgcn in new fields https://github.com/llvm/llvm-project/pull/102913 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Use std::optional::value_or (NFC) (PR #109894)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/109894 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits