[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -372,6 +372,31 @@ void foo(double *d, float f, float *fp, long double *l, 
int *i, const char *c) {
 // HAS_MAYTRAP: declare float @llvm.experimental.constrained.minnum.f32(
 // HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.minnum.f80(
 
+  fmaximum_num(*d,*d);   fmaximum_numf(f,f);  fmaximum_numl(*l,*l);
+
+// NO__ERRNO: declare double @llvm.maximumnum.f64(double, double) 
[[READNONE_INTRINSIC]]
+// NO__ERRNO: declare float @llvm.maximumnum.f32(float, float) 
[[READNONE_INTRINSIC]]
+// NO__ERRNO: declare x86_fp80 @llvm.maximumnum.f80(x86_fp80, x86_fp80) 
[[READNONE_INTRINSIC]]
+// HAS_ERRNO: declare double @llvm.maximumnum.f64(double, double) 
[[READNONE_INTRINSIC]]
+// HAS_ERRNO: declare float @llvm.maximumnum.f32(float, float) 
[[READNONE_INTRINSIC]]
+// HAS_ERRNO: declare x86_fp80 @llvm.maximumnum.f80(x86_fp80, x86_fp80) 
[[READNONE_INTRINSIC]]
+// HAS_MAYTRAP: declare double @llvm.maximumnum.f64(
+// HAS_MAYTRAP: declare float @llvm.maximumnum.f32(
+// HAS_MAYTRAP: declare x86_fp80 @llvm.maximumnum.f80(
+
+  fminimum_num(*d,*d);   fminimum_numf(f,f);  fminimum_numl(*l,*l);
+
+// NO__ERRNO: declare double @llvm.minimumnum.f64(double, double) 
[[READNONE_INTRINSIC]]
+// NO__ERRNO: declare float @llvm.minimumnum.f32(float, float) 
[[READNONE_INTRINSIC]]
+// NO__ERRNO: declare x86_fp80 @llvm.minimumnum.f80(x86_fp80, x86_fp80) 
[[READNONE_INTRINSIC]]
+// HAS_ERRNO: declare double @llvm.minimumnum.f64(double, double) 
[[READNONE_INTRINSIC]]
+// HAS_ERRNO: declare float @llvm.minimumnum.f32(float, float) 
[[READNONE_INTRINSIC]]
+// HAS_ERRNO: declare x86_fp80 @llvm.minimumnum.f80(x86_fp80, x86_fp80) 
[[READNONE_INTRINSIC]]
+// HAS_MAYTRAP: declare double @llvm.minimumnum.f64(
+// HAS_MAYTRAP: declare float @llvm.minimumnum.f32(

arsenm wrote:

These checks should be common. The attributes of intrinsics are fixed and these 
don't set errno 

https://github.com/llvm/llvm-project/pull/96281
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Replace use of `llvm-mc` with `clang` (PR #112041)

2024-10-11 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/112041
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Replace use of `llvm-mc` with `clang` (PR #112041)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -463,10 +463,11 @@ void HIP::constructGenerateObjFileFromHIPFatBinary(
 
   Objf << ObjBuffer;
 
-  ArgStringList McArgs{"-triple", Args.MakeArgString(HostTriple.normalize()),
+  ArgStringList McArgs{"-target", Args.MakeArgString(HostTriple.normalize()),
"-o",  Output.getFilename(),
-   McinFile,  "--filetype=obj"};
-  const char *Mc = Args.MakeArgString(TC.GetProgramPath("llvm-mc"));
+   "-x",  "assembler",
+   ObjinFile, "-c"};
+  const char *Mc = Args.MakeArgString(TC.GetProgramPath("clang"));

arsenm wrote:

But the toolchain tracked the name of the current clang? Really you want to 
find the current binary

`I don't think it's critical that the clang we invoke here is the amdclang`

It's critical to find the exact clang that you are running 

https://github.com/llvm/llvm-project/pull/112041
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Replace use of `llvm-mc` with `clang` (PR #112041)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -463,10 +463,11 @@ void HIP::constructGenerateObjFileFromHIPFatBinary(
 
   Objf << ObjBuffer;
 
-  ArgStringList McArgs{"-triple", Args.MakeArgString(HostTriple.normalize()),
+  ArgStringList McArgs{"-target", Args.MakeArgString(HostTriple.normalize()),
"-o",  Output.getFilename(),
-   McinFile,  "--filetype=obj"};
-  const char *Mc = Args.MakeArgString(TC.GetProgramPath("llvm-mc"));
+   "-x",  "assembler",
+   ObjinFile, "-c"};
+  const char *Mc = Args.MakeArgString(TC.GetProgramPath("clang"));

arsenm wrote:

Because sometimes there's a version suffix (e.g. clang-19), and some 
distributions add on random prefixes or suffixes (e.g. amdclang) 

https://github.com/llvm/llvm-project/pull/112041
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Replace use of `llvm-mc` with `clang` (PR #112041)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -463,10 +463,11 @@ void HIP::constructGenerateObjFileFromHIPFatBinary(
 
   Objf << ObjBuffer;
 
-  ArgStringList McArgs{"-triple", Args.MakeArgString(HostTriple.normalize()),
+  ArgStringList McArgs{"-target", Args.MakeArgString(HostTriple.normalize()),
"-o",  Output.getFilename(),
-   McinFile,  "--filetype=obj"};
-  const char *Mc = Args.MakeArgString(TC.GetProgramPath("llvm-mc"));
+   "-x",  "assembler",
+   ObjinFile, "-c"};
+  const char *Mc = Args.MakeArgString(TC.GetProgramPath("clang"));

arsenm wrote:

Shouldn't assume the binary name is clang, other places seem to be doing 
something like this:

`TCArgs.MakeArgString(getToolChain().GetProgramPath(getShortName()))`



https://github.com/llvm/llvm-project/pull/112041
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang: Add llvm-mc to CLANG_TEST_DEPS (PR #112032)

2024-10-11 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm closed 
https://github.com/llvm/llvm-project/pull/112032
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang: Add llvm-mc to CLANG_TEST_DEPS (PR #112032)

2024-10-11 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/112032
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang: Add llvm-mc to CLANG_TEST_DEPS (PR #112032)

2024-10-11 Thread Matt Arsenault via cfe-commits

arsenm wrote:

* **#112032** https://app.graphite.dev/github/pr/llvm/llvm-project/112032?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/112032
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang: Add llvm-mc to CLANG_TEST_DEPS (PR #112032)

2024-10-11 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/112032

Attempt to fit sporadic precommit test failures in
hip-partial-link.hip

The driver really shouldn't be using llvm-mc in the first place
though, filed #112031 to fix this.

>From 7337759de47b0623f96241927b167e2ed413378d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 11 Oct 2024 22:20:45 +0400
Subject: [PATCH] clang: Add llvm-mc to CLANG_TEST_DEPS

Attempt to fit sporadic precommit test failures in
hip-partial-link.hip

The driver really shouldn't be using llvm-mc in the first place
though, filed #112031 to fix this.
---
 clang/test/CMakeLists.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/clang/test/CMakeLists.txt b/clang/test/CMakeLists.txt
index 2d84b0d73053f6..98829d53db934f 100644
--- a/clang/test/CMakeLists.txt
+++ b/clang/test/CMakeLists.txt
@@ -127,6 +127,7 @@ if( NOT CLANG_BUILT_STANDALONE )
 llvm-dwarfdump
 llvm-ifs
 llvm-lto2
+llvm-mc
 llvm-modextract
 llvm-nm
 llvm-objcopy

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add a flag to include GPU startup files (PR #112025)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -648,6 +648,15 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
 Args.MakeArgString("-plugin-opt=-mattr=" + llvm::join(Features, ",")));
   }
 
+  if (Args.hasArg(options::OPT_gpustartfiles)) {

arsenm wrote:

Default value would be a toolchain choice, so yes? 

https://github.com/llvm/llvm-project/pull/112025
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add a flag to include GPU startup files (PR #112025)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -648,6 +648,15 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
 Args.MakeArgString("-plugin-opt=-mattr=" + llvm::join(Features, ",")));
   }
 
+  if (Args.hasArg(options::OPT_gpustartfiles)) {

arsenm wrote:

can we make that have a positive pair, like other flags? No gpu prefix? 

https://github.com/llvm/llvm-project/pull/112025
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add a flag to include GPU startup files (PR #112025)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -648,6 +648,15 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
 Args.MakeArgString("-plugin-opt=-mattr=" + llvm::join(Features, ",")));
   }
 
+  if (Args.hasArg(options::OPT_gpustartfiles)) {

arsenm wrote:

Is there prior art for a flag to link crt? (i.e. can we just use that instead 
of inventing a new -gpu flag) 

https://github.com/llvm/llvm-project/pull/112025
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang: Fix hipstdpar test relying on default target (PR #111975)

2024-10-11 Thread Matt Arsenault via cfe-commits

arsenm wrote:

Window bot passed, which was the important bit. Linux failed on a different 
test entirely 

https://github.com/llvm/llvm-project/pull/111975
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang: Fix hipstdpar test relying on default target (PR #111975)

2024-10-11 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> @arsenm what are you actually trying to fix and what do you expect this to do?

Fix not running tests except on linux. We should have maximum host test 
coverage, and this test has no reason to depend on the host. All it needs is 
the explicit target instead of relying on the default 

https://github.com/llvm/llvm-project/pull/111975
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-10-11 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> > The LangRef doesn't need to know why it's undesirable. It's like the n field
> 
> `n` field? The following?
> 

Yes. It's an optimization hint 

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang: Fix hipstdpar test relying on default target (PR #111975)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -1,21 +1,17 @@
-// REQUIRES: x86-registered-target
-// REQUIRES: amdgpu-registered-target
-// REQUIRES: system-linux

arsenm wrote:

This is a pile of workarounds, there's no reason any of these tests should be 
host dependent 

https://github.com/llvm/llvm-project/pull/111975
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -15314,6 +15314,32 @@ bool FloatExprEvaluator::VisitCallExpr(const CallExpr 
*E) {
   Result = RHS;
 return true;
   }
+
+  case Builtin::BI__builtin_fmaximum_num:
+  case Builtin::BI__builtin_fmaximum_numf:
+  case Builtin::BI__builtin_fmaximum_numl:
+  case Builtin::BI__builtin_fmaximum_numf16:
+  case Builtin::BI__builtin_fmaximum_numf128: {
+APFloat RHS(0.);
+if (!EvaluateFloat(E->getArg(0), Result, Info) ||

arsenm wrote:

This doesn't have tests showing the evaluation, similar to those added for 
fmin/fmax in ec32386404409b65d21fdf916110c08912335926

https://github.com/llvm/llvm-project/pull/96281
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -1,12 +1,14 @@
 ; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics
 ; in the special case when those intrinsics are being generated by the 
CodeGenPrepare;
-; pass during translations with optimization (note -O3 in llc arguments).
+; pass during translations with optimization (note -disable-lsr, to inhibit
+; strength reduction pre-empting with a more preferable match for this pattern
+; in llc arguments).
 
-; RUN: llc -O3 -mtriple=spirv32-unknown-unknown %s -o - | FileCheck %s
-; RUN: %if spirv-tools %{ llc -O3 -mtriple=spirv32-unknown-unknown %s -o - 
-filetype=obj | spirv-val %}
+; RUN: llc -O3 -disable-lsr -mtriple=spirv32-unknown-unknown %s -o - | 
FileCheck %s

arsenm wrote:

The purpose of this test appears to be to demonstrate the net result, which 
would be update (rather than disable lsr to get the previous output). Some 
other transform decided something else was better, should show what that is.


https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-11 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Does it mean, that the reasoning behind this very PR is not legit?

No. This is providing the generic property in the datalayout used by 
InstCombine and others as a hint of what to do without directly knowing what 
the target is 


https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang: Fix hipstdpar test relying on default target (PR #111975)

2024-10-11 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/111975
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang: Fix hipstdpar test relying on default target (PR #111975)

2024-10-11 Thread Matt Arsenault via cfe-commits

arsenm wrote:

* **#111976** https://app.graphite.dev/github/pr/llvm/llvm-project/111976?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#111975** https://app.graphite.dev/github/pr/llvm/llvm-project/111975?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/111975
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang: Fix hipstdpar test relying on default target (PR #111975)

2024-10-11 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/111975

Use explicit target and stop restricting hosts it can run on.

>From d3ec46ab6c4d4d5d740336a9c81c24ed8dc70680 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 11 Oct 2024 14:38:02 +0400
Subject: [PATCH] clang: Fix hipstdpar test relying on default target

Use explicit target and stop restricting hosts it can run on.
---
 clang/test/Driver/hipstdpar.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/clang/test/Driver/hipstdpar.c b/clang/test/Driver/hipstdpar.c
index 32e040ef70d754..b759c5fb2084a3 100644
--- a/clang/test/Driver/hipstdpar.c
+++ b/clang/test/Driver/hipstdpar.c
@@ -1,21 +1,17 @@
-// REQUIRES: x86-registered-target
-// REQUIRES: amdgpu-registered-target
-// REQUIRES: system-linux
-// UNSUPPORTED: target={{.*}}-zos{{.*}}
-// XFAIL: target={{.*}}hexagon{{.*}}
-// XFAIL: target={{.*}}-scei{{.*}}
-// XFAIL: target={{.*}}-sie{{.*}}
+// REQUIRES: x86-registered-target, amdgpu-registered-target
 
-// RUN: not %clang -### --hipstdpar --hipstdpar-path=/does/not/exist -nogpulib 
\
+// RUN: not %clang -### --target=x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar --hipstdpar-path=/does/not/exist -nogpulib\
 // RUN:   -nogpuinc --compile %s 2>&1 | \
 // RUN:   FileCheck --check-prefix=HIPSTDPAR-MISSING-LIB %s
-// RUN: %clang -### --hipstdpar --hipstdpar-path=%S/Inputs/hipstdpar \
+// RUN: %clang -### --target=x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar --hipstdpar-path=%S/Inputs/hipstdpar \
 // RUN:   --hipstdpar-thrust-path=%S/Inputs/hipstdpar/thrust \
 // RUN:   --hipstdpar-prim-path=%S/Inputs/hipstdpar/rocprim \
 // RUN:   -nogpulib -nogpuinc --compile %s 2>&1 | \
 // RUN:   FileCheck --check-prefix=HIPSTDPAR-COMPILE %s
 // RUN: touch %t.o
-// RUN: %clang -### --hipstdpar %t.o 2>&1 | FileCheck 
--check-prefix=HIPSTDPAR-LINK %s
+// RUN: %clang -### --target=x86_64-unknown-linux-gnu --hipstdpar %t.o 2>&1 | 
FileCheck --check-prefix=HIPSTDPAR-LINK %s
 
 // HIPSTDPAR-MISSING-LIB: error: cannot find HIP Standard Parallelism 
Acceleration library; provide it via '--hipstdpar-path'
 // HIPSTDPAR-COMPILE: "-x" "hip"

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -15314,6 +15314,32 @@ bool FloatExprEvaluator::VisitCallExpr(const CallExpr 
*E) {
   Result = RHS;
 return true;
   }
+
+  case Builtin::BI__builtin_fmaximum_num:
+  case Builtin::BI__builtin_fmaximum_numf:
+  case Builtin::BI__builtin_fmaximum_numl:
+  case Builtin::BI__builtin_fmaximum_numf16:
+  case Builtin::BI__builtin_fmaximum_numf128: {
+APFloat RHS(0.);
+if (!EvaluateFloat(E->getArg(0), Result, Info) ||

arsenm wrote:

Missing constexpr evaluation tests

https://github.com/llvm/llvm-project/pull/96281
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -475,6 +475,12 @@ SYMBOL(fmaxl, None, )
 SYMBOL(fmin, None, )
 SYMBOL(fminf, None, )
 SYMBOL(fminl, None, )
+SYMBOL(fmaximum_num, None, )

arsenm wrote:

Not sure what this for, but this isn't tested?

https://github.com/llvm/llvm-project/pull/96281
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -1295,6 +1295,24 @@ SYMBOL(fminf, None, )
 SYMBOL(fminl, std::, )
 SYMBOL(fminl, None, )
 SYMBOL(fminl, None, )
+SYMBOL(fmaximum_num, std::, )

arsenm wrote:

Not sure what this for, but this isn't tested? 

https://github.com/llvm/llvm-project/pull/96281
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -372,6 +372,31 @@ void foo(double *d, float f, float *fp, long double *l, 
int *i, const char *c) {
 // HAS_MAYTRAP: declare float @llvm.experimental.constrained.minnum.f32(
 // HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.minnum.f80(
 
+  fmaximum_num(f,f);   fmaximum_numf(f,f);  fmaximum_numl(f,f);

arsenm wrote:

Use right type and avoid the implicit casts? 

https://github.com/llvm/llvm-project/pull/96281
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Clang: Support minimumnum and maximumnum intrinsics (PR #96281)

2024-10-11 Thread Matt Arsenault via cfe-commits


@@ -15314,6 +15314,32 @@ bool FloatExprEvaluator::VisitCallExpr(const CallExpr 
*E) {
   Result = RHS;

arsenm wrote:

Unrelated, but why is up here reproducing logic that's already in APFloat? 

https://github.com/llvm/llvm-project/pull/96281
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)

2024-10-10 Thread Matt Arsenault via cfe-commits


@@ -683,6 +706,59 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
 return !A.checkForAllCallLikeInstructions(DoesNotRetrieve, *this,
   UsedAssumedInformation);
   }
+
+  // Returns true if FlatScratchInit is needed, i.e., no-flat-scratch-init is
+  // not to be set.
+  bool needFlatScratchInit(Attributor &A) {
+assert(isAssumed(FLAT_SCRATCH_INIT)); // only called if the bit is still 
set
+
+// This is called on each callee; false means callee shouldn't have
+// no-flat-scratch-init.
+auto CheckForNoFlatScratchInit = [&](Instruction &I) {
+  const auto &CB = cast(I);
+  const Function *Callee = CB.getCalledFunction();
+
+  // Callee == 0 for inline asm or indirect call with known callees.
+  // In the latter case, updateImpl() already checked the callees and we
+  // know their FLAT_SCRATCH_INIT bit is set.
+  // If function has indirect call with unknown callees, the bit is
+  // already removed in updateImpl() and execution won't reach here.
+  if (!Callee)
+return true;
+
+  return Callee->getIntrinsicID() !=
+ Intrinsic::amdgcn_addrspacecast_nonnull;
+};
+
+bool UsedAssumedInformation = false;
+// If any callee is false (i.e. need FlatScratchInit),
+// checkForAllCallLikeInstructions returns false, in which case this
+// function returns true.
+return !A.checkForAllCallLikeInstructions(CheckForNoFlatScratchInit, *this,
+  UsedAssumedInformation);
+  }
+
+  bool constHasASCast(const Constant *C,
+  SmallPtrSetImpl &Visited) {
+if (!Visited.insert(C).second)
+  return false;
+
+if (const auto *CE = dyn_cast(C))
+  if (CE->getOpcode() == Instruction::AddrSpaceCast &&
+  CE->getOperand(0)->getType()->getPointerAddressSpace() ==
+  AMDGPUAS::PRIVATE_ADDRESS)
+return true;
+
+for (const Use &U : C->operands()) {
+  const auto *OpC = dyn_cast(U);
+  if (!OpC || !Visited.insert(OpC).second)
+continue;
+
+  if (constHasASCast(OpC, Visited))
+return true;
+}
+return false;
+  }

arsenm wrote:

I do not want to duplicate the same function that already exists for the LDS 
case. Unify these.

We also should try to avoid doing this walk over all instructions through all 
constant expressions twice for the two attributes 

https://github.com/llvm/llvm-project/pull/94647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)

2024-10-10 Thread Matt Arsenault via cfe-commits


@@ -439,6 +439,26 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
   indicatePessimisticFixpoint();
   return;
 }
+
+SmallPtrSet VisitedConsts;
+
+for (Instruction &I : instructions(F)) {

arsenm wrote:

Should use checkForAllInstructions instead of manually looking at all 
instructions 

https://github.com/llvm/llvm-project/pull/94647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-10-10 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> But that will still require to define, what is undesirable address space 
> right?

The LangRef doesn't need to know why it's undesirable. It's like the n field

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [polly] [NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (PR #111752)

2024-10-10 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.

There are definitely places that would benefit from a getDeclaration. There are 
several places that have to awkwardly construct the intrinsic name to check 
getFunction 

https://github.com/llvm/llvm-project/pull/111752
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-09 Thread Matt Arsenault via cfe-commits


@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))

arsenm wrote:

We could do the same thing for amdgpu. We implement addrspacecast with the same 
operations.

This also reminds me, we should have a valid flag on addrspacecast. 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-09 Thread Matt Arsenault via cfe-commits


@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))

arsenm wrote:

If I have skimmed SPIRV correctly, it expects invalid addrspacecasts 
(OpGenericCastToPtrExplicit) to return null. You could implement the same kind 
of check by looking for icmp ne (addrspacecast x to y), null

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (PR #111579)

2024-10-08 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm closed 
https://github.com/llvm/llvm-project/pull/111579
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (PR #111579)

2024-10-08 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Not sure if you still want to keep it for backward compatibility.

Definitely not. It's already bitcode auto upgraded 

https://github.com/llvm/llvm-project/pull/111579
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (PR #111579)

2024-10-08 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/111579
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (PR #111579)

2024-10-08 Thread Matt Arsenault via cfe-commits

arsenm wrote:

* **#111579** https://app.graphite.dev/github/pr/llvm/llvm-project/111579?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/111579
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (PR #111579)

2024-10-08 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/111579

This has been replaced with metadata on individual atomicrmw instructions.

>From be077b9947546b5d6a87be7c57d44b57ff6efb5f Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 27 Jun 2024 13:46:35 +0200
Subject: [PATCH] clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics
 attribute

This has been replaced with metadata on individual atomicrmw instructions.
---
 clang/lib/CodeGen/Targets/AMDGPU.cpp|  3 ---
 clang/test/CodeGenCUDA/amdgpu-func-attrs.cu | 22 -
 clang/test/OpenMP/amdgcn-attributes.cpp |  3 ---
 3 files changed, 28 deletions(-)
 delete mode 100644 clang/test/CodeGenCUDA/amdgpu-func-attrs.cu

diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 37e6af3d4196a8..b852dcffb295c9 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -452,9 +452,6 @@ void AMDGPUTargetCodeGenInfo::setTargetAttributes(
   if (FD)
 setFunctionDeclAttributes(FD, F, M);
 
-  if (M.getContext().getTargetInfo().allowAMDGPUUnsafeFPAtomics())
-F->addFnAttr("amdgpu-unsafe-fp-atomics", "true");
-
   if (!getABIInfo().getCodeGenOpts().EmitIEEENaNCompliantInsts)
 F->addFnAttr("amdgpu-ieee", "false");
 }
diff --git a/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu 
b/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu
deleted file mode 100644
index 89add87919c12d..00
--- a/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu
+++ /dev/null
@@ -1,22 +0,0 @@
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
-// RUN: | FileCheck -check-prefixes=NO-UNSAFE-FP-ATOMICS %s
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
-// RUN: -munsafe-fp-atomics \
-// RUN: | FileCheck -check-prefixes=UNSAFE-FP-ATOMICS %s
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
-// RUN: -o - -x hip %s -munsafe-fp-atomics \
-// RUN: | FileCheck -check-prefix=NO-UNSAFE-FP-ATOMICS %s
-
-#include "Inputs/cuda.h"
-
-__device__ void test() {
-// UNSAFE-FP-ATOMICS: define{{.*}} void @_Z4testv() [[ATTR:#[0-9]+]]
-}
-
-
-// Make sure this is silently accepted on other targets.
-// NO-UNSAFE-FP-ATOMICS-NOT: "amdgpu-unsafe-fp-atomics"
-
-// UNSAFE-FP-ATOMICS-DAG: attributes [[ATTR]] = 
{{.*}}"amdgpu-unsafe-fp-atomics"="true"
diff --git a/clang/test/OpenMP/amdgcn-attributes.cpp 
b/clang/test/OpenMP/amdgcn-attributes.cpp
index 5ddc34537d12fb..2c9e16a4f5098e 100644
--- a/clang/test/OpenMP/amdgcn-attributes.cpp
+++ b/clang/test/OpenMP/amdgcn-attributes.cpp
@@ -5,7 +5,6 @@
 // RUN: %clang_cc1 -target-cpu gfx900 -fopenmp -x c++ -std=c++11 -triple 
amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s 
-fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | 
FileCheck -check-prefixes=CPU,ALL %s
 
 // RUN: %clang_cc1 -menable-no-nans -mno-amdgpu-ieee -fopenmp -x c++ 
-std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa 
-emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path 
%t-ppc-host.bc -o - | FileCheck -check-prefixes=NOIEEE,ALL %s
-// RUN: %clang_cc1 -munsafe-fp-atomics -fopenmp -x c++ -std=c++11 -triple 
amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s 
-fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | 
FileCheck -check-prefixes=UNSAFEATOMIC,ALL %s
 
 // expected-no-diagnostics
 
@@ -35,9 +34,7 @@ int callable(int x) {
 // DEFAULT: attributes #0 = { convergent mustprogress noinline norecurse 
nounwind optnone "amdgpu-flat-work-group-size"="1,42" "kernel" 
"no-trapping-math"="true" "omp_target_thread_limit"="42" 
"stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
 // CPU: attributes #0 = { convergent mustprogress noinline norecurse nounwind 
optnone "amdgpu-flat-work-group-size"="1,42" "kernel" "no-trapping-math"="true" 
"omp_target_thread_limit"="42" "stack-protector-buffer-size"="8" 
"target-cpu"="gfx900" 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
 "uniform-work-group-size"="true" }
 // NOIEEE: attributes #0 = { convergent mustprogress noinline norecurse 
nounwind optnone "amdgpu-flat-work-group-size"="1,42" "amdgpu-ieee"="false" 
"kernel" "no-nans-fp-math"="true" "no-trapping-math"="true" 
"omp_target_thread_limit"="42" "stack-protector-buffer-size"="8" 
"uniform-work-group-size"="true" }
-// UNSAFEATOMIC: attributes #0 = { convergent mustprogress noinline norecurse 
nounwind optnone "amdgpu-flat-work-group-size"="1,42" 
"amdgpu-unsafe-fp-atomics"="true" "kernel" "no-trapping-math"="true" 
"omp_target_thread_limit"="42" "stack-protector-buffer-size"="8" 
"uniform-work-group-size"="true" }
 
 // DEFAULT: attributes #2 = { convergent mustprogress noinline nounwind 
optnone "no-trapping-math"="true" "stack-protector-buffer

[clang] clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (PR #102462)

2024-10-08 Thread Matt Arsenault via cfe-commits


@@ -647,6 +647,14 @@ class LangOptions : public LangOptionsBase {
 return ConvergentFunctions;
   }
 
+  /// Return true if atomicrmw operations targeting allocations in private
+  /// memory are undefined.
+  bool threadPrivateMemoryAtomicsAreUndefined() const {
+// Should be false for OpenMP.
+// TODO: Should this be true for SYCL?

arsenm wrote:

This is now derived from the builtins rather than the language mode

https://github.com/llvm/llvm-project/pull/102462
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (PR #102462)

2024-10-07 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited 
https://github.com/llvm/llvm-project/pull/102462
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-07 Thread Matt Arsenault via cfe-commits


@@ -1,12 +1,14 @@
 ; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics
 ; in the special case when those intrinsics are being generated by the 
CodeGenPrepare;
-; pass during translations with optimization (note -O3 in llc arguments).
+; pass during translations with optimization (note -disable-lsr, to inhibit
+; strength reduction pre-empting with a more preferable match for this pattern
+; in llc arguments).
 
-; RUN: llc -O3 -mtriple=spirv32-unknown-unknown %s -o - | FileCheck %s
-; RUN: %if spirv-tools %{ llc -O3 -mtriple=spirv32-unknown-unknown %s -o - 
-filetype=obj | spirv-val %}
+; RUN: llc -O3 -disable-lsr -mtriple=spirv32-unknown-unknown %s -o - | 
FileCheck %s

arsenm wrote:

 If the intent is to specifically check codegenprepare, should have an IR->IR 
test in test/Transforms/CodeGenPrepare.

I don't know whether the -disable-lsr output is the best or not, but based on 
the name of the test I would assume this would try to document the actual 
result, not with the special flag. 

Also this test shouldn't have been using -O3 (it barely does anything and -O2 
is the default) 


https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)

2024-10-07 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> WRT eliminating the constrained intrinsics completely, I thought that operand 
> bundles could only be attached to function calls and not regular 
> instructions? If I'm wrong, we _still_ have a problem because there are so 
> many uses of the regular FP instructions that we can't be safe-by-default and 
> still use those instructions. We'd need to keep some kind of the constrained 
> intrinsics (or new intrinsics) that give us replacements for the regular FP 
> instructions.

Right, we would need to introduce new llvm.fadd etc. to carry bundles. If there 
are no bundles these could fold back to the regular instruction 

https://github.com/llvm/llvm-project/pull/109798
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)

2024-10-07 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,201 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z16 | FileCheck %s
+;
+; Tests for 16-bit floating point (half).
+
+; Incoming half arguments added together and returned.
+define half @fun0(half %Op0, half %Op1) {
+; CHECK-LABEL: fun0:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r13, %r15, 104(%r15)
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -168
+; CHECK-NEXT:.cfi_def_cfa_offset 328
+; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:.cfi_offset %f8, -168
+; CHECK-NEXT:vlgvf %r0, %v2, 0
+; CHECK-NEXT:llghr %r2, %r0
+; CHECK-NEXT:vlgvf %r13, %v0, 0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r13
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:aebr %f0, %f8
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:vlvgf %v0, %r2, 0
+; CHECK-NEXT:lmg %r13, %r15, 272(%r15)
+; CHECK-NEXT:br %r14
+entry:
+  %Res = fadd half %Op0, %Op1
+  ret half %Res
+}
+
+; The half values are loaded and stored instead.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:.cfi_offset %r12, -64
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -168
+; CHECK-NEXT:.cfi_def_cfa_offset 328
+; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:.cfi_offset %f8, -168
+; CHECK-NEXT:llgh %r12, 0(%r2)
+; CHECK-NEXT:llgh %r2, 0(%r3)
+; CHECK-NEXT:lgr %r13, %r4
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:lgr %r2, %r12
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:aebr %f0, %f8
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:sth %r2, 0(%r13)
+; CHECK-NEXT:lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:br %r14
+entry:
+  %0 = load half, ptr %Op0, align 2
+  %1 = load half, ptr %Op1, align 2
+  %add = fadd half %0, %1
+  store half %add, ptr %Dst, align 2
+  ret void
+}
+
+; Test a chain of half operations which should have each operation surrounded
+; by conversions to/from fp32 for proper emulation.
+define half @fun2(half %Op0, half %Op1, half %Op2) {
+; CHECK-LABEL: fun2:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:.cfi_offset %r12, -64
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -168
+; CHECK-NEXT:.cfi_def_cfa_offset 328
+; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:.cfi_offset %f8, -168
+; CHECK-NEXT:vlgvf %r0, %v2, 0
+; CHECK-NEXT:llghr %r2, %r0
+; CHECK-NEXT:vlgvf %r13, %v4, 0
+; CHECK-NEXT:vlgvf %r12, %v0, 0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r12
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:aebr %f0, %f8
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r2
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r13
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:wfasb %f0, %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:vlvgf %v0, %r2, 0
+; CHECK-NEXT:lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:br %r14
+entry:
+  %A0 = fadd half %Op0, %Op1
+  %Res = fadd half %A0, %Op2
+  ret half %Res
+}
+
+; Store an incoming half argument and return a loaded one.
+define half @fun3(half %Op0, ptr %Dst, ptr %Src) {
+; CHECK-LABEL: fun3:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:vlgvf %r0, %v0, 0
+; CHECK-NEXT:sth %r0, 0(%r2)
+; CHECK-NEXT:lh %r0, 0(%r3)
+; CHECK-NEXT:vlvgf %v0, %r0, 0
+; CHECK-NEXT:br %r14
+entry:
+  store half %Op0, ptr %Dst
+
+  %Res = load half, ptr %Src
+  ret half %Res
+}
+
+; Call a function with half argument and return values.
+declare half @foo(half)
+define void @fun4(ptr %Src, ptr %Dst) {
+; CHECK-LABEL: fun4:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r13, %r15, 104(%r15)
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -160
+; CHECK-NEXT:.cfi_def_cfa_offset 320
+; CHECK-NEXT:lh %r0, 0(%r2)
+; CHECK-NEXT:vlvgf %v0, %r0, 0
+; CHEC

[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)

2024-10-07 Thread Matt Arsenault via cfe-commits


@@ -784,6 +791,20 @@ bool SystemZTargetLowering::useSoftFloat() const {
   return Subtarget.hasSoftFloat();
 }
 
+MVT SystemZTargetLowering::getRegisterTypeForCallingConv(
+  LLVMContext &Context, CallingConv::ID CC,
+  EVT VT) const {
+  // 128-bit single-element vector types are passed like other vectors,
+  // not like their element type.
+  if (VT.isVector() && VT.getSizeInBits() == 128 &&
+  VT.getVectorNumElements() == 1)
+return MVT::v16i8;
+  // Keep f16 so that they can be recognized and handled.
+  if (VT == MVT::f16)

arsenm wrote:

I assume this is because it's an illegal type. It would be much nicer if 
calling convention code just always worked on the original types to begin with 

https://github.com/llvm/llvm-project/pull/109164
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)

2024-10-07 Thread Matt Arsenault via cfe-commits


@@ -1597,6 +1618,15 @@ bool SystemZTargetLowering::splitValueIntoRegisterParts(
 return true;
   }
 
+  // Convert f16 to f32 (Out-arg).
+  if (PartVT == MVT::f16) {
+assert(NumParts == 1 && "");

arsenm wrote:

Remove && "" or make it a meaningful message 

https://github.com/llvm/llvm-project/pull/109164
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)

2024-10-07 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,201 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z16 | FileCheck %s
+;
+; Tests for 16-bit floating point (half).
+
+; Incoming half arguments added together and returned.
+define half @fun0(half %Op0, half %Op1) {
+; CHECK-LABEL: fun0:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r13, %r15, 104(%r15)
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -168
+; CHECK-NEXT:.cfi_def_cfa_offset 328
+; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:.cfi_offset %f8, -168
+; CHECK-NEXT:vlgvf %r0, %v2, 0
+; CHECK-NEXT:llghr %r2, %r0
+; CHECK-NEXT:vlgvf %r13, %v0, 0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r13
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:aebr %f0, %f8
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:vlvgf %v0, %r2, 0
+; CHECK-NEXT:lmg %r13, %r15, 272(%r15)
+; CHECK-NEXT:br %r14
+entry:
+  %Res = fadd half %Op0, %Op1
+  ret half %Res
+}
+
+; The half values are loaded and stored instead.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:.cfi_offset %r12, -64
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -168
+; CHECK-NEXT:.cfi_def_cfa_offset 328
+; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:.cfi_offset %f8, -168
+; CHECK-NEXT:llgh %r12, 0(%r2)
+; CHECK-NEXT:llgh %r2, 0(%r3)
+; CHECK-NEXT:lgr %r13, %r4
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:lgr %r2, %r12
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:aebr %f0, %f8
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:sth %r2, 0(%r13)
+; CHECK-NEXT:lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:br %r14
+entry:
+  %0 = load half, ptr %Op0, align 2
+  %1 = load half, ptr %Op1, align 2
+  %add = fadd half %0, %1
+  store half %add, ptr %Dst, align 2
+  ret void
+}
+
+; Test a chain of half operations which should have each operation surrounded
+; by conversions to/from fp32 for proper emulation.
+define half @fun2(half %Op0, half %Op1, half %Op2) {
+; CHECK-LABEL: fun2:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:.cfi_offset %r12, -64
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -168
+; CHECK-NEXT:.cfi_def_cfa_offset 328
+; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:.cfi_offset %f8, -168
+; CHECK-NEXT:vlgvf %r0, %v2, 0
+; CHECK-NEXT:llghr %r2, %r0
+; CHECK-NEXT:vlgvf %r13, %v4, 0
+; CHECK-NEXT:vlgvf %r12, %v0, 0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r12
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:aebr %f0, %f8
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r2
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r13
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:wfasb %f0, %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:vlvgf %v0, %r2, 0
+; CHECK-NEXT:lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:br %r14
+entry:
+  %A0 = fadd half %Op0, %Op1
+  %Res = fadd half %A0, %Op2
+  ret half %Res
+}
+
+; Store an incoming half argument and return a loaded one.
+define half @fun3(half %Op0, ptr %Dst, ptr %Src) {
+; CHECK-LABEL: fun3:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:vlgvf %r0, %v0, 0
+; CHECK-NEXT:sth %r0, 0(%r2)
+; CHECK-NEXT:lh %r0, 0(%r3)
+; CHECK-NEXT:vlvgf %v0, %r0, 0
+; CHECK-NEXT:br %r14
+entry:
+  store half %Op0, ptr %Dst
+
+  %Res = load half, ptr %Src
+  ret half %Res
+}
+
+; Call a function with half argument and return values.
+declare half @foo(half)
+define void @fun4(ptr %Src, ptr %Dst) {
+; CHECK-LABEL: fun4:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r13, %r15, 104(%r15)
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -160
+; CHECK-NEXT:.cfi_def_cfa_offset 320
+; CHECK-NEXT:lh %r0, 0(%r2)
+; CHECK-NEXT:vlvgf %v0, %r0, 0
+; CHEC

[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)

2024-10-07 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,201 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z16 | FileCheck %s
+;
+; Tests for 16-bit floating point (half).
+
+; Incoming half arguments added together and returned.
+define half @fun0(half %Op0, half %Op1) {
+; CHECK-LABEL: fun0:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r13, %r15, 104(%r15)
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -168
+; CHECK-NEXT:.cfi_def_cfa_offset 328
+; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:.cfi_offset %f8, -168
+; CHECK-NEXT:vlgvf %r0, %v2, 0
+; CHECK-NEXT:llghr %r2, %r0
+; CHECK-NEXT:vlgvf %r13, %v0, 0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r13
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:aebr %f0, %f8
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:vlvgf %v0, %r2, 0
+; CHECK-NEXT:lmg %r13, %r15, 272(%r15)
+; CHECK-NEXT:br %r14
+entry:
+  %Res = fadd half %Op0, %Op1
+  ret half %Res
+}
+
+; The half values are loaded and stored instead.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:.cfi_offset %r12, -64
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -168
+; CHECK-NEXT:.cfi_def_cfa_offset 328
+; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:.cfi_offset %f8, -168
+; CHECK-NEXT:llgh %r12, 0(%r2)
+; CHECK-NEXT:llgh %r2, 0(%r3)
+; CHECK-NEXT:lgr %r13, %r4
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:lgr %r2, %r12
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:aebr %f0, %f8
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:sth %r2, 0(%r13)
+; CHECK-NEXT:lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:br %r14
+entry:
+  %0 = load half, ptr %Op0, align 2
+  %1 = load half, ptr %Op1, align 2
+  %add = fadd half %0, %1
+  store half %add, ptr %Dst, align 2
+  ret void
+}
+
+; Test a chain of half operations which should have each operation surrounded
+; by conversions to/from fp32 for proper emulation.
+define half @fun2(half %Op0, half %Op1, half %Op2) {
+; CHECK-LABEL: fun2:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:.cfi_offset %r12, -64
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -168
+; CHECK-NEXT:.cfi_def_cfa_offset 328
+; CHECK-NEXT:std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:.cfi_offset %f8, -168
+; CHECK-NEXT:vlgvf %r0, %v2, 0
+; CHECK-NEXT:llghr %r2, %r0
+; CHECK-NEXT:vlgvf %r13, %v4, 0
+; CHECK-NEXT:vlgvf %r12, %v0, 0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r12
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:aebr %f0, %f8
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r2
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:llghr %r2, %r13
+; CHECK-NEXT:ldr %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:wfasb %f0, %f8, %f0
+; CHECK-NEXT:brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:vlvgf %v0, %r2, 0
+; CHECK-NEXT:lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:br %r14
+entry:
+  %A0 = fadd half %Op0, %Op1
+  %Res = fadd half %A0, %Op2
+  ret half %Res
+}
+
+; Store an incoming half argument and return a loaded one.
+define half @fun3(half %Op0, ptr %Dst, ptr %Src) {
+; CHECK-LABEL: fun3:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:vlgvf %r0, %v0, 0
+; CHECK-NEXT:sth %r0, 0(%r2)
+; CHECK-NEXT:lh %r0, 0(%r3)
+; CHECK-NEXT:vlvgf %v0, %r0, 0
+; CHECK-NEXT:br %r14
+entry:
+  store half %Op0, ptr %Dst
+
+  %Res = load half, ptr %Src
+  ret half %Res
+}
+
+; Call a function with half argument and return values.
+declare half @foo(half)
+define void @fun4(ptr %Src, ptr %Dst) {
+; CHECK-LABEL: fun4:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:stmg %r13, %r15, 104(%r15)
+; CHECK-NEXT:.cfi_offset %r13, -56
+; CHECK-NEXT:.cfi_offset %r14, -48
+; CHECK-NEXT:.cfi_offset %r15, -40
+; CHECK-NEXT:aghi %r15, -160
+; CHECK-NEXT:.cfi_def_cfa_offset 320
+; CHECK-NEXT:lh %r0, 0(%r2)
+; CHECK-NEXT:vlvgf %v0, %r0, 0
+; CHEC

[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)

2024-10-07 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,85 @@
+// RUN: %clang_cc1 -triple s390x-linux-gnu \
+// RUN: -ffloat16-excess-precision=standard -emit-llvm -o - %s \
+// RUN: | FileCheck %s -check-prefix=STANDARD
+
+// RUN: %clang_cc1 -triple s390x-linux-gnu \
+// RUN: -ffloat16-excess-precision=none -emit-llvm -o - %s \
+// RUN: | FileCheck %s -check-prefix=NONE
+
+// RUN: %clang_cc1 -triple s390x-linux-gnu \
+// RUN: -ffloat16-excess-precision=fast -emit-llvm -o - %s \
+// RUN: | FileCheck %s -check-prefix=FAST
+
+_Float16 f(_Float16 a, _Float16 b, _Float16 c, _Float16 d) {
+return a * b + c * d;
+}
+

arsenm wrote:

Test vector cases 

https://github.com/llvm/llvm-project/pull/109164
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)

2024-10-07 Thread Matt Arsenault via cfe-commits


@@ -784,6 +791,20 @@ bool SystemZTargetLowering::useSoftFloat() const {
   return Subtarget.hasSoftFloat();
 }
 
+MVT SystemZTargetLowering::getRegisterTypeForCallingConv(
+  LLVMContext &Context, CallingConv::ID CC,
+  EVT VT) const {
+  // 128-bit single-element vector types are passed like other vectors,
+  // not like their element type.
+  if (VT.isVector() && VT.getSizeInBits() == 128 &&
+  VT.getVectorNumElements() == 1)
+return MVT::v16i8;

arsenm wrote:

Seems unrelated? 

https://github.com/llvm/llvm-project/pull/109164
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)

2024-10-04 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Think it would be useful to put that on functions in the wrapper headers that 
> definitely aren't convergent? E.g. getting a thread id.

You could, but it's trivially inferable in those cases anyway 



https://github.com/llvm/llvm-project/pull/111076
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)

2024-10-04 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/111076
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)

2024-10-03 Thread Matt Arsenault via cfe-commits


@@ -4106,9 +4106,10 @@ bool CompilerInvocation::ParseLangArgs(LangOptions 
&Opts, ArgList &Args,
   Opts.Blocks = Args.hasArg(OPT_fblocks) || (Opts.OpenCL
 && Opts.OpenCLVersion == 200);
 
-  Opts.ConvergentFunctions = Args.hasArg(OPT_fconvergent_functions) ||
- Opts.OpenCL || (Opts.CUDA && Opts.CUDAIsDevice) ||
- Opts.SYCLIsDevice || Opts.HLSL;
+  Opts.ConvergentFunctions = Args.hasFlag(
+  OPT_fconvergent_functions, OPT_fno_convergent_functions,
+  Opts.OpenMPIsTargetDevice || T.isAMDGPU() || T.isNVPTX() || Opts.OpenCL 
||
+  Opts.CUDAIsDevice || Opts.SYCLIsDevice || Opts.HLSL);

arsenm wrote:

Sort all the language checks together, before the target list. We probably 
should have a hasConvergentOperations() predicate somewhere 

https://github.com/llvm/llvm-project/pull/111076
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)

2024-10-03 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> -fno-convergent-functions to opt-out if you want to test broken behavior. 

You may legitimately know there are no convergent functions in the TU. We also 
have the noconvergent source attribute now for this 

https://github.com/llvm/llvm-project/pull/111076
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Implement resource directory headers for common GPU intrinsics (PR #110179)

2024-10-02 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,187 @@
+//===-- amdgpuintrin.h - AMDPGU intrinsic functions 
---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef __AMDGPUINTRIN_H
+#define __AMDGPUINTRIN_H
+
+#ifndef __AMDGPU__
+#error "This file is intended for AMDGPU targets or offloading to AMDGPU
+#endif
+
+#include 
+#include 
+
+#if defined(__HIP__) || defined(__CUDA__)
+#define _DEFAULT_ATTRS __attribute__((device)) __attribute__((always_inline))
+#else
+#define _DEFAULT_ATTRS __attribute__((always_inline))
+#endif
+
+#pragma omp begin declare target device_type(nohost)
+#pragma omp begin declare variant match(device = {arch(amdgcn)})
+
+// Type aliases to the address spaces used by the AMDGPU backend.
+#define _private __attribute__((opencl_private))
+#define _constant __attribute__((opencl_constant))
+#define _local __attribute__((opencl_local))
+#define _global __attribute__((opencl_global))
+
+// Attribute to declare a function as a kernel.
+#define _kernel __attribute__((amdgpu_kernel, visibility("protected")))
+
+// Returns the number of workgroups in the 'x' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_x() {
+  return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workgroups in the 'y' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_y() {
+  return __builtin_amdgcn_grid_size_y() / __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workgroups in the 'z' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_z() {
+  return __builtin_amdgcn_grid_size_z() / __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the total number of workgruops in the grid.
+_DEFAULT_ATTRS static inline uint64_t _get_num_blocks() {
+  return _get_num_blocks_x() * _get_num_blocks_y() * _get_num_blocks_z();
+}
+
+// Returns the 'x' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_x() {
+  return __builtin_amdgcn_workgroup_id_x();
+}
+
+// Returns the 'y' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_y() {
+  return __builtin_amdgcn_workgroup_id_y();
+}
+
+// Returns the 'z' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_z() {
+  return __builtin_amdgcn_workgroup_id_z();
+}
+
+// Returns the absolute id of the AMD workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_block_id() {
+  return _get_block_id_x() + _get_num_blocks_x() * _get_block_id_y() +
+ _get_num_blocks_x() * _get_num_blocks_y() * _get_block_id_z();
+}
+
+// Returns the number of workitems in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_x() {
+  return __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workitems in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_y() {
+  return __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workitems in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_z() {
+  return __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the total number of workitems in the workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_num_threads() {
+  return _get_num_threads_x() * _get_num_threads_y() * _get_num_threads_z();
+}
+
+// Returns the 'x' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_x() {
+  return __builtin_amdgcn_workitem_id_x();
+}
+
+// Returns the 'y' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_y() {
+  return __builtin_amdgcn_workitem_id_y();
+}
+
+// Returns the 'z' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_z() {
+  return __builtin_amdgcn_workitem_id_z();
+}
+
+// Returns the absolute id of the thread in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_thread_id() {
+  return _get_thread_id_x() + _get_num_threads_x() * _get_thread_id_y() +
+ _get_num_threads_x() * _get_num_threads_y() * _get_thread_id_z();
+}
+
+// Returns the size of an AMD wavefront, either 32 or 64 depending on hardware
+// and compilation options.
+_DEFAULT_ATTRS static inline uint32_t _get_lane_size() {
+  return __builtin_amdgcn_wavefrontsize();
+}
+
+// Returns the id of the thread inside of an AMD wavefront executing together.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t _get_lane_id() {
+  return __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u));
+}
+
+// Returns the bit-mask of active threads in

[clang] [Clang] Implement resource directory headers for common GPU intrinsics (PR #110179)

2024-10-02 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,187 @@
+//===-- amdgpuintrin.h - AMDPGU intrinsic functions 
---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef __AMDGPUINTRIN_H
+#define __AMDGPUINTRIN_H
+
+#ifndef __AMDGPU__
+#error "This file is intended for AMDGPU targets or offloading to AMDGPU
+#endif
+
+#include 
+#include 
+
+#if defined(__HIP__) || defined(__CUDA__)
+#define _DEFAULT_ATTRS __attribute__((device)) __attribute__((always_inline))
+#else
+#define _DEFAULT_ATTRS __attribute__((always_inline))
+#endif
+
+#pragma omp begin declare target device_type(nohost)
+#pragma omp begin declare variant match(device = {arch(amdgcn)})
+
+// Type aliases to the address spaces used by the AMDGPU backend.
+#define _private __attribute__((opencl_private))
+#define _constant __attribute__((opencl_constant))
+#define _local __attribute__((opencl_local))
+#define _global __attribute__((opencl_global))
+
+// Attribute to declare a function as a kernel.
+#define _kernel __attribute__((amdgpu_kernel, visibility("protected")))
+
+// Returns the number of workgroups in the 'x' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_x() {
+  return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workgroups in the 'y' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_y() {
+  return __builtin_amdgcn_grid_size_y() / __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workgroups in the 'z' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_z() {
+  return __builtin_amdgcn_grid_size_z() / __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the total number of workgruops in the grid.
+_DEFAULT_ATTRS static inline uint64_t _get_num_blocks() {
+  return _get_num_blocks_x() * _get_num_blocks_y() * _get_num_blocks_z();
+}
+
+// Returns the 'x' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_x() {
+  return __builtin_amdgcn_workgroup_id_x();
+}
+
+// Returns the 'y' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_y() {
+  return __builtin_amdgcn_workgroup_id_y();
+}
+
+// Returns the 'z' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_z() {
+  return __builtin_amdgcn_workgroup_id_z();
+}
+
+// Returns the absolute id of the AMD workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_block_id() {
+  return _get_block_id_x() + _get_num_blocks_x() * _get_block_id_y() +
+ _get_num_blocks_x() * _get_num_blocks_y() * _get_block_id_z();
+}
+
+// Returns the number of workitems in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_x() {
+  return __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workitems in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_y() {
+  return __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workitems in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_z() {
+  return __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the total number of workitems in the workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_num_threads() {
+  return _get_num_threads_x() * _get_num_threads_y() * _get_num_threads_z();
+}
+
+// Returns the 'x' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_x() {
+  return __builtin_amdgcn_workitem_id_x();
+}
+
+// Returns the 'y' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_y() {
+  return __builtin_amdgcn_workitem_id_y();
+}
+
+// Returns the 'z' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_z() {
+  return __builtin_amdgcn_workitem_id_z();
+}
+
+// Returns the absolute id of the thread in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_thread_id() {
+  return _get_thread_id_x() + _get_num_threads_x() * _get_thread_id_y() +
+ _get_num_threads_x() * _get_num_threads_y() * _get_thread_id_z();
+}
+
+// Returns the size of an AMD wavefront, either 32 or 64 depending on hardware
+// and compilation options.
+_DEFAULT_ATTRS static inline uint32_t _get_lane_size() {
+  return __builtin_amdgcn_wavefrontsize();
+}
+
+// Returns the id of the thread inside of an AMD wavefront executing together.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t _get_lane_id() {

arsenm wrote:

We should really just rip out the convergent source attribute. We should only 
have noconvergent

[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits


@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))
+return std::pair(Ptr, AddressSpace::CrossWorkgroup);
+
+  return std::pair(nullptr, UINT32_MAX);
+}

arsenm wrote:

This is the fancy stuff that should go into a follow up patch to add assume 
support 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits


@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {

arsenm wrote:

Move to separate change, not sure this is necessarily valid for spirv 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits


@@ -178,6 +266,9 @@ void SPIRVPassConfig::addIRPasses() {
 addPass(createSPIRVStructurizerPass());
   }
 
+  if (TM.getOptLevel() > CodeGenOptLevel::None)
+addPass(createInferAddressSpacesPass(AddressSpace::Generic));

arsenm wrote:

Not sure why this is a pass parameter to InferAddressSpaces, and a TTI hook 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits


@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))

arsenm wrote:

Shouldn't be looking at the amdgcn intrinsics? Surely spirv has its own 
operations for this? 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits


@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))
+return std::pair(Ptr, AddressSpace::CrossWorkgroup);
+
+  return std::pair(nullptr, UINT32_MAX);
+}
+
+bool SPIRVTargetMachine::isNoopAddrSpaceCast(unsigned SrcAS,
+ unsigned DestAS) const {
+  if (SrcAS != AddressSpace::Generic && SrcAS != AddressSpace::CrossWorkgroup)
+return false;
+  return DestAS == AddressSpace::Generic ||
+ DestAS == AddressSpace::CrossWorkgroup;
+}

arsenm wrote:

This is separate, I don't think InferAddressSpaces relies on this 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

arsenm wrote:

You don't need to duplicate all of these tests. You just need some basic 
samples that the target is implemented, the full set is testing pass mechanics 
which can be done on any target 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [NFC][TableGen] Change `Record::getSuperClasses` to use const Record* (PR #110845)

2024-10-02 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/110845
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [TableGen] Change `DefInit::Def` to a const Record pointer (PR #110747)

2024-10-02 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/110747
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [TableGen] Change `DefInit::Def` to a const Record pointer (PR #110747)

2024-10-02 Thread Matt Arsenault via cfe-commits


@@ -1660,7 +1660,7 @@ class Record {
   // this record.
   SmallVector Locs;
   SmallVector ForwardDeclarationLocs;
-  SmallVector ReferenceLocs;
+  mutable SmallVector ReferenceLocs;

arsenm wrote:

You have the const_cast on the addition, so this is unnecessary? 

https://github.com/llvm/llvm-project/pull/110747
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-02 Thread Matt Arsenault via cfe-commits


@@ -1,56 +0,0 @@
-; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics

arsenm wrote:

The codegen prepare behavior is still backend code to be tested. You can just 
run codegenprepare as a standalone pass too (usually would have separate llc 
and opt run lines in such a test) 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [llvm] [mlir] Make Ownership of MachineModuleInfo in Its Wrapper Pass External (PR #110443)

2024-10-02 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,102 @@
+//===-- LLVMTargetMachineC.cpp 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This file implements the LLVM-C part of TargetMachine.h that directly
+// depends on the CodeGen library.
+//
+//===--===//
+
+#include "llvm-c/Core.h"
+#include "llvm-c/TargetMachine.h"
+#include "llvm/CodeGen/MachineModuleInfo.h"
+#include "llvm/IR/LegacyPassManager.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Target/TargetMachine.h"
+
+using namespace llvm;
+
+static TargetMachine *unwrap(LLVMTargetMachineRef P) {
+  return reinterpret_cast(P);
+}
+
+static Target *unwrap(LLVMTargetRef P) { return reinterpret_cast(P); 
}
+
+static LLVMTargetMachineRef wrap(const TargetMachine *P) {
+  return reinterpret_cast(const_cast(P));
+}
+
+static LLVMTargetRef wrap(const Target *P) {
+  return reinterpret_cast(const_cast(P));
+}
+
+static LLVMBool LLVMTargetMachineEmit(LLVMTargetMachineRef T, LLVMModuleRef M,
+  raw_pwrite_stream &OS,
+  LLVMCodeGenFileType codegen,
+  char **ErrorMessage) {
+  TargetMachine *TM = unwrap(T);
+  Module *Mod = unwrap(M);
+
+  legacy::PassManager pass;
+  MachineModuleInfo MMI(static_cast(TM));
+
+  std::string error;
+
+  Mod->setDataLayout(TM->createDataLayout());
+
+  CodeGenFileType ft;
+  switch (codegen) {
+  case LLVMAssemblyFile:
+ft = CodeGenFileType::AssemblyFile;
+break;
+  default:
+ft = CodeGenFileType::ObjectFile;
+break;
+  }
+  if (TM->addPassesToEmitFile(pass, MMI, OS, nullptr, ft)) {
+error = "TargetMachine can't emit a file of this type";
+*ErrorMessage = strdup(error.c_str());
+return true;
+  }
+
+  pass.run(*Mod);
+
+  OS.flush();
+  return false;
+}
+
+LLVMBool LLVMTargetMachineEmitToFile(LLVMTargetMachineRef T, LLVMModuleRef M,
+ const char *Filename,
+ LLVMCodeGenFileType codegen,
+ char **ErrorMessage) {
+  std::error_code EC;
+  raw_fd_ostream dest(Filename, EC, sys::fs::OF_None);
+  if (EC) {
+*ErrorMessage = strdup(EC.message().c_str());
+return true;
+  }
+  bool Result = LLVMTargetMachineEmit(T, M, dest, codegen, ErrorMessage);
+  dest.flush();
+  return Result;
+}
+
+LLVMBool LLVMTargetMachineEmitToMemoryBuffer(LLVMTargetMachineRef T,
+ LLVMModuleRef M,
+ LLVMCodeGenFileType codegen,
+ char **ErrorMessage,
+ LLVMMemoryBufferRef *OutMemBuf) {
+  SmallString<0> CodeString;
+  raw_svector_ostream OStream(CodeString);
+  bool Result = LLVMTargetMachineEmit(T, M, OStream, codegen, ErrorMessage);
+
+  StringRef Data = OStream.str();
+  *OutMemBuf =
+  LLVMCreateMemoryBufferWithMemoryRangeCopy(Data.data(), Data.size(), "");
+  return Result;
+}

arsenm wrote:

Missing newline at end of file 

https://github.com/llvm/llvm-project/pull/110443
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-02 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> with the PR pulled in (on top of LLVM's HEAD 
> [aadfba9](https://github.com/llvm/llvm-project/commit/aadfba9b2aa107f9cada2fd9bcbe612cbf560650)),
>  the compilation command is: `clang++ -cl-std=CL2.0 -emit-llvm -c -x cl -g0 
> --target=spir -Xclang -finclude-default-header -O2 test.cl` The output LLVM 
> IR after the optimizations is:

You want spirv, not spir 

> note bitcast to i128 with the following truncation to i96 - those types 
> aren't part of the datalayout, yet some optimization generated them. So 
> something has to be done with it and changing the datalayout is not enough.

Any pass is allowed to introduce any IR type. This field is a pure optimization 
hint. It is not required to do anything, and places no restrictions on any pass

> 
> > This does not mean arbitrary integer bitwidths do not work. The n field is 
> > weird, it's more of an optimization hint.
> 
> And I can imagine that we would want to not only be able to emit 4-bit 
> integers in the frontend, but also allow optimization passes to emit them. 

Just because there's an extension doesn't mean it's desirable to use them. On 
real targets, they'll end up codegenning in wider types anyway

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> 1. Usually (or at least AFAIK) optimization passes won't consider datalayout 
> automatically, 

The datalayout is a widely used global constant. There's no option of "not 
considering it"

>  Do you plan to go over LLVM passes adding this check?

There's nothing new to do here. This has always existed

> 2. Some existing and future extensions might allow extra bit widths for 
> integers. 

This does not mean arbitrary integer bitwidths do not work. The n field is 
weird, it's more of an optimization hint.



https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited 
https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits


@@ -1,56 +0,0 @@
-; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics

arsenm wrote:

> Right but it's relying on a non-guaranteed maybe-optimisation firing, as far 
> as I can tell.

The point is to test the optimization does work. The codegen pipeline is a 
bunch of intertwined IR passes on top of core codegen, and they need to 
cooperate 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-10-01 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> > I would like to avoid adding additional special properties to AS0, or 
> > defining the flat concept.
> 
> How can we add a new specification w/o defining it?

By not defining it in terms of flat addressing. Just make it the undesirable 
address space

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits


@@ -54,14 +54,14 @@ static std::string computeDataLayout(const Triple &TT) {
   // memory model used for graphics: PhysicalStorageBuffer64. But it shouldn't
   // mean anything.
   if (Arch == Triple::spirv32)
-return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-"
-   "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1";
+return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-"
+   "v256:256-v512:512-v1024:1024-n8:16:32:64-G1";
   if (TT.getVendor() == Triple::VendorType::AMD &&
   TT.getOS() == Triple::OSType::AMDHSA)
-return "e-i64:64-v16:16-v24:32-v32:32-v48:64-"
-   "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0";
-  return "e-i64:64-v16:16-v24:32-v32:32-v48:64-"
- "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1";
+return "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-"
+   "v512:512-v1024:1024-n32:64-S32-G1-P4-A0";

arsenm wrote:

AMDGPU sets S32 now, which isn't wrong. But the rest of codegen assumes 16-byte 
alignment by default 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits


@@ -1,56 +0,0 @@
-; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics

arsenm wrote:

That is not the nature of this kind of test

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)

2024-10-01 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/110198
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [IR] Allow fast math flags on calls with homogeneous FP struct types (PR #110506)

2024-10-01 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/110506
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits


@@ -1,56 +0,0 @@
-; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics

arsenm wrote:

This one is testing codegenprepare as part of the normal codegen pipeline, so 
this one is fine. The other case was a full optimization pipeline + codegen, 
which are more far removed 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits


@@ -1,56 +0,0 @@
-; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics

arsenm wrote:

Not sure what the problem is with this test, but it's already covered by 
another? 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits


@@ -54,14 +54,14 @@ static std::string computeDataLayout(const Triple &TT) {
   // memory model used for graphics: PhysicalStorageBuffer64. But it shouldn't
   // mean anything.
   if (Arch == Triple::spirv32)
-return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-"
-   "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1";
+return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-"
+   "v256:256-v512:512-v1024:1024-n8:16:32:64-G1";
   if (TT.getVendor() == Triple::VendorType::AMD &&
   TT.getOS() == Triple::OSType::AMDHSA)
-return "e-i64:64-v16:16-v24:32-v32:32-v48:64-"
-   "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0";
-  return "e-i64:64-v16:16-v24:32-v32:32-v48:64-"
- "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1";
+return "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-"
+   "v512:512-v1024:1024-n32:64-S32-G1-P4-A0";

arsenm wrote:

The stack alignment should be 16 bytes (S128), but that's not mentioned in the 
description. Do this separately? I'm pretty sure this is wrong for the amdgcn 
triples too 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [IR] Allow fast math flags on calls with homogeneous FP struct types (PR #110506)

2024-09-30 Thread Matt Arsenault via cfe-commits


@@ -1122,6 +1122,26 @@ define void @fastMathFlagsForArrayCalls([2 x float] %f, 
[2 x double] %d1, [2 x <
   ret void
 }
 
+declare { float, float } @fmf_struct_f32()
+declare { double, double } @fmf_struct_f64()
+declare { <4 x double>, <4 x double> } @fmf_struct_v4f64()
+
+; CHECK-LABEL: fastMathFlagsForStructCalls(
+define void @fastMathFlagsForStructCalls({ float, float } %f, { double, double 
} %d1, { <4 x double>, <4 x double> } %d2) {
+  %call.fast = call fast { float, float } @fmf_struct_f32()
+  ; CHECK: %call.fast = call fast { float, float } @fmf_struct_f32()
+
+  ; Throw in some other attributes to make sure those stay in the right places.
+
+  %call.nsz.arcp = notail call nsz arcp { double, double } @fmf_struct_f64()
+  ; CHECK: %call.nsz.arcp = notail call nsz arcp { double, double } 
@fmf_struct_f64()
+
+  %call.nnan.ninf = tail call nnan ninf fastcc { <4 x double>, <4 x double> } 
@fmf_struct_v4f64()
+  ; CHECK: %call.nnan.ninf = tail call nnan ninf fastcc { <4 x double>, <4 x 
double> } @fmf_struct_v4f64()
+

arsenm wrote:

Can you also add a test with nofpclass attributes on the return / argument? The 
intent was it would be allowed for the same types as FPMathOperator 

https://github.com/llvm/llvm-project/pull/110506
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [llvm] [mlir] Make Ownership of MachineModuleInfo in Its Wrapper Pass External (PR #110443)

2024-09-30 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> * Move the MC emission functions in `TargetMachine` to `LLVMTargetMachine`. 
> With the changes in this PR, we explicitly assume in both 
> `addPassesToEmitFile` and `addPassesToEmitMC` that the `TargetMachine` is an 
> `LLVMTargetMachine`; Hence it does not make sense for these functions to be 
> present in the `TargetMachine` interface.

Was this already implicitly assumed? IIRC there was some layering reason why 
this is the way it was. There were previous attempts to merge these before, 
which were abandoned: 

https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html

https://reviews.llvm.org/D38482
https://reviews.llvm.org/D38489

https://github.com/llvm/llvm-project/pull/110443
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [LLVM][TableGen] Change SeachableTableEmitter to use const RecordKeeper (PR #110032)

2024-09-30 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/110032
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)

2024-09-30 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> With the constrained intrinsics the default is safe because optimizations 
> don't recognize the constrained intrinsic and thus don't know how to optimize 
> it. If we instead rely on the strictfp attribute then we'll need possibly 
> thousands of checks for this attribute, we'll need everyone going forward to 
> remember to check for it, and we'll have no way to verify that this rule is 
> being followed.

The current state already requires you to check this for any library calls. Not 
sure any wide audit of those ever happened. I don't see a better alternative to 
cover those, plus the full set of target intrinsics. 


https://github.com/llvm/llvm-project/pull/109798
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [LLVM][TableGen] Change SeachableTableEmitter to use const RecordKeeper (PR #110032)

2024-09-30 Thread Matt Arsenault via cfe-commits


@@ -1556,7 +1557,7 @@ class RecordVal {
   bool IsUsed = false;
 
   /// Reference locations to this record value.
-  SmallVector ReferenceLocs;
+  mutable SmallVector ReferenceLocs;

arsenm wrote:

Is this removed in later patches? 

https://github.com/llvm/llvm-project/pull/110032
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)

2024-09-30 Thread Matt Arsenault via cfe-commits


@@ -273,6 +273,74 @@ void test_builtin_elementwise_min(int i, short s, double 
d, float4 v, int3 iv, u
   // expected-error@-1 {{1st argument must be a vector, integer or floating 
point type (was '_Complex float')}}
 }
 
+void test_builtin_elementwise_maximum(int i, short s, float f, double d, 
float4 v, int3 iv, unsigned3 uv, int *p) {
+  i = __builtin_elementwise_maximum(p, d);
+  // expected-error@-1 {{arguments are of different types ('int *' vs 
'double')}}
+
+  struct Foo foo = __builtin_elementwise_maximum(d, d);
+  // expected-error@-1 {{initializing 'struct Foo' with an expression of 
incompatible type 'double'}}
+
+  i = __builtin_elementwise_maximum(i);
+  // expected-error@-1 {{too few arguments to function call, expected 2, have 
1}}
+
+  i = __builtin_elementwise_maximum();
+  // expected-error@-1 {{too few arguments to function call, expected 2, have 
0}}
+
+  i = __builtin_elementwise_maximum(i, i, i);
+  // expected-error@-1 {{too many arguments to function call, expected 2, have 
3}}
+
+  i = __builtin_elementwise_maximum(v, iv);
+  // expected-error@-1 {{arguments are of different types ('float4' (vector of 
4 'float' values) vs 'int3' (vector of 3 'int' values))}}
+
+  i = __builtin_elementwise_maximum(uv, iv);
+  // expected-error@-1 {{arguments are of different types ('unsigned3' (vector 
of 3 'unsigned int' values) vs 'int3' (vector of 3 'int' values))}}
+
+  d = __builtin_elementwise_maximum(d, f);
+
+  v = __builtin_elementwise_maximum(v, v);
+
+  int A[10];
+  A = __builtin_elementwise_maximum(A, A);
+  // expected-error@-1 {{1st argument must be a vector, integer or floating 
point type (was 'int *')}}
+
+  _Complex float c1, c2;
+  c1 = __builtin_elementwise_maximum(c1, c2);
+  // expected-error@-1 {{1st argument must be a vector, integer or floating 
point type (was '_Complex float')}}
+}
+
+void test_builtin_elementwise_minimum(int i, short s, float f, double d, 
float4 v, int3 iv, unsigned3 uv, int *p) {
+  i = __builtin_elementwise_minimum(p, d);
+  // expected-error@-1 {{arguments are of different types ('int *' vs 
'double')}}
+
+  struct Foo foo = __builtin_elementwise_minimum(d, d);
+  // expected-error@-1 {{initializing 'struct Foo' with an expression of 
incompatible type 'double'}}
+
+  i = __builtin_elementwise_minimum(i);
+  // expected-error@-1 {{too few arguments to function call, expected 2, have 
1}}
+
+  i = __builtin_elementwise_minimum();
+  // expected-error@-1 {{too few arguments to function call, expected 2, have 
0}}
+
+  i = __builtin_elementwise_minimum(i, i, i);
+  // expected-error@-1 {{too many arguments to function call, expected 2, have 
3}}
+
+  i = __builtin_elementwise_minimum(v, iv);
+  // expected-error@-1 {{arguments are of different types ('float4' (vector of 
4 'float' values) vs 'int3' (vector of 3 'int' values))}}
+
+  i = __builtin_elementwise_minimum(uv, iv);
+  // expected-error@-1 {{arguments are of different types ('unsigned3' (vector 
of 3 'unsigned int' values) vs 'int3' (vector of 3 'int' values))}}
+
+  d = __builtin_elementwise_minimum(f, d);
+
+  int A[10];
+  A = __builtin_elementwise_minimum(A, A);
+  // expected-error@-1 {{1st argument must be a vector, integer or floating 
point type (was 'int *')}}

arsenm wrote:

The codegen assumes this is only floating point, so the integer part of the 
message is wrong. Also missing tests using 2 arguments with only integer / 
vector of integer 

https://github.com/llvm/llvm-project/pull/110198
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)

2024-09-26 Thread Matt Arsenault via cfe-commits


@@ -706,6 +706,12 @@ Unless specified otherwise operation(±0) = ±0 and 
operation(±infinity) = ±in
  representable values for the 
signed/unsigned integer type.
  T __builtin_elementwise_sub_sat(T x, T y)   return the difference of x and y, 
clamped to the range ofinteger types
  representable values for the 
signed/unsigned integer type.
+ T __builtin_elementwise_maximum(T x, T y)   return x or y, whichever is 
larger. If exactly one argument is   integer and floating point types
+ a NaN, return the other argument. 
If both arguments are NaNs,

arsenm wrote:

This doesn't fully explain the semantics, and I'd like to avoid trying to 
re-explain all the details in every instance of this. Can you just point this 
to some other description of the semantics? 

https://github.com/llvm/llvm-project/pull/110198
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [cuda][[HIP] `__constant__` should imply constant (PR #110182)

2024-09-26 Thread Matt Arsenault via cfe-commits

arsenm wrote:

If it's not legal for it to be marked as constant, it's also not legal to use 
constant address space

https://github.com/llvm/llvm-project/pull/110182
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-09-25 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Both in InferAddressSpaces, and in Attributor, you don't really care about 
> whether a flat address-space exists. 

Right, this is more of an undesirable address space. Optimizations don't need 
to know anything about its behavior beyond that.


> In reply to your question above re whether this is a DL or a Target property, 
> I don't have a strong opinion there (it appears @shiltian and @arsenm might). 

I don't really like putting this in the DataLayout. My original idea was to 
move it to TargetMachine, but we want to avoid the dependence on CodeGen. The 
DataLayout is just the other place we have that defines module level target 
information. The simple solution is just have a switch over the target 
architecture in Attributor.

> I do believe that this is a necessary bit of query-able information, 
> especially from a Clang, for correctness reasons (more on that below).

I don't think this buys frontends much. Clang still needs to understand the 
full language address space -> target address space mapping. This would just 
allow populating one entry generically


> Ah, this is part of the challenge - we do indeed assume that 0 is flat, but 
> Targets aren't bound by LangRef to use 0 to denote flat (and some, like SPIR 
> / SPIR-V) do not

As I mentioned above, SPIRV can just work its way out of this problem for its 
IR. SPIR's only reason for existence is bitcode compatibility, so doing 
anything with there will be quite a lot of work which will never realistically 
happen. 


> I'm fine with adding the enforcement in LLVM that AS0 needs to be the flat 
> AS, if a target has it, but the definition of a flat AS still needs to be 
> set. If we do that, how will SPIR/SPIR-V work?
> This is the most generic wording I can come up with so far. Happy to hear 
> more feedbacks.

I would like to avoid adding additional special properties to AS0, or defining 
the flat concept. 



https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-09-25 Thread Matt Arsenault via cfe-commits


@@ -579,7 +579,7 @@ static StringRef computeDataLayout(const Triple &TT) {
  
"-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-"
  "v32:32-v48:64-v96:"
  "128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-"
- "G1-ni:7:8:9";
+ "G1-ni:7:8:9-T0";

arsenm wrote:

No, but yes. We probably should just define 0 to be the flat address space and 
take the same numbers as amdgcn. Flat will just be unsupported in codegen (but 
theoretically someone could go implement software tagged pointers)

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-09-25 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Just to clarify, does this mean any two non-flat address space pointers 
> _cannot_ alias?

This should change nothing about aliasing. The IR assumption is any address 
space may alias any other 

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-09-25 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> There are targets that use a different integer to denote flat (e.g. see SPIR 
> & SPIR-V). Whilst I know that there are objections to that, the fact remains 
> that they had historical reason (wanted to make legacy OCL convention that 
> the default is private work, and given that IR defaults to 0 this was an 
> easy, if possibly costly, way out; 

The SPIRV IR would be better off changing its numbers around like we did in 
AMDGPU ages ago. The only concern would be bitcode compatibility, but given 
it's still an "experimental target" that shouldn't be an issue.

> AMDGPU also borks this for legacy OCL reasons, which has been a source of 
> pain). 

This is only a broken in-clang hack, the backend IR always uses the correct 
address space 

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-09-25 Thread Matt Arsenault via cfe-commits


@@ -66,12 +66,12 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
   HasFloat16 = true;
 
   if (TargetPointerWidth == 32)
-resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64");
+resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64-T0");

arsenm wrote:

It is 

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-09-25 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.

I think we need more thought about how the ABI for this will work, but we need 
to start somewhere 

https://github.com/llvm/llvm-project/pull/102913
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)

2024-09-25 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> If we can't keep the constrained semantics and near-100% guarantee that no 
> new exceptions will be introduced then operand bundles are not a replacement 
> for the constrained intrinsics.

We would still need a call / function attribute to indicate strictfp calls, and 
such calls would then be annotatable with bundles to relax the assumptions. The 
default would always have to be the most conservative assumption 



https://github.com/llvm/llvm-project/pull/109798
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)

2024-09-25 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/94647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)

2024-09-25 Thread Matt Arsenault via cfe-commits


@@ -434,6 +434,15 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
   indicatePessimisticFixpoint();
   return;
 }
+
+for (Instruction &I : instructions(F)) {
+  if (isa(I) &&

arsenm wrote:

Simple example, where the cast is still directly the operand. It could be 
further nested inside another constant expression 

https://github.com/llvm/llvm-project/pull/94647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)

2024-09-25 Thread Matt Arsenault via cfe-commits


@@ -434,6 +434,15 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
   indicatePessimisticFixpoint();
   return;
 }
+
+for (Instruction &I : instructions(F)) {
+  if (isa(I) &&

arsenm wrote:

5->3 is an illegal address space cast, but the round trip cast can fold away. 
You don't want the cast back to the original address space. 

https://github.com/llvm/llvm-project/pull/94647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)

2024-09-25 Thread Matt Arsenault via cfe-commits

arsenm wrote:

Also it's silly that we need to do bitcode autoupgrade of "experimental" 
intrinsics, but x86 started shipping with strictfp enabled in production before 
they graduated. We might as well drop the experimental bit then 

https://github.com/llvm/llvm-project/pull/109798
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)

2024-09-25 Thread Matt Arsenault via cfe-commits


@@ -357,6 +357,9 @@ class IRBuilderBase {
 
   void setConstrainedFPCallAttr(CallBase *I) {
 I->addFnAttr(Attribute::StrictFP);
+MemoryEffects ME = MemoryEffects::inaccessibleMemOnly();

arsenm wrote:

It shouldn't be necessary to touch the attributes. The set of intrinsic 
attributes are fixed (callsite attributes are another thing, but generally 
should be droppable here) 

https://github.com/llvm/llvm-project/pull/109798
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-09-25 Thread Matt Arsenault via cfe-commits


@@ -78,15 +78,15 @@ void MCResourceInfo::finalize(MCContext &OutContext) {
 }
 
 MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) {
-  return OutContext.getOrCreateSymbol("max_num_vgpr");
+  return OutContext.getOrCreateSymbol("amdgcn.max_num_vgpr");

arsenm wrote:

We're usually using amdgpu instead of amdgcn in new fields 

https://github.com/llvm/llvm-project/pull/102913
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Use std::optional::value_or (NFC) (PR #109894)

2024-09-24 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/109894
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   10   >