[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-25 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH closed 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-24 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-17 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-16 Thread Vikram Hegde via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

vikramRH wrote:

Makes sense, updated.

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

arsenm wrote:

My worry is this test won't get updated whenever that's fixed. How about you 
leave the existing checks, but add an additional "not --crash" run line that 
checks the verifier error? 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Vikram Hegde via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

vikramRH wrote:

this is a preexisting error, and the failure is further down the pipeline. 
(after sreg alloc now i guess), does it make sense to have it as xfail now 
rather then stopping after isel? 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

arsenm wrote:

I thought you switched to using the intrinsics such that this would avoid the 
verifier error. This test is going to fail if you are still hitting the error 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Vikram Hegde via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

vikramRH wrote:

I currently see machine verifier failure which is not related to this patch. An 
 i32 example with trunc here, https://godbolt.org/z/he8asMe77. 
This is also seen with wider type legalizations that we do now, so I cannot 
integrate these with existing tests just yet. am I missing something here ?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits


@@ -6129,13 +6150,55 @@ static SDValue lowerLaneOp(const SITargetLowering , 
SDNode *N,
   if (ValSize % 32 != 0)
 return SDValue();
 
+  auto unrollLaneOp = [, ](SDNode *N) -> SDValue {
+EVT VT = N->getValueType(0);
+unsigned NE = VT.getVectorNumElements();
+EVT EltVT = VT.getVectorElementType();
+SmallVector Scalars;
+unsigned NumOperands = N->getNumOperands();
+SmallVector Operands(NumOperands);
+SDNode *GL = N->getGluedNode();
+
+if (GL) {
+  // only handle convegrencectrl_glue
+  assert(GL->getOpcode() == ISD::CONVERGENCECTRL_GLUE);
+}

arsenm wrote:

Typo convegrencectrl_glue. Also assert(!GL || ...)

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

arsenm wrote:

We don't need these isel checks for the tokens if the IR verifier is going to 
catch the breakage. This test can jus be merged with the existing intrinsic 
tests 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Vikram Hegde via cfe-commits

vikramRH wrote:





> That's another option. The only real plus to the intermediate is it's 
> slightly less annoying to write combines for. But there are limited combining 
> opportunities for these

we now legalize to intrinsics directly. The SDAG lowering uses a new helper to 
unroll vector cases while also handling convergence tokens

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Or drop the new nodes altogether and legelaize to intrinsics directly ?

That's another option. The only real plus to the intermediate is it's slightly 
less annoying to write combines for. But there are limited combining 
opportunities for these 



https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,46 @@
+# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o 
/dev/null %s 2>&1 | FileCheck %s

arsenm wrote:

I'd still test all 3, but yes an IR test 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> > > > @jayfoad's testcase fails and the same test should be repeated for all 
> > > > 3 intrinsics
> > > 
> > > 
> > > added MIR tests for 3 intrinsics. The issue is that Im not able to attach 
> > > the glue nodes to newly created laneop pieces since they fail at 
> > > selection. #87509 should enable this,
> > 
> > 
> > I am not really comfortable waiting for #87509 to fix convergence tokens in 
> > this expansion. Is it really true that this expansion cannot be fixed 
> > independent of future work on `CONVERGENCE_GLUE`? There is no way to 
> > manually handle the same glue operands??
> 
> I guess one way would be to have custom selection for each of the new node 
> type introduced, but would this be a proper way forward ? (this would be in 
> general for all convergent SDNodes i guess if selection is not made generic)

Or drop the new nodes altogether and legelaize to intrinsics directly ?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> > > @jayfoad's testcase fails and the same test should be repeated for all 3 
> > > intrinsics
> > 
> > 
> > added MIR tests for 3 intrinsics. The issue is that Im not able to attach 
> > the glue nodes to newly created laneop pieces since they fail at selection. 
> > #87509 should enable this,
> 
> I am not really comfortable waiting for #87509 to fix convergence tokens in 
> this expansion. Is it really true that this expansion cannot be fixed 
> independent of future work on `CONVERGENCE_GLUE`? There is no way to manually 
> handle the same glue operands??

I guess one way would be to have custom selection for each of the new node type 
introduced, but would this be a proper way forward ? (this would be in general 
for all convergent SDNodes i guess if selection is not made generic)

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Sameer Sahasrabuddhe via cfe-commits


@@ -0,0 +1,46 @@
+# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o 
/dev/null %s 2>&1 | FileCheck %s

ssahasra wrote:

All it needs is one new file in `test/CodeGen/AMDGPU` where 64-bit lane ops are 
used with a convergence tokens. Mark that as XFAIL. When the issue is fixed, 
that file can be merged into the existing tests. We don't need to test each of 
the convergence control intrinsics. It's enough to just have a token on a 
readlane.

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Sameer Sahasrabuddhe via cfe-commits

ssahasra wrote:

> > @jayfoad's testcase fails and the same test should be repeated for all 3 
> > intrinsics
> 
> added MIR tests for 3 intrinsics. The issue is that Im not able to attach the 
> glue nodes to newly created laneop pieces since they fail at selection. 
> #87509 should enable this,

I am not really comfortable waiting for #87509 to fix convergence tokens in 
this expansion. Is it really true that this expansion cannot be fixed 
independent of future work on `CONVERGENCE_GLUE`? There is no way to manually 
handle the same glue operands??

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Vikram Hegde via cfe-commits


@@ -0,0 +1,46 @@
+# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o 
/dev/null %s 2>&1 | FileCheck %s

vikramRH wrote:

Okay, I'll update with IR's

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,46 @@
+# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o 
/dev/null %s 2>&1 | FileCheck %s

arsenm wrote:

You should not need to introduce any new machine verifier tests, they are not 
useful. The useful test would be the IR that produces the verifier error, which 
ideally would be fixed 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> @jayfoad's testcase fails and the same test should be repeated for all 3 
> intrinsics

added MIR tests for 3 intrinsics. The issue is that Im not able to attach the 
glue nodes to newly created laneop pieces since they fail at selection. 
https://github.com/llvm/llvm-project/pull/87509 should enable this,

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,19 @@
+; RUN: not --crash llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx900 
-verify-machineinstrs -o - %s 2>&1 | FileCheck %s

arsenm wrote:

This should also be repeated for all 3 intrinsics 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm requested changes to this pull request.

@jayfoad's testcase fails and the same test should be repeated for all 3 
intrinsics 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,19 @@
+; RUN: not --crash llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx900 
-verify-machineinstrs -o - %s 2>&1 | FileCheck %s

arsenm wrote:

This is not an IR verifier test, it is a codegen test that fails the machine 
verifier. A machine verifier would go in test/MachineVerifier, and preferably 
would be written in MIR and not rely on codegen.


I thought the point of this test was to show that the selection did not produce 
a machine verifier error after selection, so this is broken 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-03 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> You should add the mentioned convergence-tokens.ll test function

Added the test in a separate test file

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Matt Arsenault via cfe-commits

arsenm wrote:

You should add the mentioned convergence-tokens.ll test function 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Vikram Hegde via cfe-commits


@@ -5496,6 +5496,9 @@ const char* 
AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
   NODE_NAME_CASE(LDS)
   NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD)
   NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD)
+  NODE_NAME_CASE(READLANE)
+  NODE_NAME_CASE(READFIRSTLANE)

vikramRH wrote:

done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Jay Foad via cfe-commits

jayfoad wrote:

There is a latent problem to do with convergence. If you add a new test case 
like this:
```diff
diff --git a/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll 
b/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll
index 238f6ab39e83..22995083293d 100644
--- a/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll
+++ b/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll
@@ -55,6 +55,21 @@ else:
   ret i32 %p
 }
 
+define i64 @basic_branch_i64(i64 %src, i1 %cond) #0 {
+entry:
+  %t = call token @llvm.experimental.convergence.anchor()
+  %x = add i64 %src, 1
+  br i1 %cond, label %then, label %else
+
+then:
+  %r = call i64 @llvm.amdgcn.readfirstlane.i64(i64 %x) [ 
"convergencectrl"(token %t) ]
+  br label %else
+
+else:
+  %p = phi i64 [%r, %then], [%x, %entry]
+  ret i64 %p
+}
+
 ; CHECK-LABEL: name:basic_loop
 ;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR
 ;   CHECK:  bb.[[#]].loop:
```
Then it will fail with:
```
*** Bad machine code: Cannot mix controlled and uncontrolled convergence in the 
same function. ***
```
This is related to #87509. Since the readlane/readfirstlane/writelane 
intrinsics are IntrConvergent, the corresponding ISD nodes should be marked 
with SDNPInGlue or SDNPOptInGlue. @ssahasra FYI

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Does this need IR autoupgrade?

This type of auto upgrade is free, it just happens 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Jay Foad via cfe-commits

https://github.com/jayfoad commented:

Does this need IR autoupgrade?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Jay Foad via cfe-commits


@@ -5496,6 +5496,9 @@ const char* 
AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
   NODE_NAME_CASE(LDS)
   NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD)
   NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD)
+  NODE_NAME_CASE(READLANE)
+  NODE_NAME_CASE(READFIRSTLANE)
+  NODE_NAME_CASE(WRITELANE)

jayfoad wrote:

Add this to `SITargetLowering::isSDNodeSourceOfDivergence`

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Jay Foad via cfe-commits


@@ -5496,6 +5496,9 @@ const char* 
AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
   NODE_NAME_CASE(LDS)
   NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD)
   NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD)
+  NODE_NAME_CASE(READLANE)
+  NODE_NAME_CASE(READFIRSTLANE)

jayfoad wrote:

Add these to `AMDGPUTargetLowering::isSDNodeAlwaysUniform`

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Jay Foad via cfe-commits

https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Vikram Hegde via cfe-commits


@@ -5461,8 +5461,7 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper 
,
 
   SmallVector PartialRes;
   unsigned NumParts = Size / 32;
-  MachineInstrBuilder Src0Parts, Src2Parts;
-  Src0Parts = B.buildUnmerge(PartialResTy, Src0);
+  MachineInstrBuilder Src0Parts = B.buildUnmerge(PartialResTy, Src0), 
Src2Parts;

vikramRH wrote:

Done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Matt Arsenault via cfe-commits


@@ -5461,8 +5461,7 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper 
,
 
   SmallVector PartialRes;
   unsigned NumParts = Size / 32;
-  MachineInstrBuilder Src0Parts, Src2Parts;
-  Src0Parts = B.buildUnmerge(PartialResTy, Src0);
+  MachineInstrBuilder Src0Parts = B.buildUnmerge(PartialResTy, Src0), 
Src2Parts;

arsenm wrote:

Don't also declare Src2Parts on the same line 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Vikram Hegde via cfe-commits


@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
 
   :ref:`llvm.set.fpenv` Sets the floating point 
environment to the specifies state.
 
+  llvm.amdgcn.readfirstlaneProvides direct access to 
v_readfirstlane_b32. Returns the value in
+   the lowest active lane of 
the input operand. Currently implemented
+   for i16, i32, float, half, 
bf16, <2 x i16>, <2 x half>, <2 x bfloat>,

vikramRH wrote:

done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Matt Arsenault via cfe-commits


@@ -1208,7 +1225,7 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
the output.
 
   llvm.amdgcn.sdot2Provides direct access to 
v_dot2_i32_i16 across targets which
-   support such instructions. 
This performs signed dot product
+   upport such instructions. 
This performs signed dot product

arsenm wrote:

Stray typo introduced 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Matt Arsenault via cfe-commits


@@ -5387,6 +5387,98 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+// TODO: Fix pointer type handling
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  auto createLaneOp = [, ](Register Src0, Register Src1, Register Src2,
+ LLT VT) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {VT}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+// Already legal
+return true;
+  }
+
+  if (Size < 32) {
+Src0 = B.buildAnyExt(S32, Src0).getReg(0);
+if (Src2.isValid())
+  Src2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0);
+
+Register LaneOpDst = createLaneOp(Src0, Src1, Src2, S32);
+B.buildTrunc(DstReg, LaneOpDst);
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if (Size % 32 != 0)
+return false;
+
+  LLT PartialResTy = S32;
+  if (Ty.isVector()) {
+LLT EltTy = Ty.getElementType();
+switch (EltTy.getSizeInBits()) {
+case 16:
+  PartialResTy = Ty.changeElementCount(ElementCount::getFixed(2));
+  break;
+case 32:
+  PartialResTy = EltTy;
+  break;
+default:
+  // Handle all other cases via S32 pieces;
+  break;
+}
+  }
+
+  SmallVector PartialRes;
+  unsigned NumParts = Size / 32;
+  MachineInstrBuilder Src0Parts, Src2Parts;
+  Src0Parts = B.buildUnmerge(PartialResTy, Src0);

arsenm wrote:

Fold declare + initialize 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Matt Arsenault via cfe-commits


@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
 
   :ref:`llvm.set.fpenv` Sets the floating point 
environment to the specifies state.
 
+  llvm.amdgcn.readfirstlaneProvides direct access to 
v_readfirstlane_b32. Returns the value in
+   the lowest active lane of 
the input operand. Currently implemented
+   for i16, i32, float, half, 
bf16, <2 x i16>, <2 x half>, <2 x bfloat>,

arsenm wrote:

bf16->bfloat 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented:

lgtm with a few nits 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Vikram Hegde via cfe-commits


@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+bool IsFloat = VT.isFloatingPoint();
+Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0,
+SL, MVT::i32);
+if (Src2.getNode()) {
+  Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2,
+  SL, MVT::i32);
+}
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32);
+SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT);
+return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc;
+  }
+
+  if ((ValSize % 32) == 0) {
+MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32);

vikramRH wrote:

Updated

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Vikram Hegde via cfe-commits


@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
 
   :ref:`llvm.set.fpenv` Sets the floating point 
environment to the specifies state.
 
+  llvm.amdgcn.readfirstlaneProvides direct access to 
v_readfirstlane_b32. Returns the value in
+   the lowest active lane of 
the input operand. Currently 
+   implemented for i16, i32, 
float, half, bf16, v2i16, v2f16 and types 
+   whose sizes are multiples 
of 32-bit.
+
+  llvm.amdgcn.readlane Provides direct access to 
v_readlane_b32. Returns the value in the 
+   specified lane of the first 
input operand. The second operand 
+   specifies the lane to read 
from. Currently implemented
+   for i16, i32, float, half, 
bf16, v2i16, v2f16 and types whose sizes

vikramRH wrote:

Updated

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Vikram Hegde via cfe-commits


@@ -5387,6 +5387,124 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+// TODO: Fix pointer type handling
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+// Already legal
+return true;
+  }
+
+  if (Size < 32) {
+Src0 = B.buildAnyExt(S32, Src0).getReg(0);
+if (Src2.isValid())
+  Src2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0);
+
+Register LaneOpDst = createLaneOp(Src0, Src1, Src2);
+B.buildTrunc(DstReg, LaneOpDst);
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+LLT PartialResTy =
+Ty.isVector() && Ty.getElementType() == S16 ? V2S16 : S32;

vikramRH wrote:

Done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-29 Thread Vikram Hegde via cfe-commits


@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+bool IsFloat = VT.isFloatingPoint();
+Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0,
+SL, MVT::i32);
+if (Src2.getNode()) {
+  Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2,
+  SL, MVT::i32);
+}
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32);
+SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT);
+return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc;
+  }
+
+  if ((ValSize % 32) == 0) {
+MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32);

vikramRH wrote:

Understood. Thanks !

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-29 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-29 Thread Matt Arsenault via cfe-commits


@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+bool IsFloat = VT.isFloatingPoint();
+Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0,
+SL, MVT::i32);
+if (Src2.getNode()) {
+  Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2,
+  SL, MVT::i32);
+}
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32);
+SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT);
+return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc;
+  }
+
+  if ((ValSize % 32) == 0) {
+MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32);

arsenm wrote:

Yes, you need bitcast to get from FP to the 32-bit scalars. you are only trying 
to preserve the element types of the 32-bit legal pieces. i.e. v4f16 -> v2f16, 
v2f16. v4f32 -> f32, f32, f32, f32. v2f64 -> bitcast v4i32 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-29 Thread Matt Arsenault via cfe-commits


@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+bool IsFloat = VT.isFloatingPoint();
+Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0,
+SL, MVT::i32);
+if (Src2.getNode()) {
+  Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2,
+  SL, MVT::i32);
+}
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32);
+SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT);
+return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc;
+  }
+
+  if ((ValSize % 32) == 0) {
+MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32);
+Src0 = DAG.getBitcast(VecVT, Src0);
+
+if (Src2.getNode())
+  Src2 = DAG.getBitcast(VecVT, Src2);
+
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, VecVT);
+SDValue UnrolledLaneOp = DAG.UnrollVectorOp(LaneOp.getNode());
+return DAG.getBitcast(VT, UnrolledLaneOp);

arsenm wrote:

You should not be using TLI.getRegClassFor or getTargetExtractSubreg. The 
GlobalISel equivalent does not use these. Stick to the higher level operations 
unless necessary, which it isn't here 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-29 Thread Matt Arsenault via cfe-commits


@@ -5387,6 +5387,124 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+// TODO: Fix pointer type handling
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+// Already legal
+return true;
+  }
+
+  if (Size < 32) {
+Src0 = B.buildAnyExt(S32, Src0).getReg(0);
+if (Src2.isValid())
+  Src2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0);
+
+Register LaneOpDst = createLaneOp(Src0, Src1, Src2);
+B.buildTrunc(DstReg, LaneOpDst);
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+LLT PartialResTy =
+Ty.isVector() && Ty.getElementType() == S16 ? V2S16 : S32;

arsenm wrote:

This isn't trying to preserve pointer element types, and will also lose FP 
information in the future. Can you use changeElementCount in the size == 16 
case, and try to maintain the element type for the 32-bit case? 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-29 Thread Matt Arsenault via cfe-commits


@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
 
   :ref:`llvm.set.fpenv` Sets the floating point 
environment to the specifies state.
 
+  llvm.amdgcn.readfirstlaneProvides direct access to 
v_readfirstlane_b32. Returns the value in
+   the lowest active lane of 
the input operand. Currently 
+   implemented for i16, i32, 
float, half, bf16, v2i16, v2f16 and types 
+   whose sizes are multiples 
of 32-bit.
+
+  llvm.amdgcn.readlane Provides direct access to 
v_readlane_b32. Returns the value in the 
+   specified lane of the first 
input operand. The second operand 
+   specifies the lane to read 
from. Currently implemented
+   for i16, i32, float, half, 
bf16, v2i16, v2f16 and types whose sizes

arsenm wrote:

Should probably try to avoid repeating all the same information on all 3 
intrinsics 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-29 Thread Matt Arsenault via cfe-commits


@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+bool IsFloat = VT.isFloatingPoint();
+Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0,
+SL, MVT::i32);
+if (Src2.getNode()) {
+  Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2,
+  SL, MVT::i32);
+}
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32);
+SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT);
+return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc;
+  }
+
+  if ((ValSize % 32) == 0) {
+MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32);

arsenm wrote:

Same comment, but more important than the globalisel case. Try to maintain the 
element type (e.g. v2f32->f32) 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-29 Thread Matt Arsenault via cfe-commits


@@ -5387,6 +5387,124 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+// TODO: Fix pointer type handling
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+// Already legal
+return true;
+  }
+
+  if (Size < 32) {
+Src0 = B.buildAnyExt(S32, Src0).getReg(0);
+if (Src2.isValid())
+  Src2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0);
+
+Register LaneOpDst = createLaneOp(Src0, Src1, Src2);
+B.buildTrunc(DstReg, LaneOpDst);
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {

arsenm wrote:

Early return instead of indenting everything on this.

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-29 Thread Matt Arsenault via cfe-commits


@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
 
   :ref:`llvm.set.fpenv` Sets the floating point 
environment to the specifies state.
 
+  llvm.amdgcn.readfirstlaneProvides direct access to 
v_readfirstlane_b32. Returns the value in
+   the lowest active lane of 
the input operand. Currently 
+   implemented for i16, i32, 
float, half, bf16, v2i16, v2f16 and types 

arsenm wrote:

This mixes type naming conventions. Probably should stick to the IR convention 
(and say `bfloat`, `<2 x i16>`, `<2 x half>`, `<2 x bfloat>`. Also, pointers 
double, and multiples of the 32-bit vectors should work.

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-28 Thread Vikram Hegde via cfe-commits


@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+bool IsFloat = VT.isFloatingPoint();
+Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0,
+SL, MVT::i32);
+if (Src2.getNode()) {
+  Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2,
+  SL, MVT::i32);
+}
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32);
+SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT);
+return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc;
+  }
+
+  if ((ValSize % 32) == 0) {
+MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32);
+Src0 = DAG.getBitcast(VecVT, Src0);
+
+if (Src2.getNode())
+  Src2 = DAG.getBitcast(VecVT, Src2);
+
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, VecVT);
+SDValue UnrolledLaneOp = DAG.UnrollVectorOp(LaneOp.getNode());
+return DAG.getBitcast(VT, UnrolledLaneOp);

vikramRH wrote:

```suggestion
  MVT LaneOpT =
VT.isVector() && VT.getVectorElementType().getSizeInBits() == 16
? MVT::v2i16
: MVT::i32;
SDValue Src0SubReg, Src2SubReg;
SmallVector LaneOps;
LaneOps.push_back(DAG.getTargetConstant(
TLI.getRegClassFor(VT.getSimpleVT(), N->isDivergent())->getID(), SL,
MVT::i32));
for (unsigned i = 0; i < (ValSize / 32); i++) {
  unsigned SubRegIdx = SIRegisterInfo::getSubRegFromChannel(i);
  Src0SubReg = DAG.getTargetExtractSubreg(SubRegIdx, SL, LaneOpT, Src0);
  if (Src2)
Src2SubReg = DAG.getTargetExtractSubreg(SubRegIdx, SL, LaneOpT, Src2);
  LaneOps.push_back(createLaneOp(Src0SubReg, Src1, Src2SubReg, LaneOpT));
  LaneOps.push_back(DAG.getTargetConstant(SubRegIdx, SL, MVT::i32));
}
return SDValue(
DAG.getMachineNode(TargetOpcode::REG_SEQUENCE, SL, VT, LaneOps), 0);
```

@arsenm , @jayfoad , an alternate idea here that is much closer in logic to the 
GIsel implementation and doesn't rely on bitcasts. how does this look ? 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-26 Thread Vikram Hegde via cfe-commits


@@ -5456,43 +5444,32 @@ bool 
AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
   if ((Size % 32) == 0) {
 SmallVector PartialRes;
 unsigned NumParts = Size / 32;
-auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16;
+bool IsS16Vec = Ty.isVector() && Ty.getElementType() == S16;

vikramRH wrote:

done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-23 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits