[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH closed https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR vikramRH wrote: Makes sense, updated. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR arsenm wrote: My worry is this test won't get updated whenever that's fixed. How about you leave the existing checks, but add an additional "not --crash" run line that checks the verifier error? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR vikramRH wrote: this is a preexisting error, and the failure is further down the pipeline. (after sreg alloc now i guess), does it make sense to have it as xfail now rather then stopping after isel? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR arsenm wrote: I thought you switched to using the intrinsics such that this would avoid the verifier error. This test is going to fail if you are still hitting the error https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR vikramRH wrote: I currently see machine verifier failure which is not related to this patch. An i32 example with trunc here, https://godbolt.org/z/he8asMe77. This is also seen with wider type legalizations that we do now, so I cannot integrate these with existing tests just yet. am I missing something here ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6129,13 +6150,55 @@ static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, if (ValSize % 32 != 0) return SDValue(); + auto unrollLaneOp = [, ](SDNode *N) -> SDValue { +EVT VT = N->getValueType(0); +unsigned NE = VT.getVectorNumElements(); +EVT EltVT = VT.getVectorElementType(); +SmallVector Scalars; +unsigned NumOperands = N->getNumOperands(); +SmallVector Operands(NumOperands); +SDNode *GL = N->getGluedNode(); + +if (GL) { + // only handle convegrencectrl_glue + assert(GL->getOpcode() == ISD::CONVERGENCECTRL_GLUE); +} arsenm wrote: Typo convegrencectrl_glue. Also assert(!GL || ...) https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR arsenm wrote: We don't need these isel checks for the tokens if the IR verifier is going to catch the breakage. This test can jus be merged with the existing intrinsic tests https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > That's another option. The only real plus to the intermediate is it's > slightly less annoying to write combines for. But there are limited combining > opportunities for these we now legalize to intrinsics directly. The SDAG lowering uses a new helper to unroll vector cases while also handling convergence tokens https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
arsenm wrote: > Or drop the new nodes altogether and legelaize to intrinsics directly ? That's another option. The only real plus to the intermediate is it's slightly less annoying to write combines for. But there are limited combining opportunities for these https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,46 @@ +# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o /dev/null %s 2>&1 | FileCheck %s arsenm wrote: I'd still test all 3, but yes an IR test https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > > > > @jayfoad's testcase fails and the same test should be repeated for all > > > > 3 intrinsics > > > > > > > > > added MIR tests for 3 intrinsics. The issue is that Im not able to attach > > > the glue nodes to newly created laneop pieces since they fail at > > > selection. #87509 should enable this, > > > > > > I am not really comfortable waiting for #87509 to fix convergence tokens in > > this expansion. Is it really true that this expansion cannot be fixed > > independent of future work on `CONVERGENCE_GLUE`? There is no way to > > manually handle the same glue operands?? > > I guess one way would be to have custom selection for each of the new node > type introduced, but would this be a proper way forward ? (this would be in > general for all convergent SDNodes i guess if selection is not made generic) Or drop the new nodes altogether and legelaize to intrinsics directly ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > > > @jayfoad's testcase fails and the same test should be repeated for all 3 > > > intrinsics > > > > > > added MIR tests for 3 intrinsics. The issue is that Im not able to attach > > the glue nodes to newly created laneop pieces since they fail at selection. > > #87509 should enable this, > > I am not really comfortable waiting for #87509 to fix convergence tokens in > this expansion. Is it really true that this expansion cannot be fixed > independent of future work on `CONVERGENCE_GLUE`? There is no way to manually > handle the same glue operands?? I guess one way would be to have custom selection for each of the new node type introduced, but would this be a proper way forward ? (this would be in general for all convergent SDNodes i guess if selection is not made generic) https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,46 @@ +# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o /dev/null %s 2>&1 | FileCheck %s ssahasra wrote: All it needs is one new file in `test/CodeGen/AMDGPU` where 64-bit lane ops are used with a convergence tokens. Mark that as XFAIL. When the issue is fixed, that file can be merged into the existing tests. We don't need to test each of the convergence control intrinsics. It's enough to just have a token on a readlane. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
ssahasra wrote: > > @jayfoad's testcase fails and the same test should be repeated for all 3 > > intrinsics > > added MIR tests for 3 intrinsics. The issue is that Im not able to attach the > glue nodes to newly created laneop pieces since they fail at selection. > #87509 should enable this, I am not really comfortable waiting for #87509 to fix convergence tokens in this expansion. Is it really true that this expansion cannot be fixed independent of future work on `CONVERGENCE_GLUE`? There is no way to manually handle the same glue operands?? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,46 @@ +# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o /dev/null %s 2>&1 | FileCheck %s vikramRH wrote: Okay, I'll update with IR's https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,46 @@ +# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o /dev/null %s 2>&1 | FileCheck %s arsenm wrote: You should not need to introduce any new machine verifier tests, they are not useful. The useful test would be the IR that produces the verifier error, which ideally would be fixed https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > @jayfoad's testcase fails and the same test should be repeated for all 3 > intrinsics added MIR tests for 3 intrinsics. The issue is that Im not able to attach the glue nodes to newly created laneop pieces since they fail at selection. https://github.com/llvm/llvm-project/pull/87509 should enable this, https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,19 @@ +; RUN: not --crash llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx900 -verify-machineinstrs -o - %s 2>&1 | FileCheck %s arsenm wrote: This should also be repeated for all 3 intrinsics https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/arsenm requested changes to this pull request. @jayfoad's testcase fails and the same test should be repeated for all 3 intrinsics https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,19 @@ +; RUN: not --crash llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx900 -verify-machineinstrs -o - %s 2>&1 | FileCheck %s arsenm wrote: This is not an IR verifier test, it is a codegen test that fails the machine verifier. A machine verifier would go in test/MachineVerifier, and preferably would be written in MIR and not rely on codegen. I thought the point of this test was to show that the selection did not produce a machine verifier error after selection, so this is broken https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > You should add the mentioned convergence-tokens.ll test function Added the test in a separate test file https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
arsenm wrote: You should add the mentioned convergence-tokens.ll test function https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5496,6 +5496,9 @@ const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const { NODE_NAME_CASE(LDS) NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD) NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD) + NODE_NAME_CASE(READLANE) + NODE_NAME_CASE(READFIRSTLANE) vikramRH wrote: done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
jayfoad wrote: There is a latent problem to do with convergence. If you add a new test case like this: ```diff diff --git a/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll b/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll index 238f6ab39e83..22995083293d 100644 --- a/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll +++ b/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll @@ -55,6 +55,21 @@ else: ret i32 %p } +define i64 @basic_branch_i64(i64 %src, i1 %cond) #0 { +entry: + %t = call token @llvm.experimental.convergence.anchor() + %x = add i64 %src, 1 + br i1 %cond, label %then, label %else + +then: + %r = call i64 @llvm.amdgcn.readfirstlane.i64(i64 %x) [ "convergencectrl"(token %t) ] + br label %else + +else: + %p = phi i64 [%r, %then], [%x, %entry] + ret i64 %p +} + ; CHECK-LABEL: name:basic_loop ; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR ; CHECK: bb.[[#]].loop: ``` Then it will fail with: ``` *** Bad machine code: Cannot mix controlled and uncontrolled convergence in the same function. *** ``` This is related to #87509. Since the readlane/readfirstlane/writelane intrinsics are IntrConvergent, the corresponding ISD nodes should be marked with SDNPInGlue or SDNPOptInGlue. @ssahasra FYI https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
arsenm wrote: > Does this need IR autoupgrade? This type of auto upgrade is free, it just happens https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/jayfoad commented: Does this need IR autoupgrade? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5496,6 +5496,9 @@ const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const { NODE_NAME_CASE(LDS) NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD) NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD) + NODE_NAME_CASE(READLANE) + NODE_NAME_CASE(READFIRSTLANE) + NODE_NAME_CASE(WRITELANE) jayfoad wrote: Add this to `SITargetLowering::isSDNodeSourceOfDivergence` https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5496,6 +5496,9 @@ const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const { NODE_NAME_CASE(LDS) NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD) NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD) + NODE_NAME_CASE(READLANE) + NODE_NAME_CASE(READFIRSTLANE) jayfoad wrote: Add these to `AMDGPUTargetLowering::isSDNodeAlwaysUniform` https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5461,8 +5461,7 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , SmallVector PartialRes; unsigned NumParts = Size / 32; - MachineInstrBuilder Src0Parts, Src2Parts; - Src0Parts = B.buildUnmerge(PartialResTy, Src0); + MachineInstrBuilder Src0Parts = B.buildUnmerge(PartialResTy, Src0), Src2Parts; vikramRH wrote: Done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5461,8 +5461,7 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , SmallVector PartialRes; unsigned NumParts = Size / 32; - MachineInstrBuilder Src0Parts, Src2Parts; - Src0Parts = B.buildUnmerge(PartialResTy, Src0); + MachineInstrBuilder Src0Parts = B.buildUnmerge(PartialResTy, Src0), Src2Parts; arsenm wrote: Don't also declare Src2Parts on the same line https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR intrinsics. :ref:`llvm.set.fpenv` Sets the floating point environment to the specifies state. + llvm.amdgcn.readfirstlaneProvides direct access to v_readfirstlane_b32. Returns the value in + the lowest active lane of the input operand. Currently implemented + for i16, i32, float, half, bf16, <2 x i16>, <2 x half>, <2 x bfloat>, vikramRH wrote: done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -1208,7 +1225,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics. the output. llvm.amdgcn.sdot2Provides direct access to v_dot2_i32_i16 across targets which - support such instructions. This performs signed dot product + upport such instructions. This performs signed dot product arsenm wrote: Stray typo introduced https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,98 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +// TODO: Fix pointer type handling +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + auto createLaneOp = [, ](Register Src0, Register Src1, Register Src2, + LLT VT) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {VT}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +// Already legal +return true; + } + + if (Size < 32) { +Src0 = B.buildAnyExt(S32, Src0).getReg(0); +if (Src2.isValid()) + Src2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0); + +Register LaneOpDst = createLaneOp(Src0, Src1, Src2, S32); +B.buildTrunc(DstReg, LaneOpDst); + +MI.eraseFromParent(); +return true; + } + + if (Size % 32 != 0) +return false; + + LLT PartialResTy = S32; + if (Ty.isVector()) { +LLT EltTy = Ty.getElementType(); +switch (EltTy.getSizeInBits()) { +case 16: + PartialResTy = Ty.changeElementCount(ElementCount::getFixed(2)); + break; +case 32: + PartialResTy = EltTy; + break; +default: + // Handle all other cases via S32 pieces; + break; +} + } + + SmallVector PartialRes; + unsigned NumParts = Size / 32; + MachineInstrBuilder Src0Parts, Src2Parts; + Src0Parts = B.buildUnmerge(PartialResTy, Src0); arsenm wrote: Fold declare + initialize https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR intrinsics. :ref:`llvm.set.fpenv` Sets the floating point environment to the specifies state. + llvm.amdgcn.readfirstlaneProvides direct access to v_readfirstlane_b32. Returns the value in + the lowest active lane of the input operand. Currently implemented + for i16, i32, float, half, bf16, <2 x i16>, <2 x half>, <2 x bfloat>, arsenm wrote: bf16->bfloat https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/arsenm commented: lgtm with a few nits https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +bool IsFloat = VT.isFloatingPoint(); +Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0, +SL, MVT::i32); +if (Src2.getNode()) { + Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2, + SL, MVT::i32); +} +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32); +SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT); +return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc; + } + + if ((ValSize % 32) == 0) { +MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32); vikramRH wrote: Updated https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR intrinsics. :ref:`llvm.set.fpenv` Sets the floating point environment to the specifies state. + llvm.amdgcn.readfirstlaneProvides direct access to v_readfirstlane_b32. Returns the value in + the lowest active lane of the input operand. Currently + implemented for i16, i32, float, half, bf16, v2i16, v2f16 and types + whose sizes are multiples of 32-bit. + + llvm.amdgcn.readlane Provides direct access to v_readlane_b32. Returns the value in the + specified lane of the first input operand. The second operand + specifies the lane to read from. Currently implemented + for i16, i32, float, half, bf16, v2i16, v2f16 and types whose sizes vikramRH wrote: Updated https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,124 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +// TODO: Fix pointer type handling +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +// Already legal +return true; + } + + if (Size < 32) { +Src0 = B.buildAnyExt(S32, Src0).getReg(0); +if (Src2.isValid()) + Src2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0); + +Register LaneOpDst = createLaneOp(Src0, Src1, Src2); +B.buildTrunc(DstReg, LaneOpDst); + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +LLT PartialResTy = +Ty.isVector() && Ty.getElementType() == S16 ? V2S16 : S32; vikramRH wrote: Done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +bool IsFloat = VT.isFloatingPoint(); +Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0, +SL, MVT::i32); +if (Src2.getNode()) { + Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2, + SL, MVT::i32); +} +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32); +SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT); +return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc; + } + + if ((ValSize % 32) == 0) { +MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32); vikramRH wrote: Understood. Thanks ! https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +bool IsFloat = VT.isFloatingPoint(); +Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0, +SL, MVT::i32); +if (Src2.getNode()) { + Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2, + SL, MVT::i32); +} +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32); +SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT); +return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc; + } + + if ((ValSize % 32) == 0) { +MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32); arsenm wrote: Yes, you need bitcast to get from FP to the 32-bit scalars. you are only trying to preserve the element types of the 32-bit legal pieces. i.e. v4f16 -> v2f16, v2f16. v4f32 -> f32, f32, f32, f32. v2f64 -> bitcast v4i32 https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +bool IsFloat = VT.isFloatingPoint(); +Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0, +SL, MVT::i32); +if (Src2.getNode()) { + Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2, + SL, MVT::i32); +} +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32); +SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT); +return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc; + } + + if ((ValSize % 32) == 0) { +MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32); +Src0 = DAG.getBitcast(VecVT, Src0); + +if (Src2.getNode()) + Src2 = DAG.getBitcast(VecVT, Src2); + +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, VecVT); +SDValue UnrolledLaneOp = DAG.UnrollVectorOp(LaneOp.getNode()); +return DAG.getBitcast(VT, UnrolledLaneOp); arsenm wrote: You should not be using TLI.getRegClassFor or getTargetExtractSubreg. The GlobalISel equivalent does not use these. Stick to the higher level operations unless necessary, which it isn't here https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,124 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +// TODO: Fix pointer type handling +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +// Already legal +return true; + } + + if (Size < 32) { +Src0 = B.buildAnyExt(S32, Src0).getReg(0); +if (Src2.isValid()) + Src2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0); + +Register LaneOpDst = createLaneOp(Src0, Src1, Src2); +B.buildTrunc(DstReg, LaneOpDst); + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +LLT PartialResTy = +Ty.isVector() && Ty.getElementType() == S16 ? V2S16 : S32; arsenm wrote: This isn't trying to preserve pointer element types, and will also lose FP information in the future. Can you use changeElementCount in the size == 16 case, and try to maintain the element type for the 32-bit case? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR intrinsics. :ref:`llvm.set.fpenv` Sets the floating point environment to the specifies state. + llvm.amdgcn.readfirstlaneProvides direct access to v_readfirstlane_b32. Returns the value in + the lowest active lane of the input operand. Currently + implemented for i16, i32, float, half, bf16, v2i16, v2f16 and types + whose sizes are multiples of 32-bit. + + llvm.amdgcn.readlane Provides direct access to v_readlane_b32. Returns the value in the + specified lane of the first input operand. The second operand + specifies the lane to read from. Currently implemented + for i16, i32, float, half, bf16, v2i16, v2f16 and types whose sizes arsenm wrote: Should probably try to avoid repeating all the same information on all 3 intrinsics https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +bool IsFloat = VT.isFloatingPoint(); +Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0, +SL, MVT::i32); +if (Src2.getNode()) { + Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2, + SL, MVT::i32); +} +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32); +SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT); +return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc; + } + + if ((ValSize % 32) == 0) { +MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32); arsenm wrote: Same comment, but more important than the globalisel case. Try to maintain the element type (e.g. v2f32->f32) https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,124 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +// TODO: Fix pointer type handling +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +// Already legal +return true; + } + + if (Size < 32) { +Src0 = B.buildAnyExt(S32, Src0).getReg(0); +if (Src2.isValid()) + Src2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0); + +Register LaneOpDst = createLaneOp(Src0, Src1, Src2); +B.buildTrunc(DstReg, LaneOpDst); + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { arsenm wrote: Early return instead of indenting everything on this. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR intrinsics. :ref:`llvm.set.fpenv` Sets the floating point environment to the specifies state. + llvm.amdgcn.readfirstlaneProvides direct access to v_readfirstlane_b32. Returns the value in + the lowest active lane of the input operand. Currently + implemented for i16, i32, float, half, bf16, v2i16, v2f16 and types arsenm wrote: This mixes type naming conventions. Probably should stick to the IR convention (and say `bfloat`, `<2 x i16>`, `<2 x half>`, `<2 x bfloat>`. Also, pointers double, and multiples of the 32-bit vectors should work. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +bool IsFloat = VT.isFloatingPoint(); +Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0, +SL, MVT::i32); +if (Src2.getNode()) { + Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2, + SL, MVT::i32); +} +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32); +SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT); +return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc; + } + + if ((ValSize % 32) == 0) { +MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32); +Src0 = DAG.getBitcast(VecVT, Src0); + +if (Src2.getNode()) + Src2 = DAG.getBitcast(VecVT, Src2); + +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, VecVT); +SDValue UnrolledLaneOp = DAG.UnrollVectorOp(LaneOp.getNode()); +return DAG.getBitcast(VT, UnrolledLaneOp); vikramRH wrote: ```suggestion MVT LaneOpT = VT.isVector() && VT.getVectorElementType().getSizeInBits() == 16 ? MVT::v2i16 : MVT::i32; SDValue Src0SubReg, Src2SubReg; SmallVector LaneOps; LaneOps.push_back(DAG.getTargetConstant( TLI.getRegClassFor(VT.getSimpleVT(), N->isDivergent())->getID(), SL, MVT::i32)); for (unsigned i = 0; i < (ValSize / 32); i++) { unsigned SubRegIdx = SIRegisterInfo::getSubRegFromChannel(i); Src0SubReg = DAG.getTargetExtractSubreg(SubRegIdx, SL, LaneOpT, Src0); if (Src2) Src2SubReg = DAG.getTargetExtractSubreg(SubRegIdx, SL, LaneOpT, Src2); LaneOps.push_back(createLaneOp(Src0SubReg, Src1, Src2SubReg, LaneOpT)); LaneOps.push_back(DAG.getTargetConstant(SubRegIdx, SL, MVT::i32)); } return SDValue( DAG.getMachineNode(TargetOpcode::REG_SEQUENCE, SL, VT, LaneOps), 0); ``` @arsenm , @jayfoad , an alternate idea here that is much closer in logic to the GIsel implementation and doesn't rely on bitcasts. how does this look ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5456,43 +5444,32 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , if ((Size % 32) == 0) { SmallVector PartialRes; unsigned NumParts = Size / 32; -auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16; +bool IsS16Vec = Ty.isVector() && Ty.getElementType() == S16; vikramRH wrote: done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits