[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-27 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

closing this in favour of https://github.com/llvm/llvm-project/pull/96933 and 
https://github.com/llvm/llvm-project/pull/96934

https://github.com/llvm/llvm-project/pull/96473
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-27 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH closed 
https://github.com/llvm/llvm-project/pull/96473
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-26 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

Apologies for the commit spam here, graphite seems a good option now onwards. 
However all dependent patches have landed now, the diff here is now up to date.

https://github.com/llvm/llvm-project/pull/96473
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-26 Thread Vikram Hegde via cfe-commits


@@ -228,10 +228,11 @@ void 
AMDGPUAtomicOptimizerImpl::visitAtomicRMWInst(AtomicRMWInst ) {
 
   // If the value operand is divergent, each lane is contributing a different
   // value to the atomic calculation. We can only optimize divergent values if
-  // we have DPP available on our subtarget, and the atomic operation is 32
-  // bits.
+  // we have DPP available on our subtarget, and the atomic operation is either
+  // 32 or 64 bits.
   if (ValDivergent &&
-  (!ST->hasDPP() || DL->getTypeSizeInBits(I.getType()) != 32)) {
+  (!ST->hasDPP() || (DL->getTypeSizeInBits(I.getType()) != 32 &&
+  DL->getTypeSizeInBits(I.getType()) != 64))) {

vikramRH wrote:

Done

https://github.com/llvm/llvm-project/pull/96473
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-06-25 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH closed 
https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-25 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH closed 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-24 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/96473
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-06-23 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH ready_for_review 
https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-06-23 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #72607)

2024-06-18 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> Hello @vikramRH, please feel free to commandeer this.

Thanks @yuanfang-chen. Also, clang already rejects expressions like [0] 
(https://godbolt.org/z/eGcxzGo66), which is also true with constexprs and this 
PR. What's the specific concern here ?

https://github.com/llvm/llvm-project/pull/72607
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-06-17 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

Updated this PR to be in sync with #89217, However still plan is to land this 
land this only after changes in #89217 are accepted.

https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-06-17 Thread Vikram Hegde via cfe-commits


@@ -18479,6 +18479,28 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_update_dpp, Args[0]->getType());
 return Builder.CreateCall(F, Args);
   }
+  case AMDGPU::BI__builtin_amdgcn_permlane16:
+  case AMDGPU::BI__builtin_amdgcn_permlanex16: {
+llvm::Value *Src0 = EmitScalarExpr(E->getArg(0));

vikramRH wrote:

added a new helper

https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #72607)

2024-06-17 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

@yuanfang-chen , @AaronBallman, @shafik,  are we still actively looking into 
this ? (I would be willing to commandeer this if its not high on your priority 
list)

https://github.com/llvm/llvm-project/pull/72607
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-16 Thread Vikram Hegde via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

vikramRH wrote:

Makes sense, updated.

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Vikram Hegde via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

vikramRH wrote:

this is a preexisting error, and the failure is further down the pipeline. 
(after sreg alloc now i guess), does it make sense to have it as xfail now 
rather then stopping after isel? 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Vikram Hegde via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

vikramRH wrote:

I currently see machine verifier failure which is not related to this patch. An 
 i32 example with trunc here, https://godbolt.org/z/he8asMe77. 
This is also seen with wider type legalizations that we do now, so I cannot 
integrate these with existing tests just yet. am I missing something here ?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Vikram Hegde via cfe-commits

vikramRH wrote:





> That's another option. The only real plus to the intermediate is it's 
> slightly less annoying to write combines for. But there are limited combining 
> opportunities for these

we now legalize to intrinsics directly. The SDAG lowering uses a new helper to 
unroll vector cases while also handling convergence tokens

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> > > > @jayfoad's testcase fails and the same test should be repeated for all 
> > > > 3 intrinsics
> > > 
> > > 
> > > added MIR tests for 3 intrinsics. The issue is that Im not able to attach 
> > > the glue nodes to newly created laneop pieces since they fail at 
> > > selection. #87509 should enable this,
> > 
> > 
> > I am not really comfortable waiting for #87509 to fix convergence tokens in 
> > this expansion. Is it really true that this expansion cannot be fixed 
> > independent of future work on `CONVERGENCE_GLUE`? There is no way to 
> > manually handle the same glue operands??
> 
> I guess one way would be to have custom selection for each of the new node 
> type introduced, but would this be a proper way forward ? (this would be in 
> general for all convergent SDNodes i guess if selection is not made generic)

Or drop the new nodes altogether and legelaize to intrinsics directly ?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> > > @jayfoad's testcase fails and the same test should be repeated for all 3 
> > > intrinsics
> > 
> > 
> > added MIR tests for 3 intrinsics. The issue is that Im not able to attach 
> > the glue nodes to newly created laneop pieces since they fail at selection. 
> > #87509 should enable this,
> 
> I am not really comfortable waiting for #87509 to fix convergence tokens in 
> this expansion. Is it really true that this expansion cannot be fixed 
> independent of future work on `CONVERGENCE_GLUE`? There is no way to manually 
> handle the same glue operands??

I guess one way would be to have custom selection for each of the new node type 
introduced, but would this be a proper way forward ? (this would be in general 
for all convergent SDNodes i guess if selection is not made generic)

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Vikram Hegde via cfe-commits


@@ -0,0 +1,46 @@
+# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o 
/dev/null %s 2>&1 | FileCheck %s

vikramRH wrote:

Okay, I'll update with IR's

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> @jayfoad's testcase fails and the same test should be repeated for all 3 
> intrinsics

added MIR tests for 3 intrinsics. The issue is that Im not able to attach the 
glue nodes to newly created laneop pieces since they fail at selection. 
https://github.com/llvm/llvm-project/pull/87509 should enable this,

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-03 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> You should add the mentioned convergence-tokens.ll test function

Added the test in a separate test file

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Vikram Hegde via cfe-commits


@@ -5496,6 +5496,9 @@ const char* 
AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
   NODE_NAME_CASE(LDS)
   NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD)
   NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD)
+  NODE_NAME_CASE(READLANE)
+  NODE_NAME_CASE(READFIRSTLANE)

vikramRH wrote:

done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Vikram Hegde via cfe-commits


@@ -5461,8 +5461,7 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper 
,
 
   SmallVector PartialRes;
   unsigned NumParts = Size / 32;
-  MachineInstrBuilder Src0Parts, Src2Parts;
-  Src0Parts = B.buildUnmerge(PartialResTy, Src0);
+  MachineInstrBuilder Src0Parts = B.buildUnmerge(PartialResTy, Src0), 
Src2Parts;

vikramRH wrote:

Done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Vikram Hegde via cfe-commits


@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
 
   :ref:`llvm.set.fpenv` Sets the floating point 
environment to the specifies state.
 
+  llvm.amdgcn.readfirstlaneProvides direct access to 
v_readfirstlane_b32. Returns the value in
+   the lowest active lane of 
the input operand. Currently implemented
+   for i16, i32, float, half, 
bf16, <2 x i16>, <2 x half>, <2 x bfloat>,

vikramRH wrote:

done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Vikram Hegde via cfe-commits


@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+bool IsFloat = VT.isFloatingPoint();
+Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0,
+SL, MVT::i32);
+if (Src2.getNode()) {
+  Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2,
+  SL, MVT::i32);
+}
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32);
+SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT);
+return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc;
+  }
+
+  if ((ValSize % 32) == 0) {
+MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32);

vikramRH wrote:

Updated

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Vikram Hegde via cfe-commits


@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
 
   :ref:`llvm.set.fpenv` Sets the floating point 
environment to the specifies state.
 
+  llvm.amdgcn.readfirstlaneProvides direct access to 
v_readfirstlane_b32. Returns the value in
+   the lowest active lane of 
the input operand. Currently 
+   implemented for i16, i32, 
float, half, bf16, v2i16, v2f16 and types 
+   whose sizes are multiples 
of 32-bit.
+
+  llvm.amdgcn.readlane Provides direct access to 
v_readlane_b32. Returns the value in the 
+   specified lane of the first 
input operand. The second operand 
+   specifies the lane to read 
from. Currently implemented
+   for i16, i32, float, half, 
bf16, v2i16, v2f16 and types whose sizes

vikramRH wrote:

Updated

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-30 Thread Vikram Hegde via cfe-commits


@@ -5387,6 +5387,124 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+// TODO: Fix pointer type handling
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+// Already legal
+return true;
+  }
+
+  if (Size < 32) {
+Src0 = B.buildAnyExt(S32, Src0).getReg(0);
+if (Src2.isValid())
+  Src2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0);
+
+Register LaneOpDst = createLaneOp(Src0, Src1, Src2);
+B.buildTrunc(DstReg, LaneOpDst);
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+LLT PartialResTy =
+Ty.isVector() && Ty.getElementType() == S16 ? V2S16 : S32;

vikramRH wrote:

Done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-29 Thread Vikram Hegde via cfe-commits


@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+bool IsFloat = VT.isFloatingPoint();
+Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0,
+SL, MVT::i32);
+if (Src2.getNode()) {
+  Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2,
+  SL, MVT::i32);
+}
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32);
+SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT);
+return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc;
+  }
+
+  if ((ValSize % 32) == 0) {
+MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32);

vikramRH wrote:

Understood. Thanks !

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-29 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-29 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

1. Added/updated tests for permlanex16, permlane64
2. This needs https://github.com/llvm/llvm-project/pull/89217 to land first so 
that only incremental changes can be reviewed. 

https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-29 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-28 Thread Vikram Hegde via cfe-commits


@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+bool IsFloat = VT.isFloatingPoint();
+Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0,
+SL, MVT::i32);
+if (Src2.getNode()) {
+  Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2,
+  SL, MVT::i32);
+}
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32);
+SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT);
+return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc;
+  }
+
+  if ((ValSize % 32) == 0) {
+MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32);
+Src0 = DAG.getBitcast(VecVT, Src0);
+
+if (Src2.getNode())
+  Src2 = DAG.getBitcast(VecVT, Src2);
+
+SDValue LaneOp = createLaneOp(Src0, Src1, Src2, VecVT);
+SDValue UnrolledLaneOp = DAG.UnrollVectorOp(LaneOp.getNode());
+return DAG.getBitcast(VT, UnrolledLaneOp);

vikramRH wrote:

```suggestion
  MVT LaneOpT =
VT.isVector() && VT.getVectorElementType().getSizeInBits() == 16
? MVT::v2i16
: MVT::i32;
SDValue Src0SubReg, Src2SubReg;
SmallVector LaneOps;
LaneOps.push_back(DAG.getTargetConstant(
TLI.getRegClassFor(VT.getSimpleVT(), N->isDivergent())->getID(), SL,
MVT::i32));
for (unsigned i = 0; i < (ValSize / 32); i++) {
  unsigned SubRegIdx = SIRegisterInfo::getSubRegFromChannel(i);
  Src0SubReg = DAG.getTargetExtractSubreg(SubRegIdx, SL, LaneOpT, Src0);
  if (Src2)
Src2SubReg = DAG.getTargetExtractSubreg(SubRegIdx, SL, LaneOpT, Src2);
  LaneOps.push_back(createLaneOp(Src0SubReg, Src1, Src2SubReg, LaneOpT));
  LaneOps.push_back(DAG.getTargetConstant(SubRegIdx, SL, MVT::i32));
}
return SDValue(
DAG.getMachineNode(TargetOpcode::REG_SEQUENCE, SL, VT, LaneOps), 0);
```

@arsenm , @jayfoad , an alternate idea here that is much closer in logic to the 
GIsel implementation and doesn't rely on bitcasts. how does this look ? 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-27 Thread Vikram Hegde via cfe-commits


@@ -5433,7 +5450,16 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper 
,
 ? Src0
 : B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
 Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0);
-if (Src2.isValid()) {
+
+if (IsPermLane16) {
+  Register Src1Cast =
+  MRI.getType(Src1).isScalar()
+  ? Src1
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);

vikramRH wrote:

Yes, I will take over the changes from 
https://github.com/llvm/llvm-project/pull/89217 once finalized,

https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-27 Thread Vikram Hegde via cfe-commits


@@ -18479,6 +18479,25 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_update_dpp, Args[0]->getType());
 return Builder.CreateCall(F, Args);
   }
+  case AMDGPU::BI__builtin_amdgcn_permlane16:
+  case AMDGPU::BI__builtin_amdgcn_permlanex16: {
+Intrinsic::ID IID;
+IID = BuiltinID == AMDGPU::BI__builtin_amdgcn_permlane16
+  ? Intrinsic::amdgcn_permlane16
+  : Intrinsic::amdgcn_permlanex16;
+
+llvm::Value *Src0 = EmitScalarExpr(E->getArg(0));
+llvm::Value *Src1 = EmitScalarExpr(E->getArg(1));
+llvm::Value *Src2 = EmitScalarExpr(E->getArg(2));

vikramRH wrote:

yes

https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-26 Thread Vikram Hegde via cfe-commits


@@ -5456,43 +5444,32 @@ bool 
AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
   if ((Size % 32) == 0) {
 SmallVector PartialRes;
 unsigned NumParts = Size / 32;
-auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16;
+bool IsS16Vec = Ty.isVector() && Ty.getElementType() == S16;

vikramRH wrote:

done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-23 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-23 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> > 1. What's the proper way to legalize f16 and bf16 for SDAG case without 
> > bitcasts ? (I would think  "fp_extend -> LaneOp -> Fptrunc" is wrong)
> 
> Bitcast to i16, anyext to i32, laneop, trunc to i16, bitcast to original type.
> 
> Why wouldn't you use bitcasts?

Just a doubt I had on previous comments, sorry for the noise !

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-23 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

updated the GIsel legalizer, I still have couple of questions for SDAG case 
though,
1. What's the proper way to legalize f16 and bf16 for SDAG case without 
bitcasts ? (I would think  "fp_extend -> LaneOp -> Fptrunc" is wrong)
2. For scalar cases such as i64, f64, i128 .. (i.e 32 bit multiples), I guess 
bitcast to vectors (v2i32, v2f32, v4i32) is unavoidable since "UnrollVectorOp" 
wouldn't work otherwise. any alternalte suggestions here ?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-23 Thread Vikram Hegde via cfe-commits


@@ -5387,6 +5387,192 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+// Already legal
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0);
+if (Src2.isValid()) {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Src2 = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+}
+
+Register LaneOpDst = createLaneOp(Src0, Src1, Src2);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOpDst);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDst);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16;
+MachineInstrBuilder Src0Parts;
+
+if (Ty.isPointer()) {
+  auto PtrToInt = B.buildPtrToInt(LLT::scalar(Size), Src0);
+  Src0Parts = B.buildUnmerge(S32, PtrToInt);
+} else if (Ty.isPointerVector()) {
+  LLT IntVecTy = Ty.changeElementType(
+  LLT::scalar(Ty.getElementType().getSizeInBits()));
+  auto PtrToInt = B.buildPtrToInt(IntVecTy, Src0);
+  Src0Parts = B.buildUnmerge(S32, PtrToInt);
+} else
+  Src0Parts =
+  IsS16Vec ? B.buildUnmerge(V2S16, Src0) : B.buildUnmerge(S32, Src0);
+
+switch (IID) {
+case Intrinsic::amdgcn_readlane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  for (unsigned i = 0; i < NumParts; ++i) {
+Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0)
+: Src0Parts.getReg(i);
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32})
+ .addUse(Src0)
+ .addUse(Src1))
+.getReg(0));
+  }
+  break;
+}
+case Intrinsic::amdgcn_readfirstlane: {
+  for (unsigned i = 0; i < NumParts; ++i) {
+Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0)
+: Src0Parts.getReg(i);
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readfirstlane, {S32})
+ .addUse(Src0)
+ .getReg(0)));
+  }
+
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  Register Src2 = MI.getOperand(4).getReg();
+  MachineInstrBuilder Src2Parts;
+
+  if (Ty.isPointer()) {
+auto PtrToInt = B.buildPtrToInt(S64, Src2);
+Src2Parts = B.buildUnmerge(S32, PtrToInt);
+  } else if (Ty.isPointerVector()) {
+LLT IntVecTy = Ty.changeElementType(
+LLT::scalar(Ty.getElementType().getSizeInBits()));
+auto PtrToInt = B.buildPtrToInt(IntVecTy, Src2);
+Src2Parts = B.buildUnmerge(S32, PtrToInt);
+  } else
+Src2Parts =
+IsS16Vec ? B.buildUnmerge(V2S16, Src2) : B.buildUnmerge(S32, Src2);

vikramRH wrote:

done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-23 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-23 Thread Vikram Hegde via cfe-commits


@@ -6086,6 +6086,62 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+SDValue InitBitCast = DAG.getBitcast(IntVT, Src0);
+Src0 = DAG.getAnyExtOrTrunc(InitBitCast, SL, MVT::i32);
+if (Src2.getNode()) {
+  SDValue Src2Cast = DAG.getBitcast(IntVT, Src2);

vikramRH wrote:

What would be the proper way to legalize f16 and bf16 for SDAG case without 
bitcasts ? (Im currently thinking  "fp_extend -> LaneOp -> Fptrunc" which seems 
wrong)

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-20 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-20 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-20 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-20 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-18 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-18 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-18 Thread Vikram Hegde via cfe-commits


@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, 
untyped]> {
 // FIXME: Specify SchedRW for READFIRSTLANE_B32
 // TODO: There is VOP3 encoding also
 def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", 
VOP_READFIRSTLANE,
-   getVOP1Pat.ret, 1> {
+   [], 1> {
   let isConvergent = 1;
 }
 
+foreach vt = Reg32Types.types in {
+  def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))),
+(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0))

vikramRH wrote:

Done

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Vikram Hegde via cfe-commits


@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, 
untyped]> {
 // FIXME: Specify SchedRW for READFIRSTLANE_B32
 // TODO: There is VOP3 encoding also
 def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", 
VOP_READFIRSTLANE,
-   getVOP1Pat.ret, 1> {
+   [], 1> {
   let isConvergent = 1;
 }
 
+foreach vt = Reg32Types.types in {
+  def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))),
+(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0))

vikramRH wrote:

Do you think these changes are okay until I figure out root cause ?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Vikram Hegde via cfe-commits


@@ -342,6 +342,22 @@ def AMDGPUfdot2_impl : SDNode<"AMDGPUISD::FDOT2",
 
 def AMDGPUperm_impl : SDNode<"AMDGPUISD::PERM", AMDGPUDTIntTernaryOp, []>;
 
+def AMDGPUReadfirstlaneOp : SDTypeProfile<1, 1, [
+  SDTCisSameAs<0, 1>
+]>;
+
+def AMDGPUReadlaneOp : SDTypeProfile<1, 2, [
+  SDTCisSameAs<0, 1>, SDTCisInt<2>
+]>;
+
+def AMDGPUDWritelaneOp : SDTypeProfile<1, 3, [
+  SDTCisSameAs<1, 1>, SDTCisInt<2>, SDTCisSameAs<0, 3>,

vikramRH wrote:

Thanks for pointing this, missed updating this latest version. updated now, 
however issue is not related to this

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Vikram Hegde via cfe-commits


@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, 
untyped]> {
 // FIXME: Specify SchedRW for READFIRSTLANE_B32
 // TODO: There is VOP3 encoding also
 def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", 
VOP_READFIRSTLANE,
-   getVOP1Pat.ret, 1> {
+   [], 1> {
   let isConvergent = 1;
 }
 
+foreach vt = Reg32Types.types in {
+  def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))),
+(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0))

vikramRH wrote:

Attaching example match table snippets for v2i16 and p3 here, should make the 
scenario bit more clear,
for v2i16
 ```
GIM_Try, /*On fail goto*//*Label 3499*/ GIMT_Encode4(202699), // Rule ID 2117 //
GIM_CheckIntrinsicID, /*MI*/0, /*Op*/1, 
GIMT_Encode2(Intrinsic::amdgcn_writelane),
GIM_RootCheckType, /*Op*/0, /*Type*/GILLT_v2s16,
GIM_RootCheckType, /*Op*/2, /*Type*/GILLT_v2s16,
GIM_RootCheckType, /*Op*/3, /*Type*/GILLT_s32,
GIM_RootCheckType, /*Op*/4, /*Type*/GILLT_v2s16,
GIM_RootCheckRegBankForClass, /*Op*/0, 
/*RC*/GIMT_Encode2(AMDGPU::VGPR_32RegClassID),
// (intrinsic_wo_chain:{ *:[v2i16] } 2863:{ *:[iPTR] }, v2i16:{ 
*:[v2i16] }:$src0, i32:{ *:[i32] }:$src1, v2i16:{ *:[v2i16] }:$src2)  =>  
(V_WRITELANE_B32:{ *:[v2i16] } SCSrc_b32:{ *:[v2i16] }:$src0, SCSrc_b32:{ 
*:[i32] }:$src1, VGPR_32:{ *:[v2i16] }:$src2)
GIR_BuildRootMI, /*Opcode*/GIMT_Encode2(AMDGPU::V_WRITELANE_B32),
```

and for p3,
```
GIM_Try, /*On fail goto*//*Label 3502*/ GIMT_Encode4(202816), // Rule ID 2129 //
GIM_CheckIntrinsicID, /*MI*/0, /*Op*/1, 
GIMT_Encode2(Intrinsic::amdgcn_writelane),
GIM_RootCheckType, /*Op*/0, /*Type*/GILLT_s32,
GIM_RootCheckType, /*Op*/2, /*Type*/GILLT_p2s32,
GIM_RootCheckType, /*Op*/3, /*Type*/GILLT_s32,
GIM_RootCheckType, /*Op*/4, /*Type*/GILLT_p2s32,
GIM_RootCheckRegBankForClass, /*Op*/0, 
/*RC*/GIMT_Encode2(AMDGPU::VGPR_32RegClassID),
// (intrinsic_wo_chain:{ *:[i32] } 2863:{ *:[iPTR] }, p2:{ *:[i32] 
}:$src0, i32:{ *:[i32] }:$src1, p2:{ *:[i32] }:$src2)  =>  (V_WRITELANE_B32:{ 
*:[i32] } SCSrc_b32:{ *:[i32] }:$src0, SCSrc_b32:{ *:[i32] }:$src1, VGPR_32:{ 
*:[i32] }:$src2)
GIR_BuildRootMI, /*Opcode*/GIMT_Encode2(AMDGPU::V_WRITELANE_B32),
```

The destination type check for p3 case is still for "GILLT_s32",



https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Vikram Hegde via cfe-commits


@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, 
untyped]> {
 // FIXME: Specify SchedRW for READFIRSTLANE_B32
 // TODO: There is VOP3 encoding also
 def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", 
VOP_READFIRSTLANE,
-   getVOP1Pat.ret, 1> {
+   [], 1> {
   let isConvergent = 1;
 }
 
+foreach vt = Reg32Types.types in {
+  def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))),
+(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0))

vikramRH wrote:

Unfortunately no, Had tried this and couple of other variations. the issue 
seems to be too specific to GIsel pointers..

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-15 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-15 Thread Vikram Hegde via cfe-commits


@@ -5387,6 +5387,212 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal

vikramRH wrote:

Also the issue is only for pointer types, float, v2i16 etc work just fine

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-15 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-15 Thread Vikram Hegde via cfe-commits


@@ -5387,6 +5387,212 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal

vikramRH wrote:

Done except for pointers. I currently see an issue where pattern type inference 
somehow deduces destination type to scalars (instead of say LLT_ p3s32). not 
currently sure why , any ideas ?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #72607)

2024-05-14 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

@yuanfang-chen , any plans to continue with this PR ?

https://github.com/llvm/llvm-project/pull/72607
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-13 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

Added new 32 bit pointer,  <8 x i16> tests

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-13 Thread Vikram Hegde via cfe-commits


@@ -5386,6 +5386,153 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+MachineInstrBuilder LaneOpDst;
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid);
+  break;
+}
+case Intrinsic::amdgcn_readlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1);
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0);
+  LaneOpDst = B.buildIntrinsic(IID, {S32})
+  .addUse(Src0Valid)
+  .addUse(Src1)
+  .addUse(Src2Valid);
+}
+}
+
+Register LaneOpDstReg = LaneOpDst.getReg(0);
+B.buildBitcast(DstReg, LaneOpDstReg);
+MI.eraseFromParent();
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Register Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0);
+
+MachineInstrBuilder LaneOpDst;
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid);
+  break;
+}
+case Intrinsic::amdgcn_readlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1);
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Register Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+  LaneOpDst = B.buildIntrinsic(IID, {S32})
+  .addUse(Src0Valid)
+  .addUse(Src1)
+  .addUse(Src2Valid);
+}
+}
+
+Register LaneOpDstReg = LaneOpDst.getReg(0);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOpDstReg);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDstReg);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto Src0Parts = B.buildUnmerge(S32, Src0);

vikramRH wrote:

done I hope as per the expectation, however I don't understand the plus  here

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH deleted 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Vikram Hegde via cfe-commits


@@ -5386,6 +5386,153 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+MachineInstrBuilder LaneOpDst;
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid);
+  break;
+}
+case Intrinsic::amdgcn_readlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1);
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0);
+  LaneOpDst = B.buildIntrinsic(IID, {S32})
+  .addUse(Src0Valid)
+  .addUse(Src1)
+  .addUse(Src2Valid);
+}
+}
+
+Register LaneOpDstReg = LaneOpDst.getReg(0);
+B.buildBitcast(DstReg, LaneOpDstReg);
+MI.eraseFromParent();
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Register Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0);
+
+MachineInstrBuilder LaneOpDst;
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid);
+  break;
+}
+case Intrinsic::amdgcn_readlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1);
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Register Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+  LaneOpDst = B.buildIntrinsic(IID, {S32})
+  .addUse(Src0Valid)
+  .addUse(Src1)
+  .addUse(Src2Valid);
+}
+}
+
+Register LaneOpDstReg = LaneOpDst.getReg(0);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOpDstReg);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDstReg);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto Src0Parts = B.buildUnmerge(S32, Src0);

vikramRH wrote:

Do you mean extract s16 elements individually and handle them as  (Size < 32) 
case ?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Vikram Hegde via cfe-commits


@@ -5386,6 +5386,153 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+MachineInstrBuilder LaneOpDst;
+switch (IID) {

vikramRH wrote:

my bad, I will improve the helper

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Vikram Hegde via cfe-commits


@@ -5386,6 +5386,153 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+MachineInstrBuilder LaneOpDst;
+switch (IID) {

vikramRH wrote:

I removed the helper in the recent commit following @arsenm's suggestion. Only 
reason is readability

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> > add f32 pattern to select read/writelane operations
> 
> Why would you need this? Don't you legalize f32 to i32?

Sorry about this. Its a leftover comment from the initial implementation which 
I should have removed.

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Vikram Hegde via cfe-commits


@@ -5386,6 +5386,130 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register , Register ,
+  Register ) -> Register {
+auto LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+if (Src2.isValid())
+  return (LaneOpDst.addUse(Src1).addUse(Src2)).getReg(0);
+if (Src1.isValid())
+  return (LaneOpDst.addUse(Src1)).getReg(0);
+return LaneOpDst.getReg(0);
+  };
+
+  Register Src1, Src2, Src0Valid, Src2Valid;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+if (Src2.isValid())
+  Src2Valid = B.buildBitcast(S32, Src2).getReg(0);
+Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid);
+B.buildBitcast(DstReg, LaneOp);
+MI.eraseFromParent();
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0);
+
+if (Src2.isValid()) {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+}
+Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOp);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOp);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto Src0Parts = B.buildUnmerge(S32, Src0);
+
+switch (IID) {
+case Intrinsic::amdgcn_readlane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  for (unsigned i = 0; i < NumParts; ++i)
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32})
+ .addUse(Src0Parts.getReg(i))
+ .addUse(Src1))
+.getReg(0));

vikramRH wrote:

should this be a seperate change that addresses other such instances too ?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-05 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-05-05 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

1. Review comments
2. improve GIsel lowering
3. add tests for half, bfloat, float2, ptr, vector of ptr and int
4. removed gfx700 checks from writelane test since it caused issues with f16 
legalization. is this required ?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-05-02 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

new commit extends @jayfoad's implementation with GIsel support.  yet to add 
tests for half, floats and some vectors

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-04-22 Thread Vikram Hegde via cfe-commits


@@ -4822,6 +4822,111 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr 
,
   return RetBB;
 }
 
+static MachineBasicBlock *lowerPseudoLaneOp(MachineInstr ,

vikramRH wrote:

@arsenm, would "PreISelIntrinsicLowering" be a proper place for this ?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-04-19 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

Gentle ping :)

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-04-19 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

Added/updated tests for readfirstlane and writelane ops

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [RFC][Clang] Enable custom type checking for printf (PR #86801)

2024-03-30 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

> I looked at the OpenCL spec for C standard library support and was surprised 
> that 1) it's only talking about C99 so it's unclear what happens for C11 
> (clause 6 says "This document describes the modifications and restrictions to 
> C99 and C11 in OpenCL C" but 6.11 only talks about C99 headers and leaves 
> `iso646.h`, `math.h`, `stdbool.h`, `stddef.h`, (all in C99) as well as 
> `stdalign.h`, `stdatomic.h`, `stdnoreturn.h`, `threads.h`, and `uchar.h` 
> available?), and 2) OpenCL's `printf` is not really the same function as C's 
> `printf` 
> (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#differences-between-opencl-c-and-c99-printf).
> 
> #1 is probably more of an oversight than anything, at least with the C11 
> headers. So maybe this isn't a super slippery slope, but maybe C23 will 
> change that (I can imagine `stdbit.h` being of use in OpenCL for bit-bashing 
> operations). However, the fact that the builtin isn't really `printf` but is 
> `printf`-like makes me think we should make it a separate builtin to avoid 
> surprises (we do diagnostics based on builtin IDs and we have special 
> checking logic that we perhaps should be exempting in some cases).

Understood. Then I propose the following. 
1. Currently Builtin TableGen does not seem to support specifying lang address 
spaces in function prototypes. this needs to be implemented first if not 
already in development.
2. We could have two new macro variants probably named "OCL_BUILTIN" and 
"OCL_LIB_BUILTIN" which will take the ID's of the form 
"BI_OCL##". we would also need corresponding TableGen classes 
(probably named similar to the macros) which can expose such overloaded 
prototypes when required.

How does this sound ?


https://github.com/llvm/llvm-project/pull/86801
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [RFC][Clang] Enable custom type checking for printf (PR #86801)

2024-03-28 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

Thanks for the comments @AaronBallman. The core issue here is that the current 
builtin handling design does not allow multiple overloads for the same 
identifier to coexist  (ref. 
https://github.com/llvm/llvm-project/blob/eacda36c7dd842cb15c0c954eda74b67d0c73814/clang/include/clang/Basic/Builtins.h#L66),
 unless the builtins are defined in target specific namespaces which is what I 
tried in my original patch . If we want change this approach, I currently think 
of couple of ways at a top level
1. As you said, we could have OCL specific LibBuiltin and LangBuiltin TableGen 
classes (and corresponding macros in Buitlins.inc). To make this work they 
would need new builtin ID's  of different form (say "BI_OCL##"). 
This is very Language specific.
2. Probably change the current Builtin Info structure to allow vector of 
possible signatures for an identifier. The builtin type decoder could choose 
the appropriate signature based on LangOpt. (This wording is vague and could be 
a separate discussion in itself )

either way, changes in current design are required. printf is the only current 
use case I know that can benefit out of this (since OpenCL v1.2 s6.9.f says 
other library functions defined in C standard header are not available ,so 路‍♂️ 
 ). But I guess we could have more use cases in future. can this be a separate 
discussion ? This patch would unblock my current work for now. 
 


https://github.com/llvm/llvm-project/pull/86801
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [RFC][Clang] Enable custom type checking for printf (PR #86801)

2024-03-27 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH ready_for_review 
https://github.com/llvm/llvm-project/pull/86801
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)

2024-03-27 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

closing this in favour of https://github.com/llvm/llvm-project/pull/86801

https://github.com/llvm/llvm-project/pull/72554
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)

2024-03-27 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH closed 
https://github.com/llvm/llvm-project/pull/72554
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [RFC][Clang] Enable custom type checking for printf (PR #86801)

2024-03-27 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH created 
https://github.com/llvm/llvm-project/pull/86801

The motivation for this change comes from an ongoing PR (#72556 ) , which 
enables hostcall based printf lowering for AMDGPU target and OpenCL inputs. The 
OpenCL printf has a different signature than the C printf. the difference being 
the explicit address space specifier for format string arg as follows
  int printf(__constant const char* st, ...) __attribute__((format(printf, 1, 
2)));
This is not considered a builtin because of the type mismatch.

The patch #72556 tried to address this scenario by declaring OCL printf 
essentially as a target specific printf overload. However, the discussions 
there resulted in the decision that this should not be target-specific.
 
The idea in this patch is that the changes are NFC for current framework (i.e 
the semantic checks for printf are preserved) however the printf declarations 
are considered builtins now regardless of LangOpt. This would allow me to 
hanlde the printf CodeGen without any target specific hacks.

PS: feel free to add additional reviewers, I'm not aware of others who could 
comment here.


>From a2731d056cee9c4e75d49f8d4fa3325dc532b207 Mon Sep 17 00:00:00 2001
From: Vikram 
Date: Wed, 27 Mar 2024 04:24:10 -0400
Subject: [PATCH] [Clang] Implement custom type checking for printf

---
 clang/include/clang/Basic/Builtins.td |  4 +--
 clang/include/clang/Sema/Sema.h   |  1 +
 clang/lib/AST/Decl.cpp|  6 +++-
 clang/lib/Basic/Builtins.cpp  |  3 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |  2 +-
 clang/lib/Sema/SemaChecking.cpp   | 51 +++
 6 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/clang/include/clang/Basic/Builtins.td 
b/clang/include/clang/Basic/Builtins.td
index 52c0dd52c28b11..f795c7c42c7b25 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -2816,14 +2816,14 @@ def StrLen : LibBuiltin<"string.h"> {
 // FIXME: This list is incomplete.
 def Printf : LibBuiltin<"stdio.h"> {
   let Spellings = ["printf"];
-  let Attributes = [PrintfFormat<0>];
+  let Attributes = [PrintfFormat<0>, CustomTypeChecking];
   let Prototype = "int(char const*, ...)";
 }
 
 // FIXME: The builtin and library function should have the same signature.
 def BuiltinPrintf : Builtin {
   let Spellings = ["__builtin_printf"];
-  let Attributes = [NoThrow, PrintfFormat<0>, FunctionWithBuiltinPrefix];
+  let Attributes = [NoThrow, PrintfFormat<0>, FunctionWithBuiltinPrefix, 
CustomTypeChecking];
   let Prototype = "int(char const* restrict, ...)";
 }
 
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 5ecd2f9eb2881f..b18b208a75bdf4 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -2245,6 +2245,7 @@ class Sema final {
 
   bool SemaBuiltinVAStart(unsigned BuiltinID, CallExpr *TheCall);
   bool SemaBuiltinVAStartARMMicrosoft(CallExpr *Call);
+  bool SemaBuiltinPrintf(FunctionDecl *FDecl, CallExpr *TheCall);
   bool SemaBuiltinUnorderedCompare(CallExpr *TheCall, unsigned BuiltinID);
   bool SemaBuiltinFPClassification(CallExpr *TheCall, unsigned NumArgs,
unsigned BuiltinID);
diff --git a/clang/lib/AST/Decl.cpp b/clang/lib/AST/Decl.cpp
index 131f82985e903b..298223f874cda3 100644
--- a/clang/lib/AST/Decl.cpp
+++ b/clang/lib/AST/Decl.cpp
@@ -3629,8 +3629,12 @@ unsigned FunctionDecl::getBuiltinID(bool 
ConsiderWrapperFunctions) const {
   // OpenCL v1.2 s6.9.f - The library functions defined in
   // the C99 standard headers are not available.
   if (Context.getLangOpts().OpenCL &&
-  Context.BuiltinInfo.isPredefinedLibFunction(BuiltinID))
+  Context.BuiltinInfo.isPredefinedLibFunction(BuiltinID)) {
+if (Context.getLangOpts().getOpenCLCompatibleVersion() >= 120 &&
+(BuiltinID == Builtin::BIprintf))
+  return BuiltinID;
 return 0;
+  }
 
   // CUDA does not have device-side standard library. printf and malloc are the
   // only special cases that are supported by device-side runtime.
diff --git a/clang/lib/Basic/Builtins.cpp b/clang/lib/Basic/Builtins.cpp
index 3467847ac1672e..25590ed9299e8b 100644
--- a/clang/lib/Basic/Builtins.cpp
+++ b/clang/lib/Basic/Builtins.cpp
@@ -235,7 +235,8 @@ bool Builtin::Context::performsCallback(unsigned ID,
 
 bool Builtin::Context::canBeRedeclared(unsigned ID) const {
   return ID == Builtin::NotBuiltin || ID == Builtin::BI__va_start ||
- ID == Builtin::BI__builtin_assume_aligned ||
+ ID == Builtin::BI__builtin_assume_aligned || ID == Builtin::BIprintf 
||
+ ID == Builtin::BI__builtin_printf ||
  (!hasReferenceArgsOrResult(ID) && !hasCustomTypechecking(ID)) ||
  isInStdNamespace(ID);
 }
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 3cfdb261a0eac0..baed36acb12437 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ 

[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-03-18 Thread Vikram Hegde via cfe-commits


@@ -3616,6 +3617,12 @@ unsigned FunctionDecl::getBuiltinID(bool 
ConsiderWrapperFunctions) const {
   if (!ConsiderWrapperFunctions && getStorageClass() == SC_Static)
 return 0;
 
+  // AMDGCN implementation supports printf as a builtin
+  // for OpenCL
+  if (Context.getTargetInfo().getTriple().isAMDGCN() &&
+  Context.getLangOpts().OpenCL && BuiltinID == AMDGPU::BIprintf)
+return BuiltinID;

vikramRH wrote:

I was referring to 
https://github.com/llvm/llvm-project/blob/3e6db602918435b6a5ac476f63f8b259e7e73af4/clang/lib/AST/Decl.cpp#L3633.
 This essentially means that even if frontend attaches the printf builtin ID to 
the decl (even after custom type checks), this would revert.

https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-03-18 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-03-18 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-03-18 Thread Vikram Hegde via cfe-commits


@@ -3616,6 +3617,12 @@ unsigned FunctionDecl::getBuiltinID(bool 
ConsiderWrapperFunctions) const {
   if (!ConsiderWrapperFunctions && getStorageClass() == SC_Static)
 return 0;
 
+  // AMDGCN implementation supports printf as a builtin
+  // for OpenCL
+  if (Context.getTargetInfo().getTriple().isAMDGCN() &&
+  Context.getLangOpts().OpenCL && BuiltinID == AMDGPU::BIprintf)
+return BuiltinID;

vikramRH wrote:

@arsenm, thanks for the info. CustomTypeChecking is a valid option. I'm not 
sure why OpenCL community did not consider this change despite OpenCL specs 
specifying the details. I could create a separate patch for this (probably 
folks from OCL community would provide further background). Meanwhile, can this 
go ahead as an AMDGPU specific workaround for now so that we have the intended 
feature in place ? (The frontend changes here can be reverted with that follow 
up patch )

PS :Also, I see another issue . OpenCL v1.2 s6.9.f states none of the functions 
defined in C99 headers are available. This would mean std printf is supposed to 
be treated differently than OpenCL builtins and consequently the builtin IDs 
assigned to them "need" to be different. If this understanding is correct, 
moving ahead with using same builtin ID as std printf is not the right way.

https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-03-15 Thread Vikram Hegde via cfe-commits


@@ -3616,6 +3617,12 @@ unsigned FunctionDecl::getBuiltinID(bool 
ConsiderWrapperFunctions) const {
   if (!ConsiderWrapperFunctions && getStorageClass() == SC_Static)
 return 0;
 
+  // AMDGCN implementation supports printf as a builtin
+  // for OpenCL
+  if (Context.getTargetInfo().getTriple().isAMDGCN() &&
+  Context.getLangOpts().OpenCL && BuiltinID == AMDGPU::BIprintf)
+return BuiltinID;

vikramRH wrote:

ping @arsenm 

https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-03-14 Thread Vikram Hegde via cfe-commits


@@ -3616,6 +3617,12 @@ unsigned FunctionDecl::getBuiltinID(bool 
ConsiderWrapperFunctions) const {
   if (!ConsiderWrapperFunctions && getStorageClass() == SC_Static)
 return 0;
 
+  // AMDGCN implementation supports printf as a builtin
+  // for OpenCL
+  if (Context.getTargetInfo().getTriple().isAMDGCN() &&
+  Context.getLangOpts().OpenCL && BuiltinID == AMDGPU::BIprintf)
+return BuiltinID;

vikramRH wrote:

Only other alternative I see currently is to modify Sema (probably 
ActOnFunctionDeclarator) so that we map the ocl printf declaration to C printf 
builtin ID. This would be a really hacky solution and I would prefer this 
implementation.

https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-03-14 Thread Vikram Hegde via cfe-commits


@@ -202,12 +207,20 @@ RValue 
CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) {
 Args.push_back(Arg);
   }
 
-  llvm::IRBuilder<> IRB(Builder.GetInsertBlock(), Builder.GetInsertPoint());
-  IRB.SetCurrentDebugLocation(Builder.getCurrentDebugLocation());
+  auto PFK = CGM.getTarget().getTargetOpts().AMDGPUPrintfKindVal;
+  bool isBuffered = (PFK == clang::TargetOptions::AMDGPUPrintfKind::Buffered);
+
+  StringRef FmtStr;
+  if (llvm::getConstantStringInfo(Args[0], FmtStr)) {
+if (FmtStr.empty())
+  FmtStr = StringRef("", 1);
+  } else {
+assert(!CGM.getLangOpts().OpenCL &&
+   "OpenCL needs compile time resolvable format string");

vikramRH wrote:

Done

https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-03-01 Thread Vikram Hegde via cfe-commits


@@ -202,12 +207,20 @@ RValue 
CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) {
 Args.push_back(Arg);
   }
 
-  llvm::IRBuilder<> IRB(Builder.GetInsertBlock(), Builder.GetInsertPoint());
-  IRB.SetCurrentDebugLocation(Builder.getCurrentDebugLocation());
+  auto PFK = CGM.getTarget().getTargetOpts().AMDGPUPrintfKindVal;
+  bool isBuffered = (PFK == clang::TargetOptions::AMDGPUPrintfKind::Buffered);
+
+  StringRef FmtStr;
+  if (llvm::getConstantStringInfo(Args[0], FmtStr)) {
+if (FmtStr.empty())
+  FmtStr = StringRef("", 1);

vikramRH wrote:

not really. This is just to say the format string is not really empty (i.e size 
= 0) when the user input is an empty format string (a weird corner case)

https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-03-01 Thread Vikram Hegde via cfe-commits


@@ -2550,6 +2550,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
   ().getLongDoubleFormat() == ::APFloat::IEEEquad())
 BuiltinID = mutateLongDoubleBuiltin(BuiltinID);
 
+  // Mutate the printf builtin ID so that we use the same CodeGen path for
+  // HIP and OpenCL with AMDGPU targets.
+  if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf)
+BuiltinID = Builtin::BIprintf;

vikramRH wrote:

This can be removed if you feel so, probably we would need a new case in Expr 
CodeGen

https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-03-01 Thread Vikram Hegde via cfe-commits


@@ -3616,6 +3617,12 @@ unsigned FunctionDecl::getBuiltinID(bool 
ConsiderWrapperFunctions) const {
   if (!ConsiderWrapperFunctions && getStorageClass() == SC_Static)
 return 0;
 
+  // AMDGCN implementation supports printf as a builtin
+  // for OpenCL
+  if (Context.getTargetInfo().getTriple().isAMDGCN() &&
+  Context.getLangOpts().OpenCL && BuiltinID == AMDGPU::BIprintf)
+return BuiltinID;

vikramRH wrote:

The signatures of C-printf and OCL printf differ and I dont think generic 
builtin handling provides a way to register overloaded builtins with "shared" 
builtin ID's. do you have any alternate suggestions here ?

https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-02-21 Thread Vikram Hegde via cfe-commits


@@ -178,17 +181,29 @@ RValue 
CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E) {
   E, this, GetVprintfDeclaration(CGM.getModule()), false);
 }
 
+// Deterimines if an argument is a string
+static bool isString(const clang::Type *argXTy) {

vikramRH wrote:

I have removed this, addrspace cast is done during arg processing.

https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable OpenCL hostcall printf (WIP) (PR #72556)

2024-02-20 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

The new set of changes adds following changes,
1. The iteration over vector elements now happens using vector size from the 
format specifier as reference, this is inline with runtime implementation and 
helps handling undefined behavior when we have a mismatch.
2. The error flag "-Werror=format-invalid-specifier" has been removed.

https://github.com/llvm/llvm-project/pull/72556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] support vector subscript expressions in constant evaluator (WIP) (PR #76379)

2023-12-27 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

Putting this on hold hold as @yuanfang-chen already has a PR

https://github.com/llvm/llvm-project/pull/76379
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] support vector subscript expressions in constant evaluator (WIP) (PR #76379)

2023-12-27 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH closed 
https://github.com/llvm/llvm-project/pull/76379
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] support vector subscript expressions in constant evaluator (WIP) (PR #76379)

2023-12-26 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH updated 
https://github.com/llvm/llvm-project/pull/76379

>From 89c79eea31d1a9ec0656fbf5c4eacf75b2471034 Mon Sep 17 00:00:00 2001
From: Vikram 
Date: Wed, 20 Dec 2023 05:36:40 +
Subject: [PATCH] [Clang] support vector subscript expressions in constant
 evaluator

---
 clang/lib/AST/ExprConstant.cpp   |  61 +-
 clang/test/CodeGenCXX/temporaries.cpp|  12 +-
 clang/test/SemaCXX/constexpr-vectors.cpp | 746 ++-
 3 files changed, 668 insertions(+), 151 deletions(-)

diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index f6aeee1a4e935d..390c5aef477105 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -221,6 +221,12 @@ namespace {
 ArraySize = 2;
 MostDerivedLength = I + 1;
 IsArray = true;
+  } else if (Type->isVectorType()) {
+const VectorType *CT = Type->castAs();
+Type = CT->getElementType();
+ArraySize = CT->getNumElements();
+MostDerivedLength = I + 1;
+IsArray = true;
   } else if (const FieldDecl *FD = getAsField(Path[I])) {
 Type = FD->getType();
 ArraySize = 0;
@@ -437,6 +443,15 @@ namespace {
   MostDerivedArraySize = 2;
   MostDerivedPathLength = Entries.size();
 }
+/// Update this designator to refer to the given vector component.
+void addVectorUnchecked(const VectorType *VecTy) {
+  Entries.push_back(PathEntry::ArrayIndex(0));
+
+  MostDerivedType = VecTy->getElementType();
+  MostDerivedIsArrayElement = true;
+  MostDerivedArraySize = VecTy->getNumElements();
+  MostDerivedPathLength = Entries.size();
+}
 void diagnoseUnsizedArrayPointerArithmetic(EvalInfo , const Expr *E);
 void diagnosePointerArithmetic(EvalInfo , const Expr *E,
const APSInt );
@@ -1732,6 +1747,10 @@ namespace {
   if (checkSubobject(Info, E, Imag ? CSK_Imag : CSK_Real))
 Designator.addComplexUnchecked(EltTy, Imag);
 }
+void addVector(EvalInfo , const Expr *E, const VectorType *VecTy) {
+  if (checkSubobject(Info, E, CSK_ArrayIndex))
+Designator.addVectorUnchecked(VecTy);
+}
 void clearIsNullPointer() {
   IsNullPtr = false;
 }
@@ -1890,6 +1909,8 @@ static bool EvaluateFixedPointOrInteger(const Expr *E, 
APFixedPoint ,
 static bool EvaluateFixedPoint(const Expr *E, APFixedPoint ,
EvalInfo );
 
+static bool EvaluateVector(const Expr *E, APValue , EvalInfo );
+
 
//===--===//
 // Misc utilities
 
//===--===//
@@ -3278,6 +3299,19 @@ static bool HandleLValueComplexElement(EvalInfo , 
const Expr *E,
   return true;
 }
 
+static bool HandeLValueVectorComponent(EvalInfo , const Expr *E,
+   LValue , const VectorType *VecTy,
+   APSInt ) {
+  LVal.addVector(Info, E, VecTy);
+
+  CharUnits SizeOfComponent;
+  if (!HandleSizeof(Info, E->getExprLoc(), VecTy->getElementType(),
+SizeOfComponent))
+return false;
+  LVal.adjustOffsetAndIndex(Info, E, Adjustment, SizeOfComponent);
+  return true;
+}
+
 /// Try to evaluate the initializer for a variable declaration.
 ///
 /// \param Info   Information about the ongoing evaluation.
@@ -3718,7 +3752,8 @@ findSubobject(EvalInfo , const Expr *E, const 
CompleteObject ,
 }
 
 // If this is our last pass, check that the final object type is OK.
-if (I == N || (I == N - 1 && ObjType->isAnyComplexType())) {
+if (I == N || (I == N - 1 &&
+   (ObjType->isAnyComplexType() || ObjType->isVectorType( {
   // Accesses to volatile objects are prohibited.
   if (ObjType.isVolatileQualified() && isFormalAccess(handler.AccessKind)) 
{
 if (Info.getLangOpts().CPlusPlus) {
@@ -3823,6 +3858,10 @@ findSubobject(EvalInfo , const Expr *E, const 
CompleteObject ,
 return handler.found(Index ? O->getComplexFloatImag()
: O->getComplexFloatReal(), ObjType);
   }
+} else if (ObjType->isVectorType()) {
+  // Next Subobject is a vector element
+  uint64_t Index = Sub.Entries[I].getAsArrayIndex();
+  O = >getVectorElt(Index);
 } else if (const FieldDecl *Field = getAsField(Sub.Entries[I])) {
   if (Field->isMutable() &&
   !Obj.mayAccessMutableMembers(Info, handler.AccessKind)) {
@@ -8756,14 +8795,28 @@ bool LValueExprEvaluator::VisitMemberExpr(const 
MemberExpr *E) {
 }
 
 bool LValueExprEvaluator::VisitArraySubscriptExpr(const ArraySubscriptExpr *E) 
{
-  // FIXME: Deal with vectors as array subscript bases.
-  if (E->getBase()->getType()->isVectorType() ||
-  E->getBase()->getType()->isSveVLSBuiltinType())
+
+  if (E->getBase()->getType()->isSveVLSBuiltinType())
 return Error(E);
 
   

[clang] [Clang] support vector subscript expressions in constant evaluator (WIP) (PR #76379)

2023-12-25 Thread Vikram Hegde via cfe-commits

vikramRH wrote:

It seems there are few crashes with systemZ vectors. Looking into them

https://github.com/llvm/llvm-project/pull/76379
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] support vector subscript expressions in constant evaluator (WIP) (PR #76379)

2023-12-25 Thread Vikram Hegde via cfe-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/76379
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   >