[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)

2024-05-23 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/93064
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-23 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented:

Should lose the [WIP] in the title 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)

2024-05-23 Thread Matt Arsenault via cfe-commits


@@ -12385,4 +12385,8 @@ def err_acc_reduction_composite_type
 def err_acc_reduction_composite_member_type :Error<
 "OpenACC 'reduction' composite variable must not have non-scalar field">;
 def note_acc_reduction_composite_member_loc : Note<"invalid field is here">;
+
+// AMDGCN builtins diagnostics
+def err_amdgcn_global_load_lds_size_invalid_value : Error<"invalid size 
value">;
+def note_amdgcn_global_load_lds_size_valid_value : Note<"size must be 1/2/4">;

arsenm wrote:

Not sure what the message phrasing guidelines are here, but probably should 
spill out 1, 2, or 4 rather than using / 

https://github.com/llvm/llvm-project/pull/93064
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)

2024-05-23 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,13 @@
+// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-unknown 
-target-cpu gfx940 -S -verify -o - %s
+// REQUIRES: amdgpu-registered-target
+
+typedef unsigned int u32;
+
+void test_global_load_lds_unsupported_size(global u32* src, local u32 *dst, 
u32 size) {
+  __builtin_amdgcn_global_load_lds(src, dst, size, /*offset=*/0, /*aux=*/0); 
// expected-error{{expression is not an integer constant expression}}
+  __builtin_amdgcn_global_load_lds(src, dst, /*size=*/5, /*offset=*/0, 
/*aux=*/0); // expected-error{{invalid size value}} expected-note {{size must 
be 1/2/4}}
+  __builtin_amdgcn_global_load_lds(src, dst, /*size=*/0, /*offset=*/0, 
/*aux=*/0); // expected-error{{invalid size value}} expected-note {{size must 
be 1/2/4}}

arsenm wrote:

Didn't add negative value test 

https://github.com/llvm/llvm-project/pull/93064
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce target-specific `Sema` components (PR #93179)

2024-05-23 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/93179
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Introduce target-specific `Sema` components (PR #93179)

2024-05-23 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented:

Should update the GitHub autolabeler paths for the targets 

https://github.com/llvm/llvm-project/pull/93179
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-23 Thread Matt Arsenault via cfe-commits


@@ -6086,6 +6086,62 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+SDValue InitBitCast = DAG.getBitcast(IntVT, Src0);
+Src0 = DAG.getAnyExtOrTrunc(InitBitCast, SL, MVT::i32);
+if (Src2.getNode()) {
+  SDValue Src2Cast = DAG.getBitcast(IntVT, Src2);

arsenm wrote:

Yes, bitcast for the f16/bf16 case to get to the int 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-23 Thread Matt Arsenault via cfe-commits


@@ -5456,43 +5444,32 @@ bool 
AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
   if ((Size % 32) == 0) {
 SmallVector PartialRes;
 unsigned NumParts = Size / 32;
-auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16;
+bool IsS16Vec = Ty.isVector() && Ty.getElementType() == S16;

arsenm wrote:

Better to track this as the LLT to use for the pieces, rather than making it 
this conditional thing. This will simplify improved pointer handling in the 
future 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] IR: Add module level attribution language-standard (PR #93159)

2024-05-23 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm requested changes to this pull request.

You cannot encode language standards in this. We should simply have different 
operations that provide the range of semantics and not make the IR modal 

https://github.com/llvm/llvm-project/pull/93159
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)

2024-05-22 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,9 @@
+// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-unknown 
-target-cpu gfx940 -S -verify -o - %s
+// REQUIRES: amdgpu-registered-target
+
+typedef unsigned int u32;
+
+void test_global_load_lds_unsupported_size(global u32* src, local u32 *dst, 
u32 size) {
+  __builtin_amdgcn_global_load_lds(src, dst, size, /*offset=*/0, /*aux=*/0); 
// expected-error{{expression is not an integer constant expression}}
+  __builtin_amdgcn_global_load_lds(src, dst, /*size=*/5, /*offset=*/0, 
/*aux=*/0); // expected-error{{invalid size value}} expected-note {{size must 
be 1/2/4}}

arsenm wrote:

Test 0, -1, 3, 12, 16? 

https://github.com/llvm/llvm-project/pull/93064
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)

2024-05-22 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,9 @@
+// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-unknown 
-target-cpu gfx940 -S -verify -o - %s
+// REQUIRES: amdgpu-registered-target
+
+typedef unsigned int u32;
+
+void test_global_load_lds_unsupported_size(global u32* src, local u32 *dst, 
u32 size) {
+  __builtin_amdgcn_global_load_lds(src, dst, size, /*offset=*/0, /*aux=*/0); 
// expected-error{{size must be a constant}} expected-error{{cannot compile 
this builtin function yet}}

arsenm wrote:

Why is cannot compile this builtin function yet here? 

https://github.com/llvm/llvm-project/pull/93064
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)

2024-05-22 Thread Matt Arsenault via cfe-commits


@@ -2537,6 +2537,47 @@ static RValue 
EmitHipStdParUnsupportedBuiltin(CodeGenFunction *CGF,
   return RValue::get(CGF->Builder.CreateCall(UBF, Args));
 }
 
+static void buildInstrinsicCallArgs(CodeGenFunction , const CallExpr *E,

arsenm wrote:

Shouldn't need any CGBUiltin changes? 

https://github.com/llvm/llvm-project/pull/93064
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)

2024-05-22 Thread Matt Arsenault via cfe-commits


@@ -19040,6 +19040,48 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType});
 return Builder.CreateCall(F, {Arg});
   }
+  case AMDGPU::BI__builtin_amdgcn_global_load_lds: {
+SmallVector Args;
+unsigned ICEArguments = 0;
+ASTContext::GetBuiltinTypeError Error;
+getContext().GetBuiltinType(BuiltinID, Error, );
+assert(Error == ASTContext::GE_None && "Should not codegen an error");
+Function *F = CGM.getIntrinsic(Intrinsic::amdgcn_global_load_lds);
+llvm::FunctionType *FTy = F->getFunctionType();
+for (unsigned i = 0, e = E->getNumArgs(); i != e; ++i) {
+  Value *ArgValue = EmitScalarOrConstFoldImmArg(ICEArguments, i, E);
+  llvm::Type *PTy = FTy->getParamType(i);
+  if (PTy != ArgValue->getType()) {
+if (auto *PtrTy = dyn_cast(PTy)) {
+  if (PtrTy->getAddressSpace() !=
+  ArgValue->getType()->getPointerAddressSpace()) {
+ArgValue = Builder.CreateAddrSpaceCast(
+ArgValue, llvm::PointerType::get(getLLVMContext(),
+ PtrTy->getAddressSpace()));
+  }
+}

arsenm wrote:

> Because the builtin can be used not only in OpenCL, I don't think it would be 
> good to put it in SemaOpenCL.

But your test case is written in OpenCL. You can write a run line for other 
languages if really needed, but for this you don't really need it 

> @yxsamliu Do we have Sema for builtin?

Yes, everything does

https://github.com/llvm/llvm-project/pull/93064
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)

2024-05-22 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,9 @@
+// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-unknown 
-target-cpu gfx940 -S -verify -o - %s
+// REQUIRES: amdgpu-registered-target

arsenm wrote:

Test belongs in SemaOpenCL 

https://github.com/llvm/llvm-project/pull/93064
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)

2024-05-22 Thread Matt Arsenault via cfe-commits


@@ -19040,6 +19040,48 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType});
 return Builder.CreateCall(F, {Arg});
   }
+  case AMDGPU::BI__builtin_amdgcn_global_load_lds: {
+SmallVector Args;
+unsigned ICEArguments = 0;
+ASTContext::GetBuiltinTypeError Error;
+getContext().GetBuiltinType(BuiltinID, Error, );
+assert(Error == ASTContext::GE_None && "Should not codegen an error");
+Function *F = CGM.getIntrinsic(Intrinsic::amdgcn_global_load_lds);
+llvm::FunctionType *FTy = F->getFunctionType();
+for (unsigned i = 0, e = E->getNumArgs(); i != e; ++i) {
+  Value *ArgValue = EmitScalarOrConstFoldImmArg(ICEArguments, i, E);
+  llvm::Type *PTy = FTy->getParamType(i);
+  if (PTy != ArgValue->getType()) {
+if (auto *PtrTy = dyn_cast(PTy)) {
+  if (PtrTy->getAddressSpace() !=
+  ArgValue->getType()->getPointerAddressSpace()) {
+ArgValue = Builder.CreateAddrSpaceCast(
+ArgValue, llvm::PointerType::get(getLLVMContext(),
+ PtrTy->getAddressSpace()));
+  }
+}
+ArgValue = Builder.CreateBitCast(ArgValue, PTy);

arsenm wrote:

Should never have to create a pointer bitcast 

https://github.com/llvm/llvm-project/pull/93064
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)

2024-05-22 Thread Matt Arsenault via cfe-commits


@@ -19040,6 +19040,48 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType});
 return Builder.CreateCall(F, {Arg});
   }
+  case AMDGPU::BI__builtin_amdgcn_global_load_lds: {
+SmallVector Args;
+unsigned ICEArguments = 0;
+ASTContext::GetBuiltinTypeError Error;
+getContext().GetBuiltinType(BuiltinID, Error, );
+assert(Error == ASTContext::GE_None && "Should not codegen an error");
+Function *F = CGM.getIntrinsic(Intrinsic::amdgcn_global_load_lds);
+llvm::FunctionType *FTy = F->getFunctionType();
+for (unsigned i = 0, e = E->getNumArgs(); i != e; ++i) {
+  Value *ArgValue = EmitScalarOrConstFoldImmArg(ICEArguments, i, E);
+  llvm::Type *PTy = FTy->getParamType(i);
+  if (PTy != ArgValue->getType()) {
+if (auto *PtrTy = dyn_cast(PTy)) {
+  if (PtrTy->getAddressSpace() !=
+  ArgValue->getType()->getPointerAddressSpace()) {
+ArgValue = Builder.CreateAddrSpaceCast(
+ArgValue, llvm::PointerType::get(getLLVMContext(),
+ PtrTy->getAddressSpace()));
+  }
+}

arsenm wrote:

You shouldn't have to adjust the codegen at all to emit a diagnostic 

https://github.com/llvm/llvm-project/pull/93064
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)

2024-05-22 Thread Matt Arsenault via cfe-commits


@@ -19040,6 +19040,48 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType});
 return Builder.CreateCall(F, {Arg});
   }
+  case AMDGPU::BI__builtin_amdgcn_global_load_lds: {
+SmallVector Args;
+unsigned ICEArguments = 0;
+ASTContext::GetBuiltinTypeError Error;
+getContext().GetBuiltinType(BuiltinID, Error, );
+assert(Error == ASTContext::GE_None && "Should not codegen an error");
+Function *F = CGM.getIntrinsic(Intrinsic::amdgcn_global_load_lds);
+llvm::FunctionType *FTy = F->getFunctionType();
+for (unsigned i = 0, e = E->getNumArgs(); i != e; ++i) {
+  Value *ArgValue = EmitScalarOrConstFoldImmArg(ICEArguments, i, E);
+  llvm::Type *PTy = FTy->getParamType(i);
+  if (PTy != ArgValue->getType()) {
+if (auto *PtrTy = dyn_cast(PTy)) {
+  if (PtrTy->getAddressSpace() !=
+  ArgValue->getType()->getPointerAddressSpace()) {
+ArgValue = Builder.CreateAddrSpaceCast(
+ArgValue, llvm::PointerType::get(getLLVMContext(),
+ PtrTy->getAddressSpace()));
+  }
+}
+ArgValue = Builder.CreateBitCast(ArgValue, PTy);
+  }
+  Args.push_back(ArgValue);
+}
+constexpr const int SizeIdx = 2;
+ConstantInt *SizeVal = dyn_cast(Args[SizeIdx]);
+if (!SizeVal) {
+  CGM.Error(E->getExprLoc(), "size must be a constant");

arsenm wrote:

These should be emitted in sema 

https://github.com/llvm/llvm-project/pull/93064
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)

2024-05-22 Thread Matt Arsenault via cfe-commits


@@ -678,6 +680,49 @@ class SIMemoryLegalizer final : public MachineFunctionPass 
{
   bool runOnMachineFunction(MachineFunction ) override;
 };
 
+static const StringMap ASNames = {{
+{"global", SIAtomicAddrSpace::GLOBAL},
+{"local", SIAtomicAddrSpace::LDS},
+}};
+
+void diagnoseUnknownMMRAASName(const MachineInstr , StringRef AS) {
+  const MachineFunction *MF = MI.getMF();
+  const Function  = MF->getFunction();
+  std::string Str;
+  raw_string_ostream OS(Str);

arsenm wrote:

SmallString + raw_svector_ostream? 

https://github.com/llvm/llvm-project/pull/78572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)

2024-05-22 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/78572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)

2024-05-22 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/78572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)

2024-05-22 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Then I guess the MMRA should just have "global" and "local" for now, we can 
> always add more later if needed. What do you think?

Yes, we don't have specific image counters. They are just vcmnt 

https://github.com/llvm/llvm-project/pull/78572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][Clang] Builtin for GLOBAL_LOAD_LDS on GFX940 (PR #92962)

2024-05-22 Thread Matt Arsenault via cfe-commits


@@ -240,6 +240,7 @@ TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_v2bf16, 
"V2sV2s*0V2s", "t", "at
 TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2bf16, "V2sV2s*1V2s", "t", 
"atomic-global-pk-add-bf16-inst")
 TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*3V2s", "t", 
"atomic-ds-pk-add-16-insts")
 TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2f16, "V2hV2h*3V2h", "t", 
"atomic-ds-pk-add-16-insts")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_lds, "vv*1v*3UiiUi", "t", 
"gfx940-insts")

arsenm wrote:

clang should really be enforcing the valid immediate values for the size 

https://github.com/llvm/llvm-project/pull/92962
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)

2024-05-21 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> I thought image memory = private. It's unclear to me, what AS does OpenCL 
> IMAGE memory map to in our backend? (But otherwise, yes, MMRA should just 
> have the backend names, the mapping of the OpenCL IMAGE to a backend AS 
> should be in the device-lib)

Images are global memory with magical addressing and value interpretation on 
load/store. There's nothing private about them 

https://github.com/llvm/llvm-project/pull/78572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)

2024-05-20 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> @arsenm Should we use `image` or `private`? We could allow both in the 
> frontend, and only use `private` as the canonical MMRA.

I don't understand why image would imply private. I would just keep at as 
private throughout 

https://github.com/llvm/llvm-project/pull/78572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-20 Thread Matt Arsenault via cfe-commits


@@ -5433,7 +5450,16 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper 
,
 ? Src0
 : B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
 Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0);
-if (Src2.isValid()) {
+
+if (IsPermLane16) {
+  Register Src1Cast =
+  MRI.getType(Src1).isScalar()
+  ? Src1
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);

arsenm wrote:

Like the other patch, shouldn't need any bitcasts 

https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-20 Thread Matt Arsenault via cfe-commits


@@ -18479,6 +18479,25 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_update_dpp, Args[0]->getType());
 return Builder.CreateCall(F, Args);
   }
+  case AMDGPU::BI__builtin_amdgcn_permlane16:
+  case AMDGPU::BI__builtin_amdgcn_permlanex16: {
+Intrinsic::ID IID;
+IID = BuiltinID == AMDGPU::BI__builtin_amdgcn_permlane16
+  ? Intrinsic::amdgcn_permlane16
+  : Intrinsic::amdgcn_permlanex16;
+
+llvm::Value *Src0 = EmitScalarExpr(E->getArg(0));
+llvm::Value *Src1 = EmitScalarExpr(E->getArg(1));
+llvm::Value *Src2 = EmitScalarExpr(E->getArg(2));

arsenm wrote:

I assume EmitScalarExpr handles the immargs correctly? 

https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-20 Thread Matt Arsenault via cfe-commits


@@ -18479,6 +18479,25 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_update_dpp, Args[0]->getType());
 return Builder.CreateCall(F, Args);
   }
+  case AMDGPU::BI__builtin_amdgcn_permlane16:
+  case AMDGPU::BI__builtin_amdgcn_permlanex16: {
+Intrinsic::ID IID;
+IID = BuiltinID == AMDGPU::BI__builtin_amdgcn_permlane16

arsenm wrote:

combine declare + define, also can sink down to use 

https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-20 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented:

On this and the previous, can you add a section to AMDGPUUsage for the 
intrinsics and what types they support 

https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-05-20 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/92725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-20 Thread Matt Arsenault via cfe-commits


@@ -5387,6 +5387,192 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+// Already legal
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0);
+if (Src2.isValid()) {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Src2 = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+}
+
+Register LaneOpDst = createLaneOp(Src0, Src1, Src2);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOpDst);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDst);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16;

arsenm wrote:

no auto 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-20 Thread Matt Arsenault via cfe-commits


@@ -5387,6 +5387,192 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+// Already legal
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0);
+if (Src2.isValid()) {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Src2 = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+}
+
+Register LaneOpDst = createLaneOp(Src0, Src1, Src2);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOpDst);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDst);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16;
+MachineInstrBuilder Src0Parts;
+
+if (Ty.isPointer()) {
+  auto PtrToInt = B.buildPtrToInt(LLT::scalar(Size), Src0);
+  Src0Parts = B.buildUnmerge(S32, PtrToInt);
+} else if (Ty.isPointerVector()) {
+  LLT IntVecTy = Ty.changeElementType(
+  LLT::scalar(Ty.getElementType().getSizeInBits()));
+  auto PtrToInt = B.buildPtrToInt(IntVecTy, Src0);
+  Src0Parts = B.buildUnmerge(S32, PtrToInt);
+} else
+  Src0Parts =
+  IsS16Vec ? B.buildUnmerge(V2S16, Src0) : B.buildUnmerge(S32, Src0);
+
+switch (IID) {
+case Intrinsic::amdgcn_readlane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  for (unsigned i = 0; i < NumParts; ++i) {
+Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0)
+: Src0Parts.getReg(i);
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32})
+ .addUse(Src0)
+ .addUse(Src1))
+.getReg(0));
+  }
+  break;
+}
+case Intrinsic::amdgcn_readfirstlane: {
+  for (unsigned i = 0; i < NumParts; ++i) {
+Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0)
+: Src0Parts.getReg(i);
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readfirstlane, {S32})
+ .addUse(Src0)
+ .getReg(0)));
+  }
+
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  Register Src2 = MI.getOperand(4).getReg();
+  MachineInstrBuilder Src2Parts;
+
+  if (Ty.isPointer()) {
+auto PtrToInt = B.buildPtrToInt(S64, Src2);
+Src2Parts = B.buildUnmerge(S32, PtrToInt);
+  } else if (Ty.isPointerVector()) {
+LLT IntVecTy = Ty.changeElementType(
+LLT::scalar(Ty.getElementType().getSizeInBits()));
+auto PtrToInt = B.buildPtrToInt(IntVecTy, Src2);
+Src2Parts = B.buildUnmerge(S32, PtrToInt);
+  } else
+Src2Parts =
+IsS16Vec ? B.buildUnmerge(V2S16, Src2) : B.buildUnmerge(S32, Src2);

arsenm wrote:

The point of splitting out the pointer typed tests was to avoid fixing the 
handling of the pointer typed selection patterns. You still have the casts 
inserted here. You should not need any ptrtoint, inttoptr, or bitcasts in any 
of these legalizations 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org

[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-20 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm requested changes to this pull request.

There should be no need to introduce same-sized value casts, whether bitcast or 
ptrtoint in either legalizer 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-20 Thread Matt Arsenault via cfe-commits


@@ -6086,6 +6086,62 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+// Already legal
+return SDValue();
+  }
+
+  if (ValSize < 32) {
+SDValue InitBitCast = DAG.getBitcast(IntVT, Src0);
+Src0 = DAG.getAnyExtOrTrunc(InitBitCast, SL, MVT::i32);
+if (Src2.getNode()) {
+  SDValue Src2Cast = DAG.getBitcast(IntVT, Src2);

arsenm wrote:

You should not have any bitcasts anywhere 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-20 Thread Matt Arsenault via cfe-commits


@@ -5387,6 +5387,192 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+// Already legal
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0);
+if (Src2.isValid()) {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Src2 = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+}
+
+Register LaneOpDst = createLaneOp(Src0, Src1, Src2);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOpDst);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDst);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16;
+MachineInstrBuilder Src0Parts;
+
+if (Ty.isPointer()) {
+  auto PtrToInt = B.buildPtrToInt(LLT::scalar(Size), Src0);
+  Src0Parts = B.buildUnmerge(S32, PtrToInt);
+} else if (Ty.isPointerVector()) {
+  LLT IntVecTy = Ty.changeElementType(
+  LLT::scalar(Ty.getElementType().getSizeInBits()));
+  auto PtrToInt = B.buildPtrToInt(IntVecTy, Src0);
+  Src0Parts = B.buildUnmerge(S32, PtrToInt);
+} else
+  Src0Parts =
+  IsS16Vec ? B.buildUnmerge(V2S16, Src0) : B.buildUnmerge(S32, Src0);
+
+switch (IID) {
+case Intrinsic::amdgcn_readlane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  for (unsigned i = 0; i < NumParts; ++i) {
+Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0)
+: Src0Parts.getReg(i);
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32})
+ .addUse(Src0)
+ .addUse(Src1))
+.getReg(0));
+  }
+  break;
+}
+case Intrinsic::amdgcn_readfirstlane: {
+  for (unsigned i = 0; i < NumParts; ++i) {
+Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0)
+: Src0Parts.getReg(i);
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readfirstlane, {S32})
+ .addUse(Src0)
+ .getReg(0)));
+  }
+
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  Register Src2 = MI.getOperand(4).getReg();
+  MachineInstrBuilder Src2Parts;
+
+  if (Ty.isPointer()) {
+auto PtrToInt = B.buildPtrToInt(S64, Src2);
+Src2Parts = B.buildUnmerge(S32, PtrToInt);
+  } else if (Ty.isPointerVector()) {
+LLT IntVecTy = Ty.changeElementType(
+LLT::scalar(Ty.getElementType().getSizeInBits()));
+auto PtrToInt = B.buildPtrToInt(IntVecTy, Src2);
+Src2Parts = B.buildUnmerge(S32, PtrToInt);
+  } else
+Src2Parts =
+IsS16Vec ? B.buildUnmerge(V2S16, Src2) : B.buildUnmerge(S32, Src2);
+
+  for (unsigned i = 0; i < NumParts; ++i) {
+Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0)
+: Src0Parts.getReg(i);
+Src2 = IsS16Vec ? B.buildBitcast(S32, Src2Parts.getReg(i)).getReg(0)
+: Src2Parts.getReg(i);
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_writelane, {S32})
+ .addUse(Src0)
+ 

[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-20 Thread Matt Arsenault via cfe-commits


@@ -5387,6 +5387,192 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+// Already legal
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0);
+if (Src2.isValid()) {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Src2 = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+}
+
+Register LaneOpDst = createLaneOp(Src0, Src1, Src2);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOpDst);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDst);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16;
+MachineInstrBuilder Src0Parts;
+
+if (Ty.isPointer()) {
+  auto PtrToInt = B.buildPtrToInt(LLT::scalar(Size), Src0);
+  Src0Parts = B.buildUnmerge(S32, PtrToInt);
+} else if (Ty.isPointerVector()) {
+  LLT IntVecTy = Ty.changeElementType(
+  LLT::scalar(Ty.getElementType().getSizeInBits()));
+  auto PtrToInt = B.buildPtrToInt(IntVecTy, Src0);
+  Src0Parts = B.buildUnmerge(S32, PtrToInt);
+} else
+  Src0Parts =
+  IsS16Vec ? B.buildUnmerge(V2S16, Src0) : B.buildUnmerge(S32, Src0);
+
+switch (IID) {
+case Intrinsic::amdgcn_readlane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  for (unsigned i = 0; i < NumParts; ++i) {
+Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0)
+: Src0Parts.getReg(i);
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32})
+ .addUse(Src0)
+ .addUse(Src1))
+.getReg(0));
+  }
+  break;
+}
+case Intrinsic::amdgcn_readfirstlane: {
+  for (unsigned i = 0; i < NumParts; ++i) {
+Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0)

arsenm wrote:

No bitcasts 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-20 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenCL] Fix an infinite loop in builidng AddrSpaceQualType (PR #92612)

2024-05-18 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,25 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 4
+//RUN: %clang_cc1 %s -emit-llvm -O1 -o - | FileCheck %s

arsenm wrote:

codegen tests need an explicit target 

https://github.com/llvm/llvm-project/pull/92612
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-17 Thread Matt Arsenault via cfe-commits


@@ -6086,6 +6086,62 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [&](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {

arsenm wrote:

? VT shadow 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-17 Thread Matt Arsenault via cfe-commits


@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, 
untyped]> {
 // FIXME: Specify SchedRW for READFIRSTLANE_B32
 // TODO: There is VOP3 encoding also
 def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", 
VOP_READFIRSTLANE,
-   getVOP1Pat.ret, 1> {
+   [], 1> {
   let isConvergent = 1;
 }
 
+foreach vt = Reg32Types.types in {
+  def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))),
+(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0))

arsenm wrote:

I'd rather just leave the pointer cases failing in the global isel case, and 
remove all the cast insertion bits.
You could split out the pointer cases to a separate file and just not run it 
with globalisel. 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Matt Arsenault via cfe-commits


@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, 
untyped]> {
 // FIXME: Specify SchedRW for READFIRSTLANE_B32
 // TODO: There is VOP3 encoding also
 def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", 
VOP_READFIRSTLANE,
-   getVOP1Pat.ret, 1> {
+   [], 1> {
   let isConvergent = 1;
 }
 
+foreach vt = Reg32Types.types in {
+  def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))),
+(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0))

arsenm wrote:

I think GlobalISelEmitter is just ignoring PtrValueType

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Matt Arsenault via cfe-commits


@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, 
untyped]> {
 // FIXME: Specify SchedRW for READFIRSTLANE_B32
 // TODO: There is VOP3 encoding also
 def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", 
VOP_READFIRSTLANE,
-   getVOP1Pat.ret, 1> {
+   [], 1> {
   let isConvergent = 1;
 }
 
+foreach vt = Reg32Types.types in {
+  def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))),
+(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0))

arsenm wrote:

Seems like this is just a GlobalISelEmitter bug (and another example of we we 
should have a way to write patterns that only care about the bit size) 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Matt Arsenault via cfe-commits


@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, 
untyped]> {
 // FIXME: Specify SchedRW for READFIRSTLANE_B32
 // TODO: There is VOP3 encoding also
 def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", 
VOP_READFIRSTLANE,
-   getVOP1Pat.ret, 1> {
+   [], 1> {
   let isConvergent = 1;
 }
 
+foreach vt = Reg32Types.types in {
+  def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))),
+(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0))

arsenm wrote:

What is the problem? 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Matt Arsenault via cfe-commits


@@ -780,14 +780,22 @@ defm V_SUBREV_U32 : VOP2Inst <"v_subrev_u32", 
VOP_I32_I32_I32_ARITH, null_frag,
 
 // These are special and do not read the exec mask.
 let isConvergent = 1, Uses = [] in {
-def V_READLANE_B32 : VOP2_Pseudo<"v_readlane_b32", VOP_READLANE,
-  [(set i32:$vdst, (int_amdgcn_readlane i32:$src0, i32:$src1))]>;
+def V_READLANE_B32 : VOP2_Pseudo<"v_readlane_b32", VOP_READLANE,[]>;
 let IsNeverUniform = 1, Constraints = "$vdst = $vdst_in", 
DisableEncoding="$vdst_in" in {
-def V_WRITELANE_B32 : VOP2_Pseudo<"v_writelane_b32", VOP_WRITELANE,
-  [(set i32:$vdst, (int_amdgcn_writelane i32:$src0, i32:$src1, 
i32:$vdst_in))]>;
+def V_WRITELANE_B32 : VOP2_Pseudo<"v_writelane_b32", VOP_WRITELANE, []>;
 } // End IsNeverUniform, $vdst = $vdst_in, DisableEncoding $vdst_in
 } // End isConvergent = 1
 
+foreach vt = Reg32Types.types in {
+  def : GCNPat<(vt (AMDGPUreadlane vt:$src0, i32:$src1)),
+(V_READLANE_B32 VRegOrLdsSrc_32:$src0, SCSrc_b32:$src1)

arsenm wrote:

Same here, supply the type to the result instruction 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-16 Thread Matt Arsenault via cfe-commits


@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, 
untyped]> {
 // FIXME: Specify SchedRW for READFIRSTLANE_B32
 // TODO: There is VOP3 encoding also
 def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", 
VOP_READFIRSTLANE,
-   getVOP1Pat.ret, 1> {
+   [], 1> {
   let isConvergent = 1;
 }
 
+foreach vt = Reg32Types.types in {
+  def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))),
+(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0))

arsenm wrote:

```suggestion
(vt (V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0)))
```

This may fix your pattern type deduction issue 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][amdgpuarch] Correct file names in file header comments (PR #92294)

2024-05-15 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/92294
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [MC] Remove UseAssemblerInfoForParsing (PR #91082)

2024-05-15 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> It's still used:
> 
> ```
> /work/kparzysz/git/llvm.org/mlir/lib/Target/LLVM/ROCDL/Target.cpp: In member 
> function ‘std::optional > 
> mlir::ROCDL::SerializeGPUModuleBase::assembleIsa(llvm::StringRef)’:
> /work/kparzysz/git/llvm.org/mlir/lib/Target/LLVM/ROCDL/Target.cpp:302:15: 
> error:
>  ‘class llvm::MCStreamer’ has no member named ‘setUseAssemblerInfoForParsing’
>   302 |   mcStreamer->setUseAssemblerInfoForParsing(true);
>   |   ^
> ```

But why? I don't know what business MLIR could possibly have touching this, for 
AMDGPU of all things

https://github.com/llvm/llvm-project/pull/91082
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [libc] [libcxx] [llvm] [mlir] Fix typo "indicies" (PR #92232)

2024-05-15 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/92232
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][SPIR-V] Always add convergence intrinsics (PR #88918)

2024-05-13 Thread Matt Arsenault via cfe-commits
Nathan =?utf-8?q?Gauër?= ,
Nathan =?utf-8?q?Gauër?= ,
Nathan =?utf-8?q?Gauër?= ,
Nathan =?utf-8?q?Gauër?= 
Message-ID:
In-Reply-To: 



@@ -1586,6 +1586,12 @@ class CodeGenModule : public CodeGenTypeCache {
   void AddGlobalDtor(llvm::Function *Dtor, int Priority = 65535,
  bool IsDtorAttrFunc = false);
 
+  // Return whether structured convergence intrinsics should be generated for
+  // this target.
+  bool shouldEmitConvergenceTokens() const {
+return getTriple().isSPIRVLogical();

arsenm wrote:

This doesn't have anything to do with what the target wants to do with with the 
CFG lowering. Structurally the IR should not allow uncontrolled convergence, 
and that it exists is a wart that needs to exist until everywhere handles 
convergence tokens 

https://github.com/llvm/llvm-project/pull/88918
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-13 Thread Matt Arsenault via cfe-commits


@@ -6086,6 +6086,68 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [&](SDValue Src0, SDValue Src1, SDValue Src2,
+  MVT VT) -> SDValue {
+return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, 
Src2})
+: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+   : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+if (VT == MVT::i32)
+  // Already legal
+  return SDValue();
+Src0 = DAG.getBitcast(IntVT, Src0);

arsenm wrote:

Like the other cases, we should be able to avoid intermediate casting 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-13 Thread Matt Arsenault via cfe-commits


@@ -3400,7 +3400,7 @@ def : GCNPat<
 // FIXME: Should also do this for readlane, but tablegen crashes on
 // the ignored src1.
 def : GCNPat<
-  (int_amdgcn_readfirstlane (i32 imm:$src)),
+  (i32 (AMDGPUreadfirstlane (i32 imm:$src))),

arsenm wrote:

We might need to make this fold more sophisticated for other types, but best 
for a follow up patch 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-13 Thread Matt Arsenault via cfe-commits


@@ -5387,6 +5387,212 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal

arsenm wrote:

Either add braces or don't put the comment under the if. Also, the size 32 case 
should be treated as directly legal for 32-bit pointers 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-13 Thread Matt Arsenault via cfe-commits


@@ -5387,6 +5387,212 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register Src0, Register Src1,
+  Register Src2) -> Register {
+auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane:
+  return LaneOp.getReg(0);
+case Intrinsic::amdgcn_readlane:
+  return LaneOp.addUse(Src1).getReg(0);
+case Intrinsic::amdgcn_writelane:
+  return LaneOp.addUse(Src1).addUse(Src2).getReg(0);
+default:
+  llvm_unreachable("unhandled lane op");
+}
+  };
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+auto IsPtr = Ty.isPointer();
+Src0 = IsPtr ? B.buildPtrToInt(S32, Src0).getReg(0)
+ : B.buildBitcast(S32, Src0).getReg(0);

arsenm wrote:

You should not need these casts. Any legal type should be directly accepted 
without these 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-13 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,153 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+MachineInstrBuilder LaneOpDst;
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid);
+  break;
+}
+case Intrinsic::amdgcn_readlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1);
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0);
+  LaneOpDst = B.buildIntrinsic(IID, {S32})
+  .addUse(Src0Valid)
+  .addUse(Src1)
+  .addUse(Src2Valid);
+}
+}
+
+Register LaneOpDstReg = LaneOpDst.getReg(0);
+B.buildBitcast(DstReg, LaneOpDstReg);
+MI.eraseFromParent();
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Register Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0);
+
+MachineInstrBuilder LaneOpDst;
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid);
+  break;
+}
+case Intrinsic::amdgcn_readlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1);
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Register Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+  LaneOpDst = B.buildIntrinsic(IID, {S32})
+  .addUse(Src0Valid)
+  .addUse(Src1)
+  .addUse(Src2Valid);
+}
+}
+
+Register LaneOpDstReg = LaneOpDst.getReg(0);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOpDstReg);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDstReg);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto Src0Parts = B.buildUnmerge(S32, Src0);

arsenm wrote:

You avoid adding extra intermediate bitcasts 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [flang] [llvm] [mlir] [polly] [test]: fix filecheck annotation typos (PR #91854)

2024-05-13 Thread Matt Arsenault via cfe-commits


@@ -58,7 +58,7 @@ CHECK-CNT3-NOT: {{^}}this is duplicate
 CHECK-CNT4-COUNT-5: this is duplicate
 CHECK-CNT4-EMPTY:
 
-Many-label:
+Many-LABEL:

arsenm wrote:

I would be careful about touching FileCheck tests. The point might be the wrong 
label 

https://github.com/llvm/llvm-project/pull/91854
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)

2024-05-13 Thread Matt Arsenault via cfe-commits


@@ -4408,6 +4409,42 @@ Target-Specific Extensions
 
 Clang supports some language features conditionally on some targets.
 
+AMDGPU Language Extensions
+--
+
+__builtin_amdgcn_fence
+^^
+
+``__builtin_amdgcn_fence`` emits a fence.
+
+* ``unsigned`` atomic ordering, e.g. ``__ATOMIC_ACQUIRE``
+* ``const char *`` synchronization scope, e.g. ``workgroup``
+* Zero or more ``const char *`` address spaces names.
+
+The address spaces arguments must be string literals with known values, such 
as:
+
+* ``"local"``
+* ``"global"``
+* ``"image"``
+
+If one or more address space name are provided, the code generator will attempt
+to emit potentially faster instructions that only fence those address spaces.
+Emitting such instructions may not always be possible and the compiler is free
+to fence more aggressively.
+
+If no address spaces names are provided, all address spaces are fenced.
+
+.. code-block:: c++
+
+  // Fence all address spaces.
+  __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup");
+  __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent");
+
+  // Fence only requested address spaces.
+  __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")

arsenm wrote:

Not sure we can get away without image

https://github.com/llvm/llvm-project/pull/78572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [flang] [lld] [llvm] [mlir] [polly] [test]: fix filecheck annotation typos (PR #91854)

2024-05-12 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented:

amdgpu changes lgtm 

https://github.com/llvm/llvm-project/pull/91854
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ASAN] Add "sanitized_padded_global" llvm ir attribute to identify sanitizer instrumented globals (PR #68865)

2024-05-09 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> > (You can even place `.quad sym[0].hash; .long sym[0].size` in a section 
> > `SHF_LINK_ORDER` linking to the global variable for linker garbage 
> > collection.)
> > The runtime can build a map correlating hashes to sizes, which can be used 
> > to answer variable size queries.
> 
> AMD language runtimes provide queries for the size of device global symbols 
> and functions to copy data to and from device global variables. Currently, 
> runtime gets the needed information form the ELF symbol sizes in the symbol 
> table. So, in #70166 We have come up with approach of adding two symbols (at 
> the same offset but with different sizes) for the same global, one symbol 
> which reports actual global size and other symbol which reports instrumented 
> size.

Have you looked into switching to the suggested approach of having a separately 
emitted field? 

https://github.com/llvm/llvm-project/pull/68865
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Matt Arsenault via cfe-commits


@@ -2176,26 +2176,23 @@ def int_amdgcn_wave_reduce_umin : AMDGPUWaveReduce;
 def int_amdgcn_wave_reduce_umax : AMDGPUWaveReduce;
 
 def int_amdgcn_readfirstlane :
-  ClangBuiltin<"__builtin_amdgcn_readfirstlane">,
-  Intrinsic<[llvm_i32_ty], [llvm_i32_ty],
+  Intrinsic<[llvm_any_ty], [LLVMMatchType<0>],
 [IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, 
IntrNoFree]>;
 
 // The lane argument must be uniform across the currently active threads of the
 // current wave. Otherwise, the result is undefined.
 def int_amdgcn_readlane :
-  ClangBuiltin<"__builtin_amdgcn_readlane">,
-  Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
+  Intrinsic<[llvm_any_ty], [LLVMMatchType<0>, llvm_i32_ty],
 [IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, 
IntrNoFree]>;
 
 // The value to write and lane select arguments must be uniform across the
 // currently active threads of the current wave. Otherwise, the result is
 // undefined.
 def int_amdgcn_writelane :
-  ClangBuiltin<"__builtin_amdgcn_writelane">,
-  Intrinsic<[llvm_i32_ty], [
-llvm_i32_ty,// uniform value to write: returned by the selected lane
-llvm_i32_ty,// uniform lane select
-llvm_i32_ty // returned by all lanes other than the selected one
+  Intrinsic<[llvm_any_ty], [
+LLVMMatchType<0>,// uniform value to write: returned by the selected 
lane
+llvm_i32_ty,// uniform lane select
+LLVMMatchType<0> // returned by all lanes other than the selected one

arsenm wrote:

Comments are no longer aligned 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,153 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+MachineInstrBuilder LaneOpDst;
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid);
+  break;
+}
+case Intrinsic::amdgcn_readlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1);
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0);
+  LaneOpDst = B.buildIntrinsic(IID, {S32})
+  .addUse(Src0Valid)
+  .addUse(Src1)
+  .addUse(Src2Valid);
+}
+}
+
+Register LaneOpDstReg = LaneOpDst.getReg(0);
+B.buildBitcast(DstReg, LaneOpDstReg);
+MI.eraseFromParent();
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Register Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0);
+
+MachineInstrBuilder LaneOpDst;
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid);
+  break;
+}
+case Intrinsic::amdgcn_readlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1);
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Register Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+  LaneOpDst = B.buildIntrinsic(IID, {S32})
+  .addUse(Src0Valid)
+  .addUse(Src1)
+  .addUse(Src2Valid);
+}
+}
+
+Register LaneOpDstReg = LaneOpDst.getReg(0);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOpDstReg);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDstReg);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto Src0Parts = B.buildUnmerge(S32, Src0);

arsenm wrote:

For the multiple of `<2 x s16>`, it's a bit nicer to preserve the 16-bit 
element types 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,153 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+MachineInstrBuilder LaneOpDst;
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid);
+  break;
+}
+case Intrinsic::amdgcn_readlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1);
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0);
+  LaneOpDst = B.buildIntrinsic(IID, {S32})
+  .addUse(Src0Valid)
+  .addUse(Src1)
+  .addUse(Src2Valid);
+}
+}
+
+Register LaneOpDstReg = LaneOpDst.getReg(0);
+B.buildBitcast(DstReg, LaneOpDstReg);
+MI.eraseFromParent();
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Register Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0);
+
+MachineInstrBuilder LaneOpDst;
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid);
+  break;
+}
+case Intrinsic::amdgcn_readlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1);
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Register Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+  LaneOpDst = B.buildIntrinsic(IID, {S32})
+  .addUse(Src0Valid)
+  .addUse(Src1)
+  .addUse(Src2Valid);
+}
+}
+
+Register LaneOpDstReg = LaneOpDst.getReg(0);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOpDstReg);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDstReg);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto Src0Parts = B.buildUnmerge(S32, Src0);
+
+switch (IID) {
+case Intrinsic::amdgcn_readlane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  for (unsigned i = 0; i < NumParts; ++i)
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32})
+ .addUse(Src0Parts.getReg(i))
+ .addUse(Src1))
+.getReg(0));
+  break;
+}
+case Intrinsic::amdgcn_readfirstlane: {
+
+  for (unsigned i = 0; i < NumParts; ++i)
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readfirstlane, {S32})
+ .addUse(Src0Parts.getReg(i)))
+.getReg(0));
+
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  Register Src2 = MI.getOperand(4).getReg();
+  auto Src2Parts = B.buildUnmerge(S32, Src2);
+
+  for (unsigned i = 0; i < NumParts; ++i)
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_writelane, {S32})
+ .addUse(Src0Parts.getReg(i))
+ .addUse(Src1)
+ .addUse(Src2Parts.getReg(i)))
+.getReg(0));
+}
+}
+
+if (Ty.isPointerVector()) {
+  auto MergedVec = B.buildMergeLikeInstr(
+  LLT::vector(ElementCount::getFixed(NumParts), S32), PartialRes);
+  B.buildBitcast(DstReg, MergedVec);

arsenm wrote:

You cannot bitcast from an i32 vector to a pointer. You can merge the i32 
pieces directly into the pointer though 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,153 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+MachineInstrBuilder LaneOpDst;
+switch (IID) {
+case Intrinsic::amdgcn_readfirstlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid);
+  break;
+}
+case Intrinsic::amdgcn_readlane: {
+  LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1);
+  break;
+}
+case Intrinsic::amdgcn_writelane: {
+  Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0);
+  LaneOpDst = B.buildIntrinsic(IID, {S32})
+  .addUse(Src0Valid)
+  .addUse(Src1)
+  .addUse(Src2Valid);

arsenm wrote:

Missing break 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,153 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+MachineInstrBuilder LaneOpDst;
+switch (IID) {

arsenm wrote:

This isn't quite what I meant; I mean trying to handle these all as they were 
the same create lane was ugly

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-05-08 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,111 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -fsyntax-only -verify %s
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+kernel void test () {
+
+  int sgpr = 0, vgpr = 0, imm = 0;
+
+  // sgpr constraints
+  __asm__ ("s_mov_b32 %0, %1" : "=s" (sgpr) : "s" (imm) : );
+
+  __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec}" (imm) : );
+  __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exe" (imm) : ); // 
expected-error {{invalid input constraint '{exe' in asm}}
+  __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec" (imm) : ); // 
expected-error {{invalid input constraint '{exec' in asm}}
+  __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec}a" (imm) : ); // 
expected-error {{invalid input constraint '{exec}a' in asm}}

arsenm wrote:

Or, maybe, we can use this to finally get them to eliminate the asm in the 
first place 

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-05-08 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,111 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -fsyntax-only -verify %s
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+kernel void test () {
+
+  int sgpr = 0, vgpr = 0, imm = 0;
+
+  // sgpr constraints
+  __asm__ ("s_mov_b32 %0, %1" : "=s" (sgpr) : "s" (imm) : );

arsenm wrote:

Missing the codegen tests for these? 

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-05-08 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,111 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -fsyntax-only -verify %s
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+kernel void test () {
+
+  int sgpr = 0, vgpr = 0, imm = 0;
+
+  // sgpr constraints
+  __asm__ ("s_mov_b32 %0, %1" : "=s" (sgpr) : "s" (imm) : );
+
+  __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec}" (imm) : );
+  __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exe" (imm) : ); // 
expected-error {{invalid input constraint '{exe' in asm}}
+  __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec" (imm) : ); // 
expected-error {{invalid input constraint '{exec' in asm}}
+  __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec}a" (imm) : ); // 
expected-error {{invalid input constraint '{exec}a' in asm}}

arsenm wrote:

If we really have to tolerate asm, I wonder if we can ban physical register 
references through SPIRV (other than for maybe the handful of named special 
register). That is, allow exec/m0/vcc and disallow any numbered registers. The 
exact register counts move around from target to target and are exposing bonus 
ABI 

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-05-08 Thread Matt Arsenault via cfe-commits


@@ -2658,21 +2676,102 @@ 
IGroupLPDAGMutation::invertSchedBarrierMask(SchedGroupMask Mask) const {
   return InvertedMask;
 }
 
+void IGroupLPDAGMutation::addSchedGroupBarrierRules() {
+
+  /// Whether or not the instruction has no true data predecessors
+  /// with opcode \p Opc.
+  class NoOpcDataPred : public InstructionRule {
+  protected:
+unsigned Opc;
+
+  public:
+bool apply(const SUnit *SU, const ArrayRef Collection,
+   SmallVectorImpl ) override {
+  return !std::any_of(
+  SU->Preds.begin(), SU->Preds.end(), [this](const SDep ) {
+return Pred.getKind() == SDep::Data &&
+   Pred.getSUnit()->getInstr()->getOpcode() == Opc;
+  });
+}
+
+NoOpcDataPred(unsigned Opc, const SIInstrInfo *TII, unsigned SGID,
+  bool NeedsCache = false)
+: InstructionRule(TII, SGID, NeedsCache), Opc(Opc) {}
+  };
+
+  /// Whether or not the instruction has no write after read predecessors
+  /// with opcode \p Opc.
+  class NoOpcWARPred final : public InstructionRule {
+  protected:
+unsigned Opc;
+
+  public:
+bool apply(const SUnit *SU, const ArrayRef Collection,
+   SmallVectorImpl ) override {
+  return !std::any_of(
+  SU->Preds.begin(), SU->Preds.end(), [this](const SDep ) {
+return Pred.getKind() == SDep::Anti &&
+   Pred.getSUnit()->getInstr()->getOpcode() == Opc;
+  });
+}
+NoOpcWARPred(unsigned Opc, const SIInstrInfo *TII, unsigned SGID,
+ bool NeedsCache = false)
+: InstructionRule(TII, SGID, NeedsCache), Opc(Opc){};
+  };
+
+  SchedGroupBarrierRuleCallBacks = {
+  [](unsigned SGID, const SIInstrInfo *TII) {
+return std::make_shared(AMDGPU::V_CNDMASK_B32_e64, TII,

arsenm wrote:

There's basically no reason to ever use shared_ptr, something is wrong if it's 
necessary over unique_ptr 

https://github.com/llvm/llvm-project/pull/85304
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-05-08 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/85304
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-05-08 Thread Matt Arsenault via cfe-commits


@@ -1284,7 +1284,29 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
|  ``// 5 MFMA``
|  
``__builtin_amdgcn_sched_group_barrier(8, 5, 0)``
 
-  llvm.amdgcn.iglp_opt An **experimental** 
intrinsic for instruction group level parallelism. The intrinsic
+  llvm.amdgcn.sched.group.barrier.rule It has the same behavior as 
sched.group.barrier, except the intrinsic includes a fourth argument:
+
+   - RuleMask : The bitmask of 
rules which are applied to the SchedGroup.
+
+   The RuleMask is handled as 
a 64 bit integer, so 64 rules are encodable with a single mask.
+
+   Users can access the 
intrinsic by specifying the optional fourth argument in sched_group_barrier 
builtin
+
+   |  ``// 1 VMEM read 
invoking rules 1 and 2``
+   |  
``__builtin_amdgcn_sched_group_barrier(32, 1, 0, 3)``
+
+   Currently available rules 
are:
+   - 0x: No rule.
+   - 0x0001: Instructions in 
the SchedGroup must not write to the same register
+ that a previously 
occuring V_CNDMASK_B32_e64 reads from.
+   - 0x0002: Instructions in 
the SchedGroup must not write to the same register
+ that a previously 
occuring V_PERM_B32_e64 reads from.
+   - 0x0004: Instructions in 
the SchedGroup must require data produced by a
+ V_CNDMASK_B32_e64.
+   - 0x0008: Instructions in 
the SchedGroup must require data produced by a
+ V_PERM_B32_e64.
+

arsenm wrote:

These scheduling rules seem way too specific. Especially that it's pointing out 
specific instruction encodings, by the internal pseudoinstruction names 

https://github.com/llvm/llvm-project/pull/85304
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-05-08 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented:

I don't understand how anyone is supposed to use this. This is exposing 
extremely specific, random low level details of the scheduling. Users claim 
they want scheduling controls, but what they actually want is the scheduler to 
just do the right thing. We should spent more energy making the scheduler 
sensible by default, instead of creating all of this complexity.

If we're going to have something like this, it needs to have predefined macros 
instead of expecting reading 

https://github.com/llvm/llvm-project/pull/85304
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [WIP] Expand variadic functions in IR (PR #89007)

2024-05-08 Thread Matt Arsenault via cfe-commits


@@ -247,7 +247,7 @@ Address CodeGen::emitMergePHI(CodeGenFunction , Address 
Addr1,
 
 bool CodeGen::isEmptyField(ASTContext , const FieldDecl *FD,
bool AllowArrays, bool AsIfNoUniqueAddr) {
-  if (FD->isUnnamedBitField())
+  if (FD->isUnnamedBitfield())

arsenm wrote:

Unrelated change pulled in? 

https://github.com/llvm/llvm-project/pull/89007
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [WIP] Expand variadic functions in IR (PR #89007)

2024-05-08 Thread Matt Arsenault via cfe-commits


@@ -157,7 +157,7 @@ llvm::Value 
*CodeGen::emitRoundPointerUpToAlignment(CodeGenFunction ,
   llvm::Value *RoundUp = CGF.Builder.CreateConstInBoundsGEP1_32(
   CGF.Builder.getInt8Ty(), Ptr, Align.getQuantity() - 1);
   return CGF.Builder.CreateIntrinsic(
-  llvm::Intrinsic::ptrmask, {Ptr->getType(), CGF.IntPtrTy},
+llvm::Intrinsic::ptrmask, {Ptr->getType(), CGF.IntPtrTy},

arsenm wrote:

Spurious whitespace change? 

https://github.com/llvm/llvm-project/pull/89007
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [WIP] Expand variadic functions in IR (PR #89007)

2024-05-08 Thread Matt Arsenault via cfe-commits


@@ -24,6 +24,7 @@ MODULE_PASS("amdgpu-lower-ctor-dtor", 
AMDGPUCtorDtorLoweringPass())
 MODULE_PASS("amdgpu-lower-module-lds", AMDGPULowerModuleLDSPass(*this))
 MODULE_PASS("amdgpu-printf-runtime-binding", AMDGPUPrintfRuntimeBindingPass())
 MODULE_PASS("amdgpu-unify-metadata", AMDGPUUnifyMetadataPass())
+MODULE_PASS("expand-variadics", 
ExpandVariadicsPass(ExpandVariadicsMode::Lowering))

arsenm wrote:

Shouldn't need to list this in every target's PassRegistry, the generic one 
should be fine 

https://github.com/llvm/llvm-project/pull/89007
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][HIP] Warn when __AMDGCN_WAVEFRONT_SIZE is used in host code (PR #91478)

2024-05-08 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,55 @@
+/*=== __clang_hip_device_macro_guards.h - guards for HIP device macros -===
+ *
+ * Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+ * See https://llvm.org/LICENSE.txt for license information.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ *
+ *===---===
+ */
+
+/*
+ * WARNING: This header is intended to be directly -include'd by
+ * the compiler and is not supposed to be included by users.
+ *
+ */
+
+#ifndef __CLANG_HIP_DEVICE_MACRO_GUARDS_H__
+#define __CLANG_HIP_DEVICE_MACRO_GUARDS_H__
+
+#if __HIP__
+#if !defined(__HIP_DEVICE_COMPILE__)
+// The __AMDGCN_WAVEFRONT_SIZE macros cannot hold meaningful values during host
+// compilation as devices are not initialized when the macros are defined and
+// there may indeed be devices with differing wavefront sizes in the same
+// system. This code issues diagnostics when the macros are used in host code.
+
+#undef __AMDGCN_WAVEFRONT_SIZE
+#undef __AMDGCN_WAVEFRONT_SIZE__
+
+// Reference __hip_device_macro_guard in a way that is legal in preprocessor
+// directives and does not affect the value so that appropriate diagnostics are
+// issued. Function calls, casts, or the comma operator would make the macro
+// illegal for use in preprocessor directives.
+#define __AMDGCN_WAVEFRONT_SIZE (!__hip_device_macro_guard ? 64 : 64)
+#define __AMDGCN_WAVEFRONT_SIZE__ (!__hip_device_macro_guard ? 64 : 64)
+
+// This function is referenced by the macro in device functions during host
+// compilation, it SHOULD NOT cause a diagnostic.
+__attribute__((device)) static constexpr int __hip_device_macro_guard(void) {
+  return -1;
+}
+
+// This function is referenced by the macro in host functions during host
+// compilation, it SHOULD cause a diagnostic.
+__attribute__((
+host, deprecated("The __AMDGCN_WAVEFRONT_SIZE macros do not correspond "
+ "to the device(s) when used in host code and may only "
+ "be used in device code."))) static constexpr int

arsenm wrote:

I thought I saw some junk trying to support pre-C++11 HIP, is that a concern 
here?

Is this macro defined in OpenMP? If so can we do the same thing? 

https://github.com/llvm/llvm-project/pull/91478
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)

2024-05-07 Thread Matt Arsenault via cfe-commits


@@ -4408,6 +4409,42 @@ Target-Specific Extensions
 
 Clang supports some language features conditionally on some targets.
 
+AMDGPU Language Extensions
+--
+
+__builtin_amdgcn_fence
+^^
+
+``__builtin_amdgcn_fence`` emits a fence.
+
+* ``unsigned`` atomic ordering, e.g. ``__ATOMIC_ACQUIRE``
+* ``const char *`` synchronization scope, e.g. ``workgroup``
+* Zero or more ``const char *`` address spaces names.
+
+The address spaces arguments must be string literals with known values, such 
as:
+
+* ``"local"``
+* ``"global"``
+* ``"image"``
+
+If one or more address space name are provided, the code generator will attempt
+to emit potentially faster instructions that only fence those address spaces.
+Emitting such instructions may not always be possible and the compiler is free
+to fence more aggressively.
+
+If no address spaces names are provided, all address spaces are fenced.
+
+.. code-block:: c++
+
+  // Fence all address spaces.
+  __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup");
+  __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent");
+
+  // Fence only requested address spaces.
+  __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")

arsenm wrote:

We randomly change between HSA and OpenCL terminology. Maybe we should call 
"local" "groupsegment"? I guess the ISA manuals call it "local data share" 

https://github.com/llvm/llvm-project/pull/78572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)

2024-05-07 Thread Matt Arsenault via cfe-commits


@@ -1,22 +1,113 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 4
 // REQUIRES: amdgpu-registered-target
 // RUN: %clang_cc1 %s -emit-llvm -O0 -o - \
-// RUN:   -triple=amdgcn-amd-amdhsa  | opt -S | FileCheck %s
+// RUN:   -triple=amdgcn-amd-amdhsa | FileCheck %s
 
+// CHECK-LABEL: define dso_local void @_Z25test_memory_fence_successv(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:fence syncscope("workgroup") seq_cst
+// CHECK-NEXT:fence syncscope("agent") acquire
+// CHECK-NEXT:fence seq_cst
+// CHECK-NEXT:fence syncscope("agent") acq_rel
+// CHECK-NEXT:fence syncscope("workgroup") release
+// CHECK-NEXT:ret void
+//
 void test_memory_fence_success() {
-  // CHECK-LABEL: test_memory_fence_success
 
-  // CHECK: fence syncscope("workgroup") seq_cst
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup");
 
-  // CHECK: fence syncscope("agent") acquire
   __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent");
 
-  // CHECK: fence seq_cst
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "");
 
-  // CHECK: fence syncscope("agent") acq_rel
   __builtin_amdgcn_fence(4, "agent");
 
-  // CHECK: fence syncscope("workgroup") release
   __builtin_amdgcn_fence(3, "workgroup");
 }
+
+// CHECK-LABEL: define dso_local void @_Z10test_localv(
+// CHECK-SAME: ) #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:fence syncscope("workgroup") seq_cst, !mmra [[META3:![0-9]+]]
+// CHECK-NEXT:fence syncscope("agent") acquire, !mmra [[META3]]
+// CHECK-NEXT:fence seq_cst, !mmra [[META3]]
+// CHECK-NEXT:fence syncscope("agent") acq_rel, !mmra [[META3]]
+// CHECK-NEXT:fence syncscope("workgroup") release, !mmra [[META3]]
+// CHECK-NEXT:ret void
+//
+void test_local() {
+  __builtin_amdgcn_fence( __ATOMIC_SEQ_CST, "workgroup", "local");
+
+  __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent", "local");
+
+  __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "", "local");
+
+  __builtin_amdgcn_fence(4, "agent", "local");
+
+  __builtin_amdgcn_fence(3, "workgroup", "local");
+}
+
+
+// CHECK-LABEL: define dso_local void @_Z11test_globalv(
+// CHECK-SAME: ) #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:fence syncscope("workgroup") seq_cst, !mmra [[META4:![0-9]+]]
+// CHECK-NEXT:fence syncscope("agent") acquire, !mmra [[META4]]
+// CHECK-NEXT:fence seq_cst, !mmra [[META4]]
+// CHECK-NEXT:fence syncscope("agent") acq_rel, !mmra [[META4]]
+// CHECK-NEXT:fence syncscope("workgroup") release, !mmra [[META4]]
+// CHECK-NEXT:ret void
+//
+void test_global() {
+  __builtin_amdgcn_fence( __ATOMIC_SEQ_CST, "workgroup", "global");
+
+  __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent", "global");
+
+  __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "", "global");
+
+  __builtin_amdgcn_fence(4, "agent", "global");
+
+  __builtin_amdgcn_fence(3, "workgroup", "global");
+}
+
+// CHECK-LABEL: define dso_local void @_Z10test_imagev(
+// CHECK-SAME: ) #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:fence syncscope("workgroup") seq_cst, !mmra [[META5:![0-9]+]]
+// CHECK-NEXT:fence syncscope("agent") acquire, !mmra [[META5]]
+// CHECK-NEXT:fence seq_cst, !mmra [[META5]]
+// CHECK-NEXT:fence syncscope("agent") acq_rel, !mmra [[META5]]
+// CHECK-NEXT:fence syncscope("workgroup") release, !mmra [[META5]]
+// CHECK-NEXT:ret void
+//
+void test_image() {
+  __builtin_amdgcn_fence( __ATOMIC_SEQ_CST, "workgroup", "image");
+
+  __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent", "image");
+
+  __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "", "image");
+
+  __builtin_amdgcn_fence(4, "agent", "image");
+
+  __builtin_amdgcn_fence(3, "workgroup", "image");
+}
+
+// CHECK-LABEL: define dso_local void @_Z10test_mixedv(
+// CHECK-SAME: ) #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:fence syncscope("workgroup") seq_cst, !mmra [[META6:![0-9]+]]
+// CHECK-NEXT:fence syncscope("workgroup") seq_cst, !mmra [[META7:![0-9]+]]
+// CHECK-NEXT:ret void
+//
+void test_mixed() {
+  __builtin_amdgcn_fence( __ATOMIC_SEQ_CST, "workgroup", "image", "global");
+  __builtin_amdgcn_fence( __ATOMIC_SEQ_CST, "workgroup", "image", "local", 
"global");
+}

arsenm wrote:

Maybe test repeated AS name 

https://github.com/llvm/llvm-project/pull/78572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [modules] Accept equivalent module caches from different symlink (PR #90925)

2024-05-07 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/90925

>From 4760ebce0ff7725f4bb75f5107f551d867e4db6d Mon Sep 17 00:00:00 2001
From: Ellis Hoag 
Date: Thu, 2 May 2024 17:47:38 -0700
Subject: [PATCH 1/4] [modules] Accept equivalent module caches from different
 symlink

Use `fs::equivalent()`, which follows symlinks, to check if two module cache 
paths are equivalent. This prevents a PCH error when building from a different 
path that is a symlink of the original.

```
error: PCH was compiled with module cache path 
'/home/foo/blah/ModuleCache/2IBP1TNT8OR8D', but the path is currently 
'/data/users/foo/blah/ModuleCache/2IBP1TNT8OR8D'
1 error generated.
```
---
 clang/lib/Serialization/ASTReader.cpp | 20 +---
 clang/test/Modules/module-symlink.m   | 11 +++
 2 files changed, 20 insertions(+), 11 deletions(-)
 create mode 100644 clang/test/Modules/module-symlink.m

diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 0ef57a3ea804ef..c20ead8b865692 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -839,17 +839,15 @@ static bool checkHeaderSearchOptions(const 
HeaderSearchOptions ,
  DiagnosticsEngine *Diags,
  const LangOptions ,
  const PreprocessorOptions ) {
-  if (LangOpts.Modules) {
-if (SpecificModuleCachePath != ExistingModuleCachePath &&
-!PPOpts.AllowPCHWithDifferentModulesCachePath) {
-  if (Diags)
-Diags->Report(diag::err_pch_modulecache_mismatch)
-  << SpecificModuleCachePath << ExistingModuleCachePath;
-  return true;
-}
-  }
-
-  return false;
+  if (!LangOpts.Modules || PPOpts.AllowPCHWithDifferentModulesCachePath ||
+  SpecificModuleCachePath == ExistingModuleCachePath ||
+  llvm::sys::fs::equivalent(SpecificModuleCachePath,
+ExistingModuleCachePath))
+return false;
+  if (Diags)
+Diags->Report(diag::err_pch_modulecache_mismatch)
+<< SpecificModuleCachePath << ExistingModuleCachePath;
+  return true;
 }
 
 bool PCHValidator::ReadHeaderSearchOptions(const HeaderSearchOptions ,
diff --git a/clang/test/Modules/module-symlink.m 
b/clang/test/Modules/module-symlink.m
new file mode 100644
index 00..be447449a0e81e
--- /dev/null
+++ b/clang/test/Modules/module-symlink.m
@@ -0,0 +1,11 @@
+// RUN: rm -rf %t
+// RUN: %clang_cc1 -fmodules-cache-path=%t/modules -fmodules 
-fimplicit-module-maps -I %S/Inputs -emit-pch -o %t.pch %s -verify
+
+// RUN: ln -s %t/modules %t/modules.symlink
+// RUN: %clang_cc1 -fmodules-cache-path=%t/modules.symlink -fmodules 
-fimplicit-module-maps -I %S/Inputs -include-pch %t.pch %s -verify
+
+// expected-no-diagnostics
+
+@import ignored_macros;
+
+struct Point p;

>From 490eefe98e3dd020ff3e51c7f817ec2b3d3a2663 Mon Sep 17 00:00:00 2001
From: Ellis Hoag 
Date: Fri, 3 May 2024 09:50:11 -0700
Subject: [PATCH 2/4] Require shell to fix windows test

---
 clang/test/Modules/module-symlink.m | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/clang/test/Modules/module-symlink.m 
b/clang/test/Modules/module-symlink.m
index be447449a0e81e..9a69186c5ea28f 100644
--- a/clang/test/Modules/module-symlink.m
+++ b/clang/test/Modules/module-symlink.m
@@ -1,3 +1,5 @@
+// REQUIRES: shell
+
 // RUN: rm -rf %t
 // RUN: %clang_cc1 -fmodules-cache-path=%t/modules -fmodules 
-fimplicit-module-maps -I %S/Inputs -emit-pch -o %t.pch %s -verify
 

>From 6e58177107f854f42d3cdc70e796c425a1797798 Mon Sep 17 00:00:00 2001
From: Ellis Hoag 
Date: Fri, 3 May 2024 10:34:35 -0700
Subject: [PATCH 3/4] Use VFS to check if files are equal

---
 clang/lib/Serialization/ASTReader.cpp | 25 +++
 clang/test/Modules/module-symlink.m   |  1 +
 llvm/include/llvm/Support/VirtualFileSystem.h |  4 +++
 llvm/lib/Support/VirtualFileSystem.cpp| 10 
 4 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index c20ead8b865692..d35c870926f96e 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -833,16 +833,18 @@ bool SimpleASTReaderListener::ReadPreprocessorOptions(
 /// against the header search options in an existing preprocessor.
 ///
 /// \param Diags If non-null, produce diagnostics for any mismatches incurred.
-static bool checkHeaderSearchOptions(const HeaderSearchOptions ,
+static bool checkHeaderSearchOptions(llvm::vfs::FileSystem ,
  StringRef SpecificModuleCachePath,
  StringRef ExistingModuleCachePath,
  DiagnosticsEngine *Diags,
  const LangOptions ,
  const PreprocessorOptions ) {
   if 

[clang] [clang][SPIR-V] Always add convergence intrinsics (PR #88918)

2024-05-07 Thread Matt Arsenault via cfe-commits
Nathan =?utf-8?q?Gau=C3=ABr?= ,
Nathan =?utf-8?q?Gau=C3=ABr?= ,
Nathan =?utf-8?q?Gau=C3=ABr?= ,
Nathan =?utf-8?q?Gau=C3=ABr?= 
Message-ID:
In-Reply-To: 



@@ -1586,6 +1586,12 @@ class CodeGenModule : public CodeGenTypeCache {
   void AddGlobalDtor(llvm::Function *Dtor, int Priority = 65535,
  bool IsDtorAttrFunc = false);
 
+  // Return whether structured convergence intrinsics should be generated for
+  // this target.
+  bool shouldEmitConvergenceTokens() const {
+return getTriple().isSPIRVLogical();

arsenm wrote:

Should add a TODO this should just be unconditional in the future 

https://github.com/llvm/llvm-project/pull/88918
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][SPIR-V] Always add convergence intrinsics (PR #88918)

2024-05-07 Thread Matt Arsenault via cfe-commits
Nathan =?utf-8?q?Gau=C3=ABr?= ,
Nathan =?utf-8?q?Gau=C3=ABr?= ,
Nathan =?utf-8?q?Gau=C3=ABr?= ,
Nathan =?utf-8?q?Gau=C3=ABr?= 
Message-ID:
In-Reply-To: 


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/88918
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][SPIR-V] Always add convergence intrinsics (PR #88918)

2024-05-07 Thread Matt Arsenault via cfe-commits
Nathan =?utf-8?q?Gauër?= ,
Nathan =?utf-8?q?Gauër?= ,
Nathan =?utf-8?q?Gauër?= ,
Nathan =?utf-8?q?Gauër?= 
Message-ID:
In-Reply-To: 


https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/88918
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][SPIR-V] Always add convergence intrinsics (PR #88918)

2024-05-07 Thread Matt Arsenault via cfe-commits
Nathan =?utf-8?q?Gauër?= ,
Nathan =?utf-8?q?Gauër?= ,
Nathan =?utf-8?q?Gauër?= ,
Nathan =?utf-8?q?Gauër?= 
Message-ID:
In-Reply-To: 


https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/88918
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-07 Thread Matt Arsenault via cfe-commits


@@ -504,3 +508,16 @@ def AMDGPUdiv_fmas : PatFrags<(ops node:$src0, node:$src1, 
node:$src2, node:$vcc
 def AMDGPUperm : PatFrags<(ops node:$src0, node:$src1, node:$src2),
   [(int_amdgcn_perm node:$src0, node:$src1, node:$src2),
(AMDGPUperm_impl node:$src0, node:$src1, node:$src2)]>;
+
+def AMDGPUreadlane : PatFrags<(ops node:$src0, node:$src1),
+  [(int_amdgcn_readlane node:$src0, node:$src1),
+   (AMDGPUreadlane_impl node:$src0, node:$src1)]>;
+
+def AMDGPUreadfirstlane : PatFrags<(ops node:$src),
+  [(int_amdgcn_readfirstlane node:$src),
+   (AMDGPUreadfirstlane_impl node:$src)]>;
+
+def AMDGPUwritelane : PatFrags<(ops node:$src0, node:$src1, node:$src2),
+  [(int_amdgcn_writelane node:$src0, node:$src1, node:$src2),
+   (AMDGPUwritelane_impl node:$src0, node:$src1, node:$src2)]>;
+   

arsenm wrote:

Missing newline at end of file 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-07 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,130 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register , Register ,

arsenm wrote:

I think this helper is just making things more confusing. You can just handle 
the 3 cases separately with unmerge logic 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-07 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,130 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register , Register ,
+  Register ) -> Register {
+auto LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+if (Src2.isValid())
+  return (LaneOpDst.addUse(Src1).addUse(Src2)).getReg(0);
+if (Src1.isValid())
+  return (LaneOpDst.addUse(Src1)).getReg(0);
+return LaneOpDst.getReg(0);
+  };
+
+  Register Src1, Src2, Src0Valid, Src2Valid;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+if (Src2.isValid())
+  Src2Valid = B.buildBitcast(S32, Src2).getReg(0);
+Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid);
+B.buildBitcast(DstReg, LaneOp);
+MI.eraseFromParent();
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0);
+
+if (Src2.isValid()) {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+}
+Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOp);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOp);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto Src0Parts = B.buildUnmerge(S32, Src0);
+
+switch (IID) {
+case Intrinsic::amdgcn_readlane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  for (unsigned i = 0; i < NumParts; ++i)
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32})
+ .addUse(Src0Parts.getReg(i))
+ .addUse(Src1))
+.getReg(0));

arsenm wrote:

We should really add a buildIntrinsic overload that just takes the array of 
inputs like for other instructions 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-07 Thread Matt Arsenault via cfe-commits


@@ -5982,6 +5982,68 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  EVT VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [&](SDValue , SDValue , SDValue ,

arsenm wrote:

SDValue should be passed by value 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-07 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,130 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register , Register ,
+  Register ) -> Register {

arsenm wrote:

Register should be passed by value 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-07 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,130 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register , Register ,
+  Register ) -> Register {
+auto LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+if (Src2.isValid())
+  return (LaneOpDst.addUse(Src1).addUse(Src2)).getReg(0);
+if (Src1.isValid())
+  return (LaneOpDst.addUse(Src1)).getReg(0);

arsenm wrote:

Extra parentheses around this 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-05-03 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,25 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -disable-O0-optnone 
-emit-llvm \
+// RUN:   %s -o - | opt -S -passes=mem2reg | FileCheck %s
+
+// CHECK-LABEL: define dso_local half @test_convert_from_bf16_to_fp16(
+// CHECK-SAME: bfloat noundef [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[FPEXT:%.*]] = fpext bfloat [[A]] to float
+// CHECK-NEXT:[[FPTRUNC:%.*]] = fptrunc float [[FPEXT]] to half
+// CHECK-NEXT:ret half [[FPTRUNC]]
+//
+_Float16 test_convert_from_bf16_to_fp16(__bf16 a) {
+return (_Float16)a;
+}
+
+// CHECK-LABEL: define dso_local bfloat @test_convert_from_fp16_to_bf16(
+// CHECK-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[FPEXT:%.*]] = fpext half [[A]] to float
+// CHECK-NEXT:[[FPTRUNC:%.*]] = fptrunc float [[FPEXT]] to bfloat
+// CHECK-NEXT:ret bfloat [[FPTRUNC]]
+//
+__bf16 test_convert_from_fp16_to_bf16(_Float16 a) {
+return (__bf16)a;
+}
+

arsenm wrote:

I think these tests need to be additive. The vector behavior seems to be 
different between standard C and the proper vector languages? 

https://github.com/llvm/llvm-project/pull/89051
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-05-03 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> ping Ping Do you have another review comment?

This has now confused me. You should roll back to the case where you only 
changed the scalar behavior. Any vector behavior change should be a separate 
PR, if that is even correct. I would still like to know what the gcc behavior 
is in this case 


https://github.com/llvm/llvm-project/pull/89051
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Allow the `__builtin_flt_rounds` functions on AMDGPU (PR #90994)

2024-05-03 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/90994
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-05-02 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,94 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32)
+return true;
+
+  if (Size < 32) {
+auto Ext = B.buildAnyExt(LLT::scalar(32), Src0).getReg(0);
+auto LaneOpDst =
+B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}).addUse(Ext);
+if (IID == Intrinsic::amdgcn_readlane ||
+IID == Intrinsic::amdgcn_writelane) {
+  auto Src1 = MI.getOperand(3).getReg();
+  LaneOpDst = LaneOpDst.addUse(Src1);
+  if (IID == Intrinsic::amdgcn_writelane) {
+auto Src2 = MI.getOperand(4).getReg();
+auto Ext2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0);
+LaneOpDst = LaneOpDst.addUse(Ext2);
+  }
+}
+B.buildTrunc(DstReg, LaneOpDst).getReg(0);

arsenm wrote:

The .getReg(0) does nothing here 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-05-02 Thread Matt Arsenault via cfe-commits


@@ -504,3 +508,15 @@ def AMDGPUdiv_fmas : PatFrags<(ops node:$src0, node:$src1, 
node:$src2, node:$vcc
 def AMDGPUperm : PatFrags<(ops node:$src0, node:$src1, node:$src2),
   [(int_amdgcn_perm node:$src0, node:$src1, node:$src2),
(AMDGPUperm_impl node:$src0, node:$src1, node:$src2)]>;
+
+def AMDGPUreadlane : PatFrags<(ops node:$src0, node:$src1),
+  [(int_amdgcn_readlane node:$src0, node:$src1),
+   (AMDGPUreadlane_impl node:$src0, node:$src1)]>;
+
+def AMDGPUreadfirstlane : PatFrags<(ops node:$src),
+  [(int_amdgcn_readfirstlane node:$src),
+   (AMDGPUreadfirstlane_impl node:$src)]>;
+
+def AMDGPUwritelane : PatFrags<(ops node:$src0, node:$src1, node:$src2),
+  [(int_amdgcn_writelane node:$src0, node:$src1, node:$src2),
+   (AMDGPUwritelane_impl node:$src0, node:$src1, node:$src2)]>;

arsenm wrote:

Missing newline end of file 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-05-02 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,94 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32)
+return true;
+
+  if (Size < 32) {
+auto Ext = B.buildAnyExt(LLT::scalar(32), Src0).getReg(0);
+auto LaneOpDst =
+B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}).addUse(Ext);
+if (IID == Intrinsic::amdgcn_readlane ||
+IID == Intrinsic::amdgcn_writelane) {
+  auto Src1 = MI.getOperand(3).getReg();
+  LaneOpDst = LaneOpDst.addUse(Src1);
+  if (IID == Intrinsic::amdgcn_writelane) {
+auto Src2 = MI.getOperand(4).getReg();
+auto Ext2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0);
+LaneOpDst = LaneOpDst.addUse(Ext2);
+  }
+}
+B.buildTrunc(DstReg, LaneOpDst).getReg(0);
+  } else if ((Size % 32) == 0) {
+SmallVector Src0Parts, PartialRes;
+unsigned NumParts = Size / 32;
+auto WideReg = MRI.createGenericVirtualRegister(LLT::scalar(NumParts * 
32));
+for (unsigned i = 0; i < NumParts; ++i) {
+  Src0Parts.push_back(MRI.createGenericVirtualRegister(S32));
+}
+
+B.buildUnmerge(Src0Parts, Src0);

arsenm wrote:

buildUnmerge should handle all of this for you if you just pass the scalar type 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-05-02 Thread Matt Arsenault via cfe-commits


@@ -6091,6 +5982,70 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  auto VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [&](SDValue , SDValue , SDValue ,
+  MVT VT) -> SDValue {
+return (Src2.getNode()

arsenm wrote:

Don't need .getNode for boolean test 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-05-02 Thread Matt Arsenault via cfe-commits


@@ -5386,6 +5386,94 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32)
+return true;
+
+  if (Size < 32) {
+auto Ext = B.buildAnyExt(LLT::scalar(32), Src0).getReg(0);
+auto LaneOpDst =
+B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}).addUse(Ext);
+if (IID == Intrinsic::amdgcn_readlane ||
+IID == Intrinsic::amdgcn_writelane) {
+  auto Src1 = MI.getOperand(3).getReg();
+  LaneOpDst = LaneOpDst.addUse(Src1);
+  if (IID == Intrinsic::amdgcn_writelane) {
+auto Src2 = MI.getOperand(4).getReg();
+auto Ext2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0);
+LaneOpDst = LaneOpDst.addUse(Ext2);
+  }
+}
+B.buildTrunc(DstReg, LaneOpDst).getReg(0);

arsenm wrote:

Should just early exit at this point 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-05-02 Thread Matt Arsenault via cfe-commits


@@ -6091,6 +5982,70 @@ static SDValue lowerBALLOTIntrinsic(const 
SITargetLowering , SDNode *N,
   DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
 }
 
+static SDValue lowerLaneOp(const SITargetLowering , SDNode *N,
+   SelectionDAG ) {
+  auto VT = N->getValueType(0);
+  unsigned ValSize = VT.getSizeInBits();
+  unsigned IntrinsicID = N->getConstantOperandVal(0);
+  SDValue Src0 = N->getOperand(1);
+  SDLoc SL(N);
+  MVT IntVT = MVT::getIntegerVT(ValSize);
+
+  auto createLaneOp = [&](SDValue , SDValue , SDValue ,
+  MVT VT) -> SDValue {
+return (Src2.getNode()
+? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2})
+: Src1.getNode()
+? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1})
+: DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0}));
+  };
+
+  SDValue Src1, Src2, Src0Valid, Src2Valid;
+  if (IntrinsicID == Intrinsic::amdgcn_readlane ||
+  IntrinsicID == Intrinsic::amdgcn_writelane) {
+Src1 = N->getOperand(2);
+if (IntrinsicID == Intrinsic::amdgcn_writelane)
+  Src2 = N->getOperand(3);
+  }
+
+  if (ValSize == 32) {
+if (VT == MVT::i32)
+  // Already legal
+  return SDValue();
+Src0Valid = DAG.getBitcast(IntVT, Src0);
+if (Src2.getNode())
+  Src2Valid = DAG.getBitcast(IntVT, Src2);
+auto LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid, MVT::i32);
+return DAG.getBitcast(VT, LaneOp);
+  }
+
+  if (ValSize < 32) {
+auto InitBitCast = DAG.getBitcast(IntVT, Src0);
+Src0Valid = DAG.getAnyExtOrTrunc(InitBitCast, SL, MVT::i32);
+if (Src2.getNode()) {
+  auto Src2Cast = DAG.getBitcast(IntVT, Src2);
+  Src2Valid = DAG.getAnyExtOrTrunc(Src2Cast, SL, MVT::i32);
+}
+auto LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid, MVT::i32);
+auto Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT);
+return DAG.getBitcast(VT, Trunc);
+  }
+
+  if ((ValSize % 32) == 0) {
+MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32);
+Src0Valid = DAG.getBitcast(VecVT, Src0);
+
+if (Src2.getNode())
+  Src2Valid = DAG.getBitcast(VecVT, Src2);
+
+auto LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid, VecVT);
+auto UnrolledLaneOp = DAG.UnrollVectorOp(LaneOp.getNode());

arsenm wrote:

no autos 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add OpenCL-specific fence address space masks (PR #78572)

2024-05-02 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> I'm now wondering if adding a new builtin is needed at all, or if it should 
> just be part of the original builtin? It's an additive change.

Maybe?

> 
> Should we also rename the MMRA to `amdgpu-fence-as` (remove OpenCL from the 
> name) ?
> 

I definitely do not want to maintain any language names in anything 


https://github.com/llvm/llvm-project/pull/78572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [IR] Add getelementptr nusw and nuw flags (PR #90824)

2024-05-02 Thread Matt Arsenault via cfe-commits


@@ -316,3 +316,82 @@ define <2 x i32> @test_trunc_both_reversed_vector(<2 x 
i64> %a) {
   %res = trunc nsw nuw <2 x i64> %a to <2 x i32>
   ret <2 x i32> %res
 }
+
+define ptr @gep_nuw(ptr %p, i64 %idx) {
+; CHECK: %gep = getelementptr nuw i8, ptr %p, i64 %idx
+  %gep = getelementptr nuw i8, ptr %p, i64 %idx
+  ret ptr %gep
+}
+
+define ptr @gep_inbounds_nuw(ptr %p, i64 %idx) {
+; CHECK: %gep = getelementptr inbounds nuw i8, ptr %p, i64 %idx
+  %gep = getelementptr inbounds nuw i8, ptr %p, i64 %idx
+  ret ptr %gep
+}
+
+define ptr @gep_nusw(ptr %p, i64 %idx) {
+; CHECK: %gep = getelementptr nusw i8, ptr %p, i64 %idx
+  %gep = getelementptr nusw i8, ptr %p, i64 %idx
+  ret ptr %gep
+}
+
+; inbounds implies nusw, so the flag is not printed back.
+define ptr @gep_inbounds_nusw(ptr %p, i64 %idx) {
+; CHECK: %gep = getelementptr inbounds i8, ptr %p, i64 %idx
+  %gep = getelementptr inbounds nusw i8, ptr %p, i64 %idx
+  ret ptr %gep
+}
+
+define ptr @gep_nusw_nuw(ptr %p, i64 %idx) {
+; CHECK: %gep = getelementptr nusw nuw i8, ptr %p, i64 %idx
+  %gep = getelementptr nusw nuw i8, ptr %p, i64 %idx
+  ret ptr %gep
+}
+
+define ptr @gep_inbounds_nusw_nuw(ptr %p, i64 %idx) {
+; CHECK: %gep = getelementptr inbounds nuw i8, ptr %p, i64 %idx
+  %gep = getelementptr inbounds nusw nuw i8, ptr %p, i64 %idx
+  ret ptr %gep
+}
+
+define ptr @gep_nuw_nusw_inbounds(ptr %p, i64 %idx) {
+; CHECK: %gep = getelementptr inbounds nuw i8, ptr %p, i64 %idx
+  %gep = getelementptr nuw nusw inbounds i8, ptr %p, i64 %idx
+  ret ptr %gep
+}
+
+define ptr @const_gep_nuw(ptr %p, i64 %idx) {
+; CHECK: ret ptr getelementptr nuw (i8, ptr @addr, i64 100)
+  ret ptr getelementptr nuw (i8, ptr @addr, i64 100)
+}
+
+define ptr @const_gep_inbounds_nuw(ptr %p, i64 %idx) {
+; CHECK: ret ptr getelementptr inbounds nuw (i8, ptr @addr, i64 100)
+  ret ptr getelementptr inbounds nuw (i8, ptr @addr, i64 100)
+}
+
+define ptr @const_gep_nusw(ptr %p, i64 %idx) {
+; CHECK: ret ptr getelementptr nusw (i8, ptr @addr, i64 100)
+  ret ptr getelementptr nusw (i8, ptr @addr, i64 100)
+}
+
+; inbounds implies nusw, so the flag is not printed back.
+define ptr @const_gep_inbounds_nusw(ptr %p, i64 %idx) {
+; CHECK: ret ptr getelementptr inbounds (i8, ptr @addr, i64 100)
+  ret ptr getelementptr inbounds nusw (i8, ptr @addr, i64 100)
+}
+
+define ptr @const_gep_nusw_nuw(ptr %p, i64 %idx) {
+; CHECK: ret ptr getelementptr nusw nuw (i8, ptr @addr, i64 100)
+  ret ptr getelementptr nusw nuw (i8, ptr @addr, i64 100)
+}
+
+define ptr @const_gep_inbounds_nusw_nuw(ptr %p, i64 %idx) {
+; CHECK: ret ptr getelementptr inbounds nuw (i8, ptr @addr, i64 100)
+  ret ptr getelementptr inbounds nusw nuw (i8, ptr @addr, i64 100)
+}
+
+define ptr @const_gep_nuw_nusw_inbounds(ptr %p, i64 %idx) {
+; CHECK: ret ptr getelementptr inbounds nuw (i8, ptr @addr, i64 100)
+  ret ptr getelementptr nuw nusw inbounds (i8, ptr @addr, i64 100)
+}

arsenm wrote:

Maybe test non-0 AS and vectors?

https://github.com/llvm/llvm-project/pull/90824
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Clean up denormal handling with -ffp-model, -ffast-math, etc. (PR #89477)

2024-04-26 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89477
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Clean up denormal handling with -ffp-model, -ffast-math, etc. (PR #89477)

2024-04-26 Thread Matt Arsenault via cfe-commits


@@ -1462,6 +1460,14 @@ floating point semantic models: precise (the default), 
strict, and fast.
   "allow_approximate_fns", "off", "off", "on"
   "allow_reassociation", "off", "off", "on"
 
+The ``-fp-model`` option does not modify the "fdenormal-fp-math" or
+"fdenormal-fp-math-f32" settings, but it does have an impact on whether

arsenm wrote:

IIRC denormal-fp-math-f32 is only a cc1 flag not exposed to end users 

https://github.com/llvm/llvm-project/pull/89477
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add OpenCL-specific fence address space masks (PR #78572)

2024-04-26 Thread Matt Arsenault via cfe-commits


@@ -18319,6 +18320,26 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned 
BuiltinID,
   return nullptr;
 }
 
+void CodeGenFunction::AddAMDGCNAddressSpaceMMRA(llvm::Instruction *Inst,
+llvm::Value *ASMask) {
+  constexpr const char *Tag = "opencl-fence-mem";
+
+  uint64_t Mask = cast(ASMask)->getZExtValue();
+  if (Mask == 0)
+return;
+
+  // 3 bits can be set: local, global, image in that order.
+  LLVMContext  = Inst->getContext();
+  SmallVector MMRAs;
+  if (Mask & (1 << 0))

arsenm wrote:

Space separated is weird. I meant more 
__builtin_amdgcn_something_fence("somesyncscope", ordering, "addrspace0", 
"addrspace1", ...) 

https://github.com/llvm/llvm-project/pull/78572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   10   >