[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/93064 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/arsenm commented: Should lose the [WIP] in the title https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)
@@ -12385,4 +12385,8 @@ def err_acc_reduction_composite_type def err_acc_reduction_composite_member_type :Error< "OpenACC 'reduction' composite variable must not have non-scalar field">; def note_acc_reduction_composite_member_loc : Note<"invalid field is here">; + +// AMDGCN builtins diagnostics +def err_amdgcn_global_load_lds_size_invalid_value : Error<"invalid size value">; +def note_amdgcn_global_load_lds_size_valid_value : Note<"size must be 1/2/4">; arsenm wrote: Not sure what the message phrasing guidelines are here, but probably should spill out 1, 2, or 4 rather than using / https://github.com/llvm/llvm-project/pull/93064 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)
@@ -0,0 +1,13 @@ +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-unknown -target-cpu gfx940 -S -verify -o - %s +// REQUIRES: amdgpu-registered-target + +typedef unsigned int u32; + +void test_global_load_lds_unsupported_size(global u32* src, local u32 *dst, u32 size) { + __builtin_amdgcn_global_load_lds(src, dst, size, /*offset=*/0, /*aux=*/0); // expected-error{{expression is not an integer constant expression}} + __builtin_amdgcn_global_load_lds(src, dst, /*size=*/5, /*offset=*/0, /*aux=*/0); // expected-error{{invalid size value}} expected-note {{size must be 1/2/4}} + __builtin_amdgcn_global_load_lds(src, dst, /*size=*/0, /*offset=*/0, /*aux=*/0); // expected-error{{invalid size value}} expected-note {{size must be 1/2/4}} arsenm wrote: Didn't add negative value test https://github.com/llvm/llvm-project/pull/93064 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Introduce target-specific `Sema` components (PR #93179)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/93179 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Introduce target-specific `Sema` components (PR #93179)
https://github.com/arsenm commented: Should update the GitHub autolabeler paths for the targets https://github.com/llvm/llvm-project/pull/93179 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,62 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +SDValue InitBitCast = DAG.getBitcast(IntVT, Src0); +Src0 = DAG.getAnyExtOrTrunc(InitBitCast, SL, MVT::i32); +if (Src2.getNode()) { + SDValue Src2Cast = DAG.getBitcast(IntVT, Src2); arsenm wrote: Yes, bitcast for the f16/bf16 case to get to the int https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5456,43 +5444,32 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , if ((Size % 32) == 0) { SmallVector PartialRes; unsigned NumParts = Size / 32; -auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16; +bool IsS16Vec = Ty.isVector() && Ty.getElementType() == S16; arsenm wrote: Better to track this as the LLT to use for the pieces, rather than making it this conditional thing. This will simplify improved pointer handling in the future https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] IR: Add module level attribution language-standard (PR #93159)
https://github.com/arsenm requested changes to this pull request. You cannot encode language standards in this. We should simply have different operations that provide the range of semantics and not make the IR modal https://github.com/llvm/llvm-project/pull/93159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)
@@ -0,0 +1,9 @@ +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-unknown -target-cpu gfx940 -S -verify -o - %s +// REQUIRES: amdgpu-registered-target + +typedef unsigned int u32; + +void test_global_load_lds_unsupported_size(global u32* src, local u32 *dst, u32 size) { + __builtin_amdgcn_global_load_lds(src, dst, size, /*offset=*/0, /*aux=*/0); // expected-error{{expression is not an integer constant expression}} + __builtin_amdgcn_global_load_lds(src, dst, /*size=*/5, /*offset=*/0, /*aux=*/0); // expected-error{{invalid size value}} expected-note {{size must be 1/2/4}} arsenm wrote: Test 0, -1, 3, 12, 16? https://github.com/llvm/llvm-project/pull/93064 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)
@@ -0,0 +1,9 @@ +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-unknown -target-cpu gfx940 -S -verify -o - %s +// REQUIRES: amdgpu-registered-target + +typedef unsigned int u32; + +void test_global_load_lds_unsupported_size(global u32* src, local u32 *dst, u32 size) { + __builtin_amdgcn_global_load_lds(src, dst, size, /*offset=*/0, /*aux=*/0); // expected-error{{size must be a constant}} expected-error{{cannot compile this builtin function yet}} arsenm wrote: Why is cannot compile this builtin function yet here? https://github.com/llvm/llvm-project/pull/93064 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)
@@ -2537,6 +2537,47 @@ static RValue EmitHipStdParUnsupportedBuiltin(CodeGenFunction *CGF, return RValue::get(CGF->Builder.CreateCall(UBF, Args)); } +static void buildInstrinsicCallArgs(CodeGenFunction , const CallExpr *E, arsenm wrote: Shouldn't need any CGBUiltin changes? https://github.com/llvm/llvm-project/pull/93064 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)
@@ -19040,6 +19040,48 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType}); return Builder.CreateCall(F, {Arg}); } + case AMDGPU::BI__builtin_amdgcn_global_load_lds: { +SmallVector Args; +unsigned ICEArguments = 0; +ASTContext::GetBuiltinTypeError Error; +getContext().GetBuiltinType(BuiltinID, Error, ); +assert(Error == ASTContext::GE_None && "Should not codegen an error"); +Function *F = CGM.getIntrinsic(Intrinsic::amdgcn_global_load_lds); +llvm::FunctionType *FTy = F->getFunctionType(); +for (unsigned i = 0, e = E->getNumArgs(); i != e; ++i) { + Value *ArgValue = EmitScalarOrConstFoldImmArg(ICEArguments, i, E); + llvm::Type *PTy = FTy->getParamType(i); + if (PTy != ArgValue->getType()) { +if (auto *PtrTy = dyn_cast(PTy)) { + if (PtrTy->getAddressSpace() != + ArgValue->getType()->getPointerAddressSpace()) { +ArgValue = Builder.CreateAddrSpaceCast( +ArgValue, llvm::PointerType::get(getLLVMContext(), + PtrTy->getAddressSpace())); + } +} arsenm wrote: > Because the builtin can be used not only in OpenCL, I don't think it would be > good to put it in SemaOpenCL. But your test case is written in OpenCL. You can write a run line for other languages if really needed, but for this you don't really need it > @yxsamliu Do we have Sema for builtin? Yes, everything does https://github.com/llvm/llvm-project/pull/93064 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)
@@ -0,0 +1,9 @@ +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-unknown -target-cpu gfx940 -S -verify -o - %s +// REQUIRES: amdgpu-registered-target arsenm wrote: Test belongs in SemaOpenCL https://github.com/llvm/llvm-project/pull/93064 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)
@@ -19040,6 +19040,48 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType}); return Builder.CreateCall(F, {Arg}); } + case AMDGPU::BI__builtin_amdgcn_global_load_lds: { +SmallVector Args; +unsigned ICEArguments = 0; +ASTContext::GetBuiltinTypeError Error; +getContext().GetBuiltinType(BuiltinID, Error, ); +assert(Error == ASTContext::GE_None && "Should not codegen an error"); +Function *F = CGM.getIntrinsic(Intrinsic::amdgcn_global_load_lds); +llvm::FunctionType *FTy = F->getFunctionType(); +for (unsigned i = 0, e = E->getNumArgs(); i != e; ++i) { + Value *ArgValue = EmitScalarOrConstFoldImmArg(ICEArguments, i, E); + llvm::Type *PTy = FTy->getParamType(i); + if (PTy != ArgValue->getType()) { +if (auto *PtrTy = dyn_cast(PTy)) { + if (PtrTy->getAddressSpace() != + ArgValue->getType()->getPointerAddressSpace()) { +ArgValue = Builder.CreateAddrSpaceCast( +ArgValue, llvm::PointerType::get(getLLVMContext(), + PtrTy->getAddressSpace())); + } +} +ArgValue = Builder.CreateBitCast(ArgValue, PTy); arsenm wrote: Should never have to create a pointer bitcast https://github.com/llvm/llvm-project/pull/93064 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)
@@ -19040,6 +19040,48 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType}); return Builder.CreateCall(F, {Arg}); } + case AMDGPU::BI__builtin_amdgcn_global_load_lds: { +SmallVector Args; +unsigned ICEArguments = 0; +ASTContext::GetBuiltinTypeError Error; +getContext().GetBuiltinType(BuiltinID, Error, ); +assert(Error == ASTContext::GE_None && "Should not codegen an error"); +Function *F = CGM.getIntrinsic(Intrinsic::amdgcn_global_load_lds); +llvm::FunctionType *FTy = F->getFunctionType(); +for (unsigned i = 0, e = E->getNumArgs(); i != e; ++i) { + Value *ArgValue = EmitScalarOrConstFoldImmArg(ICEArguments, i, E); + llvm::Type *PTy = FTy->getParamType(i); + if (PTy != ArgValue->getType()) { +if (auto *PtrTy = dyn_cast(PTy)) { + if (PtrTy->getAddressSpace() != + ArgValue->getType()->getPointerAddressSpace()) { +ArgValue = Builder.CreateAddrSpaceCast( +ArgValue, llvm::PointerType::get(getLLVMContext(), + PtrTy->getAddressSpace())); + } +} arsenm wrote: You shouldn't have to adjust the codegen at all to emit a diagnostic https://github.com/llvm/llvm-project/pull/93064 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (PR #93064)
@@ -19040,6 +19040,48 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType}); return Builder.CreateCall(F, {Arg}); } + case AMDGPU::BI__builtin_amdgcn_global_load_lds: { +SmallVector Args; +unsigned ICEArguments = 0; +ASTContext::GetBuiltinTypeError Error; +getContext().GetBuiltinType(BuiltinID, Error, ); +assert(Error == ASTContext::GE_None && "Should not codegen an error"); +Function *F = CGM.getIntrinsic(Intrinsic::amdgcn_global_load_lds); +llvm::FunctionType *FTy = F->getFunctionType(); +for (unsigned i = 0, e = E->getNumArgs(); i != e; ++i) { + Value *ArgValue = EmitScalarOrConstFoldImmArg(ICEArguments, i, E); + llvm::Type *PTy = FTy->getParamType(i); + if (PTy != ArgValue->getType()) { +if (auto *PtrTy = dyn_cast(PTy)) { + if (PtrTy->getAddressSpace() != + ArgValue->getType()->getPointerAddressSpace()) { +ArgValue = Builder.CreateAddrSpaceCast( +ArgValue, llvm::PointerType::get(getLLVMContext(), + PtrTy->getAddressSpace())); + } +} +ArgValue = Builder.CreateBitCast(ArgValue, PTy); + } + Args.push_back(ArgValue); +} +constexpr const int SizeIdx = 2; +ConstantInt *SizeVal = dyn_cast(Args[SizeIdx]); +if (!SizeVal) { + CGM.Error(E->getExprLoc(), "size must be a constant"); arsenm wrote: These should be emitted in sema https://github.com/llvm/llvm-project/pull/93064 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)
@@ -678,6 +680,49 @@ class SIMemoryLegalizer final : public MachineFunctionPass { bool runOnMachineFunction(MachineFunction ) override; }; +static const StringMap ASNames = {{ +{"global", SIAtomicAddrSpace::GLOBAL}, +{"local", SIAtomicAddrSpace::LDS}, +}}; + +void diagnoseUnknownMMRAASName(const MachineInstr , StringRef AS) { + const MachineFunction *MF = MI.getMF(); + const Function = MF->getFunction(); + std::string Str; + raw_string_ostream OS(Str); arsenm wrote: SmallString + raw_svector_ostream? https://github.com/llvm/llvm-project/pull/78572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/78572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/78572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)
arsenm wrote: > Then I guess the MMRA should just have "global" and "local" for now, we can > always add more later if needed. What do you think? Yes, we don't have specific image counters. They are just vcmnt https://github.com/llvm/llvm-project/pull/78572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][Clang] Builtin for GLOBAL_LOAD_LDS on GFX940 (PR #92962)
@@ -240,6 +240,7 @@ TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_v2bf16, "V2sV2s*0V2s", "t", "at TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2bf16, "V2sV2s*1V2s", "t", "atomic-global-pk-add-bf16-inst") TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*3V2s", "t", "atomic-ds-pk-add-16-insts") TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2f16, "V2hV2h*3V2h", "t", "atomic-ds-pk-add-16-insts") +TARGET_BUILTIN(__builtin_amdgcn_global_load_lds, "vv*1v*3UiiUi", "t", "gfx940-insts") arsenm wrote: clang should really be enforcing the valid immediate values for the size https://github.com/llvm/llvm-project/pull/92962 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)
arsenm wrote: > I thought image memory = private. It's unclear to me, what AS does OpenCL > IMAGE memory map to in our backend? (But otherwise, yes, MMRA should just > have the backend names, the mapping of the OpenCL IMAGE to a backend AS > should be in the device-lib) Images are global memory with magical addressing and value interpretation on load/store. There's nothing private about them https://github.com/llvm/llvm-project/pull/78572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)
arsenm wrote: > @arsenm Should we use `image` or `private`? We could allow both in the > frontend, and only use `private` as the canonical MMRA. I don't understand why image would imply private. I would just keep at as private throughout https://github.com/llvm/llvm-project/pull/78572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
@@ -5433,7 +5450,16 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , ? Src0 : B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0); -if (Src2.isValid()) { + +if (IsPermLane16) { + Register Src1Cast = + MRI.getType(Src1).isScalar() + ? Src1 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); arsenm wrote: Like the other patch, shouldn't need any bitcasts https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
@@ -18479,6 +18479,25 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_update_dpp, Args[0]->getType()); return Builder.CreateCall(F, Args); } + case AMDGPU::BI__builtin_amdgcn_permlane16: + case AMDGPU::BI__builtin_amdgcn_permlanex16: { +Intrinsic::ID IID; +IID = BuiltinID == AMDGPU::BI__builtin_amdgcn_permlane16 + ? Intrinsic::amdgcn_permlane16 + : Intrinsic::amdgcn_permlanex16; + +llvm::Value *Src0 = EmitScalarExpr(E->getArg(0)); +llvm::Value *Src1 = EmitScalarExpr(E->getArg(1)); +llvm::Value *Src2 = EmitScalarExpr(E->getArg(2)); arsenm wrote: I assume EmitScalarExpr handles the immargs correctly? https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
@@ -18479,6 +18479,25 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_update_dpp, Args[0]->getType()); return Builder.CreateCall(F, Args); } + case AMDGPU::BI__builtin_amdgcn_permlane16: + case AMDGPU::BI__builtin_amdgcn_permlanex16: { +Intrinsic::ID IID; +IID = BuiltinID == AMDGPU::BI__builtin_amdgcn_permlane16 arsenm wrote: combine declare + define, also can sink down to use https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
https://github.com/arsenm commented: On this and the previous, can you add a section to AMDGPUUsage for the intrinsics and what types they support https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,192 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +// Already legal +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0); +if (Src2.isValid()) { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Src2 = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); +} + +Register LaneOpDst = createLaneOp(Src0, Src1, Src2); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOpDst); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDst); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16; arsenm wrote: no auto https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,192 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +// Already legal +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0); +if (Src2.isValid()) { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Src2 = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); +} + +Register LaneOpDst = createLaneOp(Src0, Src1, Src2); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOpDst); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDst); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16; +MachineInstrBuilder Src0Parts; + +if (Ty.isPointer()) { + auto PtrToInt = B.buildPtrToInt(LLT::scalar(Size), Src0); + Src0Parts = B.buildUnmerge(S32, PtrToInt); +} else if (Ty.isPointerVector()) { + LLT IntVecTy = Ty.changeElementType( + LLT::scalar(Ty.getElementType().getSizeInBits())); + auto PtrToInt = B.buildPtrToInt(IntVecTy, Src0); + Src0Parts = B.buildUnmerge(S32, PtrToInt); +} else + Src0Parts = + IsS16Vec ? B.buildUnmerge(V2S16, Src0) : B.buildUnmerge(S32, Src0); + +switch (IID) { +case Intrinsic::amdgcn_readlane: { + Register Src1 = MI.getOperand(3).getReg(); + for (unsigned i = 0; i < NumParts; ++i) { +Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0) +: Src0Parts.getReg(i); +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}) + .addUse(Src0) + .addUse(Src1)) +.getReg(0)); + } + break; +} +case Intrinsic::amdgcn_readfirstlane: { + for (unsigned i = 0; i < NumParts; ++i) { +Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0) +: Src0Parts.getReg(i); +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readfirstlane, {S32}) + .addUse(Src0) + .getReg(0))); + } + + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src1 = MI.getOperand(3).getReg(); + Register Src2 = MI.getOperand(4).getReg(); + MachineInstrBuilder Src2Parts; + + if (Ty.isPointer()) { +auto PtrToInt = B.buildPtrToInt(S64, Src2); +Src2Parts = B.buildUnmerge(S32, PtrToInt); + } else if (Ty.isPointerVector()) { +LLT IntVecTy = Ty.changeElementType( +LLT::scalar(Ty.getElementType().getSizeInBits())); +auto PtrToInt = B.buildPtrToInt(IntVecTy, Src2); +Src2Parts = B.buildUnmerge(S32, PtrToInt); + } else +Src2Parts = +IsS16Vec ? B.buildUnmerge(V2S16, Src2) : B.buildUnmerge(S32, Src2); arsenm wrote: The point of splitting out the pointer typed tests was to avoid fixing the handling of the pointer typed selection patterns. You still have the casts inserted here. You should not need any ptrtoint, inttoptr, or bitcasts in any of these legalizations https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/arsenm requested changes to this pull request. There should be no need to introduce same-sized value casts, whether bitcast or ptrtoint in either legalizer https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,62 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [, ](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +SDValue InitBitCast = DAG.getBitcast(IntVT, Src0); +Src0 = DAG.getAnyExtOrTrunc(InitBitCast, SL, MVT::i32); +if (Src2.getNode()) { + SDValue Src2Cast = DAG.getBitcast(IntVT, Src2); arsenm wrote: You should not have any bitcasts anywhere https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,192 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +// Already legal +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0); +if (Src2.isValid()) { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Src2 = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); +} + +Register LaneOpDst = createLaneOp(Src0, Src1, Src2); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOpDst); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDst); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16; +MachineInstrBuilder Src0Parts; + +if (Ty.isPointer()) { + auto PtrToInt = B.buildPtrToInt(LLT::scalar(Size), Src0); + Src0Parts = B.buildUnmerge(S32, PtrToInt); +} else if (Ty.isPointerVector()) { + LLT IntVecTy = Ty.changeElementType( + LLT::scalar(Ty.getElementType().getSizeInBits())); + auto PtrToInt = B.buildPtrToInt(IntVecTy, Src0); + Src0Parts = B.buildUnmerge(S32, PtrToInt); +} else + Src0Parts = + IsS16Vec ? B.buildUnmerge(V2S16, Src0) : B.buildUnmerge(S32, Src0); + +switch (IID) { +case Intrinsic::amdgcn_readlane: { + Register Src1 = MI.getOperand(3).getReg(); + for (unsigned i = 0; i < NumParts; ++i) { +Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0) +: Src0Parts.getReg(i); +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}) + .addUse(Src0) + .addUse(Src1)) +.getReg(0)); + } + break; +} +case Intrinsic::amdgcn_readfirstlane: { + for (unsigned i = 0; i < NumParts; ++i) { +Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0) +: Src0Parts.getReg(i); +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readfirstlane, {S32}) + .addUse(Src0) + .getReg(0))); + } + + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src1 = MI.getOperand(3).getReg(); + Register Src2 = MI.getOperand(4).getReg(); + MachineInstrBuilder Src2Parts; + + if (Ty.isPointer()) { +auto PtrToInt = B.buildPtrToInt(S64, Src2); +Src2Parts = B.buildUnmerge(S32, PtrToInt); + } else if (Ty.isPointerVector()) { +LLT IntVecTy = Ty.changeElementType( +LLT::scalar(Ty.getElementType().getSizeInBits())); +auto PtrToInt = B.buildPtrToInt(IntVecTy, Src2); +Src2Parts = B.buildUnmerge(S32, PtrToInt); + } else +Src2Parts = +IsS16Vec ? B.buildUnmerge(V2S16, Src2) : B.buildUnmerge(S32, Src2); + + for (unsigned i = 0; i < NumParts; ++i) { +Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0) +: Src0Parts.getReg(i); +Src2 = IsS16Vec ? B.buildBitcast(S32, Src2Parts.getReg(i)).getReg(0) +: Src2Parts.getReg(i); +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_writelane, {S32}) + .addUse(Src0) +
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,192 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +// Already legal +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0); +if (Src2.isValid()) { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Src2 = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); +} + +Register LaneOpDst = createLaneOp(Src0, Src1, Src2); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOpDst); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDst); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16; +MachineInstrBuilder Src0Parts; + +if (Ty.isPointer()) { + auto PtrToInt = B.buildPtrToInt(LLT::scalar(Size), Src0); + Src0Parts = B.buildUnmerge(S32, PtrToInt); +} else if (Ty.isPointerVector()) { + LLT IntVecTy = Ty.changeElementType( + LLT::scalar(Ty.getElementType().getSizeInBits())); + auto PtrToInt = B.buildPtrToInt(IntVecTy, Src0); + Src0Parts = B.buildUnmerge(S32, PtrToInt); +} else + Src0Parts = + IsS16Vec ? B.buildUnmerge(V2S16, Src0) : B.buildUnmerge(S32, Src0); + +switch (IID) { +case Intrinsic::amdgcn_readlane: { + Register Src1 = MI.getOperand(3).getReg(); + for (unsigned i = 0; i < NumParts; ++i) { +Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0) +: Src0Parts.getReg(i); +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}) + .addUse(Src0) + .addUse(Src1)) +.getReg(0)); + } + break; +} +case Intrinsic::amdgcn_readfirstlane: { + for (unsigned i = 0; i < NumParts; ++i) { +Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0) arsenm wrote: No bitcasts https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenCL] Fix an infinite loop in builidng AddrSpaceQualType (PR #92612)
@@ -0,0 +1,25 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4 +//RUN: %clang_cc1 %s -emit-llvm -O1 -o - | FileCheck %s arsenm wrote: codegen tests need an explicit target https://github.com/llvm/llvm-project/pull/92612 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,62 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [&](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { arsenm wrote: ? VT shadow https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, untyped]> { // FIXME: Specify SchedRW for READFIRSTLANE_B32 // TODO: There is VOP3 encoding also def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", VOP_READFIRSTLANE, - getVOP1Pat.ret, 1> { + [], 1> { let isConvergent = 1; } +foreach vt = Reg32Types.types in { + def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))), +(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0)) arsenm wrote: I'd rather just leave the pointer cases failing in the global isel case, and remove all the cast insertion bits. You could split out the pointer cases to a separate file and just not run it with globalisel. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, untyped]> { // FIXME: Specify SchedRW for READFIRSTLANE_B32 // TODO: There is VOP3 encoding also def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", VOP_READFIRSTLANE, - getVOP1Pat.ret, 1> { + [], 1> { let isConvergent = 1; } +foreach vt = Reg32Types.types in { + def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))), +(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0)) arsenm wrote: I think GlobalISelEmitter is just ignoring PtrValueType https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, untyped]> { // FIXME: Specify SchedRW for READFIRSTLANE_B32 // TODO: There is VOP3 encoding also def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", VOP_READFIRSTLANE, - getVOP1Pat.ret, 1> { + [], 1> { let isConvergent = 1; } +foreach vt = Reg32Types.types in { + def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))), +(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0)) arsenm wrote: Seems like this is just a GlobalISelEmitter bug (and another example of we we should have a way to write patterns that only care about the bit size) https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, untyped]> { // FIXME: Specify SchedRW for READFIRSTLANE_B32 // TODO: There is VOP3 encoding also def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", VOP_READFIRSTLANE, - getVOP1Pat.ret, 1> { + [], 1> { let isConvergent = 1; } +foreach vt = Reg32Types.types in { + def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))), +(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0)) arsenm wrote: What is the problem? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -780,14 +780,22 @@ defm V_SUBREV_U32 : VOP2Inst <"v_subrev_u32", VOP_I32_I32_I32_ARITH, null_frag, // These are special and do not read the exec mask. let isConvergent = 1, Uses = [] in { -def V_READLANE_B32 : VOP2_Pseudo<"v_readlane_b32", VOP_READLANE, - [(set i32:$vdst, (int_amdgcn_readlane i32:$src0, i32:$src1))]>; +def V_READLANE_B32 : VOP2_Pseudo<"v_readlane_b32", VOP_READLANE,[]>; let IsNeverUniform = 1, Constraints = "$vdst = $vdst_in", DisableEncoding="$vdst_in" in { -def V_WRITELANE_B32 : VOP2_Pseudo<"v_writelane_b32", VOP_WRITELANE, - [(set i32:$vdst, (int_amdgcn_writelane i32:$src0, i32:$src1, i32:$vdst_in))]>; +def V_WRITELANE_B32 : VOP2_Pseudo<"v_writelane_b32", VOP_WRITELANE, []>; } // End IsNeverUniform, $vdst = $vdst_in, DisableEncoding $vdst_in } // End isConvergent = 1 +foreach vt = Reg32Types.types in { + def : GCNPat<(vt (AMDGPUreadlane vt:$src0, i32:$src1)), +(V_READLANE_B32 VRegOrLdsSrc_32:$src0, SCSrc_b32:$src1) arsenm wrote: Same here, supply the type to the result instruction https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, untyped]> { // FIXME: Specify SchedRW for READFIRSTLANE_B32 // TODO: There is VOP3 encoding also def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", VOP_READFIRSTLANE, - getVOP1Pat.ret, 1> { + [], 1> { let isConvergent = 1; } +foreach vt = Reg32Types.types in { + def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))), +(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0)) arsenm wrote: ```suggestion (vt (V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0))) ``` This may fix your pattern type deduction issue https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [NFC][amdgpuarch] Correct file names in file header comments (PR #92294)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/92294 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [MC] Remove UseAssemblerInfoForParsing (PR #91082)
arsenm wrote: > It's still used: > > ``` > /work/kparzysz/git/llvm.org/mlir/lib/Target/LLVM/ROCDL/Target.cpp: In member > function ‘std::optional > > mlir::ROCDL::SerializeGPUModuleBase::assembleIsa(llvm::StringRef)’: > /work/kparzysz/git/llvm.org/mlir/lib/Target/LLVM/ROCDL/Target.cpp:302:15: > error: > ‘class llvm::MCStreamer’ has no member named ‘setUseAssemblerInfoForParsing’ > 302 | mcStreamer->setUseAssemblerInfoForParsing(true); > | ^ > ``` But why? I don't know what business MLIR could possibly have touching this, for AMDGPU of all things https://github.com/llvm/llvm-project/pull/91082 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [libc] [libcxx] [llvm] [mlir] Fix typo "indicies" (PR #92232)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/92232 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][SPIR-V] Always add convergence intrinsics (PR #88918)
Nathan =?utf-8?q?Gauër?= , Nathan =?utf-8?q?Gauër?= , Nathan =?utf-8?q?Gauër?= , Nathan =?utf-8?q?Gauër?= Message-ID: In-Reply-To: @@ -1586,6 +1586,12 @@ class CodeGenModule : public CodeGenTypeCache { void AddGlobalDtor(llvm::Function *Dtor, int Priority = 65535, bool IsDtorAttrFunc = false); + // Return whether structured convergence intrinsics should be generated for + // this target. + bool shouldEmitConvergenceTokens() const { +return getTriple().isSPIRVLogical(); arsenm wrote: This doesn't have anything to do with what the target wants to do with with the CFG lowering. Structurally the IR should not allow uncontrolled convergence, and that it exists is a wart that needs to exist until everywhere handles convergence tokens https://github.com/llvm/llvm-project/pull/88918 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,68 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [&](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +if (VT == MVT::i32) + // Already legal + return SDValue(); +Src0 = DAG.getBitcast(IntVT, Src0); arsenm wrote: Like the other cases, we should be able to avoid intermediate casting https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -3400,7 +3400,7 @@ def : GCNPat< // FIXME: Should also do this for readlane, but tablegen crashes on // the ignored src1. def : GCNPat< - (int_amdgcn_readfirstlane (i32 imm:$src)), + (i32 (AMDGPUreadfirstlane (i32 imm:$src))), arsenm wrote: We might need to make this fold more sophisticated for other types, but best for a follow up patch https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,212 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal arsenm wrote: Either add braces or don't put the comment under the if. Also, the size 32 case should be treated as directly legal for 32-bit pointers https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,212 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +auto IsPtr = Ty.isPointer(); +Src0 = IsPtr ? B.buildPtrToInt(S32, Src0).getReg(0) + : B.buildBitcast(S32, Src0).getReg(0); arsenm wrote: You should not need these casts. Any legal type should be directly accepted without these https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,153 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +MachineInstrBuilder LaneOpDst; +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid); + break; +} +case Intrinsic::amdgcn_readlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1); + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0); + LaneOpDst = B.buildIntrinsic(IID, {S32}) + .addUse(Src0Valid) + .addUse(Src1) + .addUse(Src2Valid); +} +} + +Register LaneOpDstReg = LaneOpDst.getReg(0); +B.buildBitcast(DstReg, LaneOpDstReg); +MI.eraseFromParent(); +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Register Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0); + +MachineInstrBuilder LaneOpDst; +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid); + break; +} +case Intrinsic::amdgcn_readlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1); + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Register Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); + LaneOpDst = B.buildIntrinsic(IID, {S32}) + .addUse(Src0Valid) + .addUse(Src1) + .addUse(Src2Valid); +} +} + +Register LaneOpDstReg = LaneOpDst.getReg(0); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOpDstReg); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDstReg); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto Src0Parts = B.buildUnmerge(S32, Src0); arsenm wrote: You avoid adding extra intermediate bitcasts https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [flang] [llvm] [mlir] [polly] [test]: fix filecheck annotation typos (PR #91854)
@@ -58,7 +58,7 @@ CHECK-CNT3-NOT: {{^}}this is duplicate CHECK-CNT4-COUNT-5: this is duplicate CHECK-CNT4-EMPTY: -Many-label: +Many-LABEL: arsenm wrote: I would be careful about touching FileCheck tests. The point might be the wrong label https://github.com/llvm/llvm-project/pull/91854 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)
@@ -4408,6 +4409,42 @@ Target-Specific Extensions Clang supports some language features conditionally on some targets. +AMDGPU Language Extensions +-- + +__builtin_amdgcn_fence +^^ + +``__builtin_amdgcn_fence`` emits a fence. + +* ``unsigned`` atomic ordering, e.g. ``__ATOMIC_ACQUIRE`` +* ``const char *`` synchronization scope, e.g. ``workgroup`` +* Zero or more ``const char *`` address spaces names. + +The address spaces arguments must be string literals with known values, such as: + +* ``"local"`` +* ``"global"`` +* ``"image"`` + +If one or more address space name are provided, the code generator will attempt +to emit potentially faster instructions that only fence those address spaces. +Emitting such instructions may not always be possible and the compiler is free +to fence more aggressively. + +If no address spaces names are provided, all address spaces are fenced. + +.. code-block:: c++ + + // Fence all address spaces. + __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup"); + __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent"); + + // Fence only requested address spaces. + __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local") arsenm wrote: Not sure we can get away without image https://github.com/llvm/llvm-project/pull/78572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [flang] [lld] [llvm] [mlir] [polly] [test]: fix filecheck annotation typos (PR #91854)
https://github.com/arsenm commented: amdgpu changes lgtm https://github.com/llvm/llvm-project/pull/91854 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ASAN] Add "sanitized_padded_global" llvm ir attribute to identify sanitizer instrumented globals (PR #68865)
arsenm wrote: > > (You can even place `.quad sym[0].hash; .long sym[0].size` in a section > > `SHF_LINK_ORDER` linking to the global variable for linker garbage > > collection.) > > The runtime can build a map correlating hashes to sizes, which can be used > > to answer variable size queries. > > AMD language runtimes provide queries for the size of device global symbols > and functions to copy data to and from device global variables. Currently, > runtime gets the needed information form the ELF symbol sizes in the symbol > table. So, in #70166 We have come up with approach of adding two symbols (at > the same offset but with different sizes) for the same global, one symbol > which reports actual global size and other symbol which reports instrumented > size. Have you looked into switching to the suggested approach of having a separately emitted field? https://github.com/llvm/llvm-project/pull/68865 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -2176,26 +2176,23 @@ def int_amdgcn_wave_reduce_umin : AMDGPUWaveReduce; def int_amdgcn_wave_reduce_umax : AMDGPUWaveReduce; def int_amdgcn_readfirstlane : - ClangBuiltin<"__builtin_amdgcn_readfirstlane">, - Intrinsic<[llvm_i32_ty], [llvm_i32_ty], + Intrinsic<[llvm_any_ty], [LLVMMatchType<0>], [IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>; // The lane argument must be uniform across the currently active threads of the // current wave. Otherwise, the result is undefined. def int_amdgcn_readlane : - ClangBuiltin<"__builtin_amdgcn_readlane">, - Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty], + Intrinsic<[llvm_any_ty], [LLVMMatchType<0>, llvm_i32_ty], [IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>; // The value to write and lane select arguments must be uniform across the // currently active threads of the current wave. Otherwise, the result is // undefined. def int_amdgcn_writelane : - ClangBuiltin<"__builtin_amdgcn_writelane">, - Intrinsic<[llvm_i32_ty], [ -llvm_i32_ty,// uniform value to write: returned by the selected lane -llvm_i32_ty,// uniform lane select -llvm_i32_ty // returned by all lanes other than the selected one + Intrinsic<[llvm_any_ty], [ +LLVMMatchType<0>,// uniform value to write: returned by the selected lane +llvm_i32_ty,// uniform lane select +LLVMMatchType<0> // returned by all lanes other than the selected one arsenm wrote: Comments are no longer aligned https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,153 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +MachineInstrBuilder LaneOpDst; +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid); + break; +} +case Intrinsic::amdgcn_readlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1); + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0); + LaneOpDst = B.buildIntrinsic(IID, {S32}) + .addUse(Src0Valid) + .addUse(Src1) + .addUse(Src2Valid); +} +} + +Register LaneOpDstReg = LaneOpDst.getReg(0); +B.buildBitcast(DstReg, LaneOpDstReg); +MI.eraseFromParent(); +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Register Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0); + +MachineInstrBuilder LaneOpDst; +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid); + break; +} +case Intrinsic::amdgcn_readlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1); + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Register Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); + LaneOpDst = B.buildIntrinsic(IID, {S32}) + .addUse(Src0Valid) + .addUse(Src1) + .addUse(Src2Valid); +} +} + +Register LaneOpDstReg = LaneOpDst.getReg(0); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOpDstReg); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDstReg); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto Src0Parts = B.buildUnmerge(S32, Src0); arsenm wrote: For the multiple of `<2 x s16>`, it's a bit nicer to preserve the 16-bit element types https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,153 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +MachineInstrBuilder LaneOpDst; +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid); + break; +} +case Intrinsic::amdgcn_readlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1); + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0); + LaneOpDst = B.buildIntrinsic(IID, {S32}) + .addUse(Src0Valid) + .addUse(Src1) + .addUse(Src2Valid); +} +} + +Register LaneOpDstReg = LaneOpDst.getReg(0); +B.buildBitcast(DstReg, LaneOpDstReg); +MI.eraseFromParent(); +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Register Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0); + +MachineInstrBuilder LaneOpDst; +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid); + break; +} +case Intrinsic::amdgcn_readlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1); + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Register Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); + LaneOpDst = B.buildIntrinsic(IID, {S32}) + .addUse(Src0Valid) + .addUse(Src1) + .addUse(Src2Valid); +} +} + +Register LaneOpDstReg = LaneOpDst.getReg(0); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOpDstReg); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDstReg); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto Src0Parts = B.buildUnmerge(S32, Src0); + +switch (IID) { +case Intrinsic::amdgcn_readlane: { + Register Src1 = MI.getOperand(3).getReg(); + for (unsigned i = 0; i < NumParts; ++i) +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}) + .addUse(Src0Parts.getReg(i)) + .addUse(Src1)) +.getReg(0)); + break; +} +case Intrinsic::amdgcn_readfirstlane: { + + for (unsigned i = 0; i < NumParts; ++i) +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readfirstlane, {S32}) + .addUse(Src0Parts.getReg(i))) +.getReg(0)); + + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src1 = MI.getOperand(3).getReg(); + Register Src2 = MI.getOperand(4).getReg(); + auto Src2Parts = B.buildUnmerge(S32, Src2); + + for (unsigned i = 0; i < NumParts; ++i) +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_writelane, {S32}) + .addUse(Src0Parts.getReg(i)) + .addUse(Src1) + .addUse(Src2Parts.getReg(i))) +.getReg(0)); +} +} + +if (Ty.isPointerVector()) { + auto MergedVec = B.buildMergeLikeInstr( + LLT::vector(ElementCount::getFixed(NumParts), S32), PartialRes); + B.buildBitcast(DstReg, MergedVec); arsenm wrote: You cannot bitcast from an i32 vector to a pointer. You can merge the i32 pieces directly into the pointer though https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,153 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +MachineInstrBuilder LaneOpDst; +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid); + break; +} +case Intrinsic::amdgcn_readlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1); + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0); + LaneOpDst = B.buildIntrinsic(IID, {S32}) + .addUse(Src0Valid) + .addUse(Src1) + .addUse(Src2Valid); arsenm wrote: Missing break https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,153 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +MachineInstrBuilder LaneOpDst; +switch (IID) { arsenm wrote: This isn't quite what I meant; I mean trying to handle these all as they were the same create lane was ugly https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)
@@ -0,0 +1,111 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -fsyntax-only -verify %s + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +kernel void test () { + + int sgpr = 0, vgpr = 0, imm = 0; + + // sgpr constraints + __asm__ ("s_mov_b32 %0, %1" : "=s" (sgpr) : "s" (imm) : ); + + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec}" (imm) : ); + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exe" (imm) : ); // expected-error {{invalid input constraint '{exe' in asm}} + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec" (imm) : ); // expected-error {{invalid input constraint '{exec' in asm}} + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec}a" (imm) : ); // expected-error {{invalid input constraint '{exec}a' in asm}} arsenm wrote: Or, maybe, we can use this to finally get them to eliminate the asm in the first place https://github.com/llvm/llvm-project/pull/89796 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)
@@ -0,0 +1,111 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -fsyntax-only -verify %s + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +kernel void test () { + + int sgpr = 0, vgpr = 0, imm = 0; + + // sgpr constraints + __asm__ ("s_mov_b32 %0, %1" : "=s" (sgpr) : "s" (imm) : ); arsenm wrote: Missing the codegen tests for these? https://github.com/llvm/llvm-project/pull/89796 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)
@@ -0,0 +1,111 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -fsyntax-only -verify %s + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable + +kernel void test () { + + int sgpr = 0, vgpr = 0, imm = 0; + + // sgpr constraints + __asm__ ("s_mov_b32 %0, %1" : "=s" (sgpr) : "s" (imm) : ); + + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec}" (imm) : ); + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exe" (imm) : ); // expected-error {{invalid input constraint '{exe' in asm}} + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec" (imm) : ); // expected-error {{invalid input constraint '{exec' in asm}} + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec}a" (imm) : ); // expected-error {{invalid input constraint '{exec}a' in asm}} arsenm wrote: If we really have to tolerate asm, I wonder if we can ban physical register references through SPIRV (other than for maybe the handful of named special register). That is, allow exec/m0/vcc and disallow any numbered registers. The exact register counts move around from target to target and are exposing bonus ABI https://github.com/llvm/llvm-project/pull/89796 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)
@@ -2658,21 +2676,102 @@ IGroupLPDAGMutation::invertSchedBarrierMask(SchedGroupMask Mask) const { return InvertedMask; } +void IGroupLPDAGMutation::addSchedGroupBarrierRules() { + + /// Whether or not the instruction has no true data predecessors + /// with opcode \p Opc. + class NoOpcDataPred : public InstructionRule { + protected: +unsigned Opc; + + public: +bool apply(const SUnit *SU, const ArrayRef Collection, + SmallVectorImpl ) override { + return !std::any_of( + SU->Preds.begin(), SU->Preds.end(), [this](const SDep ) { +return Pred.getKind() == SDep::Data && + Pred.getSUnit()->getInstr()->getOpcode() == Opc; + }); +} + +NoOpcDataPred(unsigned Opc, const SIInstrInfo *TII, unsigned SGID, + bool NeedsCache = false) +: InstructionRule(TII, SGID, NeedsCache), Opc(Opc) {} + }; + + /// Whether or not the instruction has no write after read predecessors + /// with opcode \p Opc. + class NoOpcWARPred final : public InstructionRule { + protected: +unsigned Opc; + + public: +bool apply(const SUnit *SU, const ArrayRef Collection, + SmallVectorImpl ) override { + return !std::any_of( + SU->Preds.begin(), SU->Preds.end(), [this](const SDep ) { +return Pred.getKind() == SDep::Anti && + Pred.getSUnit()->getInstr()->getOpcode() == Opc; + }); +} +NoOpcWARPred(unsigned Opc, const SIInstrInfo *TII, unsigned SGID, + bool NeedsCache = false) +: InstructionRule(TII, SGID, NeedsCache), Opc(Opc){}; + }; + + SchedGroupBarrierRuleCallBacks = { + [](unsigned SGID, const SIInstrInfo *TII) { +return std::make_shared(AMDGPU::V_CNDMASK_B32_e64, TII, arsenm wrote: There's basically no reason to ever use shared_ptr, something is wrong if it's necessary over unique_ptr https://github.com/llvm/llvm-project/pull/85304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/85304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)
@@ -1284,7 +1284,29 @@ The AMDGPU backend implements the following LLVM IR intrinsics. | ``// 5 MFMA`` | ``__builtin_amdgcn_sched_group_barrier(8, 5, 0)`` - llvm.amdgcn.iglp_opt An **experimental** intrinsic for instruction group level parallelism. The intrinsic + llvm.amdgcn.sched.group.barrier.rule It has the same behavior as sched.group.barrier, except the intrinsic includes a fourth argument: + + - RuleMask : The bitmask of rules which are applied to the SchedGroup. + + The RuleMask is handled as a 64 bit integer, so 64 rules are encodable with a single mask. + + Users can access the intrinsic by specifying the optional fourth argument in sched_group_barrier builtin + + | ``// 1 VMEM read invoking rules 1 and 2`` + | ``__builtin_amdgcn_sched_group_barrier(32, 1, 0, 3)`` + + Currently available rules are: + - 0x: No rule. + - 0x0001: Instructions in the SchedGroup must not write to the same register + that a previously occuring V_CNDMASK_B32_e64 reads from. + - 0x0002: Instructions in the SchedGroup must not write to the same register + that a previously occuring V_PERM_B32_e64 reads from. + - 0x0004: Instructions in the SchedGroup must require data produced by a + V_CNDMASK_B32_e64. + - 0x0008: Instructions in the SchedGroup must require data produced by a + V_PERM_B32_e64. + arsenm wrote: These scheduling rules seem way too specific. Especially that it's pointing out specific instruction encodings, by the internal pseudoinstruction names https://github.com/llvm/llvm-project/pull/85304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)
https://github.com/arsenm commented: I don't understand how anyone is supposed to use this. This is exposing extremely specific, random low level details of the scheduling. Users claim they want scheduling controls, but what they actually want is the scheduler to just do the right thing. We should spent more energy making the scheduler sensible by default, instead of creating all of this complexity. If we're going to have something like this, it needs to have predefined macros instead of expecting reading https://github.com/llvm/llvm-project/pull/85304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [WIP] Expand variadic functions in IR (PR #89007)
@@ -247,7 +247,7 @@ Address CodeGen::emitMergePHI(CodeGenFunction , Address Addr1, bool CodeGen::isEmptyField(ASTContext , const FieldDecl *FD, bool AllowArrays, bool AsIfNoUniqueAddr) { - if (FD->isUnnamedBitField()) + if (FD->isUnnamedBitfield()) arsenm wrote: Unrelated change pulled in? https://github.com/llvm/llvm-project/pull/89007 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [WIP] Expand variadic functions in IR (PR #89007)
@@ -157,7 +157,7 @@ llvm::Value *CodeGen::emitRoundPointerUpToAlignment(CodeGenFunction , llvm::Value *RoundUp = CGF.Builder.CreateConstInBoundsGEP1_32( CGF.Builder.getInt8Ty(), Ptr, Align.getQuantity() - 1); return CGF.Builder.CreateIntrinsic( - llvm::Intrinsic::ptrmask, {Ptr->getType(), CGF.IntPtrTy}, +llvm::Intrinsic::ptrmask, {Ptr->getType(), CGF.IntPtrTy}, arsenm wrote: Spurious whitespace change? https://github.com/llvm/llvm-project/pull/89007 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [WIP] Expand variadic functions in IR (PR #89007)
@@ -24,6 +24,7 @@ MODULE_PASS("amdgpu-lower-ctor-dtor", AMDGPUCtorDtorLoweringPass()) MODULE_PASS("amdgpu-lower-module-lds", AMDGPULowerModuleLDSPass(*this)) MODULE_PASS("amdgpu-printf-runtime-binding", AMDGPUPrintfRuntimeBindingPass()) MODULE_PASS("amdgpu-unify-metadata", AMDGPUUnifyMetadataPass()) +MODULE_PASS("expand-variadics", ExpandVariadicsPass(ExpandVariadicsMode::Lowering)) arsenm wrote: Shouldn't need to list this in every target's PassRegistry, the generic one should be fine https://github.com/llvm/llvm-project/pull/89007 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][HIP] Warn when __AMDGCN_WAVEFRONT_SIZE is used in host code (PR #91478)
@@ -0,0 +1,55 @@ +/*=== __clang_hip_device_macro_guards.h - guards for HIP device macros -=== + * + * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. + * See https://llvm.org/LICENSE.txt for license information. + * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception + * + *===---=== + */ + +/* + * WARNING: This header is intended to be directly -include'd by + * the compiler and is not supposed to be included by users. + * + */ + +#ifndef __CLANG_HIP_DEVICE_MACRO_GUARDS_H__ +#define __CLANG_HIP_DEVICE_MACRO_GUARDS_H__ + +#if __HIP__ +#if !defined(__HIP_DEVICE_COMPILE__) +// The __AMDGCN_WAVEFRONT_SIZE macros cannot hold meaningful values during host +// compilation as devices are not initialized when the macros are defined and +// there may indeed be devices with differing wavefront sizes in the same +// system. This code issues diagnostics when the macros are used in host code. + +#undef __AMDGCN_WAVEFRONT_SIZE +#undef __AMDGCN_WAVEFRONT_SIZE__ + +// Reference __hip_device_macro_guard in a way that is legal in preprocessor +// directives and does not affect the value so that appropriate diagnostics are +// issued. Function calls, casts, or the comma operator would make the macro +// illegal for use in preprocessor directives. +#define __AMDGCN_WAVEFRONT_SIZE (!__hip_device_macro_guard ? 64 : 64) +#define __AMDGCN_WAVEFRONT_SIZE__ (!__hip_device_macro_guard ? 64 : 64) + +// This function is referenced by the macro in device functions during host +// compilation, it SHOULD NOT cause a diagnostic. +__attribute__((device)) static constexpr int __hip_device_macro_guard(void) { + return -1; +} + +// This function is referenced by the macro in host functions during host +// compilation, it SHOULD cause a diagnostic. +__attribute__(( +host, deprecated("The __AMDGCN_WAVEFRONT_SIZE macros do not correspond " + "to the device(s) when used in host code and may only " + "be used in device code."))) static constexpr int arsenm wrote: I thought I saw some junk trying to support pre-C++11 HIP, is that a concern here? Is this macro defined in OpenMP? If so can we do the same thing? https://github.com/llvm/llvm-project/pull/91478 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)
@@ -4408,6 +4409,42 @@ Target-Specific Extensions Clang supports some language features conditionally on some targets. +AMDGPU Language Extensions +-- + +__builtin_amdgcn_fence +^^ + +``__builtin_amdgcn_fence`` emits a fence. + +* ``unsigned`` atomic ordering, e.g. ``__ATOMIC_ACQUIRE`` +* ``const char *`` synchronization scope, e.g. ``workgroup`` +* Zero or more ``const char *`` address spaces names. + +The address spaces arguments must be string literals with known values, such as: + +* ``"local"`` +* ``"global"`` +* ``"image"`` + +If one or more address space name are provided, the code generator will attempt +to emit potentially faster instructions that only fence those address spaces. +Emitting such instructions may not always be possible and the compiler is free +to fence more aggressively. + +If no address spaces names are provided, all address spaces are fenced. + +.. code-block:: c++ + + // Fence all address spaces. + __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup"); + __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent"); + + // Fence only requested address spaces. + __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local") arsenm wrote: We randomly change between HSA and OpenCL terminology. Maybe we should call "local" "groupsegment"? I guess the ISA manuals call it "local data share" https://github.com/llvm/llvm-project/pull/78572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add amdgpu-as MMRA for fences (PR #78572)
@@ -1,22 +1,113 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4 // REQUIRES: amdgpu-registered-target // RUN: %clang_cc1 %s -emit-llvm -O0 -o - \ -// RUN: -triple=amdgcn-amd-amdhsa | opt -S | FileCheck %s +// RUN: -triple=amdgcn-amd-amdhsa | FileCheck %s +// CHECK-LABEL: define dso_local void @_Z25test_memory_fence_successv( +// CHECK-SAME: ) #[[ATTR0:[0-9]+]] { +// CHECK-NEXT: entry: +// CHECK-NEXT:fence syncscope("workgroup") seq_cst +// CHECK-NEXT:fence syncscope("agent") acquire +// CHECK-NEXT:fence seq_cst +// CHECK-NEXT:fence syncscope("agent") acq_rel +// CHECK-NEXT:fence syncscope("workgroup") release +// CHECK-NEXT:ret void +// void test_memory_fence_success() { - // CHECK-LABEL: test_memory_fence_success - // CHECK: fence syncscope("workgroup") seq_cst __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup"); - // CHECK: fence syncscope("agent") acquire __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent"); - // CHECK: fence seq_cst __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, ""); - // CHECK: fence syncscope("agent") acq_rel __builtin_amdgcn_fence(4, "agent"); - // CHECK: fence syncscope("workgroup") release __builtin_amdgcn_fence(3, "workgroup"); } + +// CHECK-LABEL: define dso_local void @_Z10test_localv( +// CHECK-SAME: ) #[[ATTR0]] { +// CHECK-NEXT: entry: +// CHECK-NEXT:fence syncscope("workgroup") seq_cst, !mmra [[META3:![0-9]+]] +// CHECK-NEXT:fence syncscope("agent") acquire, !mmra [[META3]] +// CHECK-NEXT:fence seq_cst, !mmra [[META3]] +// CHECK-NEXT:fence syncscope("agent") acq_rel, !mmra [[META3]] +// CHECK-NEXT:fence syncscope("workgroup") release, !mmra [[META3]] +// CHECK-NEXT:ret void +// +void test_local() { + __builtin_amdgcn_fence( __ATOMIC_SEQ_CST, "workgroup", "local"); + + __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent", "local"); + + __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "", "local"); + + __builtin_amdgcn_fence(4, "agent", "local"); + + __builtin_amdgcn_fence(3, "workgroup", "local"); +} + + +// CHECK-LABEL: define dso_local void @_Z11test_globalv( +// CHECK-SAME: ) #[[ATTR0]] { +// CHECK-NEXT: entry: +// CHECK-NEXT:fence syncscope("workgroup") seq_cst, !mmra [[META4:![0-9]+]] +// CHECK-NEXT:fence syncscope("agent") acquire, !mmra [[META4]] +// CHECK-NEXT:fence seq_cst, !mmra [[META4]] +// CHECK-NEXT:fence syncscope("agent") acq_rel, !mmra [[META4]] +// CHECK-NEXT:fence syncscope("workgroup") release, !mmra [[META4]] +// CHECK-NEXT:ret void +// +void test_global() { + __builtin_amdgcn_fence( __ATOMIC_SEQ_CST, "workgroup", "global"); + + __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent", "global"); + + __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "", "global"); + + __builtin_amdgcn_fence(4, "agent", "global"); + + __builtin_amdgcn_fence(3, "workgroup", "global"); +} + +// CHECK-LABEL: define dso_local void @_Z10test_imagev( +// CHECK-SAME: ) #[[ATTR0]] { +// CHECK-NEXT: entry: +// CHECK-NEXT:fence syncscope("workgroup") seq_cst, !mmra [[META5:![0-9]+]] +// CHECK-NEXT:fence syncscope("agent") acquire, !mmra [[META5]] +// CHECK-NEXT:fence seq_cst, !mmra [[META5]] +// CHECK-NEXT:fence syncscope("agent") acq_rel, !mmra [[META5]] +// CHECK-NEXT:fence syncscope("workgroup") release, !mmra [[META5]] +// CHECK-NEXT:ret void +// +void test_image() { + __builtin_amdgcn_fence( __ATOMIC_SEQ_CST, "workgroup", "image"); + + __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent", "image"); + + __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "", "image"); + + __builtin_amdgcn_fence(4, "agent", "image"); + + __builtin_amdgcn_fence(3, "workgroup", "image"); +} + +// CHECK-LABEL: define dso_local void @_Z10test_mixedv( +// CHECK-SAME: ) #[[ATTR0]] { +// CHECK-NEXT: entry: +// CHECK-NEXT:fence syncscope("workgroup") seq_cst, !mmra [[META6:![0-9]+]] +// CHECK-NEXT:fence syncscope("workgroup") seq_cst, !mmra [[META7:![0-9]+]] +// CHECK-NEXT:ret void +// +void test_mixed() { + __builtin_amdgcn_fence( __ATOMIC_SEQ_CST, "workgroup", "image", "global"); + __builtin_amdgcn_fence( __ATOMIC_SEQ_CST, "workgroup", "image", "local", "global"); +} arsenm wrote: Maybe test repeated AS name https://github.com/llvm/llvm-project/pull/78572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [modules] Accept equivalent module caches from different symlink (PR #90925)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/90925 >From 4760ebce0ff7725f4bb75f5107f551d867e4db6d Mon Sep 17 00:00:00 2001 From: Ellis Hoag Date: Thu, 2 May 2024 17:47:38 -0700 Subject: [PATCH 1/4] [modules] Accept equivalent module caches from different symlink Use `fs::equivalent()`, which follows symlinks, to check if two module cache paths are equivalent. This prevents a PCH error when building from a different path that is a symlink of the original. ``` error: PCH was compiled with module cache path '/home/foo/blah/ModuleCache/2IBP1TNT8OR8D', but the path is currently '/data/users/foo/blah/ModuleCache/2IBP1TNT8OR8D' 1 error generated. ``` --- clang/lib/Serialization/ASTReader.cpp | 20 +--- clang/test/Modules/module-symlink.m | 11 +++ 2 files changed, 20 insertions(+), 11 deletions(-) create mode 100644 clang/test/Modules/module-symlink.m diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp index 0ef57a3ea804ef..c20ead8b865692 100644 --- a/clang/lib/Serialization/ASTReader.cpp +++ b/clang/lib/Serialization/ASTReader.cpp @@ -839,17 +839,15 @@ static bool checkHeaderSearchOptions(const HeaderSearchOptions , DiagnosticsEngine *Diags, const LangOptions , const PreprocessorOptions ) { - if (LangOpts.Modules) { -if (SpecificModuleCachePath != ExistingModuleCachePath && -!PPOpts.AllowPCHWithDifferentModulesCachePath) { - if (Diags) -Diags->Report(diag::err_pch_modulecache_mismatch) - << SpecificModuleCachePath << ExistingModuleCachePath; - return true; -} - } - - return false; + if (!LangOpts.Modules || PPOpts.AllowPCHWithDifferentModulesCachePath || + SpecificModuleCachePath == ExistingModuleCachePath || + llvm::sys::fs::equivalent(SpecificModuleCachePath, +ExistingModuleCachePath)) +return false; + if (Diags) +Diags->Report(diag::err_pch_modulecache_mismatch) +<< SpecificModuleCachePath << ExistingModuleCachePath; + return true; } bool PCHValidator::ReadHeaderSearchOptions(const HeaderSearchOptions , diff --git a/clang/test/Modules/module-symlink.m b/clang/test/Modules/module-symlink.m new file mode 100644 index 00..be447449a0e81e --- /dev/null +++ b/clang/test/Modules/module-symlink.m @@ -0,0 +1,11 @@ +// RUN: rm -rf %t +// RUN: %clang_cc1 -fmodules-cache-path=%t/modules -fmodules -fimplicit-module-maps -I %S/Inputs -emit-pch -o %t.pch %s -verify + +// RUN: ln -s %t/modules %t/modules.symlink +// RUN: %clang_cc1 -fmodules-cache-path=%t/modules.symlink -fmodules -fimplicit-module-maps -I %S/Inputs -include-pch %t.pch %s -verify + +// expected-no-diagnostics + +@import ignored_macros; + +struct Point p; >From 490eefe98e3dd020ff3e51c7f817ec2b3d3a2663 Mon Sep 17 00:00:00 2001 From: Ellis Hoag Date: Fri, 3 May 2024 09:50:11 -0700 Subject: [PATCH 2/4] Require shell to fix windows test --- clang/test/Modules/module-symlink.m | 2 ++ 1 file changed, 2 insertions(+) diff --git a/clang/test/Modules/module-symlink.m b/clang/test/Modules/module-symlink.m index be447449a0e81e..9a69186c5ea28f 100644 --- a/clang/test/Modules/module-symlink.m +++ b/clang/test/Modules/module-symlink.m @@ -1,3 +1,5 @@ +// REQUIRES: shell + // RUN: rm -rf %t // RUN: %clang_cc1 -fmodules-cache-path=%t/modules -fmodules -fimplicit-module-maps -I %S/Inputs -emit-pch -o %t.pch %s -verify >From 6e58177107f854f42d3cdc70e796c425a1797798 Mon Sep 17 00:00:00 2001 From: Ellis Hoag Date: Fri, 3 May 2024 10:34:35 -0700 Subject: [PATCH 3/4] Use VFS to check if files are equal --- clang/lib/Serialization/ASTReader.cpp | 25 +++ clang/test/Modules/module-symlink.m | 1 + llvm/include/llvm/Support/VirtualFileSystem.h | 4 +++ llvm/lib/Support/VirtualFileSystem.cpp| 10 4 files changed, 29 insertions(+), 11 deletions(-) diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp index c20ead8b865692..d35c870926f96e 100644 --- a/clang/lib/Serialization/ASTReader.cpp +++ b/clang/lib/Serialization/ASTReader.cpp @@ -833,16 +833,18 @@ bool SimpleASTReaderListener::ReadPreprocessorOptions( /// against the header search options in an existing preprocessor. /// /// \param Diags If non-null, produce diagnostics for any mismatches incurred. -static bool checkHeaderSearchOptions(const HeaderSearchOptions , +static bool checkHeaderSearchOptions(llvm::vfs::FileSystem , StringRef SpecificModuleCachePath, StringRef ExistingModuleCachePath, DiagnosticsEngine *Diags, const LangOptions , const PreprocessorOptions ) { if
[clang] [clang][SPIR-V] Always add convergence intrinsics (PR #88918)
Nathan =?utf-8?q?Gau=C3=ABr?= , Nathan =?utf-8?q?Gau=C3=ABr?= , Nathan =?utf-8?q?Gau=C3=ABr?= , Nathan =?utf-8?q?Gau=C3=ABr?= Message-ID: In-Reply-To: @@ -1586,6 +1586,12 @@ class CodeGenModule : public CodeGenTypeCache { void AddGlobalDtor(llvm::Function *Dtor, int Priority = 65535, bool IsDtorAttrFunc = false); + // Return whether structured convergence intrinsics should be generated for + // this target. + bool shouldEmitConvergenceTokens() const { +return getTriple().isSPIRVLogical(); arsenm wrote: Should add a TODO this should just be unconditional in the future https://github.com/llvm/llvm-project/pull/88918 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][SPIR-V] Always add convergence intrinsics (PR #88918)
Nathan =?utf-8?q?Gau=C3=ABr?= , Nathan =?utf-8?q?Gau=C3=ABr?= , Nathan =?utf-8?q?Gau=C3=ABr?= , Nathan =?utf-8?q?Gau=C3=ABr?= Message-ID: In-Reply-To: https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/88918 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][SPIR-V] Always add convergence intrinsics (PR #88918)
Nathan =?utf-8?q?Gauër?= , Nathan =?utf-8?q?Gauër?= , Nathan =?utf-8?q?Gauër?= , Nathan =?utf-8?q?Gauër?= Message-ID: In-Reply-To: https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/88918 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][SPIR-V] Always add convergence intrinsics (PR #88918)
Nathan =?utf-8?q?Gauër?= , Nathan =?utf-8?q?Gauër?= , Nathan =?utf-8?q?Gauër?= , Nathan =?utf-8?q?Gauër?= Message-ID: In-Reply-To: https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/88918 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -504,3 +508,16 @@ def AMDGPUdiv_fmas : PatFrags<(ops node:$src0, node:$src1, node:$src2, node:$vcc def AMDGPUperm : PatFrags<(ops node:$src0, node:$src1, node:$src2), [(int_amdgcn_perm node:$src0, node:$src1, node:$src2), (AMDGPUperm_impl node:$src0, node:$src1, node:$src2)]>; + +def AMDGPUreadlane : PatFrags<(ops node:$src0, node:$src1), + [(int_amdgcn_readlane node:$src0, node:$src1), + (AMDGPUreadlane_impl node:$src0, node:$src1)]>; + +def AMDGPUreadfirstlane : PatFrags<(ops node:$src), + [(int_amdgcn_readfirstlane node:$src), + (AMDGPUreadfirstlane_impl node:$src)]>; + +def AMDGPUwritelane : PatFrags<(ops node:$src0, node:$src1, node:$src2), + [(int_amdgcn_writelane node:$src0, node:$src1, node:$src2), + (AMDGPUwritelane_impl node:$src0, node:$src1, node:$src2)]>; + arsenm wrote: Missing newline at end of file https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,130 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register , Register , arsenm wrote: I think this helper is just making things more confusing. You can just handle the 3 cases separately with unmerge logic https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,130 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register , Register , + Register ) -> Register { +auto LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0); +if (Src2.isValid()) + return (LaneOpDst.addUse(Src1).addUse(Src2)).getReg(0); +if (Src1.isValid()) + return (LaneOpDst.addUse(Src1)).getReg(0); +return LaneOpDst.getReg(0); + }; + + Register Src1, Src2, Src0Valid, Src2Valid; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +if (Src2.isValid()) + Src2Valid = B.buildBitcast(S32, Src2).getReg(0); +Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid); +B.buildBitcast(DstReg, LaneOp); +MI.eraseFromParent(); +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0); + +if (Src2.isValid()) { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); +} +Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOp); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOp); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto Src0Parts = B.buildUnmerge(S32, Src0); + +switch (IID) { +case Intrinsic::amdgcn_readlane: { + Register Src1 = MI.getOperand(3).getReg(); + for (unsigned i = 0; i < NumParts; ++i) +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}) + .addUse(Src0Parts.getReg(i)) + .addUse(Src1)) +.getReg(0)); arsenm wrote: We should really add a buildIntrinsic overload that just takes the array of inputs like for other instructions https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5982,6 +5982,68 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [&](SDValue , SDValue , SDValue , arsenm wrote: SDValue should be passed by value https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,130 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register , Register , + Register ) -> Register { arsenm wrote: Register should be passed by value https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,130 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register , Register , + Register ) -> Register { +auto LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0); +if (Src2.isValid()) + return (LaneOpDst.addUse(Src1).addUse(Src2)).getReg(0); +if (Src1.isValid()) + return (LaneOpDst.addUse(Src1)).getReg(0); arsenm wrote: Extra parentheses around this https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)
@@ -0,0 +1,25 @@ +// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -disable-O0-optnone -emit-llvm \ +// RUN: %s -o - | opt -S -passes=mem2reg | FileCheck %s + +// CHECK-LABEL: define dso_local half @test_convert_from_bf16_to_fp16( +// CHECK-SAME: bfloat noundef [[A:%.*]]) #[[ATTR0:[0-9]+]] { +// CHECK-NEXT: entry: +// CHECK-NEXT:[[FPEXT:%.*]] = fpext bfloat [[A]] to float +// CHECK-NEXT:[[FPTRUNC:%.*]] = fptrunc float [[FPEXT]] to half +// CHECK-NEXT:ret half [[FPTRUNC]] +// +_Float16 test_convert_from_bf16_to_fp16(__bf16 a) { +return (_Float16)a; +} + +// CHECK-LABEL: define dso_local bfloat @test_convert_from_fp16_to_bf16( +// CHECK-SAME: half noundef [[A:%.*]]) #[[ATTR0]] { +// CHECK-NEXT: entry: +// CHECK-NEXT:[[FPEXT:%.*]] = fpext half [[A]] to float +// CHECK-NEXT:[[FPTRUNC:%.*]] = fptrunc float [[FPEXT]] to bfloat +// CHECK-NEXT:ret bfloat [[FPTRUNC]] +// +__bf16 test_convert_from_fp16_to_bf16(_Float16 a) { +return (__bf16)a; +} + arsenm wrote: I think these tests need to be additive. The vector behavior seems to be different between standard C and the proper vector languages? https://github.com/llvm/llvm-project/pull/89051 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)
arsenm wrote: > ping Ping Do you have another review comment? This has now confused me. You should roll back to the case where you only changed the scalar behavior. Any vector behavior change should be a separate PR, if that is even correct. I would still like to know what the gcc behavior is in this case https://github.com/llvm/llvm-project/pull/89051 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Allow the `__builtin_flt_rounds` functions on AMDGPU (PR #90994)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/90994 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
@@ -5386,6 +5386,94 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) +return true; + + if (Size < 32) { +auto Ext = B.buildAnyExt(LLT::scalar(32), Src0).getReg(0); +auto LaneOpDst = +B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}).addUse(Ext); +if (IID == Intrinsic::amdgcn_readlane || +IID == Intrinsic::amdgcn_writelane) { + auto Src1 = MI.getOperand(3).getReg(); + LaneOpDst = LaneOpDst.addUse(Src1); + if (IID == Intrinsic::amdgcn_writelane) { +auto Src2 = MI.getOperand(4).getReg(); +auto Ext2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0); +LaneOpDst = LaneOpDst.addUse(Ext2); + } +} +B.buildTrunc(DstReg, LaneOpDst).getReg(0); arsenm wrote: The .getReg(0) does nothing here https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
@@ -504,3 +508,15 @@ def AMDGPUdiv_fmas : PatFrags<(ops node:$src0, node:$src1, node:$src2, node:$vcc def AMDGPUperm : PatFrags<(ops node:$src0, node:$src1, node:$src2), [(int_amdgcn_perm node:$src0, node:$src1, node:$src2), (AMDGPUperm_impl node:$src0, node:$src1, node:$src2)]>; + +def AMDGPUreadlane : PatFrags<(ops node:$src0, node:$src1), + [(int_amdgcn_readlane node:$src0, node:$src1), + (AMDGPUreadlane_impl node:$src0, node:$src1)]>; + +def AMDGPUreadfirstlane : PatFrags<(ops node:$src), + [(int_amdgcn_readfirstlane node:$src), + (AMDGPUreadfirstlane_impl node:$src)]>; + +def AMDGPUwritelane : PatFrags<(ops node:$src0, node:$src1, node:$src2), + [(int_amdgcn_writelane node:$src0, node:$src1, node:$src2), + (AMDGPUwritelane_impl node:$src0, node:$src1, node:$src2)]>; arsenm wrote: Missing newline end of file https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
@@ -5386,6 +5386,94 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) +return true; + + if (Size < 32) { +auto Ext = B.buildAnyExt(LLT::scalar(32), Src0).getReg(0); +auto LaneOpDst = +B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}).addUse(Ext); +if (IID == Intrinsic::amdgcn_readlane || +IID == Intrinsic::amdgcn_writelane) { + auto Src1 = MI.getOperand(3).getReg(); + LaneOpDst = LaneOpDst.addUse(Src1); + if (IID == Intrinsic::amdgcn_writelane) { +auto Src2 = MI.getOperand(4).getReg(); +auto Ext2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0); +LaneOpDst = LaneOpDst.addUse(Ext2); + } +} +B.buildTrunc(DstReg, LaneOpDst).getReg(0); + } else if ((Size % 32) == 0) { +SmallVector Src0Parts, PartialRes; +unsigned NumParts = Size / 32; +auto WideReg = MRI.createGenericVirtualRegister(LLT::scalar(NumParts * 32)); +for (unsigned i = 0; i < NumParts; ++i) { + Src0Parts.push_back(MRI.createGenericVirtualRegister(S32)); +} + +B.buildUnmerge(Src0Parts, Src0); arsenm wrote: buildUnmerge should handle all of this for you if you just pass the scalar type https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
@@ -6091,6 +5982,70 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + auto VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [&](SDValue , SDValue , SDValue , + MVT VT) -> SDValue { +return (Src2.getNode() arsenm wrote: Don't need .getNode for boolean test https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
@@ -5386,6 +5386,94 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) +return true; + + if (Size < 32) { +auto Ext = B.buildAnyExt(LLT::scalar(32), Src0).getReg(0); +auto LaneOpDst = +B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}).addUse(Ext); +if (IID == Intrinsic::amdgcn_readlane || +IID == Intrinsic::amdgcn_writelane) { + auto Src1 = MI.getOperand(3).getReg(); + LaneOpDst = LaneOpDst.addUse(Src1); + if (IID == Intrinsic::amdgcn_writelane) { +auto Src2 = MI.getOperand(4).getReg(); +auto Ext2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0); +LaneOpDst = LaneOpDst.addUse(Ext2); + } +} +B.buildTrunc(DstReg, LaneOpDst).getReg(0); arsenm wrote: Should just early exit at this point https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
@@ -6091,6 +5982,70 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering , SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, + SelectionDAG ) { + auto VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [&](SDValue , SDValue , SDValue , + MVT VT) -> SDValue { +return (Src2.getNode() +? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1.getNode() +? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) +: DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2, Src0Valid, Src2Valid; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +if (VT == MVT::i32) + // Already legal + return SDValue(); +Src0Valid = DAG.getBitcast(IntVT, Src0); +if (Src2.getNode()) + Src2Valid = DAG.getBitcast(IntVT, Src2); +auto LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid, MVT::i32); +return DAG.getBitcast(VT, LaneOp); + } + + if (ValSize < 32) { +auto InitBitCast = DAG.getBitcast(IntVT, Src0); +Src0Valid = DAG.getAnyExtOrTrunc(InitBitCast, SL, MVT::i32); +if (Src2.getNode()) { + auto Src2Cast = DAG.getBitcast(IntVT, Src2); + Src2Valid = DAG.getAnyExtOrTrunc(Src2Cast, SL, MVT::i32); +} +auto LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid, MVT::i32); +auto Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT); +return DAG.getBitcast(VT, Trunc); + } + + if ((ValSize % 32) == 0) { +MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32); +Src0Valid = DAG.getBitcast(VecVT, Src0); + +if (Src2.getNode()) + Src2Valid = DAG.getBitcast(VecVT, Src2); + +auto LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid, VecVT); +auto UnrolledLaneOp = DAG.UnrollVectorOp(LaneOp.getNode()); arsenm wrote: no autos https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add OpenCL-specific fence address space masks (PR #78572)
arsenm wrote: > I'm now wondering if adding a new builtin is needed at all, or if it should > just be part of the original builtin? It's an additive change. Maybe? > > Should we also rename the MMRA to `amdgpu-fence-as` (remove OpenCL from the > name) ? > I definitely do not want to maintain any language names in anything https://github.com/llvm/llvm-project/pull/78572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [IR] Add getelementptr nusw and nuw flags (PR #90824)
@@ -316,3 +316,82 @@ define <2 x i32> @test_trunc_both_reversed_vector(<2 x i64> %a) { %res = trunc nsw nuw <2 x i64> %a to <2 x i32> ret <2 x i32> %res } + +define ptr @gep_nuw(ptr %p, i64 %idx) { +; CHECK: %gep = getelementptr nuw i8, ptr %p, i64 %idx + %gep = getelementptr nuw i8, ptr %p, i64 %idx + ret ptr %gep +} + +define ptr @gep_inbounds_nuw(ptr %p, i64 %idx) { +; CHECK: %gep = getelementptr inbounds nuw i8, ptr %p, i64 %idx + %gep = getelementptr inbounds nuw i8, ptr %p, i64 %idx + ret ptr %gep +} + +define ptr @gep_nusw(ptr %p, i64 %idx) { +; CHECK: %gep = getelementptr nusw i8, ptr %p, i64 %idx + %gep = getelementptr nusw i8, ptr %p, i64 %idx + ret ptr %gep +} + +; inbounds implies nusw, so the flag is not printed back. +define ptr @gep_inbounds_nusw(ptr %p, i64 %idx) { +; CHECK: %gep = getelementptr inbounds i8, ptr %p, i64 %idx + %gep = getelementptr inbounds nusw i8, ptr %p, i64 %idx + ret ptr %gep +} + +define ptr @gep_nusw_nuw(ptr %p, i64 %idx) { +; CHECK: %gep = getelementptr nusw nuw i8, ptr %p, i64 %idx + %gep = getelementptr nusw nuw i8, ptr %p, i64 %idx + ret ptr %gep +} + +define ptr @gep_inbounds_nusw_nuw(ptr %p, i64 %idx) { +; CHECK: %gep = getelementptr inbounds nuw i8, ptr %p, i64 %idx + %gep = getelementptr inbounds nusw nuw i8, ptr %p, i64 %idx + ret ptr %gep +} + +define ptr @gep_nuw_nusw_inbounds(ptr %p, i64 %idx) { +; CHECK: %gep = getelementptr inbounds nuw i8, ptr %p, i64 %idx + %gep = getelementptr nuw nusw inbounds i8, ptr %p, i64 %idx + ret ptr %gep +} + +define ptr @const_gep_nuw(ptr %p, i64 %idx) { +; CHECK: ret ptr getelementptr nuw (i8, ptr @addr, i64 100) + ret ptr getelementptr nuw (i8, ptr @addr, i64 100) +} + +define ptr @const_gep_inbounds_nuw(ptr %p, i64 %idx) { +; CHECK: ret ptr getelementptr inbounds nuw (i8, ptr @addr, i64 100) + ret ptr getelementptr inbounds nuw (i8, ptr @addr, i64 100) +} + +define ptr @const_gep_nusw(ptr %p, i64 %idx) { +; CHECK: ret ptr getelementptr nusw (i8, ptr @addr, i64 100) + ret ptr getelementptr nusw (i8, ptr @addr, i64 100) +} + +; inbounds implies nusw, so the flag is not printed back. +define ptr @const_gep_inbounds_nusw(ptr %p, i64 %idx) { +; CHECK: ret ptr getelementptr inbounds (i8, ptr @addr, i64 100) + ret ptr getelementptr inbounds nusw (i8, ptr @addr, i64 100) +} + +define ptr @const_gep_nusw_nuw(ptr %p, i64 %idx) { +; CHECK: ret ptr getelementptr nusw nuw (i8, ptr @addr, i64 100) + ret ptr getelementptr nusw nuw (i8, ptr @addr, i64 100) +} + +define ptr @const_gep_inbounds_nusw_nuw(ptr %p, i64 %idx) { +; CHECK: ret ptr getelementptr inbounds nuw (i8, ptr @addr, i64 100) + ret ptr getelementptr inbounds nusw nuw (i8, ptr @addr, i64 100) +} + +define ptr @const_gep_nuw_nusw_inbounds(ptr %p, i64 %idx) { +; CHECK: ret ptr getelementptr inbounds nuw (i8, ptr @addr, i64 100) + ret ptr getelementptr nuw nusw inbounds (i8, ptr @addr, i64 100) +} arsenm wrote: Maybe test non-0 AS and vectors? https://github.com/llvm/llvm-project/pull/90824 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Clean up denormal handling with -ffp-model, -ffast-math, etc. (PR #89477)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89477 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Clean up denormal handling with -ffp-model, -ffast-math, etc. (PR #89477)
@@ -1462,6 +1460,14 @@ floating point semantic models: precise (the default), strict, and fast. "allow_approximate_fns", "off", "off", "on" "allow_reassociation", "off", "off", "on" +The ``-fp-model`` option does not modify the "fdenormal-fp-math" or +"fdenormal-fp-math-f32" settings, but it does have an impact on whether arsenm wrote: IIRC denormal-fp-math-f32 is only a cc1 flag not exposed to end users https://github.com/llvm/llvm-project/pull/89477 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add OpenCL-specific fence address space masks (PR #78572)
@@ -18319,6 +18320,26 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID, return nullptr; } +void CodeGenFunction::AddAMDGCNAddressSpaceMMRA(llvm::Instruction *Inst, +llvm::Value *ASMask) { + constexpr const char *Tag = "opencl-fence-mem"; + + uint64_t Mask = cast(ASMask)->getZExtValue(); + if (Mask == 0) +return; + + // 3 bits can be set: local, global, image in that order. + LLVMContext = Inst->getContext(); + SmallVector MMRAs; + if (Mask & (1 << 0)) arsenm wrote: Space separated is weird. I meant more __builtin_amdgcn_something_fence("somesyncscope", ordering, "addrspace0", "addrspace1", ...) https://github.com/llvm/llvm-project/pull/78572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits