pengfei added inline comments.
================ Comment at: llvm/include/llvm/CodeGen/Passes.h:496 + + /// The pass transform amx intrinsics to scalar operation if the function has + /// optnone attribute or it is O0. ---------------- transforms ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:1 +//===- llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp ----------------------===// +// ---------------- We usually comment as //===--- filename - description ---===// See `head -n1 llvm/lib/Target/X86/*.cpp` ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:51 + BasicBlock *Header = BasicBlock::Create( + Preheader->getContext(), Name + ".header", Preheader->getParent(), Exit); + BasicBlock *Body = BasicBlock::Create(Header->getContext(), Name + ".body", ---------------- Ctx ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:88 + +template <Intrinsic::ID IntrID, + typename = typename std::enable_if< ---------------- Can we just use `template <bool IsLoad>`? I think it also can reduce the branch. ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:99 + Loop *RowLoop = LI.AllocateLoop(); + Loop *ColLoop = LI.AllocateLoop(); + RowLoop->addChildLoop(ColLoop); ---------------- Not sure how about the arithmetic intrinsics. But at least for load and store intrinsics we can use LLVM intrinsic `llvm.masked.load/store` to reduce the inner loop. ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:166 + Value *Vec = nullptr; + if (auto *BitCast = dyn_cast<BitCastInst>(Tile)) + Vec = BitCast->getOperand(0); ---------------- Maybe we can just use cast to help to raise the assertion. ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:223 + Value *VecC, *VecA, *VecB; + if (auto *BitCast = dyn_cast<BitCastInst>(Acc)) + VecC = BitCast->getOperand(0); ---------------- You can use cast to help to check the failure so that VecA/B/C won't be uninitialized. ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:229 + // to vector. However with -O0, it doesn't happen. + if (auto *BitCast = dyn_cast<BitCastInst>(LHS)) + VecA = BitCast->getOperand(0); ---------------- ditto ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:231 + VecA = BitCast->getOperand(0); + assert(VecA->getType()->isVectorTy() && "bitcast from non-v256i32 to x86amx"); + if (auto *BitCast = dyn_cast<BitCastInst>(RHS)) ---------------- Should check it is V256I32? ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:232 + assert(VecA->getType()->isVectorTy() && "bitcast from non-v256i32 to x86amx"); + if (auto *BitCast = dyn_cast<BitCastInst>(RHS)) + VecB = BitCast->getOperand(0); ---------------- ditto ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:288 + // %acc = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %131) + // %neweltc = add i32 %elt, %acc + // %NewVecC = insertelement <256 x i32> %vec.c.inner.phi, i32 %neweltc, ---------------- eltc? ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:311 + Value *ResElt = B.CreateAdd(EltC, SubVecR); + Value *NewVecC = B.CreateInsertElement(VecCPhi, ResElt, IdxC); + Value *NewVecD = B.CreateInsertElement(VecDPhi, ResElt, IdxC); ---------------- Is it necessary to insert the ResElt to VecC? ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:340 + IntrID == Intrinsic::x86_tilestored64_internal>::type> + bool lowerTileLoadStore(Instruction *TileLoad); + bool lowerTileLoad(Instruction *TileLoad); ---------------- TileLoadStore ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:341 + bool lowerTileLoadStore(Instruction *TileLoad); + bool lowerTileLoad(Instruction *TileLoad); + bool lowerTileDPBSSD(Instruction *TileDPBSSD); ---------------- Forgot to remove? ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:343 + bool lowerTileDPBSSD(Instruction *TileDPBSSD); + bool lowerTileStore(Instruction *TileStore); + bool lowerTileZero(Instruction *TileZero); ---------------- ditto ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:387 + +template <Intrinsic::ID IntrID, + typename = typename std::enable_if< ---------------- ditto ================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:391 + IntrID == Intrinsic::x86_tilestored64_internal>::type> +bool X86LowerAMXIntrinsics::lowerTileLoadStore(Instruction *TileLoad) { + Value *M, *N, *Ptr, *Stride, *Tile; ---------------- ditto ================ Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:333 + TargetMachine *TM = &getAnalysis<TargetPassConfig>().getTM<TargetMachine>(); + if (F.hasFnAttribute(Attribute::OptimizeNone) || + TM->getOptLevel() == CodeGenOpt::None) ---------------- ditto ================ Comment at: llvm/test/CodeGen/X86/AMX/amx-low-intrinsics-no-bitcast.ll:1 +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt -mtriple=x86_64 -lower-amx-intrinsics %s -S | FileCheck %s ---------------- Better name it amx-low-intrinsics-no-amx-bitcast.ll ================ Comment at: llvm/test/CodeGen/X86/AMX/amx-low-intrinsics-no-bitcast.ll:13 +; CHECK-NEXT: br label [[TILELOAD_SCALARIZE_ROWS_BODY:%.*]] +; CHECK: tileload.scalarize.rows.body: +; CHECK-NEXT: br label [[TILELOAD_SCALARIZE_COLS_HEADER:%.*]] ---------------- It seems the body block is not necessary ================ Comment at: llvm/test/CodeGen/X86/AMX/amx-low-intrinsics-no-bitcast.ll:19 +; CHECK-NEXT: br label [[TILELOAD_SCALARIZE_COLS_BODY:%.*]] +; CHECK: tileload.scalarize.cols.body: +; CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[TILELOAD_SCALARIZE_ROWS_IV]] to i64 ---------------- ditto. The lable `TILELOAD_SCALARIZE_COLS_BODY` even not been used. ================ Comment at: llvm/test/CodeGen/X86/AMX/amx-low-intrinsics-no-bitcast.ll:31 +; CHECK-NEXT: br label [[TILELOAD_SCALARIZE_COLS_LATCH]] +; CHECK: tileload.scalarize.cols.latch: +; CHECK-NEXT: [[TILELOAD_SCALARIZE_COLS_STEP]] = add i16 [[TILELOAD_SCALARIZE_COLS_IV]], 1 ---------------- I think cols.latch is not necessary either. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D93594/new/ https://reviews.llvm.org/D93594 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits