[llvm-branch-commits] [llvm] [AArch64] Add new pass after VirtRegRewriter to add implicit-defs (PR #174188)

via llvm-branch-commits Fri, 02 Jan 2026 00:40:27 -0800

llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

<details>
<summary>Changes</summary>

When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds extra 
implicit-def's to instructions that define the low N bits of a GPR/FPR register 
to represent that the top bits are written, because all AArch64 instructions 
that write the low bits of a GPR/FPR also implicitly zero the top bits.

These semantics are originally represented in the MIR using `SUBREG_TO_REG`, 
but during register coalescing this information is lost and when rewriting 
virtual -&gt; physical registers the implicit-defs are not added to represent 
the the top bits are written.

There have been several attempts to fix this in the coalescer (#<!-- 
-->168353), but each iteration has exposed new bugs and the patch had to be 
reverted. Additionally, the concept of adding 'implicit-def' of a virtual 
register during the register allocation process is particularly fragile and 
many places don't expect it (for example in `X86::commuteInstructionImpl` the 
code only looks at specific operands and does not consider implicit-defs. 
Similar in `SplitEditor::addDeadDef` where it traverses operand 'defs' rather 
than 'all_defs').

We want a temporary solution that doesn't impact other targets and is simpler 
and less intrusive than the patch proposed for the register coalescer so that 
we can enable SRLT to make better use of SVE/SME multi-vector instructions 
while we work on a more permanent solution that requires rewriting a large part 
of the AArch64 instructions (32-bit and NEON).

---

Patch is 40.74 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/174188.diff


13 Files Affected:

- (modified) llvm/lib/Target/AArch64/AArch64.h (+2) 
- (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+2-3) 
- (added) llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp (+265) 
- (modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+5-2) 
- (modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (+7-1) 
- (modified) llvm/lib/Target/AArch64/AArch64TargetMachine.cpp (+15-1) 
- (modified) llvm/lib/Target/AArch64/CMakeLists.txt (+1) 
- (modified) llvm/test/CodeGen/AArch64/O3-pipeline.ll (+1) 
- (modified) llvm/test/CodeGen/AArch64/arm64-addrmode.ll (+40-90) 
- (modified) llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll 
(+5-10) 
- (modified) 
llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir (-1) 
- (added) 
llvm/test/CodeGen/AArch64/subreg-liveness-fix-subreg-to-reg-implicit-def.mir 
(+88) 
- (modified) llvm/test/CodeGen/AArch64/subreg_to_reg_coalescing_issue.mir 
(+1-2) 


``````````diff
diff --git a/llvm/lib/Target/AArch64/AArch64.h 
b/llvm/lib/Target/AArch64/AArch64.h
index a8e15c338352a..40983714ddf1d 100644
--- a/llvm/lib/Target/AArch64/AArch64.h
+++ b/llvm/lib/Target/AArch64/AArch64.h
@@ -64,6 +64,7 @@ FunctionPass *createAArch64CollectLOHPass();
 FunctionPass *createSMEABIPass();
 FunctionPass *createSMEPeepholeOptPass();
 FunctionPass *createMachineSMEABIPass(CodeGenOptLevel);
+FunctionPass *createAArch64SRLTDefineSuperRegsPass();
 ModulePass *createSVEIntrinsicOptsPass();
 InstructionSelector *
 createAArch64InstructionSelector(const AArch64TargetMachine &,
@@ -117,6 +118,7 @@ void initializeLDTLSCleanupPass(PassRegistry&);
 void initializeSMEABIPass(PassRegistry &);
 void initializeSMEPeepholeOptPass(PassRegistry &);
 void initializeMachineSMEABIPass(PassRegistry &);
+void initializeAArch64SRLTDefineSuperRegsPass(PassRegistry &);
 void initializeSVEIntrinsicOptsPass(PassRegistry &);
 void initializeAArch64Arm64ECCallLoweringPass(PassRegistry &);
 } // end namespace llvm
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp 
b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index a90cad4f19ef7..b8951adf27c05 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -1383,9 +1383,8 @@ bool AArch64RegisterInfo::shouldCoalesce(
   MachineFunction &MF = *MI->getMF();
   MachineRegisterInfo &MRI = MF.getRegInfo();
 
-  // Coalescing of SUBREG_TO_REG is broken when using subreg liveness tracking,
-  // we must disable it for now.
-  if (MI->isSubregToReg() && MRI.subRegLivenessEnabled())
+  if (MI->isSubregToReg() && MRI.subRegLivenessEnabled() &&
+      !MF.getSubtarget<AArch64Subtarget>().enableSRLTSubregToRegMitigation())
     return false;
 
   if (MI->isCopy() &&
diff --git a/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp 
b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
new file mode 100644
index 0000000000000..695155abc5b15
--- /dev/null
+++ b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
@@ -0,0 +1,265 @@
+//===- AArch64SRLTDefineSuperRegs.cpp 
-------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds
+// extra implicit-def's to instructions that define the low N bits of
+// a GPR/FPR register to also define the top bits, because all AArch64
+// instructions that write the low bits of a GPR/FPR also implicitly zero
+// the top bits.  For example, 'mov w0, w1' writes zeroes to the top 32-bits of
+// x0, so this pass adds a `implicit-def $x0` after register allocation.
+//
+// These semantics are originally represented in the MIR using `SUBREG_TO_REG`
+// which expresses that the top bits have been defined by the preceding
+// instructions, but during register coalescing this information is lost and in
+// contrast to when SRTL is disabled, when rewriting virtual -> physical
+// registers the implicit-defs are not added to the instruction.
+//
+// There have been several attempts to fix this in the coalescer [1], but each
+// iteration has exposed new bugs and the patch had to be reverted.
+// Additionally, the concept of adding 'implicit-def' of a virtual register is
+// particularly fragile and many places don't expect it (for example in
+// `X86::commuteInstructionImpl` the  code only looks at specific operands and
+// does not consider implicit-defs. Similar in `SplitEditor::addDeadDef` where
+// it traverses operand 'defs' rather than 'all_defs').
+//
+// We want a temporary solution that doesn't impact other targets and is 
simpler
+// and less intrusive than the patch proposed for the register coalescer [1], 
so
+// that we can enable SRLT for AArch64.
+//
+// The approach here is to just add the 'implicit-def' manually after rewriting
+// virtual regs -> phsyical regs. This still means that during the register
+// allocation process the dependences are not accurately represented in the MIR
+// and LiveIntervals, but there are several reasons why we believe this isn't a
+// problem in practice:
+// (A) The register allocator only spills entire virtual registers.
+//     This is additionally guarded by code in
+//     AArch64InstrInfo::storeRegToStackSlot/loadRegFromStackSlot
+//     where it checks if a register matches the expected register class.
+// (B) Rematerialization only happens when the instruction writes the full
+//     register.
+// (C) The high bits of the AArch64 register cannot be written independently.
+// (D) Instructions that write only part of a register always take that same
+//     register as a tied input operand, to indicate it's a merging operation.
+//
+// (A) means that for two virtual registers of regclass GPR32 and GPR64, if the
+// GPR32 register is coalesced into the GPR64 vreg then the full GPR64 would
+// be spilled/filled even if only the low 32-bits would be required for the
+// given liverange. (B) means that the top bits of a GPR64 would never be
+// overwritten by rematerialising a GPR32 sub-register for a given liverange.
+// (C-D) means that we can assume that the MIR as input to the register
+// allocator correctly expresses the instruction behaviour and dependences
+// between values, so unless the register allocator would violate (A) or (B),
+// the MIR is otherwise sound.
+//
+// Alternative approaches have also been considered, such as:
+// (1) Changing the AArch64 instruction definitions to write all bits and
+//     extract the low N bits for the result.
+// (2) Disabling coalescing of SUBREG_TO_REG and using regalloc hints to tell
+//     the register allocator to favour the same register for the input/output.
+// (3) Adding a new coalescer guard node with a tied-operand constraint, such
+//     that when the SUBREG_TO_REG is removed, something still represents that
+//     the top bits are defined. The node would get removed before rewriting
+//     virtregs.
+// (4) Using an explicit INSERT_SUBREG into a zero value and try to optimize
+//     away the INSERT_SUBREG (this is a more explicit variant of (2) and (3))
+// (5) Adding a new MachineOperand flag that represents the top bits would be
+//     defined, but are not read nor undef.
+//
+// (1) would be the best approach but would be a significant effort as it
+// requires rewriting most/all instruction definitions and fixing MIR passes
+// that rely on the current definitions, whereas (2-4) result in sub-optimal
+// code that can't really be avoided because the explicit nodes would stop
+// rematerialization. (5) might be a way to mitigate the
+// fragility of implicit-def's of virtual registers if we want to pursue
+// landing [1], but then we'd rather choose approach (1) to avoid using
+// SUBREG_TO_REG entirely.
+//
+// [1] https://github.com/llvm/llvm-project/pull/168353
+//===----------------------------------------------------------------------===//
+
+#include "AArch64InstrInfo.h"
+#include "AArch64MachineFunctionInfo.h"
+#include "AArch64Subtarget.h"
+#include "MCTargetDesc/AArch64AddressingModes.h"
+#include "llvm/ADT/BitVector.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/TargetRegisterInfo.h"
+#include "llvm/Support/Debug.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-srlt-define-superregs"
+#define PASS_NAME "AArch64 SRLT Define Super-Regs Pass"
+
+namespace {
+
+struct AArch64SRLTDefineSuperRegs : public MachineFunctionPass {
+  inline static char ID = 0;
+
+  AArch64SRLTDefineSuperRegs() : MachineFunctionPass(ID) {}
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+  void collectWidestSuperReg(Register R, const BitVector &RequiredBaseRegUnits,
+                 const BitVector &QHiRegUnits,
+                 SmallSet<Register, 8> &SuperRegs);
+
+  StringRef getPassName() const override { return PASS_NAME; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    AU.setPreservesCFG();
+    AU.addPreservedID(MachineLoopInfoID);
+    AU.addPreservedID(MachineDominatorsID);
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+private:
+  MachineFunction *MF = nullptr;
+  const AArch64Subtarget *Subtarget = nullptr;
+  const AArch64RegisterInfo *TRI = nullptr;
+  const AArch64FunctionInfo *AFI = nullptr;
+  const TargetInstrInfo *TII = nullptr;
+  MachineRegisterInfo *MRI = nullptr;
+};
+
+} // end anonymous namespace
+
+INITIALIZE_PASS(AArch64SRLTDefineSuperRegs, DEBUG_TYPE, PASS_NAME, false,
+                false)
+
+SmallVector<MCRegister> SupportedAliasRegs = {
+    AArch64::X0,  AArch64::X1,  AArch64::X2,  AArch64::X3,  AArch64::X4,
+    AArch64::X5,  AArch64::X6,  AArch64::X7,  AArch64::X8,  AArch64::X9,
+    AArch64::X10, AArch64::X11, AArch64::X12, AArch64::X13, AArch64::X14,
+    AArch64::X15, AArch64::X16, AArch64::X17, AArch64::X18, AArch64::X19,
+    AArch64::X20, AArch64::X21, AArch64::X22, AArch64::X23, AArch64::X24,
+    AArch64::X25, AArch64::X26, AArch64::X27, AArch64::X28, AArch64::FP,
+    AArch64::LR,  AArch64::SP,  AArch64::Z0,  AArch64::Z1,  AArch64::Z2,
+    AArch64::Z3,  AArch64::Z4,  AArch64::Z5,  AArch64::Z6,  AArch64::Z7,
+    AArch64::Z8,  AArch64::Z9,  AArch64::Z10, AArch64::Z11, AArch64::Z12,
+    AArch64::Z13, AArch64::Z14, AArch64::Z15, AArch64::Z16, AArch64::Z17,
+    AArch64::Z18, AArch64::Z19, AArch64::Z20, AArch64::Z21, AArch64::Z22,
+    AArch64::Z23, AArch64::Z24, AArch64::Z25, AArch64::Z26, AArch64::Z27,
+    AArch64::Z28, AArch64::Z29, AArch64::Z30, AArch64::Z31};
+
+SmallVector<MCRegister> QHiRegs = {
+    AArch64::Q0_HI,  AArch64::Q1_HI,  AArch64::Q2_HI,  AArch64::Q3_HI,
+    AArch64::Q4_HI,  AArch64::Q5_HI,  AArch64::Q6_HI,  AArch64::Q7_HI,
+    AArch64::Q8_HI,  AArch64::Q9_HI,  AArch64::Q10_HI, AArch64::Q11_HI,
+    AArch64::Q12_HI, AArch64::Q13_HI, AArch64::Q14_HI, AArch64::Q15_HI,
+    AArch64::Q16_HI, AArch64::Q17_HI, AArch64::Q18_HI, AArch64::Q19_HI,
+    AArch64::Q20_HI, AArch64::Q21_HI, AArch64::Q22_HI, AArch64::Q23_HI,
+    AArch64::Q24_HI, AArch64::Q25_HI, AArch64::Q26_HI, AArch64::Q27_HI,
+    AArch64::Q28_HI, AArch64::Q29_HI, AArch64::Q30_HI, AArch64::Q31_HI};
+
+// Find the widest super-reg for a given reg and adds it to `SuperRegs`.
+// For example:
+//  W0    -> X0
+//  B1    -> Q1 (without SVE)
+//        -> Z1 (with SVE)
+//  W1_W2 -> X1_X2
+//  D0_D1 -> Q0_Q1 (without SVE)
+//        -> Z0_Z1 (with SVE)
+void AArch64SRLTDefineSuperRegs::collectWidestSuperReg(
+    Register R, const BitVector &RequiredBaseRegUnits,
+    const BitVector &QHiRegUnits, SmallSet<Register, 8> &SuperRegs) {
+  assert(R.isPhysical() &&
+         "Expected to be run straight after virtregrewriter!");
+
+  BitVector Units(TRI->getNumRegUnits());
+  for (MCRegUnit U : TRI->regunits(R))
+    Units.set((unsigned)U);
+
+  auto IsSuitableSuperReg = [&](Register SR) {
+    for (MCRegUnit U : TRI->regunits(SR)) {
+      // Avoid choosing z1 as super-reg of d1 if SVE is not available.
+      if (QHiRegUnits.test((unsigned)U) &&
+          !Subtarget->isSVEorStreamingSVEAvailable())
+        return false;
+      // We consider reg A as a suitable super-reg of B when any of the reg 
units:
+      // * is shared (Q0 and B0, both have B0 as sub-register)
+      // * is artificial (Q0 = B0 + (B0_HI + H0_HI + S0_HI))
+      // This avoids considering e.g. Q0_Q1 as a super reg of D0 or D1.
+      if (!TRI->isArtificialRegUnit(U) &&
+          (!Units.test((unsigned)U) || 
!RequiredBaseRegUnits.test((unsigned)U)))
+        return false;
+    }
+    return true;
+  };
+
+  Register LargestSuperReg = AArch64::NoRegister;
+  for (Register SR : TRI->superregs(R))
+    if (IsSuitableSuperReg(SR) && (LargestSuperReg == AArch64::NoRegister ||
+                                   TRI->isSuperRegister(LargestSuperReg, SR)))
+      LargestSuperReg = SR;
+
+  if (LargestSuperReg != AArch64::NoRegister)
+    SuperRegs.insert(LargestSuperReg);
+}
+
+bool AArch64SRLTDefineSuperRegs::runOnMachineFunction(MachineFunction &MF) {
+  this->MF = &MF;
+  Subtarget = &MF.getSubtarget<AArch64Subtarget>();
+  TII = Subtarget->getInstrInfo();
+  TRI = Subtarget->getRegisterInfo();
+  MRI = &MF.getRegInfo();
+
+  if (!MRI->subRegLivenessEnabled())
+    return false;
+
+  assert(!MRI->isSSA() && "Expected to be run after breaking down SSA form!");
+
+  BitVector RequiredBaseRegUnits(TRI->getNumRegUnits());
+  for (Register R : SupportedAliasRegs)
+    for (MCRegUnit U : TRI->regunits(R))
+      RequiredBaseRegUnits.set((unsigned)U);
+
+  BitVector QHiRegUnits(TRI->getNumRegUnits());
+  for (Register R : QHiRegs)
+    for (MCRegUnit U : TRI->regunits(R))
+      QHiRegUnits.set((unsigned)U);
+
+  bool Changed = false;
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : MBB) {
+      // For each partial register write (e.g. w0), also add implicit-def for
+      // top bits of register (x0).
+      SmallSet<Register, 8> SuperRegs;
+      for (const MachineOperand &DefOp : MI.defs())
+        collectWidestSuperReg(DefOp.getReg(), RequiredBaseRegUnits, 
QHiRegUnits,
+                              SuperRegs);
+
+      if (!SuperRegs.size())
+        continue;
+
+      LLVM_DEBUG(dbgs() << "Adding implicit-defs to: " << MI);
+      for (Register R : SuperRegs) {
+        LLVM_DEBUG(dbgs() << "  " << printReg(R, TRI) << "\n");
+        bool IsRenamable = any_of(MI.defs(), [&](const MachineOperand &MO) {
+          return TRI->regsOverlap(MO.getReg(), R) && MO.isRenamable();
+        });
+        MachineOperand DefOp = MachineOperand::CreateReg(
+            R, /*isDef=*/true, /*isImp=*/true, /*isKill=*/false,
+            /*isDead=*/false, /*isUndef=*/false, /*isEarlyClobber=*/false,
+            /*SubReg=*/0, /*isDebug=*/false, /*isInternalRead=*/false,
+            /*isRenamable=*/IsRenamable);
+        MI.addOperand(DefOp);
+      }
+      Changed = true;
+    }
+  }
+
+  return Changed;
+}
+
+FunctionPass *llvm::createAArch64SRLTDefineSuperRegsPass() {
+  return new AArch64SRLTDefineSuperRegs();
+}
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp 
b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index 911eedbe36185..1737a0c1529b4 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -369,7 +369,8 @@ AArch64Subtarget::AArch64Subtarget(const Triple &TT, 
StringRef CPU,
                                    unsigned MinSVEVectorSizeInBitsOverride,
                                    unsigned MaxSVEVectorSizeInBitsOverride,
                                    bool IsStreaming, bool 
IsStreamingCompatible,
-                                   bool HasMinSize)
+                                   bool HasMinSize,
+                                   bool EnableSRLTSubregToRegMitigation)
     : AArch64GenSubtargetInfo(TT, CPU, TuneCPU, FS),
       ReserveXRegister(AArch64::GPR64commonRegClass.getNumRegs()),
       ReserveXRegisterForRA(AArch64::GPR64commonRegClass.getNumRegs()),
@@ -381,7 +382,9 @@ AArch64Subtarget::AArch64Subtarget(const Triple &TT, 
StringRef CPU,
               ? std::optional<unsigned>(AArch64StreamingHazardSize)
               : std::nullopt),
       MinSVEVectorSizeInBits(MinSVEVectorSizeInBitsOverride),
-      MaxSVEVectorSizeInBits(MaxSVEVectorSizeInBitsOverride), TargetTriple(TT),
+      MaxSVEVectorSizeInBits(MaxSVEVectorSizeInBitsOverride),
+      EnableSRLTSubregToRegMitigation(EnableSRLTSubregToRegMitigation),
+      TargetTriple(TT),
       InstrInfo(initializeSubtargetDependencies(FS, CPU, TuneCPU, HasMinSize)),
       TLInfo(TM, *this) {
   if (AArch64::isX18ReservedByDefault(TT))
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h 
b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index bd8a2d5234f2d..248e140b3101c 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -88,6 +88,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo 
{
   std::optional<unsigned> StreamingHazardSize;
   unsigned MinSVEVectorSizeInBits;
   unsigned MaxSVEVectorSizeInBits;
+  bool EnableSRLTSubregToRegMitigation;
   unsigned VScaleForTuning = 1;
   TailFoldingOpts DefaultSVETFOpts = TailFoldingOpts::Disabled;
 
@@ -128,7 +129,8 @@ class AArch64Subtarget final : public 
AArch64GenSubtargetInfo {
                    unsigned MinSVEVectorSizeInBitsOverride = 0,
                    unsigned MaxSVEVectorSizeInBitsOverride = 0,
                    bool IsStreaming = false, bool IsStreamingCompatible = 
false,
-                   bool HasMinSize = false);
+                   bool HasMinSize = false,
+                   bool EnableSRLTSubregToRegMitigation = false);
 
 // Getters for SubtargetFeatures defined in tablegen
 #define GET_SUBTARGETINFO_MACRO(ATTRIBUTE, DEFAULT, GETTER)                    
\
@@ -467,6 +469,10 @@ class AArch64Subtarget final : public 
AArch64GenSubtargetInfo {
   /// add + cnt instructions.
   bool useScalarIncVL() const;
 
+  bool enableSRLTSubregToRegMitigation() const {
+    return EnableSRLTSubregToRegMitigation;
+  }
+
   /// Choose a method of checking LR before performing a tail call.
   AArch64PAuth::AuthCheckMethod
   getAuthenticatedLRCheckMethod(const MachineFunction &MF) const;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp 
b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
index 1ec5a20cc0ce0..3aba866458830 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -227,6 +227,12 @@ static cl::opt<bool>
                             cl::desc("Enable new lowering for the SME ABI"),
                             cl::init(true), cl::Hidden);
 
+static cl::opt<bool> EnableSRLTSubregToRegMitigation(
+    "aarch64-srlt-mitigate-sr2r",
+    cl::desc("Enable SUBREG_TO_REG mitigation by adding 'implicit-def' for "
+             "super-regs when using Subreg Liveness Tracking"),
+    cl::init(true), cl::Hidden);
+
 extern "C" LLVM_ABI LLVM_EXTERNAL_VISIBILITY void
 LLVMInitializeAArch64Target() {
   // Register the target.
@@ -268,6 +274,7 @@ LLVMInitializeAArch64Target() {
   initializeKCFIPass(PR);
   initializeSMEABIPass(PR);
   initializeMachineSMEABIPass(PR);
+  initializeAArch64SRLTDefineSuperRegsPass(PR);
   initializeSMEPeepholeOptPass(PR);
   initializeSVEIntrinsicOptsPass(PR);
   initializeAArch64SpeculationHardeningPass(PR);
@@ -462,7 +469,8 @@ AArch64TargetMachine::getSubtargetImpl(const Function &F) 
const {
     resetTargetOptions(F);
     I = std::make_unique<AArch64Subtarget>(
         TargetTriple, CPU, TuneCPU, FS, *this, isLittle, MinSVEVectorSize,
-        MaxSVEVectorSize, IsStreaming, IsStreamingCompatible, HasMinSize);
+        MaxSVEVectorSize, IsStreaming, IsStreamingCompatible, HasMinSize,
+        EnableSRLTSubregToRegMitigation);
   }
 
   if (IsStreaming && !I->hasSME())
@@ -550,6 +558,7 @@ class AArch64PassConfig : public TargetPassConfig {
   void addMachineSSAOptimization() override;
   bool addILPOpts() override;
   void addPreRegAlloc() override;
+  void addPostRewrite() override;
   void addPostRegAlloc() override;
   void addPreSched2() override;
   void addPreEmitPass() override;
@@ -815,6 +824,11 @@ void AArch64PassConfig::addPreRegAlloc() {
     addPass(&MachinePipelinerID);
 }
 
+void AArch64PassConfig::addPostRewrite() {
+  if (EnableSRLTSubregToRegMitigation)
+    addPass(createAArch64SRLTDefineSuperRegsPass());
+}
+
 void AArch64PassConfig::addPostRegAlloc() {
   // Remove redundant copy instructions.
   if (TM->getOptLevel() != CodeGenOptLevel::None &&
diff --git a/llvm/lib/Target/AArch64/CMakeLists.txt 
b/llvm/lib/Target/AArch64/CMakeLists.txt
index 3334b3689e03f..2fe554217c1ba 100644
--- a/llvm/lib/Target/AArch64/CMakeLists.txt
+++ b/llvm/lib/Target/AArch64/CMakeLists.txt
@@ -92,6 +92,7 @@ add_llvm_target(AArch64CodeGen
   SMEPeepholeOpt.cpp
   SVEIntrinsicOpts.cpp
   MachineSMEABIPass.cpp
+  AArch64SRLTDefineSuperRegs.cpp
   AArch64SIMDInstrOpt.cpp
   AArch64PrologueEpilogue.cpp
 
diff --git a/llvm/test/CodeGen/AArch64/O3-pipeline.ll 
b/llvm/test/CodeGen/AArch64/O3-pipeline.ll
index e4249fe4fb1c8..d137b8c9ac1e0 100644
--- a/llvm/test/CodeGen/AArch64/O3-pipeline.ll
...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/174188
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64] Add new pass after VirtRegRewriter to add implicit-defs (PR #174188)

Reply via email to