https://github.com/sdesmalen-arm created 
https://github.com/llvm/llvm-project/pull/174188

When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds extra 
implicit-def's to instructions that define the low N bits of a GPR/FPR register 
to represent that the top bits are written, because all AArch64 instructions 
that write the low bits of a GPR/FPR also implicitly zero the top bits.

These semantics are originally represented in the MIR using `SUBREG_TO_REG`, 
but during register coalescing this information is lost and when rewriting 
virtual -> physical registers the implicit-defs are not added to represent the 
the top bits are written.

There have been several attempts to fix this in the coalescer (#168353), but 
each iteration has exposed new bugs and the patch had to be reverted. 
Additionally, the concept of adding 'implicit-def' of a virtual register during 
the register allocation process is particularly fragile and many places don't 
expect it (for example in `X86::commuteInstructionImpl` the code only looks at 
specific operands and does not consider implicit-defs. Similar in 
`SplitEditor::addDeadDef` where it traverses operand 'defs' rather than 
'all_defs').

We want a temporary solution that doesn't impact other targets and is simpler 
and less intrusive than the patch proposed for the register coalescer so that 
we can enable SRLT to make better use of SVE/SME multi-vector instructions 
while we work on a more permanent solution that requires rewriting a large part 
of the AArch64 instructions (32-bit and NEON).

>From 4cf3b6c29458ebf18e704e00f36622f84b2f54c7 Mon Sep 17 00:00:00 2001
From: Sander de Smalen <[email protected]>
Date: Tue, 23 Dec 2025 15:35:37 +0000
Subject: [PATCH] [AArch64] Add new pass after VirtRegRewriter to add
 implicit-defs

When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds extra
implicit-def's to instructions that define the low N bits of a GPR/FPR
register to represent that the top bits are written, because all AArch64
instructions that write the low bits of a GPR/FPR also implicitly zero the
top bits.

These semantics are originally represented in the MIR using `SUBREG_TO_REG`,
but during register coalescing this information is lost and when rewriting
virtual -> physical registers the implicit-defs are not added to represent
the the top bits are written.

There have been several attempts to fix this in the coalescer (#168353),
but each iteration has exposed new bugs and the patch had to be reverted.
Additionally, the concept of adding 'implicit-def' of a virtual register
during the register allocation process is particularly fragile and many
places don't expect it (for example in `X86::commuteInstructionImpl` the
code only looks at specific operands and does not consider implicit-defs.
Similar in `SplitEditor::addDeadDef` where it traverses operand 'defs'
rather than 'all_defs').

We want a temporary solution that doesn't impact other targets and is
simpler and less intrusive than the patch proposed for the register
coalescer so that we can enable SRLT to make better use of SVE/SME
multi-vector instructions while we work on a more permanent solution that
requires rewriting a large part of the AArch64 instructions (32-bit and
NEON).
---
 llvm/lib/Target/AArch64/AArch64.h             |   2 +
 .../Target/AArch64/AArch64RegisterInfo.cpp    |   5 +-
 .../AArch64/AArch64SRLTDefineSuperRegs.cpp    | 265 ++++++++++++++++++
 llvm/lib/Target/AArch64/AArch64Subtarget.cpp  |   7 +-
 llvm/lib/Target/AArch64/AArch64Subtarget.h    |   8 +-
 .../Target/AArch64/AArch64TargetMachine.cpp   |  16 +-
 llvm/lib/Target/AArch64/CMakeLists.txt        |   1 +
 llvm/test/CodeGen/AArch64/O3-pipeline.ll      |   1 +
 llvm/test/CodeGen/AArch64/arm64-addrmode.ll   | 130 +++------
 .../AArch64/preserve_nonecc_varargs_darwin.ll |  15 +-
 ...gister-coalesce-update-subranges-remat.mir |   1 -
 ...iveness-fix-subreg-to-reg-implicit-def.mir |  88 ++++++
 .../subreg_to_reg_coalescing_issue.mir        |   3 +-
 13 files changed, 432 insertions(+), 110 deletions(-)
 create mode 100644 llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
 create mode 100644 
llvm/test/CodeGen/AArch64/subreg-liveness-fix-subreg-to-reg-implicit-def.mir

diff --git a/llvm/lib/Target/AArch64/AArch64.h 
b/llvm/lib/Target/AArch64/AArch64.h
index a8e15c338352a..40983714ddf1d 100644
--- a/llvm/lib/Target/AArch64/AArch64.h
+++ b/llvm/lib/Target/AArch64/AArch64.h
@@ -64,6 +64,7 @@ FunctionPass *createAArch64CollectLOHPass();
 FunctionPass *createSMEABIPass();
 FunctionPass *createSMEPeepholeOptPass();
 FunctionPass *createMachineSMEABIPass(CodeGenOptLevel);
+FunctionPass *createAArch64SRLTDefineSuperRegsPass();
 ModulePass *createSVEIntrinsicOptsPass();
 InstructionSelector *
 createAArch64InstructionSelector(const AArch64TargetMachine &,
@@ -117,6 +118,7 @@ void initializeLDTLSCleanupPass(PassRegistry&);
 void initializeSMEABIPass(PassRegistry &);
 void initializeSMEPeepholeOptPass(PassRegistry &);
 void initializeMachineSMEABIPass(PassRegistry &);
+void initializeAArch64SRLTDefineSuperRegsPass(PassRegistry &);
 void initializeSVEIntrinsicOptsPass(PassRegistry &);
 void initializeAArch64Arm64ECCallLoweringPass(PassRegistry &);
 } // end namespace llvm
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp 
b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index a90cad4f19ef7..b8951adf27c05 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -1383,9 +1383,8 @@ bool AArch64RegisterInfo::shouldCoalesce(
   MachineFunction &MF = *MI->getMF();
   MachineRegisterInfo &MRI = MF.getRegInfo();
 
-  // Coalescing of SUBREG_TO_REG is broken when using subreg liveness tracking,
-  // we must disable it for now.
-  if (MI->isSubregToReg() && MRI.subRegLivenessEnabled())
+  if (MI->isSubregToReg() && MRI.subRegLivenessEnabled() &&
+      !MF.getSubtarget<AArch64Subtarget>().enableSRLTSubregToRegMitigation())
     return false;
 
   if (MI->isCopy() &&
diff --git a/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp 
b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
new file mode 100644
index 0000000000000..695155abc5b15
--- /dev/null
+++ b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
@@ -0,0 +1,265 @@
+//===- AArch64SRLTDefineSuperRegs.cpp 
-------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds
+// extra implicit-def's to instructions that define the low N bits of
+// a GPR/FPR register to also define the top bits, because all AArch64
+// instructions that write the low bits of a GPR/FPR also implicitly zero
+// the top bits.  For example, 'mov w0, w1' writes zeroes to the top 32-bits of
+// x0, so this pass adds a `implicit-def $x0` after register allocation.
+//
+// These semantics are originally represented in the MIR using `SUBREG_TO_REG`
+// which expresses that the top bits have been defined by the preceding
+// instructions, but during register coalescing this information is lost and in
+// contrast to when SRTL is disabled, when rewriting virtual -> physical
+// registers the implicit-defs are not added to the instruction.
+//
+// There have been several attempts to fix this in the coalescer [1], but each
+// iteration has exposed new bugs and the patch had to be reverted.
+// Additionally, the concept of adding 'implicit-def' of a virtual register is
+// particularly fragile and many places don't expect it (for example in
+// `X86::commuteInstructionImpl` the  code only looks at specific operands and
+// does not consider implicit-defs. Similar in `SplitEditor::addDeadDef` where
+// it traverses operand 'defs' rather than 'all_defs').
+//
+// We want a temporary solution that doesn't impact other targets and is 
simpler
+// and less intrusive than the patch proposed for the register coalescer [1], 
so
+// that we can enable SRLT for AArch64.
+//
+// The approach here is to just add the 'implicit-def' manually after rewriting
+// virtual regs -> phsyical regs. This still means that during the register
+// allocation process the dependences are not accurately represented in the MIR
+// and LiveIntervals, but there are several reasons why we believe this isn't a
+// problem in practice:
+// (A) The register allocator only spills entire virtual registers.
+//     This is additionally guarded by code in
+//     AArch64InstrInfo::storeRegToStackSlot/loadRegFromStackSlot
+//     where it checks if a register matches the expected register class.
+// (B) Rematerialization only happens when the instruction writes the full
+//     register.
+// (C) The high bits of the AArch64 register cannot be written independently.
+// (D) Instructions that write only part of a register always take that same
+//     register as a tied input operand, to indicate it's a merging operation.
+//
+// (A) means that for two virtual registers of regclass GPR32 and GPR64, if the
+// GPR32 register is coalesced into the GPR64 vreg then the full GPR64 would
+// be spilled/filled even if only the low 32-bits would be required for the
+// given liverange. (B) means that the top bits of a GPR64 would never be
+// overwritten by rematerialising a GPR32 sub-register for a given liverange.
+// (C-D) means that we can assume that the MIR as input to the register
+// allocator correctly expresses the instruction behaviour and dependences
+// between values, so unless the register allocator would violate (A) or (B),
+// the MIR is otherwise sound.
+//
+// Alternative approaches have also been considered, such as:
+// (1) Changing the AArch64 instruction definitions to write all bits and
+//     extract the low N bits for the result.
+// (2) Disabling coalescing of SUBREG_TO_REG and using regalloc hints to tell
+//     the register allocator to favour the same register for the input/output.
+// (3) Adding a new coalescer guard node with a tied-operand constraint, such
+//     that when the SUBREG_TO_REG is removed, something still represents that
+//     the top bits are defined. The node would get removed before rewriting
+//     virtregs.
+// (4) Using an explicit INSERT_SUBREG into a zero value and try to optimize
+//     away the INSERT_SUBREG (this is a more explicit variant of (2) and (3))
+// (5) Adding a new MachineOperand flag that represents the top bits would be
+//     defined, but are not read nor undef.
+//
+// (1) would be the best approach but would be a significant effort as it
+// requires rewriting most/all instruction definitions and fixing MIR passes
+// that rely on the current definitions, whereas (2-4) result in sub-optimal
+// code that can't really be avoided because the explicit nodes would stop
+// rematerialization. (5) might be a way to mitigate the
+// fragility of implicit-def's of virtual registers if we want to pursue
+// landing [1], but then we'd rather choose approach (1) to avoid using
+// SUBREG_TO_REG entirely.
+//
+// [1] https://github.com/llvm/llvm-project/pull/168353
+//===----------------------------------------------------------------------===//
+
+#include "AArch64InstrInfo.h"
+#include "AArch64MachineFunctionInfo.h"
+#include "AArch64Subtarget.h"
+#include "MCTargetDesc/AArch64AddressingModes.h"
+#include "llvm/ADT/BitVector.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/TargetRegisterInfo.h"
+#include "llvm/Support/Debug.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-srlt-define-superregs"
+#define PASS_NAME "AArch64 SRLT Define Super-Regs Pass"
+
+namespace {
+
+struct AArch64SRLTDefineSuperRegs : public MachineFunctionPass {
+  inline static char ID = 0;
+
+  AArch64SRLTDefineSuperRegs() : MachineFunctionPass(ID) {}
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+  void collectWidestSuperReg(Register R, const BitVector &RequiredBaseRegUnits,
+                 const BitVector &QHiRegUnits,
+                 SmallSet<Register, 8> &SuperRegs);
+
+  StringRef getPassName() const override { return PASS_NAME; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    AU.setPreservesCFG();
+    AU.addPreservedID(MachineLoopInfoID);
+    AU.addPreservedID(MachineDominatorsID);
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+private:
+  MachineFunction *MF = nullptr;
+  const AArch64Subtarget *Subtarget = nullptr;
+  const AArch64RegisterInfo *TRI = nullptr;
+  const AArch64FunctionInfo *AFI = nullptr;
+  const TargetInstrInfo *TII = nullptr;
+  MachineRegisterInfo *MRI = nullptr;
+};
+
+} // end anonymous namespace
+
+INITIALIZE_PASS(AArch64SRLTDefineSuperRegs, DEBUG_TYPE, PASS_NAME, false,
+                false)
+
+SmallVector<MCRegister> SupportedAliasRegs = {
+    AArch64::X0,  AArch64::X1,  AArch64::X2,  AArch64::X3,  AArch64::X4,
+    AArch64::X5,  AArch64::X6,  AArch64::X7,  AArch64::X8,  AArch64::X9,
+    AArch64::X10, AArch64::X11, AArch64::X12, AArch64::X13, AArch64::X14,
+    AArch64::X15, AArch64::X16, AArch64::X17, AArch64::X18, AArch64::X19,
+    AArch64::X20, AArch64::X21, AArch64::X22, AArch64::X23, AArch64::X24,
+    AArch64::X25, AArch64::X26, AArch64::X27, AArch64::X28, AArch64::FP,
+    AArch64::LR,  AArch64::SP,  AArch64::Z0,  AArch64::Z1,  AArch64::Z2,
+    AArch64::Z3,  AArch64::Z4,  AArch64::Z5,  AArch64::Z6,  AArch64::Z7,
+    AArch64::Z8,  AArch64::Z9,  AArch64::Z10, AArch64::Z11, AArch64::Z12,
+    AArch64::Z13, AArch64::Z14, AArch64::Z15, AArch64::Z16, AArch64::Z17,
+    AArch64::Z18, AArch64::Z19, AArch64::Z20, AArch64::Z21, AArch64::Z22,
+    AArch64::Z23, AArch64::Z24, AArch64::Z25, AArch64::Z26, AArch64::Z27,
+    AArch64::Z28, AArch64::Z29, AArch64::Z30, AArch64::Z31};
+
+SmallVector<MCRegister> QHiRegs = {
+    AArch64::Q0_HI,  AArch64::Q1_HI,  AArch64::Q2_HI,  AArch64::Q3_HI,
+    AArch64::Q4_HI,  AArch64::Q5_HI,  AArch64::Q6_HI,  AArch64::Q7_HI,
+    AArch64::Q8_HI,  AArch64::Q9_HI,  AArch64::Q10_HI, AArch64::Q11_HI,
+    AArch64::Q12_HI, AArch64::Q13_HI, AArch64::Q14_HI, AArch64::Q15_HI,
+    AArch64::Q16_HI, AArch64::Q17_HI, AArch64::Q18_HI, AArch64::Q19_HI,
+    AArch64::Q20_HI, AArch64::Q21_HI, AArch64::Q22_HI, AArch64::Q23_HI,
+    AArch64::Q24_HI, AArch64::Q25_HI, AArch64::Q26_HI, AArch64::Q27_HI,
+    AArch64::Q28_HI, AArch64::Q29_HI, AArch64::Q30_HI, AArch64::Q31_HI};
+
+// Find the widest super-reg for a given reg and adds it to `SuperRegs`.
+// For example:
+//  W0    -> X0
+//  B1    -> Q1 (without SVE)
+//        -> Z1 (with SVE)
+//  W1_W2 -> X1_X2
+//  D0_D1 -> Q0_Q1 (without SVE)
+//        -> Z0_Z1 (with SVE)
+void AArch64SRLTDefineSuperRegs::collectWidestSuperReg(
+    Register R, const BitVector &RequiredBaseRegUnits,
+    const BitVector &QHiRegUnits, SmallSet<Register, 8> &SuperRegs) {
+  assert(R.isPhysical() &&
+         "Expected to be run straight after virtregrewriter!");
+
+  BitVector Units(TRI->getNumRegUnits());
+  for (MCRegUnit U : TRI->regunits(R))
+    Units.set((unsigned)U);
+
+  auto IsSuitableSuperReg = [&](Register SR) {
+    for (MCRegUnit U : TRI->regunits(SR)) {
+      // Avoid choosing z1 as super-reg of d1 if SVE is not available.
+      if (QHiRegUnits.test((unsigned)U) &&
+          !Subtarget->isSVEorStreamingSVEAvailable())
+        return false;
+      // We consider reg A as a suitable super-reg of B when any of the reg 
units:
+      // * is shared (Q0 and B0, both have B0 as sub-register)
+      // * is artificial (Q0 = B0 + (B0_HI + H0_HI + S0_HI))
+      // This avoids considering e.g. Q0_Q1 as a super reg of D0 or D1.
+      if (!TRI->isArtificialRegUnit(U) &&
+          (!Units.test((unsigned)U) || 
!RequiredBaseRegUnits.test((unsigned)U)))
+        return false;
+    }
+    return true;
+  };
+
+  Register LargestSuperReg = AArch64::NoRegister;
+  for (Register SR : TRI->superregs(R))
+    if (IsSuitableSuperReg(SR) && (LargestSuperReg == AArch64::NoRegister ||
+                                   TRI->isSuperRegister(LargestSuperReg, SR)))
+      LargestSuperReg = SR;
+
+  if (LargestSuperReg != AArch64::NoRegister)
+    SuperRegs.insert(LargestSuperReg);
+}
+
+bool AArch64SRLTDefineSuperRegs::runOnMachineFunction(MachineFunction &MF) {
+  this->MF = &MF;
+  Subtarget = &MF.getSubtarget<AArch64Subtarget>();
+  TII = Subtarget->getInstrInfo();
+  TRI = Subtarget->getRegisterInfo();
+  MRI = &MF.getRegInfo();
+
+  if (!MRI->subRegLivenessEnabled())
+    return false;
+
+  assert(!MRI->isSSA() && "Expected to be run after breaking down SSA form!");
+
+  BitVector RequiredBaseRegUnits(TRI->getNumRegUnits());
+  for (Register R : SupportedAliasRegs)
+    for (MCRegUnit U : TRI->regunits(R))
+      RequiredBaseRegUnits.set((unsigned)U);
+
+  BitVector QHiRegUnits(TRI->getNumRegUnits());
+  for (Register R : QHiRegs)
+    for (MCRegUnit U : TRI->regunits(R))
+      QHiRegUnits.set((unsigned)U);
+
+  bool Changed = false;
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : MBB) {
+      // For each partial register write (e.g. w0), also add implicit-def for
+      // top bits of register (x0).
+      SmallSet<Register, 8> SuperRegs;
+      for (const MachineOperand &DefOp : MI.defs())
+        collectWidestSuperReg(DefOp.getReg(), RequiredBaseRegUnits, 
QHiRegUnits,
+                              SuperRegs);
+
+      if (!SuperRegs.size())
+        continue;
+
+      LLVM_DEBUG(dbgs() << "Adding implicit-defs to: " << MI);
+      for (Register R : SuperRegs) {
+        LLVM_DEBUG(dbgs() << "  " << printReg(R, TRI) << "\n");
+        bool IsRenamable = any_of(MI.defs(), [&](const MachineOperand &MO) {
+          return TRI->regsOverlap(MO.getReg(), R) && MO.isRenamable();
+        });
+        MachineOperand DefOp = MachineOperand::CreateReg(
+            R, /*isDef=*/true, /*isImp=*/true, /*isKill=*/false,
+            /*isDead=*/false, /*isUndef=*/false, /*isEarlyClobber=*/false,
+            /*SubReg=*/0, /*isDebug=*/false, /*isInternalRead=*/false,
+            /*isRenamable=*/IsRenamable);
+        MI.addOperand(DefOp);
+      }
+      Changed = true;
+    }
+  }
+
+  return Changed;
+}
+
+FunctionPass *llvm::createAArch64SRLTDefineSuperRegsPass() {
+  return new AArch64SRLTDefineSuperRegs();
+}
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp 
b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index 911eedbe36185..1737a0c1529b4 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -369,7 +369,8 @@ AArch64Subtarget::AArch64Subtarget(const Triple &TT, 
StringRef CPU,
                                    unsigned MinSVEVectorSizeInBitsOverride,
                                    unsigned MaxSVEVectorSizeInBitsOverride,
                                    bool IsStreaming, bool 
IsStreamingCompatible,
-                                   bool HasMinSize)
+                                   bool HasMinSize,
+                                   bool EnableSRLTSubregToRegMitigation)
     : AArch64GenSubtargetInfo(TT, CPU, TuneCPU, FS),
       ReserveXRegister(AArch64::GPR64commonRegClass.getNumRegs()),
       ReserveXRegisterForRA(AArch64::GPR64commonRegClass.getNumRegs()),
@@ -381,7 +382,9 @@ AArch64Subtarget::AArch64Subtarget(const Triple &TT, 
StringRef CPU,
               ? std::optional<unsigned>(AArch64StreamingHazardSize)
               : std::nullopt),
       MinSVEVectorSizeInBits(MinSVEVectorSizeInBitsOverride),
-      MaxSVEVectorSizeInBits(MaxSVEVectorSizeInBitsOverride), TargetTriple(TT),
+      MaxSVEVectorSizeInBits(MaxSVEVectorSizeInBitsOverride),
+      EnableSRLTSubregToRegMitigation(EnableSRLTSubregToRegMitigation),
+      TargetTriple(TT),
       InstrInfo(initializeSubtargetDependencies(FS, CPU, TuneCPU, HasMinSize)),
       TLInfo(TM, *this) {
   if (AArch64::isX18ReservedByDefault(TT))
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h 
b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index bd8a2d5234f2d..248e140b3101c 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -88,6 +88,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo 
{
   std::optional<unsigned> StreamingHazardSize;
   unsigned MinSVEVectorSizeInBits;
   unsigned MaxSVEVectorSizeInBits;
+  bool EnableSRLTSubregToRegMitigation;
   unsigned VScaleForTuning = 1;
   TailFoldingOpts DefaultSVETFOpts = TailFoldingOpts::Disabled;
 
@@ -128,7 +129,8 @@ class AArch64Subtarget final : public 
AArch64GenSubtargetInfo {
                    unsigned MinSVEVectorSizeInBitsOverride = 0,
                    unsigned MaxSVEVectorSizeInBitsOverride = 0,
                    bool IsStreaming = false, bool IsStreamingCompatible = 
false,
-                   bool HasMinSize = false);
+                   bool HasMinSize = false,
+                   bool EnableSRLTSubregToRegMitigation = false);
 
 // Getters for SubtargetFeatures defined in tablegen
 #define GET_SUBTARGETINFO_MACRO(ATTRIBUTE, DEFAULT, GETTER)                    
\
@@ -467,6 +469,10 @@ class AArch64Subtarget final : public 
AArch64GenSubtargetInfo {
   /// add + cnt instructions.
   bool useScalarIncVL() const;
 
+  bool enableSRLTSubregToRegMitigation() const {
+    return EnableSRLTSubregToRegMitigation;
+  }
+
   /// Choose a method of checking LR before performing a tail call.
   AArch64PAuth::AuthCheckMethod
   getAuthenticatedLRCheckMethod(const MachineFunction &MF) const;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp 
b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
index 1ec5a20cc0ce0..3aba866458830 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -227,6 +227,12 @@ static cl::opt<bool>
                             cl::desc("Enable new lowering for the SME ABI"),
                             cl::init(true), cl::Hidden);
 
+static cl::opt<bool> EnableSRLTSubregToRegMitigation(
+    "aarch64-srlt-mitigate-sr2r",
+    cl::desc("Enable SUBREG_TO_REG mitigation by adding 'implicit-def' for "
+             "super-regs when using Subreg Liveness Tracking"),
+    cl::init(true), cl::Hidden);
+
 extern "C" LLVM_ABI LLVM_EXTERNAL_VISIBILITY void
 LLVMInitializeAArch64Target() {
   // Register the target.
@@ -268,6 +274,7 @@ LLVMInitializeAArch64Target() {
   initializeKCFIPass(PR);
   initializeSMEABIPass(PR);
   initializeMachineSMEABIPass(PR);
+  initializeAArch64SRLTDefineSuperRegsPass(PR);
   initializeSMEPeepholeOptPass(PR);
   initializeSVEIntrinsicOptsPass(PR);
   initializeAArch64SpeculationHardeningPass(PR);
@@ -462,7 +469,8 @@ AArch64TargetMachine::getSubtargetImpl(const Function &F) 
const {
     resetTargetOptions(F);
     I = std::make_unique<AArch64Subtarget>(
         TargetTriple, CPU, TuneCPU, FS, *this, isLittle, MinSVEVectorSize,
-        MaxSVEVectorSize, IsStreaming, IsStreamingCompatible, HasMinSize);
+        MaxSVEVectorSize, IsStreaming, IsStreamingCompatible, HasMinSize,
+        EnableSRLTSubregToRegMitigation);
   }
 
   if (IsStreaming && !I->hasSME())
@@ -550,6 +558,7 @@ class AArch64PassConfig : public TargetPassConfig {
   void addMachineSSAOptimization() override;
   bool addILPOpts() override;
   void addPreRegAlloc() override;
+  void addPostRewrite() override;
   void addPostRegAlloc() override;
   void addPreSched2() override;
   void addPreEmitPass() override;
@@ -815,6 +824,11 @@ void AArch64PassConfig::addPreRegAlloc() {
     addPass(&MachinePipelinerID);
 }
 
+void AArch64PassConfig::addPostRewrite() {
+  if (EnableSRLTSubregToRegMitigation)
+    addPass(createAArch64SRLTDefineSuperRegsPass());
+}
+
 void AArch64PassConfig::addPostRegAlloc() {
   // Remove redundant copy instructions.
   if (TM->getOptLevel() != CodeGenOptLevel::None &&
diff --git a/llvm/lib/Target/AArch64/CMakeLists.txt 
b/llvm/lib/Target/AArch64/CMakeLists.txt
index 3334b3689e03f..2fe554217c1ba 100644
--- a/llvm/lib/Target/AArch64/CMakeLists.txt
+++ b/llvm/lib/Target/AArch64/CMakeLists.txt
@@ -92,6 +92,7 @@ add_llvm_target(AArch64CodeGen
   SMEPeepholeOpt.cpp
   SVEIntrinsicOpts.cpp
   MachineSMEABIPass.cpp
+  AArch64SRLTDefineSuperRegs.cpp
   AArch64SIMDInstrOpt.cpp
   AArch64PrologueEpilogue.cpp
 
diff --git a/llvm/test/CodeGen/AArch64/O3-pipeline.ll 
b/llvm/test/CodeGen/AArch64/O3-pipeline.ll
index e4249fe4fb1c8..d137b8c9ac1e0 100644
--- a/llvm/test/CodeGen/AArch64/O3-pipeline.ll
+++ b/llvm/test/CodeGen/AArch64/O3-pipeline.ll
@@ -191,6 +191,7 @@
 ; CHECK-NEXT:       Virtual Register Rewriter
 ; CHECK-NEXT:       Register Allocation Pass Scoring
 ; CHECK-NEXT:       Stack Slot Coloring
+; CHECK-NEXT:       AArch64 SRLT Define Super-Regs Pass
 ; CHECK-NEXT:       Machine Copy Propagation Pass
 ; CHECK-NEXT:       Machine Loop Invariant Code Motion
 ; CHECK-NEXT:       AArch64 Redundant Copy Elimination
diff --git a/llvm/test/CodeGen/AArch64/arm64-addrmode.ll 
b/llvm/test/CodeGen/AArch64/arm64-addrmode.ll
index c8b7035b7c6e3..f8695b62619c0 100644
--- a/llvm/test/CodeGen/AArch64/arm64-addrmode.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-addrmode.ll
@@ -43,7 +43,6 @@ define void @t4(ptr %object) {
 ; CHECK-LABEL: t4:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov w8, #32768 // =0x8000
-; CHECK-NEXT:    // kill: def $x8 killed $w8
 ; CHECK-NEXT:    ldr xzr, [x0, x8]
 ; CHECK-NEXT:    ret
   %incdec.ptr = getelementptr inbounds i64, ptr %object, i64 4096
@@ -70,7 +69,6 @@ define void @t6(i64 %a, ptr %object) {
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    add x8, x1, x0, lsl #3
 ; CHECK-NEXT:    mov w9, #32768 // =0x8000
-; CHECK-NEXT:    // kill: def $x9 killed $w9
 ; CHECK-NEXT:    ldr xzr, [x8, x9]
 ; CHECK-NEXT:    ret
   %tmp1 = getelementptr inbounds i64, ptr %object, i64 %a
@@ -84,7 +82,6 @@ define void @t7(i64 %a) {
 ; CHECK-LABEL: t7:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov w8, #65535 // =0xffff
-; CHECK-NEXT:    // kill: def $x8 killed $w8
 ; CHECK-NEXT:    ldr xzr, [x0, x8]
 ; CHECK-NEXT:    ret
   %1 = add i64 %a, 65535   ;0xffff
@@ -134,7 +131,6 @@ define void @t11(i64 %a) {
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov w8, #17767 // =0x4567
 ; CHECK-NEXT:    movk w8, #291, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
 ; CHECK-NEXT:    ldr xzr, [x0, x8]
 ; CHECK-NEXT:    ret
   %1 = add i64 %a, 19088743   ;0x1234567
@@ -218,10 +214,8 @@ define void @t17(i64 %a) {
 define i8 @LdOffset_i8(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i8:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #56952 // =0xde78
-; CHECK-NEXT:    movk w8, #15, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrb w0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #253, lsl #12 // =1036288
+; CHECK-NEXT:    ldrb w0, [x8, #3704]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i8, ptr %a, i64 1039992
   %val = load i8, ptr %arrayidx, align 1
@@ -232,10 +226,8 @@ define i8 @LdOffset_i8(ptr %a)  {
 define i32 @LdOffset_i8_zext32(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i8_zext32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #56952 // =0xde78
-; CHECK-NEXT:    movk w8, #15, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrb w0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #253, lsl #12 // =1036288
+; CHECK-NEXT:    ldrb w0, [x8, #3704]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i8, ptr %a, i64 1039992
   %val = load i8, ptr %arrayidx, align 1
@@ -247,10 +239,8 @@ define i32 @LdOffset_i8_zext32(ptr %a)  {
 define i32 @LdOffset_i8_sext32(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i8_sext32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #56952 // =0xde78
-; CHECK-NEXT:    movk w8, #15, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrsb w0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #253, lsl #12 // =1036288
+; CHECK-NEXT:    ldrsb w0, [x8, #3704]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i8, ptr %a, i64 1039992
   %val = load i8, ptr %arrayidx, align 1
@@ -262,11 +252,8 @@ define i32 @LdOffset_i8_sext32(ptr %a)  {
 define i64 @LdOffset_i8_zext64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i8_zext64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #56952 // =0xde78
-; CHECK-NEXT:    movk w8, #15, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrb w8, [x0, x8]
-; CHECK-NEXT:    mov w0, w8
+; CHECK-NEXT:    add x8, x0, #253, lsl #12 // =1036288
+; CHECK-NEXT:    ldrb w0, [x8, #3704]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i8, ptr %a, i64 1039992
   %val = load i8, ptr %arrayidx, align 1
@@ -278,10 +265,8 @@ define i64 @LdOffset_i8_zext64(ptr %a)  {
 define i64 @LdOffset_i8_sext64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i8_sext64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #56952 // =0xde78
-; CHECK-NEXT:    movk w8, #15, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrsb x0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #253, lsl #12 // =1036288
+; CHECK-NEXT:    ldrsb x0, [x8, #3704]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i8, ptr %a, i64 1039992
   %val = load i8, ptr %arrayidx, align 1
@@ -293,10 +278,8 @@ define i64 @LdOffset_i8_sext64(ptr %a)  {
 define i16 @LdOffset_i16(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i16:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #48368 // =0xbcf0
-; CHECK-NEXT:    movk w8, #31, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrh w0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #506, lsl #12 // =2072576
+; CHECK-NEXT:    ldrh w0, [x8, #7408]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i16, ptr %a, i64 1039992
   %val = load i16, ptr %arrayidx, align 2
@@ -307,10 +290,8 @@ define i16 @LdOffset_i16(ptr %a)  {
 define i32 @LdOffset_i16_zext32(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i16_zext32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #48368 // =0xbcf0
-; CHECK-NEXT:    movk w8, #31, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrh w0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #506, lsl #12 // =2072576
+; CHECK-NEXT:    ldrh w0, [x8, #7408]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i16, ptr %a, i64 1039992
   %val = load i16, ptr %arrayidx, align 2
@@ -322,10 +303,8 @@ define i32 @LdOffset_i16_zext32(ptr %a)  {
 define i32 @LdOffset_i16_sext32(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i16_sext32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #48368 // =0xbcf0
-; CHECK-NEXT:    movk w8, #31, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrsh w0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #506, lsl #12 // =2072576
+; CHECK-NEXT:    ldrsh w0, [x8, #7408]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i16, ptr %a, i64 1039992
   %val = load i16, ptr %arrayidx, align 2
@@ -337,11 +316,8 @@ define i32 @LdOffset_i16_sext32(ptr %a)  {
 define i64 @LdOffset_i16_zext64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i16_zext64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #48368 // =0xbcf0
-; CHECK-NEXT:    movk w8, #31, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrh w8, [x0, x8]
-; CHECK-NEXT:    mov w0, w8
+; CHECK-NEXT:    add x8, x0, #506, lsl #12 // =2072576
+; CHECK-NEXT:    ldrh w0, [x8, #7408]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i16, ptr %a, i64 1039992
   %val = load i16, ptr %arrayidx, align 2
@@ -353,10 +329,8 @@ define i64 @LdOffset_i16_zext64(ptr %a)  {
 define i64 @LdOffset_i16_sext64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i16_sext64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #48368 // =0xbcf0
-; CHECK-NEXT:    movk w8, #31, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrsh x0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #506, lsl #12 // =2072576
+; CHECK-NEXT:    ldrsh x0, [x8, #7408]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i16, ptr %a, i64 1039992
   %val = load i16, ptr %arrayidx, align 2
@@ -368,10 +342,8 @@ define i64 @LdOffset_i16_sext64(ptr %a)  {
 define i32 @LdOffset_i32(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #31200 // =0x79e0
-; CHECK-NEXT:    movk w8, #63, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldr w0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #1012, lsl #12 // =4145152
+; CHECK-NEXT:    ldr w0, [x8, #14816]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i32, ptr %a, i64 1039992
   %val = load i32, ptr %arrayidx, align 4
@@ -382,11 +354,8 @@ define i32 @LdOffset_i32(ptr %a)  {
 define i64 @LdOffset_i32_zext64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i32_zext64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #31200 // =0x79e0
-; CHECK-NEXT:    movk w8, #63, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldr w8, [x0, x8]
-; CHECK-NEXT:    mov w0, w8
+; CHECK-NEXT:    add x8, x0, #1012, lsl #12 // =4145152
+; CHECK-NEXT:    ldr w0, [x8, #14816]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i32, ptr %a, i64 1039992
   %val = load i32, ptr %arrayidx, align 2
@@ -398,10 +367,8 @@ define i64 @LdOffset_i32_zext64(ptr %a)  {
 define i64 @LdOffset_i32_sext64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i32_sext64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #31200 // =0x79e0
-; CHECK-NEXT:    movk w8, #63, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrsw x0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #1012, lsl #12 // =4145152
+; CHECK-NEXT:    ldrsw x0, [x8, #14816]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i32, ptr %a, i64 1039992
   %val = load i32, ptr %arrayidx, align 2
@@ -413,10 +380,8 @@ define i64 @LdOffset_i32_sext64(ptr %a)  {
 define i64 @LdOffset_i64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #62400 // =0xf3c0
-; CHECK-NEXT:    movk w8, #126, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldr x0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #2024, lsl #12 // =8290304
+; CHECK-NEXT:    ldr x0, [x8, #29632]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i64, ptr %a, i64 1039992
   %val = load i64, ptr %arrayidx, align 4
@@ -427,10 +392,8 @@ define i64 @LdOffset_i64(ptr %a)  {
 define <2 x i32> @LdOffset_v2i32(ptr %a)  {
 ; CHECK-LABEL: LdOffset_v2i32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #62400 // =0xf3c0
-; CHECK-NEXT:    movk w8, #126, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldr d0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #2024, lsl #12 // =8290304
+; CHECK-NEXT:    ldr d0, [x8, #29632]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds <2 x i32>, ptr %a, i64 1039992
   %val = load <2 x i32>, ptr %arrayidx, align 4
@@ -441,10 +404,8 @@ define <2 x i32> @LdOffset_v2i32(ptr %a)  {
 define <2 x i64> @LdOffset_v2i64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_v2i64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #59264 // =0xe780
-; CHECK-NEXT:    movk w8, #253, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldr q0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #4048, lsl #12 // =16580608
+; CHECK-NEXT:    ldr q0, [x8, #59264]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds <2 x i64>, ptr %a, i64 1039992
   %val = load <2 x i64>, ptr %arrayidx, align 4
@@ -455,10 +416,8 @@ define <2 x i64> @LdOffset_v2i64(ptr %a)  {
 define double @LdOffset_i8_f64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i8_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #56952 // =0xde78
-; CHECK-NEXT:    movk w8, #15, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrsb w8, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #253, lsl #12 // =1036288
+; CHECK-NEXT:    ldrsb w8, [x8, #3704]
 ; CHECK-NEXT:    scvtf d0, w8
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i8, ptr %a, i64 1039992
@@ -471,10 +430,8 @@ define double @LdOffset_i8_f64(ptr %a)  {
 define double @LdOffset_i16_f64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i16_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #48368 // =0xbcf0
-; CHECK-NEXT:    movk w8, #31, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldrsh w8, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #506, lsl #12 // =2072576
+; CHECK-NEXT:    ldrsh w8, [x8, #7408]
 ; CHECK-NEXT:    scvtf d0, w8
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i16, ptr %a, i64 1039992
@@ -487,10 +444,8 @@ define double @LdOffset_i16_f64(ptr %a)  {
 define double @LdOffset_i32_f64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i32_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #31200 // =0x79e0
-; CHECK-NEXT:    movk w8, #63, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldr s0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #1012, lsl #12 // =4145152
+; CHECK-NEXT:    ldr s0, [x8, #14816]
 ; CHECK-NEXT:    ucvtf d0, d0
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i32, ptr %a, i64 1039992
@@ -503,10 +458,8 @@ define double @LdOffset_i32_f64(ptr %a)  {
 define double @LdOffset_i64_f64(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i64_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #62400 // =0xf3c0
-; CHECK-NEXT:    movk w8, #126, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
-; CHECK-NEXT:    ldr d0, [x0, x8]
+; CHECK-NEXT:    add x8, x0, #2024, lsl #12 // =8290304
+; CHECK-NEXT:    ldr d0, [x8, #29632]
 ; CHECK-NEXT:    scvtf d0, d0
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i64, ptr %a, i64 1039992
@@ -554,7 +507,6 @@ define i32 @LdOffset_i16_odd_offset(ptr nocapture noundef 
readonly %a)  {
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov w8, #56953 // =0xde79
 ; CHECK-NEXT:    movk w8, #15, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
 ; CHECK-NEXT:    ldrsh w0, [x0, x8]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i8, ptr %a, i64 1039993
@@ -568,7 +520,6 @@ define i8 @LdOffset_i8_movnwi(ptr %a)  {
 ; CHECK-LABEL: LdOffset_i8_movnwi:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov w8, #16777215 // =0xffffff
-; CHECK-NEXT:    // kill: def $x8 killed $w8
 ; CHECK-NEXT:    ldrb w0, [x0, x8]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i8, ptr %a, i64 16777215
@@ -582,7 +533,6 @@ define i8 @LdOffset_i8_too_large(ptr %a)  {
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov w8, #1 // =0x1
 ; CHECK-NEXT:    movk w8, #256, lsl #16
-; CHECK-NEXT:    // kill: def $x8 killed $w8
 ; CHECK-NEXT:    ldrb w0, [x0, x8]
 ; CHECK-NEXT:    ret
   %arrayidx = getelementptr inbounds i8, ptr %a, i64 16777217
diff --git a/llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll 
b/llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll
index f03804d0064fd..2a77d4dd33fe5 100644
--- a/llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll
+++ b/llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll
@@ -28,21 +28,15 @@ define i32 @caller() nounwind ssp {
 ; CHECK-NEXT:    mov w8, #10 ; =0xa
 ; CHECK-NEXT:    mov w9, #9 ; =0x9
 ; CHECK-NEXT:    mov w10, #8 ; =0x8
-; CHECK-NEXT:    ; kill: def $x8 killed $w8
-; CHECK-NEXT:    mov w11, #6 ; =0x6
-; CHECK-NEXT:    ; kill: def $x9 killed $w9
-; CHECK-NEXT:    str x8, [sp, #32]
-; CHECK-NEXT:    ; kill: def $x10 killed $w10
-; CHECK-NEXT:    mov w0, #1 ; =0x1
+; CHECK-NEXT:    stp x9, x8, [sp, #24]
 ; CHECK-NEXT:    mov w8, #7 ; =0x7
-; CHECK-NEXT:    stp x10, x9, [sp, #16]
-; CHECK-NEXT:    mov w9, w11
+; CHECK-NEXT:    mov w9, #6 ; =0x6
+; CHECK-NEXT:    mov w0, #1 ; =0x1
 ; CHECK-NEXT:    mov w1, #2 ; =0x2
 ; CHECK-NEXT:    mov w2, #3 ; =0x3
 ; CHECK-NEXT:    mov w3, #4 ; =0x4
 ; CHECK-NEXT:    mov w4, #5 ; =0x5
 ; CHECK-NEXT:    stp d15, d14, [sp, #48] ; 16-byte Folded Spill
-; CHECK-NEXT:    ; kill: def $x8 killed $w8
 ; CHECK-NEXT:    stp d13, d12, [sp, #64] ; 16-byte Folded Spill
 ; CHECK-NEXT:    stp d11, d10, [sp, #80] ; 16-byte Folded Spill
 ; CHECK-NEXT:    stp d9, d8, [sp, #96] ; 16-byte Folded Spill
@@ -52,7 +46,8 @@ define i32 @caller() nounwind ssp {
 ; CHECK-NEXT:    stp x22, x21, [sp, #160] ; 16-byte Folded Spill
 ; CHECK-NEXT:    stp x20, x19, [sp, #176] ; 16-byte Folded Spill
 ; CHECK-NEXT:    stp x29, x30, [sp, #192] ; 16-byte Folded Spill
-; CHECK-NEXT:    stp x9, x8, [sp]
+; CHECK-NEXT:    stp x8, x10, [sp, #8]
+; CHECK-NEXT:    str x9, [sp]
 ; CHECK-NEXT:    bl _callee
 ; CHECK-NEXT:    ldp x29, x30, [sp, #192] ; 16-byte Folded Reload
 ; CHECK-NEXT:    ldp x20, x19, [sp, #176] ; 16-byte Folded Reload
diff --git 
a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir 
b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
index 68032643bcf4d..08fc47d9480ce 100644
--- a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
+++ b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
@@ -1,6 +1,5 @@
 # RUN: llc -mtriple=aarch64 -o /dev/null -run-pass=register-coalescer 
-aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | 
FileCheck %s --check-prefix=CHECK-DBG
 # RUN: llc -mtriple=aarch64 -verify-machineinstrs -o - 
-run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking %s | 
FileCheck %s --check-prefix=CHECK
-# XFAIL: *
 # REQUIRES: asserts
 
 # CHECK-DBG: ********** REGISTER COALESCER **********
diff --git 
a/llvm/test/CodeGen/AArch64/subreg-liveness-fix-subreg-to-reg-implicit-def.mir 
b/llvm/test/CodeGen/AArch64/subreg-liveness-fix-subreg-to-reg-implicit-def.mir
new file mode 100644
index 0000000000000..82fa4d3c2e3db
--- /dev/null
+++ 
b/llvm/test/CodeGen/AArch64/subreg-liveness-fix-subreg-to-reg-implicit-def.mir
@@ -0,0 +1,88 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 6
+# RUN: llc -mtriple=aarch64 -run-pass=aarch64-srlt-define-superregs 
-enable-subreg-liveness -o - %s | FileCheck %s
+--- |
+  target triple = "aarch64"
+
+  define void @test_implicit_def_w1_to_x1() { entry: unreachable }
+  define void @test_implicit_def_d0_to_q0_and_d1_to_q1() { entry: unreachable }
+  define void @test_implicit_def_d0_d1_d2_to_q0_q1_q2() { entry: unreachable }
+  define void @test_implicit_def_d0_d1_d2_to_z0_z1_z2_with_sve() 
"target-features"="+sve" { entry: unreachable }
+
+---
+name: test_implicit_def_w1_to_x1
+isSSA: false
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x1
+
+    ; CHECK-LABEL: name: test_implicit_def_w1_to_x1
+    ; CHECK: liveins: $x1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: renamable $x0 = COPY $x1
+    ; CHECK-NEXT: renamable $w1 = ORRWrr $wzr, renamable $w0, implicit-def 
renamable $x1
+    ; CHECK-NEXT: RET_ReallyLR implicit $x1, implicit $x0
+    renamable $x0 = COPY $x1
+    renamable $w1 = ORRWrr $wzr, renamable $w0
+    RET_ReallyLR implicit $x1, implicit $x0
+...
+---
+name: test_implicit_def_d0_to_q0_and_d1_to_q1
+isSSA: false
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $d0, $d1, $x1
+
+    ; CHECK-LABEL: name: test_implicit_def_d0_to_q0_and_d1_to_q1
+    ; CHECK: liveins: $d0, $d1, $x1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: early-clobber $x1, renamable $d0, renamable $d1 = LDPDpre 
renamable $x1, 16, implicit-def renamable $q0, implicit-def renamable $q1 :: 
(load (s64))
+    ; CHECK-NEXT: STPDi renamable $d0, renamable $d1, renamable $x1, 0 :: 
(store (s64))
+    ; CHECK-NEXT: RET undef $lr
+    early-clobber $x1, renamable $d0, renamable $d1 = LDPDpre renamable $x1, 
16 :: (load (s64))
+    STPDi renamable $d0, renamable $d1, renamable $x1, 0 :: (store (s64))
+    RET undef $lr
+...
+---
+name: test_implicit_def_d0_d1_d2_to_q0_q1_q2
+isSSA: false
+tracksRegLiveness: true
+machineFunctionInfo: {}
+body:             |
+  bb.0.entry:
+    liveins: $x0, $x1, $lr, $fp
+
+    ; CHECK-LABEL: name: test_implicit_def_d0_d1_d2_to_q0_q1_q2
+    ; CHECK: liveins: $x0, $x1, $lr, $fp
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: renamable $x0, renamable $d0_d1_d2 = LD3Threev8b_POST killed 
renamable $x0, $xzr, implicit-def renamable $q0_q1_q2
+    ; CHECK-NEXT: FAKE_USE $x0
+    ; CHECK-NEXT: FAKE_USE $d0_d1_d2
+    ; CHECK-NEXT: RET undef $lr
+    renamable $x0, renamable $d0_d1_d2 = LD3Threev8b_POST killed renamable 
$x0, $xzr
+    FAKE_USE $x0
+    FAKE_USE $d0_d1_d2
+    RET undef $lr
+...
+---
+name: test_implicit_def_d0_d1_d2_to_z0_z1_z2_with_sve
+isSSA: false
+tracksRegLiveness: true
+machineFunctionInfo: {}
+body:             |
+  bb.0.entry:
+    liveins: $x0, $x1, $lr, $fp
+
+    ; CHECK-LABEL: name: test_implicit_def_d0_d1_d2_to_z0_z1_z2_with_sve
+    ; CHECK: liveins: $x0, $x1, $lr, $fp
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: renamable $x0, renamable $d0_d1_d2 = LD3Threev8b_POST killed 
renamable $x0, $xzr, implicit-def renamable $z0_z1_z2
+    ; CHECK-NEXT: FAKE_USE $x0
+    ; CHECK-NEXT: FAKE_USE $d0_d1_d2
+    ; CHECK-NEXT: RET undef $lr
+    renamable $x0, renamable $d0_d1_d2 = LD3Threev8b_POST killed renamable 
$x0, $xzr
+    FAKE_USE $x0
+    FAKE_USE $d0_d1_d2
+    RET undef $lr
+...
diff --git a/llvm/test/CodeGen/AArch64/subreg_to_reg_coalescing_issue.mir 
b/llvm/test/CodeGen/AArch64/subreg_to_reg_coalescing_issue.mir
index 0d472fba05039..b8fa4a2fef901 100644
--- a/llvm/test/CodeGen/AArch64/subreg_to_reg_coalescing_issue.mir
+++ b/llvm/test/CodeGen/AArch64/subreg_to_reg_coalescing_issue.mir
@@ -14,8 +14,7 @@ body: |
     ; SRLT: liveins: $x1
     ; SRLT-NEXT: {{  $}}
     ; SRLT-NEXT: $x0 = ORRXrr $xzr, $x1
-    ; SRLT-NEXT: renamable $w8 = ORRWrr $wzr, renamable $w0
-    ; SRLT-NEXT: $w1 = ORRWrr $wzr, killed $w8, implicit-def $x1
+    ; SRLT-NEXT: renamable $w1 = ORRWrr $wzr, renamable $w0, implicit-def 
renamable $x1
     ; SRLT-NEXT: RET_ReallyLR implicit $x1, implicit $x0
     ;
     ; NOSRLT-LABEL: name: dont_remove_orr_w

_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to