[llvm-branch-commits] [llvm] [InlineSpiller][AMDGPU] Implement subreg reload during RA spill (PR #175002)

Quentin Colombet via llvm-branch-commits Mon, 26 Jan 2026 14:52:14 -0800

================
@@ -1248,18 +1249,62 @@ void InlineSpiller::spillAroundUses(Register Reg) {
 
     // Create a new virtual register for spill/fill.
     // FIXME: Infer regclass from instruction alone.
-    Register NewVReg = Edit->createFrom(Reg);
+
+    unsigned SubReg = 0;
+    LaneBitmask CoveringLanes = LaneBitmask::getNone();
+    // If the subreg liveness is enabled, identify the subreg use(s) to try
+    // subreg reload. Skip if the instruction also defines the register.
+    // For copy bundles, get the covering lane masks.
+    if (MRI.subRegLivenessEnabled() && !RI.Writes) {
+      for (auto [MI, OpIdx] : Ops) {
+        const MachineOperand &MO = MI->getOperand(OpIdx);
+        assert(MO.isReg() && MO.getReg() == Reg);
+        if (MO.isUse()) {
+          SubReg = MO.getSubReg();
+          if (SubReg)
+            CoveringLanes |= TRI.getSubRegIndexLaneMask(SubReg);
+        }
+      }
+    }
+
+    if (MI.isBundled() && CoveringLanes.any()) {
+      CoveringLanes = LaneBitmask(bit_ceil(CoveringLanes.getAsInteger()) - 1);
+      // Obtain the covering subregister index, including any missing indices
+      // within the identified small range. Although this may be suboptimal due
+      // to gaps in the subregisters that are not part of the copy bundle, it 
is
+      // benificial when components outside this range of the original tuple 
can
+      // be completely skipped from the reload.
+      SubReg = TRI.getSubRegIdxFromLaneMask(CoveringLanes);
+    }
+
+    // If the target doesn't support subreg reload, fallback to restoring the
+    // full tuple.
+    if (SubReg && !TRI.shouldEnableSubRegReload(SubReg))
+      SubReg = 0;
+
+    const TargetRegisterClass *OrigRC = MRI.getRegClass(Reg);
+    const TargetRegisterClass *NewRC =
+        SubReg ? TRI.getSubRegisterClass(OrigRC, SubReg) : nullptr;
----------------
qcolombet wrote:


What I'm saying is: Assume we always reload the full tuple so that we don't 
have to handle the update of the subreg uses here.

The question is can this be easily folded later on?

To get back to your example.
Instead of generating:
```
; Later, only need sub1 (second 32-bit component)
; Current implementation - restore full.
%reload:Vreg_128 = RESTORE_V128, ofst:0
%val = USE %reload.sub1
```
This patch wants to generate:
```
%reload32:Vreg_32 = RESTORE_V32, ofst:Y
%val = USE %reload32
```

What I'm saying is could we generate:
```
%reload32:Vreg_32 = RESTORE_V32, ofst:Y
%reload = INSERT_SUBREG undef, %reload32, sub1 <-- At the API boundary assume 
we fully restore
%val = USE %reload.sub1 <-- this part doesn't need to be updated here
```

What I'm saying here is a lot of the "complexity" in this patch is changing 
`%reload` in `%reload32`, i.e., change the register class and the subreg idx. 
I'm wondering if we can do that clean-up later.

I'm trying to get a read of the trade-off between ease to understand the code 
vs. doing the whole transformation in one-shot/several-shots.

https://github.com/llvm/llvm-project/pull/175002
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [InlineSpiller][AMDGPU] Implement subreg reload during RA spill (PR #175002)

Reply via email to