================
@@ -1248,18 +1249,62 @@ void InlineSpiller::spillAroundUses(Register Reg) {
 
     // Create a new virtual register for spill/fill.
     // FIXME: Infer regclass from instruction alone.
-    Register NewVReg = Edit->createFrom(Reg);
+
+    unsigned SubReg = 0;
+    LaneBitmask CoveringLanes = LaneBitmask::getNone();
+    // If the subreg liveness is enabled, identify the subreg use(s) to try
+    // subreg reload. Skip if the instruction also defines the register.
+    // For copy bundles, get the covering lane masks.
+    if (MRI.subRegLivenessEnabled() && !RI.Writes) {
+      for (auto [MI, OpIdx] : Ops) {
+        const MachineOperand &MO = MI->getOperand(OpIdx);
+        assert(MO.isReg() && MO.getReg() == Reg);
+        if (MO.isUse()) {
+          SubReg = MO.getSubReg();
+          if (SubReg)
+            CoveringLanes |= TRI.getSubRegIndexLaneMask(SubReg);
+        }
+      }
+    }
+
+    if (MI.isBundled() && CoveringLanes.any()) {
+      CoveringLanes = LaneBitmask(bit_ceil(CoveringLanes.getAsInteger()) - 1);
+      // Obtain the covering subregister index, including any missing indices
+      // within the identified small range. Although this may be suboptimal due
+      // to gaps in the subregisters that are not part of the copy bundle, it 
is
+      // benificial when components outside this range of the original tuple 
can
+      // be completely skipped from the reload.
+      SubReg = TRI.getSubRegIdxFromLaneMask(CoveringLanes);
+    }
+
+    // If the target doesn't support subreg reload, fallback to restoring the
+    // full tuple.
+    if (SubReg && !TRI.shouldEnableSubRegReload(SubReg))
+      SubReg = 0;
+
+    const TargetRegisterClass *OrigRC = MRI.getRegClass(Reg);
+    const TargetRegisterClass *NewRC =
+        SubReg ? TRI.getSubRegisterClass(OrigRC, SubReg) : nullptr;
----------------
cdevadas wrote:

The subreg reload brings two advantages. 
1. Currently, when a tuple is reloaded, the full tuple becomes live at the 
reload point, even if only a subset of its components is actually needed. On 
targets like AMDGPU, this creates difficulties later during the expansion of 
the reload pseudo-instruction into individual reload operations, because the 
unused or undefined subregisters still appear live. They are often patched with 
ad hoc fixups such as inserting implicit-def or implicit operands for the 
unneeded tuple components to avoid miscompilations. The subreg reload fixes 
this broken liveness info for partial uses of tuples chosen for spilling. It 
avoids introducing spurious undef subregs and eliminates the need for such 
hacky post-RA workarounds.
2. Trimming down the registers really helps improve the allocation.  Instead of 
the full tuple, we ensure RA reloads only the relevant subregs.

It is not clear to me how RA will see `%reload = INSERT_SUBREG undef, ..` (the 
one you suggested). We may miss the two advantages I mentioned here.

https://github.com/llvm/llvm-project/pull/175002
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to