On Tue, 18 Oct 2022 08:57:37 PDT (-0700), j...@ventanamicro.com wrote:

Just a couple more comments in-line.

On 10/18/22 09:18, Manolis Tsamis wrote:

+/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
+
+static sbitmap
+riscv_get_separate_components (void)
+{
+  HOST_WIDE_INT offset;
+  sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
+  bitmap_clear (components);
+
+  if (riscv_use_save_libcall (&cfun->machine->frame)
+      || cfun->machine->interrupt_handler_p)
riscv_use_save_libcall() already checks interrupt_handler_p, so that's
redundant.  That said, I'm not sure riscv_use_save_libcall() is the
right check here as unless I'm missing something we don't have all those
other constraints when shrink-wrapping.

riscv_use_save_libcall returns false when interrupt_handler_p is true, so the
check for interrupt_handler_p in the branch is not redundant in this case.

I encountered some issues when shrink wrapping and libcall was used in the same
function. Thinking that libcall replaces the prologue/epilogue I didn't see a
reason to have both at the same time and hence I opted to disable
shrink wrapping in that case. From my understanding this should be harmless?

I would have expected things to work fine with libcalls, perhaps with
the exception of the save/restore libcalls.  So that needs deeper
investigation.

The save/restore libcalls only support saving/restoring a handful of register configurations (just the saved X registers in the order they're usually saved in by GCC). It should be OK for correctness to over-save registers, but it kind of just un-does the shrink wrapping so not sure it's worth worrying about at that point.

There's also some oddness around the save/restore libcall ABI, it's not the standard function ABI but instead a GCC-internal one. IIRC it just uses the alternate link register (ie, t0 instead of ra) but I may have forgotten something else.

It seems kind of clunky to have two copies of all these loops (and we'll
need a third to make this work with the V stuff), but we've got that
issue elsewhere in the port so I don't think you need to fix it here
(though the V stuff will be there for the v2, so you'll need the third
copy of each loop).

Indeed, I was following the other ports here. Do you think it would be
better to refactor this when the code for the V extension is added?
By taking into account what code will be needed for V, a proper refactored
function could be made to handle all cases.

I think refactoring when V gets added would be fine.  While we could
probably refactor it correctly now (it isn't terribly complex code after
all), but we're more likely to get it right with the least amount of
work if we do it when V is submitted.

Some of the V register blocks are already there, but ya I agree we can just wait. There's going to be a bunch of V-related churn for a bit, juggling those patches is already enough of a headache ;)

Either way, this deserves a test case.  I think it should be possible to
write one by introducing some register pressure around a
shrink-wrappable block that needs a long stack offset and making sure
in-flight registers don't get trashed.

I tried to think of some way to introduce a test like that but couldn't and
I don't see how it would be done. Shrink wrapping only affects saved registers
so there are always available temporaries that are not affected by
shrink wrapping.
(Register pressure should be irrelevant in this case if I understand correctly).
Also the implementation checks for SMALL_OPERAND (offset) shrink wrapping
should be unaffected from long stack offsets. If you see some way to write
a test for that based on what I explained please explain how I could do that.

I think the register pressure was just to ensure that some saves were
needed to trigger an attempt to shrink wrap something.  You'd also need
something to eat stack space (local array which gets referenced as an
asm operand, but where the asm doesn't generate any code perhaps)? 
Whether or not that works depends on stack layout though which I don't
know well enough for riscv.

Sorry for being a bit vague, but it's because I always find it takes a bit of time to write up tests like this. I think something like this might do it, but that almost certainly won't work as-is:

   // Some extern bits to try and trip up the optimizer.
   extern long helper(long *sa, long a, long b, long c, ...);
   extern long glob_array[1024];
// The function takes a bunch of arguments to fill up the A // registers.
   long func(long a, long b, long c, ...) {
long stack_array[1024]; // should be big enough to make SP offsets // take a temporary // Here I'm loading up all the T registers with something that // must be saved to the stack, since those helper calls below may // change the value of glob_array. We probably need to fill the S // registers too?
     long t0 = glob_array[arg + 0];
     long t1 = glob_array[arg + 1];
     long t2 = ...
// This should trigger shrink-wrapping, as the function calls will // need a bunch of register state saved. The idea is to make sure // all the other registers are somehow in flight at this point and // could only be recovered via their saved copies, but due to the // large stack we'll need a temporary to construct the slot // addresses.
     if (a) {
       t0 = helper(stack_array, a, b, ...);
     }
// Go re-use those temporaries, again indirect to avoid // optimization.
     a += glob_array[t0];
     b += glob_array[t1];
     c ...;
     return helper(stack_array, a, b, c, ...);
   }

At that point you should end up with the save/restore code in that if block needing to do something like

   lui  TMP, IMM_HI
   addi TMP, sp, IMM_LO
   sd REG0, 0(t0)
   sd REG1, 8(t0)
   ...

and if one of those registers it's trying to save is TMP then we're in trouble.

Generally looks pretty good though.

Ya, and a useful optimization so thanks for the help ;)



Jeff

Reply via email to