> -----Original Message-----
> From: Segher Boessenkool <seg...@kernel.crashing.org>
> Sent: Wednesday, July 2, 2025 10:22 PM
> To: Cui, Lili <lili....@intel.com>
> Cc: ubiz...@gmail.com; gcc-patches@gcc.gnu.org; Liu, Hongtao
> <hongtao....@intel.com>; richard.guent...@gmail.com; Michael Matz
> <m...@suse.de>
> Subject: Re: [PATCH V3] x86: Enable separate shrink wrapping
> 
> On Wed, Jul 02, 2025 at 01:32:37PM +0000, Cui, Lili wrote:
> > > > +  /* Don't mess with the following registers.  */  if
> > > > + (frame_pointer_needed)
> > > > +    bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM);
> > >
> > > What is that about?  Isn't that one of the bigger possible wins?
> >
> > Good question!
> 
> I know :-)
> 
> > Initially, I looked at other architectures and disabled the hard frame
> > pointer,
> 
> Like aarch?  Yeah I always wondered why they don't do it.  I decided that that
> is because of their ABI and architecture stuff they can save and restore their
> frame reg (r29) with the same insn as they use for the link reg (r30).  Of
> course they could do code to do tradeoffs there, but apparently they did no
> see the use for that, or perhaps from experience knew what way this would
> fall in the end.
> 

Loongarch/rs6000/riscv/aarch64 all disable HARD_FRAME_POINTER_REGNUM.

> > but after reconsidering, I realized your point makes sense. If the
> > hard frame pointer were enabled,  we would typically emit push %rbp
> > and mov %rsp, %rbp at the first of prologue,  there is no room for
> > separate shrink wrap, but if the function itself also use rbp, there
> > might be room for optimization,
> 
> Yup, when using a frame pointer (hard or otherwise, and a very bad plan
> nowadays, a 1970's thing) you typically get the frame pointer established very
> first thing, anything that touches the frame needs it after all!
> 
> But not all code accesses the frame, many early-out paths do not for
> example.
> 

Yes, currently we do shrink-wrap for the entire prologue (including the 
HARD_FRAME_POINTER), it can solve some early return issues. But we can't do 
separate-shrink-wrap for HARD_FRAME_POINTER, because HARD_FRAME_POINTER needs 
to record rsp before rsp points to the bottom of stack. We have to put it at 
the beginning of the prologue, and we have no chance to shrink it individually.

I removed these two lines of code and conducted a comparison test,  and found 
that the binary unchanged. Unfortunately, I didn't identify any opportunities 
for optimization, I think it's better to keep them. Not sure if there might be 
any corner case issues.

Thanks,
Lili.

> > I took out these two lines and ran some tests, and everything seems fine. I
> will do more testing t and try to find a case where the optimization is really
> made.
> 
> For x86 all insns that access the frame explicitly refer to the (hard) frame
> pointer register I think?  So yeah, then things should just work like that :-)
> 
> Good luck, have fun, don't do cargo-cult,
> 
> 
> Segher

Reply via email to