On Tue, Mar 23, 2021 at 05:00:08PM +0100, Jesper Dangaard Brouer wrote:
> > +   /*
> > +    * If there are no allowed local zones that meets the watermarks then
> > +    * try to allocate a single page and reclaim if necessary.
> > +    */
> > +   if (!zone)
> > +           goto failed;
> > +
> > +   /* Attempt the batch allocation */
> > +   local_irq_save(flags);
> > +   pcp = &this_cpu_ptr(zone->pageset)->pcp;
> > +   pcp_list = &pcp->lists[ac.migratetype];
> > +
> > +   while (allocated < nr_pages) {
> > +           page = __rmqueue_pcplist(zone, ac.migratetype, alloc_flags,
> > +                                                           pcp, pcp_list);
> 
> The function __rmqueue_pcplist() is now used two places, this cause the
> compiler to uninline the static function.
> 

This was expected. It was not something I was particularly happy with
but avoiding it was problematic without major refactoring.

> My tests show you should inline __rmqueue_pcplist().  See patch I'm
> using below signature, which also have some benchmark notes. (Please
> squash it into your patch and drop these notes).
> 

The cycle savings per element is very marginal at just 4 cycles. I
expect just the silly stat updates are way more costly but the series
that addresses that is likely to be controversial. As I know the cycle
budget for processing a packet is tight, I've applied the patch but am
keeping it separate to preserve the data in case someone points out that
is a big function to inline and "fixes" it.

-- 
Mel Gorman
SUSE Labs

Reply via email to