From: Waiman Long
> Sent: 03 May 2024 17:00
> To: David Laight <david.lai...@aculab.com>; 'linux-kernel@vger.kernel.org' 
> <linux-
> ker...@vger.kernel.org>; 'pet...@infradead.org' <pet...@infradead.org>
> Cc: 'mi...@redhat.com' <mi...@redhat.com>; 'w...@kernel.org' 
> <w...@kernel.org>; 'boqun.f...@gmail.com'
> <boqun.f...@gmail.com>; 'Linus Torvalds' <torva...@linux-foundation.org>; 
> 'virtualization@lists.linux-
> foundation.org' <virtualizat...@lists.linux-foundation.org>; 'Zeng Heng' 
> <zenghe...@huawei.com>
> Subject: Re: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and 
> per_cpu_ptr().
> 
> 
> On 12/31/23 23:14, Waiman Long wrote:
> >
> > On 12/31/23 16:55, David Laight wrote:
> >> per_cpu_ptr() indexes __per_cpu_offset[] with the cpu number.
> >> This requires the cpu number be 64bit.
> >> However the value is osq_lock() comes from a 32bit xchg() and there
> >> isn't a way of telling gcc the high bits are zero (they are) so
> >> there will always be an instruction to clear the high bits.
> >>
> >> The cpu number is also offset by one (to make the initialiser 0)
> >> It seems to be impossible to get gcc to convert
> >> __per_cpu_offset[cpu_p1 - 1]
> >> into (__per_cpu_offset - 1)[cpu_p1] (transferring the offset to the
> >> address).
> >>
> >> Converting the cpu number to 32bit unsigned prior to the decrement means
> >> that gcc knows the decrement has set the high bits to zero and doesn't
> >> add a register-register move (or cltq) to zero/sign extend the value.
> >>
> >> Not massive but saves two instructions.
> >>
> >> Signed-off-by: David Laight <david.lai...@aculab.com>
> >> ---
> >>   kernel/locking/osq_lock.c | 6 ++----
> >>   1 file changed, 2 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> >> index 35bb99e96697..37a4fa872989 100644
> >> --- a/kernel/locking/osq_lock.c
> >> +++ b/kernel/locking/osq_lock.c
> >> @@ -29,11 +29,9 @@ static inline int encode_cpu(int cpu_nr)
> >>       return cpu_nr + 1;
> >>   }
> >>   -static inline struct optimistic_spin_node *decode_cpu(int
> >> encoded_cpu_val)
> >> +static inline struct optimistic_spin_node *decode_cpu(unsigned int
> >> encoded_cpu_val)
> >>   {
> >> -    int cpu_nr = encoded_cpu_val - 1;
> >> -
> >> -    return per_cpu_ptr(&osq_node, cpu_nr);
> >> +    return per_cpu_ptr(&osq_node, encoded_cpu_val - 1);
> >>   }
> >>     /*
> >
> > You really like micro-optimization.
> >
> > Anyway,
> >
> > Reviewed-by: Waiman Long <long...@redhat.com>
> >
> David,
> 
> Could you respin the series based on the latest upstream code?

Looks like a wet bank holiday weekend.....

        David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Reply via email to