Hi Leonardo,

On 03/27/2020 03:51 PM, Leonardo Bras wrote:
Hello Christophe, thanks for the feedback.

I noticed an error in this patch and sent a v2, that can be seen here:
http://patchwork.ozlabs.org/patch/1262468/

Comments inline::

On Fri, 2020-03-27 at 07:50 +0100, Christophe Leroy wrote:
@@ -142,6 +144,8 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
                if (likely(__arch_spin_trylock(lock) == 0))
                        break;
                do {
+                       if (unlikely(crash_skip_spinlock))
+                               return;

Complete function for reference:
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
        while (1) {
                if (likely(__arch_spin_trylock(lock) == 0))
                        break;
                do {
                        if (unlikely(crash_skip_spinlock))
                                return;
                        HMT_low();
                        if (is_shared_processor())
                                splpar_spin_yield(lock);
                } while (unlikely(lock->slock != 0));
                HMT_medium();
        }
}

You are adding a test that reads a global var in the middle of a so hot
path ? That must kill performance.

I thought it would, in worst case scenario, increase a maximum delay of
an arch_spin_lock() call 1 spin cycle. Here is what I thought:

- If the lock is already free, it would change nothing,
- Otherwise, the lock will wait.
- Waiting cycle just got bigger.
- Worst case scenario: running one more cycle, given lock->slock can
turn to 0 just after checking.

Could you please point where I failed to see the performance penalty?
(I need to get better at this :) )

You are right that when the lock is free, it changes nothing. However when it is not, it is not just one cycle.

Here is arch_spin_lock() without your patch:

00000440 <my_lock>:
 440:   39 40 00 01     li      r10,1
 444:   7d 20 18 28     lwarx   r9,0,r3
 448:   2c 09 00 00     cmpwi   r9,0
 44c:   40 82 00 10     bne     45c <my_lock+0x1c>
 450:   7d 40 19 2d     stwcx.  r10,0,r3
 454:   40 a2 ff f0     bne     444 <my_lock+0x4>
 458:   4c 00 01 2c     isync
 45c:   2f 89 00 00     cmpwi   cr7,r9,0
 460:   4d be 00 20     bclr+   12,4*cr7+eq
 464:   7c 21 0b 78     mr      r1,r1
 468:   81 23 00 00     lwz     r9,0(r3)
 46c:   2f 89 00 00     cmpwi   cr7,r9,0
 470:   40 be ff f4     bne     cr7,464 <my_lock+0x24>
 474:   7c 42 13 78     mr      r2,r2
 478:   7d 20 18 28     lwarx   r9,0,r3
 47c:   2c 09 00 00     cmpwi   r9,0
 480:   40 82 00 10     bne     490 <my_lock+0x50>
 484:   7d 40 19 2d     stwcx.  r10,0,r3
 488:   40 a2 ff f0     bne     478 <my_lock+0x38>
 48c:   4c 00 01 2c     isync
 490:   2f 89 00 00     cmpwi   cr7,r9,0
 494:   40 be ff d0     bne     cr7,464 <my_lock+0x24>
 498:   4e 80 00 20     blr

Here is arch_spin_lock() with your patch. I enclose with === what comes in addition:

00000440 <my_lock>:
 440:   39 40 00 01     li      r10,1
 444:   7d 20 18 28     lwarx   r9,0,r3
 448:   2c 09 00 00     cmpwi   r9,0
 44c:   40 82 00 10     bne     45c <my_lock+0x1c>
 450:   7d 40 19 2d     stwcx.  r10,0,r3
 454:   40 a2 ff f0     bne     444 <my_lock+0x4>
 458:   4c 00 01 2c     isync
 45c:   2f 89 00 00     cmpwi   cr7,r9,0
 460:   4d be 00 20     bclr+   12,4*cr7+eq
=====================================================
 464:   3d 40 00 00     lis     r10,0
                        466: R_PPC_ADDR16_HA    crash_skip_spinlock
 468:   39 4a 00 00     addi    r10,r10,0
                        46a: R_PPC_ADDR16_LO    crash_skip_spinlock
 46c:   39 00 00 01     li      r8,1
 470:   89 2a 00 00     lbz     r9,0(r10)
 474:   2f 89 00 00     cmpwi   cr7,r9,0
 478:   4c 9e 00 20     bnelr   cr7
=====================================================
 47c:   7c 21 0b 78     mr      r1,r1
 480:   81 23 00 00     lwz     r9,0(r3)
 484:   2f 89 00 00     cmpwi   cr7,r9,0
 488:   40 be ff f4     bne     cr7,47c <my_lock+0x3c>
 48c:   7c 42 13 78     mr      r2,r2
 490:   7d 20 18 28     lwarx   r9,0,r3
 494:   2c 09 00 00     cmpwi   r9,0
 498:   40 82 00 10     bne     4a8 <my_lock+0x68>
 49c:   7d 00 19 2d     stwcx.  r8,0,r3
 4a0:   40 a2 ff f0     bne     490 <my_lock+0x50>
 4a4:   4c 00 01 2c     isync
 4a8:   2f 89 00 00     cmpwi   cr7,r9,0
 4ac:   40 be ff c4     bne     cr7,470 <my_lock+0x30>
 4b0:   4e 80 00 20     blr


Christophe

Reply via email to