Hi,
I am running into a peculiar problem here.
I am getting stack-traces from two sources, i) spin_lock BUG_ON "cpu
recursion", and ii) remote gdb running on my host and connected to the
guest kernel (the one I am debugging).
spin_lock BUG_ON stack-trace says:
BUG: spinlock recursion on CPU#0, qsbench/2169
lock: c6783618, .magic: dead4ead, .owner: qsbench/2169, .owner_cpu: 0
Pid: 2169, comm: qsbench Not tainted 2.6.27.4-sg-kgdb #2
[<c0644bb7>] ? printk+0xf/0x18
[<c05096ae>] spin_bug+0x7c/0x87
[<c0509766>] _raw_spin_lock+0x35/0xfa
[<c0647509>] _spin_lock+0x3d/0x4b
[<c048189e>] grab_swap_token+0x20c/0x246 <===
[<c0477873>] handle_mm_fault+0x35b/0x70a
[<c0649352>] ? do_page_fault+0x2a6/0x722
[<c0649425>] do_page_fault+0x379/0x722
[<c0445d82>] ? __lock_acquire+0x7bd/0x814
[<c0445d82>] ? __lock_acquire+0x7bd/0x814
[<c043f9c4>] ? getnstimeofday+0x3c/0xd6
....
And gdb shows:
#0 0xc0505f1b in delay_tsc (loops=1) at arch/x86/lib/delay.c:85
#1 0xc0505f77 in __udelay (usecs=2394097143) at arch/x86/lib/delay.c:118
#2 0xc6866d98 in ?? ()
#3 0xc05097ba in _raw_spin_lock (lock=0xc6783618) at lib/spinlock_debug.c:116
#4 0xc0647509 in _spin_lock_bh (lock=0xc6783628) at kernel/spinlock.c:113
#5 0xc048189e in dmam_pool_match (dev=<value optimized out>, res=0x2d5,
match_data=0x0) at mm/dmapool.c:457 <===
#6 0x00000001 in ?? ()
If you notice the lines I have shown with arrows, the address
0xc048189e is shown in different function in each case.
What is more surprising, System.map says
c0481692 T grab_swap_token
c04818d8 t dmam_pool_match
(consistent with the spin_lock BUG_ON trace)
But on going through the output of 'objdump -s -l --source' I found
that 'dmam_pool_match' starts at address 0xc0481898.
(consistent with the gdb back-trace output)
I am confused about the source of the 'spinlock cpu recursion'. I have
checked the code I have written (grab_swap_token modifications etc in
mm/thrash.c) , it doesn't appears to be the source of the problem. I
haven't touched the function dmam_pool_match() and it doesn't appears
in mm/thrash.c
>>> But now I am getting a "BUG: spinlock recursion on CPU#0" error. What
>>> does this error mean ?
>>
>> yes, it actually means that the previous lock is still not released yet, and
>> now u are calling a function to acquire the spinlock again......and somehow,
>> before actually acquiring the spinlock, a check was made:
>>
>> lib/spinlock_debug.c:
>>
>> static inline void
>> debug_spin_lock_before(spinlock_t *lock)
>> {
>> SPIN_BUG_ON(lock->magic != SPINLOCK_MAGIC, lock, "bad magic");
>> SPIN_BUG_ON(lock->owner == current, lock, "recursion");
>> SPIN_BUG_ON(lock->owner_cpu == raw_smp_processor_id(),
>> lock, "cpu
>> recursion");
>> }
>>
>> so now the above "recursion" was detected, as the spinlock is still residing
>> on the same CPU while being reacquired. so instead of going into a tight
>> spin, it print out the stack trace before hand.
>>
Does it mean that it's illegal for two processes (on the same cpu)
spinning on a lock that is held by someone else (again on the same
cpu) ?
--
Regards,
Sukanto Ghosh
--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to [email protected]
Please read the FAQ at http://kernelnewbies.org/FAQ