On Wed, Feb 18, 2009 at 3:58 AM, Sukanto Ghosh <[email protected]>wrote:
> Hi Peter, > > Accidentally I came upon this article: > http://stackframe.blogspot.com/2007/04/debugging-linux-kernels-with.html > > With the help of it I could get a stacktrace when the kernel hung. > > The problem was a stupid one: I was holding a spin_lock and then I > called some function that again tries to hold the same lock. Now I > release the lock before I call that method. > > I was working with mm/thrash.c > > But now I am getting a "BUG: spinlock recursion on CPU#0" error. What > does this error mean ? yes, it actually means that the previous lock is still not released yet, and now u are calling a function to acquire the spinlock again......and somehow, before actually acquiring the spinlock, a check was made: lib/spinlock_debug.c: static inline void debug_spin_lock_before(spinlock_t *lock) { SPIN_BUG_ON(lock->magic != SPINLOCK_MAGIC, lock, "bad magic"); SPIN_BUG_ON(lock->owner == current, lock, "recursion"); SPIN_BUG_ON(lock->owner_cpu == raw_smp_processor_id(), lock, "cpu recursion"); } so now the above "recursion" was detected, as the spinlock is still residing on the same CPU while being reacquired. so instead of going into a tight spin, it print out the stack trace before hand. > > Does it mean again it's spinning indefinitely on a spinlock ? > > I got the following backtrace: > #0 0xc0505f1b in delay_tsc (loops=1) at arch/x86/lib/delay.c:85 > #1 0xc0505f77 in __udelay (usecs=3479683981) at arch/x86/lib/delay.c:118 > #2 0xc700ad98 in ?? () > #3 0xc05097ba in _raw_spin_lock (lock=0xc6c34998) at > lib/spinlock_debug.c:116 > #4 0xc0647509 in _spin_lock_bh (lock=0xc6c349a8) at kernel/spinlock.c:113 > #5 0xc048189e in dmam_pool_match (dev=<value optimized out>, res=0x1cb, > match_data=0x0) at mm/dmapool.c:457 > #6 0x00000001 in ?? () > > > While writing this mail I had paused my guest OS kernel from gdb (^c) > for sometime and when I said continue (c) it printed: "Clocksource tsc > unstable (delta = 838972636559 ns) > > > Regards, > Sukanto Ghosh > > > > > > On Wed, Feb 18, 2009 at 4:52 AM, Peter Teoh <[email protected]> > wrote: > > Would u like to share WHERE u made the change? WHAT u do could be part > of > > academic exercise, so perhaps u want to keep confidential, but WHERE > would > > be helpful. > > > > I am suspecting (very usual for changes to MM codes) that u have done > > something illegal while holding a open spinlock. So knowing where u > insert > > codes, will help us to understand if this is a problem or not. > > > > On Sat, Feb 14, 2009 at 7:52 PM, Sukanto Ghosh < > [email protected]> > > wrote: > >> > >> Hi, > >> > >> I have made some changes to the memory management part of the kernel > >> as an experiment. Now when I boot into that kernel and start some > >> heavy processes (which cause paging), the kernel hangs. I can't even > >> type anything. > >> > >> I have gone through the 'paper on debugging kernel oops or hang' > >> (http://mail.nl.linux.org/kernelnewbies/2003-08/msg00347.html) > >> > >> In this paper Erik says that to get the stack trace we can type > >> 'Alt-SysRq-t' which prints the stack trace and when it's not possible > >> to type anything, then it's best to use serial port + console. he says > >> the config for lilo would be: console=ttyS0,9600 console=tty0 > >> > >> As I have grub I am using the following lines: > >> > >> default=0 > >> timeout=15 > >> title Fedora (2.6.27.4) > >> root (hd0,0) > >> kernel /boot/vmlinuz-2.6.27.4 ro root=/dev/sda1 > >> initrd /boot/initrd-2.6.27.4.img > >> serial --unit=0 --speed=9600 --word=8 --parity=no --stop=1 > >> terminal --dumb --timeout=10 serial console > >> > >> > >> > >> CONFIG_MAGIC_SYSRQ was enabled in my config file. > >> > >> My test kernel is running inside a Virtual machine (VM) (VMware), with > >> its serial port 0 redirected to a file. > >> VM OS: fedora Core 9 with modified kernel 2.6.27.4 > >> Host OS: ubuntu hardy 2.6.24.3 > >> > >> My problem is I am not getting any kind of output in the file to which > >> I redirected the serial port of the VM except a bunch of "Press any > >> key to continue .. " messages. > >> > >> should I be providing the 'alt-sysrq-t' input through the serial port, > >> if so, how ? > >> can i connect a host terminal to the serial port of the VM. > >> Vmware gives me three options about the serial port of the Virtual > Machine > >> i) connect it to physical port of the host, ii) connect to a named > >> pipe and, iii)connect it to a file in the host. > >> > >> Please help ... > >> > >> > >> -- > >> Regards, > >> Sukanto Ghosh > >> > >> -- > >> To unsubscribe from this list: send an email with > >> "unsubscribe kernelnewbies" to [email protected] > >> Please read the FAQ at http://kernelnewbies.org/FAQ > >> > > > > > > > > -- > > Regards, > > Peter Teoh > > > -- Regards, Peter Teoh
