Re: (reiserfs hang at boot) where is the kernel debugger?
On Tue, 3 Oct 2000 12:32:37 -0300 (BRST), Rik van Riel <[EMAIL PROTECTED]> wrote: >On Wed, 4 Oct 2000, Keith Owens wrote: >> Rik van Riel <[EMAIL PROTECTED]> wrote: >> >Sysrq-T is broken on x86 ; >> >> show_task() calls thread_saved_pc() which is giving bad results. >> Getting the correct PC for blocked threads is easy, >> But it does not give you much. Thread esp and eip are only >> saved during switch_to(), at which point eip always points to >> schedule+0x42c. > >Yup ;) > >So this function will need to look at the call trace and >give the function that called schedule() ... Shudder. I had to do that for kdb and it is as ugly as sin. See kdba_prologue and kdb_get_next_ar in the kdb patch, especially the comments at the start of kdb_get_next_ar. ix86 back trace has special cases galore. This is why an oops dumps so much rubbish in the "call trace" on ix86, it is just too hard to get a correct call trace so we print anything on stack that might be a kernel or module address and expect the user to filter out all the false positives. Also bear in mind that for running threads you have no idea where the stack pointer is, esp is not saved in the process table unless the thread blocks. So sysrq-T cannot even think about looking at the stack for running threads on other cpus unless you force them to block first. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: (reiserfs hang at boot) where is the kernel debugger?
On Wed, 4 Oct 2000, Keith Owens wrote: > Rik van Riel <[EMAIL PROTECTED]> wrote: > >Sysrq-T is broken on x86 ; > > show_task() calls thread_saved_pc() which is giving bad results. > Getting the correct PC for blocked threads is easy, > > Index: 0-test9-pre9.3/include/asm-i386/processor.h > --- 0-test9-pre9.3/include/asm-i386/processor.h Tue, 08 Aug 2000 16:14:08 +1000 kaos >(linux-2.4/P/18_processor. 1.1.1.5 644) > +++ 0-test9-pre9.3(w)/include/asm-i386/processor.h Wed, 04 Oct 2000 01:48:32 +1100 >kaos (linux-2.4/P/18_processor. 1.1.1.5 644) > @@ -411,7 +411,7 @@ extern void forget_segments(void); > * Return saved PC of a blocked thread. > */ > extern inline unsigned long thread_saved_pc(struct thread_struct *t) > { > - return ((unsigned long *)t->esp)[3]; > + return (t->eip); > } > > But it does not give you much. Thread esp and eip are only > saved during switch_to(), at which point eip always points to > schedule+0x42c. Yup ;) So this function will need to look at the call trace and give the function that called schedule() ... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: (reiserfs hang at boot) where is the kernel debugger?
On Sun, 1 Oct 2000 23:50:17 -0300 (BRST), Rik van Riel <[EMAIL PROTECTED]> wrote: >On Sun, 1 Oct 2000, David Ford wrote: >> During normal operation of the machine, -T shows processes >> having PCs of 0x and 0x7f00 which strikes me as a >> bit odd. >> >> For e.g. the following: >> >> sshd S 7FFF 0 247 88 248 (NOTLB) >> 121 >> sig: 0 : X >> bash S 0 248247 263 (NOTLB) >> sig: 0 0001 : X > >Sysrq-T is broken on x86 ; > >(very much to my dismay ... this is one of the best >debugging helps we have^Whad and I could have used >it quite well) show_task() calls thread_saved_pc() which is giving bad results. Getting the correct PC for blocked threads is easy, Index: 0-test9-pre9.3/include/asm-i386/processor.h --- 0-test9-pre9.3/include/asm-i386/processor.h Tue, 08 Aug 2000 16:14:08 +1000 kaos (linux-2.4/P/18_processor. 1.1.1.5 644) +++ 0-test9-pre9.3(w)/include/asm-i386/processor.h Wed, 04 Oct 2000 01:48:32 +1100 +kaos (linux-2.4/P/18_processor. 1.1.1.5 644) @@ -411,7 +411,7 @@ extern void forget_segments(void); * Return saved PC of a blocked thread. */ extern inline unsigned long thread_saved_pc(struct thread_struct *t) { - return ((unsigned long *)t->esp)[3]; + return (t->eip); } But it does not give you much. Thread esp and eip are only saved during switch_to(), at which point eip always points to schedule+0x42c. If the task is running on a cpu (the interesting case) then neither t->esp nor t->eip contain useful values so you cannot get the PC for running tasks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: (reiserfs hang at boot) where is the kernel debugger?
On Sun, 1 Oct 2000 23:50:17 -0300 (BRST), Rik van Riel [EMAIL PROTECTED] wrote: On Sun, 1 Oct 2000, David Ford wrote: During normal operation of the machine, -T shows processes having PCs of 0x and 0x7f00 which strikes me as a bit odd. For e.g. the following: sshd S 7FFF 0 247 88 248 (NOTLB) 121 sig: 0 : X bash S 0 248247 263 (NOTLB) sig: 0 0001 : X Sysrq-T is broken on x86 ; (very much to my dismay ... this is one of the best debugging helps we have^Whad and I could have used it quite well) show_task() calls thread_saved_pc() which is giving bad results. Getting the correct PC for blocked threads is easy, Index: 0-test9-pre9.3/include/asm-i386/processor.h --- 0-test9-pre9.3/include/asm-i386/processor.h Tue, 08 Aug 2000 16:14:08 +1000 kaos (linux-2.4/P/18_processor. 1.1.1.5 644) +++ 0-test9-pre9.3(w)/include/asm-i386/processor.h Wed, 04 Oct 2000 01:48:32 +1100 +kaos (linux-2.4/P/18_processor. 1.1.1.5 644) @@ -411,7 +411,7 @@ extern void forget_segments(void); * Return saved PC of a blocked thread. */ extern inline unsigned long thread_saved_pc(struct thread_struct *t) { - return ((unsigned long *)t-esp)[3]; + return (t-eip); } But it does not give you much. Thread esp and eip are only saved during switch_to(), at which point eip always points to schedule+0x42c. If the task is running on a cpu (the interesting case) then neither t-esp nor t-eip contain useful values so you cannot get the PC for running tasks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: (reiserfs hang at boot) where is the kernel debugger?
On Wed, 4 Oct 2000, Keith Owens wrote: Rik van Riel [EMAIL PROTECTED] wrote: Sysrq-T is broken on x86 ; show_task() calls thread_saved_pc() which is giving bad results. Getting the correct PC for blocked threads is easy, Index: 0-test9-pre9.3/include/asm-i386/processor.h --- 0-test9-pre9.3/include/asm-i386/processor.h Tue, 08 Aug 2000 16:14:08 +1000 kaos (linux-2.4/P/18_processor. 1.1.1.5 644) +++ 0-test9-pre9.3(w)/include/asm-i386/processor.h Wed, 04 Oct 2000 01:48:32 +1100 kaos (linux-2.4/P/18_processor. 1.1.1.5 644) @@ -411,7 +411,7 @@ extern void forget_segments(void); * Return saved PC of a blocked thread. */ extern inline unsigned long thread_saved_pc(struct thread_struct *t) { - return ((unsigned long *)t-esp)[3]; + return (t-eip); } But it does not give you much. Thread esp and eip are only saved during switch_to(), at which point eip always points to schedule+0x42c. Yup ;) So this function will need to look at the call trace and give the function that called schedule() ... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: (reiserfs hang at boot) where is the kernel debugger?
Rik van Riel wrote: > Schedule() is the last function in the kernel they > went into before they got scheduled away ;) > > The second last function is the one you're interested > in ... Hmm, 'k. > > PC: schedule() > > -1: down() > > -2: down_fail() > Then I guess something was trying to take the same > semaphore twice and deadlocked, taking the rest of > the system with it... That sounds 'bout right. Every new process gets stuck at the same location too. (reiserfs list) Do you have a comment? > > sshd S 7FFF 0 247 88 248 (NOTLB) > > 121 > > sig: 0 : X > > bash S 0 248247 263 (NOTLB) > > sig: 0 0001 : X > > Sysrq-T is broken on x86 ; > > (very much to my dismay ... this is one of the best > debugging helps we have^Whad and I could have used > it quite well) Oh that doesn't make me happy. What's necessary to fix it to get useful information? -d -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President begin:vcard n:Ford;David x-mozilla-html:TRUE org:http://www.kalifornia.com/images/paradise.jpg"> adr:;; version:2.1 email;internet:[EMAIL PROTECTED] title:Blue Labs Developer x-mozilla-cpt:;28256 fn:David Ford end:vcard
Re: (reiserfs hang at boot) where is the kernel debugger?
On Sun, 1 Oct 2000, David Ford wrote: > Rik van Riel wrote: > > > > How broken is it? I have a test9-pre7 system that's exhibits an > > > elusive bug, reiserfs hangs at boot time, and all I need is a > > > backtrace on the D state processes. > > > > Could be a VM bug. ;) > > It could, but I strongly doubt it. We've seen this bug [very] > infrequently for the last year. OK, then it's almost certainly not VM related... (which also means I can't fix it in little time) > I'm rather hesitant to trust my findings, kdb says all the > processes are in schedule+nn. Yes, it can be related but I'm a > wee bit dubious. Schedule() is the last function in the kernel they went into before they got scheduled away ;) The second last function is the one you're interested in ... > I'd post all my trace findings but I really don't want to cross > type all that right now -- I left my serial cable at the office. > I'll post the traces tomorrow. The simple version of them is as > follows: > > PC: schedule() > -1: down() > -2: down_fail() > ... > > Some processes have devfs calls following this, some have > typical kernel init calls etc., but the common factor in all of > them is they all sit in schedule and they all have the same PC > location. Then I guess something was trying to take the same semaphore twice and deadlocked, taking the rest of the system with it... > During normal operation of the machine, -T shows processes > having PCs of 0x and 0x7f00 which strikes me as a > bit odd. > > For e.g. the following: > > sshd S 7FFF 0 247 88 248 (NOTLB) > 121 > sig: 0 : X > bash S 0 248247 263 (NOTLB) > sig: 0 0001 : X Sysrq-T is broken on x86 ; (very much to my dismay ... this is one of the best debugging helps we have^Whad and I could have used it quite well) regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ begin:vcard n:Ford;David x-mozilla-html:TRUE org:http://www.kalifornia.com/images/paradise.jpg"> adr:;; version:2.1 email;internet:[EMAIL PROTECTED] title:Blue Labs Developer x-mozilla-cpt:;28256 fn:David Ford end:vcard
Re: (reiserfs hang at boot) where is the kernel debugger?
Rik van Riel wrote: > > How broken is it? I have a test9-pre7 system that's exhibits an > > elusive bug, reiserfs hangs at boot time, and all I need is a > > backtrace on the D state processes. > > Could be a VM bug. ;) It could, but I strongly doubt it. We've seen this bug [very] infrequently for the last year. I managed to build a system and trigger it four out of five times on boot. Frustratingly after adding kdb, I'm only able to trigger it once in ~20 boots now. I'm rather hesitant to trust my findings, kdb says all the processes are in schedule+nn. Yes, it can be related but I'm a wee bit dubious. I'd post all my trace findings but I really don't want to cross type all that right now -- I left my serial cable at the office. I'll post the traces tomorrow. The simple version of them is as follows: PC: schedule() -1: down() -2: down_fail() ... Some processes have devfs calls following this, some have typical kernel init calls etc., but the common factor in all of them is they all sit in schedule and they all have the same PC location. SysRq-T shows C111914B which is outside system.map but again I'm dubious. During normal operation of the machine, -T shows processes having PCs of 0x and 0x7f00 which strikes me as a bit odd. For e.g. the following: sshd S 7FFF 0 247 88 248 (NOTLB) 121 sig: 0 : X bash S 0 248247 263 (NOTLB) sig: 0 0001 : X BTW, if nobody has any qualms, I'm going to poke at this output and tidy it up, the header is disjointed from the body. -d -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President begin:vcard n:Ford;David x-mozilla-html:TRUE org:http://www.kalifornia.com/images/paradise.jpg"> adr:;; version:2.1 email;internet:[EMAIL PROTECTED] title:Blue Labs Developer x-mozilla-cpt:;28256 fn:David Ford end:vcard
Re: (reiserfs hang at boot) where is the kernel debugger?
Rik van Riel wrote: How broken is it? I have a test9-pre7 system that's exhibits an elusive bug, reiserfs hangs at boot time, and all I need is a backtrace on the D state processes. Could be a VM bug. ;) It could, but I strongly doubt it. We've seen this bug [very] infrequently for the last year. I managed to build a system and trigger it four out of five times on boot. Frustratingly after adding kdb, I'm only able to trigger it once in ~20 boots now. I'm rather hesitant to trust my findings, kdb says all the processes are in schedule+nn. Yes, it can be related but I'm a wee bit dubious. I'd post all my trace findings but I really don't want to cross type all that right now -- I left my serial cable at the office. I'll post the traces tomorrow. The simple version of them is as follows: PC: schedule() -1: down() -2: down_fail() ... Some processes have devfs calls following this, some have typical kernel init calls etc., but the common factor in all of them is they all sit in schedule and they all have the same PC location. SysRq-T shows C111914B which is outside system.map but again I'm dubious. During normal operation of the machine, -T shows processes having PCs of 0x and 0x7f00 which strikes me as a bit odd. For e.g. the following: sshd S 7FFF 0 247 88 248 (NOTLB) 121 sig: 0 : X bash S 0 248247 263 (NOTLB) sig: 0 0001 : X BTW, if nobody has any qualms, I'm going to poke at this output and tidy it up, the header is disjointed from the body. -d -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President begin:vcard n:Ford;David x-mozilla-html:TRUE org:img src="http://www.kalifornia.com/images/paradise.jpg" adr:;; version:2.1 email;internet:[EMAIL PROTECTED] title:Blue Labs Developer x-mozilla-cpt:;28256 fn:David Ford end:vcard
Re: (reiserfs hang at boot) where is the kernel debugger?
Rik van Riel wrote: Schedule() is the last function in the kernel they went into before they got scheduled away ;) The second last function is the one you're interested in ... Hmm, 'k. PC: schedule() -1: down() -2: down_fail() Then I guess something was trying to take the same semaphore twice and deadlocked, taking the rest of the system with it... That sounds 'bout right. Every new process gets stuck at the same location too. (reiserfs list) Do you have a comment? sshd S 7FFF 0 247 88 248 (NOTLB) 121 sig: 0 : X bash S 0 248247 263 (NOTLB) sig: 0 0001 : X Sysrq-T is broken on x86 ; (very much to my dismay ... this is one of the best debugging helps we have^Whad and I could have used it quite well) Oh that doesn't make me happy. What's necessary to fix it to get useful information? -d -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President begin:vcard n:Ford;David x-mozilla-html:TRUE org:img src="http://www.kalifornia.com/images/paradise.jpg" adr:;; version:2.1 email;internet:[EMAIL PROTECTED] title:Blue Labs Developer x-mozilla-cpt:;28256 fn:David Ford end:vcard