Re: (reiserfs hang at boot) where is the kernel debugger?

2000-10-03 Thread Keith Owens

On Tue, 3 Oct 2000 12:32:37 -0300 (BRST), 
Rik van Riel <[EMAIL PROTECTED]> wrote:
>On Wed, 4 Oct 2000, Keith Owens wrote:
>> Rik van Riel <[EMAIL PROTECTED]> wrote:
>> >Sysrq-T is broken on x86 ;
>> 
>> show_task() calls thread_saved_pc() which is giving bad results.
>> Getting the correct PC for blocked threads is easy,
>> But it does not give you much.  Thread esp and eip are only
>> saved during switch_to(), at which point eip always points to
>> schedule+0x42c.
>
>Yup ;)
>
>So this function will need to look at the call trace and
>give the function that called schedule() ...

Shudder.  I had to do that for kdb and it is as ugly as sin.  See
kdba_prologue and kdb_get_next_ar in the kdb patch, especially the
comments at the start of kdb_get_next_ar.  ix86 back trace has special
cases galore.  This is why an oops dumps so much rubbish in the "call
trace" on ix86, it is just too hard to get a correct call trace so we
print anything on stack that might be a kernel or module address and
expect the user to filter out all the false positives.

Also bear in mind that for running threads you have no idea where the
stack pointer is, esp is not saved in the process table unless the
thread blocks.  So sysrq-T cannot even think about looking at the stack
for running threads on other cpus unless you force them to block first.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs hang at boot) where is the kernel debugger?

2000-10-03 Thread Rik van Riel

On Wed, 4 Oct 2000, Keith Owens wrote:
> Rik van Riel <[EMAIL PROTECTED]> wrote:

> >Sysrq-T is broken on x86 ;
> 
> show_task() calls thread_saved_pc() which is giving bad results.
> Getting the correct PC for blocked threads is easy,
> 
> Index: 0-test9-pre9.3/include/asm-i386/processor.h
> --- 0-test9-pre9.3/include/asm-i386/processor.h Tue, 08 Aug 2000 16:14:08 +1000 kaos 
>(linux-2.4/P/18_processor. 1.1.1.5 644)
> +++ 0-test9-pre9.3(w)/include/asm-i386/processor.h Wed, 04 Oct 2000 01:48:32 +1100 
>kaos (linux-2.4/P/18_processor. 1.1.1.5 644)
> @@ -411,7 +411,7 @@ extern void forget_segments(void);
>   * Return saved PC of a blocked thread.
>   */
>  extern inline unsigned long thread_saved_pc(struct thread_struct *t)
>  {
> -   return ((unsigned long *)t->esp)[3];
> +   return (t->eip);
>  }
> 
> But it does not give you much.  Thread esp and eip are only
> saved during switch_to(), at which point eip always points to
> schedule+0x42c.

Yup ;)

So this function will need to look at the call trace and
give the function that called schedule() ...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs hang at boot) where is the kernel debugger?

2000-10-03 Thread Keith Owens

On Sun, 1 Oct 2000 23:50:17 -0300 (BRST), 
Rik van Riel <[EMAIL PROTECTED]> wrote:
>On Sun, 1 Oct 2000, David Ford wrote:
>> During normal operation of the machine, -T shows processes
>> having PCs of 0x and 0x7f00 which strikes me as a
>> bit odd.
>> 
>> For e.g. the following:
>> 
>>  sshd  S 7FFF 0   247 88   248  (NOTLB)
>>  121
>> sig: 0   : X
>>  bash  S  0   248247   263  (NOTLB)
>> sig: 0  0001 : X
>
>Sysrq-T is broken on x86 ;
>
>(very much to my dismay ... this is one of the best
>debugging helps we have^Whad and I could have used
>it quite well)

show_task() calls thread_saved_pc() which is giving bad results.
Getting the correct PC for blocked threads is easy,

Index: 0-test9-pre9.3/include/asm-i386/processor.h
--- 0-test9-pre9.3/include/asm-i386/processor.h Tue, 08 Aug 2000 16:14:08 +1000 kaos 
(linux-2.4/P/18_processor. 1.1.1.5 644)
+++ 0-test9-pre9.3(w)/include/asm-i386/processor.h Wed, 04 Oct 2000 01:48:32 +1100 
+kaos (linux-2.4/P/18_processor. 1.1.1.5 644)
@@ -411,7 +411,7 @@ extern void forget_segments(void);
  * Return saved PC of a blocked thread.
  */
 extern inline unsigned long thread_saved_pc(struct thread_struct *t)
 {
-   return ((unsigned long *)t->esp)[3];
+   return (t->eip);
 }

But it does not give you much.  Thread esp and eip are only saved
during switch_to(), at which point eip always points to schedule+0x42c.
If the task is running on a cpu (the interesting case) then neither
t->esp nor t->eip contain useful values so you cannot get the PC for
running tasks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs hang at boot) where is the kernel debugger?

2000-10-03 Thread Keith Owens

On Sun, 1 Oct 2000 23:50:17 -0300 (BRST), 
Rik van Riel [EMAIL PROTECTED] wrote:
On Sun, 1 Oct 2000, David Ford wrote:
 During normal operation of the machine, -T shows processes
 having PCs of 0x and 0x7f00 which strikes me as a
 bit odd.
 
 For e.g. the following:
 
  sshd  S 7FFF 0   247 88   248  (NOTLB)
  121
 sig: 0   : X
  bash  S  0   248247   263  (NOTLB)
 sig: 0  0001 : X

Sysrq-T is broken on x86 ;

(very much to my dismay ... this is one of the best
debugging helps we have^Whad and I could have used
it quite well)

show_task() calls thread_saved_pc() which is giving bad results.
Getting the correct PC for blocked threads is easy,

Index: 0-test9-pre9.3/include/asm-i386/processor.h
--- 0-test9-pre9.3/include/asm-i386/processor.h Tue, 08 Aug 2000 16:14:08 +1000 kaos 
(linux-2.4/P/18_processor. 1.1.1.5 644)
+++ 0-test9-pre9.3(w)/include/asm-i386/processor.h Wed, 04 Oct 2000 01:48:32 +1100 
+kaos (linux-2.4/P/18_processor. 1.1.1.5 644)
@@ -411,7 +411,7 @@ extern void forget_segments(void);
  * Return saved PC of a blocked thread.
  */
 extern inline unsigned long thread_saved_pc(struct thread_struct *t)
 {
-   return ((unsigned long *)t-esp)[3];
+   return (t-eip);
 }

But it does not give you much.  Thread esp and eip are only saved
during switch_to(), at which point eip always points to schedule+0x42c.
If the task is running on a cpu (the interesting case) then neither
t-esp nor t-eip contain useful values so you cannot get the PC for
running tasks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs hang at boot) where is the kernel debugger?

2000-10-03 Thread Rik van Riel

On Wed, 4 Oct 2000, Keith Owens wrote:
 Rik van Riel [EMAIL PROTECTED] wrote:

 Sysrq-T is broken on x86 ;
 
 show_task() calls thread_saved_pc() which is giving bad results.
 Getting the correct PC for blocked threads is easy,
 
 Index: 0-test9-pre9.3/include/asm-i386/processor.h
 --- 0-test9-pre9.3/include/asm-i386/processor.h Tue, 08 Aug 2000 16:14:08 +1000 kaos 
(linux-2.4/P/18_processor. 1.1.1.5 644)
 +++ 0-test9-pre9.3(w)/include/asm-i386/processor.h Wed, 04 Oct 2000 01:48:32 +1100 
kaos (linux-2.4/P/18_processor. 1.1.1.5 644)
 @@ -411,7 +411,7 @@ extern void forget_segments(void);
   * Return saved PC of a blocked thread.
   */
  extern inline unsigned long thread_saved_pc(struct thread_struct *t)
  {
 -   return ((unsigned long *)t-esp)[3];
 +   return (t-eip);
  }
 
 But it does not give you much.  Thread esp and eip are only
 saved during switch_to(), at which point eip always points to
 schedule+0x42c.

Yup ;)

So this function will need to look at the call trace and
give the function that called schedule() ...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs hang at boot) where is the kernel debugger?

2000-10-01 Thread David Ford

Rik van Riel wrote:

> Schedule() is the last function in the kernel they
> went into before they got scheduled away ;)
>
> The second last function is the one you're interested
> in ...

Hmm, 'k.


> > PC:  schedule()
> > -1:  down()
> > -2:  down_fail()
> Then I guess something was trying to take the same
> semaphore twice and deadlocked, taking the rest of
> the system with it...

That sounds 'bout right.  Every new process gets stuck at the same
location too.  (reiserfs list) Do you have a comment?


> >  sshd  S 7FFF 0   247 88   248  (NOTLB)
> >  121
> > sig: 0   : X
> >  bash  S  0   248247   263  (NOTLB)
> > sig: 0  0001 : X
>
> Sysrq-T is broken on x86 ;
>
> (very much to my dismay ... this is one of the best
> debugging helps we have^Whad and I could have used
> it quite well)

Oh that doesn't make me happy.  What's necessary to fix it to get useful
information?

-d

--
  "There is a natural aristocracy among men. The grounds of this are
  virtue and talents", Thomas Jefferson [1742-1826], 3rd US President




begin:vcard 
n:Ford;David
x-mozilla-html:TRUE
org:http://www.kalifornia.com/images/paradise.jpg">
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
title:Blue Labs Developer
x-mozilla-cpt:;28256
fn:David Ford
end:vcard



Re: (reiserfs hang at boot) where is the kernel debugger?

2000-10-01 Thread Rik van Riel

On Sun, 1 Oct 2000, David Ford wrote:
> Rik van Riel wrote:
> 
> > > How broken is it?  I have a test9-pre7 system that's exhibits an
> > > elusive bug, reiserfs hangs at boot time, and all I need is a
> > > backtrace on the D state processes.
> >
> > Could be a VM bug. ;)
> 
> It could, but I strongly doubt it.  We've seen this bug [very]
> infrequently for the last year.

OK, then it's almost certainly not VM related...
(which also means I can't fix it in little time)

> I'm rather hesitant to trust my findings, kdb says all the
> processes are in schedule+nn. Yes, it can be related but I'm a
> wee bit dubious.

Schedule() is the last function in the kernel they
went into before they got scheduled away ;)

The second last function is the one you're interested
in ...

> I'd post all my trace findings but I really don't want to cross
> type all that right now -- I left my serial cable at the office.  
> I'll post the traces tomorrow.  The simple version of them is as
> follows:
> 
> PC:  schedule()
> -1:  down()
> -2:  down_fail()
> ...
> 
> Some processes have devfs calls following this, some have
> typical kernel init calls etc., but the common factor in all of
> them is they all sit in schedule and they all have the same PC
> location.

Then I guess something was trying to take the same 
semaphore twice and deadlocked, taking the rest of
the system with it...

> During normal operation of the machine, -T shows processes
> having PCs of 0x and 0x7f00 which strikes me as a
> bit odd.
> 
> For e.g. the following:
> 
>  sshd  S 7FFF 0   247 88   248  (NOTLB)
>  121
> sig: 0   : X
>  bash  S  0   248247   263  (NOTLB)
> sig: 0  0001 : X

Sysrq-T is broken on x86 ;

(very much to my dismay ... this is one of the best
debugging helps we have^Whad and I could have used
it quite well)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/


begin:vcard 
n:Ford;David
x-mozilla-html:TRUE
org:http://www.kalifornia.com/images/paradise.jpg">
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
title:Blue Labs Developer
x-mozilla-cpt:;28256
fn:David Ford
end:vcard



Re: (reiserfs hang at boot) where is the kernel debugger?

2000-10-01 Thread David Ford

Rik van Riel wrote:

> > How broken is it?  I have a test9-pre7 system that's exhibits an
> > elusive bug, reiserfs hangs at boot time, and all I need is a
> > backtrace on the D state processes.
>
> Could be a VM bug. ;)

It could, but I strongly doubt it.  We've seen this bug [very] infrequently
for the last year.  I managed to build a system and trigger it four out of
five times on boot.  Frustratingly after adding kdb, I'm only able to
trigger it once in ~20 boots now.

I'm rather hesitant to trust my findings, kdb says all the processes are in
schedule+nn.  Yes, it can be related but I'm a wee bit dubious.   I'd post
all my trace findings but I really don't want to cross type all that right
now -- I left my serial cable at the office.  I'll post the traces
tomorrow.  The simple version of them is as follows:

PC:  schedule()
-1:  down()
-2:  down_fail()
...

Some processes have devfs calls following this, some have typical kernel
init calls etc., but the common factor in all of them is they all sit in
schedule and they all have the same PC location.  SysRq-T shows C111914B
which is outside system.map but again I'm dubious.  During normal operation
of the machine, -T shows processes having PCs of 0x and 0x7f00
which strikes me as a bit odd.

For e.g. the following:

 sshd  S 7FFF 0   247 88   248  (NOTLB)
 121
sig: 0   : X
 bash  S  0   248247   263  (NOTLB)
sig: 0  0001 : X

BTW, if nobody has any qualms, I'm going to poke at this output and tidy it
up, the header is disjointed from the body.

-d

--
  "There is a natural aristocracy among men. The grounds of this are
  virtue and talents", Thomas Jefferson [1742-1826], 3rd US President




begin:vcard 
n:Ford;David
x-mozilla-html:TRUE
org:http://www.kalifornia.com/images/paradise.jpg">
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
title:Blue Labs Developer
x-mozilla-cpt:;28256
fn:David Ford
end:vcard



Re: (reiserfs hang at boot) where is the kernel debugger?

2000-10-01 Thread David Ford

Rik van Riel wrote:

  How broken is it?  I have a test9-pre7 system that's exhibits an
  elusive bug, reiserfs hangs at boot time, and all I need is a
  backtrace on the D state processes.

 Could be a VM bug. ;)

It could, but I strongly doubt it.  We've seen this bug [very] infrequently
for the last year.  I managed to build a system and trigger it four out of
five times on boot.  Frustratingly after adding kdb, I'm only able to
trigger it once in ~20 boots now.

I'm rather hesitant to trust my findings, kdb says all the processes are in
schedule+nn.  Yes, it can be related but I'm a wee bit dubious.   I'd post
all my trace findings but I really don't want to cross type all that right
now -- I left my serial cable at the office.  I'll post the traces
tomorrow.  The simple version of them is as follows:

PC:  schedule()
-1:  down()
-2:  down_fail()
...

Some processes have devfs calls following this, some have typical kernel
init calls etc., but the common factor in all of them is they all sit in
schedule and they all have the same PC location.  SysRq-T shows C111914B
which is outside system.map but again I'm dubious.  During normal operation
of the machine, -T shows processes having PCs of 0x and 0x7f00
which strikes me as a bit odd.

For e.g. the following:

 sshd  S 7FFF 0   247 88   248  (NOTLB)
 121
sig: 0   : X
 bash  S  0   248247   263  (NOTLB)
sig: 0  0001 : X

BTW, if nobody has any qualms, I'm going to poke at this output and tidy it
up, the header is disjointed from the body.

-d

--
  "There is a natural aristocracy among men. The grounds of this are
  virtue and talents", Thomas Jefferson [1742-1826], 3rd US President




begin:vcard 
n:Ford;David
x-mozilla-html:TRUE
org:img src="http://www.kalifornia.com/images/paradise.jpg"
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
title:Blue Labs Developer
x-mozilla-cpt:;28256
fn:David Ford
end:vcard



Re: (reiserfs hang at boot) where is the kernel debugger?

2000-10-01 Thread David Ford

Rik van Riel wrote:

 Schedule() is the last function in the kernel they
 went into before they got scheduled away ;)

 The second last function is the one you're interested
 in ...

Hmm, 'k.


  PC:  schedule()
  -1:  down()
  -2:  down_fail()
 Then I guess something was trying to take the same
 semaphore twice and deadlocked, taking the rest of
 the system with it...

That sounds 'bout right.  Every new process gets stuck at the same
location too.  (reiserfs list) Do you have a comment?


   sshd  S 7FFF 0   247 88   248  (NOTLB)
   121
  sig: 0   : X
   bash  S  0   248247   263  (NOTLB)
  sig: 0  0001 : X

 Sysrq-T is broken on x86 ;

 (very much to my dismay ... this is one of the best
 debugging helps we have^Whad and I could have used
 it quite well)

Oh that doesn't make me happy.  What's necessary to fix it to get useful
information?

-d

--
  "There is a natural aristocracy among men. The grounds of this are
  virtue and talents", Thomas Jefferson [1742-1826], 3rd US President




begin:vcard 
n:Ford;David
x-mozilla-html:TRUE
org:img src="http://www.kalifornia.com/images/paradise.jpg"
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
title:Blue Labs Developer
x-mozilla-cpt:;28256
fn:David Ford
end:vcard