RE: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops) -- MAIL LOOP?

2000-10-17 Thread Sudhindra Herle

Sorry for this off-topic post, but,

I'm getting this email way too many times. I now have 5 copies of the email
from Alexander Viro (in response to Linus).

Is anyone else facing the same problem?

Is vger messed up again?

Cheers,
-Sudhi.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Roger Larsson

Linus Torvalds wrote:
> 
> On Tue, 17 Oct 2000, Alexander Viro wrote:
> >
> > > Trace; c014efde 
> > > Trace; c014f240 
> > > Trace; c014f6af 
> > > Trace; c021e87e 
> > Huh?
> > > Trace; c01523af 
> >
> > The rest of trace is OK, but WTF is net/unix/*.c code is doing here?
> 
> The traces always (or almost always) have crud in them - it's not a real
> stack-trace, it's just a printout of the stack contents that match
> addresses in the text region. So the unix_write_space thing was probably
> from the previous system call and just hadn't been overwritten.
> 
> Linus
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/


Hmm..
Might this problem be related to...

Tigrans:
>  Subject: test10-pre1 BUG at page_alloc.c:221!


Quintelas:
> Subject: I've got the BAD_RANGE BUG in rmqueue!!! (Pre9-4)



Richard Guenther
> Subject:  [OOPS][BUG] with 2.4.0-test9
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Linus Torvalds



On Tue, 17 Oct 2000, Alexander Viro wrote:
> 
> > Trace; c014efde 
> > Trace; c014f240 
> > Trace; c014f6af 
> > Trace; c021e87e 
> Huh?
> > Trace; c01523af 
> 
> The rest of trace is OK, but WTF is net/unix/*.c code is doing here?

The traces always (or almost always) have crud in them - it's not a real
stack-trace, it's just a printout of the stack contents that match
addresses in the text region. So the unix_write_space thing was probably
from the previous system call and just hadn't been overwritten.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Udo A. Steinberg

Alexander Viro wrote:
> 
> 
> See another posting. More or less the same analysis. I don't see
> where it came from and it smells funny - looks like a loss of ->b_count
> _or_ an active page returned by alloc_page() (to grow_buffers()). I
> wouldn't exclude the latter, BTW, but then I'm still not too familiar with
> Rik's changes to VM, so it's just a nodding to the area I don't grok right
> now.
> 
> > Udo, any idea what you are doing differently than anybody else to see
> > this thing? Any special usage patterns that seem to bring on the trouble?
> 
> BTW, sorry for a stupid question, but... was it the first oops? If it was
> an aftermath of something else...

X wasn't executing any commands anymore, so I switched over to a text console
and dmesg only showed this one oops after the bootup stuff. Your guess that
there might be memory corruption somewhere is probably not too far off, because
I occasionally have a broken ld-cache, in which case several programs suddenly
stop working. Running "ldconfig" immediately fixes that problem.

Whether it's VM-related or broken ram I cannot say, although I'd bet that
it's the former.

-Udo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Alexander Viro



On Tue, 17 Oct 2000, Linus Torvalds wrote:

> and the above is a perfectly fine backtrace, makes tons of sense, looks
> good.

Except the strange beast between ext2_create() and ext2_new_inode().

> HOWEVER. What doesn't make any sense at all is that bread() calls getblk()
> to find the buffer, which in turn certainly makes sure that the buffer it
> tries to read is mapped. In fact, there are two paths to the read: one
> finds the buffer off the hash queue, and the other creates it. The one
> that creates the buffer explicitly marks it BH_Mapped, so the only
> apparent source of problems would be the hash queue.
> 
> Except for the fact that the only thing that adds buffers to the hash
> queue is __insert_into_queues(), and the only thing that calls THAT is
> getblk() itself - again after having marked the buffer mapped.
> 
> In short, the debug trace looks fine, but it also looks completely
> incomprehensible. The only thing that would strike me is
>  - memory corruption
>  - somebody calls "unmap_buffer()" in a buffer that is hashed. Which we
>used to have as a bug, but we definitely don't do that any more.
>  - we have buffer head list corruption going on.

 - we got a page-bound bh on free_list and called block_flushpage() on
that page. But yes, it defintiely counts as a buffer head list corruption.

> Now, I don't see any recent code that has touched anything like this,
> which obviously doesn't mean anything at all. It might be a very old bug
> that just hasn't reared its head before now.
> 
> Al, do you see anything wrong?

See another posting. More or less the same analysis. I don't see
where it came from and it smells funny - looks like a loss of ->b_count
_or_ an active page returned by alloc_page() (to grow_buffers()). I
wouldn't exclude the latter, BTW, but then I'm still not too familiar with
Rik's changes to VM, so it's just a nodding to the area I don't grok right
now.

> Udo, any idea what you are doing differently than anybody else to see
> this thing? Any special usage patterns that seem to bring on the trouble?

BTW, sorry for a stupid question, but... was it the first oops? If it was
an aftermath of something else...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Alexander Viro



On Tue, 17 Oct 2000, Udo A. Steinberg wrote:

> Kernel bug at ll_rw_blk.c: 713!  

unmapped buffer got to the ll_rw_block()

> Trace; c0184d53 
> Trace; c012fa31 

What?  OK, so we got a unmapped bh hashed at some point.
Either it was inserted into hash while it was unmapped or it had been
hashed and then unmapped. The latter could happen only if
block_flushpage() got a bh associated with page _and_ sitting in the hash.
That, in turn, means insertion of page-bound bh into hash at some
earlier point. IOW, __insert_into_queues() had been called on unmapped or
page-bound bh. That returns us to getblk(). OK, unmapped is out of
question and page-bound means that we got a page-bound bh on a freelist.
Very bad. And I don't believe that it's fs-related. Frankly, I don't see
where such thing might happen.

> Trace; c014efde 
> Trace; c014f240 
> Trace; c014f6af 
> Trace; c021e87e 
Huh?
> Trace; c01523af 

The rest of trace is OK, but WTF is net/unix/*.c code is doing here?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Linus Torvalds


Hi,

On Tue, 17 Oct 2000, Udo A. Steinberg wrote:
> 
> It seems that we were all wrong in assuming that ext2 was fixed
> wrt. filesystem corruption. test10pre3 once again has the potential
> to eat files (not sure about earlier versions).
> 
> I finally managed to capture an oops (by hand), so bear with me that
> I didn't typo anywhere.
> 
> Find attached the decoded oops:
> 
> Kernel bug at ll_rw_blk.c: 713!  

Ok, so it claims that we're doing IO on an unmapped buffer. So far so
good..

> invalid operand: 
> CPU: 0
> EIP: 0010:[]
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010282
> eax: 001f ebx: 00cc0008 ecx: c4433500 edx: 0007
> esi: c2c650c0 edi: c02fd160 ebp:  esp: cd28dd90
> ds: 0018 es: 0018 ss: 0018
> Process netscape (pid: 6456, stackpage=cd28d000)
> Stack: c0250fc5 c0251262 02c9 c2c650c0 0008 000c 00cc0008
>d00a  cee143c0 c02fd170 c0300ac0 c02fd178 c02fd170
> 0008 00cc0008  c0183f24 00fe c0184b84
>c02fd160  c2c650c0
> 
> Call Trace: [] [] [] []
> [] [] [] []
> [] [] [] []
> [] [] [] []
> Code: 0f 0b 83 c4 0c 90 0f b7 4e 14 66 89 4c 24 16 0f b6 46 15 8b
>  
> >>EIP; c0184546 <__make_request+a6/630>   <=
> Trace; c0183f24 
> Trace; c0184b84 
> Trace; c0184d53 
> Trace; c012fa31 
> Trace; c014efde 
> Trace; c014f240 
> Trace; c014f6af 

[ etc ]

and the above is a perfectly fine backtrace, makes tons of sense, looks
good.

HOWEVER. What doesn't make any sense at all is that bread() calls getblk()
to find the buffer, which in turn certainly makes sure that the buffer it
tries to read is mapped. In fact, there are two paths to the read: one
finds the buffer off the hash queue, and the other creates it. The one
that creates the buffer explicitly marks it BH_Mapped, so the only
apparent source of problems would be the hash queue.

Except for the fact that the only thing that adds buffers to the hash
queue is __insert_into_queues(), and the only thing that calls THAT is
getblk() itself - again after having marked the buffer mapped.

In short, the debug trace looks fine, but it also looks completely
incomprehensible. The only thing that would strike me is
 - memory corruption
 - somebody calls "unmap_buffer()" in a buffer that is hashed. Which we
   used to have as a bug, but we definitely don't do that any more.
 - we have buffer head list corruption going on.

Now, I don't see any recent code that has touched anything like this,
which obviously doesn't mean anything at all. It might be a very old bug
that just hasn't reared its head before now.

Al, do you see anything wrong?

Udo, any idea what you are doing differently than anybody else to see
this thing? Any special usage patterns that seem to bring on the trouble?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Udo A. Steinberg


Hi Linus & Alexander

It seems that we were all wrong in assuming that ext2 was fixed
wrt. filesystem corruption. test10pre3 once again has the potential
to eat files (not sure about earlier versions).

I finally managed to capture an oops (by hand), so bear with me that
I didn't typo anywhere.

Find attached the decoded oops:


Kernel bug at ll_rw_blk.c: 713!  


ksymoops 2.3.4 on i686 2.4.0-test10.  Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0-test10/ (default)
 -m /boot/System.map-2.4.0-test10 (specified)
 
invalid operand: 
CPU: 0
EIP: 0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 001f ebx: 00cc0008 ecx: c4433500 edx: 0007
esi: c2c650c0 edi: c02fd160 ebp:  esp: cd28dd90
ds: 0018 es: 0018 ss: 0018
Process netscape (pid: 6456, stackpage=cd28d000)
Stack: c0250fc5 c0251262 02c9 c2c650c0 0008 000c 00cc0008
   d00a  cee143c0 c02fd170 c0300ac0 c02fd178 c02fd170
    0008 00cc0008  c0183f24 00fe c0184b84
   c02fd160  c2c650c0

Call Trace: [] [] [] []
[] [] [] []
[] [] [] []
[] [] [] []
Code: 0f 0b 83 c4 0c 90 0f b7 4e 14 66 89 4c 24 16 0f b6 46 15 8b
 
>>EIP; c0184546 <__make_request+a6/630>   <=
Trace; c0250fc5 
Trace; c0251262 
Trace; c0183f24 
Trace; c0184b84 
Trace; c0184d53 
Trace; c012fa31 
Trace; c014efde 
Trace; c014f240 
Trace; c014f6af 
Trace; c021e87e 
Trace; c01523af 
Trace; c0138db7 
Trace; c0138f59 
Trace; c012d6bb 
Trace; c012d9d8 
Trace; c010a9d7 
Code;  c0184546 <__make_request+a6/630>
 <_EIP>:
Code;  c0184546 <__make_request+a6/630>   <=
   0:   0f 0b ud2a  <=
Code;  c0184548 <__make_request+a8/630>
   2:   83 c4 0c  addl   $0xc,%esp
Code;  c018454b <__make_request+ab/630>
   5:   90nop
Code;  c018454c <__make_request+ac/630>
   6:   0f b7 4e 14   movzwl 0x14(%esi),%ecx
Code;  c0184550 <__make_request+b0/630>
   a:   66 89 4c 24 16movw   %cx,0x16(%esp,1)
Code;  c0184555 <__make_request+b5/630>
   f:   0f b6 46 15   movzbl 0x15(%esi),%eax
Code;  c0184559 <__make_request+b9/630>
  13:   8b 00 movl   (%eax),%eax
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Udo A. Steinberg


Hi Linus  Alexander

It seems that we were all wrong in assuming that ext2 was fixed
wrt. filesystem corruption. test10pre3 once again has the potential
to eat files (not sure about earlier versions).

I finally managed to capture an oops (by hand), so bear with me that
I didn't typo anywhere.

Find attached the decoded oops:


Kernel bug at ll_rw_blk.c: 713!  


ksymoops 2.3.4 on i686 2.4.0-test10.  Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0-test10/ (default)
 -m /boot/System.map-2.4.0-test10 (specified)
 
invalid operand: 
CPU: 0
EIP: 0010:[c0184546]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 001f ebx: 00cc0008 ecx: c4433500 edx: 0007
esi: c2c650c0 edi: c02fd160 ebp:  esp: cd28dd90
ds: 0018 es: 0018 ss: 0018
Process netscape (pid: 6456, stackpage=cd28d000)
Stack: c0250fc5 c0251262 02c9 c2c650c0 0008 000c 00cc0008
   d00a  cee143c0 c02fd170 c0300ac0 c02fd178 c02fd170
    0008 00cc0008  c0183f24 00fe c0184b84
   c02fd160  c2c650c0

Call Trace: [c0250fc5] [c0251262] [c0183f24] [c0184b84]
[c0184d53] [c012fa31] [c014efde] [c014f240]
[c014f6af] [c021e87e] [c01523af] [c0138db7]
[c0138f59] [c012d6bb] [c012d9d8] [c010a9d7]
Code: 0f 0b 83 c4 0c 90 0f b7 4e 14 66 89 4c 24 16 0f b6 46 15 8b
 
EIP; c0184546 __make_request+a6/630   =
Trace; c0250fc5 tvecs+17b9d/1b898
Trace; c0251262 tvecs+17e3a/1b898
Trace; c0183f24 blk_get_queue+34/50
Trace; c0184b84 generic_make_request+b4/120
Trace; c0184d53 ll_rw_block+163/1e0
Trace; c012fa31 bread+31/70
Trace; c014efde read_inode_bitmap+3e/90
Trace; c014f240 load_inode_bitmap+210/230
Trace; c014f6af ext2_new_inode+29f/700
Trace; c021e87e unix_write_space+2e/50
Trace; c01523af ext2_create+1f/c0
Trace; c0138db7 vfs_create+a7/e0
Trace; c0138f59 open_namei+169/620
Trace; c012d6bb filp_open+3b/60
Trace; c012d9d8 sys_open+38/c0
Trace; c010a9d7 system_call+33/38
Code;  c0184546 __make_request+a6/630
 _EIP:
Code;  c0184546 __make_request+a6/630   =
   0:   0f 0b ud2a  =
Code;  c0184548 __make_request+a8/630
   2:   83 c4 0c  addl   $0xc,%esp
Code;  c018454b __make_request+ab/630
   5:   90nop
Code;  c018454c __make_request+ac/630
   6:   0f b7 4e 14   movzwl 0x14(%esi),%ecx
Code;  c0184550 __make_request+b0/630
   a:   66 89 4c 24 16movw   %cx,0x16(%esp,1)
Code;  c0184555 __make_request+b5/630
   f:   0f b6 46 15   movzbl 0x15(%esi),%eax
Code;  c0184559 __make_request+b9/630
  13:   8b 00 movl   (%eax),%eax
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Linus Torvalds


Hi,

On Tue, 17 Oct 2000, Udo A. Steinberg wrote:
 
 It seems that we were all wrong in assuming that ext2 was fixed
 wrt. filesystem corruption. test10pre3 once again has the potential
 to eat files (not sure about earlier versions).
 
 I finally managed to capture an oops (by hand), so bear with me that
 I didn't typo anywhere.
 
 Find attached the decoded oops:
 
 Kernel bug at ll_rw_blk.c: 713!  

Ok, so it claims that we're doing IO on an unmapped buffer. So far so
good..

 invalid operand: 
 CPU: 0
 EIP: 0010:[c0184546]
 Using defaults from ksymoops -t elf32-i386 -a i386
 EFLAGS: 00010282
 eax: 001f ebx: 00cc0008 ecx: c4433500 edx: 0007
 esi: c2c650c0 edi: c02fd160 ebp:  esp: cd28dd90
 ds: 0018 es: 0018 ss: 0018
 Process netscape (pid: 6456, stackpage=cd28d000)
 Stack: c0250fc5 c0251262 02c9 c2c650c0 0008 000c 00cc0008
d00a  cee143c0 c02fd170 c0300ac0 c02fd178 c02fd170
 0008 00cc0008  c0183f24 00fe c0184b84
c02fd160  c2c650c0
 
 Call Trace: [c0250fc5] [c0251262] [c0183f24] [c0184b84]
 [c0184d53] [c012fa31] [c014efde] [c014f240]
 [c014f6af] [c021e87e] [c01523af] [c0138db7]
 [c0138f59] [c012d6bb] [c012d9d8] [c010a9d7]
 Code: 0f 0b 83 c4 0c 90 0f b7 4e 14 66 89 4c 24 16 0f b6 46 15 8b
  
 EIP; c0184546 __make_request+a6/630   =
 Trace; c0183f24 blk_get_queue+34/50
 Trace; c0184b84 generic_make_request+b4/120
 Trace; c0184d53 ll_rw_block+163/1e0
 Trace; c012fa31 bread+31/70
 Trace; c014efde read_inode_bitmap+3e/90
 Trace; c014f240 load_inode_bitmap+210/230
 Trace; c014f6af ext2_new_inode+29f/700

[ etc ]

and the above is a perfectly fine backtrace, makes tons of sense, looks
good.

HOWEVER. What doesn't make any sense at all is that bread() calls getblk()
to find the buffer, which in turn certainly makes sure that the buffer it
tries to read is mapped. In fact, there are two paths to the read: one
finds the buffer off the hash queue, and the other creates it. The one
that creates the buffer explicitly marks it BH_Mapped, so the only
apparent source of problems would be the hash queue.

Except for the fact that the only thing that adds buffers to the hash
queue is __insert_into_queues(), and the only thing that calls THAT is
getblk() itself - again after having marked the buffer mapped.

In short, the debug trace looks fine, but it also looks completely
incomprehensible. The only thing that would strike me is
 - memory corruption
 - somebody calls "unmap_buffer()" in a buffer that is hashed. Which we
   used to have as a bug, but we definitely don't do that any more.
 - we have buffer head list corruption going on.

Now, I don't see any recent code that has touched anything like this,
which obviously doesn't mean anything at all. It might be a very old bug
that just hasn't reared its head before now.

Al, do you see anything wrong?

Udo, any idea what you are doing differently than anybody else to see
this thing? Any special usage patterns that seem to bring on the trouble?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Alexander Viro



On Tue, 17 Oct 2000, Udo A. Steinberg wrote:

 Kernel bug at ll_rw_blk.c: 713!  

unmapped buffer got to the ll_rw_block()

 Trace; c0184d53 ll_rw_block+163/1e0
 Trace; c012fa31 bread+31/70

What? thinking OK, so we got a unmapped bh hashed at some point.
Either it was inserted into hash while it was unmapped or it had been
hashed and then unmapped. The latter could happen only if
block_flushpage() got a bh associated with page _and_ sitting in the hash.
That, in turn, means insertion of page-bound bh into hash at some
earlier point. IOW, __insert_into_queues() had been called on unmapped or
page-bound bh. That returns us to getblk(). OK, unmapped is out of
question and page-bound means that we got a page-bound bh on a freelist.
Very bad. And I don't believe that it's fs-related. Frankly, I don't see
where such thing might happen.

 Trace; c014efde read_inode_bitmap+3e/90
 Trace; c014f240 load_inode_bitmap+210/230
 Trace; c014f6af ext2_new_inode+29f/700
 Trace; c021e87e unix_write_space+2e/50
Huh?
 Trace; c01523af ext2_create+1f/c0

The rest of trace is OK, but WTF is net/unix/*.c code is doing here?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Alexander Viro



On Tue, 17 Oct 2000, Linus Torvalds wrote:

 and the above is a perfectly fine backtrace, makes tons of sense, looks
 good.

Except the strange beast between ext2_create() and ext2_new_inode().

 HOWEVER. What doesn't make any sense at all is that bread() calls getblk()
 to find the buffer, which in turn certainly makes sure that the buffer it
 tries to read is mapped. In fact, there are two paths to the read: one
 finds the buffer off the hash queue, and the other creates it. The one
 that creates the buffer explicitly marks it BH_Mapped, so the only
 apparent source of problems would be the hash queue.
 
 Except for the fact that the only thing that adds buffers to the hash
 queue is __insert_into_queues(), and the only thing that calls THAT is
 getblk() itself - again after having marked the buffer mapped.
 
 In short, the debug trace looks fine, but it also looks completely
 incomprehensible. The only thing that would strike me is
  - memory corruption
  - somebody calls "unmap_buffer()" in a buffer that is hashed. Which we
used to have as a bug, but we definitely don't do that any more.
  - we have buffer head list corruption going on.

 - we got a page-bound bh on free_list and called block_flushpage() on
that page. But yes, it defintiely counts as a buffer head list corruption.

 Now, I don't see any recent code that has touched anything like this,
 which obviously doesn't mean anything at all. It might be a very old bug
 that just hasn't reared its head before now.
 
 Al, do you see anything wrong?

See another posting. More or less the same analysis. I don't see
where it came from and it smells funny - looks like a loss of -b_count
_or_ an active page returned by alloc_page() (to grow_buffers()). I
wouldn't exclude the latter, BTW, but then I'm still not too familiar with
Rik's changes to VM, so it's just a nodding to the area I don't grok right
now.

 Udo, any idea what you are doing differently than anybody else to see
 this thing? Any special usage patterns that seem to bring on the trouble?

BTW, sorry for a stupid question, but... was it the first oops? If it was
an aftermath of something else...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Udo A. Steinberg

Alexander Viro wrote:
 
 
 See another posting. More or less the same analysis. I don't see
 where it came from and it smells funny - looks like a loss of -b_count
 _or_ an active page returned by alloc_page() (to grow_buffers()). I
 wouldn't exclude the latter, BTW, but then I'm still not too familiar with
 Rik's changes to VM, so it's just a nodding to the area I don't grok right
 now.
 
  Udo, any idea what you are doing differently than anybody else to see
  this thing? Any special usage patterns that seem to bring on the trouble?
 
 BTW, sorry for a stupid question, but... was it the first oops? If it was
 an aftermath of something else...

X wasn't executing any commands anymore, so I switched over to a text console
and dmesg only showed this one oops after the bootup stuff. Your guess that
there might be memory corruption somewhere is probably not too far off, because
I occasionally have a broken ld-cache, in which case several programs suddenly
stop working. Running "ldconfig" immediately fixes that problem.

Whether it's VM-related or broken ram I cannot say, although I'd bet that
it's the former.

-Udo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Linus Torvalds



On Tue, 17 Oct 2000, Alexander Viro wrote:
 
  Trace; c014efde read_inode_bitmap+3e/90
  Trace; c014f240 load_inode_bitmap+210/230
  Trace; c014f6af ext2_new_inode+29f/700
  Trace; c021e87e unix_write_space+2e/50
 Huh?
  Trace; c01523af ext2_create+1f/c0
 
 The rest of trace is OK, but WTF is net/unix/*.c code is doing here?

The traces always (or almost always) have crud in them - it's not a real
stack-trace, it's just a printout of the stack contents that match
addresses in the text region. So the unix_write_space thing was probably
from the previous system call and just hadn't been overwritten.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Roger Larsson

Linus Torvalds wrote:
 
 On Tue, 17 Oct 2000, Alexander Viro wrote:
 
   Trace; c014efde read_inode_bitmap+3e/90
   Trace; c014f240 load_inode_bitmap+210/230
   Trace; c014f6af ext2_new_inode+29f/700
   Trace; c021e87e unix_write_space+2e/50
  Huh?
   Trace; c01523af ext2_create+1f/c0
 
  The rest of trace is OK, but WTF is net/unix/*.c code is doing here?
 
 The traces always (or almost always) have crud in them - it's not a real
 stack-trace, it's just a printout of the stack contents that match
 addresses in the text region. So the unix_write_space thing was probably
 from the previous system call and just hadn't been overwritten.
 
 Linus
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/


Hmm..
Might this problem be related to...

Tigrans:
  Subject: test10-pre1 BUG at page_alloc.c:221!


Quintelas:
 Subject: I've got the BAD_RANGE BUG in rmqueue!!! (Pre9-4)



Richard Guenther
 Subject:  [OOPS][BUG] with 2.4.0-test9
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops) -- MAIL LOOP?

2000-10-17 Thread Sudhindra Herle

Sorry for this off-topic post, but,

I'm getting this email way too many times. I now have 5 copies of the email
from Alexander Viro (in response to Linus).

Is anyone else facing the same problem?

Is vger messed up again?

Cheers,
-Sudhi.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/