Re: XFS or Kernel Problem / Bug

2007-01-30 Thread Stefan Priebe - FH

Hi!

Any News?

Stefan

Stefan Priebe - FH schrieb:

Hi!

OK - i rechecked everything. We've 22 Servers with the DFI PM-12 
Mainboard with VIA Chipset.


But only the 5 oldest of them (before 2004 / 01 / 20) (we've buyed all 
in a range of 10 month) have this problem.


So i think it is a mixture of software and hardware problem. Perhaps DFI 
changed something on the mainboard (e.g. new revision) or there was a 
new BIOS Version on it.


But there must also changed something in the kernel.

 > OK, can you post configs for one that works and one that doesn't?
You mean Kernel .configs?

 > And which C compiler(s) do you use? The same for all, I hope...
On all 32bit Machines:

gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
--enable-__cxa_atexit --with-system-zlib --enable-nls 
--without-included-gettext --enable-clocale=gnu --enable-debug 
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux

Thread model: posix
gcc version 3.3.5 (Debian 1:3.3.5-13)

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


What is different about these servers?


All 300 machines are mostly different. We have Dual Opteron, single P4
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
more... different mainboards etc.

The only thing i found out is, that all these servers (where the
problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


Any others with VIA chipsets?


Are you building different kernels for them, or is it just different
drivers loaded?


No every machine builds it's own kernel.



OK, can you post configs for one that works and one that doesn't?

And which C compiler(s) do you use? The same for all, I hope...






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-30 Thread Stefan Priebe - FH

Hi!

Any News?

Stefan

Stefan Priebe - FH schrieb:

Hi!

OK - i rechecked everything. We've 22 Servers with the DFI PM-12 
Mainboard with VIA Chipset.


But only the 5 oldest of them (before 2004 / 01 / 20) (we've buyed all 
in a range of 10 month) have this problem.


So i think it is a mixture of software and hardware problem. Perhaps DFI 
changed something on the mainboard (e.g. new revision) or there was a 
new BIOS Version on it.


But there must also changed something in the kernel.

  OK, can you post configs for one that works and one that doesn't?
You mean Kernel .configs?

  And which C compiler(s) do you use? The same for all, I hope...
On all 32bit Machines:

gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
--enable-__cxa_atexit --with-system-zlib --enable-nls 
--without-included-gettext --enable-clocale=gnu --enable-debug 
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux

Thread model: posix
gcc version 3.3.5 (Debian 1:3.3.5-13)

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


What is different about these servers?


All 300 machines are mostly different. We have Dual Opteron, single P4
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
more... different mainboards etc.

The only thing i found out is, that all these servers (where the
problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


Any others with VIA chipsets?


Are you building different kernels for them, or is it just different
drivers loaded?


No every machine builds it's own kernel.



OK, can you post configs for one that works and one that doesn't?

And which C compiler(s) do you use? The same for all, I hope...






-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-25 Thread Stefan Priebe - FH

Hi!

OK - i rechecked everything. We've 22 Servers with the DFI PM-12 
Mainboard with VIA Chipset.


But only the 5 oldest of them (before 2004 / 01 / 20) (we've buyed all 
in a range of 10 month) have this problem.


So i think it is a mixture of software and hardware problem. Perhaps DFI 
changed something on the mainboard (e.g. new revision) or there was a 
new BIOS Version on it.


But there must also changed something in the kernel.

> OK, can you post configs for one that works and one that doesn't?
You mean Kernel .configs?

> And which C compiler(s) do you use? The same for all, I hope...
On all 32bit Machines:

gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
--enable-__cxa_atexit --with-system-zlib --enable-nls 
--without-included-gettext --enable-clocale=gnu --enable-debug 
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux

Thread model: posix
gcc version 3.3.5 (Debian 1:3.3.5-13)

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


What is different about these servers?


All 300 machines are mostly different. We have Dual Opteron, single P4
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
more... different mainboards etc.

The only thing i found out is, that all these servers (where the
problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


Any others with VIA chipsets?


Are you building different kernels for them, or is it just different
drivers loaded?


No every machine builds it's own kernel.



OK, can you post configs for one that works and one that doesn't?

And which C compiler(s) do you use? The same for all, I hope...



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-25 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
> > What is different about these servers?
> All 300 machines are mostly different. We have Dual Opteron, single P4
> with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
> more... different mainboards etc.
>
> The only thing i found out is, that all these servers (where the
> problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.
Any others with VIA chipsets?
>
> > Are you building different kernels for them, or is it just different
> > drivers loaded?
> No every machine builds it's own kernel.
>
OK, can you post configs for one that works and one that doesn't?

And which C compiler(s) do you use? The same for all, I hope...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-25 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
  What is different about these servers?
 All 300 machines are mostly different. We have Dual Opteron, single P4
 with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
 more... different mainboards etc.

 The only thing i found out is, that all these servers (where the
 problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.
Any others with VIA chipsets?

  Are you building different kernels for them, or is it just different
  drivers loaded?
 No every machine builds it's own kernel.

OK, can you post configs for one that works and one that doesn't?

And which C compiler(s) do you use? The same for all, I hope...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-25 Thread Stefan Priebe - FH

Hi!

OK - i rechecked everything. We've 22 Servers with the DFI PM-12 
Mainboard with VIA Chipset.


But only the 5 oldest of them (before 2004 / 01 / 20) (we've buyed all 
in a range of 10 month) have this problem.


So i think it is a mixture of software and hardware problem. Perhaps DFI 
changed something on the mainboard (e.g. new revision) or there was a 
new BIOS Version on it.


But there must also changed something in the kernel.

 OK, can you post configs for one that works and one that doesn't?
You mean Kernel .configs?

 And which C compiler(s) do you use? The same for all, I hope...
On all 32bit Machines:

gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
--enable-__cxa_atexit --with-system-zlib --enable-nls 
--without-included-gettext --enable-clocale=gnu --enable-debug 
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux

Thread model: posix
gcc version 3.3.5 (Debian 1:3.3.5-13)

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


What is different about these servers?


All 300 machines are mostly different. We have Dual Opteron, single P4
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
more... different mainboards etc.

The only thing i found out is, that all these servers (where the
problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


Any others with VIA chipsets?


Are you building different kernels for them, or is it just different
drivers loaded?


No every machine builds it's own kernel.



OK, can you post configs for one that works and one that doesn't?

And which C compiler(s) do you use? The same for all, I hope...



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi Chuck,
   hi Eric,

cause you both asked me nearly the same i will answer you both in one mail.


> What is different about these servers?
All 300 machines are mostly different. We have Dual Opteron, single P4 
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many more... 
different mainboards etc.


The only thing i found out is, that all these servers (where the problem 
exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


> Are you building different kernels for them, or is it just different
> drivers loaded?
No every machine builds it's own kernel.

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


Hi!

Mhm are you shure? I mean i have this problem on 5 servers - all with
the same mainboard. I cannot believe, that all 5 servers have a
hardware problem that starts on the same day.

The other thing is - that they all work fine with 2.6.16.x and all
other kernels before. I mean some of them were used with 2.6.x since
two years without any problem...



OK it's probably not hardware, but a bit is flipped somehow.

What is different about these servers?

Are you building different kernels for them, or is it just different
drivers loaded?




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
> Hi!
>
> Mhm are you shure? I mean i have this problem on 5 servers - all with
> the same mainboard. I cannot believe, that all 5 servers have a
> hardware problem that starts on the same day.
>
> The other thing is - that they all work fine with 2.6.16.x and all
> other kernels before. I mean some of them were used with 2.6.x since
> two years without any problem...
>
OK it's probably not hardware, but a bit is flipped somehow.

What is different about these servers?

Are you building different kernels for them, or is it just different
drivers loaded?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

Mhm are you shure? I mean i have this problem on 5 servers - all with 
the same mainboard. I cannot believe, that all 5 servers have a hardware 
problem that starts on the same day.


The other thing is - that they all work fine with 2.6.16.x and all other 
kernels before. I mean some of them were used with 2.6.x since two years 
without any problem...


Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


Sorry that is not possible - cause it is a production machine.

But i've catched the error and the files from another machine -
perhaps this helps.

"BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288"
" printing eip:"
"c0142ff7"
"*pde = "
"Oops:  [#1]"
"SMP "
"Modules linked in: iptable_filter ip_tables x_tables"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at generic_file_buffered_write+0x390/0x6cf"
"eax:    ebx: 01ec   ecx: ea029a40   edx: 8002"
"esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18"
"ds: 007b   es: 007b   ss: 0068"
"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
"Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 "
"   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 "
"    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 "
"Call Trace:"
" [] sock_def_readable+0x7f/0x81"
" [] file_update_time+0xad/0xcb"
" [] xfs_iunlock+0x55/0x9f"
" [] xfs_write+0xa74/0xc61"
" [] sock_aio_read+0x95/0x99"
" [] xfs_file_aio_write+0x8f/0xa0"
" [] do_sync_write+0xc9/0x10f"
" [] autoremove_wake_function+0x0/0x57"
" [] generic_file_llseek+0x95/0xbc"
" [] do_sync_write+0x0/0x10f"
" [] vfs_write+0xa6/0x179"
" [] sys_write+0x51/0x80"
" [] syscall_call+0x7/0xb"
"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db
0f 88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b>
85 9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "
"EIP: [] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18"

Files:
http://server113-han.de-nserver.de/filemap.s
http://server113-han.de-nserver.de/filemap.o



You seem to have some kind of hardware/memory problem.

Disassembly of the failing instruction from the oops:

 8b 7c 24 28   mov0x28(%esp),%edi
 8b 85 9c 00 00 00 mov0x9c(%ebp),%eax   <=

Dump of the object code:

   8b 7c 24 28 mov0x28(%esp),%edi
   8b 87 9c 00 00 00   mov0x9c(%edi),%eax

Looks like a bit is flipped.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
> Sorry that is not possible - cause it is a production machine.
>
> But i've catched the error and the files from another machine -
> perhaps this helps.
>
> "BUG: unable to handle kernel NULL pointer dereference at virtual
> address 0288"
> " printing eip:"
> "c0142ff7"
> "*pde = "
> "Oops:  [#1]"
> "SMP "
> "Modules linked in: iptable_filter ip_tables x_tables"
> "CPU:0"
> "EIP:0060:[]Not tainted VLI"
> "EFLAGS: 00010246   (2.6.18.6 #1) "
> "EIP is at generic_file_buffered_write+0x390/0x6cf"
> "eax:    ebx: 01ec   ecx: ea029a40   edx: 8002"
> "esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18"
> "ds: 007b   es: 007b   ss: 0068"
> "Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
> "Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
> 0010 "
> "   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
>  "
> "    01ec dd04beac 00d420b1   dd04bd80
> 45b1fa67 "
> "Call Trace:"
> " [] sock_def_readable+0x7f/0x81"
> " [] file_update_time+0xad/0xcb"
> " [] xfs_iunlock+0x55/0x9f"
> " [] xfs_write+0xa74/0xc61"
> " [] sock_aio_read+0x95/0x99"
> " [] xfs_file_aio_write+0x8f/0xa0"
> " [] do_sync_write+0xc9/0x10f"
> " [] autoremove_wake_function+0x0/0x57"
> " [] generic_file_llseek+0x95/0xbc"
> " [] do_sync_write+0x0/0x10f"
> " [] vfs_write+0xa6/0x179"
> " [] sys_write+0x51/0x80"
> " [] syscall_call+0x7/0xb"
> "Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db
> 0f 88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b>
> 85 9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "
> "EIP: [] generic_file_buffered_write+0x390/0x6cf SS:ESP
> 0068:dd04bd18"
>
> Files:
> http://server113-han.de-nserver.de/filemap.s
> http://server113-han.de-nserver.de/filemap.o
>
You seem to have some kind of hardware/memory problem.

Disassembly of the failing instruction from the oops:

 8b 7c 24 28   mov0x28(%esp),%edi
 8b 85 9c 00 00 00 mov0x9c(%ebp),%eax   <=

Dump of the object code:

   8b 7c 24 28 mov0x28(%esp),%edi
   8b 87 9c 00 00 00   mov0x9c(%edi),%eax

Looks like a bit is flipped.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

Sorry that is not possible - cause it is a production machine.

But i've catched the error and the files from another machine - perhaps 
this helps.


"BUG: unable to handle kernel NULL pointer dereference at virtual 
address 0288"

" printing eip:"
"c0142ff7"
"*pde = "
"Oops:  [#1]"
"SMP "
"Modules linked in: iptable_filter ip_tables x_tables"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at generic_file_buffered_write+0x390/0x6cf"
"eax:    ebx: 01ec   ecx: ea029a40   edx: 8002"
"esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18"
"ds: 007b   es: 007b   ss: 0068"
"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
"Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0 
0010 "
"   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc 
 "
"    01ec dd04beac 00d420b1   dd04bd80 
45b1fa67 "

"Call Trace:"
" [] sock_def_readable+0x7f/0x81"
" [] file_update_time+0xad/0xcb"
" [] xfs_iunlock+0x55/0x9f"
" [] xfs_write+0xa74/0xc61"
" [] sock_aio_read+0x95/0x99"
" [] xfs_file_aio_write+0x8f/0xa0"
" [] do_sync_write+0xc9/0x10f"
" [] autoremove_wake_function+0x0/0x57"
" [] generic_file_llseek+0x95/0xbc"
" [] do_sync_write+0x0/0x10f"
" [] vfs_write+0xa6/0x179"
" [] sys_write+0x51/0x80"
" [] syscall_call+0x7/0xb"
"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f 
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b> 85 
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "
"EIP: [] generic_file_buffered_write+0x390/0x6cf SS:ESP 
0068:dd04bd18"


Files:
http://server113-han.de-nserver.de/filemap.s
http://server113-han.de-nserver.de/filemap.o

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:

It could be, that the options are now different - cause i my first try
was to change the kernel options - if that did not help i switched
back to 2.6.16.37.

Any idea what i can do?

Chuck Ebbert schrieb:

That doesn't match your oops at all.  Did you use a different compiler
and/or
different kernel build options?



If you don't know what changed you can try different options until the
filemap.s
is the same.  You should see

movl   156(%ebp),%eax
testb   16, 48(%eax)


in generic_file_buffered_write.  And you need to regenerate filemap.s
manually
each time.


(Did you test the kernel that you posted these pieces from? If you can
get it to oops
the same way, just post that instead.)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
> It could be, that the options are now different - cause i my first try
> was to change the kernel options - if that did not help i switched
> back to 2.6.16.37.
>
> Any idea what i can do?
>
> Chuck Ebbert schrieb:
>> That doesn't match your oops at all.  Did you use a different compiler
>> and/or
>> different kernel build options?
>>
>>
>
If you don't know what changed you can try different options until the
filemap.s
is the same.  You should see

movl   156(%ebp),%eax
testb   16, 48(%eax)


in generic_file_buffered_write.  And you need to regenerate filemap.s
manually
each time.


(Did you test the kernel that you posted these pieces from? If you can
get it to oops
the same way, just post that instead.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

It could be, that the options are now different - cause i my first try 
was to change the kernel options - if that did not help i switched back 
to 2.6.16.37.


Any idea what i can do?

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:

Hi!

I do everything you like :-) if we can find the bug.

So here are the files (2.6.18.6):
http://server055.de-nserver.de/filemap.o
http://server055.de-nserver.de/filemap.s


If you can, post the file mm/filemap.o from your build directory to some
website.
And do 'make mm/filemap.s' and post that file too.


That doesn't match your oops at all.  Did you use a different compiler
and/or
different kernel build options?




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
> Hi!
>
> I do everything you like :-) if we can find the bug.
>
> So here are the files (2.6.18.6):
> http://server055.de-nserver.de/filemap.o
> http://server055.de-nserver.de/filemap.s
>
>>
>> If you can, post the file mm/filemap.o from your build directory to some
>> website.
>> And do 'make mm/filemap.s' and post that file too.
>>
>
That doesn't match your oops at all.  Did you use a different compiler
and/or
different kernel build options?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
 Hi!

 I do everything you like :-) if we can find the bug.

 So here are the files (2.6.18.6):
 http://server055.de-nserver.de/filemap.o
 http://server055.de-nserver.de/filemap.s


 If you can, post the file mm/filemap.o from your build directory to some
 website.
 And do 'make mm/filemap.s' and post that file too.


That doesn't match your oops at all.  Did you use a different compiler
and/or
different kernel build options?


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

It could be, that the options are now different - cause i my first try 
was to change the kernel options - if that did not help i switched back 
to 2.6.16.37.


Any idea what i can do?

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:

Hi!

I do everything you like :-) if we can find the bug.

So here are the files (2.6.18.6):
http://server055.de-nserver.de/filemap.o
http://server055.de-nserver.de/filemap.s


If you can, post the file mm/filemap.o from your build directory to some
website.
And do 'make mm/filemap.s' and post that file too.


That doesn't match your oops at all.  Did you use a different compiler
and/or
different kernel build options?




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
 It could be, that the options are now different - cause i my first try
 was to change the kernel options - if that did not help i switched
 back to 2.6.16.37.

 Any idea what i can do?

 Chuck Ebbert schrieb:
 That doesn't match your oops at all.  Did you use a different compiler
 and/or
 different kernel build options?



If you don't know what changed you can try different options until the
filemap.s
is the same.  You should see

movl   156(%ebp),%eax
testb   16, 48(%eax)


in generic_file_buffered_write.  And you need to regenerate filemap.s
manually
each time.


(Did you test the kernel that you posted these pieces from? If you can
get it to oops
the same way, just post that instead.)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

Sorry that is not possible - cause it is a production machine.

But i've catched the error and the files from another machine - perhaps 
this helps.


BUG: unable to handle kernel NULL pointer dereference at virtual 
address 0288

 printing eip:
c0142ff7
*pde = 
Oops:  [#1]
SMP 
Modules linked in: iptable_filter ip_tables x_tables
CPU:0
EIP:0060:[c0142ff7]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at generic_file_buffered_write+0x390/0x6cf
eax:    ebx: 01ec   ecx: ea029a40   edx: 8002
esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18
ds: 007b   es: 007b   ss: 0068
Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)
Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0 
0010 
   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc 
 
    01ec dd04beac 00d420b1   dd04bd80 
45b1fa67 

Call Trace:
 [c036d793] sock_def_readable+0x7f/0x81
 [c017a03a] file_update_time+0xad/0xcb
 [c0232015] xfs_iunlock+0x55/0x9f
 [c0262eeb] xfs_write+0xa74/0xc61
 [c036a253] sock_aio_read+0x95/0x99
 [c025d9fb] xfs_file_aio_write+0x8f/0xa0
 [c015fb94] do_sync_write+0xc9/0x10f
 [c0133ad6] autoremove_wake_function+0x0/0x57
 [c015f3d5] generic_file_llseek+0x95/0xbc
 [c015facb] do_sync_write+0x0/0x10f
 [c015fc80] vfs_write+0xa6/0x179
 [c015fe24] sys_write+0x51/0x80
 [c0102d3f] syscall_call+0x7/0xb
Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f 
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 8b 85 
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a 
EIP: [c0142ff7] generic_file_buffered_write+0x390/0x6cf SS:ESP 
0068:dd04bd18


Files:
http://server113-han.de-nserver.de/filemap.s
http://server113-han.de-nserver.de/filemap.o

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:

It could be, that the options are now different - cause i my first try
was to change the kernel options - if that did not help i switched
back to 2.6.16.37.

Any idea what i can do?

Chuck Ebbert schrieb:

That doesn't match your oops at all.  Did you use a different compiler
and/or
different kernel build options?



If you don't know what changed you can try different options until the
filemap.s
is the same.  You should see

movl   156(%ebp),%eax
testb   16, 48(%eax)


in generic_file_buffered_write.  And you need to regenerate filemap.s
manually
each time.


(Did you test the kernel that you posted these pieces from? If you can
get it to oops
the same way, just post that instead.)



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
 Sorry that is not possible - cause it is a production machine.

 But i've catched the error and the files from another machine -
 perhaps this helps.

 BUG: unable to handle kernel NULL pointer dereference at virtual
 address 0288
  printing eip:
 c0142ff7
 *pde = 
 Oops:  [#1]
 SMP 
 Modules linked in: iptable_filter ip_tables x_tables
 CPU:0
 EIP:0060:[c0142ff7]Not tainted VLI
 EFLAGS: 00010246   (2.6.18.6 #1) 
 EIP is at generic_file_buffered_write+0x390/0x6cf
 eax:    ebx: 01ec   ecx: ea029a40   edx: 8002
 esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18
 ds: 007b   es: 007b   ss: 0068
 Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)
 Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
 0010 
080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
  
 01ec dd04beac 00d420b1   dd04bd80
 45b1fa67 
 Call Trace:
  [c036d793] sock_def_readable+0x7f/0x81
  [c017a03a] file_update_time+0xad/0xcb
  [c0232015] xfs_iunlock+0x55/0x9f
  [c0262eeb] xfs_write+0xa74/0xc61
  [c036a253] sock_aio_read+0x95/0x99
  [c025d9fb] xfs_file_aio_write+0x8f/0xa0
  [c015fb94] do_sync_write+0xc9/0x10f
  [c0133ad6] autoremove_wake_function+0x0/0x57
  [c015f3d5] generic_file_llseek+0x95/0xbc
  [c015facb] do_sync_write+0x0/0x10f
  [c015fc80] vfs_write+0xa6/0x179
  [c015fe24] sys_write+0x51/0x80
  [c0102d3f] syscall_call+0x7/0xb
 Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db
 0f 88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 8b
 85 9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a 
 EIP: [c0142ff7] generic_file_buffered_write+0x390/0x6cf SS:ESP
 0068:dd04bd18

 Files:
 http://server113-han.de-nserver.de/filemap.s
 http://server113-han.de-nserver.de/filemap.o

You seem to have some kind of hardware/memory problem.

Disassembly of the failing instruction from the oops:

 8b 7c 24 28   mov0x28(%esp),%edi
 8b 85 9c 00 00 00 mov0x9c(%ebp),%eax   =

Dump of the object code:

   8b 7c 24 28 mov0x28(%esp),%edi
   8b 87 9c 00 00 00   mov0x9c(%edi),%eax

Looks like a bit is flipped.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

Mhm are you shure? I mean i have this problem on 5 servers - all with 
the same mainboard. I cannot believe, that all 5 servers have a hardware 
problem that starts on the same day.


The other thing is - that they all work fine with 2.6.16.x and all other 
kernels before. I mean some of them were used with 2.6.x since two years 
without any problem...


Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


Sorry that is not possible - cause it is a production machine.

But i've catched the error and the files from another machine -
perhaps this helps.

BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288
 printing eip:
c0142ff7
*pde = 
Oops:  [#1]
SMP 
Modules linked in: iptable_filter ip_tables x_tables
CPU:0
EIP:0060:[c0142ff7]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at generic_file_buffered_write+0x390/0x6cf
eax:    ebx: 01ec   ecx: ea029a40   edx: 8002
esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18
ds: 007b   es: 007b   ss: 0068
Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)
Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 
   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 
    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 
Call Trace:
 [c036d793] sock_def_readable+0x7f/0x81
 [c017a03a] file_update_time+0xad/0xcb
 [c0232015] xfs_iunlock+0x55/0x9f
 [c0262eeb] xfs_write+0xa74/0xc61
 [c036a253] sock_aio_read+0x95/0x99
 [c025d9fb] xfs_file_aio_write+0x8f/0xa0
 [c015fb94] do_sync_write+0xc9/0x10f
 [c0133ad6] autoremove_wake_function+0x0/0x57
 [c015f3d5] generic_file_llseek+0x95/0xbc
 [c015facb] do_sync_write+0x0/0x10f
 [c015fc80] vfs_write+0xa6/0x179
 [c015fe24] sys_write+0x51/0x80
 [c0102d3f] syscall_call+0x7/0xb
Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db
0f 88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 8b
85 9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a 
EIP: [c0142ff7] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18

Files:
http://server113-han.de-nserver.de/filemap.s
http://server113-han.de-nserver.de/filemap.o



You seem to have some kind of hardware/memory problem.

Disassembly of the failing instruction from the oops:

 8b 7c 24 28   mov0x28(%esp),%edi
 8b 85 9c 00 00 00 mov0x9c(%ebp),%eax   =

Dump of the object code:

   8b 7c 24 28 mov0x28(%esp),%edi
   8b 87 9c 00 00 00   mov0x9c(%edi),%eax

Looks like a bit is flipped.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
 Hi!

 Mhm are you shure? I mean i have this problem on 5 servers - all with
 the same mainboard. I cannot believe, that all 5 servers have a
 hardware problem that starts on the same day.

 The other thing is - that they all work fine with 2.6.16.x and all
 other kernels before. I mean some of them were used with 2.6.x since
 two years without any problem...

OK it's probably not hardware, but a bit is flipped somehow.

What is different about these servers?

Are you building different kernels for them, or is it just different
drivers loaded?


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi Chuck,
   hi Eric,

cause you both asked me nearly the same i will answer you both in one mail.


 What is different about these servers?
All 300 machines are mostly different. We have Dual Opteron, single P4 
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many more... 
different mainboards etc.


The only thing i found out is, that all these servers (where the problem 
exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


 Are you building different kernels for them, or is it just different
 drivers loaded?
No every machine builds it's own kernel.

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


Hi!

Mhm are you shure? I mean i have this problem on 5 servers - all with
the same mainboard. I cannot believe, that all 5 servers have a
hardware problem that starts on the same day.

The other thing is - that they all work fine with 2.6.16.x and all
other kernels before. I mean some of them were used with 2.6.x since
two years without any problem...



OK it's probably not hardware, but a bit is flipped somehow.

What is different about these servers?

Are you building different kernels for them, or is it just different
drivers loaded?




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-23 Thread Stefan Priebe - FH

Hi!

I do everything you like :-) if we can find the bug.

So here are the files (2.6.18.6):
http://server055.de-nserver.de/filemap.o
http://server055.de-nserver.de/filemap.s

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:

"general protection fault:  [#1]"
"Modules linked in:"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b"
"eax:    ebx: fffe0007   ecx: 0071a4cd   edx: "
"esi:    edi:    ebp: 0015   esp: ce35f8f0"
"ds:    es: 007b   ss: 0068"
"Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)"
"Stack: 0232  0233    000c
 "
"   0007  eca90250 eca90278 0001 eca90200 
03c3 "
"    010003c3 ffc0 ce35fa58 ce35fa58 0001 
 "
"Call Trace:"
" [] xfs_trans_dqresv+0x3f9/0x405"
" [] xfs_bmap_add_extent+0x163/0x377"
" [] xfs_bmapi+0xa4e/0x1109"
" [] xfs_iomap_write_delay+0x233/0x2fa"
" [] xfs_imap_to_bmap+0x29/0x1d6"
" [] xfs_iomap+0x23c/0x3e1"
" [] xfs_iomap+0x2e0/0x3e1"
" [] xfs_bmap+0x1a/0x1e"
" [] __xfs_get_blocks+0x5d/0x195"


Without the "Code:" line it's hard to tell what happened...



and sometimes this one:

"BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288"
" printing eip:"
"c0142ff7"
"*pde = "
"Oops:  [#1]"
"SMP "
"Modules linked in: iptable_filter ip_tables x_tables"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at generic_file_buffered_write+0x390/0x6cf"
"eax:    ebx: 01ec   ecx: ea029a40   edx: 8002"
"esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18"
"ds: 007b   es: 007b   ss: 0068"
"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
"Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 "
"   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 "
"    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 "
"Call Trace:"
" [] sock_def_readable+0x7f/0x81"
" [] file_update_time+0xad/0xcb"
" [] xfs_iunlock+0x55/0x9f"
" [] xfs_write+0xa74/0xc61"
" [] sock_aio_read+0x95/0x99"
" [] xfs_file_aio_write+0x8f/0xa0"
" [] do_sync_write+0xc9/0x10f"
" [] autoremove_wake_function+0x0/0x57"
" [] generic_file_llseek+0x95/0xbc"
" [] do_sync_write+0x0/0x10f"
" [] vfs_write+0xa6/0x179"
" [] sys_write+0x51/0x80"
" [] syscall_call+0x7/0xb"

"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b> 85
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "

"EIP: [] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18"



Well that's strange. It's here in mm/filemap.c line 2201:

/*
 * For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC
 */
if (likely(status >= 0)) {
if (unlikely((file->f_flags & O_SYNC) ||
IS_SYNC(inode))) { <===
if (!a_ops->writepage || !is_sync_kiocb(iocb))
status = generic_osync_inode(inode, mapping,
OSYNC_METADATA|OSYNC_DATA);
}
}

ebp holds the value of 'inode' and it's obviously wrong (it's also the same
as 'written', which is in ebx.) So when it tries to read inode->i_sb, it
dies.

If you can, post the file mm/filemap.o from your build directory to some
website.
And do 'make mm/filemap.s' and post that file too.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-23 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
> I've 3 Servers which works wonderful with 2.6.16.X (also testet the
> latest 2.6.16.37)
>
> but with 2.6.18.6 i get these errors:
>
> "general protection fault:  [#1]"
> "Modules linked in:"
> "CPU:0"
> "EIP:0060:[]Not tainted VLI"
> "EFLAGS: 00010246   (2.6.18.6 #1) "
> "EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b"
> "eax:    ebx: fffe0007   ecx: 0071a4cd   edx: "
> "esi:    edi:    ebp: 0015   esp: ce35f8f0"
> "ds:    es: 007b   ss: 0068"
> "Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)"
> "Stack: 0232  0233    000c
>  "
> "   0007  eca90250 eca90278 0001 eca90200 
> 03c3 "
> "    010003c3 ffc0 ce35fa58 ce35fa58 0001 
>  "
> "Call Trace:"
> " [] xfs_trans_dqresv+0x3f9/0x405"
> " [] xfs_bmap_add_extent+0x163/0x377"
> " [] xfs_bmapi+0xa4e/0x1109"
> " [] xfs_iomap_write_delay+0x233/0x2fa"
> " [] xfs_imap_to_bmap+0x29/0x1d6"
> " [] xfs_iomap+0x23c/0x3e1"
> " [] xfs_iomap+0x2e0/0x3e1"
> " [] xfs_bmap+0x1a/0x1e"
> " [] __xfs_get_blocks+0x5d/0x195"
Without the "Code:" line it's hard to tell what happened...
>
>
> and sometimes this one:
>
> "BUG: unable to handle kernel NULL pointer dereference at virtual
> address 0288"
> " printing eip:"
> "c0142ff7"
> "*pde = "
> "Oops:  [#1]"
> "SMP "
> "Modules linked in: iptable_filter ip_tables x_tables"
> "CPU:0"
> "EIP:0060:[]Not tainted VLI"
> "EFLAGS: 00010246   (2.6.18.6 #1) "
> "EIP is at generic_file_buffered_write+0x390/0x6cf"
> "eax:    ebx: 01ec   ecx: ea029a40   edx: 8002"
> "esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18"
> "ds: 007b   es: 007b   ss: 0068"
> "Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
> "Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
> 0010 "
> "   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
>  "
> "    01ec dd04beac 00d420b1   dd04bd80
> 45b1fa67 "
> "Call Trace:"
> " [] sock_def_readable+0x7f/0x81"
> " [] file_update_time+0xad/0xcb"
> " [] xfs_iunlock+0x55/0x9f"
> " [] xfs_write+0xa74/0xc61"
> " [] sock_aio_read+0x95/0x99"
> " [] xfs_file_aio_write+0x8f/0xa0"
> " [] do_sync_write+0xc9/0x10f"
> " [] autoremove_wake_function+0x0/0x57"
> " [] generic_file_llseek+0x95/0xbc"
> " [] do_sync_write+0x0/0x10f"
> " [] vfs_write+0xa6/0x179"
> " [] sys_write+0x51/0x80"
> " [] syscall_call+0x7/0xb"
>
> "Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
> 88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b> 85
> 9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "
>
> "EIP: [] generic_file_buffered_write+0x390/0x6cf SS:ESP
> 0068:dd04bd18"
>
Well that's strange. It's here in mm/filemap.c line 2201:

/*
 * For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC
 */
if (likely(status >= 0)) {
if (unlikely((file->f_flags & O_SYNC) ||
IS_SYNC(inode))) { <===
if (!a_ops->writepage || !is_sync_kiocb(iocb))
status = generic_osync_inode(inode, mapping,
OSYNC_METADATA|OSYNC_DATA);
}
}

ebp holds the value of 'inode' and it's obviously wrong (it's also the same
as 'written', which is in ebx.) So when it tries to read inode->i_sb, it
dies.

If you can, post the file mm/filemap.o from your build directory to some
website.
And do 'make mm/filemap.s' and post that file too.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-23 Thread Stefan Priebe - FH

Hi!

I can give you an idea of the workload :-) I have the same problem on an 
nearly idle Server. There runs only a few cronjobs (normal Debian System 
crons).


The load was not higher than 0.01 on this system the last 3 days and 
this morning it crashes with the same error.


I've not tested 2.6.19.x cause this one has some problems with SATA AHCI 
driver which we need. But i can manuelly update only this system with 
2.6.19.x and wait some days.


There were no other messages in the log.

Cheers,
   Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 09:07:23AM +0100, Stefan Priebe - FH wrote:

Hi!

The update of the IDE layer was in 2.6.19. I don't think it is a 
hardware bug cause all these 5 machines runs fine since a few years with 
2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
machines began to crash periodically. On friday last week we downgraded 
them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
believe it is a hardware problem. Do you really think that could be?


I was thinking more of a driver change that is being triggered on
that particular hardware. FWIW, did you test 2.6.19?

I really need a better idea of the workload these servers are running
and, ideally, a reproducable test case to track something like
this down. At the moment I have no idea what is going on and no
real information on which to even base a guess.

Were there any other messages in the log?

On Mon, Jan 22, 2007 at 10:42:36AM +0100, Stefan Priebe - FH wrote:

Hi!

I've another idea... could it be, that it is a barrier problem? Since 
barriers are enabled by default from 2.6.17 on ...


You could try turning it off. If it does fix the problem, then I'd be
pointing once again at hardware ;)

Cheers,

Dave.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-23 Thread Chuck Ebbert
Stefan Priebe - FH wrote:
 I've 3 Servers which works wonderful with 2.6.16.X (also testet the
 latest 2.6.16.37)

 but with 2.6.18.6 i get these errors:

 general protection fault:  [#1]
 Modules linked in:
 CPU:0
 EIP:0060:[c01c8fd2]Not tainted VLI
 EFLAGS: 00010246   (2.6.18.6 #1) 
 EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b
 eax:    ebx: fffe0007   ecx: 0071a4cd   edx: 
 esi:    edi:    ebp: 0015   esp: ce35f8f0
 ds:    es: 007b   ss: 0068
 Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)
 Stack: 0232  0233    000c
  
0007  eca90250 eca90278 0001 eca90200 
 03c3 
 010003c3 ffc0 ce35fa58 ce35fa58 0001 
  
 Call Trace:
  [c01b6c58] xfs_trans_dqresv+0x3f9/0x405
  [c01c6485] xfs_bmap_add_extent+0x163/0x377
  [c01cd2c3] xfs_bmapi+0xa4e/0x1109
  [c01ebbe3] xfs_iomap_write_delay+0x233/0x2fa
  [c01eaa31] xfs_imap_to_bmap+0x29/0x1d6
  [c01eae1a] xfs_iomap+0x23c/0x3e1
  [c01eaebe] xfs_iomap+0x2e0/0x3e1
  [c020a71a] xfs_bmap+0x1a/0x1e
  [c020471e] __xfs_get_blocks+0x5d/0x195
Without the Code: line it's hard to tell what happened...


 and sometimes this one:

 BUG: unable to handle kernel NULL pointer dereference at virtual
 address 0288
  printing eip:
 c0142ff7
 *pde = 
 Oops:  [#1]
 SMP 
 Modules linked in: iptable_filter ip_tables x_tables
 CPU:0
 EIP:0060:[c0142ff7]Not tainted VLI
 EFLAGS: 00010246   (2.6.18.6 #1) 
 EIP is at generic_file_buffered_write+0x390/0x6cf
 eax:    ebx: 01ec   ecx: ea029a40   edx: 8002
 esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18
 ds: 007b   es: 007b   ss: 0068
 Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)
 Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
 0010 
080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
  
 01ec dd04beac 00d420b1   dd04bd80
 45b1fa67 
 Call Trace:
  [c036d793] sock_def_readable+0x7f/0x81
  [c017a03a] file_update_time+0xad/0xcb
  [c0232015] xfs_iunlock+0x55/0x9f
  [c0262eeb] xfs_write+0xa74/0xc61
  [c036a253] sock_aio_read+0x95/0x99
  [c025d9fb] xfs_file_aio_write+0x8f/0xa0
  [c015fb94] do_sync_write+0xc9/0x10f
  [c0133ad6] autoremove_wake_function+0x0/0x57
  [c015f3d5] generic_file_llseek+0x95/0xbc
  [c015facb] do_sync_write+0x0/0x10f
  [c015fc80] vfs_write+0xa6/0x179
  [c015fe24] sys_write+0x51/0x80
  [c0102d3f] syscall_call+0x7/0xb

 Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
 88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 8b 85
 9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a 

 EIP: [c0142ff7] generic_file_buffered_write+0x390/0x6cf SS:ESP
 0068:dd04bd18

Well that's strange. It's here in mm/filemap.c line 2201:

/*
 * For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC
 */
if (likely(status = 0)) {
if (unlikely((file-f_flags  O_SYNC) ||
IS_SYNC(inode))) { ===
if (!a_ops-writepage || !is_sync_kiocb(iocb))
status = generic_osync_inode(inode, mapping,
OSYNC_METADATA|OSYNC_DATA);
}
}

ebp holds the value of 'inode' and it's obviously wrong (it's also the same
as 'written', which is in ebx.) So when it tries to read inode-i_sb, it
dies.

If you can, post the file mm/filemap.o from your build directory to some
website.
And do 'make mm/filemap.s' and post that file too.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-23 Thread Stefan Priebe - FH

Hi!

I do everything you like :-) if we can find the bug.

So here are the files (2.6.18.6):
http://server055.de-nserver.de/filemap.o
http://server055.de-nserver.de/filemap.s

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:

general protection fault:  [#1]
Modules linked in:
CPU:0
EIP:0060:[c01c8fd2]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b
eax:    ebx: fffe0007   ecx: 0071a4cd   edx: 
esi:    edi:    ebp: 0015   esp: ce35f8f0
ds:    es: 007b   ss: 0068
Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)
Stack: 0232  0233    000c
 
   0007  eca90250 eca90278 0001 eca90200 
03c3 
    010003c3 ffc0 ce35fa58 ce35fa58 0001 
 
Call Trace:
 [c01b6c58] xfs_trans_dqresv+0x3f9/0x405
 [c01c6485] xfs_bmap_add_extent+0x163/0x377
 [c01cd2c3] xfs_bmapi+0xa4e/0x1109
 [c01ebbe3] xfs_iomap_write_delay+0x233/0x2fa
 [c01eaa31] xfs_imap_to_bmap+0x29/0x1d6
 [c01eae1a] xfs_iomap+0x23c/0x3e1
 [c01eaebe] xfs_iomap+0x2e0/0x3e1
 [c020a71a] xfs_bmap+0x1a/0x1e
 [c020471e] __xfs_get_blocks+0x5d/0x195


Without the Code: line it's hard to tell what happened...



and sometimes this one:

BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288
 printing eip:
c0142ff7
*pde = 
Oops:  [#1]
SMP 
Modules linked in: iptable_filter ip_tables x_tables
CPU:0
EIP:0060:[c0142ff7]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at generic_file_buffered_write+0x390/0x6cf
eax:    ebx: 01ec   ecx: ea029a40   edx: 8002
esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18
ds: 007b   es: 007b   ss: 0068
Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)
Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 
   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 
    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 
Call Trace:
 [c036d793] sock_def_readable+0x7f/0x81
 [c017a03a] file_update_time+0xad/0xcb
 [c0232015] xfs_iunlock+0x55/0x9f
 [c0262eeb] xfs_write+0xa74/0xc61
 [c036a253] sock_aio_read+0x95/0x99
 [c025d9fb] xfs_file_aio_write+0x8f/0xa0
 [c015fb94] do_sync_write+0xc9/0x10f
 [c0133ad6] autoremove_wake_function+0x0/0x57
 [c015f3d5] generic_file_llseek+0x95/0xbc
 [c015facb] do_sync_write+0x0/0x10f
 [c015fc80] vfs_write+0xa6/0x179
 [c015fe24] sys_write+0x51/0x80
 [c0102d3f] syscall_call+0x7/0xb

Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 8b 85
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a 

EIP: [c0142ff7] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18



Well that's strange. It's here in mm/filemap.c line 2201:

/*
 * For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC
 */
if (likely(status = 0)) {
if (unlikely((file-f_flags  O_SYNC) ||
IS_SYNC(inode))) { ===
if (!a_ops-writepage || !is_sync_kiocb(iocb))
status = generic_osync_inode(inode, mapping,
OSYNC_METADATA|OSYNC_DATA);
}
}

ebp holds the value of 'inode' and it's obviously wrong (it's also the same
as 'written', which is in ebx.) So when it tries to read inode-i_sb, it
dies.

If you can, post the file mm/filemap.o from your build directory to some
website.
And do 'make mm/filemap.s' and post that file too.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-23 Thread Stefan Priebe - FH

Hi!

I can give you an idea of the workload :-) I have the same problem on an 
nearly idle Server. There runs only a few cronjobs (normal Debian System 
crons).


The load was not higher than 0.01 on this system the last 3 days and 
this morning it crashes with the same error.


I've not tested 2.6.19.x cause this one has some problems with SATA AHCI 
driver which we need. But i can manuelly update only this system with 
2.6.19.x and wait some days.


There were no other messages in the log.

Cheers,
   Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 09:07:23AM +0100, Stefan Priebe - FH wrote:

Hi!

The update of the IDE layer was in 2.6.19. I don't think it is a 
hardware bug cause all these 5 machines runs fine since a few years with 
2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
machines began to crash periodically. On friday last week we downgraded 
them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
believe it is a hardware problem. Do you really think that could be?


I was thinking more of a driver change that is being triggered on
that particular hardware. FWIW, did you test 2.6.19?

I really need a better idea of the workload these servers are running
and, ideally, a reproducable test case to track something like
this down. At the moment I have no idea what is going on and no
real information on which to even base a guess.

Were there any other messages in the log?

On Mon, Jan 22, 2007 at 10:42:36AM +0100, Stefan Priebe - FH wrote:

Hi!

I've another idea... could it be, that it is a barrier problem? Since 
barriers are enabled by default from 2.6.17 on ...


You could try turning it off. If it does fix the problem, then I'd be
pointing once again at hardware ;)

Cheers,

Dave.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread David Chinner
On Mon, Jan 22, 2007 at 09:07:23AM +0100, Stefan Priebe - FH wrote:
> Hi!
> 
> The update of the IDE layer was in 2.6.19. I don't think it is a 
> hardware bug cause all these 5 machines runs fine since a few years with 
> 2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
> machines began to crash periodically. On friday last week we downgraded 
> them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
> believe it is a hardware problem. Do you really think that could be?

I was thinking more of a driver change that is being triggered on
that particular hardware. FWIW, did you test 2.6.19?

I really need a better idea of the workload these servers are running
and, ideally, a reproducable test case to track something like
this down. At the moment I have no idea what is going on and no
real information on which to even base a guess.

Were there any other messages in the log?

On Mon, Jan 22, 2007 at 10:42:36AM +0100, Stefan Priebe - FH wrote:
> Hi!
> 
> I've another idea... could it be, that it is a barrier problem? Since 
> barriers are enabled by default from 2.6.17 on ...

You could try turning it off. If it does fix the problem, then I'd be
pointing once again at hardware ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread Stefan Priebe - FH

Hi!

I've another idea... could it be, that it is a barrier problem? Since 
barriers are enabled by default from 2.6.17 on ...


Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are "old" Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines.


Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread Stefan Priebe - FH

Hi!

The update of the IDE layer was in 2.6.19. I don't think it is a 
hardware bug cause all these 5 machines runs fine since a few years with 
2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
machines began to crash periodically. On friday last week we downgraded 
them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
believe it is a hardware problem. Do you really think that could be?


Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are "old" Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines.


Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread David Chinner
On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:
> Hi!
> 
> I'm  not shure but perhaps it isn't an XFS Bug.
> 
> Here is what i find out:
> 
> We've about 300 servers at the momentan and 5 of them are "old" Intel 
> Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
> happens on THESE Machines.

Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread David Chinner
On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:
 Hi!
 
 I'm  not shure but perhaps it isn't an XFS Bug.
 
 Here is what i find out:
 
 We've about 300 servers at the momentan and 5 of them are old Intel 
 Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
 happens on THESE Machines.

Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread Stefan Priebe - FH

Hi!

The update of the IDE layer was in 2.6.19. I don't think it is a 
hardware bug cause all these 5 machines runs fine since a few years with 
2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
machines began to crash periodically. On friday last week we downgraded 
them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
believe it is a hardware problem. Do you really think that could be?


Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are old Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines.


Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread Stefan Priebe - FH

Hi!

I've another idea... could it be, that it is a barrier problem? Since 
barriers are enabled by default from 2.6.17 on ...


Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are old Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines.


Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread David Chinner
On Mon, Jan 22, 2007 at 09:07:23AM +0100, Stefan Priebe - FH wrote:
 Hi!
 
 The update of the IDE layer was in 2.6.19. I don't think it is a 
 hardware bug cause all these 5 machines runs fine since a few years with 
 2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
 machines began to crash periodically. On friday last week we downgraded 
 them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
 believe it is a hardware problem. Do you really think that could be?

I was thinking more of a driver change that is being triggered on
that particular hardware. FWIW, did you test 2.6.19?

I really need a better idea of the workload these servers are running
and, ideally, a reproducable test case to track something like
this down. At the moment I have no idea what is going on and no
real information on which to even base a guess.

Were there any other messages in the log?

On Mon, Jan 22, 2007 at 10:42:36AM +0100, Stefan Priebe - FH wrote:
 Hi!
 
 I've another idea... could it be, that it is a barrier problem? Since 
 barriers are enabled by default from 2.6.17 on ...

You could try turning it off. If it does fix the problem, then I'd be
pointing once again at hardware ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-21 Thread Stefan Priebe - FH

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are "old" Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines. Other P4 Machines with a Tyan Mainboard or a 
Gigabyte Mainboard are not affected. All 300 machines runs the same 
Debian 3.0 with self build kernel. Some of these 5 use a 3ware 
controller and some of them the mainboardcontroller. All systems are 
using IDE.


But i cannot say what happens to these machines at the time of failure. 
Sometimes these servers crashed directly after a few minutes. Sometimes 
they run about 2-3 days... i've now downgraded all servers to 2.6.16.37. 
Cause they are production machines... but i have one machine where we 
can test - if you need something.


Here is the output running 2.6.16.37 at the moment:
xfs_growfs -n /

meta-data=/dev/root  isize=256agcount=16, agsize=603855 blks
 =   sectsz=512   attr=0
data =   bsize=4096   blocks=9661680, imaxpct=25
 =   sunit=0  swidth=0 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal   bsize=4096   blocks=4717, version=1
 =   sectsz=512   sunit=0 blks
realtime =none   extsz=65536  blocks=0, rtextents=0

Stefan

David Chinner schrieb:

On Sun, Jan 21, 2007 at 01:30:15PM +0100, Stefan Priebe - FH wrote:

Hello!

I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:


[ EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b ]
[ EIP is at generic_file_buffered_write+0x390/0x6cf ]

Do you have a reproducable test case for these? if not,
do you have any idea what is going on in the system at the time
of the failure?

Can you describe the storage subsystem you are using and post the
output of xfs_growfs -n  on the filesystem that is causing
problems?

Cheers,

Dave.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-21 Thread David Chinner
On Sun, Jan 21, 2007 at 01:30:15PM +0100, Stefan Priebe - FH wrote:
> Hello!
> 
> I've 3 Servers which works wonderful with 2.6.16.X (also testet the
> latest 2.6.16.37)
> 
> but with 2.6.18.6 i get these errors:

[ EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b ]
[ EIP is at generic_file_buffered_write+0x390/0x6cf ]

Do you have a reproducable test case for these? if not,
do you have any idea what is going on in the system at the time
of the failure?

Can you describe the storage subsystem you are using and post the
output of xfs_growfs -n  on the filesystem that is causing
problems?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


XFS or Kernel Problem / Bug

2007-01-21 Thread Stefan Priebe - FH

Hello!

I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:

"general protection fault:  [#1]"
"Modules linked in:"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b"
"eax:    ebx: fffe0007   ecx: 0071a4cd   edx: "
"esi:    edi:    ebp: 0015   esp: ce35f8f0"
"ds:    es: 007b   ss: 0068"
"Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)"
"Stack: 0232  0233    000c
 "
"   0007  eca90250 eca90278 0001 eca90200 
03c3 "
"    010003c3 ffc0 ce35fa58 ce35fa58 0001 
 "
"Call Trace:"
" [] xfs_trans_dqresv+0x3f9/0x405"
" [] xfs_bmap_add_extent+0x163/0x377"
" [] xfs_bmapi+0xa4e/0x1109"
" [] xfs_iomap_write_delay+0x233/0x2fa"
" [] xfs_imap_to_bmap+0x29/0x1d6"
" [] xfs_iomap+0x23c/0x3e1"
" [] xfs_iomap+0x2e0/0x3e1"
" [] xfs_bmap+0x1a/0x1e"
" [] __xfs_get_blocks+0x5d/0x195"


and sometimes this one:

"BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288"
" printing eip:"
"c0142ff7"
"*pde = "
"Oops:  [#1]"
"SMP "
"Modules linked in: iptable_filter ip_tables x_tables"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at generic_file_buffered_write+0x390/0x6cf"
"eax:    ebx: 01ec   ecx: ea029a40   edx: 8002"
"esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18"
"ds: 007b   es: 007b   ss: 0068"
"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
"Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 "
"   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 "
"    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 "
"Call Trace:"
" [] sock_def_readable+0x7f/0x81"
" [] file_update_time+0xad/0xcb"
" [] xfs_iunlock+0x55/0x9f"
" [] xfs_write+0xa74/0xc61"
" [] sock_aio_read+0x95/0x99"
" [] xfs_file_aio_write+0x8f/0xa0"
" [] do_sync_write+0xc9/0x10f"
" [] autoremove_wake_function+0x0/0x57"
" [] generic_file_llseek+0x95/0xbc"
" [] do_sync_write+0x0/0x10f"
" [] vfs_write+0xa6/0x179"
" [] sys_write+0x51/0x80"
" [] syscall_call+0x7/0xb"

"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b> 85
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "

"EIP: [] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18"

Stefan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


XFS or Kernel Problem / Bug

2007-01-21 Thread Stefan Priebe - FH

Hello!

I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:

general protection fault:  [#1]
Modules linked in:
CPU:0
EIP:0060:[c01c8fd2]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b
eax:    ebx: fffe0007   ecx: 0071a4cd   edx: 
esi:    edi:    ebp: 0015   esp: ce35f8f0
ds:    es: 007b   ss: 0068
Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)
Stack: 0232  0233    000c
 
   0007  eca90250 eca90278 0001 eca90200 
03c3 
    010003c3 ffc0 ce35fa58 ce35fa58 0001 
 
Call Trace:
 [c01b6c58] xfs_trans_dqresv+0x3f9/0x405
 [c01c6485] xfs_bmap_add_extent+0x163/0x377
 [c01cd2c3] xfs_bmapi+0xa4e/0x1109
 [c01ebbe3] xfs_iomap_write_delay+0x233/0x2fa
 [c01eaa31] xfs_imap_to_bmap+0x29/0x1d6
 [c01eae1a] xfs_iomap+0x23c/0x3e1
 [c01eaebe] xfs_iomap+0x2e0/0x3e1
 [c020a71a] xfs_bmap+0x1a/0x1e
 [c020471e] __xfs_get_blocks+0x5d/0x195


and sometimes this one:

BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288
 printing eip:
c0142ff7
*pde = 
Oops:  [#1]
SMP 
Modules linked in: iptable_filter ip_tables x_tables
CPU:0
EIP:0060:[c0142ff7]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at generic_file_buffered_write+0x390/0x6cf
eax:    ebx: 01ec   ecx: ea029a40   edx: 8002
esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18
ds: 007b   es: 007b   ss: 0068
Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)
Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 
   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 
    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 
Call Trace:
 [c036d793] sock_def_readable+0x7f/0x81
 [c017a03a] file_update_time+0xad/0xcb
 [c0232015] xfs_iunlock+0x55/0x9f
 [c0262eeb] xfs_write+0xa74/0xc61
 [c036a253] sock_aio_read+0x95/0x99
 [c025d9fb] xfs_file_aio_write+0x8f/0xa0
 [c015fb94] do_sync_write+0xc9/0x10f
 [c0133ad6] autoremove_wake_function+0x0/0x57
 [c015f3d5] generic_file_llseek+0x95/0xbc
 [c015facb] do_sync_write+0x0/0x10f
 [c015fc80] vfs_write+0xa6/0x179
 [c015fe24] sys_write+0x51/0x80
 [c0102d3f] syscall_call+0x7/0xb

Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 8b 85
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a 

EIP: [c0142ff7] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18

Stefan

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-21 Thread David Chinner
On Sun, Jan 21, 2007 at 01:30:15PM +0100, Stefan Priebe - FH wrote:
 Hello!
 
 I've 3 Servers which works wonderful with 2.6.16.X (also testet the
 latest 2.6.16.37)
 
 but with 2.6.18.6 i get these errors:

[ EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b ]
[ EIP is at generic_file_buffered_write+0x390/0x6cf ]

Do you have a reproducable test case for these? if not,
do you have any idea what is going on in the system at the time
of the failure?

Can you describe the storage subsystem you are using and post the
output of xfs_growfs -n mntpt on the filesystem that is causing
problems?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-21 Thread Stefan Priebe - FH

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are old Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines. Other P4 Machines with a Tyan Mainboard or a 
Gigabyte Mainboard are not affected. All 300 machines runs the same 
Debian 3.0 with self build kernel. Some of these 5 use a 3ware 
controller and some of them the mainboardcontroller. All systems are 
using IDE.


But i cannot say what happens to these machines at the time of failure. 
Sometimes these servers crashed directly after a few minutes. Sometimes 
they run about 2-3 days... i've now downgraded all servers to 2.6.16.37. 
Cause they are production machines... but i have one machine where we 
can test - if you need something.


Here is the output running 2.6.16.37 at the moment:
xfs_growfs -n /

meta-data=/dev/root  isize=256agcount=16, agsize=603855 blks
 =   sectsz=512   attr=0
data =   bsize=4096   blocks=9661680, imaxpct=25
 =   sunit=0  swidth=0 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal   bsize=4096   blocks=4717, version=1
 =   sectsz=512   sunit=0 blks
realtime =none   extsz=65536  blocks=0, rtextents=0

Stefan

David Chinner schrieb:

On Sun, Jan 21, 2007 at 01:30:15PM +0100, Stefan Priebe - FH wrote:

Hello!

I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:


[ EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b ]
[ EIP is at generic_file_buffered_write+0x390/0x6cf ]

Do you have a reproducable test case for these? if not,
do you have any idea what is going on in the system at the time
of the failure?

Can you describe the storage subsystem you are using and post the
output of xfs_growfs -n mntpt on the filesystem that is causing
problems?

Cheers,

Dave.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/