Re: 2.4.1-pre8 losing pages

2001-01-28 Thread Andre Hedrick

On Sun, 28 Jan 2001, Peter Horton wrote:

> Okay, scratch that. It does still happen when there's no swap, but for
> some reason it happens a lot less often. Looks like it's timing related,
> it only fails when using 7200rpm drives, not older 5400rpm ones (even
> though they too are using UDMA33). I've ruled out the filing system, the
> IDE controller, the drives and the RAM, so that leaves the kernel or the
> CPU - I'll try and beg/borrow/steal another CPU and try that. I can
> compile kernels / run X whilst the test is running without a problem so it
> looks like it's the bulk write that's the problem.

Peter, did the scratch-test series pass or fail?
Did it report any bit failures on the check?

Cheers,

Andre Hedrick
Linux ATA Development

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre8 losing pages

2001-01-28 Thread Peter Horton

On Fri, Jan 26, 2001 at 07:48:05PM +, Russell King wrote:
> Peter Horton writes:
> > The corruption is dependent on having a swapped on swap partition. If I
> > "swapoff" the corruption goes away, but it comes back when I "swapon"
> > again. I feel this a kernel bug, but as I'm the only person out here who's
> > seeing it I'm at a loss ...
> 
> The reason I ask is that on an ARM box running plain 2.4.0 with swap
> enabled I get rather a lot of SEGVs.  Turn swap off, and I don't see
> any.
> 
> It sounds like it may be related.
> 

Okay, scratch that. It does still happen when there's no swap, but for
some reason it happens a lot less often. Looks like it's timing related,
it only fails when using 7200rpm drives, not older 5400rpm ones (even
though they too are using UDMA33). I've ruled out the filing system, the
IDE controller, the drives and the RAM, so that leaves the kernel or the
CPU - I'll try and beg/borrow/steal another CPU and try that. I can
compile kernels / run X whilst the test is running without a problem so it
looks like it's the bulk write that's the problem.

P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre8 losing pages

2001-01-26 Thread Russell King

Peter Horton writes:
> The corruption is dependent on having a swapped on swap partition. If I
> "swapoff" the corruption goes away, but it comes back when I "swapon"
> again. I feel this a kernel bug, but as I'm the only person out here who's
> seeing it I'm at a loss ...

What compiler are you using?

The reason I ask is that on an ARM box running plain 2.4.0 with swap
enabled I get rather a lot of SEGVs.  Turn swap off, and I don't see
any.

It sounds like it may be related.

--
Russell King ([EMAIL PROTECTED])The developer of ARM Linux
 http://www.arm.linux.org.uk/personal/aboutme.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre8 losing pages

2001-01-26 Thread Peter Horton

On Fri, Jan 26, 2001 at 09:24:12AM +, Peter Horton wrote:
> On Fri, Jan 26, 2001 at 03:20:33AM +0100, Xuan Baldauf wrote:
> > 
> > Peter Horton wrote:
> > 
> > > I'm experiencing repeatable corruption whilst writing large volumes of
> > > data to disk. Kernel version is 2.4.1-pre8, on an 850MHz AMD Athlon on an
> > > ASUS A7V (VIA KT133 chipset) motherboard 128M RAM (tested with 'memtest86'
> > > for 10 hours).
> > >
> > 
> 
> ... this is the kinda output I get on most runs :-
> 
>Linux mole-rat 2.4.1-pre10 #1 Fri Jan 26 08:48:55 GMT 2001 i686 unknown
>...
>aa6a64589748321899bab2b66f71427f  testt
>aa6a64589748321899bab2b66f71427f  testu
>aa6a64589748321899bab2b66f71427f  testv
>9dde1bed276e32a1f9af98c87ab05978  testw
>aa6a64589748321899bab2b66f71427f  testx
>aa6a64589748321899bab2b66f71427f  testy
>aa6a64589748321899bab2b66f71427f  testz
>mole-rat:~# cmp testw testx
>testw testx differ: char 110862337, line 433772
>mole-rat:~# cmp -i $(( 110862336 + 4096 )) testw testx
>mole-rat:~# echo $(( 110862336 % 4096 ))
>0
> 
> > 
> > I cannot reproduce your behaviour in 2.4.1-pre9.
> > 
> 

The corruption is dependent on having a swapped on swap partition. If I
"swapoff" the corruption goes away, but it comes back when I "swapon"
again. I feel this a kernel bug, but as I'm the only person out here who's
seeing it I'm at a loss ...

P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre8 losing pages

2001-01-26 Thread Peter Horton

On Fri, Jan 26, 2001 at 03:20:33AM +0100, Xuan Baldauf wrote:
> 
> Peter Horton wrote:
> 
> > I'm experiencing repeatable corruption whilst writing large volumes of
> > data to disk. Kernel version is 2.4.1-pre8, on an 850MHz AMD Athlon on an
> > ASUS A7V (VIA KT133 chipset) motherboard 128M RAM (tested with 'memtest86'
> > for 10 hours).
> >
> 
> So what output does following bash script produce?
> 

Well this is the script I've been testing with ...

   #!/bin/bash -x
   set -e
   uname -a
   rm -f test test[a-z]
   dd if=/dev/urandom of=test bs=1024k count=128
   for I in a b c d e f g h i j k l m n o p q r s t u v w x y z; do
   cp test test$I
   done
   md5sum test*

... this is the kinda output I get on most runs :-

   Linux mole-rat 2.4.1-pre10 #1 Fri Jan 26 08:48:55 GMT 2001 i686 unknown
   ...
   aa6a64589748321899bab2b66f71427f  testt
   aa6a64589748321899bab2b66f71427f  testu
   aa6a64589748321899bab2b66f71427f  testv
   9dde1bed276e32a1f9af98c87ab05978  testw
   aa6a64589748321899bab2b66f71427f  testx
   aa6a64589748321899bab2b66f71427f  testy
   aa6a64589748321899bab2b66f71427f  testz
   mole-rat:~# cmp testw testx
   testw testx differ: char 110862337, line 433772
   mole-rat:~# cmp -i $(( 110862336 + 4096 )) testw testx
   mole-rat:~# echo $(( 110862336 % 4096 ))
   0

> 
> I cannot reproduce your behaviour in 2.4.1-pre9.
> 

No, I can't find anybody else who can either. Maybe I've got a dodgy CPU
:-(

P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre8 losing pages

2001-01-25 Thread Xuan Baldauf



Peter Horton wrote:

> I'm experiencing repeatable corruption whilst writing large volumes of
> data to disk. Kernel version is 2.4.1-pre8, on an 850MHz AMD Athlon on an
> ASUS A7V (VIA KT133 chipset) motherboard 128M RAM (tested with 'memtest86'
> for 10 hours).
>
> First, I realised that the fsck was noticing small corruptions on my ext2
> volume. My first suspect was the much discussed VIA IDE controller. As a
> test I created a 128M file from "urandom" and copied it to twenty six
> files. When I MD5 the files one or two of them are usually corrupt. The
> damage usually occurs in the 24th copy (thought not always). Inspecting
> the files shows a single 4K block (aligned on a 4K boundary) that is
> completely different from what it should be. The kernel logs no errors
> whilst writing the corrupt files.
>
> I've repeated the test on the other on-board IDE controller (Promise), a
> different hard disk, and on reiserfs. I see the corruption in all cases.
>
> I tried building the kernel for "Pentium-Classic", and I tried a few older
> kernels (2.4.0-test5 and 2.4.0-test12), still bad (all kernels built with
> GCC 2.95.2 - Debian potato).
>
> I really could do with some help as where to look next :-). I did try and
> come up with a test to see whether bad data is written or whether the
> damaged piece is just not written, but if I alter the testing procedure
> too much the problem seems to go away. It seems to just lose a single page
> under one very specific circumstance.

So what output does following bash script produce?

#!/bin/bash
uname -a
dd if=/dev/urandom of=test0 bs=1024k count=128
I=1
while test $I -lt 32; do
  echo $I
  cp test0 test$I
  I="$(($I+1))"
done
md5sum test*

I cannot reproduce your behaviour in 2.4.1-pre9.

Xuân.

>
>
> P.
>
> ( configs attached )
>
>   
>   Name: info.tar.gz
>info.tar.gzType: Unix Tape Archive (application/x-tar)
>   Encoding: base64
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.1-pre8 losing pages

2001-01-25 Thread Peter Horton

I'm experiencing repeatable corruption whilst writing large volumes of
data to disk. Kernel version is 2.4.1-pre8, on an 850MHz AMD Athlon on an
ASUS A7V (VIA KT133 chipset) motherboard 128M RAM (tested with 'memtest86'
for 10 hours).

First, I realised that the fsck was noticing small corruptions on my ext2
volume. My first suspect was the much discussed VIA IDE controller. As a
test I created a 128M file from "urandom" and copied it to twenty six
files. When I MD5 the files one or two of them are usually corrupt. The
damage usually occurs in the 24th copy (thought not always). Inspecting
the files shows a single 4K block (aligned on a 4K boundary) that is
completely different from what it should be. The kernel logs no errors
whilst writing the corrupt files.

I've repeated the test on the other on-board IDE controller (Promise), a
different hard disk, and on reiserfs. I see the corruption in all cases.

I tried building the kernel for "Pentium-Classic", and I tried a few older
kernels (2.4.0-test5 and 2.4.0-test12), still bad (all kernels built with
GCC 2.95.2 - Debian potato).

I really could do with some help as where to look next :-). I did try and
come up with a test to see whether bad data is written or whether the
damaged piece is just not written, but if I alter the testing procedure
too much the problem seems to go away. It seems to just lose a single page
under one very specific circumstance.

P.

( configs attached )


 info.tar.gz