Re: 2.4.5 data corruption
> Sometimes it takes either the kernel tree or our website some time to get > in 'sync' with the latest driver version. The latest driver version is > 1.02.00.007. > > There may be DAC960 like /proc support at some point for GUI haters. Publishing enough info to let people write a GPL non gui management tool would be a win in itself - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
On 19 Jun 2001, at 5:00, Stefan Traby wrote: > On Thu, Jun 14, 2001 at 07:20:06PM +0100, Alan Cox wrote: > > > Folks, I believe I have a reproducible test case which corrupts > > > data in 2.4.5. > > > > 2.4.5 has an out of date 3ware driver that is short > > > + 1.02.00.007 - Fix possible null pointer dereferences in > > + tw_ioctl(). > > + Remove check for invalid done function pointer > > + from tw_scsi_queue(). > > hehe, this one keeps the 3dmd from running here at all. Saw that one here too. [...] > (like DAC); I guess that many people would love to get rid > of the - sorry - fucking closed sourced and totally broken 3dmd > which makes an extremly nice product totally useless (you can't > trust it; not only because it's closed source, it simply doesn't > work (except that it wastes memory, that works fine. tested.)) > > -- 3dmd does have a lot of problems, but i thought it was just me. I only made it work once in a machine, and not very well. Last week i installed the latest version in another of my machines and after half an hour wrestling with it - trying to make it change passwords and ask for one, among other things - i gave up. > > ciao - > Stefan > Pedro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
> Well, I do not understand how the driver is distributed. > The actual 3ware stuff won't compile on 2.4.x, and the stuff in kernel > is always different from 3ware releases. The stuff in the -ac tree is directly from 3ware - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
Well, I do not understand how the driver is distributed. The actual 3ware stuff won't compile on 2.4.x, and the stuff in kernel is always different from 3ware releases. The stuff in the -ac tree is directly from 3ware - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
On 19 Jun 2001, at 5:00, Stefan Traby wrote: On Thu, Jun 14, 2001 at 07:20:06PM +0100, Alan Cox wrote: Folks, I believe I have a reproducible test case which corrupts data in 2.4.5. 2.4.5 has an out of date 3ware driver that is short + 1.02.00.007 - Fix possible null pointer dereferences in + tw_ioctl(). + Remove check for invalid done function pointer + from tw_scsi_queue(). hehe, this one keeps the 3dmd from running here at all. Saw that one here too. [...] (like DAC); I guess that many people would love to get rid of the - sorry - fucking closed sourced and totally broken 3dmd which makes an extremly nice product totally useless (you can't trust it; not only because it's closed source, it simply doesn't work (except that it wastes memory, that works fine. tested.)) -- 3dmd does have a lot of problems, but i thought it was just me. I only made it work once in a machine, and not very well. Last week i installed the latest version in another of my machines and after half an hour wrestling with it - trying to make it change passwords and ask for one, among other things - i gave up. ciao - Stefan Pedro - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
Sometimes it takes either the kernel tree or our website some time to get in 'sync' with the latest driver version. The latest driver version is 1.02.00.007. There may be DAC960 like /proc support at some point for GUI haters. Publishing enough info to let people write a GPL non gui management tool would be a win in itself - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
On Thu, Jun 14, 2001 at 07:20:06PM +0100, Alan Cox wrote: > > Folks, I believe I have a reproducible test case which corrupts data in > > 2.4.5. > > 2.4.5 has an out of date 3ware driver that is short > + 1.02.00.007 - Fix possible null pointer dereferences in tw_ioctl(). > + Remove check for invalid done function pointer from > + tw_scsi_queue(). hehe, this one keeps the 3dmd from running here at all. > That might be a first thing to check Well, I do not understand how the driver is distributed. The actual 3ware stuff won't compile on 2.4.x, and the stuff in kernel is always different from 3ware releases. I use two 8-port cards (8 disks each) and I see different but fatal problems on both systems. Is anyone here using an actual firmware and raid-5 ? Does it work up to some level on 6800 ? Anyway, a useful proc-interface would be really cool (like DAC); I guess that many people would love to get rid of the - sorry - fucking closed sourced and totally broken 3dmd which makes an extremly nice product totally useless (you can't trust it; not only because it's closed source, it simply doesn't work (except that it wastes memory, that works fine. tested.)) -- ciao - Stefan " destroy-your-data-by-3dmd-no-need-for-hammer-anymore CNAME www.3ware.com. " - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
On Thu, Jun 14, 2001 at 07:20:06PM +0100, Alan Cox wrote: Folks, I believe I have a reproducible test case which corrupts data in 2.4.5. 2.4.5 has an out of date 3ware driver that is short + 1.02.00.007 - Fix possible null pointer dereferences in tw_ioctl(). + Remove check for invalid done function pointer from + tw_scsi_queue(). hehe, this one keeps the 3dmd from running here at all. That might be a first thing to check Well, I do not understand how the driver is distributed. The actual 3ware stuff won't compile on 2.4.x, and the stuff in kernel is always different from 3ware releases. I use two 8-port cards (8 disks each) and I see different but fatal problems on both systems. Is anyone here using an actual firmware and raid-5 ? Does it work up to some level on 6800 ? Anyway, a useful proc-interface would be really cool (like DAC); I guess that many people would love to get rid of the - sorry - fucking closed sourced and totally broken 3dmd which makes an extremly nice product totally useless (you can't trust it; not only because it's closed source, it simply doesn't work (except that it wastes memory, that works fine. tested.)) -- ciao - Stefan destroy-your-data-by-3dmd-no-need-for-hammer-anymore CNAME www.3ware.com. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
On Fri, Jun 15, 2001 at 11:54:20PM +0400, Eugene Crosser wrote: > In article <[EMAIL PROTECTED]>, > Alan Cox <[EMAIL PROTECTED]> writes: > >> any problems since 2.4.5 was published, they seem to have surfaced > >> immediately after I created a rather big file capturing video with > >> broadcast2000 (video card is bt848). Filesystem is ext2. > > > > Thats something I've seen reported elsehwere. The high bandwidth capture card > > stuff seems to show up problems. It could be drivers could be hardware. On > > my AMD 751 pre release board I see that problem but on the 751 production board > > I dont > > You must be right, today I created another big file with the same program > but without doing caputre and the filesystem was intact. OTOH, > Russell Leighton reports curruption when creating a file with dd... For what it is worth, after having three failures in a row, now it isn't happening. My test case is/was my nightly backup. If it happens again, I'll save the corrupted data so we can do more digging. I'm kicking myself for not having done it the first time around. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
Nuther anecdote: I was creating a big swapfile on ext2 (because 2.4.5 needs too much swap) with dd (SCSI disk on Sym53c8-something controller) and corrupted the partition THEN fsck would cause the kernel to panic. I thought I had some bad hw ... the box sits on my office floor waiting resurrection. Eugene Crosser wrote: > In article <[EMAIL PROTECTED]>, > Alan Cox <[EMAIL PROTECTED]> writes: > >> Folks, I believe I have a reproducible test case which corrupts data in > >> 2.4.5. > > > > 2.4.5 has an out of date 3ware driver that is short > > These days I observed massive FS curruption on vanilla 2.4.5, > SCSI disk on Sym53c8-something controller (UW). I did not notice > any problems since 2.4.5 was published, they seem to have surfaced > immediately after I created a rather big file capturing video with > broadcast2000 (video card is bt848). Filesystem is ext2. > > Eugene > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- --- Russell Leighton[EMAIL PROTECTED] --- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
Nuther anecdote: I was creating a big swapfile on ext2 (because 2.4.5 needs too much swap) with dd (SCSI disk on Sym53c8-something controller) and corrupted the partition THEN fsck would cause the kernel to panic. I thought I had some bad hw ... the box sits on my office floor waiting resurrection. Eugene Crosser wrote: In article [EMAIL PROTECTED], Alan Cox [EMAIL PROTECTED] writes: Folks, I believe I have a reproducible test case which corrupts data in 2.4.5. 2.4.5 has an out of date 3ware driver that is short These days I observed massive FS curruption on vanilla 2.4.5, SCSI disk on Sym53c8-something controller (UW). I did not notice any problems since 2.4.5 was published, they seem to have surfaced immediately after I created a rather big file capturing video with broadcast2000 (video card is bt848). Filesystem is ext2. Eugene - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- --- Russell Leighton[EMAIL PROTECTED] --- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
On Fri, Jun 15, 2001 at 11:54:20PM +0400, Eugene Crosser wrote: In article [EMAIL PROTECTED], Alan Cox [EMAIL PROTECTED] writes: any problems since 2.4.5 was published, they seem to have surfaced immediately after I created a rather big file capturing video with broadcast2000 (video card is bt848). Filesystem is ext2. Thats something I've seen reported elsehwere. The high bandwidth capture card stuff seems to show up problems. It could be drivers could be hardware. On my AMD 751 pre release board I see that problem but on the 751 production board I dont You must be right, today I created another big file with the same program but without doing caputre and the filesystem was intact. OTOH, Russell Leighton reports curruption when creating a file with dd... For what it is worth, after having three failures in a row, now it isn't happening. My test case is/was my nightly backup. If it happens again, I'll save the corrupted data so we can do more digging. I'm kicking myself for not having done it the first time around. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
> any problems since 2.4.5 was published, they seem to have surfaced > immediately after I created a rather big file capturing video with > broadcast2000 (video card is bt848). Filesystem is ext2. Thats something I've seen reported elsehwere. The high bandwidth capture card stuff seems to show up problems. It could be drivers could be hardware. On my AMD 751 pre release board I see that problem but on the 751 production board I dont - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
any problems since 2.4.5 was published, they seem to have surfaced immediately after I created a rather big file capturing video with broadcast2000 (video card is bt848). Filesystem is ext2. Thats something I've seen reported elsehwere. The high bandwidth capture card stuff seems to show up problems. It could be drivers could be hardware. On my AMD 751 pre release board I see that problem but on the 751 production board I dont - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
On Tuesday, June 12, 2001 01:17:49 PM -0700 Larry McVoy <[EMAIL PROTECTED]> wrote: > Folks, I believe I have a reproducible test case which corrupts data in > 2.4.5. > > We do nightly, weekly, and monthly backups by copying our entire /home > partition on the company file server: > > FilesystemSize Used Avail Use% Mounted on > /dev/hda1 1.9G 1.7G 123M 93% / > /dev/hda6 1.9G 437M 1.4G 23% /tmp What flavor of IDE controller? Where is swap? -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
On Tue, Jun 12, 2001 at 01:17:49PM -0700, Larry McVoy wrote: > Folks, I believe I have a reproducible test case which corrupts data in > 2.4.5. Why don't you send the test case to the list? I would love to try it out and it would be a good addition to LTP. -- Nate Straz [EMAIL PROTECTED] sgi, inc http://www.sgi.com/ Linux Test Project http://ltp.sf.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
On Tue, Jun 12, 2001 at 01:17:49PM -0700, Larry McVoy wrote: Folks, I believe I have a reproducible test case which corrupts data in 2.4.5. Why don't you send the test case to the list? I would love to try it out and it would be a good addition to LTP. -- Nate Straz [EMAIL PROTECTED] sgi, inc http://www.sgi.com/ Linux Test Project http://ltp.sf.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.5 data corruption
On Tuesday, June 12, 2001 01:17:49 PM -0700 Larry McVoy [EMAIL PROTECTED] wrote: Folks, I believe I have a reproducible test case which corrupts data in 2.4.5. We do nightly, weekly, and monthly backups by copying our entire /home partition on the company file server: FilesystemSize Used Avail Use% Mounted on /dev/hda1 1.9G 1.7G 123M 93% / /dev/hda6 1.9G 437M 1.4G 23% /tmp What flavor of IDE controller? Where is swap? -chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.4.5 data corruption
Folks, I believe I have a reproducible test case which corrupts data in 2.4.5. We do nightly, weekly, and monthly backups by copying our entire /home partition on the company file server: FilesystemSize Used Avail Use% Mounted on /dev/hda1 1.9G 1.7G 123M 93% / /dev/hda6 1.9G 437M 1.4G 23% /tmp /dev/sda1 37G 26G 11G 71% /home /dev/sdc1 37G 26G 11G 70% /weekly /dev/sdd1 37G 24G 13G 65% /monthly /dev/sdb1 37G 26G 11G 71% /nightly The sd? drives are actually ide drives on a 3ware escalade controller. I have reason to believe the drives are good, before I installed them I scrubbed them with varying data patterns and verified that that I got back what I put there. All tested cleanly overnight. I recently added an integrity check to our backups - the integrity checker writes out the path, the gzip adler32 checksum, the size, and the mtime of each file. Each time I do a backup, the backup scripts look for the integrity listing in the other partitions and compares all files with the same path, size, and modtime. This morning I had a pile of errors after things having gone smoothly for the last few weeks. I suspected that I had screwed something up, looked over the backup scripts, simplified them down to a simple cpio, and tried again. Another pile of errors, different set of files. In both cases, the newly created files were corrupted, the ones on the live /home partition as well as the /weekly & /monthly partitions all compared cleanly. I rebooted into 2.2.19, tried again, no errors. I was running 2.4.5, no patches. I power cycled the machine between each reboot, went through the bios memory check, and also went through my own memory check; memory does not seem to be an issue. I think I can reproduce this, it takes a reboot and about 2 hours. I made it happen twice with 2.4.5, the first try on 2.2.19 did not work. The data corruption looks like *extra* bytes added at the beginning of files. I only looked at a few, if we go down the path of debugging this I'll save them all next time. The extra byte counts were small, in one case there was the letter "1" added to the start of the file, other than that it was identical. That's really weird, as a file system guy, I'd expect to see blocks of data not small chunks of data. Very strange. One thing I haven't done is to rule out the 3ware controller. I tend to doubt it is the problem but who knows. There were no kernel messages complaining about anything during the backup, so the kernel doesn't seem to know there is a problem. So, does anyone recognize these symptoms? Does anyone care? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.4.5 data corruption
Folks, I believe I have a reproducible test case which corrupts data in 2.4.5. We do nightly, weekly, and monthly backups by copying our entire /home partition on the company file server: FilesystemSize Used Avail Use% Mounted on /dev/hda1 1.9G 1.7G 123M 93% / /dev/hda6 1.9G 437M 1.4G 23% /tmp /dev/sda1 37G 26G 11G 71% /home /dev/sdc1 37G 26G 11G 70% /weekly /dev/sdd1 37G 24G 13G 65% /monthly /dev/sdb1 37G 26G 11G 71% /nightly The sd? drives are actually ide drives on a 3ware escalade controller. I have reason to believe the drives are good, before I installed them I scrubbed them with varying data patterns and verified that that I got back what I put there. All tested cleanly overnight. I recently added an integrity check to our backups - the integrity checker writes out the path, the gzip adler32 checksum, the size, and the mtime of each file. Each time I do a backup, the backup scripts look for the integrity listing in the other partitions and compares all files with the same path, size, and modtime. This morning I had a pile of errors after things having gone smoothly for the last few weeks. I suspected that I had screwed something up, looked over the backup scripts, simplified them down to a simple cpio, and tried again. Another pile of errors, different set of files. In both cases, the newly created files were corrupted, the ones on the live /home partition as well as the /weekly /monthly partitions all compared cleanly. I rebooted into 2.2.19, tried again, no errors. I was running 2.4.5, no patches. I power cycled the machine between each reboot, went through the bios memory check, and also went through my own memory check; memory does not seem to be an issue. I think I can reproduce this, it takes a reboot and about 2 hours. I made it happen twice with 2.4.5, the first try on 2.2.19 did not work. The data corruption looks like *extra* bytes added at the beginning of files. I only looked at a few, if we go down the path of debugging this I'll save them all next time. The extra byte counts were small, in one case there was the letter 1 added to the start of the file, other than that it was identical. That's really weird, as a file system guy, I'd expect to see blocks of data not small chunks of data. Very strange. One thing I haven't done is to rule out the 3ware controller. I tend to doubt it is the problem but who knows. There were no kernel messages complaining about anything during the backup, so the kernel doesn't seem to know there is a problem. So, does anyone recognize these symptoms? Does anyone care? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/