Re: 2.4 ate my filesystem on rw-mount, summary
Ok, folks, it's time for a summary. Since my last post, I've had time to experiment a bit more, and I've also had some private communication with Vojtech. First, I would like to say that you do need quite a bit of bad luck (or hardware) to have the same problems I did. Linux 2.4, VIA and IDE works very well for most users. But I really recommend making a backup of all your vital data before installing 2.4 and enabling DMA with IDE disks. (And, yes, I did this. Honest! :-) ) Problem log === 1. Installed RedHat 7 2. Built 2.4.0 with VIA driver and DMA by default (well, in 2.4.0, the VIA driver will always use DMA by default, wheather you want to or not.) 3. Rebooted -> 2.4.0 4. The computer froze on the remounting root read-write message. 5. Powercycle 6. Rebooted -> 2.2.16-22 7. Got a corrupt disk, missing files, moved files, incorrect file contents 8. Goto 1 So, why did this happen? Problem one === This one really makes me upset, because had it not been for this one, it would have been soo much easier to find the cause of the problem. It is also so easy to fix. The problem is that the RedHat disables all kernel messages during boot, except for panics. I my not so very humble opinion, kernel error messages, and possibly also warning messages, should of course be shown. It can easyly be fixed by editing /etc/sysconfig/init. The error messages that was hidden by RH7, was a couple of CRC error messages, and then an endless stream of "Busy" and "Drive not ready for command" errors. More on this later. Problem two === The computer in question has problems with UDMA(33), otherwise I would not have gotten CRC errors, and everything would have been fine. Why I do get CRC errors, one can so far only speculate, especially since I am able to use UDMA(66) with another drive, on the same controller, without much trouble. One theory is that the PCI bus clock may be too fast, and the drive cannot catch up. To check this, I plan to measure the PCI clock to see if this is true. Quick measurements with a not too great oscilloscope seems to indicate a clock speed of around 33.3-33.4 MHz, so it may actully be out of spec, but not by much. Another theory is that the CRC errors are caused by bad cables, connectors, or motherboard, but the fact that I can use UDMA(66) on the same controller seems to contradicts this. But OTOH I have learnt not to underestimate the amazing amount of trouble a bad cable can cause. Possible work-arounds include a "idebus=40" kernel option, or using hdparm to configure the drive and kernel for UDMA(22). Problem three = The drive that gave me these problems is a SAMSUNG VG34323A, and the problem with this drive is that it does not seem to recover from CRC errors. Once I get my first CRC error, the drive becomes permanently busy, until I power cycle. Problem four I do not know exactly what Linux is doing when remounting a partition read-write, but it does seem to update some very sensitive sectors, and when the write fails, a lot of very vital data is destroyed. It is perhaps questionable whether the destruction of a couple of files would be much better than the destruction of /dev, but I think it is. Lesson == Be very careful when enabling DMA on a Linux machine, especially on cheap hardware. It is not enough to test DMA on a read-only partition first, since writing is a completely different story. ...and probably some more things that I either forgot, or are too painful to remember... /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, summary
Ok, folks, it's time for a summary. Since my last post, I've had time to experiment a bit more, and I've also had some private communication with Vojtech. First, I would like to say that you do need quite a bit of bad luck (or hardware) to have the same problems I did. Linux 2.4, VIA and IDE works very well for most users. But I really recommend making a backup of all your vital data before installing 2.4 and enabling DMA with IDE disks. (And, yes, I did this. Honest! :-) ) Problem log === 1. Installed RedHat 7 2. Built 2.4.0 with VIA driver and DMA by default (well, in 2.4.0, the VIA driver will always use DMA by default, wheather you want to or not.) 3. Rebooted - 2.4.0 4. The computer froze on the remounting root read-write message. 5. Powercycle 6. Rebooted - 2.2.16-22 7. Got a corrupt disk, missing files, moved files, incorrect file contents 8. Goto 1 So, why did this happen? Problem one === This one really makes me upset, because had it not been for this one, it would have been soo much easier to find the cause of the problem. It is also so easy to fix. The problem is that the RedHat disables all kernel messages during boot, except for panics. I my not so very humble opinion, kernel error messages, and possibly also warning messages, should of course be shown. It can easyly be fixed by editing /etc/sysconfig/init. The error messages that was hidden by RH7, was a couple of CRC error messages, and then an endless stream of "Busy" and "Drive not ready for command" errors. More on this later. Problem two === The computer in question has problems with UDMA(33), otherwise I would not have gotten CRC errors, and everything would have been fine. Why I do get CRC errors, one can so far only speculate, especially since I am able to use UDMA(66) with another drive, on the same controller, without much trouble. One theory is that the PCI bus clock may be too fast, and the drive cannot catch up. To check this, I plan to measure the PCI clock to see if this is true. Quick measurements with a not too great oscilloscope seems to indicate a clock speed of around 33.3-33.4 MHz, so it may actully be out of spec, but not by much. Another theory is that the CRC errors are caused by bad cables, connectors, or motherboard, but the fact that I can use UDMA(66) on the same controller seems to contradicts this. But OTOH I have learnt not to underestimate the amazing amount of trouble a bad cable can cause. Possible work-arounds include a "idebus=40" kernel option, or using hdparm to configure the drive and kernel for UDMA(22). Problem three = The drive that gave me these problems is a SAMSUNG VG34323A, and the problem with this drive is that it does not seem to recover from CRC errors. Once I get my first CRC error, the drive becomes permanently busy, until I power cycle. Problem four speculationI do not know exactly what Linux is doing when remounting a partition read-write, but it does seem to update some very sensitive sectors, and when the write fails, a lot of very vital data is destroyed. It is perhaps questionable whether the destruction of a couple of files would be much better than the destruction of /dev, but I think it is. /speculation Lesson == Be very careful when enabling DMA on a Linux machine, especially on cheap hardware. It is not enough to test DMA on a read-only partition first, since writing is a completely different story. ...and probably some more things that I either forgot, or are too painful to remember... /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, Jan 14, 2001 at 06:59:57PM +0100, Tobias Ringstrom wrote: > > I should also add that the 3.11 driver seems to make things better, but > not yet perfect. My intuition tells me that I get CRC errors much sooner > with 2.1e than with 3.11. > > Has the timings changed from 2.1e to 3.11, and would it be easy to modify > 3.11 to get extra safe/paranoid, but less high performance, timings? If you use 'idebus=40' or 'idebus=50', the driver will add an extra margin to the timings, trying to compensate for the 40 or 50 MHz PCI bus it will be tricked to think it's working with. This could add a data point, yes. > Some extra data: > * B seems to work in 2 with udma2 > * A seems to work in 2 with udma1, but not with udma2. UDMA1 is 22.2 MB/sec, UDMA2 is 33.3. UDMA0 is 16.6. Could you (if didn't already) send me the lspci -vvxxx after the -X65 (UDMA1) command, together with the one before? That also could tell something. > I wouldn't say it's rock solid, and I would not trust my data to any of > these combinations, but at least it not break immmediately (i.e. for less > than 1 GB written). Actually, the CRC messages are safe and only mean a data transfer is retried. That is, only if it doesn't fail every time. They happen on many boards and drives using UDMA even under normal correct operation :( > The worst combination is 2.4.0 with VIA 2.1e and A in 1. Going from 2.1e > to 3.11 helps, but it is still very bad. > > I'd really like to be more precise, but there are too many combinations to > try to try them all, and sometimes it fails right away, and sometimes > after several hundred megabytes. If 'fails after several hundred megabytes' only means a single CRC error which is recovered from correctly, then that actually means 'working and probably would work perfect with a shorter cable'. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
I should also add that the 3.11 driver seems to make things better, but not yet perfect. My intuition tells me that I get CRC errors much sooner with 2.1e than with 3.11. Has the timings changed from 2.1e to 3.11, and would it be easy to modify 3.11 to get extra safe/paranoid, but less high performance, timings? Some extra data: * B seems to work in 2 with udma2 * A seems to work in 2 with udma1, but not with udma2. I wouldn't say it's rock solid, and I would not trust my data to any of these combinations, but at least it not break immmediately (i.e. for less than 1 GB written). The worst combination is 2.4.0 with VIA 2.1e and A in 1. Going from 2.1e to 3.11 helps, but it is still very bad. I'd really like to be more precise, but there are too many combinations to try to try them all, and sometimes it fails right away, and sometimes after several hundred megabytes. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, 14 Jan 2001, Vojtech Pavlik wrote: > > > So the drive *did* work on the vt82c686a in the A7V board? You tested it > > > both on the Promise and on the 686a? But doesn't work on the 686a in > > > your other board? > > > > Yes, on both the Promise and on the 686a. But the device revisions are > > different. The machine that does NOT work: > > > > 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) > > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) > > > > The machine that works: > > > > 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22) > > 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) > > > > The one the works is a 1 GHz Athlon, and the other is an 800 MHz > > Pentium-III. Of course is isn't. The vt82c686 that does not work is a 450 MHz K-6, not a PIII. > > > > no matter what cable I use. When I get this, the machine does not recover > > > > most of the time, and I have to reset or power cycle. > > > > > > It should be able to recover in a couple (up to 10) minutes ... > > > > Who waits 10 minutes for a timeout? Can it be lowered? > > It's not a 10 minute timeout, it's a shorter timeout retried many times. > Not my code, though - this is generic PCI IDE code, and is a huge mess. What I get is a number of Busy and Drive is not ready for command for different sectors. > > Expect another mail with the data you requested within a couple of hours. > > Thanks a lot. Ok, it took a bit longer that that, mostly because me and my whife had unexpected (but very welcome) guests at home. It is Sunday, after all... I have attached a tar file with "lspci -vvxxx" and "hdinfo -i" for machine 1 and 2 to this mail, but first some comments. I will be talking about three machines: 1) 450 MHz K-6 on an AOpen MX59 PRO II motherboard 2) 800 MHz PIII on an unknown cheap/crappy motherboard. 3) 1 GHz Athlon on an ASUS A7V motherboard. and the following drives: A) SAMSUNG VG34323A, sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 udma2 B) ST38421A, mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 Machine 3 is the machine at home, and it does not have problems with any disks I have tried soo far, and seems very stable, both with ATA100 and ATA66. I verified that what is happening when RH7 tries to remount / read-write, is that I get the infamous CRC errors. It does not seem to recover from this state. At least I did not wait that long. I do not think that the RH7 kernel 2.2.16-22 uses udma2 at any time, and that may be why it works. Disk B does NOT work with DMA enabled with machine 1 or 2. It works better than disk A, but it does still fail after some time. The combination 1B was the most stable, and only failed once. When using disk B, the computer has managed to recover from the CRC error condition every time, as opposed to disk A which never recovers. (Busy) Using hdparm -X65 (udma1) makes disk A work with 2.4 in machine 2. What is the difference between udma1 and udma2? Now I'm almost completely lost. Hope this helps. Let me know if you want me to try something else. /Tobias /dev/hde: Model=SAMSUNG VG34323A (4.32GB), FwRev=GQ200, SerialNo=dW1921060033c8 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } RawCHS=14896/9/63, TrkSize=32256, SectSize=512, ECCbytes=21 BuffType=DualPortCache, BuffSize=496kB, MaxMultSect=16, MultSect=off CurCHS=14896/9/63, CurSects=-531627904, LBA=yes, LBAsects=8446032 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 *udma2 00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02) Subsystem: Asustek Computer, Inc.: Unknown device 8033 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00: 06 11 05 03 06 00 10 a2 02 00 00 06 00 00 00 00 10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 17 a4 6b b4 4f 81 10 10 80 00 08 10 10 10 10 10 60: 03 ff 00 b0 e6 e5 e5 00 44 7c 86 0f 08 3f 00 00 70: de 80 cc 0c 0e a1 d2 00 01 b4 11 02 00 00 00 01 80: 0f 40 00 00 80 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 17 02 00 1f 00 00 00 00 6e 02 14 00 b0: 61 ec 80 e5 32 33 28 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, Jan 14, 2001 at 09:45:09AM +0100, Tobias Ringstrom wrote: > On Sun, 14 Jan 2001, Vojtech Pavlik wrote: > > On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote: > > > > > I have now tried the SAMSUNG VG34323A disk with two other controllers at > > > home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V > > > motherboard), and there are no problems to be found with DMA enabled. > > > Streaming 10 MB/s without glitches. > > > > So the drive *did* work on the vt82c686a in the A7V board? You tested it > > both on the Promise and on the 686a? But doesn't work on the 686a in > > your other board? > > Yes, on both the Promise and on the 686a. But the device revisions are > different. The machine that does NOT work: > > 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) > > The machine that works: > > 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22) > 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) > > The one the works is a 1 GHz Athlon, and the other is an 800 MHz > Pentium-III. > > > > no matter what cable I use. When I get this, the machine does not recover > > > most of the time, and I have to reset or power cycle. > > > > It should be able to recover in a couple (up to 10) minutes ... > > Who waits 10 minutes for a timeout? Can it be lowered? It's not a 10 minute timeout, it's a shorter timeout retried many times. Not my code, though - this is generic PCI IDE code, and is a huge mess. > Expect another mail with the data you requested within a couple of hours. Thanks a lot. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, 14 Jan 2001, Vojtech Pavlik wrote: > On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote: > > > I have now tried the SAMSUNG VG34323A disk with two other controllers at > > home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V > > motherboard), and there are no problems to be found with DMA enabled. > > Streaming 10 MB/s without glitches. > > So the drive *did* work on the vt82c686a in the A7V board? You tested it > both on the Promise and on the 686a? But doesn't work on the 686a in > your other board? Yes, on both the Promise and on the 686a. But the device revisions are different. The machine that does NOT work: 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) The machine that works: 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22) 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) The one the works is a 1 GHz Athlon, and the other is an 800 MHz Pentium-III. > > no matter what cable I use. When I get this, the machine does not recover > > most of the time, and I have to reset or power cycle. > > It should be able to recover in a couple (up to 10) minutes ... Who waits 10 minutes for a timeout? Can it be lowered? Expect another mail with the data you requested within a couple of hours. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote: > I have now tried the SAMSUNG VG34323A disk with two other controllers at > home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V > motherboard), and there are no problems to be found with DMA enabled. > Streaming 10 MB/s without glitches. So the drive *did* work on the vt82c686a in the A7V board? You tested it both on the Promise and on the 686a? But doesn't work on the 686a in your other board? > However, writing to the SAMSUNG VG34323A disk with DMA enabled on either > this machine [1] (at work, using the VIA IDE driver version 3.11) > > 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Apollo PRO] (rev 23) > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) > > or this machine [2] (at work, using the VIA IDE driver version 2.1e) > > 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) What's the manufacturer/model of these boards? Just for record ... What's the PCI bus speed? Or memory speed? > I get exactly the following errors on both machines > > hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } > hdc: dma_intr: error=0x84 { DriveStatusError BadCRC } > > no matter what cable I use. When I get this, the machine does not recover > most of the time, and I have to reset or power cycle. It should be able to recover in a couple (up to 10) minutes ... > This disc works > flawlessly on two other IDE controllers, so I do not think that the disk > is completely broken. It must be either these chipsets or the driver in > combination with this disk. Note that I _can_ use another UDMA66 disk > _with_ DMA enabled on both machine [1] and [2] above without problems. > Also, 2.2.16-22 seems to work with DMA enabled on machine [1]. I have not > tried 2.2.16-22 with DMA enabled on machine [2]. > > The problem I reported at first, hence the nasty subject, was a hang and a > nasty fs corruption when RH7 tried to remount the root fs read-write. I > examined the RH7 init scripts, or more precisely /etc/rc.sysinit, and > discovered, to my great disgust, that the stupid thing disables the dmesg > output on the console very early in the script. It is thus entirely > possible that I do get the above mentioned errors when the computer seems > to hang, and my fs gets corrupted. I will fix the script tomorrow to see > if my assumption is correct. > > SUMMARY: I have a disk that with DMA enabled give me CRC errors on two > machines, but not on two other, independent on the cable. Both troubling > machines do not recover from these errors. Linux 2.2.16-22 from RedHat > works fine with DMA enabled on machine [1], [2] is unknown. > > I hope this makes things a lot clearer. Yes, indeed it's much clearer now. Now to fix the bug, or at least be able to track it closer, I'll need 'lspci -vvxxx' of the IDE pci device in the following cases: 1) SAMSUNG VG34323A on VT82C596b/cf with RH 2.2.16-22 and DMA (working) 2) SAMSUNG VG34323A on VT82C686a/ce with RH 2.2.16-22 and DMA (working) 3) SAMSUNG VG34323A on VT82C596b/cf with 2.4.0+via3.11 and DMA, (doesn't work, so fs readonly) 4) SAMSUNG VG34323A on VT82C686a/ce with 2.4.0+via3.11 and DMA, (doesn't work, so fs readonly) 5) The other drive on VT82C596b/cf with 2.4.0+via3.11 and DMA (working) 6) The other drive on VT82C686a/ce with 2.4.0+via3.11 and DMA (working) With these data I should be able to find out what's different between the working and not working setups ... My current theory: In UDMA, when reading, the drive provides the clock. The IDE controller thus can read everything OK. When writing, the controller provides the clock and for some reason the Samsung can't keep up with the setting the driver selects for it. The question is why and why the driver selects the incorrect (or just too tight?) value. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote: I have now tried the SAMSUNG VG34323A disk with two other controllers at home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V motherboard), and there are no problems to be found with DMA enabled. Streaming 10 MB/s without glitches. So the drive *did* work on the vt82c686a in the A7V board? You tested it both on the Promise and on the 686a? But doesn't work on the 686a in your other board? However, writing to the SAMSUNG VG34323A disk with DMA enabled on either this machine [1] (at work, using the VIA IDE driver version 3.11) 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Apollo PRO] (rev 23) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) or this machine [2] (at work, using the VIA IDE driver version 2.1e) 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) What's the manufacturer/model of these boards? Just for record ... What's the PCI bus speed? Or memory speed? I get exactly the following errors on both machines hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdc: dma_intr: error=0x84 { DriveStatusError BadCRC } no matter what cable I use. When I get this, the machine does not recover most of the time, and I have to reset or power cycle. It should be able to recover in a couple (up to 10) minutes ... This disc works flawlessly on two other IDE controllers, so I do not think that the disk is completely broken. It must be either these chipsets or the driver in combination with this disk. Note that I _can_ use another UDMA66 disk _with_ DMA enabled on both machine [1] and [2] above without problems. Also, 2.2.16-22 seems to work with DMA enabled on machine [1]. I have not tried 2.2.16-22 with DMA enabled on machine [2]. The problem I reported at first, hence the nasty subject, was a hang and a nasty fs corruption when RH7 tried to remount the root fs read-write. I examined the RH7 init scripts, or more precisely /etc/rc.sysinit, and discovered, to my great disgust, that the stupid thing disables the dmesg output on the console very early in the script. It is thus entirely possible that I do get the above mentioned errors when the computer seems to hang, and my fs gets corrupted. I will fix the script tomorrow to see if my assumption is correct. SUMMARY: I have a disk that with DMA enabled give me CRC errors on two machines, but not on two other, independent on the cable. Both troubling machines do not recover from these errors. Linux 2.2.16-22 from RedHat works fine with DMA enabled on machine [1], [2] is unknown. I hope this makes things a lot clearer. Yes, indeed it's much clearer now. Now to fix the bug, or at least be able to track it closer, I'll need 'lspci -vvxxx' of the IDE pci device in the following cases: 1) SAMSUNG VG34323A on VT82C596b/cf with RH 2.2.16-22 and DMA (working) 2) SAMSUNG VG34323A on VT82C686a/ce with RH 2.2.16-22 and DMA (working) 3) SAMSUNG VG34323A on VT82C596b/cf with 2.4.0+via3.11 and DMA, (doesn't work, so fs readonly) 4) SAMSUNG VG34323A on VT82C686a/ce with 2.4.0+via3.11 and DMA, (doesn't work, so fs readonly) 5) The other drive on VT82C596b/cf with 2.4.0+via3.11 and DMA (working) 6) The other drive on VT82C686a/ce with 2.4.0+via3.11 and DMA (working) With these data I should be able to find out what's different between the working and not working setups ... My current theory: In UDMA, when reading, the drive provides the clock. The IDE controller thus can read everything OK. When writing, the controller provides the clock and for some reason the Samsung can't keep up with the setting the driver selects for it. The question is why and why the driver selects the incorrect (or just too tight?) value. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, 14 Jan 2001, Vojtech Pavlik wrote: On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote: I have now tried the SAMSUNG VG34323A disk with two other controllers at home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V motherboard), and there are no problems to be found with DMA enabled. Streaming 10 MB/s without glitches. So the drive *did* work on the vt82c686a in the A7V board? You tested it both on the Promise and on the 686a? But doesn't work on the 686a in your other board? Yes, on both the Promise and on the 686a. But the device revisions are different. The machine that does NOT work: 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) The machine that works: 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22) 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) The one the works is a 1 GHz Athlon, and the other is an 800 MHz Pentium-III. no matter what cable I use. When I get this, the machine does not recover most of the time, and I have to reset or power cycle. It should be able to recover in a couple (up to 10) minutes ... Who waits 10 minutes for a timeout? Can it be lowered? Expect another mail with the data you requested within a couple of hours. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, Jan 14, 2001 at 09:45:09AM +0100, Tobias Ringstrom wrote: On Sun, 14 Jan 2001, Vojtech Pavlik wrote: On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote: I have now tried the SAMSUNG VG34323A disk with two other controllers at home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V motherboard), and there are no problems to be found with DMA enabled. Streaming 10 MB/s without glitches. So the drive *did* work on the vt82c686a in the A7V board? You tested it both on the Promise and on the 686a? But doesn't work on the 686a in your other board? Yes, on both the Promise and on the 686a. But the device revisions are different. The machine that does NOT work: 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) The machine that works: 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22) 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) The one the works is a 1 GHz Athlon, and the other is an 800 MHz Pentium-III. no matter what cable I use. When I get this, the machine does not recover most of the time, and I have to reset or power cycle. It should be able to recover in a couple (up to 10) minutes ... Who waits 10 minutes for a timeout? Can it be lowered? It's not a 10 minute timeout, it's a shorter timeout retried many times. Not my code, though - this is generic PCI IDE code, and is a huge mess. Expect another mail with the data you requested within a couple of hours. Thanks a lot. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, 14 Jan 2001, Vojtech Pavlik wrote: So the drive *did* work on the vt82c686a in the A7V board? You tested it both on the Promise and on the 686a? But doesn't work on the 686a in your other board? Yes, on both the Promise and on the 686a. But the device revisions are different. The machine that does NOT work: 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) The machine that works: 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22) 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) The one the works is a 1 GHz Athlon, and the other is an 800 MHz Pentium-III. Of course is isn't. The vt82c686 that does not work is a 450 MHz K-6, not a PIII. no matter what cable I use. When I get this, the machine does not recover most of the time, and I have to reset or power cycle. It should be able to recover in a couple (up to 10) minutes ... Who waits 10 minutes for a timeout? Can it be lowered? It's not a 10 minute timeout, it's a shorter timeout retried many times. Not my code, though - this is generic PCI IDE code, and is a huge mess. What I get is a number of Busy and Drive is not ready for command for different sectors. Expect another mail with the data you requested within a couple of hours. Thanks a lot. Ok, it took a bit longer that that, mostly because me and my whife had unexpected (but very welcome) guests at home. It is Sunday, after all... I have attached a tar file with "lspci -vvxxx" and "hdinfo -i" for machine 1 and 2 to this mail, but first some comments. I will be talking about three machines: 1) 450 MHz K-6 on an AOpen MX59 PRO II motherboard 2) 800 MHz PIII on an unknown cheap/crappy motherboard. 3) 1 GHz Athlon on an ASUS A7V motherboard. and the following drives: A) SAMSUNG VG34323A, sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 udma2 B) ST38421A, mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 Machine 3 is the machine at home, and it does not have problems with any disks I have tried soo far, and seems very stable, both with ATA100 and ATA66. I verified that what is happening when RH7 tries to remount / read-write, is that I get the infamous CRC errors. It does not seem to recover from this state. At least I did not wait that long. I do not think that the RH7 kernel 2.2.16-22 uses udma2 at any time, and that may be why it works. Disk B does NOT work with DMA enabled with machine 1 or 2. It works better than disk A, but it does still fail after some time. The combination 1B was the most stable, and only failed once. When using disk B, the computer has managed to recover from the CRC error condition every time, as opposed to disk A which never recovers. (Busy) Using hdparm -X65 (udma1) makes disk A work with 2.4 in machine 2. What is the difference between udma1 and udma2? Now I'm almost completely lost. Hope this helps. Let me know if you want me to try something else. /Tobias /dev/hde: Model=SAMSUNG VG34323A (4.32GB), FwRev=GQ200, SerialNo=dW1921060033c8 Config={ HardSect NotMFM HdSw15uSec Fixed DTR10Mbs } RawCHS=14896/9/63, TrkSize=32256, SectSize=512, ECCbytes=21 BuffType=DualPortCache, BuffSize=496kB, MaxMultSect=16, MultSect=off CurCHS=14896/9/63, CurSects=-531627904, LBA=yes, LBAsects=8446032 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 *udma2 00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02) Subsystem: Asustek Computer, Inc.: Unknown device 8033 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium TAbort- TAbort- MAbort+ SERR- PERR+ Latency: 0 Region 0: Memory at e000 (32-bit, prefetchable) [size=128M] Capabilities: [a0] AGP version 2.0 Status: RQ=31 SBA+ 64bit- FW+ Rate=x1,x2 Command: RQ=0 SBA- AGP- 64bit- FW- Rate=none Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00: 06 11 05 03 06 00 10 a2 02 00 00 06 00 00 00 00 10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 17 a4 6b b4 4f 81 10 10 80 00 08 10 10 10 10 10 60: 03 ff 00 b0 e6 e5 e5 00 44 7c 86 0f 08 3f 00 00 70: de 80 cc 0c 0e a1 d2 00 01 b4 11 02 00 00 00 01 80: 0f 40 00 00 80 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 17 02 00 1f 00 00 00 00 6e
Re: 2.4 ate my filesystem on rw-mount, getting closer
I should also add that the 3.11 driver seems to make things better, but not yet perfect. My intuition tells me that I get CRC errors much sooner with 2.1e than with 3.11. Has the timings changed from 2.1e to 3.11, and would it be easy to modify 3.11 to get extra safe/paranoid, but less high performance, timings? Some extra data: * B seems to work in 2 with udma2 * A seems to work in 2 with udma1, but not with udma2. I wouldn't say it's rock solid, and I would not trust my data to any of these combinations, but at least it not break immmediately (i.e. for less than 1 GB written). The worst combination is 2.4.0 with VIA 2.1e and A in 1. Going from 2.1e to 3.11 helps, but it is still very bad. I'd really like to be more precise, but there are too many combinations to try to try them all, and sometimes it fails right away, and sometimes after several hundred megabytes. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, Jan 14, 2001 at 06:59:57PM +0100, Tobias Ringstrom wrote: I should also add that the 3.11 driver seems to make things better, but not yet perfect. My intuition tells me that I get CRC errors much sooner with 2.1e than with 3.11. Has the timings changed from 2.1e to 3.11, and would it be easy to modify 3.11 to get extra safe/paranoid, but less high performance, timings? If you use 'idebus=40' or 'idebus=50', the driver will add an extra margin to the timings, trying to compensate for the 40 or 50 MHz PCI bus it will be tricked to think it's working with. This could add a data point, yes. Some extra data: * B seems to work in 2 with udma2 * A seems to work in 2 with udma1, but not with udma2. UDMA1 is 22.2 MB/sec, UDMA2 is 33.3. UDMA0 is 16.6. Could you (if didn't already) send me the lspci -vvxxx after the -X65 (UDMA1) command, together with the one before? That also could tell something. I wouldn't say it's rock solid, and I would not trust my data to any of these combinations, but at least it not break immmediately (i.e. for less than 1 GB written). Actually, the CRC messages are safe and only mean a data transfer is retried. That is, only if it doesn't fail every time. They happen on many boards and drives using UDMA even under normal correct operation :( The worst combination is 2.4.0 with VIA 2.1e and A in 1. Going from 2.1e to 3.11 helps, but it is still very bad. I'd really like to be more precise, but there are too many combinations to try to try them all, and sometimes it fails right away, and sometimes after several hundred megabytes. If 'fails after several hundred megabytes' only means a single CRC error which is recovered from correctly, then that actually means 'working and probably would work perfect with a shorter cable'. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
I have now tried the SAMSUNG VG34323A disk with two other controllers at home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V motherboard), and there are no problems to be found with DMA enabled. Streaming 10 MB/s without glitches. However, writing to the SAMSUNG VG34323A disk with DMA enabled on either this machine [1] (at work, using the VIA IDE driver version 3.11) 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Apollo PRO] (rev 23) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) or this machine [2] (at work, using the VIA IDE driver version 2.1e) 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) I get exactly the following errors on both machines hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdc: dma_intr: error=0x84 { DriveStatusError BadCRC } no matter what cable I use. When I get this, the machine does not recover most of the time, and I have to reset or power cycle. This disc works flawlessly on two other IDE controllers, so I do not think that the disk is completely broken. It must be either these chipsets or the driver in combination with this disk. Note that I _can_ use another UDMA66 disk _with_ DMA enabled on both machine [1] and [2] above without problems. Also, 2.2.16-22 seems to work with DMA enabled on machine [1]. I have not tried 2.2.16-22 with DMA enabled on machine [2]. The problem I reported at first, hence the nasty subject, was a hang and a nasty fs corruption when RH7 tried to remount the root fs read-write. I examined the RH7 init scripts, or more precisely /etc/rc.sysinit, and discovered, to my great disgust, that the stupid thing disables the dmesg output on the console very early in the script. It is thus entirely possible that I do get the above mentioned errors when the computer seems to hang, and my fs gets corrupted. I will fix the script tomorrow to see if my assumption is correct. SUMMARY: I have a disk that with DMA enabled give me CRC errors on two machines, but not on two other, independent on the cable. Both troubling machines do not recover from these errors. Linux 2.2.16-22 from RedHat works fine with DMA enabled on machine [1], [2] is unknown. I hope this makes things a lot clearer. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
On Sat, 13 Jan 2001, Vojtech Pavlik wrote: > On Sat, Jan 13, 2001 at 09:12:27AM +0100, Tobias Ringstrom wrote: > > > 2) What's in /proc/ide/via? > > > > It's not there since I disabled the VIA driver. > > Ok. Could you send me this file when you boot with fs r-o? Ok, but this is with the wrong disc. Withe the bad disc, drive0 looks exacly like drive2, i.e. normal UDMA(33). Sorry about that. --VIA BusMastering IDE Configuration Driver Version: 2.1e South Bridge: VIA vt82c686a rev 0x1b Command register: 0x7 Latency timer: 32 PCI clock: 33MHz Master Read Cycle IRDY:0ws Master Write Cycle IRDY:0ws FIFO Output Data 1/2 Clock Advance: off BM IDE Status Register Read Retry: on Max DRDY Pulse Width: No limit ---Primary IDE---Secondary IDE-- Read DMA FIFO flush: on on End Sect. FIFO flush: on on Prefetch Buffer: on on Post Write Buffer: on on FIFO size: 8 8 Threshold Prim.: 1/2 1/2 Bytes Per Sector: 512 512 Both channels togth: yes yes ---drive0drive1drive2drive3- BMDMA enabled:yes yes yes yes Transfer Mode: UDMA DMA/PIO UDMA DMA/PIO Address Setup: 30ns 120ns 30ns 120ns Active Pulse:90ns 330ns 90ns 330ns Recovery Time: 30ns 270ns 30ns 270ns Cycle Time: 30ns 600ns 60ns 600ns Transfer Rate: 66.0MB/s 3.3MB/s 33.0MB/s 3.3MB/s > > > 4) If you mount your filesystem read-only, does it read garbage? > > > > Now here's a strange part, or possibly a crusial clue. When I booted a > > 2.4.0 kernel (from floppy using the excellent syslinux) with "ro > > init=/bin/sh", I could access the filesystem just fine. I could even > > remount the root filesystem rw, and there were no problems. But I did not > > write anything to the disk, since I was convinced that the problem was > > gone (this was the second try). After this I rebooted with > > ctrl-alt-delete, forgetting how bad an idea that is with init=/bin/sh, > > booted up the RH7 2.2.16 kernel, and fsck was run with no errors. > > So far no problem. Rebooting with c-a-d with fs r-o is OK. > > > Now I > > though all was well, rebooted from floppy again, but without the init= > > part, and poof, it hang. > > Where? It could be a different reason than IDE setup ... Don't think so. It happens on the "Remounting root read-write". > > More interesting may be that I had to turn the computer off and on again > > to get BIOS to find the hard drive. Repeated long reset button presses > > did not help. It is possible that it hung during BIOS hd detection - I > > wish I could remember. > > I fear this isn't much of a clue, sorry. The clue is that the VIA driver messed up either the chipset or the drive quite a lot, but maybe that is already obvious. > > I suspect that I could have hung the drive with init=/bin/sh if I would > > have done some reading and writing to the device, besides ls. > > Please try it. Best mke2fs your swap partition and try reading & writing > to that. You can mkswap it back after you finish. After more testing, I think I have isolated the problem to this disk, or at least this disk with this controller. With another (UDMA66) disk, there are no problems. Details at the end. > > I think I can spend some more time today trying it out some more. > > Please do. 'lspci -vvxxx' data for the case without a driver, with 2.4.0 > driver and with 3.11 driver would help me find the problem. Ok, I'll do that later. > Make sure you *don't* have any hdparm -d1 or hdparm -X66 or similar > stuff in your init scripts. I'm sure I don't. This happens with a clean fresh RH7 installation. > > I will > > also try your 3.11 driver, which seems to be an enormous cleanup. > > the 2.1e driver is an enormous cleanup of the original driver from the > 2.2 kernels. the 3.11 is an enormous cleanup of 2.1e, yes. I have not had a chance to try the 3.11 driver yet. Now for the new details. When writing to the disk with DMA enabled, I get the following errors, in two different machines. Both are VIA IDE machines. I is NOT a cable error. I have tries with several cables. Possibly a connector or soldering problem. I'll try the disk in more machines an get back with more info. I have to run now. hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdc: dma_intr: error=0x84 { DriveStatusError BadCRC } /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
On Sat, Jan 13, 2001 at 09:12:27AM +0100, Tobias Ringstrom wrote: > > Wow. Ok, I'm maintaining the 2.4.0 VIA driver, so I'd like to know more > > about this: > > > > 1) What's the ISA bridge revision? > > 00:00.0 Host bridge: VIA Technologies, Inc. VT8501 (rev 02) > 00:01.0 PCI bridge: VIA Technologies, Inc. VT8501 > 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) > 00:07.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 0e) > 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 20) > 00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 [Apollo Super >AC97/Audio] (rev 21) > 00:0a.0 Ethernet controller: VIA Technologies, Inc. VT86C100A [Rhine 10/100] (rev 06) > 01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i7 (rev 5b) Ok, your IDE chip is a vt82c686a/ce. > > 2) What's in /proc/ide/via? > > It's not there since I disabled the VIA driver. Ok. Could you send me this file when you boot with fs r-o? > > 3) What says hdparm -i on your devices? > > /dev/hda: > > Model=SAMSUNG VG34323A (4.32GB), FwRev=GQ200, SerialNo=dW1921060033c8 > Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } > RawCHS=14896/9/63, TrkSize=32256, SectSize=512, ECCbytes=21 > BuffType=DualPortCache, BuffSize=496kB, MaxMultSect=16, MultSect=off > CurCHS=14896/9/63, CurSects=-531627904, LBA=yes, LBAsects=8446032 > IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} > PIO modes: pio0 pio1 pio2 pio3 pio4 > DMA modes: sdma0 sdma1 sdma2 *mdma0 mdma1 mdma2 udma0 udma1 *udma2 Looks good, too. An UDMA33 drive. > > 4) If you mount your filesystem read-only, does it read garbage? > > Now here's a strange part, or possibly a crusial clue. When I booted a > 2.4.0 kernel (from floppy using the excellent syslinux) with "ro > init=/bin/sh", I could access the filesystem just fine. I could even > remount the root filesystem rw, and there were no problems. But I did not > write anything to the disk, since I was convinced that the problem was > gone (this was the second try). After this I rebooted with > ctrl-alt-delete, forgetting how bad an idea that is with init=/bin/sh, > booted up the RH7 2.2.16 kernel, and fsck was run with no errors. So far no problem. Rebooting with c-a-d with fs r-o is OK. > Now I > though all was well, rebooted from floppy again, but without the init= > part, and poof, it hang. Where? It could be a different reason than IDE setup ... > More interesting may be that I had to turn the computer off and on again > to get BIOS to find the hard drive. Repeated long reset button presses > did not help. It is possible that it hung during BIOS hd detection - I > wish I could remember. I fear this isn't much of a clue, sorry. > I suspect that I could have hung the drive with init=/bin/sh if I would > have done some reading and writing to the device, besides ls. Please try it. Best mke2fs your swap partition and try reading & writing to that. You can mkswap it back after you finish. > I think I can spend some more time today trying it out some more. Please do. 'lspci -vvxxx' data for the case without a driver, with 2.4.0 driver and with 3.11 driver would help me find the problem. Make sure you *don't* have any hdparm -d1 or hdparm -X66 or similar stuff in your init scripts. > I will > also try your 3.11 driver, which seems to be an enormous cleanup. the 2.1e driver is an enormous cleanup of the original driver from the 2.2 kernels. the 3.11 is an enormous cleanup of 2.1e, yes. > Btw, do > you have a home page for the VIA driver? A CVS perhaps? If not, please > consider using sourceforge or something similar. No, not yet, but working on that. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
On Fri, 12 Jan 2001, Vojtech Pavlik wrote: > Wow. Ok, I'm maintaining the 2.4.0 VIA driver, so I'd like to know more > about this: > > 1) What's the ISA bridge revision? 00:00.0 Host bridge: VIA Technologies, Inc. VT8501 (rev 02) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8501 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) 00:07.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 0e) 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 20) 00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 [Apollo Super AC97/Audio] (rev 21) 00:0a.0 Ethernet controller: VIA Technologies, Inc. VT86C100A [Rhine 10/100] (rev 06) 01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i7 (rev 5b) > 2) What's in /proc/ide/via? It's not there since I disabled the VIA driver. > 3) What says hdparm -i on your devices? /dev/hda: Model=SAMSUNG VG34323A (4.32GB), FwRev=GQ200, SerialNo=dW1921060033c8 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } RawCHS=14896/9/63, TrkSize=32256, SectSize=512, ECCbytes=21 BuffType=DualPortCache, BuffSize=496kB, MaxMultSect=16, MultSect=off CurCHS=14896/9/63, CurSects=-531627904, LBA=yes, LBAsects=8446032 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: sdma0 sdma1 sdma2 *mdma0 mdma1 mdma2 udma0 udma1 *udma2 > 4) If you mount your filesystem read-only, does it read garbage? Now here's a strange part, or possibly a crusial clue. When I booted a 2.4.0 kernel (from floppy using the excellent syslinux) with "ro init=/bin/sh", I could access the filesystem just fine. I could even remount the root filesystem rw, and there were no problems. But I did not write anything to the disk, since I was convinced that the problem was gone (this was the second try). After this I rebooted with ctrl-alt-delete, forgetting how bad an idea that is with init=/bin/sh, booted up the RH7 2.2.16 kernel, and fsck was run with no errors. Now I though all was well, rebooted from floppy again, but without the init= part, and poof, it hang. More interesting may be that I had to turn the computer off and on again to get BIOS to find the hard drive. Repeated long reset button presses did not help. It is possible that it hung during BIOS hd detection - I wish I could remember. I suspect that I could have hung the drive with init=/bin/sh if I would have done some reading and writing to the device, besides ls. I think I can spend some more time today trying it out some more. I will also try your 3.11 driver, which seems to be an enormous cleanup. Btw, do you have a home page for the VIA driver? A CVS perhaps? If not, please consider using sourceforge or something similar. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
On Fri, 12 Jan 2001, Vojtech Pavlik wrote: Wow. Ok, I'm maintaining the 2.4.0 VIA driver, so I'd like to know more about this: 1) What's the ISA bridge revision? 00:00.0 Host bridge: VIA Technologies, Inc. VT8501 (rev 02) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8501 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) 00:07.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 0e) 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 20) 00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 [Apollo Super AC97/Audio] (rev 21) 00:0a.0 Ethernet controller: VIA Technologies, Inc. VT86C100A [Rhine 10/100] (rev 06) 01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i7 (rev 5b) 2) What's in /proc/ide/via? It's not there since I disabled the VIA driver. 3) What says hdparm -i on your devices? /dev/hda: Model=SAMSUNG VG34323A (4.32GB), FwRev=GQ200, SerialNo=dW1921060033c8 Config={ HardSect NotMFM HdSw15uSec Fixed DTR10Mbs } RawCHS=14896/9/63, TrkSize=32256, SectSize=512, ECCbytes=21 BuffType=DualPortCache, BuffSize=496kB, MaxMultSect=16, MultSect=off CurCHS=14896/9/63, CurSects=-531627904, LBA=yes, LBAsects=8446032 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: sdma0 sdma1 sdma2 *mdma0 mdma1 mdma2 udma0 udma1 *udma2 4) If you mount your filesystem read-only, does it read garbage? Now here's a strange part, or possibly a crusial clue. When I booted a 2.4.0 kernel (from floppy using the excellent syslinux) with "ro init=/bin/sh", I could access the filesystem just fine. I could even remount the root filesystem rw, and there were no problems. But I did not write anything to the disk, since I was convinced that the problem was gone (this was the second try). After this I rebooted with ctrl-alt-delete, forgetting how bad an idea that is with init=/bin/sh, booted up the RH7 2.2.16 kernel, and fsck was run with no errors. Now I though all was well, rebooted from floppy again, but without the init= part, and poof, it hang. More interesting may be that I had to turn the computer off and on again to get BIOS to find the hard drive. Repeated long reset button presses did not help. It is possible that it hung during BIOS hd detection - I wish I could remember. I suspect that I could have hung the drive with init=/bin/sh if I would have done some reading and writing to the device, besides ls. I think I can spend some more time today trying it out some more. I will also try your 3.11 driver, which seems to be an enormous cleanup. Btw, do you have a home page for the VIA driver? A CVS perhaps? If not, please consider using sourceforge or something similar. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
On Sat, Jan 13, 2001 at 09:12:27AM +0100, Tobias Ringstrom wrote: Wow. Ok, I'm maintaining the 2.4.0 VIA driver, so I'd like to know more about this: 1) What's the ISA bridge revision? 00:00.0 Host bridge: VIA Technologies, Inc. VT8501 (rev 02) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8501 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) 00:07.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 0e) 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 20) 00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 [Apollo Super AC97/Audio] (rev 21) 00:0a.0 Ethernet controller: VIA Technologies, Inc. VT86C100A [Rhine 10/100] (rev 06) 01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i7 (rev 5b) Ok, your IDE chip is a vt82c686a/ce. 2) What's in /proc/ide/via? It's not there since I disabled the VIA driver. Ok. Could you send me this file when you boot with fs r-o? 3) What says hdparm -i on your devices? /dev/hda: Model=SAMSUNG VG34323A (4.32GB), FwRev=GQ200, SerialNo=dW1921060033c8 Config={ HardSect NotMFM HdSw15uSec Fixed DTR10Mbs } RawCHS=14896/9/63, TrkSize=32256, SectSize=512, ECCbytes=21 BuffType=DualPortCache, BuffSize=496kB, MaxMultSect=16, MultSect=off CurCHS=14896/9/63, CurSects=-531627904, LBA=yes, LBAsects=8446032 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: sdma0 sdma1 sdma2 *mdma0 mdma1 mdma2 udma0 udma1 *udma2 Looks good, too. An UDMA33 drive. 4) If you mount your filesystem read-only, does it read garbage? Now here's a strange part, or possibly a crusial clue. When I booted a 2.4.0 kernel (from floppy using the excellent syslinux) with "ro init=/bin/sh", I could access the filesystem just fine. I could even remount the root filesystem rw, and there were no problems. But I did not write anything to the disk, since I was convinced that the problem was gone (this was the second try). After this I rebooted with ctrl-alt-delete, forgetting how bad an idea that is with init=/bin/sh, booted up the RH7 2.2.16 kernel, and fsck was run with no errors. So far no problem. Rebooting with c-a-d with fs r-o is OK. Now I though all was well, rebooted from floppy again, but without the init= part, and poof, it hang. Where? It could be a different reason than IDE setup ... More interesting may be that I had to turn the computer off and on again to get BIOS to find the hard drive. Repeated long reset button presses did not help. It is possible that it hung during BIOS hd detection - I wish I could remember. I fear this isn't much of a clue, sorry. I suspect that I could have hung the drive with init=/bin/sh if I would have done some reading and writing to the device, besides ls. Please try it. Best mke2fs your swap partition and try reading writing to that. You can mkswap it back after you finish. I think I can spend some more time today trying it out some more. Please do. 'lspci -vvxxx' data for the case without a driver, with 2.4.0 driver and with 3.11 driver would help me find the problem. Make sure you *don't* have any hdparm -d1 or hdparm -X66 or similar stuff in your init scripts. I will also try your 3.11 driver, which seems to be an enormous cleanup. the 2.1e driver is an enormous cleanup of the original driver from the 2.2 kernels. the 3.11 is an enormous cleanup of 2.1e, yes. Btw, do you have a home page for the VIA driver? A CVS perhaps? If not, please consider using sourceforge or something similar. No, not yet, but working on that. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
On Sat, 13 Jan 2001, Vojtech Pavlik wrote: On Sat, Jan 13, 2001 at 09:12:27AM +0100, Tobias Ringstrom wrote: 2) What's in /proc/ide/via? It's not there since I disabled the VIA driver. Ok. Could you send me this file when you boot with fs r-o? Ok, but this is with the wrong disc. Withe the bad disc, drive0 looks exacly like drive2, i.e. normal UDMA(33). Sorry about that. --VIA BusMastering IDE Configuration Driver Version: 2.1e South Bridge: VIA vt82c686a rev 0x1b Command register: 0x7 Latency timer: 32 PCI clock: 33MHz Master Read Cycle IRDY:0ws Master Write Cycle IRDY:0ws FIFO Output Data 1/2 Clock Advance: off BM IDE Status Register Read Retry: on Max DRDY Pulse Width: No limit ---Primary IDE---Secondary IDE-- Read DMA FIFO flush: on on End Sect. FIFO flush: on on Prefetch Buffer: on on Post Write Buffer: on on FIFO size: 8 8 Threshold Prim.: 1/2 1/2 Bytes Per Sector: 512 512 Both channels togth: yes yes ---drive0drive1drive2drive3- BMDMA enabled:yes yes yes yes Transfer Mode: UDMA DMA/PIO UDMA DMA/PIO Address Setup: 30ns 120ns 30ns 120ns Active Pulse:90ns 330ns 90ns 330ns Recovery Time: 30ns 270ns 30ns 270ns Cycle Time: 30ns 600ns 60ns 600ns Transfer Rate: 66.0MB/s 3.3MB/s 33.0MB/s 3.3MB/s 4) If you mount your filesystem read-only, does it read garbage? Now here's a strange part, or possibly a crusial clue. When I booted a 2.4.0 kernel (from floppy using the excellent syslinux) with "ro init=/bin/sh", I could access the filesystem just fine. I could even remount the root filesystem rw, and there were no problems. But I did not write anything to the disk, since I was convinced that the problem was gone (this was the second try). After this I rebooted with ctrl-alt-delete, forgetting how bad an idea that is with init=/bin/sh, booted up the RH7 2.2.16 kernel, and fsck was run with no errors. So far no problem. Rebooting with c-a-d with fs r-o is OK. Now I though all was well, rebooted from floppy again, but without the init= part, and poof, it hang. Where? It could be a different reason than IDE setup ... Don't think so. It happens on the "Remounting root read-write". More interesting may be that I had to turn the computer off and on again to get BIOS to find the hard drive. Repeated long reset button presses did not help. It is possible that it hung during BIOS hd detection - I wish I could remember. I fear this isn't much of a clue, sorry. The clue is that the VIA driver messed up either the chipset or the drive quite a lot, but maybe that is already obvious. I suspect that I could have hung the drive with init=/bin/sh if I would have done some reading and writing to the device, besides ls. Please try it. Best mke2fs your swap partition and try reading writing to that. You can mkswap it back after you finish. After more testing, I think I have isolated the problem to this disk, or at least this disk with this controller. With another (UDMA66) disk, there are no problems. Details at the end. I think I can spend some more time today trying it out some more. Please do. 'lspci -vvxxx' data for the case without a driver, with 2.4.0 driver and with 3.11 driver would help me find the problem. Ok, I'll do that later. Make sure you *don't* have any hdparm -d1 or hdparm -X66 or similar stuff in your init scripts. I'm sure I don't. This happens with a clean fresh RH7 installation. I will also try your 3.11 driver, which seems to be an enormous cleanup. the 2.1e driver is an enormous cleanup of the original driver from the 2.2 kernels. the 3.11 is an enormous cleanup of 2.1e, yes. I have not had a chance to try the 3.11 driver yet. Now for the new details. When writing to the disk with DMA enabled, I get the following errors, in two different machines. Both are VIA IDE machines. I is NOT a cable error. I have tries with several cables. Possibly a connector or soldering problem. I'll try the disk in more machines an get back with more info. I have to run now. hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdc: dma_intr: error=0x84 { DriveStatusError BadCRC } /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
I have now tried the SAMSUNG VG34323A disk with two other controllers at home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V motherboard), and there are no problems to be found with DMA enabled. Streaming 10 MB/s without glitches. However, writing to the SAMSUNG VG34323A disk with DMA enabled on either this machine [1] (at work, using the VIA IDE driver version 3.11) 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Apollo PRO] (rev 23) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) or this machine [2] (at work, using the VIA IDE driver version 2.1e) 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) I get exactly the following errors on both machines hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdc: dma_intr: error=0x84 { DriveStatusError BadCRC } no matter what cable I use. When I get this, the machine does not recover most of the time, and I have to reset or power cycle. This disc works flawlessly on two other IDE controllers, so I do not think that the disk is completely broken. It must be either these chipsets or the driver in combination with this disk. Note that I _can_ use another UDMA66 disk _with_ DMA enabled on both machine [1] and [2] above without problems. Also, 2.2.16-22 seems to work with DMA enabled on machine [1]. I have not tried 2.2.16-22 with DMA enabled on machine [2]. The problem I reported at first, hence the nasty subject, was a hang and a nasty fs corruption when RH7 tried to remount the root fs read-write. I examined the RH7 init scripts, or more precisely /etc/rc.sysinit, and discovered, to my great disgust, that the stupid thing disables the dmesg output on the console very early in the script. It is thus entirely possible that I do get the above mentioned errors when the computer seems to hang, and my fs gets corrupted. I will fix the script tomorrow to see if my assumption is correct. SUMMARY: I have a disk that with DMA enabled give me CRC errors on two machines, but not on two other, independent on the cable. Both troubling machines do not recover from these errors. Linux 2.2.16-22 from RedHat works fine with DMA enabled on machine [1], [2] is unknown. I hope this makes things a lot clearer. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
On Fri, Jan 12, 2001 at 12:23:21PM -0500, Martin Laberge wrote: > > > This is on a 450 MHz AMD-K6 with the following IDE controller: > > > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) > > > > There are several people who have reported that the 2.4.0 VIA IDE driver > > trashes hard disks like that. The 2.2 one also did this sometimes but only > > with specific chipset versions and if you have dma autotune on (thats why > > currently 2.2 refuses to do tuning on VP3) > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to [EMAIL PROTECTED] > > Please read the FAQ at http://www.tux.org/lkml/ > > I had exactly the same problem with my K6-350 and IDE VT82C586a > on a kernet 2.2.16. i just made a hdparm to enable DMA and poo > lost all data reinstall necessary from scratch Is this problem still present with 2.4.0? Well, you don't need to kill your data to test this - make sure the kernel is mounting the filesystems read only in the test. DMA will be probably enabled automatically for your drives. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
On Fri, Jan 12, 2001 at 10:15:45AM +0100, Tobias Ringstrom wrote: > I've never seen anything like it before, which I'm happy for. The system > had been running a standard RedHat 7 kernel for days without any problems, > but who wants to run a 2.2 kernel? I compiled 2.4.0 for it, rebooted, and > blam! The RedHat init stripts got to the "remounting root read-write" > point, and just froze solid. > > Rebooting into RH7 failed, becauce inittab could not be found. In fact > the filesystem was completely messed up, with /dev empty, lots of device > nodes in /etc, and files missing all over the place. I had to reinstall > RH7 from scratch. > > I do not understand how this could happen during a remounting root rw. > Is the filesystem really that unstable? > > Am I right in suspecting DMA, which was enabled at the time? Any other > ideas? Is it a known problem? > > This is on a 450 MHz AMD-K6 with the following IDE controller: > > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) > > [I know this is not a very good trouble report, but it will have to do for > the time beeing. I hope to do more testing at a later time.] > > /Tobias > > PS. This is _not_ the same system that I reported IDE busy errors for. Wow. Ok, I'm maintaining the 2.4.0 VIA driver, so I'd like to know more about this: 1) What's the ISA bridge revision? 2) What's in /proc/ide/via? 3) What says hdparm -i on your devices? 4) If you mount your filesystem read-only, does it read garbage? Thanks. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
Alan Cox wrote: > > This is on a 450 MHz AMD-K6 with the following IDE controller: > > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) > > There are several people who have reported that the 2.4.0 VIA IDE driver > trashes hard disks like that. The 2.2 one also did this sometimes but only > with specific chipset versions and if you have dma autotune on (thats why > currently 2.2 refuses to do tuning on VP3) > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ I had exactly the same problem with my K6-350 and IDE VT82C586a on a kernet 2.2.16. i just made a hdparm to enable DMA and poo lost all data reinstall necessary from scratch Martin Laberge [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
> This is on a 450 MHz AMD-K6 with the following IDE controller: > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) There are several people who have reported that the 2.4.0 VIA IDE driver trashes hard disks like that. The 2.2 one also did this sometimes but only with specific chipset versions and if you have dma autotune on (thats why currently 2.2 refuses to do tuning on VP3) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4 ate my filesystem on rw-mount
I've never seen anything like it before, which I'm happy for. The system had been running a standard RedHat 7 kernel for days without any problems, but who wants to run a 2.2 kernel? I compiled 2.4.0 for it, rebooted, and blam! The RedHat init stripts got to the "remounting root read-write" point, and just froze solid. Rebooting into RH7 failed, becauce inittab could not be found. In fact the filesystem was completely messed up, with /dev empty, lots of device nodes in /etc, and files missing all over the place. I had to reinstall RH7 from scratch. I do not understand how this could happen during a remounting root rw. Is the filesystem really that unstable? Am I right in suspecting DMA, which was enabled at the time? Any other ideas? Is it a known problem? This is on a 450 MHz AMD-K6 with the following IDE controller: 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) [I know this is not a very good trouble report, but it will have to do for the time beeing. I hope to do more testing at a later time.] /Tobias PS. This is _not_ the same system that I reported IDE busy errors for. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4 ate my filesystem on rw-mount
I've never seen anything like it before, which I'm happy for. The system had been running a standard RedHat 7 kernel for days without any problems, but who wants to run a 2.2 kernel? I compiled 2.4.0 for it, rebooted, and blam! The RedHat init stripts got to the "remounting root read-write" point, and just froze solid. Rebooting into RH7 failed, becauce inittab could not be found. In fact the filesystem was completely messed up, with /dev empty, lots of device nodes in /etc, and files missing all over the place. I had to reinstall RH7 from scratch. I do not understand how this could happen during a remounting root rw. Is the filesystem really that unstable? Am I right in suspecting DMA, which was enabled at the time? Any other ideas? Is it a known problem? This is on a 450 MHz AMD-K6 with the following IDE controller: 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) [I know this is not a very good trouble report, but it will have to do for the time beeing. I hope to do more testing at a later time.] /Tobias PS. This is _not_ the same system that I reported IDE busy errors for. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
This is on a 450 MHz AMD-K6 with the following IDE controller: 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) There are several people who have reported that the 2.4.0 VIA IDE driver trashes hard disks like that. The 2.2 one also did this sometimes but only with specific chipset versions and if you have dma autotune on (thats why currently 2.2 refuses to do tuning on VP3) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
Alan Cox wrote: This is on a 450 MHz AMD-K6 with the following IDE controller: 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) There are several people who have reported that the 2.4.0 VIA IDE driver trashes hard disks like that. The 2.2 one also did this sometimes but only with specific chipset versions and if you have dma autotune on (thats why currently 2.2 refuses to do tuning on VP3) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ I had exactly the same problem with my K6-350 and IDE VT82C586a on a kernet 2.2.16. i just made a hdparm to enable DMA and poo lost all data reinstall necessary from scratch Martin Laberge [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
On Fri, Jan 12, 2001 at 10:15:45AM +0100, Tobias Ringstrom wrote: I've never seen anything like it before, which I'm happy for. The system had been running a standard RedHat 7 kernel for days without any problems, but who wants to run a 2.2 kernel? I compiled 2.4.0 for it, rebooted, and blam! The RedHat init stripts got to the "remounting root read-write" point, and just froze solid. Rebooting into RH7 failed, becauce inittab could not be found. In fact the filesystem was completely messed up, with /dev empty, lots of device nodes in /etc, and files missing all over the place. I had to reinstall RH7 from scratch. I do not understand how this could happen during a remounting root rw. Is the filesystem really that unstable? Am I right in suspecting DMA, which was enabled at the time? Any other ideas? Is it a known problem? This is on a 450 MHz AMD-K6 with the following IDE controller: 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) [I know this is not a very good trouble report, but it will have to do for the time beeing. I hope to do more testing at a later time.] /Tobias PS. This is _not_ the same system that I reported IDE busy errors for. Wow. Ok, I'm maintaining the 2.4.0 VIA driver, so I'd like to know more about this: 1) What's the ISA bridge revision? 2) What's in /proc/ide/via? 3) What says hdparm -i on your devices? 4) If you mount your filesystem read-only, does it read garbage? Thanks. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount
On Fri, Jan 12, 2001 at 12:23:21PM -0500, Martin Laberge wrote: This is on a 450 MHz AMD-K6 with the following IDE controller: 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) There are several people who have reported that the 2.4.0 VIA IDE driver trashes hard disks like that. The 2.2 one also did this sometimes but only with specific chipset versions and if you have dma autotune on (thats why currently 2.2 refuses to do tuning on VP3) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ I had exactly the same problem with my K6-350 and IDE VT82C586a on a kernet 2.2.16. i just made a hdparm to enable DMA and poo lost all data reinstall necessary from scratch Is this problem still present with 2.4.0? Well, you don't need to kill your data to test this - make sure the kernel is mounting the filesystems read only in the test. DMA will be probably enabled automatically for your drives. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/