Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Vojtech Pavlik

On Sun, Jan 14, 2001 at 06:59:57PM +0100, Tobias Ringstrom wrote:
> 
> I should also add that the 3.11 driver seems to make things better, but
> not yet perfect.  My intuition tells me that I get CRC errors much sooner
> with 2.1e than with 3.11.
> 
> Has the timings changed from 2.1e to 3.11, and would it be easy to modify
> 3.11 to get extra safe/paranoid, but less high performance, timings?

If you use 'idebus=40' or 'idebus=50', the driver will add an extra
margin to the timings, trying to compensate for the 40 or 50 MHz PCI bus
it will be tricked to think it's working with.

This could add a data point, yes.

> Some extra data:
> * B seems to work in 2 with udma2
> * A seems to work in 2 with udma1, but not with udma2.

UDMA1 is 22.2 MB/sec, UDMA2 is 33.3. UDMA0 is 16.6.

Could you (if didn't already) send me the lspci -vvxxx after the -X65
(UDMA1) command, together with the one before? That also could tell
something.

> I wouldn't say it's rock solid, and I would not trust my data to any of
> these combinations, but at least it not break immmediately (i.e. for less
> than 1 GB written).

Actually, the CRC messages are safe and only mean a data transfer is
retried. That is, only if it doesn't fail every time. They happen on
many boards and drives using UDMA even under normal correct operation :(

> The worst combination is 2.4.0 with VIA 2.1e and A in 1.  Going from 2.1e
> to 3.11 helps, but it is still very bad.
> 
> I'd really like to be more precise, but there are too many combinations to
> try to try them all, and sometimes it fails right away, and sometimes
> after several hundred megabytes.

If 'fails after several hundred megabytes' only means a single CRC error
which is recovered from correctly, then that actually means 'working and
probably would work perfect with a shorter cable'.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Tobias Ringstrom

I should also add that the 3.11 driver seems to make things better, but
not yet perfect.  My intuition tells me that I get CRC errors much sooner
with 2.1e than with 3.11.

Has the timings changed from 2.1e to 3.11, and would it be easy to modify
3.11 to get extra safe/paranoid, but less high performance, timings?

Some extra data:
* B seems to work in 2 with udma2
* A seems to work in 2 with udma1, but not with udma2.

I wouldn't say it's rock solid, and I would not trust my data to any of
these combinations, but at least it not break immmediately (i.e. for less
than 1 GB written).

The worst combination is 2.4.0 with VIA 2.1e and A in 1.  Going from 2.1e
to 3.11 helps, but it is still very bad.

I'd really like to be more precise, but there are too many combinations to
try to try them all, and sometimes it fails right away, and sometimes
after several hundred megabytes.

/Tobias

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Tobias Ringstrom

On Sun, 14 Jan 2001, Vojtech Pavlik wrote:
> > > So the drive *did* work on the vt82c686a in the A7V board? You tested it
> > > both on the Promise and on the 686a? But doesn't work on the 686a in
> > > your other board?
> >
> > Yes, on both the Promise and on the 686a.  But the device revisions are
> > different.  The machine that does NOT work:
> >
> > 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b)
> > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06)
> >
> > The machine that works:
> >
> > 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
> > 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)
> >
> > The one the works is a 1 GHz Athlon, and the other is an 800 MHz
> > Pentium-III.

Of course is isn't.  The vt82c686 that does not work is a 450 MHz K-6, not
a PIII.

> > > > no matter what cable I use.  When I get this, the machine does not recover
> > > > most of the time, and I have to reset or power cycle.
> > >
> > > It should be able to recover in a couple (up to 10) minutes ...
> >
> > Who waits 10 minutes for a timeout?  Can it be lowered?
>
> It's not a 10 minute timeout, it's a shorter timeout retried many times.
> Not my code, though - this is generic PCI IDE code, and is a huge mess.

What I get is a number of Busy and Drive is not ready for command for
different sectors.

> > Expect another mail with the data you requested within a couple of hours.
>
> Thanks a lot.

Ok, it took a bit longer that that, mostly because me and my whife had
unexpected (but very welcome) guests at home.  It is Sunday, after all...

I have attached a tar file with "lspci -vvxxx" and "hdinfo -i" for machine
1 and 2 to this mail, but first some comments.

I will be talking about three machines:

1) 450 MHz K-6 on an AOpen MX59 PRO II motherboard
2) 800 MHz PIII on an unknown cheap/crappy motherboard.
3) 1 GHz Athlon on an ASUS A7V motherboard.

and the following drives:

A) SAMSUNG VG34323A, sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 udma2
B) ST38421A, mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4

Machine 3 is the machine at home, and it does not have problems with any
disks I have tried soo far, and seems very stable, both with ATA100 and
ATA66.

I verified that what is happening when RH7 tries to remount / read-write,
is that I get the infamous CRC errors.  It does not seem to recover from
this state.  At least I did not wait that long.

I do not think that the RH7 kernel 2.2.16-22 uses udma2 at any time, and
that may be why it works.

Disk B does NOT work with DMA enabled with machine 1 or 2.  It works
better than disk A, but it does still fail after some time.  The
combination 1B was the most stable, and only failed once.

When using disk B, the computer has managed to recover from the CRC error
condition every time, as opposed to disk A which never recovers.  (Busy)

Using hdparm -X65 (udma1) makes disk A work with 2.4 in machine 2.  What
is the difference between udma1 and udma2?

Now I'm almost completely lost.  Hope this helps.  Let me know if you want
me to try something else.

/Tobias




/dev/hde:

 Model=SAMSUNG VG34323A (4.32GB), FwRev=GQ200, SerialNo=dW1921060033c8
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=14896/9/63, TrkSize=32256, SectSize=512, ECCbytes=21
 BuffType=DualPortCache, BuffSize=496kB, MaxMultSect=16, MultSect=off
 CurCHS=14896/9/63, CurSects=-531627904, LBA=yes, LBAsects=8446032
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 *udma2 


00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
Subsystem: Asustek Computer, Inc.: Unknown device 8033
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- 
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 06 11 05 03 06 00 10 a2 02 00 00 06 00 00 00 00
10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 17 a4 6b b4 4f 81 10 10 80 00 08 10 10 10 10 10
60: 03 ff 00 b0 e6 e5 e5 00 44 7c 86 0f 08 3f 00 00
70: de 80 cc 0c 0e a1 d2 00 01 b4 11 02 00 00 00 01
80: 0f 40 00 00 80 00 00 00 02 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 02 c0 20 00 17 02 00 1f 00 00 00 00 6e 02 14 00
b0: 61 ec 80 e5 32 33 28 00 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Vojtech Pavlik

On Sun, Jan 14, 2001 at 09:45:09AM +0100, Tobias Ringstrom wrote:
> On Sun, 14 Jan 2001, Vojtech Pavlik wrote:
> > On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote:
> >
> > > I have now tried the SAMSUNG VG34323A disk with two other controllers at
> > > home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V
> > > motherboard), and there are no problems to be found with DMA enabled.
> > > Streaming 10 MB/s without glitches.
> >
> > So the drive *did* work on the vt82c686a in the A7V board? You tested it
> > both on the Promise and on the 686a? But doesn't work on the 686a in
> > your other board?
> 
> Yes, on both the Promise and on the 686a.  But the device revisions are
> different.  The machine that does NOT work:
> 
> 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b)
> 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06)
> 
> The machine that works:
> 
> 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
> 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)
> 
> The one the works is a 1 GHz Athlon, and the other is an 800 MHz
> Pentium-III.
> 
> > > no matter what cable I use.  When I get this, the machine does not recover
> > > most of the time, and I have to reset or power cycle.
> >
> > It should be able to recover in a couple (up to 10) minutes ...
> 
> Who waits 10 minutes for a timeout?  Can it be lowered?

It's not a 10 minute timeout, it's a shorter timeout retried many times.
Not my code, though - this is generic PCI IDE code, and is a huge mess.

> Expect another mail with the data you requested within a couple of hours.

Thanks a lot.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Tobias Ringstrom

On Sun, 14 Jan 2001, Vojtech Pavlik wrote:
> On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote:
>
> > I have now tried the SAMSUNG VG34323A disk with two other controllers at
> > home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V
> > motherboard), and there are no problems to be found with DMA enabled.
> > Streaming 10 MB/s without glitches.
>
> So the drive *did* work on the vt82c686a in the A7V board? You tested it
> both on the Promise and on the 686a? But doesn't work on the 686a in
> your other board?

Yes, on both the Promise and on the 686a.  But the device revisions are
different.  The machine that does NOT work:

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b)
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06)

The machine that works:

00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)

The one the works is a 1 GHz Athlon, and the other is an 800 MHz
Pentium-III.

> > no matter what cable I use.  When I get this, the machine does not recover
> > most of the time, and I have to reset or power cycle.
>
> It should be able to recover in a couple (up to 10) minutes ...

Who waits 10 minutes for a timeout?  Can it be lowered?

Expect another mail with the data you requested within a couple of hours.

/Tobias

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Vojtech Pavlik

On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote:

> I have now tried the SAMSUNG VG34323A disk with two other controllers at
> home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V
> motherboard), and there are no problems to be found with DMA enabled.
> Streaming 10 MB/s without glitches.

So the drive *did* work on the vt82c686a in the A7V board? You tested it
both on the Promise and on the 686a? But doesn't work on the 686a in
your other board?

> However, writing to the SAMSUNG VG34323A disk with DMA enabled on either
> this machine [1] (at work, using the VIA IDE driver version 3.11)
> 
> 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Apollo PRO] (rev 23)
> 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)
> 
> or this machine [2] (at work, using the VIA IDE driver version 2.1e)
> 
> 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b)
> 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06)

What's the manufacturer/model of these boards? Just for record ...
What's the PCI bus speed? Or memory speed?

> I get exactly the following errors on both machines
> 
> hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
> 
> no matter what cable I use.  When I get this, the machine does not recover
> most of the time, and I have to reset or power cycle.

It should be able to recover in a couple (up to 10) minutes ...

> This disc works
> flawlessly on two other IDE controllers, so I do not think that the disk
> is completely broken. It must be either these chipsets or the driver in
> combination with this disk.  Note that I _can_ use another UDMA66 disk
> _with_ DMA enabled on both machine [1] and [2] above without problems.
> Also, 2.2.16-22 seems to work with DMA enabled on machine [1].  I have not
> tried 2.2.16-22 with DMA enabled on machine [2].
> 
> The problem I reported at first, hence the nasty subject, was a hang and a
> nasty fs corruption when RH7 tried to remount the root fs read-write.  I
> examined the RH7 init scripts, or more precisely /etc/rc.sysinit, and
> discovered, to my great disgust, that the stupid thing disables the dmesg
> output on the console very early in the script.  It is thus entirely
> possible that I do get the above mentioned errors when the computer seems
> to hang, and my fs gets corrupted.  I will fix the script tomorrow to see
> if my assumption is correct.
> 
> SUMMARY:  I have a disk that with DMA enabled give me CRC errors on two
> machines, but not on two other, independent on the cable.  Both troubling
> machines do not recover from these errors.  Linux 2.2.16-22 from RedHat
> works fine with DMA enabled on machine [1], [2] is unknown.
> 
> I hope this makes things a lot clearer.

Yes, indeed it's much clearer now. Now to fix the bug, or at least be
able to track it closer, I'll need 'lspci -vvxxx' of the IDE pci device
in the following cases:

1) SAMSUNG VG34323A on VT82C596b/cf with RH 2.2.16-22 and DMA (working)
2) SAMSUNG VG34323A on VT82C686a/ce with RH 2.2.16-22 and DMA (working)
3) SAMSUNG VG34323A on VT82C596b/cf with 2.4.0+via3.11 and DMA,
(doesn't work, so fs readonly)
4) SAMSUNG VG34323A on VT82C686a/ce with 2.4.0+via3.11 and DMA,
(doesn't work, so fs readonly)
5) The other drive on VT82C596b/cf with 2.4.0+via3.11 and DMA (working)
6) The other drive on VT82C686a/ce with 2.4.0+via3.11 and DMA (working)

With these data I should be able to find out what's different between
the working and not working setups ...



My current theory: In UDMA, when reading, the drive provides the clock.
The IDE controller thus can read everything OK. When writing, the
controller provides the clock and for some reason the Samsung can't keep
up with the setting the driver selects for it. The question is why and
why the driver selects the incorrect (or just too tight?) value.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Vojtech Pavlik

On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote:

 I have now tried the SAMSUNG VG34323A disk with two other controllers at
 home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V
 motherboard), and there are no problems to be found with DMA enabled.
 Streaming 10 MB/s without glitches.

So the drive *did* work on the vt82c686a in the A7V board? You tested it
both on the Promise and on the 686a? But doesn't work on the 686a in
your other board?

 However, writing to the SAMSUNG VG34323A disk with DMA enabled on either
 this machine [1] (at work, using the VIA IDE driver version 3.11)
 
 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Apollo PRO] (rev 23)
 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)
 
 or this machine [2] (at work, using the VIA IDE driver version 2.1e)
 
 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b)
 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06)

What's the manufacturer/model of these boards? Just for record ...
What's the PCI bus speed? Or memory speed?

 I get exactly the following errors on both machines
 
 hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
 hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
 
 no matter what cable I use.  When I get this, the machine does not recover
 most of the time, and I have to reset or power cycle.

It should be able to recover in a couple (up to 10) minutes ...

 This disc works
 flawlessly on two other IDE controllers, so I do not think that the disk
 is completely broken. It must be either these chipsets or the driver in
 combination with this disk.  Note that I _can_ use another UDMA66 disk
 _with_ DMA enabled on both machine [1] and [2] above without problems.
 Also, 2.2.16-22 seems to work with DMA enabled on machine [1].  I have not
 tried 2.2.16-22 with DMA enabled on machine [2].
 
 The problem I reported at first, hence the nasty subject, was a hang and a
 nasty fs corruption when RH7 tried to remount the root fs read-write.  I
 examined the RH7 init scripts, or more precisely /etc/rc.sysinit, and
 discovered, to my great disgust, that the stupid thing disables the dmesg
 output on the console very early in the script.  It is thus entirely
 possible that I do get the above mentioned errors when the computer seems
 to hang, and my fs gets corrupted.  I will fix the script tomorrow to see
 if my assumption is correct.
 
 SUMMARY:  I have a disk that with DMA enabled give me CRC errors on two
 machines, but not on two other, independent on the cable.  Both troubling
 machines do not recover from these errors.  Linux 2.2.16-22 from RedHat
 works fine with DMA enabled on machine [1], [2] is unknown.
 
 I hope this makes things a lot clearer.

Yes, indeed it's much clearer now. Now to fix the bug, or at least be
able to track it closer, I'll need 'lspci -vvxxx' of the IDE pci device
in the following cases:

1) SAMSUNG VG34323A on VT82C596b/cf with RH 2.2.16-22 and DMA (working)
2) SAMSUNG VG34323A on VT82C686a/ce with RH 2.2.16-22 and DMA (working)
3) SAMSUNG VG34323A on VT82C596b/cf with 2.4.0+via3.11 and DMA,
(doesn't work, so fs readonly)
4) SAMSUNG VG34323A on VT82C686a/ce with 2.4.0+via3.11 and DMA,
(doesn't work, so fs readonly)
5) The other drive on VT82C596b/cf with 2.4.0+via3.11 and DMA (working)
6) The other drive on VT82C686a/ce with 2.4.0+via3.11 and DMA (working)

With these data I should be able to find out what's different between
the working and not working setups ...



My current theory: In UDMA, when reading, the drive provides the clock.
The IDE controller thus can read everything OK. When writing, the
controller provides the clock and for some reason the Samsung can't keep
up with the setting the driver selects for it. The question is why and
why the driver selects the incorrect (or just too tight?) value.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Tobias Ringstrom

On Sun, 14 Jan 2001, Vojtech Pavlik wrote:
 On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote:

  I have now tried the SAMSUNG VG34323A disk with two other controllers at
  home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V
  motherboard), and there are no problems to be found with DMA enabled.
  Streaming 10 MB/s without glitches.

 So the drive *did* work on the vt82c686a in the A7V board? You tested it
 both on the Promise and on the 686a? But doesn't work on the 686a in
 your other board?

Yes, on both the Promise and on the 686a.  But the device revisions are
different.  The machine that does NOT work:

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b)
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06)

The machine that works:

00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)

The one the works is a 1 GHz Athlon, and the other is an 800 MHz
Pentium-III.

  no matter what cable I use.  When I get this, the machine does not recover
  most of the time, and I have to reset or power cycle.

 It should be able to recover in a couple (up to 10) minutes ...

Who waits 10 minutes for a timeout?  Can it be lowered?

Expect another mail with the data you requested within a couple of hours.

/Tobias

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Vojtech Pavlik

On Sun, Jan 14, 2001 at 09:45:09AM +0100, Tobias Ringstrom wrote:
 On Sun, 14 Jan 2001, Vojtech Pavlik wrote:
  On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote:
 
   I have now tried the SAMSUNG VG34323A disk with two other controllers at
   home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V
   motherboard), and there are no problems to be found with DMA enabled.
   Streaming 10 MB/s without glitches.
 
  So the drive *did* work on the vt82c686a in the A7V board? You tested it
  both on the Promise and on the 686a? But doesn't work on the 686a in
  your other board?
 
 Yes, on both the Promise and on the 686a.  But the device revisions are
 different.  The machine that does NOT work:
 
 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b)
 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06)
 
 The machine that works:
 
 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)
 
 The one the works is a 1 GHz Athlon, and the other is an 800 MHz
 Pentium-III.
 
   no matter what cable I use.  When I get this, the machine does not recover
   most of the time, and I have to reset or power cycle.
 
  It should be able to recover in a couple (up to 10) minutes ...
 
 Who waits 10 minutes for a timeout?  Can it be lowered?

It's not a 10 minute timeout, it's a shorter timeout retried many times.
Not my code, though - this is generic PCI IDE code, and is a huge mess.

 Expect another mail with the data you requested within a couple of hours.

Thanks a lot.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Tobias Ringstrom

On Sun, 14 Jan 2001, Vojtech Pavlik wrote:
   So the drive *did* work on the vt82c686a in the A7V board? You tested it
   both on the Promise and on the 686a? But doesn't work on the 686a in
   your other board?
 
  Yes, on both the Promise and on the 686a.  But the device revisions are
  different.  The machine that does NOT work:
 
  00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b)
  00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06)
 
  The machine that works:
 
  00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
  00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)
 
  The one the works is a 1 GHz Athlon, and the other is an 800 MHz
  Pentium-III.

Of course is isn't.  The vt82c686 that does not work is a 450 MHz K-6, not
a PIII.

no matter what cable I use.  When I get this, the machine does not recover
most of the time, and I have to reset or power cycle.
  
   It should be able to recover in a couple (up to 10) minutes ...
 
  Who waits 10 minutes for a timeout?  Can it be lowered?

 It's not a 10 minute timeout, it's a shorter timeout retried many times.
 Not my code, though - this is generic PCI IDE code, and is a huge mess.

What I get is a number of Busy and Drive is not ready for command for
different sectors.

  Expect another mail with the data you requested within a couple of hours.

 Thanks a lot.

Ok, it took a bit longer that that, mostly because me and my whife had
unexpected (but very welcome) guests at home.  It is Sunday, after all...

I have attached a tar file with "lspci -vvxxx" and "hdinfo -i" for machine
1 and 2 to this mail, but first some comments.

I will be talking about three machines:

1) 450 MHz K-6 on an AOpen MX59 PRO II motherboard
2) 800 MHz PIII on an unknown cheap/crappy motherboard.
3) 1 GHz Athlon on an ASUS A7V motherboard.

and the following drives:

A) SAMSUNG VG34323A, sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 udma2
B) ST38421A, mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4

Machine 3 is the machine at home, and it does not have problems with any
disks I have tried soo far, and seems very stable, both with ATA100 and
ATA66.

I verified that what is happening when RH7 tries to remount / read-write,
is that I get the infamous CRC errors.  It does not seem to recover from
this state.  At least I did not wait that long.

I do not think that the RH7 kernel 2.2.16-22 uses udma2 at any time, and
that may be why it works.

Disk B does NOT work with DMA enabled with machine 1 or 2.  It works
better than disk A, but it does still fail after some time.  The
combination 1B was the most stable, and only failed once.

When using disk B, the computer has managed to recover from the CRC error
condition every time, as opposed to disk A which never recovers.  (Busy)

Using hdparm -X65 (udma1) makes disk A work with 2.4 in machine 2.  What
is the difference between udma1 and udma2?

Now I'm almost completely lost.  Hope this helps.  Let me know if you want
me to try something else.

/Tobias




/dev/hde:

 Model=SAMSUNG VG34323A (4.32GB), FwRev=GQ200, SerialNo=dW1921060033c8
 Config={ HardSect NotMFM HdSw15uSec Fixed DTR10Mbs }
 RawCHS=14896/9/63, TrkSize=32256, SectSize=512, ECCbytes=21
 BuffType=DualPortCache, BuffSize=496kB, MaxMultSect=16, MultSect=off
 CurCHS=14896/9/63, CurSects=-531627904, LBA=yes, LBAsects=8446032
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 *udma2 


00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
Subsystem: Asustek Computer, Inc.: Unknown device 8033
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium TAbort- TAbort- 
MAbort+ SERR- PERR+
Latency: 0
Region 0: Memory at e000 (32-bit, prefetchable) [size=128M]
Capabilities: [a0] AGP version 2.0
Status: RQ=31 SBA+ 64bit- FW+ Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=none
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 06 11 05 03 06 00 10 a2 02 00 00 06 00 00 00 00
10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 17 a4 6b b4 4f 81 10 10 80 00 08 10 10 10 10 10
60: 03 ff 00 b0 e6 e5 e5 00 44 7c 86 0f 08 3f 00 00
70: de 80 cc 0c 0e a1 d2 00 01 b4 11 02 00 00 00 01
80: 0f 40 00 00 80 00 00 00 02 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 02 c0 20 00 17 02 00 1f 00 00 00 00 6e 

Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Tobias Ringstrom

I should also add that the 3.11 driver seems to make things better, but
not yet perfect.  My intuition tells me that I get CRC errors much sooner
with 2.1e than with 3.11.

Has the timings changed from 2.1e to 3.11, and would it be easy to modify
3.11 to get extra safe/paranoid, but less high performance, timings?

Some extra data:
* B seems to work in 2 with udma2
* A seems to work in 2 with udma1, but not with udma2.

I wouldn't say it's rock solid, and I would not trust my data to any of
these combinations, but at least it not break immmediately (i.e. for less
than 1 GB written).

The worst combination is 2.4.0 with VIA 2.1e and A in 1.  Going from 2.1e
to 3.11 helps, but it is still very bad.

I'd really like to be more precise, but there are too many combinations to
try to try them all, and sometimes it fails right away, and sometimes
after several hundred megabytes.

/Tobias

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-14 Thread Vojtech Pavlik

On Sun, Jan 14, 2001 at 06:59:57PM +0100, Tobias Ringstrom wrote:
 
 I should also add that the 3.11 driver seems to make things better, but
 not yet perfect.  My intuition tells me that I get CRC errors much sooner
 with 2.1e than with 3.11.
 
 Has the timings changed from 2.1e to 3.11, and would it be easy to modify
 3.11 to get extra safe/paranoid, but less high performance, timings?

If you use 'idebus=40' or 'idebus=50', the driver will add an extra
margin to the timings, trying to compensate for the 40 or 50 MHz PCI bus
it will be tricked to think it's working with.

This could add a data point, yes.

 Some extra data:
 * B seems to work in 2 with udma2
 * A seems to work in 2 with udma1, but not with udma2.

UDMA1 is 22.2 MB/sec, UDMA2 is 33.3. UDMA0 is 16.6.

Could you (if didn't already) send me the lspci -vvxxx after the -X65
(UDMA1) command, together with the one before? That also could tell
something.

 I wouldn't say it's rock solid, and I would not trust my data to any of
 these combinations, but at least it not break immmediately (i.e. for less
 than 1 GB written).

Actually, the CRC messages are safe and only mean a data transfer is
retried. That is, only if it doesn't fail every time. They happen on
many boards and drives using UDMA even under normal correct operation :(

 The worst combination is 2.4.0 with VIA 2.1e and A in 1.  Going from 2.1e
 to 3.11 helps, but it is still very bad.
 
 I'd really like to be more precise, but there are too many combinations to
 try to try them all, and sometimes it fails right away, and sometimes
 after several hundred megabytes.

If 'fails after several hundred megabytes' only means a single CRC error
which is recovered from correctly, then that actually means 'working and
probably would work perfect with a shorter cable'.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-13 Thread Tobias Ringstrom

I have now tried the SAMSUNG VG34323A disk with two other controllers at
home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V
motherboard), and there are no problems to be found with DMA enabled.
Streaming 10 MB/s without glitches.

However, writing to the SAMSUNG VG34323A disk with DMA enabled on either
this machine [1] (at work, using the VIA IDE driver version 3.11)

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Apollo PRO] (rev 23)
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)

or this machine [2] (at work, using the VIA IDE driver version 2.1e)

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b)
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06)

I get exactly the following errors on both machines

hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }

no matter what cable I use.  When I get this, the machine does not recover
most of the time, and I have to reset or power cycle.  This disc works
flawlessly on two other IDE controllers, so I do not think that the disk
is completely broken. It must be either these chipsets or the driver in
combination with this disk.  Note that I _can_ use another UDMA66 disk
_with_ DMA enabled on both machine [1] and [2] above without problems.
Also, 2.2.16-22 seems to work with DMA enabled on machine [1].  I have not
tried 2.2.16-22 with DMA enabled on machine [2].

The problem I reported at first, hence the nasty subject, was a hang and a
nasty fs corruption when RH7 tried to remount the root fs read-write.  I
examined the RH7 init scripts, or more precisely /etc/rc.sysinit, and
discovered, to my great disgust, that the stupid thing disables the dmesg
output on the console very early in the script.  It is thus entirely
possible that I do get the above mentioned errors when the computer seems
to hang, and my fs gets corrupted.  I will fix the script tomorrow to see
if my assumption is correct.

SUMMARY:  I have a disk that with DMA enabled give me CRC errors on two
machines, but not on two other, independent on the cable.  Both troubling
machines do not recover from these errors.  Linux 2.2.16-22 from RedHat
works fine with DMA enabled on machine [1], [2] is unknown.

I hope this makes things a lot clearer.

/Tobias

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 ate my filesystem on rw-mount, getting closer

2001-01-13 Thread Tobias Ringstrom

I have now tried the SAMSUNG VG34323A disk with two other controllers at
home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V
motherboard), and there are no problems to be found with DMA enabled.
Streaming 10 MB/s without glitches.

However, writing to the SAMSUNG VG34323A disk with DMA enabled on either
this machine [1] (at work, using the VIA IDE driver version 3.11)

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Apollo PRO] (rev 23)
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)

or this machine [2] (at work, using the VIA IDE driver version 2.1e)

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b)
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06)

I get exactly the following errors on both machines

hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }

no matter what cable I use.  When I get this, the machine does not recover
most of the time, and I have to reset or power cycle.  This disc works
flawlessly on two other IDE controllers, so I do not think that the disk
is completely broken. It must be either these chipsets or the driver in
combination with this disk.  Note that I _can_ use another UDMA66 disk
_with_ DMA enabled on both machine [1] and [2] above without problems.
Also, 2.2.16-22 seems to work with DMA enabled on machine [1].  I have not
tried 2.2.16-22 with DMA enabled on machine [2].

The problem I reported at first, hence the nasty subject, was a hang and a
nasty fs corruption when RH7 tried to remount the root fs read-write.  I
examined the RH7 init scripts, or more precisely /etc/rc.sysinit, and
discovered, to my great disgust, that the stupid thing disables the dmesg
output on the console very early in the script.  It is thus entirely
possible that I do get the above mentioned errors when the computer seems
to hang, and my fs gets corrupted.  I will fix the script tomorrow to see
if my assumption is correct.

SUMMARY:  I have a disk that with DMA enabled give me CRC errors on two
machines, but not on two other, independent on the cable.  Both troubling
machines do not recover from these errors.  Linux 2.2.16-22 from RedHat
works fine with DMA enabled on machine [1], [2] is unknown.

I hope this makes things a lot clearer.

/Tobias

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/