Re: PDC20265, disk corruption and NMI watchdog...

2001-01-28 Thread Petr Vandrovec

On Mon, Jan 28, 2001 at 17:36 Andre Hedrick wrote:
> 
> Everything but a kernel version :-(

Sorry. Corruption happens since I have this motherboard just before
christmas - something about 2.4.0-test13-pre2. It was not so massive
as today - with 2.4.0-ac10 (before it just died, today it damaged
hdh in addition). I got corruption/lockup with other versions (test13-pre3,
test13-pre7, 2.4.0, 2.4.0-ac3, 2.4.0-ac8, 2.4.0-ac9) too,
but as this one (ac10) does NMI watchdog on UP, I finally found
what's wrong instead of silent death - it is why I finally wrote
this letter - so you can either __sti() or mdelay() in ide_delay_50ms.
So others will not suffer from silent death instead of IDE reset...

Then I can test whether promise recovers from this - I doubt, as it
looks like that some data for /dev/hde landed on /dev/hdh (impossible,
I know...).

> On Mon, 29 Jan 2001, Petr Vandrovec wrote:
> 
> >   why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?!
> > It is invoked with interrupts disabled, causing NMI watchdog detected
> > on my system, leading to complete crash of system.

For now I said that both disks should use PIO4 and it looks stable...
But there is visible difference in speed between PIO4 and UDMA5 ;-)

I have no idea whether TOSHIBA can do UDMA CRCs, but it works fine
at work, where I connected it to i440BX, and since November I
connect it to VIA694X/686A. Both in UDMA2 mode without troubles.
Best regards,
Petr Vandrovec
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PDC20265, disk corruption and NMI watchdog...

2001-01-28 Thread Andre Hedrick


Everything but a kernel version :-(

On Mon, 29 Jan 2001, Petr Vandrovec wrote:

> Short story:
> 
> Hi Andre,
>   why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?!
> It is invoked with interrupts disabled, causing NMI watchdog detected
> on my system, leading to complete crash of system.
> 
> Long story:
> 
> At home I have Asus A7V motherboard with 1G Athlon, and onboard
> PDC20265 and VIA KT133. Only CDROM is connected to VIA (as I was
> not able to get any ATAPI device with Promise under Linux).
> To primary master of PDC (hde) there is IBM-DTLA-307045, happilly
> running in UDMA5 mode. As secondary slave (hdh) there is removable
> hdd TOSHIBA MK6409MAV - I use this hdd to transport data between
> work, home and grandparents.
> 
> On every weekend I bring debian packages on this hdd home, as downloading
> couple of MBs each weekend is not acceptable for dialup connection.
> But data are never copied OK from one hdd (hdh) to another (hde) -
> - hdh runs in UDMA2, hde in UDMA5. There are always 4 different bytes
> in couple of files - corrupted bytes are always on word boundary, but
> sometime they are on dword, sometime they are not. Data are never
> moved, they are just random bytes... It looks like that
> problem is with removable HDD (source), not with UDMA5 destination.
> 
> So I decided to 'hdparm -d1 -X 65 /dev/hdh' to switch it to UDMA1.
> i was awarded by (4 times):
> 
> hde: dma_intr: bad DMA status
> hde: dma_intr: status = 0x50 { DriveReady SeekComplete }
> 
> and
> 
> hde: DMA disabled
> NMI watchdog detected lockup on CPU0, registers:
> ...
> Dump says that ide_delay_50ms was invoked with interrupts disabled
> in swapper task, through pdc202xx_reset -> do_reset1 -> ide_do_reset -> 
> ide_error -> ide_dma_intr -> ide_intr -> handle_IRQ_event -> do_IRQ -> 
> (interrupt) -> default_idle -> cpu_idle.
> 
> I have no idea why it compalined on hde, when I hdparm-ed hdh...
> So I rebooted - and after reboot fsck of hde* passed ok, but on
> hdh* it was not able to find debian tree - directory tree was cut to about
> 7 parts which were reconnected to /lost+found. Probably someone took
> deep look at some inodes, as fsck found about 1000 errors - it is too
> much from filesystem which could be modified only due to atime changes...
> 
> So I my questions are: 
> Is it ok to use pdc202xx driver at all? It does not complain about UDMA CRC 
>   errors, but data are (always) corrupted.
> Should I return back to UDMA66 VIA instead of UDMA100 promise? 
> Should I rejumper my removable HDD to be master, and not slave?
>   Thanks,
>   Petr Vandrovec
>   [EMAIL PROTECTED]  
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
> 

Andre Hedrick
Linux ATA Development

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



PDC20265, disk corruption and NMI watchdog...

2001-01-28 Thread Petr Vandrovec

Short story:

Hi Andre,
  why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?!
It is invoked with interrupts disabled, causing NMI watchdog detected
on my system, leading to complete crash of system.

Long story:

At home I have Asus A7V motherboard with 1G Athlon, and onboard
PDC20265 and VIA KT133. Only CDROM is connected to VIA (as I was
not able to get any ATAPI device with Promise under Linux).
To primary master of PDC (hde) there is IBM-DTLA-307045, happilly
running in UDMA5 mode. As secondary slave (hdh) there is removable
hdd TOSHIBA MK6409MAV - I use this hdd to transport data between
work, home and grandparents.

On every weekend I bring debian packages on this hdd home, as downloading
couple of MBs each weekend is not acceptable for dialup connection.
But data are never copied OK from one hdd (hdh) to another (hde) -
- hdh runs in UDMA2, hde in UDMA5. There are always 4 different bytes
in couple of files - corrupted bytes are always on word boundary, but
sometime they are on dword, sometime they are not. Data are never
moved, they are just random bytes... It looks like that
problem is with removable HDD (source), not with UDMA5 destination.

So I decided to 'hdparm -d1 -X 65 /dev/hdh' to switch it to UDMA1.
i was awarded by (4 times):

hde: dma_intr: bad DMA status
hde: dma_intr: status = 0x50 { DriveReady SeekComplete }

and

hde: DMA disabled
NMI watchdog detected lockup on CPU0, registers:
...
Dump says that ide_delay_50ms was invoked with interrupts disabled
in swapper task, through pdc202xx_reset -> do_reset1 -> ide_do_reset -> 
ide_error -> ide_dma_intr -> ide_intr -> handle_IRQ_event -> do_IRQ -> 
(interrupt) -> default_idle -> cpu_idle.

I have no idea why it compalined on hde, when I hdparm-ed hdh...
So I rebooted - and after reboot fsck of hde* passed ok, but on
hdh* it was not able to find debian tree - directory tree was cut to about
7 parts which were reconnected to /lost+found. Probably someone took
deep look at some inodes, as fsck found about 1000 errors - it is too
much from filesystem which could be modified only due to atime changes...

So I my questions are: 
Is it ok to use pdc202xx driver at all? It does not complain about UDMA CRC 
  errors, but data are (always) corrupted.
Should I return back to UDMA66 VIA instead of UDMA100 promise? 
Should I rejumper my removable HDD to be master, and not slave?
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



PDC20265, disk corruption and NMI watchdog...

2001-01-28 Thread Petr Vandrovec

Short story:

Hi Andre,
  why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?!
It is invoked with interrupts disabled, causing NMI watchdog detected
on my system, leading to complete crash of system.

Long story:

At home I have Asus A7V motherboard with 1G Athlon, and onboard
PDC20265 and VIA KT133. Only CDROM is connected to VIA (as I was
not able to get any ATAPI device with Promise under Linux).
To primary master of PDC (hde) there is IBM-DTLA-307045, happilly
running in UDMA5 mode. As secondary slave (hdh) there is removable
hdd TOSHIBA MK6409MAV - I use this hdd to transport data between
work, home and grandparents.

On every weekend I bring debian packages on this hdd home, as downloading
couple of MBs each weekend is not acceptable for dialup connection.
But data are never copied OK from one hdd (hdh) to another (hde) -
- hdh runs in UDMA2, hde in UDMA5. There are always 4 different bytes
in couple of files - corrupted bytes are always on word boundary, but
sometime they are on dword, sometime they are not. Data are never
moved, they are just random bytes... It looks like that
problem is with removable HDD (source), not with UDMA5 destination.

So I decided to 'hdparm -d1 -X 65 /dev/hdh' to switch it to UDMA1.
i was awarded by (4 times):

hde: dma_intr: bad DMA status
hde: dma_intr: status = 0x50 { DriveReady SeekComplete }

and

hde: DMA disabled
NMI watchdog detected lockup on CPU0, registers:
...
Dump says that ide_delay_50ms was invoked with interrupts disabled
in swapper task, through pdc202xx_reset - do_reset1 - ide_do_reset - 
ide_error - ide_dma_intr - ide_intr - handle_IRQ_event - do_IRQ - 
(interrupt) - default_idle - cpu_idle.

I have no idea why it compalined on hde, when I hdparm-ed hdh...
So I rebooted - and after reboot fsck of hde* passed ok, but on
hdh* it was not able to find debian tree - directory tree was cut to about
7 parts which were reconnected to /lost+found. Probably someone took
deep look at some inodes, as fsck found about 1000 errors - it is too
much from filesystem which could be modified only due to atime changes...

So I my questions are: 
Is it ok to use pdc202xx driver at all? It does not complain about UDMA CRC 
  errors, but data are (always) corrupted.
Should I return back to UDMA66 VIA instead of UDMA100 promise? 
Should I rejumper my removable HDD to be master, and not slave?
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PDC20265, disk corruption and NMI watchdog...

2001-01-28 Thread Andre Hedrick


Everything but a kernel version :-(

On Mon, 29 Jan 2001, Petr Vandrovec wrote:

 Short story:
 
 Hi Andre,
   why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?!
 It is invoked with interrupts disabled, causing NMI watchdog detected
 on my system, leading to complete crash of system.
 
 Long story:
 
 At home I have Asus A7V motherboard with 1G Athlon, and onboard
 PDC20265 and VIA KT133. Only CDROM is connected to VIA (as I was
 not able to get any ATAPI device with Promise under Linux).
 To primary master of PDC (hde) there is IBM-DTLA-307045, happilly
 running in UDMA5 mode. As secondary slave (hdh) there is removable
 hdd TOSHIBA MK6409MAV - I use this hdd to transport data between
 work, home and grandparents.
 
 On every weekend I bring debian packages on this hdd home, as downloading
 couple of MBs each weekend is not acceptable for dialup connection.
 But data are never copied OK from one hdd (hdh) to another (hde) -
 - hdh runs in UDMA2, hde in UDMA5. There are always 4 different bytes
 in couple of files - corrupted bytes are always on word boundary, but
 sometime they are on dword, sometime they are not. Data are never
 moved, they are just random bytes... It looks like that
 problem is with removable HDD (source), not with UDMA5 destination.
 
 So I decided to 'hdparm -d1 -X 65 /dev/hdh' to switch it to UDMA1.
 i was awarded by (4 times):
 
 hde: dma_intr: bad DMA status
 hde: dma_intr: status = 0x50 { DriveReady SeekComplete }
 
 and
 
 hde: DMA disabled
 NMI watchdog detected lockup on CPU0, registers:
 ...
 Dump says that ide_delay_50ms was invoked with interrupts disabled
 in swapper task, through pdc202xx_reset - do_reset1 - ide_do_reset - 
 ide_error - ide_dma_intr - ide_intr - handle_IRQ_event - do_IRQ - 
 (interrupt) - default_idle - cpu_idle.
 
 I have no idea why it compalined on hde, when I hdparm-ed hdh...
 So I rebooted - and after reboot fsck of hde* passed ok, but on
 hdh* it was not able to find debian tree - directory tree was cut to about
 7 parts which were reconnected to /lost+found. Probably someone took
 deep look at some inodes, as fsck found about 1000 errors - it is too
 much from filesystem which could be modified only due to atime changes...
 
 So I my questions are: 
 Is it ok to use pdc202xx driver at all? It does not complain about UDMA CRC 
   errors, but data are (always) corrupted.
 Should I return back to UDMA66 VIA instead of UDMA100 promise? 
 Should I rejumper my removable HDD to be master, and not slave?
   Thanks,
   Petr Vandrovec
   [EMAIL PROTECTED]  
 
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/
 

Andre Hedrick
Linux ATA Development

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PDC20265, disk corruption and NMI watchdog...

2001-01-28 Thread Petr Vandrovec

On Mon, Jan 28, 2001 at 17:36 Andre Hedrick wrote:
 
 Everything but a kernel version :-(

Sorry. Corruption happens since I have this motherboard just before
christmas - something about 2.4.0-test13-pre2. It was not so massive
as today - with 2.4.0-ac10 (before it just died, today it damaged
hdh in addition). I got corruption/lockup with other versions (test13-pre3,
test13-pre7, 2.4.0, 2.4.0-ac3, 2.4.0-ac8, 2.4.0-ac9) too,
but as this one (ac10) does NMI watchdog on UP, I finally found
what's wrong instead of silent death - it is why I finally wrote
this letter - so you can either __sti() or mdelay() in ide_delay_50ms.
So others will not suffer from silent death instead of IDE reset...

Then I can test whether promise recovers from this - I doubt, as it
looks like that some data for /dev/hde landed on /dev/hdh (impossible,
I know...).

 On Mon, 29 Jan 2001, Petr Vandrovec wrote:
 
why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?!
  It is invoked with interrupts disabled, causing NMI watchdog detected
  on my system, leading to complete crash of system.

For now I said that both disks should use PIO4 and it looks stable...
But there is visible difference in speed between PIO4 and UDMA5 ;-)

I have no idea whether TOSHIBA can do UDMA CRCs, but it works fine
at work, where I connected it to i440BX, and since November I
connect it to VIA694X/686A. Both in UDMA2 mode without troubles.
Best regards,
Petr Vandrovec
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/