Re: PDC20265, disk corruption and NMI watchdog...
On Mon, Jan 28, 2001 at 17:36 Andre Hedrick wrote: > > Everything but a kernel version :-( Sorry. Corruption happens since I have this motherboard just before christmas - something about 2.4.0-test13-pre2. It was not so massive as today - with 2.4.0-ac10 (before it just died, today it damaged hdh in addition). I got corruption/lockup with other versions (test13-pre3, test13-pre7, 2.4.0, 2.4.0-ac3, 2.4.0-ac8, 2.4.0-ac9) too, but as this one (ac10) does NMI watchdog on UP, I finally found what's wrong instead of silent death - it is why I finally wrote this letter - so you can either __sti() or mdelay() in ide_delay_50ms. So others will not suffer from silent death instead of IDE reset... Then I can test whether promise recovers from this - I doubt, as it looks like that some data for /dev/hde landed on /dev/hdh (impossible, I know...). > On Mon, 29 Jan 2001, Petr Vandrovec wrote: > > > why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?! > > It is invoked with interrupts disabled, causing NMI watchdog detected > > on my system, leading to complete crash of system. For now I said that both disks should use PIO4 and it looks stable... But there is visible difference in speed between PIO4 and UDMA5 ;-) I have no idea whether TOSHIBA can do UDMA CRCs, but it works fine at work, where I connected it to i440BX, and since November I connect it to VIA694X/686A. Both in UDMA2 mode without troubles. Best regards, Petr Vandrovec [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PDC20265, disk corruption and NMI watchdog...
Everything but a kernel version :-( On Mon, 29 Jan 2001, Petr Vandrovec wrote: > Short story: > > Hi Andre, > why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?! > It is invoked with interrupts disabled, causing NMI watchdog detected > on my system, leading to complete crash of system. > > Long story: > > At home I have Asus A7V motherboard with 1G Athlon, and onboard > PDC20265 and VIA KT133. Only CDROM is connected to VIA (as I was > not able to get any ATAPI device with Promise under Linux). > To primary master of PDC (hde) there is IBM-DTLA-307045, happilly > running in UDMA5 mode. As secondary slave (hdh) there is removable > hdd TOSHIBA MK6409MAV - I use this hdd to transport data between > work, home and grandparents. > > On every weekend I bring debian packages on this hdd home, as downloading > couple of MBs each weekend is not acceptable for dialup connection. > But data are never copied OK from one hdd (hdh) to another (hde) - > - hdh runs in UDMA2, hde in UDMA5. There are always 4 different bytes > in couple of files - corrupted bytes are always on word boundary, but > sometime they are on dword, sometime they are not. Data are never > moved, they are just random bytes... It looks like that > problem is with removable HDD (source), not with UDMA5 destination. > > So I decided to 'hdparm -d1 -X 65 /dev/hdh' to switch it to UDMA1. > i was awarded by (4 times): > > hde: dma_intr: bad DMA status > hde: dma_intr: status = 0x50 { DriveReady SeekComplete } > > and > > hde: DMA disabled > NMI watchdog detected lockup on CPU0, registers: > ... > Dump says that ide_delay_50ms was invoked with interrupts disabled > in swapper task, through pdc202xx_reset -> do_reset1 -> ide_do_reset -> > ide_error -> ide_dma_intr -> ide_intr -> handle_IRQ_event -> do_IRQ -> > (interrupt) -> default_idle -> cpu_idle. > > I have no idea why it compalined on hde, when I hdparm-ed hdh... > So I rebooted - and after reboot fsck of hde* passed ok, but on > hdh* it was not able to find debian tree - directory tree was cut to about > 7 parts which were reconnected to /lost+found. Probably someone took > deep look at some inodes, as fsck found about 1000 errors - it is too > much from filesystem which could be modified only due to atime changes... > > So I my questions are: > Is it ok to use pdc202xx driver at all? It does not complain about UDMA CRC > errors, but data are (always) corrupted. > Should I return back to UDMA66 VIA instead of UDMA100 promise? > Should I rejumper my removable HDD to be master, and not slave? > Thanks, > Petr Vandrovec > [EMAIL PROTECTED] > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ > Andre Hedrick Linux ATA Development - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
PDC20265, disk corruption and NMI watchdog...
Short story: Hi Andre, why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?! It is invoked with interrupts disabled, causing NMI watchdog detected on my system, leading to complete crash of system. Long story: At home I have Asus A7V motherboard with 1G Athlon, and onboard PDC20265 and VIA KT133. Only CDROM is connected to VIA (as I was not able to get any ATAPI device with Promise under Linux). To primary master of PDC (hde) there is IBM-DTLA-307045, happilly running in UDMA5 mode. As secondary slave (hdh) there is removable hdd TOSHIBA MK6409MAV - I use this hdd to transport data between work, home and grandparents. On every weekend I bring debian packages on this hdd home, as downloading couple of MBs each weekend is not acceptable for dialup connection. But data are never copied OK from one hdd (hdh) to another (hde) - - hdh runs in UDMA2, hde in UDMA5. There are always 4 different bytes in couple of files - corrupted bytes are always on word boundary, but sometime they are on dword, sometime they are not. Data are never moved, they are just random bytes... It looks like that problem is with removable HDD (source), not with UDMA5 destination. So I decided to 'hdparm -d1 -X 65 /dev/hdh' to switch it to UDMA1. i was awarded by (4 times): hde: dma_intr: bad DMA status hde: dma_intr: status = 0x50 { DriveReady SeekComplete } and hde: DMA disabled NMI watchdog detected lockup on CPU0, registers: ... Dump says that ide_delay_50ms was invoked with interrupts disabled in swapper task, through pdc202xx_reset -> do_reset1 -> ide_do_reset -> ide_error -> ide_dma_intr -> ide_intr -> handle_IRQ_event -> do_IRQ -> (interrupt) -> default_idle -> cpu_idle. I have no idea why it compalined on hde, when I hdparm-ed hdh... So I rebooted - and after reboot fsck of hde* passed ok, but on hdh* it was not able to find debian tree - directory tree was cut to about 7 parts which were reconnected to /lost+found. Probably someone took deep look at some inodes, as fsck found about 1000 errors - it is too much from filesystem which could be modified only due to atime changes... So I my questions are: Is it ok to use pdc202xx driver at all? It does not complain about UDMA CRC errors, but data are (always) corrupted. Should I return back to UDMA66 VIA instead of UDMA100 promise? Should I rejumper my removable HDD to be master, and not slave? Thanks, Petr Vandrovec [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
PDC20265, disk corruption and NMI watchdog...
Short story: Hi Andre, why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?! It is invoked with interrupts disabled, causing NMI watchdog detected on my system, leading to complete crash of system. Long story: At home I have Asus A7V motherboard with 1G Athlon, and onboard PDC20265 and VIA KT133. Only CDROM is connected to VIA (as I was not able to get any ATAPI device with Promise under Linux). To primary master of PDC (hde) there is IBM-DTLA-307045, happilly running in UDMA5 mode. As secondary slave (hdh) there is removable hdd TOSHIBA MK6409MAV - I use this hdd to transport data between work, home and grandparents. On every weekend I bring debian packages on this hdd home, as downloading couple of MBs each weekend is not acceptable for dialup connection. But data are never copied OK from one hdd (hdh) to another (hde) - - hdh runs in UDMA2, hde in UDMA5. There are always 4 different bytes in couple of files - corrupted bytes are always on word boundary, but sometime they are on dword, sometime they are not. Data are never moved, they are just random bytes... It looks like that problem is with removable HDD (source), not with UDMA5 destination. So I decided to 'hdparm -d1 -X 65 /dev/hdh' to switch it to UDMA1. i was awarded by (4 times): hde: dma_intr: bad DMA status hde: dma_intr: status = 0x50 { DriveReady SeekComplete } and hde: DMA disabled NMI watchdog detected lockup on CPU0, registers: ... Dump says that ide_delay_50ms was invoked with interrupts disabled in swapper task, through pdc202xx_reset - do_reset1 - ide_do_reset - ide_error - ide_dma_intr - ide_intr - handle_IRQ_event - do_IRQ - (interrupt) - default_idle - cpu_idle. I have no idea why it compalined on hde, when I hdparm-ed hdh... So I rebooted - and after reboot fsck of hde* passed ok, but on hdh* it was not able to find debian tree - directory tree was cut to about 7 parts which were reconnected to /lost+found. Probably someone took deep look at some inodes, as fsck found about 1000 errors - it is too much from filesystem which could be modified only due to atime changes... So I my questions are: Is it ok to use pdc202xx driver at all? It does not complain about UDMA CRC errors, but data are (always) corrupted. Should I return back to UDMA66 VIA instead of UDMA100 promise? Should I rejumper my removable HDD to be master, and not slave? Thanks, Petr Vandrovec [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PDC20265, disk corruption and NMI watchdog...
Everything but a kernel version :-( On Mon, 29 Jan 2001, Petr Vandrovec wrote: Short story: Hi Andre, why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?! It is invoked with interrupts disabled, causing NMI watchdog detected on my system, leading to complete crash of system. Long story: At home I have Asus A7V motherboard with 1G Athlon, and onboard PDC20265 and VIA KT133. Only CDROM is connected to VIA (as I was not able to get any ATAPI device with Promise under Linux). To primary master of PDC (hde) there is IBM-DTLA-307045, happilly running in UDMA5 mode. As secondary slave (hdh) there is removable hdd TOSHIBA MK6409MAV - I use this hdd to transport data between work, home and grandparents. On every weekend I bring debian packages on this hdd home, as downloading couple of MBs each weekend is not acceptable for dialup connection. But data are never copied OK from one hdd (hdh) to another (hde) - - hdh runs in UDMA2, hde in UDMA5. There are always 4 different bytes in couple of files - corrupted bytes are always on word boundary, but sometime they are on dword, sometime they are not. Data are never moved, they are just random bytes... It looks like that problem is with removable HDD (source), not with UDMA5 destination. So I decided to 'hdparm -d1 -X 65 /dev/hdh' to switch it to UDMA1. i was awarded by (4 times): hde: dma_intr: bad DMA status hde: dma_intr: status = 0x50 { DriveReady SeekComplete } and hde: DMA disabled NMI watchdog detected lockup on CPU0, registers: ... Dump says that ide_delay_50ms was invoked with interrupts disabled in swapper task, through pdc202xx_reset - do_reset1 - ide_do_reset - ide_error - ide_dma_intr - ide_intr - handle_IRQ_event - do_IRQ - (interrupt) - default_idle - cpu_idle. I have no idea why it compalined on hde, when I hdparm-ed hdh... So I rebooted - and after reboot fsck of hde* passed ok, but on hdh* it was not able to find debian tree - directory tree was cut to about 7 parts which were reconnected to /lost+found. Probably someone took deep look at some inodes, as fsck found about 1000 errors - it is too much from filesystem which could be modified only due to atime changes... So I my questions are: Is it ok to use pdc202xx driver at all? It does not complain about UDMA CRC errors, but data are (always) corrupted. Should I return back to UDMA66 VIA instead of UDMA100 promise? Should I rejumper my removable HDD to be master, and not slave? Thanks, Petr Vandrovec [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ Andre Hedrick Linux ATA Development - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PDC20265, disk corruption and NMI watchdog...
On Mon, Jan 28, 2001 at 17:36 Andre Hedrick wrote: Everything but a kernel version :-( Sorry. Corruption happens since I have this motherboard just before christmas - something about 2.4.0-test13-pre2. It was not so massive as today - with 2.4.0-ac10 (before it just died, today it damaged hdh in addition). I got corruption/lockup with other versions (test13-pre3, test13-pre7, 2.4.0, 2.4.0-ac3, 2.4.0-ac8, 2.4.0-ac9) too, but as this one (ac10) does NMI watchdog on UP, I finally found what's wrong instead of silent death - it is why I finally wrote this letter - so you can either __sti() or mdelay() in ide_delay_50ms. So others will not suffer from silent death instead of IDE reset... Then I can test whether promise recovers from this - I doubt, as it looks like that some data for /dev/hde landed on /dev/hdh (impossible, I know...). On Mon, 29 Jan 2001, Petr Vandrovec wrote: why on Earth ide_delay_50ms uses jiffies instead of mdelay(50) ?! It is invoked with interrupts disabled, causing NMI watchdog detected on my system, leading to complete crash of system. For now I said that both disks should use PIO4 and it looks stable... But there is visible difference in speed between PIO4 and UDMA5 ;-) I have no idea whether TOSHIBA can do UDMA CRCs, but it works fine at work, where I connected it to i440BX, and since November I connect it to VIA694X/686A. Both in UDMA2 mode without troubles. Best regards, Petr Vandrovec [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/