Re: PROBLEM: sata_sil24 lockups under heavy i/o
Mark Wagner wrote: > The sil24-connected sata drives are external and connected to their own > power supply. > > I've replaced the sil24-based card with a Promise SATA300 TX4 controller > card and everything seems to work now. Hmmm... sil24 fares well with four ports occupied. Weird. Care to give it another shot? Maybe pci bus contact was bad or something. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata_sil24 lockups under heavy i/o
On Sun, Jan 07, 2007 at 03:27:03PM +0900, Tejun Heo wrote: > Mark Wagner wrote: > [--snip--] > >NETDEV WATCHDOG: eth0: transmit timed out > >eth0: transmit timed out, tx_status 00 status e000. > [--snip--] > >hda: DMA timeout error > >hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > >ide: failed opcode was: unknown > >hda: status timeout: status=0xd0 { Busy } > >ide: failed opcode was: unknown > >hdb: DMA disabled > >hda: no DRQ after issuing MULTWRITE_EXT > >ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen > >ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) > >ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen > >ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) > >ata4: hard resetting port > >ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen > >ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) > >ata2: hard resetting port > >ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen > >ata1.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) > >ata1.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) > [--snip--] > >i2c_adapter i2c-0: Transaction error! > >i2c_adapter i2c-0: Transaction error! > >i2c_adapter i2c-0: Transaction error! > > It seems like your system is falling apart. Timeouts are occurring > everywhere. Either IRQ routing went wrong or your powersupply is not > providing enough power. Adding two more disks to sil24 doesn't change > anything about IRQ routing. If the system functioned okay w/ two disks > attached to sil24, give your system a better power supply or rewire > power cables such that each power lane is more equally loaded. The sil24-connected sata drives are external and connected to their own power supply. I've replaced the sil24-based card with a Promise SATA300 TX4 controller card and everything seems to work now. Thanks, Mark -- Mark Wagner [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata_sil24 lockups under heavy i/o
On Sun, Jan 07, 2007 at 03:27:03PM +0900, Tejun Heo wrote: Mark Wagner wrote: [--snip--] NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status e000. [--snip--] hda: DMA timeout error hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hda: status timeout: status=0xd0 { Busy } ide: failed opcode was: unknown hdb: DMA disabled hda: no DRQ after issuing MULTWRITE_EXT ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata4: hard resetting port ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2: hard resetting port ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen ata1.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata1.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) [--snip--] i2c_adapter i2c-0: Transaction error! i2c_adapter i2c-0: Transaction error! i2c_adapter i2c-0: Transaction error! It seems like your system is falling apart. Timeouts are occurring everywhere. Either IRQ routing went wrong or your powersupply is not providing enough power. Adding two more disks to sil24 doesn't change anything about IRQ routing. If the system functioned okay w/ two disks attached to sil24, give your system a better power supply or rewire power cables such that each power lane is more equally loaded. The sil24-connected sata drives are external and connected to their own power supply. I've replaced the sil24-based card with a Promise SATA300 TX4 controller card and everything seems to work now. Thanks, Mark -- Mark Wagner [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata_sil24 lockups under heavy i/o
Mark Wagner wrote: The sil24-connected sata drives are external and connected to their own power supply. I've replaced the sil24-based card with a Promise SATA300 TX4 controller card and everything seems to work now. Hmmm... sil24 fares well with four ports occupied. Weird. Care to give it another shot? Maybe pci bus contact was bad or something. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata_sil24 lockups under heavy i/o
Hello, Mark Wagner wrote: [--snip--] NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status e000. [--snip--] hda: DMA timeout error hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hda: status timeout: status=0xd0 { Busy } ide: failed opcode was: unknown hdb: DMA disabled hda: no DRQ after issuing MULTWRITE_EXT ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata4: hard resetting port ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2: hard resetting port ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen ata1.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata1.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) [--snip--] i2c_adapter i2c-0: Transaction error! i2c_adapter i2c-0: Transaction error! i2c_adapter i2c-0: Transaction error! It seems like your system is falling apart. Timeouts are occurring everywhere. Either IRQ routing went wrong or your powersupply is not providing enough power. Adding two more disks to sil24 doesn't change anything about IRQ routing. If the system functioned okay w/ two disks attached to sil24, give your system a better power supply or rewire power cables such that each power lane is more equally loaded. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata_sil24 lockups under heavy i/o
Hello, Mark Wagner wrote: [--snip--] NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status e000. [--snip--] hda: DMA timeout error hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hda: status timeout: status=0xd0 { Busy } ide: failed opcode was: unknown hdb: DMA disabled hda: no DRQ after issuing MULTWRITE_EXT ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata4: hard resetting port ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2: hard resetting port ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen ata1.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata1.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) [--snip--] i2c_adapter i2c-0: Transaction error! i2c_adapter i2c-0: Transaction error! i2c_adapter i2c-0: Transaction error! It seems like your system is falling apart. Timeouts are occurring everywhere. Either IRQ routing went wrong or your powersupply is not providing enough power. Adding two more disks to sil24 doesn't change anything about IRQ routing. If the system functioned okay w/ two disks attached to sil24, give your system a better power supply or rewire power cables such that each power lane is more equally loaded. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata_sil24 lockups under heavy i/o
Mark Wagner wrote: [1.] One line summary of the problem: sata_sil24 lockups under heavy i/o [2.] Full description of the problem/report: I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning well with two disks attached. Once I attached 2 additional disks (for a total of 4) and started heavy i/o (extending a software raid5 device) the system began locking up for a few minutes at a time. After the system recovers the disk transfer speed is reduced from UDMA/100 to UDMA/66 or UDMA/44. I don't think this is anything to do with the sata_sil24 driver. Something really wierd seems to be going on with interrupts on this machine: /proc/interrupts CPU0 0: 20507744XT-PIC-XTtimer 1:262XT-PIC-XTi8042 2: 0XT-PIC-XTcascade 5: 962175XT-PIC-XTsym53c8xx, uhci_hcd:usb1, uhci_hcd:usb2 7: 3678XT-PIC-XTparport0 8: 2XT-PIC-XTrtc 10:9153035XT-PIC-XTide2, eth0 11: 30XT-PIC-XTsym53c8xx 12:1026266XT-PIC-XTlibata 14: 840214XT-PIC-XTide0 15: 569928XT-PIC-XTide1 NMI: 11264 LOC: 20506755 ERR:234 MIS: 0 Output of dmesg: ... Local APIC disabled by BIOS -- reenabling. Found and enabled local APIC! Hmm, you might want to try not forcing the local APIC enabled by removing the lapic option from the kernel command line. Don't know if it could be related though. VP_IDE: IDE controller at PCI slot :00:04.1 VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci:00:04.1 ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:DMA Probing IDE interface ide0... input: AT Translated Set 2 keyboard as /class/input/input0 hda: MAXTOR STM3160812A, ATA DISK drive hda: IRQ probe failed (0xfef8) hdb: WDC WD1600JB-00EVA0, ATA DISK drive hdb: IRQ probe failed (0xfef8) ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: IC35L120AVVA07-0, ATA DISK drive hdc: IRQ probe failed (0xbef8) hdd: WDC WD1200JB-00GVA0, ATA DISK drive hdd: IRQ probe failed (0xbef8) These "IRQ proble failed" errors don't seem right at all. Then: NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status e000. diagnostics: net 0cd8 media 8880 dma 00a0 fifo Flags; bus-master 1, dirty 203390(14) current 203406(14) Transmit list 01bea840 vs. c1beaac0. So eth0's not happy. ata2.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x6 frozen ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2.00: tag 2 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2: hard resetting port The SATA card is getting timeouts. And from your other mail: > hda: DMA timeout error > hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > ide: failed opcode was: unknown > hda: status timeout: status=0xd0 { Busy } > ide: failed opcode was: unknown > hda: no DRQ after issuing MULTWRITE_EXT Your onboard IDE controller is also timing out.. Sounds to me like some kind of general IRQ problem, though you'd have to be losing interrupts on IRQ 10, 12 and 14 which seems pretty extreme. Maybe a hardware problem, are you sure you have enough power to run this many drives in the box? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata_sil24 lockups under heavy i/o
On Wed, Jan 03, 2007 at 09:30:24AM -0800, Mark Wagner wrote: > [1.] One line summary of the problem: > > sata_sil24 lockups under heavy i/o > > [2.] Full description of the problem/report: > > I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning > well with two disks attached. Once I attached 2 additional disks (for > a total of 4) and started heavy i/o (extending a software raid5 device) > the system began locking up for a few minutes at a time. After the > system recovers the disk transfer speed is reduced from UDMA/100 to > UDMA/66 or UDMA/44. Last night I performed a simultaneous dd on the 4 drives on the Sil3124 card like so: dd if=/dev/sda of=/dev/null dd if=/dev/sdb of=/dev/null dd if=/dev/sdc of=/dev/null dd if=/dev/sdd of=/dev/null Three times the system temporarily locked up and then lowered the speeds of the drives. They are currently at PIO4. What might be causing this? Here is the dmesg from when the problem occurred: NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status e000. diagnostics: net 0cd8 media 8880 dma 00a0 fifo Flags; bus-master 1, dirty 13609990(6) current 13610006(6) Transmit list 01beaa20 vs. c1bea5c0. 0: @c1bea200 length 85ea status 0c0005ea 1: @c1bea2a0 length 85ea status 0c0005ea 2: @c1bea340 length 85ea status 0c0005ea 3: @c1bea3e0 length 85ea status 0c0005ea 4: @c1bea480 length 85ea status 8c0005ea 5: @c1bea520 length 85ea status 8c0005ea 6: @c1bea5c0 length 85ea status 0c0105ea 7: @c1bea660 length 85ea status 0c0105ea 8: @c1bea700 length 85ea status 0c0105ea 9: @c1bea7a0 length 85ea status 0c0105ea 10: @c1bea840 length 85ea status 0c0105ea 11: @c1bea8e0 length 85ea status 0c0105ea 12: @c1bea980 length 85ea status 0c0105ea 13: @c1beaa20 length 85ea status 0c0005ea 14: @c1beaac0 length 85ea status 0c0005ea 15: @c1beab60 length 85ea status 0c0005ea eth0: Resetting the Tx ring pointer. NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status e000. diagnostics: net 0cd8 media 8880 dma 00a0 fifo 8000 Flags; bus-master 1, dirty 13609990(6) current 13610006(6) Transmit list 01bea5c0 vs. c1bea5c0. 0: @c1bea200 length 85ea status 0c0005ea 1: @c1bea2a0 length 85ea status 0c0005ea 2: @c1bea340 length 85ea status 0c0005ea 3: @c1bea3e0 length 85ea status 0c0005ea 4: @c1bea480 length 85ea status 8c0005ea 5: @c1bea520 length 85ea status 8c0005ea 6: @c1bea5c0 length 85ea status 0c0105ea 7: @c1bea660 length 85ea status 0c0105ea 8: @c1bea700 length 85ea status 0c0105ea 9: @c1bea7a0 length 85ea status 0c0105ea 10: @c1bea840 length 85ea status 0c0105ea 11: @c1bea8e0 length 85ea status 0c0105ea 12: @c1bea980 length 85ea status 0c0105ea 13: @c1beaa20 length 85ea status 0c0005ea 14: @c1beaac0 length 85ea status 0c0005ea 15: @c1beab60 length 85ea status 0c0005ea eth0: Resetting the Tx ring pointer. hda: dma_timer_expiry: dma status == 0x61 NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status e000. diagnostics: net 0cd8 media 8880 dma 00a0 fifo 8000 Flags; bus-master 1, dirty 13609990(6) current 13610006(6) Transmit list 01bea5c0 vs. c1bea5c0. 0: @c1bea200 length 85ea status 0c0005ea 1: @c1bea2a0 length 85ea status 0c0005ea 2: @c1bea340 length 85ea status 0c0005ea 3: @c1bea3e0 length 85ea status 0c0005ea 4: @c1bea480 length 85ea status 8c0005ea 5: @c1bea520 length 85ea status 8c0005ea 6: @c1bea5c0 length 85ea status 0c0105ea 7: @c1bea660 length 85ea status 0c0105ea 8: @c1bea700 length 85ea status 0c0105ea 9: @c1bea7a0 length 85ea status 0c0105ea 10: @c1bea840 length 85ea status 0c0105ea 11: @c1bea8e0 length 85ea status 0c0105ea 12: @c1bea980 length 85ea status 0c0105ea 13: @c1beaa20 length 85ea status 0c0005ea 14: @c1beaac0 length 85ea status 0c0005ea 15: @c1beab60 length 85ea status 0c0005ea eth0: Resetting the Tx ring pointer. hda: DMA timeout error hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hda: status timeout: status=0xd0 { Busy } ide: failed opcode was: unknown hdb: DMA disabled hda: no DRQ after issuing MULTWRITE_EXT ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata4: hard resetting port ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2: hard resetting port ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 froz
Re: PROBLEM: sata_sil24 lockups under heavy i/o
On Wed, Jan 03, 2007 at 09:30:24AM -0800, Mark Wagner wrote: [1.] One line summary of the problem: sata_sil24 lockups under heavy i/o [2.] Full description of the problem/report: I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning well with two disks attached. Once I attached 2 additional disks (for a total of 4) and started heavy i/o (extending a software raid5 device) the system began locking up for a few minutes at a time. After the system recovers the disk transfer speed is reduced from UDMA/100 to UDMA/66 or UDMA/44. Last night I performed a simultaneous dd on the 4 drives on the Sil3124 card like so: dd if=/dev/sda of=/dev/null dd if=/dev/sdb of=/dev/null dd if=/dev/sdc of=/dev/null dd if=/dev/sdd of=/dev/null Three times the system temporarily locked up and then lowered the speeds of the drives. They are currently at PIO4. What might be causing this? Here is the dmesg from when the problem occurred: NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status e000. diagnostics: net 0cd8 media 8880 dma 00a0 fifo Flags; bus-master 1, dirty 13609990(6) current 13610006(6) Transmit list 01beaa20 vs. c1bea5c0. 0: @c1bea200 length 85ea status 0c0005ea 1: @c1bea2a0 length 85ea status 0c0005ea 2: @c1bea340 length 85ea status 0c0005ea 3: @c1bea3e0 length 85ea status 0c0005ea 4: @c1bea480 length 85ea status 8c0005ea 5: @c1bea520 length 85ea status 8c0005ea 6: @c1bea5c0 length 85ea status 0c0105ea 7: @c1bea660 length 85ea status 0c0105ea 8: @c1bea700 length 85ea status 0c0105ea 9: @c1bea7a0 length 85ea status 0c0105ea 10: @c1bea840 length 85ea status 0c0105ea 11: @c1bea8e0 length 85ea status 0c0105ea 12: @c1bea980 length 85ea status 0c0105ea 13: @c1beaa20 length 85ea status 0c0005ea 14: @c1beaac0 length 85ea status 0c0005ea 15: @c1beab60 length 85ea status 0c0005ea eth0: Resetting the Tx ring pointer. NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status e000. diagnostics: net 0cd8 media 8880 dma 00a0 fifo 8000 Flags; bus-master 1, dirty 13609990(6) current 13610006(6) Transmit list 01bea5c0 vs. c1bea5c0. 0: @c1bea200 length 85ea status 0c0005ea 1: @c1bea2a0 length 85ea status 0c0005ea 2: @c1bea340 length 85ea status 0c0005ea 3: @c1bea3e0 length 85ea status 0c0005ea 4: @c1bea480 length 85ea status 8c0005ea 5: @c1bea520 length 85ea status 8c0005ea 6: @c1bea5c0 length 85ea status 0c0105ea 7: @c1bea660 length 85ea status 0c0105ea 8: @c1bea700 length 85ea status 0c0105ea 9: @c1bea7a0 length 85ea status 0c0105ea 10: @c1bea840 length 85ea status 0c0105ea 11: @c1bea8e0 length 85ea status 0c0105ea 12: @c1bea980 length 85ea status 0c0105ea 13: @c1beaa20 length 85ea status 0c0005ea 14: @c1beaac0 length 85ea status 0c0005ea 15: @c1beab60 length 85ea status 0c0005ea eth0: Resetting the Tx ring pointer. hda: dma_timer_expiry: dma status == 0x61 NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status e000. diagnostics: net 0cd8 media 8880 dma 00a0 fifo 8000 Flags; bus-master 1, dirty 13609990(6) current 13610006(6) Transmit list 01bea5c0 vs. c1bea5c0. 0: @c1bea200 length 85ea status 0c0005ea 1: @c1bea2a0 length 85ea status 0c0005ea 2: @c1bea340 length 85ea status 0c0005ea 3: @c1bea3e0 length 85ea status 0c0005ea 4: @c1bea480 length 85ea status 8c0005ea 5: @c1bea520 length 85ea status 8c0005ea 6: @c1bea5c0 length 85ea status 0c0105ea 7: @c1bea660 length 85ea status 0c0105ea 8: @c1bea700 length 85ea status 0c0105ea 9: @c1bea7a0 length 85ea status 0c0105ea 10: @c1bea840 length 85ea status 0c0105ea 11: @c1bea8e0 length 85ea status 0c0105ea 12: @c1bea980 length 85ea status 0c0105ea 13: @c1beaa20 length 85ea status 0c0005ea 14: @c1beaac0 length 85ea status 0c0005ea 15: @c1beab60 length 85ea status 0c0005ea eth0: Resetting the Tx ring pointer. hda: DMA timeout error hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hda: status timeout: status=0xd0 { Busy } ide: failed opcode was: unknown hdb: DMA disabled hda: no DRQ after issuing MULTWRITE_EXT ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata4: hard resetting port ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2: hard resetting port ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen ata1.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40
Re: PROBLEM: sata_sil24 lockups under heavy i/o
Mark Wagner wrote: [1.] One line summary of the problem: sata_sil24 lockups under heavy i/o [2.] Full description of the problem/report: I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning well with two disks attached. Once I attached 2 additional disks (for a total of 4) and started heavy i/o (extending a software raid5 device) the system began locking up for a few minutes at a time. After the system recovers the disk transfer speed is reduced from UDMA/100 to UDMA/66 or UDMA/44. I don't think this is anything to do with the sata_sil24 driver. Something really wierd seems to be going on with interrupts on this machine: /proc/interrupts CPU0 0: 20507744XT-PIC-XTtimer 1:262XT-PIC-XTi8042 2: 0XT-PIC-XTcascade 5: 962175XT-PIC-XTsym53c8xx, uhci_hcd:usb1, uhci_hcd:usb2 7: 3678XT-PIC-XTparport0 8: 2XT-PIC-XTrtc 10:9153035XT-PIC-XTide2, eth0 11: 30XT-PIC-XTsym53c8xx 12:1026266XT-PIC-XTlibata 14: 840214XT-PIC-XTide0 15: 569928XT-PIC-XTide1 NMI: 11264 LOC: 20506755 ERR:234 MIS: 0 Output of dmesg: ... Local APIC disabled by BIOS -- reenabling. Found and enabled local APIC! Hmm, you might want to try not forcing the local APIC enabled by removing the lapic option from the kernel command line. Don't know if it could be related though. VP_IDE: IDE controller at PCI slot :00:04.1 VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci:00:04.1 ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:DMA Probing IDE interface ide0... input: AT Translated Set 2 keyboard as /class/input/input0 hda: MAXTOR STM3160812A, ATA DISK drive hda: IRQ probe failed (0xfef8) hdb: WDC WD1600JB-00EVA0, ATA DISK drive hdb: IRQ probe failed (0xfef8) ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: IC35L120AVVA07-0, ATA DISK drive hdc: IRQ probe failed (0xbef8) hdd: WDC WD1200JB-00GVA0, ATA DISK drive hdd: IRQ probe failed (0xbef8) These IRQ proble failed errors don't seem right at all. Then: NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status e000. diagnostics: net 0cd8 media 8880 dma 00a0 fifo Flags; bus-master 1, dirty 203390(14) current 203406(14) Transmit list 01bea840 vs. c1beaac0. So eth0's not happy. ata2.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x6 frozen ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2.00: tag 2 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2: hard resetting port The SATA card is getting timeouts. And from your other mail: hda: DMA timeout error hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hda: status timeout: status=0xd0 { Busy } ide: failed opcode was: unknown hda: no DRQ after issuing MULTWRITE_EXT Your onboard IDE controller is also timing out.. Sounds to me like some kind of general IRQ problem, though you'd have to be losing interrupts on IRQ 10, 12 and 14 which seems pretty extreme. Maybe a hardware problem, are you sure you have enough power to run this many drives in the box? -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PROBLEM: sata_sil24 lockups under heavy i/o
[1.] One line summary of the problem: sata_sil24 lockups under heavy i/o [2.] Full description of the problem/report: I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning well with two disks attached. Once I attached 2 additional disks (for a total of 4) and started heavy i/o (extending a software raid5 device) the system began locking up for a few minutes at a time. After the system recovers the disk transfer speed is reduced from UDMA/100 to UDMA/66 or UDMA/44. [3.] Keywords (i.e., modules, networking, kernel): libata sata_sil24 [4.] Kernel version (from /proc/version): Linux version 2.6.19-gentoo-r2 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 4.1.1-r3)) #1 Tue Dec 19 22:55:21 PST 2006 [5.] Most recent kernel version which did not have the bug: Unknown. [8.1.] Software (add the output of the ver_linux script here) Linux cthulhu 2.6.19-gentoo-r2 #1 Tue Dec 19 22:55:21 PST 2006 i686 AMD Athlon(tm) Processor AuthenticAMD GNU/Linux Gnu C 4.1.1 Gnu make 3.81 binutils 2.17 util-linux 2.12r mount 2.12r module-init-tools 3.2.2 e2fsprogs 1.39 Linux C Library> libc.2.4 Dynamic linker (ldd) 2.4 Procps 3.2.7 Net-tools 1.60 Kbd1.12 Sh-utils 6.7 udev 103 Modules Loaded w83781d hwmon_vid lp usbhid 8250_pnp 8250 serial_core parport_pc pcspkr parport uhci_hcd via686a i2c_isa usbcore i2c_viapro i2c_core [8.2.] Processor information (from /proc/cpuinfo): processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 4 model name : AMD Athlon(tm) Processor stepping: 4 cpu MHz : 1410.226 cache size : 256 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow bogomips: 2822.62 [8.3.] Module information (from /proc/modules): w83781d 28008 1 - Live 0xfa367000 hwmon_vid 2240 1 w83781d, Live 0xf887d000 lp 8452 0 - Live 0xfa24c000 usbhid 32288 1 - Live 0xfa257000 8250_pnp 8704 0 - Live 0xf883b000 8250 17252 1 8250_pnp, Live 0xf8851000 serial_core 14976 1 8250, Live 0xf884c000 parport_pc 28644 1 - Live 0xfa202000 pcspkr 2240 0 - Live 0xf883f000 parport 30600 2 lp,parport_pc, Live 0xf8872000 uhci_hcd 16776 0 - Live 0xf8822000 via686a 13320 0 - Live 0xf8841000 i2c_isa 3584 2 w83781d,via686a, Live 0xf8839000 usbcore 99524 4 usbhid,uhci_hcd, Live 0xf8858000 i2c_viapro 6932 0 - Live 0xf882d000 i2c_core 15952 4 w83781d,via686a,i2c_isa,i2c_viapro, Live 0xf8828000 [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) -001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 02f8-02ff : serial 0376-0376 : ide1 0378-037a : parport0 03c0-03df : vga+ 03c0-03df : vesafb 03f6-03f6 : ide0 03f8-03ff : serial 0778-077a : parport0 0cf8-0cff : PCI conf1 7800-783f : :00:11.0 7800-7807 : ide2 7808-780f : ide3 7810-783f : PDC20265 8000-8003 : :00:11.0 8400-8407 : :00:11.0 8800-8803 : :00:11.0 8802-8802 : ide2 9000-9007 : :00:11.0 9000-9007 : ide2 9400-947f : :00:0b.0 9800-980f : :00:0a.0 9800-980f : sata_sil24 a000-a0ff : :00:09.1 a000-a0ff : sym53c8xx a400-a4ff : :00:09.0 a400-a4ff : sym53c8xx d000-d01f : :00:04.3 d000-d01f : uhci_hcd d400-d41f : :00:04.2 d400-d41f : uhci_hcd d800-d80f : :00:04.1 d800-d807 : ide0 d808-d80f : ide1 e200-e27f : :00:04.4 e400-e47f : pnp 00:12 e800-e80f : :00:04.4 e800-e807 : vt596_smbus -0009efff : System RAM 0009f000-0009 : reserved 000a-000b : Video RAM area 000c-000cf3ff : Video ROM 000d-000d27ff : Adapter ROM 000d4000-000d47ff : Adapter ROM 000d8000-000dbfff : Adapter ROM 000f-000f : System ROM 0010-3ffebfff : System RAM 0010-002efefe : Kernel code 002efeff-003acf43 : Kernel data 3ffec000-3ffeefff : ACPI Tables 3ffef000-3fffefff : reserved 3000-3fff : ACPI Non-volatile Storage 5000-5007 : :00:0a.0 5008-5009 : :00:0b.0 500a-500a : :00:09.0 500b-500b : :00:09.1 500c-500c : :00:11.0 d200-d201 : :00:11.0 d280-d280007f : :00:0b.0 d300-d3007fff : :00:0a.0 d300-d3007fff : sata_sil24 d380-d380007f : :00:0a.0 d380-d380007f : sata_sil24 d400-d4000fff : :00:09.1 d400-d4000fff : sym53c8xx d480-d48000ff : :00:09.1 d480-d48000ff : sym53c8xx d500-d5000fff : :00:09.0 d500-d5000fff : sym53c8xx d580-d58000ff : :00:09.0 d580-d5800
PROBLEM: sata_sil24 lockups under heavy i/o
[1.] One line summary of the problem: sata_sil24 lockups under heavy i/o [2.] Full description of the problem/report: I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning well with two disks attached. Once I attached 2 additional disks (for a total of 4) and started heavy i/o (extending a software raid5 device) the system began locking up for a few minutes at a time. After the system recovers the disk transfer speed is reduced from UDMA/100 to UDMA/66 or UDMA/44. [3.] Keywords (i.e., modules, networking, kernel): libata sata_sil24 [4.] Kernel version (from /proc/version): Linux version 2.6.19-gentoo-r2 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 4.1.1-r3)) #1 Tue Dec 19 22:55:21 PST 2006 [5.] Most recent kernel version which did not have the bug: Unknown. [8.1.] Software (add the output of the ver_linux script here) Linux cthulhu 2.6.19-gentoo-r2 #1 Tue Dec 19 22:55:21 PST 2006 i686 AMD Athlon(tm) Processor AuthenticAMD GNU/Linux Gnu C 4.1.1 Gnu make 3.81 binutils 2.17 util-linux 2.12r mount 2.12r module-init-tools 3.2.2 e2fsprogs 1.39 Linux C Library libc.2.4 Dynamic linker (ldd) 2.4 Procps 3.2.7 Net-tools 1.60 Kbd1.12 Sh-utils 6.7 udev 103 Modules Loaded w83781d hwmon_vid lp usbhid 8250_pnp 8250 serial_core parport_pc pcspkr parport uhci_hcd via686a i2c_isa usbcore i2c_viapro i2c_core [8.2.] Processor information (from /proc/cpuinfo): processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 4 model name : AMD Athlon(tm) Processor stepping: 4 cpu MHz : 1410.226 cache size : 256 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow bogomips: 2822.62 [8.3.] Module information (from /proc/modules): w83781d 28008 1 - Live 0xfa367000 hwmon_vid 2240 1 w83781d, Live 0xf887d000 lp 8452 0 - Live 0xfa24c000 usbhid 32288 1 - Live 0xfa257000 8250_pnp 8704 0 - Live 0xf883b000 8250 17252 1 8250_pnp, Live 0xf8851000 serial_core 14976 1 8250, Live 0xf884c000 parport_pc 28644 1 - Live 0xfa202000 pcspkr 2240 0 - Live 0xf883f000 parport 30600 2 lp,parport_pc, Live 0xf8872000 uhci_hcd 16776 0 - Live 0xf8822000 via686a 13320 0 - Live 0xf8841000 i2c_isa 3584 2 w83781d,via686a, Live 0xf8839000 usbcore 99524 4 usbhid,uhci_hcd, Live 0xf8858000 i2c_viapro 6932 0 - Live 0xf882d000 i2c_core 15952 4 w83781d,via686a,i2c_isa,i2c_viapro, Live 0xf8828000 [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) -001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 02f8-02ff : serial 0376-0376 : ide1 0378-037a : parport0 03c0-03df : vga+ 03c0-03df : vesafb 03f6-03f6 : ide0 03f8-03ff : serial 0778-077a : parport0 0cf8-0cff : PCI conf1 7800-783f : :00:11.0 7800-7807 : ide2 7808-780f : ide3 7810-783f : PDC20265 8000-8003 : :00:11.0 8400-8407 : :00:11.0 8800-8803 : :00:11.0 8802-8802 : ide2 9000-9007 : :00:11.0 9000-9007 : ide2 9400-947f : :00:0b.0 9800-980f : :00:0a.0 9800-980f : sata_sil24 a000-a0ff : :00:09.1 a000-a0ff : sym53c8xx a400-a4ff : :00:09.0 a400-a4ff : sym53c8xx d000-d01f : :00:04.3 d000-d01f : uhci_hcd d400-d41f : :00:04.2 d400-d41f : uhci_hcd d800-d80f : :00:04.1 d800-d807 : ide0 d808-d80f : ide1 e200-e27f : :00:04.4 e400-e47f : pnp 00:12 e800-e80f : :00:04.4 e800-e807 : vt596_smbus -0009efff : System RAM 0009f000-0009 : reserved 000a-000b : Video RAM area 000c-000cf3ff : Video ROM 000d-000d27ff : Adapter ROM 000d4000-000d47ff : Adapter ROM 000d8000-000dbfff : Adapter ROM 000f-000f : System ROM 0010-3ffebfff : System RAM 0010-002efefe : Kernel code 002efeff-003acf43 : Kernel data 3ffec000-3ffeefff : ACPI Tables 3ffef000-3fffefff : reserved 3000-3fff : ACPI Non-volatile Storage 5000-5007 : :00:0a.0 5008-5009 : :00:0b.0 500a-500a : :00:09.0 500b-500b : :00:09.1 500c-500c : :00:11.0 d200-d201 : :00:11.0 d280-d280007f : :00:0b.0 d300-d3007fff : :00:0a.0 d300-d3007fff : sata_sil24 d380-d380007f : :00:0a.0 d380-d380007f : sata_sil24 d400-d4000fff : :00:09.1 d400-d4000fff : sym53c8xx d480-d48000ff : :00:09.1 d480-d48000ff : sym53c8xx d500-d5000fff : :00:09.0 d500-d5000fff : sym53c8xx d580-d58000ff : :00:09.0 d580-d58000ff