Re: PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-12 Thread Tejun Heo
Mark Wagner wrote:
> The sil24-connected sata drives are external and connected to their own
> power supply.
> 
> I've replaced the sil24-based card with a Promise SATA300 TX4 controller
> card and everything seems to work now.

Hmmm... sil24 fares well with four ports occupied.  Weird.  Care to give
it another shot?  Maybe pci bus contact was bad or something.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-12 Thread Mark Wagner
On Sun, Jan 07, 2007 at 03:27:03PM +0900, Tejun Heo wrote:

> Mark Wagner wrote:
> [--snip--]
> >NETDEV WATCHDOG: eth0: transmit timed out
> >eth0: transmit timed out, tx_status 00 status e000.
> [--snip--]
> >hda: DMA timeout error
> >hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
> >ide: failed opcode was: unknown
> >hda: status timeout: status=0xd0 { Busy }
> >ide: failed opcode was: unknown
> >hdb: DMA disabled
> >hda: no DRQ after issuing MULTWRITE_EXT
> >ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
> >ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
> >ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
> >ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
> >ata4: hard resetting port
> >ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
> >ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
> >ata2: hard resetting port
> >ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
> >ata1.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
> >ata1.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
> [--snip--]
> >i2c_adapter i2c-0: Transaction error!
> >i2c_adapter i2c-0: Transaction error!
> >i2c_adapter i2c-0: Transaction error!
> 
> It seems like your system is falling apart.  Timeouts are occurring 
> everywhere.  Either IRQ routing went wrong or your powersupply is not 
> providing enough power.  Adding two more disks to sil24 doesn't change 
> anything about IRQ routing.  If the system functioned okay w/ two disks 
> attached to sil24, give your system a better power supply or rewire 
> power cables such that each power lane is more equally loaded.

The sil24-connected sata drives are external and connected to their own
power supply.

I've replaced the sil24-based card with a Promise SATA300 TX4 controller
card and everything seems to work now.

Thanks,

Mark

-- 
Mark Wagner [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-12 Thread Mark Wagner
On Sun, Jan 07, 2007 at 03:27:03PM +0900, Tejun Heo wrote:

 Mark Wagner wrote:
 [--snip--]
 NETDEV WATCHDOG: eth0: transmit timed out
 eth0: transmit timed out, tx_status 00 status e000.
 [--snip--]
 hda: DMA timeout error
 hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
 ide: failed opcode was: unknown
 hda: status timeout: status=0xd0 { Busy }
 ide: failed opcode was: unknown
 hdb: DMA disabled
 hda: no DRQ after issuing MULTWRITE_EXT
 ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
 ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
 ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
 ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
 ata4: hard resetting port
 ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
 ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
 ata2: hard resetting port
 ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
 ata1.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
 ata1.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
 [--snip--]
 i2c_adapter i2c-0: Transaction error!
 i2c_adapter i2c-0: Transaction error!
 i2c_adapter i2c-0: Transaction error!
 
 It seems like your system is falling apart.  Timeouts are occurring 
 everywhere.  Either IRQ routing went wrong or your powersupply is not 
 providing enough power.  Adding two more disks to sil24 doesn't change 
 anything about IRQ routing.  If the system functioned okay w/ two disks 
 attached to sil24, give your system a better power supply or rewire 
 power cables such that each power lane is more equally loaded.

The sil24-connected sata drives are external and connected to their own
power supply.

I've replaced the sil24-based card with a Promise SATA300 TX4 controller
card and everything seems to work now.

Thanks,

Mark

-- 
Mark Wagner [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-12 Thread Tejun Heo
Mark Wagner wrote:
 The sil24-connected sata drives are external and connected to their own
 power supply.
 
 I've replaced the sil24-based card with a Promise SATA300 TX4 controller
 card and everything seems to work now.

Hmmm... sil24 fares well with four ports occupied.  Weird.  Care to give
it another shot?  Maybe pci bus contact was bad or something.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-07 Thread Tejun Heo

Hello,

Mark Wagner wrote:
[--snip--]

NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.

[--snip--]

hda: DMA timeout error
hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hdb: DMA disabled
hda: no DRQ after issuing MULTWRITE_EXT
ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata4: hard resetting port
ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2: hard resetting port
ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
ata1.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)

[--snip--]

i2c_adapter i2c-0: Transaction error!
i2c_adapter i2c-0: Transaction error!
i2c_adapter i2c-0: Transaction error!


It seems like your system is falling apart.  Timeouts are occurring 
everywhere.  Either IRQ routing went wrong or your powersupply is not 
providing enough power.  Adding two more disks to sil24 doesn't change 
anything about IRQ routing.  If the system functioned okay w/ two disks 
attached to sil24, give your system a better power supply or rewire 
power cables such that each power lane is more equally loaded.


--
tejun

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-07 Thread Tejun Heo

Hello,

Mark Wagner wrote:
[--snip--]

NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.

[--snip--]

hda: DMA timeout error
hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hdb: DMA disabled
hda: no DRQ after issuing MULTWRITE_EXT
ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata4: hard resetting port
ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2: hard resetting port
ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
ata1.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)

[--snip--]

i2c_adapter i2c-0: Transaction error!
i2c_adapter i2c-0: Transaction error!
i2c_adapter i2c-0: Transaction error!


It seems like your system is falling apart.  Timeouts are occurring 
everywhere.  Either IRQ routing went wrong or your powersupply is not 
providing enough power.  Adding two more disks to sil24 doesn't change 
anything about IRQ routing.  If the system functioned okay w/ two disks 
attached to sil24, give your system a better power supply or rewire 
power cables such that each power lane is more equally loaded.


--
tejun

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-04 Thread Robert Hancock

Mark Wagner wrote:

[1.] One line summary of the problem:

sata_sil24 lockups under heavy i/o

[2.] Full description of the problem/report:

I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning
well with two disks attached. Once I attached 2 additional disks (for
a total of 4) and started heavy i/o (extending a software raid5 device)
the system began locking up for a few minutes at a time. After the
system recovers the disk transfer speed is reduced from UDMA/100 to
UDMA/66 or UDMA/44.


I don't think this is anything to do with the sata_sil24 driver. 
Something really wierd seems to be going on with interrupts on this machine:



/proc/interrupts

  CPU0
  0:   20507744XT-PIC-XTtimer
  1:262XT-PIC-XTi8042
  2:  0XT-PIC-XTcascade
  5: 962175XT-PIC-XTsym53c8xx, uhci_hcd:usb1,
uhci_hcd:usb2
  7:   3678XT-PIC-XTparport0
  8:  2XT-PIC-XTrtc
 10:9153035XT-PIC-XTide2, eth0
 11: 30XT-PIC-XTsym53c8xx
 12:1026266XT-PIC-XTlibata
 14: 840214XT-PIC-XTide0
 15: 569928XT-PIC-XTide1
NMI:  11264
LOC:   20506755
ERR:234
MIS:  0

Output of dmesg:



...


Local APIC disabled by BIOS -- reenabling.
Found and enabled local APIC!


Hmm, you might want to try not forcing the local APIC enabled by 
removing the lapic option from the kernel command line. Don't know if it 
could be related though.



VP_IDE: IDE controller at PCI slot :00:04.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci:00:04.1
ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:DMA
Probing IDE interface ide0...
input: AT Translated Set 2 keyboard as /class/input/input0
hda: MAXTOR STM3160812A, ATA DISK drive
hda: IRQ probe failed (0xfef8)
hdb: WDC WD1600JB-00EVA0, ATA DISK drive
hdb: IRQ probe failed (0xfef8)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: IC35L120AVVA07-0, ATA DISK drive
hdc: IRQ probe failed (0xbef8)
hdd: WDC WD1200JB-00GVA0, ATA DISK drive
hdd: IRQ probe failed (0xbef8)


These "IRQ proble failed" errors don't seem right at all.

Then:


NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
  diagnostics: net 0cd8 media 8880 dma 00a0 fifo 
  Flags; bus-master 1, dirty 203390(14) current 203406(14)
  Transmit list 01bea840 vs. c1beaac0.


So eth0's not happy.


ata2.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x6 frozen
ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2.00: tag 2 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2: hard resetting port


The SATA card is getting timeouts. And from your other mail:

> hda: DMA timeout error
> hda: dma timeout error: status=0x58 { DriveReady SeekComplete 
DataRequest }

> ide: failed opcode was: unknown
> hda: status timeout: status=0xd0 { Busy }
> ide: failed opcode was: unknown
> hda: no DRQ after issuing MULTWRITE_EXT

Your onboard IDE controller is also timing out.. Sounds to me like some 
kind of general IRQ problem, though you'd have to be losing interrupts 
on IRQ 10, 12 and 14 which seems pretty extreme. Maybe a hardware 
problem, are you sure you have enough power to run this many drives in 
the box?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-04 Thread Mark Wagner
On Wed, Jan 03, 2007 at 09:30:24AM -0800, Mark Wagner wrote:

> [1.] One line summary of the problem:
> 
> sata_sil24 lockups under heavy i/o
> 
> [2.] Full description of the problem/report:
> 
> I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning
> well with two disks attached. Once I attached 2 additional disks (for
> a total of 4) and started heavy i/o (extending a software raid5 device)
> the system began locking up for a few minutes at a time. After the
> system recovers the disk transfer speed is reduced from UDMA/100 to
> UDMA/66 or UDMA/44.

Last night I performed a simultaneous dd on the 4 drives on the Sil3124
card like so:

dd if=/dev/sda of=/dev/null
dd if=/dev/sdb of=/dev/null
dd if=/dev/sdc of=/dev/null
dd if=/dev/sdd of=/dev/null

Three times the system temporarily locked up and then lowered
the speeds of the drives. They are currently at PIO4. What
might be causing this?

Here is the dmesg from when the problem occurred:

NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
  diagnostics: net 0cd8 media 8880 dma 00a0 fifo 
  Flags; bus-master 1, dirty 13609990(6) current 13610006(6)
  Transmit list 01beaa20 vs. c1bea5c0.
  0: @c1bea200  length 85ea status 0c0005ea
  1: @c1bea2a0  length 85ea status 0c0005ea
  2: @c1bea340  length 85ea status 0c0005ea
  3: @c1bea3e0  length 85ea status 0c0005ea
  4: @c1bea480  length 85ea status 8c0005ea
  5: @c1bea520  length 85ea status 8c0005ea
  6: @c1bea5c0  length 85ea status 0c0105ea
  7: @c1bea660  length 85ea status 0c0105ea
  8: @c1bea700  length 85ea status 0c0105ea
  9: @c1bea7a0  length 85ea status 0c0105ea
  10: @c1bea840  length 85ea status 0c0105ea
  11: @c1bea8e0  length 85ea status 0c0105ea
  12: @c1bea980  length 85ea status 0c0105ea
  13: @c1beaa20  length 85ea status 0c0005ea
  14: @c1beaac0  length 85ea status 0c0005ea
  15: @c1beab60  length 85ea status 0c0005ea
eth0: Resetting the Tx ring pointer.
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
  diagnostics: net 0cd8 media 8880 dma 00a0 fifo 8000
  Flags; bus-master 1, dirty 13609990(6) current 13610006(6)
  Transmit list 01bea5c0 vs. c1bea5c0.
  0: @c1bea200  length 85ea status 0c0005ea
  1: @c1bea2a0  length 85ea status 0c0005ea
  2: @c1bea340  length 85ea status 0c0005ea
  3: @c1bea3e0  length 85ea status 0c0005ea
  4: @c1bea480  length 85ea status 8c0005ea
  5: @c1bea520  length 85ea status 8c0005ea
  6: @c1bea5c0  length 85ea status 0c0105ea
  7: @c1bea660  length 85ea status 0c0105ea
  8: @c1bea700  length 85ea status 0c0105ea
  9: @c1bea7a0  length 85ea status 0c0105ea
  10: @c1bea840  length 85ea status 0c0105ea
  11: @c1bea8e0  length 85ea status 0c0105ea
  12: @c1bea980  length 85ea status 0c0105ea
  13: @c1beaa20  length 85ea status 0c0005ea
  14: @c1beaac0  length 85ea status 0c0005ea
  15: @c1beab60  length 85ea status 0c0005ea
eth0: Resetting the Tx ring pointer.
hda: dma_timer_expiry: dma status == 0x61
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
  diagnostics: net 0cd8 media 8880 dma 00a0 fifo 8000
  Flags; bus-master 1, dirty 13609990(6) current 13610006(6)
  Transmit list 01bea5c0 vs. c1bea5c0.
  0: @c1bea200  length 85ea status 0c0005ea
  1: @c1bea2a0  length 85ea status 0c0005ea
  2: @c1bea340  length 85ea status 0c0005ea
  3: @c1bea3e0  length 85ea status 0c0005ea
  4: @c1bea480  length 85ea status 8c0005ea
  5: @c1bea520  length 85ea status 8c0005ea
  6: @c1bea5c0  length 85ea status 0c0105ea
  7: @c1bea660  length 85ea status 0c0105ea
  8: @c1bea700  length 85ea status 0c0105ea
  9: @c1bea7a0  length 85ea status 0c0105ea
  10: @c1bea840  length 85ea status 0c0105ea
  11: @c1bea8e0  length 85ea status 0c0105ea
  12: @c1bea980  length 85ea status 0c0105ea
  13: @c1beaa20  length 85ea status 0c0005ea
  14: @c1beaac0  length 85ea status 0c0005ea
  15: @c1beab60  length 85ea status 0c0005ea
eth0: Resetting the Tx ring pointer.
hda: DMA timeout error
hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hdb: DMA disabled
hda: no DRQ after issuing MULTWRITE_EXT
ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata4: hard resetting port
ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2: hard resetting port
ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 froz

Re: PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-04 Thread Mark Wagner
On Wed, Jan 03, 2007 at 09:30:24AM -0800, Mark Wagner wrote:

 [1.] One line summary of the problem:
 
 sata_sil24 lockups under heavy i/o
 
 [2.] Full description of the problem/report:
 
 I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning
 well with two disks attached. Once I attached 2 additional disks (for
 a total of 4) and started heavy i/o (extending a software raid5 device)
 the system began locking up for a few minutes at a time. After the
 system recovers the disk transfer speed is reduced from UDMA/100 to
 UDMA/66 or UDMA/44.

Last night I performed a simultaneous dd on the 4 drives on the Sil3124
card like so:

dd if=/dev/sda of=/dev/null
dd if=/dev/sdb of=/dev/null
dd if=/dev/sdc of=/dev/null
dd if=/dev/sdd of=/dev/null

Three times the system temporarily locked up and then lowered
the speeds of the drives. They are currently at PIO4. What
might be causing this?

Here is the dmesg from when the problem occurred:

NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
  diagnostics: net 0cd8 media 8880 dma 00a0 fifo 
  Flags; bus-master 1, dirty 13609990(6) current 13610006(6)
  Transmit list 01beaa20 vs. c1bea5c0.
  0: @c1bea200  length 85ea status 0c0005ea
  1: @c1bea2a0  length 85ea status 0c0005ea
  2: @c1bea340  length 85ea status 0c0005ea
  3: @c1bea3e0  length 85ea status 0c0005ea
  4: @c1bea480  length 85ea status 8c0005ea
  5: @c1bea520  length 85ea status 8c0005ea
  6: @c1bea5c0  length 85ea status 0c0105ea
  7: @c1bea660  length 85ea status 0c0105ea
  8: @c1bea700  length 85ea status 0c0105ea
  9: @c1bea7a0  length 85ea status 0c0105ea
  10: @c1bea840  length 85ea status 0c0105ea
  11: @c1bea8e0  length 85ea status 0c0105ea
  12: @c1bea980  length 85ea status 0c0105ea
  13: @c1beaa20  length 85ea status 0c0005ea
  14: @c1beaac0  length 85ea status 0c0005ea
  15: @c1beab60  length 85ea status 0c0005ea
eth0: Resetting the Tx ring pointer.
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
  diagnostics: net 0cd8 media 8880 dma 00a0 fifo 8000
  Flags; bus-master 1, dirty 13609990(6) current 13610006(6)
  Transmit list 01bea5c0 vs. c1bea5c0.
  0: @c1bea200  length 85ea status 0c0005ea
  1: @c1bea2a0  length 85ea status 0c0005ea
  2: @c1bea340  length 85ea status 0c0005ea
  3: @c1bea3e0  length 85ea status 0c0005ea
  4: @c1bea480  length 85ea status 8c0005ea
  5: @c1bea520  length 85ea status 8c0005ea
  6: @c1bea5c0  length 85ea status 0c0105ea
  7: @c1bea660  length 85ea status 0c0105ea
  8: @c1bea700  length 85ea status 0c0105ea
  9: @c1bea7a0  length 85ea status 0c0105ea
  10: @c1bea840  length 85ea status 0c0105ea
  11: @c1bea8e0  length 85ea status 0c0105ea
  12: @c1bea980  length 85ea status 0c0105ea
  13: @c1beaa20  length 85ea status 0c0005ea
  14: @c1beaac0  length 85ea status 0c0005ea
  15: @c1beab60  length 85ea status 0c0005ea
eth0: Resetting the Tx ring pointer.
hda: dma_timer_expiry: dma status == 0x61
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
  diagnostics: net 0cd8 media 8880 dma 00a0 fifo 8000
  Flags; bus-master 1, dirty 13609990(6) current 13610006(6)
  Transmit list 01bea5c0 vs. c1bea5c0.
  0: @c1bea200  length 85ea status 0c0005ea
  1: @c1bea2a0  length 85ea status 0c0005ea
  2: @c1bea340  length 85ea status 0c0005ea
  3: @c1bea3e0  length 85ea status 0c0005ea
  4: @c1bea480  length 85ea status 8c0005ea
  5: @c1bea520  length 85ea status 8c0005ea
  6: @c1bea5c0  length 85ea status 0c0105ea
  7: @c1bea660  length 85ea status 0c0105ea
  8: @c1bea700  length 85ea status 0c0105ea
  9: @c1bea7a0  length 85ea status 0c0105ea
  10: @c1bea840  length 85ea status 0c0105ea
  11: @c1bea8e0  length 85ea status 0c0105ea
  12: @c1bea980  length 85ea status 0c0105ea
  13: @c1beaa20  length 85ea status 0c0005ea
  14: @c1beaac0  length 85ea status 0c0005ea
  15: @c1beab60  length 85ea status 0c0005ea
eth0: Resetting the Tx ring pointer.
hda: DMA timeout error
hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hdb: DMA disabled
hda: no DRQ after issuing MULTWRITE_EXT
ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
ata3.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata4.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata4: hard resetting port
ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2: hard resetting port
ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
ata1.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40

Re: PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-04 Thread Robert Hancock

Mark Wagner wrote:

[1.] One line summary of the problem:

sata_sil24 lockups under heavy i/o

[2.] Full description of the problem/report:

I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning
well with two disks attached. Once I attached 2 additional disks (for
a total of 4) and started heavy i/o (extending a software raid5 device)
the system began locking up for a few minutes at a time. After the
system recovers the disk transfer speed is reduced from UDMA/100 to
UDMA/66 or UDMA/44.


I don't think this is anything to do with the sata_sil24 driver. 
Something really wierd seems to be going on with interrupts on this machine:



/proc/interrupts

  CPU0
  0:   20507744XT-PIC-XTtimer
  1:262XT-PIC-XTi8042
  2:  0XT-PIC-XTcascade
  5: 962175XT-PIC-XTsym53c8xx, uhci_hcd:usb1,
uhci_hcd:usb2
  7:   3678XT-PIC-XTparport0
  8:  2XT-PIC-XTrtc
 10:9153035XT-PIC-XTide2, eth0
 11: 30XT-PIC-XTsym53c8xx
 12:1026266XT-PIC-XTlibata
 14: 840214XT-PIC-XTide0
 15: 569928XT-PIC-XTide1
NMI:  11264
LOC:   20506755
ERR:234
MIS:  0

Output of dmesg:



...


Local APIC disabled by BIOS -- reenabling.
Found and enabled local APIC!


Hmm, you might want to try not forcing the local APIC enabled by 
removing the lapic option from the kernel command line. Don't know if it 
could be related though.



VP_IDE: IDE controller at PCI slot :00:04.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci:00:04.1
ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:DMA
Probing IDE interface ide0...
input: AT Translated Set 2 keyboard as /class/input/input0
hda: MAXTOR STM3160812A, ATA DISK drive
hda: IRQ probe failed (0xfef8)
hdb: WDC WD1600JB-00EVA0, ATA DISK drive
hdb: IRQ probe failed (0xfef8)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: IC35L120AVVA07-0, ATA DISK drive
hdc: IRQ probe failed (0xbef8)
hdd: WDC WD1200JB-00GVA0, ATA DISK drive
hdd: IRQ probe failed (0xbef8)


These IRQ proble failed errors don't seem right at all.

Then:


NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e000.
  diagnostics: net 0cd8 media 8880 dma 00a0 fifo 
  Flags; bus-master 1, dirty 203390(14) current 203406(14)
  Transmit list 01bea840 vs. c1beaac0.


So eth0's not happy.


ata2.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x6 frozen
ata2.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2.00: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2.00: tag 2 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2: hard resetting port


The SATA card is getting timeouts. And from your other mail:

 hda: DMA timeout error
 hda: dma timeout error: status=0x58 { DriveReady SeekComplete 
DataRequest }

 ide: failed opcode was: unknown
 hda: status timeout: status=0xd0 { Busy }
 ide: failed opcode was: unknown
 hda: no DRQ after issuing MULTWRITE_EXT

Your onboard IDE controller is also timing out.. Sounds to me like some 
kind of general IRQ problem, though you'd have to be losing interrupts 
on IRQ 10, 12 and 14 which seems pretty extreme. Maybe a hardware 
problem, are you sure you have enough power to run this many drives in 
the box?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-03 Thread Mark Wagner
[1.] One line summary of the problem:

sata_sil24 lockups under heavy i/o

[2.] Full description of the problem/report:

I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning
well with two disks attached. Once I attached 2 additional disks (for
a total of 4) and started heavy i/o (extending a software raid5 device)
the system began locking up for a few minutes at a time. After the
system recovers the disk transfer speed is reduced from UDMA/100 to
UDMA/66 or UDMA/44.

[3.] Keywords (i.e., modules, networking, kernel):

libata sata_sil24

[4.] Kernel version (from /proc/version):

Linux version 2.6.19-gentoo-r2 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo
4.1.1-r3)) #1 Tue Dec 19 22:55:21 PST 2006

[5.] Most recent kernel version which did not have the bug:

Unknown.

[8.1.] Software (add the output of the ver_linux script here)

Linux cthulhu 2.6.19-gentoo-r2 #1 Tue Dec 19 22:55:21 PST 2006 i686 AMD
Athlon(tm) Processor AuthenticAMD GNU/Linux

Gnu C  4.1.1
Gnu make   3.81
binutils   2.17
util-linux 2.12r
mount  2.12r
module-init-tools  3.2.2
e2fsprogs  1.39
Linux C Library> libc.2.4
Dynamic linker (ldd)   2.4
Procps 3.2.7
Net-tools  1.60
Kbd1.12
Sh-utils   6.7
udev   103
Modules Loaded w83781d hwmon_vid lp usbhid 8250_pnp 8250
serial_core parport_pc pcspkr parport uhci_hcd via686a i2c_isa usbcore
i2c_viapro i2c_core

[8.2.] Processor information (from /proc/cpuinfo):

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 6
model   : 4
model name  : AMD Athlon(tm) Processor
stepping: 4
cpu MHz : 1410.226
cache size  : 256 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips: 2822.62

[8.3.] Module information (from /proc/modules):

w83781d 28008 1 - Live 0xfa367000
hwmon_vid 2240 1 w83781d, Live 0xf887d000
lp 8452 0 - Live 0xfa24c000
usbhid 32288 1 - Live 0xfa257000
8250_pnp 8704 0 - Live 0xf883b000
8250 17252 1 8250_pnp, Live 0xf8851000
serial_core 14976 1 8250, Live 0xf884c000
parport_pc 28644 1 - Live 0xfa202000
pcspkr 2240 0 - Live 0xf883f000
parport 30600 2 lp,parport_pc, Live 0xf8872000
uhci_hcd 16776 0 - Live 0xf8822000
via686a 13320 0 - Live 0xf8841000
i2c_isa 3584 2 w83781d,via686a, Live 0xf8839000
usbcore 99524 4 usbhid,uhci_hcd, Live 0xf8858000
i2c_viapro 6932 0 - Live 0xf882d000
i2c_core 15952 4 w83781d,via686a,i2c_isa,i2c_viapro, Live 0xf8828000

[8.4.] Loaded driver and hardware information (/proc/ioports,
/proc/iomem)

-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial
0376-0376 : ide1
0378-037a : parport0
03c0-03df : vga+
  03c0-03df : vesafb
03f6-03f6 : ide0
03f8-03ff : serial
0778-077a : parport0
0cf8-0cff : PCI conf1
7800-783f : :00:11.0
  7800-7807 : ide2
  7808-780f : ide3
  7810-783f : PDC20265
8000-8003 : :00:11.0
8400-8407 : :00:11.0
8800-8803 : :00:11.0
  8802-8802 : ide2
9000-9007 : :00:11.0
  9000-9007 : ide2
9400-947f : :00:0b.0
9800-980f : :00:0a.0
  9800-980f : sata_sil24
a000-a0ff : :00:09.1
  a000-a0ff : sym53c8xx
a400-a4ff : :00:09.0
  a400-a4ff : sym53c8xx
d000-d01f : :00:04.3
  d000-d01f : uhci_hcd
d400-d41f : :00:04.2
  d400-d41f : uhci_hcd
d800-d80f : :00:04.1
  d800-d807 : ide0
  d808-d80f : ide1
e200-e27f : :00:04.4
e400-e47f : pnp 00:12
e800-e80f : :00:04.4
  e800-e807 : vt596_smbus

-0009efff : System RAM
0009f000-0009 : reserved
000a-000b : Video RAM area
000c-000cf3ff : Video ROM
000d-000d27ff : Adapter ROM
000d4000-000d47ff : Adapter ROM
000d8000-000dbfff : Adapter ROM
000f-000f : System ROM
0010-3ffebfff : System RAM
  0010-002efefe : Kernel code
  002efeff-003acf43 : Kernel data
3ffec000-3ffeefff : ACPI Tables
3ffef000-3fffefff : reserved
3000-3fff : ACPI Non-volatile Storage
5000-5007 : :00:0a.0
5008-5009 : :00:0b.0
500a-500a : :00:09.0
500b-500b : :00:09.1
500c-500c : :00:11.0
d200-d201 : :00:11.0
d280-d280007f : :00:0b.0
d300-d3007fff : :00:0a.0
  d300-d3007fff : sata_sil24
d380-d380007f : :00:0a.0
  d380-d380007f : sata_sil24
d400-d4000fff : :00:09.1
  d400-d4000fff : sym53c8xx
d480-d48000ff : :00:09.1
  d480-d48000ff : sym53c8xx
d500-d5000fff : :00:09.0
  d500-d5000fff : sym53c8xx
d580-d58000ff : :00:09.0
  d580-d5800

PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-03 Thread Mark Wagner
[1.] One line summary of the problem:

sata_sil24 lockups under heavy i/o

[2.] Full description of the problem/report:

I have a PCI-based sata_sil24 card. It has 4 ports. It was functioning
well with two disks attached. Once I attached 2 additional disks (for
a total of 4) and started heavy i/o (extending a software raid5 device)
the system began locking up for a few minutes at a time. After the
system recovers the disk transfer speed is reduced from UDMA/100 to
UDMA/66 or UDMA/44.

[3.] Keywords (i.e., modules, networking, kernel):

libata sata_sil24

[4.] Kernel version (from /proc/version):

Linux version 2.6.19-gentoo-r2 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo
4.1.1-r3)) #1 Tue Dec 19 22:55:21 PST 2006

[5.] Most recent kernel version which did not have the bug:

Unknown.

[8.1.] Software (add the output of the ver_linux script here)

Linux cthulhu 2.6.19-gentoo-r2 #1 Tue Dec 19 22:55:21 PST 2006 i686 AMD
Athlon(tm) Processor AuthenticAMD GNU/Linux

Gnu C  4.1.1
Gnu make   3.81
binutils   2.17
util-linux 2.12r
mount  2.12r
module-init-tools  3.2.2
e2fsprogs  1.39
Linux C Library libc.2.4
Dynamic linker (ldd)   2.4
Procps 3.2.7
Net-tools  1.60
Kbd1.12
Sh-utils   6.7
udev   103
Modules Loaded w83781d hwmon_vid lp usbhid 8250_pnp 8250
serial_core parport_pc pcspkr parport uhci_hcd via686a i2c_isa usbcore
i2c_viapro i2c_core

[8.2.] Processor information (from /proc/cpuinfo):

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 6
model   : 4
model name  : AMD Athlon(tm) Processor
stepping: 4
cpu MHz : 1410.226
cache size  : 256 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips: 2822.62

[8.3.] Module information (from /proc/modules):

w83781d 28008 1 - Live 0xfa367000
hwmon_vid 2240 1 w83781d, Live 0xf887d000
lp 8452 0 - Live 0xfa24c000
usbhid 32288 1 - Live 0xfa257000
8250_pnp 8704 0 - Live 0xf883b000
8250 17252 1 8250_pnp, Live 0xf8851000
serial_core 14976 1 8250, Live 0xf884c000
parport_pc 28644 1 - Live 0xfa202000
pcspkr 2240 0 - Live 0xf883f000
parport 30600 2 lp,parport_pc, Live 0xf8872000
uhci_hcd 16776 0 - Live 0xf8822000
via686a 13320 0 - Live 0xf8841000
i2c_isa 3584 2 w83781d,via686a, Live 0xf8839000
usbcore 99524 4 usbhid,uhci_hcd, Live 0xf8858000
i2c_viapro 6932 0 - Live 0xf882d000
i2c_core 15952 4 w83781d,via686a,i2c_isa,i2c_viapro, Live 0xf8828000

[8.4.] Loaded driver and hardware information (/proc/ioports,
/proc/iomem)

-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial
0376-0376 : ide1
0378-037a : parport0
03c0-03df : vga+
  03c0-03df : vesafb
03f6-03f6 : ide0
03f8-03ff : serial
0778-077a : parport0
0cf8-0cff : PCI conf1
7800-783f : :00:11.0
  7800-7807 : ide2
  7808-780f : ide3
  7810-783f : PDC20265
8000-8003 : :00:11.0
8400-8407 : :00:11.0
8800-8803 : :00:11.0
  8802-8802 : ide2
9000-9007 : :00:11.0
  9000-9007 : ide2
9400-947f : :00:0b.0
9800-980f : :00:0a.0
  9800-980f : sata_sil24
a000-a0ff : :00:09.1
  a000-a0ff : sym53c8xx
a400-a4ff : :00:09.0
  a400-a4ff : sym53c8xx
d000-d01f : :00:04.3
  d000-d01f : uhci_hcd
d400-d41f : :00:04.2
  d400-d41f : uhci_hcd
d800-d80f : :00:04.1
  d800-d807 : ide0
  d808-d80f : ide1
e200-e27f : :00:04.4
e400-e47f : pnp 00:12
e800-e80f : :00:04.4
  e800-e807 : vt596_smbus

-0009efff : System RAM
0009f000-0009 : reserved
000a-000b : Video RAM area
000c-000cf3ff : Video ROM
000d-000d27ff : Adapter ROM
000d4000-000d47ff : Adapter ROM
000d8000-000dbfff : Adapter ROM
000f-000f : System ROM
0010-3ffebfff : System RAM
  0010-002efefe : Kernel code
  002efeff-003acf43 : Kernel data
3ffec000-3ffeefff : ACPI Tables
3ffef000-3fffefff : reserved
3000-3fff : ACPI Non-volatile Storage
5000-5007 : :00:0a.0
5008-5009 : :00:0b.0
500a-500a : :00:09.0
500b-500b : :00:09.1
500c-500c : :00:11.0
d200-d201 : :00:11.0
d280-d280007f : :00:0b.0
d300-d3007fff : :00:0a.0
  d300-d3007fff : sata_sil24
d380-d380007f : :00:0a.0
  d380-d380007f : sata_sil24
d400-d4000fff : :00:09.1
  d400-d4000fff : sym53c8xx
d480-d48000ff : :00:09.1
  d480-d48000ff : sym53c8xx
d500-d5000fff : :00:09.0
  d500-d5000fff : sym53c8xx
d580-d58000ff : :00:09.0
  d580-d58000ff