I have further characterized the error. It looks like, at least during the softraid rebuild process, most DMA commands are sent to the PCI card and then complete via an IRQ callback before the next command is sent. However, the problem I see here sometimes occurrs when:
- Command for drive 1 is sent to the PCI card via DMA (sata_promise.c:pdc_packet_start) - Command for drive 2 is sent to the PCI card via DMA before the previous command completes - Command for drive 1 completes (sata_promise.c:pdc_host_intr) Often the command for drive 2 will now timeout. Now, I have seen the case when this above scenario will actually complete successfully, either with a second IRQ just for the drive2 command, or sometimes with a single IRQ which completes both commands. I have a workaround using a semaphore which causes all commands to strictly serialize, (lock it in pdc_packet_start, unlock in pdc_host_intr) thereby not allowing any concurrent commands, but this appears to have a large performance impact. At least it allows me to actually cause my softraid device to finish syncing to 100%. I'm looking for other solutions, or a clue as to the actual cause of the error. My current theory is that if the second command is sent to the PCI via DMA too soon, it may be overlooked, so some rate-limiting may be useful, if I can figure out how to implement it. Any comments or suggestions here would be greatly appreciated, thanks! -- Jim Ramsay "Me fail English? That's unpossible!" - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
