Re: [PATCH #upstream-fixes] libata: kill spurious NCQ completion detection

2007-12-07 Thread Mark Lord

Tejun Heo wrote:

Spurious NCQ completion detection implemented in ahci was incorrect.
On AHCI receving and processing FISes and raising interrupts are not
interlocked and spurious interrupts are expected.

For example, if an interrupt occurs while interrupt handler is running
and the running interrupt handler handles the event the new IRQ
indicated, after IRQ handler finishes, it will be executed again
because IRQ pending bit is set by the new interrupt but there won't be
anything to process.

...

Great job tracking that one down, Tejun!

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH #upstream-fixes] libata: kill spurious NCQ completion detection

2007-12-07 Thread Jeff Garzik

Tejun Heo wrote:

Spurious NCQ completion detection implemented in ahci was incorrect.
On AHCI receving and processing FISes and raising interrupts are not
interlocked and spurious interrupts are expected.

For example, if an interrupt occurs while interrupt handler is running
and the running interrupt handler handles the event the new IRQ
indicated, after IRQ handler finishes, it will be executed again
because IRQ pending bit is set by the new interrupt but there won't be
anything to process.

Please read the following message for more information.

  http://article.gmane.org/gmane.linux.ide/26012

This patch...

* Removes all spurious IRQ whining from ahci.  Spurious NCQ completion
  detection was completely wrong.  Spurious D2H Register FIS taught us
  that some early drives send spurious D2H Register FIS with I bit set
  while NCQ commands are in progress but none of recent drives does
  that and even the ones which show such behavior can do NCQ fine.

* Kills all NCQ blacklist entries which were added because of spurious
  NCQ completions.  I tracked down each commit and verified all
  removed ones are actually added because of spurious completions.

  WD740ADFD-00NLR1 wasn't deleted but moved upward because the drive
  not only had spurious NCQ completions but also is slow on sequential
  data transfers if NCQ is enabled.

  Maxtor 7V300F0 was added by 0e3dbc01d53940fe10e5a5cfec15ede3e929c918
  from Alan Cox.  I can only find evidences that the drive only had
  troubles with spuruious completions by searching the mailing list.
  This entry needs to be verified and removed if it doesn't have other
  NCQ related problems.

Signed-off-by: Tejun Heo [EMAIL PROTECTED]
Cc: Alan Cox [EMAIL PROTECTED]
---
Alan, can you please check why 7V300F0 was added?  Thanks a lot.

 drivers/ata/ahci.c|   74 +-
 drivers/ata/libata-core.c |   18 ---
 2 files changed, 4 insertions(+), 88 deletions(-)


applied


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH #upstream-fixes] libata: kill spurious NCQ completion detection

2007-12-06 Thread Tejun Heo
Spurious NCQ completion detection implemented in ahci was incorrect.
On AHCI receving and processing FISes and raising interrupts are not
interlocked and spurious interrupts are expected.

For example, if an interrupt occurs while interrupt handler is running
and the running interrupt handler handles the event the new IRQ
indicated, after IRQ handler finishes, it will be executed again
because IRQ pending bit is set by the new interrupt but there won't be
anything to process.

Please read the following message for more information.

  http://article.gmane.org/gmane.linux.ide/26012

This patch...

* Removes all spurious IRQ whining from ahci.  Spurious NCQ completion
  detection was completely wrong.  Spurious D2H Register FIS taught us
  that some early drives send spurious D2H Register FIS with I bit set
  while NCQ commands are in progress but none of recent drives does
  that and even the ones which show such behavior can do NCQ fine.

* Kills all NCQ blacklist entries which were added because of spurious
  NCQ completions.  I tracked down each commit and verified all
  removed ones are actually added because of spurious completions.

  WD740ADFD-00NLR1 wasn't deleted but moved upward because the drive
  not only had spurious NCQ completions but also is slow on sequential
  data transfers if NCQ is enabled.

  Maxtor 7V300F0 was added by 0e3dbc01d53940fe10e5a5cfec15ede3e929c918
  from Alan Cox.  I can only find evidences that the drive only had
  troubles with spuruious completions by searching the mailing list.
  This entry needs to be verified and removed if it doesn't have other
  NCQ related problems.

Signed-off-by: Tejun Heo [EMAIL PROTECTED]
Cc: Alan Cox [EMAIL PROTECTED]
---
Alan, can you please check why 7V300F0 was added?  Thanks a lot.

 drivers/ata/ahci.c|   74 +-
 drivers/ata/libata-core.c |   18 ---
 2 files changed, 4 insertions(+), 88 deletions(-)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 4688dbf..7ef497a 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1638,7 +1638,7 @@ static void ahci_port_intr(struct ata_port *ap)
struct ahci_host_priv *hpriv = ap-host-private_data;
int resetting = !!(ap-pflags  ATA_PFLAG_RESETTING);
u32 status, qc_active;
-   int rc, known_irq = 0;
+   int rc;
 
status = readl(port_mmio + PORT_IRQ_STAT);
writel(status, port_mmio + PORT_IRQ_STAT);
@@ -1696,80 +1696,12 @@ static void ahci_port_intr(struct ata_port *ap)
 
rc = ata_qc_complete_multiple(ap, qc_active, NULL);
 
-   /* If resetting, spurious or invalid completions are expected,
-* return unconditionally.
-*/
-   if (resetting)
-   return;
-
-   if (rc  0)
-   return;
-   if (rc  0) {
+   /* while resetting, invalid completions are expected */
+   if (unlikely(rc  0  !resetting)) {
ehi-err_mask |= AC_ERR_HSM;
ehi-action |= ATA_EH_SOFTRESET;
ata_port_freeze(ap);
-   return;
-   }
-
-   /* hmmm... a spurious interrupt */
-
-   /* if !NCQ, ignore.  No modern ATA device has broken HSM
-* implementation for non-NCQ commands.
-*/
-   if (!ap-link.sactive)
-   return;
-
-   if (status  PORT_IRQ_D2H_REG_FIS) {
-   if (!pp-ncq_saw_d2h)
-   ata_port_printk(ap, KERN_INFO,
-   D2H reg with I during NCQ, 
-   this message won't be printed again\n);
-   pp-ncq_saw_d2h = 1;
-   known_irq = 1;
-   }
-
-   if (status  PORT_IRQ_DMAS_FIS) {
-   if (!pp-ncq_saw_dmas)
-   ata_port_printk(ap, KERN_INFO,
-   DMAS FIS during NCQ, 
-   this message won't be printed again\n);
-   pp-ncq_saw_dmas = 1;
-   known_irq = 1;
}
-
-   if (status  PORT_IRQ_SDB_FIS) {
-   const __le32 *f = pp-rx_fis + RX_FIS_SDB;
-
-   if (le32_to_cpu(f[1])) {
-   /* SDB FIS containing spurious completions
-* might be dangerous, whine and fail commands
-* with HSM violation.  EH will turn off NCQ
-* after several such failures.
-*/
-   ata_ehi_push_desc(ehi,
-   spurious completions during NCQ 
-   issue=0x%x SAct=0x%x FIS=%08x:%08x,
-   readl(port_mmio + PORT_CMD_ISSUE),
-   readl(port_mmio + PORT_SCR_ACT),
-   le32_to_cpu(f[0]), le32_to_cpu(f[1]));
-   ehi-err_mask |= AC_ERR_HSM;
-   ehi-action |= ATA_EH_SOFTRESET;
-   ata_port_freeze(ap);
-