Re: Disabling ADMA? (was Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61)

2007-07-07 Thread Robert Hancock

Jeff Garzik wrote:

Kuan Luo wrote:

@@ -1714,3 +2761,6 @@ module_init(nv_init);
 module_exit(nv_exit);
 module_param_named(adma, adma_enabled, bool, 0444);
 MODULE_PARM_DESC(adma, Enable use of ADMA (Default: true));
+module_param_named(ncq, ncq_enabled, bool, 0444);
+MODULE_PARM_DESC(ncq, Enable use of NCQ (Default: false));


After looking through sata_nv bug reports, I am leaning towards 
disabling ADMA by default, and wanted to solicit comments.


While admittedly not knowing the root cause, it seems like every current 
outstanding sata_nv bug report that remains after switching out hardware 
can be solved by setting module option 'adma' to zero.  That's my first 
suggestion upon any bugzilla sata_nv bug, and it usually works.  You can 
look through bugs assigned to or CC'd to [EMAIL PROTECTED] (kernel.org 
bugs) [EMAIL PROTECTED] (redhat.com bugs) for examples.


Do you have some specific bug numbers? Other than this one:

http://bugzilla.kernel.org/show_bug.cgi?id=8421

That one is hotplug-related, and it would seem only affects specific 
boards as it works fine for me. Likely NVIDIA input would be needed on 
that one to get much farther.


I didn't find any other sata_nv ADMA-related bugs in the kernel.org or 
RH Bugzillas.


Most of the reports I've gotten recently have been in one of these 
categories:


-broken NCQ implementation on the drive (usually shows up as commands 
timing out with CPB resp_flags of 2)

-cabling or power problems (often showing errors in the SError register)

There have been a few reports yet of problems with commands getting 
issued and seeing to go nowhere. These can be difficult to diagnose. 
However, in some cases there's evidence leaning toward this being a 
drive issue as well, ex: this thread:


http://groups.google.ca/group/linux.kernel/browse_frm/thread/2ebf329e3f6470eb/0d26dff9a30d29e9?lnk=gst

Here there's a report of this drive model doing this on multiple 
controllers.


Keep in mind, in the case where a broken NCQ implementation on the drive 
is the cause, disabling ADMA (and hence NCQ) will appear to fix the 
problem when the proper course of action would be to blacklist NCQ on 
that drive globally.


Given the relative scarcity of problem reports (at least those I've 
seen), and the fact that this is not obscure hardware (they must have 
made millions of these nForce4 boards, and I think some may even still 
be in production), I'm reluctant to say we should be switching ADMA off 
by default.




I still need to review the SWNCQ patch in detail, but I presume it is 
possible to still use SWNCQ without ADMA?


Yes, they're independent.



On a side note, I would rather default SWNCQ to 'on'.



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61

2007-05-18 Thread Kuan Luo
Thanks for your comment, see the explaination inline.
We'll apply your advice in later patch.

-Original Message-
From: Robert Hancock [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 18, 2007 9:48 AM
To: Peer Chen
Cc: [EMAIL PROTECTED]; Jeff Garzik;
[EMAIL PROTECTED]; Kuan Luo; linux-ide@vger.kernel.org
Subject: Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for
MCP51/MCP55/MCP61

Peer Chen wrote:
  Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA
 controller.
 
 This patch base on sata_nv.c file from kernel 2.6.22-rc1
 
 See attachment for the patch.
 
 Signed-off-by: Kuan Luo [EMAIL PROTECTED]
 Signed-off-by: Peer Chen [EMAIL PROTECTED]

Good to finally see this come out. I've pasted the code below (indented)

in order to make some comments:

--- linux-2.6.22-rc1/drivers/ata/sata_nv.c.orig 2007-05-17 
14:48:26.0 -0400
+++ linux-2.6.22-rc1/drivers/ata/sata_nv.c  2007-05-17 
17:07:28.0 -0400
@@ -46,6 +46,8 @@
 #include linux/device.h
 #include scsi/scsi_host.h
 #include scsi/scsi_device.h
+#include scsi/scsi.h
+#include scsi/scsi_cmnd.h
 #include linux/libata.h

 #define DRV_NAME   sata_nv
@@ -169,6 +171,36 @@ enum {
NV_ADMA_PORT_REGISTER_MODE  = (1  0),
NV_ADMA_ATAPI_SETUP_COMPLETE= (1  1),

+   /* MCP55 reg offset */
+   NV_CTL_MCP55= 0x400,
+   NV_INT_STATUS_MCP55 = 0x440,
+   NV_INT_ENABLE_MCP55 = 0x444,
+   NV_NCQ_REG_MCP55= 0x448,
+   NV_CH1_SACTIVE_MCP55= 0x0C,
+   
+   /* MCP55 */
+   NV_INT_ALL_MCP55= 0x,
+   NV_INT_PORT_SHIFT_MCP55 = 16,   /* each port
occupies 16 bits */
+   NV_INT_MASK_MCP55   = NV_INT_ALL_MCP55 
0xfffd,
+   
+   /* NCQ ENABLE BITS*/
+   NV_CTL_PRI_SWNCQ= 0x02,
+   NV_CTL_SEC_SWNCQ= 0x04,
+   
+   /* MCP55 status bits*/
+   NV_INT_DEV_MCP55= 0x01,
+   NV_INT_PM_MCP55 = 0x02,
+   NV_INT_ADDED_MCP55  = 0x04,
+   NV_INT_REMOVED_MCP55= 0x08,
+   
+   NV_INT_BACKOUT_MCP55= 0x10,
+   NV_INT_SDBFIS_MCP55 = 0x20,
+   NV_INT_DHREGFIS_MCP55   = 0x40,
+   NV_INT_DMASETUP_MCP55   = 0x80,
+   
+   NV_INT_HOTPLUG_MCP55= (NV_INT_ADDED_MCP55 |
+
NV_INT_REMOVED_MCP55),
+
 };

 /* ADMA Physical Region Descriptor - one SG segment */
@@ -264,13 +296,118 @@ static void nv_adma_host_stop(struct ata
 static void nv_adma_post_internal_cmd(struct ata_queued_cmd
*qc);
 static void nv_adma_tf_read(struct ata_port *ap, struct
ata_taskfile *tf);

+static void ncq_error_handler(struct ata_port *ap);
+static void nv_mcp55_thaw(struct ata_port *ap);
+static void nv_mcp55_freeze(struct ata_port *ap);
+static void ncq_host_init(struct ata_host *host);
+static void nv_bmdma_stop(struct ata_port *ap);
+static int nv_std_qc_defer(struct ata_port *ap);
+static int  nv_port_start(struct ata_port *ap);
+static void nv_port_stop(struct ata_port *ap);
+static void ncq_clear(struct ata_port *ap);
+static void nv_qc_prep(struct ata_queued_cmd *qc);
+static void nv_fill_sg(struct ata_queued_cmd *qc);
+static void ncq_sactive_start (struct ata_queued_cmd *qc);
+static u32 ncq_sactive_value (struct ata_port *ap);
+static unsigned int nv_qc_issue_prot(struct ata_queued_cmd
*qc);
+static u32 ncq_tag_value(struct ata_port *ap);
+static int nv_ncqintr_sdbfis(struct ata_port *ap);
+static int nv_ncqintr_dmasetupfis(struct ata_port *ap);
+static void ncq_clear_singlefis(struct ata_port *ap, u32 val);
+static u32 ncq_ownfisintr_value (struct ata_port *ap);
+void ncq_hotplug(struct ata_port *ap, u32 fis);
+static irqreturn_t nv_mcp55_interrupt(int irq, void
*dev_instance);
+static int ncq_interrupt(struct ata_port *ap, u32 fis);
+static int nv_scsi_queuecmd(struct scsi_cmnd *cmd,
+   void (*done)(struct scsi_cmnd *));

These functions should use mcp51 or swncq or something in the name 
instead of ncq, since the latter implies it may be related to ADMA as 
well.
[kuan]:
I will use a better name for these function.
+
+#undef NCQ_DEBUG
+#undef NCQ_VERBOSE_DEBUG
+#ifdef NCQ_DEBUG
+#define NPRINTK(fmt, args...) printk(KERN_ERR %s:  fmt, 
__FUNCTION__, 

Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61

2007-05-18 Thread Robert Hancock

Kuan Luo wrote:

Thanks for your comment, see the explaination inline.
We'll apply your advice in later patch.


...

Please don't duplicate this code in the driver, this is part of libata 
core in libata-scsi.c. Add an export for these functions if you need to 
use them in the driver.

[kuan]: These calls are declared in static type . I can't export them
and don't want to
modify libata code.


If you really need these functions then they should be made non-static 
and exported. Duplicating the code is not a solution.


I'm not so sure you actually need all that, though. I suspect you can 
likely handle the deferring of commands if you detect an FPDMA data 
phase inside the qc_issue function only (like you already do in some 
cases) instead of having to mess with deferring them at the SCSI layer.


I'm still puzzling out how this stuff all works, but it looks like this 
code makes you stop sending new commands if:


-the port is in the FPDMA Data Phase (DMA Setup FIS received but the 
transfer is not complete yet) - I assume the hardware doesn't handle 
this itself, which seems rather unique
-we previously deferred a command inside of qc_issue because we were in 
the FPDMA data phase
-we previously saw dhfis_flags not equal to qc_active, or we got a 
BACKOUT interrupt (whatever exactly that means), both of which set some 
value in the back_byte
[kuan]: 
-If we got BACKOUT interrupt, it means that a command just sent by

driver
backed out.The driver should resend the command.So new commands should
be defered.
-If dhfis_flags != qc_active, it indicates that the last command doesn't
generate a device to host register FIS .
After sending some commands, I found that the last command sometimes has
this problem 
but previous commands are normal.In this case, we need resend the last

command.
Both cases set back_byte. 


The case where the command didn't generate a D2H FIS should likely be 
investigated further, otherwise we don't necessarily know that this 
workaround will work in all cases?


This code seems a bit odd. Isn't this tossing out a bunch of potential 
error status, etc?

[kuan]:
If there are commands in queue, the driver can  send  a new command 
only after receiving dhfis intr of previous command and before receiving

any dmasetup fis intr.
In this place,  i do the last check before sending the command.


But the D2H FIS can contain an error indication, correct? If that 
happens here it won't detect this. In this situation error handling 
should be triggered.



+   done_mask = pp-qc_active ^ sactive;
+   if (unlikely(done_mask  sactive)) {
+   ata_port_printk(ap, KERN_ERR, illegal qc_active
transition 
+   (%08x-%08x)\n, ap-qc_active,
sactive);
+   return -EINVAL;
+   }   

Shouldn't this trigger error handling if it happens instead of just 
printing an error?

[kuan]:
I think the error handling can be triggered by timeout. In fact, 
this case should seldom happen.


There have been reports of some drives with bad NCQ implementations that 
return completion status for commands that were not issued. If we detect 
this case we should raise an HSM violation which will disable NCQ on 
this drive if it happens repeatedly. See the code in ahci.c in 
ahci_host_intr.


This comment still applies:


Additional/general comments:

Think you need some code to handle suspend and resume (re-enable SATA 
MMIO space, etc.)


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61

2007-05-17 Thread Robert Hancock

Peer Chen wrote:

 Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA
controller.

This patch base on sata_nv.c file from kernel 2.6.22-rc1

See attachment for the patch.

Signed-off-by: Kuan Luo [EMAIL PROTECTED]
Signed-off-by: Peer Chen [EMAIL PROTECTED]


Good to finally see this come out. I've pasted the code below (indented) 
in order to make some comments:


	--- linux-2.6.22-rc1/drivers/ata/sata_nv.c.orig	2007-05-17 
14:48:26.0 -0400
	+++ linux-2.6.22-rc1/drivers/ata/sata_nv.c	2007-05-17 
17:07:28.0 -0400

@@ -46,6 +46,8 @@
 #include linux/device.h
 #include scsi/scsi_host.h
 #include scsi/scsi_device.h
+#include scsi/scsi.h
+#include scsi/scsi_cmnd.h
 #include linux/libata.h

 #define DRV_NAME   sata_nv
@@ -169,6 +171,36 @@ enum {
NV_ADMA_PORT_REGISTER_MODE  = (1  0),
NV_ADMA_ATAPI_SETUP_COMPLETE= (1  1),

+   /* MCP55 reg offset */
+   NV_CTL_MCP55= 0x400,
+   NV_INT_STATUS_MCP55 = 0x440,
+   NV_INT_ENABLE_MCP55 = 0x444,
+   NV_NCQ_REG_MCP55= 0x448,
+   NV_CH1_SACTIVE_MCP55= 0x0C,
+   
+   /* MCP55 */
+   NV_INT_ALL_MCP55= 0x,
+   NV_INT_PORT_SHIFT_MCP55 = 16,   /* each port occupies 
16 bits */
+   NV_INT_MASK_MCP55   = NV_INT_ALL_MCP55  0xfffd,
+   
+   /* NCQ ENABLE BITS*/
+   NV_CTL_PRI_SWNCQ= 0x02,
+   NV_CTL_SEC_SWNCQ= 0x04,
+   
+   /* MCP55 status bits*/
+   NV_INT_DEV_MCP55= 0x01,
+   NV_INT_PM_MCP55 = 0x02,
+   NV_INT_ADDED_MCP55  = 0x04,
+   NV_INT_REMOVED_MCP55= 0x08,
+   
+   NV_INT_BACKOUT_MCP55= 0x10,
+   NV_INT_SDBFIS_MCP55 = 0x20,
+   NV_INT_DHREGFIS_MCP55   = 0x40,
+   NV_INT_DMASETUP_MCP55   = 0x80,
+   
+   NV_INT_HOTPLUG_MCP55= (NV_INT_ADDED_MCP55 |
+   NV_INT_REMOVED_MCP55),
+
 };

 /* ADMA Physical Region Descriptor - one SG segment */
@@ -264,13 +296,118 @@ static void nv_adma_host_stop(struct ata
 static void nv_adma_post_internal_cmd(struct ata_queued_cmd *qc);
 static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile 
*tf);

+static void ncq_error_handler(struct ata_port *ap);
+static void nv_mcp55_thaw(struct ata_port *ap);
+static void nv_mcp55_freeze(struct ata_port *ap);
+static void ncq_host_init(struct ata_host *host);
+static void nv_bmdma_stop(struct ata_port *ap);
+static int nv_std_qc_defer(struct ata_port *ap);
+static int  nv_port_start(struct ata_port *ap);
+static void nv_port_stop(struct ata_port *ap);
+static void ncq_clear(struct ata_port *ap);
+static void nv_qc_prep(struct ata_queued_cmd *qc);
+static void nv_fill_sg(struct ata_queued_cmd *qc);
+static void ncq_sactive_start (struct ata_queued_cmd *qc);
+static u32 ncq_sactive_value (struct ata_port *ap);
+static unsigned int nv_qc_issue_prot(struct ata_queued_cmd *qc);
+static u32 ncq_tag_value(struct ata_port *ap);
+static int nv_ncqintr_sdbfis(struct ata_port *ap);
+static int nv_ncqintr_dmasetupfis(struct ata_port *ap);
+static void ncq_clear_singlefis(struct ata_port *ap, u32 val);
+static u32 ncq_ownfisintr_value (struct ata_port *ap);
+void ncq_hotplug(struct ata_port *ap, u32 fis);
+static irqreturn_t nv_mcp55_interrupt(int irq, void *dev_instance);
+static int ncq_interrupt(struct ata_port *ap, u32 fis);
+static int nv_scsi_queuecmd(struct scsi_cmnd *cmd,
+   void (*done)(struct scsi_cmnd *));

These functions should use mcp51 or swncq or something in the name 
instead of ncq, since the latter implies it may be related to ADMA as 
well.


+
+#undef NCQ_DEBUG
+#undef NCQ_VERBOSE_DEBUG
+#ifdef NCQ_DEBUG
	+#define NPRINTK(fmt, args...) printk(KERN_ERR %s:  fmt, 
__FUNCTION__, ## args)

+#ifdef NCQ_VERBOSE_DEBUG
	+#define NVPRINTK(fmt, args...) printk(KERN_ERR %s:  fmt, 
__FUNCTION__, ## args)

+#else
+#define NVPRINTK(fmt, args...) do { } while(0)
+#endif /* NCQ_VERBOSE_DEBUG */
+#else
+#define NPRINTK(fmt, args...) do { } while(0)
+#define NVPRINTK(fmt, args...) do { } while(0)
+#endif

We don't need these private helper macros, just use