Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
Ric Wheeler wrote: > > > Mark Lord wrote: > >> Eric D. Mudama wrote: >> >>> >>> Actually, it's possibly worse, since each failure in libata will >>> generate 3-4 retries. With existing ATA error recovery in the >>> drives, that's about 3 seconds per retry on average, or 12 seconds >>> per failure. Multiply that by the number of blocks past the error to >>> complete the request.. >> >> >> It really beats the alternative of a forced reboot >> due to, say, superblock I/O failing because it happened >> to get merged with an unrelated I/O which then failed.. >> Etc.. >> >> Definitely an improvement. >> >> The number of retries is an entirely separate issue. >> If we really care about it, then we should fix SD_MAX_RETRIES. >> >> The current value of 5 is *way* too high. It should be zero or one. >> >> Cheers >> > I think that drives retry enough, we should leave retry at zero for > normal (non-removable) drives. Should this be a policy we can set like > we do with NCQ queue depth via /sys ? The transport might also want a say. I see ABORTED COMMAND errors often enough with SAS (e.g. due to expander congestion) to warrant at least one retry (which works in my testing). SATA disks behind SAS infrastructure would also be susceptible to the same "random" failures. Transport Layer Retries (TLR) in SAS should remove this class of transport errors but only SAS tape drives support TLR as far as I know. Doug Gilbert > We need to be able to layer things like MD on top of normal drive errors > in a way that will produce a system that provides reasonable response > time despite any possible IO error on a single component. Another case > that we end up doing on a regular basis is drive recovery. Errors need > to be limited in scope to just the impacted area and dispatched up to > the application layer as quickly as we can so that you don't spend days > watching a copy of huge drive (think 750GB or more) ;-) > > ric - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
On Tue, 2007-01-30 at 22:20 -0500, Ric Wheeler wrote: > Mark Lord wrote: > > The number of retries is an entirely separate issue. > > If we really care about it, then we should fix SD_MAX_RETRIES. > > > > The current value of 5 is *way* too high. It should be zero or one. > > > > Cheers > > > I think that drives retry enough, we should leave retry at zero for > normal (non-removable) drives. Should this be a policy we can set like > we do with NCQ queue depth via /sys ? I don't disagree that it should be settable. However, retries occur for other reasons than failures inside the device. The most standard ones are unit attentions generated because of other activity (target reset etc). The key to the problem is retrying only operations that are genuinely retryable, which the mid-layer doesn't do such a good job on. > We need to be able to layer things like MD on top of normal drive errors > in a way that will produce a system that provides reasonable response > time despite any possible IO error on a single component. Another case > that we end up doing on a regular basis is drive recovery. Errors need > to be limited in scope to just the impacted area and dispatched up to > the application layer as quickly as we can so that you don't spend days > watching a copy of huge drive (think 750GB or more) ;-) For the MD case, this is what REQ_FAILFAST is for. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the number of blocks past the error to complete the request.. It really beats the alternative of a forced reboot due to, say, superblock I/O failing because it happened to get merged with an unrelated I/O which then failed.. Etc.. Definitely an improvement. The number of retries is an entirely separate issue. If we really care about it, then we should fix SD_MAX_RETRIES. The current value of 5 is *way* too high. It should be zero or one. Cheers I think that drives retry enough, we should leave retry at zero for normal (non-removable) drives. Should this be a policy we can set like we do with NCQ queue depth via /sys ? We need to be able to layer things like MD on top of normal drive errors in a way that will produce a system that provides reasonable response time despite any possible IO error on a single component. Another case that we end up doing on a regular basis is drive recovery. Errors need to be limited in scope to just the impacted area and dispatched up to the application layer as quickly as we can so that you don't spend days watching a copy of huge drive (think 750GB or more) ;-) ric - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
James Bottomley wrote: First off, please send SCSI patches to the SCSI list: Fixed already, thanks! This patch fixes the behaviour to be similar to what we had originally. When a bad sector is encounted, SCSI will now work around it again, failing *only* the bad sector itself. Erm, but the corollary is that if we get a large read failure because of a bad track, you're going to try and chunk up it a sector at a time That's better than the huge data-loss scenario that we currently have for single-sector errors. MUCH better. forcing an individual error for each sector is going to annoy some people ... particularly removable medium ones which return this error if the medium isn't present ... Are you sure this is really what we want to do? No, for removed-medium everything just fails right away. This patch is *only* for media errors, not any other failures. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
First off, please send SCSI patches to the SCSI list: On Tue, 2007-01-30 at 19:47 -0500, Mark Lord wrote: > In ancient kernels, the SCSI disk code used to continue after > encountering a MEDIUM_ERROR. It would "complete" the good > sectors before the error, fail the bad sector/block, and then > continue with the rest of the request. > > Kernels since about 2.6.16 or so have been broken in this regard. > They "complete" the good sectors before the error, > and then fail the entire remaining portions of the request. What was the commit that introduced the change? ... I have a vague memory of it being deliberate. > This is very risky behaviour, as a request is often a merge > of several bios, and just because one application hits a bad sector > is no reason to pretend that (for example) an adjacent directly lookup also > failed. > > This patch fixes the behaviour to be similar to what we had originally. > > When a bad sector is encounted, SCSI will now work around it again, > failing *only* the bad sector itself. Erm, but the corollary is that if we get a large read failure because of a bad track, you're going to try and chunk up it a sector at a time forcing an individual error for each sector is going to annoy some people ... particularly removable medium ones which return this error if the medium isn't present ... Are you sure this is really what we want to do? > Signed-off-by: Mark Lord <[EMAIL PROTECTED]> > --- > diff -u --recursive --new-file > --exclude-from=linux_17//Documentation/dontdiff old/drivers/scsi/scsi_lib.c > linux/drivers/scsi/scsi_lib.c > --- old/drivers/scsi/scsi_lib.c 2007-01-30 13:58:05.0 -0500 > +++ linux/drivers/scsi/scsi_lib.c 2007-01-30 18:30:01.0 -0500 > @@ -865,6 +865,12 @@ >*/ > if (sense_valid && !sense_deferred) { > switch (sshdr.sense_key) { > + case MEDIUM_ERROR: > + // Bad sector. Fail it, and then continue the rest of > the request: > + if (scsi_end_request(cmd, 0, cmd->device->sector_size, > 1) == NULL) { The sense key may have come with additional information I think we want to parse that (if it exists) rather than just blindly failing the first sector of the request. > + cmd->retries = 0; // go around again.. > + return; > + } This would drop through to the UNIT_ATTENTION case if scsi_end_request() fails ... I don't think that's correct. > case UNIT_ATTENTION: > if (cmd->device->removable) { > /* Detected disc change. Set a bit - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the number of blocks past the error to complete the request.. It really beats the alternative of a forced reboot due to, say, superblock I/O failing because it happened to get merged with an unrelated I/O which then failed.. Etc.. Definitely an improvement. The number of retries is an entirely separate issue. If we really care about it, then we should fix SD_MAX_RETRIES. The current value of 5 is *way* too high. It should be zero or one. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AIC7xxx on 2.6.18
NOTE: I am not on the linux-scsi list, keep me in CC. Andrew Morton wrote: > On Tue, 30 Jan 2007 07:18:20 -0500 > Wakko Warner <[EMAIL PROTECTED]> wrote: > > Andrew Morton wrote: > > > Yes, getting the oops traces will help, thanks. And confirmation on a > > > more > > > recent kernel would be good. > > > > I tested with a 2.6.20-rc6 kernel and the MAC 39160 card. There was no oops > > and I was able to access the 2 disks. This was on a different PC though. > > I'll try it again on the original PC. > > Thanks. The PC was a completely different PC when I tried it that time. This time, I tried it on a similar PC (same motherboard model, but not the exact same machine). I had no problems with 2.6.18. I looked a little close and I noticed that the original machine was actually overclocked. I did the same to the machine that works and it is now not working. So the problem with the mac card seems to be the overclocking. I completely forgotten about it since it was a test machine anyway. So this just leaves the problem I've experienced on the machine with the PC u160 and the u/uw dual card. > > Should I try 2.6.19 as well? > > There's not a lot of point in doing so. If/when we come up with a > 2.6.20-rc6 fix we'll know whether it is applicable to 2.6.19.x. I'll try 2.6.19 on the machine with the 2 scsi cards with the option roms disabled. I'd rather not run a -rc kernel on this machine. -- Lab tests show that use of micro$oft causes cancer in lab animals Got Gas??? - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] RESEND scsi_lib.c: continue after MEDIUM_ERROR
Fixed for 80-columns, and copying linux-scsi this time. In ancient kernels, the SCSI disk code used to continue after encountering a MEDIUM_ERROR. It would "complete" the good sectors before the error, fail the bad sector/block, and then continue with the rest of the request. Kernels since about 2.6.16 or so have been broken in this regard. They "complete" the good sectors before the error, and then fail the entire remaining portions of the request. This is very risky behaviour, as a request is often a merge of several bios, and just because one application hits a bad sector is no reason to pretend that (for example) an adjacent directly lookup also failed. This patch fixes the behaviour to be similar to what we had originally. When a bad sector is encounted, SCSI will now work around it again, failing *only* the bad sector itself. Signed-off-by: Mark Lord <[EMAIL PROTECTED]> --- --- old/drivers/scsi/scsi_lib.c 2007-01-30 20:06:15.0 -0500 +++ linux/drivers/scsi/scsi_lib.c 2007-01-30 20:06:59.0 -0500 @@ -865,6 +865,13 @@ */ if (sense_valid && !sense_deferred) { switch (sshdr.sense_key) { + case MEDIUM_ERROR: + /* Bad sector. Fail it, and continue on with the rest */ + if (scsi_end_request(cmd, 0, + cmd->device->sector_size, 1) == NULL) { + cmd->retries = 0; /* go around again.. */ + return; + } case UNIT_ATTENTION: if (cmd->device->removable) { /* Detected disc change. Set a bit - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] scsi: Fix lost EH commands
If an EH command times out today, the LLDD's abort handler will be called to abort the command. It is assumed that this completes successfully, which can result in the command getting completed later resulting in an oops. Improve the current implementation by escalating all the way to host reset if necessary in order to clean up the EH command. Signed-off-by: Brian King <[EMAIL PROTECTED]> --- linux-2.6-bjking1/drivers/scsi/scsi_error.c | 239 ++-- 1 files changed, 123 insertions(+), 116 deletions(-) diff -puN drivers/scsi/scsi_error.c~scsi_fix_eh_lost_cmds drivers/scsi/scsi_error.c --- linux-2.6/drivers/scsi/scsi_error.c~scsi_fix_eh_lost_cmds 2007-01-12 15:42:11.0 -0600 +++ linux-2.6-bjking1/drivers/scsi/scsi_error.c 2007-01-12 15:42:11.0 -0600 @@ -453,6 +453,128 @@ static void scsi_eh_done(struct scsi_cmn } /** + * scsi_try_host_reset - ask host adapter to reset itself + * @scmd: SCSI cmd to send hsot reset. + **/ +static int scsi_try_host_reset(struct scsi_cmnd *scmd) +{ + unsigned long flags; + int rtn; + + SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Snd Host RST\n", + __FUNCTION__)); + + if (!scmd->device->host->hostt->eh_host_reset_handler) + return FAILED; + + rtn = scmd->device->host->hostt->eh_host_reset_handler(scmd); + + if (rtn == SUCCESS) { + if (!scmd->device->host->hostt->skip_settle_delay) + ssleep(HOST_RESET_SETTLE_TIME); + spin_lock_irqsave(scmd->device->host->host_lock, flags); + scsi_report_bus_reset(scmd->device->host, + scmd_channel(scmd)); + spin_unlock_irqrestore(scmd->device->host->host_lock, flags); + } + + return rtn; +} + +/** + * scsi_try_bus_reset - ask host to perform a bus reset + * @scmd: SCSI cmd to send bus reset. + **/ +static int scsi_try_bus_reset(struct scsi_cmnd *scmd) +{ + unsigned long flags; + int rtn; + + SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Snd Bus RST\n", + __FUNCTION__)); + + if (!scmd->device->host->hostt->eh_bus_reset_handler) + return FAILED; + + rtn = scmd->device->host->hostt->eh_bus_reset_handler(scmd); + + if (rtn == SUCCESS) { + if (!scmd->device->host->hostt->skip_settle_delay) + ssleep(BUS_RESET_SETTLE_TIME); + spin_lock_irqsave(scmd->device->host->host_lock, flags); + scsi_report_bus_reset(scmd->device->host, + scmd_channel(scmd)); + spin_unlock_irqrestore(scmd->device->host->host_lock, flags); + } + + return rtn; +} + +/** + * scsi_try_bus_device_reset - Ask host to perform a BDR on a dev + * @scmd: SCSI cmd used to send BDR + * + * Notes: + *There is no timeout for this operation. if this operation is + *unreliable for a given host, then the host itself needs to put a + *timer on it, and set the host back to a consistent state prior to + *returning. + **/ +static int scsi_try_bus_device_reset(struct scsi_cmnd *scmd) +{ + int rtn; + + if (!scmd->device->host->hostt->eh_device_reset_handler) + return FAILED; + + rtn = scmd->device->host->hostt->eh_device_reset_handler(scmd); + if (rtn == SUCCESS) { + scmd->device->was_reset = 1; + scmd->device->expecting_cc_ua = 1; + } + + return rtn; +} + +static int __scsi_try_to_abort_cmd(struct scsi_cmnd *scmd) +{ + if (!scmd->device->host->hostt->eh_abort_handler) + return FAILED; + + return scmd->device->host->hostt->eh_abort_handler(scmd); +} + +/** + * scsi_try_to_abort_cmd - Ask host to abort a running command. + * @scmd: SCSI cmd to abort from Lower Level. + * + * Notes: + *This function will not return until the user's completion function + *has been called. there is no timeout on this operation. if the + *author of the low-level driver wishes this operation to be timed, + *they can provide this facility themselves. helper functions in + *scsi_error.c can be supplied to make this easier to do. + **/ +static int scsi_try_to_abort_cmd(struct scsi_cmnd *scmd) +{ + /* +* scsi_done was called just after the command timed out and before +* we had a chance to process it. (db) +*/ + if (scmd->serial_number == 0) + return SUCCESS; + return __scsi_try_to_abort_cmd(scmd); +} + +static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd) +{ + if (__scsi_try_to_abort_cmd(scmd) != SUCCESS) + if (scsi_try_bus_device_reset(scmd) != SUCCESS) + if (scsi_try_bus_reset(scmd) != SUCCESS) + scsi_try_host_reset(scmd); +} + +/** * scsi_send_eh_cmnd - submit a scsi comma
[PATCH] scsi: Update Aic94xx SAS/SATA Linux open source device driver for new sequence firmware.
Subject: [PATCH] scsi: Update Aic94xx SAS/SATA Linux open source device driver for new sequence firmware. Contribution: Ed Chim <[EMAIL PROTECTED]> Gilbert Wu <[EMAIL PROTECTED]> Change Log: 1.Use dword instead of qword to display the value of Connection State register for debug purpose. 2.There are some registers location of AIC94xx chip has been changed according to the new V28 firmware. The patch has redefined the register location and provided initialization. 3.The new sequencer firmware v28 for Aic94xx SAS/SATA Linux open source device driver can be downloaded from http://www.adaptec.com/NR/exeres/35B611BC-9789-4B5B-82C6-85A2CCA8A46A.ht m Patch: apply to scsi-misc-2.6.git development tree Signed-off-by: Gilbert Wu <[EMAIL PROTECTED]> diff -urN a/drivers/scsi/aic94xx/aic94xx_dump.c b/drivers/scsi/aic94xx/aic94xx_dump.c --- a/drivers/scsi/aic94xx/aic94xx_dump.c 2007-01-29 10:20:44.0 -0800 +++ b/drivers/scsi/aic94xx/aic94xx_dump.c 2007-01-29 10:31:44.0 -0800 @@ -556,7 +556,7 @@ PRINT_LMIP_word(asd_ha, lseq, Q_TGTXFR_TAIL); PRINT_LMIP_byte(asd_ha, lseq, LINK_NUMBER); PRINT_LMIP_byte(asd_ha, lseq, SCRATCH_FLAGS); - PRINT_LMIP_qword(asd_ha, lseq, CONNECTION_STATE); + PRINT_LMIP_dword(asd_ha, lseq, CONNECTION_STATE); PRINT_LMIP_word(asd_ha, lseq, CONCTL); PRINT_LMIP_byte(asd_ha, lseq, CONSTAT); PRINT_LMIP_byte(asd_ha, lseq, CONNECTION_MODES); diff -urN a/drivers/scsi/aic94xx/aic94xx_reg_def.h b/drivers/scsi/aic94xx/aic94xx_reg_def.h --- a/drivers/scsi/aic94xx/aic94xx_reg_def.h2007-01-29 10:21:14.0 -0800 +++ b/drivers/scsi/aic94xx/aic94xx_reg_def.h2007-01-29 10:35:54.0 -0800 @@ -2226,9 +2226,10 @@ #define LmSEQ_SAS_RESET_MODE(LinkNum) (LmSCRATCH(LinkNum) + 0x0074) #define LmSEQ_LINK_RESET_RETRY_COUNT(LinkNum) (LmSCRATCH(LinkNum) + 0x0075) #define LmSEQ_NUM_LINK_RESET_RETRIES(LinkNum) (LmSCRATCH(LinkNum) + 0x0076) -#define LmSEQ_OOB_INT_ENABLES(LinkNum) (LmSCRATCH(LinkNum) + 0x007A) +#define LmSEQ_OOB_INT_ENABLES(LinkNum) (LmSCRATCH(LinkNum) + 0x0078) +#define LmSEQ_NOTIFY_TIMER_DOWN_COUNT(LinkNum) (LmSCRATCH(LinkNum) + 0x007A) #define LmSEQ_NOTIFY_TIMER_TIMEOUT(LinkNum)(LmSCRATCH(LinkNum) + 0x007C) -#define LmSEQ_NOTIFY_TIMER_DOWN_COUNT(LinkNum) (LmSCRATCH(LinkNum) + 0x007E) +#define LmSEQ_NOTIFY_TIMER_INITIAL_COUNT(LinkNum) (LmSCRATCH(LinkNum) + 0x007E) /* Mode dependent scratch page 1, mode 0 and mode 1 */ #define LmSEQ_SG_LIST_PTR_ADDR0(LinkNum)(LmSCRATCH(LinkNum) + 0x0020) diff -urN a/drivers/scsi/aic94xx/aic94xx_seq.c b/drivers/scsi/aic94xx/aic94xx_seq.c --- a/drivers/scsi/aic94xx/aic94xx_seq.c2007-01-29 10:21:28.0 -0800 +++ b/drivers/scsi/aic94xx/aic94xx_seq.c2007-01-29 10:42:55.0 -0800 @@ -810,6 +810,8 @@ /* No delay for the first NOTIFY to be sent to the attached target. */ asd_write_reg_word(asd_ha, LmSEQ_NOTIFY_TIMER_DOWN_COUNT(lseq), ASD_NOTIFY_DOWN_COUNT); + asd_write_reg_word(asd_ha, LmSEQ_NOTIFY_TIMER_INITIAL_COUNT(lseq), + ASD_NOTIFY_DOWN_COUNT); /* LSEQ Mode dependent, mode 0 and 1, page 1 setup. */ for (i = 0; i < 2; i++) { - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
SAS illegal toplogies [was Re: [PATCH 1/4 v2] libsas: Don't BUG when connecting two expanders via wide port]
Darrick J. Wong wrote: > libsas: Don't BUG when connecting two expanders via wide port > > When a device is connected to an expander, the discovery process goes through > sas_ex_discover_dev to figure out what's attached to the phy. If it is the > case that the phy being discovered happens to be the second phy of a wide link > to an expander, that discover_dev function will incorrectly call > sas_ex_discover_expander, which creates another sas_port and tries to attach > the > other sas_phys to the new port, thus triggering a BUG. The correct thing to > do is > to check the other ex_phys of the expander to see if there's a sas_port for > this > sas_phy, and attach the sas_phy to the existing sas_port. > > This is easily triggered if one enables the phys of a wide port between > expanders one by one. > > This second version of the patch fixes a small regression in the case where > all the phys show up at once and we accidentally try to attach to a port > that hasn't been created yet. Darrick, Okay. Now I'm wondering what the discovery algorithm in libsas does if it finds truly illegal connections between expanders. The spec defines what is illegal but says it is vendor specific what will be done. One approach is to use the SMP PHY CONTROL function to disable the phy (or the phys at both ends of the illegal link). The next trick is how to tell the user who just connected a cable between expanders that "you can't do that!". Tools like my smp_discover could alert a user to a disabled phy but without turning it back on (and causing the libsas discovery algorithm another headache) my SMP utilities don't know what it is connected to. Another question is which link to disable. Imagine three expanders interconnected with 3 links which is illegal. Breaking any one link makes it legal, but which one to break? Last seen, or perhaps the link which has the largest SAS address sum ... Doug Gilbert - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] aacraid: Add kernel command line parameter parsing
One shortcoming of the driver relationship with the kernel is that there is no standard means of having the insmod parameters associated with a driver to also be parsed and set by the kernel parameter line. The enclosed patch is a proposal for the aacraid driver to pick up the kernel parameter line, parse it, and then adjust the insmod parameters. The format of the kernel parameter line is aacraid=:[,:]... There may be a better way of providing this service via the kernel without any modifications from the driver, since all the characteristics of the insmod parameters are exported by the MODULE_PARM_* hints. Would such mods be in insmod/modprobe and not in the kernel or driver? Signed-off-by Mark Salyzyn <[EMAIL PROTECTED]> --- Sincerely -- Mark Salyzyn Illegitimi Non Carborundum aacraid_command_line.patch Description: aacraid_command_line.patch
Re: 2.6.20-rc6-mm1
Hi Andrew, Looks good for NTFS thanks! The only thing is that I think we already have a variable "unsigned long flags" in the function ntfs_end_buffer_async_read() so that could be used instead of redefining it more locally in the if statements. Could you send the patch to Linus? Feel free to add my Acked-by or Signed-off-by "Anton Altaparmakov <[EMAIL PROTECTED]> line if you wish (I am not bothered either way)... Thanks a lot for fixing it! Best regards, Anton On Tue, 30 Jan 2007, Andrew Morton wrote: > On Mon, 29 Jan 2007 23:27:27 -0800 > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > On Sun, 28 Jan 2007 11:25:42 +0100 > > Jiri Slaby <[EMAIL PROTECTED]> wrote: > > > > > Andrew Morton napsal(a): > > > > Temporarily at > > > > > > > > http://userweb.kernel.org/~akpm/2.6.20-rc6-mm1/ > > > > > > I'm still seeing this during bootup: > > > BUG: at /home/l/latest/xxx/arch/i386/mm/highmem.c:52 kmap_atomic() > > > [] show_trace_log_lvl+0x1a/0x30 > > > [] show_trace+0x12/0x14 > > > [] dump_stack+0x16/0x18 > > > [] kmap_atomic+0x16c/0x20e > > > [] ntfs_end_buffer_async_read+0x18e/0x2ed > > > [] end_bio_bh_io_sync+0x26/0x3f > > > [] bio_endio+0x37/0x62 > > > [] __end_that_request_first+0x224/0x444 > > > [] end_that_request_chunk+0x8/0xa > > > [] scsi_end_request+0x1f/0xc7 > > > [] scsi_io_completion+0x7b/0x33a > > > [] sd_rw_intr+0x23/0x1ab > > > [] scsi_finish_command+0x42/0x47 > > > [] scsi_softirq_done+0x64/0xcf > > > [] blk_done_softirq+0x54/0x62 > > > [] __do_softirq+0x75/0xde > > > [] do_softirq+0x3b/0x3d > > > [] irq_exit+0x3b/0x3d > > > [] do_IRQ+0x51/0x8d > > > [] common_interrupt+0x23/0x28 > > > [] cpu_idle+0x80/0xc3 > > > [] rest_init+0x23/0x36 > > > [] start_kernel+0x3a5/0x43c > > > [<>] 0x0 > > > === > > > > > > I.e. KM_BIO_SRC_IRQ through softirq path. > > > > > > > argh. > > > > ntfs_end_buffer_async_read() doesn't know whether it will be called from > > hardirq or from softirq context: it depends upon the underlying driver. > > > > In this case, if the CPU running ntfs_end_buffer_async_read() is > > interrupted by IO completion against a different disk controller and that > > completion handler uses KM_BIO_SRC_IRQ (as it is allowed to do), it will > > trash ntfs_end_buffer_async_read()'s atomic kmap and unpleasing things will > > ensue. > > > > I guess a suitable fix here is to protect that kmap with > > local_irq_save/restore. > > > > I wonder where else we have that bug? > > Actually, this isn't related to softirq-vs-hardirq. Most interrupt > handlers are interruptible, so the rule is simply that KM_BIO_SRC_IRQ must > always be taken under local_irq_disable(). > > A quick scan indicates that the following files might be buggy in this > regard: > > drivers/mmc/wbsd.c > drivers/mmc/at91_mci.c > drivers/mmc/sdhci.c > drivers/scsi/scsi_lib.c when called from stex.c > fs/ntfs/aops.c > > Happily, KM_BIO_DST_IRQ has no users and can presumably be removed. > > > Fixes for stex and ntfs follow. > > > From: Andrew Morton <[EMAIL PROTECTED]> > > The KM_BIO_SRC_IRQ kmap slot requires local irq protection. > > Cc: James Bottomley <[EMAIL PROTECTED]> > Cc: Ed Lin <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > --- > > drivers/scsi/stex.c |8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff -puN drivers/scsi/stex.c~stex-kmap_atomic-atomicity-fix > drivers/scsi/stex.c > --- a/drivers/scsi/stex.c~stex-kmap_atomic-atomicity-fix > +++ a/drivers/scsi/stex.c > @@ -459,15 +459,19 @@ static void stex_internal_copy(struct sc > *count = cmd->request_bufflen; > lcount = *count; > while (lcount) { > + unsigned long flags = flags;/* Suppress uninit warning */ > + > len = lcount; > s = (void *)src; > if (cmd->use_sg) { > size_t offset = *count - lcount; > s += offset; > + local_irq_save(flags); > base = scsi_kmap_atomic_sg(cmd->request_buffer, > sg_count, &offset, &len); > if (base == NULL) { > *count -= lcount; > + local_irq_restore(flags); > return; > } > d = base + offset; > @@ -480,8 +484,10 @@ static void stex_internal_copy(struct sc > memcpy(s, d, len); > > lcount -= len; > - if (cmd->use_sg) > + if (cmd->use_sg) { > scsi_kunmap_atomic_sg(base); > + local_irq_restore(flags); > + } > } > } > > _ > > > > From: Andrew Morton <[EMAIL PROTECTED]> > > The KM_BIO_SRC_IRQ kmap slot requires local irq protection. > > Cc: Anton Altaparmakov <[EMAIL PROTECTED]> > Signed-off-by: An
[RFC: 2.6.16 patch] add the areca driver
I'd like to add the areca driver to 2.6.16 - it seems straightforward and doesn't touch other code. Below are the commits I picked from Linus' tree, and the complete patch is attachd. Is there any reason I miss why this driver might not work in 2.6.16? TIA Adrian Commit: f6013cc7f40d9b191a6b879a1941871b54552a81 Author: James Bottomley <[EMAIL PROTECTED]> Sun, 28 Jan 2007 00:54:39 +0100 [SCSI] arcmsr: fix up sysfs values The sysfs files in arcmsr are non-standard in that they aren't simple filename value pairs, the values actually contain preceeding text which would have to be parsed. The idea of sysfs files is that the file name is the description and the contents is a simple value. Fix up arcmsr to conform to this standard. Signed-off-by: James Bottomley <[EMAIL PROTECTED]> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Commit: e43c51964140ae3b11b320fae451f47ecb7763d4 Author: Andrew Morton <[EMAIL PROTECTED]> Sun, 28 Jan 2007 00:53:31 +0100 [SCSI] areca sysfs fix Remove sysfs_remove_bin_file() return-value checking from the areca driver. There's nothing a driver can do if sysfs file removal fails, so we'll soon be changing sysfs_remove_bin_file() to internally print a diagnostic and to return void. Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Commit: 144d09c6b0f3638ba03f9994a01aa0136b86918c Author: Erich Chen <[EMAIL PROTECTED]> Sun, 28 Jan 2007 00:52:30 +0100 [SCSI] arcmsr: initial driver, version 1.20.00.13 arcmsr is a driver for the Areca Raid controller, a host based RAID subsystem that speaks SCSI at the firmware level. This patch is quite a clean up over the initial submission with contributions from: Randy Dunlap <[EMAIL PROTECTED]> Christoph Hellwig <[EMAIL PROTECTED]> Matthew Wilcox <[EMAIL PROTECTED]> Adrian Bunk <[EMAIL PROTECTED]> Signed-off-by: Erich Chen <[EMAIL PROTECTED]> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> patch-areca.gz Description: Binary data
Re: 2.6.20-rc6-mm1
Andrew Morton napsal(a): On Mon, 29 Jan 2007 23:27:27 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: On Sun, 28 Jan 2007 11:25:42 +0100 Jiri Slaby <[EMAIL PROTECTED]> wrote: Andrew Morton napsal(a): Temporarily at http://userweb.kernel.org/~akpm/2.6.20-rc6-mm1/ I'm still seeing this during bootup: BUG: at /home/l/latest/xxx/arch/i386/mm/highmem.c:52 kmap_atomic() [] show_trace_log_lvl+0x1a/0x30 [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] kmap_atomic+0x16c/0x20e [] ntfs_end_buffer_async_read+0x18e/0x2ed [] end_bio_bh_io_sync+0x26/0x3f [] bio_endio+0x37/0x62 [] __end_that_request_first+0x224/0x444 [] end_that_request_chunk+0x8/0xa [] scsi_end_request+0x1f/0xc7 [] scsi_io_completion+0x7b/0x33a [] sd_rw_intr+0x23/0x1ab [] scsi_finish_command+0x42/0x47 [] scsi_softirq_done+0x64/0xcf [] blk_done_softirq+0x54/0x62 [] __do_softirq+0x75/0xde [] do_softirq+0x3b/0x3d [] irq_exit+0x3b/0x3d [] do_IRQ+0x51/0x8d [] common_interrupt+0x23/0x28 [] cpu_idle+0x80/0xc3 [] rest_init+0x23/0x36 [] start_kernel+0x3a5/0x43c [<>] 0x0 === I.e. KM_BIO_SRC_IRQ through softirq path. [...] Actually, this isn't related to softirq-vs-hardirq. Most interrupt I meant that hardirq path was fixed (by adding KM_BIO_SRC_IRQ to kmap_atomic "type !=" test in arch/i386/mm/highmem.c) and softirq was not yet. handlers are interruptible, so the rule is simply that KM_BIO_SRC_IRQ must always be taken under local_irq_disable(). A quick scan indicates that the following files might be buggy in this regard: drivers/mmc/wbsd.c drivers/mmc/at91_mci.c drivers/mmc/sdhci.c drivers/scsi/scsi_lib.c when called from stex.c fs/ntfs/aops.c Happily, KM_BIO_DST_IRQ has no users and can presumably be removed. Fixes for stex and ntfs follow. Clean boot now. thanks, -- http://www.fi.muni.cz/~xslaby/Jiri Slaby faculty of informatics, masaryk university, brno, cz e-mail: jirislaby gmail com, gpg pubkey fingerprint: B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc6-mm1
Andrew Morton wrote: > > A quick scan indicates that the following files might be buggy in this > regard: > > drivers/mmc/wbsd.c > drivers/mmc/sdhci.c This are probably even buggier than so. They really should be using page_address(), it seems that kmap_atomic() gives the same result when not using highmem (which they are carful to avoid). I'll put on the paper bag and whip up a patch. Rgds Pierre signature.asc Description: OpenPGP digital signature
[PATCH 1/4 v2] libsas: Don't BUG when connecting two expanders via wide port
libsas: Don't BUG when connecting two expanders via wide port When a device is connected to an expander, the discovery process goes through sas_ex_discover_dev to figure out what's attached to the phy. If it is the case that the phy being discovered happens to be the second phy of a wide link to an expander, that discover_dev function will incorrectly call sas_ex_discover_expander, which creates another sas_port and tries to attach the other sas_phys to the new port, thus triggering a BUG. The correct thing to do is to check the other ex_phys of the expander to see if there's a sas_port for this sas_phy, and attach the sas_phy to the existing sas_port. This is easily triggered if one enables the phys of a wide port between expanders one by one. This second version of the patch fixes a small regression in the case where all the phys show up at once and we accidentally try to attach to a port that hasn't been created yet. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/sas_expander.c | 30 ++ 1 files changed, 30 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 114e26c..2f3b8e1 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -736,6 +736,29 @@ static struct domain_device *sas_ex_disc return NULL; } +/* See if this phy is part of a wide port */ +static int sas_ex_join_wide_port(struct domain_device *parent, int phy_id) +{ + struct ex_phy *phy = &parent->ex_dev.ex_phy[phy_id]; + int i; + + for (i = 0; i < parent->ex_dev.num_phys; i++) { + struct ex_phy *ephy = &parent->ex_dev.ex_phy[i]; + + if (ephy == phy) + continue; + + if (!memcmp(phy->attached_sas_addr, ephy->attached_sas_addr, + SAS_ADDR_SIZE) && ephy->port) { + sas_port_add_phy(ephy->port, phy->phy); + phy->phy_state = PHY_DEVICE_DISCOVERED; + return 0; + } + } + + return -ENODEV; +} + static struct domain_device *sas_ex_discover_expander( struct domain_device *parent, int phy_id) { @@ -868,6 +891,13 @@ static int sas_ex_discover_dev(struct do return res; } + res = sas_ex_join_wide_port(dev, phy_id); + if (!res) { + SAS_DPRINTK("Attaching ex phy%d to wide port %016llx\n", + phy_id, SAS_ADDR(ex_phy->attached_sas_addr)); + return res; + } + switch (ex_phy->attached_dev_type) { case SAS_END_DEV: child = sas_ex_discover_end_dev(dev, phy_id); - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc6-mm1
On Mon, 29 Jan 2007 23:27:27 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Sun, 28 Jan 2007 11:25:42 +0100 > Jiri Slaby <[EMAIL PROTECTED]> wrote: > > > Andrew Morton napsal(a): > > > Temporarily at > > > > > > http://userweb.kernel.org/~akpm/2.6.20-rc6-mm1/ > > > > I'm still seeing this during bootup: > > BUG: at /home/l/latest/xxx/arch/i386/mm/highmem.c:52 kmap_atomic() > > [] show_trace_log_lvl+0x1a/0x30 > > [] show_trace+0x12/0x14 > > [] dump_stack+0x16/0x18 > > [] kmap_atomic+0x16c/0x20e > > [] ntfs_end_buffer_async_read+0x18e/0x2ed > > [] end_bio_bh_io_sync+0x26/0x3f > > [] bio_endio+0x37/0x62 > > [] __end_that_request_first+0x224/0x444 > > [] end_that_request_chunk+0x8/0xa > > [] scsi_end_request+0x1f/0xc7 > > [] scsi_io_completion+0x7b/0x33a > > [] sd_rw_intr+0x23/0x1ab > > [] scsi_finish_command+0x42/0x47 > > [] scsi_softirq_done+0x64/0xcf > > [] blk_done_softirq+0x54/0x62 > > [] __do_softirq+0x75/0xde > > [] do_softirq+0x3b/0x3d > > [] irq_exit+0x3b/0x3d > > [] do_IRQ+0x51/0x8d > > [] common_interrupt+0x23/0x28 > > [] cpu_idle+0x80/0xc3 > > [] rest_init+0x23/0x36 > > [] start_kernel+0x3a5/0x43c > > [<>] 0x0 > > === > > > > I.e. KM_BIO_SRC_IRQ through softirq path. > > > > argh. > > ntfs_end_buffer_async_read() doesn't know whether it will be called from > hardirq or from softirq context: it depends upon the underlying driver. > > In this case, if the CPU running ntfs_end_buffer_async_read() is > interrupted by IO completion against a different disk controller and that > completion handler uses KM_BIO_SRC_IRQ (as it is allowed to do), it will > trash ntfs_end_buffer_async_read()'s atomic kmap and unpleasing things will > ensue. > > I guess a suitable fix here is to protect that kmap with > local_irq_save/restore. > > I wonder where else we have that bug? Actually, this isn't related to softirq-vs-hardirq. Most interrupt handlers are interruptible, so the rule is simply that KM_BIO_SRC_IRQ must always be taken under local_irq_disable(). A quick scan indicates that the following files might be buggy in this regard: drivers/mmc/wbsd.c drivers/mmc/at91_mci.c drivers/mmc/sdhci.c drivers/scsi/scsi_lib.c when called from stex.c fs/ntfs/aops.c Happily, KM_BIO_DST_IRQ has no users and can presumably be removed. Fixes for stex and ntfs follow. From: Andrew Morton <[EMAIL PROTECTED]> The KM_BIO_SRC_IRQ kmap slot requires local irq protection. Cc: James Bottomley <[EMAIL PROTECTED]> Cc: Ed Lin <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/scsi/stex.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff -puN drivers/scsi/stex.c~stex-kmap_atomic-atomicity-fix drivers/scsi/stex.c --- a/drivers/scsi/stex.c~stex-kmap_atomic-atomicity-fix +++ a/drivers/scsi/stex.c @@ -459,15 +459,19 @@ static void stex_internal_copy(struct sc *count = cmd->request_bufflen; lcount = *count; while (lcount) { + unsigned long flags = flags;/* Suppress uninit warning */ + len = lcount; s = (void *)src; if (cmd->use_sg) { size_t offset = *count - lcount; s += offset; + local_irq_save(flags); base = scsi_kmap_atomic_sg(cmd->request_buffer, sg_count, &offset, &len); if (base == NULL) { *count -= lcount; + local_irq_restore(flags); return; } d = base + offset; @@ -480,8 +484,10 @@ static void stex_internal_copy(struct sc memcpy(s, d, len); lcount -= len; - if (cmd->use_sg) + if (cmd->use_sg) { scsi_kunmap_atomic_sg(base); + local_irq_restore(flags); + } } } _ From: Andrew Morton <[EMAIL PROTECTED]> The KM_BIO_SRC_IRQ kmap slot requires local irq protection. Cc: Anton Altaparmakov <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/ntfs/aops.c |6 ++ 1 file changed, 6 insertions(+) diff -puN fs/ntfs/aops.c~ntfs-kmap_atomic-atomicity-fix fs/ntfs/aops.c --- a/fs/ntfs/aops.c~ntfs-kmap_atomic-atomicity-fix +++ a/fs/ntfs/aops.c @@ -88,14 +88,17 @@ static void ntfs_end_buffer_async_read(s if (unlikely(file_ofs + bh->b_size > init_size)) { u8 *kaddr; int ofs; + unsigned long flags; ofs = 0; if (file_ofs < init_size) ofs = init_size - file_ofs; + local_irq_save(flags); kaddr = kmap_atomic(page, KM_BIO_SRC_IRQ);
Re: AIC7xxx on 2.6.18
On Tue, 30 Jan 2007 07:18:20 -0500 Wakko Warner <[EMAIL PROTECTED]> wrote: > NOTE: I am not on the linux-scsi list, keep me in CC. > > Andrew Morton wrote: > > On Sun, 28 Jan 2007 14:46:20 -0500 > > Wakko Warner <[EMAIL PROTECTED]> wrote: > > > > > I have 2 machine that oops with these cards. > > > > > > 1) The bios has the option to enable/disable option roms on individual PCI > > > slots. I have an AHA-39160 and an AHA-2940U/UW (dual channel). If I > > > disable option roms, the driver oopses when accessing the 2nd card. > > > > > > I can get the oops if really needed as I don't like rebooting this > > > machine. > > > > > > 2) I have an AHA-39160 with Apple/Mac firmware. When attempting to use it > > > on a PC, the driver oopses presumably because the card wasn't initialized > > > or > > > something. I realize this is probably not a supported configuration, but > > > I > > > don't believe that it should be oopsing. > > > > > > I can get the oops for this one if it'll help. > > > > Yes, getting the oops traces will help, thanks. And confirmation on a more > > recent kernel would be good. > > I tested with a 2.6.20-rc6 kernel and the MAC 39160 card. There was no oops > and I was able to access the 2 disks. This was on a different PC though. > I'll try it again on the original PC. Thanks. > Should I try 2.6.19 as well? There's not a lot of point in doing so. If/when we come up with a 2.6.20-rc6 fix we'll know whether it is applicable to 2.6.19.x. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: SAS1068 PCI-X Fusion-MPT SAS 1000:0055
Hi, Also seen on a NEC server, a 1068 chip with a jumper used to switch chip PCI ID and its BIOS: - PCI ID = 0054 => 'MPT Fusion' BIOS - PCI ID = 0055 => 'MegaRAID' BIOS I'm feeling that I submit this unusual chip ID to pciid DB some month ago... More important: there's a driver for this chip when it is used in 'MegaRAID' mode (standard 'mptsas' driver may be used for MPT Fusion mode) . This driver is named 'megasr' and is available (binaries) from several server vendors (Intel/Supermicro/Hitachi...) for standard distro (RH,Suse). Seems that this driver is provided by LSI (modinfo)... regards -- Fred Moore, Eric a écrit : > On Friday, January 26, 2007 12:53 PM, Jun'ichi Nomura wrote: >> Hi, >> >>> I have new NEC server with SAS1068 PCI-X Fusion-MPT SAS >>> pciid: 1000:0055 >>> mptsas form 2.6.20-rc5 don't recognize it ;( >>> >>> I see that driver support only 1000:0054 and 1000:0058 devices. >> It might be that the device has software RAID feature and changes >> device ID based on setup. (1000:0055 when software RAID is enabled >> and 1000:0054 or something for normal SAS) >> >> If so, there is a chance you can disable the software RAID >> via BIOS setup utility. >> >> Thanks, >> -- >> Jun'ichi Nomura, NEC Corporation of America >> > > You probably want to talk to the megaraid folks and see > if the have a driver for that. > > I didn't submit a device id of 0055 to sourceforge. > > The only 1068 ids that are clamied by mptsas is 0054 and 0058 > which are the pcix and pcie solutions. I notice that 0055 is > listed in repository, but it was not me that submitted that. > http://pci-ids.ucw.cz/iii/?i=1000 > > Eric Moore > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sym53c500_cs: remove bogus call fo free_dma()
What DMA for 16bit pcmcia card, anyway? We never do request_dma() there and ->dma_channel never changes since initialization to -1. IOW, that call is dead code. Signed-off-by: Al Viro <[EMAIL PROTECTED]> --- drivers/scsi/pcmcia/sym53c500_cs.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/pcmcia/sym53c500_cs.c b/drivers/scsi/pcmcia/sym53c500_cs.c index 9fb0ea5..5b458d2 100644 --- a/drivers/scsi/pcmcia/sym53c500_cs.c +++ b/drivers/scsi/pcmcia/sym53c500_cs.c @@ -545,8 +545,6 @@ SYM53C500_release(struct pcmcia_device *link) */ if (shost->irq) free_irq(shost->irq, shost); - if (shost->dma_channel != 0xff) - free_dma(shost->dma_channel); if (shost->io_port && shost->n_io_port) release_region(shost->io_port, shost->n_io_port); -- 1.5.0-rc2.GIT - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AIC7xxx on 2.6.18
NOTE: I am not on the linux-scsi list, keep me in CC. Andrew Morton wrote: > On Sun, 28 Jan 2007 14:46:20 -0500 > Wakko Warner <[EMAIL PROTECTED]> wrote: > > > I have 2 machine that oops with these cards. > > > > 1) The bios has the option to enable/disable option roms on individual PCI > > slots. I have an AHA-39160 and an AHA-2940U/UW (dual channel). If I > > disable option roms, the driver oopses when accessing the 2nd card. > > > > I can get the oops if really needed as I don't like rebooting this machine. > > > > 2) I have an AHA-39160 with Apple/Mac firmware. When attempting to use it > > on a PC, the driver oopses presumably because the card wasn't initialized or > > something. I realize this is probably not a supported configuration, but I > > don't believe that it should be oopsing. > > > > I can get the oops for this one if it'll help. > > Yes, getting the oops traces will help, thanks. And confirmation on a more > recent kernel would be good. I tested with a 2.6.20-rc6 kernel and the MAC 39160 card. There was no oops and I was able to access the 2 disks. This was on a different PC though. I'll try it again on the original PC. Should I try 2.6.19 as well? -- Lab tests show that use of micro$oft causes cancer in lab animals Got Gas??? - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/12] sas_ata: Make this a module separate from libsas
Break out sas_ata as a free-standing module that provides a SATA Translation Layer (SATL) for libsas. This patch requires the libsas SATL registration patch; the changes to sas_ata itself are rather minor. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/Makefile |5 +++-- drivers/scsi/libsas/sas_ata.c | 37 ++--- 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/libsas/Makefile b/drivers/scsi/libsas/Makefile index 6383eb5..5e95902 100644 --- a/drivers/scsi/libsas/Makefile +++ b/drivers/scsi/libsas/Makefile @@ -33,5 +33,6 @@ libsas-y += sas_init.o \ sas_dump.o \ sas_discover.o \ sas_expander.o \ - sas_scsi_host.o \ - sas_ata.o + sas_scsi_host.o + +obj-$(CONFIG_SCSI_SAS_SATL) += sas_ata.o diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 1b7221c..f75fa59 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -404,8 +404,8 @@ static struct ata_port_info sata_port_in .port_ops = &sas_sata_ops }; -int sas_ata_init_host_and_port(struct domain_device *found_dev, - struct scsi_target *starget) +static int sas_ata_init_host_and_port(struct domain_device *found_dev, + struct scsi_target *starget) { struct Scsi_Host *shost = dev_to_shost(&starget->dev); struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost); @@ -431,7 +431,7 @@ int sas_ata_init_host_and_port(struct do return 0; } -void sas_ata_task_abort(struct sas_task *task) +static void sas_ata_task_abort(struct sas_task *task) { struct ata_queued_cmd *qc = task->uldd_task; struct completion *waiting; @@ -450,3 +450,34 @@ void sas_ata_task_abort(struct sas_task waiting = qc->private_data; complete(waiting); } + +/* Module initialization */ +static struct satl_operations sas_ata_ops = { + .owner = THIS_MODULE, + .init_target= sas_ata_init_host_and_port, + .queuecommand = ata_sas_queuecmd, + .ioctl = ata_scsi_ioctl, + .configure_port = ata_sas_slave_configure, + .deactivate_port= ata_port_disable, + .destroy_port = ata_sas_port_destroy, + .init_port = ata_sas_port_init, + .task_abort = sas_ata_task_abort +}; + +static int __init sas_ata_init(void) +{ + return sas_register_satl(&sas_ata_ops); +} + +static void __exit sas_ata_exit(void) +{ + sas_unregister_satl(&sas_ata_ops); +} + +module_init(sas_ata_init); +module_exit(sas_ata_exit); + +MODULE_AUTHOR("Darrick Wong <[EMAIL PROTECTED]>"); +MODULE_DESCRIPTION("libata SATL for SAS"); +MODULE_LICENSE("GPL v2"); +MODULE_VERSION("1.0"); - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/12] libsas: Provide a generic SATL registration function
Decouple libsas and sas_ata so that the latter can be provided as a plug-in module for the former. Any module wishing to provide SATL services registers itself with libsas; when SATA devices are discovered, libsas will module_get/put as necessary to ensure that the module cannot go away accidentally. At this time, we cannot start a SAS HBA without a SATL, load a SATL later, and then rerun device discovery; that may be addressed in a later patch. A phy reset will do the job quite nicely. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/Kconfig | 11 +++ drivers/scsi/libsas/sas_discover.c |6 -- drivers/scsi/libsas/sas_scsi_host.c | 137 --- include/scsi/libsas.h | 30 +--- include/scsi/sas_ata.h | 38 +- 5 files changed, 176 insertions(+), 46 deletions(-) diff --git a/drivers/scsi/libsas/Kconfig b/drivers/scsi/libsas/Kconfig index b64e391..9c06eec 100644 --- a/drivers/scsi/libsas/Kconfig +++ b/drivers/scsi/libsas/Kconfig @@ -24,12 +24,21 @@ # config SCSI_SAS_LIBSAS tristate "SAS Domain Transport Attributes" - depends on SCSI && ATA + depends on SCSI select SCSI_SAS_ATTRS help This provides transport specific helpers for SAS drivers which use the domain device construct (like the aic94xxx). +config SCSI_SAS_SATL + tristate "Serial ATA Translation Layer (SATL) on SAS controllers" + depends on SCSI_SAS_LIBSAS && ATA + default y + help + This provides an ATA translation layer between libsas and + libata to load SATA devices that are connected to SAS + controllers. + config SCSI_SAS_LIBSAS_DEBUG bool "Compile the SAS Domain Transport Attributes in debug mode" default y diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c index a18c0f6..56cc8da 100644 --- a/drivers/scsi/libsas/sas_discover.c +++ b/drivers/scsi/libsas/sas_discover.c @@ -476,12 +476,6 @@ cont1: if (!dev->parent) sas_sata_propagate_sas_addr(dev); - /* XXX Hint: register this SATA device with SATL. - When this returns, dev->sata_dev->lu is alive and - present. - sas_satl_register_dev(dev); - */ - sas_fill_in_rphy(dev, dev->rphy); return 0; diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index a30c0b7..073b6a7 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -44,6 +44,10 @@ #include /* -- SCSI Host glue -- */ + +static DEFINE_SPINLOCK(satl_ops_lock); +static struct satl_operations *satl_ops; + static void sas_scsi_task_done(struct sas_task *task) { struct task_status_struct *ts = &task->task_status; @@ -213,8 +217,8 @@ int sas_queuecommand(struct scsi_cmnd *c unsigned long flags; spin_lock_irqsave(dev->sata_dev.ap->lock, flags); - res = ata_sas_queuecmd(cmd, scsi_done, - dev->sata_dev.ap); + res = satl_ops->queuecommand(cmd, scsi_done, +dev->sata_dev.ap); spin_unlock_irqrestore(dev->sata_dev.ap->lock, flags); goto out; } @@ -663,8 +667,9 @@ int sas_ioctl(struct scsi_device *sdev, { struct domain_device *dev = sdev_to_domain_dev(sdev); - if (dev_is_sata(dev)) - return ata_scsi_ioctl(sdev, cmd, arg); + if (dev_is_sata(dev)) { + return satl_ops->ioctl(sdev, cmd, arg); + } return -EINVAL; } @@ -705,6 +710,29 @@ static inline struct domain_device *sas_ return sas_find_dev_by_rphy(rphy); } +static int sas_target_alloc_sata(struct domain_device *dev, +struct scsi_target *starget) +{ + int res = -ENODEV; + + /* Do we have a SATL available? */ + if (!get_satl()) + goto satl_found; + + request_module("sas_ata"); + if (!get_satl()) + goto satl_found; + + SAS_DPRINTK("sas_ata not loaded, ignoring SATA devices\n"); + goto no_satl; + +satl_found: + res = satl_ops->init_target(dev, starget); + +no_satl: + return res; +} + int sas_target_alloc(struct scsi_target *starget) { struct domain_device *found_dev = sas_find_target(starget); @@ -714,7 +742,7 @@ int sas_target_alloc(struct scsi_target return -ENODEV; if (dev_is_sata(found_dev)) { - res = sas_ata_init_host_and_port(found_dev, starget); + res = sas_target_alloc_sata(found_dev, starget); if (res) return res; } @@ -734,7 +762,7 @@ int sas_slave_configure(struct scsi_devi BUG_ON(dev->r
[PATCH 10/12] sas_ata: Implement sas_task_abort for ATA devices
ATA devices need special handling for sas_task_abort. If the ATA command came from SCSI, then we merely need to tell SCSI to abort the scsi_cmnd. However, internal commands require a bit more work--we need to fill the qc with the appropriate error status and complete the command, and eventually post_internal will issue the actual ABORT TASK. --- drivers/scsi/libsas/sas_ata.c | 47 +-- drivers/scsi/libsas/sas_internal.h |3 ++ drivers/scsi/libsas/sas_scsi_host.c |8 -- include/scsi/sas_ata.h |2 + 4 files changed, 54 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 8111222..1b7221c 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -30,6 +30,8 @@ #include #include #include #include "../scsi_sas_internal.h" +#include "../scsi_transport_api.h" +#include static enum ata_completion_errors sas_to_ata_err(struct task_status_struct *ts) { @@ -91,6 +93,7 @@ static void sas_ata_task_done(struct sas struct domain_device *dev; struct task_status_struct *stat = &task->task_status; struct ata_task_resp *resp = (struct ata_task_resp *)stat->buf; + struct sas_ha_struct *sas_ha; enum ata_completion_errors ac; unsigned long flags; @@ -98,6 +101,7 @@ static void sas_ata_task_done(struct sas goto qc_already_gone; dev = qc->ap->private_data; + sas_ha = dev->port->ha; spin_lock_irqsave(dev->sata_dev.ap->lock, flags); if (stat->stat == SAS_PROTO_RESPONSE || stat->stat == SAM_GOOD) { @@ -124,6 +128,20 @@ static void sas_ata_task_done(struct sas ata_qc_complete(qc); spin_unlock_irqrestore(dev->sata_dev.ap->lock, flags); + /* +* If the sas_task has an ata qc, a scsi_cmnd and the aborted +* flag is set, then we must have come in via the libsas EH +* functions. When we exit this function, we need to put the +* scsi_cmnd on the list of finished errors. The ata_qc_complete +* call cleans up the libata side of things but we're protected +* from the scsi_cmnd going away because the scsi_cmnd is owned +* by the EH, making libata's call to scsi_done a NOP. +*/ + spin_lock_irqsave(&task->task_state_lock, flags); + if (qc->scsicmd && task->task_state_flags & SAS_TASK_STATE_ABORTED) + scsi_eh_finish_cmd(qc->scsicmd, &sas_ha->eh_done_q); + spin_unlock_irqrestore(&task->task_state_lock, flags); + qc_already_gone: list_del_init(&task->list); sas_free_task(task); @@ -259,15 +277,18 @@ static void sas_ata_post_internal(struct * ought to abort the task. */ struct sas_task *task = qc->lldd_task; - struct domain_device *dev = qc->ap->private_data; + unsigned long flags; qc->lldd_task = NULL; if (task) { + /* Should this be a AT(API) device reset? */ + spin_lock_irqsave(&task->task_state_lock, flags); + task->task_state_flags |= SAS_TASK_NEED_DEV_RESET; + spin_unlock_irqrestore(&task->task_state_lock, flags); + task->uldd_task = NULL; __sas_task_abort(task); } - - sas_phy_reset(dev->port->phy, 1); } } @@ -409,3 +430,23 @@ int sas_ata_init_host_and_port(struct do return 0; } + +void sas_ata_task_abort(struct sas_task *task) +{ + struct ata_queued_cmd *qc = task->uldd_task; + struct completion *waiting; + + /* Bounce SCSI-initiated commands to the SCSI EH */ + if (qc->scsicmd) { + scsi_req_abort_cmd(qc->scsicmd); + scsi_schedule_eh(qc->scsicmd->device->host); + return; + } + + /* Internal command, fake a timeout and complete. */ + qc->flags &= ~ATA_QCFLAG_ACTIVE; + qc->flags |= ATA_QCFLAG_FAILED; + qc->err_mask |= AC_ERR_TIMEOUT; + waiting = qc->private_data; + complete(waiting); +} diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h index a78638d..2b8213b 100644 --- a/drivers/scsi/libsas/sas_internal.h +++ b/drivers/scsi/libsas/sas_internal.h @@ -39,6 +39,9 @@ #else #define SAS_DPRINTK(fmt, ...) #endif +#define TO_SAS_TASK(_scsi_cmd) ((void *)(_scsi_cmd)->host_scribble) +#define ASSIGN_SAS_TASK(_sc, _t) do { (_sc)->host_scribble = (void *) _t; } while (0) + void sas_scsi_recover_host(struct Scsi_Host *shost); int sas_show_class(enum sas_class class, char *buf); diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index 5b0c471..a30c0b7 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -44,9 +44,6 @@ #include /*
[PATCH 08/12] libsas: Unknown STP devices should be reported to libata as unknown.
When libsas encounters a STP device whose protocol isn't recognized (i.e. not ATA or ATAPI), we should set the ata_device's class to ATA_DEV_UNKNOWN instead of ATA_DEV_ATA. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/sas_ata.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 20f3a5e..7ebda69 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -232,7 +232,7 @@ static void sas_ata_phy_reset(struct ata SAS_DPRINTK("%s: Unknown SATA command set: %d.\n", __FUNCTION__, dev->sata_dev.command_set); - ap->device[0].class = ATA_DEV_ATA; + ap->device[0].class = ATA_DEV_UNKNOWN; break; } - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/12] sas_ata: Assign sas_task to scsi_cmnd to enable EH for ATA devices
The SATL should connect the scsi_cmnd to the sas_task (despite the presence of libata) so that requests to abort scsi_cmnds headed to the ATA device can be processed by the EH and aborted correctly. The abort status should still be propagated from sas -> ata -> scsi. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/sas_ata.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 7ebda69..8111222 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -119,6 +119,8 @@ static void sas_ata_task_done(struct sas } qc->lldd_task = NULL; + if (qc->scsicmd) + ASSIGN_SAS_TASK(qc->scsicmd, NULL); ata_qc_complete(qc); spin_unlock_irqrestore(dev->sata_dev.ap->lock, flags); @@ -184,6 +186,9 @@ static unsigned int sas_ata_qc_issue(str break; } + if (qc->scsicmd) + ASSIGN_SAS_TASK(qc->scsicmd, task); + if (sas_ha->lldd_max_execute_num < 2) res = i->dft->lldd_execute_task(task, 1, GFP_ATOMIC); else @@ -193,6 +198,8 @@ static unsigned int sas_ata_qc_issue(str if (res) { SAS_DPRINTK("lldd_execute_task returned: %d\n", res); + if (qc->scsicmd) + ASSIGN_SAS_TASK(qc->scsicmd, NULL); sas_free_task(task); return AC_ERR_SYSTEM; } - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/12] libsas: Accept SAM_GOOD for ATAPI devices in sas_ata_task_done
A sas_task sent to an ATAPI devices returns SAM_GOOD if successful. Therefore, we should treat this the same way we treat ATA commands that succeed. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/sas_ata.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 2bb619e..20f3a5e 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -100,7 +100,7 @@ static void sas_ata_task_done(struct sas dev = qc->ap->private_data; spin_lock_irqsave(dev->sata_dev.ap->lock, flags); - if (stat->stat == SAS_PROTO_RESPONSE) { + if (stat->stat == SAS_PROTO_RESPONSE || stat->stat == SAM_GOOD) { ata_tf_from_fis(resp->ending_fis, &dev->sata_dev.tf); qc->err_mask |= ac_err_mask(dev->sata_dev.tf.command); dev->sata_dev.sstatus = resp->sstatus; - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/12] sas_ata: Don't copy aic94xx's sactive to ata_port
Since the aic94xx sequencer assigns its own NCQ tags to ATA commands, it no longer makes any sense to copy the sactive field in the STP response to ata_port->sactive, as that will confuse libata. Also, libata seems to be capable of managing sactive on its own. The attached patch gets rid of one of the causes of the BUG messages in ata_qc_new, and seems to work without problems on an IBM x206m. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/sas_ata.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index c8af884..16c3e5a 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -106,7 +106,6 @@ static void sas_ata_task_done(struct sas dev->sata_dev.sstatus = resp->sstatus; dev->sata_dev.serror = resp->serror; dev->sata_dev.scontrol = resp->scontrol; - dev->sata_dev.ap->sactive = resp->sactive; } else if (stat->stat != SAM_STAT_GOOD) { ac = sas_to_ata_err(stat); if (ac) { - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/12] sas_ata: Implement SATA PHY control
This patch requires "libsas: Add a sysfs knob to enable/disable a phy" to be applied. It hooks the SControl write function to provide basic SATA phy control for phy enable/disable and speed limits. Power management is still broken, though it is unclear that libata actually uses those SControl bits anyway. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/sas_ata.c | 42 ++- drivers/scsi/libsas/sas_scsi_host.c |1 + 2 files changed, 42 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 16c3e5a..2bb619e 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -270,6 +270,46 @@ static void sas_ata_tf_read(struct ata_p memcpy(tf, &dev->sata_dev.tf, sizeof (*tf)); } +static void sas_ata_scontrol_write(struct domain_device *dev, u32 val) +{ + u32 tmp = dev->sata_dev.scontrol; + struct sas_phy *phy = dev->port->phy; + + val &= 0x0FF; /* only set max spd and dev ctrl */ + val |= 0x300; /* disallow host pm */ + val |= tmp & 0xF000; /* preserve upper bits */ + + /* disable phy */ + if ((val & 0x4) && !(tmp & 0x4)) + sas_phy_enable(phy, 0); + + /* enable phy */ + if (!(val & 0x4) && (tmp & 0x4)) + sas_phy_enable(phy, 1); + + /* reset phy */ + if ((val & 0x1) && !(tmp & 0x1)) + sas_phy_reset(phy, 0); + + /* speed limit */ + if ((val & 0xF0) != (tmp & 0xF0)) { + struct sas_phy_linkrates rates = {0}; + + switch ((val & 0xF0) >> 4) { + case 0: + case 2: + rates.maximum_linkrate = SAS_LINK_RATE_3_0_GBPS; + break; + case 1: + rates.maximum_linkrate = SAS_LINK_RATE_1_5_GBPS; + break; + } + sas_set_phy_speed(phy, &rates); + } + + dev->sata_dev.scontrol = val; +} + static void sas_ata_scr_write(struct ata_port *ap, unsigned int sc_reg_in, u32 val) { @@ -281,7 +321,7 @@ static void sas_ata_scr_write(struct ata dev->sata_dev.sstatus = val; break; case SCR_CONTROL: - dev->sata_dev.scontrol = val; + sas_ata_scontrol_write(dev, val); break; case SCR_ERROR: dev->sata_dev.serror = val; diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index fee9c10..5b0c471 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -1040,3 +1040,4 @@ EXPORT_SYMBOL_GPL(sas_eh_device_reset_ha EXPORT_SYMBOL_GPL(sas_slave_alloc); EXPORT_SYMBOL_GPL(sas_target_destroy); EXPORT_SYMBOL_GPL(sas_ioctl); +EXPORT_SYMBOL_GPL(sas_set_phy_speed); - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/12] sas_ata: ata_post_internal should abort the sas_task
This patch adds a new field, lldd_task, to ata_queued_cmd so that libata users such as libsas can associate some data with a qc. The particular ambition with this patch is to associate a sas_task with a qc; that way, if libata decides to timeout a command, we can come back (in sas_ata_post_internal) and abort the sas task. One question remains: Is it necessary to reset the phy on error, or will the libata error handler take care of it? (Assuming that one is written, of course.) This patch, as it is today, works well enough to clean things up when an ATA device probe attempt fails halfway through the probe, though I'm not sure this is always the right thing to do. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/sas_ata.c | 30 +++--- include/linux/libata.h|1 + 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 46e1dbe..c8af884 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -88,12 +88,17 @@ static enum ata_completion_errors sas_to static void sas_ata_task_done(struct sas_task *task) { struct ata_queued_cmd *qc = task->uldd_task; - struct domain_device *dev = qc->ap->private_data; + struct domain_device *dev; struct task_status_struct *stat = &task->task_status; struct ata_task_resp *resp = (struct ata_task_resp *)stat->buf; enum ata_completion_errors ac; unsigned long flags; + if (!qc) + goto qc_already_gone; + + dev = qc->ap->private_data; + spin_lock_irqsave(dev->sata_dev.ap->lock, flags); if (stat->stat == SAS_PROTO_RESPONSE) { ata_tf_from_fis(resp->ending_fis, &dev->sata_dev.tf); @@ -114,9 +119,11 @@ static void sas_ata_task_done(struct sas } } + qc->lldd_task = NULL; ata_qc_complete(qc); spin_unlock_irqrestore(dev->sata_dev.ap->lock, flags); +qc_already_gone: list_del_init(&task->list); sas_free_task(task); } @@ -166,6 +173,7 @@ static unsigned int sas_ata_qc_issue(str task->scatter = qc->__sg; task->ata_task.retry_count = 1; task->task_state_flags = SAS_TASK_STATE_PENDING; + qc->lldd_task = task; switch (qc->tf.protocol) { case ATA_PROT_NCQ: @@ -237,8 +245,24 @@ static void sas_ata_post_internal(struct if (qc->flags & ATA_QCFLAG_FAILED) qc->err_mask |= AC_ERR_OTHER; - if (qc->err_mask) - SAS_DPRINTK("%s: Failure; reset phy!\n", __FUNCTION__); + if (qc->err_mask) { + /* +* Find the sas_task and kill it. By this point, +* libata has decided to kill the qc, so we needn't +* bother with sas_ata_task_done. But we still +* ought to abort the task. +*/ + struct sas_task *task = qc->lldd_task; + struct domain_device *dev = qc->ap->private_data; + + qc->lldd_task = NULL; + if (task) { + task->uldd_task = NULL; + __sas_task_abort(task); + } + + sas_phy_reset(dev->port->phy, 1); + } } static void sas_ata_tf_read(struct ata_port *ap, struct ata_taskfile *tf) diff --git a/include/linux/libata.h b/include/linux/libata.h index 22aa69e..fe98957 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -452,6 +452,7 @@ struct ata_queued_cmd { ata_qc_cb_t complete_fn; void*private_data; + void*lldd_task; }; struct ata_port_stats { - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/12] sas_ata: sas_ata_qc_issue should return AC_ERR_*
The sas_ata_qc_issue function was incorrectly written to return error codes such as -ENOMEM. Since libata OR's qc->err_mask with the return value, It is necessary to make my code return one of the AC_ERR_ codes instead. For now, use AC_ERR_SYSTEM because an error here means that the OS couldn't send the command to the controller. If anybody has a suggestion for a better AC_ERR_ code to use, please suggest it. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/sas_ata.c | 10 -- 1 files changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 0bb1a14..46e1dbe 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -123,7 +123,7 @@ static void sas_ata_task_done(struct sas static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc) { - int res = -ENOMEM; + int res; struct sas_task *task; struct domain_device *dev = qc->ap->private_data; struct sas_ha_struct *sas_ha = dev->port->ha; @@ -135,7 +135,7 @@ static unsigned int sas_ata_qc_issue(str task = sas_alloc_task(GFP_ATOMIC); if (!task) - goto out; + return AC_ERR_SYSTEM; task->dev = dev; task->task_proto = SAS_PROTOCOL_STP; task->task_done = sas_ata_task_done; @@ -187,12 +187,10 @@ static unsigned int sas_ata_qc_issue(str SAS_DPRINTK("lldd_execute_task returned: %d\n", res); sas_free_task(task); - if (res == -SAS_QUEUE_FULL) - return -ENOMEM; + return AC_ERR_SYSTEM; } -out: - return res; + return 0; } static u8 sas_ata_check_status(struct ata_port *ap) - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/12] sas_ata: Require CONFIG_ATA in Kconfig
Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/Kconfig |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/libsas/Kconfig b/drivers/scsi/libsas/Kconfig index aafdc92..b64e391 100644 --- a/drivers/scsi/libsas/Kconfig +++ b/drivers/scsi/libsas/Kconfig @@ -24,7 +24,7 @@ # config SCSI_SAS_LIBSAS tristate "SAS Domain Transport Attributes" - depends on SCSI + depends on SCSI && ATA select SCSI_SAS_ATTRS help This provides transport specific helpers for SAS drivers which - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/12] sas_ata: Satisfy libata qc function locking requirements
ata_qc_complete and ata_sas_queuecmd require that the port lock be held when they are called. sas_ata doesn't do this, leading to BUG messages about qc tags newly allocated qc tags already being in use. This patch fixes the locking, which should clean up the rest of those messages. So far I've tested this against an IBM x206m with two SATA disks with no BUG messages and no other signs of things going wrong, and the machine finally passed the pounder stress test. Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]> --- drivers/scsi/libsas/sas_ata.c |4 drivers/scsi/libsas/sas_scsi_host.c |4 2 files changed, 8 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index de42b5b..0bb1a14 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -92,7 +92,9 @@ static void sas_ata_task_done(struct sas struct task_status_struct *stat = &task->task_status; struct ata_task_resp *resp = (struct ata_task_resp *)stat->buf; enum ata_completion_errors ac; + unsigned long flags; + spin_lock_irqsave(dev->sata_dev.ap->lock, flags); if (stat->stat == SAS_PROTO_RESPONSE) { ata_tf_from_fis(resp->ending_fis, &dev->sata_dev.tf); qc->err_mask |= ac_err_mask(dev->sata_dev.tf.command); @@ -113,6 +115,8 @@ static void sas_ata_task_done(struct sas } ata_qc_complete(qc); + spin_unlock_irqrestore(dev->sata_dev.ap->lock, flags); + list_del_init(&task->list); sas_free_task(task); } diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index 2cd478a..fee9c10 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -213,8 +213,12 @@ int sas_queuecommand(struct scsi_cmnd *c struct sas_task *task; if (dev_is_sata(dev)) { + unsigned long flags; + + spin_lock_irqsave(dev->sata_dev.ap->lock, flags); res = ata_sas_queuecmd(cmd, scsi_done, dev->sata_dev.ap); + spin_unlock_irqrestore(dev->sata_dev.ap->lock, flags); goto out; } - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/12] Roll-up of sas_ata patches
Hi all, This is a roll-up of all of my ATA related uncommitted patches against libsas and aic94xx to date. Per James Bottomley's request, I'm pushing these patches out for further review in aic94xx-sas. The big changes in this patch set are a lot of bug and locking fixes, the conversion of the EH routines to interact with the SAS EH strategy routines, and of course the separation of the SATL code into a separate module. These patches should apply in number order cleanly against 2.6.20-rc6 + scsi_misc + scsi-rc-fixes + aic94xx-sas. They've been fairly well tested on a bunch of SATA disks in a x206m, though the ATAPI support is not so well tested. However, I have run these patches in other loads for a while. Hopefully these patches are ready for more widespread testing in scsi-misc, and thank you for any comments or feedback that you provide. (Apologies for any stgit mail misconfiguration on my part.) --D - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html