Re: 2.6.24-rc4-mm1: hostbyte=0x01 driverbyte=0x00 (now bisected)
Alexey Dobriyan wrote: >> git-scsi-misc.patch > > Apologies for not looking into the problem earlier. See > http://marc.info/?t=11962802235&r=1&w=2 > "2.6.24-rc3-mm2: Result: hostbyte=0x01 driverbyte=0x00\nend_request: I/O > error" > for previous installment. > > I've bisected it to the following patch in git-scsi-misc branch. > Revert on top of 2.6.24-rc4-mm1 also helps. > > commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 > Author: Hannes Reinecke <[EMAIL PROTECTED]> > Date: Tue Nov 6 09:23:40 2007 +0100 > > [SCSI] Do not requeue requests if REQ_FAILFAST is set > > Any requests with the REQ_FAILFAST flag set should not be requeued > to the requeust queue, but rather terminated directly. > Otherwise the multipath failover will stall until the command > timeout triggers. > > Signed-off-by: Hannes Reinecke <[EMAIL PROTECTED]> > Signed-off-by: James Bottomley <[EMAIL PROTECTED]> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 0f44bdb..0da0dd0 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1286,6 +1286,11 @@ int scsi_prep_state_check(struct scsi_device *sdev, > struct request *req) >*/ > if (!(req->cmd_flags & REQ_PREEMPT)) > ret = BLKPREP_DEFER; > + /* > + * Return failfast requests immediately > + */ > + if (req->cmd_flags & REQ_FAILFAST) > + ret = BLKPREP_KILL; > break; > default: > /* > @@ -1414,6 +1419,17 @@ static inline int scsi_host_queue_ready(struct > request_queue *q, > return 1; > } > > +static void __scsi_kill_request(struct request *req) > +{ > + struct scsi_cmnd *cmd = req->special; > + struct scsi_device *sdev = cmd->device; > + > + cmd->result = DID_NO_CONNECT << 16; > + atomic_inc(&cmd->device->iorequest_cnt); > + sdev->device_busy--; > + __scsi_done(cmd); > +} > + > /* > * Kill a request for a dead device > */ > @@ -1527,8 +1543,16 @@ static void scsi_request_fn(struct request_queue *q) >* accept it. >*/ > req = elv_next_request(q); > - if (!req || !scsi_dev_queue_ready(q, sdev)) > + if (!req) > + break; > + > + if (!scsi_dev_queue_ready(q, sdev)) { > + if (req->cmd_flags & REQ_FAILFAST) { > + scsi_kill_request(req, q); > + continue; > + } > break; > + } > > if (unlikely(!scsi_device_online(sdev))) { > sdev_printk(KERN_ERR, sdev, > @@ -1609,8 +1633,12 @@ static void scsi_request_fn(struct request_queue *q) >* later time. >*/ > spin_lock_irq(q->queue_lock); > - blk_requeue_request(q, req); > - sdev->device_busy--; > + if (unlikely(req->cmd_flags & REQ_FAILFAST)) > + __scsi_kill_request(req); > + else { > + blk_requeue_request(q, req); > + sdev->device_busy--; > + } > if(sdev->device_busy == 0) > blk_plug_device(q); > out: Yeah, sorry. That patch was bad. Please use the attached one instead. Andrew, can you replace them? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage [EMAIL PROTECTED] +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 13e7e09..9ec1566 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1284,13 +1284,15 @@ int scsi_prep_state_check(struct scsi_device *sdev, struct request *req) /* * If the devices is blocked we defer normal commands. */ - if (!(req->cmd_flags & REQ_PREEMPT)) - ret = BLKPREP_DEFER; - /* -* Return failfast requests immediately -*/ - if (req->cmd_flags & REQ_FAILFAST) - ret = BLKPREP_KILL; + if (!(req->cmd_flags & REQ_PREEMPT)) { + /* +* Return failfast requests immediately +*/ + if (req->cmd_flags & REQ_FAILFAST) + ret = BLKPREP_KILL; + else + ret = BLKPREP_DEFER; + } break; default: /*
Re: aic7-triple x build fixed in kbuild.git
Sam Ravnborg wrote: > akpm reported that: > make mrproper > make allmodconfig > make drivers/scsi/aic7xxx/ > > were broken in recent kbuild.git - and as a logical consequence > removed kbuild.git from his -mm lineup. > > I have now fixed it (stupid bug I made) and pushed out > a fresh kbuild.git tree. > > As previously announced I am not very active in kbuild land the next > couple of weeks so I may be slow if there is any comments. > > The fixed commit is included below. > > Sam > > commit 63b7c4e77e9204407ee8a5d5243ba9c502550252 > Author: Vegard Nossum <[EMAIL PROTECTED]> > Date: Fri Oct 26 13:31:13 2007 +0200 > > aic7(3*x): fix firmware build > > This patch adds the proper $(obj) and $(src) prefixes to dependency > rules in aic7xxx makefile. Without this patch, there is a remote > possibility that parallel make with a different output directory can > fail. > > Also changed the deprecated EXTRA_CFLAGS construct to ccflags-y syntax. > > Fixed up patch to survive "make drivers/scsi/ -j" > with BUILD_FIRMWARE enable. /Sam > > Signed-off-by: Vegard Nossum <[EMAIL PROTECTED]> > Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]> > Cc: linux-scsi@vger.kernel.org > > diff --git a/drivers/scsi/aic7xxx/Makefile b/drivers/scsi/aic7xxx/Makefile > index 9a6ce19..e4f70c5 100644 > --- a/drivers/scsi/aic7xxx/Makefile > +++ b/drivers/scsi/aic7xxx/Makefile > @@ -33,11 +33,10 @@ aic79xx-y += > aic79xx_osm.o\ > aic79xx_proc.o \ > aic79xx_osm_pci.o > > -EXTRA_CFLAGS += -Idrivers/scsi > +ccflags-y += -Idrivers/scsi > ifdef WARNINGS_BECOME_ERRORS > -EXTRA_CFLAGS += -Werror > +ccflags-y += -Werror > endif > -#EXTRA_CFLAGS += -g > > # Files generated that shall be removed upon make clean > clean-files := aic7xxx_seq.h aic7xxx_reg.h aic7xxx_reg_print.c > @@ -46,53 +45,45 @@ clean-files += aic79xx_seq.h aic79xx_reg.h > aic79xx_reg_print.c > # Dependencies for generated files need to be listed explicitly > > $(obj)/aic7xxx_core.o: $(obj)/aic7xxx_seq.h > +$(obj)/aic7xxx_core.o: $(obj)/aic7xxx_reg.h > $(obj)/aic79xx_core.o: $(obj)/aic79xx_seq.h > -$(obj)/aic79xx_reg_print.c: $(src)/aic79xx_reg_print.c_shipped > -$(obj)/aic7xxx_reg_print.c: $(src)/aic7xxx_reg_print.c_shipped > +$(obj)/aic79xx_core.o: $(obj)/aic79xx_reg.h > > -$(addprefix $(obj)/,$(aic7xxx-y)): $(obj)/aic7xxx_reg.h > -$(addprefix $(obj)/,$(aic79xx-y)): $(obj)/aic79xx_reg.h > +$(addprefix $(obj)/,$(aic7xxx-y)): $(obj)/aic7xxx_seq.h > +$(addprefix $(obj)/,$(aic79xx-y)): $(obj)/aic79xx_seq.h > > -aic7xxx-gen-$(CONFIG_AIC7XXX_BUILD_FIRMWARE) := $(obj)/aic7xxx_seq.h \ > -$(obj)/aic7xxx_reg.h > +aic7xxx-gen-$(CONFIG_AIC7XXX_BUILD_FIRMWARE) := $(obj)/aic7xxx_reg.h > aic7xxx-gen-$(CONFIG_AIC7XXX_REG_PRETTY_PRINT) += > $(obj)/aic7xxx_reg_print.c > > aicasm-7xxx-opts-$(CONFIG_AIC7XXX_REG_PRETTY_PRINT) := \ > -p $(obj)/aic7xxx_reg_print.c -i aic7xxx_osm.h > > ifeq ($(CONFIG_AIC7XXX_BUILD_FIRMWARE),y) > -# Create a dependency chain in generated files > -# to avoid concurrent invocations of the single > -# rule that builds them all. > -aic7xxx_seq.h: aic7xxx_reg.h > -ifeq ($(CONFIG_AIC7XXX_REG_PRETTY_PRINT),y) > -aic7xxx_reg.h: aic7xxx_reg_print.c > -endif > -$(aic7xxx-gen-y): $(src)/aic7xxx.seq $(src)/aic7xxx.reg $(obj)/aicasm/aicasm > +$(obj)/aic7xxx_seq.h: $(src)/aic7xxx.seq $(src)/aic7xxx.reg > $(obj)/aicasm/aicasm > $(obj)/aicasm/aicasm -I$(src) -r $(obj)/aic7xxx_reg.h \ > $(aicasm-7xxx-opts-y) -o $(obj)/aic7xxx_seq.h \ > $(src)/aic7xxx.seq > + > +$(aic7xxx-gen-y): $(obj)/aic7xxx_seq.h > +else > +$(obj)/aic7xxx_reg_print.c: $(src)/aic7xxx_reg_print.c_shipped > endif > > -aic79xx-gen-$(CONFIG_AIC79XX_BUILD_FIRMWARE) := $(obj)/aic79xx_seq.h \ > -$(obj)/aic79xx_reg.h > +aic79xx-gen-$(CONFIG_AIC79XX_BUILD_FIRMWARE) := $(obj)/aic79xx_reg.h > aic79xx-gen-$(CONFIG_AIC79XX_REG_PRETTY_PRINT) += > $(obj)/aic79xx_reg_print.c > > aicasm-79xx-opts-$(CONFIG_AIC79XX_REG_PRETTY_PRINT) := \ > -p $(obj)/aic79xx_reg_print.c -i aic79xx_osm.h > > ifeq ($(CONFIG_AIC79XX_BUILD_FIRMWARE),y) > -# Create a dependency chain in generated files > -# to avoid concurrent invocations of the single > -# rule that builds them all. > -aic79xx_seq.h: aic79xx_reg.h > -ifeq ($(CONFIG_AIC79XX_REG_PRETTY_PRINT),y) > -aic79xx_reg.h: aic79xx_reg_print.c > -endif > -$(aic79xx-gen-y): $(src)/aic79xx.seq $(src)/aic79xx.reg $(obj)/aicasm/aicasm > +$(obj)/aic79xx_seq.h: $(src)/aic79xx.seq $(src)/aic79xx.reg > $(obj)/aicasm/aicasm > $(obj)/aicasm/aicasm -I$(src) -r $(obj)/aic79xx_reg.h \ > $(aicasm-79xx-opts-y) -o $(obj)/aic79xx_seq.h \ >
Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)
On Thu, 06 Dec 2007 14:49:37 +0900 FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > > > drivers/scsi/dpt_i2o.c | 132 ++- > > > drivers/scsi/dpti.h|9 ++ > > > 2 files changed, 68 insertions(+), 73 deletions(-) > > > > I've done the following: > > > > -untared a clean 2.6.24-rc4 and compiled it with my 2.6.23.1-settings in > > order > > to verify that the driver is still broken: checked, the box still won't > > boot. > > > > -patched the just compiled kernel source with your patch, "make dist-clean" > > (by means of "make-kpkg clean") and recompile: box boots fine. > > > > I've put the captured console logs to > > http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-pristine > > http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-patched > > ... and the kernelconfig (which shouldn't matter) to > > http://w.sysiphus.de/dpt_i2o/kernelconfig.2624-rc4 > > Thanks for testing. So reverting Matthew's hotplug patch fixes the > problem though I have no idea how the patch leads to this. Seems that > nobody has any clue on that. We need to revert that patch for the > moment. OK, thanks. Let's leave it a couple of days for people to register objections, have bright ideas, etc. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)
On Wed, 5 Dec 2007 11:14:41 +0100 Anders Henke <[EMAIL PROTECTED]> wrote: > On Tue, 4 Dec 2007 Andrew Morton wrote: > > On Wed, 05 Dec 2007 10:30:54 +0900 FUJITA Tomonori <[EMAIL PROTECTED]> > > wrote: > > > > > On Tue, 4 Dec 2007 17:11:55 -0800 > > > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > > > On Wed, 05 Dec 2007 10:04:03 +0900 > > > > FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > > > > > > > > > On Tue, 4 Dec 2007 16:57:38 -0800 > > > > > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > On Thu, 29 Nov 2007 13:31:50 +0100 > > > > > > Anders Henke <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > On November 28 2007, Anders Henke wrote: > > > > > > > > As "everything is reported as being zero" is quite odd an Jan > > > > > > > > took a > > > > > > > > guess that it might be block-layer or driver-related, I've > > > > > > > > assumed > > > > > > > > that the driver is responsible for this; just out of the > > > > > > > > curiousity, > > > > > > > > I've manually replaced the dpt_i2o driver by the 2.6.19 one by > > > > > > > > copying > > > > > > > > driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ > > > > > > > > into a > > > > > > > > vanilla 2.6.23.1. kernel; using this kernel fixed the issue for > > > > > > > > me. > > > > > > > > > > > > > > > > I haven't yet fine-tested from which kernel release on the > > > > > > > > dpt_i2o driver > > > > > > > > behaves like this and spews out zeroed blocks when trying to > > > > > > > > mount > > > > > > > > the rootfs. Maybe this is just some timing issue. > > > > > > > > > > > > > > I've started the fine-tests and can say so far that dpt_i2o from > > > > > > > 2.6.22 is still fine. Test is simple: > > > > > > > > > > > > > > [EMAIL PROTECTED]:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r > > > > > > > dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/ > > > > > > > > > > > > > > ... recompile the kernel, reboot: works. > > > > > > > > > > > > > > 2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two > > > > > > > different > > > > > > > patch sets: > > > > > > > -one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1 > > > > > > > -one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3 > > > > > > > -one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10. > > > > > > > > > > > > > > When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel, > > > > > > > the "zero blocks"-symptom show up, so it's the "lucky" situation > > > > > > > that the smallest patch actually seams to be the broken one. > > > > > > > > > > > > > > According to the 2.6.23-rc1 short-form changelog, there is > > > > > > > one major edit on the dpt_i2o driver: > > > > > > > > > > > > > > FUJITA Tomonori > > > > > > > > > > > > > > [SCSI] dpt_i2o: convert to use the data buffer accessors > > > > > > > > > > > > > > Stephen Rothwell > > > > > > > dpt_i2o depends on virt_to_bus > > > > > > > > > > > > > > Fujita, would you please take a look at this? > > > > > > > > > > > > He won't have seen this. cc's added. > > > > > > > > > > > > > I think that something's broken in there, leading to the dpt_i2o > > > > > > > sending out blocks of zeroes right after initialization, at least > > > > > > > on > > > > > > > some specific controllers (in this case, Adaptec 2010S on Intel > > > > > > > SE7501WV2S-based boxes). > > > > > > > > > > > > > > I don't have insight kernel driver development knowledge, so I'm > > > > > > > quite out of help right now. Nevertheless, I'll add the diff > > > > > > > from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o: > > > > > > > > > > > > > > > > > > > Can you please confirm that this revert (against 2.6.24-rc4) fixes > > > > > > the data > > > > > > corruption problems? > > > > > > > > > > Anders said that my patch is fine and seems that Matthew's hotplug > > > > > conversion patch leads to the problem: > > > > > > > > > > http://marc.info/?l=linux-kernel&m=119641892129732&w=2 > > > > > > > > Oh. Jan broke message threading :( > > > > > > > > So it's been nearly a week and nothing has happened? Do we revert that > > > > change? > > > > > > SCSI people really want this conversion... > > > > > > Matthew, did you have a chance to look at it? > > > > It seems pretty improbably that a change of that nature could cause data > > corruption. Anders, are you able to determine whether the revert (against > > current Linus mainline or 2.6.24-rc4) fixes things? Because it would be > > very strange... > > > > This is a grave bug. It's really quite urgent... > > > > Thanks. > > > > drivers/scsi/dpt_i2o.c | 132 ++- > > drivers/scsi/dpti.h|9 ++ > > 2 files changed, 68 insertions(+), 73 deletions(-) > > I've done the following: > > -untared a clean 2.6.24-rc4 and compiled it with my 2.6.23.1-settings in order > to verify that the driver is still broken: checked, the box still won't > boot. > > -patched the
Re: [PATCH] esp_scsi: Fix reset cleanup spinlock recursion
From: "Maciej W. Rozycki" <[EMAIL PROTECTED]> Date: Wed, 5 Dec 2007 16:10:54 + (GMT) > The esp_reset_cleanup() function is called with the host lock held and > invokes starget_for_each_device() which wants to take it too. Here is a > fix along the lines of shost_for_each_device()/__shost_for_each_device() > adding a __starget_for_each_device() counterpart which assumes the lock > has already been taken. > > Eventually, I think the driver should get modified so that more work is > done as a softirq rather than in the interrupt context, but for now it > fixes a bug that causes the spinlock debugger to fire. > > While at it, it fixes a small number of cosmetic problems with > starget_for_each_device() too. > > Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]> Acked-by: David S. Miller <[EMAIL PROTECTED]> Thanks for finding and fixing this bug. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 27/28] blk_end_request: changing scsi mid-layer for bidi (take 3)
Hi Boaz, On Tue, 04 Dec 2007 15:39:12 +0200, Boaz Harrosh <[EMAIL PROTECTED]> wrote: > On Sat, Dec 01 2007 at 1:35 +0200, Kiyoshi Ueda <[EMAIL PROTECTED]> wrote: > > This patch converts bidi of scsi mid-layer to use blk_end_request(). > > > > rq->next_rq represents a pair of bidi requests. > > (There are no other use of 'next_rq' of struct request.) > > For both requests in the pair, end_that_request_chunk() should be > > called before end_that_request_last() is called for one of them. > > Since the calls to end_that_request_first()/chunk() and > > end_that_request_last() are packaged into blk_end_request(), > > the handling of next_rq completion has to be moved into > > blk_end_request(), too. > > > > Bidi sets its specific value to rq->data_len before the request is > > completed so that upper-layer can read it. > > This setting must be between end_that_request_chunk() and > > end_that_request_last(), because rq->data_len may be used > > in end_that_request_chunk() by blk_trace and so on. > > To satisfy the requirement, use blk_end_request_callback() which > > is added in PATCH 25 only for the tricky drivers. > > > > If bidi didn't reuse rq->data_len and added new members to request > > for the specific value, it could set before end_that_request_chunk() > > and use the standard blk_end_request() like below. > > > > void scsi_end_bidi_request(struct scsi_cmnd *cmd) > > { > > struct request *req = cmd->request; > > > > rq->resid = scsi_out(cmd)->resid; > > rq->next_rq->resid = scsi_in(cmd)->resid; > > > > if (blk_end_request(req, 1, req->data_len)) > > BUG(); > > > > scsi_release_buffers(cmd); > > scsi_next_command(cmd); > > } ... snip ... > > rq->data_len = scsi_out(cmd)->resid is Not Just a problem of bidi > it is a General problem of scsi residual handling, and user code. > > Even today before any bidi. at scsi_lib.c at scsi_io_completion() > we do req->data_len = scsi_get_resid(cmd); > ( or: req->data_len = cmd->resid; depends which version you look) > And then call scsi_end_request() which calls __end_that_request_first/last > So it is assumed even today that req->data_len is not touched by > __end_that_request_first/last unless __end_that_request_first returned > that there is more work to do and the command is resubmitted in which > case the resid information is discarded. > > So if the regular resid handling is acceptable - Set req->data_len > before the call to __end_that_request_first/last, or blk_end_request() > in your case, then here goes your second client of the _callback and > it can be removed. > But if it is found that req->data_len is touched and the resid information > gets lost, than it should be fixed for the common uni-io case, by - for > example > - pass resid to the blk_end_request() function. > (So in any way the _callback can go) Thank you for the explanation of scsi's rq->data_len usage. I see that scsi usually uses rq->data_len for cmd->resid. I have investigated the possibility of setting data_len before the call to blk_end_request. But no matter whether data_len is touched or not, we need a callback for bidi. So I would like to go with the current patch. I explained the reason and some details below. As far as I can see, rq->data_len is just referenced by blk_add_trace_rq() in __end_that_request_first(), not modified. And I don't change any logic around there in the block-layer. So there shouldn't be any critical problem for scsi residual handing. (although I'm not sure that scsi expectes cmd->resid to be traced by blk_trace.) Anyway, I see that it is no critical problem for bidi to set cmd->resid to rq->data_len before blk_end_request() call. But if I do that, blk_end_request() can't get the next_rq's size to complete in its code below. > + /* Bidi request must be completed as a whole */ > + if (blk_bidi_rq(rq) && > + __end_that_request_first(rq->next_rq, uptodate, > + blk_rq_bytes(rq->next_rq))) > + return 1; So I will have to move next_rq completion to bidi and use _callback() anyway like the following. - static int dummy_cb(struct request *rq) { return 1; } void scsi_end_bidi_request(struct scsi_cmnd *cmd) { struct request *req = cmd->request; unsigned int dlen = req->data_len; unsigned int next_dlen = req->next_rq->data_len; req->data_len = scsi_out(cmd)->resid; req->next_rq->data_len = scsi_in(cmd)->resid; /* Complete only DATA of next_rq using _callback and dummy function */ if (!blk_end_request_callback(req->next_rq, 1, next_dlen, dummy_cb)) BUG(); if (blk_end_request(req, 1, dlen)) BUG(); scsi_release_buffers(cmd); scsi_next_command(cmd); } - I prefer
2.6.24-rc4-mm1: hostbyte=0x01 driverbyte=0x00 (now bisected)
> git-scsi-misc.patch Apologies for not looking into the problem earlier. See http://marc.info/?t=11962802235&r=1&w=2 "2.6.24-rc3-mm2: Result: hostbyte=0x01 driverbyte=0x00\nend_request: I/O error" for previous installment. I've bisected it to the following patch in git-scsi-misc branch. Revert on top of 2.6.24-rc4-mm1 also helps. commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 Author: Hannes Reinecke <[EMAIL PROTECTED]> Date: Tue Nov 6 09:23:40 2007 +0100 [SCSI] Do not requeue requests if REQ_FAILFAST is set Any requests with the REQ_FAILFAST flag set should not be requeued to the requeust queue, but rather terminated directly. Otherwise the multipath failover will stall until the command timeout triggers. Signed-off-by: Hannes Reinecke <[EMAIL PROTECTED]> Signed-off-by: James Bottomley <[EMAIL PROTECTED]> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 0f44bdb..0da0dd0 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1286,6 +1286,11 @@ int scsi_prep_state_check(struct scsi_device *sdev, struct request *req) */ if (!(req->cmd_flags & REQ_PREEMPT)) ret = BLKPREP_DEFER; + /* +* Return failfast requests immediately +*/ + if (req->cmd_flags & REQ_FAILFAST) + ret = BLKPREP_KILL; break; default: /* @@ -1414,6 +1419,17 @@ static inline int scsi_host_queue_ready(struct request_queue *q, return 1; } +static void __scsi_kill_request(struct request *req) +{ + struct scsi_cmnd *cmd = req->special; + struct scsi_device *sdev = cmd->device; + + cmd->result = DID_NO_CONNECT << 16; + atomic_inc(&cmd->device->iorequest_cnt); + sdev->device_busy--; + __scsi_done(cmd); +} + /* * Kill a request for a dead device */ @@ -1527,8 +1543,16 @@ static void scsi_request_fn(struct request_queue *q) * accept it. */ req = elv_next_request(q); - if (!req || !scsi_dev_queue_ready(q, sdev)) + if (!req) + break; + + if (!scsi_dev_queue_ready(q, sdev)) { + if (req->cmd_flags & REQ_FAILFAST) { + scsi_kill_request(req, q); + continue; + } break; + } if (unlikely(!scsi_device_online(sdev))) { sdev_printk(KERN_ERR, sdev, @@ -1609,8 +1633,12 @@ static void scsi_request_fn(struct request_queue *q) * later time. */ spin_lock_irq(q->queue_lock); - blk_requeue_request(q, req); - sdev->device_busy--; + if (unlikely(req->cmd_flags & REQ_FAILFAST)) + __scsi_kill_request(req); + else { + blk_requeue_request(q, req); + sdev->device_busy--; + } if(sdev->device_busy == 0) blk_plug_device(q); out: - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: stex driver - kernel panics
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Rolf Eike Beer wrote: > > gdb stex.o > l *stex_mu_intr+0x10e > > That will give you the exact line that failed > > Greeting, > > Eike Well I get this: raidology:~# gdb stex.o ... (gdb) l *stex_mu_intr+0x10e No symbol table is loaded. Use the "file" command. I think its due to no kernel debugging info. I'll recompile with "CONFIG_DEBUG_KERNEL" set and see what happens. Won't the recompile make the previous procedure addresses invalid? cya Sean -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHVy2NdnDJ8yyauXcRAuNGAJ9qIRhBcHEEK6tn7HhUq9pnaA/ovQCcDLGY eLSL9wCctTNBa9MfjSYEphw= =nUpC -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan
On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote: > On Wed, 05 Dec 2007 11:36:39 -0500 > Lee Schermerhorn <[EMAIL PROTECTED]> wrote: > > > As reported here: > > > > http://marc.info/?l=linux-scsi&m=119645761124683&w=4 > > > > against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA > > platform under 24-rc4-mm1 with async scsi scan enabled. I'm still > > seeing the message "mptspi: ioc#: mpt_config failed" when it hangs. > > > > I can boot by disabling async scan. However, I've also noticed some > > disks attached via one of the "mpt" adapters ["scsi8" in console long in > > message linked above] going "off-line" during stress tests. This was > > under 24-rc3-mm2. Haven't got that far yet with 24-rc4-mm1. > > > > Is ther any way of tricking you into > http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt? > > Obvious culprits to start with would be git-scsi-misc and maybe > scsi-early-detection-of-medium-not-present-updated.patch. But there are > only 20-odd scsi patches in there. I'll try to get to it tomorrow am. Lee > - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: stex driver - kernel panics
Sean Rendell wrote: > Hi all, > > I was wondering if someone knows how to debug this. I have a HP xw4300 PC > with 2G RAM and 3GHz P4. Included is a Promise EX8350 RAID card with 4 > drives configured as RAID5. While doing some lengthy disk exercises (ie cp > a 200G directory of files) the kernel usually panics. > > This is the 2.6.23 kernel patched to 2.6.24-rc4. The only module is the > stex driver, all other drivers are in the kernel. > > Can anyone debug this? (I am not a software guy) Do you need more info? > > cya > Sean > > > (copied by hand from the screen) > raidology:/home# mkdir foo > raidology:/home# cp -a * foo/ > stex(:11:0e.0): aborting command > sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 40 3c a7 17 00 00 08 00 > stex(:11:0e.0): aborting command > sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 06 85 cc 17 00 01 00 00 > stex(:11:0e.0): aborting command > sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 31 1c 66 27 00 00 08 00 > stex(:11:0e.0): aborting command > sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 4f 1f 87 f7 00 00 08 00 > stex(:11:0e.0): aborting command > sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 4a 07 d1 ef 00 00 08 00 > stex(:11:0e.0): aborting command > sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 18 8e 33 30 00 00 01 00 > stex(:11:0e.0): aborting command > sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 18 8e 33 3f 00 00 08 00 > stex(:11:0e.0): aborting command > sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 02 ba 77 b7 00 00 08 00 > BUG: unable to handle kernel NULL pointer dereference at virtual address > printing eip: *pde = > Oops: [#1] SMP > Modules linked in: stex > > Pid: 0, comm: swapper Not tainted (2.6.24-rc4loca-RAID5-02 #1) > EIP: 0060:[<>] EFLAGS: 00010046 CPU:0 > EIP is at run_init_process+0x3fefff40/0x20 > EAX: f7760900 EBX: f7548d78 ECX: f7760900 EDX: > ESI: f7760900 EDI: f7f6eaa0 EBP: f7f6ea70 ESP: c04ebeec > DS: 007b ES: 007b FS: 00d8 GS: SS:0068 > Process swapper (pid: 0, ti=c04ea000 task=c04b53a0 task.ti=c04ea000) > Stack: f887c4be c04ebf24 c04ebf1c c04ebf2c f8d9e000 f7f6eaa0 0001 > 00ced0bd f7f6ea70 f8d9e000 0246 0001 f887ceb2 f7670e80 > 0010 c013dff5 c04df580 0010 c04df5b0 c013f7d4 > 0800 Call Trace: > [] stex_mu_intr+0x10e/0x4f0 [stex] > [] stex_intr+0x42/0x70 [stex] > [] handle_IRQ_event+0x25/0x60 > [] handle_fasteoi_irq+0x74/0xf0 > [] do_IRQ+0x3b/0x80 > [] smp_apic_timer_interrupt+0x57/0x90 > [] common_interrupt+0x23/0x28 > [] mwait_idle_with_hints+0x46/0x60 > [] cpu_idle+0x65/0x80 > [] start_kernel+0x2af/0x2f0 > [] unknown_bootoption+0x0/0x1f0 > === > Code: Bad EIP value. > EIP: [<>] run_init_process=0x3fefff40/0x20 SS:ESP 0068:c04ebeec > Kernel panic - not syncing: Fatal exception in interrupt gdb stex.o l *stex_mu_intr+0x10e That will give you the exact line that failed Greeting, Eike signature.asc Description: This is a digitally signed message part.
Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan
On Wed, 05 Dec 2007 11:36:39 -0500 Lee Schermerhorn <[EMAIL PROTECTED]> wrote: > As reported here: > > http://marc.info/?l=linux-scsi&m=119645761124683&w=4 > > against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA > platform under 24-rc4-mm1 with async scsi scan enabled. I'm still > seeing the message "mptspi: ioc#: mpt_config failed" when it hangs. > > I can boot by disabling async scan. However, I've also noticed some > disks attached via one of the "mpt" adapters ["scsi8" in console long in > message linked above] going "off-line" during stress tests. This was > under 24-rc3-mm2. Haven't got that far yet with 24-rc4-mm1. > Is ther any way of tricking you into http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt? Obvious culprits to start with would be git-scsi-misc and maybe scsi-early-detection-of-medium-not-present-updated.patch. But there are only 20-odd scsi patches in there. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
stex driver - kernel panics
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I was wondering if someone knows how to debug this. I have a HP xw4300 PC with 2G RAM and 3GHz P4. Included is a Promise EX8350 RAID card with 4 drives configured as RAID5. While doing some lengthy disk exercises (ie cp a 200G directory of files) the kernel usually panics. This is the 2.6.23 kernel patched to 2.6.24-rc4. The only module is the stex driver, all other drivers are in the kernel. Can anyone debug this? (I am not a software guy) Do you need more info? cya Sean (copied by hand from the screen) raidology:/home# mkdir foo raidology:/home# cp -a * foo/ stex(:11:0e.0): aborting command sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 40 3c a7 17 00 00 08 00 stex(:11:0e.0): aborting command sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 06 85 cc 17 00 01 00 00 stex(:11:0e.0): aborting command sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 31 1c 66 27 00 00 08 00 stex(:11:0e.0): aborting command sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 4f 1f 87 f7 00 00 08 00 stex(:11:0e.0): aborting command sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 4a 07 d1 ef 00 00 08 00 stex(:11:0e.0): aborting command sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 18 8e 33 30 00 00 01 00 stex(:11:0e.0): aborting command sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 18 8e 33 3f 00 00 08 00 stex(:11:0e.0): aborting command sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 02 ba 77 b7 00 00 08 00 BUG: unable to handle kernel NULL pointer dereference at virtual address printing eip: *pde = Oops: [#1] SMP Modules linked in: stex Pid: 0, comm: swapper Not tainted (2.6.24-rc4loca-RAID5-02 #1) EIP: 0060:[<>] EFLAGS: 00010046 CPU:0 EIP is at run_init_process+0x3fefff40/0x20 EAX: f7760900 EBX: f7548d78 ECX: f7760900 EDX: ESI: f7760900 EDI: f7f6eaa0 EBP: f7f6ea70 ESP: c04ebeec DS: 007b ES: 007b FS: 00d8 GS: SS:0068 Process swapper (pid: 0, ti=c04ea000 task=c04b53a0 task.ti=c04ea000) Stack: f887c4be c04ebf24 c04ebf1c c04ebf2c f8d9e000 f7f6eaa0 0001 00ced0bd f7f6ea70 f8d9e000 0246 0001 f887ceb2 f7670e80 0010 c013dff5 c04df580 0010 c04df5b0 c013f7d4 0800 Call Trace: [] stex_mu_intr+0x10e/0x4f0 [stex] [] stex_intr+0x42/0x70 [stex] [] handle_IRQ_event+0x25/0x60 [] handle_fasteoi_irq+0x74/0xf0 [] do_IRQ+0x3b/0x80 [] smp_apic_timer_interrupt+0x57/0x90 [] common_interrupt+0x23/0x28 [] mwait_idle_with_hints+0x46/0x60 [] cpu_idle+0x65/0x80 [] start_kernel+0x2af/0x2f0 [] unknown_bootoption+0x0/0x1f0 === Code: Bad EIP value. EIP: [<>] run_init_process=0x3fefff40/0x20 SS:ESP 0068:c04ebeec Kernel panic - not syncing: Fatal exception in interrupt Other info: raidology:/# uname -a Linux raidology 2.6.24-rc4loca-RAID5-02 #1 SMP Tue Dec 4 13:01:10 EST 2007 i686 GNU/Linux raidology:/# lspci 00:00.0 Host bridge: Intel Corporation 955X Express Memory Controller Hub 00:01.0 PCI bridge: Intel Corporation 955X Express PCI Express Root Port 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 5 (rev 01) 00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 6 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) 00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01) 01:00.0 VGA compatible controller: nVidia Corporation NV44 [Quadro NVS 285] (rev a1) 05:09.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02) 05:0a.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02) 10:00.0 PCI bridge: Intel Corporation 80333 Segment-A PCI Express-to-PCI Express Bridge 10:00.2 PCI bridge: Intel Corporation 80333 Segment-B PCI Express-to-PCI Express Bridge 11:0e.0 RAID bus controller: Promise Technology, Inc. Unknown device 8350 3f:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5752 Gigabit Ethernet PCI Express (rev 01) raidology:/# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model
Another use for block-layer timeouts
On Wed, Nov 07, 2007 at 02:22:57PM +0100, Jens Axboe wrote: > On Tue, Nov 06 2007, [EMAIL PROTECTED] wrote: > > gdth driver is modified NOT to use scp->eh_timeout. Now, it has > > eh_timed_out (gdth_timed_out) to handle command timeouts for locked > > I/O's. Have not tested as I don't have needed hardware! Patch is > > against 2.6.23-mm1. > > I updated the timeout patch to current kernel and fixed some fallout. I > included your gdth patch. > > Completely untested, patch is below. It's also in the #timeout branch of > the block git repo, to keep track of it. Hi Jens, Kristen, Sarah and I were talking over lunch today, and discovered that they each have a problem that I think can be solved by using the block layer timeout functionality. One problem is a software-controlled drive activity LED. The other is wanting to save power by suspending the link to an idle USB Storage device. What these both have in common is the desire to find out when the queue is empty (with a bit of hysteresis to avoid flickering the LED off for too long, or powering down the USB link when there's a command about to arrive). I think this could be done with the timer you're adding here, by adding a driver callback to be called once the timer fires and there's no commands in the queue. Alternatively, we could add another timer. While each driver could have its own timer to detect these cases, if we have two drivers that want this information, chances are there are others that could also make use of it. What do you think? Shall I try knocking up a patch to (ab)use the timer in this way? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] blk request timeout minor fixes...
Jens Axboe [EMAIL PROTECTED] wrote: > ->timeout isn't used very much, so should not be too much work to > console the two. > > If you see any specific obstacles, do list them! I believe it is used in blk_add_timer() which may be called: 1) First time sending the command to LLDs No problem. It would be set to correct value at that time. 2) called from blk_rq_timed_out() in response to BLK_EH_RESET_TIMER Without the actual timeout value, there is no way to res-set timer correctly. 3) Retry Same #2. The other way is to remove the deadline field. That poses timing out issues. Thanks, Malahal. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] blk request timeout minor fixes...
On Wed, Dec 05 2007, [EMAIL PROTECTED] wrote: > Jens Axboe [EMAIL PROTECTED] wrote: > > > This one fixes the 4-byte hole. Thank you very much. > > > > That's all fine, but now we have two fields doing essentially the same > > thing. Care to cleanup the ->timeout usage so we can get by with using > > just that field? > > I added another field because I couldn't find a way to use just one > field without breaking something! I will gladly clean it up if you > find a way to fix it. ->timeout isn't used very much, so should not be too much work to console the two. If you see any specific obstacles, do list them! -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] blk request timeout minor fixes...
Jens Axboe [EMAIL PROTECTED] wrote: > > This one fixes the 4-byte hole. Thank you very much. > > That's all fine, but now we have two fields doing essentially the same > thing. Care to cleanup the ->timeout usage so we can get by with using > just that field? I added another field because I couldn't find a way to use just one field without breaking something! I will gladly clean it up if you find a way to fix it. Thanks, Malahal. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linux-usb-users] Read errors on Flash Drive Transcend TS1GJF2A
On Wed, 5 Dec 2007, Boaz Harrosh wrote: > Hi Alen > > Yes, I have not inspected sr.c very carefully, you are absolutely right. > Could you submit a unified patch for sd, sr and scsi.c I have hit this > bug 2 in my error injection tests. I was doing sg_chaining tests and now > with the possibly very large requests the problem gets worse. At the time, > I could not identify the exact problem, thanks I'd like to keep the sd and sr patches separate. The sd patch has already been tested and is known to fix an existing problem. The sr patch below is new and it has not been tested. All I know is that it compiles. Alan Stern Index: usb-2.6/drivers/scsi/scsi.c === --- usb-2.6.orig/drivers/scsi/scsi.c +++ usb-2.6/drivers/scsi/scsi.c @@ -700,6 +700,8 @@ void scsi_finish_command(struct scsi_cmn good_bytes = cmd->request_bufflen; if (cmd->request->cmd_type != REQ_TYPE_BLOCK_PC) { + if (cmd->resid > 0 && cmd->resid < good_bytes) + good_bytes -= cmd->resid; drv = scsi_cmd_to_driver(cmd); if (drv->done) good_bytes = drv->done(cmd); Index: usb-2.6/drivers/scsi/sr.c === --- usb-2.6.orig/drivers/scsi/sr.c +++ usb-2.6/drivers/scsi/sr.c @@ -218,15 +218,29 @@ static int sr_media_change(struct cdrom_ static int sr_done(struct scsi_cmnd *SCpnt) { int result = SCpnt->result; - int this_count = SCpnt->request_bufflen; - int good_bytes = (result == 0 ? this_count : 0); - int block_sectors = 0; - long error_sector; + unsigned int xfer_size = SCpnt->request_bufflen; + unsigned int good_bytes = (result == 0 ? xfer_size : 0); + unsigned long start_lba = SCpnt->request->sector; + u64 bad_lba; + unsigned long num_blocks; + unsigned long bad_sector; struct scsi_cd *cd = scsi_cd(SCpnt->request->rq_disk); + struct scsi_sense_hdr sshdr; + int sense_valid = 0; + int sense_deferred = 0; + int info_valid; #ifdef DEBUG printk("sr.c done: %x\n", result); #endif + if (result) { + sense_valid = scsi_command_normalize_sense(SCpnt, &sshdr); + if (sense_valid) + sense_deferred = scsi_sense_is_deferred(&sshdr); + } + if (driver_byte(result) != DRIVER_SENSE && + (!sense_valid || sense_deferred)) + goto out; /* * Handle MEDIUM ERRORs or VOLUME OVERFLOWs that indicate partial @@ -234,60 +248,83 @@ static int sr_done(struct scsi_cmnd *SCp * care is taken to avoid unnecessary additional work such as * memcpy's that could be avoided. */ - if (driver_byte(result) != 0 && /* An error occurred */ - (SCpnt->sense_buffer[0] & 0x7f) == 0x70) { /* Sense current */ - switch (SCpnt->sense_buffer[2]) { - case MEDIUM_ERROR: - case VOLUME_OVERFLOW: - case ILLEGAL_REQUEST: - if (!(SCpnt->sense_buffer[0] & 0x90)) - break; - error_sector = (SCpnt->sense_buffer[3] << 24) | - (SCpnt->sense_buffer[4] << 16) | - (SCpnt->sense_buffer[5] << 8) | - SCpnt->sense_buffer[6]; - if (SCpnt->request->bio != NULL) - block_sectors = - bio_sectors(SCpnt->request->bio); - if (block_sectors < 4) - block_sectors = 4; - if (cd->device->sector_size == 2048) - error_sector <<= 2; - error_sector &= ~(block_sectors - 1); - good_bytes = (error_sector - SCpnt->request->sector) << 9; - if (good_bytes < 0 || good_bytes >= this_count) - good_bytes = 0; - /* -* The SCSI specification allows for the value -* returned by READ CAPACITY to be up to 75 2K -* sectors past the last readable block. -* Therefore, if we hit a medium error within the -* last 75 2K sectors, we decrease the saved size -* value. -*/ - if (error_sector < get_capacity(cd->disk) && - cd->capacity - error_sector < 4 * 75) - set_capacity(cd->disk, error_sector); + switch (sshdr.sense_key) { + case HARDWARE_ERROR: + case MEDIUM_ERROR: + case VOLUME_OVERFLOW: + case ILLEGAL_REQUEST: + if (!blk_fs_request(SCpnt->reques
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
Current progress: 11 revisions left to test. The current partial "git bisect log" is available per Ingo's suggestion on bugzilla. http://bugzilla.kernel.org/show_bug.cgi?id=9457 -- Bob Tracy | "They couldn't hit an elephant at this dist- " [EMAIL PROTECTED] | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan
As reported here: http://marc.info/?l=linux-scsi&m=119645761124683&w=4 against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA platform under 24-rc4-mm1 with async scsi scan enabled. I'm still seeing the message "mptspi: ioc#: mpt_config failed" when it hangs. I can boot by disabling async scan. However, I've also noticed some disks attached via one of the "mpt" adapters ["scsi8" in console long in message linked above] going "off-line" during stress tests. This was under 24-rc3-mm2. Haven't got that far yet with 24-rc4-mm1. Regards, Lee - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [OpenFCoE PATCH] If expecting pre-T11 frames, a T11 frame caused a data fault.
Christoph Hellwig wrote: > Just wondering: > >>From my quick reading of the code these pre-T11 frames are basically > a different frame-level protocol. Given that T11 has standardized > on a different one what's the rationale for supporting the old frames? There is still some use in lab environments of the old frame format, and it would greatly help if we could keep it working for now. Eventually, in a year or so, it could be removed completely. It seems to be a small amount of code and in a loadable module besides. Joe - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] ibmvscsi: retry on H_DROPPED during send_crq
Currently the vscsi client driver responds to the case where H_SEND_CRQ returns H_DROPPED by returning DID_ERROR. If the server CRQ is full, either from mismanaging the request_limit or problems on the server, we should return SCSI_MLQUEUE_HOST_BUSY instead. The places where we are calling send_srp_login are not checking the return code. We could get H_DROPPED or H_CLOSED and in that case we should reset and retry. Signed-off-by: Robert Jennings <[EMAIL PROTECTED]> --- drivers/scsi/ibmvscsi/ibmvscsi.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) Index: b/drivers/scsi/ibmvscsi/ibmvscsi.c === --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -629,14 +629,15 @@ static int ibmvscsi_send_srp_event(struc list_del(&evt_struct->list); del_timer(&evt_struct->timer); - /* If send_crq returns H_CLOSED, return SCSI_MLQUEUE_HOST_BUSY. -* Firmware will send a CRQ with a transport event (0xFF) to -* tell this client what has happened to the transport. This -* will be handled in ibmvscsi_handle_crq() + /* If send_crq returns H_CLOSED or H_DROPPED, return +* SCSI_MLQUEUE_HOST_BUSY. Firmware will send a CRQ with +* a transport event (0xFF) to tell this client what has +* happened to the transport. This will be handled in +* ibmvscsi_handle_crq(). */ - if (rc == H_CLOSED) { + if (rc == H_CLOSED || rc == H_DROPPED) { dev_warn(hostdata->dev, "send warning. " -"Receive queue closed, will retry.\n"); +"Receive queue unavailable, will retry.\n"); goto send_busy; } dev_err(hostdata->dev, "send error %d\n", rc); @@ -1270,7 +1271,8 @@ void ibmvscsi_handle_crq(struct viosrp_c if ((rc = ibmvscsi_ops->send_crq(hostdata, 0xC002LL, 0)) == 0) { /* Now login */ - send_srp_login(hostdata); + if (send_srp_login(hostdata)) + ibmvscsi_reset_host(hostdata); } else { dev_err(hostdata->dev, "Unable to send init rsp. rc=%ld\n", rc); } @@ -1280,7 +1282,8 @@ void ibmvscsi_handle_crq(struct viosrp_c dev_info(hostdata->dev, "partner initialization complete\n"); /* Now login */ - send_srp_login(hostdata); + if (send_srp_login(hostdata)) + ibmvscsi_reset_host(hostdata); break; default: dev_err(hostdata->dev, "unknown crq message type: %d\n", crq->format); - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] esp_scsi: Fix reset cleanup spinlock recursion
The esp_reset_cleanup() function is called with the host lock held and invokes starget_for_each_device() which wants to take it too. Here is a fix along the lines of shost_for_each_device()/__shost_for_each_device() adding a __starget_for_each_device() counterpart which assumes the lock has already been taken. Eventually, I think the driver should get modified so that more work is done as a softirq rather than in the interrupt context, but for now it fixes a bug that causes the spinlock debugger to fire. While at it, it fixes a small number of cosmetic problems with starget_for_each_device() too. Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]> --- Checked with checkpatch.pl and at the run time. Please apply. Maciej patch-mips-2.6.23-rc5-20070904-esp_scsi-reset-3 diff -up --recursive --new-file linux-mips-2.6.23-rc5-20070904.macro/drivers/scsi/esp_scsi.c linux-mips-2.6.23-rc5-20070904/drivers/scsi/esp_scsi.c --- linux-mips-2.6.23-rc5-20070904.macro/drivers/scsi/esp_scsi.c 2007-10-23 14:07:38.0 + +++ linux-mips-2.6.23-rc5-20070904/drivers/scsi/esp_scsi.c 2007-11-26 01:01:10.0 + @@ -2028,8 +2028,8 @@ static void esp_reset_cleanup(struct esp tp->flags |= ESP_TGT_CHECK_NEGO; if (tp->starget) - starget_for_each_device(tp->starget, NULL, - esp_clear_hold); + __starget_for_each_device(tp->starget, NULL, + esp_clear_hold); } esp->flags &= ~ESP_FLAG_RESETTING; } diff -up --recursive --new-file linux-mips-2.6.23-rc5-20070904.macro/drivers/scsi/scsi.c linux-mips-2.6.23-rc5-20070904/drivers/scsi/scsi.c --- linux-mips-2.6.23-rc5-20070904.macro/drivers/scsi/scsi.c2007-09-04 04:55:44.0 + +++ linux-mips-2.6.23-rc5-20070904/drivers/scsi/scsi.c 2007-12-05 16:00:40.0 + @@ -886,11 +886,11 @@ EXPORT_SYMBOL(__scsi_iterate_devices); * starget_for_each_device - helper to walk all devices of a target * @starget: target whose devices we want to iterate over. * - * This traverses over each devices of @shost. The devices have + * This traverses over each device of @starget. The devices have * a reference that must be released by scsi_host_put when breaking * out of the loop. */ -void starget_for_each_device(struct scsi_target *starget, void * data, +void starget_for_each_device(struct scsi_target *starget, void *data, void (*fn)(struct scsi_device *, void *)) { struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); @@ -905,6 +905,33 @@ void starget_for_each_device(struct scsi EXPORT_SYMBOL(starget_for_each_device); /** + * __starget_for_each_device - helper to walk all devices of a target + * (UNLOCKED) + * @starget: target whose devices we want to iterate over. + * + * This traverses over each device of @starget. It does _not_ + * take a reference on the scsi_device, so the whole loop must be + * protected by shost->host_lock. + * + * Note: The only reason why drivers would want to use this is because + * they need to access the device list in irq context. Otherwise you + * really want to use starget_for_each_device instead. + **/ +void __starget_for_each_device(struct scsi_target *starget, void *data, + void (*fn)(struct scsi_device *, void *)) +{ + struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); + struct scsi_device *sdev; + + __shost_for_each_device(sdev, shost) { + if ((sdev->channel == starget->channel) && + (sdev->id == starget->id)) + fn(sdev, data); + } +} +EXPORT_SYMBOL(__starget_for_each_device); + +/** * __scsi_device_lookup_by_target - find a device given the target (UNLOCKED) * @starget: SCSI target pointer * @lun: SCSI Logical Unit Number diff -up --recursive --new-file linux-mips-2.6.23-rc5-20070904.macro/include/scsi/scsi_device.h linux-mips-2.6.23-rc5-20070904/include/scsi/scsi_device.h --- linux-mips-2.6.23-rc5-20070904.macro/include/scsi/scsi_device.h 2007-09-04 04:56:21.0 + +++ linux-mips-2.6.23-rc5-20070904/include/scsi/scsi_device.h 2007-11-26 01:00:57.0 + @@ -222,6 +222,9 @@ extern struct scsi_device *__scsi_device uint); extern void starget_for_each_device(struct scsi_target *, void *, void (*fn)(struct scsi_device *, void *)); +extern void __starget_for_each_device(struct scsi_target *, void *, + void (*fn)(struct scsi_device *, +void *)); /* only exposed to implement shost_for_each_device */ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *, - To unsubscribe from this list: send the lin
Re: [patch] SCSI: early detection of medium not present, updated
On Sun, 2007-12-02 at 15:02 -0500, Alan Stern wrote: > On Sun, 2 Dec 2007, James Bottomley wrote: > > > Actually, the night train is a good place to do coding. I know this > > compiles, but could someone check it out? It's the form I think we > > should do, since the ASC/ASCQ codes for esoteric conditions which we > > might eventually check for are different for CDs and removable discs. > > This part is puzzling: > > > + /* try to eat the UNIT_ATTENTION if there are enough retries */ > > + do { > > + result = scsi_execute_req(sdev, cmd, DMA_NONE, NULL, 0, sshdr, > > + timeout, retries); > > + } while ((driver_byte(result) & DRIVER_SENSE) && > > +sshdr && sshdr->sense_key == UNIT_ATTENTION && > > +--retries); > > For one thing, you've using retries to control both an outer loop and > an inner loop -- meaning the total number of attempts could be on the > order of retries**2. Sort of; however, there should only really be one cc/ua before the TUR goes through. Also, TUR produces almost no conditions that would be retryable in scsi_decide_disposition() and the scsi_execute sends it down with REQ_FAILFAST anyway, so it's really not used much. > For another, why do you want to swallow a UNIT_ATTENTION? Shouldn't > that be up to the code calling scsi_test_unit_ready()? No, that's a bug in the old code. UA doesn't necessarily mean there's a media change. It could be asserted for a whole host of reasons. Until we get proper UA signalling, we need to resend the TUR so we get the correct media status and discard the UA (Even if the device is asserting UA for media change reasons, it will still give the correct sense code response to the following TUR). James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [OpenFCoE PATCH] If expecting pre-T11 frames, a T11 frame caused a data fault.
Just wondering: >From my quick reading of the code these pre-T11 frames are basically a different frame-level protocol. Given that T11 has standardized on a different one what's the rationale for supporting the old frames? - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
need help to suspend a device
Hi, I need some help with device suspending or something similar to this. Problem description: We have many many Infortrend sata-to-scsi-raid systems here, connected to several server systems with LSI2230 HBAs. Now the problem is, that these Infortrend boxes sometimes suffer from some kind of hick-up, sometimes due to the reboot of second server connected to the very same Infortrend box and sometimes just for no obvious reason. As far I can presently see it, the best would be to suspend any i/o, commands, etc to the troublesome box for about 30s to 60s and just wait and then to repeat the last command. However, presently the scsi-layer/driver will immediately begin a domain validation and then do a scsi-reset. Unfortunately this will cause even more trouble. Sometimes this works, in more cases it will end in an i/o error and in most cases these domain validations and scsi-resets will go into an endless loop and eventually the Infortrend raids will be so confused that their scsi-controller will crash. I'm already looking for some hours through the scsi and mpt functions, but so far didn't find a good way to put commands to a scsi-device to sleep for some time. Any hints and suggestions are highly appreciated. Thanks in advance, Bernd -- Bernd Schubert Q-Leap Networks GmbH - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)
On Tue, 4 Dec 2007 Andrew Morton wrote: > On Wed, 05 Dec 2007 10:30:54 +0900 FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > > > On Tue, 4 Dec 2007 17:11:55 -0800 > > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > On Wed, 05 Dec 2007 10:04:03 +0900 > > > FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > > > > > > > On Tue, 4 Dec 2007 16:57:38 -0800 > > > > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > > > > > On Thu, 29 Nov 2007 13:31:50 +0100 > > > > > Anders Henke <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > On November 28 2007, Anders Henke wrote: > > > > > > > As "everything is reported as being zero" is quite odd an Jan > > > > > > > took a > > > > > > > guess that it might be block-layer or driver-related, I've assumed > > > > > > > that the driver is responsible for this; just out of the > > > > > > > curiousity, > > > > > > > I've manually replaced the dpt_i2o driver by the 2.6.19 one by > > > > > > > copying > > > > > > > driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ > > > > > > > into a > > > > > > > vanilla 2.6.23.1. kernel; using this kernel fixed the issue for > > > > > > > me. > > > > > > > > > > > > > > I haven't yet fine-tested from which kernel release on the > > > > > > > dpt_i2o driver > > > > > > > behaves like this and spews out zeroed blocks when trying to mount > > > > > > > the rootfs. Maybe this is just some timing issue. > > > > > > > > > > > > I've started the fine-tests and can say so far that dpt_i2o from > > > > > > 2.6.22 is still fine. Test is simple: > > > > > > > > > > > > [EMAIL PROTECTED]:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r > > > > > > dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/ > > > > > > > > > > > > ... recompile the kernel, reboot: works. > > > > > > > > > > > > 2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two > > > > > > different > > > > > > patch sets: > > > > > > -one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1 > > > > > > -one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3 > > > > > > -one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10. > > > > > > > > > > > > When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel, > > > > > > the "zero blocks"-symptom show up, so it's the "lucky" situation > > > > > > that the smallest patch actually seams to be the broken one. > > > > > > > > > > > > According to the 2.6.23-rc1 short-form changelog, there is > > > > > > one major edit on the dpt_i2o driver: > > > > > > > > > > > > FUJITA Tomonori > > > > > > > > > > > > [SCSI] dpt_i2o: convert to use the data buffer accessors > > > > > > > > > > > > Stephen Rothwell > > > > > > dpt_i2o depends on virt_to_bus > > > > > > > > > > > > Fujita, would you please take a look at this? > > > > > > > > > > He won't have seen this. cc's added. > > > > > > > > > > > I think that something's broken in there, leading to the dpt_i2o > > > > > > sending out blocks of zeroes right after initialization, at least on > > > > > > some specific controllers (in this case, Adaptec 2010S on Intel > > > > > > SE7501WV2S-based boxes). > > > > > > > > > > > > I don't have insight kernel driver development knowledge, so I'm > > > > > > quite out of help right now. Nevertheless, I'll add the diff > > > > > > from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o: > > > > > > > > > > > > > > > > Can you please confirm that this revert (against 2.6.24-rc4) fixes > > > > > the data > > > > > corruption problems? > > > > > > > > Anders said that my patch is fine and seems that Matthew's hotplug > > > > conversion patch leads to the problem: > > > > > > > > http://marc.info/?l=linux-kernel&m=119641892129732&w=2 > > > > > > Oh. Jan broke message threading :( > > > > > > So it's been nearly a week and nothing has happened? Do we revert that > > > change? > > > > SCSI people really want this conversion... > > > > Matthew, did you have a chance to look at it? > > It seems pretty improbably that a change of that nature could cause data > corruption. Anders, are you able to determine whether the revert (against > current Linus mainline or 2.6.24-rc4) fixes things? Because it would be > very strange... > > This is a grave bug. It's really quite urgent... > > Thanks. > > drivers/scsi/dpt_i2o.c | 132 ++- > drivers/scsi/dpti.h|9 ++ > 2 files changed, 68 insertions(+), 73 deletions(-) I've done the following: -untared a clean 2.6.24-rc4 and compiled it with my 2.6.23.1-settings in order to verify that the driver is still broken: checked, the box still won't boot. -patched the just compiled kernel source with your patch, "make dist-clean" (by means of "make-kpkg clean") and recompile: box boots fine. I've put the captured console logs to http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-pristine http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-patched ... and the kernelconfig (which shouldn't matter) to http://w.sysiphus.de/dpt_i2o
Re: [PATCH 01/28] blk_end_request: add new request completion interface (take 3)
On Tue, Dec 04 2007, Kiyoshi Ueda wrote: > Hi Boaz and Jens, > > On Tue, 04 Dec 2007 15:56:32 +0200, Boaz Harrosh <[EMAIL PROTECTED]> wrote: > > > +/** > > > + * blk_end_request - Helper function for drivers to complete the request. > > > + * @rq: the request being processed > > > + * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error > > > + * @nr_bytes: number of bytes to complete > > > + * > > > + * Description: > > > + * Ends I/O on a number of bytes attached to @rq. > > > + * If @rq has leftover, sets it up for the next range of segments. > > > + * > > > + * Return: > > > + * 0 - we are done with this request > > > + * 1 - still buffers pending for this request > > > + **/ > > > +int blk_end_request(struct request *rq, int uptodate, int nr_bytes) > > > > I always hated that uptodate boolean with possible negative error value. > > I guess it was done for backward compatibility of then users of > > end_that_request_first(). But since you are introducing a new API then > > this is not the case. Just have regular status int where 0 means ALL_OK > > and negative value means error code. > > Just my $0.02. > > Thank you for the comment. > I think it's quite reasonable. > By doing that, we don't need end_io_error() anymore. > > > Jens, > What do you think? I agree, the uptodate usage right now is horrible. > If you agree with the interface change above, I would prefer to > separate the patch-set from blk_end_request patch-set like below: > o blk_end_request: remove end_that_request_* > o change interface of 'uptodate' in blk_end_request to 'error' > It makes the purpose of blk_end_request patch-set clear > (and also, each patch of device drivers could be smaller). > But, it doubles your merging work. So if you would like to get > the changes at once, I'll merge them into blk_end_request patch-set. Twice the merging is not an issue for me. > As for the patch inclusion, do you push the driver changes to Linus > all at once? Or should I ask each maintainer to take the patch? Lets just try to get as many maintainer acks as possible, since the patches need to go in together. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linux-usb-users] Read errors on Flash Drive Transcend TS1GJF2A
On Tue, Dec 04 2007 at 20:03 +0200, Alan Stern <[EMAIL PROTECTED]> wrote: > On Tue, 4 Dec 2007, Boaz Harrosh wrote: > >> perhaps below hunk should be added to your patch. > > It looks like a good idea. > >> Was it decided when this data corruption bugfix is >> merged. > > I don't know -- I haven't heard anything back from James. > >> Also in sr.c. It does the range check but it might >> enjoy the resid handling as well. > > I think the range checking in sr.c is completely wrong. The code > doesn't check carefully to see that the error sector lies within the > range of sectors being accessed; there's a possibility of overflow if > the capacity is larger than 2**31 bytes. Also this line in particular > makes no sense: > > error_sector &= ~(block_sectors - 1); > > Like you said, using the residue value too wouldn't hurt. > > Furthermore the check for the Valid bit is done wrongly: > > if (!(SCpnt->sense_buffer[0] & 0x90)) > > This will never be true because of the earlier test: > > if (driver_byte(result) != 0 && /* An error occurred */ > (SCpnt->sense_buffer[0] & 0x7f) == 0x70) { /* Sense current */ > > It probably should test against 0x80 instead of 0x90. > > Alan Stern > Hi Alen Yes, I have not inspected sr.c very carefully, you are absolutely right. Could you submit a unified patch for sd, sr and scsi.c I have hit this bug 2 in my error injection tests. I was doing sg_chaining tests and now with the possibly very large requests the problem gets worse. At the time, I could not identify the exact problem, thanks Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html