Re: 2.6.24-rc4-mm1: hostbyte=0x01 driverbyte=0x00 (now bisected)

2007-12-05 Thread Hannes Reinecke
Alexey Dobriyan wrote:
>>  git-scsi-misc.patch
> 
> Apologies for not looking into the problem earlier. See
> http://marc.info/?t=11962802235&r=1&w=2
> "2.6.24-rc3-mm2: Result: hostbyte=0x01 driverbyte=0x00\nend_request: I/O 
> error"
> for previous installment.
> 
> I've bisected it to the following patch in git-scsi-misc branch.
> Revert on top of 2.6.24-rc4-mm1 also helps.
> 
> commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0
> Author: Hannes Reinecke <[EMAIL PROTECTED]>
> Date:   Tue Nov 6 09:23:40 2007 +0100
> 
> [SCSI] Do not requeue requests if REQ_FAILFAST is set
> 
> Any requests with the REQ_FAILFAST flag set should not be requeued
> to the requeust queue, but rather terminated directly.
> Otherwise the multipath failover will stall until the command
> timeout triggers.
> 
> Signed-off-by: Hannes Reinecke <[EMAIL PROTECTED]>
> Signed-off-by: James Bottomley <[EMAIL PROTECTED]>
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0f44bdb..0da0dd0 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1286,6 +1286,11 @@ int scsi_prep_state_check(struct scsi_device *sdev, 
> struct request *req)
>*/
>   if (!(req->cmd_flags & REQ_PREEMPT))
>   ret = BLKPREP_DEFER;
> + /*
> +  * Return failfast requests immediately
> +  */
> + if (req->cmd_flags & REQ_FAILFAST)
> + ret = BLKPREP_KILL;
>   break;
>   default:
>   /*
> @@ -1414,6 +1419,17 @@ static inline int scsi_host_queue_ready(struct 
> request_queue *q,
>   return 1;
>  }
>  
> +static void __scsi_kill_request(struct request *req)
> +{
> + struct scsi_cmnd *cmd = req->special;
> + struct scsi_device *sdev = cmd->device;
> +
> + cmd->result = DID_NO_CONNECT << 16;
> + atomic_inc(&cmd->device->iorequest_cnt);
> + sdev->device_busy--;
> + __scsi_done(cmd);
> +}
> +
>  /*
>   * Kill a request for a dead device
>   */
> @@ -1527,8 +1543,16 @@ static void scsi_request_fn(struct request_queue *q)
>* accept it.
>*/
>   req = elv_next_request(q);
> - if (!req || !scsi_dev_queue_ready(q, sdev))
> + if (!req)
> + break;
> +
> + if (!scsi_dev_queue_ready(q, sdev)) {
> + if (req->cmd_flags & REQ_FAILFAST) {
> + scsi_kill_request(req, q);
> + continue;
> + }
>   break;
> + }
>  
>   if (unlikely(!scsi_device_online(sdev))) {
>   sdev_printk(KERN_ERR, sdev,
> @@ -1609,8 +1633,12 @@ static void scsi_request_fn(struct request_queue *q)
>* later time.
>*/
>   spin_lock_irq(q->queue_lock);
> - blk_requeue_request(q, req);
> - sdev->device_busy--;
> + if (unlikely(req->cmd_flags & REQ_FAILFAST))
> + __scsi_kill_request(req);
> + else {
> + blk_requeue_request(q, req);
> + sdev->device_busy--;
> + }
>   if(sdev->device_busy == 0)
>   blk_plug_device(q);
>   out:
Yeah, sorry. That patch was bad. Please use the attached one instead.
Andrew, can you replace them?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 13e7e09..9ec1566 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1284,13 +1284,15 @@ int scsi_prep_state_check(struct scsi_device *sdev, 
struct request *req)
/*
 * If the devices is blocked we defer normal commands.
 */
-   if (!(req->cmd_flags & REQ_PREEMPT))
-   ret = BLKPREP_DEFER;
-   /*
-* Return failfast requests immediately
-*/
-   if (req->cmd_flags & REQ_FAILFAST)
-   ret = BLKPREP_KILL;
+   if (!(req->cmd_flags & REQ_PREEMPT)) {
+   /*
+* Return failfast requests immediately
+*/
+   if (req->cmd_flags & REQ_FAILFAST)
+   ret = BLKPREP_KILL;
+   else
+   ret = BLKPREP_DEFER;
+   }
break;
default:
/*


Re: aic7-triple x build fixed in kbuild.git

2007-12-05 Thread Hannes Reinecke
Sam Ravnborg wrote:
> akpm reported that:
> make mrproper
> make allmodconfig
> make drivers/scsi/aic7xxx/
> 
> were broken in recent kbuild.git - and as a logical consequence
> removed kbuild.git from his -mm lineup.
> 
> I have now fixed it (stupid bug I made) and pushed out
> a fresh kbuild.git tree.
> 
> As previously announced I am not very active in kbuild land the next
> couple of weeks so I may be slow if there is any comments.
> 
> The fixed commit is included below.
> 
>   Sam
> 
> commit 63b7c4e77e9204407ee8a5d5243ba9c502550252
> Author: Vegard Nossum <[EMAIL PROTECTED]>
> Date:   Fri Oct 26 13:31:13 2007 +0200
> 
> aic7(3*x): fix firmware build
> 
> This patch adds the proper $(obj) and $(src) prefixes to dependency
> rules in aic7xxx makefile. Without this patch, there is a remote
> possibility that parallel make with a different output directory can
> fail.
> 
> Also changed the deprecated EXTRA_CFLAGS construct to ccflags-y syntax.
> 
> Fixed up patch to survive "make drivers/scsi/ -j"
> with BUILD_FIRMWARE enable. /Sam
> 
> Signed-off-by: Vegard Nossum <[EMAIL PROTECTED]>
> Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]>
> Cc: linux-scsi@vger.kernel.org
> 
> diff --git a/drivers/scsi/aic7xxx/Makefile b/drivers/scsi/aic7xxx/Makefile
> index 9a6ce19..e4f70c5 100644
> --- a/drivers/scsi/aic7xxx/Makefile
> +++ b/drivers/scsi/aic7xxx/Makefile
> @@ -33,11 +33,10 @@ aic79xx-y += 
> aic79xx_osm.o\
>  aic79xx_proc.o   \
>  aic79xx_osm_pci.o
>  
> -EXTRA_CFLAGS += -Idrivers/scsi
> +ccflags-y += -Idrivers/scsi
>  ifdef WARNINGS_BECOME_ERRORS
> -EXTRA_CFLAGS += -Werror
> +ccflags-y += -Werror
>  endif
> -#EXTRA_CFLAGS += -g
>  
>  # Files generated that shall be removed upon make clean
>  clean-files := aic7xxx_seq.h aic7xxx_reg.h aic7xxx_reg_print.c
> @@ -46,53 +45,45 @@ clean-files += aic79xx_seq.h aic79xx_reg.h 
> aic79xx_reg_print.c
>  # Dependencies for generated files need to be listed explicitly
>  
>  $(obj)/aic7xxx_core.o: $(obj)/aic7xxx_seq.h
> +$(obj)/aic7xxx_core.o: $(obj)/aic7xxx_reg.h
>  $(obj)/aic79xx_core.o: $(obj)/aic79xx_seq.h
> -$(obj)/aic79xx_reg_print.c: $(src)/aic79xx_reg_print.c_shipped
> -$(obj)/aic7xxx_reg_print.c: $(src)/aic7xxx_reg_print.c_shipped
> +$(obj)/aic79xx_core.o: $(obj)/aic79xx_reg.h
>  
> -$(addprefix $(obj)/,$(aic7xxx-y)): $(obj)/aic7xxx_reg.h
> -$(addprefix $(obj)/,$(aic79xx-y)): $(obj)/aic79xx_reg.h
> +$(addprefix $(obj)/,$(aic7xxx-y)): $(obj)/aic7xxx_seq.h
> +$(addprefix $(obj)/,$(aic79xx-y)): $(obj)/aic79xx_seq.h
>  
> -aic7xxx-gen-$(CONFIG_AIC7XXX_BUILD_FIRMWARE) := $(obj)/aic7xxx_seq.h \
> -$(obj)/aic7xxx_reg.h
> +aic7xxx-gen-$(CONFIG_AIC7XXX_BUILD_FIRMWARE) := $(obj)/aic7xxx_reg.h
>  aic7xxx-gen-$(CONFIG_AIC7XXX_REG_PRETTY_PRINT)   += 
> $(obj)/aic7xxx_reg_print.c
>  
>  aicasm-7xxx-opts-$(CONFIG_AIC7XXX_REG_PRETTY_PRINT) := \
>   -p $(obj)/aic7xxx_reg_print.c -i aic7xxx_osm.h
>  
>  ifeq ($(CONFIG_AIC7XXX_BUILD_FIRMWARE),y)
> -# Create a dependency chain in generated files
> -# to avoid concurrent invocations of the single
> -# rule that builds them all.
> -aic7xxx_seq.h: aic7xxx_reg.h
> -ifeq ($(CONFIG_AIC7XXX_REG_PRETTY_PRINT),y)
> -aic7xxx_reg.h: aic7xxx_reg_print.c
> -endif
> -$(aic7xxx-gen-y): $(src)/aic7xxx.seq $(src)/aic7xxx.reg $(obj)/aicasm/aicasm
> +$(obj)/aic7xxx_seq.h: $(src)/aic7xxx.seq $(src)/aic7xxx.reg 
> $(obj)/aicasm/aicasm
>   $(obj)/aicasm/aicasm -I$(src) -r $(obj)/aic7xxx_reg.h \
> $(aicasm-7xxx-opts-y) -o $(obj)/aic7xxx_seq.h \
> $(src)/aic7xxx.seq
> +
> +$(aic7xxx-gen-y): $(obj)/aic7xxx_seq.h
> +else
> +$(obj)/aic7xxx_reg_print.c: $(src)/aic7xxx_reg_print.c_shipped
>  endif
>  
> -aic79xx-gen-$(CONFIG_AIC79XX_BUILD_FIRMWARE) := $(obj)/aic79xx_seq.h \
> -$(obj)/aic79xx_reg.h
> +aic79xx-gen-$(CONFIG_AIC79XX_BUILD_FIRMWARE) := $(obj)/aic79xx_reg.h
>  aic79xx-gen-$(CONFIG_AIC79XX_REG_PRETTY_PRINT)   += 
> $(obj)/aic79xx_reg_print.c
>  
>  aicasm-79xx-opts-$(CONFIG_AIC79XX_REG_PRETTY_PRINT) := \
>   -p $(obj)/aic79xx_reg_print.c -i aic79xx_osm.h
>  
>  ifeq ($(CONFIG_AIC79XX_BUILD_FIRMWARE),y)
> -# Create a dependency chain in generated files
> -# to avoid concurrent invocations of the single
> -# rule that builds them all.
> -aic79xx_seq.h: aic79xx_reg.h
> -ifeq ($(CONFIG_AIC79XX_REG_PRETTY_PRINT),y)
> -aic79xx_reg.h: aic79xx_reg_print.c
> -endif
> -$(aic79xx-gen-y): $(src)/aic79xx.seq $(src)/aic79xx.reg $(obj)/aicasm/aicasm
> +$(obj)/aic79xx_seq.h: $(src)/aic79xx.seq $(src)/aic79xx.reg 
> $(obj)/aicasm/aicasm
>   $(obj)/aicasm/aicasm -I$(src) -r $(obj)/aic79xx_reg.h \
> $(aicasm-79xx-opts-y) -o $(obj)/aic79xx_seq.h \
>  

Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

2007-12-05 Thread Andrew Morton
On Thu, 06 Dec 2007 14:49:37 +0900 FUJITA Tomonori <[EMAIL PROTECTED]> wrote:

> > >  drivers/scsi/dpt_i2o.c |  132 ++-
> > >  drivers/scsi/dpti.h|9 ++
> > >  2 files changed, 68 insertions(+), 73 deletions(-)
> > 
> > I've done the following:
> > 
> > -untared a clean 2.6.24-rc4 and compiled it with my 2.6.23.1-settings in 
> > order
> >  to verify that the driver is still broken: checked, the box still won't 
> >  boot.
> > 
> > -patched the just compiled kernel source with your patch, "make dist-clean"
> >  (by means of "make-kpkg clean") and recompile: box boots fine.
> >
> > I've put the captured console logs to
> > http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-pristine
> > http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-patched
> > ... and the kernelconfig (which shouldn't matter) to
> > http://w.sysiphus.de/dpt_i2o/kernelconfig.2624-rc4
> 
> Thanks for testing. So reverting Matthew's hotplug patch fixes the
> problem though I have no idea how the patch leads to this. Seems that
> nobody has any clue on that. We need to revert that patch for the
> moment.

OK, thanks.  Let's leave it a couple of days for people to register objections,
have bright ideas, etc.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

2007-12-05 Thread FUJITA Tomonori
On Wed, 5 Dec 2007 11:14:41 +0100
Anders Henke <[EMAIL PROTECTED]> wrote:

> On Tue, 4 Dec 2007 Andrew Morton wrote:
> > On Wed, 05 Dec 2007 10:30:54 +0900 FUJITA Tomonori <[EMAIL PROTECTED]> 
> > wrote:
> > 
> > > On Tue, 4 Dec 2007 17:11:55 -0800
> > > Andrew Morton <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On Wed, 05 Dec 2007 10:04:03 +0900
> > > > FUJITA Tomonori <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > On Tue, 4 Dec 2007 16:57:38 -0800
> > > > > Andrew Morton <[EMAIL PROTECTED]> wrote:
> > > > > 
> > > > > > On Thu, 29 Nov 2007 13:31:50 +0100
> > > > > > Anders Henke <[EMAIL PROTECTED]> wrote:
> > > > > > 
> > > > > > > On November 28 2007, Anders Henke wrote:
> > > > > > > > As "everything is reported as being zero" is quite odd an Jan 
> > > > > > > > took a
> > > > > > > > guess that it might be block-layer or driver-related, I've 
> > > > > > > > assumed
> > > > > > > > that the driver is responsible for this; just out of the 
> > > > > > > > curiousity, 
> > > > > > > > I've manually replaced the dpt_i2o driver by the 2.6.19 one by 
> > > > > > > > copying 
> > > > > > > > driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ 
> > > > > > > > into a 
> > > > > > > > vanilla 2.6.23.1. kernel; using this kernel fixed the issue for 
> > > > > > > > me.
> > > > > > > > 
> > > > > > > > I haven't yet fine-tested from which kernel release on the 
> > > > > > > > dpt_i2o driver 
> > > > > > > > behaves like this and spews out zeroed blocks when trying to 
> > > > > > > > mount
> > > > > > > > the rootfs. Maybe this is just some timing issue.
> > > > > > > 
> > > > > > > I've started the fine-tests and can say so far that dpt_i2o from 
> > > > > > > 2.6.22 is still fine. Test is simple:
> > > > > > > 
> > > > > > > [EMAIL PROTECTED]:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r 
> > > > > > > dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/
> > > > > > > 
> > > > > > > ... recompile the kernel, reboot: works.
> > > > > > > 
> > > > > > > 2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two 
> > > > > > > different
> > > > > > > patch sets:
> > > > > > > -one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1
> > > > > > > -one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3
> > > > > > > -one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10.
> > > > > > > 
> > > > > > > When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel,
> > > > > > > the "zero blocks"-symptom show up, so it's the "lucky" situation
> > > > > > > that the smallest patch actually seams to be the broken one.
> > > > > > > 
> > > > > > > According to the 2.6.23-rc1 short-form changelog, there is
> > > > > > > one major edit on the dpt_i2o driver:
> > > > > > > 
> > > > > > > FUJITA Tomonori 
> > > > > > > 
> > > > > > >   [SCSI] dpt_i2o: convert to use the data buffer accessors
> > > > > > > 
> > > > > > > Stephen Rothwell 
> > > > > > >   dpt_i2o depends on virt_to_bus
> > > > > > > 
> > > > > > > Fujita, would you please take a look at this?
> > > > > > 
> > > > > > He won't have seen this.  cc's added.
> > > > > > 
> > > > > > > I think that something's broken in there, leading to the dpt_i2o 
> > > > > > > sending out blocks of zeroes right after initialization, at least 
> > > > > > > on
> > > > > > > some specific controllers (in this case, Adaptec 2010S on Intel
> > > > > > > SE7501WV2S-based boxes).
> > > > > > > 
> > > > > > > I don't have insight kernel driver development knowledge, so I'm
> > > > > > > quite out of help right now. Nevertheless, I'll add the diff
> > > > > > > from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o:
> > > > > > > 
> > > > > > 
> > > > > > Can you please confirm that this revert (against 2.6.24-rc4) fixes 
> > > > > > the data
> > > > > > corruption problems?
> > > > > 
> > > > > Anders said that my patch is fine and seems that Matthew's hotplug
> > > > > conversion patch leads to the problem:
> > > > > 
> > > > > http://marc.info/?l=linux-kernel&m=119641892129732&w=2
> > > > 
> > > > Oh.  Jan broke message threading :(
> > > > 
> > > > So it's been nearly a week and nothing has happened?  Do we revert that
> > > > change?
> > > 
> > > SCSI people really want this conversion...
> > > 
> > > Matthew, did you have a chance to look at it?
> > 
> > It seems pretty improbably that a change of that nature could cause data
> > corruption.  Anders, are you able to determine whether the revert (against
> > current Linus mainline or 2.6.24-rc4) fixes things?  Because it would be
> > very strange...
> > 
> > This is a grave bug.  It's really quite urgent...
> > 
> > Thanks.
> > 
> >  drivers/scsi/dpt_i2o.c |  132 ++-
> >  drivers/scsi/dpti.h|9 ++
> >  2 files changed, 68 insertions(+), 73 deletions(-)
> 
> I've done the following:
> 
> -untared a clean 2.6.24-rc4 and compiled it with my 2.6.23.1-settings in order
>  to verify that the driver is still broken: checked, the box still won't 
>  boot.
> 
> -patched the

Re: [PATCH] esp_scsi: Fix reset cleanup spinlock recursion

2007-12-05 Thread David Miller
From: "Maciej W. Rozycki" <[EMAIL PROTECTED]>
Date: Wed, 5 Dec 2007 16:10:54 + (GMT)

>  The esp_reset_cleanup() function is called with the host lock held and 
> invokes starget_for_each_device() which wants to take it too.  Here is a 
> fix along the lines of shost_for_each_device()/__shost_for_each_device() 
> adding a __starget_for_each_device() counterpart which assumes the lock 
> has already been taken.
> 
>  Eventually, I think the driver should get modified so that more work is 
> done as a softirq rather than in the interrupt context, but for now it 
> fixes a bug that causes the spinlock debugger to fire.
> 
>  While at it, it fixes a small number of cosmetic problems with 
> starget_for_each_device() too.
> 
> Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>

Acked-by: David S. Miller <[EMAIL PROTECTED]>

Thanks for finding and fixing this bug.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 27/28] blk_end_request: changing scsi mid-layer for bidi (take 3)

2007-12-05 Thread Kiyoshi Ueda
Hi Boaz,

On Tue, 04 Dec 2007 15:39:12 +0200, Boaz Harrosh <[EMAIL PROTECTED]> wrote:
> On Sat, Dec 01 2007 at 1:35 +0200, Kiyoshi Ueda <[EMAIL PROTECTED]> wrote:
> > This patch converts bidi of scsi mid-layer to use blk_end_request().
> > 
> > rq->next_rq represents a pair of bidi requests.
> > (There are no other use of 'next_rq' of struct request.)
> > For both requests in the pair, end_that_request_chunk() should be
> > called before end_that_request_last() is called for one of them.
> > Since the calls to end_that_request_first()/chunk() and
> > end_that_request_last() are packaged into blk_end_request(),
> > the handling of next_rq completion has to be moved into
> > blk_end_request(), too.
> > 
> > Bidi sets its specific value to rq->data_len before the request is
> > completed so that upper-layer can read it.
> > This setting must be between end_that_request_chunk() and
> > end_that_request_last(), because rq->data_len may be used
> > in end_that_request_chunk() by blk_trace and so on.
> > To satisfy the requirement, use blk_end_request_callback() which
> > is added in PATCH 25 only for the tricky drivers.
> > 
> > If bidi didn't reuse rq->data_len and added new members to request
> > for the specific value, it could set before end_that_request_chunk()
> > and use the standard blk_end_request() like below.
> > 
> > void scsi_end_bidi_request(struct scsi_cmnd *cmd)
> > {
> > struct request *req = cmd->request;
> > 
> > rq->resid = scsi_out(cmd)->resid;
> > rq->next_rq->resid = scsi_in(cmd)->resid;
> > 
> > if (blk_end_request(req, 1, req->data_len))
> > BUG();
> > 
> > scsi_release_buffers(cmd);
> > scsi_next_command(cmd);
> > }
...
snip
...
>
> rq->data_len = scsi_out(cmd)->resid is Not Just a problem of bidi
> it is a General problem of scsi residual handling, and user code.
> 
> Even today before any bidi. at scsi_lib.c at scsi_io_completion()
> we do req->data_len = scsi_get_resid(cmd);
> ( or: req->data_len = cmd->resid; depends which version you look)
> And then call scsi_end_request() which calls __end_that_request_first/last
> So it is assumed even today that req->data_len is not touched by
> __end_that_request_first/last unless __end_that_request_first returned
> that there is more work to do and the command is resubmitted in which
> case the resid information is discarded.
> 
> So if the regular resid handling is acceptable - Set req->data_len
> before the call to __end_that_request_first/last, or blk_end_request()
> in your case, then here goes your second client of the _callback and
> it can be removed.
> But if it is found that req->data_len is touched and the resid information
> gets lost, than it should be fixed for the common uni-io case, by - for 
> example
> - pass resid to the blk_end_request() function.
> (So in any way the _callback can go)

Thank you for the explanation of scsi's rq->data_len usage.
I see that scsi usually uses rq->data_len for cmd->resid.

I have investigated the possibility of setting data_len before
the call to blk_end_request.
But no matter whether data_len is touched or not, we need a callback
for bidi.  So I would like to go with the current patch.

I explained the reason and some details below.


As far as I can see, rq->data_len is just referenced
by blk_add_trace_rq() in __end_that_request_first(), not modified.
And I don't change any logic around there in the block-layer.
So there shouldn't be any critical problem for scsi residual handing.
(although I'm not sure that scsi expectes cmd->resid to be traced
 by blk_trace.)

Anyway, I see that it is no critical problem for bidi to set cmd->resid
to rq->data_len before blk_end_request() call.
But if I do that, blk_end_request() can't get the next_rq's size
to complete in its code below.

> + /* Bidi request must be completed as a whole */
> + if (blk_bidi_rq(rq) &&
> + __end_that_request_first(rq->next_rq, uptodate,
> +  blk_rq_bytes(rq->next_rq)))
> + return 1;

So I will have to move next_rq completion to bidi and use _callback()
anyway like the following.
-
static int dummy_cb(struct request *rq)
{
return 1;
}

void scsi_end_bidi_request(struct scsi_cmnd *cmd)
{
struct request *req = cmd->request;
unsigned int dlen = req->data_len;
unsigned int next_dlen = req->next_rq->data_len;
 
req->data_len = scsi_out(cmd)->resid;
req->next_rq->data_len = scsi_in(cmd)->resid;
 
/* Complete only DATA of next_rq using _callback and dummy function */
if (!blk_end_request_callback(req->next_rq, 1, next_dlen, dummy_cb))
BUG();
 
if (blk_end_request(req, 1, dlen))
BUG();

scsi_release_buffers(cmd);
scsi_next_command(cmd);
}
-

I prefer

2.6.24-rc4-mm1: hostbyte=0x01 driverbyte=0x00 (now bisected)

2007-12-05 Thread Alexey Dobriyan
>  git-scsi-misc.patch

Apologies for not looking into the problem earlier. See
http://marc.info/?t=11962802235&r=1&w=2
"2.6.24-rc3-mm2: Result: hostbyte=0x01 driverbyte=0x00\nend_request: I/O error"
for previous installment.

I've bisected it to the following patch in git-scsi-misc branch.
Revert on top of 2.6.24-rc4-mm1 also helps.

commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0
Author: Hannes Reinecke <[EMAIL PROTECTED]>
Date:   Tue Nov 6 09:23:40 2007 +0100

[SCSI] Do not requeue requests if REQ_FAILFAST is set

Any requests with the REQ_FAILFAST flag set should not be requeued
to the requeust queue, but rather terminated directly.
Otherwise the multipath failover will stall until the command
timeout triggers.

Signed-off-by: Hannes Reinecke <[EMAIL PROTECTED]>
Signed-off-by: James Bottomley <[EMAIL PROTECTED]>

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0f44bdb..0da0dd0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1286,6 +1286,11 @@ int scsi_prep_state_check(struct scsi_device *sdev, 
struct request *req)
 */
if (!(req->cmd_flags & REQ_PREEMPT))
ret = BLKPREP_DEFER;
+   /*
+* Return failfast requests immediately
+*/
+   if (req->cmd_flags & REQ_FAILFAST)
+   ret = BLKPREP_KILL;
break;
default:
/*
@@ -1414,6 +1419,17 @@ static inline int scsi_host_queue_ready(struct 
request_queue *q,
return 1;
 }
 
+static void __scsi_kill_request(struct request *req)
+{
+   struct scsi_cmnd *cmd = req->special;
+   struct scsi_device *sdev = cmd->device;
+
+   cmd->result = DID_NO_CONNECT << 16;
+   atomic_inc(&cmd->device->iorequest_cnt);
+   sdev->device_busy--;
+   __scsi_done(cmd);
+}
+
 /*
  * Kill a request for a dead device
  */
@@ -1527,8 +1543,16 @@ static void scsi_request_fn(struct request_queue *q)
 * accept it.
 */
req = elv_next_request(q);
-   if (!req || !scsi_dev_queue_ready(q, sdev))
+   if (!req)
+   break;
+
+   if (!scsi_dev_queue_ready(q, sdev)) {
+   if (req->cmd_flags & REQ_FAILFAST) {
+   scsi_kill_request(req, q);
+   continue;
+   }
break;
+   }
 
if (unlikely(!scsi_device_online(sdev))) {
sdev_printk(KERN_ERR, sdev,
@@ -1609,8 +1633,12 @@ static void scsi_request_fn(struct request_queue *q)
 * later time.
 */
spin_lock_irq(q->queue_lock);
-   blk_requeue_request(q, req);
-   sdev->device_busy--;
+   if (unlikely(req->cmd_flags & REQ_FAILFAST))
+   __scsi_kill_request(req);
+   else {
+   blk_requeue_request(q, req);
+   sdev->device_busy--;
+   }
if(sdev->device_busy == 0)
blk_plug_device(q);
  out:
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stex driver - kernel panics

2007-12-05 Thread Sean Rendell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Rolf Eike Beer wrote:
> 
> gdb stex.o
> l *stex_mu_intr+0x10e
> 
> That will give you the exact line that failed
> 
> Greeting,
> 
> Eike



Well I get this:

raidology:~# gdb stex.o
...
(gdb) l *stex_mu_intr+0x10e
No symbol table is loaded.  Use the "file" command.


I think its due to no kernel debugging info.  I'll recompile with 
"CONFIG_DEBUG_KERNEL" set and see what happens.

Won't the recompile make the previous procedure addresses invalid?

cya
Sean

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHVy2NdnDJ8yyauXcRAuNGAJ9qIRhBcHEEK6tn7HhUq9pnaA/ovQCcDLGY
eLSL9wCctTNBa9MfjSYEphw=
=nUpC
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan

2007-12-05 Thread Lee Schermerhorn
On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote:
> On Wed, 05 Dec 2007 11:36:39 -0500
> Lee Schermerhorn <[EMAIL PROTECTED]> wrote:
> 
> > As reported here:
> > 
> > http://marc.info/?l=linux-scsi&m=119645761124683&w=4
> > 
> > against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA
> > platform under 24-rc4-mm1 with async scsi scan enabled.  I'm still
> > seeing the message  "mptspi: ioc#: mpt_config failed" when it hangs. 
> > 
> > I can boot by disabling async scan.  However, I've also noticed some
> > disks attached via one of the "mpt" adapters ["scsi8" in console long in
> > message linked above] going "off-line" during stress tests.  This was
> > under 24-rc3-mm2.  Haven't got that far yet with 24-rc4-mm1.
> > 
> 
> Is ther any way of tricking you into
> http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt?
> 
> Obvious culprits to start with would be git-scsi-misc and maybe
> scsi-early-detection-of-medium-not-present-updated.patch.  But there are
> only 20-odd scsi patches in there.

I'll try to get to it tomorrow am.  

Lee
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stex driver - kernel panics

2007-12-05 Thread Rolf Eike Beer
Sean Rendell wrote:
> Hi all,
>
> I was wondering if someone knows how to debug this.  I have a HP xw4300 PC
> with 2G RAM and 3GHz P4.  Included is a Promise EX8350 RAID card with 4
> drives configured as RAID5.  While doing some lengthy disk exercises (ie cp
> a 200G directory of files) the kernel usually panics.
>
> This is the 2.6.23 kernel patched to 2.6.24-rc4.  The only module is the
> stex driver, all other drivers are in the kernel.
>
> Can anyone debug this?  (I am not a software guy)  Do you need more info?
>
> cya
> Sean
>
>
> (copied by hand from the screen)
> raidology:/home# mkdir foo
> raidology:/home# cp -a * foo/
> stex(:11:0e.0): aborting command
> sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 40 3c a7 17 00 00 08 00
> stex(:11:0e.0): aborting command
> sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 06 85 cc 17 00 01 00 00
> stex(:11:0e.0): aborting command
> sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 31 1c 66 27 00 00 08 00
> stex(:11:0e.0): aborting command
> sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 4f 1f 87 f7 00 00 08 00
> stex(:11:0e.0): aborting command
> sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 4a 07 d1 ef 00 00 08 00
> stex(:11:0e.0): aborting command
> sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 18 8e 33 30 00 00 01 00
> stex(:11:0e.0): aborting command
> sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 18 8e 33 3f 00 00 08 00
> stex(:11:0e.0): aborting command
> sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 02 ba 77 b7 00 00 08 00
> BUG: unable to handle kernel NULL pointer dereference at virtual address
>  printing eip:  *pde = 
> Oops:  [#1] SMP
> Modules linked in: stex
>
> Pid: 0, comm: swapper Not tainted (2.6.24-rc4loca-RAID5-02 #1)
> EIP: 0060:[<>] EFLAGS: 00010046 CPU:0
> EIP is at run_init_process+0x3fefff40/0x20
> EAX: f7760900 EBX: f7548d78 ECX: f7760900 EDX: 
> ESI: f7760900 EDI: f7f6eaa0 EBP: f7f6ea70 ESP: c04ebeec
>  DS: 007b ES: 007b FS: 00d8 GS:  SS:0068
> Process swapper (pid: 0, ti=c04ea000 task=c04b53a0 task.ti=c04ea000)
> Stack: f887c4be c04ebf24 c04ebf1c c04ebf2c f8d9e000 f7f6eaa0 0001
> 00ced0bd f7f6ea70 f8d9e000 0246 0001 f887ceb2 f7670e80 
>  0010 c013dff5 c04df580 0010  c04df5b0 c013f7d4
> 0800 Call Trace:
>  [] stex_mu_intr+0x10e/0x4f0 [stex]
>  [] stex_intr+0x42/0x70 [stex]
>  [] handle_IRQ_event+0x25/0x60
>  [] handle_fasteoi_irq+0x74/0xf0
>  [] do_IRQ+0x3b/0x80
>  [] smp_apic_timer_interrupt+0x57/0x90
>  [] common_interrupt+0x23/0x28
>  [] mwait_idle_with_hints+0x46/0x60
>  [] cpu_idle+0x65/0x80
>  [] start_kernel+0x2af/0x2f0
>  [] unknown_bootoption+0x0/0x1f0
>  ===
> Code:  Bad EIP value.
> EIP: [<>] run_init_process=0x3fefff40/0x20 SS:ESP 0068:c04ebeec
> Kernel panic - not syncing: Fatal exception in interrupt

gdb stex.o
l *stex_mu_intr+0x10e

That will give you the exact line that failed

Greeting,

Eike


signature.asc
Description: This is a digitally signed message part.


Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan

2007-12-05 Thread Andrew Morton
On Wed, 05 Dec 2007 11:36:39 -0500
Lee Schermerhorn <[EMAIL PROTECTED]> wrote:

> As reported here:
> 
>   http://marc.info/?l=linux-scsi&m=119645761124683&w=4
> 
> against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA
> platform under 24-rc4-mm1 with async scsi scan enabled.  I'm still
> seeing the message  "mptspi: ioc#: mpt_config failed" when it hangs. 
> 
> I can boot by disabling async scan.  However, I've also noticed some
> disks attached via one of the "mpt" adapters ["scsi8" in console long in
> message linked above] going "off-line" during stress tests.  This was
> under 24-rc3-mm2.  Haven't got that far yet with 24-rc4-mm1.
> 

Is ther any way of tricking you into
http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt?

Obvious culprits to start with would be git-scsi-misc and maybe
scsi-early-detection-of-medium-not-present-updated.patch.  But there are
only 20-odd scsi patches in there.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


stex driver - kernel panics

2007-12-05 Thread Sean Rendell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

I was wondering if someone knows how to debug this.  I have a HP xw4300 PC with 
2G RAM and 3GHz P4.  Included is a
Promise EX8350 RAID card with 4 drives configured as RAID5.  While doing some 
lengthy disk exercises (ie cp a 200G
directory of files) the kernel usually panics.

This is the 2.6.23 kernel patched to 2.6.24-rc4.  The only module is the stex 
driver, all other drivers are in the kernel.

Can anyone debug this?  (I am not a software guy)  Do you need more info?

cya
Sean


(copied by hand from the screen)
raidology:/home# mkdir foo
raidology:/home# cp -a * foo/
stex(:11:0e.0): aborting command
sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 40 3c a7 17 00 00 08 00
stex(:11:0e.0): aborting command
sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 06 85 cc 17 00 01 00 00
stex(:11:0e.0): aborting command
sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 31 1c 66 27 00 00 08 00
stex(:11:0e.0): aborting command
sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 4f 1f 87 f7 00 00 08 00
stex(:11:0e.0): aborting command
sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 4a 07 d1 ef 00 00 08 00
stex(:11:0e.0): aborting command
sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 18 8e 33 30 00 00 01 00
stex(:11:0e.0): aborting command
sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 18 8e 33 3f 00 00 08 00
stex(:11:0e.0): aborting command
sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 02 ba 77 b7 00 00 08 00
BUG: unable to handle kernel NULL pointer dereference at virtual address 

printing eip:  *pde = 
Oops:  [#1] SMP
Modules linked in: stex

Pid: 0, comm: swapper Not tainted (2.6.24-rc4loca-RAID5-02 #1)
EIP: 0060:[<>] EFLAGS: 00010046 CPU:0
EIP is at run_init_process+0x3fefff40/0x20
EAX: f7760900 EBX: f7548d78 ECX: f7760900 EDX: 
ESI: f7760900 EDI: f7f6eaa0 EBP: f7f6ea70 ESP: c04ebeec
 DS: 007b ES: 007b FS: 00d8 GS:  SS:0068
Process swapper (pid: 0, ti=c04ea000 task=c04b53a0 task.ti=c04ea000)
Stack: f887c4be c04ebf24 c04ebf1c c04ebf2c f8d9e000 f7f6eaa0 0001 00ced0bd
   f7f6ea70 f8d9e000 0246 0001 f887ceb2 f7670e80  
   0010 c013dff5 c04df580 0010  c04df5b0 c013f7d4 0800
Call Trace:
 [] stex_mu_intr+0x10e/0x4f0 [stex]
 [] stex_intr+0x42/0x70 [stex]
 [] handle_IRQ_event+0x25/0x60
 [] handle_fasteoi_irq+0x74/0xf0
 [] do_IRQ+0x3b/0x80
 [] smp_apic_timer_interrupt+0x57/0x90
 [] common_interrupt+0x23/0x28
 [] mwait_idle_with_hints+0x46/0x60
 [] cpu_idle+0x65/0x80
 [] start_kernel+0x2af/0x2f0
 [] unknown_bootoption+0x0/0x1f0
 ===
Code:  Bad EIP value.
EIP: [<>] run_init_process=0x3fefff40/0x20 SS:ESP 0068:c04ebeec
Kernel panic - not syncing: Fatal exception in interrupt



Other info:


raidology:/# uname -a
Linux raidology 2.6.24-rc4loca-RAID5-02 #1 SMP Tue Dec 4 13:01:10 EST 2007 i686 
GNU/Linux


raidology:/# lspci
00:00.0 Host bridge: Intel Corporation 955X Express Memory Controller Hub
00:01.0 PCI bridge: Intel Corporation 955X Express PCI Express Root Port
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition 
Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 
(rev 01)
00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express 
Port 5 (rev 01)
00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express 
Port 6 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 
01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 
01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 
01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 
01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface 
Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller 
(rev 01)
00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA 
Storage Controller AHCI (rev 01)
01:00.0 VGA compatible controller: nVidia Corporation NV44 [Quadro NVS 285] 
(rev a1)
05:09.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 02)
05:0a.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 02)
10:00.0 PCI bridge: Intel Corporation 80333 Segment-A PCI Express-to-PCI 
Express Bridge
10:00.2 PCI bridge: Intel Corporation 80333 Segment-B PCI Express-to-PCI 
Express Bridge
11:0e.0 RAID bus controller: Promise Technology, Inc. Unknown device 8350
3f:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5752 Gigabit 
Ethernet PCI Express (rev 01)



raidology:/# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model 

Another use for block-layer timeouts

2007-12-05 Thread Matthew Wilcox
On Wed, Nov 07, 2007 at 02:22:57PM +0100, Jens Axboe wrote:
> On Tue, Nov 06 2007, [EMAIL PROTECTED] wrote:
> > gdth driver is modified NOT to use scp->eh_timeout. Now, it has
> > eh_timed_out (gdth_timed_out) to handle command timeouts for locked
> > I/O's. Have not tested as I don't have needed hardware! Patch is
> > against 2.6.23-mm1.
> 
> I updated the timeout patch to current kernel and fixed some fallout. I
> included your gdth patch.
> 
> Completely untested, patch is below. It's also in the #timeout branch of
> the block git repo, to keep track of it.

Hi Jens,

Kristen, Sarah and I were talking over lunch today, and discovered that
they each have a problem that I think can be solved by using the block
layer timeout functionality.  One problem is a software-controlled drive
activity LED.  The other is wanting to save power by suspending the link
to an idle USB Storage device.

What these both have in common is the desire to find out when the queue
is empty (with a bit of hysteresis to avoid flickering the LED off for
too long, or powering down the USB link when there's a command about
to arrive).

I think this could be done with the timer you're adding here, by adding
a driver callback to be called once the timer fires and there's no
commands in the queue.  Alternatively, we could add another timer.
While each driver could have its own timer to detect these cases, if
we have two drivers that want this information, chances are there are
others that could also make use of it.

What do you think?  Shall I try knocking up a patch to (ab)use the timer
in this way?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blk request timeout minor fixes...

2007-12-05 Thread malahal
Jens Axboe [EMAIL PROTECTED] wrote:
> ->timeout isn't used very much, so should not be too much work to
> console the two.
> 
> If you see any specific obstacles, do list them!

I believe it is used in blk_add_timer() which may be called:
1) First time sending the command to LLDs
   No problem. It would be set to correct value at that time.
2) called from blk_rq_timed_out() in response to BLK_EH_RESET_TIMER
   Without the actual timeout value, there is no way to res-set timer
   correctly.
3) Retry
   Same #2.

The other way is to remove the deadline field. That poses timing out
issues.

Thanks, Malahal.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blk request timeout minor fixes...

2007-12-05 Thread Jens Axboe
On Wed, Dec 05 2007, [EMAIL PROTECTED] wrote:
> Jens Axboe [EMAIL PROTECTED] wrote:
> > > This one fixes the 4-byte hole. Thank you very much.
> > 
> > That's all fine, but now we have two fields doing essentially the same
> > thing. Care to cleanup the ->timeout usage so we can get by with using
> > just that field?
> 
> I added another field because I couldn't find a way to use just one
> field without breaking something! I will gladly clean it up if you
> find a way to fix it.

->timeout isn't used very much, so should not be too much work to
console the two.

If you see any specific obstacles, do list them!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blk request timeout minor fixes...

2007-12-05 Thread malahal
Jens Axboe [EMAIL PROTECTED] wrote:
> > This one fixes the 4-byte hole. Thank you very much.
> 
> That's all fine, but now we have two fields doing essentially the same
> thing. Care to cleanup the ->timeout usage so we can get by with using
> just that field?

I added another field because I couldn't find a way to use just one
field without breaking something! I will gladly clean it up if you
find a way to fix it.

Thanks, Malahal.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-usb-users] Read errors on Flash Drive Transcend TS1GJF2A

2007-12-05 Thread Alan Stern
On Wed, 5 Dec 2007, Boaz Harrosh wrote:

> Hi Alen
> 
> Yes, I have not inspected sr.c very carefully, you are absolutely right.
> Could you submit a unified patch for sd, sr and scsi.c I have hit this 
> bug 2 in my error injection tests. I was doing sg_chaining tests and now
> with the possibly very large requests the problem gets worse. At the time,
> I could not identify the exact problem, thanks

I'd like to keep the sd and sr patches separate.  The sd patch has 
already been tested and is known to fix an existing problem.

The sr patch below is new and it has not been tested.  All I know is 
that it compiles.

Alan Stern


Index: usb-2.6/drivers/scsi/scsi.c
===
--- usb-2.6.orig/drivers/scsi/scsi.c
+++ usb-2.6/drivers/scsi/scsi.c
@@ -700,6 +700,8 @@ void scsi_finish_command(struct scsi_cmn
 
good_bytes = cmd->request_bufflen;
 if (cmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {
+   if (cmd->resid > 0 && cmd->resid < good_bytes)
+   good_bytes -= cmd->resid;
drv = scsi_cmd_to_driver(cmd);
if (drv->done)
good_bytes = drv->done(cmd);
Index: usb-2.6/drivers/scsi/sr.c
===
--- usb-2.6.orig/drivers/scsi/sr.c
+++ usb-2.6/drivers/scsi/sr.c
@@ -218,15 +218,29 @@ static int sr_media_change(struct cdrom_
 static int sr_done(struct scsi_cmnd *SCpnt)
 {
int result = SCpnt->result;
-   int this_count = SCpnt->request_bufflen;
-   int good_bytes = (result == 0 ? this_count : 0);
-   int block_sectors = 0;
-   long error_sector;
+   unsigned int xfer_size = SCpnt->request_bufflen;
+   unsigned int good_bytes = (result == 0 ? xfer_size : 0);
+   unsigned long start_lba = SCpnt->request->sector;
+   u64 bad_lba;
+   unsigned long num_blocks;
+   unsigned long bad_sector;
struct scsi_cd *cd = scsi_cd(SCpnt->request->rq_disk);
+   struct scsi_sense_hdr sshdr;
+   int sense_valid = 0;
+   int sense_deferred = 0;
+   int info_valid;
 
 #ifdef DEBUG
printk("sr.c done: %x\n", result);
 #endif
+   if (result) {
+   sense_valid = scsi_command_normalize_sense(SCpnt, &sshdr);
+   if (sense_valid)
+   sense_deferred = scsi_sense_is_deferred(&sshdr);
+   }
+   if (driver_byte(result) != DRIVER_SENSE &&
+   (!sense_valid || sense_deferred))
+   goto out;
 
/*
 * Handle MEDIUM ERRORs or VOLUME OVERFLOWs that indicate partial
@@ -234,60 +248,83 @@ static int sr_done(struct scsi_cmnd *SCp
 * care is taken to avoid unnecessary additional work such as
 * memcpy's that could be avoided.
 */
-   if (driver_byte(result) != 0 && /* An error occurred */
-   (SCpnt->sense_buffer[0] & 0x7f) == 0x70) { /* Sense current */
-   switch (SCpnt->sense_buffer[2]) {
-   case MEDIUM_ERROR:
-   case VOLUME_OVERFLOW:
-   case ILLEGAL_REQUEST:
-   if (!(SCpnt->sense_buffer[0] & 0x90))
-   break;
-   error_sector = (SCpnt->sense_buffer[3] << 24) |
-   (SCpnt->sense_buffer[4] << 16) |
-   (SCpnt->sense_buffer[5] << 8) |
-   SCpnt->sense_buffer[6];
-   if (SCpnt->request->bio != NULL)
-   block_sectors =
-   bio_sectors(SCpnt->request->bio);
-   if (block_sectors < 4)
-   block_sectors = 4;
-   if (cd->device->sector_size == 2048)
-   error_sector <<= 2;
-   error_sector &= ~(block_sectors - 1);
-   good_bytes = (error_sector - SCpnt->request->sector) << 
9;
-   if (good_bytes < 0 || good_bytes >= this_count)
-   good_bytes = 0;
-   /*
-* The SCSI specification allows for the value
-* returned by READ CAPACITY to be up to 75 2K
-* sectors past the last readable block.
-* Therefore, if we hit a medium error within the
-* last 75 2K sectors, we decrease the saved size
-* value.
-*/
-   if (error_sector < get_capacity(cd->disk) &&
-   cd->capacity - error_sector < 4 * 75)
-   set_capacity(cd->disk, error_sector);
+   switch (sshdr.sense_key) {
+   case HARDWARE_ERROR:
+   case MEDIUM_ERROR:
+   case VOLUME_OVERFLOW:
+   case ILLEGAL_REQUEST:
+   if (!blk_fs_request(SCpnt->reques

Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha

2007-12-05 Thread Bob Tracy
Current progress: 11 revisions left to test.  The current partial
"git bisect log" is available per Ingo's suggestion on bugzilla.

http://bugzilla.kernel.org/show_bug.cgi?id=9457

-- 

Bob Tracy  |  "They couldn't hit an elephant at this dist- "
[EMAIL PROTECTED]   |   - Last words of Union General John Sedgwick,
   |  Battle of Spotsylvania Court House, U.S. Civil War

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan

2007-12-05 Thread Lee Schermerhorn
As reported here:

http://marc.info/?l=linux-scsi&m=119645761124683&w=4

against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA
platform under 24-rc4-mm1 with async scsi scan enabled.  I'm still
seeing the message  "mptspi: ioc#: mpt_config failed" when it hangs. 

I can boot by disabling async scan.  However, I've also noticed some
disks attached via one of the "mpt" adapters ["scsi8" in console long in
message linked above] going "off-line" during stress tests.  This was
under 24-rc3-mm2.  Haven't got that far yet with 24-rc4-mm1.

Regards,
Lee


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [OpenFCoE PATCH] If expecting pre-T11 frames, a T11 frame caused a data fault.

2007-12-05 Thread Joe Eykholt
Christoph Hellwig wrote:
> Just wondering:
> 
>>From my quick reading of the code these pre-T11 frames are basically
> a different frame-level protocol.  Given that T11 has standardized
> on a different one what's the rationale for supporting the old frames?

There is still some use in lab environments of the old frame format, and
it would greatly help if we could keep it working for now.  Eventually,
in a year or so, it could be removed completely.  It seems to be a small
amount of code and in a loadable module besides.

Joe
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] ibmvscsi: retry on H_DROPPED during send_crq

2007-12-05 Thread Robert Jennings
Currently the vscsi client driver responds to the case where H_SEND_CRQ
returns H_DROPPED by returning DID_ERROR.  If the server CRQ is full,
either from mismanaging the request_limit or problems on the server,
we should return SCSI_MLQUEUE_HOST_BUSY instead.

The places where we are calling send_srp_login are not checking the
return code.  We could get H_DROPPED or H_CLOSED and in that case we
should reset and retry.

Signed-off-by: Robert Jennings <[EMAIL PROTECTED]>

---
 drivers/scsi/ibmvscsi/ibmvscsi.c |   19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

Index: b/drivers/scsi/ibmvscsi/ibmvscsi.c
===
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -629,14 +629,15 @@ static int ibmvscsi_send_srp_event(struc
list_del(&evt_struct->list);
del_timer(&evt_struct->timer);
 
-   /* If send_crq returns H_CLOSED, return SCSI_MLQUEUE_HOST_BUSY.
-* Firmware will send a CRQ with a transport event (0xFF) to
-* tell this client what has happened to the transport.  This
-* will be handled in ibmvscsi_handle_crq()
+   /* If send_crq returns H_CLOSED or H_DROPPED, return
+* SCSI_MLQUEUE_HOST_BUSY.  Firmware will send a CRQ with
+* a transport event (0xFF) to tell this client what has
+* happened to the transport.  This  will be handled in
+* ibmvscsi_handle_crq().
 */
-   if (rc == H_CLOSED) {
+   if (rc == H_CLOSED || rc == H_DROPPED) {
dev_warn(hostdata->dev, "send warning. "
-"Receive queue closed, will retry.\n");
+"Receive queue unavailable, will retry.\n");
goto send_busy;
}
dev_err(hostdata->dev, "send error %d\n", rc);
@@ -1270,7 +1271,8 @@ void ibmvscsi_handle_crq(struct viosrp_c
if ((rc = ibmvscsi_ops->send_crq(hostdata,
 0xC002LL, 
0)) == 0) {
/* Now login */
-   send_srp_login(hostdata);
+   if (send_srp_login(hostdata))
+   ibmvscsi_reset_host(hostdata);
} else {
dev_err(hostdata->dev, "Unable to send init 
rsp. rc=%ld\n", rc);
}
@@ -1280,7 +1282,8 @@ void ibmvscsi_handle_crq(struct viosrp_c
dev_info(hostdata->dev, "partner initialization 
complete\n");
 
/* Now login */
-   send_srp_login(hostdata);
+   if (send_srp_login(hostdata))
+   ibmvscsi_reset_host(hostdata);
break;
default:
dev_err(hostdata->dev, "unknown crq message type: 
%d\n", crq->format);
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] esp_scsi: Fix reset cleanup spinlock recursion

2007-12-05 Thread Maciej W. Rozycki
 The esp_reset_cleanup() function is called with the host lock held and 
invokes starget_for_each_device() which wants to take it too.  Here is a 
fix along the lines of shost_for_each_device()/__shost_for_each_device() 
adding a __starget_for_each_device() counterpart which assumes the lock 
has already been taken.

 Eventually, I think the driver should get modified so that more work is 
done as a softirq rather than in the interrupt context, but for now it 
fixes a bug that causes the spinlock debugger to fire.

 While at it, it fixes a small number of cosmetic problems with 
starget_for_each_device() too.

Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>
---
 Checked with checkpatch.pl and at the run time.  Please apply.

  Maciej

patch-mips-2.6.23-rc5-20070904-esp_scsi-reset-3
diff -up --recursive --new-file 
linux-mips-2.6.23-rc5-20070904.macro/drivers/scsi/esp_scsi.c 
linux-mips-2.6.23-rc5-20070904/drivers/scsi/esp_scsi.c
--- linux-mips-2.6.23-rc5-20070904.macro/drivers/scsi/esp_scsi.c
2007-10-23 14:07:38.0 +
+++ linux-mips-2.6.23-rc5-20070904/drivers/scsi/esp_scsi.c  2007-11-26 
01:01:10.0 +
@@ -2028,8 +2028,8 @@ static void esp_reset_cleanup(struct esp
tp->flags |= ESP_TGT_CHECK_NEGO;
 
if (tp->starget)
-   starget_for_each_device(tp->starget, NULL,
-   esp_clear_hold);
+   __starget_for_each_device(tp->starget, NULL,
+ esp_clear_hold);
}
esp->flags &= ~ESP_FLAG_RESETTING;
 }
diff -up --recursive --new-file 
linux-mips-2.6.23-rc5-20070904.macro/drivers/scsi/scsi.c 
linux-mips-2.6.23-rc5-20070904/drivers/scsi/scsi.c
--- linux-mips-2.6.23-rc5-20070904.macro/drivers/scsi/scsi.c2007-09-04 
04:55:44.0 +
+++ linux-mips-2.6.23-rc5-20070904/drivers/scsi/scsi.c  2007-12-05 
16:00:40.0 +
@@ -886,11 +886,11 @@ EXPORT_SYMBOL(__scsi_iterate_devices);
  * starget_for_each_device  -  helper to walk all devices of a target
  * @starget:   target whose devices we want to iterate over.
  *
- * This traverses over each devices of @shost.  The devices have
+ * This traverses over each device of @starget.  The devices have
  * a reference that must be released by scsi_host_put when breaking
  * out of the loop.
  */
-void starget_for_each_device(struct scsi_target *starget, void * data,
+void starget_for_each_device(struct scsi_target *starget, void *data,
 void (*fn)(struct scsi_device *, void *))
 {
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
@@ -905,6 +905,33 @@ void starget_for_each_device(struct scsi
 EXPORT_SYMBOL(starget_for_each_device);
 
 /**
+ * __starget_for_each_device  -  helper to walk all devices of a target
+ *  (UNLOCKED)
+ * @starget:   target whose devices we want to iterate over.
+ *
+ * This traverses over each device of @starget.  It does _not_
+ * take a reference on the scsi_device, so the whole loop must be
+ * protected by shost->host_lock.
+ *
+ * Note:  The only reason why drivers would want to use this is because
+ * they need to access the device list in irq context.  Otherwise you
+ * really want to use starget_for_each_device instead.
+ **/
+void __starget_for_each_device(struct scsi_target *starget, void *data,
+  void (*fn)(struct scsi_device *, void *))
+{
+   struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
+   struct scsi_device *sdev;
+
+   __shost_for_each_device(sdev, shost) {
+   if ((sdev->channel == starget->channel) &&
+   (sdev->id == starget->id))
+   fn(sdev, data);
+   }
+}
+EXPORT_SYMBOL(__starget_for_each_device);
+
+/**
  * __scsi_device_lookup_by_target - find a device given the target (UNLOCKED)
  * @starget:   SCSI target pointer
  * @lun:   SCSI Logical Unit Number
diff -up --recursive --new-file 
linux-mips-2.6.23-rc5-20070904.macro/include/scsi/scsi_device.h 
linux-mips-2.6.23-rc5-20070904/include/scsi/scsi_device.h
--- linux-mips-2.6.23-rc5-20070904.macro/include/scsi/scsi_device.h 
2007-09-04 04:56:21.0 +
+++ linux-mips-2.6.23-rc5-20070904/include/scsi/scsi_device.h   2007-11-26 
01:00:57.0 +
@@ -222,6 +222,9 @@ extern struct scsi_device *__scsi_device
  uint);
 extern void starget_for_each_device(struct scsi_target *, void *,
 void (*fn)(struct scsi_device *, void *));
+extern void __starget_for_each_device(struct scsi_target *, void *,
+ void (*fn)(struct scsi_device *,
+void *));
 
 /* only exposed to implement shost_for_each_device */
 extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
-
To unsubscribe from this list: send the lin

Re: [patch] SCSI: early detection of medium not present, updated

2007-12-05 Thread James Bottomley

On Sun, 2007-12-02 at 15:02 -0500, Alan Stern wrote:
> On Sun, 2 Dec 2007, James Bottomley wrote:
> 
> > Actually, the night train is a good place to do coding.  I know this
> > compiles, but could someone check it out?  It's the form I think we
> > should do, since the ASC/ASCQ codes for esoteric conditions which we
> > might eventually check for are different for CDs and removable discs.
> 
> This part is puzzling:
> 
> > +   /* try to eat the UNIT_ATTENTION if there are enough retries */
> > +   do {
> > +   result = scsi_execute_req(sdev, cmd, DMA_NONE, NULL, 0, sshdr,
> > + timeout, retries);
> > +   } while ((driver_byte(result) & DRIVER_SENSE) &&
> > +sshdr && sshdr->sense_key == UNIT_ATTENTION &&
> > +--retries);
> 
> For one thing, you've using retries to control both an outer loop and 
> an inner loop -- meaning the total number of attempts could be on the 
> order of retries**2.

Sort of; however, there should only really be one cc/ua before the TUR
goes through.  Also, TUR produces almost no conditions that would be
retryable in scsi_decide_disposition() and the scsi_execute sends it
down with REQ_FAILFAST anyway, so it's really not used much.

> For another, why do you want to swallow a UNIT_ATTENTION?  Shouldn't 
> that be up to the code calling scsi_test_unit_ready()?

No, that's a bug in the old code.  UA doesn't necessarily mean there's a
media change.  It could be asserted for a whole host of reasons.  Until
we get proper UA signalling, we need to resend the TUR so we get the
correct media status and discard the UA (Even if the device is asserting
UA for media change reasons, it will still give the correct sense code
response to the following TUR).

James

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [OpenFCoE PATCH] If expecting pre-T11 frames, a T11 frame caused a data fault.

2007-12-05 Thread Christoph Hellwig
Just wondering:

>From my quick reading of the code these pre-T11 frames are basically
a different frame-level protocol.  Given that T11 has standardized
on a different one what's the rationale for supporting the old frames?

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


need help to suspend a device

2007-12-05 Thread Bernd Schubert
Hi,

I need some help with device suspending or something similar to this.

Problem description: We have many many Infortrend sata-to-scsi-raid systems 
here, connected to several server systems with LSI2230 HBAs.
Now the problem is, that these Infortrend boxes sometimes suffer from some 
kind of hick-up, sometimes due to the reboot of second server connected to 
the very same Infortrend box and sometimes just for no obvious reason. 

As far I can presently see it, the best would be to suspend any i/o, commands, 
etc to the troublesome box for about 30s to 60s and just wait and then to 
repeat the last command.
However, presently the scsi-layer/driver will immediately begin a domain 
validation and then do a scsi-reset. Unfortunately this will cause even more 
trouble. Sometimes this works, in more cases it will end in an i/o error and 
in most cases these domain validations and scsi-resets will go into an 
endless loop and eventually the Infortrend raids will be so confused that 
their scsi-controller will crash.

I'm already looking for some hours through the scsi and mpt functions, but so 
far didn't find a good way to put commands to a scsi-device to sleep for some 
time.

Any hints and suggestions are highly appreciated.


Thanks in advance,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

2007-12-05 Thread Anders Henke
On Tue, 4 Dec 2007 Andrew Morton wrote:
> On Wed, 05 Dec 2007 10:30:54 +0900 FUJITA Tomonori <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 4 Dec 2007 17:11:55 -0800
> > Andrew Morton <[EMAIL PROTECTED]> wrote:
> > 
> > > On Wed, 05 Dec 2007 10:04:03 +0900
> > > FUJITA Tomonori <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On Tue, 4 Dec 2007 16:57:38 -0800
> > > > Andrew Morton <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > On Thu, 29 Nov 2007 13:31:50 +0100
> > > > > Anders Henke <[EMAIL PROTECTED]> wrote:
> > > > > 
> > > > > > On November 28 2007, Anders Henke wrote:
> > > > > > > As "everything is reported as being zero" is quite odd an Jan 
> > > > > > > took a
> > > > > > > guess that it might be block-layer or driver-related, I've assumed
> > > > > > > that the driver is responsible for this; just out of the 
> > > > > > > curiousity, 
> > > > > > > I've manually replaced the dpt_i2o driver by the 2.6.19 one by 
> > > > > > > copying 
> > > > > > > driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ 
> > > > > > > into a 
> > > > > > > vanilla 2.6.23.1. kernel; using this kernel fixed the issue for 
> > > > > > > me.
> > > > > > > 
> > > > > > > I haven't yet fine-tested from which kernel release on the 
> > > > > > > dpt_i2o driver 
> > > > > > > behaves like this and spews out zeroed blocks when trying to mount
> > > > > > > the rootfs. Maybe this is just some timing issue.
> > > > > > 
> > > > > > I've started the fine-tests and can say so far that dpt_i2o from 
> > > > > > 2.6.22 is still fine. Test is simple:
> > > > > > 
> > > > > > [EMAIL PROTECTED]:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r 
> > > > > > dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/
> > > > > > 
> > > > > > ... recompile the kernel, reboot: works.
> > > > > > 
> > > > > > 2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two 
> > > > > > different
> > > > > > patch sets:
> > > > > > -one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1
> > > > > > -one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3
> > > > > > -one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10.
> > > > > > 
> > > > > > When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel,
> > > > > > the "zero blocks"-symptom show up, so it's the "lucky" situation
> > > > > > that the smallest patch actually seams to be the broken one.
> > > > > > 
> > > > > > According to the 2.6.23-rc1 short-form changelog, there is
> > > > > > one major edit on the dpt_i2o driver:
> > > > > > 
> > > > > > FUJITA Tomonori 
> > > > > > 
> > > > > >   [SCSI] dpt_i2o: convert to use the data buffer accessors
> > > > > > 
> > > > > > Stephen Rothwell 
> > > > > >   dpt_i2o depends on virt_to_bus
> > > > > > 
> > > > > > Fujita, would you please take a look at this?
> > > > > 
> > > > > He won't have seen this.  cc's added.
> > > > > 
> > > > > > I think that something's broken in there, leading to the dpt_i2o 
> > > > > > sending out blocks of zeroes right after initialization, at least on
> > > > > > some specific controllers (in this case, Adaptec 2010S on Intel
> > > > > > SE7501WV2S-based boxes).
> > > > > > 
> > > > > > I don't have insight kernel driver development knowledge, so I'm
> > > > > > quite out of help right now. Nevertheless, I'll add the diff
> > > > > > from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o:
> > > > > > 
> > > > > 
> > > > > Can you please confirm that this revert (against 2.6.24-rc4) fixes 
> > > > > the data
> > > > > corruption problems?
> > > > 
> > > > Anders said that my patch is fine and seems that Matthew's hotplug
> > > > conversion patch leads to the problem:
> > > > 
> > > > http://marc.info/?l=linux-kernel&m=119641892129732&w=2
> > > 
> > > Oh.  Jan broke message threading :(
> > > 
> > > So it's been nearly a week and nothing has happened?  Do we revert that
> > > change?
> > 
> > SCSI people really want this conversion...
> > 
> > Matthew, did you have a chance to look at it?
> 
> It seems pretty improbably that a change of that nature could cause data
> corruption.  Anders, are you able to determine whether the revert (against
> current Linus mainline or 2.6.24-rc4) fixes things?  Because it would be
> very strange...
> 
> This is a grave bug.  It's really quite urgent...
> 
> Thanks.
> 
>  drivers/scsi/dpt_i2o.c |  132 ++-
>  drivers/scsi/dpti.h|9 ++
>  2 files changed, 68 insertions(+), 73 deletions(-)

I've done the following:

-untared a clean 2.6.24-rc4 and compiled it with my 2.6.23.1-settings in order
 to verify that the driver is still broken: checked, the box still won't 
 boot.

-patched the just compiled kernel source with your patch, "make dist-clean"
 (by means of "make-kpkg clean") and recompile: box boots fine.

I've put the captured console logs to
http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-pristine
http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-patched
... and the kernelconfig (which shouldn't matter) to
http://w.sysiphus.de/dpt_i2o

Re: [PATCH 01/28] blk_end_request: add new request completion interface (take 3)

2007-12-05 Thread Jens Axboe
On Tue, Dec 04 2007, Kiyoshi Ueda wrote:
> Hi Boaz and Jens,
> 
> On Tue, 04 Dec 2007 15:56:32 +0200, Boaz Harrosh <[EMAIL PROTECTED]> wrote:
> > > +/**
> > > + * blk_end_request - Helper function for drivers to complete the request.
> > > + * @rq:   the request being processed
> > > + * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
> > > + * @nr_bytes: number of bytes to complete
> > > + *
> > > + * Description:
> > > + * Ends I/O on a number of bytes attached to @rq.
> > > + * If @rq has leftover, sets it up for the next range of segments.
> > > + *
> > > + * Return:
> > > + * 0 - we are done with this request
> > > + * 1 - still buffers pending for this request
> > > + **/
> > > +int blk_end_request(struct request *rq, int uptodate, int nr_bytes)
> > 
> > I always hated that uptodate boolean with possible negative error value.
> > I guess it was done for backward compatibility of then users of 
> > end_that_request_first(). But since you are introducing a new API then
> > this is not the case. Just have regular status int where 0 means ALL_OK
> > and negative value means error code. 
> > Just my $0.02.
> 
> Thank you for the comment.
> I think it's quite reasonable.
> By doing that, we don't need end_io_error() anymore.
> 
> 
> Jens,
> What do you think?

I agree, the uptodate usage right now is horrible.

> If you agree with the interface change above, I would prefer to
> separate the patch-set from blk_end_request patch-set like below:
> o blk_end_request: remove end_that_request_*
> o change interface of 'uptodate' in blk_end_request to 'error'
> It makes the purpose of blk_end_request patch-set clear
> (and also, each patch of device drivers could be smaller).
> But, it doubles your merging work.  So if you would like to get
> the changes at once, I'll merge them into blk_end_request patch-set.

Twice the merging is not an issue for me.

> As for the patch inclusion, do you push the driver changes to Linus
> all at once?  Or should I ask each maintainer to take the patch?

Lets just try to get as many maintainer acks as possible, since the
patches need to go in together.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-usb-users] Read errors on Flash Drive Transcend TS1GJF2A

2007-12-05 Thread Boaz Harrosh
On Tue, Dec 04 2007 at 20:03 +0200, Alan Stern <[EMAIL PROTECTED]> wrote:
> On Tue, 4 Dec 2007, Boaz Harrosh wrote:
> 
>> perhaps below hunk should be added to your patch.
> 
> It looks like a good idea.
> 
>> Was it decided when this data corruption bugfix is
>> merged.
> 
> I don't know -- I haven't heard anything back from James.
> 
>> Also in sr.c. It does the range check but it might
>> enjoy the resid handling as well.
> 
> I think the range checking in sr.c is completely wrong.  The code
> doesn't check carefully to see that the error sector lies within the
> range of sectors being accessed; there's a possibility of overflow if
> the capacity is larger than 2**31 bytes.  Also this line in particular
> makes no sense:
> 
>   error_sector &= ~(block_sectors - 1);
> 
> Like you said, using the residue value too wouldn't hurt.
> 
> Furthermore the check for the Valid bit is done wrongly:
> 
>   if (!(SCpnt->sense_buffer[0] & 0x90))
> 
> This will never be true because of the earlier test:
> 
> if (driver_byte(result) != 0 && /* An error occurred */
> (SCpnt->sense_buffer[0] & 0x7f) == 0x70) { /* Sense current */
> 
> It probably should test against 0x80 instead of 0x90.
> 
> Alan Stern
> 
Hi Alen

Yes, I have not inspected sr.c very carefully, you are absolutely right.
Could you submit a unified patch for sd, sr and scsi.c I have hit this 
bug 2 in my error injection tests. I was doing sg_chaining tests and now
with the possibly very large requests the problem gets worse. At the time,
I could not identify the exact problem, thanks

Boaz
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html