Re: [RFD driver-core] Lifetime problems of the current driver model

2007-04-02 Thread Cornelia Huck
On Mon, 2 Apr 2007 04:59:43 +0900,
Tejun Heo <[EMAIL PROTECTED]> wrote:

> Okay, here's preliminary patch.  I've tested it very lightly so it's
> likely to contain bugs and definitely needs to be splitted.

Cool. However, there's something fishy there (not sure whether it's in
your patch or a latent bug in the ccw bus code that just has been
uncovered):

[ cut here ]
kernel BUG at mm/slab.c:607!
illegal operation: 0001 [#1]
CPU:1Not tainted
Process kslowcrw (pid: 36, task: 01a80cc0, ksp:
01a87ce0) Krnl PSW : 040400018000 000b21b0
(kfree+0x144/0x154) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0
EA:3 Krnl GPRS:   009836d8
03fffd22ffe0 000b20aa 00396ba8 01a6d100
00986c88 070001a879d8 00555428 0344a5b0
01a879c8 0344a500 00390ca8 000b20aa
01a879c8 Krnl Code: 000b21a2: e392c0480024
stg %r9,72(%r2,%r12) 000b21a8: a7f4ffc9   brc
15,1000b213a 000b21ac: a7f40001   brc 15,b21ae
  >000b21b0: a7f4ff9d   brc 15,1000b20ea
   000b21b4: e3303014   lg  %r3,16(%r3)
   000b21ba: a7f4ff90   brc 15,1000b20da
   000b21be: 0707   bcr 0,%r7
   000b21c0: eb8ff0580024   stmg
%r8,%r15,88(%r15) Call Trace:
([<000b20aa>] kfree+0x3e/0x154)
 [<001166f2>] release_sysfs_dirent+0x3e/0xf4
 [<00114914>] sysfs_hash_and_remove+0x150/0x154
 [<00118aec>] remove_files+0x48/0x68
 [<00118b84>] sysfs_remove_group+0x78/0xe8
 [<001cce3c>] device_remove_groups+0x48/0x68
 [<001cd4c0>] device_remove_attrs+0x3c/0x7c
 [<001cd70e>] device_del+0x20e/0x3a8
 [<001cd8d2>] device_unregister+0x2a/0x44
 [<0022fb2c>] css_sch_device_unregister+0x3c/0x54
 [<00230124>] css_evaluate_subchannel+0x2f0/0x400
 [<0023041a>] css_trigger_slow_path+0xda/0x154
 [<00054c9a>] run_workqueue+0x136/0x1fc
 [<00054e3a>] worker_thread+0xda/0x170
 [<0005b05a>] kthread+0x122/0x15c
 [<00019912>] kernel_thread_starter+0x6/0xc
 [<0001990c>] kernel_thread_starter+0x0/0xc

This happens when I detach a device (which causes it to be
unregistered). I'll look at it when I'm fully operational.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD driver-core] Lifetime problems of the current driver model

2007-04-02 Thread Greg KH
On Mon, Apr 02, 2007 at 04:59:43AM +0900, Tejun Heo wrote:
> 
> Dependencies between sysfs/kobject objects are clearer now and I think
> I got the locking and referencing correct.  This patch immediately
> fixes 'sysfs attr grabbing the wrong kobject and module' problem -
> sysfs and module lifetimes are unrelated now.  We can drop
> half-working attr->owner.
> 
> * A sysfs node no longer hold reference to its kobject.  It just
>   attaches itself to the kobject on creation and detaches on deletion.
> 
> * For a dir node, sysfs_dirent no longer directly points to kobject.
>   It points to sysfs_dir which contains pointer to kobject and a rwsem
>   to protect it.
> 
> * An open file doesn't hold a reference to kobject.  It holds a
>   reference to sysfs_dirent.  kobject pointer is verified and
>   show/store are performed with rwsem read-locked.  Deletion
>   disconnects the sysfs from its kobject while the rwsem is
>   write-locked.  This mechanism replaces buffer orphaning and kobject
>   validation during open.

Ah, very nice.

> * attr ops is determined on sysfs node creation not on open.  This is
>   a step toward making kobject opaque in sysfs.
> 
> * bin files are handled similarly but mmap makes the open file hold
>   reference to the kobject till it's closed.  Any better ideas?

This is probably needed, and might be acceptable, and I think we can
live with it as there is only a very small number of sysfs files that
fall into this category.

> * symlink is reimplemented in terms of sysfs_dirents instead of
>   kobjects.  sysfs_dirent->s_parent is added and each sysfs_dirent
>   holds reference to its parent.  Currently walking up the tree
>   requires read locking and unlocking sysfs_dir at each level.  This
>   can be removed if name is added to sysfs_dirent.
> 
> * As kobject can be disconnected anytime and sysfs code still needs to
>   look follow dentry link in kobject, kobject->dentry is protected by
>   dcache_lock.  Once kobject becomes opaque to sysfs, this hack can go
>   away.  All in all, making kobject completely opaque in sysfs isn't
>   too far away after this patch although it would require mass code
>   update.

What would need to be updated after this?

> What do you think about this approach?  If it's acceptable, I'll test
> further and split the patch into logical steps to get it reviewed
> better.

At first glance, I think it looks fine, but Maneesh is the one who
understands this code better than anyone, so I would like to get his
opinion on it.

I think you should start splitting it all up into steps, which will help
us review it a whole lot easier, as overall, I think this is a very
worthy goal.

thanks so much for doing this work,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD driver-core] Lifetime problems of the current driver model

2007-04-02 Thread Maneesh Soni
On Mon, Apr 02, 2007 at 04:59:43AM +0900, Tejun Heo wrote:
> Hello, James, Greg.
> 
> On Fri, Mar 30, 2007 at 01:19:34PM -0500, James Bottomley wrote:
> > That's sort of what I was reaching for too ... it just looks to me that
> > all the sysfs glue is in kobject, so they make a good candidate for the
> > pure sysfs objects.  Whatever we do, there has to be breakable cross
> > linking between the driver's tree and the sysfs representation ... of
> > course, nasty things like ksets get in the way of considering the
> > objects as separate, sigh.
> 
> The dependency between kobject and sysfs doesn't seem to be that
> tight.  I managed to break them such that sysfs can disconnect from
> its kobject on deletion _without_ changing sysfs/kobject API.
> 
> > > This way the sysfs reference counting can be put completely out of
> > > picture without impacting the rest of code.  Handling symlink and
> > > suicidal attributes might need some extra attention but I think they can
> > > be done.
> > 
> > True ... but you'll also have to implement something within sysfs to do
> > refcounting ... it needs to know how long to keep its nodes around.
> 
> sysfs_dirent already had reference counter and plays that role quite
> nicely.
> 
> Okay, here's preliminary patch.  I've tested it very lightly so it's
> likely to contain bugs and definitely needs to be splitted.
> 
> Dependencies between sysfs/kobject objects are clearer now and I think
> I got the locking and referencing correct.  This patch immediately
> fixes 'sysfs attr grabbing the wrong kobject and module' problem -
> sysfs and module lifetimes are unrelated now.  We can drop
> half-working attr->owner.
> 
> * A sysfs node no longer hold reference to its kobject.  It just
>   attaches itself to the kobject on creation and detaches on deletion.
> 

Earlier only sysfs leaf nodes held ref to kobject that is also 
only during open/close for regular files and in create/follow/destroy
symlinks.

> * For a dir node, sysfs_dirent no longer directly points to kobject.
>   It points to sysfs_dir which contains pointer to kobject and a rwsem
>   to protect it.
> 
> * An open file doesn't hold a reference to kobject.  It holds a
>   reference to sysfs_dirent.  kobject pointer is verified and
>   show/store are performed with rwsem read-locked.  Deletion
>   disconnects the sysfs from its kobject while the rwsem is
>   write-locked.  This mechanism replaces buffer orphaning and kobject
>   validation during open.
> 
> * attr ops is determined on sysfs node creation not on open.  This is
>   a step toward making kobject opaque in sysfs.
> 
> * bin files are handled similarly but mmap makes the open file hold
>   reference to the kobject till it's closed.  Any better ideas?
> > * symlink is reimplemented in terms of sysfs_dirents instead of
>   kobjects.  sysfs_dirent->s_parent is added and each sysfs_dirent
>   holds reference to its parent.  Currently walking up the tree
>   requires read locking and unlocking sysfs_dir at each level.  This
>   can be removed if name is added to sysfs_dirent.
> 
> * As kobject can be disconnected anytime and sysfs code still needs to
>   look follow dentry link in kobject, kobject->dentry is protected by
>   dcache_lock.  Once kobject becomes opaque to sysfs, this hack can go
>   away.  All in all, making kobject completely opaque in sysfs isn't
>   too far away after this patch although it would require mass code
>   update.
> 
> What do you think about this approach?  If it's acceptable, I'll test
> further and split the patch into logical steps to get it reviewed
> better.
> 

It does complicate a lot in sysfs code. The code is already quite fragile
considering the races (akpm is facing a couple of them, and which I am 
still working on) from VFS/dentry/inode angle. The previous rework of
sysfs was to make dentries/inodes corresponding to sysfs leaf nodes
reclaimable, i.e. looking at sysfs from VFS angle and IIUC you are 
looking at it from the driver layer to solve races between module/driver-
layer/sysfs, which is a valid purpose.

Following are the few rules in short to take care
1. directory dentries are pinned
2. regular files and symlink dentries are reclaimable
3. sysfs_dirent tree is protected using the parent inode's (ie directory 
   dentry's inode) i_mutex
4. sysfs rename is protected using a semaphore, sysfs_rename_sem also.
5. sysfs_dirent starts with refcount of 1 at the time of creation, refcount
   is incremented/decremented when a dentry is attached or detached. So, the
   refcount at the max can be 2. 
6. symlink creation pins the target kobject, so that the corresponding dentry
   is also pinned and removal releases the kobject.

There were few more additions like sysfs poll and shadow directory support,
for which respective authors can review and I can review the VFS related
changes but please split the patches as you like but one possible way
could be as chunks dealing with changes related to

1. data structure changes
2

Re: SMP pass through interface via bsg

2007-04-02 Thread FUJITA Tomonori
From: James Bottomley <[EMAIL PROTECTED]>
Subject: Re: SMP pass through interface via bsg
Date: Sun, 01 Apr 2007 22:57:42 -0500

> On Mon, 2007-04-02 at 11:49 +0900, FUJITA Tomonori wrote:
> > I started to work on SMP pass through via bsg as a non-SCSI command
> > experimentation. I asked Doug to try the patch but it seems to take
> > some time to make Doug's hardware work with the mainline aic94xxx
> > driver.
> 
> If you post the code, others who also have the hardware can try it
> too ...

I've attached a patch against Jens' bsg tree and uploaed a patch
against the sgv4-tools tree:

http://zaal.org/bsg/smp-test.diff

The sgv4-tools tree is at:

http://git.kernel.org/?p=linux/kernel/git/tomo/sgv4-tools.git;a=summary


If everything works:

gazania:/sys/class/bsg# ls
end_device-0:0  end_device-0:1  expander-0:0  sas_host0  sda  sdb

gazania:/home/fujita/sgv4-tools# ./smp_test /sys/class/bsg/expander-0\:0


smp_test works like Doug's smp_rep_manufacturer.

Note that `smp_test /sys/class/bsg/expander-0\:0/` doesn't work.

I've not tested these patches at all since I don't have proper
hardware. So please don't be surprised if the kernel crashes.


> > How should a user-space tool (like Doug's smp_utils) send a request
> > and get the response via bsg?
> 
> Basic frame in/frame out ... this is why we need the block layer
> bidirectional support

Yeah. Currently, I use a dirty hack to use sense buffer for bidi.


> > -- each SAS object (host, device, expander, etc) has the own bsg
> > device
> 
> I think so; probably attached via the transport class.

I see. I changed my patch to use this approach.


> > A user-space tool opens a bsg device for a sas object that it wants to
> > send a request to.
> > 
> > -- one bsg device handles multiple SAS objects
> > 
> > Some options: one bsg device in the whole system; one bsg device per
> > SAS driver; one bsg device per SAS host; etc.
> > 
> > With this approach, we need to invent a header incluing routing
> > information.
> > 
> > My hacky patch creates one bsg device per SAS host and invents:
> > 
> > struct smp_passthrough_hdr {
> > __u64 sas_address;
> > __u8 phy_identifier;
> > };
> 
> I'd really rather not go this route unless the one device per object
> approach becomes untenable.

OK. Probably, Dougs has some comments on this. I chose this because
Dougs prefers it and at the workshop Christoph seemed to be neutral in
two approaches.

Anyway, as I said, the attached patch uses "one bsg device per one SAS
object" approach.


> > a user-space tool sends this header instead of scb via bsg and put a
> > SMP request at dout_xfer and a buffer response at din_xfer (note that
> > now I do bidi transfer in a hacky way).
> > 
> > 
> > - SAS implementation details
> > 
> > As I said, the patch is too hacky. I just tried to implement a
> > smp_portal alternative by using bsg.
> > 
> > The patch adds a hook into sas transport class. sas_host_setup calls
> > bsg_register_queue. Then, the request_fn calls smp_execute_task to
> > send a smp request and get the response. It doesn't look good to link
> > the sas transport class with libsas. In addition, the mpt driver
> > handles smp request/response in a very different way.
> > 
> > Any suggestion to bind SMP pass through via bsg to aic94xx and mpt
> > cleanly?
> 
> bind in the transport class, not the driver ...

Yeah. The problem is that requests go to libsas for aic94xx while requests
directly go to the mpt driver.

I added a hook for SMP to struct sas_function_template.

diff --git a/block/bsg.c b/block/bsg.c
index a333c93..6c7802f 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -230,8 +230,8 @@ static int blk_fill_sgv4_hdr_rq(request_
if (copy_from_user(rq->cmd, (void *)(unsigned long)hdr->request,
   hdr->request_len))
return -EFAULT;
-   if (blk_verify_command(rq->cmd, has_write_perm))
-   return -EPERM;
+/* if (blk_verify_command(rq->cmd, has_write_perm)) */
+/* return -EPERM; */
 
/*
 * fill in request structure
@@ -263,7 +263,7 @@ bsg_validate_sgv4_hdr(request_queue_t *q
return -EIO;
 
/* not supported currently */
-   if (hdr->protocol || hdr->subprotocol)
+   if (hdr->protocol)
return -EINVAL;
 
/*
@@ -273,7 +273,7 @@ bsg_validate_sgv4_hdr(request_queue_t *q
return 0;
 
/* not supported currently */
-   if (hdr->dout_xfer_len && hdr->din_xfer_len)
+   if (!hdr->subprotocol && hdr->dout_xfer_len && hdr->din_xfer_len)
return -EINVAL;
 
*rw = hdr->dout_xfer_len ? WRITE : READ;
@@ -312,6 +312,12 @@ bsg_map_hdr(struct bsg_device *bd, struc
return ERR_PTR(ret);
}
 
+   if (hdr->subprotocol) {
+   rq->sense_len = hdr->dout_xfer_len;
+   rq->sense = (void*)(unsigned long)hdr->dout_xferp;
+   hdr->dout_xfer_len = 0;
+   }
+
if (hdr-

Re: [PATCH] Add support for asynchronous scans to libata

2007-04-02 Thread James Smart

Matt,

Have your resolved the unload race conditions yet?

We'd like to update lpfc for the async scans, but our testing gets blocked
very quickly by the bugs. The bugs are not necessarily specific to lpfc
or to FC.

Stack traces are below. Simple ismod/rmmod loop can trigger them

-- james s



First one is an rmmod while lpfc is active in it's scan_start function.
I think we're in a sleep waiting for the board to be ready when the driver
is yanked out from underneath us.

Second one is an rmmod while do_scsi_scan_host is in a loop calling
scan_finished.

Third one is an rmmod immediately after completion of do_scan_async.


STACK TRACE 1
-

BUG: unable to handle kernel paging request at virtual address f8941836
 printing eip:
f8941836
*pde = 37e59067
Oops:  [#1]
SMP
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_mirror 
dm_mod video sbs i2c_e6
 printing eip:
f8941836
*pde = 37e59067
 chipreg pcspkr tg3 scsi_transport_fc mptspi mptscsih mptbase 
scsi_transport_spi sd_mod scsi_mod exd
CPU:2
EIP:0060:[]Not tainted VLI
EFLAGS: 00010246   (2.6.20-rc7-jimp #1)
EIP is at 0xf8941836
eax:    ebx:    ecx: c190   edx: 0286
esi: f7ab6000   edi: f887158d   ebp: f7ab62e8   esp: ecec9f30
ds: 007b   es: 007b   ss: 0068
Process scsi_scan_6 (pid: 15927, ti=ecec9000 task=f2309000 task.ti=ecec9000)
Stack: c04a2183 f65b6cc4 f8963c80 f8963c80 f71e38d0 ffef c04a1cef 8180
   0008 f6e4aa00  0046  f7ab62e8 f7ab6000 f887158d
   f75fa3e0 f7ab62e8 f7ab6000 f887158d f75fa3e0 f894d6dc f89593e4 f7ab6000
Call Trace:
 [] sysfs_dirent_exist+0x20/0x5f
 [] sysfs_add_file+0x66/0x70
 [] do_scan_async+0x0/0x126 [scsi_mod]
 [] do_scan_async+0x0/0x126 [scsi_mod]
 [] do_scsi_scan_host+0x30/0x8b [scsi_mod]
 [] do_scan_async+0x1d/0x126 [scsi_mod]
 [] complete+0x39/0x48
 [] do_scan_async+0x0/0x126 [scsi_mod]
 [] kthread+0xb0/0xd8
 [] kthread+0x0/0xd8
 [] kernel_thread_helper+0x7/0x10
 ===
Code:  Bad EIP value.
EIP: [] 0xf8941836 SS:ESP 0068:ecec9f30
 <0>Oops:  [#2]
SMP
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_mirror 
dm_mod video sbs i2c_ed
CPU:3
EIP:0060:[]Not tainted VLI
EFLAGS: 00010246   (2.6.20-rc7-jimp #1)
EIP is at 0xf8941836
eax:    ebx:    ecx: c1902000   edx: 0286
esi: f74ff000   edi: f887158d   ebp: f74ff2e8   esp: ec3c1f30
ds: 007b   es: 007b   ss: 0068
Process scsi_scan_5 (pid: 15919, ti=ec3c1000 task=f7b92000 task.ti=ec3c1000)
Stack: c04a2183 ece3821c f8963c80 f8963c80 f6b3942c ffef c04a1cef 8180
   0008 f6efea00  0046 0002 f74ff2e8 f74ff000 f887158d
   f7897de0 f74ff2e8 f74ff000 f887158d f7897de0 f894d6dc f89593e4 f74ff000
Call Trace:
 [] sysfs_dirent_exist+0x20/0x5f
 [] sysfs_add_file+0x66/0x70
 [] do_scan_async+0x0/0x126 [scsi_mod]
 [] do_scan_async+0x0/0x126 [scsi_mod]
 [] do_scsi_scan_host+0x30/0x8b [scsi_mod]
 [] do_scan_async+0x1d/0x126 [scsi_mod]
 [] do_scan_async+0x0/0x126 [scsi_mod]
 [] kthread+0xb0/0xd8
 [] kthread+0x0/0xd8
 [] kernel_thread_helper+0x7/0x10
 ===

STACK TRACE 2
-

BUG: unable to handle kernel NULL pointer dereference at virtual address 

 printing eip:
c0548435
*pde = 
Oops:  [#3]
SMP
Modules linked in: lpfc autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 
dm_mirror dm_mod video sbs d
CPU:3
EIP:0060:[]Not tainted VLI
EFLAGS: 00010202   (2.6.20-rc7-jimp #1)
EIP is at make_class_name+0x27/0x7d
eax:    ebx:    ecx:    edx: 000b
esi: f887c29a   edi:    ebp:    esp: edd27ec4
ds: 007b   es: 007b   ss: 0068
Process fc_wq_7 (pid: 16050, ti=edd27000 task=f68b4550 task.ti=edd27000)
Stack: f65d35f8 f8889e20 f65d35f8 f65d35f0 f8889da0 c0548616  f65d35f0
   f69f0030 f6fdc000  c054869c f65d3400 f88720f8 f65d3400 f69f0030
   f8872152 f65d3400 f69f f88721c9 f6fdc014 f6d1d848 f8872233 f8872247
Call Trace:
 [] class_device_del+0x8c/0x10a
 [] class_device_unregister+0x8/0x10
 [] __scsi_remove_device+0x1d/0x60 [scsi_mod]
 [] scsi_remove_device+0x17/0x20 [scsi_mod]
 [] __scsi_remove_target+0x6e/0xa1 [scsi_mod]
 [] __remove_child+0x0/0x18 [scsi_mod]
 [] __remove_child+0x14/0x18 [scsi_mod]
 [] device_for_each_child+0x1a/0x3c
 [] scsi_remove_target+0x2e/0x37 [scsi_mod]
 [] fc_rport_final_delete+0x51/0x9a [scsi_transport_fc]
 [] run_workqueue+0x85/0x125
 [] do_sigaction+0x10f/0x149
 [] fc_rport_final_delete+0x0/0x9a [scsi_transport_fc]
 [] worker_thread+0xf9/0x124
 [] default_wake_function+0x0/0xc
 [] worker_thread+0x0/0x124
 [] kthread+0xb0/0xd8
 [] kthread+0x0/0xd8
 [] kernel_thread_helper+0x7/0x10
 ===
Code: 5b 04 5b c3 55 31 ed 57 89 c7 56 89 c6 53 83 cb ff 83 ec 04 89 d9 89 14 
24 89 e8 f2 ae f7 d1

STACK TRACE 3
-

BUG: unable to handle kernel NULL pointer dereference at virtual address 

 printing eip:

*pde =

Re: [PATCH 1/1] scsi: Add EH Start Unit retry

2007-04-02 Thread Brian King
James Bottomley wrote:
> On Thu, 2007-03-29 at 15:25 -0500, Brian King wrote:
> This is pretty much an open coded for loop ... how about just doing a
> for loop as attached below?

Fine by me. I was simply coding it up to look like scsi_eh_tur.

Thanks,

Brian

> 
> The two advantages to this are:
> 
>  1. it's cleaner and more readable
>  2. the compiler knows how to optimize it better
> 
> James
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 7a1a1bb..28a266c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -932,10 +932,12 @@ static int scsi_eh_try_stu(struct scsi_cmnd *scmd)
>   static unsigned char stu_command[6] = {START_STOP, 0, 0, 0, 1, 0};
> 
>   if (scmd->device->allow_restart) {
> - int rtn;
> + int i, rtn = NEEDS_RETRY;
> +
> + for (i = 0; rtn == NEEDS_RETRY && i < 2; i++)
> + rtn = scsi_send_eh_cmnd(scmd, stu_command, 6,
> + START_UNIT_TIMEOUT, 0);
> 
> - rtn = scsi_send_eh_cmnd(scmd, stu_command, 6,
> - START_UNIT_TIMEOUT, 0);
>   if (rtn == SUCCESS)
>   return 0;
>   }
> 
> 


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SMP pass through interface via bsg

2007-04-02 Thread James Smart



James Bottomley wrote:

-- each SAS object (host, device, expander, etc) has the own bsg
device


I think so; probably attached via the transport class.


FYI - I understand the idea of a bsg device per object, but really, for
something that is used rarely, it's a bunch of overhead. Objects, data
structures, etc - more udev/kobject mgmt  I believe I prefer the
approach of a shared distribution point - e.g. one bsg device at the
transport globally, or perhaps one at the host (actually the outbound
port aka host/channel) supporting the transport - followed by headers
in the messages that direct flow after that. This kinda follows the
model we have today for I/O - w/ queuecommand for the host, and
addressing in the SCSI command.

Additionally, I've always had some concern that we had to create an
object for everything in the SAN (every phy!), and have that view replicated
per host (for multi-initiator/multi-path SANs). I always believed there
was some sets of things that you would want to talk to that just doesn't
justify a new object (for example - do we start talking to process associators
in FC ?).  Another reason to move toward a transport-specific addressing
header.

My other concern with using bsg and the i/o path for transport management
functions is they compete with i/o for things like the can_queue values.
Should they ? Should they have higher priority ?


I'd really rather not go this route unless the one device per object
approach becomes untenable.


Understood, but building things until they topple is not a great idea
as there will be back-ward compatibility issues w/ user-space/sysfs and
the tools built around it. If you start with the shared distribution
point, you can always support both (eventually) if its a good idea.
Harder to do that in the reverse if it's toppling.


The patch adds a hook into sas transport class. sas_host_setup calls
bsg_register_queue. Then, the request_fn calls smp_execute_task to
send a smp request and get the response. It doesn't look good to link
the sas transport class with libsas. In addition, the mpt driver
handles smp request/response in a very different way.

Any suggestion to bind SMP pass through via bsg to aic94xx and mpt
cleanly?


bind in the transport class, not the driver ...


Agree - the trick for libsas is to get an interface into the driver that
both drivers can support.

-- james s
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Block layer: separate out queue-oriented ioctls Re: [PATCH] Block layer: separate out queue-oriented ioctls

2007-04-02 Thread Alan Stern
On Mon, 2 Apr 2007, Joerg Schilling wrote:

> Hi,
> 
> this is a repost as I like to know the current state of the problem...
> 
> The USB DMA size problem is known to exist on Linux since February 2004.
> I am still in hope that there will be a fix soon.

Me too.  I submitted the most recent version of the patch (labelled as857) 
over a month ago and have received essentially no feedback on it.

Douglas, James...  What's the story?

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Block layer: separate out queue-oriented ioctls Re: [PATCH] Block layer: separate out queue-oriented ioctls

2007-04-02 Thread Jens Axboe
On Mon, Apr 02 2007, Alan Stern wrote:
> On Mon, 2 Apr 2007, Joerg Schilling wrote:
> 
> > Hi,
> > 
> > this is a repost as I like to know the current state of the problem...
> > 
> > The USB DMA size problem is known to exist on Linux since February 2004.
> > I am still in hope that there will be a fix soon.
> 
> Me too.  I submitted the most recent version of the patch (labelled as857) 
> over a month ago and have received essentially no feedback on it.
> 
> Douglas, James...  What's the story?

FWIW, it had my ack, I think we were just waiting for Doug to ack the sg
bits.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Block layer: separate out queue-oriented ioctls Re: [PATCH] Block layer: separate out queue-oriented ioctls

2007-04-02 Thread James Bottomley
On Mon, 2007-04-02 at 16:23 +0200, Jens Axboe wrote:
> FWIW, it had my ack, I think we were just waiting for Doug to ack the sg
> bits.

And there's really nothing I can do (well, except write the thing) since
the changes are not in any SCSI pieces I maintain directly ... they're
block and sg.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD driver-core] Lifetime problems of the current driver model

2007-04-02 Thread Cornelia Huck
On Mon, 2 Apr 2007 11:20:48 +0200,
Cornelia Huck <[EMAIL PROTECTED]> wrote:

> Cool. However, there's something fishy there (not sure whether it's in
> your patch or a latent bug in the ccw bus code that just has been
> uncovered):

Similar bug when loading/unloading a module that creates a driver
attribute. The winner seems to be kfree(sd->s_element) in
release_sysfs_dirent() (in case of an attribute, it will point to the
attribute structure, which is usually statically created)...
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SMP pass through interface via bsg

2007-04-02 Thread Douglas Gilbert
James Smart wrote:
> 
> 
> James Bottomley wrote:
>>> -- each SAS object (host, device, expander, etc) has the own bsg
>>> device
>>
>> I think so; probably attached via the transport class.
> 
> FYI - I understand the idea of a bsg device per object, but really, for
> something that is used rarely, it's a bunch of overhead. Objects, data
> structures, etc - more udev/kobject mgmt  I believe I prefer the
> approach of a shared distribution point - e.g. one bsg device at the
> transport globally, or perhaps one at the host (actually the outbound
> port aka host/channel) supporting the transport - followed by headers
> in the messages that direct flow after that. This kinda follows the
> model we have today for I/O - w/ queuecommand for the host, and
> addressing in the SCSI command.

James,
I fully agree.

> Additionally, I've always had some concern that we had to create an
> object for everything in the SAN (every phy!), and have that view
> replicated
> per host (for multi-initiator/multi-path SANs). I always believed there
> was some sets of things that you would want to talk to that just doesn't
> justify a new object (for example - do we start talking to process
> associators
> in FC ?).  Another reason to move toward a transport-specific addressing
> header.

Yes, seldom used things like well known logical units
and virtual SMP targets (there is one on every MPT Fusion
SAS LBA) that don't make the cut in the "devices for everything"
model become invisible to Linux users. It is exactly these
type of things that specialized user space programs use
a pass-through interface for.

So if the kernel can't find a use for it, then you, the
owner of the hardware, won't be able to use it either.
Hard to describe that approach as open software.

> My other concern with using bsg and the i/o path for transport management
> functions is they compete with i/o for things like the can_queue values.
> Should they ? Should they have higher priority ?

sg v4 adds priority control mechanisms but there still
remains possibilities for conflict, some of which may
cause problems.

I can see that a state based driver like st may want
to stop a pass-through getting to a logical unit most
of the time (and mechanisms could be added). However
even st may want to use a pass through to the transport
to reset the target ("hard reset") if it can't get the
LU RESET task management function to work.

>> I'd really rather not go this route unless the one device per object
>> approach becomes untenable.
> 
> Understood, but building things until they topple is not a great idea
> as there will be back-ward compatibility issues w/ user-space/sysfs and
> the tools built around it. If you start with the shared distribution
> point, you can always support both (eventually) if its a good idea.
> Harder to do that in the reverse if it's toppling.

We are talking about the SAS Management Protocol (SMP)
in this thread and in SAS-1 and SAS-1.1 discovery is done
by every SAS initiator, for every ripple in the topology.
In large topologies this approach can cause a "SMP storm"
that can temporarily drop SAS bandwidth to SCSI-1 figures.
Today discovery is done in the LLD or firmware (Adaptec and
LSI respectively) so they can magically make devices appear.
The approach in SAS-2 is to devolve SAS discovery to
expanders and use more efficient SMP functions. Current
generation SAS HBAs (and some LLDs) will need to alter
or stop their current SAS discovery techniques. The
user space may need to get involved, for zoning and
associated security.

Only allowing the SMP pass-through to talk to devices
that the kernel thinks are SAS expanders has some
shortcomings:
  - how can user space SAS topology discovery be done?
  - what about SMP targets that are not on expanders
  - disabling the phy that connects an expander
to the SAS domain is problematic when the
file descriptor you are using notionally represents
that expander.


Note: discovery of a SAS topology is a different process from
finding logical units within SCSI targets. In the context of
SAS, the latter process can stay in the kernel and can be
done for each SSP target found, preferably after the SAS
topology has been fully discovered.

>>> The patch adds a hook into sas transport class. sas_host_setup calls
>>> bsg_register_queue. Then, the request_fn calls smp_execute_task to
>>> send a smp request and get the response. It doesn't look good to link
>>> the sas transport class with libsas. In addition, the mpt driver
>>> handles smp request/response in a very different way.
>>>
>>> Any suggestion to bind SMP pass through via bsg to aic94xx and mpt
>>> cleanly?
>>
>> bind in the transport class, not the driver ...
> 
> Agree - the trick for libsas is to get an interface into the driver that
> both drivers can support.

For LSI MPT Fusion it should be almost trivial to
map the  (Tomo's "hacky" approach")
through to LSI's ioc_num and SMP pass-through structures.

The aic94xx must have a 

Re: SMP pass through interface via bsg

2007-04-02 Thread James Bottomley
On Mon, 2007-04-02 at 21:43 +0900, FUJITA Tomonori wrote:
> From: James Bottomley <[EMAIL PROTECTED]>
> > If you post the code, others who also have the hardware can try it
> > too ...
> 
> I've attached a patch against Jens' bsg tree and uploaed a patch
> against the sgv4-tools tree:
> 
> http://zaal.org/bsg/smp-test.diff
> 
> The sgv4-tools tree is at:
> 
> http://git.kernel.org/?p=linux/kernel/git/tomo/sgv4-tools.git;a=summary
> 
> 
> If everything works:
> 
> gazania:/sys/class/bsg# ls
> end_device-0:0  end_device-0:1  expander-0:0  sas_host0  sda  sdb
> 
> gazania:/home/fujita/sgv4-tools# ./smp_test /sys/class/bsg/expander-0\:0
> 
> 
> smp_test works like Doug's smp_rep_manufacturer.
> 
> Note that `smp_test /sys/class/bsg/expander-0\:0/` doesn't work.

Well, I get a lot further than Doug with aic94xx, but it still doesn't
quite work.  This is the console output:

sas_non_host_smp_fn 193
sas_smp_fn 172: 8 0804d000 4
BUG: sleeping function called from invalid context at mm/slab.c:3032
in_atomic():0, irqs_disabled():1
no locks held by smp_test/3102.
irq event stamp: 1234
hardirqs last  enabled at (1233): [] 
debug_check_no_locks_freed+0xa5/0x1c0
hardirqs last disabled at (1234): [] _spin_lock_irq+0xf/0x40
softirqs last  enabled at (0): [] copy_process+0x2dd/0x11c0
softirqs last disabled at (0): [<>] 0x0
 [] show_trace_log_lvl+0x1a/0x30
 [] show_trace+0x12/0x20
 [] dump_stack+0x16/0x20
 [] __might_sleep+0xb9/0xd0
 [] __kmalloc_track_caller+0xd8/0x110
 [] __kzalloc+0x19/0x50
 [] sas_smp_request+0x3f/0x130 [libsas]
 [] sas_smp_fn+0x8f/0xd0 [scsi_transport_sas]
 [] sas_non_host_smp_fn+0x60/0x70 [scsi_transport_sas]
 [] elv_insert+0x91/0x160
 [] __elv_add_request+0x5d/0xb0
 [] blk_execute_rq_nowait+0x52/0xa0
 [] blk_execute_rq+0x95/0xe0
 [] bsg_ioctl+0x133/0x1b0
 [] do_ioctl+0x6b/0x80
 [] vfs_ioctl+0x57/0x2a0
 [] sys_ioctl+0x39/0x60
 [] syscall_call+0x7/0xb
 ===
sas: smp_execute_task: task to dev 500605b01450 response: 0x0 status 0x82
sas: smp_execute_task: task to dev 500605b01450 response: 0x0 status 0x82
sas: smp_execute_task: task to dev 500605b01450 response: 0x0 status 0x82

The first huge dump is just complaining about a kzalloc GFP_KERNEL with irqs 
off in sas_smp_request.

I'll debug why the execute_task is failing when I get more time today.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] [SCSI]stex: minor cleanup and version update

2007-04-02 Thread Brian King
Ed Lin wrote:
> @@ -1007,6 +1008,11 @@ static int stex_abort(struct scsi_cmnd *
>   u32 data;
>   int result = SUCCESS;
>   unsigned long flags;
> +
> + printk(KERN_INFO DRV_NAME
> + "(%s): aborting command\n", pci_name(hba->pdev));
> + scsi_print_command(cmd);
> +

scmd_printk is probably what you want here rather than just a printk.

scmd_printk(KERN_ERR, cmd, "aborting command\n");


> @@ -1092,6 +1098,10 @@ static int stex_reset(struct scsi_cmnd *
>   unsigned long before;
>   hba = (struct st_hba *) &cmd->device->host->hostdata[0];
> 
> + printk(KERN_INFO DRV_NAME
> + "(%s): resetting host\n", pci_name(hba->pdev));
> + scsi_print_command(cmd);
> +

Same here.

Brian

-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SMP pass through interface via bsg

2007-04-02 Thread FUJITA Tomonori
From: James Bottomley <[EMAIL PROTECTED]>
Subject: Re: SMP pass through interface via bsg
Date: Mon, 02 Apr 2007 11:02:01 -0500

> On Mon, 2007-04-02 at 21:43 +0900, FUJITA Tomonori wrote:
> > From: James Bottomley <[EMAIL PROTECTED]>
> > > If you post the code, others who also have the hardware can try it
> > > too ...
> > 
> > I've attached a patch against Jens' bsg tree and uploaed a patch
> > against the sgv4-tools tree:
> > 
> > http://zaal.org/bsg/smp-test.diff
> > 
> > The sgv4-tools tree is at:
> > 
> > http://git.kernel.org/?p=linux/kernel/git/tomo/sgv4-tools.git;a=summary
> > 
> > 
> > If everything works:
> > 
> > gazania:/sys/class/bsg# ls
> > end_device-0:0  end_device-0:1  expander-0:0  sas_host0  sda  sdb
> > 
> > gazania:/home/fujita/sgv4-tools# ./smp_test /sys/class/bsg/expander-0\:0
> > 
> > 
> > smp_test works like Doug's smp_rep_manufacturer.
> > 
> > Note that `smp_test /sys/class/bsg/expander-0\:0/` doesn't work.
> 
> Well, I get a lot further than Doug with aic94xx, but it still doesn't
> quite work.  This is the console output:

Thanks.

>  ===
> sas: smp_execute_task: task to dev 500605b01450 response: 0x0 status 0x82
> sas: smp_execute_task: task to dev 500605b01450 response: 0x0 status 0x82
> sas: smp_execute_task: task to dev 500605b01450 response: 0x0 status 0x82
> 
> The first huge dump is just complaining about a kzalloc GFP_KERNEL
> with irqs off in sas_smp_request.

Probably, this is due to a silly bug in sas_smp_fn (please replace
spin_unlock with spin_unclok_irq).


> I'll debug why the execute_task is failing when I get more time today.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SMP pass through interface via bsg

2007-04-02 Thread FUJITA Tomonori
From: FUJITA Tomonori <[EMAIL PROTECTED]>
Subject: Re: SMP pass through interface via bsg
Date: Tue, 03 Apr 2007 01:27:47 +0900

> From: James Bottomley <[EMAIL PROTECTED]>
> Subject: Re: SMP pass through interface via bsg
> Date: Mon, 02 Apr 2007 11:02:01 -0500
> 
> > On Mon, 2007-04-02 at 21:43 +0900, FUJITA Tomonori wrote:
> > > From: James Bottomley <[EMAIL PROTECTED]>
> > > > If you post the code, others who also have the hardware can try it
> > > > too ...
> > > 
> > > I've attached a patch against Jens' bsg tree and uploaed a patch
> > > against the sgv4-tools tree:
> > > 
> > > http://zaal.org/bsg/smp-test.diff
> > > 
> > > The sgv4-tools tree is at:
> > > 
> > > http://git.kernel.org/?p=linux/kernel/git/tomo/sgv4-tools.git;a=summary
> > > 
> > > 
> > > If everything works:
> > > 
> > > gazania:/sys/class/bsg# ls
> > > end_device-0:0  end_device-0:1  expander-0:0  sas_host0  sda  sdb
> > > 
> > > gazania:/home/fujita/sgv4-tools# ./smp_test /sys/class/bsg/expander-0\:0
> > > 
> > > 
> > > smp_test works like Doug's smp_rep_manufacturer.
> > > 
> > > Note that `smp_test /sys/class/bsg/expander-0\:0/` doesn't work.
> > 
> > Well, I get a lot further than Doug with aic94xx, but it still doesn't
> > quite work.  This is the console output:
> 
> Thanks.
> 
> >  ===
> > sas: smp_execute_task: task to dev 500605b01450 response: 0x0 status 
> > 0x82
> > sas: smp_execute_task: task to dev 500605b01450 response: 0x0 status 
> > 0x82
> > sas: smp_execute_task: task to dev 500605b01450 response: 0x0 status 
> > 0x82
> > 
> > The first huge dump is just complaining about a kzalloc GFP_KERNEL
> > with irqs off in sas_smp_request.
> 
> Probably, this is due to a silly bug in sas_smp_fn (please replace
> spin_unlock with spin_unclok_irq).
> 
> 
> > I'll debug why the execute_task is failing when I get more time today.
> 
> Thanks.

OK. I found another bug in smp_test tool (sends bogus response buffer
len to kernel). I've uploaded a new patch:

http://zaal.org/bsg/smp-test2.diff

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SMP pass through interface via bsg

2007-04-02 Thread James Bottomley
On Tue, 2007-04-03 at 01:43 +0900, FUJITA Tomonori wrote:
> OK. I found another bug in smp_test tool (sends bogus response buffer
> len to kernel). I've uploaded a new patch:
> 
> http://zaal.org/bsg/smp-test2.diff

That sort of works; you have a final bug in that the manufacturer info
response frame is 64 bytes, not 128 bytes, but with that corrected
everything goes through and I get the result back:

hobholes:~# /home/jejb/git/sgv4-tools/smp_test /sys/class/bsg/expander-2\:0
  SAS-1.1 format: 0
  vendor identification: LSILOGIC
  product identification: SASx12 A.0  
  product revision level: 

So we can class this one as a success ... 

Thanks!

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SMP pass through interface via bsg

2007-04-02 Thread James Bottomley
On Mon, 2007-04-02 at 09:45 -0400, James Smart wrote:
> 
> James Bottomley wrote:
> >> -- each SAS object (host, device, expander, etc) has the own bsg
> >> device
> > 
> > I think so; probably attached via the transport class.
> 
> FYI - I understand the idea of a bsg device per object, but really, for
> something that is used rarely, it's a bunch of overhead.

It's some overhead, yes ... although rarely do the control elements
(switches, expanders, etc.) outnumber the actual end devices
significantly, so it's only a fractional increase in most configurations

>  Objects, data
> structures, etc - more udev/kobject mgmt 

This I'm prepared to trade for usability ... also, in our current sysfs
layout, we already have most of these objects.  Fundamentally, it's
easier just to use something that already exists in your hierarchy to do
the mapping, so /sys/class/bsg/expander-2:0 in tomo's example.

However, I realise this could be hidden by the tools as well in the
single tap, so it's a weak argument.

>  I believe I prefer the
> approach of a shared distribution point - e.g. one bsg device at the
> transport globally, or perhaps one at the host (actually the outbound
> port aka host/channel) supporting the transport - followed by headers
> in the messages that direct flow after that. This kinda follows the
> model we have today for I/O - w/ queuecommand for the host, and
> addressing in the SCSI command.

> Additionally, I've always had some concern that we had to create an
> object for everything in the SAN (every phy!), and have that view replicated
> per host (for multi-initiator/multi-path SANs). I always believed there
> was some sets of things that you would want to talk to that just doesn't
> justify a new object (for example - do we start talking to process associators
> in FC ?).  Another reason to move toward a transport-specific addressing
> header.

Well, that's a generic problem with sysfs and the device model.  We can
shortcircuit it by removing irrelevant objects ... so we don't actually
see phys in the FC tree, do we?  but for relevant objects, yes we do
show and allocate them.

> My other concern with using bsg and the i/o path for transport management
> functions is they compete with i/o for things like the can_queue values.
> Should they ? Should they have higher priority ?
> 
> > I'd really rather not go this route unless the one device per object
> > approach becomes untenable.
> 
> Understood, but building things until they topple is not a great idea
> as there will be back-ward compatibility issues w/ user-space/sysfs and
> the tools built around it. If you start with the shared distribution
> point, you can always support both (eventually) if its a good idea.
> Harder to do that in the reverse if it's toppling.

True ... but the one tap per relevant object approach won't explode over
what we currently have (i.e. if it's going to fall over, it will anyway
regardless of this issue ...)

> >> The patch adds a hook into sas transport class. sas_host_setup calls
> >> bsg_register_queue. Then, the request_fn calls smp_execute_task to
> >> send a smp request and get the response. It doesn't look good to link
> >> the sas transport class with libsas. In addition, the mpt driver
> >> handles smp request/response in a very different way.
> >>
> >> Any suggestion to bind SMP pass through via bsg to aic94xx and mpt
> >> cleanly?
> > 
> > bind in the transport class, not the driver ...
> 
> Agree - the trick for libsas is to get an interface into the driver that
> both drivers can support.
> 
> -- james s

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-02 Thread Ed Lin


> -Original Message-
> From: Christoph Hellwig [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, March 31, 2007 2:27 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux
> Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> 
> 
> On Fri, Mar 30, 2007 at 03:21:33PM -0700, Ed Lin wrote:
> > +   if (hba->cardtype == st_shasta) {
> > req->lun = lun;
> > req->target = id;
> > +   } else if (hba->cardtype == st_yosemite){
> > +   req->lun = id * ST_MAX_LUN_PER_TARGET + lun;
> > +   req->target = 0;
> > +   } else {
> > +   /* st_vsc and st_vsc1 */
> > +   req->lun = 0;
> > +   req->target = id * ST_MAX_LUN_PER_TARGET + lun;
> 
> I don't get why you can't export id as targer and lun as lun for
> the !st_shasta types.  Could you explain in detail what the problem
> with that approach would be?
> 
>

Of course I can do that. That will result in 1 target and 128 lun
for st_yosemite and 128 target and 1 lun for st_vsc. That seems
a little weird and I am afraid it will be turned down. Also
I can keep a same mapping for the console in the original code.

If you think it's ok, that's really better, because it makes the
hot path a bit faster. Also because of the CONFIG_SCSI_MULTI_LUN
option, I have to map lun to channel otherwise many entities
will disappear when that option is not selected. Plus I have to
reserve a slot for the RAID console, so the final mapping may be:

channel:0~7, id:0~16(st_shasta, channel 0,id 16 is reserved for console)
channel:0~127, id:0~1(st_yosemite, channel 0,id 1 is reserved for
console)
channel:0, id:0~128(st_vsc, channel 0,id 128 is reserved for console)

I don't know whether this is acceptable. 
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aic94xx driver woes

2007-04-02 Thread James Bottomley
On Sun, 2007-04-01 at 16:29 -0400, Douglas Gilbert wrote:
> ...
> sas: phy3 added to port0, phy_mask:0x8
> sas: DOING DISCOVERY on port 0, pid:2110
> aic94xx: scb:0x80 timed out

This might be the problem.

I see this periodically when a phy goes out to lunch on my system ...
with me, it always seems to be phy0 of a port containing phy0-4 ... so
phy1-3 still function to get messages.

Can you try sending a link reset to phy3?

It should be something like

echo 1 > /sys/class/sas_phy/phy-X:3/link_reset

and see if it just produces 

aic94xx: scb:0x80 timed out

Again?

Thanks,

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SMP pass through interface via bsg

2007-04-02 Thread FUJITA Tomonori
From: James Bottomley <[EMAIL PROTECTED]>
Subject: Re: SMP pass through interface via bsg
Date: Mon, 02 Apr 2007 12:01:41 -0500

> On Tue, 2007-04-03 at 01:43 +0900, FUJITA Tomonori wrote:
> > OK. I found another bug in smp_test tool (sends bogus response buffer
> > len to kernel). I've uploaded a new patch:
> > 
> > http://zaal.org/bsg/smp-test2.diff
> 
> That sort of works; you have a final bug in that the manufacturer info
> response frame is 64 bytes, not 128 bytes, but with that corrected
> everything goes through and I get the result back:
> 
> hobholes:~# /home/jejb/git/sgv4-tools/smp_test /sys/class/bsg/expander-2\:0
>   SAS-1.1 format: 0
>   vendor identification: LSILOGIC
>   product identification: SASx12 A.0  
>   product revision level: 
> 
> So we can class this one as a success ... 
>
> Thanks!

Great! Thanks. I'll try to finish the mpt driver's hook
sometime. Finally, We have a bsg user (though it also needs proper
bidi support).

Jens, what remains to be done before bsg is merged into mainline?
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-02 Thread Ed Lin


> -Original Message-
> From: James Bottomley [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, March 31, 2007 7:22 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> 
> 
> On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote:
> > The internal id/lun mapping of st_vsc and st_vsc1 
> controllers is different
> > from st_shasta. The original driver code can only  map 
> first 16 'entities'
> > for st_vsc and st_vsc1 while there are actually 128 available.
> > 
> > Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
> > no harm because inquiries beyond boundary are discarded by firmware.
> > 
> > The correct internal mapping should be:
> > id:0~15, lun:0~7 (st_shasta)
> > id:0, lun:0~127 (st_yosemite)
> > id:0~127, lun:0 (st_vsc and st_vsc1)
> > To scsi mid layer they are all channel:0~7, id:0~15, lun:0, 
> with a maximun
> > 'entity' number of 128. The RAID console only interfaces to 
> scsi mid layer
> > and is always mapped at channel:0, id:16, lun:0.
> 
> I'm with Christoph here ... if we're going to break the backwards
> compatibility of the mappings (which your code does) then we 
> could just
> dump channel and use the SCSI id and lun directly.
> 
> Understanding this code is predicated on this quirky definition in
> stex_queuecommand:
> 
>   id = cmd->device->id;
>   lun = cmd->device->channel; /* firmware lun issue work around */
>^^^
> 
> > @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd,
> >  
> > req = stex_alloc_req(hba);
> >  
> > -   if (hba->cardtype == st_yosemite) {
> > -   req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id;
> 
> This looks to be correct, it goes up id 0 to ST_MAX_TARGET_NUM -1 then
> takes the next channel.
> 
> > -   req->target = 0;
> > -   } else {
> > +   if (hba->cardtype == st_shasta) {
> > req->lun = lun;
> > req->target = id;
> > +   } else if (hba->cardtype == st_yosemite){
> > +   req->lun = id * ST_MAX_LUN_PER_TARGET + lun;
> > +   req->target = 0;
> > +   } else {
> > +   /* st_vsc and st_vsc1 */
> > +   req->lun = 0;
> > +   req->target = id * ST_MAX_LUN_PER_TARGET + lun;
> 
> These both look to be wrong.  You're taking the channel as the lowest
> common denominator, so your first target is on channel 1 id 
> 0, your next
> on channel 2, id 0 and so on.  That's really going to mess with the
> ordering (which will be user visible) is that really what you want?
> 

Well the firmware doesn't care about the scanning order. You can jump
first on 0,8,16... and then go back to 1,9,17... These are consistent
to the user if the code is that way(may break previous impression
though).
Anyway this is a simulation and I can make whatever necessary
adjustment.


 
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-04-02 Thread Ed Lin


> -Original Message-
> From: James Bottomley [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, March 31, 2007 11:46 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: Re: [PATCH 3/4] [SCSI]stex: fix reset recovery for 
> console device
> 
> 
> On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote:
> > After reset completed, the scsi error handler sends out START_STOP
> > and TEST_UNIT_READY to the device. For 'normal' devices these
> > commands will be handled by firmware. However, because the RAID
> > console only interfaces to scsi mid layer, the firmware 
> will not process
> > these commands for it. This will make the console to be 
> offlined right
> > after reset. Add the handling in driver to fix this problem.
> 
> I don't see how this explanation can be correct.  The error 
> handler only
> sends a START_STOP command if sdev->allow_restart is one, 
> which you have
> to set in the slave_configure routines (which stex doesn't).  
> If you're
> seeing a START_STOP in the eh path, there's something else wrong.
> TEST_UNIT_READY, certainly ... it's part of a restart check.
> 

I just saw the routine name scsi_eh_try_stu, and didn't notice the
allow_restart (partly because I thought it was not harmful...).
But the TEST_UNIT_READY must stay.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SMP pass through interface via bsg

2007-04-02 Thread Jens Axboe
On Tue, Apr 03 2007, FUJITA Tomonori wrote:
> From: James Bottomley <[EMAIL PROTECTED]>
> Subject: Re: SMP pass through interface via bsg
> Date: Mon, 02 Apr 2007 12:01:41 -0500
> 
> > On Tue, 2007-04-03 at 01:43 +0900, FUJITA Tomonori wrote:
> > > OK. I found another bug in smp_test tool (sends bogus response buffer
> > > len to kernel). I've uploaded a new patch:
> > > 
> > > http://zaal.org/bsg/smp-test2.diff
> > 
> > That sort of works; you have a final bug in that the manufacturer info
> > response frame is 64 bytes, not 128 bytes, but with that corrected
> > everything goes through and I get the result back:
> > 
> > hobholes:~# /home/jejb/git/sgv4-tools/smp_test /sys/class/bsg/expander-2\:0
> >   SAS-1.1 format: 0
> >   vendor identification: LSILOGIC
> >   product identification: SASx12 A.0  
> >   product revision level: 
> > 
> > So we can class this one as a success ... 
> >
> > Thanks!
> 
> Great! Thanks. I'll try to finish the mpt driver's hook
> sometime. Finally, We have a bsg user (though it also needs proper
> bidi support).
> 
> Jens, what remains to be done before bsg is merged into mainline?

Well the bi-dir stuff and sg v4 design were the two bits that needed to
get done before pushing bsg made sense, so we are getting there...
Probably a 2.6.23 target, leaving the bidi bits a revision cycle to get
sorted out.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/4] [SCSI]stex: minor cleanup and version update

2007-04-02 Thread Ed Lin


> -Original Message-
> From: Brian King [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 02, 2007 9:05 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux
> Subject: Re: [PATCH 4/4] [SCSI]stex: minor cleanup and version update
> 
> 
> Ed Lin wrote:
> > @@ -1007,6 +1008,11 @@ static int stex_abort(struct scsi_cmnd *
> > u32 data;
> > int result = SUCCESS;
> > unsigned long flags;
> > +
> > +   printk(KERN_INFO DRV_NAME
> > +   "(%s): aborting command\n", pci_name(hba->pdev));
> > +   scsi_print_command(cmd);
> > +
> 
> scmd_printk is probably what you want here rather than just a printk.
> 
> scmd_printk(KERN_ERR, cmd, "aborting command\n");
> 
> 
> > @@ -1092,6 +1098,10 @@ static int stex_reset(struct scsi_cmnd *
> > unsigned long before;
> > hba = (struct st_hba *) &cmd->device->host->hostdata[0];
> > 
> > +   printk(KERN_INFO DRV_NAME
> > +   "(%s): resetting host\n", pci_name(hba->pdev));
> > +   scsi_print_command(cmd);
> > +
> 
> Same here.

Well it's just because printk with pci_name stuff is used across
the driver, and I didn't want to break the rule... 
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-04-02 Thread James Bottomley
On Mon, 2007-04-02 at 11:14 -0700, Ed Lin wrote:
> I just saw the routine name scsi_eh_try_stu, and didn't notice the
> allow_restart (partly because I thought it was not harmful...).
> But the TEST_UNIT_READY must stay.

Sure ... I was just checking since your change log implied you'd seen
the problem from the error handler ... however, we can add it ...
there's a possibility of getting spin up on init from sd anyway.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SMP pass through interface via bsg

2007-04-02 Thread Mike Christie
Jens Axboe wrote:
> On Tue, Apr 03 2007, FUJITA Tomonori wrote:
>> From: James Bottomley <[EMAIL PROTECTED]>
>> Subject: Re: SMP pass through interface via bsg
>> Date: Mon, 02 Apr 2007 12:01:41 -0500
>>
>>> On Tue, 2007-04-03 at 01:43 +0900, FUJITA Tomonori wrote:
 OK. I found another bug in smp_test tool (sends bogus response buffer
 len to kernel). I've uploaded a new patch:

 http://zaal.org/bsg/smp-test2.diff
>>> That sort of works; you have a final bug in that the manufacturer info
>>> response frame is 64 bytes, not 128 bytes, but with that corrected
>>> everything goes through and I get the result back:
>>>
>>> hobholes:~# /home/jejb/git/sgv4-tools/smp_test /sys/class/bsg/expander-2\:0
>>>   SAS-1.1 format: 0
>>>   vendor identification: LSILOGIC
>>>   product identification: SASx12 A.0  
>>>   product revision level: 
>>>
>>> So we can class this one as a success ... 
>>>
>>> Thanks!
>> Great! Thanks. I'll try to finish the mpt driver's hook
>> sometime. Finally, We have a bsg user (though it also needs proper
>> bidi support).
>>
>> Jens, what remains to be done before bsg is merged into mainline?
> 
> Well the bi-dir stuff and sg v4 design were the two bits that needed to
> get done before pushing bsg made sense, so we are getting there...
> Probably a 2.6.23 target, leaving the bidi bits a revision cycle to get
> sorted out.
> 

Could we get the bidi parts in without having to do the other sg clean
ups (my patches to merge the blk_rq_map* with the scsi ULD (sg, etc,
etc) equivalents or do the clean ups have to be done first?
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] aacraid: Panics during init time reset (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test)

2007-04-02 Thread Salyzyn, Mark
Duane discovered in the scsi-misc-2.6 code that the reset handler could
be called without the sync command handler set up resulting in a panic.
Judith discovered this issue within minutes and has recently reported
it. Here is a fix.

IMHO, this needs to be applied immediately regardless of the status of
the kexec patch as this issue is present in the scsi-misc-2.6 driver for
all existing init-time recovery actions. This patch in principal would
not be different w/o the kexec patch.

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's
handling of patches.

This attached patch is against current scsi-misc-2.6
 
Signed-off-by: Mark Salyzyn <[EMAIL PROTECTED]>

---

Sincerely -- Mark Salyzyn


aacraid_kexec_2.patch
Description: aacraid_kexec_2.patch


Re: SMP pass through interface via bsg

2007-04-02 Thread Jens Axboe
On Mon, Apr 02 2007, Mike Christie wrote:
> Jens Axboe wrote:
> > On Tue, Apr 03 2007, FUJITA Tomonori wrote:
> >> From: James Bottomley <[EMAIL PROTECTED]>
> >> Subject: Re: SMP pass through interface via bsg
> >> Date: Mon, 02 Apr 2007 12:01:41 -0500
> >>
> >>> On Tue, 2007-04-03 at 01:43 +0900, FUJITA Tomonori wrote:
>  OK. I found another bug in smp_test tool (sends bogus response buffer
>  len to kernel). I've uploaded a new patch:
> 
>  http://zaal.org/bsg/smp-test2.diff
> >>> That sort of works; you have a final bug in that the manufacturer info
> >>> response frame is 64 bytes, not 128 bytes, but with that corrected
> >>> everything goes through and I get the result back:
> >>>
> >>> hobholes:~# /home/jejb/git/sgv4-tools/smp_test 
> >>> /sys/class/bsg/expander-2\:0
> >>>   SAS-1.1 format: 0
> >>>   vendor identification: LSILOGIC
> >>>   product identification: SASx12 A.0  
> >>>   product revision level: 
> >>>
> >>> So we can class this one as a success ... 
> >>>
> >>> Thanks!
> >> Great! Thanks. I'll try to finish the mpt driver's hook
> >> sometime. Finally, We have a bsg user (though it also needs proper
> >> bidi support).
> >>
> >> Jens, what remains to be done before bsg is merged into mainline?
> > 
> > Well the bi-dir stuff and sg v4 design were the two bits that needed to
> > get done before pushing bsg made sense, so we are getting there...
> > Probably a 2.6.23 target, leaving the bidi bits a revision cycle to get
> > sorted out.
> > 
> 
> Could we get the bidi parts in without having to do the other sg clean
> ups (my patches to merge the blk_rq_map* with the scsi ULD (sg, etc,
> etc) equivalents or do the clean ups have to be done first?

IMO the sg cleanups are an orthogonal effort.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SMP pass through interface via bsg

2007-04-02 Thread FUJITA Tomonori
From: Jens Axboe <[EMAIL PROTECTED]>
Subject: Re: SMP pass through interface via bsg
Date: Mon, 2 Apr 2007 20:12:38 +0200

> On Tue, Apr 03 2007, FUJITA Tomonori wrote:
> > From: James Bottomley <[EMAIL PROTECTED]>
> > Subject: Re: SMP pass through interface via bsg
> > Date: Mon, 02 Apr 2007 12:01:41 -0500
> > 
> > > On Tue, 2007-04-03 at 01:43 +0900, FUJITA Tomonori wrote:
> > > > OK. I found another bug in smp_test tool (sends bogus response buffer
> > > > len to kernel). I've uploaded a new patch:
> > > > 
> > > > http://zaal.org/bsg/smp-test2.diff
> > > 
> > > That sort of works; you have a final bug in that the manufacturer info
> > > response frame is 64 bytes, not 128 bytes, but with that corrected
> > > everything goes through and I get the result back:
> > > 
> > > hobholes:~# /home/jejb/git/sgv4-tools/smp_test 
> > > /sys/class/bsg/expander-2\:0
> > >   SAS-1.1 format: 0
> > >   vendor identification: LSILOGIC
> > >   product identification: SASx12 A.0  
> > >   product revision level: 
> > > 
> > > So we can class this one as a success ... 
> > >
> > > Thanks!
> > 
> > Great! Thanks. I'll try to finish the mpt driver's hook
> > sometime. Finally, We have a bsg user (though it also needs proper
> > bidi support).
> > 
> > Jens, what remains to be done before bsg is merged into mainline?
> 
> Well the bi-dir stuff and sg v4 design were the two bits that needed to
> get done before pushing bsg made sense, so we are getting there...
> Probably a 2.6.23 target, leaving the bidi bits a revision cycle to get
> sorted out.

sg v4 design issues refer to:

- iovec support

- an alternative to the read/write interface

- scsi command tag (possibly larger than 64 bits)

What else?
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] scsi: Add EH Start Unit retry

2007-04-02 Thread Brian King
Douglas Gilbert wrote:
> Brian King wrote:
>> Currently, the scsi error handler will issue a START_UNIT
>> command if the drive indicates it needs its motor started
>> and the allow_restart flag is set in the scsi_device. If,
>> after the scsi error handler invokes a host adapter reset
>> due to error recovery, a device is in a unit attention
>> state AND also needs a START_UNIT, that device will be placed
>> offline. The disk array devices on an ipr RAID adapter
>> will do exactly this when in a dual initiator configuration.
>> This patch adds a single retry to the EH initiated
>> START_UNIT.
> 
> I have no objection to this patch. Just seems a pity
> that SCSI devices go to the trouble of sending unit
> attentions while OSes just throw them away.

I agree. The reason the ipr adapter firmware added this UA
in this configuration was to support SCSI 1 reservations and
communicate to the host that any reservation previously
held to the disk array is now lost since the adapter was reset.

> Perhaps the scsi_device sysfs directory could have entries
> like:
>   last_ua_asc
>   last_ua_ascq
>   last_ua_timestamp
> where code could place the asc/ascq codes and a timestamp
> then continue doing a retry.
> Could we get a log entry, hotplug event?

If we did have a way to communicate UA's to userspace like this
it seems like it would allow usage of SCSI 1 reservations
in this config and make it easier for an out of band tool to
manage these reservations. I wonder if it would be cleaner if
UAs could simply be sent up as netlink events / uevents, so they
could contain all the information needed in one packet, rather
than having to read sysfs attributes to figure out what happened.

> Logical units may queue unit attentions (sam4r10.pdf
> section 5.8.7) so it is possible that one retry may
> not be enough. With my suggestion above, only the last
> one would persist for a reasonable time.

Yep. I've already ran into that with dual ported SAS devices.
While one retry is sufficient for the ipr disk array devices I am
trying to fix this for, I have no objection to increasing it.
Maybe its just a case of increasing it later if it ends up
being an issue.

Brian

-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD driver-core] Lifetime problems of the current driver model

2007-04-02 Thread Luben Tuikov
--- Tejun Heo <[EMAIL PROTECTED]> wrote:
> Cornelia Huck wrote:
> > On Sat, 31 Mar 2007 00:08:19 +0900,
> > Tejun Heo <[EMAIL PROTECTED]> wrote:
> > 
> >> (3) make sure all existing kobjects are released by module exit function.
> >>
> >> For example, let's say there is a hypothetical disk device /dev/dk0
> >> driven by a hypothetical driver mydrv.  /dev/dk0 is represented like the
> >> following in the sysfs tree.
> >>
> >> /sys/devices/pci:00/:00:1f.0/dk0/{myknob0,myknob1}
> >>
> >> Owner of both attrs myknob0 and myknob1 is mydrv and opening either
> >> increases the reference counts of dk0 and mydrv and closing does the
> >> opposite.
> >>
> >> * When there is no opener of either knob and the /dev/dk0 isn't used by
> >> anyone.  Reference count of dk0 is 1, mydrv 0.
> > 
> > Hm, but as long as dk0 is registered, it can be looked up and someone
> > could get a reference on it.
> 
> Yeah, exactly.  That's why any getting any kobject reference backed by a
> module must be accompanied by try_module_get().

Tejun,

This has always been the case.

By design, the kobject infrastructure is one such that allows for
different behavior implementations.  You could use it to accomplish
what you want and describe in your RFD, or you could use it to implement
what we have today or you could use it to implement completely different
(refcount) infrastructure.

Luben


> 
> int mydrv_get_dk(struct dk *dk)
> {
>   rc = try_module_get(mydrv);
>   if (rc)
>   return rc;
>   kobject_get(&dk->kobj);
>   return 0;
> }
> 
> >> * User issues rmmod mydrv.  As mydrv's reference count is zero, unload
> >> proceeds and mydrv's exit function is called.
> >>
> >> * mydrv's exit function looks like the following.
> >>
> >>   mydrv_exit()
> >>   {
> >>sysfs_remove_file(dk0, myknob0);
> >>sysfs_remove_file(dk1, myknob1);
> >>device_del(dk0);
> >>deinit controller;
> >>release all resources;
> >>   }
> >>
> >> The device_del(dk0) drops dk0's reference count to zero and its
> >> ->release is invoked immediately.
> > 
> > And here is the problem if someone else still has a reference. The
> > module will be unloaded, but ->release will not be called until the
> > "someone else" gives up the reference...
> 
> Exactly, in that case, module reference count must not be zero.  You and
> I are saying the same thing.  Why are we running in circle?
> 
> -- 
> tejun
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Fix scsi_send_eh_cmnd scatterlist handling

2007-04-02 Thread David Miller

This fixes a regression caused by commit:

2dc611de5a3fd955cd0298c50691d4c05046db97

The sense buffer code in scsi_send_eh_cmnd was changed to use
alloc_page() and a scatter list, but the sense data copy was not
updated to match so what we actually get in the sense buffer is total
grabage starting with the kernel address of the struct page we got.
Basically the stack frame of scsi_send_eh_cmd() is what ends up
in the sense buffer.

Depending upon how pointers look on a given platform, you can
end up getting sr_ioctl.c errors when you mount a cdrom.  If
the CDROM gives a check condition for GPCMD_GET_CONFIGURATION issued
by drivers/cdrom/cdrom.c:cdrom_mmc_profile(), sr_ioctl will
spit out this error message in sr_do_ioctl() with the way pointers
are on sparc64:

default:
printk(KERN_ERR "%s: CDROM (ioctl) error, command: ", 
cd->cdi.name);
__scsi_print_command(cgc->cmd);
scsi_print_sense_hdr("sr", &sshdr);
err = -EIO;

This is the error Tom Callaway reported in:

http://marc.info/?l=linux-sparc&m=117407453208101&w=2

Anyways, fix this by using page_address(sgl.page) which is OK
because we know this is low-mem due to GFP_ATOMIC.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index b8edcf5..918bb60 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -716,7 +716,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
 */
if (copy_sense) {
if (!SCSI_SENSE_VALID(scmd)) {
-   memcpy(scmd->sense_buffer, scmd->request_buffer,
+   memcpy(scmd->sense_buffer, page_address(sgl.page),
   sizeof(scmd->sense_buffer));
}
__free_page(sgl.page);
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: Fix scsi_send_eh_cmnd scatterlist handling

2007-04-02 Thread Christoph Hellwig
On Mon, Apr 02, 2007 at 12:24:39PM -0700, David Miller wrote:
> 
> This fixes a regression caused by commit:
> 
> 2dc611de5a3fd955cd0298c50691d4c05046db97
> 
> The sense buffer code in scsi_send_eh_cmnd was changed to use
> alloc_page() and a scatter list, but the sense data copy was not
> updated to match so what we actually get in the sense buffer is total
> grabage starting with the kernel address of the struct page we got.
> Basically the stack frame of scsi_send_eh_cmd() is what ends up
> in the sense buffer.
> 
> Depending upon how pointers look on a given platform, you can
> end up getting sr_ioctl.c errors when you mount a cdrom.  If
> the CDROM gives a check condition for GPCMD_GET_CONFIGURATION issued
> by drivers/cdrom/cdrom.c:cdrom_mmc_profile(), sr_ioctl will
> spit out this error message in sr_do_ioctl() with the way pointers
> are on sparc64:
> 
>   default:
>   printk(KERN_ERR "%s: CDROM (ioctl) error, command: ", 
> cd->cdi.name);
>   __scsi_print_command(cgc->cmd);
>   scsi_print_sense_hdr("sr", &sshdr);
>   err = -EIO;
> 
> This is the error Tom Callaway reported in:
> 
> http://marc.info/?l=linux-sparc&m=117407453208101&w=2
> 
> Anyways, fix this by using page_address(sgl.page) which is OK
> because we know this is low-mem due to GFP_ATOMIC.

The patch looks correct to me and thanks for spotting this stupid
bug!
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD driver-core] Lifetime problems of the current driver model

2007-04-02 Thread Luben Tuikov
--- James Bottomley <[EMAIL PROTECTED]> wrote:

> I'd favour trying to separate kobject and struct device for this ...
> move all the sysfs stuff into kobject and device only stuff into struct
^^^

Currently the kobject implementation is pure and well-defined.  It is
a good implementation [kobject], and I'd hate to see it lost into being
convoluted with/into another model.

Currently the infrastructure layers are well defined:
   kobject ->  (A layer with objects, their behavor and implementation)
  device ->   (--"--)
  sysfs.   (--"--)
This isn't that bad of an infrastructure.

It is this well defined layering, i.e. objects, their behavior and
implementation, that allows different (better/worse) infrastructures
to be built on top of it.

It is this well-defined layering which will allow what Tejun wants
to be implemented.

> device ... but that would get us into disentangling the ksets, which, on
> balance, isn't going to be fun ...

 Luben

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] aacraid: Panics during init time reset (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test)

2007-04-02 Thread Judith Lebzelter
On Mon, Apr 02, 2007 at 02:34:36PM -0400, Salyzyn, Mark wrote:
> Duane discovered in the scsi-misc-2.6 code that the reset handler could
> be called without the sync command handler set up resulting in a panic.
> Judith discovered this issue within minutes and has recently reported
> it. Here is a fix.

Mark,

I applied this patch and ran a kexec test again and I still 
got a panic:

Loading aacraid.Adaptec aacraid driver (1.1-5[2437]-mh4)^M
ko module^M
ACPI: PCI Interrupt :03:0e.0[A] -> Link [LNKC] -> GSI 3 (level, low) -> IRQ 
3^M
Unable to handle kernel NULL pointer dereference at  RIP: ^M
 [<>]^M
PGD 4791067 PUD 473c067 PMD 0 ^M
Oops: 0010 [1] ^M
CPU 0 ^M
Modules linked in: aacraid^M
Pid: 977, comm: insmod Not tainted 2.6.21-rc3-kdump #1^M
RIP: 0010:[<>]  [<>]^M
RSP: :81000474dbf0  EFLAGS: 00010246^M
RAX: c201 RBX: 810004fe4cd8 RCX: 5b540e96^M
RDX: c201 RSI: 81000443cf40 RDI: 810004fe4cd8^M
RBP: fffee138 R08: 81001c20 R09: 8143593e^M
R10: 810004c537a0 R11:  R12: 81000474dc7c^M
R13:  R14:  R15: ^M
FS:  0057b850(0063) GS:814d6000() knlGS:^M
CS:  0010 DS:  ES:  CR0: 8005003b^M
CR2:  CR3: 04745000 CR4: 06e0^M
Process insmod (pid: 977, threadinfo 81000474c000, task 81000443cf40)^M
Stack:  88008e82 3e00fc1f  810004fe4cd8^M
 810004fe4800  8800a6dd 0032^M
 88008c3b   81000474dc7c^M
Call Trace:^M
 [] :aacraid:rx_sync_cmd+0x15c/0x16a^M
 [] :aacraid:aac_rx_restart_adapter+0x7e/0x169^M
 [] :aacraid:_aac_rx_init+0x7b/0x2fc^M
 [] :aacraid:aac_probe_one+0x1a2/0x457^M
 [] pci_device_probe+0x4c/0x75^M
 [] really_probe+0xc4/0x148^M
 [] __driver_attach+0x6d/0xab^M
 [] __driver_attach+0x0/0xab^M
 [] __driver_attach+0x0/0xab^M
 [] bus_for_each_dev+0x43/0x6e^M
 [] bus_add_driver+0x6b/0x18d^M
 [] __pci_register_driver+0x72/0xa7^M
 [] :aacraid:aac_init+0x3a/0x75^M
 [] sys_init_module+0x1195/0x12e6^M
 [] system_call+0x7e/0x83^M
^M
^M
Code:  Bad RIP value.^M
RIP  [<>]^M
 RSP ^M
CR2: ^M

There is an extra line in the call trace for the 'rx_sync_cmd'.

Judith

> 
> IMHO, this needs to be applied immediately regardless of the status of
> the kexec patch as this issue is present in the scsi-misc-2.6 driver for
> all existing init-time recovery actions. This patch in principal would
> not be different w/o the kexec patch.
> 
> ObligatoryDisclaimer: Please accept my condolences regarding Outlook's
> handling of patches.
> 
> This attached patch is against current scsi-misc-2.6
>  
> Signed-off-by: Mark Salyzyn <[EMAIL PROTECTED]>
> 
> ---
> 
> Sincerely -- Mark Salyzyn


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] aacraid: Panics during init time reset (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test)

2007-04-02 Thread Salyzyn, Mark
Judith, another layer on this onion also discovered by Duane, the
interrupt enable handler also needed to be set ... The interrupt enable
was called from within the synchronous command handler.

My private email with the fix was sent a whole 5 minutes ahead of yours
;-> Here is the jist of it for the observers:

/* Failure to reset here is an option ... */
dev->a_ops.adapter_sync_cmd = rx_sync_cmd;
+   dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt;
dev->OIMR = status = rx_readb (dev, MUnit.OIMR);

Yes, the disable interrupt method patched into the enable int platform
function. Later init code will reset it accordingly.

Sincerely -- Mark Salyzyn


> -Original Message-
> From: Judith Lebzelter [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 02, 2007 3:44 PM
> To: Salyzyn, Mark
> Cc: Judith Lebzelter; James Bottomley; 
> linux-scsi@vger.kernel.org; Duane Cox
> Subject: Re: [PATCH] aacraid: Panics during init time reset 
> (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver 
> during 'insmod' for kexec test)
> 
> 
> On Mon, Apr 02, 2007 at 02:34:36PM -0400, Salyzyn, Mark wrote:
> > Duane discovered in the scsi-misc-2.6 code that the reset 
> handler could
> > be called without the sync command handler set up resulting 
> in a panic.
> > Judith discovered this issue within minutes and has 
> recently reported
> > it. Here is a fix.
> 
> Mark,
> 
> I applied this patch and ran a kexec test again and I still 
> got a panic:
> 
> Loading aacraid.Adaptec aacraid driver (1.1-5[2437]-mh4)^M
> ko module^M
> ACPI: PCI Interrupt :03:0e.0[A] -> Link [LNKC] -> GSI 3 
> (level, low) -> IRQ 3^M
> Unable to handle kernel NULL pointer dereference at 
>  RIP: ^M
>  [<>]^M
> PGD 4791067 PUD 473c067 PMD 0 ^M
> Oops: 0010 [1] ^M
> CPU 0 ^M
> Modules linked in: aacraid^M
> Pid: 977, comm: insmod Not tainted 2.6.21-rc3-kdump #1^M
> RIP: 0010:[<>]  [<>]^M
> RSP: :81000474dbf0  EFLAGS: 00010246^M
> RAX: c201 RBX: 810004fe4cd8 RCX: 5b540e96^M
> RDX: c201 RSI: 81000443cf40 RDI: 810004fe4cd8^M
> RBP: fffee138 R08: 81001c20 R09: 8143593e^M
> R10: 810004c537a0 R11:  R12: 81000474dc7c^M
> R13:  R14:  R15: ^M
> FS:  0057b850(0063) GS:814d6000() 
> knlGS:^M
> CS:  0010 DS:  ES:  CR0: 8005003b^M
> CR2:  CR3: 04745000 CR4: 06e0^M
> Process insmod (pid: 977, threadinfo 81000474c000, task 
> 81000443cf40)^M
> Stack:  88008e82 3e00fc1f  
> 810004fe4cd8^M
>  810004fe4800  8800a6dd 0032^M
>  88008c3b   81000474dc7c^M
> Call Trace:^M
>  [] :aacraid:rx_sync_cmd+0x15c/0x16a^M
>  [] :aacraid:aac_rx_restart_adapter+0x7e/0x169^M
>  [] :aacraid:_aac_rx_init+0x7b/0x2fc^M
>  [] :aacraid:aac_probe_one+0x1a2/0x457^M
>  [] pci_device_probe+0x4c/0x75^M
>  [] really_probe+0xc4/0x148^M
>  [] __driver_attach+0x6d/0xab^M
>  [] __driver_attach+0x0/0xab^M
>  [] __driver_attach+0x0/0xab^M
>  [] bus_for_each_dev+0x43/0x6e^M
>  [] bus_add_driver+0x6b/0x18d^M
>  [] __pci_register_driver+0x72/0xa7^M
>  [] :aacraid:aac_init+0x3a/0x75^M
>  [] sys_init_module+0x1195/0x12e6^M
>  [] system_call+0x7e/0x83^M
> ^M
> ^M
> Code:  Bad RIP value.^M
> RIP  [<>]^M
>  RSP ^M
> CR2: ^M
> 
> There is an extra line in the call trace for the 'rx_sync_cmd'.
> 
> Judith
> 
> > 
> > IMHO, this needs to be applied immediately regardless of 
> the status of
> > the kexec patch as this issue is present in the 
> scsi-misc-2.6 driver for
> > all existing init-time recovery actions. This patch in 
> principal would
> > not be different w/o the kexec patch.
> > 
> > ObligatoryDisclaimer: Please accept my condolences 
> regarding Outlook's
> > handling of patches.
> > 
> > This attached patch is against current scsi-misc-2.6
> >  
> > Signed-off-by: Mark Salyzyn <[EMAIL PROTECTED]>
> > 
> > ---
> > 
> > Sincerely -- Mark Salyzyn
> 
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] aacraid: Panics during init time reset (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test)

2007-04-02 Thread Judith Lebzelter
On Mon, Apr 02, 2007 at 03:56:50PM -0400, Salyzyn, Mark wrote:
> Judith, another layer on this onion also discovered by Duane, the
> interrupt enable handler also needed to be set ... The interrupt enable
> was called from within the synchronous command handler.
> 
> My private email with the fix was sent a whole 5 minutes ahead of yours
> ;-> Here is the jist of it for the observers:

I saw it the second I hit 'submit'.

> 
> /* Failure to reset here is an option ... */
> dev->a_ops.adapter_sync_cmd = rx_sync_cmd;
> +   dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt;
> dev->OIMR = status = rx_readb (dev, MUnit.OIMR);
> 
> Yes, the disable interrupt method patched into the enable int platform
> function. Later init code will reset it accordingly.
> 
> Sincerely -- Mark Salyzyn

Yes, that works now.  I will run a series to check if the other IRQ interrupt 
problem is resolved.

Judith

> 
> 
> > -Original Message-
> > From: Judith Lebzelter [mailto:[EMAIL PROTECTED] 
> > Sent: Monday, April 02, 2007 3:44 PM
> > To: Salyzyn, Mark
> > Cc: Judith Lebzelter; James Bottomley; 
> > linux-scsi@vger.kernel.org; Duane Cox
> > Subject: Re: [PATCH] aacraid: Panics during init time reset 
> > (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver 
> > during 'insmod' for kexec test)
> > 
> > 
> > On Mon, Apr 02, 2007 at 02:34:36PM -0400, Salyzyn, Mark wrote:
> > > Duane discovered in the scsi-misc-2.6 code that the reset 
> > handler could
> > > be called without the sync command handler set up resulting 
> > in a panic.
> > > Judith discovered this issue within minutes and has 
> > recently reported
> > > it. Here is a fix.
> > 
> > Mark,
> > 
> > I applied this patch and ran a kexec test again and I still 
> > got a panic:
> > 
> > Loading aacraid.Adaptec aacraid driver (1.1-5[2437]-mh4)^M
> > ko module^M
> > ACPI: PCI Interrupt :03:0e.0[A] -> Link [LNKC] -> GSI 3 
> > (level, low) -> IRQ 3^M
> > Unable to handle kernel NULL pointer dereference at 
> >  RIP: ^M
> >  [<>]^M
> > PGD 4791067 PUD 473c067 PMD 0 ^M
> > Oops: 0010 [1] ^M
> > CPU 0 ^M
> > Modules linked in: aacraid^M
> > Pid: 977, comm: insmod Not tainted 2.6.21-rc3-kdump #1^M
> > RIP: 0010:[<>]  [<>]^M
> > RSP: :81000474dbf0  EFLAGS: 00010246^M
> > RAX: c201 RBX: 810004fe4cd8 RCX: 5b540e96^M
> > RDX: c201 RSI: 81000443cf40 RDI: 810004fe4cd8^M
> > RBP: fffee138 R08: 81001c20 R09: 8143593e^M
> > R10: 810004c537a0 R11:  R12: 81000474dc7c^M
> > R13:  R14:  R15: ^M
> > FS:  0057b850(0063) GS:814d6000() 
> > knlGS:^M
> > CS:  0010 DS:  ES:  CR0: 8005003b^M
> > CR2:  CR3: 04745000 CR4: 06e0^M
> > Process insmod (pid: 977, threadinfo 81000474c000, task 
> > 81000443cf40)^M
> > Stack:  88008e82 3e00fc1f  
> > 810004fe4cd8^M
> >  810004fe4800  8800a6dd 0032^M
> >  88008c3b   81000474dc7c^M
> > Call Trace:^M
> >  [] :aacraid:rx_sync_cmd+0x15c/0x16a^M
> >  [] :aacraid:aac_rx_restart_adapter+0x7e/0x169^M
> >  [] :aacraid:_aac_rx_init+0x7b/0x2fc^M
> >  [] :aacraid:aac_probe_one+0x1a2/0x457^M
> >  [] pci_device_probe+0x4c/0x75^M
> >  [] really_probe+0xc4/0x148^M
> >  [] __driver_attach+0x6d/0xab^M
> >  [] __driver_attach+0x0/0xab^M
> >  [] __driver_attach+0x0/0xab^M
> >  [] bus_for_each_dev+0x43/0x6e^M
> >  [] bus_add_driver+0x6b/0x18d^M
> >  [] __pci_register_driver+0x72/0xa7^M
> >  [] :aacraid:aac_init+0x3a/0x75^M
> >  [] sys_init_module+0x1195/0x12e6^M
> >  [] system_call+0x7e/0x83^M
> > ^M
> > ^M
> > Code:  Bad RIP value.^M
> > RIP  [<>]^M
> >  RSP ^M
> > CR2: ^M
> > 
> > There is an extra line in the call trace for the 'rx_sync_cmd'.
> > 
> > Judith
> > 
> > > 
> > > IMHO, this needs to be applied immediately regardless of 
> > the status of
> > > the kexec patch as this issue is present in the 
> > scsi-misc-2.6 driver for
> > > all existing init-time recovery actions. This patch in 
> > principal would
> > > not be different w/o the kexec patch.
> > > 
> > > ObligatoryDisclaimer: Please accept my condolences 
> > regarding Outlook's
> > > handling of patches.
> > > 
> > > This attached patch is against current scsi-misc-2.6
> > >  
> > > Signed-off-by: Mark Salyzyn <[EMAIL PROTECTED]>
> > > 
> > > ---
> > > 
> > > Sincerely -- Mark Salyzyn
> > 
> > 
> > 
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] aacraid: Panics during init time reset (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test)

2007-04-02 Thread Salyzyn, Mark
Cool! Look forward to your results.

Sorry for being so snitty sounding with the 5 minute comment, I was just
admiring the pace of activity!

Sincerely -- Mark Salyzyn

> -Original Message-
> From: Judith Lebzelter [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 02, 2007 4:17 PM
> To: Salyzyn, Mark
> Cc: Judith Lebzelter; James Bottomley; 
> linux-scsi@vger.kernel.org; Duane Cox
> Subject: Re: [PATCH] aacraid: Panics during init time reset 
> (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver 
> during 'insmod' for kexec test)
> 
> 
> On Mon, Apr 02, 2007 at 03:56:50PM -0400, Salyzyn, Mark wrote:
> > Judith, another layer on this onion also discovered by Duane, the
> > interrupt enable handler also needed to be set ... The 
> interrupt enable
> > was called from within the synchronous command handler.
> > 
> > My private email with the fix was sent a whole 5 minutes 
> ahead of yours
> > ;-> Here is the jist of it for the observers:
> 
> I saw it the second I hit 'submit'.
> 
> > 
> > /* Failure to reset here is an option ... */
> > dev->a_ops.adapter_sync_cmd = rx_sync_cmd;
> > +   dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt;
> > dev->OIMR = status = rx_readb (dev, MUnit.OIMR);
> > 
> > Yes, the disable interrupt method patched into the enable 
> int platform
> > function. Later init code will reset it accordingly.
> > 
> > Sincerely -- Mark Salyzyn
> 
> Yes, that works now.  I will run a series to check if the 
> other IRQ interrupt 
> problem is resolved.
> 
> Judith
> 
> > 
> > 
> > > -Original Message-
> > > From: Judith Lebzelter [mailto:[EMAIL PROTECTED] 
> > > Sent: Monday, April 02, 2007 3:44 PM
> > > To: Salyzyn, Mark
> > > Cc: Judith Lebzelter; James Bottomley; 
> > > linux-scsi@vger.kernel.org; Duane Cox
> > > Subject: Re: [PATCH] aacraid: Panics during init time reset 
> > > (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver 
> > > during 'insmod' for kexec test)
> > > 
> > > 
> > > On Mon, Apr 02, 2007 at 02:34:36PM -0400, Salyzyn, Mark wrote:
> > > > Duane discovered in the scsi-misc-2.6 code that the reset 
> > > handler could
> > > > be called without the sync command handler set up resulting 
> > > in a panic.
> > > > Judith discovered this issue within minutes and has 
> > > recently reported
> > > > it. Here is a fix.
> > > 
> > > Mark,
> > > 
> > > I applied this patch and ran a kexec test again and I still 
> > > got a panic:
> > > 
> > > Loading aacraid.Adaptec aacraid driver (1.1-5[2437]-mh4)^M
> > > ko module^M
> > > ACPI: PCI Interrupt :03:0e.0[A] -> Link [LNKC] -> GSI 3 
> > > (level, low) -> IRQ 3^M
> > > Unable to handle kernel NULL pointer dereference at 
> > >  RIP: ^M
> > >  [<>]^M
> > > PGD 4791067 PUD 473c067 PMD 0 ^M
> > > Oops: 0010 [1] ^M
> > > CPU 0 ^M
> > > Modules linked in: aacraid^M
> > > Pid: 977, comm: insmod Not tainted 2.6.21-rc3-kdump #1^M
> > > RIP: 0010:[<>]  [<>]^M
> > > RSP: :81000474dbf0  EFLAGS: 00010246^M
> > > RAX: c201 RBX: 810004fe4cd8 RCX: 
> 5b540e96^M
> > > RDX: c201 RSI: 81000443cf40 RDI: 
> 810004fe4cd8^M
> > > RBP: fffee138 R08: 81001c20 R09: 
> 8143593e^M
> > > R10: 810004c537a0 R11:  R12: 
> 81000474dc7c^M
> > > R13:  R14:  R15: 
> ^M
> > > FS:  0057b850(0063) GS:814d6000() 
> > > knlGS:^M
> > > CS:  0010 DS:  ES:  CR0: 8005003b^M
> > > CR2:  CR3: 04745000 CR4: 
> 06e0^M
> > > Process insmod (pid: 977, threadinfo 81000474c000, task 
> > > 81000443cf40)^M
> > > Stack:  88008e82 3e00fc1f  
> > > 810004fe4cd8^M
> > >  810004fe4800  8800a6dd 
> 0032^M
> > >  88008c3b   
> 81000474dc7c^M
> > > Call Trace:^M
> > >  [] :aacraid:rx_sync_cmd+0x15c/0x16a^M
> > >  [] :aacraid:aac_rx_restart_adapter+0x7e/0x169^M
> > >  [] :aacraid:_aac_rx_init+0x7b/0x2fc^M
> > >  [] :aacraid:aac_probe_one+0x1a2/0x457^M
> > >  [] pci_device_probe+0x4c/0x75^M
> > >  [] really_probe+0xc4/0x148^M
> > >  [] __driver_attach+0x6d/0xab^M
> > >  [] __driver_attach+0x0/0xab^M
> > >  [] __driver_attach+0x0/0xab^M
> > >  [] bus_for_each_dev+0x43/0x6e^M
> > >  [] bus_add_driver+0x6b/0x18d^M
> > >  [] __pci_register_driver+0x72/0xa7^M
> > >  [] :aacraid:aac_init+0x3a/0x75^M
> > >  [] sys_init_module+0x1195/0x12e6^M
> > >  [] system_call+0x7e/0x83^M
> > > ^M
> > > ^M
> > > Code:  Bad RIP value.^M
> > > RIP  [<>]^M
> > >  RSP ^M
> > > CR2: ^M
> > > 
> > > There is an extra line in the call trace for the 'rx_sync_cmd'.
> > > 
> > > Judith
> > > 
> > > > 
> > > > IMHO, this needs to be applied immediately regardless of 
> > > the status of
> > > > the kexec patch as this iss

Re: [PATCH] aacraid: Panics during init time reset (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test)

2007-04-02 Thread Judith Lebzelter
On Mon, Apr 02, 2007 at 04:29:35PM -0400, Salyzyn, Mark wrote:
> Cool! Look forward to your results.
> 
> Sorry for being so snitty sounding with the 5 minute comment, I was just
> admiring the pace of activity!

No problem.  I thought it was funny.:)

Judith
> 
> Sincerely -- Mark Salyzyn
> 
> > -Original Message-
> > From: Judith Lebzelter [mailto:[EMAIL PROTECTED] 
> > Sent: Monday, April 02, 2007 4:17 PM
> > To: Salyzyn, Mark
> > Cc: Judith Lebzelter; James Bottomley; 
> > linux-scsi@vger.kernel.org; Duane Cox
> > Subject: Re: [PATCH] aacraid: Panics during init time reset 
> > (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver 
> > during 'insmod' for kexec test)
> > 
> > 
> > On Mon, Apr 02, 2007 at 03:56:50PM -0400, Salyzyn, Mark wrote:
> > > Judith, another layer on this onion also discovered by Duane, the
> > > interrupt enable handler also needed to be set ... The 
> > interrupt enable
> > > was called from within the synchronous command handler.
> > > 
> > > My private email with the fix was sent a whole 5 minutes 
> > ahead of yours
> > > ;-> Here is the jist of it for the observers:
> > 
> > I saw it the second I hit 'submit'.
> > 
> > > 
> > > /* Failure to reset here is an option ... */
> > > dev->a_ops.adapter_sync_cmd = rx_sync_cmd;
> > > +   dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt;
> > > dev->OIMR = status = rx_readb (dev, MUnit.OIMR);
> > > 
> > > Yes, the disable interrupt method patched into the enable 
> > int platform
> > > function. Later init code will reset it accordingly.
> > > 
> > > Sincerely -- Mark Salyzyn
> > 
> > Yes, that works now.  I will run a series to check if the 
> > other IRQ interrupt 
> > problem is resolved.
> > 
> > Judith
> > 
> > > 
> > > 
> > > > -Original Message-
> > > > From: Judith Lebzelter [mailto:[EMAIL PROTECTED] 
> > > > Sent: Monday, April 02, 2007 3:44 PM
> > > > To: Salyzyn, Mark
> > > > Cc: Judith Lebzelter; James Bottomley; 
> > > > linux-scsi@vger.kernel.org; Duane Cox
> > > > Subject: Re: [PATCH] aacraid: Panics during init time reset 
> > > > (Was: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver 
> > > > during 'insmod' for kexec test)
> > > > 
> > > > 
> > > > On Mon, Apr 02, 2007 at 02:34:36PM -0400, Salyzyn, Mark wrote:
> > > > > Duane discovered in the scsi-misc-2.6 code that the reset 
> > > > handler could
> > > > > be called without the sync command handler set up resulting 
> > > > in a panic.
> > > > > Judith discovered this issue within minutes and has 
> > > > recently reported
> > > > > it. Here is a fix.
> > > > 
> > > > Mark,
> > > > 
> > > > I applied this patch and ran a kexec test again and I still 
> > > > got a panic:
> > > > 
> > > > Loading aacraid.Adaptec aacraid driver (1.1-5[2437]-mh4)^M
> > > > ko module^M
> > > > ACPI: PCI Interrupt :03:0e.0[A] -> Link [LNKC] -> GSI 3 
> > > > (level, low) -> IRQ 3^M
> > > > Unable to handle kernel NULL pointer dereference at 
> > > >  RIP: ^M
> > > >  [<>]^M
> > > > PGD 4791067 PUD 473c067 PMD 0 ^M
> > > > Oops: 0010 [1] ^M
> > > > CPU 0 ^M
> > > > Modules linked in: aacraid^M
> > > > Pid: 977, comm: insmod Not tainted 2.6.21-rc3-kdump #1^M
> > > > RIP: 0010:[<>]  [<>]^M
> > > > RSP: :81000474dbf0  EFLAGS: 00010246^M
> > > > RAX: c201 RBX: 810004fe4cd8 RCX: 
> > 5b540e96^M
> > > > RDX: c201 RSI: 81000443cf40 RDI: 
> > 810004fe4cd8^M
> > > > RBP: fffee138 R08: 81001c20 R09: 
> > 8143593e^M
> > > > R10: 810004c537a0 R11:  R12: 
> > 81000474dc7c^M
> > > > R13:  R14:  R15: 
> > ^M
> > > > FS:  0057b850(0063) GS:814d6000() 
> > > > knlGS:^M
> > > > CS:  0010 DS:  ES:  CR0: 8005003b^M
> > > > CR2:  CR3: 04745000 CR4: 
> > 06e0^M
> > > > Process insmod (pid: 977, threadinfo 81000474c000, task 
> > > > 81000443cf40)^M
> > > > Stack:  88008e82 3e00fc1f  
> > > > 810004fe4cd8^M
> > > >  810004fe4800  8800a6dd 
> > 0032^M
> > > >  88008c3b   
> > 81000474dc7c^M
> > > > Call Trace:^M
> > > >  [] :aacraid:rx_sync_cmd+0x15c/0x16a^M
> > > >  [] :aacraid:aac_rx_restart_adapter+0x7e/0x169^M
> > > >  [] :aacraid:_aac_rx_init+0x7b/0x2fc^M
> > > >  [] :aacraid:aac_probe_one+0x1a2/0x457^M
> > > >  [] pci_device_probe+0x4c/0x75^M
> > > >  [] really_probe+0xc4/0x148^M
> > > >  [] __driver_attach+0x6d/0xab^M
> > > >  [] __driver_attach+0x0/0xab^M
> > > >  [] __driver_attach+0x0/0xab^M
> > > >  [] bus_for_each_dev+0x43/0x6e^M
> > > >  [] bus_add_driver+0x6b/0x18d^M
> > > >  [] __pci_register_driver+0x72/0xa7^M
> > > >  [] :aacraid:aac_init+0x3a/0x75^M
> > > >  [] sys_init_module+0x1195/0x12e6^M
> > > >  [] system_call+0x7e/

[patch 1/1] scsi: Disable short inquiry log by default

2007-04-02 Thread brking

If a scsi device reports less than 36 bytes of standard inquiry
data, scsi core logs a KERN_INFO printk indicating this. It was
observed that this results in lots of clutter in the log on
systems with devices that respond to a SCSI Inquiry with PQ=3 or
PQ=1 with less than 36 bytes of inquiry data, such as ibmvscsi.
Disable this log by default.

Signed-off-by: Brian King <[EMAIL PROTECTED]>
---

 drivers/scsi/scsi_scan.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN drivers/scsi/scsi_scan.c~scsi_scan_short_inq_log 
drivers/scsi/scsi_scan.c
--- linux-2.6/drivers/scsi/scsi_scan.c~scsi_scan_short_inq_log  2007-04-02 
15:10:46.0 -0500
+++ linux-2.6-bjking1/drivers/scsi/scsi_scan.c  2007-04-02 15:13:39.0 
-0500
@@ -662,8 +662,8 @@ static int scsi_probe_lun(struct scsi_de
 * strings.
 */
if (sdev->inquiry_len < 36) {
-   printk(KERN_INFO "scsi scan: INQUIRY result too short (%d),"
-   " using 36\n", sdev->inquiry_len);
+   SCSI_LOG_SCAN_BUS(1, printk(KERN_INFO "scsi scan: INQUIRY 
result too"
+   " short (%d), using 36\n", 
sdev->inquiry_len));
sdev->inquiry_len = 36;
}
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] scsi: Disable short inquiry log by default

2007-04-02 Thread Matthew Wilcox
On Mon, Apr 02, 2007 at 04:20:22PM -0500, [EMAIL PROTECTED] wrote:
>   if (sdev->inquiry_len < 36) {
> - printk(KERN_INFO "scsi scan: INQUIRY result too short (%d),"
> - " using 36\n", sdev->inquiry_len);
> + SCSI_LOG_SCAN_BUS(1, printk(KERN_INFO "scsi scan: INQUIRY 
> result too"
> + " short (%d), using 36\n", 
> sdev->inquiry_len));
>   sdev->inquiry_len = 36;

The wrapping's a bit awkward.  How about:
+   SCSI_LOG_SCAN_BUS(1, printk(KERN_INFO "scsi scan: INQUIRY "
+   "result too short (%d), using 36\n",
+   sdev->inquiry_len);
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] scsi: Disable short inquiry log by default

2007-04-02 Thread James Bottomley
On Mon, 2007-04-02 at 16:20 -0500, [EMAIL PROTECTED] wrote:
> If a scsi device reports less than 36 bytes of standard inquiry
> data, scsi core logs a KERN_INFO printk indicating this. It was
> observed that this results in lots of clutter in the log on
> systems with devices that respond to a SCSI Inquiry with PQ=3 or
> PQ=1 with less than 36 bytes of inquiry data, such as ibmvscsi.
> Disable this log by default.

It shouldn't be doing this ... the standards are pretty clear, say SPC-3
section 6.4.2

"The standard INQUIRY data shall contain at least 36 bytes"

Horrible things happen when this rule is violated (as it is by some
badly constructed devices), so I'd really like to leave the print in so
we know what we're debugging.

Can't you fix ibmvscsi to be standards compliant?

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-02 Thread Ed Lin


> -Original Message-
> From: James Bottomley [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, March 31, 2007 7:22 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> 
> 
> On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote:
> > The internal id/lun mapping of st_vsc and st_vsc1 
> controllers is different
> > from st_shasta. The original driver code can only  map 
> first 16 'entities'
> > for st_vsc and st_vsc1 while there are actually 128 available.
> > 
> > Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
> > no harm because inquiries beyond boundary are discarded by firmware.
> > 
> > The correct internal mapping should be:
> > id:0~15, lun:0~7 (st_shasta)
> > id:0, lun:0~127 (st_yosemite)
> > id:0~127, lun:0 (st_vsc and st_vsc1)
> > To scsi mid layer they are all channel:0~7, id:0~15, lun:0, 
> with a maximun
> > 'entity' number of 128. The RAID console only interfaces to 
> scsi mid layer
> > and is always mapped at channel:0, id:16, lun:0.
> 
> I'm with Christoph here ... if we're going to break the backwards
> compatibility of the mappings (which your code does) then we 
> could just
> dump channel and use the SCSI id and lun directly.
> 
> Understanding this code is predicated on this quirky definition in
> stex_queuecommand:
> 
>   id = cmd->device->id;
>   lun = cmd->device->channel; /* firmware lun issue work around */
>^^^
> 
> > @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd,
> >  
> > req = stex_alloc_req(hba);
> >  
> > -   if (hba->cardtype == st_yosemite) {
> > -   req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id;
> 
> This looks to be correct, it goes up id 0 to ST_MAX_TARGET_NUM -1 then
> takes the next channel.
> 
> > -   req->target = 0;
> > -   } else {
> > +   if (hba->cardtype == st_shasta) {
> > req->lun = lun;
> > req->target = id;
> > +   } else if (hba->cardtype == st_yosemite){
> > +   req->lun = id * ST_MAX_LUN_PER_TARGET + lun;
> > +   req->target = 0;
> > +   } else {
> > +   /* st_vsc and st_vsc1 */
> > +   req->lun = 0;
> > +   req->target = id * ST_MAX_LUN_PER_TARGET + lun;
> 
> These both look to be wrong.  You're taking the channel as the lowest
> common denominator, so your first target is on channel 1 id 
> 0, your next
> on channel 2, id 0 and so on.  That's really going to mess with the
> ordering (which will be user visible) is that really what you want?
> 

How about the attached one? 


s1
Description: s1


RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-04-02 Thread Ed Lin


> -Original Message-
> From: James Bottomley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 02, 2007 11:28 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for 
> console device
> 
> 
> On Mon, 2007-04-02 at 11:14 -0700, Ed Lin wrote:
> > I just saw the routine name scsi_eh_try_stu, and didn't notice the
> > allow_restart (partly because I thought it was not harmful...).
> > But the TEST_UNIT_READY must stay.
> 
> Sure ... I was just checking since your change log implied you'd seen
> the problem from the error handler ... however, we can add it ...
> there's a possibility of getting spin up on init from sd anyway.
> 

You make the decision. But after reconsideration, I think it's better
to remove unused code. It also needs change since the patch about
id mapping is modified in another mail.

How about the attachment here?


s3
Description: s3


Re: aic94xx driver woes

2007-04-02 Thread Douglas Gilbert
James Bottomley wrote:
> On Sun, 2007-04-01 at 16:29 -0400, Douglas Gilbert wrote:
>> ...
>> sas: phy3 added to port0, phy_mask:0x8
>> sas: DOING DISCOVERY on port 0, pid:2110
>> aic94xx: scb:0x80 timed out
> 
> This might be the problem.
> 
> I see this periodically when a phy goes out to lunch on my system ...
> with me, it always seems to be phy0 of a port containing phy0-4 ... so
> phy1-3 still function to get messages.
> 
> Can you try sending a link reset to phy3?
> 
> It should be something like
> 
> echo 1 > /sys/class/sas_phy/phy-X:3/link_reset
> 
> and see if it just produces 
> 
> aic94xx: scb:0x80 timed out

Yes it does.

> Again?

It is repeatable.

Also when I connect to phy 0 it works (both direct
connect and expander). However phys 1 and 2 react
like phy 3 shown above.

Doug Gilbert

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aic94xx driver woes

2007-04-02 Thread James Bottomley
On Mon, 2007-04-02 at 19:36 -0400, Douglas Gilbert wrote:
> > echo 1 > /sys/class/sas_phy/phy-X:3/link_reset
> > 
> > and see if it just produces 
> > 
> > aic94xx: scb:0x80 timed out
> 
> Yes it does.
> 
> > Again?
> 
> It is repeatable.
> 
> Also when I connect to phy 0 it works (both direct
> connect and expander). However phys 1 and 2 react
> like phy 3 shown above.

OK, well, I know what it is, I just don't know how to fix it.

On certain error conditions, whatever controls the phy SCB processing
seems to freeze to a particular phy.  With me, it's externally induced
(an expander->expander->satapi configuration).  I can recover my system
by powering off and on the expander setup.

Whatever this condition is, it blows away all error recovery, since
they're also done via means of ascbs, so any error recovery ascbs also
get stuck until they timeout.  Someone with the specs needs to look and
see if there's a way we can kick the phy queue (or whatever queue it's
stuck in)---I'm assuming it's a phy queue because I can get packets out
via other phys, just nothing via the stuck one.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD driver-core] Lifetime problems of the current driver model

2007-04-02 Thread Tejun Heo
Cornelia Huck wrote:
> On Mon, 2 Apr 2007 11:20:48 +0200,
> Cornelia Huck <[EMAIL PROTECTED]> wrote:
> 
>> Cool. However, there's something fishy there (not sure whether it's in
>> your patch or a latent bug in the ccw bus code that just has been
>> uncovered):
> 
> Similar bug when loading/unloading a module that creates a driver
> attribute. The winner seems to be kfree(sd->s_element) in
> release_sysfs_dirent() (in case of an attribute, it will point to the
> attribute structure, which is usually statically created)...

Thanks for finding it out.  I was suspecting that last minute change.
The code should be

if (dir node)
kfree(s_element)
else if (symlink node)
do things and kfree()

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] scsi: Add EH Start Unit retry

2007-04-02 Thread James Bottomley
On Mon, 2007-04-02 at 13:57 -0500, Brian King wrote:
> I agree. The reason the ipr adapter firmware added this UA
> in this configuration was to support SCSI 1 reservations and
> communicate to the host that any reservation previously
> held to the disk array is now lost since the adapter was reset.

For that particular case, scsi_report_bus_reset() was the original
design concept ... it's asynchronous when the problem occurs ... if you
have to wait until UA, you don't necessarily find there's a problem
until much later.

James



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html