[PATCH v2] drivers: scsi: scsi_lib.c: use SG_POOL instead of SP

2013-12-03 Thread Chen Gang
the macro SP is too common to make conflict with others, so recommend
to use another more readable and non-conflict macro SG_POOL instead of
(and recommend others do not use SP either).

The related warning (with allmodconfig for hexagon):

CC [M]  drivers/scsi/scsi_lib.o
  drivers/scsi/scsi_lib.c:46:0: warning: "SP" redefined [enabled by default]
  arch/hexagon/include/uapi/asm/registers.h:9:0: note: this is the location of 
the previous definition


Signed-off-by: Chen Gang 
---
 drivers/scsi/scsi_lib.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7bd7f0d..19967fa 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -43,28 +43,28 @@ struct scsi_host_sg_pool {
mempool_t   *pool;
 };
 
-#define SP(x) { x, "sgpool-" __stringify(x) }
+#define SG_POOL(x) { x, "sgpool-" __stringify(x) }
 #if (SCSI_MAX_SG_SEGMENTS < 32)
 #error SCSI_MAX_SG_SEGMENTS is too small (must be 32 or greater)
 #endif
 static struct scsi_host_sg_pool scsi_sg_pools[] = {
-   SP(8),
-   SP(16),
+   SG_POOL(8),
+   SG_POOL(16),
 #if (SCSI_MAX_SG_SEGMENTS > 32)
-   SP(32),
+   SG_POOL(32),
 #if (SCSI_MAX_SG_SEGMENTS > 64)
-   SP(64),
+   SG_POOL(64),
 #if (SCSI_MAX_SG_SEGMENTS > 128)
-   SP(128),
+   SG_POOL(128),
 #if (SCSI_MAX_SG_SEGMENTS > 256)
 #error SCSI_MAX_SG_SEGMENTS is too large (256 MAX)
 #endif
 #endif
 #endif
 #endif
-   SP(SCSI_MAX_SG_SEGMENTS)
+   SG_POOL(SCSI_MAX_SG_SEGMENTS)
 };
-#undef SP
+#undef SG_POOL
 
 struct kmem_cache *scsi_sdb_cache;
 
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to use SES driver to send SES commands?

2013-12-03 Thread Newtech Tan
Hi Douglas

Thanks! 

I am using MIPs CPU on embedded system, after cross-compliled sg3_utils 
package, 
I will try to use sg_senddiag to send ses command.

> I'm not sure what you expect from the ses driver
> in the kernel. If it likes an SES device (and it
> doesn't like some of them) then you will get a
> small, but useful, set of "knobs" to twiddle and
> read in sysfs. As far as I can see the ses driver does
> not create any device nodes (as found in /dev). This
> means there is no way to extend it by using the SG_IO
> ioctl to send SCSI commands to the enclosure via
> the ses driver. And there is no need since the sg and
> bsg drivers already give you that capability.
> 
> Your question is vague, if by "send SES commands" you
> are referring to SCSI commands then again my advice is
> to look at what sg_senddiag and sg_ses do. If that is not
> what you want then write something yourself using the
> SG_IO ioctl on a bsg or sg device node.
> 
> Doug Gilbert
> 
> 
> On 13-12-02 02:12 AM, Newtech Tan wrote:
> > Hi Douglas
> >
> >   Thanks for your kind reply.
> >
> >   sg_senddiag is the sg_ses utility. But i want to use SES 
> > driver(linux/driver/scsi/ses.c), do you have any advice?
> >
> >
> >> On 13-11-29 05:07 AM, Newtech Tan wrote:
> >>> Hi friends
> >>>
> >>>   I subscribed the mailing list just now. Would you please to give me 
> >>> help?
> >>>
> >>>   Who can tell me how to use SES driver(linux/driver/scsi/ses.c) to 
> >>> send SES command(SEND DIAGNOSTIC, RECEIVE DIAGNOSTIC RESULTS) in my linux 
> >>> program?
> >>> (I don't want to use sg_ses.)
> >>
> >> Then look at the source for sg_senddiag (sg_senddiag.c).
> >> Your SES device is either /dev/sg2 or /dev/bsg/4:0:0:0
> >>
> >>> The following is my system info. /dev/sg2 is my SES device.
> >>>
> >>> [root@tan-sl dev]# lsscsi -g
> >>> [0:0:0:0]cd/dvd  HL-DT-ST DVD+-RW GSA-H53N B104  /dev/sr0   /dev/sg0
> >>> [3:0:0:0]diskATA  TOSHIBA MK8061GS ME0A  /dev/sda   /dev/sg1
> >>> [4:0:0:0]enclosu LSI  SAS616x  0502  - /dev/sg2
> >>>
> >>> [root@tan-sl enclosure]# pwd
> >>> /sys/class/enclosure
> >>> [root@tan-sl enclosure]# ls -l
> >>> total 0
> >>> lrwxrwxrwx 1 root root 0 Nov 29 14:51 4:0:0:0 -> 
> >>> ../../devices/pci:00/:00:1c.0/:02:00.0/host4/port-4:0/expan
> >>> der-4:0/port-4:0:0/end_device-4:0:0/target4:0:0/4:0:0:0/enclosure/4:0:0:0
> >>>
> >>> in linux, I can't find ses device.  Under /sys/class/enclosure, folder 
> >>> 4:0:0:0 existed.
> >>>
> >>> I will apprectiate for your help.

お疲れ様です。譚です。


以上、よろしくお願いいたします。

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing

2013-12-03 Thread James Bottomley
On Tue, 2013-12-03 at 11:46 -0600, Alireza Haghdoost wrote:
> On Tue, Dec 3, 2013 at 7:25 AM, James Bottomley
>  wrote:
> > Well, no, we could have used Ordered instead of Simple tags ... that
> > would preserve submission order according to spec.  This wouldn't really
> > work for SATA because NCQ only has simple tags.
> 
> Thanks a lot James for your comments. Is it possible to configure TCQ
> mode to the Ordered tag instead of Simple tags? I understand that NCQ
> does not support Ordered tags but I think it would be nice to keep
> this functionality as an option for other SCSI targets like qla2xxx.
> I can see the discussion about TAG ordering in the mailing list.
> However, I am not sure if it is functional right now or not.

It's set in the scsi_populate_tag() inline function (scsi_tcq.h).
That's currently hard coded to simple tags.

> > The point is that our
> > granular unit of ordering is between two barriers, which is way above
> > the request/tag level so we didn't bother to enforce tag ordering.
> 
> Does a barrier force flush all in_flight SCSI commands ?

A flush barrier does, yes ... that's the predominant implementation.

>  Based on my
> understanding if we put a barrier between multiple requests, it wont
> return until TCQ process all in_flight scsi commands. Which means we
> can not keep a fixed load on TCQ and it would certainly reduce the
> throughput of our application.

Yes, that's what we see in filesystems with barriers enabled.  It's the
price we pay for integrity.

> > However, handling
> > strict ordering in the face of requeuing events like QUEUE FULL or BUSY
> > is hard so we didn't bother.
> 
> We have a peace of code to monitor in_flight requests and avoid
> QUEUE_FULL events. However, would you please let us know a case that
> cause a BUSY events ? Does it means the scsi target is busy processing
> other requests with-in the same host machine ?

BUSY is a catch all status.  It's different from QUEUE FULL because
queue tracking algorithms use QUEUE FULL to determine the optimal number
of in-flight commands (we can do that in the mid-layer today with the
queue full tracking code).  BUSY means the command needs retrying
because of some other condition on the initiator that isn't connected
with the task queues, so it isn't counted against the queue full status
tracking for reducing command flows.  It's most often returned by
multi-initiator devices in the presence of management (or even
statistics type) conditions, or because of scheduling or caching issues.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scsi_transport_sas: move bsg destructor into sas_rphy_remove

2013-12-03 Thread Joe Lawrence
The recent change in sysfs, bcdde7e221a8750f9b62b6d0bd31b72ea4ad9309
"sysfs: make __sysfs_remove_dir() recursive" revealed an asymmetric
rphy device creation/deletion sequence in scsi_transport_sas:

  modprobe mpt2sas
sas_rphy_add
  device_add A   rphy->dev
  device_add B   sas_device transport class
  device_add C   sas_end_device transport class
  device_add D   bsg class

  rmmod mpt2sas
sas_rphy_delete
  sas_rphy_remove
device_del B
device_del C
device_del A
  sysfs_remove_group recursive sysfs dir removal
  sas_rphy_free
device_del D warning

  where device A is the parent of B, C, and D.

When sas_rphy_free tries to unregister the bsg request queue (device D
above), the ensuing sysfs cleanup discovers that its sysfs group has
already been removed and emits a warning, "sysfs group... not found for
kobject 'end_device-X:0'".

Since bsg creation is a side effect of sas_rphy_add, move its
complementary removal call into sas_rphy_remove. This imposes the
following tear-down order for the devices above: D, B, C, A.

Note the sas_device and sas_end_device transport class devices (B and C
above) are created and destroyed both via the list match traversal in
attribute_container_device_trigger, so the order in which they are
handled is fixed. This is fine as long as they are deleted before their
parent device.

Signed-off-by: Joe Lawrence 
Cc: "James E.J. Bottomley" 
---
 drivers/scsi/scsi_transport_sas.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_transport_sas.c 
b/drivers/scsi/scsi_transport_sas.c
index 1b681427dde0..c341f855fadc 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -1621,8 +1621,6 @@ void sas_rphy_free(struct sas_rphy *rphy)
list_del(&rphy->list);
mutex_unlock(&sas_host->lock);
 
-   sas_bsg_remove(shost, rphy);
-
transport_destroy_device(dev);
 
put_device(dev);
@@ -1681,6 +1679,7 @@ sas_rphy_remove(struct sas_rphy *rphy)
}
 
sas_rphy_unlink(rphy);
+   sas_bsg_remove(NULL, rphy);
transport_remove_device(dev);
device_del(dev);
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sysfs group not found for kobject on mpt2sas unload

2013-12-03 Thread Joe Lawrence
On Mon, 25 Nov 2013 11:23:39 -0500
Joe Lawrence  wrote:

> On Wed, 20 Nov 2013 14:08:40 -0500
> Joe Lawrence  wrote:
> 
> > Starting in 3.12, when loading and unloading the mpt2sas driver, I see
> > the following warning:
> > 
> > [ cut here ]
> > WARNING: CPU: 20 PID: 19096 at fs/sysfs/group.c:214 
> > sysfs_remove_group+0xc6/0xd0()
> > sysfs group 81ca2f40 not found for kobject 'end_device-30:0'
> > Modules linked in: mpt2sas(-) 
> > stap_edcc1781e2697fc53c3d320bc2530218_19063(OF) ebtable_nat osst 
> > nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat 
> > nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 
> > iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 
> > nf_defrag_ipv4 xt_conntrack nf_conntrack bonding ebtable_filter ebtables 
> > ip6table_filter ip6_tables ixgbe igb x86_pkg_temp_thermal ptp coretemp 
> > pps_core joydev mdio crc32_pclmul crc32c_intel raid_class pcspkr 
> > ghash_clmulni_intel scsi_transport_sas ipmi_si dca ipmi_msghandler ntb 
> > uinput dm_round_robin sd_mod qla2xxx syscopyarea sysfillrect sysimgblt 
> > i2c_algo_bit drm_kms_helper ttm drm scsi_transport_fc scsi_tgt usb_storage 
> > i2c_core dm_multipath [last unloaded: 
> > stap_2da929a187c82c607a23237c27bf2d06_18803]
> > CPU: 20 PID: 19096 Comm: rmmod Tainted: GF   W  O 3.12.0+ #4
> > Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.2:52 
> > 04/09/2013
> >  0009 88083ad3bb88 8165265e 88083ad3bbd0
> >  88083ad3bbc0 8105514d  81ca2f40
> >  880851f46ef8 88007a73cf38 88083fb8c6e8 88083ad3bc20
> > Call Trace:
> >  [] dump_stack+0x4d/0x66
> >  [] warn_slowpath_common+0x7d/0xa0
> >  [] warn_slowpath_fmt+0x4c/0x50
> >  [] ? sysfs_get_dirent_ns+0x4e/0x70
> >  [] sysfs_remove_group+0xc6/0xd0
> >  [] dpm_sysfs_remove+0x43/0x50
> >  [] device_del+0x45/0x1c0
> >  [] device_unregister+0x1e/0x60
> >  [] bsg_unregister_queue+0x5e/0xa0
> >  [] sas_rphy_free+0x7a/0xb0 [scsi_transport_sas]
> >  [] sas_port_delete+0x35/0x160 [scsi_transport_sas]
> >  [] ? sysfs_remove_link+0x23/0x30
> >  [] mpt2sas_transport_port_remove+0x19a/0x1e0 [mpt2sas]
> >  [] _scsih_remove_device+0xb0/0x100 [mpt2sas]
> >  [] 
> > mpt2sas_device_remove_by_sas_address.part.54+0x59/0x80 [mpt2sas]
> >  [] _scsih_remove+0xf9/0x210 [mpt2sas]
> >  [] pci_device_remove+0x3b/0xb0
> >  [] __device_release_driver+0x7f/0xf0
> >  [] driver_detach+0xc0/0xd0
> >  [] bus_remove_driver+0x55/0xd0
> >  [] driver_unregister+0x2c/0x50
> >  [] pci_unregister_driver+0x23/0x80
> >  [] _scsih_exit+0x25/0x912 [mpt2sas]
> >  [] SyS_delete_module+0x16d/0x2d0
> >  [] ? do_page_fault+0xe/0x10
> >  [] system_call_fastpath+0x16/0x1b
> > ---[ end trace b4eef98870c871fd ]---
> > 
> > Instrumenting the module loading/unloading cycle with systemtap, it
> > reports the following sequence of events:
> > 
> > modprobe
> >   device_add(A)
> >   device_add(B child of A)
> >   device_add(C child of A)
> >   device_add(D child of A)
> > 
> > rmmod
> >   device_del(B child of A)
> >   device_del(C child of A)
> >   device_del(A)
> >   device_del(D child of A) << WARNING
> > 
> > The same sequence of device_add/del events occur in 3.11, but without
> > the warning.  Git bisect shows bcdde7e221a8750f9b62b6d0bd31b72ea4ad9309
> > "sysfs: make __sysfs_remove_dir() recursive" as the first bad commit.
> > 
> 
> FWIW, I applied Mika's patch that Tejun posted the other day, "sysfs:
> handle duplicate removal attempts in sysfs_remove_group()" [1] and the
> warning goes away on mpt2sas unload.
> 
> [1] 
> http://git.kernel.org/cgit/linux/kernel/git/tj/misc.git/commit/?h=review-sysfs-fixes&id=a69cc96d8c434c6cb64847f37caa890af705fc5c
> 
> -- Joe

Hi James,

(Correction, the sysfs change introducing the warning is 3.13-rc1+, not
3.12.)

Also, it seems that in 3.13-rc2, the patch from Mika that I had
referenced was reverted and that bug handled separately.

  81440e7 Revert "sysfs: handle duplicate removal attempts in 
sysfs_remove_group()"
  54d7114 sysfs: handle duplicate removal attempts in sysfs_remove_group()

So in 3.13-rc2, the warning persists when loading and unloading the
mpt2sas driver.

I looked at the code again, and I'm not sure why the sas_bsg_remove call
was placed in sas_rphy_free and not sas_rphy_remove.

The sas_rphy_remove function is tasked with undoing the actions of
sas_rphy_add ... so callers of sas_rphy_free are expecting that
everything sas_rphy_add had setup (if it was even called) has been
cleaned up.

Since bsg components are initialized by sas_rphy_add, does it make sense
for their destructor to live in sas_rphy_remove?

I've tested that change against a few mpt2sas driver load and unload
cycles without incident or sysfs warning.  If the placement of the
sas_bsg_remove call is important where it is, then a more substantial
change may be required to delete devices in an order that appeases
sysfs.

Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing

2013-12-03 Thread Alireza Haghdoost
On Tue, Dec 3, 2013 at 7:25 AM, James Bottomley
 wrote:
> Well, no, we could have used Ordered instead of Simple tags ... that
> would preserve submission order according to spec.  This wouldn't really
> work for SATA because NCQ only has simple tags.

Thanks a lot James for your comments. Is it possible to configure TCQ
mode to the Ordered tag instead of Simple tags? I understand that NCQ
does not support Ordered tags but I think it would be nice to keep
this functionality as an option for other SCSI targets like qla2xxx.
I can see the discussion about TAG ordering in the mailing list.
However, I am not sure if it is functional right now or not.


> The point is that our
> granular unit of ordering is between two barriers, which is way above
> the request/tag level so we didn't bother to enforce tag ordering.

Does a barrier force flush all in_flight SCSI commands ? Based on my
understanding if we put a barrier between multiple requests, it wont
return until TCQ process all in_flight scsi commands. Which means we
can not keep a fixed load on TCQ and it would certainly reduce the
throughput of our application.

> However, handling
> strict ordering in the face of requeuing events like QUEUE FULL or BUSY
> is hard so we didn't bother.

We have a peace of code to monitor in_flight requests and avoid
QUEUE_FULL events. However, would you please let us know a case that
cause a BUSY events ? Does it means the scsi target is busy processing
other requests with-in the same host machine ?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "swiotlb buffer is full" with 3.13-rc1+ but not 3.4.

2013-12-03 Thread James Bottomley
On Tue, 2013-12-03 at 12:33 -0500, Konrad Rzeszutek Wilk wrote:
> On Sat, Nov 30, 2013 at 03:48:44PM -0500, James Bottomley wrote:
> > On Sat, 2013-11-30 at 13:56 -0500, Konrad Rzeszutek Wilk wrote:
> > > My theory is that the SWIOTLB is not full - it is just that the request
> > > is for a compound page that is more than 512kB. Please note that
> > > SWIOTLB highest "chunk" of buffer it can deal with is 512kb.
> > > 
> > > And that is of course the question comes out - why would it try to
> > > bounce buffer it. In Xen the answer is simple - the sg chunks cross page
> > > boundaries which means that they are not physically contingous - so we
> > > have to use the bounce buffer. It would be better if the the sg list
> > > provided a large list of 4KB pages instead of compound pages as that
> > > could help in avoiding the bounce buffer.
> > > 
> > > But I digress - this is a theory - I don't know whether the SCSI layer
> > > does any colescing of the sg list - and if so, whether there is any
> > > easy knob to tell it to not do it.
> > 
> > Well, SCSI doesn't, but block does.  It's actually an efficiency thing
> > since most firmware descriptor formats cope with multiple pages and the
> > more descriptors you have for a transaction, the more work the on-board
> > processor on the HBA has to do.  If you have an emulated HBA, like
> > virtio, you could turn off physical coalesing by setting the
> > use_clustering flag to DISABLE_CLUSTERING.  But you can't do that for a
> > real card.  I assume the problem here is that the host is passing the
> > card directly to the guest and the guest clusters based on its idea of
> > guest pages which don't map to contiguous physical pages?
> 
> Kind of. Except that in this case the guest does know that it can't map
> them contingously - and resorts to using the bounce buffer so that it
> can provide a nice chunk of contingous area. This is detected by
> the SWIOTLB layer and also the block layer to discourage coalescing
> there.
> 
> But since SCSI is all about sg list I think it gets tangled up here:
> 
> 537 for_each_sg(sgl, sg, nelems, i) { 
>   
> 538 phys_addr_t paddr = sg_phys(sg);  
>   
> 539 dma_addr_t dev_addr = xen_phys_to_bus(paddr); 
>   
> 540   
>   
> 541 if (swiotlb_force ||  
>   
> 542 !dma_capable(hwdev, dev_addr, sg->length) ||  
>   
> 543 range_straddles_page_boundary(paddr, sg->length)) {   
>   
> 544 phys_addr_t map = swiotlb_tbl_map_single(hwdev,   
>   
> 545  
> start_dma_addr,
> 546  
> sg_phys(sg),   
> 547  
> sg->length,
> 548  dir);
>   
> 
> So it is either not capable of reaching that physical address (so DMA
> mask, but I doubt it - this is LSI which can do 64bit).

Right, so no bouncing.

>  Or the pages
> straddle. They can straddle it by well, being offset at odd locations, or
> compound pages.

All modern filesystems have 4k+ block sizes, so no offsets at all.  For
DIO you can get offsets at the beginning and end of the transfer, but
they will be offsets within the page, so the problem can only be
clustering (physical merging).

> But why would they in the first place - and so many of them - considering
> the flow of those printks Ian's is seeing.

Probably because compaction and our allocators are designed to give out
physically contiguous pages, which work their way back into the block
layer in order.  On a lot of I/O workloads, we see 30%+ physical
merging.

> James,
> The SCSI layer wouldn't do any funny business here right - no reording
> of bios? That is all left to the block layer right?

We don't see bios ... they're top of block.  SCSI sees requests but the
block layer does all our request and sg list manipulation for us.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "swiotlb buffer is full" with 3.13-rc1+ but not 3.4.

2013-12-03 Thread Konrad Rzeszutek Wilk
On Sat, Nov 30, 2013 at 03:48:44PM -0500, James Bottomley wrote:
> On Sat, 2013-11-30 at 13:56 -0500, Konrad Rzeszutek Wilk wrote:
> > My theory is that the SWIOTLB is not full - it is just that the request
> > is for a compound page that is more than 512kB. Please note that
> > SWIOTLB highest "chunk" of buffer it can deal with is 512kb.
> > 
> > And that is of course the question comes out - why would it try to
> > bounce buffer it. In Xen the answer is simple - the sg chunks cross page
> > boundaries which means that they are not physically contingous - so we
> > have to use the bounce buffer. It would be better if the the sg list
> > provided a large list of 4KB pages instead of compound pages as that
> > could help in avoiding the bounce buffer.
> > 
> > But I digress - this is a theory - I don't know whether the SCSI layer
> > does any colescing of the sg list - and if so, whether there is any
> > easy knob to tell it to not do it.
> 
> Well, SCSI doesn't, but block does.  It's actually an efficiency thing
> since most firmware descriptor formats cope with multiple pages and the
> more descriptors you have for a transaction, the more work the on-board
> processor on the HBA has to do.  If you have an emulated HBA, like
> virtio, you could turn off physical coalesing by setting the
> use_clustering flag to DISABLE_CLUSTERING.  But you can't do that for a
> real card.  I assume the problem here is that the host is passing the
> card directly to the guest and the guest clusters based on its idea of
> guest pages which don't map to contiguous physical pages?

Kind of. Except that in this case the guest does know that it can't map
them contingously - and resorts to using the bounce buffer so that it
can provide a nice chunk of contingous area. This is detected by
the SWIOTLB layer and also the block layer to discourage coalescing
there.

But since SCSI is all about sg list I think it gets tangled up here:

537 for_each_sg(sgl, sg, nelems, i) {   

538 phys_addr_t paddr = sg_phys(sg);

539 dma_addr_t dev_addr = xen_phys_to_bus(paddr);   

540 

541 if (swiotlb_force ||

542 !dma_capable(hwdev, dev_addr, sg->length) ||

543 range_straddles_page_boundary(paddr, sg->length)) { 

544 phys_addr_t map = swiotlb_tbl_map_single(hwdev, 

545  
start_dma_addr,
546  
sg_phys(sg),   
547  
sg->length,
548  dir);  


So it is either not capable of reaching that physical address (so DMA
mask, but I doubt it - this is LSI which can do 64bit). Or the pages
straddle. They can straddle it by well, being offset at odd locations, or
compound pages.

But why would they in the first place - and so many of them - considering
the flow of those printks Ian's is seeing.

James,
The SCSI layer wouldn't do any funny business here right - no reording
of bios? That is all left to the block layer right?


> 
> The way you tell how many physically contiguous pages block is willing
> to merge is by looking at /sys/block//queue/max_segment_size if
> that's 4k then it won't merge, if it's greater than 4k, then it will.

Ah, good idea. Ian, anything there?
> 
> I'm not quite sure what to do ... you can't turn of clustering globally
> in the guest because the virtio drivers use it to reduce ring descriptor
> pressure, what you probably want is some way to flag a pass through
> device.
> 
> James
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx: automatically rescan removed luns.

2013-12-03 Thread Benjamin ESTRABAUD

On 28/11/13 13:29, Hannes Reinecke wrote:

On 11/26/2013 05:26 PM, Douglas Gilbert wrote:

On 13-11-26 11:06 AM, Benjamin ESTRABAUD wrote:

[ .. ]

"rescan-scsi-bus.sh" did detect new LUN, but apparently not
removed ones.
However I need to test it on a system with a compatible bash shell
as I wasn't
able to run the script without errors.


Did you try the rescan-scsi-bus.sh from sg3_utils v 1.37 or
earlier? The reason I ask is that a fair amount of work
was done on the rescan-scsi-bus.sh found in version 1.37
including syncing with Kurt Garloff's version 1.57 plus
patches from Hannes Reinecke and Sean Stewart.


Plus you need to call it with '-r', otherwise it won't remove any
stale LUNs. I'm sure it's documented somewhere ...


Hi Doug and Hannes,

I did indeed try but I was using an older version and without specifying 
"-r" so it didn't remove anything.


I looked at the latest version's "remove" code (at around line 445 from 
rescan-scsi-bus.sh) and it seems to delete disks using the scsi_device 
sysfs "delete" function (which is the function I'm using right now).


I was unable to tell how the script detects whether a drive is gone or 
has been replaced with another backend storage at the same LUN, which in 
fact turns out to be the thing I'm more interested in (since I can now 
delete stale luns).


Thanks in advance for your help!


Cheers,

Hannes



Regards,
Ben.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing

2013-12-03 Thread Alireza Haghdoost
On Tue, Dec 3, 2013 at 6:09 AM, Bart Van Assche  wrote:
> I think libaio can reorder commands before these reach the SCSI core.

Thanks for your comments. I think the libaio is above the block layer.
Therefore, since we observed the ordering is maintained in the block
layer, I don't think libaio reorder IO requests with in an IO context.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RESEND v2 0/2] Hard disk resume time optimization

2013-12-03 Thread Douglas Gilbert

On 13-12-03 05:25 PM, Todd E Brandt wrote:

Hi James, can you give me some feedback on this patch set? It includes
changes based on your feedback to v1.

The essential issue behind hard disks' lengthy resume time is the ata port
driver blocking until the ATA port hardware is finished coming online. So
the kernel isn't really doing anything during all those seconds that the
disks are resuming, it's just blocking until the hardware says it's ready
to accept commands. Applying this patch set allows SATA disks to resume
asynchronously without holding up system resume, thus allowing the UI to
come online much more quickly. There may be a short period after resume
where the disks are still spinning up in the background, but the user
shouldn't notice since the OS can function with the data left in RAM.

The patch set has two parts which apply to ata_port_resume and sd_resume
respectively. Both are required to achieve any real performance benefit,
but they will still function independantly without a performance hit.

ata_port_resume patch (1/2):

On resume, the ATA port driver currently waits until the AHCI controller
finishes executing the port wakeup command. This patch changes the
ata_port_resume callback to issue the wakeup and then return immediately,
thus allowing the next device in the pm queue to resume. Any commands
issued to the AHCI hardware during the wakeup will be queued up and
executed once the port is physically online. Thus no information is lost.

sd_resume patch (2/2):

On resume, the SD driver currently waits until the block driver finishes
executing a disk start command with blk_execute_rq. This patch changes
the sd_resume callback to use blk_execute_rq_nowait instead, which allows
it to return immediately, thus allowing the next device in the pm queue
to resume. The return value of blk_execute_rq_nowait is handled in the
background by sd_resume_complete. Any commands issued to the scsi disk
during the startup will be queued up and executed once the disk is online.
Thus no information is lost.


There was some fuzziness in the SCSI drafts as to how a disk
would react to medium access commands (e.g. READ) when it
was transitioning from stopped state (plus other lower powered
states) to active. There are two options:
  a) hold the medium access command in the target until
 it becomes active, then act on it, or
  b) send back a NOT READY sense key with LOGICAL UNIT IS IN
 PROCESS OF BECOMING READY immediately.

This has recently been resolved with the CFF STOPPED field in
the Power condition mode page. "Recently" is the operative word
so you should expect to see next to no support for the CFF
STOPPED field now. Thus SCSI disks can take their pick between
the above two options. Does your patch cope with that?


Also the START STOP UNIT SCSI command has an IMMED bit. It
makes more sense to send this command with the IMMED bit set
and wait for it to complete since that should be fast. You
may receive an unexpected UNIT ATTENTION or a transport error.
Those two imply your command has not, or can not do what has
been requested.

Doug Gilbert



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] hpsa: increase the probability of a reported success after a device reset

2013-12-03 Thread Tomas Henzl
rc is set in the loop, and it isn't set back to zero anywhere
this patch fixes it

Signed-off-by: Tomas Henzl 
---
 drivers/scsi/hpsa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index fb5a898..e46b609 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -2379,7 +2379,7 @@ static int hpsa_register_scsi(struct ctlr_info *h)
 static int wait_for_device_to_become_ready(struct ctlr_info *h,
unsigned char lunaddr[])
 {
-   int rc = 0;
+   int rc;
int count = 0;
int waittime = 1; /* seconds */
struct CommandList *c;
@@ -2399,6 +2399,7 @@ static int wait_for_device_to_become_ready(struct 
ctlr_info *h,
 */
msleep(1000 * waittime);
count++;
+   rc = 0; /* device ready. */
 
/* Increase wait time with each try, up to a point. */
if (waittime < HPSA_MAX_WAIT_INTERVAL_SECS)
-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RESEND v2 1/2] Hard disk resume time optimization, asynchronous ata_port_resume

2013-12-03 Thread One Thousand Gnomes
> thus allowing the UI to come online sooner. There may be a short period after 
> resume where the disks are still spinning up in the background, but the user 
> shouldn't notice since the OS can function with the data left in RAM.

I wonder how many marginal power supplies this will find 8)

I still think it's the right thing to do. In the SCSI world a bit more
caution was needed however.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RESEND v2 0/2] Hard disk resume time optimization

2013-12-03 Thread Todd E Brandt
Hi James, can you give me some feedback on this patch set? It includes
changes based on your feedback to v1.  

The essential issue behind hard disks' lengthy resume time is the ata port
driver blocking until the ATA port hardware is finished coming online. So
the kernel isn't really doing anything during all those seconds that the
disks are resuming, it's just blocking until the hardware says it's ready
to accept commands. Applying this patch set allows SATA disks to resume
asynchronously without holding up system resume, thus allowing the UI to
come online much more quickly. There may be a short period after resume
where the disks are still spinning up in the background, but the user
shouldn't notice since the OS can function with the data left in RAM.

The patch set has two parts which apply to ata_port_resume and sd_resume
respectively. Both are required to achieve any real performance benefit,
but they will still function independantly without a performance hit.

ata_port_resume patch (1/2):

On resume, the ATA port driver currently waits until the AHCI controller
finishes executing the port wakeup command. This patch changes the
ata_port_resume callback to issue the wakeup and then return immediately,
thus allowing the next device in the pm queue to resume. Any commands
issued to the AHCI hardware during the wakeup will be queued up and
executed once the port is physically online. Thus no information is lost.

sd_resume patch (2/2):

On resume, the SD driver currently waits until the block driver finishes
executing a disk start command with blk_execute_rq. This patch changes
the sd_resume callback to use blk_execute_rq_nowait instead, which allows
it to return immediately, thus allowing the next device in the pm queue
to resume. The return value of blk_execute_rq_nowait is handled in the
background by sd_resume_complete. Any commands issued to the scsi disk
during the startup will be queued up and executed once the disk is online.
Thus no information is lost.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RESEND v2 2/2] Hard disk resume time optimization, asynchronous sd_resume

2013-12-03 Thread Todd E Brandt
On resume, the SD driver currently waits until the block driver finishes 
executing a disk start command with blk_execute_rq. This patch changes the 
sd_resume callback to use blk_execute_rq_nowait instead, which allows it to 
return immediately, thus allowing the next device in the pm queue to resume. 
The return value of blk_execute_rq_nowait is handled in the background by 
sd_resume_complete. Any commands issued to the scsi disk during the startup 
will be queued up and executed once the disk is online. Thus no information 
is lost, and although the wait time itself isn't removed, it doesn't hold up 
the rest of the system.

In combination with the ata_port_resume patch, this patch greatly reduces S3 
system resume time on systems with SATA drives. This is accomplished by 
removing the drive spinup time from the system resume delay. Applying these 
two patches allows SATA disks to resume asynchronously without holding up 
system resume; thus allowing the UI to come online sooner. There may be a 
short period after resume where the disks are still spinning up in the 
background, but the user shouldn't notice since the OS can function with the 
data left in RAM.

This patch applies to all three resume callbacks: resume, restore, and 
runtime-resume. There is only a performance benefit for resume, but for 
simplicity both restore and runtime-resume use the same code path.

Signed-off-by: Todd Brandt 
Signed-off-by: Arjan van de Ven 

 drivers/scsi/sd.c | 70 
+++---
 1 file changed, 67 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e6c4bff..eed8ea2 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3166,18 +3166,82 @@ static int sd_suspend_runtime(struct device *dev)
return sd_suspend_common(dev, false);
 }
 
+static void sd_resume_complete(struct request *rq, int error)
+{
+   struct scsi_sense_hdr sshdr;
+   struct scsi_disk *sdkp = rq->end_io_data;
+   char *sense = rq->sense;
+
+   if (error) {
+   sd_printk(KERN_WARNING, sdkp, "START FAILED\n");
+   sd_print_result(sdkp, error);
+   if (sense && (driver_byte(error) & DRIVER_SENSE)) {
+   scsi_normalize_sense(sense,
+   SCSI_SENSE_BUFFERSIZE, &sshdr);
+   sd_print_sense_hdr(sdkp, &sshdr);
+   }
+   } else {
+   sd_printk(KERN_NOTICE, sdkp, "START SUCCESS\n");
+   }
+
+   kfree(sense);
+   rq->sense = NULL;
+   rq->end_io_data = NULL;
+   __blk_put_request(rq->q, rq);
+   scsi_disk_put(sdkp);
+}
+
 static int sd_resume(struct device *dev)
 {
+   unsigned char cmd[6] = { START_STOP };
struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
+   struct request *req;
+   char *sense = NULL;
int ret = 0;
 
if (!sdkp->device->manage_start_stop)
-   goto done;
+   goto error;
 
sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
-   ret = sd_start_stop_device(sdkp, 1);
 
-done:
+   cmd[4] |= 1;
+
+   if (sdkp->device->start_stop_pwr_cond)
+   cmd[4] |= 1 << 4;   /* Active or Standby */
+
+   if (!scsi_device_online(sdkp->device)) {
+   ret = -ENODEV;
+   goto error;
+   }
+
+   req = blk_get_request(sdkp->device->request_queue, 0, __GFP_WAIT);
+   if (!req) {
+   ret = -ENOMEM;
+   goto error;
+   }
+
+   sense = kzalloc(SCSI_SENSE_BUFFERSIZE, GFP_NOIO);
+   if (!sense) {
+   ret = -ENOMEM;
+   goto error_sense;
+   }
+
+   req->cmd_len = COMMAND_SIZE(cmd[0]);
+   memcpy(req->cmd, cmd, req->cmd_len);
+   req->sense = sense;
+   req->sense_len = 0;
+   req->retries = SD_MAX_RETRIES;
+   req->timeout = SD_TIMEOUT;
+   req->cmd_type = REQ_TYPE_BLOCK_PC;
+   req->cmd_flags |= REQ_PM | REQ_QUIET | REQ_PREEMPT;
+
+   req->end_io_data = sdkp;
+   blk_execute_rq_nowait(req->q, NULL, req, 1, sd_resume_complete);
+   return 0;
+
+ error_sense:
+   __blk_put_request(req->q, req);
+ error:
scsi_disk_put(sdkp);
return ret;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RESEND v2 1/2] Hard disk resume time optimization, asynchronous ata_port_resume

2013-12-03 Thread Todd E Brandt
On resume, the ATA port driver currently waits until the AHCI controller 
finishes executing the port wakeup command. This patch changes the 
ata_port_resume callback to issue the wakeup and then return immediately, 
thus allowing the next device in the pm queue to resume. Any commands issued 
to the AHCI hardware during the wakeup will be queued up and executed once 
the port is physically online. Thus no information is lost, and although 
the wait time itself isn't removed, it doesn't hold up the rest of the system.

In combination with the sd_resume patch, this patch greatly reduces S3 system 
resume time on systems with SATA drives. This is accomplished by removing the 
drive spinup time from the system resume delay. Applying these two patches 
allows SATA disks to resume asynchronously without holding up system resume; 
thus allowing the UI to come online sooner. There may be a short period after 
resume where the disks are still spinning up in the background, but the user 
shouldn't notice since the OS can function with the data left in RAM.

This patch only changes the behavior of the resume callback, not restore, 
thaw, or runtime-resume. This is because thaw and restore are used after a 
suspend-to-disk, which means that an image needs to be read from swap and 
reloaded into RAM. The swap disk will always need to be fully restored/thawed 
in order for resume to continue.

Signed-off-by: Todd Brandt 
Signed-off-by: Arjan van de Ven 

 drivers/ata/libata-core.c | 48 
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 75b9367..4819b93 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5312,7 +5312,7 @@ bool ata_link_offline(struct ata_link *link)
 #ifdef CONFIG_PM
 static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
   unsigned int action, unsigned int ehi_flags,
-  int *async)
+  bool async, int *async_result)
 {
struct ata_link *link;
unsigned long flags;
@@ -5322,8 +5322,8 @@ static int ata_port_request_pm(struct ata_port *ap, 
pm_message_t mesg,
 * progress.  Wait for PM_PENDING to clear.
 */
if (ap->pflags & ATA_PFLAG_PM_PENDING) {
-   if (async) {
-   *async = -EAGAIN;
+   if (async && async_result) {
+   *async_result = -EAGAIN;
return 0;
}
ata_port_wait_eh(ap);
@@ -5335,7 +5335,7 @@ static int ata_port_request_pm(struct ata_port *ap, 
pm_message_t mesg,
 
ap->pm_mesg = mesg;
if (async)
-   ap->pm_result = async;
+   ap->pm_result = async_result;
else
ap->pm_result = &rc;
 
@@ -5358,7 +5358,8 @@ static int ata_port_request_pm(struct ata_port *ap, 
pm_message_t mesg,
return rc;
 }
 
-static int __ata_port_suspend_common(struct ata_port *ap, pm_message_t mesg, 
int *async)
+static int __ata_port_suspend_common(struct ata_port *ap, pm_message_t mesg,
+bool async, int *async_result)
 {
/*
 * On some hardware, device fails to respond after spun down
@@ -5370,14 +5371,14 @@ static int __ata_port_suspend_common(struct ata_port 
*ap, pm_message_t mesg, int
 */
unsigned int ehi_flags = ATA_EHI_QUIET | ATA_EHI_NO_AUTOPSY |
 ATA_EHI_NO_RECOVERY;
-   return ata_port_request_pm(ap, mesg, 0, ehi_flags, async);
+   return ata_port_request_pm(ap, mesg, 0, ehi_flags, async, async_result);
 }
 
 static int ata_port_suspend_common(struct device *dev, pm_message_t mesg)
 {
struct ata_port *ap = to_ata_port(dev);
 
-   return __ata_port_suspend_common(ap, mesg, NULL);
+   return __ata_port_suspend_common(ap, mesg, false, NULL);
 }
 
 static int ata_port_suspend(struct device *dev)
@@ -5402,27 +5403,42 @@ static int ata_port_poweroff(struct device *dev)
 }
 
 static int __ata_port_resume_common(struct ata_port *ap, pm_message_t mesg,
-   int *async)
+   bool async, int *async_result)
 {
int rc;
 
rc = ata_port_request_pm(ap, mesg, ATA_EH_RESET,
-   ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET, async);
+   ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET, async, async_result);
return rc;
 }
 
-static int ata_port_resume_common(struct device *dev, pm_message_t mesg)
+static int ata_port_resume_common(struct device *dev, pm_message_t mesg,
+ bool async)
 {
struct ata_port *ap = to_ata_port(dev);
 
-   return __ata_port_resume_common(ap, mesg, NULL);
+   return __ata_port_resume_common(ap, mesg, async, NULL);
+}
+
+static int ata_port_resume_async(struct device *dev)
+{
+   int rc;
+
+   

[PATCH] scsi: Fix crash on out of memory with MAC SCSI

2013-12-03 Thread Alan
From: Alan 

Missing check on scsi_register

Signed-off-by: Alan Cox 
---
 drivers/scsi/mac_scsi.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/mac_scsi.c b/drivers/scsi/mac_scsi.c
index 8580757..f5cdc68 100644
--- a/drivers/scsi/mac_scsi.c
+++ b/drivers/scsi/mac_scsi.c
@@ -260,6 +260,8 @@ int __init macscsi_detect(struct scsi_host_template * tpnt)
 /* Once we support multiple 5380s (e.g. DuoDock) we'll do
something different here */
 instance = scsi_register (tpnt, sizeof(struct NCR5380_hostdata));
+if (instance == NULL)
+   return 0;
 
 if (macintosh_config->ident == MAC_MODEL_IIFX) {
mac_scsi_regp  = via1+0x8000;

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] bfa: Fix crash when symb name set for offline vport

2013-12-03 Thread James Bottomley
On Tue, 2013-12-03 at 07:38 -0700, Vijaya Mohan Guvva wrote:
> > On Tue, 2013-11-19 at 23:05 -0800, vmo...@brocade.com wrote:
> > > From: Vijaya Mohan Guvva 
> > >
> > > This patch fixes a crash when tried setting symbolic name for an
> > > offline vport through sysfs. Crash is due to uninitialized pointer
> > > lport->ns, which gets initialized only on linkup (port online).
> > 
> > This looks like a separable patch that should go through the fixes tree to
> > stable, is that right?
> > 
> > James
> > 
> 
> Hi James,
> Yes, this is a bug fix and should go through the fixes tree to stable tree.
> So, should I separate this patch from the original patch and submit it 
> separately?

No, that's OK this time around, I can cut this out of the set.  Next
time, having separate bug fixes and updates allows this to happen
without me having to ask.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 7/8] bfa: Fix crash when symb name set for offline vport

2013-12-03 Thread Vijaya Mohan Guvva
> On Tue, 2013-11-19 at 23:05 -0800, vmo...@brocade.com wrote:
> > From: Vijaya Mohan Guvva 
> >
> > This patch fixes a crash when tried setting symbolic name for an
> > offline vport through sysfs. Crash is due to uninitialized pointer
> > lport->ns, which gets initialized only on linkup (port online).
> 
> This looks like a separable patch that should go through the fixes tree to
> stable, is that right?
> 
> James
> 

Hi James,
Yes, this is a bug fix and should go through the fixes tree to stable tree.
So, should I separate this patch from the original patch and submit it 
separately?

Thanks,
Vijay
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] bfa: Fix crash when symb name set for offline vport

2013-12-03 Thread James Bottomley
On Tue, 2013-11-19 at 23:05 -0800, vmo...@brocade.com wrote:
> From: Vijaya Mohan Guvva 
> 
> This patch fixes a crash when tried setting symbolic name for an offline
> vport through sysfs. Crash is due to uninitialized pointer lport->ns,
> which gets initialized only on linkup (port online).

This looks like a separable patch that should go through the fixes tree
to stable, is that right?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing

2013-12-03 Thread James Bottomley
On Tue, 2013-12-03 at 13:26 +0100, Bart Van Assche wrote:
> On 12/03/13 00:38, James Bottomley wrote:
> > Well this would be because we don't guarantee order at any granularity
> > below barriers.  We won't reorder across barriers but below them we can
> > reorder the commands and, of course, we use simple tags for queuing
> > which entitles the underlying storage hardware to reorder within its
> > internal queue.  Previously, when everything was single threaded issue,
> > you mostly got FIFO behaviour because reorder really only occurred on
> > error or busy, but I would imagine that's changing now with multiqueue.
> 
> Reordering SCSI commands was fine as long as hard disks were the only
> supported storage medium. This is because most hard disk controllers do
> not perform writes in the order these writes are submitted to their
> controller.

Well, no, we could have used Ordered instead of Simple tags ... that
would preserve submission order according to spec.  This wouldn't really
work for SATA because NCQ only has simple tags.  The point is that our
granular unit of ordering is between two barriers, which is way above
the request/tag level so we didn't bother to enforce tag ordering.  We
discussed it over the course of several years, because strict ordering
would have relieved us of the need to do barriers.  However, handling
strict ordering in the face of requeuing events like QUEUE FULL or BUSY
is hard so we didn't bother.

James

>  However, with several SSD models it is possible to tell the
> controller to preserve write order. Furthermore, the optimizations that
> are possible by using atomic writes are only safe if it is guaranteed
> that none of the layers between the application and the SCSI target
> changes the order in which an application submitted these atomic writes.
> In other words, although it was safe in the past to reorder the writes
> submitted between two successive barriers such reordering would
> eliminate several of the benefits of atomic writes. A quote from the
> draft SCSI atomics specification
> (http://www.t10.org/cgi-bin/ac.pl?t=d&f=13-064r7.pdf):
> 
> Atomic writes may:
>   a) increase write endurance
> A) reducing writes increases the life of a flash-based SSD
>   b) increase performance
> A) reducing writes results in fewer system calls, fewer I/Os over
>the SCSI transport protocol, and fewer interrupts
>   c) improve reliability for non-journaled data
>   d) simplify applications
> A) reduce or eliminate journaling
> B) keep applications from managing atomicity
> 
> 
> Bart.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing

2013-12-03 Thread Bart Van Assche
On 12/02/13 23:12, Alireza Haghdoost wrote:
> Note that we are using kernel 3.11 and libaio to perform async IO.

I think libaio can reorder commands before these reach the SCSI core.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing

2013-12-03 Thread Bart Van Assche
On 12/03/13 00:38, James Bottomley wrote:
> Well this would be because we don't guarantee order at any granularity
> below barriers.  We won't reorder across barriers but below them we can
> reorder the commands and, of course, we use simple tags for queuing
> which entitles the underlying storage hardware to reorder within its
> internal queue.  Previously, when everything was single threaded issue,
> you mostly got FIFO behaviour because reorder really only occurred on
> error or busy, but I would imagine that's changing now with multiqueue.

Reordering SCSI commands was fine as long as hard disks were the only
supported storage medium. This is because most hard disk controllers do
not perform writes in the order these writes are submitted to their
controller. However, with several SSD models it is possible to tell the
controller to preserve write order. Furthermore, the optimizations that
are possible by using atomic writes are only safe if it is guaranteed
that none of the layers between the application and the SCSI target
changes the order in which an application submitted these atomic writes.
In other words, although it was safe in the past to reorder the writes
submitted between two successive barriers such reordering would
eliminate several of the benefits of atomic writes. A quote from the
draft SCSI atomics specification
(http://www.t10.org/cgi-bin/ac.pl?t=d&f=13-064r7.pdf):

Atomic writes may:
  a) increase write endurance
A) reducing writes increases the life of a flash-based SSD
  b) increase performance
A) reducing writes results in fewer system calls, fewer I/Os over
   the SCSI transport protocol, and fewer interrupts
  c) improve reliability for non-journaled data
  d) simplify applications
A) reduce or eliminate journaling
B) keep applications from managing atomicity


Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drivers: scsi: scsi_lib.c: add prefix "SCSILIB_" to macro "SP"

2013-12-03 Thread Chen Gang
On 12/03/2013 05:32 AM, rkuo wrote:
> On Mon, Dec 02, 2013 at 06:14:33PM +0800, Chen Gang wrote:
>> If one issue occurs, normally, both sides need improvement.
>>
>> For our issue:
>>
>>  - need try to keep uapi no touch ("arch/hexagon/uapi/asm/registers.h").
>>
>>  - improving our module is much easier than improving hexagon.
>>
>>  - for 'SP', it is really short enough to like a register name.
>>SG_POOL seems more suitable for our 'sgpool' related operations.
>>
>>
>> It will be better to improve hexagon too, but it is not quit easy (maybe
>> have to bear it), it is uapi :-(
> 
> I can't speak for SCSI, but that define in Hexagon isn't really used outside
> of that file anyways (the two SP macros themselves).  I'm also pretty sure
> the userspace isn't currently using it anyways.  Seeing as it's not really
> buying us anything, I'm fine with just removing it and continuing to review
> the other defines.  I'll make that change and test it locally.
> 

OK, thanks.

> 
> Thanks,
> Richard Kuo
> 
> 


-- 
Chen Gang
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html