Re: Performance of SCST versus STGT

2008-01-20 Thread Bart Van Assche
On Jan 18, 2008 1:08 PM, Vladislav Bolkhovitin [EMAIL PROTECTED] wrote:

 [ ... ]
 So, seems I understood your slides correctly: the more valuable data for
 our SCST SRP vs STGT iSER comparison should be on page 26 for 1 command
 read (~480MB/s, i.e. ~60% from Bart's result on the equivalent hardware).

At least in my tests SCST performed significantly better than STGT.
These tests were performed with the currently available
implementations of SCST and STGT. Which performance improvements are
possible for these projects (e.g. zero-copying), and by how much is it
expected that these performance improvements will increase throughput
and will decrease latency ?

Bart.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: usb-storage-psc1350-v4.patch (was Linux scsi / usb-mass-storage and HP printer cardreader bug + fix)

2008-01-20 Thread Hans de Goede

James Bottomley wrote:

On Mon, 2008-01-14 at 20:27 +0100, Hans de Goede wrote:

James Bottomley wrote:

Plus what is the rq-nr_sectors  sdp-sector_size /
512 test supposed to be doing?  that being true is supposed to be a
guarantee of the block layer (and if something goes wrong there's a
check for this lower down).


It first is was just:
rq-nr_sectors  1

Then I changed it to also do the right thing for 1024 and larger sector disks. 
The whole purpose is to not read the last sector in a larger then one sector 
read, so the amount of sectors gets reduced by one if (block + rq-nr_sectors 
== get_capacity(disk)) but we do want still want to be able to read the last 
sector by itself, so we must not reduce the no sectors to be read by one if it 
is already one.

Yes, I know that.  What I mean is the block subsystem sends reads and
writes down in increments of the sector size.  Checking if the current
number of pending sectors is greater than the current block size is
checking that guarantee.  The current code already has a check in it to
see if this guarantee is observed ... you don't need to check it again.

I'm not checking for the guarantee, I'm checking that the read  1 
realsize_sector, before decrementing the number of _512_bytes_sectors by 
realsize_sector, this check must be there, as after reading all but the last 
sector, another request will be send with 1 realsize_sector size, for which
(block + rq-nr_sectors) == get_capacity(disk) will still hold true, in this 
case this_count must not be lowered, otherwise this_count will become 0 in this 
case and the last sector will never get read.


I hope that clarifies why that code is there, if not can you tell how you would 
want the code to look by just giving the full if statement as it should be, I 
think we are somehow misunderstanding each other.


Oh, right; it's  rather than = ... sorry, yes that's fine.



Ok,

I got swamped @work this week so it took a while, but I've made a new seperate 
(scsi-sd only) patch with the cleanups as discussed.


I'm sending this (to the full recipient list) in a new top level post titled:
PATCH: scsi-sd-last-sector-bug-flag.patch

Regards,

Hans
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PATCH: scsi-sd-last-sector-bug-flag.patch

2008-01-20 Thread Hans de Goede

Hi all,

This patch adds a new scsi_device flag for devices which contain a bug where
the device crashes when the last sector is read in a larger then 1 sector read.

This is for example the case with sdcards in the HP PSC1350 printer cardreader
and in the HP PSC1610 printer cardreader.

Signed-off-by: Hans de Goede [EMAIL PROTECTED]

Regards,

Hans
This patch adds a new scsi_device flag for devices which contain a bug where
the device crashes when the last sector is read in a larger then 1 sector read.

This is for example the case with sdcards in the HP PSC1350 printer cardreader
and in the HP PSC1610 printer cardreader.

Signed-off-by: Hans de Goede [EMAIL PROTECTED]
diff -up vanilla-2.6.24-rc7/include/scsi/scsi_device.h.psc1350 vanilla-2.6.24-rc7/include/scsi/scsi_device.h
--- vanilla-2.6.24-rc7/include/scsi/scsi_device.h.psc1350	2008-01-11 19:40:31.0 +0100
+++ vanilla-2.6.24-rc7/include/scsi/scsi_device.h	2008-01-11 19:40:48.0 +0100
@@ -142,6 +142,7 @@ struct scsi_device {
 	unsigned fix_capacity:1;	/* READ_CAPACITY is too high by 1 */
 	unsigned guess_capacity:1;	/* READ_CAPACITY might be too high by 1 */
 	unsigned retry_hwerror:1;	/* Retry HARDWARE_ERROR */
+	unsigned last_sector_bug:1;	/* Always read last sector in a 1 sector read */
 
 	DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
 	struct list_head event_list;	/* asserted events */
diff -up vanilla-2.6.24-rc7/drivers/scsi/sd.c.psc1350 vanilla-2.6.24-rc7/drivers/scsi/sd.c
--- vanilla-2.6.24-rc7/drivers/scsi/sd.c.psc1350	2008-01-11 19:55:43.0 +0100
+++ vanilla-2.6.24-rc7/drivers/scsi/sd.c	2008-01-20 10:49:17.0 +0100
@@ -395,6 +395,16 @@ static int sd_prep_fn(struct request_que
 		goto out;
 	}
 
+	/* 
+	 * Some devices (some sdcards for one) don't like it if the last sector
+	 * gets read in a larger then 1 sector read.
+	 */
+	if (unlikely(sdp-last_sector_bug  
+	rq-nr_sectors  sdp-sector_size / 512 
+ 	block + this_count == get_capacity(disk))) {
+		this_count -= sdp-sector_size / 512;
+	}
+
 	SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, block=%llu\n,
 	(unsigned long long)block));
 


PATCH: usb-storage-set-last-sector-bug-flag.patch

2008-01-20 Thread Hans de Goede

Hi all,

This patch sets the last_sector_bug flag to 1 for all USB disks. This is
needed to makes the cardreader on various HP multifunction printers work.

Since the performance impact is negible we set this flag for all USB disks to
avoid an unusual_devs.h nightmare.

Note that this patch depends on:
PATCH: scsi-sd-last-sector-bug-flag.patch

Which actually adds this flag to the scsi subsystem.

Signed-off-by: Hans de Goede [EMAIL PROTECTED]

Regards,

Hans
This patch sets the last_sector_bug flag to 1 for all USB disks. This is
needed to makes the cardreader on various HP multifunction printers work.

Since the performance impact is negible we set this flag for all USB disks to
avoid an unusual_devs.h nightmare.

Note that this patch depends on:
PATCH: scsi-sd-last-sector-bug-flag.patch

Which actually adds this flag to the scsi subsystem.

Signed-off-by: Hans de Goede [EMAIL PROTECTED]
--- vanilla-2.6.24-rc7/drivers/usb/storage/scsiglue.c.psc1350	2008-01-11 19:40:31.0 +0100
+++ vanilla-2.6.24-rc7/drivers/usb/storage/scsiglue.c	2008-01-20 11:18:38.0 +0100
@@ -187,6 +187,10 @@ static int slave_configure(struct scsi_d
 		 * automatically, requiring a START-STOP UNIT command. */
 		sdev-allow_restart = 1;
 
+		/* Some USB cardreaders have trouble reading an sdcard's last
+		 * sector in a larger then 1 sector read, since the performance
+		 * impact is negible we set this flag for all USB disks */
+		sdev-last_sector_bug = 1;
 	} else {
 
 		/* Non-disk-type devices don't need to blacklist any pages


Re: [PATCH v3] use dynamically allocated sense buffer

2008-01-20 Thread Boaz Harrosh
On Wed, Jan 16 2008 at 6:32 +0200, FUJITA Tomonori [EMAIL PROTECTED] wrote:
 This is the third version of:
 
 http://marc.info/?l=linux-scsim=120038907123706w=2
 
 The changes from the second version are:
 
 - Fixed the memory leak bug that Boaz pointed out.
 
 shots-backup_sense_buffer has gone. One sense buffer is allocated per
 host and it's always attached to the scsi_cmnd linked with freelist
 (backup command).
 
 - Calling scsi_setup_command_sense_buffer in scsi_host_alloc instead
 of scsi_add_host.
 
 The first version tries to allocates as many buffers as
 shost-can_queue so scsi_setup_command_sense_buffer is called in
 scsi_add_host. But, it's not the case any more, so this calls
 scsi_setup_command_sense_buffer in scsi_host_alloc like
 scsi_setup_command_freelist.
 
 
 I did the same tests against this patch (though I knew there were not
 the performnace differences):
 
 dynamic sense buf (slub) | 483.5 MB/s  IOPS 123772.7/s
 
 For convenience, here are the previous results:
 
 scsi-misc (slub) | 486.9 MB/s  IOPS 124652.9/s
 dynamic sense buf (slub) | 483.2 MB/s  IOPS 123704.1/s
 
 
 I put the results and the kernel configuration:
 
 http://www.kernel.org/pub/linux/kernel/people/tomo/sense/
 
 
 =
 From: FUJITA Tomonori [EMAIL PROTECTED]
 Subject: [PATCH] use dynamically allocated sense buffer
 
 This removes static array sense_buffer in scsi_cmnd and uses
 dynamically allocated sense_buffer (with GFP_DMA).
 
 The reason for doing this is that some architectures need cacheline
 aligned buffer for DMA:
 
 http://lkml.org/lkml/2007/11/19/2
 
 The problems are that scsi_eh_prep_cmnd puts scsi_cmnd::sense_buffer
 to sglist and some LLDs directly DMA to scsi_cmnd::sense_buffer. It's
 necessary to DMA to scsi_cmnd::sense_buffer safely. This patch solves
 these issues.
 
 __scsi_get_command allocates sense_buffer via kmem_cache_alloc and
 attaches it to a scsi_cmnd so everything just work as before.
 
 
 Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
 ---
  drivers/scsi/hosts.c |9 ++-
  drivers/scsi/scsi.c  |   61 -
  drivers/scsi/scsi_priv.h |2 +
  include/scsi/scsi_cmnd.h |2 +-
  4 files changed, 70 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
 index 9a10b43..f5d3fbb 100644
 --- a/drivers/scsi/hosts.c
 +++ b/drivers/scsi/hosts.c
 @@ -268,6 +268,7 @@ static void scsi_host_dev_release(struct device *dev)
   }
  
   scsi_destroy_command_freelist(shost);
 + scsi_destroy_command_sense_buffer(shost);
   if (shost-bqt)
   blk_free_tags(shost-bqt);
  
 @@ -372,10 +373,14 @@ struct Scsi_Host *scsi_host_alloc(struct 
 scsi_host_template *sht, int privsize)
   else
   shost-dma_boundary = 0x;
  
 - rval = scsi_setup_command_freelist(shost);
 + rval = scsi_setup_command_sense_buffer(shost);
   if (rval)
   goto fail_kfree;
  
 + rval = scsi_setup_command_freelist(shost);
 + if (rval)
 + goto fail_destroy_sense;
 +
   device_initialize(shost-shost_gendev);
   snprintf(shost-shost_gendev.bus_id, BUS_ID_SIZE, host%d,
   shost-host_no);
 @@ -399,6 +404,8 @@ struct Scsi_Host *scsi_host_alloc(struct 
 scsi_host_template *sht, int privsize)
  
   fail_destroy_freelist:
   scsi_destroy_command_freelist(shost);
 + fail_destroy_sense:
 + scsi_destroy_command_sense_buffer(shost);
   fail_kfree:
   kfree(shost);
   return NULL;
 diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
 index 54ff611..0a4a5b8 100644
 --- a/drivers/scsi/scsi.c
 +++ b/drivers/scsi/scsi.c
 @@ -161,6 +161,9 @@ static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
  
  static DEFINE_MUTEX(host_cmd_pool_mutex);
  
 +static struct kmem_cache *sense_buffer_slab;
 +static int sense_buffer_slab_users;
 +
  /**
   * __scsi_get_command - Allocate a struct scsi_cmnd
   * @shost: host to transmit command
 @@ -172,6 +175,7 @@ static DEFINE_MUTEX(host_cmd_pool_mutex);
  struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *shost, gfp_t gfp_mask)
  {
   struct scsi_cmnd *cmd;
 + unsigned char *buf;
  
   cmd = kmem_cache_alloc(shost-cmd_pool-slab,
   gfp_mask | shost-cmd_pool-gfp_mask);
 @@ -186,6 +190,21 @@ struct scsi_cmnd *__scsi_get_command(struct Scsi_Host 
 *shost, gfp_t gfp_mask)
   list_del_init(cmd-list);
   }
   spin_unlock_irqrestore(shost-free_list_lock, flags);
 +
 + if (cmd) {
 + buf = cmd-sense_buffer;
 + memset(cmd, 0, sizeof(*cmd));
 + cmd-sense_buffer = buf;
 + }
 + } else {
 + buf = kmem_cache_alloc(sense_buffer_slab, __GFP_DMA|gfp_mask);
 + if (likely(buf)) {
 + memset(cmd, 0, sizeof(*cmd));
 + cmd-sense_buffer = buf;
 + } else {
 + 

Re: [PATCH v3] use dynamically allocated sense buffer

2008-01-20 Thread Matthew Wilcox
On Wed, Jan 16, 2008 at 01:32:17PM +0900, FUJITA Tomonori wrote:
 This removes static array sense_buffer in scsi_cmnd and uses
 dynamically allocated sense_buffer (with GFP_DMA).
 
 The reason for doing this is that some architectures need cacheline
 aligned buffer for DMA:
 
 http://lkml.org/lkml/2007/11/19/2
 
 The problems are that scsi_eh_prep_cmnd puts scsi_cmnd::sense_buffer
 to sglist and some LLDs directly DMA to scsi_cmnd::sense_buffer. It's
 necessary to DMA to scsi_cmnd::sense_buffer safely. This patch solves
 these issues.
 
 __scsi_get_command allocates sense_buffer via kmem_cache_alloc and
 attaches it to a scsi_cmnd so everything just work as before.

I think this is fine for the moment.

Longer-term, I want to allow low-level drivers to allocate the
sense_buffer themselves so they can DMA directly into it (ie grown-up dma
mapping, rather than this quaint x86 __GFP_DMA).  This patch doesn't get
us any closer to that, but it doesn't get us further away from it either.

-- 
Intel are signing my paycheques ... these opinions are still mine
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: scsi-sd-last-sector-bug-flag.patch

2008-01-20 Thread James Bottomley

On Sun, 2008-01-20 at 11:12 +0100, Hans de Goede wrote:
 Hi all,
 
 This patch adds a new scsi_device flag for devices which contain a bug where
 the device crashes when the last sector is read in a larger then 1 sector 
 read.
 
 This is for example the case with sdcards in the HP PSC1350 printer cardreader
 and in the HP PSC1610 printer cardreader.
 
 Signed-off-by: Hans de Goede [EMAIL PROTECTED]

This could have done with running through checkpatch.pl:

ERROR: trailing whitespace
#28: FILE: drivers/scsi/sd.c:398:
+^I/* $

ERROR: trailing whitespace
#32: FILE: drivers/scsi/sd.c:402:
+^Iif (unlikely(sdp-last_sector_bug  $

WARNING: braces {} are not necessary for single statement blocks
#32: FILE: drivers/scsi/sd.c:402:
+   if (unlikely(sdp-last_sector_bug  
+   rq-nr_sectors  sdp-sector_size / 512 
+   block + this_count == get_capacity(disk))) {
+   this_count -= sdp-sector_size / 512;
+   }

ERROR: use tabs not spaces
#34: FILE: drivers/scsi/sd.c:404:
+ ^Iblock + this_count == get_capacity(disk))) {$

WARNING: line over 80 characters
#49: FILE: include/scsi/scsi_device.h:142:
+   unsigned last_sector_bug:1; /* Always read last sector in a
1 sector read */

total: 3 errors, 2 warnings, 23 lines checked

I've fixed all of these up.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] use dynamically allocated sense buffer

2008-01-20 Thread James Bottomley

On Wed, 2008-01-16 at 13:32 +0900, FUJITA Tomonori wrote:
 This is the third version of:
 
 http://marc.info/?l=linux-scsim=120038907123706w=2
[...]
 diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
 index 54ff611..0a4a5b8 100644
 --- a/drivers/scsi/scsi.c
 +++ b/drivers/scsi/scsi.c
 @@ -186,6 +190,21 @@ struct scsi_cmnd *__scsi_get_command(struct Scsi_Host 
 *shost, gfp_t gfp_mask)
   list_del_init(cmd-list);
   }
   spin_unlock_irqrestore(shost-free_list_lock, flags);
 +
 + if (cmd) {
 + buf = cmd-sense_buffer;
 + memset(cmd, 0, sizeof(*cmd));
 + cmd-sense_buffer = buf;
 + }
 + } else {
 + buf = kmem_cache_alloc(sense_buffer_slab, __GFP_DMA|gfp_mask);

This is going to cause the enterprise some angst because ZONE_DMA can be
very small, and the enterprise requrements for commands in flight very
large.  I also think its unnecessary if we know the host isn't requiring
ISA DMA.  How about the below to fix this, it's based on the existing
infrastructure for solving the very same problem.

James

---

From e7ffbd81684779f5eb41d66d5f499b82af40e12b Mon Sep 17 00:00:00 2001
From: James Bottomley [EMAIL PROTECTED]
Date: Sun, 20 Jan 2008 09:28:24 -0600
Subject: [SCSI] don't use __GFP_DMA for sense buffers if not required

Only hosts which actually have ISA DMA requirements need sense buffers
coming out of ZONE_DMA, so only use the __GFP_DMA flag for that case
to avoid allocating this scarce resource if it's not necessary.

Signed-off-by: James Bottomley [EMAIL PROTECTED]
---
 drivers/scsi/hosts.c |9 +
 drivers/scsi/scsi.c  |  106 +++--
 drivers/scsi/scsi_priv.h |2 -
 3 files changed, 46 insertions(+), 71 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index f5d3fbb..9a10b43 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -268,7 +268,6 @@ static void scsi_host_dev_release(struct device *dev)
}
 
scsi_destroy_command_freelist(shost);
-   scsi_destroy_command_sense_buffer(shost);
if (shost-bqt)
blk_free_tags(shost-bqt);
 
@@ -373,13 +372,9 @@ struct Scsi_Host *scsi_host_alloc(struct 
scsi_host_template *sht, int privsize)
else
shost-dma_boundary = 0x;
 
-   rval = scsi_setup_command_sense_buffer(shost);
-   if (rval)
-   goto fail_kfree;
-
rval = scsi_setup_command_freelist(shost);
if (rval)
-   goto fail_destroy_sense;
+   goto fail_kfree;
 
device_initialize(shost-shost_gendev);
snprintf(shost-shost_gendev.bus_id, BUS_ID_SIZE, host%d,
@@ -404,8 +399,6 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template 
*sht, int privsize)
 
  fail_destroy_freelist:
scsi_destroy_command_freelist(shost);
- fail_destroy_sense:
-   scsi_destroy_command_sense_buffer(shost);
  fail_kfree:
kfree(shost);
return NULL;
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 0a4a5b8..045e69d 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -141,29 +141,30 @@ const char * scsi_device_type(unsigned type)
 EXPORT_SYMBOL(scsi_device_type);
 
 struct scsi_host_cmd_pool {
-   struct kmem_cache   *slab;
-   unsigned intusers;
-   char*name;
-   unsigned intslab_flags;
-   gfp_t   gfp_mask;
+   struct kmem_cache   *cmd_slab;
+   struct kmem_cache   *sense_slab;
+   unsigned intusers;
+   char*cmd_name;
+   char*sense_name;
+   unsigned intslab_flags;
+   gfp_t   gfp_mask;
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_pool = {
-   .name   = scsi_cmd_cache,
+   .cmd_name   = scsi_cmd_cache,
+   .sense_name = scsi_sense_cache,
.slab_flags = SLAB_HWCACHE_ALIGN,
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
-   .name   = scsi_cmd_cache(DMA),
+   .cmd_name   = scsi_cmd_cache(DMA),
+   .sense_name = scsi_sense_cache(DMA),
.slab_flags = SLAB_HWCACHE_ALIGN|SLAB_CACHE_DMA,
.gfp_mask   = __GFP_DMA,
 };
 
 static DEFINE_MUTEX(host_cmd_pool_mutex);
 
-static struct kmem_cache *sense_buffer_slab;
-static int sense_buffer_slab_users;
-
 /**
  * __scsi_get_command - Allocate a struct scsi_cmnd
  * @shost: host to transmit command
@@ -177,8 +178,8 @@ struct scsi_cmnd *__scsi_get_command(struct Scsi_Host 
*shost, gfp_t gfp_mask)
struct scsi_cmnd *cmd;
unsigned char *buf;
 
-   cmd = kmem_cache_alloc(shost-cmd_pool-slab,
-   gfp_mask | shost-cmd_pool-gfp_mask);
+   cmd = kmem_cache_alloc(shost-cmd_pool-cmd_slab,
+  gfp_mask | shost-cmd_pool-gfp_mask);
 
if 

Re: [SCSI] scsi.h: add macro for enclosure bit of inquiry data

2008-01-20 Thread Stefan Richter
James Bottomley wrote:
 The macro tells us whether the device is (or contains) an enclosure device.
...
 +static inline int scsi_device_enclosure(struct scsi_device *sdev)
 +{
 + return sdev-inquiry[6]  (16);
 +}

Perhaps call it scsi_device_is_enclosure() to better reflect the nature
of this function.

Or if it is an accessor to inquiry data to you, maybe call it
scsi_device_inquiry_encserv() or
scsi_device_inquiry_enclosure_services() or
sdev_to_inquiry_encserv() or
sdev_to_inquiry_enclosure_services().

Alas neither of this fits with the existing similar functions in
scsi_device.h which don't have expressive names.
-- 
Stefan Richter
-=-==--- ---= =-=--
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-20 Thread Boaz Harrosh
On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote:
 this patch depends on the sg branch of the block tree
 
 James
 
 ---
 From: James Bottomley [EMAIL PROTECTED]
 Date: Tue, 15 Jan 2008 11:11:46 -0600
 Subject: remove use_sg_chaining
 
 With the sg table code, every SCSI driver is now either chain capable
 or broken, so there's no need to have a check in the host template.
 
 Also tidy up the code by moving the scatterlist size defines into the
 SCSI includes and permit the last entry of the scatterlist pools not
 to be a power of two.
 ---

I have a theoretical problem that BUGed me from the beginning.

Could it happen that a memory critical IO, (that is needed to free
memory), be collected into an sg-chained large IO, and the allocation 
of the multiple sg-pool-allocations fail, thous dead locking on
out-of-memory? Is there a mechanism in place that will split large IO's 
into smaller chunks in the event of out-of-memory condition in prep_fn?

Is it possible to call blk_rq_map_sg() with less then what is present
at request to only map the starting portion?

Boaz
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-20 Thread James Bottomley

On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote:
 On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote:
  this patch depends on the sg branch of the block tree
  
  James
  
  ---
  From: James Bottomley [EMAIL PROTECTED]
  Date: Tue, 15 Jan 2008 11:11:46 -0600
  Subject: remove use_sg_chaining
  
  With the sg table code, every SCSI driver is now either chain capable
  or broken, so there's no need to have a check in the host template.
  
  Also tidy up the code by moving the scatterlist size defines into the
  SCSI includes and permit the last entry of the scatterlist pools not
  to be a power of two.
  ---
 
 I have a theoretical problem that BUGed me from the beginning.
 
 Could it happen that a memory critical IO, (that is needed to free
 memory), be collected into an sg-chained large IO, and the allocation 
 of the multiple sg-pool-allocations fail, thous dead locking on
 out-of-memory? Is there a mechanism in place that will split large IO's 
 into smaller chunks in the event of out-of-memory condition in prep_fn?
 
 Is it possible to call blk_rq_map_sg() with less then what is present
 at request to only map the starting portion?

Obviously, that's why I was worrying about mempool size and default
blocks a while ago.

However, the deadlock only occurs if the device is swap or backing a
filesystem with memory mapped files.  The use cases for this are really
tapes and other entities that need huge buffers.  That's why we're
keeping the system sector size at 1024 unless you alter it through sysfs
(here gun, there foot ...)

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-20 Thread Boaz Harrosh
On Sun, Jan 20 2008 at 21:24 +0200, James Bottomley [EMAIL PROTECTED] wrote:
 On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote:
 On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] 
 wrote:
 this patch depends on the sg branch of the block tree

 James

 ---
 From: James Bottomley [EMAIL PROTECTED]
 Date: Tue, 15 Jan 2008 11:11:46 -0600
 Subject: remove use_sg_chaining

 With the sg table code, every SCSI driver is now either chain capable
 or broken, so there's no need to have a check in the host template.

 Also tidy up the code by moving the scatterlist size defines into the
 SCSI includes and permit the last entry of the scatterlist pools not
 to be a power of two.
 ---
 I have a theoretical problem that BUGed me from the beginning.

 Could it happen that a memory critical IO, (that is needed to free
 memory), be collected into an sg-chained large IO, and the allocation 
 of the multiple sg-pool-allocations fail, thous dead locking on
 out-of-memory? Is there a mechanism in place that will split large IO's 
 into smaller chunks in the event of out-of-memory condition in prep_fn?

 Is it possible to call blk_rq_map_sg() with less then what is present
 at request to only map the starting portion?
 
 Obviously, that's why I was worrying about mempool size and default
 blocks a while ago.
 
 However, the deadlock only occurs if the device is swap or backing a
 filesystem with memory mapped files.  The use cases for this are really
 tapes and other entities that need huge buffers.  That's why we're
 keeping the system sector size at 1024 unless you alter it through sysfs
 (here gun, there foot ...)
 
 James
 

OK Thanks for confirming my concern, In modern life with devices like
iSCSI that have ~0 as it's max_sector, swapping over that should be considered
and configured carefully. Once with pNFS over blocks/objects it should be 
addressed.
Perhaps with a FAIL_FAST semantics for users like pNFS to split up the requests 
if they
fail with out-of-memory.

Thanks
Boaz
 
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-20 Thread Boaz Harrosh
On Sun, Jan 20 2008 at 21:29 +0200, Jens Axboe [EMAIL PROTECTED] wrote:
 On Sun, Jan 20 2008, James Bottomley wrote:
 On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote:
 On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] 
 wrote:
 this patch depends on the sg branch of the block tree

 James

 ---
 From: James Bottomley [EMAIL PROTECTED]
 Date: Tue, 15 Jan 2008 11:11:46 -0600
 Subject: remove use_sg_chaining

 With the sg table code, every SCSI driver is now either chain capable
 or broken, so there's no need to have a check in the host template.

 Also tidy up the code by moving the scatterlist size defines into the
 SCSI includes and permit the last entry of the scatterlist pools not
 to be a power of two.
 ---
 I have a theoretical problem that BUGed me from the beginning.

 Could it happen that a memory critical IO, (that is needed to free
 memory), be collected into an sg-chained large IO, and the allocation 
 of the multiple sg-pool-allocations fail, thous dead locking on
 out-of-memory? Is there a mechanism in place that will split large IO's 
 into smaller chunks in the event of out-of-memory condition in prep_fn?

 Is it possible to call blk_rq_map_sg() with less then what is present
 at request to only map the starting portion?
 Obviously, that's why I was worrying about mempool size and default
 blocks a while ago.

 However, the deadlock only occurs if the device is swap or backing a
 filesystem with memory mapped files.  The use cases for this are really
 tapes and other entities that need huge buffers.  That's why we're
 keeping the system sector size at 1024 unless you alter it through sysfs
 (here gun, there foot ...)
 
 Alternatively (and much safer, imho), we allow blk_rq_map_sg() return
 smaller than nr_phys_segments and just ensure that the request is
 continued nicely through the normal 'request if residual' logic.
 
Thats a grate Idea. I will Q it on my todo list. Thanks

Boaz

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-20 Thread Jens Axboe
On Sun, Jan 20 2008, Jens Axboe wrote:
 On Sun, Jan 20 2008, Boaz Harrosh wrote:
  On Sun, Jan 20 2008 at 21:29 +0200, Jens Axboe [EMAIL PROTECTED] wrote:
   On Sun, Jan 20 2008, James Bottomley wrote:
   On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote:
   On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] 
   wrote:
   this patch depends on the sg branch of the block tree
  
   James
  
   ---
   From: James Bottomley [EMAIL PROTECTED]
   Date: Tue, 15 Jan 2008 11:11:46 -0600
   Subject: remove use_sg_chaining
  
   With the sg table code, every SCSI driver is now either chain capable
   or broken, so there's no need to have a check in the host template.
  
   Also tidy up the code by moving the scatterlist size defines into the
   SCSI includes and permit the last entry of the scatterlist pools not
   to be a power of two.
   ---
   I have a theoretical problem that BUGed me from the beginning.
  
   Could it happen that a memory critical IO, (that is needed to free
   memory), be collected into an sg-chained large IO, and the allocation 
   of the multiple sg-pool-allocations fail, thous dead locking on
   out-of-memory? Is there a mechanism in place that will split large IO's 
   into smaller chunks in the event of out-of-memory condition in prep_fn?
  
   Is it possible to call blk_rq_map_sg() with less then what is present
   at request to only map the starting portion?
   Obviously, that's why I was worrying about mempool size and default
   blocks a while ago.
  
   However, the deadlock only occurs if the device is swap or backing a
   filesystem with memory mapped files.  The use cases for this are really
   tapes and other entities that need huge buffers.  That's why we're
   keeping the system sector size at 1024 unless you alter it through sysfs
   (here gun, there foot ...)
   
   Alternatively (and much safer, imho), we allow blk_rq_map_sg() return
   smaller than nr_phys_segments and just ensure that the request is
   continued nicely through the normal 'request if residual' logic.
   
  Thats a grate Idea. I will Q it on my todo list. Thanks
 
 ok good, thanks :-)

btw, the above is full of typos, my apologies. it should read requeue
if residual, but I guess you already guessed as much.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-20 Thread Jens Axboe
On Sun, Jan 20 2008, James Bottomley wrote:
 
 On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote:
  On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] 
  wrote:
   this patch depends on the sg branch of the block tree
   
   James
   
   ---
   From: James Bottomley [EMAIL PROTECTED]
   Date: Tue, 15 Jan 2008 11:11:46 -0600
   Subject: remove use_sg_chaining
   
   With the sg table code, every SCSI driver is now either chain capable
   or broken, so there's no need to have a check in the host template.
   
   Also tidy up the code by moving the scatterlist size defines into the
   SCSI includes and permit the last entry of the scatterlist pools not
   to be a power of two.
   ---
  
  I have a theoretical problem that BUGed me from the beginning.
  
  Could it happen that a memory critical IO, (that is needed to free
  memory), be collected into an sg-chained large IO, and the allocation 
  of the multiple sg-pool-allocations fail, thous dead locking on
  out-of-memory? Is there a mechanism in place that will split large IO's 
  into smaller chunks in the event of out-of-memory condition in prep_fn?
  
  Is it possible to call blk_rq_map_sg() with less then what is present
  at request to only map the starting portion?
 
 Obviously, that's why I was worrying about mempool size and default
 blocks a while ago.
 
 However, the deadlock only occurs if the device is swap or backing a
 filesystem with memory mapped files.  The use cases for this are really
 tapes and other entities that need huge buffers.  That's why we're
 keeping the system sector size at 1024 unless you alter it through sysfs
 (here gun, there foot ...)

Alternatively (and much safer, imho), we allow blk_rq_map_sg() return
smaller than nr_phys_segments and just ensure that the request is
continued nicely through the normal 'request if residual' logic.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-20 Thread Jens Axboe
On Sun, Jan 20 2008, Boaz Harrosh wrote:
 On Sun, Jan 20 2008 at 21:24 +0200, James Bottomley [EMAIL PROTECTED] wrote:
  On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote:
  On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] 
  wrote:
  this patch depends on the sg branch of the block tree
 
  James
 
  ---
  From: James Bottomley [EMAIL PROTECTED]
  Date: Tue, 15 Jan 2008 11:11:46 -0600
  Subject: remove use_sg_chaining
 
  With the sg table code, every SCSI driver is now either chain capable
  or broken, so there's no need to have a check in the host template.
 
  Also tidy up the code by moving the scatterlist size defines into the
  SCSI includes and permit the last entry of the scatterlist pools not
  to be a power of two.
  ---
  I have a theoretical problem that BUGed me from the beginning.
 
  Could it happen that a memory critical IO, (that is needed to free
  memory), be collected into an sg-chained large IO, and the allocation 
  of the multiple sg-pool-allocations fail, thous dead locking on
  out-of-memory? Is there a mechanism in place that will split large IO's 
  into smaller chunks in the event of out-of-memory condition in prep_fn?
 
  Is it possible to call blk_rq_map_sg() with less then what is present
  at request to only map the starting portion?
  
  Obviously, that's why I was worrying about mempool size and default
  blocks a while ago.
  
  However, the deadlock only occurs if the device is swap or backing a
  filesystem with memory mapped files.  The use cases for this are really
  tapes and other entities that need huge buffers.  That's why we're
  keeping the system sector size at 1024 unless you alter it through sysfs
  (here gun, there foot ...)
  
  James
  
 
 OK Thanks for confirming my concern, In modern life with devices like
 iSCSI that have ~0 as it's max_sector, swapping over that should be
 considered and configured carefully. Once with pNFS over
 blocks/objects it should be addressed.  Perhaps with a FAIL_FAST
 semantics for users like pNFS to split up the requests if they fail
 with out-of-memory.

I'll have to disagree again, you can't expect users to know these sorts
of things (sorry your system deadlocked, you should have known not to
increase max_sectors_kb for something you swap on). Especially when
handling it correctly in scsi_init_io() is a few lines of change.

No excuse for not doing this correctly. At least for blk_fs_request()
requests, for blk_pc_request() failing is the only option.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-20 Thread Jens Axboe
On Sun, Jan 20 2008, Boaz Harrosh wrote:
 On Sun, Jan 20 2008 at 21:29 +0200, Jens Axboe [EMAIL PROTECTED] wrote:
  On Sun, Jan 20 2008, James Bottomley wrote:
  On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote:
  On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] 
  wrote:
  this patch depends on the sg branch of the block tree
 
  James
 
  ---
  From: James Bottomley [EMAIL PROTECTED]
  Date: Tue, 15 Jan 2008 11:11:46 -0600
  Subject: remove use_sg_chaining
 
  With the sg table code, every SCSI driver is now either chain capable
  or broken, so there's no need to have a check in the host template.
 
  Also tidy up the code by moving the scatterlist size defines into the
  SCSI includes and permit the last entry of the scatterlist pools not
  to be a power of two.
  ---
  I have a theoretical problem that BUGed me from the beginning.
 
  Could it happen that a memory critical IO, (that is needed to free
  memory), be collected into an sg-chained large IO, and the allocation 
  of the multiple sg-pool-allocations fail, thous dead locking on
  out-of-memory? Is there a mechanism in place that will split large IO's 
  into smaller chunks in the event of out-of-memory condition in prep_fn?
 
  Is it possible to call blk_rq_map_sg() with less then what is present
  at request to only map the starting portion?
  Obviously, that's why I was worrying about mempool size and default
  blocks a while ago.
 
  However, the deadlock only occurs if the device is swap or backing a
  filesystem with memory mapped files.  The use cases for this are really
  tapes and other entities that need huge buffers.  That's why we're
  keeping the system sector size at 1024 unless you alter it through sysfs
  (here gun, there foot ...)
  
  Alternatively (and much safer, imho), we allow blk_rq_map_sg() return
  smaller than nr_phys_segments and just ensure that the request is
  continued nicely through the normal 'request if residual' logic.
  
 Thats a grate Idea. I will Q it on my todo list. Thanks

ok good, thanks :-)

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-20 Thread James Bottomley

On Sun, 2008-01-20 at 21:54 +0200, Boaz Harrosh wrote:
 On Sun, Jan 20 2008 at 21:24 +0200, James Bottomley [EMAIL PROTECTED] wrote:
  On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote:
  On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] 
  wrote:
  this patch depends on the sg branch of the block tree
 
  James
 
  ---
  From: James Bottomley [EMAIL PROTECTED]
  Date: Tue, 15 Jan 2008 11:11:46 -0600
  Subject: remove use_sg_chaining
 
  With the sg table code, every SCSI driver is now either chain capable
  or broken, so there's no need to have a check in the host template.
 
  Also tidy up the code by moving the scatterlist size defines into the
  SCSI includes and permit the last entry of the scatterlist pools not
  to be a power of two.
  ---
  I have a theoretical problem that BUGed me from the beginning.
 
  Could it happen that a memory critical IO, (that is needed to free
  memory), be collected into an sg-chained large IO, and the allocation 
  of the multiple sg-pool-allocations fail, thous dead locking on
  out-of-memory? Is there a mechanism in place that will split large IO's 
  into smaller chunks in the event of out-of-memory condition in prep_fn?
 
  Is it possible to call blk_rq_map_sg() with less then what is present
  at request to only map the starting portion?
  
  Obviously, that's why I was worrying about mempool size and default
  blocks a while ago.
  
  However, the deadlock only occurs if the device is swap or backing a
  filesystem with memory mapped files.  The use cases for this are really
  tapes and other entities that need huge buffers.  That's why we're
  keeping the system sector size at 1024 unless you alter it through sysfs
  (here gun, there foot ...)
  
  James
  
 
 OK Thanks for confirming my concern, In modern life with devices like
 iSCSI that have ~0 as it's max_sector, swapping over that should be considered
 and configured carefully. Once with pNFS over blocks/objects it should be 
 addressed.
 Perhaps with a FAIL_FAST semantics for users like pNFS to split up the 
 requests if they
 fail with out-of-memory.

Well, swap over networked backed devices is an order of magnitude worse
of a problem.

However, the block layer doesn't let you set max_sectors over 1024; even
when iscsi requests ~0 it gets 1024 but the user is allowed to raise
this via sysfs. (That's the difference between max_sectors [currently
operating parameter] and max_hw_sectors [passed in maximum])

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: scsi-sd-last-sector-bug-flag.patch

2008-01-20 Thread Greg KH
On Sun, Jan 20, 2008 at 11:12:26AM +0100, Hans de Goede wrote:
 Hi all,

 This patch adds a new scsi_device flag for devices which contain a bug 
 where
 the device crashes when the last sector is read in a larger then 1 sector 
 read.

 This is for example the case with sdcards in the HP PSC1350 printer 
 cardreader
 and in the HP PSC1610 printer cardreader.

Wait, we already handle this in the usb-storage driver, why are you
putting this in the scsi core now?

confused,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: usb-storage-set-last-sector-bug-flag.patch

2008-01-20 Thread Greg KH
On Sun, Jan 20, 2008 at 11:27:29AM +0100, Hans de Goede wrote:
 Hi all,

 This patch sets the last_sector_bug flag to 1 for all USB disks. This is
 needed to makes the cardreader on various HP multifunction printers work.

 Since the performance impact is negible we set this flag for all USB disks 
 to avoid an unusual_devs.h nightmare.

Oh great, now my working just fine USB devices, which happen to have
data in the last sector, suddenly stop working.

That's not acceptable :(

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: usb-storage-set-last-sector-bug-flag.patch

2008-01-20 Thread Hans de Goede

James Bottomley wrote:

On Sun, 2008-01-20 at 12:56 -0800, Greg KH wrote:

On Sun, Jan 20, 2008 at 11:27:29AM +0100, Hans de Goede wrote:

Hi all,

This patch sets the last_sector_bug flag to 1 for all USB disks. This is
needed to makes the cardreader on various HP multifunction printers work.

Since the performance impact is negible we set this flag for all USB disks 
to avoid an unusual_devs.h nightmare.

Oh great, now my working just fine USB devices, which happen to have
data in the last sector, suddenly stop working.

That's not acceptable :(


I don't see how this will happen, might you not be confusing this change
(which allows access to the last sector, just insists that it be
accessed by a single sector read) with US_FL_FIX_CAPACITY which is for
devices that report having one more sectors than they actually have and
therefore adjusts the access limits down by one?



Let me try to explain some more, the scsi-sd patch, which goes hand in hand 
with this adds a last_sector_bug flag, which is indeed a different flag form 
the fix_capacity flag. Both deal with with what are (in case of the 
last_sector_bug flag probably) off by one bugs.


The fix_capacity flag is for devices which report their last sector is N (with 
sectors numbered from 0 - N) but in reality  / they mean they have N sectors, 
so their last sector really is N - 1.


The last_sector_bug flag is for a bug (sofar only seen in HP multifunction 
printers with cardreader when using an sdcard) where the reader will cease to 
function (returns error codes in response to each and every command) after one 
has attempted to read the last sector in a read larger then 1 sector. To be 
clear an example lets say an example disk has 256 sectors, numbered 0 - 255.


Then the following reads will all cause the reader to go into borked mode:
16 sectors starting at 240
8 sectors starting at 248
2 sectors starting at 254

Yet the last sector can still be read without problems the following read:
1 sector starting at 255

So what the scsi-sd part of these 2 patches does it adds a last_sector_bug 
flag, which when set will cause the layer to split a read like this:

16 sectors starting at 240

Into:
15 sectors starting at 240
1 sector starting at 255


Since reading the last sector is a rare occurence (but one which does happen 
every time when determining the partition table, triggering the bug on every 
card insert). and since there are a lot of different HP printer models ( and 
cheap usb card readers are notoriously buggy so other cardreaders might be 
affected too), Matthew Dharm (the usb-storage maintainer) thought it best to 
enable this for all devices.


Regards,

Hans
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: scsi-sd-last-sector-bug-flag.patch

2008-01-20 Thread Hans de Goede

Greg KH wrote:

On Sun, Jan 20, 2008 at 11:12:26AM +0100, Hans de Goede wrote:

Hi all,

This patch adds a new scsi_device flag for devices which contain a bug 
where
the device crashes when the last sector is read in a larger then 1 sector 
read.


This is for example the case with sdcards in the HP PSC1350 printer 
cardreader

and in the HP PSC1610 printer cardreader.


Wait, we already handle this in the usb-storage driver, why are you
putting this in the scsi core now?



No we don't, I've send patches for this to the usb-storage driver but they were 
refused as they modified scsi commands in flight, the consensus was that this 
should be done in the scsi layer, hence this patch.



confused,


I've noticed, esp. with regards to your second mail to which I will reply next. 
There has been a lot of discussion about this ending in this (perfectly fine) 
patch, which is much better then my original hack if I may add that.


Regards,

Hans
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-20 Thread James Bottomley

On Sun, 2008-01-20 at 21:01 +0100, Jens Axboe wrote:
 On Sun, Jan 20 2008, Jens Axboe wrote:
  On Sun, Jan 20 2008, Boaz Harrosh wrote:
   On Sun, Jan 20 2008 at 21:29 +0200, Jens Axboe [EMAIL PROTECTED] wrote:
On Sun, Jan 20 2008, James Bottomley wrote:
On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote:
On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL 
PROTECTED] wrote:
this patch depends on the sg branch of the block tree
   
James
   
---
From: James Bottomley [EMAIL PROTECTED]
Date: Tue, 15 Jan 2008 11:11:46 -0600
Subject: remove use_sg_chaining
   
With the sg table code, every SCSI driver is now either chain capable
or broken, so there's no need to have a check in the host template.
   
Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
---
I have a theoretical problem that BUGed me from the beginning.
   
Could it happen that a memory critical IO, (that is needed to free
memory), be collected into an sg-chained large IO, and the allocation 
of the multiple sg-pool-allocations fail, thous dead locking on
out-of-memory? Is there a mechanism in place that will split large 
IO's 
into smaller chunks in the event of out-of-memory condition in 
prep_fn?
   
Is it possible to call blk_rq_map_sg() with less then what is present
at request to only map the starting portion?
Obviously, that's why I was worrying about mempool size and default
blocks a while ago.
   
However, the deadlock only occurs if the device is swap or backing a
filesystem with memory mapped files.  The use cases for this are really
tapes and other entities that need huge buffers.  That's why we're
keeping the system sector size at 1024 unless you alter it through 
sysfs
(here gun, there foot ...)

Alternatively (and much safer, imho), we allow blk_rq_map_sg() return
smaller than nr_phys_segments and just ensure that the request is
continued nicely through the normal 'request if residual' logic.

   Thats a grate Idea. I will Q it on my todo list. Thanks
  
  ok good, thanks :-)
 
 btw, the above is full of typos, my apologies. it should read requeue
 if residual, but I guess you already guessed as much.

Something like ...

It looks to me like it would make sense to have something like a
BLKPREP_SGALLOCFAIL return so the block layer can do this for us ...
Alternatively, we'll have to find a way of adjusting the sector count as
it goes into the ULD prep functions.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: usb-storage-set-last-sector-bug-flag.patch

2008-01-20 Thread James Bottomley

On Sun, 2008-01-20 at 12:56 -0800, Greg KH wrote:
 On Sun, Jan 20, 2008 at 11:27:29AM +0100, Hans de Goede wrote:
  Hi all,
 
  This patch sets the last_sector_bug flag to 1 for all USB disks. This is
  needed to makes the cardreader on various HP multifunction printers work.
 
  Since the performance impact is negible we set this flag for all USB disks 
  to avoid an unusual_devs.h nightmare.
 
 Oh great, now my working just fine USB devices, which happen to have
 data in the last sector, suddenly stop working.
 
 That's not acceptable :(

I don't see how this will happen, might you not be confusing this change
(which allows access to the last sector, just insists that it be
accessed by a single sector read) with US_FL_FIX_CAPACITY which is for
devices that report having one more sectors than they actually have and
therefore adjusts the access limits down by one?

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] PATCH: usb-storage-set-last-sector-bug-flag.patch

2008-01-20 Thread Guillaume Bedot

Le dimanche 20 janvier 2008 à 15:03 -0600, James Bottomley a écrit :
 On Sun, 2008-01-20 at 12:56 -0800, Greg KH wrote:
  On Sun, Jan 20, 2008 at 11:27:29AM +0100, Hans de Goede wrote:
   Hi all,
  
   This patch sets the last_sector_bug flag to 1 for all USB disks. This is
   needed to makes the cardreader on various HP multifunction printers work.
  
   Since the performance impact is negible we set this flag for all USB 
   disks 
   to avoid an unusual_devs.h nightmare.
  
  Oh great, now my working just fine USB devices, which happen to have
  data in the last sector, suddenly stop working.
  
  That's not acceptable :(
 
 I don't see how this will happen, might you not be confusing this change
 (which allows access to the last sector, just insists that it be
 accessed by a single sector read) with US_FL_FIX_CAPACITY which is for
 devices that report having one more sectors than they actually have and
 therefore adjusts the access limits down by one?

Well, i was the one suggesting more than 2 devices might be impacted
(i'm absolutely not sure about this, but it could be).
It seems caused by a common error when using 0 as a base index.

I don't like loosing performance for broken devices, but i'm not against
keeping that patch for all devices.

1) As the last sectors are read when the card is inserted, it results in
not working at all devices :
Users may think they are just not supported, and won't report any bug.
See how Hans worked hard to find 3 other cases in fedora, ubuntu, etc
forums !

2) It should not break in the other cases, IIUC, it just splits the read
in two.

3) It's just about the last sector, so any issue should only be greater
timing when using realtime perhaps ?

Whatever, this patch, or an other form of it, is needed (because of
1) ).
If it is a default, an option or dedicated to a limited set of devices
must be chosen.

I hope you will soon find this solution.

Best regards,

Guillaume B.





-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SCSI] scsi.h: add macro for enclosure bit of inquiry data

2008-01-20 Thread James Bottomley

On Sun, 2008-01-20 at 18:44 +0100, Stefan Richter wrote:
 James Bottomley wrote:
  The macro tells us whether the device is (or contains) an enclosure device.
 ...
  +static inline int scsi_device_enclosure(struct scsi_device *sdev)
  +{
  +   return sdev-inquiry[6]  (16);
  +}
 
 Perhaps call it scsi_device_is_enclosure() to better reflect the nature
 of this function.
 
 Or if it is an accessor to inquiry data to you, maybe call it
 scsi_device_inquiry_encserv() or
 scsi_device_inquiry_enclosure_services() or
 sdev_to_inquiry_encserv() or
 sdev_to_inquiry_enclosure_services().
 
 Alas neither of this fits with the existing similar functions in
 scsi_device.h which don't have expressive names.

Right .. that's the problem.  Being potentially clearer in naming but at
odds with what's currently in the file starts to add to the confusion
about the other names in there.  I chose the name primarily because it's
the same form as all the others.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Marvell 6440 SAS/SATA driver (draft)

2008-01-20 Thread Ke Wei
Hello James:


   I read your comments.

.target_alloc   = sas_target_alloc,
.slave_configure= sas_slave_configure,
.slave_destroy  = sas_slave_destroy,
.change_queue_depth = sas_change_queue_depth,
.change_queue_type  = sas_change_queue_type,
.bios_param = sas_bios_param,
  - .can_queue  = 1,
  + .can_queue  = 30,

 I don't think you want to do this.  It starts sending multiple commands
 at once.  The design use of libsas is to start off at 1 and then up the
 limit in slave configure once we know what type of device we're dealing
 with.  The current sas_slave_configure has a limitation in that the
 depth is hard coded to 32, so if you need it smaller, we'll have to find
 a way of allowing this?  Is 30 your max queue depth?

Yes , I may not do this. But I need to set
sas_ha_struct.lldd_queue_size to suitable value.
Because sas_queue sends multiple commands according to lldd_queue_size
before calling lldd_execute_task function which supports queue depth ,
30 is suitable.


.cmd_per_lun= 1,
.this_id= -1,
  - .sg_tablesize   = SG_ALL,
  - .max_sectors= SCSI_DEFAULT_MAX_SECTORS,
  - .use_clustering = ENABLE_CLUSTERING,
  + .sg_tablesize   = 32,

 32 looks a little small.  My reading of the code is that the PRD table
 has to fit with the command header, OAF area and status area into an 8k
 segment of memory, so at 16 bytes per PRD entry, it looks like a page
 worth at least (256) isn't unreasonable.  You should probably have some
 type of macro for this though.

  + .max_sectors= (128*1024)9,

Yes , It's demo code . And I need to test device to find a good value
according to performance and reliability.


  + .use_clustering = DISABLE_CLUSTERING,

 I think this is wrong ... there should be no modern device that disables
 clustering (otherwise they fall over badly on iommu platforms).

.eh_device_reset_handler= sas_eh_device_reset_handler,
.eh_bus_reset_handler   = sas_eh_bus_reset_handler,
  - .slave_alloc= sas_slave_alloc,

 Please don't remove this ... it's a standard callback, and it's required
 for the day you support SATA.

You are right. I forgot to recover these codes when I debuged.
And Driver has support for SATA devices . I will commit the patches in
this weeks.

Thank you for your help.

Ke Wei.

On Jan 19, 2008 2:53 AM, James Bottomley
[EMAIL PROTECTED] wrote:
 On Thu, 2008-01-10 at 14:53 +0800, Ke Wei wrote:
  The 88SE6440 driver :
 
  The driver is based on bare code from Jeff Garzik. And it can work
  under linux kernel 2.6.23.
  By far, Can discover and find SAS HDD, but SATA is currently
  unsupported. Command queue depth can be above 1.
  Most error handling, and some phy handling code is notably missing.
 
 
  contains the following updates:
 
  --- mvsas_orig.c  2007-12-06 19:21:32.0 -0500
  +++ mvsas.c   2008-01-09 04:53:14.0 -0500
 [...]
  +#define MVS_BIT(x) (1L  (x))
  +
  +#define PORT_TYPE_SATAMVS_BIT(0)
  +#define PORT_TYPE_SAS MVS_BIT(1)
  +
  +#define READ_PORT_CONFIG_DATA(i) \
  + ((i3)?mr32(P4_CFG_DATA+(i-4)*8):mr32(P0_CFG_DATA+i*8))
  +#define WRITE_PORT_CONFIG_DATA(i,tmp) \
  + {if(i3)mw32(P4_CFG_DATA+(i-4)*8,tmp);else mw32(P0_CFG_DATA+i*8,tmp);}
  +#define WRITE_PORT_CONFIG_ADDR(i,tmp) \
  + {if(i3)mw32(P4_CFG_ADDR+(i-4)*8,tmp);else mw32(P0_CFG_ADDR+i*8,tmp);}
  +
  +#define READ_PORT_PHY_CONTROL(i) \
  +   ((i3)?mr32(P4_SER_CTLSTAT+(i-4)*4):mr32(P0_SER_CTLSTAT+i*4))
  +#define WRITE_PORT_PHY_CONTROL(i,tmp) \
  +   {if(i3)mw32(P4_SER_CTLSTAT+(i-4)*4,tmp);else
  mw32(P0_SER_CTLSTAT+i*4,tmp);}

 This is an example of where you mailer has broken a line (which causes
 the patch not to apply).


  +
  +#define READ_PORT_VSR_DATA(i) \
  +   ((i3)?mr32(P4_VSR_DATA+(i-4)*8):mr32(P0_VSR_DATA+i*8))
  +#define WRITE_PORT_VSR_DATA(i,tmp) \
  +   {if(i3)mw32(P4_VSR_DATA+(i-4)*8,tmp);else mw32(P0_VSR_DATA+i*8,tmp);}
  +#define WRITE_PORT_VSR_ADDR(i,tmp) \
  +   {if(i3)mw32(P4_VSR_ADDR+(i-4)*8,tmp);else mw32(P0_VSR_ADDR+i*8,tmp);}
  +
  +#define READ_PORT_IRQ_STAT(i) \
  +   ((i3)?mr32(P4_INT_STAT+(i-4)*8):mr32(P0_INT_STAT+i*8))
  +#define WRITE_PORT_IRQ_STAT(i,tmp) \
  +   {if(i3)mw32(P4_INT_STAT+(i-4)*8,tmp);else mw32(P0_INT_STAT+i*8,tmp);}
  +#define READ_PORT_IRQ_MASK(i) \
  +   ((i3)?mr32(P4_INT_MASK+(i-4)*8):mr32(P0_INT_MASK+i*8))
  +#define WRITE_PORT_IRQ_MASK(i,tmp) \
  +   {if(i3)mw32(P4_INT_MASK+(i-4)*8,tmp);else mw32(P0_INT_MASK+i*8,tmp);}
  +
   /* driver compile-time configuration */
   enum driver_configuration {
  - MVS_TX_RING_SZ  = 1024, /* TX ring size (12-bit) */
  + MVS_TX_RING_SZ  = 512,  /* TX ring size (12-bit) */
MVS_RX_RING_SZ  = 1024, /* RX ring size 

Re: [PATCH v3] use dynamically allocated sense buffer

2008-01-20 Thread FUJITA Tomonori
On Sun, 20 Jan 2008 09:40:11 -0700
Matthew Wilcox [EMAIL PROTECTED] wrote:

 On Wed, Jan 16, 2008 at 01:32:17PM +0900, FUJITA Tomonori wrote:
  This removes static array sense_buffer in scsi_cmnd and uses
  dynamically allocated sense_buffer (with GFP_DMA).
  
  The reason for doing this is that some architectures need cacheline
  aligned buffer for DMA:
  
  http://lkml.org/lkml/2007/11/19/2
  
  The problems are that scsi_eh_prep_cmnd puts scsi_cmnd::sense_buffer
  to sglist and some LLDs directly DMA to scsi_cmnd::sense_buffer. It's
  necessary to DMA to scsi_cmnd::sense_buffer safely. This patch solves
  these issues.
  
  __scsi_get_command allocates sense_buffer via kmem_cache_alloc and
  attaches it to a scsi_cmnd so everything just work as before.
 
 I think this is fine for the moment.
 
 Longer-term, I want to allow low-level drivers to allocate the
 sense_buffer themselves so they can DMA directly into it (ie grown-up dma
 mapping, rather than this quaint x86 __GFP_DMA).  This patch doesn't get

Yeah, I think that the approach is one of candidates.

If we go with it, I think that the major issue is that LLDs don't know
when they can reclaim sense_buffer from scsi-ml; scsi-ml uses
sense_buffer after scmd-scsi_done.


 us any closer to that, but it doesn't get us further away from it either.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] use dynamically allocated sense buffer

2008-01-20 Thread FUJITA Tomonori
On Sun, 20 Jan 2008 10:36:56 -0600
James Bottomley [EMAIL PROTECTED] wrote:

 
 On Wed, 2008-01-16 at 13:32 +0900, FUJITA Tomonori wrote:
  This is the third version of:
  
  http://marc.info/?l=linux-scsim=120038907123706w=2
 [...]
  diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
  index 54ff611..0a4a5b8 100644
  --- a/drivers/scsi/scsi.c
  +++ b/drivers/scsi/scsi.c
  @@ -186,6 +190,21 @@ struct scsi_cmnd *__scsi_get_command(struct Scsi_Host 
  *shost, gfp_t gfp_mask)
  list_del_init(cmd-list);
  }
  spin_unlock_irqrestore(shost-free_list_lock, flags);
  +
  +   if (cmd) {
  +   buf = cmd-sense_buffer;
  +   memset(cmd, 0, sizeof(*cmd));
  +   cmd-sense_buffer = buf;
  +   }
  +   } else {
  +   buf = kmem_cache_alloc(sense_buffer_slab, __GFP_DMA|gfp_mask);
 
 This is going to cause the enterprise some angst because ZONE_DMA can be
 very small, and the enterprise requrements for commands in flight very
 large.  I also think its unnecessary if we know the host isn't requiring
 ISA DMA.

Yes, I should have done properly.


  How about the below to fix this, it's based on the existing
 infrastructure for solving the very same problem.

Looks nice. Integrating sense_buffer_pool into struct
scsi_host_cmd_pool looks much better.


 James
 
 ---
 
 From e7ffbd81684779f5eb41d66d5f499b82af40e12b Mon Sep 17 00:00:00 2001
 From: James Bottomley [EMAIL PROTECTED]
 Date: Sun, 20 Jan 2008 09:28:24 -0600
 Subject: [SCSI] don't use __GFP_DMA for sense buffers if not required
 
 Only hosts which actually have ISA DMA requirements need sense buffers
 coming out of ZONE_DMA, so only use the __GFP_DMA flag for that case
 to avoid allocating this scarce resource if it's not necessary.
 
 Signed-off-by: James Bottomley [EMAIL PROTECTED]
 ---
  drivers/scsi/hosts.c |9 +
  drivers/scsi/scsi.c  |  106 +++--
  drivers/scsi/scsi_priv.h |2 -
  3 files changed, 46 insertions(+), 71 deletions(-)
 

(snip)

 @@ -310,7 +313,6 @@ int scsi_setup_command_freelist(struct Scsi_Host *shost)
  {
   struct scsi_host_cmd_pool *pool;
   struct scsi_cmnd *cmd;
 - unsigned char *sense_buffer;
  
   spin_lock_init(shost-free_list_lock);
   INIT_LIST_HEAD(shost-free_list);
 @@ -322,10 +324,13 @@ int scsi_setup_command_freelist(struct Scsi_Host *shost)
   mutex_lock(host_cmd_pool_mutex);
   pool = (shost-unchecked_isa_dma ? scsi_cmd_dma_pool : scsi_cmd_pool);
   if (!pool-users) {
 - pool-slab = kmem_cache_create(pool-name,
 - sizeof(struct scsi_cmnd), 0,
 - pool-slab_flags, NULL);
 - if (!pool-slab)
 + pool-cmd_slab = kmem_cache_create(pool-cmd_name,
 +sizeof(struct scsi_cmnd), 0,
 +pool-slab_flags, NULL);
 + pool-sense_slab = kmem_cache_create(pool-sense_name,
 +  SCSI_SENSE_BUFFERSIZE, 0,
 +  pool-slab_flags, NULL);
 + if (!pool-cmd_slab || !pool-sense_slab)
   goto fail;


If one of the above allocations fails, the allocated slab leaks?


diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 045e69d..1a9fba6 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -327,11 +327,16 @@ int scsi_setup_command_freelist(struct Scsi_Host *shost)
pool-cmd_slab = kmem_cache_create(pool-cmd_name,
   sizeof(struct scsi_cmnd), 0,
   pool-slab_flags, NULL);
+   if (!pool-cmd_slab)
+   goto fail;
+
pool-sense_slab = kmem_cache_create(pool-sense_name,
 SCSI_SENSE_BUFFERSIZE, 0,
 pool-slab_flags, NULL);
-   if (!pool-cmd_slab || !pool-sense_slab)
+   if (!pool-sense_slab) {
+   kmem_cache_destroy(pool-cmd_slab);
goto fail;
+   }
}
 
pool-users++;
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-20 Thread FUJITA Tomonori
On Sun, 20 Jan 2008 21:54:21 +0200
Boaz Harrosh [EMAIL PROTECTED] wrote:

 On Sun, Jan 20 2008 at 21:24 +0200, James Bottomley [EMAIL PROTECTED] wrote:
  On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote:
  On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] 
  wrote:
  this patch depends on the sg branch of the block tree
 
  James
 
  ---
  From: James Bottomley [EMAIL PROTECTED]
  Date: Tue, 15 Jan 2008 11:11:46 -0600
  Subject: remove use_sg_chaining
 
  With the sg table code, every SCSI driver is now either chain capable
  or broken, so there's no need to have a check in the host template.
 
  Also tidy up the code by moving the scatterlist size defines into the
  SCSI includes and permit the last entry of the scatterlist pools not
  to be a power of two.
  ---
  I have a theoretical problem that BUGed me from the beginning.
 
  Could it happen that a memory critical IO, (that is needed to free
  memory), be collected into an sg-chained large IO, and the allocation 
  of the multiple sg-pool-allocations fail, thous dead locking on
  out-of-memory? Is there a mechanism in place that will split large IO's 
  into smaller chunks in the event of out-of-memory condition in prep_fn?
 
  Is it possible to call blk_rq_map_sg() with less then what is present
  at request to only map the starting portion?
  
  Obviously, that's why I was worrying about mempool size and default
  blocks a while ago.
  
  However, the deadlock only occurs if the device is swap or backing a
  filesystem with memory mapped files.  The use cases for this are really
  tapes and other entities that need huge buffers.  That's why we're
  keeping the system sector size at 1024 unless you alter it through sysfs
  (here gun, there foot ...)
  
  James
  
 
 OK Thanks for confirming my concern, In modern life with devices like
 iSCSI that have ~0 as it's max_sector, swapping over that should be considered
 and configured carefully. Once with pNFS over blocks/objects it should be 
 addressed.
 Perhaps with a FAIL_FAST semantics for users like pNFS to split up the 
 requests if they
 fail with out-of-memory.

As James pointed out, with networked backed device, the things are
much more complicated (I have no idea when such configuration will be
possible).

http://kerneltrap.org/Linux/Swap_Over_NFS
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html