Re: Performance of SCST versus STGT
On Jan 18, 2008 1:08 PM, Vladislav Bolkhovitin [EMAIL PROTECTED] wrote: [ ... ] So, seems I understood your slides correctly: the more valuable data for our SCST SRP vs STGT iSER comparison should be on page 26 for 1 command read (~480MB/s, i.e. ~60% from Bart's result on the equivalent hardware). At least in my tests SCST performed significantly better than STGT. These tests were performed with the currently available implementations of SCST and STGT. Which performance improvements are possible for these projects (e.g. zero-copying), and by how much is it expected that these performance improvements will increase throughput and will decrease latency ? Bart. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: usb-storage-psc1350-v4.patch (was Linux scsi / usb-mass-storage and HP printer cardreader bug + fix)
James Bottomley wrote: On Mon, 2008-01-14 at 20:27 +0100, Hans de Goede wrote: James Bottomley wrote: Plus what is the rq-nr_sectors sdp-sector_size / 512 test supposed to be doing? that being true is supposed to be a guarantee of the block layer (and if something goes wrong there's a check for this lower down). It first is was just: rq-nr_sectors 1 Then I changed it to also do the right thing for 1024 and larger sector disks. The whole purpose is to not read the last sector in a larger then one sector read, so the amount of sectors gets reduced by one if (block + rq-nr_sectors == get_capacity(disk)) but we do want still want to be able to read the last sector by itself, so we must not reduce the no sectors to be read by one if it is already one. Yes, I know that. What I mean is the block subsystem sends reads and writes down in increments of the sector size. Checking if the current number of pending sectors is greater than the current block size is checking that guarantee. The current code already has a check in it to see if this guarantee is observed ... you don't need to check it again. I'm not checking for the guarantee, I'm checking that the read 1 realsize_sector, before decrementing the number of _512_bytes_sectors by realsize_sector, this check must be there, as after reading all but the last sector, another request will be send with 1 realsize_sector size, for which (block + rq-nr_sectors) == get_capacity(disk) will still hold true, in this case this_count must not be lowered, otherwise this_count will become 0 in this case and the last sector will never get read. I hope that clarifies why that code is there, if not can you tell how you would want the code to look by just giving the full if statement as it should be, I think we are somehow misunderstanding each other. Oh, right; it's rather than = ... sorry, yes that's fine. Ok, I got swamped @work this week so it took a while, but I've made a new seperate (scsi-sd only) patch with the cleanups as discussed. I'm sending this (to the full recipient list) in a new top level post titled: PATCH: scsi-sd-last-sector-bug-flag.patch Regards, Hans - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PATCH: scsi-sd-last-sector-bug-flag.patch
Hi all, This patch adds a new scsi_device flag for devices which contain a bug where the device crashes when the last sector is read in a larger then 1 sector read. This is for example the case with sdcards in the HP PSC1350 printer cardreader and in the HP PSC1610 printer cardreader. Signed-off-by: Hans de Goede [EMAIL PROTECTED] Regards, Hans This patch adds a new scsi_device flag for devices which contain a bug where the device crashes when the last sector is read in a larger then 1 sector read. This is for example the case with sdcards in the HP PSC1350 printer cardreader and in the HP PSC1610 printer cardreader. Signed-off-by: Hans de Goede [EMAIL PROTECTED] diff -up vanilla-2.6.24-rc7/include/scsi/scsi_device.h.psc1350 vanilla-2.6.24-rc7/include/scsi/scsi_device.h --- vanilla-2.6.24-rc7/include/scsi/scsi_device.h.psc1350 2008-01-11 19:40:31.0 +0100 +++ vanilla-2.6.24-rc7/include/scsi/scsi_device.h 2008-01-11 19:40:48.0 +0100 @@ -142,6 +142,7 @@ struct scsi_device { unsigned fix_capacity:1; /* READ_CAPACITY is too high by 1 */ unsigned guess_capacity:1; /* READ_CAPACITY might be too high by 1 */ unsigned retry_hwerror:1; /* Retry HARDWARE_ERROR */ + unsigned last_sector_bug:1; /* Always read last sector in a 1 sector read */ DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */ struct list_head event_list; /* asserted events */ diff -up vanilla-2.6.24-rc7/drivers/scsi/sd.c.psc1350 vanilla-2.6.24-rc7/drivers/scsi/sd.c --- vanilla-2.6.24-rc7/drivers/scsi/sd.c.psc1350 2008-01-11 19:55:43.0 +0100 +++ vanilla-2.6.24-rc7/drivers/scsi/sd.c 2008-01-20 10:49:17.0 +0100 @@ -395,6 +395,16 @@ static int sd_prep_fn(struct request_que goto out; } + /* + * Some devices (some sdcards for one) don't like it if the last sector + * gets read in a larger then 1 sector read. + */ + if (unlikely(sdp-last_sector_bug + rq-nr_sectors sdp-sector_size / 512 + block + this_count == get_capacity(disk))) { + this_count -= sdp-sector_size / 512; + } + SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, block=%llu\n, (unsigned long long)block));
PATCH: usb-storage-set-last-sector-bug-flag.patch
Hi all, This patch sets the last_sector_bug flag to 1 for all USB disks. This is needed to makes the cardreader on various HP multifunction printers work. Since the performance impact is negible we set this flag for all USB disks to avoid an unusual_devs.h nightmare. Note that this patch depends on: PATCH: scsi-sd-last-sector-bug-flag.patch Which actually adds this flag to the scsi subsystem. Signed-off-by: Hans de Goede [EMAIL PROTECTED] Regards, Hans This patch sets the last_sector_bug flag to 1 for all USB disks. This is needed to makes the cardreader on various HP multifunction printers work. Since the performance impact is negible we set this flag for all USB disks to avoid an unusual_devs.h nightmare. Note that this patch depends on: PATCH: scsi-sd-last-sector-bug-flag.patch Which actually adds this flag to the scsi subsystem. Signed-off-by: Hans de Goede [EMAIL PROTECTED] --- vanilla-2.6.24-rc7/drivers/usb/storage/scsiglue.c.psc1350 2008-01-11 19:40:31.0 +0100 +++ vanilla-2.6.24-rc7/drivers/usb/storage/scsiglue.c 2008-01-20 11:18:38.0 +0100 @@ -187,6 +187,10 @@ static int slave_configure(struct scsi_d * automatically, requiring a START-STOP UNIT command. */ sdev-allow_restart = 1; + /* Some USB cardreaders have trouble reading an sdcard's last + * sector in a larger then 1 sector read, since the performance + * impact is negible we set this flag for all USB disks */ + sdev-last_sector_bug = 1; } else { /* Non-disk-type devices don't need to blacklist any pages
Re: [PATCH v3] use dynamically allocated sense buffer
On Wed, Jan 16 2008 at 6:32 +0200, FUJITA Tomonori [EMAIL PROTECTED] wrote: This is the third version of: http://marc.info/?l=linux-scsim=120038907123706w=2 The changes from the second version are: - Fixed the memory leak bug that Boaz pointed out. shots-backup_sense_buffer has gone. One sense buffer is allocated per host and it's always attached to the scsi_cmnd linked with freelist (backup command). - Calling scsi_setup_command_sense_buffer in scsi_host_alloc instead of scsi_add_host. The first version tries to allocates as many buffers as shost-can_queue so scsi_setup_command_sense_buffer is called in scsi_add_host. But, it's not the case any more, so this calls scsi_setup_command_sense_buffer in scsi_host_alloc like scsi_setup_command_freelist. I did the same tests against this patch (though I knew there were not the performnace differences): dynamic sense buf (slub) | 483.5 MB/s IOPS 123772.7/s For convenience, here are the previous results: scsi-misc (slub) | 486.9 MB/s IOPS 124652.9/s dynamic sense buf (slub) | 483.2 MB/s IOPS 123704.1/s I put the results and the kernel configuration: http://www.kernel.org/pub/linux/kernel/people/tomo/sense/ = From: FUJITA Tomonori [EMAIL PROTECTED] Subject: [PATCH] use dynamically allocated sense buffer This removes static array sense_buffer in scsi_cmnd and uses dynamically allocated sense_buffer (with GFP_DMA). The reason for doing this is that some architectures need cacheline aligned buffer for DMA: http://lkml.org/lkml/2007/11/19/2 The problems are that scsi_eh_prep_cmnd puts scsi_cmnd::sense_buffer to sglist and some LLDs directly DMA to scsi_cmnd::sense_buffer. It's necessary to DMA to scsi_cmnd::sense_buffer safely. This patch solves these issues. __scsi_get_command allocates sense_buffer via kmem_cache_alloc and attaches it to a scsi_cmnd so everything just work as before. Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] --- drivers/scsi/hosts.c |9 ++- drivers/scsi/scsi.c | 61 - drivers/scsi/scsi_priv.h |2 + include/scsi/scsi_cmnd.h |2 +- 4 files changed, 70 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 9a10b43..f5d3fbb 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -268,6 +268,7 @@ static void scsi_host_dev_release(struct device *dev) } scsi_destroy_command_freelist(shost); + scsi_destroy_command_sense_buffer(shost); if (shost-bqt) blk_free_tags(shost-bqt); @@ -372,10 +373,14 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) else shost-dma_boundary = 0x; - rval = scsi_setup_command_freelist(shost); + rval = scsi_setup_command_sense_buffer(shost); if (rval) goto fail_kfree; + rval = scsi_setup_command_freelist(shost); + if (rval) + goto fail_destroy_sense; + device_initialize(shost-shost_gendev); snprintf(shost-shost_gendev.bus_id, BUS_ID_SIZE, host%d, shost-host_no); @@ -399,6 +404,8 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) fail_destroy_freelist: scsi_destroy_command_freelist(shost); + fail_destroy_sense: + scsi_destroy_command_sense_buffer(shost); fail_kfree: kfree(shost); return NULL; diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 54ff611..0a4a5b8 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -161,6 +161,9 @@ static struct scsi_host_cmd_pool scsi_cmd_dma_pool = { static DEFINE_MUTEX(host_cmd_pool_mutex); +static struct kmem_cache *sense_buffer_slab; +static int sense_buffer_slab_users; + /** * __scsi_get_command - Allocate a struct scsi_cmnd * @shost: host to transmit command @@ -172,6 +175,7 @@ static DEFINE_MUTEX(host_cmd_pool_mutex); struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *shost, gfp_t gfp_mask) { struct scsi_cmnd *cmd; + unsigned char *buf; cmd = kmem_cache_alloc(shost-cmd_pool-slab, gfp_mask | shost-cmd_pool-gfp_mask); @@ -186,6 +190,21 @@ struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *shost, gfp_t gfp_mask) list_del_init(cmd-list); } spin_unlock_irqrestore(shost-free_list_lock, flags); + + if (cmd) { + buf = cmd-sense_buffer; + memset(cmd, 0, sizeof(*cmd)); + cmd-sense_buffer = buf; + } + } else { + buf = kmem_cache_alloc(sense_buffer_slab, __GFP_DMA|gfp_mask); + if (likely(buf)) { + memset(cmd, 0, sizeof(*cmd)); + cmd-sense_buffer = buf; + } else { +
Re: [PATCH v3] use dynamically allocated sense buffer
On Wed, Jan 16, 2008 at 01:32:17PM +0900, FUJITA Tomonori wrote: This removes static array sense_buffer in scsi_cmnd and uses dynamically allocated sense_buffer (with GFP_DMA). The reason for doing this is that some architectures need cacheline aligned buffer for DMA: http://lkml.org/lkml/2007/11/19/2 The problems are that scsi_eh_prep_cmnd puts scsi_cmnd::sense_buffer to sglist and some LLDs directly DMA to scsi_cmnd::sense_buffer. It's necessary to DMA to scsi_cmnd::sense_buffer safely. This patch solves these issues. __scsi_get_command allocates sense_buffer via kmem_cache_alloc and attaches it to a scsi_cmnd so everything just work as before. I think this is fine for the moment. Longer-term, I want to allow low-level drivers to allocate the sense_buffer themselves so they can DMA directly into it (ie grown-up dma mapping, rather than this quaint x86 __GFP_DMA). This patch doesn't get us any closer to that, but it doesn't get us further away from it either. -- Intel are signing my paycheques ... these opinions are still mine Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: scsi-sd-last-sector-bug-flag.patch
On Sun, 2008-01-20 at 11:12 +0100, Hans de Goede wrote: Hi all, This patch adds a new scsi_device flag for devices which contain a bug where the device crashes when the last sector is read in a larger then 1 sector read. This is for example the case with sdcards in the HP PSC1350 printer cardreader and in the HP PSC1610 printer cardreader. Signed-off-by: Hans de Goede [EMAIL PROTECTED] This could have done with running through checkpatch.pl: ERROR: trailing whitespace #28: FILE: drivers/scsi/sd.c:398: +^I/* $ ERROR: trailing whitespace #32: FILE: drivers/scsi/sd.c:402: +^Iif (unlikely(sdp-last_sector_bug $ WARNING: braces {} are not necessary for single statement blocks #32: FILE: drivers/scsi/sd.c:402: + if (unlikely(sdp-last_sector_bug + rq-nr_sectors sdp-sector_size / 512 + block + this_count == get_capacity(disk))) { + this_count -= sdp-sector_size / 512; + } ERROR: use tabs not spaces #34: FILE: drivers/scsi/sd.c:404: + ^Iblock + this_count == get_capacity(disk))) {$ WARNING: line over 80 characters #49: FILE: include/scsi/scsi_device.h:142: + unsigned last_sector_bug:1; /* Always read last sector in a 1 sector read */ total: 3 errors, 2 warnings, 23 lines checked I've fixed all of these up. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] use dynamically allocated sense buffer
On Wed, 2008-01-16 at 13:32 +0900, FUJITA Tomonori wrote: This is the third version of: http://marc.info/?l=linux-scsim=120038907123706w=2 [...] diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 54ff611..0a4a5b8 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -186,6 +190,21 @@ struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *shost, gfp_t gfp_mask) list_del_init(cmd-list); } spin_unlock_irqrestore(shost-free_list_lock, flags); + + if (cmd) { + buf = cmd-sense_buffer; + memset(cmd, 0, sizeof(*cmd)); + cmd-sense_buffer = buf; + } + } else { + buf = kmem_cache_alloc(sense_buffer_slab, __GFP_DMA|gfp_mask); This is going to cause the enterprise some angst because ZONE_DMA can be very small, and the enterprise requrements for commands in flight very large. I also think its unnecessary if we know the host isn't requiring ISA DMA. How about the below to fix this, it's based on the existing infrastructure for solving the very same problem. James --- From e7ffbd81684779f5eb41d66d5f499b82af40e12b Mon Sep 17 00:00:00 2001 From: James Bottomley [EMAIL PROTECTED] Date: Sun, 20 Jan 2008 09:28:24 -0600 Subject: [SCSI] don't use __GFP_DMA for sense buffers if not required Only hosts which actually have ISA DMA requirements need sense buffers coming out of ZONE_DMA, so only use the __GFP_DMA flag for that case to avoid allocating this scarce resource if it's not necessary. Signed-off-by: James Bottomley [EMAIL PROTECTED] --- drivers/scsi/hosts.c |9 + drivers/scsi/scsi.c | 106 +++-- drivers/scsi/scsi_priv.h |2 - 3 files changed, 46 insertions(+), 71 deletions(-) diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index f5d3fbb..9a10b43 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -268,7 +268,6 @@ static void scsi_host_dev_release(struct device *dev) } scsi_destroy_command_freelist(shost); - scsi_destroy_command_sense_buffer(shost); if (shost-bqt) blk_free_tags(shost-bqt); @@ -373,13 +372,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) else shost-dma_boundary = 0x; - rval = scsi_setup_command_sense_buffer(shost); - if (rval) - goto fail_kfree; - rval = scsi_setup_command_freelist(shost); if (rval) - goto fail_destroy_sense; + goto fail_kfree; device_initialize(shost-shost_gendev); snprintf(shost-shost_gendev.bus_id, BUS_ID_SIZE, host%d, @@ -404,8 +399,6 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) fail_destroy_freelist: scsi_destroy_command_freelist(shost); - fail_destroy_sense: - scsi_destroy_command_sense_buffer(shost); fail_kfree: kfree(shost); return NULL; diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 0a4a5b8..045e69d 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -141,29 +141,30 @@ const char * scsi_device_type(unsigned type) EXPORT_SYMBOL(scsi_device_type); struct scsi_host_cmd_pool { - struct kmem_cache *slab; - unsigned intusers; - char*name; - unsigned intslab_flags; - gfp_t gfp_mask; + struct kmem_cache *cmd_slab; + struct kmem_cache *sense_slab; + unsigned intusers; + char*cmd_name; + char*sense_name; + unsigned intslab_flags; + gfp_t gfp_mask; }; static struct scsi_host_cmd_pool scsi_cmd_pool = { - .name = scsi_cmd_cache, + .cmd_name = scsi_cmd_cache, + .sense_name = scsi_sense_cache, .slab_flags = SLAB_HWCACHE_ALIGN, }; static struct scsi_host_cmd_pool scsi_cmd_dma_pool = { - .name = scsi_cmd_cache(DMA), + .cmd_name = scsi_cmd_cache(DMA), + .sense_name = scsi_sense_cache(DMA), .slab_flags = SLAB_HWCACHE_ALIGN|SLAB_CACHE_DMA, .gfp_mask = __GFP_DMA, }; static DEFINE_MUTEX(host_cmd_pool_mutex); -static struct kmem_cache *sense_buffer_slab; -static int sense_buffer_slab_users; - /** * __scsi_get_command - Allocate a struct scsi_cmnd * @shost: host to transmit command @@ -177,8 +178,8 @@ struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *shost, gfp_t gfp_mask) struct scsi_cmnd *cmd; unsigned char *buf; - cmd = kmem_cache_alloc(shost-cmd_pool-slab, - gfp_mask | shost-cmd_pool-gfp_mask); + cmd = kmem_cache_alloc(shost-cmd_pool-cmd_slab, + gfp_mask | shost-cmd_pool-gfp_mask); if
Re: [SCSI] scsi.h: add macro for enclosure bit of inquiry data
James Bottomley wrote: The macro tells us whether the device is (or contains) an enclosure device. ... +static inline int scsi_device_enclosure(struct scsi_device *sdev) +{ + return sdev-inquiry[6] (16); +} Perhaps call it scsi_device_is_enclosure() to better reflect the nature of this function. Or if it is an accessor to inquiry data to you, maybe call it scsi_device_inquiry_encserv() or scsi_device_inquiry_enclosure_services() or sdev_to_inquiry_encserv() or sdev_to_inquiry_enclosure_services(). Alas neither of this fits with the existing similar functions in scsi_device.h which don't have expressive names. -- Stefan Richter -=-==--- ---= =-=-- http://arcgraph.de/sr/ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Obviously, that's why I was worrying about mempool size and default blocks a while ago. However, the deadlock only occurs if the device is swap or backing a filesystem with memory mapped files. The use cases for this are really tapes and other entities that need huge buffers. That's why we're keeping the system sector size at 1024 unless you alter it through sysfs (here gun, there foot ...) James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Sun, Jan 20 2008 at 21:24 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Obviously, that's why I was worrying about mempool size and default blocks a while ago. However, the deadlock only occurs if the device is swap or backing a filesystem with memory mapped files. The use cases for this are really tapes and other entities that need huge buffers. That's why we're keeping the system sector size at 1024 unless you alter it through sysfs (here gun, there foot ...) James OK Thanks for confirming my concern, In modern life with devices like iSCSI that have ~0 as it's max_sector, swapping over that should be considered and configured carefully. Once with pNFS over blocks/objects it should be addressed. Perhaps with a FAIL_FAST semantics for users like pNFS to split up the requests if they fail with out-of-memory. Thanks Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Sun, Jan 20 2008 at 21:29 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Sun, Jan 20 2008, James Bottomley wrote: On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Obviously, that's why I was worrying about mempool size and default blocks a while ago. However, the deadlock only occurs if the device is swap or backing a filesystem with memory mapped files. The use cases for this are really tapes and other entities that need huge buffers. That's why we're keeping the system sector size at 1024 unless you alter it through sysfs (here gun, there foot ...) Alternatively (and much safer, imho), we allow blk_rq_map_sg() return smaller than nr_phys_segments and just ensure that the request is continued nicely through the normal 'request if residual' logic. Thats a grate Idea. I will Q it on my todo list. Thanks Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Sun, Jan 20 2008, Jens Axboe wrote: On Sun, Jan 20 2008, Boaz Harrosh wrote: On Sun, Jan 20 2008 at 21:29 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Sun, Jan 20 2008, James Bottomley wrote: On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Obviously, that's why I was worrying about mempool size and default blocks a while ago. However, the deadlock only occurs if the device is swap or backing a filesystem with memory mapped files. The use cases for this are really tapes and other entities that need huge buffers. That's why we're keeping the system sector size at 1024 unless you alter it through sysfs (here gun, there foot ...) Alternatively (and much safer, imho), we allow blk_rq_map_sg() return smaller than nr_phys_segments and just ensure that the request is continued nicely through the normal 'request if residual' logic. Thats a grate Idea. I will Q it on my todo list. Thanks ok good, thanks :-) btw, the above is full of typos, my apologies. it should read requeue if residual, but I guess you already guessed as much. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Sun, Jan 20 2008, James Bottomley wrote: On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Obviously, that's why I was worrying about mempool size and default blocks a while ago. However, the deadlock only occurs if the device is swap or backing a filesystem with memory mapped files. The use cases for this are really tapes and other entities that need huge buffers. That's why we're keeping the system sector size at 1024 unless you alter it through sysfs (here gun, there foot ...) Alternatively (and much safer, imho), we allow blk_rq_map_sg() return smaller than nr_phys_segments and just ensure that the request is continued nicely through the normal 'request if residual' logic. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Sun, Jan 20 2008, Boaz Harrosh wrote: On Sun, Jan 20 2008 at 21:24 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Obviously, that's why I was worrying about mempool size and default blocks a while ago. However, the deadlock only occurs if the device is swap or backing a filesystem with memory mapped files. The use cases for this are really tapes and other entities that need huge buffers. That's why we're keeping the system sector size at 1024 unless you alter it through sysfs (here gun, there foot ...) James OK Thanks for confirming my concern, In modern life with devices like iSCSI that have ~0 as it's max_sector, swapping over that should be considered and configured carefully. Once with pNFS over blocks/objects it should be addressed. Perhaps with a FAIL_FAST semantics for users like pNFS to split up the requests if they fail with out-of-memory. I'll have to disagree again, you can't expect users to know these sorts of things (sorry your system deadlocked, you should have known not to increase max_sectors_kb for something you swap on). Especially when handling it correctly in scsi_init_io() is a few lines of change. No excuse for not doing this correctly. At least for blk_fs_request() requests, for blk_pc_request() failing is the only option. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Sun, Jan 20 2008, Boaz Harrosh wrote: On Sun, Jan 20 2008 at 21:29 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Sun, Jan 20 2008, James Bottomley wrote: On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Obviously, that's why I was worrying about mempool size and default blocks a while ago. However, the deadlock only occurs if the device is swap or backing a filesystem with memory mapped files. The use cases for this are really tapes and other entities that need huge buffers. That's why we're keeping the system sector size at 1024 unless you alter it through sysfs (here gun, there foot ...) Alternatively (and much safer, imho), we allow blk_rq_map_sg() return smaller than nr_phys_segments and just ensure that the request is continued nicely through the normal 'request if residual' logic. Thats a grate Idea. I will Q it on my todo list. Thanks ok good, thanks :-) -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Sun, 2008-01-20 at 21:54 +0200, Boaz Harrosh wrote: On Sun, Jan 20 2008 at 21:24 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Obviously, that's why I was worrying about mempool size and default blocks a while ago. However, the deadlock only occurs if the device is swap or backing a filesystem with memory mapped files. The use cases for this are really tapes and other entities that need huge buffers. That's why we're keeping the system sector size at 1024 unless you alter it through sysfs (here gun, there foot ...) James OK Thanks for confirming my concern, In modern life with devices like iSCSI that have ~0 as it's max_sector, swapping over that should be considered and configured carefully. Once with pNFS over blocks/objects it should be addressed. Perhaps with a FAIL_FAST semantics for users like pNFS to split up the requests if they fail with out-of-memory. Well, swap over networked backed devices is an order of magnitude worse of a problem. However, the block layer doesn't let you set max_sectors over 1024; even when iscsi requests ~0 it gets 1024 but the user is allowed to raise this via sysfs. (That's the difference between max_sectors [currently operating parameter] and max_hw_sectors [passed in maximum]) James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: scsi-sd-last-sector-bug-flag.patch
On Sun, Jan 20, 2008 at 11:12:26AM +0100, Hans de Goede wrote: Hi all, This patch adds a new scsi_device flag for devices which contain a bug where the device crashes when the last sector is read in a larger then 1 sector read. This is for example the case with sdcards in the HP PSC1350 printer cardreader and in the HP PSC1610 printer cardreader. Wait, we already handle this in the usb-storage driver, why are you putting this in the scsi core now? confused, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: usb-storage-set-last-sector-bug-flag.patch
On Sun, Jan 20, 2008 at 11:27:29AM +0100, Hans de Goede wrote: Hi all, This patch sets the last_sector_bug flag to 1 for all USB disks. This is needed to makes the cardreader on various HP multifunction printers work. Since the performance impact is negible we set this flag for all USB disks to avoid an unusual_devs.h nightmare. Oh great, now my working just fine USB devices, which happen to have data in the last sector, suddenly stop working. That's not acceptable :( greg k-h - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: usb-storage-set-last-sector-bug-flag.patch
James Bottomley wrote: On Sun, 2008-01-20 at 12:56 -0800, Greg KH wrote: On Sun, Jan 20, 2008 at 11:27:29AM +0100, Hans de Goede wrote: Hi all, This patch sets the last_sector_bug flag to 1 for all USB disks. This is needed to makes the cardreader on various HP multifunction printers work. Since the performance impact is negible we set this flag for all USB disks to avoid an unusual_devs.h nightmare. Oh great, now my working just fine USB devices, which happen to have data in the last sector, suddenly stop working. That's not acceptable :( I don't see how this will happen, might you not be confusing this change (which allows access to the last sector, just insists that it be accessed by a single sector read) with US_FL_FIX_CAPACITY which is for devices that report having one more sectors than they actually have and therefore adjusts the access limits down by one? Let me try to explain some more, the scsi-sd patch, which goes hand in hand with this adds a last_sector_bug flag, which is indeed a different flag form the fix_capacity flag. Both deal with with what are (in case of the last_sector_bug flag probably) off by one bugs. The fix_capacity flag is for devices which report their last sector is N (with sectors numbered from 0 - N) but in reality / they mean they have N sectors, so their last sector really is N - 1. The last_sector_bug flag is for a bug (sofar only seen in HP multifunction printers with cardreader when using an sdcard) where the reader will cease to function (returns error codes in response to each and every command) after one has attempted to read the last sector in a read larger then 1 sector. To be clear an example lets say an example disk has 256 sectors, numbered 0 - 255. Then the following reads will all cause the reader to go into borked mode: 16 sectors starting at 240 8 sectors starting at 248 2 sectors starting at 254 Yet the last sector can still be read without problems the following read: 1 sector starting at 255 So what the scsi-sd part of these 2 patches does it adds a last_sector_bug flag, which when set will cause the layer to split a read like this: 16 sectors starting at 240 Into: 15 sectors starting at 240 1 sector starting at 255 Since reading the last sector is a rare occurence (but one which does happen every time when determining the partition table, triggering the bug on every card insert). and since there are a lot of different HP printer models ( and cheap usb card readers are notoriously buggy so other cardreaders might be affected too), Matthew Dharm (the usb-storage maintainer) thought it best to enable this for all devices. Regards, Hans - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: scsi-sd-last-sector-bug-flag.patch
Greg KH wrote: On Sun, Jan 20, 2008 at 11:12:26AM +0100, Hans de Goede wrote: Hi all, This patch adds a new scsi_device flag for devices which contain a bug where the device crashes when the last sector is read in a larger then 1 sector read. This is for example the case with sdcards in the HP PSC1350 printer cardreader and in the HP PSC1610 printer cardreader. Wait, we already handle this in the usb-storage driver, why are you putting this in the scsi core now? No we don't, I've send patches for this to the usb-storage driver but they were refused as they modified scsi commands in flight, the consensus was that this should be done in the scsi layer, hence this patch. confused, I've noticed, esp. with regards to your second mail to which I will reply next. There has been a lot of discussion about this ending in this (perfectly fine) patch, which is much better then my original hack if I may add that. Regards, Hans - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Sun, 2008-01-20 at 21:01 +0100, Jens Axboe wrote: On Sun, Jan 20 2008, Jens Axboe wrote: On Sun, Jan 20 2008, Boaz Harrosh wrote: On Sun, Jan 20 2008 at 21:29 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Sun, Jan 20 2008, James Bottomley wrote: On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Obviously, that's why I was worrying about mempool size and default blocks a while ago. However, the deadlock only occurs if the device is swap or backing a filesystem with memory mapped files. The use cases for this are really tapes and other entities that need huge buffers. That's why we're keeping the system sector size at 1024 unless you alter it through sysfs (here gun, there foot ...) Alternatively (and much safer, imho), we allow blk_rq_map_sg() return smaller than nr_phys_segments and just ensure that the request is continued nicely through the normal 'request if residual' logic. Thats a grate Idea. I will Q it on my todo list. Thanks ok good, thanks :-) btw, the above is full of typos, my apologies. it should read requeue if residual, but I guess you already guessed as much. Something like ... It looks to me like it would make sense to have something like a BLKPREP_SGALLOCFAIL return so the block layer can do this for us ... Alternatively, we'll have to find a way of adjusting the sector count as it goes into the ULD prep functions. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: usb-storage-set-last-sector-bug-flag.patch
On Sun, 2008-01-20 at 12:56 -0800, Greg KH wrote: On Sun, Jan 20, 2008 at 11:27:29AM +0100, Hans de Goede wrote: Hi all, This patch sets the last_sector_bug flag to 1 for all USB disks. This is needed to makes the cardreader on various HP multifunction printers work. Since the performance impact is negible we set this flag for all USB disks to avoid an unusual_devs.h nightmare. Oh great, now my working just fine USB devices, which happen to have data in the last sector, suddenly stop working. That's not acceptable :( I don't see how this will happen, might you not be confusing this change (which allows access to the last sector, just insists that it be accessed by a single sector read) with US_FL_FIX_CAPACITY which is for devices that report having one more sectors than they actually have and therefore adjusts the access limits down by one? James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] PATCH: usb-storage-set-last-sector-bug-flag.patch
Le dimanche 20 janvier 2008 à 15:03 -0600, James Bottomley a écrit : On Sun, 2008-01-20 at 12:56 -0800, Greg KH wrote: On Sun, Jan 20, 2008 at 11:27:29AM +0100, Hans de Goede wrote: Hi all, This patch sets the last_sector_bug flag to 1 for all USB disks. This is needed to makes the cardreader on various HP multifunction printers work. Since the performance impact is negible we set this flag for all USB disks to avoid an unusual_devs.h nightmare. Oh great, now my working just fine USB devices, which happen to have data in the last sector, suddenly stop working. That's not acceptable :( I don't see how this will happen, might you not be confusing this change (which allows access to the last sector, just insists that it be accessed by a single sector read) with US_FL_FIX_CAPACITY which is for devices that report having one more sectors than they actually have and therefore adjusts the access limits down by one? Well, i was the one suggesting more than 2 devices might be impacted (i'm absolutely not sure about this, but it could be). It seems caused by a common error when using 0 as a base index. I don't like loosing performance for broken devices, but i'm not against keeping that patch for all devices. 1) As the last sectors are read when the card is inserted, it results in not working at all devices : Users may think they are just not supported, and won't report any bug. See how Hans worked hard to find 3 other cases in fedora, ubuntu, etc forums ! 2) It should not break in the other cases, IIUC, it just splits the read in two. 3) It's just about the last sector, so any issue should only be greater timing when using realtime perhaps ? Whatever, this patch, or an other form of it, is needed (because of 1) ). If it is a default, an option or dedicated to a limited set of devices must be chosen. I hope you will soon find this solution. Best regards, Guillaume B. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SCSI] scsi.h: add macro for enclosure bit of inquiry data
On Sun, 2008-01-20 at 18:44 +0100, Stefan Richter wrote: James Bottomley wrote: The macro tells us whether the device is (or contains) an enclosure device. ... +static inline int scsi_device_enclosure(struct scsi_device *sdev) +{ + return sdev-inquiry[6] (16); +} Perhaps call it scsi_device_is_enclosure() to better reflect the nature of this function. Or if it is an accessor to inquiry data to you, maybe call it scsi_device_inquiry_encserv() or scsi_device_inquiry_enclosure_services() or sdev_to_inquiry_encserv() or sdev_to_inquiry_enclosure_services(). Alas neither of this fits with the existing similar functions in scsi_device.h which don't have expressive names. Right .. that's the problem. Being potentially clearer in naming but at odds with what's currently in the file starts to add to the confusion about the other names in there. I chose the name primarily because it's the same form as all the others. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Marvell 6440 SAS/SATA driver (draft)
Hello James: I read your comments. .target_alloc = sas_target_alloc, .slave_configure= sas_slave_configure, .slave_destroy = sas_slave_destroy, .change_queue_depth = sas_change_queue_depth, .change_queue_type = sas_change_queue_type, .bios_param = sas_bios_param, - .can_queue = 1, + .can_queue = 30, I don't think you want to do this. It starts sending multiple commands at once. The design use of libsas is to start off at 1 and then up the limit in slave configure once we know what type of device we're dealing with. The current sas_slave_configure has a limitation in that the depth is hard coded to 32, so if you need it smaller, we'll have to find a way of allowing this? Is 30 your max queue depth? Yes , I may not do this. But I need to set sas_ha_struct.lldd_queue_size to suitable value. Because sas_queue sends multiple commands according to lldd_queue_size before calling lldd_execute_task function which supports queue depth , 30 is suitable. .cmd_per_lun= 1, .this_id= -1, - .sg_tablesize = SG_ALL, - .max_sectors= SCSI_DEFAULT_MAX_SECTORS, - .use_clustering = ENABLE_CLUSTERING, + .sg_tablesize = 32, 32 looks a little small. My reading of the code is that the PRD table has to fit with the command header, OAF area and status area into an 8k segment of memory, so at 16 bytes per PRD entry, it looks like a page worth at least (256) isn't unreasonable. You should probably have some type of macro for this though. + .max_sectors= (128*1024)9, Yes , It's demo code . And I need to test device to find a good value according to performance and reliability. + .use_clustering = DISABLE_CLUSTERING, I think this is wrong ... there should be no modern device that disables clustering (otherwise they fall over badly on iommu platforms). .eh_device_reset_handler= sas_eh_device_reset_handler, .eh_bus_reset_handler = sas_eh_bus_reset_handler, - .slave_alloc= sas_slave_alloc, Please don't remove this ... it's a standard callback, and it's required for the day you support SATA. You are right. I forgot to recover these codes when I debuged. And Driver has support for SATA devices . I will commit the patches in this weeks. Thank you for your help. Ke Wei. On Jan 19, 2008 2:53 AM, James Bottomley [EMAIL PROTECTED] wrote: On Thu, 2008-01-10 at 14:53 +0800, Ke Wei wrote: The 88SE6440 driver : The driver is based on bare code from Jeff Garzik. And it can work under linux kernel 2.6.23. By far, Can discover and find SAS HDD, but SATA is currently unsupported. Command queue depth can be above 1. Most error handling, and some phy handling code is notably missing. contains the following updates: --- mvsas_orig.c 2007-12-06 19:21:32.0 -0500 +++ mvsas.c 2008-01-09 04:53:14.0 -0500 [...] +#define MVS_BIT(x) (1L (x)) + +#define PORT_TYPE_SATAMVS_BIT(0) +#define PORT_TYPE_SAS MVS_BIT(1) + +#define READ_PORT_CONFIG_DATA(i) \ + ((i3)?mr32(P4_CFG_DATA+(i-4)*8):mr32(P0_CFG_DATA+i*8)) +#define WRITE_PORT_CONFIG_DATA(i,tmp) \ + {if(i3)mw32(P4_CFG_DATA+(i-4)*8,tmp);else mw32(P0_CFG_DATA+i*8,tmp);} +#define WRITE_PORT_CONFIG_ADDR(i,tmp) \ + {if(i3)mw32(P4_CFG_ADDR+(i-4)*8,tmp);else mw32(P0_CFG_ADDR+i*8,tmp);} + +#define READ_PORT_PHY_CONTROL(i) \ + ((i3)?mr32(P4_SER_CTLSTAT+(i-4)*4):mr32(P0_SER_CTLSTAT+i*4)) +#define WRITE_PORT_PHY_CONTROL(i,tmp) \ + {if(i3)mw32(P4_SER_CTLSTAT+(i-4)*4,tmp);else mw32(P0_SER_CTLSTAT+i*4,tmp);} This is an example of where you mailer has broken a line (which causes the patch not to apply). + +#define READ_PORT_VSR_DATA(i) \ + ((i3)?mr32(P4_VSR_DATA+(i-4)*8):mr32(P0_VSR_DATA+i*8)) +#define WRITE_PORT_VSR_DATA(i,tmp) \ + {if(i3)mw32(P4_VSR_DATA+(i-4)*8,tmp);else mw32(P0_VSR_DATA+i*8,tmp);} +#define WRITE_PORT_VSR_ADDR(i,tmp) \ + {if(i3)mw32(P4_VSR_ADDR+(i-4)*8,tmp);else mw32(P0_VSR_ADDR+i*8,tmp);} + +#define READ_PORT_IRQ_STAT(i) \ + ((i3)?mr32(P4_INT_STAT+(i-4)*8):mr32(P0_INT_STAT+i*8)) +#define WRITE_PORT_IRQ_STAT(i,tmp) \ + {if(i3)mw32(P4_INT_STAT+(i-4)*8,tmp);else mw32(P0_INT_STAT+i*8,tmp);} +#define READ_PORT_IRQ_MASK(i) \ + ((i3)?mr32(P4_INT_MASK+(i-4)*8):mr32(P0_INT_MASK+i*8)) +#define WRITE_PORT_IRQ_MASK(i,tmp) \ + {if(i3)mw32(P4_INT_MASK+(i-4)*8,tmp);else mw32(P0_INT_MASK+i*8,tmp);} + /* driver compile-time configuration */ enum driver_configuration { - MVS_TX_RING_SZ = 1024, /* TX ring size (12-bit) */ + MVS_TX_RING_SZ = 512, /* TX ring size (12-bit) */ MVS_RX_RING_SZ = 1024, /* RX ring size
Re: [PATCH v3] use dynamically allocated sense buffer
On Sun, 20 Jan 2008 09:40:11 -0700 Matthew Wilcox [EMAIL PROTECTED] wrote: On Wed, Jan 16, 2008 at 01:32:17PM +0900, FUJITA Tomonori wrote: This removes static array sense_buffer in scsi_cmnd and uses dynamically allocated sense_buffer (with GFP_DMA). The reason for doing this is that some architectures need cacheline aligned buffer for DMA: http://lkml.org/lkml/2007/11/19/2 The problems are that scsi_eh_prep_cmnd puts scsi_cmnd::sense_buffer to sglist and some LLDs directly DMA to scsi_cmnd::sense_buffer. It's necessary to DMA to scsi_cmnd::sense_buffer safely. This patch solves these issues. __scsi_get_command allocates sense_buffer via kmem_cache_alloc and attaches it to a scsi_cmnd so everything just work as before. I think this is fine for the moment. Longer-term, I want to allow low-level drivers to allocate the sense_buffer themselves so they can DMA directly into it (ie grown-up dma mapping, rather than this quaint x86 __GFP_DMA). This patch doesn't get Yeah, I think that the approach is one of candidates. If we go with it, I think that the major issue is that LLDs don't know when they can reclaim sense_buffer from scsi-ml; scsi-ml uses sense_buffer after scmd-scsi_done. us any closer to that, but it doesn't get us further away from it either. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] use dynamically allocated sense buffer
On Sun, 20 Jan 2008 10:36:56 -0600 James Bottomley [EMAIL PROTECTED] wrote: On Wed, 2008-01-16 at 13:32 +0900, FUJITA Tomonori wrote: This is the third version of: http://marc.info/?l=linux-scsim=120038907123706w=2 [...] diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 54ff611..0a4a5b8 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -186,6 +190,21 @@ struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *shost, gfp_t gfp_mask) list_del_init(cmd-list); } spin_unlock_irqrestore(shost-free_list_lock, flags); + + if (cmd) { + buf = cmd-sense_buffer; + memset(cmd, 0, sizeof(*cmd)); + cmd-sense_buffer = buf; + } + } else { + buf = kmem_cache_alloc(sense_buffer_slab, __GFP_DMA|gfp_mask); This is going to cause the enterprise some angst because ZONE_DMA can be very small, and the enterprise requrements for commands in flight very large. I also think its unnecessary if we know the host isn't requiring ISA DMA. Yes, I should have done properly. How about the below to fix this, it's based on the existing infrastructure for solving the very same problem. Looks nice. Integrating sense_buffer_pool into struct scsi_host_cmd_pool looks much better. James --- From e7ffbd81684779f5eb41d66d5f499b82af40e12b Mon Sep 17 00:00:00 2001 From: James Bottomley [EMAIL PROTECTED] Date: Sun, 20 Jan 2008 09:28:24 -0600 Subject: [SCSI] don't use __GFP_DMA for sense buffers if not required Only hosts which actually have ISA DMA requirements need sense buffers coming out of ZONE_DMA, so only use the __GFP_DMA flag for that case to avoid allocating this scarce resource if it's not necessary. Signed-off-by: James Bottomley [EMAIL PROTECTED] --- drivers/scsi/hosts.c |9 + drivers/scsi/scsi.c | 106 +++-- drivers/scsi/scsi_priv.h |2 - 3 files changed, 46 insertions(+), 71 deletions(-) (snip) @@ -310,7 +313,6 @@ int scsi_setup_command_freelist(struct Scsi_Host *shost) { struct scsi_host_cmd_pool *pool; struct scsi_cmnd *cmd; - unsigned char *sense_buffer; spin_lock_init(shost-free_list_lock); INIT_LIST_HEAD(shost-free_list); @@ -322,10 +324,13 @@ int scsi_setup_command_freelist(struct Scsi_Host *shost) mutex_lock(host_cmd_pool_mutex); pool = (shost-unchecked_isa_dma ? scsi_cmd_dma_pool : scsi_cmd_pool); if (!pool-users) { - pool-slab = kmem_cache_create(pool-name, - sizeof(struct scsi_cmnd), 0, - pool-slab_flags, NULL); - if (!pool-slab) + pool-cmd_slab = kmem_cache_create(pool-cmd_name, +sizeof(struct scsi_cmnd), 0, +pool-slab_flags, NULL); + pool-sense_slab = kmem_cache_create(pool-sense_name, + SCSI_SENSE_BUFFERSIZE, 0, + pool-slab_flags, NULL); + if (!pool-cmd_slab || !pool-sense_slab) goto fail; If one of the above allocations fails, the allocated slab leaks? diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 045e69d..1a9fba6 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -327,11 +327,16 @@ int scsi_setup_command_freelist(struct Scsi_Host *shost) pool-cmd_slab = kmem_cache_create(pool-cmd_name, sizeof(struct scsi_cmnd), 0, pool-slab_flags, NULL); + if (!pool-cmd_slab) + goto fail; + pool-sense_slab = kmem_cache_create(pool-sense_name, SCSI_SENSE_BUFFERSIZE, 0, pool-slab_flags, NULL); - if (!pool-cmd_slab || !pool-sense_slab) + if (!pool-sense_slab) { + kmem_cache_destroy(pool-cmd_slab); goto fail; + } } pool-users++; - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Sun, 20 Jan 2008 21:54:21 +0200 Boaz Harrosh [EMAIL PROTECTED] wrote: On Sun, Jan 20 2008 at 21:24 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Obviously, that's why I was worrying about mempool size and default blocks a while ago. However, the deadlock only occurs if the device is swap or backing a filesystem with memory mapped files. The use cases for this are really tapes and other entities that need huge buffers. That's why we're keeping the system sector size at 1024 unless you alter it through sysfs (here gun, there foot ...) James OK Thanks for confirming my concern, In modern life with devices like iSCSI that have ~0 as it's max_sector, swapping over that should be considered and configured carefully. Once with pNFS over blocks/objects it should be addressed. Perhaps with a FAIL_FAST semantics for users like pNFS to split up the requests if they fail with out-of-memory. As James pointed out, with networked backed device, the things are much more complicated (I have no idea when such configuration will be possible). http://kerneltrap.org/Linux/Swap_Over_NFS - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html