Re: module: fix module_refcount() return when running in a module exit routine

2015-01-23 Thread Rusty Russell
James Bottomley  writes:
> On Fri, 2015-01-23 at 05:17 -0800, Christoph Hellwig wrote:
>> On Fri, Jan 23, 2015 at 01:24:15PM +1030, Rusty Russell wrote:
>> > The correct fix is to turn try_module_get() into __module_get(), and
>> > always do the module_put().
>> 
>> Is this really safe?  __module_get sais it needs a non-zero refcount
>> to start with, but scsi_device_get is the only thing every incrementing
>> the refcount on the module pointer in the scsi host template, so
>> exactly that case can happen easily.  There's not assert actually
>> hardcoding the assumption, so I'm not sure if requirement the comment
>> really just is nice to have or a strong requirement.

Yes, as James says, my patch makes it no worse.

The "someone else grabs a reference" can be fixed in two ways.  James'
alternate path as per below, or by waiting in the exit function.  The
latter is kind of annoying, which is why we do the whole atomic
deregistration for modules...

Cheers,
Rusty.

> The comment was just documenting the old status quo: the
> try_module_get() was expected to fail if called within the host module
> remove path.  If you look at the flow, we use the refcounts on the
> actual scsi device to measure.  If they fail we know we have a problem.
> The module stuff is really only slaved to our master refcount on the
> device.  It's purpose is to prevent an inopportune rmmod.  Our default
> operating state is zero references on everything, so the user can just
> do rmmod ... obviously if the device is open or mounted then we hold
> both the device and the module.
>
> To that point, Rusty's patch just keeps the status quo in the new
> module_refcount() environment, so it's the quick bandaid.
>
> I think the use case you're worrying about is what happens if someone
> tries to use a device after module removal begins executing but before
> the device has been deleted (say by opening it)?  We'll exit the device
> removal routines and then kill the module, because after the module code
> gets to ->exit(), nothing re-checks the module refcount, so the host
> module will get free'd while we're still using the device.
>
> The fix for this seems to be to differentiate between special uses of
> scsi_get_device, which are allowed to get the device in the module exit
> routines and ordinary uses which aren't.  Something like this? (the
> patch isn't complete, but you get the idea).
>
> James
>
> ---
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 08c90a7..31ba254 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -965,6 +965,15 @@ int scsi_report_opcode(struct scsi_device *sdev, 
> unsigned char *buffer,
>  }
>  EXPORT_SYMBOL(scsi_report_opcode);
>  
> +static int __scsi_device_get_common(struct scsi_device *sdev)
> +{
> + if (sdev->sdev_state == SDEV_DEL)
> + return -ENXIO;
> + if (!get_device(&sdev->sdev_gendev))
> + return -ENXIO;
> + return 0;
> +}
> +
>  /**
>   * scsi_device_get  -  get an additional reference to a scsi_device
>   * @sdev:device to get a reference to
> @@ -975,19 +984,45 @@ EXPORT_SYMBOL(scsi_report_opcode);
>   */
>  int scsi_device_get(struct scsi_device *sdev)
>  {
> - if (sdev->sdev_state == SDEV_DEL)
> - return -ENXIO;
> - if (!get_device(&sdev->sdev_gendev))
> - return -ENXIO;
> - /* We can fail this if we're doing SCSI operations
> -  * from module exit (like cache flush) */
> - try_module_get(sdev->host->hostt->module);
> + int ret;
>  
> - return 0;
> + ret = __scsi_device_get_common(sdev);
> + if (ret)
> + return ret;
> +
> + ret = try_module_get(sdev->host->hostt->module);
> +
> + if (ret)
> + put_device(&sdev->sdev_gendev);
> +
> + return ret;
>  }
>  EXPORT_SYMBOL(scsi_device_get);
>  
>  /**
> + * scsi_device_get_in_module_exit() - get an additional reference to a 
> scsi_device
> + * @sdev:device to get a reference to
> + *
> + * Functions identically to scsi_device_get() except that it unconditionally
> + * gets the module reference.  This allows it to be called from module exit
> + * routines where scsi_device_get() will fail.  This routine is still paired
> + * with scsi_device_put().
> + */
> +int scsi_device_get_in_module_exit(struct scsi_device *sdev)
> +{
> + int ret;
> +
> + ret = __scsi_device_get_common(sdev);
> + if (ret)
> + return ret;
> +
> + __module_get(sdev->host->hostt->module);
> +
> +   

Re: module: fix module_refcount() return when running in a module exit routine

2015-01-22 Thread Rusty Russell
James Bottomley  writes:
> On Thu, 2015-01-22 at 08:50 -0800, Christoph Hellwig wrote:
>> We'll also still need to change scsi_device_put to deal with
>> a negative refcount..
>
> I don't believe so ... we never call module_refcount() now, and the
> actual module ref count never goes negative.  That's the point of
> Rusty's change.  In the unload routines, the actual module refcount is
> zero.  __module_get() will increment this and the final module_put()
> will decrement it without triggering a warning.
>
> In the old code, we expected try_module_get() to fail but then used
> module_refcount() to detect the failure and not do a put.

Yep, I've put that patch in my fixes queue now.

Thanks,
Rusty.

From: Rusty Russell 
Subject: scsi: always increment reference count

James reported:
> After e513cc1 module: Remove stop_machine from module unloading,
> module_refcount() is returning (unsigned long)-1 when called from within
> a routine that runs in module_exit.  This is confusing the scsi device
> put code which is coded to detect a module_refcount() of zero for
> running within a module exit routine and not try to do another
> module_put.  The fix is to restore the original behaviour of
> module_refcount() and return zero if we're running inside an exit
> routine.

The correct fix is to turn try_module_get() into __module_get(), and
always do the module_put().

Acked-by: James Bottomley 
Signed-off-by: Rusty Russell 

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index e02885451425..9b3829931f40 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -986,9 +986,9 @@ int scsi_device_get(struct scsi_device *sdev)
return -ENXIO;
if (!get_device(&sdev->sdev_gendev))
return -ENXIO;
-   /* We can fail this if we're doing SCSI operations
+   /* We can fail try_module_get if we're doing SCSI operations
 * from module exit (like cache flush) */
-   try_module_get(sdev->host->hostt->module);
+   __module_get(sdev->host->hostt->module);
 
return 0;
 }
@@ -1004,14 +1004,7 @@ EXPORT_SYMBOL(scsi_device_get);
  */
 void scsi_device_put(struct scsi_device *sdev)
 {
-#ifdef CONFIG_MODULE_UNLOAD
-   struct module *module = sdev->host->hostt->module;
-
-   /* The module refcount will be zero if scsi_device_get()
-* was called from a module removal routine */
-   if (module && module_refcount(module) != 0)
-   module_put(module);
-#endif
+   module_put(sdev->host->hostt->module);
put_device(&sdev->sdev_gendev);
 }
 EXPORT_SYMBOL(scsi_device_put);
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: module: fix module_refcount() return when running in a module exit routine

2015-01-21 Thread Rusty Russell
James Bottomley  writes:
> Actually, I don't think this is enough.  Some Australian once came up
> with a guide to APIs, and lectured on it at length, one of which was
> that the name should be the obvious use and it is unexpected that a
> refcount would go negative.  I think we could raise it from -6 on the
> API scale to +3 if we add some documentation like below.

Thanks, applied!

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: module: fix module_refcount() return when running in a module exit routine

2015-01-19 Thread Rusty Russell
James Bottomley  writes:
> On Mon, 2015-01-19 at 16:21 +1030, Rusty Russell wrote:
>> Masami Hiramatsu  writes:
>> > (2015/01/19 1:55), James Bottomley wrote:
>> >> From: James Bottomley 
>> >> 
>> >> After e513cc1 module: Remove stop_machine from module unloading,
>> >> module_refcount() is returning (unsigned long)-1 when called from within
>> >> a routine that runs in module_exit.  This is confusing the scsi device
>> >> put code which is coded to detect a module_refcount() of zero for
>> >> running within a module exit routine and not try to do another
>> >> module_put.  The fix is to restore the original behaviour of
>> >> module_refcount() and return zero if we're running inside an exit
>> >> routine.
>> >> 
>> >> Fixes: e513cc1c07e2ab93a4514eec9833e031df3e30bb
>> >> Reported-by: Bart Van Assche 
>> >> Signed-off-by: James Bottomley 
>> >> 
>> >
>> > Yes, this should be fixed as you said, since it must return "unsigned 
>> > long" value.
>> 
>> But there are only three non-module callers:
>> 
>> drivers/scsi/scsi.c:1012:if (module && module_refcount(module) != 0)
>> drivers/staging/lustre/lustre/obdclass/lu_object.c:1359: 
>> LINVRNT(module_refcount(key->lct_owner) > 0);
>> include/linux/module.h:447:unsigned long module_refcount(struct module *mod);
>> kernel/debug/kdb/kdb_main.c:2026:kdb_printf("%4ld ", 
>> module_refcount(mod));
>> kernel/module.c:775:unsigned long module_refcount(struct module *mod)
>> kernel/module.c:779:EXPORT_SYMBOL(module_refcount);
>> kernel/module.c:859: seq_printf(m, " %lu ", module_refcount(mod));
>> kernel/module.c:911: return sprintf(buffer, "%lu\n", 
>> module_refcount(mk->mod));
>> 
>> The first one I think should be eliminated, and the second one is simply
>> an assertion before calling module_put() (which should probably be
>> eliminated).  The others are just printing information.
>
> If you really want to insist on module_reference() returning -1 when the
> module is in it's exit phase, OK, but in that case, I think it should
> return a signed value, not an unsigned one.

Sure; I just didn't want to paper over the problem here.  And I'm not
sure we want to lose information, eg. in kgdb we're presumably looking
at it because something went wrong...

Thanks,
Rusty.

Subject: module: make module_refcount() a signed integer.

James Bottomley points out that it will be -1 during unload.  It's
only used for diagnostics, so let's not hide that as it could be a
clue as to what's gone wrong.

Cc: Jason Wessel 
Signed-off-by: Rusty Russell 

diff --git a/include/linux/module.h b/include/linux/module.h
index ebfb0e153c6a..b653d7c0a05a 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -444,7 +444,7 @@ extern void __module_put_and_exit(struct module *mod, long 
code)
 #define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code)
 
 #ifdef CONFIG_MODULE_UNLOAD
-unsigned long module_refcount(struct module *mod);
+int module_refcount(struct module *mod);
 void __symbol_put(const char *symbol);
 #define symbol_put(x) __symbol_put(VMLINUX_SYMBOL_STR(x))
 void symbol_put_addr(void *addr);
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index f191bddf64b8..7b40c5f07dce 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2023,7 +2023,7 @@ static int kdb_lsmod(int argc, const char **argv)
kdb_printf("%-20s%8u  0x%p ", mod->name,
   mod->core_size, (void *)mod);
 #ifdef CONFIG_MODULE_UNLOAD
-   kdb_printf("%4ld ", module_refcount(mod));
+   kdb_printf("%4d ", module_refcount(mod));
 #endif
if (mod->state == MODULE_STATE_GOING)
kdb_printf(" (Unloading)");
diff --git a/kernel/module.c b/kernel/module.c
index 3965511ae133..2387c98347c1 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -772,9 +772,9 @@ static int try_stop_module(struct module *mod, int flags, 
int *forced)
return 0;
 }
 
-unsigned long module_refcount(struct module *mod)
+int module_refcount(struct module *mod)
 {
-   return (unsigned long)atomic_read(&mod->refcnt) - MODULE_REF_BASE;
+   return atomic_read(&mod->refcnt) - MODULE_REF_BASE;
 }
 EXPORT_SYMBOL(module_refcount);
 
@@ -856,7 +856,7 @@ static inline void print_unload_info(struct seq_file *m, 
struct module *mod)
struct module_use *use;
int printed_something = 0;
 
-   seq_printf(m, " %lu ", 

Re: module: fix module_refcount() return when running in a module exit routine

2015-01-18 Thread Rusty Russell
Masami Hiramatsu  writes:
> (2015/01/19 1:55), James Bottomley wrote:
>> From: James Bottomley 
>> 
>> After e513cc1 module: Remove stop_machine from module unloading,
>> module_refcount() is returning (unsigned long)-1 when called from within
>> a routine that runs in module_exit.  This is confusing the scsi device
>> put code which is coded to detect a module_refcount() of zero for
>> running within a module exit routine and not try to do another
>> module_put.  The fix is to restore the original behaviour of
>> module_refcount() and return zero if we're running inside an exit
>> routine.
>> 
>> Fixes: e513cc1c07e2ab93a4514eec9833e031df3e30bb
>> Reported-by: Bart Van Assche 
>> Signed-off-by: James Bottomley 
>> 
>
> Yes, this should be fixed as you said, since it must return "unsigned long" 
> value.

But there are only three non-module callers:

drivers/scsi/scsi.c:1012:   if (module && module_refcount(module) != 0)
drivers/staging/lustre/lustre/obdclass/lu_object.c:1359:
LINVRNT(module_refcount(key->lct_owner) > 0);
include/linux/module.h:447:unsigned long module_refcount(struct module *mod);
kernel/debug/kdb/kdb_main.c:2026:   kdb_printf("%4ld ", 
module_refcount(mod));
kernel/module.c:775:unsigned long module_refcount(struct module *mod)
kernel/module.c:779:EXPORT_SYMBOL(module_refcount);
kernel/module.c:859:seq_printf(m, " %lu ", module_refcount(mod));
kernel/module.c:911:return sprintf(buffer, "%lu\n", 
module_refcount(mk->mod));

The first one I think should be eliminated, and the second one is simply
an assertion before calling module_put() (which should probably be
eliminated).  The others are just printing information.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: module: fix module_refcount() return when running in a module exit routine

2015-01-18 Thread Rusty Russell
James Bottomley  writes:
> From: James Bottomley 
>
> After e513cc1 module: Remove stop_machine from module unloading,
> module_refcount() is returning (unsigned long)-1 when called from within
> a routine that runs in module_exit.  This is confusing the scsi device
> put code which is coded to detect a module_refcount() of zero for
> running within a module exit routine and not try to do another
> module_put.  The fix is to restore the original behaviour of
> module_refcount() and return zero if we're running inside an exit
> routine.

I think this code used to be wrong, if someone used rmmod --wait.
Fortunately nobody ever did that, so we removed the feature in 2013 and
your code was OK again :)

But I think the correct fix is the same: turn try_module_get() into
__module_get():

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index e02885451425..9b3829931f40 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -986,9 +986,9 @@ int scsi_device_get(struct scsi_device *sdev)
return -ENXIO;
if (!get_device(&sdev->sdev_gendev))
return -ENXIO;
-   /* We can fail this if we're doing SCSI operations
+   /* We can fail try_module_get if we're doing SCSI operations
 * from module exit (like cache flush) */
-   try_module_get(sdev->host->hostt->module);
+   __module_get(sdev->host->hostt->module);
 
return 0;
 }
@@ -1004,14 +1004,7 @@ EXPORT_SYMBOL(scsi_device_get);
  */
 void scsi_device_put(struct scsi_device *sdev)
 {
-#ifdef CONFIG_MODULE_UNLOAD
-   struct module *module = sdev->host->hostt->module;
-
-   /* The module refcount will be zero if scsi_device_get()
-* was called from a module removal routine */
-   if (module && module_refcount(module) != 0)
-   module_put(module);
-#endif
+   module_put(sdev->host->hostt->module);
put_device(&sdev->sdev_gendev);
 }
 EXPORT_SYMBOL(scsi_device_put);
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 04/25] virtio: defer config changed notifications

2014-10-13 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> Defer config changed notifications that arrive during
> probe/scan/freeze/restore.
>
> This will allow drivers to set DRIVER_OK earlier, without worrying about
> racing with config change interrupts.
>
> This change will also benefit old hypervisors (before 2009)
> that send interrupts without checking DRIVER_OK: previously,
> the callback could race with driver-specific initialization.
>
> This will also help simplify drivers.

But AFAICT you never *read* dev->config_changed.

You unconditionally trigger a config_changed event in
virtio_config_enable().  That's a bit weird, but probably OK.

How about the following change (on top of your patch).  I
think the renaming is clearer, and note the added if() test in
virtio_config_enable().

If you approve, I'll fold it in.

Cheers,
Rusty.

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 2536701b098b..df598dd8c5c8 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -122,7 +122,7 @@ static void __virtio_config_changed(struct virtio_device 
*dev)
struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
 
if (!dev->config_enabled)
-   dev->config_changed = true;
+   dev->config_change_pending = true;
else if (drv && drv->config_changed)
drv->config_changed(dev);
 }
@@ -148,8 +148,9 @@ static void virtio_config_enable(struct virtio_device *dev)
 {
spin_lock_irq(&dev->config_lock);
dev->config_enabled = true;
-   __virtio_config_changed(dev);
-   dev->config_changed = false;
+   if (dev->config_change_pending)
+   __virtio_config_changed(dev);
+   dev->config_change_pending = false;
spin_unlock_irq(&dev->config_lock);
 }
 
@@ -253,7 +254,7 @@ int register_virtio_device(struct virtio_device *dev)
 
spin_lock_init(&dev->config_lock);
dev->config_enabled = false;
-   dev->config_changed = false;
+   dev->config_change_pending = false;
 
/* We always start by resetting the device, in case a previous
 * driver messed it up.  This also tests that code path a little. */
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 5636b119dc25..65261a7244fc 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -80,7 +80,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
  * @index: unique position on the virtio bus
  * @failed: saved value for CONFIG_S_FAILED bit (for restore)
  * @config_enabled: configuration change reporting enabled
- * @config_changed: configuration change reported while disabled
+ * @config_change_pending: configuration change reported while disabled
  * @config_lock: protects configuration change reporting
  * @dev: underlying device.
  * @id: the device type identification (used to match it with a driver).
@@ -94,7 +94,7 @@ struct virtio_device {
int index;
bool failed;
bool config_enabled;
-   bool config_changed;
+   bool config_change_pending;
spinlock_t config_lock;
struct device dev;
struct virtio_device_id id;
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 10/25] virtio: add API to enable VQs early

2014-10-12 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> virtio spec 0.9.X requires DRIVER_OK to be set before
> VQs are used, but some drivers use VQs before probe
> function returns.
> Since DRIVER_OK is set after probe, this violates the spec.
>
> Even though under virtio 1.0 transitional devices support this
> behaviour, we want to make it possible for those early callers to become
> spec compliant and eventually support non-transitional devices.
>
> Add API for drivers to call before using VQs.
>
> Sets DRIVER_OK internally.
>
> Signed-off-by: Michael S. Tsirkin 
> Reviewed-by: Cornelia Huck 

OK, this all looks good, but I think we should do two things:

1) rename virtio_enable_vqs_early() to virtio_device_ready().
2) Add a BUG_ON in the virtio_ring code to make sure device is ready
   before any ops are called.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] scsi: use 64-bit value for 'max_luns'

2014-07-08 Thread Rusty Russell
Christoph Hellwig  writes:
> On Tue, Jun 03, 2014 at 10:58:56AM +0200, Hannes Reinecke wrote:
>> Now that we're using 64-bit LUNs internally we need to increase
>> the size of max_luns to 64 bits, too.
>> 
>> Signed-off-by: Hannes Reinecke 
>> Reviewed-by: Christoph Hellwig 
>> Reviewed-by: Ewan Milne 
>
> I just noticed that this has changes to the module param code.
> These should be split into a separate patch and be ACKed by the modules
> maintainer.  I'd still love to take the change through the SCSI tree to
> be able to get this into 3.17 easily.
>
> moduleparam changes below:
>
>> diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
>> index 204a677..21e2ba6 100644
>> --- a/include/linux/moduleparam.h
>> +++ b/include/linux/moduleparam.h
>> @@ -381,6 +381,11 @@ extern int param_set_ulong(const char *val, const 
>> struct kernel_param *kp);
>>  extern int param_get_ulong(char *buffer, const struct kernel_param *kp);
>>  #define param_check_ulong(name, p) __param_check(name, p, unsigned long)
>>  
>> +extern struct kernel_param_ops param_ops_ullong;
>> +extern int param_set_ullong(const char *val, const struct kernel_param *kp);
>> +extern int param_get_ullong(char *buffer, const struct kernel_param *kp);
>> +#define param_check_ullong(name, p) __param_check(name, p, unsigned long 
>> long)
>> +
>>  extern struct kernel_param_ops param_ops_charp;
>>  extern int param_set_charp(const char *val, const struct kernel_param *kp);
>>  extern int param_get_charp(char *buffer, const struct kernel_param *kp);
>
>> diff --git a/kernel/params.c b/kernel/params.c
>> index b00142e..2b2a9dd 100644
>> --- a/kernel/params.c
>> +++ b/kernel/params.c
>> @@ -253,6 +253,7 @@ STANDARD_PARAM_DEF(int, int, "%i", kstrtoint);
>>  STANDARD_PARAM_DEF(uint, unsigned int, "%u", kstrtouint);
>>  STANDARD_PARAM_DEF(long, long, "%li", kstrtol);
>>  STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", kstrtoul);
>> +STANDARD_PARAM_DEF(ullong, unsigned long long, "%llu", kstrtoull);

Thanks Christoph!

Acked-by: Rusty Russell 

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-v2 1/6] virtio-scsi.h: Add virtio_scsi_cmd_req_pi + VIRTIO_SCSI_F_T10_PI bits

2014-05-22 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Thu, May 22, 2014 at 02:26:17AM +, Nicholas A. Bellinger wrote:
>> From: Nicholas Bellinger 
>> 
>> This patch adds a virtio_scsi_cmd_req_pi header as recommened by
>> Paolo that contains do_pi_niov + di_pi_niov elements used for
>> signaling when protection information buffers are expected to
>> preceed the data buffers.
>
> Cc rusty.
> Rusty could you please Ack merging this through Nicholas' tree
> together with the vhost changes?

Acked-by: Rusty Russell 

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virtio_scsi: don't call virtqueue_add_sgs(... GFP_NOIO) holding spinlock.

2014-05-19 Thread Rusty Russell
This triggers every time we do a SCSI abort:

virtscsi_tmf -> virtscsi_kick_cmd (grab lock and call) -> virtscsi_add_cmd
-> virtqueue_add_sgs (GFP_NOIO)

Logs look like this:
 sd 0:0:0:0: [sda] abort
 BUG: sleeping function called from invalid context at mm/slub.c:966
 in_atomic(): 1, irqs_disabled(): 1, pid: 6, name: kworker/u2:0
 3 locks held by kworker/u2:0/6:
  #0:  ("scsi_tmf_%d"shost->host_no){..}, at: [] 
process_one_work+0xe0/0x3d0
  #1:  ((&(&cmd->abort_work)->work)){..}, at: [] 
process_one_work+0xe0/0x3d0
  #2:  (&(&virtscsi_vq->vq_lock)->rlock){..}, at: [] 
virtscsi_kick_cmd+0x18/0x1b0
 CPU: 0 PID: 6 Comm: kworker/u2:0 Not tainted 3.15.0-rc5+ #110
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-rc1-0-gb1d4dc9-20140515_140003-nilsson.home.kraxel.org 04/01/2014
 Workqueue: scsi_tmf_0 scmd_eh_abort_handler

Signed-off-by: Rusty Russell 

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index db3b494e5926..62757afd93bb 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -433,11 +433,10 @@ static void virtscsi_event_done(struct virtqueue *vq)
  * @cmd: command structure
  * @req_size   : size of the request buffer
  * @resp_size  : size of the response buffer
- * @gfp: flags to use for memory allocations
  */
 static int virtscsi_add_cmd(struct virtqueue *vq,
struct virtio_scsi_cmd *cmd,
-   size_t req_size, size_t resp_size, gfp_t gfp)
+   size_t req_size, size_t resp_size)
 {
struct scsi_cmnd *sc = cmd->sc;
struct scatterlist *sgs[4], req, resp;
@@ -469,19 +468,19 @@ static int virtscsi_add_cmd(struct virtqueue *vq,
if (in)
sgs[out_num + in_num++] = in->sgl;
 
-   return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, gfp);
+   return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_ATOMIC);
 }
 
 static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq,
 struct virtio_scsi_cmd *cmd,
-size_t req_size, size_t resp_size, gfp_t gfp)
+size_t req_size, size_t resp_size)
 {
unsigned long flags;
int err;
bool needs_kick = false;
 
spin_lock_irqsave(&vq->vq_lock, flags);
-   err = virtscsi_add_cmd(vq->vq, cmd, req_size, resp_size, gfp);
+   err = virtscsi_add_cmd(vq->vq, cmd, req_size, resp_size);
if (!err)
needs_kick = virtqueue_kick_prepare(vq->vq);
 
@@ -530,8 +529,7 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len);
 
if (virtscsi_kick_cmd(req_vq, cmd,
- sizeof cmd->req.cmd, sizeof cmd->resp.cmd,
- GFP_ATOMIC) == 0)
+ sizeof cmd->req.cmd, sizeof cmd->resp.cmd) == 0)
ret = 0;
else
mempool_free(cmd, virtscsi_cmd_pool);
@@ -596,8 +594,7 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct 
virtio_scsi_cmd *cmd)
 
cmd->comp = ∁
if (virtscsi_kick_cmd(&vscsi->ctrl_vq, cmd,
- sizeof cmd->req.tmf, sizeof cmd->resp.tmf,
- GFP_NOIO) < 0)
+ sizeof cmd->req.tmf, sizeof cmd->resp.tmf) < 0)
goto out;
 
wait_for_completion(&comp);
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze

2014-01-14 Thread Rusty Russell
Jason Wang  writes:
> From: Asias He 
>
> vqs are freed in virtscsi_freeze but the hotcpu_notifier is not
> unregistered. We will have a use-after-free usage when the notifier
> callback is called after virtscsi_freeze.
>
> Fixes: 285e71ea6f3583a85e27cb2b9a7d8c35d4c0d558
> ("virtio-scsi: reset virtqueue affinity when doing cpu hotplug")

Sorry, I missed this in my earlier sweep (I was only skimming emails
during the break).

Applied,
Rusty.

> Cc: sta...@vger.kernel.org
> Signed-off-by: Asias He 
> Reviewed-by: Paolo Bonzini 
> Signed-off-by: Jason Wang 
> ---
> Changes from V1:
> - Add "Fixes" line
> - CC stable
> ---
>  drivers/scsi/virtio_scsi.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index c3173dc..16bfd50 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -956,6 +956,10 @@ static void virtscsi_remove(struct virtio_device *vdev)
>  #ifdef CONFIG_PM_SLEEP
>  static int virtscsi_freeze(struct virtio_device *vdev)
>  {
> + struct Scsi_Host *sh = virtio_scsi_host(vdev);
> + struct virtio_scsi *vscsi = shost_priv(sh);
> +
> + unregister_hotcpu_notifier(&vscsi->nb);
>   virtscsi_remove_vqs(vdev);
>   return 0;
>  }
> @@ -964,8 +968,17 @@ static int virtscsi_restore(struct virtio_device *vdev)
>  {
>   struct Scsi_Host *sh = virtio_scsi_host(vdev);
>   struct virtio_scsi *vscsi = shost_priv(sh);
> + int err;
> +
> + err = virtscsi_init(vdev, vscsi);
> + if (err)
> + return err;
> +
> + err = register_hotcpu_notifier(&vscsi->nb);
> + if (err)
> + vdev->config->del_vqs(vdev);
>  
> - return virtscsi_init(vdev, vscsi);
> + return err;
>  }
>  #endif
>  
> -- 
> 1.8.3.2
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze

2013-12-16 Thread Rusty Russell
Jason Wang  writes:
> On 10/28/2013 04:01 PM, Asias He wrote:
>> vqs are freed in virtscsi_freeze but the hotcpu_notifier is not
>> unregistered. We will have a use-after-free usage when the notifier
>> callback is called after virtscsi_freeze.
>>
>> Signed-off-by: Asias He 

Please include a Fixes: line, especially if you want the CC: stable.

Thanks,
Rusty.


>> ---
>>  drivers/scsi/virtio_scsi.c | 15 ++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
>> index 74b88ef..b26f1a5 100644
>> --- a/drivers/scsi/virtio_scsi.c
>> +++ b/drivers/scsi/virtio_scsi.c
>> @@ -957,6 +957,10 @@ static void virtscsi_remove(struct virtio_device *vdev)
>>  #ifdef CONFIG_PM
>>  static int virtscsi_freeze(struct virtio_device *vdev)
>>  {
>> +struct Scsi_Host *sh = virtio_scsi_host(vdev);
>> +struct virtio_scsi *vscsi = shost_priv(sh);
>> +
>> +unregister_hotcpu_notifier(&vscsi->nb);
>>  virtscsi_remove_vqs(vdev);
>>  return 0;
>>  }
>> @@ -965,8 +969,17 @@ static int virtscsi_restore(struct virtio_device *vdev)
>>  {
>>  struct Scsi_Host *sh = virtio_scsi_host(vdev);
>>  struct virtio_scsi *vscsi = shost_priv(sh);
>> +int err;
>> +
>> +err = virtscsi_init(vdev, vscsi);
>> +if (err)
>> +return err;
>> +
>> +err = register_hotcpu_notifier(&vscsi->nb);
>> +if (err)
>> +vdev->config->del_vqs(vdev);
>>  
>> -return virtscsi_init(vdev, vscsi);
>> +return err;
>>  }
>>  #endif
>>  
>
> Ping. Rusty, could you please review and apply this patch?
>
> Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: Fix virtqueue affinity setup

2013-07-31 Thread Rusty Russell
Asias He  writes:
> vscsi->num_queues counts the number of request virtqueue which does not
> include the control and event virtqueue. It is wrong to subtract
> VIRTIO_SCSI_VQ_BASE from vscsi->num_queues.
>
> This patch fixes the following panic.

Applied.

Thanks,
Rusty.

>
> (qemu) device_del scsi0
>
>  BUG: unable to handle kernel NULL pointer dereference at 0020
>  IP: [] __virtscsi_set_affinity+0x6f/0x120
>  PGD 0
>  Oops:  [#1] SMP
>  Modules linked in:
>  CPU: 0 PID: 659 Comm: kworker/0:1 Not tainted 3.11.0-rc2+ #1172
>  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>  Workqueue: kacpi_hotplug _handle_hotplug_event_func
>  task: 88007bee1cc0 ti: 88007bfe4000 task.ti: 88007bfe4000
>  RIP: 0010:[]  [] 
> __virtscsi_set_affinity+0x6f/0x120
>  RSP: 0018:88007bfe5a38  EFLAGS: 00010202
>  RAX: 0010 RBX: 880077fd0d28 RCX: 0050
>  RDX:  RSI: 0246 RDI: 
>  RBP: 88007bfe5a58 R08: 880077f6ff00 R09: 0001
>  R10: 8143e673 R11: 0001 R12: 0001
>  R13: 880077fd0800 R14:  R15: 88007bf489b0
>  FS:  () GS:88007ea0() knlGS:
>  CS:  0010 DS:  ES:  CR0: 8005003b
>  CR2: 0020 CR3: 79f8b000 CR4: 06f0
>  Stack:
>   880077fd0d28  880077fd0800 0008
>   88007bfe5a78 8179b37d 88007bccc800 88007bccc800
>   88007bfe5a98 8179b3b6 88007bccc800 880077fd0d28
>  Call Trace:
>   [] virtscsi_set_affinity+0x2d/0x40
>   [] virtscsi_remove_vqs+0x26/0x50
>   [] virtscsi_remove+0x82/0xa0
>   [] virtio_dev_remove+0x22/0x70
>   [] __device_release_driver+0x69/0xd0
>   [] device_release_driver+0x2d/0x40
>   [] bus_remove_device+0x116/0x150
>   [] device_del+0x126/0x1e0
>   [] device_unregister+0x16/0x30
>   [] unregister_virtio_device+0x19/0x30
>   [] virtio_pci_remove+0x36/0x80
>   [] pci_device_remove+0x37/0x70
>   [] __device_release_driver+0x69/0xd0
>   [] device_release_driver+0x2d/0x40
>   [] bus_remove_device+0x116/0x150
>   [] device_del+0x126/0x1e0
>   [] pci_stop_bus_device+0x9c/0xb0
>   [] pci_stop_and_remove_bus_device+0x16/0x30
>   [] acpiphp_disable_slot+0x8e/0x150
>   [] hotplug_event_func+0xba/0x1a0
>   [] ? acpi_os_release_object+0xe/0x12
>   [] _handle_hotplug_event_func+0x31/0x70
>   [] process_one_work+0x183/0x500
>   [] worker_thread+0x122/0x400
>   [] ? manage_workers+0x2d0/0x2d0
>   [] kthread+0xce/0xe0
>   [] ? kthread_freezable_should_stop+0x70/0x70
>   [] ret_from_fork+0x7c/0xb0
>   [] ? kthread_freezable_should_stop+0x70/0x70
>  Code: 01 00 00 00 74 59 45 31 e4 83 bb c8 01 00 00 02 74 46 66 2e 0f 1f 84 
> 00 00 00 00 00 49 63 c4 48 c1 e0 04 48 8b bc 0
> 3 10 02 00 00 <48> 8b 47 20 48 8b 80 d0 01 00 00 48 8b 40 50 48 85 c0 74 07 be
>  RIP  [] __virtscsi_set_affinity+0x6f/0x120
>   RSP 
>  CR2: 0020
>  ---[ end trace 99679331a3775f48 ]---
>
> CC: sta...@vger.kernel.org
> Signed-off-by: Asias He 
> ---
>  drivers/scsi/virtio_scsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 2168258..74b88ef 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -751,7 +751,7 @@ static void __virtscsi_set_affinity(struct virtio_scsi 
> *vscsi, bool affinity)
>  
>   vscsi->affinity_hint_set = true;
>   } else {
> - for (i = 0; i < vscsi->num_queues - VIRTIO_SCSI_VQ_BASE; i++)
> + for (i = 0; i < vscsi->num_queues; i++)
>   virtqueue_set_affinity(vscsi->req_vqs[i].vq, -1);
>  
>   vscsi->affinity_hint_set = false;
> -- 
> 1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V7 0/5] virtio-scsi multiqueue

2013-04-08 Thread Rusty Russell
Asias He  writes:
> On Sat, Apr 06, 2013 at 09:40:13AM +0100, James Bottomley wrote:
>> Well, I haven't had time to look at anything other than the patch I
>> commented on.  I'm happy with your fix, so you can add my acked by to
>> that one.  Since it's going through the virtio tree, don't wait for me,
>> put it in and I'll make you fix up anything I find later that I'm
>> unhappy with.
>
> So, Rusty, could you pick this up in your virtio-next tree?

Done!

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] virtio-scsi: reset virtqueue affinity when doing cpu hotplug

2013-01-15 Thread Rusty Russell
Wanlong Gao  writes:
> Add hot cpu notifier to reset the request virtqueue affinity
> when doing cpu hotplug.

You need to be careful to get_online_cpus() and put_online_cpus() here,
so CPUs can't go up and down in the middle of operations.

In particular, get_online_cpus()/put_online_cpus() around calls to
virtscsi_set_affinity() (except within notifiers).

> +static int virtscsi_cpu_callback(struct notifier_block *nfb,
> +  unsigned long action, void *hcpu)
> +{
> + struct virtio_scsi *vscsi = container_of(nfb, struct virtio_scsi, nb);
> + switch(action) {
> + case CPU_ONLINE:
> + case CPU_ONLINE_FROZEN:
> + case CPU_DEAD:
> + case CPU_DEAD_FROZEN:
> + virtscsi_set_affinity(vscsi, true);
> + break;
> + default:
> + break;
> + }
> + return NOTIFY_OK;
> +}

You probably want to remove affinities *before* the CPU goes down, and
restore it after the CPU comes up.

So, how about:

switch (action & ~CPU_TASKS_FROZEN) {
case CPU_ONLINE:
case CPU_DOWN_FAILED:
virtscsi_set_affinity(vscsi, true, -1);
break;
case CPU_DOWN_PREPARE:
virtscsi_set_affinity(vscsi, true, (unsigned long)hcpu);
break;
}

The extra argument to virtscsi_set_affinity() is to tell it to ignore a
cpu which seems online (because it's going down).

>  static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq,
>struct virtqueue *vq)
>  {
> @@ -888,6 +909,13 @@ static int __devinit virtscsi_probe(struct virtio_device 
> *vdev)
>   if (err)
>   goto virtscsi_init_failed;
>  
> + vscsi->nb.notifier_call = &virtscsi_cpu_callback;
> + err = register_hotcpu_notifier(&vscsi->nb);
> + if (err) {
> + pr_err("virtio_scsi: registering cpu notifier failed\n");
> + goto scsi_add_host_failed;
> + }
> +

You have to clean this up if scsi_add_host() fails, I think.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers

2013-01-07 Thread Rusty Russell
Paolo Bonzini  writes:
> Il 07/01/2013 01:02, Rusty Russell ha scritto:
>> Paolo Bonzini  writes:
>>> Il 02/01/2013 06:03, Rusty Russell ha scritto:
>>>> Paolo Bonzini  writes:
>>>>> The virtqueue_add_buf function has two limitations:
>>>>>
>>>>> 1) it requires the caller to provide all the buffers in a single call;
>>>>>
>>>>> 2) it does not support chained scatterlists: the buffers must be
>>>>> provided as an array of struct scatterlist;
>>>>
>>>> Chained scatterlists are a horrible interface, but that doesn't mean we
>>>> shouldn't support them if there's a need.
>>>>
>>>> I think I once even had a patch which passed two chained sgs, rather
>>>> than a combo sg and two length numbers.  It's very old, but I've pasted
>>>> it below.
>>>>
>>>> Duplicating the implementation by having another interface is pretty
>>>> nasty; I think I'd prefer the chained scatterlists, if that's optimal
>>>> for you.
>>>
>>> Unfortunately, that cannot work because not all architectures support
>>> chained scatterlists.
>> 
>> WHAT?  I can't figure out what an arch needs to do to support this?
>
> It needs to use the iterator functions in its DMA driver.

But we don't care for virtio.

>> All archs we care about support them, though, so I think we can ignore
>> this issue for now.
>
> Kind of... In principle all QEMU-supported arches can use virtio, and
> the speedup can be quite useful.  And there is no Kconfig symbol for SG
> chains that I can use to disable virtio-scsi on unsupported arches. :/

Well, we #error if it's not supported.  Then the lazy architectures can
fix it.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers

2013-01-06 Thread Rusty Russell
Paolo Bonzini  writes:
> Il 02/01/2013 06:03, Rusty Russell ha scritto:
>> Paolo Bonzini  writes:
>>> The virtqueue_add_buf function has two limitations:
>>>
>>> 1) it requires the caller to provide all the buffers in a single call;
>>>
>>> 2) it does not support chained scatterlists: the buffers must be
>>> provided as an array of struct scatterlist;
>> 
>> Chained scatterlists are a horrible interface, but that doesn't mean we
>> shouldn't support them if there's a need.
>> 
>> I think I once even had a patch which passed two chained sgs, rather
>> than a combo sg and two length numbers.  It's very old, but I've pasted
>> it below.
>> 
>> Duplicating the implementation by having another interface is pretty
>> nasty; I think I'd prefer the chained scatterlists, if that's optimal
>> for you.
>
> Unfortunately, that cannot work because not all architectures support
> chained scatterlists.

WHAT?  I can't figure out what an arch needs to do to support this?  Why
is it an option for archs?  Why is sg_chain() even defined for
non-ARCH_HAS_SG_CHAIN?

Jens, help!!

All archs we care about support them, though, so I think we can ignore
this issue for now.

> (Also, as you mention chained scatterlists are horrible.  They'd happen
> to work for virtio-scsi, but not for virtio-blk where the response
> status is part of the footer, not the header).

We lost that debate 5 years ago, so we hack around it as needed.  We can
add helpers to append if we need.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers

2013-01-06 Thread Rusty Russell
Wanlong Gao  writes:
> On 01/02/2013 01:03 PM, Rusty Russell wrote:
>> Paolo Bonzini  writes:
>>> The virtqueue_add_buf function has two limitations:
>>>
>>> 1) it requires the caller to provide all the buffers in a single call;
>>>
>>> 2) it does not support chained scatterlists: the buffers must be
>>> provided as an array of struct scatterlist;
>> 
>> Chained scatterlists are a horrible interface, but that doesn't mean we
>> shouldn't support them if there's a need.
>> 
>> I think I once even had a patch which passed two chained sgs, rather
>> than a combo sg and two length numbers.  It's very old, but I've pasted
>> it below.
>> 
>> Duplicating the implementation by having another interface is pretty
>> nasty; I think I'd prefer the chained scatterlists, if that's optimal
>> for you.
>
> I rebased against virtio-next and use it in virtio-scsi, and tested it with 4 
> targets
> virtio-scsi devices and host cpu idle=poll. Saw a little performance 
> regression here.

Sure, but now you should be able to eliminate virtscsi_map_sgl(), right?
You should be able to use scsi_out(sc) and scsi_in(sc) directly, which
is what Paulo wanted to do...

Right Paulo?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers

2013-01-01 Thread Rusty Russell
Paolo Bonzini  writes:
> The virtqueue_add_buf function has two limitations:
>
> 1) it requires the caller to provide all the buffers in a single call;
>
> 2) it does not support chained scatterlists: the buffers must be
> provided as an array of struct scatterlist;

Chained scatterlists are a horrible interface, but that doesn't mean we
shouldn't support them if there's a need.

I think I once even had a patch which passed two chained sgs, rather
than a combo sg and two length numbers.  It's very old, but I've pasted
it below.

Duplicating the implementation by having another interface is pretty
nasty; I think I'd prefer the chained scatterlists, if that's optimal
for you.

Cheers,
Rusty.

From: Rusty Russell 
Subject: virtio: use chained scatterlists.

Rather than handing a scatterlist[] and out and in numbers to
virtqueue_add_buf(), hand two separate ones which can be chained.

I shall refrain from ranting about what a disgusting hack chained
scatterlists are.  I'll just note that this doesn't make things
simpler (see diff).

The scatterlists we use can be too large for the stack, so we put them
in our device struct and reuse them.  But in many cases we don't want
to pay the cost of sg_init_table() as we don't know how many elements
we'll have and we'd have to initialize the entire table.

This means we have two choices: carefully reset the end markers after
we call virtqueue_add_buf(), which we do in virtio_net for the xmit
path where it's easy and we want to be optimal.  Elsewhere we
implement a helper to unset the end markers after we've filled the
array.

Signed-off-by: Rusty Russell 
---
 drivers/block/virtio_blk.c  |   37 +-
 drivers/char/hw_random/virtio-rng.c |2 -
 drivers/char/virtio_console.c   |6 +--
 drivers/net/virtio_net.c|   67 ++---
 drivers/virtio/virtio_balloon.c |6 +--
 drivers/virtio/virtio_ring.c|   71 ++--
 include/linux/virtio.h  |5 +-
 net/9p/trans_virtio.c   |   38 +--
 8 files changed, 151 insertions(+), 81 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -100,6 +100,14 @@ static void blk_done(struct virtqueue *v
spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
 }
 
+static void sg_unset_end_markers(struct scatterlist *sg, unsigned int num)
+{
+   unsigned int i;
+
+   for (i = 0; i < num; i++)
+   sg[i].page_link &= ~0x02;
+}
+
 static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
   struct request *req)
 {
@@ -140,6 +148,7 @@ static bool do_req(struct request_queue 
}
}
 
+   /* We layout out scatterlist in a single array, out then in. */
sg_set_buf(&vblk->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr));
 
/*
@@ -151,17 +160,8 @@ static bool do_req(struct request_queue 
if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC)
sg_set_buf(&vblk->sg[out++], vbr->req->cmd, vbr->req->cmd_len);
 
+   /* This marks the end of the sg list at vblk->sg[out]. */
num = blk_rq_map_sg(q, vbr->req, vblk->sg + out);
-
-   if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC) {
-   sg_set_buf(&vblk->sg[num + out + in++], vbr->req->sense, 
SCSI_SENSE_BUFFERSIZE);
-   sg_set_buf(&vblk->sg[num + out + in++], &vbr->in_hdr,
-  sizeof(vbr->in_hdr));
-   }
-
-   sg_set_buf(&vblk->sg[num + out + in++], &vbr->status,
-  sizeof(vbr->status));
-
if (num) {
if (rq_data_dir(vbr->req) == WRITE) {
vbr->out_hdr.type |= VIRTIO_BLK_T_OUT;
@@ -172,7 +172,22 @@ static bool do_req(struct request_queue 
}
}
 
-   if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr, GFP_ATOMIC)<0) {
+   if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC) {
+   sg_set_buf(&vblk->sg[out + in++], vbr->req->sense,
+  SCSI_SENSE_BUFFERSIZE);
+   sg_set_buf(&vblk->sg[out + in++], &vbr->in_hdr,
+  sizeof(vbr->in_hdr));
+   }
+
+   sg_set_buf(&vblk->sg[out + in++], &vbr->status,
+  sizeof(vbr->status));
+
+   sg_unset_end_markers(vblk->sg, out+in);
+   sg_mark_end(&vblk->sg[out-1]);
+   sg_mark_end(&vblk->sg[out+in-1]);
+
+   if (virtqueue_add_buf(vblk->vq, vblk->sg, vblk->sg+out, vbr, GFP_ATOMIC)
+   < 0) {
mempool_free(vbr, vblk->pool);
 

Re: [PATCH 1/5] virtio-ring: move queue_index to vring_virtqueue

2012-09-05 Thread Rusty Russell
Paolo Bonzini  writes:

> From: Jason Wang 
>
> Instead of storing the queue index in transport-specific virtio structs,
> this patch moves them to vring_virtqueue and introduces an helper to get
> the value.  This lets drivers simplify their management and tracing of
> virtqueues.
>
> Signed-off-by: Jason Wang 
> Signed-off-by: Paolo Bonzini 

Sorry for the delay, I was at Kernel Summit and am only now actually
reading (vs skimming) my backlog.

Putting it in vring_virtqueue rather than virtqueue seems weird, though.
But I've applied as-is, we can clean up that later if we want (probably
by merging the two structures, I'll have to think harder on that).

Acked-by: Rusty Russell 

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] virtio: introduce an API to set affinity for a virtqueue

2012-09-05 Thread Rusty Russell
Paolo Bonzini  writes:

> From: Jason Wang 
>
> Sometimes, virtio device need to configure irq affinity hint to maximize the
> performance. Instead of just exposing the irq of a virtqueue, this patch
> introduce an API to set the affinity for a virtqueue.
>
> The api is best-effort, the affinity hint may not be set as expected due to
> platform support, irq sharing or irq type. Currently, only pci method were
> implemented and we set the affinity according to:
>
> - if device uses INTX, we just ignore the request
> - if device has per vq vector, we force the affinity hint
> - if the virtqueues share MSI, make the affinity OR over all affinities
>   requested
>
> Signed-off-by: Jason Wang 
> Signed-off-by: Paolo Bonzini 

Applied, thanks.

Acked-by: Rusty Russell 

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-29 Thread Rusty Russell
On Fri, 27 Jul 2012 10:11:26 +0200, Paolo Bonzini  wrote:
> Il 27/07/2012 08:27, Rusty Russell ha scritto:
> >> > +int virtqueue_add_buf_sg(struct virtqueue *_vq,
> >> > + struct scatterlist *sg_out,
> >> > + unsigned int out,
> >> > + struct scatterlist *sg_in,
> >> > + unsigned int in,
> >> > + void *data,
> >> > + gfp_t gfp)
> > The point of chained scatterlists is they're self-terminated, so the
> > in & out counts should be calculated.
> > 
> > Counting them is not *that* bad, since we're about to read them all
> > anyway.
> > 
> > (Yes, the chained scatterlist stuff is complete crack, but I lost that
> > debate years ago.)
> > 
> > Here's my variant.  Networking, console and block seem OK, at least
> > (ie. it booted!).
> 
> I hate the for loops, even though we're about indeed to read all the
> scatterlists anyway... all they do is lengthen critical sections.

You're preaching to the choir: I agree.  But even more, I hate the
passing of a number and a scatterlist: it makes it doubly ambigious
whether we should use the number or the terminator.  And I really hate
bad APIs, even more than a bit of icache loss.

> Also, being the first user of chained scatterlist doesn't exactly give
> me warm fuzzies.

We're far from the first user: they've been in the kernel for well over
7 years.  They were introduced for the block layer, but they tended to
ignore private uses of scatterlists like this one.

> I think it's simpler if we provide an API to add individual buffers to
> the virtqueue, so that you can do multiple virtqueue_add_buf_more
> (whatever) before kicking the virtqueue.  The idea is that I can still
> use indirect buffers for the scatterlists that come from the block layer
> or from an skb, but I will use direct buffers for the request/response
> descriptors.  The direct buffers are always a small number (usually 2),
> so you can balance the effect by making the virtqueue bigger.  And for
> small reads and writes, you save a kmalloc on a very hot path.

This is why I hate chained scatterlists: there's no sane way to tell the
difference between passing a simple array and passing a chain.  We're
mugging the type system.

I think the right way of doing this is a flag.  We could abuse gfp_t,
but that's super-ugly.  Perhaps we should create our own
VQ_ATOMIC/VQ_KERNEL/VQ_DIRECT enum?

Or we could do what we previously suggested, and try to do some
intelligent size heuristic.  I've posted a patch from 3 years ago below.
 
> (BTW, scatterlists will have separate entries for each page; we do not
> need this in virtio buffers.  Collapsing physically-adjacent entries
> will speed up QEMU and will also help avoiding indirect buffers).

Yes, we should do this.  But note that this means an iteration, so we
might as well combine the loops :)

Cheers,
Rusty.

FIXME: remove printk
virtio: use indirect buffers based on demand (heuristic)

virtio_ring uses a ring buffer of descriptors: indirect support allows
a single descriptor to refer to a table of descriptors.  This saves
space in the ring, but requires a kmalloc/kfree.

Rather than try to figure out what the right threshold at which to use
indirect buffers, we drop the threshold dynamically when the ring is
under stress.

Note: to stress this, I reduced the ring size to 32 in lguest, and a
1G send reduced the threshold to 9.

Note2: I moved the BUG_ON()s above the indirect test, where they belong
(indirect falls thru on OOM, so the constraints still apply).
---
 drivers/virtio/virtio_ring.c |   61 ---
 1 file changed, 52 insertions(+), 9 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -63,6 +63,8 @@ struct vring_virtqueue
 
/* Host supports indirect buffers */
bool indirect;
+   /* Threshold before we go indirect. */
+   unsigned int indirect_threshold;
 
/* Number of free buffers */
unsigned int num_free;
@@ -139,6 +141,32 @@ static int vring_add_indirect(struct vri
return head;
 }
 
+static void adjust_threshold(struct vring_virtqueue *vq,
+unsigned int out, unsigned int in)
+{
+   /* There are really two species of virtqueue, and it matters here.
+* If there are no output parts, it's a "normally full" receive queue,
+* otherwise it's a "normally empty" send queue. */
+   if (out) {
+   /* Leave threshold unless we're full. */
+   if (out + in <

Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-27 Thread Rusty Russell
On Thu, 26 Jul 2012 15:05:39 +0200, Paolo Bonzini  wrote:
> Il 26/07/2012 09:58, Paolo Bonzini ha scritto:
> > 
> >> > Please CC me on the "convert to sg copy-less" patches, It looks 
> >> > interesting
> > Sure.
> 
> Well, here is the gist of it (note it won't apply on any public tree,
> hence no SoB yet).  It should be split in multiple changesets and you
> can make more simplifications on top of it, because
> virtio_scsi_target_state is not anymore variable-sized, but that's
> secondary.

ISTR starting on such a patch years ago, but the primitives to
manipulate a chained sg_list were nonexistent, so I dropped it,
waiting for it to be fully-baked or replaced.  That hasn't happened:

> + /* Remove scatterlist terminator, we will tack more items soon.  */
> + vblk->sg[num + out - 1].page_link &= ~0x2;

I hate this interface:

> +int virtqueue_add_buf_sg(struct virtqueue *_vq,
> +  struct scatterlist *sg_out,
> +  unsigned int out,
> +  struct scatterlist *sg_in,
> +  unsigned int in,
> +  void *data,
> +  gfp_t gfp)

The point of chained scatterlists is they're self-terminated, so the
in & out counts should be calculated.

Counting them is not *that* bad, since we're about to read them all
anyway.

(Yes, the chained scatterlist stuff is complete crack, but I lost that
debate years ago.)

Here's my variant.  Networking, console and block seem OK, at least
(ie. it booted!).

From: Rusty Russell 
Subject: virtio: use chained scatterlists.

Rather than handing a scatterlist[] and out and in numbers to
virtqueue_add_buf(), hand two separate ones which can be chained.

I shall refrain from ranting about what a disgusting hack chained
scatterlists are.  I'll just note that this doesn't make things
simpler (see diff).

The scatterlists we use can be too large for the stack, so we put them
in our device struct and reuse them.  But in many cases we don't want
to pay the cost of sg_init_table() as we don't know how many elements
we'll have and we'd have to initialize the entire table.

This means we have two choices: carefully reset the end markers after
we call virtqueue_add_buf(), which we do in virtio_net for the xmit
path where it's easy and we want to be optimal.  Elsewhere we
implement a helper to unset the end markers after we've filled the
array.

Signed-off-by: Rusty Russell 
---
 drivers/block/virtio_blk.c  |   37 +-
 drivers/char/hw_random/virtio-rng.c |2 -
 drivers/char/virtio_console.c   |6 +--
 drivers/net/virtio_net.c|   67 ++---
 drivers/virtio/virtio_balloon.c |6 +--
 drivers/virtio/virtio_ring.c|   71 ++--
 include/linux/virtio.h  |5 +-
 net/9p/trans_virtio.c   |   46 +--
 8 files changed, 159 insertions(+), 81 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -100,6 +100,14 @@ static void blk_done(struct virtqueue *v
spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
 }
 
+static void sg_unset_end_markers(struct scatterlist *sg, unsigned int num)
+{
+   unsigned int i;
+
+   for (i = 0; i < num; i++)
+   sg[i].page_link &= ~0x02;
+}
+
 static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
   struct request *req)
 {
@@ -140,6 +148,7 @@ static bool do_req(struct request_queue 
}
}
 
+   /* We layout out scatterlist in a single array, out then in. */
sg_set_buf(&vblk->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr));
 
/*
@@ -151,17 +160,8 @@ static bool do_req(struct request_queue 
if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC)
sg_set_buf(&vblk->sg[out++], vbr->req->cmd, vbr->req->cmd_len);
 
+   /* This marks the end of the sg list at vblk->sg[out]. */
num = blk_rq_map_sg(q, vbr->req, vblk->sg + out);
-
-   if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC) {
-   sg_set_buf(&vblk->sg[num + out + in++], vbr->req->sense, 
SCSI_SENSE_BUFFERSIZE);
-   sg_set_buf(&vblk->sg[num + out + in++], &vbr->in_hdr,
-  sizeof(vbr->in_hdr));
-   }
-
-   sg_set_buf(&vblk->sg[num + out + in++], &vbr->status,
-  sizeof(vbr->status));
-
if (num) {
if (rq_data_dir(vbr->req) == WRITE) {
vbr->out_hdr.type |= VIRTIO_BLK_T_OUT;
@@ -172,7 +172,22 @@ static bool do_req(st

Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays

2008-01-09 Thread Rusty Russell
On Thursday 10 January 2008 09:10:37 James Bottomley wrote:
> On Tue, 2008-01-08 at 11:39 +1100, Rusty Russell wrote:
> > On Tuesday 08 January 2008 02:48:23 James Bottomley wrote:
> > > We're always open to new APIs (or more powerful and expanded old ones).
> > > The way we've been doing the sg_chain conversion is to slide API layers
> > > into the drivers so sg_chain becomes a simple API flip when we turn it
> > > on.  Unfortunately sg_ring doesn't quite fit nicely into this.
> >
> > Hi James,
> >
> >Well, it didn't touch any drivers.  The only ones which needed
> > altering were those which wanted to use large scatter-gather lists.  You
> > think of the subtlety of sg-chain conversion as a feature; it's a bug :)
>
> No, I think of the side effect of hiding scatterlist manipulations
> inside accessors as a feature because it insulates the drivers from the
> details of the implementation.

I completely disagree.  Abstractions add complexity, and that costs.  They 
steal information from their users, and as they build up like sediment they 
make debugging and understanding harder.

For example, an array is simple and well understood.   An abstraction would 
just buy confusion and YA thing to learn.

When a single array was no longer sufficient, I figured a linked-list of 
arrays was the simplest replacement.  Easy to understand and accepted 
practice (tho rings are a bit exotic).  Because the implementation is the 
interface, authors can see what is legal.

More concretely, you're already regarding your abstractions as too expensive, 
which is why sg_chain() doesn't handle chained sgs.

Result: you've got a complex implementation and a complex interface with a 
complex abstraction.

> > sg_chains suck for manipulation, and AFAICT that's inherent.  Here, take
> > a look at the sg_ring conversion of scsi_alloc_sgtable and
> > scsi_free_sgtable and you can see why I'm unhappy with the sg_chain code:
> [...]
>
> > Hope that clarifies,
>
> Actually, not really.  If I want to continue the competition, I can just
> point out that your sg_ring code is bigger than those corresponding
> segments are after the sg_table conversion of scsi_lib.c ...
>
> However, this is pointless.

No, it's exactly the point.  These details *matter*.  The implementation 
*matters*.  sg_table moves this code out of scsi_lib (good!), but still 
demonstrates how much of a PITA they are to manipulate.

As for being able to make arbitrary changes in future without hitting drivers: 
this is the Kernel ABI pipe dream writ small.

OK, I give in with -ETIMEDOUT.  I'll go away now and do something 
productive :)

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays

2008-01-07 Thread Rusty Russell
On Tuesday 08 January 2008 02:48:23 James Bottomley wrote:
> We're always open to new APIs (or more powerful and expanded old ones).
> The way we've been doing the sg_chain conversion is to slide API layers
> into the drivers so sg_chain becomes a simple API flip when we turn it
> on.  Unfortunately sg_ring doesn't quite fit nicely into this.

Hi James,

   Well, it didn't touch any drivers.  The only ones which needed altering 
were those which wanted to use large scatter-gather lists.  You think of the 
subtlety of sg-chain conversion as a feature; it's a bug :)

> > > The other thing I note is that the problem you're claiming to solve
> > > with sg_ring (the ability to add extra scatterlists to the front or the
> > > back of an existing one) is already solved with sg_chain, so the only
> > > real advantage of sg_ring was that it contains explicit counts, which
> > > sg_table (in -mm) also introduces.
> >
> > I just converted virtio using latest Linus for fair comparison
>
> Erm, but that's not really a fair comparison; you need the sg_table code
> in
>
> git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
>
> branch sg as well.

Actually, I think it's a wash.  Now callers need to set up an sg_table as 
well.  It will save the count_sg() calls though.

> > , and it's still
> > pretty ugly.  sg_ring needs more work to de-scsi it.  sg_table is almost
> > sg_ring except it retains all the chaining warts.
> >
> > But we hit the same problems:
> >
> > 1) sg_chain loses information.  The clever chain packaging makes reading
> > easy, but manipulation is severely limited.  You can append to your own
> > chains by padding, but not someone elses.  This works for SCSI, but what
> > about the rest of us?  And don't even think of joining mapped chains: it
> > will almost work.
>
> OK, but realistically some of your criticisms are way outside of the
> design intent.  Scatterlists (and now sg_chains) are the way the block
> subsystem hands pages to its underlying block devices.

James, scatterlists are used for more than the block subsystem.  I know you're 
designing for that, but we can do better.

Because a single sg_ring is trivially convertable to and from a 
scatterlist *, the compatibility story is nice.  You can implement a 
subsystem (say, the block layer) with sg_ring, and still hand out struct 
scatterlist arrays for backwards compat: old code won't ask for v. long 
scatterlist arrays anyway.

Now we have sg_chaining, we can actually convert complex sg_rings into sg 
chains when someone asks, as my most recent patch does.  I think that's one 
abstraction too many, but it provides a transition path.

> There have never until now been any requirements to join already
> dma_map_sg() converted scatterlists ... that would wreak havoc with the
> way we reuse the list plus damage the idempotence of map/unmap.  What is
> the use case you have for this?

(This was, admittedly, a made-up example).

> > 2) sg_chain's end marker is only for reading non-dma elements, not for
> > mapped chains, nor for writing into chains.  Plus it's a duplicate of the
> > num arg, which is still passed around for efficiency.  (virtio had to
> > implement count_sg() because I didn't want those two redundant num args).
> >
> > 3) By overloading struct scatterlist, it's invisible what code has been
> > converted to chains, and which hasn't.  Too clever by half!
>
> No it's not ... that's the whole point.  Code not yet converted to use
> the API accessors is by definition unable to use chaining.  Code
> converted to use the accessors by design doesn't care (and so is
> "converted to chains").

But you've overloaded the type: what's the difference between:
/* Before conversion */
int foo(struct scatterlist *, int);
And:
/* After conversion */
int foo(struct scatterlist *, int);

You have to wade through the implementation to see the difference: does this 
function take a "chained scatterlist" or an "array scatterlist"?

Then you add insult to injury by implementing sg_chain() *which doesn't handle 
chained scatterlists!*.

> > sg_ring would not have required any change to existing drivers, just
> > those that want to use long sg lists.  And then it's damn obvious which
> > are which.
>
> Which, by definition, would have been pretty much all of them.

But it would have started with none of them, and it would have been done over 
time.  Eventually we might have had a flag day to remove raw scatterlist 
arrays.

> > 4) sg_chain missed a chance to combine all the relevent information (max,
> > num, num_dma and the sg array). sg_table comes close, but you still can't
> > join them (no max information, and even if there were, what if it's
> > full?). Unlike sg_ring, it's unlikely to reduce bugs.
>
> sg_table is sg_chain ... they're incremental extensions of the same body
> of work.

Yes.

> The only such example of a driver like that I know is the
> crypto API and now your virtio.  Once we

Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays

2008-01-07 Thread Rusty Russell
On Monday 07 January 2008 17:37:41 Tejun Heo wrote:
> Rusty Russell wrote:
> > Hi Tejun,
> >
> >Nice try!  Even ignoring the ugliness of undoing such an operation if
> > the caller doesn't expect you to mangle their chains, consider a
> > one-element sg array. :(
>
> Heh heh, that can be dealt with by skipping the first chain if the first
> chain is empty after chaining.  Please take a look at
> ata_sg_setup_extra() in the following.
>
> http://git.kernel.org/?p=linux/kernel/git/tj/libata-dev.git;a=blob;f=driver
>s/ata/libata-core.c;h=32dde5bbc75ed53e89ac17040da2cd0621a37161;hb=c8847e473a
>4a2844244784226eb362be10d52ce9
>
> That said, yeah, it's seriously ugly.  Restoring the original sg is ugly
> too.  I definitely agree that we need some improvements here.

Erk, that's beyond ugly, into actual evil.

To make this general you need to find the last N 1-element chains (but SCSI 
doesn't do this of course).  Oh the horror...

I'd be remiss if I didn't mention that the sg_ring ata patches were 
straightforward, and indescribably beautiful if compared to this!

Thanks,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays

2008-01-06 Thread Rusty Russell
On Monday 07 January 2008 16:01:40 Tejun Heo wrote:
> > But we hit the same problems:
> >
> > 1) sg_chain loses information.  The clever chain packaging makes reading
> > easy, but manipulation is severely limited.  You can append to your own
> > chains by padding, but not someone elses.  This works for SCSI, but what
> > about the rest of us?  And don't even think of joining mapped chains: it
> > will almost work.
>
> You can append by allocating one more element on the chain to be
> appended and moving the last element of the first chain to it while
> using the last element for chaining.

Hi Tejun,

   Nice try!  Even ignoring the ugliness of undoing such an operation if the 
caller doesn't expect you to mangle their chains, consider a one-element sg 
array. :(

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays

2008-01-06 Thread Rusty Russell
On Sunday 06 January 2008 02:31:12 James Bottomley wrote:
> On Wed, 2007-12-19 at 17:31 +1100, Rusty Russell wrote:
> > This patch series is the start of my attempt to simplify and make
> > explicit the chained scatterlist logic.
> >
> > It's not complete: my SATA box boots and seems happy, but all the other
> > users of SCSI need to be updated and checked.  But I've gotten far enough
> > to believe it's worth persuing.
>
> Sorry for the delay in looking at this, I was busy with Holidays and
> things.

Thankyou for your consideration.

> When I compare sg_ring with the current sg_chain (and later sg_table)
> implementations, I'm actually struck by how similar they are.

I agree, they're solving the same problem.  It is possible that the pain of 
change is no longer worthwhile, but I hate to see us give up on this.  We're 
adding complexity without making it harder to misuse.

> The other thing I note is that the problem you're claiming to solve with
> sg_ring (the ability to add extra scatterlists to the front or the back
> of an existing one) is already solved with sg_chain, so the only real
> advantage of sg_ring was that it contains explicit counts, which
> sg_table (in -mm) also introduces.

I just converted virtio using latest Linus for fair comparison, and it's still 
pretty ugly.  sg_ring needs more work to de-scsi it.  sg_table is almost 
sg_ring except it retains all the chaining warts.

But we hit the same problems:

1) sg_chain loses information.  The clever chain packaging makes reading easy, 
but manipulation is severely limited.  You can append to your own chains by 
padding, but not someone elses.  This works for SCSI, but what about the rest 
of us?  And don't even think of joining mapped chains: it will almost work.

2) sg_chain's end marker is only for reading non-dma elements, not for mapped 
chains, nor for writing into chains.  Plus it's a duplicate of the num arg, 
which is still passed around for efficiency.  (virtio had to implement 
count_sg() because I didn't want those two redundant num args).

3) By overloading struct scatterlist, it's invisible what code has been 
converted to chains, and which hasn't.  Too clever by half!

Look at sg_chain(): it claims to join two scatterlists, but it doesn't.  It 
assumes that prv is an array, not a chain.  Because you've overloaded an 
existing type, this happens everywhere.  Try passing skb_to_sgvec a chained 
skb.

sg_ring would not have required any change to existing drivers, just those 
that want to use long sg lists.  And then it's damn obvious which are which.

4) sg_chain missed a chance to combine all the relevent information (max, num, 
num_dma and the sg array). sg_table comes close, but you still can't join 
them (no max information, and even if there were, what if it's full?).  
Unlike sg_ring, it's unlikely to reduce bugs.

5) (A little moot now) sg_ring didn't require arch changes.

> The other differences are that sg_ring only allows adding at the front
> or back of an existing sg_ring, it doesn't allow splicing at any point
> like sg_chain does, so I'd say it's less functional (not that I actually
> want anyone ever to do this, of course ...)

Well it's just as possible, but you might have to copy more elements (with sg 
chaining you need only copy 1 sg element to make room for the chain elem).  
Agreed that it's a little out there...

> The final point is that sg_ring requires a two level traversal:  ring
> list then scatterlist, whereas sg_chain only requires a single level
> traversal.  I grant that we can abstract out the traversal into
> something that would make users think they're only doing a single level,
> but I don't see what the extra level really buys us.

We hide the real complexity from users and it makes it less safe for 
non-trivial cases.

Hence the introduction of YA datastructure: sg_table.  This is getting close: 
just hang the array off it and you'll have sg_ring and no requirement for 
dynamic alloc all the time.  And once you have a header, no need for chaining 
tricks...

> The only thing missing from sg_chain perhaps is an accessor function
> that does the splicing, which I can easily construct if you want to try
> it out in virtio.

I don't need that (I prepend and append), but it'd be nice if sg_next took a 
const struct scatterlist *.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Don't blatt first element of prv in sg_chain()

2008-01-06 Thread Rusty Russell
I realize that sg chaining is a ploy to make the rest of the kernel
devs feel the pain of the SCSI subsystem.  But this was a little
unsubtle.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r b3aec596b841 include/linux/scatterlist.h
--- a/include/linux/scatterlist.h   Mon Jan 07 12:43:56 2008 +1100
+++ b/include/linux/scatterlist.h   Mon Jan 07 15:01:51 2008 +1100
@@ -188,8 +188,8 @@ static inline void sg_chain(struct scatt
/*
 * offset and length are unused for chain entry.  Clear them.
 */
-   prv->offset = 0;
-   prv->length = 0;
+   prv[prv_nents - 1].offset = 0;
+   prv[prv_nents - 1].length = 0;
 
/*
 * Set lowest bit to indicate a link pointer, and make sure to clear
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] scsi: Convert everyone to scsi_sglist and scsi_sg_count

2008-01-03 Thread Rusty Russell
On Thursday 03 January 2008 20:26:13 Boaz Harrosh wrote:
> On Thu, Jan 03 2008 at 10:50 +0200, Rusty Russell <[EMAIL PROTECTED]> 
wrote:
> > This patch simply converts direct uses of ->use_sg and ->request_buffer
> > to use the wrapper macros.  This removes the assumption that the sg list
> > is overloaded on request_buffer, and that there's an explicit use_sg
> > field.
>
> All of these drivers are properly converted in current scsi-misc +
> scsi-pending. If you are really serious about changing scsi-layer you
> better work ontop of scsi git trees.

Hi Boaz,

Well, I wouldn't say I'm serious about SCSI 8) but I am delighted to see 
this being done.  Have grabbed scsi-pending from git.kernel.org, thanks for 
the hint.

> Also you can inspect -mm tree it has the scsi_data_buffer patches that does
> 4/5 what you want.

   Remember, these scsi patches are a side-effect of trying to get my own 
sg-using code sane.  So this is exactly what I *don't* want: another "works 
for scsi" solution :(

I'll see where we can go from here...
Thanks,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] scsi: convert core to sg_ring

2008-01-03 Thread Rusty Russell
A 'struct sg_ring *sg' element is added to struct scsi_cmd, and the
use_sg and __use_sg fields are removed.  If this field is null,
request_buffer point to the data (as was previously indicated by
use_sg == 0).

To ease transition, scsi_sglist() is now a backwards-compat routine
which chains the sg_ring entries so they can be iterated using
sg_next().  Once all the drivers which actually want to use chained
sgs are converted to sg_ring, the remaining drivers can revert to
using simple sg arrays.

The backwards-compat conversion is placed under an (always-on) config
option (CONFIG_SCSI_SG_CHAIN).

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/scsi/Kconfig|6 +
 drivers/scsi/scsi_error.c   |   15 +--
 drivers/scsi/scsi_lib.c |  219 +---
 drivers/scsi/scsi_lib_dma.c |   27 ++---
 drivers/scsi/scsi_tgt_lib.c |   17 +--
 include/scsi/scsi_cmnd.h|   38 +--
 include/scsi/scsi_eh.h  |6 -
 7 files changed, 170 insertions(+), 158 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -29,6 +29,12 @@ config SCSI
 
  However, do not compile this as a module if your root file system
  (the one containing the directory /) is located on a SCSI device.
+
+config SCSI_SG_CHAIN
+   bool
+   default y
+   ---help---
+ Support sg chaining for older SCSI drivers.
 
 config SCSI_DMA
bool
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -617,20 +617,21 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd 
ses->cmd_len = scmd->cmd_len;
memcpy(ses->cmnd, scmd->cmnd, sizeof(scmd->cmnd));
ses->data_direction = scmd->sc_data_direction;
+   ses->sg = scmd->sg;
ses->bufflen = scmd->request_bufflen;
ses->buffer = scmd->request_buffer;
-   ses->use_sg = scmd->use_sg;
ses->resid = scmd->resid;
ses->result = scmd->result;
 
if (sense_bytes) {
scmd->request_bufflen = min_t(unsigned,
   sizeof(scmd->sense_buffer), sense_bytes);
-   sg_init_one(&ses->sense_sgl, scmd->sense_buffer,
-  scmd->request_bufflen);
-   scmd->request_buffer = &ses->sense_sgl;
+
+   sg_ring_single(&ses->sense_sg.ring, scmd->sense_buffer,
+  scmd->request_bufflen);
+   scmd->sg = &ses->sense_sg.ring;
+   scmd->request_buffer = NULL;
scmd->sc_data_direction = DMA_FROM_DEVICE;
-   scmd->use_sg = 1;
memset(scmd->cmnd, 0, sizeof(scmd->cmnd));
scmd->cmnd[0] = REQUEST_SENSE;
scmd->cmnd[4] = scmd->request_bufflen;
@@ -639,7 +640,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd 
scmd->request_buffer = NULL;
scmd->request_bufflen = 0;
scmd->sc_data_direction = DMA_NONE;
-   scmd->use_sg = 0;
+   scmd->sg = NULL;
if (cmnd) {
memset(scmd->cmnd, 0, sizeof(scmd->cmnd));
memcpy(scmd->cmnd, cmnd, cmnd_size);
@@ -678,7 +679,7 @@ void scsi_eh_restore_cmnd(struct scsi_cm
scmd->sc_data_direction = ses->data_direction;
scmd->request_bufflen = ses->bufflen;
scmd->request_buffer = ses->buffer;
-   scmd->use_sg = ses->use_sg;
+   scmd->sg = ses->sg;
scmd->resid = ses->resid;
scmd->result = ses->result;
 }
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
@@ -34,12 +34,20 @@
 #define SG_MEMPOOL_NR  ARRAY_SIZE(scsi_sg_pools)
 #define SG_MEMPOOL_SIZE2
 
+/* If we support old-style SG chaining, we need an extra sg element to 
overwrite
+ * with the chain's next pointer. */
+#ifdef CONFIG_SCSI_SG_CHAIN
+#define SCSI_SG_PAD 1
+#else
+#define SCSI_SG_PAD 0
+#endif
+
 /*
  * The maximum number of SG segments that we will put inside a scatterlist
  * (unless chaining is used). Should ideally fit inside a single page, to
  * avoid a higher order allocation.
  */
-#define SCSI_MAX_SG_SEGMENTS   128
+#define SCSI_MAX_SG_SEGMENTS   (128 - SCSI_SG_PAD)
 
 struct scsi_host_sg_pool {
size_t  size;
@@ -707,7 +715,7 @@ static inline unsigned int scsi_sgtable_
 {
unsigned int index;
 
-   switch (nents) {
+   switch (nents + SCSI_SG_PAD) {
case 1 ... 8:
index =

[PATCH 2/3] usb_storage: usb_stor_bulk_transfer_sg cleanup

2008-01-03 Thread Rusty Russell
usb_stor_bulk_transfer_sg() assumes buf is a scatterlist array if
use_sg is non-NULL.  Change it to an explicit sg arg, instead, to
allow the callers to change to scsi_sglist().

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 09247461cfda drivers/usb/storage/freecom.c
--- a/drivers/usb/storage/freecom.c Thu Jan 03 19:30:25 2008 +1100
+++ b/drivers/usb/storage/freecom.c Thu Jan 03 19:51:09 2008 +1100
@@ -133,7 +133,7 @@ freecom_readdata (struct scsi_cmnd *srb,
/* Now transfer all of our blocks. */
US_DEBUGP("Start of read\n");
result = usb_stor_bulk_transfer_sg(us, ipipe, srb->request_buffer,
-   count, srb->use_sg, &srb->resid);
+   count, scsi_sglist(srb), scsi_sg_count(srb), 
&srb->resid);
US_DEBUGP("freecom_readdata done!\n");
 
if (result > USB_STOR_XFER_SHORT)
@@ -167,7 +167,7 @@ freecom_writedata (struct scsi_cmnd *srb
/* Now transfer all of our blocks. */
US_DEBUGP("Start of write\n");
result = usb_stor_bulk_transfer_sg(us, opipe, srb->request_buffer,
-   count, srb->use_sg, &srb->resid);
+   count, scsi_sglist(srb), scsi_sg_count(srb), 
&srb->resid);
 
US_DEBUGP("freecom_writedata done!\n");
if (result > USB_STOR_XFER_SHORT)
diff -r 09247461cfda drivers/usb/storage/sddr09.c
--- a/drivers/usb/storage/sddr09.c  Thu Jan 03 19:30:25 2008 +1100
+++ b/drivers/usb/storage/sddr09.c  Thu Jan 03 19:51:09 2008 +1100
@@ -355,7 +355,7 @@ static int
 static int
 sddr09_readX(struct us_data *us, int x, unsigned long fromaddress,
 int nr_of_pages, int bulklen, unsigned char *buf,
-int use_sg) {
+struct scatterlist *sg, int use_sg) {
 
unsigned char *command = us->iobuf;
int result;
@@ -382,7 +382,7 @@ sddr09_readX(struct us_data *us, int x, 
}
 
result = usb_stor_bulk_transfer_sg(us, us->recv_bulk_pipe,
-  buf, bulklen, use_sg, NULL);
+  buf, bulklen, sg, use_sg, NULL);
 
if (result != USB_STOR_XFER_GOOD) {
US_DEBUGP("Result for bulk_transfer in sddr09_read2%d %d\n",
@@ -403,12 +403,13 @@ sddr09_readX(struct us_data *us, int x, 
  */
 static int
 sddr09_read20(struct us_data *us, unsigned long fromaddress,
- int nr_of_pages, int pageshift, unsigned char *buf, int use_sg) {
+ int nr_of_pages, int pageshift, unsigned char *buf,
+ struct scatterlist *sg, int use_sg) {
int bulklen = nr_of_pages << pageshift;
 
/* The last 8 bits of fromaddress are ignored. */
return sddr09_readX(us, 0, fromaddress, nr_of_pages, bulklen,
-   buf, use_sg);
+   buf, sg, use_sg);
 }
 
 /*
@@ -426,11 +427,12 @@ sddr09_read20(struct us_data *us, unsign
  */
 static int
 sddr09_read21(struct us_data *us, unsigned long fromaddress,
- int count, int controlshift, unsigned char *buf, int use_sg) {
+ int count, int controlshift, unsigned char *buf,
+ struct scatterlist *sg, int use_sg) {
 
int bulklen = (count << controlshift);
return sddr09_readX(us, 1, fromaddress, count, bulklen,
-   buf, use_sg);
+   buf, sg, use_sg);
 }
 
 /*
@@ -444,13 +446,14 @@ sddr09_read21(struct us_data *us, unsign
  */
 static int
 sddr09_read22(struct us_data *us, unsigned long fromaddress,
- int nr_of_pages, int pageshift, unsigned char *buf, int use_sg) {
+ int nr_of_pages, int pageshift, unsigned char *buf,
+ struct scatterlist *sg, int use_sg) {
 
int bulklen = (nr_of_pages << pageshift) + (nr_of_pages << 
CONTROL_SHIFT);
US_DEBUGP("sddr09_read22: reading %d pages, %d bytes\n",
  nr_of_pages, bulklen);
return sddr09_readX(us, 2, fromaddress, nr_of_pages, bulklen,
-   buf, use_sg);
+   buf, sg, use_sg);
 }
 
 #if 0
@@ -538,7 +541,8 @@ static int
 static int
 sddr09_writeX(struct us_data *us,
  unsigned long Waddress, unsigned long Eaddress,
- int nr_of_pages, int bulklen, unsigned char *buf, int use_sg) {
+ int nr_of_pages, int bulklen, unsigned char *buf,
+ struct scatterlist *sg, int use_sg) {
 
unsigned char *command = us->iobuf;
int result;
@@ -568,7 +572,7 @@ sddr09_writeX(struct us_data *us,
}
 
result = usb_stor_bulk_transfer_sg(us, us->send_bulk_pipe,
-  buf, bulklen, use_sg, NULL);
+  buf, bulklen, sg, use_sg, NULL);
 
if (result != USB_STOR_XFER_GOOD) {
US_DEBU

[PATCH 1/3] scsi: Convert everyone to scsi_sglist and scsi_sg_count

2008-01-03 Thread Rusty Russell
This patch simply converts direct uses of ->use_sg and ->request_buffer to
use the wrapper macros.  This removes the assumption that the sg list is
overloaded on request_buffer, and that there's an explicit use_sg field.

The ->request_buffer assumption is explicit in scsi_debug.c's paranoid
checking, so that code had to be shuffled a little.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/scsi/NCR5380.c   |6 +++---
 drivers/scsi/NCR53C9x.c  |6 +++---
 drivers/scsi/aha1542.c   |   14 +++---
 drivers/scsi/atari_NCR5380.c |2 +-
 drivers/scsi/atp870u.c   |   22 +++---
 drivers/scsi/eata_pio.c  |6 +++---
 drivers/scsi/fd_mcs.c|6 +++---
 drivers/scsi/imm.c   |5 ++---
 drivers/scsi/in2000.c|6 +++---
 drivers/scsi/libsrp.c|   12 ++--
 drivers/scsi/pcmcia/nsp_cs.c |8 
 drivers/scsi/ppa.c   |4 ++--
 drivers/scsi/qlogicpti.c |   12 ++--
 drivers/scsi/scsi_debug.c|   14 +++---
 drivers/scsi/seagate.c   |4 ++--
 drivers/scsi/sr.c|6 +++---
 drivers/scsi/sun3_NCR5380.c  |6 +++---
 drivers/scsi/sun3x_esp.c |4 ++--
 drivers/scsi/wd33c93.c   |6 +++---
 21 files changed, 77 insertions(+), 76 deletions(-)

diff -r 297d045c5da1 drivers/scsi/NCR5380.c
--- a/drivers/scsi/NCR5380.cWed Jan 02 17:18:59 2008 +1100
+++ b/drivers/scsi/NCR5380.cWed Jan 02 18:00:10 2008 +1100
@@ -295,9 +295,9 @@ static __inline__ void initialize_SCp(Sc
 * various queues are valid.
 */
 
-   if (cmd->use_sg) {
-   cmd->SCp.buffer = (struct scatterlist *) cmd->request_buffer;
-   cmd->SCp.buffers_residual = cmd->use_sg - 1;
+   if (scsi_sg_count(cmd)) {
+   cmd->SCp.buffer = scsi_sglist(cmd);
+   cmd->SCp.buffers_residual = scsi_sg_count(cmd) - 1;
cmd->SCp.ptr = sg_virt(cmd->SCp.buffer);
cmd->SCp.this_residual = cmd->SCp.buffer->length;
} else {
diff -r 297d045c5da1 drivers/scsi/NCR53C9x.c
--- a/drivers/scsi/NCR53C9x.c   Wed Jan 02 17:18:59 2008 +1100
+++ b/drivers/scsi/NCR53C9x.c   Wed Jan 02 18:00:10 2008 +1100
@@ -910,7 +910,7 @@ EXPORT_SYMBOL(esp_proc_info);
 
 static void esp_get_dmabufs(struct NCR_ESP *esp, Scsi_Cmnd *sp)
 {
-   if(sp->use_sg == 0) {
+   if(scsi_sg_count(sp) == 0) {
sp->SCp.this_residual = sp->request_bufflen;
sp->SCp.buffer = (struct scatterlist *) sp->request_buffer;
sp->SCp.buffers_residual = 0;
@@ -920,8 +920,8 @@ static void esp_get_dmabufs(struct NCR_E
sp->SCp.ptr =
(char *) virt_to_phys(sp->request_buffer);
} else {
-   sp->SCp.buffer = (struct scatterlist *) sp->request_buffer;
-   sp->SCp.buffers_residual = sp->use_sg - 1;
+   sp->SCp.buffer = scsi_sglist(sp->request_buffer);
+   sp->SCp.buffers_residual = scsi_sg_count(sp) - 1;
sp->SCp.this_residual = sp->SCp.buffer->length;
if (esp->dma_mmu_get_scsi_sgl)
esp->dma_mmu_get_scsi_sgl(esp, sp);
diff -r 297d045c5da1 drivers/scsi/aha1542.c
--- a/drivers/scsi/aha1542.cWed Jan 02 17:18:59 2008 +1100
+++ b/drivers/scsi/aha1542.cWed Jan 02 18:00:10 2008 +1100
@@ -689,7 +689,7 @@ static int aha1542_queuecommand(Scsi_Cmn
 
memcpy(ccb[mbo].cdb, cmd, ccb[mbo].cdblen);
 
-   if (SCpnt->use_sg) {
+   if (scsi_sg_count(SCpnt)) {
struct scatterlist *sg;
struct chain *cptr;
 #ifdef DEBUG
@@ -704,12 +704,12 @@ static int aha1542_queuecommand(Scsi_Cmn
HOSTDATA(SCpnt->device->host)->SCint[mbo] = NULL;
return SCSI_MLQUEUE_HOST_BUSY;
}
-   scsi_for_each_sg(SCpnt, sg, SCpnt->use_sg, i) {
-   if (sg->length == 0 || SCpnt->use_sg > 16 ||
+   scsi_for_each_sg(SCpnt, sg, scsi_sg_count(SCpnt), i) {
+   if (sg->length == 0 || scsi_sg_count(SCpnt) > 16 ||
(((int) sg->offset) & 1) || (sg->length & 1)) {
unsigned char *ptr;
-   printk(KERN_CRIT "Bad segment list supplied to 
aha1542.c (%d, %d)\n", SCpnt->use_sg, i);
-   scsi_for_each_sg(SCpnt, sg, SCpnt->use_sg, i) {
+   printk(KERN_CRIT "Bad segment list supplied to 
aha1542.c (%d, %d)\n", scsi_sg_count(SCpnt), i);
+   scsi_for_each_sg(SCpnt, sg, 
scsi_sg_count(SCpnt), i) {
printk(KERN_CR

[PATCH 0/2] sg_ring: Gentler scsi merge

2008-01-02 Thread Rusty Russell
OK, after wading through many scsi drivers, I decided to change tack and try 
to provide a transition path.  This is in two stages:

1) These two patches.  sg_ring used underneath, but if any driver asks for 
scsi_sglist() they get a 2.6.24-style chained sg.  No other patches are 
necessary.

2) Once all chained-sg-needing scsi drivers change to use cmd->sg (ie. 
sg_ring) directly, and the chained sg patches can be reverted.  scsi_sglist() 
and scsi_sg_count() then become:

/* You should only use these if you never need chained sgs */
static inline struct scatterlist *scsi_sglist(struct scsi_cmd *cmd)
{
BUG_ON(!list_empty(&cmd->sg->list));
return &cmd->sg->sg[0];
}

static unsigned int scsi_sg_count(struct scsi_cmd *cmd)
{
if (!cmd->sg)
return 0;
BUG_ON(!list_empty(&cmd->sg->list));
return cmd->sg->num;
}

Thanks,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] sg_ring for scsi

2007-12-27 Thread Rusty Russell
On Thursday 27 December 2007 13:09:27 FUJITA Tomonori wrote:
> On Wed, 26 Dec 2007 11:27:40 +1100
>
> Rusty Russell <[EMAIL PROTECTED]> wrote:
> > There are many signs through the code that it needs a great deal of
> > work: what is the purpose of sg_break? Why does the code check if
> > kmap_atomic fails?
>
> sg_break is a workaround for the hardware restrictions, I think.

Well, scb->breakup does that, can't see what scb->sg_break is for...  May have 
to wade through git history to understand it.

> > ===
> > Convert the ips SCSI driver to sg_ring.
> >
> > Slightly non-trivial conversion, will need testing.
>
> As I said, I don't think that converting SCSI drivers to sg_ring (with
> lots of non-trivial work) provides any benefits. Surely, sg_ring
> enables us to modify sg lists but SCSI drivers don't need the
> feature. What SCSI drivers needs is just a efficient way to get the
> next sg entry (they use 'sg++' in the past and sg_next now).

Sure, iteration over a two-level structure is a little more tricky, but it's 
pretty trivial for most drivers.

Indeed, I think that it's probably better to have an explicitly different 
interface for sg_ring, and keep the current interface for small sg arrays.  
That would provide a nicer changeover.

I'll prototype something and see what it's like...
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] sg_ring: convert core ATA code to sg_ring.

2007-12-26 Thread Rusty Russell
On Wednesday 26 December 2007 19:36:36 Tejun Heo wrote:
> It would be better to build upon sg chaining as we already have it.  I
> think it can be made much easier with a bit more safe guards,
> generalization and some helpers.

Hi Tejun,

I did this work to replace sg chaining.  The current chaining code gives 
us longer sgs, and (if used carefully) the ability to append in some 
circumstances.  But it's clumsy, and we can do better.  This is my attempt.

I understand that people are experiencing whiplash from the idea of 
another change just after the sg chaining changes which have already gone in.  

At the very least, I think I'll have to come up with a better transition 
plan for SCSI drivers, rather than trying to convert them all at once... In 
theory, sg chaining can be done through the sg ring arrays, but that's pretty 
horrible.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] sg_ring for scsi

2007-12-25 Thread Rusty Russell
On Friday 21 December 2007 15:37:46 FUJITA Tomonori wrote:
> Some scsi drivers like ips access to sglist in a tricky way.

Indeed.  I fail to see how this code works, in fact:

drivers/scsi/ips.c:ips_queue() line 1101

if (ips_is_passthru(SC)) {

ips_copp_wait_item_t *scratch;

/* A Reset IOCTL is only sent by the boot CD in extreme cases.  
 */
/* There can never be any system activity ( network or disk ), 
but check */
/* anyway just as a good practice.  
 */
pt = (ips_passthru_t *) scsi_sglist(SC);
if ((pt->CoppCP.cmd.reset.op_code == IPS_CMD_RESET_CHANNEL) &&
(pt->CoppCP.cmd.reset.adapter_flag == 1)) {

Casting the scsi_sglist(SC) (which ips_is_passthrough treated as a
scatterlist) to an ips_passthrough_t seems completely bogus.  It looks like
it wants to access the region mapped by the scatterlist.

There are many signs through the code that it needs a great deal of work:
what is the purpose of sg_break?  Why does the code check if kmap_atomic
fails?

===
Convert the ips SCSI driver to sg_ring.

Slightly non-trivial conversion, will need testing.

The first issue is the standard one with "scsi_for_each_sg"
conversion: scsi_for_each_sg will always start i at 0 and increment it
each time around; "sg_ring_for_each" will start i at 0 but it will
return to zero for each new sg_ring entry; you need a separate counter
if you really want a simple increment.

Various flaws in the driver have been maintained.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 63176a8a6ce3 drivers/scsi/ips.c
--- a/drivers/scsi/ips.cMon Dec 24 17:40:08 2007 +1100
+++ b/drivers/scsi/ips.cWed Dec 26 11:20:52 2007 +1100
@@ -1105,7 +1105,7 @@ static int ips_queue(struct scsi_cmnd *S
/* A Reset IOCTL is only sent by the boot CD in extreme cases.  
 */
/* There can never be any system activity ( network or disk ), 
but check */
/* anyway just as a good practice.  
 */
-   pt = (ips_passthru_t *) scsi_sglist(SC);
+   pt = (ips_passthru_t *) SC->sg;
if ((pt->CoppCP.cmd.reset.op_code == IPS_CMD_RESET_CHANNEL) &&
(pt->CoppCP.cmd.reset.adapter_flag == 1)) {
if (ha->scb_activelist.count != 0) {
@@ -1508,8 +1508,8 @@ static int ips_is_passthru(struct scsi_c
if ((SC->cmnd[0] == IPS_IOCTL_COMMAND) &&
(SC->device->channel == 0) &&
(SC->device->id == IPS_ADAPTER_ID) &&
-   (SC->device->lun == 0) && scsi_sglist(SC)) {
-struct scatterlist *sg = scsi_sglist(SC);
+   (SC->device->lun == 0) && SC->sg) {
+struct scatterlist *sg = &SC->sg->sg[0];
 char  *buffer;
 
 /* kmap_atomic() ensures addressability of the user buffer.*/
@@ -1575,12 +1575,12 @@ ips_make_passthru(ips_ha_t *ha, struct s
ips_passthru_t *pt;
int length = 0;
int i, ret;
-struct scatterlist *sg = scsi_sglist(SC);
+struct sg_ring *sg;
 
METHOD_TRACE("ips_make_passthru", 1);
 
-scsi_for_each_sg(SC, sg, scsi_sg_count(SC), i)
-length += sg[i].length;
+   sg_ring_for_each(SC->sg, sg, i)
+length += sg->sg[i].length;
 
if (length < sizeof (ips_passthru_t)) {
/* wrong size */
@@ -2005,7 +2005,7 @@ ips_cleanup_passthru(ips_ha_t * ha, ips_
 
METHOD_TRACE("ips_cleanup_passthru", 1);
 
-   if ((!scb) || (!scb->scsi_cmd) || (!scsi_sglist(scb->scsi_cmd))) {
+   if ((!scb) || (!scb->scsi_cmd) || (!scb->scsi_cmd->sg)) {
DEBUG_VAR(1, "(%s%d) couldn't cleanup after passthru",
  ips_name, ha->host_num);
 
@@ -2758,16 +2758,18 @@ ips_next(ips_ha_t * ha, int intr)
 scb->sg_count = scsi_dma_map(SC);
 BUG_ON(scb->sg_count < 0);
if (scb->sg_count) {
-   struct scatterlist *sg;
-   int i;
+   struct sg_ring *sg;
+   int i, idx;
 
scb->flags |= IPS_SCB_MAP_SG;
 
-scsi_for_each_sg(SC, sg, scb->sg_count, i) {
+   idx = 0;
+   sg_ring_for_each(SC->sg, sg, i) {
if (ips_fill_scb_sg_single
-   (ha, sg_dma_address(sg), scb, i,
-sg_dma_len(sg)) < 0)
-   break

Re: [PATCH 0/5] sg_ring for scsi

2007-12-20 Thread Rusty Russell
On Friday 21 December 2007 13:28:34 FUJITA Tomonori wrote:
> I'm not sure about chaining the headers (as your sg_ring and
> scsi_sgtable do) would simplify LLDs. Have you looked at ips or
> qla1280?

Not yet, am working my way through the drivers, but I don't expect it will be 
a simplification to the normal SCSI LLDs.  Most of them are mere consumers of
sgs...

I'm not a SCSI person: I'm patching SCSI because I have to to get my own 
sg-using code clean :)

Hope that clarifies,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] dma_map_sg_ring() helper

2007-12-20 Thread Rusty Russell
On Friday 21 December 2007 11:40:00 David Miller wrote:
> From: Rusty Russell <[EMAIL PROTECTED]>
> Date: Fri, 21 Dec 2007 11:35:12 +1100
>
> > On Friday 21 December 2007 11:00:27 FUJITA Tomonori wrote:
> > > We need to pass the whole sg entries to the IOMMUs at a time.
> >
> > Hi Fujita,
> >
> > OK, it's certainly possible to have an arch override.  For which
> > architecture is this BTW?
>
> SPARC64, POWERPC, maybe IA-64 etc.
>
> Basically any platform that potentially does virtual
> remamping and thus linearization.

Fujita said "need" which confused me.  I already said it should be handed
down as an optimization; I was curious what I had broken :)

> I think it should always be provided, the new APIs give
> less information to the implementation and that's a step
> backwards.

Absolutely.  In fact, I think the sg_ring header would be made safer if it
had the "dma_num" in it as well: it's more explicit and less surprising to
the caller than mangling sg->num.

How are these two patches then?

===
Introduce sg_ring: a ring of scatterlist arrays.

This patch introduces 'struct sg_ring', a layer on top of scatterlist
arrays.  It meshes nicely with routines which expect a simple array of
'struct scatterlist' because it is easy to break down the ring into
its constituent arrays.

The sg_ring header also encodes the maximum number of entries, useful
for routines which populate an sg.  We need never hand around a number
of elements any more.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 include/linux/sg_ring.h |   74 
 1 files changed, 74 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/sgring.h

diff --git a/include/linux/sg_ring.h b/include/linux/sg_ring.h
new file mode 100644
--- /dev/null
+++ b/include/linux/sg_ring.h
@@ -0,0 +1,128 @@
+#ifndef _LINUX_SG_RING_H
+#define _LINUX_SG_RING_H
+#include 
+
+/**
+ * struct sg_ring - a ring of scatterlists
+ * @list: the list_head chaining them together
+ * @num: the number of valid sg entries
+ * @dma_num: the number of valid sg entries after dma mapping
+ * @max: the maximum number of sg entries (size of the sg array).
+ * @sg: the array of scatterlist entries.
+ *
+ * This provides a convenient encapsulation of one or more scatter gather
+ * arrays.  dma_map_sg_ring() (and friends) set @dma_num: some architectures
+ * coalesce sg entries, to this will be < num.
+ */
+struct sg_ring
+{
+   struct list_head list;
+   unsigned int num, dma_num, max;
+   struct scatterlist sg[0];
+};
+
+/* This helper declares an sg ring on the stack or in a struct. */
+#define DECLARE_SG_RING(name, max) \
+   struct {\
+   struct sg_ring ring;\
+   struct scatterlist sg[max]; \
+   } name
+
+/**
+ * sg_ring_init - initialize a scatterlist ring.
+ * @sg: the sg_ring.
+ * @max: the size of the trailing sg array.
+ *
+ * After initialization sg is alone in the ring.
+ */
+static inline void sg_ring_init(struct sg_ring *sg, unsigned int max)
+{
+#ifdef CONFIG_DEBUG_SG
+   unsigned int i;
+   for (i = 0; i < max; i++)
+   sg->sg[i].sg_magic = SG_MAGIC;
+   sg->num = 0x;
+   sg->dma_num = 0x;
+#endif
+   INIT_LIST_HEAD(&sg->list);
+   sg->max = max;
+   /* FIXME: This is to clear the page bits. */
+   sg_init_table(sg->sg, sg->max);
+}
+
+/**
+ * sg_ring_single - initialize a one-element scatterlist ring.
+ * @sg: the sg_ring.
+ * @buf: the pointer to the buffer.
+ * @buflen: the length of the buffer.
+ *
+ * Does sg_ring_init and also sets up first (and only) sg element.
+ */
+static inline void sg_ring_single(struct sg_ring *sg,
+ const void *buf,
+ unsigned int buflen)
+{
+   sg_ring_init(sg, 1);
+   sg->num = 1;
+   sg_init_one(&sg->sg[0], buf, buflen);
+}
+
+/**
+ * sg_ring_next - next array in a scatterlist ring.
+ * @sg: the sg_ring.
+ * @head: the sg_ring head.
+ *
+ * This will return NULL once @sg has looped back around to @head.
+ */
+static inline struct sg_ring *sg_ring_next(const struct sg_ring *sg,
+  const struct sg_ring *head)
+{
+   sg = list_first_entry(&sg->list, struct sg_ring, list);
+   if (sg == head)
+   sg = NULL;
+   return (struct sg_ring *)sg;
+}
+
+/* Helper for writing for loops. */
+static inline struct sg_ring *sg_ring_iter(const struct sg_ring *head,
+  const struct sg_ring *sg,
+  unsigned int *i)
+{
+   (*i)++;
+   /* While loop lets us skip any zero-entry sg_ring arrays */
+ 

Re: [PATCH 2/5] dma_map_sg_ring() helper

2007-12-20 Thread Rusty Russell
On Friday 21 December 2007 11:00:27 FUJITA Tomonori wrote:
> We need to pass the whole sg entries to the IOMMUs at a time.

Hi Fujita,

OK, it's certainly possible to have an arch override.  For which 
architecture is this BTW?

Thanks,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] sg_ring for scsi

2007-12-20 Thread Rusty Russell
On Thursday 20 December 2007 18:58:07 David Miller wrote:
> From: Rusty Russell <[EMAIL PROTECTED]>
> Date: Thu, 20 Dec 2007 18:53:48 +1100
>
> > Manipulating the magic chains is horrible; it looks simple to the
> > places which simply want to iterate through it, but it's awful for
> > code which wants to create them.
>
> I'm not saying complexity is inherent in this stuff, but
> assuming that it is the complexity should live as far away
> from the minions (the iterators in this case).  Therefore,
> the creators is the right spot for the hard stuff.

In this case, the main benefit of the sg chaining was that the conversion of 
most scsi drivers was easy (basically sg++ -> sg = sg_next(sg)).  The 
conversion to sg_ring is more complex, but the end result is not 
significantly more complex.

However, the cost to code which manipulates sg chains was significant: I tried 
using them in virtio and it was too ugly to live (so that doesn't support sg 
chaining).  If this was the best we could do, that'd be fine.

But, as demonstrated, there are real benefits of having an explicit header:

1) It removes the chain-end/explicit count ambiguity (see 
http://lkml.org/lkml/2007/10/25/209 & thread)
2) It allows others to manipulate sg chains, which couldn't be done before
   (eg. the ATA code which wants to append a padding element).
3) I can now hand you an sg ring for you to fill: sg chains can't do that.

In short, sg_ring is generally useful primitive.  sg chains are a clever hack 
for scsi_lib to create, and everyone else to read.

Hope that clarifies,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] dma_map_sg_ring() helper

2007-12-20 Thread Rusty Russell
On Thursday 20 December 2007 18:42:44 David Miller wrote:
> From: FUJITA Tomonori <[EMAIL PROTECTED]>
> Date: Thu, 20 Dec 2007 16:06:31 +0900
>
> > On Thu, 20 Dec 2007 16:49:30 +1100
> >
> > Rusty Russell <[EMAIL PROTECTED]> wrote:
> > > +/**
> > > + * dma_map_sg_ring - Map an entire sg ring
> > > + * @dev: Device to free noncoherent memory for
> > > + * @sg: The sg_ring
> > > + * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
> > > + *
> > > + * This returns -ENOMEM if mapping fails.  It's not clear that telling
> > > you + * it failed is useful though.
> > > + */
> > > +int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
> > > +  enum dma_data_direction direction)
> > > +{
> > > + struct sg_ring *i;
> > > + unsigned int num;
> > > +
> > > + for (i = sg; i; i = sg_ring_next(i, sg)) {
> > > + BUG_ON(i->num > i->max);
> > > + num = dma_map_sg(dev, i->sg, i->num, direction);
> > > + if (num == 0 && i->num != 0)
> > > + goto unmap;
> > > + }
> > > + return 0;
> >
> > I don't think that this works for IOMMUs that could merge sg entries.
>
> Right, it won't work at all.
>
> The caller has to be told how many DMA entries it really did use to
> compose the mapping, and there has to be a way to properly iterate
> over them.  The assumption that the IOMMU will map the SG entries
> 1-to-1 is invalid.

Good catch.  Indeed, what's missing is one line:

    i->num = num;

Of course, an arch-specific version of this could merge between sg_rings,
too, but that's an optimization.

Thanks,
Rusty.

dma_map_sg_ring() helper

Obvious counterpart to dma_map_sg.  Note that this is arch-independent
code; sg_rings are backwards compatible with simple sg arrays.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/base/dma-mapping.c  |   13 +
 include/linux/dma-mapping.h |4 
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
--- a/drivers/base/dma-mapping.c
+++ b/drivers/base/dma-mapping.c
@@ -8,6 +8,7 @@
  */
 
 #include 
+#include 
 
 /*
  * Managed DMA API
@@ -162,6 +163,60 @@ void dmam_free_noncoherent(struct device
 }
 EXPORT_SYMBOL(dmam_free_noncoherent);
 
+/**
+ * dma_map_sg_ring - Map an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * This returns -ENOMEM if mapping fails.  It's not clear that telling you
+ * it failed is useful though.
+ */
+int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+enum dma_data_direction direction)
+{
+   struct sg_ring *i;
+   unsigned int num;
+
+   for (i = sg; i; i = sg_ring_next(i, sg)) {
+   BUG_ON(i->num > i->max);
+   num = dma_map_sg(dev, i->sg, i->num, direction);
+   if (num == 0 && i->num != 0)
+   goto unmap;
+   i->num = num;
+   }
+   return 0;
+
+unmap:
+   while (sg) {
+   dma_unmap_sg(dev, sg->sg, sg->num, direction);
+   sg = sg_ring_next(sg, i);
+   }
+   return -ENOMEM;
+
+}
+EXPORT_SYMBOL(dma_map_sg_ring);
+
+/**
+ * dma_unmap_sg_ring - Unmap an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * Call after dma_map_sg_ring() succeeds.
+ */
+void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+  enum dma_data_direction direction)
+{
+   struct sg_ring *i;
+
+   for (i = sg; i; i = sg_ring_next(i, sg)) {
+   BUG_ON(i->num > i->max);
+   dma_unmap_sg(dev, i->sg, i->num, direction);
+   }
+}
+EXPORT_SYMBOL(dma_unmap_sg_ring);
+
 #ifdef ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY
 
 static void dmam_coherent_decl_release(struct device *dev, void *res)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -87,6 +87,12 @@ dma_mark_declared_memory_occupied(struct
 }
 #endif
 
+struct sg_ring;
+extern int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+  enum dma_data_direction direction);
+extern void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction);
+
 /*
  * Managed DMA API
  */

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] sg_ring for scsi

2007-12-19 Thread Rusty Russell
On Thursday 20 December 2007 18:07:41 FUJITA Tomonori wrote:
> On Thu, 20 Dec 2007 16:45:18 +1100
>
> Rusty Russell <[EMAIL PROTECTED]> wrote:
> > OK, some fixes since last time, as I wade through more SCSI drivers. 
> > Some drivers use "use_sg" as a flag to know whether the request_buffer is
> > a scatterlist: I don't need the counter, but I still need the flag, so I
> > fixed that in a more intuitive way (an explicit ->sg pointer in the cmd).
>
> use_sg and the request_buffer will be removed shortly.
>
> http://marc.info/?l=linux-scsi&m=119754650614813&w=2

Thanks!  Is there a git tree somewhere with these changes?

> I think that we tried the similar idea before, scsi_sgtable, but we
> seem to settle in the current simple approach.

Yes, a scsi-specific solution is a bad idea: other people use sg.  
Manipulating the magic chains is horrible; it looks simple to the places 
which simply want to iterate through it, but it's awful for code which wants 
to create them.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Strange code in initio.c?

2007-12-19 Thread Rusty Russell
Hi Alan,

Was looking through initio.c to convert it to sg_ring, and noticed this 
code:

initio_build_scb() around 2616:
sg = &cblk->sglist[0];
scsi_for_each_sg(cmnd, sglist, cblk->sglen, i) {
sg->data = cpu_to_le32((u32)sg_dma_address(sglist));
total_len += sg->len = 
cpu_to_le32((u32)sg_dma_len(sglist));
}

Note how sg is never incremented... seems wrong to me.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] sg_ring: Convert scsi_debug

2007-12-19 Thread Rusty Russell
Douglas Gilbert pointed out that converting scsi_debug would be a good
demonstration of the conversion required for other SCSI devices.

Details of the changes:
1) ->use_sg is replaced by ->sg, which if non-NULL, contains the sg_ring.
2) ->request_buffer can be NULL (it's only relevent if ->sg isn't set)
3) sg_ring_for_each is no longer required, just iterate directly over ->sg.
4) The iterator updates a struct sg_ring (sg) and an index (k), so previous
   references to sg become &sg->sg[k] (the k'th element within the sg_ring sg).

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r c5fe2cab1d48 drivers/scsi/scsi_debug.c
--- a/drivers/scsi/scsi_debug.c Thu Dec 20 13:12:43 2007 +1100
+++ b/drivers/scsi/scsi_debug.c Thu Dec 20 13:39:24 2007 +1100
@@ -601,16 +601,16 @@ static int fill_from_dev_buffer(struct s
int k, req_len, act_len, len, active;
void * kaddr;
void * kaddr_off;
-   struct scatterlist * sg;
+   struct sg_ring * sg;
 
if (0 == scp->request_bufflen)
return 0;
-   if (NULL == scp->request_buffer)
-   return (DID_ERROR << 16);
if (! ((scp->sc_data_direction == DMA_BIDIRECTIONAL) ||
  (scp->sc_data_direction == DMA_FROM_DEVICE)))
return (DID_ERROR << 16);
-   if (0 == scp->use_sg) {
+   if (NULL == scp->sg) {
+   if (NULL == scp->request_buffer)
+   return (DID_ERROR << 16);
req_len = scp->request_bufflen;
act_len = (req_len < arr_len) ? req_len : arr_len;
memcpy(scp->request_buffer, arr, act_len);
@@ -622,14 +622,14 @@ static int fill_from_dev_buffer(struct s
}
active = 1;
req_len = act_len = 0;
-   scsi_for_each_sg(scp, sg, scp->use_sg, k) {
+   sg_ring_for_each(scp->sg, sg, k) {
if (active) {
kaddr = (unsigned char *)
-   kmap_atomic(sg_page(sg), KM_USER0);
+   kmap_atomic(sg_page(&sg->sg[k]), KM_USER0);
if (NULL == kaddr)
return (DID_ERROR << 16);
-   kaddr_off = (unsigned char *)kaddr + sg->offset;
-   len = sg->length;
+   kaddr_off = (unsigned char *)kaddr + sg->sg[k].offset;
+   len = sg->sg[k].length;
if ((req_len + len) > arr_len) {
active = 0;
len = arr_len - req_len;
@@ -638,7 +638,7 @@ static int fill_from_dev_buffer(struct s
kunmap_atomic(kaddr, KM_USER0);
act_len += len;
}
-   req_len += sg->length;
+   req_len += sg->sg[k].length;
}
if (scp->resid)
scp->resid -= act_len;
@@ -654,29 +654,29 @@ static int fetch_to_dev_buffer(struct sc
int k, req_len, len, fin;
void * kaddr;
void * kaddr_off;
-   struct scatterlist * sg;
+   struct sg_ring * sg;
 
if (0 == scp->request_bufflen)
return 0;
-   if (NULL == scp->request_buffer)
-   return -1;
if (! ((scp->sc_data_direction == DMA_BIDIRECTIONAL) ||
  (scp->sc_data_direction == DMA_TO_DEVICE)))
return -1;
-   if (0 == scp->use_sg) {
+   if (NULL == scp->sg) {
+   if (NULL == scp->request_buffer)
+   return -1;
req_len = scp->request_bufflen;
len = (req_len < max_arr_len) ? req_len : max_arr_len;
memcpy(arr, scp->request_buffer, len);
return len;
}
-   sg = scsi_sglist(scp);
req_len = fin = 0;
-   for (k = 0; k < scp->use_sg; ++k, sg = sg_next(sg)) {
-   kaddr = (unsigned char *)kmap_atomic(sg_page(sg), KM_USER0);
+   sg_ring_for_each(scp->sg, sg, k) {
+   kaddr = (unsigned char *)kmap_atomic(sg_page(&sg->sg[k]),
+KM_USER0);
if (NULL == kaddr)
return -1;
-   kaddr_off = (unsigned char *)kaddr + sg->offset;
-   len = sg->length;
+   kaddr_off = (unsigned char *)kaddr + sg->sg[k].offset;
+   len = sg->sg[k].length;
if ((req_len + len) > max_arr_len) {
len = max_arr_len - req_len;
fin = 1;
@@ -685,7 +685,7 @@ static int fetch_to_dev_buffer(struct sc
kunmap_atomic(kaddr, KM_USER0);
if (fin)
return req_len + len;
-   req_len += sg->length;
+   req_l

[PATCH 4/5] sg_ring: Convert core scsi code to sg_ring.

2007-12-19 Thread Rusty Russell
(Update: explicit sg member to replace use_sg indicating it's a scatter gather)

Details:
1) The use_sg (and __use_sg) fields are removed: sg_rings contain their own
   count, and it's confusing to have two.
2) use_sg used to do double duty: if non-zero, it meant that the request_buffer
   actually pointed to a scatterlist.  Replace this with an explicit sg member:
   if NULL, then request_buffer contains a pointer to the raw data.
3) The scsi_sg_count(), scsi_sglist() and scsi_bufflen() wrappers are removed.
   scsi_sg_count() is no longer necessary, scsi_sglist() is now cmd->sg, and
   scsi_bufflen should just be cmd->request_bufflen if you need it.

If nothing else, the simplification of this logic shows why I prefer
sg_ring over scatterlist chaining.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -617,20 +617,21 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd 
ses->cmd_len = scmd->cmd_len;
memcpy(ses->cmnd, scmd->cmnd, sizeof(scmd->cmnd));
ses->data_direction = scmd->sc_data_direction;
+   ses->sg = scmd->sg;
ses->bufflen = scmd->request_bufflen;
ses->buffer = scmd->request_buffer;
-   ses->use_sg = scmd->use_sg;
ses->resid = scmd->resid;
ses->result = scmd->result;
 
if (sense_bytes) {
scmd->request_bufflen = min_t(unsigned,
   sizeof(scmd->sense_buffer), sense_bytes);
-   sg_init_one(&ses->sense_sgl, scmd->sense_buffer,
-  scmd->request_bufflen);
-   scmd->request_buffer = &ses->sense_sgl;
+
+   sg_ring_single(&ses->sense_sg.ring, scmd->sense_buffer,
+  scmd->request_bufflen);
+   scmd->sg = &ses->sense_sg.ring;
+   scmd->request_buffer = NULL;
scmd->sc_data_direction = DMA_FROM_DEVICE;
-   scmd->use_sg = 1;
memset(scmd->cmnd, 0, sizeof(scmd->cmnd));
scmd->cmnd[0] = REQUEST_SENSE;
scmd->cmnd[4] = scmd->request_bufflen;
@@ -639,7 +640,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd 
scmd->request_buffer = NULL;
scmd->request_bufflen = 0;
scmd->sc_data_direction = DMA_NONE;
-   scmd->use_sg = 0;
+   scmd->sg = NULL;
if (cmnd) {
memset(scmd->cmnd, 0, sizeof(scmd->cmnd));
memcpy(scmd->cmnd, cmnd, cmnd_size);
@@ -678,7 +679,7 @@ void scsi_eh_restore_cmnd(struct scsi_cm
scmd->sc_data_direction = ses->data_direction;
scmd->request_bufflen = ses->bufflen;
scmd->request_buffer = ses->buffer;
-   scmd->use_sg = ses->use_sg;
+   scmd->sg = ses->sg;
scmd->resid = ses->resid;
scmd->result = ses->result;
 }
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
@@ -737,21 +737,41 @@ static inline unsigned int scsi_sgtable_
return index;
 }
 
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+static void free_sgring(struct sg_ring *head)
 {
struct scsi_host_sg_pool *sgp;
-   struct scatterlist *sgl, *prev, *ret;
+   struct sg_ring *sg, *n;
+
+   /* Free any other entries in the ring. */
+   list_for_each_entry_safe(sg, n, &head->list, list) {
+   list_del(&sg->list);
+   sgp = scsi_sg_pools + scsi_sgtable_index(sg->max);
+   mempool_free(sg, sgp->pool);
+   }
+
+   /* Now free the head of the ring. */
+   BUG_ON(!list_empty(&head->list));
+
+   sgp = scsi_sg_pools + scsi_sgtable_index(head->max);
+   mempool_free(head, sgp->pool);
+}
+
+struct sg_ring *scsi_alloc_sgring(struct scsi_cmnd *cmd, unsigned int num,
+ gfp_t gfp_mask)
+{
+   struct scsi_host_sg_pool *sgp;
+   struct sg_ring *sg, *ret;
unsigned int index;
int this, left;
 
-   BUG_ON(!cmd->use_sg);
+   BUG_ON(!num);
 
-   left = cmd->use_sg;
-   ret = prev = NULL;
+   left = num;
+   ret = NULL;
do {
this = left;
if (this > SCSI_MAX_SG_SEGMENTS) {
-   this = SCSI_MAX_SG_SEGMENTS - 1;
+   this = SCSI_MAX_SG_SEGMENTS;
index = SG_MEMPOOL_NR - 1;
} else
   

[PATCH 3/5] blk_rq_map_sg_ring as a counterpart to blk_rq_map_sg.

2007-12-19 Thread Rusty Russell
Obvious counterpart to blk_rq_map_sg.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * for max sense size
@@ -1364,6 +1365,68 @@ new_segment:
 
 EXPORT_SYMBOL(blk_rq_map_sg);
 
+/**
+ * blk_rq_map_sg_ring - map a request to a scatterlist ring.
+ * @q: the request queue this request applies to.
+ * @rq: the request to map
+ * @sg: the sg_ring to populate.
+ *
+ * There must be enough elements in the sg_ring(s) to map the request.
+ */
+void blk_rq_map_sg_ring(struct request_queue *q, struct request *rq,
+   struct sg_ring *sg)
+{
+   struct bio_vec *bvec, *bvprv;
+   struct req_iterator iter;
+   int i, cluster;
+   struct sg_ring *head = sg;
+   struct scatterlist *sgprv;
+
+   i = 0;
+   cluster = q->queue_flags & (1 << QUEUE_FLAG_CLUSTER);
+
+   /*
+* for each bio in rq
+*/
+   bvprv = NULL;
+   sgprv = NULL;
+   rq_for_each_segment(bvec, rq, iter) {
+   int nbytes = bvec->bv_len;
+
+   if (bvprv && cluster) {
+   if (sgprv->length + nbytes > q->max_segment_size)
+   goto new_segment;
+
+   if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
+   goto new_segment;
+   if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
+   goto new_segment;
+
+   sgprv->length += nbytes;
+   } else {
+new_segment:
+   sg_set_page(sg->sg + i, bvec->bv_page, nbytes,
+   bvec->bv_offset);
+   sgprv = sg->sg + i;
+   if (++i == sg->max) {
+   sg->num = i;
+   sg = sg_ring_next(sg, head);
+   i = 0;
+   }
+   }
+   bvprv = bvec;
+   } /* segments in rq */
+
+   /* If we were still working on an sg_ring, set the number and
+* clear any following sg_rings. */
+   if (sg) {
+   sg->num = i;
+   for (sg = sg_ring_next(sg,head); sg; sg = sg_ring_next(sg,head))
+   sg->num = 0;
+   }
+}
+EXPORT_SYMBOL(blk_rq_map_sg_ring);
+
 /*
  * the standard queue merge functions, can be overridden with device
  * specific ones if so desired
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -777,6 +777,8 @@ extern void blk_ordered_complete_seq(str
 extern void blk_ordered_complete_seq(struct request_queue *, unsigned, int);
 
 extern int blk_rq_map_sg(struct request_queue *, struct request *, struct 
scatterlist *);
+struct sg_ring;
+extern void blk_rq_map_sg_ring(struct request_queue *, struct request *, 
struct sg_ring *);
 extern void blk_dump_rq_flags(struct request *, char *);
 extern void generic_unplug_device(struct request_queue *);
 extern void __generic_unplug_device(struct request_queue *);
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] dma_map_sg_ring() helper

2007-12-19 Thread Rusty Russell
Obvious counterpart to dma_map_sg.  Note that this is arch-independent
code; sg_rings are backwards compatible with simple sg arrays.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/base/dma-mapping.c  |   13 +
 include/linux/dma-mapping.h |4 
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
--- a/drivers/base/dma-mapping.c
+++ b/drivers/base/dma-mapping.c
@@ -8,6 +8,7 @@
  */
 
 #include 
+#include 
 
 /*
  * Managed DMA API
@@ -162,6 +163,59 @@ void dmam_free_noncoherent(struct device
 }
 EXPORT_SYMBOL(dmam_free_noncoherent);
 
+/**
+ * dma_map_sg_ring - Map an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * This returns -ENOMEM if mapping fails.  It's not clear that telling you
+ * it failed is useful though.
+ */
+int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+enum dma_data_direction direction)
+{
+   struct sg_ring *i;
+   unsigned int num;
+
+   for (i = sg; i; i = sg_ring_next(i, sg)) {
+   BUG_ON(i->num > i->max);
+   num = dma_map_sg(dev, i->sg, i->num, direction);
+   if (num == 0 && i->num != 0)
+   goto unmap;
+   }
+   return 0;
+
+unmap:
+   while (sg) {
+   dma_unmap_sg(dev, sg->sg, sg->num, direction);
+   sg = sg_ring_next(sg, i);
+   }
+   return -ENOMEM;
+
+}
+EXPORT_SYMBOL(dma_map_sg_ring);
+
+/**
+ * dma_unmap_sg_ring - Unmap an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * Call after dma_map_sg_ring() succeeds.
+ */
+void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+  enum dma_data_direction direction)
+{
+   struct sg_ring *i;
+
+   for (i = sg; i; i = sg_ring_next(i, sg)) {
+   BUG_ON(i->num > i->max);
+   dma_unmap_sg(dev, i->sg, i->num, direction);
+   }
+}
+EXPORT_SYMBOL(dma_unmap_sg_ring);
+
 #ifdef ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY
 
 static void dmam_coherent_decl_release(struct device *dev, void *res)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -87,6 +87,12 @@ dma_mark_declared_memory_occupied(struct
 }
 #endif
 
+struct sg_ring;
+extern int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+  enum dma_data_direction direction);
+extern void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction);
+
 /*
  * Managed DMA API
  */
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] sg_ring: sg_ring.h

2007-12-19 Thread Rusty Russell
(Updates since last time: const-safe iterators and sg_ring_num helper for
counting scatterlist entries across entire ring).

This patch introduces 'struct sg_ring', a layer on top of scatterlist
arrays.  It meshes nicely with routines which expect a simple array of
'struct scatterlist' because it is easy to break down the ring into
its constituent arrays.

The sg_ring header also encodes the maximum number of entries, useful
for routines which populate an sg.  We need never hand around a number
of elements any more.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 include/linux/sg_ring.h |   74 
 1 files changed, 74 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/sgring.h

diff --git a/include/linux/sg_ring.h b/include/linux/sg_ring.h
new file mode 100644
--- /dev/null
+++ b/include/linux/sg_ring.h
@@ -0,0 +1,124 @@
+#ifndef _LINUX_SG_RING_H
+#define _LINUX_SG_RING_H
+#include 
+
+/**
+ * struct sg_ring - a ring of scatterlists
+ * @list: the list_head chaining them together
+ * @num: the number of valid sg entries
+ * @max: the maximum number of sg entries (size of the sg array).
+ * @sg: the array of scatterlist entries.
+ *
+ * This provides a convenient encapsulation of one or more scatter gather
+ * arrays.
+ */
+struct sg_ring
+{
+   struct list_head list;
+   unsigned int num, max;
+   struct scatterlist sg[0];
+};
+
+/* This helper declares an sg ring on the stack or in a struct. */
+#define DECLARE_SG_RING(name, max) \
+   struct {\
+   struct sg_ring ring;\
+   struct scatterlist sg[max]; \
+   } name
+
+/**
+ * sg_ring_init - initialize a scatterlist ring.
+ * @sg: the sg_ring.
+ * @max: the size of the trailing sg array.
+ *
+ * After initialization sg is alone in the ring.
+ */
+static inline void sg_ring_init(struct sg_ring *sg, unsigned int max)
+{
+#ifdef CONFIG_DEBUG_SG
+   unsigned int i;
+   for (i = 0; i < max; i++)
+   sg->sg[i].sg_magic = SG_MAGIC;
+#endif
+   INIT_LIST_HEAD(&sg->list);
+   sg->max = max;
+   /* FIXME: This is to clear the page bits. */
+   sg_init_table(sg->sg, sg->max);
+}
+
+/**
+ * sg_ring_single - initialize a one-element scatterlist ring.
+ * @sg: the sg_ring.
+ * @buf: the pointer to the buffer.
+ * @buflen: the length of the buffer.
+ *
+ * Does sg_ring_init and also sets up first (and only) sg element.
+ */
+static inline void sg_ring_single(struct sg_ring *sg,
+ const void *buf,
+ unsigned int buflen)
+{
+   sg_ring_init(sg, 1);
+   sg->num = 1;
+   sg_init_one(&sg->sg[0], buf, buflen);
+}
+
+/**
+ * sg_ring_next - next array in a scatterlist ring.
+ * @sg: the sg_ring.
+ * @head: the sg_ring head.
+ *
+ * This will return NULL once @sg has looped back around to @head.
+ */
+static inline struct sg_ring *sg_ring_next(const struct sg_ring *sg,
+  const struct sg_ring *head)
+{
+   sg = list_first_entry(&sg->list, struct sg_ring, list);
+   if (sg == head)
+   sg = NULL;
+   return (struct sg_ring *)sg;
+}
+
+/* Helper for writing for loops. */
+static inline struct sg_ring *sg_ring_iter(const struct sg_ring *head,
+  const struct sg_ring *sg,
+  unsigned int *i)
+{
+   (*i)++;
+   /* While loop lets us skip any zero-entry sg_ring arrays */
+   while (*i == sg->num) {
+   *i = 0;
+   sg = sg_ring_next(sg, head);
+   if (!sg)
+   break;
+   }
+   return (struct sg_ring *)sg;
+}
+
+/**
+ * sg_ring_for_each - iterate through an entire sg_ring ring
+ * @head: the head of the sg_ring.
+ * @sg: the sg_ring iterator.
+ * @i: an (unsigned) integer which refers to sg->sg[i].
+ *
+ * The current scatterlist element is sg->sg[i].
+ */
+#define sg_ring_for_each(head, sg, i) \
+   for (sg = head, i = 0; sg; sg = sg_ring_iter(head, sg, &i))
+
+/**
+ * sg_ring_num - how many struct scatterlists are used in this sg_ring.
+ * @head: the sg_ring
+ *
+ * Simple helper function to add up the number of scatterlists.
+ */
+static inline unsigned sg_ring_num(const struct sg_ring *head)
+{
+   unsigned int num = 0, i;
+   const struct sg_ring *sg;
+
+   sg_ring_for_each(head, sg, i)
+   num += sg->num;
+   return num;
+}
+#endif /* _LINUX_SG_RING_H */
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] sg_ring for scsi

2007-12-19 Thread Rusty Russell
OK, some fixes since last time, as I wade through more SCSI drivers.  Some 
drivers use "use_sg" as a flag to know whether the request_buffer is a 
scatterlist: I don't need the counter, but I still need the flag, so I fixed 
that in a more intuitive way (an explicit ->sg pointer in the cmd).

Also, I've updated and tested scsi_debug, after Douglas's excellent 
suggestion.

(I just found out about struct scsi_pointer, so I'm off to update that now, 
too).

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] sg_ring: convert core ATA code to sg_ring.

2007-12-18 Thread Rusty Russell
ATA relies so heavily on scsi that it needs to be converted at the
same time.

ATA adds padding to scatterlists in scsi commands, but because there was
no good way of appending to those scatterlists, it had to use boutique
iterators to make sure the padding is included.  With sg_ring, ATA can
simply append an sg_ring entry with the padding, and normal iterators
can be used.

I renamed qc->cursg to qc->cur_sg to catch all the users: they should
now be referring to 'qc->cur_sg[qc->cursg_i]' wherever they were using
'qc->cursg'.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r a867138da3e0 drivers/ata/ahci.c
--- a/drivers/ata/ahci.cWed Dec 19 16:48:15 2007 +1100
+++ b/drivers/ata/ahci.cWed Dec 19 17:27:09 2007 +1100
@@ -1479,9 +1479,9 @@ static void ahci_tf_read(struct ata_port
 
 static unsigned int ahci_fill_sg(struct ata_queued_cmd *qc, void *cmd_tbl)
 {
-   struct scatterlist *sg;
struct ahci_sg *ahci_sg;
-   unsigned int n_sg = 0;
+   struct sg_ring *sg;
+   unsigned int i, n_sg = 0;
 
VPRINTK("ENTER\n");
 
@@ -1489,9 +1489,9 @@ static unsigned int ahci_fill_sg(struct 
 * Next, the S/G list.
 */
ahci_sg = cmd_tbl + AHCI_CMD_TBL_HDR_SZ;
-   ata_for_each_sg(sg, qc) {
-   dma_addr_t addr = sg_dma_address(sg);
-   u32 sg_len = sg_dma_len(sg);
+   sg_ring_for_each(qc->sg, sg, i) {
+   dma_addr_t addr = sg_dma_address(&sg->sg[i]);
+   u32 sg_len = sg_dma_len(&sg->sg[i]);
 
ahci_sg->addr = cpu_to_le32(addr & 0x);
ahci_sg->addr_hi = cpu_to_le32((addr >> 16) >> 16);
diff -r a867138da3e0 drivers/ata/libata-core.c
--- a/drivers/ata/libata-core.c Wed Dec 19 16:48:15 2007 +1100
+++ b/drivers/ata/libata-core.c Wed Dec 19 17:27:09 2007 +1100
@@ -1596,8 +1596,8 @@ static void ata_qc_complete_internal(str
  */
 unsigned ata_exec_internal_sg(struct ata_device *dev,
  struct ata_taskfile *tf, const u8 *cdb,
- int dma_dir, struct scatterlist *sgl,
- unsigned int n_elem, unsigned long timeout)
+ int dma_dir, struct sg_ring *sg,
+ unsigned long timeout)
 {
struct ata_link *link = dev->link;
struct ata_port *ap = link->ap;
@@ -1657,7 +1657,7 @@ unsigned ata_exec_internal_sg(struct ata
qc->flags |= ATA_QCFLAG_RESULT_TF;
qc->dma_dir = dma_dir;
if (dma_dir != DMA_NONE)
-   ata_sg_init(qc, sgl, n_elem);
+   ata_sg_init(qc, sg);
 
qc->private_data = &wait;
qc->complete_fn = ata_qc_complete_internal;
@@ -1770,18 +1770,16 @@ unsigned ata_exec_internal(struct ata_de
   int dma_dir, void *buf, unsigned int buflen,
   unsigned long timeout)
 {
-   struct scatterlist *psg = NULL, sg;
-   unsigned int n_elem = 0;
+   struct sg_ring *psg = NULL;
+   DECLARE_SG_RING(sg, 1);
 
if (dma_dir != DMA_NONE) {
WARN_ON(!buf);
-   sg_init_one(&sg, buf, buflen);
-   psg = &sg;
-   n_elem++;
-   }
-
-   return ata_exec_internal_sg(dev, tf, cdb, dma_dir, psg, n_elem,
-   timeout);
+   sg_ring_single(&sg.ring, buf, buflen);
+   psg = &sg.ring;
+   }
+
+   return ata_exec_internal_sg(dev, tf, cdb, dma_dir, psg, timeout);
 }
 
 /**
@@ -4438,6 +4436,36 @@ static unsigned int ata_dev_init_params(
return err_mask;
 }
 
+static struct sg_ring *sg_ring_prev(struct sg_ring *sg)
+{
+   return list_entry(sg->list.prev, struct sg_ring, list);
+}
+
+/* To be paranoid, we handle empty scatterlist ring entries here. */
+static struct sg_ring *sg_find_end(struct sg_ring *sg, unsigned int *num)
+{
+   struct sg_ring *i;
+
+   for (i = sg_ring_prev(sg); i->num == 0; i = sg_ring_prev(i)) {
+   /* sg_ring must not be entirely empty. */
+   BUG_ON(i == sg);
+   }
+   *num = i->num - 1;
+   return i;
+}
+
+static void add_padding_to_tail(struct sg_ring *sg, unsigned int padding)
+{
+   struct sg_ring *tail = sg_ring_prev(sg);
+
+   /* It's possible that the entire sg element got moved to padding. */
+   if (tail->num == 0) {
+   BUG_ON(tail->max == 0);
+   tail->num++;
+   }
+   tail->sg[tail->num - 1].length += padding;
+}
+
 /**
  * ata_sg_clean - Unmap DMA memory associated with command
  * @qc: Command containing DMA memory to be released
@@ -4450,37 +4478,39 @@ void ata_sg_clean(struct ata_queued_cmd 
 void ata_sg_clean(struct ata_queued_cmd *qc)
 {
struct ata_port *ap = qc->ap;
-   struct scatterl

[PATCH 6/7] sg_ring: libata simplification

2007-12-18 Thread Rusty Russell
[This patch has been obsoleted: Tejun has a more complete one, but
it's not in mainline yet, so this serves to make the next patch
apply].

libata separates the single buffer case from the scatterlist case
internally.  It's not clear that this is necessary.

Remove the ATA_QCFLAG_SINGLE flag, and buf_virt pointer, and always
initialize qc->nbytes in ata_sg_init().

It's possible that the ATA_QCFLAG_SG and ATA_QCFLAG_DMAMAP flags could
be entirely removed, and we could use whether qc->__sg is NULL or not.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 8b1075c7ad47 drivers/ata/libata-core.c
--- a/drivers/ata/libata-core.c Tue Nov 13 21:00:47 2007 +1100
+++ b/drivers/ata/libata-core.c Wed Nov 14 15:31:07 2007 +1100
@@ -1648,16 +1648,8 @@ unsigned ata_exec_internal_sg(struct ata
memcpy(qc->cdb, cdb, ATAPI_CDB_LEN);
qc->flags |= ATA_QCFLAG_RESULT_TF;
qc->dma_dir = dma_dir;
-   if (dma_dir != DMA_NONE) {
-   unsigned int i, buflen = 0;
-   struct scatterlist *sg;
-
-   for_each_sg(sgl, sg, n_elem, i)
-   buflen += sg->length;
-
+   if (dma_dir != DMA_NONE)
ata_sg_init(qc, sgl, n_elem);
-   qc->nbytes = buflen;
-   }
 
qc->private_data = &wait;
qc->complete_fn = ata_qc_complete_internal;
@@ -4543,9 +4535,6 @@ void ata_sg_clean(struct ata_queued_cmd 
WARN_ON(!(qc->flags & ATA_QCFLAG_DMAMAP));
WARN_ON(sg == NULL);
 
-   if (qc->flags & ATA_QCFLAG_SINGLE)
-   WARN_ON(qc->n_elem > 1);
-
VPRINTK("unmapping %u sg elements\n", qc->n_elem);
 
/* if we padded the buffer out to 32-bit bound, and data
@@ -4566,16 +4555,6 @@ void ata_sg_clean(struct ata_queued_cmd 
memcpy(addr + psg->offset, pad_buf, qc->pad_len);
kunmap_atomic(addr, KM_IRQ0);
}
-   } else {
-   if (qc->n_elem)
-   dma_unmap_single(ap->dev,
-   sg_dma_address(&sg[0]), sg_dma_len(&sg[0]),
-   dir);
-   /* restore sg */
-   sg->length += qc->pad_len;
-   if (pad_buf)
-   memcpy(qc->buf_virt + sg->length - qc->pad_len,
-  pad_buf, qc->pad_len);
}
 
qc->flags &= ~ATA_QCFLAG_DMAMAP;
@@ -4807,16 +4786,8 @@ void ata_noop_qc_prep(struct ata_queued_
 
 void ata_sg_init_one(struct ata_queued_cmd *qc, void *buf, unsigned int buflen)
 {
-   qc->flags |= ATA_QCFLAG_SINGLE;
-
-   qc->__sg = &qc->sgent;
-   qc->n_elem = 1;
-   qc->orig_n_elem = 1;
-   qc->buf_virt = buf;
-   qc->nbytes = buflen;
-   qc->cursg = qc->__sg;
-
sg_init_one(&qc->sgent, buf, buflen);
+   ata_sg_init(qc, &qc->sgent, 1);
 }
 
 /**
@@ -4841,75 +4813,13 @@ void ata_sg_init(struct ata_queued_cmd *
qc->n_elem = n_elem;
qc->orig_n_elem = n_elem;
qc->cursg = qc->__sg;
-}
-
-/**
- * ata_sg_setup_one - DMA-map the memory buffer associated with a command.
- * @qc: Command with memory buffer to be mapped.
- *
- * DMA-map the memory buffer associated with queued_cmd @qc.
- *
- * LOCKING:
- * spin_lock_irqsave(host lock)
- *
- * RETURNS:
- * Zero on success, negative on error.
- */
-
-static int ata_sg_setup_one(struct ata_queued_cmd *qc)
-{
-   struct ata_port *ap = qc->ap;
-   int dir = qc->dma_dir;
-   struct scatterlist *sg = qc->__sg;
-   dma_addr_t dma_address;
-   int trim_sg = 0;
-
-   /* we must lengthen transfers to end on a 32-bit boundary */
-   qc->pad_len = sg->length & 3;
-   if (qc->pad_len) {
-   void *pad_buf = ap->pad + (qc->tag * ATA_DMA_PAD_SZ);
-   struct scatterlist *psg = &qc->pad_sgent;
-
-   WARN_ON(qc->dev->class != ATA_DEV_ATAPI);
-
-   memset(pad_buf, 0, ATA_DMA_PAD_SZ);
-
-   if (qc->tf.flags & ATA_TFLAG_WRITE)
-   memcpy(pad_buf, qc->buf_virt + sg->length - qc->pad_len,
-  qc->pad_len);
-
-   sg_dma_address(psg) = ap->pad_dma + (qc->tag * ATA_DMA_PAD_SZ);
-   sg_dma_len(psg) = ATA_DMA_PAD_SZ;
-   /* trim sg */
-   sg->length -= qc->pad_len;
-   if (sg->length == 0)
-   trim_sg = 1;
-
-   DPRINTK("padding done, sg->length=%u pad_len=%u\n",
-   sg->length, qc->pad_len);
-   }
-
-   if (trim_sg) {
-   qc->n_elem--;
-   goto skip_map;
-   }
-
-   dma_address = dma_map_si

[PATCH 5/7] sg_ring: Convert core scsi code to sg_ring.

2007-12-18 Thread Rusty Russell
If nothing else, the simplification of this logic shows why I prefer
sg_ring over scatterlist chaining.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff --git a/drivers/block/cciss_scsi.c b/drivers/block/cciss_scsi.c
--- a/drivers/block/cciss_scsi.c
+++ b/drivers/block/cciss_scsi.c
@@ -1191,28 +1191,28 @@ cciss_scatter_gather(struct pci_dev *pde
struct scsi_cmnd *cmd)
 {
unsigned int len;
-   struct scatterlist *sg;
+   struct sg_ring *sg;
__u64 addr64;
-   int use_sg, i;
+   int count = 0, i;
 
-   BUG_ON(scsi_sg_count(cmd) > MAXSGENTRIES);
-
-   use_sg = scsi_dma_map(cmd);
-   if (use_sg) {   /* not too many addrs? */
-   scsi_for_each_sg(cmd, sg, use_sg, i) {
-   addr64 = (__u64) sg_dma_address(sg);
-   len  = sg_dma_len(sg);
+   if (scsi_dma_map(cmd) >= 0) {   /* not too many addrs? */
+   sg_ring_for_each(scsi_sgring(cmd), sg, i) {
+   addr64 = (__u64) sg_dma_address(&sg->sg[i]);
+   len  = sg_dma_len(&sg->sg[i]);
cp->SG[i].Addr.lower =
(__u32) (addr64 & (__u64) 0x);
cp->SG[i].Addr.upper =
(__u32) ((addr64 >> 32) & (__u64) 
0x);
cp->SG[i].Len = len;
cp->SG[i].Ext = 0;  // we are not chaining
+   count++;
}
}
 
-   cp->Header.SGList = (__u8) use_sg;   /* no. SGs contig in this cmd */
-   cp->Header.SGTotal = (__u16) use_sg; /* total sgs in this cmd list */
+   cp->Header.SGList = (__u8) count;   /* no. SGs contig in this cmd */
+   cp->Header.SGTotal = (__u16) count; /* total sgs in this cmd list */
+
+   BUG_ON(count > MAXSGENTRIES);
return;
 }
 
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -619,18 +619,17 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd 
ses->data_direction = scmd->sc_data_direction;
ses->bufflen = scmd->request_bufflen;
ses->buffer = scmd->request_buffer;
-   ses->use_sg = scmd->use_sg;
ses->resid = scmd->resid;
ses->result = scmd->result;
 
if (sense_bytes) {
scmd->request_bufflen = min_t(unsigned,
   sizeof(scmd->sense_buffer), sense_bytes);
-   sg_init_one(&ses->sense_sgl, scmd->sense_buffer,
-  scmd->request_bufflen);
-   scmd->request_buffer = &ses->sense_sgl;
+
+   sg_ring_single(&ses->sense_sg.ring, scmd->sense_buffer,
+  scmd->request_bufflen);
+   scmd->request_buffer = &ses->sense_sg;
scmd->sc_data_direction = DMA_FROM_DEVICE;
-   scmd->use_sg = 1;
memset(scmd->cmnd, 0, sizeof(scmd->cmnd));
scmd->cmnd[0] = REQUEST_SENSE;
scmd->cmnd[4] = scmd->request_bufflen;
@@ -639,7 +638,6 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd 
scmd->request_buffer = NULL;
scmd->request_bufflen = 0;
scmd->sc_data_direction = DMA_NONE;
-   scmd->use_sg = 0;
if (cmnd) {
memset(scmd->cmnd, 0, sizeof(scmd->cmnd));
memcpy(scmd->cmnd, cmnd, cmnd_size);
@@ -678,7 +676,6 @@ void scsi_eh_restore_cmnd(struct scsi_cm
scmd->sc_data_direction = ses->data_direction;
scmd->request_bufflen = ses->bufflen;
scmd->request_buffer = ses->buffer;
-   scmd->use_sg = ses->use_sg;
scmd->resid = ses->resid;
scmd->result = ses->result;
 }
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
@@ -737,21 +737,31 @@ static inline unsigned int scsi_sgtable_
return index;
 }
 
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+static void free_sgring(struct sg_ring *sg)
 {
struct scsi_host_sg_pool *sgp;
-   struct scatterlist *sgl, *prev, *ret;
+
+   sgp = scsi_sg_pools + scsi_sgtable_index(sg->max);
+
+   mempool_free(sg, sgp->pool);
+}
+
+struct sg_ring *scsi_alloc_sgring(struct scsi_cmnd *cmd, unsigned int num,
+ gfp_t gfp_mask)
+{
+   struct scsi_host_sg_pool *sgp;
+   struct sg_ring *sg, *ret;
unsigned int index;
int this, left;
 

[PATCH 4/7] sg_ring: dma_map_sg_ring() helper

2007-12-18 Thread Rusty Russell
Obvious counterpart to dma_map_sg.  Note that this is arch-independent
code; sg_rings are backwards compatible with simple sg arrays.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/base/dma-mapping.c  |   13 +
 include/linux/dma-mapping.h |4 
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
--- a/drivers/base/dma-mapping.c
+++ b/drivers/base/dma-mapping.c
@@ -8,6 +8,7 @@
  */
 
 #include 
+#include 
 
 /*
  * Managed DMA API
@@ -162,6 +163,59 @@ void dmam_free_noncoherent(struct device
 }
 EXPORT_SYMBOL(dmam_free_noncoherent);
 
+/**
+ * dma_map_sg_ring - Map an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * This returns -ENOMEM if mapping fails.  It's not clear that telling you
+ * it failed is useful though.
+ */
+int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+enum dma_data_direction direction)
+{
+   struct sg_ring *i;
+   unsigned int num;
+
+   for (i = sg; i; i = sg_ring_next(i, sg)) {
+   BUG_ON(i->num > i->max);
+   num = dma_map_sg(dev, i->sg, i->num, direction);
+   if (num == 0 && i->num != 0)
+   goto unmap;
+   }
+   return 0;
+
+unmap:
+   while (sg) {
+   dma_unmap_sg(dev, sg->sg, sg->num, direction);
+   sg = sg_ring_next(sg, i);
+   }
+   return -ENOMEM;
+
+}
+EXPORT_SYMBOL(dma_map_sg_ring);
+
+/**
+ * dma_unmap_sg_ring - Unmap an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * Call after dma_map_sg_ring() succeeds.
+ */
+void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+  enum dma_data_direction direction)
+{
+   struct sg_ring *i;
+
+   for (i = sg; i; i = sg_ring_next(i, sg)) {
+   BUG_ON(i->num > i->max);
+   dma_unmap_sg(dev, i->sg, i->num, direction);
+   }
+}
+EXPORT_SYMBOL(dma_unmap_sg_ring);
+
 #ifdef ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY
 
 static void dmam_coherent_decl_release(struct device *dev, void *res)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -87,6 +87,12 @@ dma_mark_declared_memory_occupied(struct
 }
 #endif
 
+struct sg_ring;
+extern int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+  enum dma_data_direction direction);
+extern void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction);
+
 /*
  * Managed DMA API
  */
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] sg_ring: blk_rq_map_sg_ring as a counterpart to blk_rq_map_sg.

2007-12-18 Thread Rusty Russell
blk_rq_map_sg_ring as a counterpart to blk_rq_map_sg.

Obvious counterpart to blk_rq_map_sg.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c  |   55 
 include/linux/blkdev.h |1 +
 2 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * for max sense size
@@ -1364,6 +1365,68 @@ new_segment:
 
 EXPORT_SYMBOL(blk_rq_map_sg);
 
+/**
+ * blk_rq_map_sg_ring - map a request to a scatterlist ring.
+ * @q: the request queue this request applies to.
+ * @rq: the request to map
+ * @sg: the sg_ring to populate.
+ *
+ * There must be enough elements in the sg_ring(s) to map the request.
+ */
+void blk_rq_map_sg_ring(struct request_queue *q, struct request *rq,
+   struct sg_ring *sg)
+{
+   struct bio_vec *bvec, *bvprv;
+   struct req_iterator iter;
+   int i, cluster;
+   struct sg_ring *head = sg;
+   struct scatterlist *sgprv;
+
+   i = 0;
+   cluster = q->queue_flags & (1 << QUEUE_FLAG_CLUSTER);
+
+   /*
+* for each bio in rq
+*/
+   bvprv = NULL;
+   sgprv = NULL;
+   rq_for_each_segment(bvec, rq, iter) {
+   int nbytes = bvec->bv_len;
+
+   if (bvprv && cluster) {
+   if (sgprv->length + nbytes > q->max_segment_size)
+   goto new_segment;
+
+   if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
+   goto new_segment;
+   if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
+   goto new_segment;
+
+   sgprv->length += nbytes;
+   } else {
+new_segment:
+   sg_set_page(sg->sg + i, bvec->bv_page, nbytes,
+   bvec->bv_offset);
+   sgprv = sg->sg + i;
+   if (++i == sg->max) {
+   sg->num = i;
+   sg = sg_ring_next(sg, head);
+   i = 0;
+   }
+   }
+   bvprv = bvec;
+   } /* segments in rq */
+
+   /* If we were still working on an sg_ring, set the number and
+* clear any following sg_rings. */
+   if (sg) {
+   sg->num = i;
+   for (sg = sg_ring_next(sg,head); sg; sg = sg_ring_next(sg,head))
+   sg->num = 0;
+   }
+}
+EXPORT_SYMBOL(blk_rq_map_sg_ring);
+
 /*
  * the standard queue merge functions, can be overridden with device
  * specific ones if so desired
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -777,6 +777,8 @@ extern void blk_ordered_complete_seq(str
 extern void blk_ordered_complete_seq(struct request_queue *, unsigned, int);
 
 extern int blk_rq_map_sg(struct request_queue *, struct request *, struct 
scatterlist *);
+struct sg_ring;
+extern void blk_rq_map_sg_ring(struct request_queue *, struct request *, 
struct sg_ring *);
 extern void blk_dump_rq_flags(struct request *, char *);
 extern void generic_unplug_device(struct request_queue *);
 extern void __generic_unplug_device(struct request_queue *);
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] sg_ring: use in virtio.

2007-12-18 Thread Rusty Russell
Using sg_rings, we can join together scatterlists returned by other
subsystems, without needing to allocate an extra element for chaining.
This helps the virtio_blk device which wants to prepend and append
metadata to the request's scatterlist.

As an added bonus, the old virtio layer used to pass a scatterlist
array and two numbers indicating the number of readable and writable
elements respectively; now we can simply hand two sg_rings which is
much clearer (each sg_ring contains its own length).

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/block/virtio_blk.c|   33 ---
 drivers/char/virtio_console.c |   13 +++---
 drivers/net/virtio_net.c  |   22 +-
 drivers/virtio/virtio_ring.c  |   90 ++--
 include/linux/virtio.h|   12 ++---
 5 files changed, 99 insertions(+), 71 deletions(-)

diff -r 6fe7cf582293 drivers/block/virtio_blk.c
--- a/drivers/block/virtio_blk.cWed Dec 19 16:06:43 2007 +1100
+++ b/drivers/block/virtio_blk.cWed Dec 19 18:29:58 2007 +1100
@@ -5,8 +5,6 @@
 #include 
 #include 
 #include 
-
-#define VIRTIO_MAX_SG  (3+MAX_PHYS_SEGMENTS)
 
 static unsigned char virtblk_index = 'a';
 struct virtio_blk
@@ -24,8 +22,10 @@ struct virtio_blk
 
mempool_t *pool;
 
-   /* Scatterlist: can be too big for stack. */
-   struct scatterlist sg[VIRTIO_MAX_SG];
+   /* Scatterlist ring: can be too big for stack. */
+   DECLARE_SG_RING(out, 1);
+   DECLARE_SG_RING(in, 1);
+   DECLARE_SG_RING(sg, MAX_PHYS_SEGMENTS);
 };
 
 struct virtblk_req
@@ -70,8 +70,8 @@ static bool do_req(struct request_queue 
 static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
   struct request *req)
 {
-   unsigned long num, out, in;
struct virtblk_req *vbr;
+   struct sg_ring *in;
 
vbr = mempool_alloc(vblk->pool, GFP_ATOMIC);
if (!vbr)
@@ -95,23 +95,24 @@ static bool do_req(struct request_queue 
if (blk_barrier_rq(vbr->req))
vbr->out_hdr.type |= VIRTIO_BLK_T_BARRIER;
 
-   /* This init could be done at vblk creation time */
-   sg_init_table(vblk->sg, VIRTIO_MAX_SG);
-   sg_set_buf(&vblk->sg[0], &vbr->out_hdr, sizeof(vbr->out_hdr));
-   num = blk_rq_map_sg(q, vbr->req, vblk->sg+1);
-   sg_set_buf(&vblk->sg[num+1], &vbr->in_hdr, sizeof(vbr->in_hdr));
+   sg_ring_single(&vblk->out.ring, &vbr->out_hdr, sizeof(vbr->out_hdr));
+   sg_ring_init(&vblk->sg.ring, ARRAY_SIZE(vblk->sg.sg));
+   vblk->sg.ring.num = blk_rq_map_sg(q, vbr->req, vblk->sg.sg);
+   sg_ring_single(&vblk->in.ring, &vbr->in_hdr, sizeof(vbr->in_hdr));
 
if (rq_data_dir(vbr->req) == WRITE) {
vbr->out_hdr.type |= VIRTIO_BLK_T_OUT;
-   out = 1 + num;
-   in = 1;
+   /* Chain write request onto output buffers. */
+   list_add_tail(&vblk->sg.ring.list, &vblk->out.ring.list);
+   in = &vblk->in.ring;
} else {
vbr->out_hdr.type |= VIRTIO_BLK_T_IN;
-   out = 1;
-   in = 1 + num;
+   /* Chain input (status) buffer at end of read buffers. */
+   list_add_tail(&vblk->in.ring.list, &vblk->sg.ring.list);
+   in = &vblk->sg.ring;
}
 
-   if (vblk->vq->vq_ops->add_buf(vblk->vq, vblk->sg, out, in, vbr)) {
+   if (vblk->vq->vq_ops->add_buf(vblk->vq, &vblk->out.ring, in, vbr)) {
mempool_free(vbr, vblk->pool);
return false;
}
@@ -128,7 +129,7 @@ static void do_virtblk_request(struct re
 
while ((req = elv_next_request(q)) != NULL) {
vblk = req->rq_disk->private_data;
-   BUG_ON(req->nr_phys_segments > ARRAY_SIZE(vblk->sg));
+   BUG_ON(req->nr_phys_segments > ARRAY_SIZE(vblk->sg.sg));
 
/* If this request fails, stop queue and wait for something to
   finish to restart it. */
diff -r 6fe7cf582293 drivers/char/virtio_console.c
--- a/drivers/char/virtio_console.c Wed Dec 19 16:06:43 2007 +1100
+++ b/drivers/char/virtio_console.c Wed Dec 19 18:29:58 2007 +1100
@@ -54,15 +54,15 @@ static struct hv_ops virtio_cons;
  * immediately (lguest's Launcher does). */
 static int put_chars(u32 vtermno, const char *buf, int count)
 {
-   struct scatterlist sg[1];
+   DECLARE_SG_RING(sg, 1);
unsigned int len;
 
/* This is a convenient routine to initialize a single-elem sg list */
-   sg_init_one(sg, buf, count);
+   sg_ring_single(&sg.ring, buf, count);
 
/* add_buf wants a token to identify this buffer: we hand it any
 * non-NULL pointer, since there&#

[PATCH 1/7] sg_ring: introduce sg_ring: a ring of scatterlist arrays.

2007-12-18 Thread Rusty Russell
This patch introduces 'struct sg_ring', a layer on top of scatterlist
arrays.  It meshes nicely with routines which expect a simple array of
'struct scatterlist' because it is easy to break down the ring into
its constituent arrays.

The sg_ring header also encodes the maximum number of entries, useful
for routines which populate an sg.  We need never hand around a number
of elements any more.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 include/linux/sg_ring.h |   74 
 1 files changed, 74 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/sgring.h

diff --git a/include/linux/sg_ring.h b/include/linux/sg_ring.h
new file mode 100644
--- /dev/null
+++ b/include/linux/sg_ring.h
@@ -0,0 +1,107 @@
+#ifndef _LINUX_SG_RING_H
+#define _LINUX_SG_RING_H
+#include 
+
+/**
+ * struct sg_ring - a ring of scatterlists
+ * @list: the list_head chaining them together
+ * @num: the number of valid sg entries
+ * @max: the maximum number of sg entries (size of the sg array).
+ * @sg: the array of scatterlist entries.
+ *
+ * This provides a convenient encapsulation of one or more scatter gather
+ * arrays.
+ */
+struct sg_ring
+{
+   struct list_head list;
+   unsigned int num, max;
+   struct scatterlist sg[0];
+};
+
+/* This helper declares an sg ring on the stack or in a struct. */
+#define DECLARE_SG_RING(name, max) \
+   struct {\
+   struct sg_ring ring;\
+   struct scatterlist sg[max]; \
+   } name
+
+/**
+ * sg_ring_init - initialize a scatterlist ring.
+ * @sg: the sg_ring.
+ * @max: the size of the trailing sg array.
+ *
+ * After initialization sg is alone in the ring.
+ */
+static inline void sg_ring_init(struct sg_ring *sg, unsigned int max)
+{
+#ifdef CONFIG_DEBUG_SG
+   unsigned int i;
+   for (i = 0; i < max; i++)
+   sg->sg[i].sg_magic = SG_MAGIC;
+#endif
+   INIT_LIST_HEAD(&sg->list);
+   sg->max = max;
+   /* FIXME: This is to clear the page bits. */
+   sg_init_table(sg->sg, sg->max);
+}
+
+/**
+ * sg_ring_single - initialize a one-element scatterlist ring.
+ * @sg: the sg_ring.
+ * @buf: the pointer to the buffer.
+ * @buflen: the length of the buffer.
+ *
+ * Does sg_ring_init and also sets up first (and only) sg element.
+ */
+static inline void sg_ring_single(struct sg_ring *sg,
+ const void *buf,
+ unsigned int buflen)
+{
+   sg_ring_init(sg, 1);
+   sg->num = 1;
+   sg_init_one(&sg->sg[0], buf, buflen);
+}
+
+/**
+ * sg_ring_next - next array in a scatterlist ring.
+ * @sg: the sg_ring.
+ * @head: the sg_ring head.
+ *
+ * This will return NULL once @sg has looped back around to @head.
+ */
+static inline struct sg_ring *sg_ring_next(struct sg_ring *sg,
+  const struct sg_ring *head)
+{
+   sg = list_first_entry(&sg->list, struct sg_ring, list);
+   if (sg == head)
+   sg = NULL;
+   return sg;
+}
+
+/* Helper for writing for loops */
+static inline struct sg_ring *sg_ring_iter(const struct sg_ring *head,
+  struct sg_ring *sg, unsigned int *i)
+{
+   (*i)++;
+   /* While loop lets us skip any zero-entry sg_ring arrays */
+   while (*i == sg->num) {
+   *i = 0;
+   sg = sg_ring_next(sg, head);
+   if (!sg)
+   break;
+   }
+   return sg;
+}
+
+/**
+ * sg_ring_for_each - iterate through an entire sg_ring ring
+ * @head: the head of the sg_ring.
+ * @sg: the sg_ring iterator.
+ * @i: an (unsigned) integer which refers to sg->sg[i].
+ *
+ * The current scatterlist element is sg->sg[i].
+ */
+#define sg_ring_for_each(head, sg, i) \
+   for (sg = head, i = 0; sg; sg = sg_ring_iter(head, sg, &i))
+#endif /* _LINUX_SG_RING_H */
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] sg_ring: a ring of scatterlist arrays

2007-12-18 Thread Rusty Russell
This patch series is the start of my attempt to simplify and make explicit
the chained scatterlist logic.

It's not complete: my SATA box boots and seems happy, but all the other
users of SCSI need to be updated and checked.  But I've gotten far enough
to believe it's worth persuing.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]scsi: BUG_ON() impossible condition.

2007-11-18 Thread Rusty Russell
If blk_rq_map_sg wrote more than was allocated in the scatterlist,
BUG_ON() is probably the right thing to do.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Acked-by: Jens Axboe <[EMAIL PROTECTED]>
---
 drivers/scsi/scsi_tgt_lib.c |   11 +++
 1 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1102,7 +1102,6 @@ void scsi_io_completion(struct scsi_cmnd
  *
  * Returns: 0 on success
  * BLKPREP_DEFER if the failure is retryable
- * BLKPREP_KILL if the failure is fatal
  */
 static int scsi_init_io(struct scsi_cmnd *cmd)
 {
@@ -1136,17 +1135,9 @@ static int scsi_init_io(struct scsi_cmnd
 * each segment.
 */
count = blk_rq_map_sg(req->q, req, cmd->request_buffer);
-   if (likely(count <= cmd->use_sg)) {
-   cmd->use_sg = count;
-   return BLKPREP_OK;
-   }
-
-   printk(KERN_ERR "Incorrect number of segments after building list\n");
-   printk(KERN_ERR "counted %d, received %d\n", count, cmd->use_sg);
-   printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
-   req->current_nr_sectors);
-
-   return BLKPREP_KILL;
+   BUG_ON(count > cmd->use_sg);
+   cmd->use_sg = count;
+   return BLKPREP_OK;
 }
 
 static struct scsi_cmnd *scsi_get_cmd_from_req(struct scsi_device *sdev,
diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -367,14 +367,9 @@ static int scsi_tgt_init_cmd(struct scsi
 
dprintk("cmd %p cnt %d %lu\n", cmd, cmd->use_sg, rq_data_dir(rq));
count = blk_rq_map_sg(rq->q, rq, cmd->request_buffer);
-   if (likely(count <= cmd->use_sg)) {
-   cmd->use_sg = count;
-   return 0;
-   }
-
-   eprintk("cmd %p cnt %d\n", cmd, cmd->use_sg);
-   scsi_free_sgtable(cmd);
-   return -EINVAL;
+   BUG_ON(count > cmd->use_sg);
+   cmd->use_sg = count;
+   return 0;
 }
 
 /* TODO: test this crap and replace bio_map_user with new interface maybe */
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html