Re: [PATCH 16/22] xen-blkfront: Make use of the new sg_map helper function

2017-04-18 Thread Konrad Rzeszutek Wilk
On Tue, Apr 18, 2017 at 09:42:20AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 18/04/17 08:27 AM, Konrad Rzeszutek Wilk wrote:
> > Interesting that you didn't CC any of the maintainers. Could you 
> > do that in the future please?
> 
> Please read the cover letter. The distribution list for the patchset
> would have been way too large to cc every maintainer (even as limited as
> it was, I had mailing lists yelling at me). My plan was to get buy in

I am not sure if you know, but you can add on each patch the respective
maintainer via 'CC'. That way you can have certain maintainers CCed only
on the subsystems they cover. You put it after (or before) your SoB and
git send-email happilly picks it up.

It does mean that for every patch you have to run something like this:

$ more add_cc 
#!/bin/bash

git diff HEAD^.. > /tmp/a
echo "---"
scripts/get_maintainer.pl --no-l /tmp/a | while read file
do
echo "Cc: $file"
done

Or such.


> for the first patch, get it merged and resend the rest independently to
> their respective maintainers. Of course, though, I'd be open to other
> suggestions.
> 
> >>>
> >>> Signed-off-by: Logan Gunthorpe 
> >>> ---
> >>>  drivers/block/xen-blkfront.c | 33 +++--
> >>>  1 file changed, 27 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> >>> index 5067a0a..7dcf41d 100644
> >>> --- a/drivers/block/xen-blkfront.c
> >>> +++ b/drivers/block/xen-blkfront.c
> >>> @@ -807,8 +807,19 @@ static int blkif_queue_rw_req(struct request *req, 
> >>> struct blkfront_ring_info *ri
> >>>   BUG_ON(sg->offset + sg->length > PAGE_SIZE);
> >>>
> >>>   if (setup.need_copy) {
> >>> - setup.bvec_off = sg->offset;
> >>> - setup.bvec_data = kmap_atomic(sg_page(sg));
> >>> + setup.bvec_off = 0;
> >>> + setup.bvec_data = sg_map(sg, SG_KMAP_ATOMIC);
> >>> + if (IS_ERR(setup.bvec_data)) {
> >>> + /*
> >>> +  * This should really never happen unless
> >>> +  * the code is changed to use memory that is
> >>> +  * not mappable in the sg. Seeing there is a
> >>> +  * questionable error path out of here,
> >>> +  * we WARN.
> >>> +  */
> >>> + WARN(1, "Non-mappable memory used in sg!");
> >>> + return 1;
> >>> + }
> >> ...
> >>
> >> Perhaps add a flag to mark failure as 'unexpected' and trace (and panic?)
> >> inside sg_map().
> 
> Thanks, that's a good suggestion. I'll make the change for v2.
> 
> Logan


Re: [PATCH 17/25] blk-mq: remove the error argument to blk_mq_complete_request

2017-04-18 Thread Konrad Rzeszutek Wilk
On Tue, Apr 18, 2017 at 04:03:21PM +0100, Roger Pau Monné wrote:
> On Thu, Apr 06, 2017 at 05:39:36PM +0200, Christoph Hellwig wrote:
> > Now that we always have a ->complete callback we can remove the direct
> > call to blk_mq_end_request, as well as the error argument to
> > blk_mq_complete_request.
> > 
> > Signed-off-by: Christoph Hellwig 
> 
> For blkfront:
> 
> Acked-by: Roger Pau Monné 
> 

And also
Reviewed-by: Konrad Rzeszutek Wilk 


Re: [PATCH 16/25] xen-blkfront: don't use req->errors

2017-04-18 Thread Konrad Rzeszutek Wilk
On Tue, Apr 18, 2017 at 04:00:51PM +0100, Roger Pau Monné wrote:
> On Thu, Apr 06, 2017 at 05:39:35PM +0200, Christoph Hellwig wrote:
> > xen-blkfron is the last users using rq->errros for passing back error to
> > blk-mq, and I'd like to get rid of that.  In the longer run the driver
> > should be moving more of the completion processing into .complete, but
> > this is the minimal change to move forward for now.
> > 
> > Signed-off-by: Christoph Hellwig 
> 
> Acked-by: Roger Pau Monné 
> 

Reviewed-by: Konrad Rzeszutek Wilk 

Thanks!


Re: kill req->errors

2017-04-18 Thread Konrad Rzeszutek Wilk
On Fri, Apr 07, 2017 at 09:11:24AM +0200, Christoph Hellwig wrote:
> On Thu, Apr 06, 2017 at 04:00:24PM -0400, Konrad Rzeszutek Wilk wrote:
> > You wouldn't have a git tree to easily test it? Thanks.
> 
> git://git.infradead.org/users/hch/block.git request-errors
> 
> Gitweb:
> 
> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/request-errors

I tested it with Xen and it all worked out. Albeit in hindsight I didn't check
the ones that matter - that is the error paths (duh!).

So let me do that now.


Re: [PATCH 16/22] xen-blkfront: Make use of the new sg_map helper function

2017-04-18 Thread Konrad Rzeszutek Wilk
On Tue, Apr 18, 2017 at 02:13:59PM +, David Laight wrote:
> From: Logan Gunthorpe
> > Sent: 13 April 2017 23:05
> > Straightforward conversion to the new helper, except due to
> > the lack of error path, we have to warn if unmapable memory
> > is ever present in the sgl.

Interesting that you didn't CC any of the maintainers. Could you 
do that in the future please?

> > 
> > Signed-off-by: Logan Gunthorpe 
> > ---
> >  drivers/block/xen-blkfront.c | 33 +++--
> >  1 file changed, 27 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> > index 5067a0a..7dcf41d 100644
> > --- a/drivers/block/xen-blkfront.c
> > +++ b/drivers/block/xen-blkfront.c
> > @@ -807,8 +807,19 @@ static int blkif_queue_rw_req(struct request *req, 
> > struct blkfront_ring_info *ri
> > BUG_ON(sg->offset + sg->length > PAGE_SIZE);
> > 
> > if (setup.need_copy) {
> > -   setup.bvec_off = sg->offset;
> > -   setup.bvec_data = kmap_atomic(sg_page(sg));
> > +   setup.bvec_off = 0;
> > +   setup.bvec_data = sg_map(sg, SG_KMAP_ATOMIC);
> > +   if (IS_ERR(setup.bvec_data)) {
> > +   /*
> > +* This should really never happen unless
> > +* the code is changed to use memory that is
> > +* not mappable in the sg. Seeing there is a
> > +* questionable error path out of here,
> > +* we WARN.
> > +*/
> > +   WARN(1, "Non-mappable memory used in sg!");
> > +   return 1;
> > +   }
> ...
> 
> Perhaps add a flag to mark failure as 'unexpected' and trace (and panic?)
> inside sg_map().
> 
>   David
> 
> 
> ___
> Linux-nvdimm mailing list
> linux-nvd...@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: kill req->errors

2017-04-06 Thread Konrad Rzeszutek Wilk
On Thu, Apr 06, 2017 at 05:39:19PM +0200, Christoph Hellwig wrote:
> Currently the request structure has an errors field that is used in
> various different ways.  The oldest drivers use it as an error count,
> blk-mq and the generic timeout code assume that it holds a Linux
> errno for block completions, and various drivers use it for internal
> status values, often overwriting them with Linux errnos later,
> that is unless they are processing passthrough requests in which
> case they'll leave their errors in it.
> 
> This series kills the ->errors field and replaced it with new fields
> in the drivers (or an odd hack in a union in struct request for
> two bitrotting old drivers).
> 
> Note that this series expects that the patch to remove the mg_disk
> driver has been applied already on top of the block for-next tree.

You wouldn't have a git tree to easily test it? Thanks.


Re: [PATCH] ibft: Expose iBFT acpi header via sysfs

2016-04-01 Thread Konrad Rzeszutek Wilk
On Wed, Mar 23, 2016 at 09:49:26PM -0400, David Bond wrote:
> 
> Some ethernet adapter vendors are supplying products which support optional
> (payed license) features. On some adapters this includes a hardware iscsi
> initiator.  The same adapters in a normal (no extra licenses) mode of
> operation can be used as a software iscsi initiator.  In addition, software
> iscsi boot initiators are becoming a standard part of many vendors uefi
> implementations.  This is creating difficulties during early boot/install
> determining the proper configuration method for these adapters when they
> are used as a boot device.
> 
> The attached patch creates sysfs entries to expose information from the
> acpi header of the ibft table.  This information allows for a method to
> easily determining if an ibft table was created by a ethernet card's
> firmware or the system uefi/bios.  In the case of a hardware initiator this
> information in combination with the pci vendor and device id can be used
> to ascertain any vendor specific behaviors that need to be accommodated.

Heya!

Is there an patch in iscsiadm for this as well? Or is this 
> 
> Reviewed-by: Lee Duncan 
> Signed-off-by: David Bond 
> ---
>  Documentation/ABI/testing/sysfs-ibft | 10 ++
>  drivers/firmware/iscsi_ibft.c| 64 
> +++-
>  drivers/scsi/iscsi_boot_sysfs.c  | 62 ++
>  include/linux/iscsi_boot_sysfs.h | 13 
>  4 files changed, 148 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-ibft 
> b/Documentation/ABI/testing/sysfs-ibft
> index cac3930..7d6725f 100644
> --- a/Documentation/ABI/testing/sysfs-ibft
> +++ b/Documentation/ABI/testing/sysfs-ibft
> @@ -21,3 +21,13 @@ Contact:   Konrad Rzeszutek 
>  Description: The /sys/firmware/ibft/ethernetX directory will contain
>   files that expose the iSCSI Boot Firmware Table NIC data.
>   Usually this contains the IP address, MAC, and gateway of the 
> NIC.
> +
> +What:/sys/firmware/ibft/acpi_header
> +Date:March 2016
> +Contact: David Bond 
> +Description: The /sys/firmware/ibft/acpi_header directory will contain files
> + that expose the SIGNATURE, OEM_ID, and OEM_TABLE_ID fields of 
> the
> + acpi table header of the iBFT structure.  This will allow for
> + identification of the creator of the table which is useful in
> + determining quirks associated with some adapters when used in
> + hardware vs software iscsi initiator mode.
> diff --git a/drivers/firmware/iscsi_ibft.c b/drivers/firmware/iscsi_ibft.c
> index 81037e5..2b1900b 100644
> --- a/drivers/firmware/iscsi_ibft.c
> +++ b/drivers/firmware/iscsi_ibft.c
> @@ -418,6 +418,31 @@ static ssize_t ibft_attr_show_target(void *data, int 
> type, char *buf)
>   return str - buf;
>  }
>  
> +static ssize_t ibft_attr_show_acpitbl(void *data, int type, char *buf)
> +{
> + struct ibft_kobject *entry = data;
> + char *str = buf;
> +
> + switch (type) {
> + case ISCSI_BOOT_ACPITBL_SIGNATURE:
> + str += sprintf_string(str, ACPI_NAME_SIZE,
> +   entry->header->header.signature);
> + break;
> + case ISCSI_BOOT_ACPITBL_OEM_ID:
> + str += sprintf_string(str, ACPI_OEM_ID_SIZE,
> +   entry->header->header.oem_id);
> + break;
> + case ISCSI_BOOT_ACPITBL_OEM_TABLE_ID:
> + str += sprintf_string(str, ACPI_OEM_TABLE_ID_SIZE,
> +   entry->header->header.oem_table_id);
> + break;
> + default:
> + break;
> + }
> +
> + return str - buf;
> +}
> +
>  static int __init ibft_check_device(void)
>  {
>   int len;
> @@ -576,6 +601,24 @@ static umode_t __init ibft_check_initiator_for(void 
> *data, int type)
>   return rc;
>  }
>  
> +static umode_t __init ibft_check_acpitbl_for(void *data, int type)
> +{
> +
> + umode_t rc = 0;
> +
> + switch (type) {
> + case ISCSI_BOOT_ACPITBL_SIGNATURE:
> + case ISCSI_BOOT_ACPITBL_OEM_ID:
> + case ISCSI_BOOT_ACPITBL_OEM_TABLE_ID:
> + rc = S_IRUGO;
> + break;
> + default:
> + break;
> + }
> +
> + return rc;
> +}
> +
>  static void ibft_kobj_release(void *data)
>  {
>   kfree(data);
> @@ -699,6 +742,8 @@ free_ibft_obj:
>  static int __init ibft_register_kobjects(struct acpi_table_ibft *header)
>  {
>   struct ibft_control *control = NULL;
> + struct iscsi_boot_kobj *boot_kobj;
> + struct ibft_kobject *ibft_kobj;
>   void *ptr, *end;
>   int rc = 0;
>   u16 offset;
> @@ -727,6 +772,23 @@ static int __init ibft_register_kobjects(struct 
> acpi_table_ibft *header)
>   }
>   }
>  

What about the rc earlier (from the loop)? You end up ignoring it.
Should we have:

if (rc)
return rc;

Re: [PATCHv2] iscsi_ibft: Add prefix-len attr and display netmask

2016-02-29 Thread Konrad Rzeszutek Wilk
On Mon, Feb 29, 2016 at 1:45 PM, Mike Christie  wrote:
> On 02/25/2016 12:15 PM, Lee Duncan wrote:
>> From: Hannes Reinecke 
>>
>> The iBFT table only specifies a prefix length, not a netmask.
>> And the netmask is pretty much pointless for IPv6.
>> So introduce a new attribute 'prefix-len' and display the
>> netmask attribute only for IPv4 addresses.
>
> The code looks ok to me, but I think this ipv4 comment was left over
> from the last posting since it conflicts with the comment below about
> always displaying it. Maybe it can just be edited when the patch is
> merged to avoid having to resend again. If so,
>
> Reviewed-by: Mike Christie 
>

Cool. Let me roll it up and send the git pull to Linus.

This does not seem be super-urgent so it can wait for the upcoming merge window?

>
>>
>> Some older user-space code might rely on the netmask attribute
>> being present, so we should always display it.
>>
>> Changes from v1:
>>  - Combined two patches into one
>>
>> Signed-off-by: Hannes Reinecke 
>> Signed-off-by: Lee Duncan 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xen/xenbus: Use 'void' instead of 'int' for the return of xenbus_switch_state()

2014-09-29 Thread Konrad Rzeszutek Wilk
On Mon, Sep 29, 2014 at 03:17:10PM +0100, David Vrabel wrote:
> On 29/09/14 15:02, Konrad Rzeszutek Wilk wrote:
> > On Sat, Sep 27, 2014 at 12:36:42AM +0800, Chen Gang wrote:
> >> When xenbus_switch_state() fails, it will call xenbus_switch_fatal()
> > 
> > Only on the first depth, not on the subsequent ones (as in if
> > the first xenbus_switch_fail fails, it won't try to call
> > xenbus_switch_state again and again).
> > 
> >> internally, so need not return any status value, then use 'void' instead
> >> of 'int' for xenbus_switch_state() and __xenbus_switch_state().
> > 
> > When that switch occurs (to XenbusStateConnected) won't the watches
> > fire - meaning we MUST make sure that the watch functions - if they
> > use the xenbus_switch_state() they MUST not hold any locks - because
> > they could be executed once more?
> > 
> > Oh wait, we don't have to worry about that right now as the callbacks
> > that pick up the messages from the XenBus are all gated on one mutex
> > anyhow.
> > 
> > Hm, anyhow, I would add this extra piece of information to the patch:
> > 
> > 
> > diff --git a/drivers/xen/xen-pciback/xenbus.c 
> > b/drivers/xen/xen-pciback/xenbus.c
> > index c214daa..f7399fd 100644
> > --- a/drivers/xen/xen-pciback/xenbus.c
> > +++ b/drivers/xen/xen-pciback/xenbus.c
> > @@ -661,6 +661,12 @@ static void xen_pcibk_be_watch(struct xenbus_watch 
> > *watch,
> >  
> > switch (xenbus_read_driver_state(pdev->xdev->nodename)) {
> > case XenbusStateInitWait:
> > +   /*
> > +* xenbus_switch_state can call xenbus_switch_fatal which will
> > +* immediately set the state to XenbusStateClosing which
> > +* means if we were reading for it here we MUST drop any
> > +* locks so that we don't dead-lock.
> > +*/
> 
> Watches are asynchronous and serialised by the xenwatch thread.  I can't
> see what deadlock you're talking about here.  Particularly since the
> backend doesn't watch its own state node (it watches the frontend one).
> 
> > xen_pcibk_setup_backend(pdev);
> > break;
> >  
> >>
> >> Also need be sure that all callers which check the return value must let
> >> 'err' be 0.
> > 
> > I am bit uncomfortable with that, that is due to:
> > 
> > 
> > .. snip..
> >> diff --git a/drivers/net/xen-netback/xenbus.c 
> >> b/drivers/net/xen-netback/xenbus.c
> >> index 9c47b89..b5c3d47 100644
> >> --- a/drivers/net/xen-netback/xenbus.c
> >> +++ b/drivers/net/xen-netback/xenbus.c
> >> @@ -337,10 +337,7 @@ static int netback_probe(struct xenbus_device *dev,
> >>if (err)
> >>pr_debug("Error writing multi-queue-max-queues\n");
> >>  
> >> -  err = xenbus_switch_state(dev, XenbusStateInitWait);
> >> -  if (err)
> >> -  goto fail;
> >> -
> >> +  xenbus_switch_state(dev, XenbusStateInitWait);
> > 
> > Which if it fails it won't call:
> > 
> > 354 fail:   
> > 
> > 355 pr_debug("failed\n");   
> > 
> > 356 netback_remove(dev);
> > 
> > 357 return err; 
> > 
> > 
> > And since there is no watch on the backend state to go in Closing it won't
> > ever call those and we leak memory.
> 
> It's not leaking the memory.  All resources will be recovered when the
> device is removed.

I presume you mean when the XenBus entries are torn down? It does look
like it would call the .remove functionality. That should take care of that.

In which case we can just remove all of the 'netback_remove()' and also
remove some of the labels.


> 
> > The same is for xen-blkback mechanism in the probe function.
> 
> David
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] xen/xenbus: Use 'void' instead of 'int' for the return of xenbus_switch_state()

2014-09-29 Thread Konrad Rzeszutek Wilk
On Fri, Sep 26, 2014 at 07:07:19PM +0100, David Vrabel wrote:
> On 26/09/14 17:36, Chen Gang wrote:
> > When xenbus_switch_state() fails, it will call xenbus_switch_fatal()
> > internally, so need not return any status value, then use 'void' instead
> > of 'int' for xenbus_switch_state() and __xenbus_switch_state().
> > 
> > Also need be sure that all callers which check the return value must let
> > 'err' be 0.
> 
> I've rewritten the commit message as:
> 
> xen/xenbus: don't return errors from xenbus_switch_state()
> 
> Most users of xenbus_switch_state() weren't handling the failure of
> xenbus_switch_state() correctly.  They either called
> xenbus_dev_fatal() (which xenbus_switch_state() has effectively
> already tried to do), or ignored errors.
> 
> xenbus_switch_state() may fail because:
> 
> a) The device is being unplugged by the toolstack. The device will
> shortly be removed and the error does not need to be handled.
> 
> b) Xenstore is broken.  There isn't much the driver can do in this
> case since xenstore is required to signal failure to the toolstack.
> 
> So, don't report any errors from xenbus_switch_state() which removes
> some unnecessary error handling in some of the drivers.
> 
> I'd appreciate a review from some of the other front/backend driver
> maintainers on whether this is sensible reasoning.

Done that but I am not too keen about this patch - as I think it will
cause memory leaks on failure paths.

> 
> David
> 
> > 
> > And also need change the related comments for xenbus_switch_state().
> > 
> > Signed-off-by: Chen Gang 
> > ---
> >  drivers/block/xen-blkback/xenbus.c |  9 ++---
> >  drivers/net/xen-netback/xenbus.c   |  5 +
> >  drivers/pci/xen-pcifront.c | 15 ++-
> >  drivers/xen/xen-pciback/xenbus.c   | 21 -
> >  drivers/xen/xen-scsiback.c |  5 +
> >  drivers/xen/xenbus/xenbus_client.c | 16 
> >  include/xen/xenbus.h   |  3 ++-
> >  7 files changed, 24 insertions(+), 50 deletions(-)
> > 
> > diff --git a/drivers/block/xen-blkback/xenbus.c 
> > b/drivers/block/xen-blkback/xenbus.c
> > index 3a8b810..fdcc584 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -587,9 +587,7 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
> > if (err)
> > goto fail;
> >  
> > -   err = xenbus_switch_state(dev, XenbusStateInitWait);
> > -   if (err)
> > -   goto fail;
> > +   xenbus_switch_state(dev, XenbusStateInitWait);
> >  
> > return 0;
> >  
> > @@ -837,10 +835,7 @@ again:
> > if (err)
> > xenbus_dev_fatal(dev, err, "ending transaction");
> >  
> > -   err = xenbus_switch_state(dev, XenbusStateConnected);
> > -   if (err)
> > -   xenbus_dev_fatal(dev, err, "%s: switching to Connected state",
> > -dev->nodename);
> > +   xenbus_switch_state(dev, XenbusStateConnected);
> >  
> > return;
> >   abort:
> > diff --git a/drivers/net/xen-netback/xenbus.c 
> > b/drivers/net/xen-netback/xenbus.c
> > index 9c47b89..b5c3d47 100644
> > --- a/drivers/net/xen-netback/xenbus.c
> > +++ b/drivers/net/xen-netback/xenbus.c
> > @@ -337,10 +337,7 @@ static int netback_probe(struct xenbus_device *dev,
> > if (err)
> > pr_debug("Error writing multi-queue-max-queues\n");
> >  
> > -   err = xenbus_switch_state(dev, XenbusStateInitWait);
> > -   if (err)
> > -   goto fail;
> > -
> > +   xenbus_switch_state(dev, XenbusStateInitWait);
> > be->state = XenbusStateInitWait;
> >  
> > /* This kicks hotplug scripts, so do it immediately. */
> > diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
> > index 53df39a..c1d73b7 100644
> > --- a/drivers/pci/xen-pcifront.c
> > +++ b/drivers/pci/xen-pcifront.c
> > @@ -901,18 +901,16 @@ static int pcifront_try_connect(struct 
> > pcifront_device *pdev)
> > }
> > }
> >  
> > -   err = xenbus_switch_state(pdev->xdev, XenbusStateConnected);
> > -
> > +   xenbus_switch_state(pdev->xdev, XenbusStateConnected);
> > +   return 0;
> >  out:
> > return err;
> >  }
> >  
> >  static int pcifront_try_disconnect(struct pcifront_device *pdev)
> >  {
> > -   int err = 0;
> > enum xenbus_state prev_state;
> >  
> > -
> > prev_state = xenbus_read_driver_state(pdev->xdev->nodename);
> >  
> > if (prev_state >= XenbusStateClosing)
> > @@ -923,11 +921,10 @@ static int pcifront_try_disconnect(struct 
> > pcifront_device *pdev)
> > pcifront_disconnect(pdev);
> > }
> >  
> > -   err = xenbus_switch_state(pdev->xdev, XenbusStateClosed);
> > +   xenbus_switch_state(pdev->xdev, XenbusStateClosed);
> >  
> >  out:
> > -
> > -   return err;
> > +   return 0;
> >  }
> >  
> >  static int pcifront_attach_devices(struct pcifront_device *pdev)
> > @@ -1060,8 +1057,8 @@ static int pcifront_detach_devices(struct 
> > pcifront_device *pdev)
> >

Re: [PATCH] xen/xenbus: Use 'void' instead of 'int' for the return of xenbus_switch_state()

2014-09-29 Thread Konrad Rzeszutek Wilk
On Sat, Sep 27, 2014 at 12:36:42AM +0800, Chen Gang wrote:
> When xenbus_switch_state() fails, it will call xenbus_switch_fatal()

Only on the first depth, not on the subsequent ones (as in if
the first xenbus_switch_fail fails, it won't try to call
xenbus_switch_state again and again).

> internally, so need not return any status value, then use 'void' instead
> of 'int' for xenbus_switch_state() and __xenbus_switch_state().

When that switch occurs (to XenbusStateConnected) won't the watches
fire - meaning we MUST make sure that the watch functions - if they
use the xenbus_switch_state() they MUST not hold any locks - because
they could be executed once more?

Oh wait, we don't have to worry about that right now as the callbacks
that pick up the messages from the XenBus are all gated on one mutex
anyhow.

Hm, anyhow, I would add this extra piece of information to the patch:


diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c
index c214daa..f7399fd 100644
--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -661,6 +661,12 @@ static void xen_pcibk_be_watch(struct xenbus_watch *watch,
 
switch (xenbus_read_driver_state(pdev->xdev->nodename)) {
case XenbusStateInitWait:
+   /*
+* xenbus_switch_state can call xenbus_switch_fatal which will
+* immediately set the state to XenbusStateClosing which
+* means if we were reading for it here we MUST drop any
+* locks so that we don't dead-lock.
+*/
xen_pcibk_setup_backend(pdev);
break;
 
> 
> Also need be sure that all callers which check the return value must let
> 'err' be 0.

I am bit uncomfortable with that, that is due to:


.. snip..
> diff --git a/drivers/net/xen-netback/xenbus.c 
> b/drivers/net/xen-netback/xenbus.c
> index 9c47b89..b5c3d47 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -337,10 +337,7 @@ static int netback_probe(struct xenbus_device *dev,
>   if (err)
>   pr_debug("Error writing multi-queue-max-queues\n");
>  
> - err = xenbus_switch_state(dev, XenbusStateInitWait);
> - if (err)
> - goto fail;
> -
> + xenbus_switch_state(dev, XenbusStateInitWait);

Which if it fails it won't call:

354 fail:   

355 pr_debug("failed\n");   

356 netback_remove(dev);

357 return err; 


And since there is no watch on the backend state to go in Closing it won't
ever call those and we leak memory.

The same is for xen-blkback mechanism in the probe function.

>   be->state = XenbusStateInitWait;
>  
>   /* This kicks hotplug scripts, so do it immediately. */
> diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
> index 53df39a..c1d73b7 100644
> --- a/drivers/pci/xen-pcifront.c
> +++ b/drivers/pci/xen-pcifront.c
> @@ -901,18 +901,16 @@ static int pcifront_try_connect(struct pcifront_device 
> *pdev)
>   }
>   }
>  
> - err = xenbus_switch_state(pdev->xdev, XenbusStateConnected);
> -
> + xenbus_switch_state(pdev->xdev, XenbusStateConnected);
> + return 0;
>  out:
>   return err;
>  }
>  
>  static int pcifront_try_disconnect(struct pcifront_device *pdev)
>  {
> - int err = 0;
>   enum xenbus_state prev_state;
>  
> -
>   prev_state = xenbus_read_driver_state(pdev->xdev->nodename);
>  
>   if (prev_state >= XenbusStateClosing)
> @@ -923,11 +921,10 @@ static int pcifront_try_disconnect(struct 
> pcifront_device *pdev)
>   pcifront_disconnect(pdev);
>   }
>  
> - err = xenbus_switch_state(pdev->xdev, XenbusStateClosed);
> + xenbus_switch_state(pdev->xdev, XenbusStateClosed);
>  
>  out:
> -
> - return err;
> + return 0;
>  }
>  
>  static int pcifront_attach_devices(struct pcifront_device *pdev)
> @@ -1060,8 +1057,8 @@ static int pcifront_detach_devices(struct 
> pcifront_device *pdev)
>   domain, bus, slot, func);
>   }
>  
> - err = xenbus_switch_state(pdev->xdev, XenbusStateReconfiguring);
> -
> + xenbus_switch_state(pdev->xdev, XenbusStateReconfiguring);
> + return 0;
>  out:
>   return err;
>  }
> diff --git a/drivers/xen/xen-pciback/xenbus.c 
> b/drivers/xen/xen-pciback/xenbus.c
> index c214daa..630a215 100644

The xen-pcifront are OK, this one:

> --- a/drivers/xen/xen-pciback/xenbus.c
> +++ b/drivers/xen/xen-pciback/xenbus.c
> @@ -184,10 +184,7 @@ static int xen_pcibk_attach(struct xen_pcibk_device 
> *pdev)
>  
>   dev_dbg(&pdev->xdev->dev, "Connecting...\n");
>  
> - err = xenbus_switch_state(pdev->xdev, XenbusStateConnected);
> - if (err)
> - xenbus_dev_fatal(pdev->xdev, err,
> -  

Re: [Xen-devel] [PATCH V5 0/5] Add XEN pvSCSI support

2014-08-22 Thread Konrad Rzeszutek Wilk
On Fri, Aug 22, 2014 at 02:21:55PM -0700, Nicholas A. Bellinger wrote:
> Hi Juergen & Co,
> 
> On Mon, 2014-08-18 at 11:31 +0200, jgr...@suse.com wrote:
> > This series adds XEN pvSCSI support. With pvSCSI it is possible to use 
> > physical
> > SCSI devices from a XEN domain.
> > 
> > The support consists of a backend in the privileged Domain-0 doing the real
> > I/O and a frontend in the unprivileged domU passing I/O-requests to the 
> > backend.
> > 
> > The code is taken (and adapted) from the original pvSCSI implementation done
> > for Linux 2.6 in 2008 by Fujitsu.
> > 
> > [PATCH V5 1/5] xen/events: support threaded irqs for interdomain event 
> > channels
> > [PATCH V5 2/5] Add XEN pvSCSI protocol description
> > [PATCH V5 3/5] Introduce xen-scsifront module
> > [PATCH V5 4/5] Introduce XEN scsiback module
> > [PATCH V5 5/5] add xen pvscsi maintainer
> > 
> > Changes in V5:
> > - Added patch to support threaded irqs for interdomain event channels
> > - several changes in xen-scsifront after comments from Christoph Hellwig
> > - several changes in xen-scsiback after comments from Christoph Hellwig
> > - several changes in xen-scsiback after comments from James Bottomley
> > 
> > Changes in V4:
> > - Re-add define for VSCSIIF_ACT_SCSI_SG_PRESET to vscsiif.h to indicate this
> >   action value should not be used in future enhancements
> > 
> > Changes in V3:
> > - added some comments to the protocol header file
> > - removed the CDB emulation from xen-scsiback, handled by core target
> >   infrastructure
> > - several changes in xen-scsifront after comments from Christoph Hellwig
> > 
> > Changes in V2:
> > - use core target infrastructure by backend instead of pure SCSI passthrough
> > - add support for larger SG lists by putting them in grant page(s)
> > - add command abort capability
> > 
> 
> For the XEN scsiback parts as a new target fabric driver, feel free to
> add my:
> 
> Reviewed-by: Nicholas Bellinger 
> 
> So I assume this will be merged for v3.18 via the xen.git tree, yes..?


When I chatted with Christopher Hellwig he recommended it be done that way.

I am almost done with the Xen bits.
> 
> --nab
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH V5 3/5] Introduce xen-scsifront module

2014-08-22 Thread Konrad Rzeszutek Wilk
> --- /dev/null
> +++ b/drivers/scsi/xen-scsifront.c
> @@ -0,0 +1,1017 @@
> +/*
> + * Xen SCSI frontend driver
> + *
> + * Copyright (c) 2008, FUJITSU Limited
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation; or, when distributed
> + * separately from the Linux kernel or incorporated into other
> + * software packages, subject to the following license:
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this source file (the "Software"), to deal in the Software without
> + * restriction, including without limitation the rights to use, copy, modify,
> + * merge, publish, distribute, sublicense, and/or sell copies of the 
> Software,
> + * and to permit persons to whom the Software is furnished to do so, subject 
> to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
> THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#define DEBUG

:-) I think you don't want that in the driver.

> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +
> +#define GRANT_INVALID_REF0
> +
> +#define VSCSIFRONT_OP_ADD_LUN1
> +#define VSCSIFRONT_OP_DEL_LUN2

Shouldn't those be defined in the vscsiff.h file?

> +
> +#define DEFAULT_TASK_COMM_LENTASK_COMM_LEN

Not sure why you need the DEFAULT_ ? Could you use TASK_COMM_LEN?

> +
> +/* tuning point*/

Missing stop and a space after the 't':

/* Tuning point. */

> +#define VSCSIIF_DEFAULT_CMD_PER_LUN 10
> +#define VSCSIIF_MAX_TARGET  64
> +#define VSCSIIF_MAX_LUN 255
> +
> +#define VSCSIIF_RING_SIZE__CONST_RING_SIZE(vscsiif, PAGE_SIZE)
> +#define VSCSIIF_MAX_REQS VSCSIIF_RING_SIZE
> +
> +#define vscsiif_grants_sg(_sg)   (PFN_UP((_sg) * \
> + sizeof(struct scsiif_request_segment)))
> +
> +struct vscsifrnt_shadow {
> + /* command between backend and frontend */
> + unsigned char act;

act? How about 'op' ? Or 'cmd_op'?

> + uint16_t rqid;

rqid? Could you have a comment explaining what that acronym is?
Oh wait - request id? How about just req_id?

> +
> + /* Number of pieces of scatter-gather */
> + unsigned int nr_grants;

s/nr_grants/nr_sg/ ?

> + struct scsiif_request_segment *sg;
> +
> + /* do reset or abort function */

Full stop missing.
> + wait_queue_head_t wq_reset; /* reset work queue   */
> + int wait_reset; /* reset work queue condition */
> + int32_t rslt_reset; /* reset response status  */
> + /* (SUCESS or FAILED) */

Full stop missing. s/SUCESS/SUCCESS/
> +
> + /* requested struct scsi_cmnd is stored from kernel */

Full stop missing.
> + struct scsi_cmnd *sc;
> + int gref[vscsiif_grants_sg(SG_ALL) + SG_ALL];
> +};
> +
> +struct vscsifrnt_info {
> + struct xenbus_device *dev;
> +
> + struct Scsi_Host *host;
> + int host_active;

This 'host_active' seems to be a guard to call 'scsi_host_remove'
through either the disconnect (so backend told us to disconnect)
or the .remove (so XenStore keys are moved). Either way I think
it is possible to run _both_ of them and this being just
an 'int' is not sufficient.

I would recommend you change this to an ref_count.
> +
> + spinlock_t shadow_lock;

Could you move this spinlock below - to where
you have tons of of 'shadow' values please?

> + unsigned int evtchn;
> + unsigned int irq;
> +
> + grant_ref_t ring_ref;
> + struct vscsiif_front_ring ring;
> + struct vscsiif_response ring_res;
> +
> + unsigned long shadow_free;

A comment would help in explainig this. 'shadow_free' sounds 
a bit cryptic. Oh, and xen-blkfront has it as 'shadow_free'
too! Argh, 'shadow_free_bitmask' could be a better name?
Oh, what if we converted to an bitmask type?

Either way - if you think the existing one should be this way
then lets leave it as is. But if you think a better name
is suited I can change in xen-blkfront too.

> 

Re: [Xen-devel] [PATCH V5 2/5] Add XEN pvSCSI protocol description

2014-08-21 Thread Konrad Rzeszutek Wilk
On Wed, Aug 20, 2014 at 04:01:57PM +0200, Juergen Gross wrote:
> On 08/20/2014 03:25 PM, Konrad Rzeszutek Wilk wrote:
> >On Mon, Aug 18, 2014 at 11:31:47AM +0200, jgr...@suse.com wrote:
> >>From: Juergen Gross 
> >>
> >>Add the definition of pvSCSI protocol used between the pvSCSI frontend in a
> >>XEN domU and the pvSCSI backend in a XEN driver domain (usually Dom0).
> >>
> >>This header was originally provided by Fujitsu for XEN based on Linux 
> >>2.6.18.
> >>Changes are:
> >>- added comment
> >>- adapt to Linux style guide
> >>- add support for larger SG-lists by putting them in an own granted page
> >>- remove stale definitions
> >>
> >>Signed-off-by: Juergen Gross 
> >>---
> >>  include/xen/interface/io/vscsiif.h | 214 
> >> +
> >>  1 file changed, 214 insertions(+)
> >>  create mode 100644 include/xen/interface/io/vscsiif.h
> >>
> >>diff --git a/include/xen/interface/io/vscsiif.h 
> >>b/include/xen/interface/io/vscsiif.h
> >>new file mode 100644
> >>index 000..4291889
> >>--- /dev/null
> >>+++ b/include/xen/interface/io/vscsiif.h
> >>@@ -0,0 +1,214 @@
> >>+/**
> >>+ * vscsiif.h
> >>+ *
> >>+ * Based on the blkif.h code.
> >>+ *
> >>+ * This interface is to be regarded as a stable API between XEN domains
> >>+ * running potentially different Linux kernel versions.
> >>+ *
> >>+ * Permission is hereby granted, free of charge, to any person obtaining a 
> >>copy
> >>+ * of this software and associated documentation files (the "Software"), to
> >>+ * deal in the Software without restriction, including without limitation 
> >>the
> >>+ * rights to use, copy, modify, merge, publish, distribute, sublicense, 
> >>and/or
> >>+ * sell copies of the Software, and to permit persons to whom the Software 
> >>is
> >>+ * furnished to do so, subject to the following conditions:
> >>+ *
> >>+ * The above copyright notice and this permission notice shall be included 
> >>in
> >>+ * all copies or substantial portions of the Software.
> >>+ *
> >>+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
> >>OR
> >>+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> >>+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
> >>THE
> >>+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> >>+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> >>+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> >>+ * DEALINGS IN THE SOFTWARE.
> >>+ *
> >>+ * Copyright(c) FUJITSU Limited 2008.
> >>+ */
> >>+
> >>+#ifndef __XEN__PUBLIC_IO_SCSI_H__
> >>+#define __XEN__PUBLIC_IO_SCSI_H__
> >>+
> >>+#include "ring.h"
> >>+#include "../grant_table.h"
> >>+
> >>+/*
> >>+ * Front->back notifications: When enqueuing a new request, sending a
> >>+ * notification can be made conditional on req_event (i.e., the generic
> >>+ * hold-off mechanism provided by the ring macros). Backends must set
> >>+ * req_event appropriately (e.g., using RING_FINAL_CHECK_FOR_REQUESTS()).
> >>+ *
> >>+ * Back->front notifications: When enqueuing a new response, sending a
> >>+ * notification can be made conditional on rsp_event (i.e., the generic
> >>+ * hold-off mechanism provided by the ring macros). Frontends must set
> >>+ * rsp_event appropriately (e.g., using RING_FINAL_CHECK_FOR_RESPONSES()).
> >>+ */
> >>+
> >>+/*
> >>+ * Feature and Parameter Negotiation
> >>+ * =
> >>+ * The two halves of a Xen pvSCSI driver utilize nodes within the XenStore 
> >>to
> >>+ * communicate capabilities and to negotiate operating parameters.  This
> >>+ * section enumerates these nodes which reside in the respective front and
> >>+ * backend portions of the XenStore, following the XenBus convention.
> >>+ *
> >>+ * All data in the XenStore is stored as strings.  Nodes specifying numeric
> >>+ * values are encoded in decimal.  Integer value ranges listed below are
> >>+ * expressed as fixed sized integer types capable o

Re: How to get the number of VFs assigned to the guests in XEN

2014-08-20 Thread Konrad Rzeszutek Wilk
On Wed, Aug 20, 2014 at 04:33:29PM +0530, Sreekanth Reddy wrote:
> HI,
> 
> 
> 
> For SRIOV support, currently in the KVM environment, mpt3sas driver can use
> the API pci_vfs_assigned() to know the number of VFs that are currently
> assigned to the running VMs. So that during the PF driver unload time, if
> the return value of this API is greater than zero the our driver won't call
> the pci_disable_sriov() to disable the VFs.
> 
> 
> 
> Now for the same purpose in XEN environment, is there any API similar to
> pci_vfs_assigned() which the low lever device driver can use to know 'the
> number of VFs that are currently assigned to the running VMs'. In XEN
> environment this API pci_vfs_assigned() will return always zero even though
> VFs are assigned to the running VMs.

Odd. It should return the same value. Are the devices binded to pci-back?

> 
> 
> 
> Regards,
> 
> Sreekanth
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH V5 1/5] xen/events: support threaded irqs for interdomain event channels

2014-08-20 Thread Konrad Rzeszutek Wilk
On Mon, Aug 18, 2014 at 11:31:46AM +0200, jgr...@suse.com wrote:
> From: Juergen Gross 
> 
> Export bind_interdomain_evtchn_to_irq() so drivers can use threaded
> interrupt handlers with:
> 
>  irq = bind_interdomain_evtchn_to_irq(remote_dom, remote_port);
>  if (irq < 0)
>  /* error */
>  ret = request_threaded_irq(...);
> 
> Signed-off-by: Juergen Gross 
> Acked-by: David Vrabel 

Acked-by: Konrad Rzeszutek Wilk 
> ---
>  drivers/xen/events/events_base.c | 5 +++--
>  include/xen/events.h | 2 ++
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/xen/events/events_base.c 
> b/drivers/xen/events/events_base.c
> index 5b5c5ff..b4bca2d 100644
> --- a/drivers/xen/events/events_base.c
> +++ b/drivers/xen/events/events_base.c
> @@ -900,8 +900,8 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned int 
> cpu)
>   return irq;
>  }
>  
> -static int bind_interdomain_evtchn_to_irq(unsigned int remote_domain,
> -   unsigned int remote_port)
> +int bind_interdomain_evtchn_to_irq(unsigned int remote_domain,
> +unsigned int remote_port)
>  {
>   struct evtchn_bind_interdomain bind_interdomain;
>   int err;
> @@ -914,6 +914,7 @@ static int bind_interdomain_evtchn_to_irq(unsigned int 
> remote_domain,
>  
>   return err ? : bind_evtchn_to_irq(bind_interdomain.local_port);
>  }
> +EXPORT_SYMBOL_GPL(bind_interdomain_evtchn_to_irq);
>  
>  static int find_virq(unsigned int virq, unsigned int cpu)
>  {
> diff --git a/include/xen/events.h b/include/xen/events.h
> index 8bee7a7..5321cd9 100644
> --- a/include/xen/events.h
> +++ b/include/xen/events.h
> @@ -28,6 +28,8 @@ int bind_ipi_to_irqhandler(enum ipi_vector ipi,
>  unsigned long irqflags,
>  const char *devname,
>  void *dev_id);
> +int bind_interdomain_evtchn_to_irq(unsigned int remote_domain,
> +unsigned int remote_port);
>  int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain,
> unsigned int remote_port,
> irq_handler_t handler,
> -- 
> 1.8.4.5
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH V5 2/5] Add XEN pvSCSI protocol description

2014-08-20 Thread Konrad Rzeszutek Wilk
On Mon, Aug 18, 2014 at 11:31:47AM +0200, jgr...@suse.com wrote:
> From: Juergen Gross 
> 
> Add the definition of pvSCSI protocol used between the pvSCSI frontend in a
> XEN domU and the pvSCSI backend in a XEN driver domain (usually Dom0).
> 
> This header was originally provided by Fujitsu for XEN based on Linux 2.6.18.
> Changes are:
> - added comment
> - adapt to Linux style guide
> - add support for larger SG-lists by putting them in an own granted page
> - remove stale definitions
> 
> Signed-off-by: Juergen Gross 
> ---
>  include/xen/interface/io/vscsiif.h | 214 
> +
>  1 file changed, 214 insertions(+)
>  create mode 100644 include/xen/interface/io/vscsiif.h
> 
> diff --git a/include/xen/interface/io/vscsiif.h 
> b/include/xen/interface/io/vscsiif.h
> new file mode 100644
> index 000..4291889
> --- /dev/null
> +++ b/include/xen/interface/io/vscsiif.h
> @@ -0,0 +1,214 @@
> +/**
> + * vscsiif.h
> + *
> + * Based on the blkif.h code.
> + *
> + * This interface is to be regarded as a stable API between XEN domains
> + * running potentially different Linux kernel versions.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense, 
> and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
> THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> + * Copyright(c) FUJITSU Limited 2008.
> + */
> +
> +#ifndef __XEN__PUBLIC_IO_SCSI_H__
> +#define __XEN__PUBLIC_IO_SCSI_H__
> +
> +#include "ring.h"
> +#include "../grant_table.h"
> +
> +/*
> + * Front->back notifications: When enqueuing a new request, sending a
> + * notification can be made conditional on req_event (i.e., the generic
> + * hold-off mechanism provided by the ring macros). Backends must set
> + * req_event appropriately (e.g., using RING_FINAL_CHECK_FOR_REQUESTS()).
> + *
> + * Back->front notifications: When enqueuing a new response, sending a
> + * notification can be made conditional on rsp_event (i.e., the generic
> + * hold-off mechanism provided by the ring macros). Frontends must set
> + * rsp_event appropriately (e.g., using RING_FINAL_CHECK_FOR_RESPONSES()).
> + */
> +
> +/*
> + * Feature and Parameter Negotiation
> + * =
> + * The two halves of a Xen pvSCSI driver utilize nodes within the XenStore to
> + * communicate capabilities and to negotiate operating parameters.  This
> + * section enumerates these nodes which reside in the respective front and
> + * backend portions of the XenStore, following the XenBus convention.
> + *
> + * All data in the XenStore is stored as strings.  Nodes specifying numeric
> + * values are encoded in decimal.  Integer value ranges listed below are
> + * expressed as fixed sized integer types capable of storing the conversion
> + * of a properly formated node string, without loss of information.
> + *
> + * Any specified default value is in effect if the corresponding XenBus node
> + * is not present in the XenStore.
> + *
> + * XenStore nodes in sections marked "PRIVATE" are solely for use by the
> + * driver side whose XenBus tree contains them.
> + *
> + 
> *
> + *Backend XenBus Nodes
> + 
> *
> + *
> + *-- Backend Device Identification (PRIVATE) 
> --
> + *
> + * p-devname
> + *  Values: string
> + *
> + *  A free string used to identify the physical device (e.g. a disk 
> name).
> + *
> + * p-dev
> + *  Values: string
> + *
> + *  A string specifying the backend device: either a 4-tuple "h:c:t:l"
> + *  (host, controller, target, lun, all integers), or a WWN (e.g.
> + *  "naa.60014054ac780582").
> + *
> + * v-dev
> + *  Values: string
> + *
> + *  A string specifying the frontend device in form of a 4-tuple 
> "h:c:t:l"
> + *  (host,

Re: [Xen-devel] [PATCH V5 5/5] add xen pvscsi maintainer

2014-08-20 Thread Konrad Rzeszutek Wilk
On Mon, Aug 18, 2014 at 11:31:50AM +0200, jgr...@suse.com wrote:
> From: Juergen Gross 
> 
> Add myself as maintainer for the Xen pvSCSI stuff.
> 
> Signed-off-by: Juergen Gross 

Acked-by: Konrad Rzeszutek Wilk 
> ---
>  MAINTAINERS | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index aefa948..360f86f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -10086,6 +10086,14 @@ S:   Supported
>  F:   drivers/block/xen-blkback/*
>  F:   drivers/block/xen*
>  
> +XEN PVSCSI DRIVERS
> +M:   Juergen Gross 
> +L:   xen-de...@lists.xenproject.org (moderated for non-subscribers)
> +S:   Supported
> +F:   drivers/scsi/xen-scsifront.c
> +F:   drivers/xen/xen-scsiback.c
> +F:   include/xen/interface/io/vscsiif.h
> +
>  XEN SWIOTLB SUBSYSTEM
>  M:   Konrad Rzeszutek Wilk 
>  L:   xen-de...@lists.xenproject.org (moderated for non-subscribers)
> -- 
> 1.8.4.5
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 0/4] Add XEN pvSCSI support

2014-07-25 Thread Konrad Rzeszutek Wilk
On Fri, Jul 25, 2014 at 01:37:29PM +0200, jgr...@suse.com wrote:
> This series adds XEN pvSCSI support. With pvSCSI it is possible to use 
> physical
> SCSI devices from a XEN domain.
> 
> The support consists of a backend in the privileged Domain-0 doing the real
> I/O and a frontend in the unprivileged domU passing I/O-requests to the 
> backend.
> 

About the question that Christopher Hellwig asked - was that ever answered?

> The code is taken (and adapted) from the original pvSCSI implementation done
> for Linux 2.6 in 2008 by Fujitsu.
> 
> [PATCH V2 1/4] Add XEN pvSCSI protocol description
> [PATCH V2 2/4] Introduce xen-scsifront module
> [PATCH V2 3/4] Introduce XEN scsiback module
> [PATCH V2 4/4] add xen pvscsi maintainer
> 
> Changes in V2:
> - use core target infrastructure by backend instead of pure SCSI passthrough
> - add support for larger SG lists by putting them in grant page(s)
> - add command abort capability
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 4/4] add xen pvscsi maintainer

2014-06-27 Thread Konrad Rzeszutek Wilk
On Fri, Jun 27, 2014 at 04:34:36PM +0200, jgr...@suse.com wrote:
> From: Juergen Gross 
> 
> Add myself as maintainer for the Xen pvSCSI stuff.

Woot!

Acked-by: Konrad Rzeszutek Wilk 

> 
> Signed-off-by: Juergen Gross 
> ---
>  MAINTAINERS | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ebc8128..78c60e6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -10008,6 +10008,13 @@ S:   Supported
>  F:   arch/x86/pci/*xen*
>  F:   drivers/pci/*xen*
>  
> +XEN PVSCSI DRIVERS
> +M:   Juergen Gross 
> +L:   xen-de...@lists.xenproject.org (moderated for non-subscribers)
> +S:   Supported
> +F:   drivers/scsi/xen*
> +F:   include/xen/interface/io/vscsiif.h
> +
>  XEN SWIOTLB SUBSYSTEM
>  M:   Konrad Rzeszutek Wilk 
>  L:   xen-de...@lists.xenproject.org (moderated for non-subscribers)
> -- 
> 1.8.4.5
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 1/4] Add XEN pvSCSI protocol description

2014-06-27 Thread Konrad Rzeszutek Wilk
On Fri, Jun 27, 2014 at 04:34:33PM +0200, jgr...@suse.com wrote:
> From: Juergen Gross 
> 
> Add the definition of pvSCSI protocol used between the pvSCSI frontend in a
> XEN domU and the pvSCSI backend in a XEN driver domain (usually Dom0).
> 
> This header was originally provided by Fujitsu for XEN based on Linux 2.6.18.
> Changes are:
> - added comment
> - adapt to Linux style guide
> 
> Signed-off-by: Juergen Gross 
> ---
>  include/xen/interface/io/vscsiif.h | 110 
> +
>  1 file changed, 110 insertions(+)
>  create mode 100644 include/xen/interface/io/vscsiif.h
> 
> diff --git a/include/xen/interface/io/vscsiif.h 
> b/include/xen/interface/io/vscsiif.h
> new file mode 100644
> index 000..f8420f3
> --- /dev/null
> +++ b/include/xen/interface/io/vscsiif.h
> @@ -0,0 +1,110 @@
> +/**
> + * vscsiif.h
> + *
> + * Based on the blkif.h code.
> + *
> + * This interface is to be regarded as a stable API between XEN domains
> + * running potentially different Linux kernel versions.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense, 
> and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
> THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> + * Copyright(c) FUJITSU Limited 2008.
> + */
> +
> +#ifndef __XEN__PUBLIC_IO_SCSI_H__
> +#define __XEN__PUBLIC_IO_SCSI_H__
> +
> +#include "ring.h"
> +#include "../grant_table.h"
> +
> +/* commands between backend and frontend */
> +#define VSCSIIF_ACT_SCSI_CDB 1   /* SCSI CDB command */
> +#define VSCSIIF_ACT_SCSI_ABORT   2   /* SCSI Device(Lun) 
> Abort*/
> +#define VSCSIIF_ACT_SCSI_RESET   3   /* SCSI Device(Lun) 
> Reset*/
> +#define VSCSIIF_ACT_SCSI_SG_PRESET   4   /* Preset SG elements */
> +
> +/*
> + * Maximum scatter/gather segments per request.
> + *
> + * Considering balance between allocating at least 16 "vscsiif_request"
> + * structures on one page (4096 bytes) and the number of scatter/gather
> + * elements needed, we decided to use 26 as a magic number.
> + */
> +#define VSCSIIF_SG_TABLESIZE 26
> +
> +/*
> + * based on Linux kernel 2.6.18

This being a bit more .. new, - do these sizes make sense anymore?
Should they be extended a bit? Or have support for using the
old ones (as default) and then negotiate new sizes with the kernel?
(If of course there is a difference?)
> + */
> +#define VSCSIIF_MAX_COMMAND_SIZE 16
> +#define VSCSIIF_SENSE_BUFFERSIZE 96
> +
> +struct scsiif_request_segment {
> + grant_ref_t gref;
> + uint16_t offset;
> + uint16_t length;
> +};
> +typedef struct scsiif_request_segment vscsiif_segment_t;

No typedefs please.
> +
> +struct vscsiif_request {
> + uint16_t rqid;  /* private guest value, echoed in resp  */
> + uint8_t act;/* command between backend and frontend */
> + uint8_t cmd_len;
> +
> + uint8_t cmnd[VSCSIIF_MAX_COMMAND_SIZE];
> + uint16_t timeout_per_command;   /* The command is issued by twice
> +the value in Backend. */
> + uint16_t channel, id, lun;
> + uint16_t padding;
> + uint8_t sc_data_direction;  /* for DMA_TO_DEVICE(1)
> +DMA_FROM_DEVICE(2)
> +DMA_NONE(3) requests */
> + uint8_t nr_segments;/* Number of pieces of scatter-gather */
> +
> + vscsiif_segment_t seg[VSCSIIF_SG_TABLESIZE];
> + uint32_t reserved[3];
> +};

Could you add a comment saying what the size should be. Just in case
somebody extends this in the future - they will know the limits.

> +typedef struct vscsiif_request vscsiif_request_t;

Ditto.
> +
> +#define VSCSIIF_SG_LIST_SIZE ((sizeof(vscsiif_request_t) - 4) /  
> \
> + sizeof(vscsiif_segment_t))

That -4 ought to have a comment. In case the scsiif_request_segment
gets a new member or such.
> +
> +struct vscsiif_

Re: [PATCH] iscsi_ibft: search for broadcom specific ibft sign

2014-05-13 Thread Konrad Rzeszutek Wilk
On Wed, May 07, 2014 at 02:01:48PM -0500, Mike Christie wrote:
> On 05/07/2014 04:00 AM, vikas.chaudh...@qlogic.com wrote:
> > From: Vikas Chaudhary 
> > 
> > Broadcom iscsi offload firmware uses a non standard ibft sign of "BIFT".
> > This patch modifies the ibft search code to search for "BIFT" along
> > with the other possible values.
> > 
> > Signed-off-by: Vikas Chaudhary 
> > Acked-by: Kevin Tran 
> > Acked-by: Eddie Wai 
> > ---
> >  drivers/firmware/iscsi_ibft.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/firmware/iscsi_ibft.c b/drivers/firmware/iscsi_ibft.c
> > index 3ee852c..071c2c9 100644
> > --- a/drivers/firmware/iscsi_ibft.c
> > +++ b/drivers/firmware/iscsi_ibft.c
> > @@ -756,6 +756,7 @@ static const struct {
> >  */
> > { ACPI_SIG_IBFT },
> > { "iBFT" },
> > +   { "BIFT" }, /* Broadcom iSCSI Offload */
> >  };
> >  
> >  static void __init acpi_find_ibft_region(void)
> > 
> 
> Patch looks ok to me.
> 
> Signed-off-by: Mike Christie 

I've converted your SoB in an Ack-by as this patch is not going
through your tree but mine.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iscsi_ibft: search for broadcom specific ibft sign

2014-05-09 Thread Konrad Rzeszutek Wilk
On Fri, May 09, 2014 at 11:50:20AM +, Vikas Chaudhary wrote:
> 
> 
> On 08/05/14 2:27 am, "Mike Christie"  wrote:
> 
> >On 05/07/2014 03:30 PM, Mike Christie wrote:
> >> On 05/07/2014 03:15 PM, Peter Jones wrote:
> >>> On Wed, May 07, 2014 at 02:49:59PM -0500, Mike Christie wrote:
> >>>> On 05/07/2014 02:21 PM, Konrad Rzeszutek Wilk wrote:
> >>>>> On Wed, May 07, 2014 at 07:12:31PM +, James Bottomley wrote:
> >>>>>> On Wed, 2014-05-07 at 09:47 -0400, Konrad Rzeszutek Wilk wrote:
> >>>>>>> On Wed, May 07, 2014 at 05:00:20AM -0400,
> >>>>>>>vikas.chaudh...@qlogic.com wrote:
> >>>>>>>> From: Vikas Chaudhary 
> >>>>>>>>
> >>>>>>>> Broadcom iscsi offload firmware uses a non standard ibft sign of
> >>>>>>>>"BIFT".
> >>>>>>>
> >>>>>>> Why? If it uses the standard iBFT format why does it use
> >>>>>>> a non-standard signature?
> >>>>>>
> >>>>>> This is useful as an academic exercise (and perhaps even a reminder
> >>>>>>to
> >>>>>> broadcom not to do it again) but I don't think we can make it a show
> >>>>>> stopper.  The boards have shipped with the non-standard signature,
> >>>>>>so we
> >>>>>> have to work with them.
> >>>>>
> >>>>> I agree as the train has left, but this got me thinking about these
> >>>>> questions that I hope Qlogic folks could answer:
> >>>>>
> >>>>>  - Mention what else is different - perhaps there are other entries
> >>>>>that
> >>>>>are a bit different? Or maybe the are some non-standard ones
> >>>>>added on?
> >>>>>
> >>>>>  - How has this been tested? As in had all the fields been tested
> >>>>>(so CHAP
> >>>>>on/off, extra ports, etc).
> >>>>>
> >>>>
> >>>> This supports the same stuff as was added in the original commit for
> >>>> that string:
> >>>>
> >>>> 140363500ddadad0c09cb512cc0c96a4d3efa053
> >>>>
> >>>> It just was not carried over in the acpi specific table in commit
> >>>> 935a9fee51c945b8942be2d7b4bae069167b4886.
> >>>
> >>> Okay, but that patch leaves the scanning for it pre-ACPI intact.
> >> 
> >> Before 935a9fee51c945b8942be2d7b4bae069167b4886, didn't we check for
> >> BIFT in the ACPI table case?
> >> 
> >> Before that patch, we used to do:
> >> drivers/firmware/iscsi_ibft_find.c:find_ibft_region()
> >> 
> >> for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++)
> >> acpi_table_parse(ibft_signs[i].sign, acpi_find_ibft);
> >> 
> >> and BIFT was in that ibft_signs array.
> >> 
> >> I was just saying I thought since we added support for BIFT, we had been
> >> checking for it in the ACPI case.
> >
> >
> >I think I am in the wrong. When I added that support I thought BIFT was
> >supposed to be for both the ACPI and the RAM case, so I had coded it
> >like above. I am not seeing that in the old mails though, so you might
> >be right and they just are now adding support for ACPI. Will just wait
> >for qlogic/broadcom.
> 
> Mike, In your original patch 140363500ddadad0c09cb512cc0c96a4d3efa053 we
> are checking for BIFT, and BIFT was in ibft_signs[] array which is defined
> in iscsi_ibft_find.c.
> Latter when patch 935a9fee51c945b8942be2d7b4bae069167b4886 get added, this
> patch defined new array of ibft_signs[] in iscsi_ibft.c which does not
> have BIFT signature.
> Patch 935a9fee51c945b8942be2d7b4bae069167b4886 added to fix finding IBFT
> ACPI table on UEFI. We are just enhancing this patch.

In a nutsheel this is a fix for a regression that has been there since 3.2
and introduced by 935a9fee51c945b8942be2d7b4bae069167b4886 ("ibft: Fix finding
IBFT ACPI table on UEFI").

Vikas,
Could you resend the patch and include these details in the commit messages:
That this is a fix for said regression and what cards it impacts (or firmwares).

Thank you.

Since this is a regression I can send the patch to Linus right away - but
I really would like to have that information in the git commit message
so that Linus doesn't look funny at me.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iscsi_ibft: search for broadcom specific ibft sign

2014-05-07 Thread Konrad Rzeszutek Wilk
On Wed, May 07, 2014 at 07:12:31PM +, James Bottomley wrote:
> On Wed, 2014-05-07 at 09:47 -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, May 07, 2014 at 05:00:20AM -0400, vikas.chaudh...@qlogic.com wrote:
> > > From: Vikas Chaudhary 
> > > 
> > > Broadcom iscsi offload firmware uses a non standard ibft sign of "BIFT".
> > 
> > Why? If it uses the standard iBFT format why does it use
> > a non-standard signature?
> 
> This is useful as an academic exercise (and perhaps even a reminder to
> broadcom not to do it again) but I don't think we can make it a show
> stopper.  The boards have shipped with the non-standard signature, so we
> have to work with them.

I agree as the train has left, but this got me thinking about these
questions that I hope Qlogic folks could answer:

 - Mention what else is different - perhaps there are other entries that
   are a bit different? Or maybe the are some non-standard ones added on?

 - How has this been tested? As in had all the fields been tested (so CHAP
   on/off, extra ports, etc).

 - Do future hardware of these cards use the standard one? If so what are they?
   "Anything produced in 2012 and later.." ?
 - Is the subset of hardware that use the non-standard small enough? Would it
   be good to mention it in the commit.

Thanks!
> 
> James
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iscsi_ibft: search for broadcom specific ibft sign

2014-05-07 Thread Konrad Rzeszutek Wilk
On Wed, May 07, 2014 at 05:00:20AM -0400, vikas.chaudh...@qlogic.com wrote:
> From: Vikas Chaudhary 
> 
> Broadcom iscsi offload firmware uses a non standard ibft sign of "BIFT".

Why? If it uses the standard iBFT format why does it use
a non-standard signature?

> This patch modifies the ibft search code to search for "BIFT" along
> with the other possible values.
> 
> Signed-off-by: Vikas Chaudhary 
> Acked-by: Kevin Tran 
> Acked-by: Eddie Wai 
> ---
>  drivers/firmware/iscsi_ibft.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/firmware/iscsi_ibft.c b/drivers/firmware/iscsi_ibft.c
> index 3ee852c..071c2c9 100644
> --- a/drivers/firmware/iscsi_ibft.c
> +++ b/drivers/firmware/iscsi_ibft.c
> @@ -756,6 +756,7 @@ static const struct {
>*/
>   { ACPI_SIG_IBFT },
>   { "iBFT" },
> + { "BIFT" }, /* Broadcom iSCSI Offload */
>  };
>  
>  static void __init acpi_find_ibft_region(void)
> -- 
> 1.8.2.GIT
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "swiotlb buffer is full" with 3.13-rc1+ but not 3.4.

2013-12-03 Thread Konrad Rzeszutek Wilk
On Sat, Nov 30, 2013 at 03:48:44PM -0500, James Bottomley wrote:
> On Sat, 2013-11-30 at 13:56 -0500, Konrad Rzeszutek Wilk wrote:
> > My theory is that the SWIOTLB is not full - it is just that the request
> > is for a compound page that is more than 512kB. Please note that
> > SWIOTLB highest "chunk" of buffer it can deal with is 512kb.
> > 
> > And that is of course the question comes out - why would it try to
> > bounce buffer it. In Xen the answer is simple - the sg chunks cross page
> > boundaries which means that they are not physically contingous - so we
> > have to use the bounce buffer. It would be better if the the sg list
> > provided a large list of 4KB pages instead of compound pages as that
> > could help in avoiding the bounce buffer.
> > 
> > But I digress - this is a theory - I don't know whether the SCSI layer
> > does any colescing of the sg list - and if so, whether there is any
> > easy knob to tell it to not do it.
> 
> Well, SCSI doesn't, but block does.  It's actually an efficiency thing
> since most firmware descriptor formats cope with multiple pages and the
> more descriptors you have for a transaction, the more work the on-board
> processor on the HBA has to do.  If you have an emulated HBA, like
> virtio, you could turn off physical coalesing by setting the
> use_clustering flag to DISABLE_CLUSTERING.  But you can't do that for a
> real card.  I assume the problem here is that the host is passing the
> card directly to the guest and the guest clusters based on its idea of
> guest pages which don't map to contiguous physical pages?

Kind of. Except that in this case the guest does know that it can't map
them contingously - and resorts to using the bounce buffer so that it
can provide a nice chunk of contingous area. This is detected by
the SWIOTLB layer and also the block layer to discourage coalescing
there.

But since SCSI is all about sg list I think it gets tangled up here:

537 for_each_sg(sgl, sg, nelems, i) {   

538 phys_addr_t paddr = sg_phys(sg);

539 dma_addr_t dev_addr = xen_phys_to_bus(paddr);   

540 

541 if (swiotlb_force ||

542 !dma_capable(hwdev, dev_addr, sg->length) ||

543 range_straddles_page_boundary(paddr, sg->length)) { 

544 phys_addr_t map = swiotlb_tbl_map_single(hwdev, 

545  
start_dma_addr,
546  
sg_phys(sg),   
547  
sg->length,
548  dir);  


So it is either not capable of reaching that physical address (so DMA
mask, but I doubt it - this is LSI which can do 64bit). Or the pages
straddle. They can straddle it by well, being offset at odd locations, or
compound pages.

But why would they in the first place - and so many of them - considering
the flow of those printks Ian's is seeing.

James,
The SCSI layer wouldn't do any funny business here right - no reording
of bios? That is all left to the block layer right?


> 
> The way you tell how many physically contiguous pages block is willing
> to merge is by looking at /sys/block//queue/max_segment_size if
> that's 4k then it won't merge, if it's greater than 4k, then it will.

Ah, good idea. Ian, anything there?
> 
> I'm not quite sure what to do ... you can't turn of clustering globally
> in the guest because the virtio drivers use it to reduce ring descriptor
> pressure, what you probably want is some way to flag a pass through
> device.
> 
> James
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html