[PATCH] Add HID quirks for Akai MIDImix.

2016-10-09 Thread Steinar H. Gunderson
The Akai MIDImix (09e8:0031) is a MIDI fader controller that speaks
regular MIDI and works well with Linux. However, initialization gets
delayed due to reports timeout:

  [3643645.631124] hid-generic 0003:09E8:0031.0020: timeout initializing reports
  [3643645.632416] hid-generic 0003:09E8:0031.0020: hiddev0: USB HID v1.11 
Device [AKAI MIDI Mix] on usb-:00:14.0-2/input0

Adding "usbhid.quirks=0x09e8:0x0031:0x2000" on the kernel
command line makes the issues go away.

Signed-off-by: Steinar H. Gunderson <sgunder...@bigfoot.com>
---
 drivers/hid/hid-ids.h   | 3 +++
 drivers/hid/usbhid/hid-quirks.c | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index 4ed9a4f..e92b09d 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -64,6 +64,9 @@
 #define USB_VENDOR_ID_AKAI 0x2011
 #define USB_DEVICE_ID_AKAI_MPKMINI20x0715
 
+#define USB_VENDOR_ID_AKAI_09E80x09E8
+#define USB_DEVICE_ID_AKAI_09E8_MIDIMIX0x0031
+
 #define USB_VENDOR_ID_ALCOR0x058f
 #define USB_DEVICE_ID_ALCOR_USBRS232   0x9720
 
diff --git a/drivers/hid/usbhid/hid-quirks.c b/drivers/hid/usbhid/hid-quirks.c
index b4b8c6a..bb40008 100644
--- a/drivers/hid/usbhid/hid-quirks.c
+++ b/drivers/hid/usbhid/hid-quirks.c
@@ -56,6 +56,7 @@ static const struct hid_blacklist {
 
{ USB_VENDOR_ID_AIREN, USB_DEVICE_ID_AIREN_SLIMPLUS, HID_QUIRK_NOGET },
{ USB_VENDOR_ID_AKAI, USB_DEVICE_ID_AKAI_MPKMINI2, 
HID_QUIRK_NO_INIT_REPORTS },
+   { USB_VENDOR_ID_AKAI_09E8, USB_DEVICE_ID_AKAI_09E8_MIDIMIX, 
HID_QUIRK_NO_INIT_REPORTS },
{ USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_UC100KM, HID_QUIRK_NOGET },
{ USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_CS124U, HID_QUIRK_NOGET },
{ USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_2PORTKVM, HID_QUIRK_NOGET },
-- 
2.8.0.rc3.226.g39d4020

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dwc3-exynos: Fix deferred probing storm.

2016-05-30 Thread Steinar H. Gunderson
On Fri, May 27, 2016 at 04:26:42PM +0300, Felipe Balbi wrote:
>> Sent. As a fix, there's a chance it could go into 4.7, right?
> yup, shouldn't be a problem. But only after v4.7-rc1 is tagged.

Seemingly v4.7-rc1 is out today (I was surprised at how quick that was).

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH] dwc3-exynos: Fix deferred probing storm.

2016-05-27 Thread Steinar H. Gunderson
On Fri, May 27, 2016 at 04:12:59PM +0300, Felipe Balbi wrote:
> yes, please do that. Keep in mind, also, that we're still in the middle
> of the merge window and nothing will really happen until v4.7-rc1 is
> tagged.

Sent. As a fix, there's a chance it could go into 4.7, right?

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] dwc3-exynos: Fix deferred probing storm.

2016-05-27 Thread Steinar H. Gunderson
dwc3-exynos has two problems during init if the regulators are slow
to come up (for instance if the I2C bus driver is not on the initramfs)
and return probe deferral. First, every time this happens, the driver
leaks the USB phys created; they need to be deallocated on error.

Second, since the phy devices are created before the regulators fail,
this means that there's a new device to re-trigger deferred probing,
which causes it to essentially go into a busy loop of re-probing the
device until the regulators come up.

Move the phy creation to after the regulators have succeeded, and also
fix cleanup on failure. On my ODROID XU4 system (with Debian's initramfs
which doesn't contain the I2C driver), this reduces the number of probe
attempts (for each of the two controllers) from more than 2000 to eight.

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlow...@samsung.com>
Reviewed-by: Vivek Gautam <gautam.vi...@samsung.com>
Fixes: d720f057fda4 ("usb: dwc3: exynos: add nop transceiver support")
Cc: <sta...@vger.kernel.org>
---
 drivers/usb/dwc3/dwc3-exynos.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/dwc3/dwc3-exynos.c b/drivers/usb/dwc3/dwc3-exynos.c
index dd5cb55..2f1fb7e 100644
--- a/drivers/usb/dwc3/dwc3-exynos.c
+++ b/drivers/usb/dwc3/dwc3-exynos.c
@@ -128,12 +128,6 @@ static int dwc3_exynos_probe(struct platform_device *pdev)
 
platform_set_drvdata(pdev, exynos);
 
-   ret = dwc3_exynos_register_phys(exynos);
-   if (ret) {
-   dev_err(dev, "couldn't register PHYs\n");
-   return ret;
-   }
-
exynos->dev = dev;
 
exynos->clk = devm_clk_get(dev, "usbdrd30");
@@ -183,20 +177,29 @@ static int dwc3_exynos_probe(struct platform_device *pdev)
goto err3;
}
 
+   ret = dwc3_exynos_register_phys(exynos);
+   if (ret) {
+   dev_err(dev, "couldn't register PHYs\n");
+   goto err4;
+   }
+
if (node) {
ret = of_platform_populate(node, NULL, NULL, dev);
if (ret) {
dev_err(dev, "failed to add dwc3 core\n");
-   goto err4;
+   goto err5;
}
} else {
dev_err(dev, "no device node, failed to add dwc3 core\n");
ret = -ENODEV;
-   goto err4;
+   goto err5;
}
 
return 0;
 
+err5:
+   platform_device_unregister(exynos->usb2_phy);
+   platform_device_unregister(exynos->usb3_phy);
 err4:
regulator_disable(exynos->vdd10);
 err3:
-- 
2.8.0.rc3.226.g39d4020

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dwc3-exynos: Fix deferred probing storm.

2016-05-27 Thread Steinar H. Gunderson
On Fri, May 27, 2016 at 03:23:35PM +0530, Vivek Gautam wrote:
> I don't have any concerns with the patch apart from the ones
> Krzysztof has already pointed out.
> LGTM.

Should I repost the patch, or will people just make these commit message
changes for me? I guess balbi@ is the right person to review and merge,
since he maintains the driver.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: Endless "supply vcc not found, using dummy regulator"

2016-05-27 Thread Steinar H. Gunderson
On Fri, May 27, 2016 at 03:02:48PM +0530, Vivek Gautam wrote:
> Above mentioned patches were not accepted by the maintainers of generic-phy
> and usb. I couldn't get any response on them for quite a long time. So, the
> patches could never make it to the mainline.
> I can try initiating the entire exercise once again and try to get them 
> merged.

That would be nice; having USB3 speeds is certainly attractive.

> AFA dwc3-exynos is concerned, there's definitely a resource leak as
> pointed out by you.
> Please post the suggested patch, adding required people in CC.

http://www.spinics.net/lists/linux-usb/msg141385.html

Is there anyone I should have Cc-ed that I didn't?

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: Endless "supply vcc not found, using dummy regulator"

2016-05-26 Thread Steinar H. Gunderson
On Wed, May 25, 2016 at 07:52:36PM +0200, Steinar H. Gunderson wrote:
>> Actually their are some missing patches to tune the usb3 phy.
>> 
>> https://lkml.org/lkml/2014/10/31/266
> This explains why the default networking speed refused to go above
> ~300 Mbit/sec! What happened to the patches, I wonder?

Adding Vivek in case he knows. They certainly don't apply anymore, at least.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: Endless "supply vcc not found, using dummy regulator"

2016-05-25 Thread Steinar H. Gunderson
On Wed, May 25, 2016 at 05:46:51PM +0530, Anand Moon wrote:
> Actually their are some missing patches to tune the usb3 phy.
> 
> https://lkml.org/lkml/2014/10/31/266

This explains why the default networking speed refused to go above
~300 Mbit/sec! What happened to the patches, I wonder?

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] dwc3-exynos: Fix deferred probing storm.

2016-05-24 Thread Steinar H. Gunderson
dwc3-exynos has two problems during init if the regulators are slow
to come up (for instance if the I2C bus driver is not on the initramfs)
and return probe deferral. First, every time this happens, the driver
leaks the USB phys created; they need to be deallocated on error.

Second, since the phy devices are created before the regulators fail,
this means that there's a new device to re-trigger deferred probing,
which causes it to essentially go into a busy loop of re-probing the
device until the regulators come up.

Move the phy creation to after the regulators have succeeded, and also
fix cleanup on failure. On my ODROID XU4 system (with Debian's initramfs
which doesn't contain the I2C driver), this reduces the number of probe
attempts (for each of the two controllers) from more than 2000 to eight.

Reported-by: Steinar H. Gunderson <sgunder...@bigfoot.com>
Signed-off-by: Steinar H. Gunderson <se...@google.com>
---
 drivers/usb/dwc3/dwc3-exynos.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/dwc3/dwc3-exynos.c b/drivers/usb/dwc3/dwc3-exynos.c
index dd5cb55..2f1fb7e 100644
--- a/drivers/usb/dwc3/dwc3-exynos.c
+++ b/drivers/usb/dwc3/dwc3-exynos.c
@@ -128,12 +128,6 @@ static int dwc3_exynos_probe(struct platform_device *pdev)
 
platform_set_drvdata(pdev, exynos);
 
-   ret = dwc3_exynos_register_phys(exynos);
-   if (ret) {
-   dev_err(dev, "couldn't register PHYs\n");
-   return ret;
-   }
-
exynos->dev = dev;
 
exynos->clk = devm_clk_get(dev, "usbdrd30");
@@ -183,20 +177,29 @@ static int dwc3_exynos_probe(struct platform_device *pdev)
goto err3;
}
 
+   ret = dwc3_exynos_register_phys(exynos);
+   if (ret) {
+   dev_err(dev, "couldn't register PHYs\n");
+   goto err4;
+   }
+
if (node) {
ret = of_platform_populate(node, NULL, NULL, dev);
if (ret) {
dev_err(dev, "failed to add dwc3 core\n");
-   goto err4;
+   goto err5;
}
} else {
dev_err(dev, "no device node, failed to add dwc3 core\n");
ret = -ENODEV;
-   goto err4;
+   goto err5;
}
 
return 0;
 
+err5:
+   platform_device_unregister(exynos->usb2_phy);
+   platform_device_unregister(exynos->usb3_phy);
 err4:
regulator_disable(exynos->vdd10);
 err3:
-- 
2.8.0.rc3.226.g39d4020

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Re: Endless "supply vcc not found, using dummy regulator"

2016-05-24 Thread Steinar H. Gunderson
On Tue, May 24, 2016 at 05:53:34PM +0200, Krzysztof Kozlowski wrote:
> exynos->clk = devm_clk_get(dev, "usbdrd30");
> if (IS_ERR(exynos->clk)) {
> + // On each error path since here we need to
> + // revert work done by dwc3_exynos_register_phys()
> dev_err(dev, "couldn't get clock\n");
> return -EINVAL;
> }
> clk_prepare_enable(exynos->clk);

OK, so I took Mark's advice and moved the phy instantiation towards the end
of the function (after the regulators have successfully come up). It reduced
the number of probes, even with the original initramfs, dramatically, so
it seems to work quite well. It also reduces the text for each deferred probe
by a lot, since we no longer have the dummy regulator message for each one
(only the message about “no suspend clk specified” is left). Finally, this
arrangement reduced the need for extra error handling to a minimum.

Cc-ing Felipe and and linux-usb@, and adding the patch as a reply to this
message.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for usbfs zerocopy.

2016-02-24 Thread Steinar H. Gunderson
On Wed, Feb 24, 2016 at 02:30:08PM -0500, Sasha Levin wrote:
> I'm seeing the following warning while fuzzing:
> [ 1595.188189] WARNING: CPU: 3 PID: 26063 at mm/page_alloc.c:3207 
> __alloc_pages_nodemask+0x960/0x29e0()
> [ 1595.188287] Modules linked in:
> [ 1595.188316] CPU: 3 PID: 26063 Comm: syz-executor Not tainted 
> 4.5.0-rc5-next-20160223-sasha-00022-g03b30f1-dirty #2982

I think it was already established that one could cause kernel warnings if
trying to allocate large amounts of memory, but that the usbfs memory limits
could curb truly dangerous amounts. Someone please correct me if I'm
misunderstanding?

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add support for usbfs zerocopy.

2016-02-12 Thread Steinar H. Gunderson
On Wed, Feb 03, 2016 at 11:09:16PM +0100, Steinar H. Gunderson wrote:
> Trying again; sending v4 as a reply to your email.

Did the v4 sending work for you?

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add support for usbfs zerocopy.

2016-02-04 Thread Steinar H. Gunderson
On Thu, Feb 04, 2016 at 11:17:26AM +0100, Bjørn Mork wrote:
> Then use Mutt to reply, but include the patch inline instead of
> attaching it.  I believe this is discussed in the Mutt section of
> Documentation/email-clients.txt

Thanks; if that works (even though it changes the “From” line and such)
that's a good solution. It doesn't work for series of multiple patches, but I
can certainly live with that.

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] Add support for usbfs zerocopy.

2016-02-03 Thread Steinar H. Gunderson
Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for all kinds of transfers (even if
   nonsensical for control and interrupt transfers); isochronous also
   no longer need to memset() the buffer to zero to avoid leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

There's a new capability USBDEVFS_CAP_MMAP that userspace can query to see
if the running kernel supports this functionality, if just trying mmap() is
not acceptable.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
Acked-by: Alan Stern <st...@rowland.harvard.edu>
---
 drivers/usb/core/devio.c  | 227 +-
 include/uapi/linux/usbdevice_fs.h |   1 +
 2 files changed, 203 insertions(+), 25 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 59e7a33..a9fc6ff 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
spinlock_t lock;/* protects the async urb lists */
struct list_head async_pending;
struct list_head async_completed;
+   struct list_head memory_list;
wait_queue_head_t wait; /* wake up if a request completed */
unsigned int discsignr;
struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+   struct list_head memlist;
+   int vma_use_count;
+   int urb_use_count;
+   u32 size;
+   void *mem;
+   dma_addr_t dma_handle;
+   unsigned long vm_start;
+   struct usb_dev_state *ps;
+};
+
 struct async {
struct list_head asynclist;
struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
void __user *userbuffer;
void __user *userurb;
struct urb *urb;
+   struct usb_memory *usbm;
unsigned int mem_usage;
int status;
u32 secid;
@@ -162,6 +176,111 @@ static int connected(struct usb_dev_state *ps)
ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
+{
+   struct usb_dev_state *ps = usbm->ps;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   --*count;
+   if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) {
+   list_del(>memlist);
+   spin_unlock_irqrestore(>lock, flags);
+
+   usb_free_coherent(ps->dev, usbm->size, usbm->mem,
+   usbm->dma_handle);
+   usbfs_decrease_memory_usage(
+   usbm->size + sizeof(struct usb_memory));
+   kfree(usbm);
+   } else {
+   spin_unlock_irqrestore(>lock, flags);
+   }
+}
+
+static void usbdev_vm_open(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+   unsigned long flags;
+
+   spin_lock_irqsave(>ps->lock, flags);
+   ++usbm->vma_use_count;
+   spin_unlock_irqrestore(>ps->lock, flags);
+}
+
+static void usbdev_vm_close(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+
+   dec_usb_memory_use_count(usbm, >vma_use_count);
+}
+
+struct vm_operations_struct usbdev_vm_ops = {
+   .open = usbdev_vm_open,
+   .close = usbdev_vm_close
+};
+
+static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = NULL;
+   struct usb_dev_state *ps = file->private_data;
+   size_t size = vma->vm_end - vma->vm_start;
+   void *mem;
+   unsigned long flags;
+   dma_addr_t dma_handle;
+   in

Re: [PATCH v2] Add support for usbfs zerocopy.

2016-02-03 Thread Steinar H. Gunderson
On Wed, Feb 03, 2016 at 01:23:17PM -0800, Greg Kroah-Hartman wrote:
> Attachments don't work, you know better than that :(

Since I've now been bitten by this several times: Is there any sort of best
practice for integrating git with MUAs? What I'm doing right now is
cut-and-paste from mutt to get the to/cc/in-reply-to headers right, and
that's suboptimal. It feels like it should be a FAQ, but I can't really find
anything obvious.

> I need a _real_ email that I don't have to hand-edit to get it to apply.

Trying again; sending v4 as a reply to your email. This is on top of git
master, since the patch no longer applied there. If you want it against any
other tree, please let me know which.

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add support for usbfs zerocopy.

2016-02-03 Thread Steinar H. Gunderson
On Thu, Feb 04, 2016 at 12:15:50AM +0200, Felipe Balbi wrote:
>> Since I've now been bitten by this several times: Is there any sort of best
>> practice for integrating git with MUAs? What I'm doing right now is
>> cut-and-paste from mutt to get the to/cc/in-reply-to headers right, and
>> that's suboptimal. It feels like it should be a FAQ, but I can't really find
>> anything obvious.
> have you tried git send-email ? If you wanna send as a reply to another
> message, use:
> 
>   $ git send-email --to f...@bar.com --cc b...@foo.com 
> --in-reply-to=abcd-message...@foo.bar.com

Unfortunately that doesn't change anything; I still need to cut-and-paste the
to, cc and in-reply-to manually. What I'd want is something like
“git send-email as a reply to the last email in my outbox”.

Attaching the patch gives me exactly this, by the way, but seemingly ends up
with a format that's more cumbersome to receive (which is a bad tradeoff).

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add support for usbfs zerocopy.

2016-02-02 Thread Steinar H. Gunderson
On Mon, Jan 25, 2016 at 09:03:57AM +0100, Steinar H. Gunderson wrote:
> I did git rebase --ignore-date HEAD^ just to reset the date. Sending it as an
> attachment just to be sure.

Hi Greg,

Did this work for you? Is there anything else I should do to this patch?

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add support for usbfs zerocopy.

2016-01-25 Thread Steinar H. Gunderson
On Sun, Jan 24, 2016 at 01:12:08PM -0800, Greg Kroah-Hartman wrote:
> Something is really wrong with your email client, it is saying this is
> sent on Nov 26, the same exact time as your previous patch, yet you sent
> this in January.  Which implies that this is an old patch and not an
> updated one :(

It comes directly from git format-patch. I guess git commit --amend doesn't
properly reset the date?

> Can you send me the latest verison of this, with the date set properly
> on your machine for it?

I did git rebase --ignore-date HEAD^ just to reset the date. Sending it as an
attachment just to be sure.

/* Steinar */
-- 
Software Engineer, Google Switzerland
>From e56d9235b343c5e70061e977639cc7dddeae8164 Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Mon, 25 Jan 2016 09:02:34 +0100
Subject: [PATCH v3] Add support for usbfs zerocopy.

Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for all kinds of transfers (even if
   nonsensical for control and interrupt transfers); isochronous also
   no longer need to memset() the buffer to zero to avoid leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

There's a new capability USBDEVFS_CAP_MMAP that userspace can query to see
if the running kernel supports this functionality, if just trying mmap() is
not acceptable.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
Acked-by: Alan Stern <st...@rowland.harvard.edu>
---
 drivers/usb/core/devio.c  | 227 +-
 include/uapi/linux/usbdevice_fs.h |   1 +
 2 files changed, 203 insertions(+), 25 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..0238c78 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
 	spinlock_t lock;/* protects the async urb lists */
 	struct list_head async_pending;
 	struct list_head async_completed;
+	struct list_head memory_list;
 	wait_queue_head_t wait; /* wake up if a request completed */
 	unsigned int discsignr;
 	struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
 	u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+	struct list_head memlist;
+	int vma_use_count;
+	int urb_use_count;
+	u32 size;
+	void *mem;
+	dma_addr_t dma_handle;
+	unsigned long vm_start;
+	struct usb_dev_state *ps;
+};
+
 struct async {
 	struct list_head asynclist;
 	struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
 	void __user *userbuffer;
 	void __user *userurb;
 	struct urb *urb;
+	struct usb_memory *usbm;
 	unsigned int mem_usage;
 	int status;
 	u32 secid;
@@ -157,6 +171,111 @@ static int connected(struct usb_dev_state *ps)
 			ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
+{
+	struct usb_dev_state *ps = usbm->ps;
+	unsigned long flags;
+
+	spin_lock_irqsave(>lock, flags);
+	--*count;
+	if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) {
+		list_del(>memlist);
+		spin_unlock_irqrestore(>lock, flags);
+
+		usb_free_coherent(ps->dev, usbm->size, usbm->mem,
+usbm->dma_handle);
+		usbfs_decrease_memory_usage(
+			usbm->size + sizeof(struct usb_memory));
+		kfree(usbm);
+	} else {
+		spin_unlock_irqrestore(>lock, flags);
+	}
+}
+
+static void usbdev_vm_open(struct vm_area_struct *vma)
+{
+	struct usb_memory *usbm = vma->vm_private_data;
+	unsigned long flags;
+
+	spin_lock_irqsave(>ps->lock, flags);
+	++usbm->vma_use_count;
+	spin_unlock_irqrestore(>ps->lock, flags);
+}
+
+static void usbdev_vm_close(struct vm_area_struct *vma)
+{
+	struct usb

Re: [PATCH] Add support for usbfs zerocopy.

2016-01-06 Thread Steinar H. Gunderson
On Wed, Jan 06, 2016 at 04:22:12PM +0100, Peter Stuge wrote:
>>> Our interface for zero copy reads/writes is O_DIRECT, and that requires
>>> not special memory allocation, just proper alignment.
>> But that assumes you are using I/O using read()/write(). There's no way you
>> can shoehorn USB isochronous reads into the read() interface, O_DIRECT or 
>> not.
> How about aio?

I don't really see how; a USB device does not look much like a file. (Where
would you stick the endpoint, for one? And how would you ever submit an URB
with multiple packets in it, which is essential?) It feels a bit like
trying to use UDP sockets with only read() and write().

In any case, the usbfs interface already exists and is stable. This is about
extending it; replacing it with something new from scratch to get zerocopy
would seem overkill.

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for usbfs zerocopy.

2016-01-06 Thread Steinar H. Gunderson
On Tue, Jan 05, 2016 at 10:49:49PM -0800, Christoph Hellwig wrote:
> This is a completely broken usage of the mmap interface.  if you use
> mmap on a device file you must use the actual mmap for the data
> transfer.

Really? V4L does exactly the same thing, from what I can see. It's just a way
of allocating memory with specific properties, roughly similar to hugetlbfs.

> Our interface for zero copy reads/writes is O_DIRECT, and that requires
> not special memory allocation, just proper alignment.

But that assumes you are using I/O using read()/write(). There's no way you
can shoehorn USB isochronous reads into the read() interface, O_DIRECT or not.

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for usbfs zerocopy.

2016-01-05 Thread Steinar H. Gunderson
On Tue, Jan 05, 2016 at 04:11:43PM -0800, Greg Kroah-Hartman wrote:
>> Add a new interface for userspace to preallocate memory that can be
>> used with usbfs. This gives two primary benefits:
> Please 'version' your patches, so that I have a chance to figure out
> what patch is what, and what changed between patches.
> 
> otherwise the odds of me picking the "wrong" one is _very_ high...

OK. I won't make any attempt at reconstructing the history, but I'll resend
the one I just sent you as v2, ie. --reroll-count=2.

Somehow it feels like there should be a way to integrate this better into my
MUA, but hopefully this is soon all done anyway. :-)

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Add support for usbfs zerocopy.

2016-01-05 Thread Steinar H. Gunderson
Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for all kinds of transfers (even if
   nonsensical for control and interrupt transfers); isochronous also
   no longer need to memset() the buffer to zero to avoid leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

There's a new capability USBDEVFS_CAP_MMAP that userspace can query to see
if the running kernel supports this functionality, if just trying mmap() is
not acceptable.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
Acked-by: Alan Stern <st...@rowland.harvard.edu>
---
 drivers/usb/core/devio.c  | 227 +-
 include/uapi/linux/usbdevice_fs.h |   1 +
 2 files changed, 203 insertions(+), 25 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..0238c78 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
spinlock_t lock;/* protects the async urb lists */
struct list_head async_pending;
struct list_head async_completed;
+   struct list_head memory_list;
wait_queue_head_t wait; /* wake up if a request completed */
unsigned int discsignr;
struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+   struct list_head memlist;
+   int vma_use_count;
+   int urb_use_count;
+   u32 size;
+   void *mem;
+   dma_addr_t dma_handle;
+   unsigned long vm_start;
+   struct usb_dev_state *ps;
+};
+
 struct async {
struct list_head asynclist;
struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
void __user *userbuffer;
void __user *userurb;
struct urb *urb;
+   struct usb_memory *usbm;
unsigned int mem_usage;
int status;
u32 secid;
@@ -157,6 +171,111 @@ static int connected(struct usb_dev_state *ps)
ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
+{
+   struct usb_dev_state *ps = usbm->ps;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   --*count;
+   if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) {
+   list_del(>memlist);
+   spin_unlock_irqrestore(>lock, flags);
+
+   usb_free_coherent(ps->dev, usbm->size, usbm->mem,
+   usbm->dma_handle);
+   usbfs_decrease_memory_usage(
+   usbm->size + sizeof(struct usb_memory));
+   kfree(usbm);
+   } else {
+   spin_unlock_irqrestore(>lock, flags);
+   }
+}
+
+static void usbdev_vm_open(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+   unsigned long flags;
+
+   spin_lock_irqsave(>ps->lock, flags);
+   ++usbm->vma_use_count;
+   spin_unlock_irqrestore(>ps->lock, flags);
+}
+
+static void usbdev_vm_close(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+
+   dec_usb_memory_use_count(usbm, >vma_use_count);
+}
+
+struct vm_operations_struct usbdev_vm_ops = {
+   .open = usbdev_vm_open,
+   .close = usbdev_vm_close
+};
+
+static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = NULL;
+   struct usb_dev_state *ps = file->private_data;
+   size_t size = vma->vm_end - vma->vm_start;
+   void *mem;
+   unsigned long flags;
+   dma_addr_t dma_handle;
+   in

[PATCH] Add support for usbfs zerocopy.

2016-01-05 Thread Steinar H. Gunderson
Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for all kinds of transfers (even if
   nonsensical for control and interrupt transfers); isochronous also
   no longer need to memset() the buffer to zero to avoid leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

There's a new capability USBDEVFS_CAP_MMAP that userspace can query to see
if the running kernel supports this functionality, if just trying mmap() is
not acceptable.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
Acked-by: Alan Stern <st...@rowland.harvard.edu>
---
 drivers/usb/core/devio.c  | 227 +-
 include/uapi/linux/usbdevice_fs.h |   1 +
 2 files changed, 203 insertions(+), 25 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..0238c78 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
spinlock_t lock;/* protects the async urb lists */
struct list_head async_pending;
struct list_head async_completed;
+   struct list_head memory_list;
wait_queue_head_t wait; /* wake up if a request completed */
unsigned int discsignr;
struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+   struct list_head memlist;
+   int vma_use_count;
+   int urb_use_count;
+   u32 size;
+   void *mem;
+   dma_addr_t dma_handle;
+   unsigned long vm_start;
+   struct usb_dev_state *ps;
+};
+
 struct async {
struct list_head asynclist;
struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
void __user *userbuffer;
void __user *userurb;
struct urb *urb;
+   struct usb_memory *usbm;
unsigned int mem_usage;
int status;
u32 secid;
@@ -157,6 +171,111 @@ static int connected(struct usb_dev_state *ps)
ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
+{
+   struct usb_dev_state *ps = usbm->ps;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   --*count;
+   if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) {
+   list_del(>memlist);
+   spin_unlock_irqrestore(>lock, flags);
+
+   usb_free_coherent(ps->dev, usbm->size, usbm->mem,
+   usbm->dma_handle);
+   usbfs_decrease_memory_usage(
+   usbm->size + sizeof(struct usb_memory));
+   kfree(usbm);
+   } else {
+   spin_unlock_irqrestore(>lock, flags);
+   }
+}
+
+static void usbdev_vm_open(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+   unsigned long flags;
+
+   spin_lock_irqsave(>ps->lock, flags);
+   ++usbm->vma_use_count;
+   spin_unlock_irqrestore(>ps->lock, flags);
+}
+
+static void usbdev_vm_close(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+
+   dec_usb_memory_use_count(usbm, >vma_use_count);
+}
+
+struct vm_operations_struct usbdev_vm_ops = {
+   .open = usbdev_vm_open,
+   .close = usbdev_vm_close
+};
+
+static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = NULL;
+   struct usb_dev_state *ps = file->private_data;
+   size_t size = vma->vm_end - vma->vm_start;
+   void *mem;
+   unsigned long flags;
+   dma_addr_t dma_handle;
+   in

Re: Does vm_operations_struct require a .owner field?

2016-01-05 Thread Steinar H. Gunderson
On Tue, Jan 05, 2016 at 04:31:09PM -0500, Alan Stern wrote:
> Thank you.  So it looks like I was worried about nothing.
> 
> Steinar, you can remove the try_module_get/module_put lines from your
> patch.  Also, the list_del() and comment in usbdev_release() aren't 
> needed -- at that point we know the memory_list has to be empty since 
> there can't be any outstanding URBs or VMA references.  If you take 
> those things out then the patch should be ready for merging.

Good, thanks. Did so, compiled, testing it still works, sending :-)

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for usbfs zerocopy.

2016-01-03 Thread Steinar H. Gunderson
On Mon, Jan 04, 2016 at 01:42:06AM +0100, Markus Rechberger wrote:
> Don't expect anyone else seriously testing this interface aside of me
> and Steinar.

FWIW, I got an email from the OpenKinect people, who are (as far as I know)
currently testing the patch together with my libusb-1.0 patch. It seems
they've been having similar problems about large isochronous memory
allocations failing.

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add support for usbfs zerocopy.

2015-12-28 Thread Steinar H. Gunderson
Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for all kinds of transfers (even if
   nonsensical for control and interrupt transfers); isochronous also
   no longer need to memset() the buffer to zero to avoid leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

There's a new capability USBDEVFS_CAP_MMAP that userspace can query to see
if the running kernel supports this functionality, if just trying mmap() is
not acceptable.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
Acked-by: Alan Stern <st...@rowland.harvard.edu>
---
 drivers/usb/core/devio.c  | 238 ++
 include/uapi/linux/usbdevice_fs.h |   1 +
 2 files changed, 214 insertions(+), 25 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..1e331dd 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
spinlock_t lock;/* protects the async urb lists */
struct list_head async_pending;
struct list_head async_completed;
+   struct list_head memory_list;
wait_queue_head_t wait; /* wake up if a request completed */
unsigned int discsignr;
struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+   struct list_head memlist;
+   int vma_use_count;
+   int urb_use_count;
+   u32 size;
+   void *mem;
+   dma_addr_t dma_handle;
+   unsigned long vm_start;
+   struct usb_dev_state *ps;
+};
+
 struct async {
struct list_head asynclist;
struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
void __user *userbuffer;
void __user *userurb;
struct urb *urb;
+   struct usb_memory *usbm;
unsigned int mem_usage;
int status;
u32 secid;
@@ -157,6 +171,116 @@ static int connected(struct usb_dev_state *ps)
ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
+{
+   struct usb_dev_state *ps = usbm->ps;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   --*count;
+   if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) {
+   list_del(>memlist);
+   spin_unlock_irqrestore(>lock, flags);
+
+   usb_free_coherent(ps->dev, usbm->size, usbm->mem,
+   usbm->dma_handle);
+   usbfs_decrease_memory_usage(
+   usbm->size + sizeof(struct usb_memory));
+   kfree(usbm);
+   module_put(THIS_MODULE);
+   } else {
+   spin_unlock_irqrestore(>lock, flags);
+   }
+}
+
+static void usbdev_vm_open(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+   unsigned long flags;
+
+   spin_lock_irqsave(>ps->lock, flags);
+   ++usbm->vma_use_count;
+   spin_unlock_irqrestore(>ps->lock, flags);
+}
+
+static void usbdev_vm_close(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+
+   dec_usb_memory_use_count(usbm, >vma_use_count);
+}
+
+struct vm_operations_struct usbdev_vm_ops = {
+   .open = usbdev_vm_open,
+   .close = usbdev_vm_close
+};
+
+static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = NULL;
+   struct usb_dev_state *ps = file->private_data;
+   size_t size = vma->vm_end - vma->vm_start;
+   void *mem;
+   unsigned long flags;
+   dma_a

Re: [PATCH] Add support for usbfs zerocopy.

2015-12-28 Thread Steinar H. Gunderson
On Fri, Dec 25, 2015 at 01:36:44PM -0500, Alan Stern wrote:
> That's okay; lots of drivers do this and people expect it.  It also 
> reduces the total amount of code.

Okay, fixed.

> > > > +   mem = usb_alloc_coherent(ps->dev, size, GFP_KERNEL, 
> > > > _handle);
> > > Shouldn't this be GFP_USER to let limits apply?
> > This I don't really know anything about. Alan?
> There doesn't appear to be very much difference between them.  As far
> as I can tell from the documentation in include/linux/gfp.h, GPF_USER
> would indeed be more appropriate.

OK, changed this one to GFP_USER, but kept all the others at GFP_KERNEL since
it seemed the most consistent with the other code.

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for usbfs zerocopy.

2015-12-25 Thread Steinar H. Gunderson
On Fri, Dec 25, 2015 at 09:50:50AM +0100, Oliver Neukum wrote:
> > +   ret = usbfs_increase_memory_usage(size + sizeof(struct usb_memory));
> > +   if (ret) {
> > +   module_put(THIS_MODULE);
> > +   return ret;
> > +   }
> 
> Could you do the error handling in a unified manner with "goto"?

Yes and no. I couldn't do it fully unified; note the error path below that
uses usbfs_decrease_memory_usage() instead. I could do it sort-of-unified,
but it ended up being one of those label cascades, of course:

return 0;

error_free_usbm:
kfree(usbm);
error_decrease_mem:
usbfs_decrease_memory_usage(size + sizeof(struct usb_memory));
error_module_put:
module_put(THIS_MODULE);
return ret;

> > +   mem = usb_alloc_coherent(ps->dev, size, GFP_KERNEL, _handle);
> Shouldn't this be GFP_USER to let limits apply?

This I don't really know anything about. Alan?

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add support for usbfs zerocopy.

2015-12-24 Thread Steinar H. Gunderson
Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for all kinds of transfers (even if
   nonsensical for control and interrupt transfers); isochronous also
   no longer need to memset() the buffer to zero to avoid leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

There's a new capability USBDEVFS_CAP_MMAP that userspace can query to see
if the running kernel supports this functionality, if just trying mmap() is
not acceptable.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
Acked-by: Alan Stern <st...@rowland.harvard.edu>
---
 drivers/usb/core/devio.c  | 238 ++
 include/uapi/linux/usbdevice_fs.h |   1 +
 2 files changed, 214 insertions(+), 25 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..c6f4450 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
spinlock_t lock;/* protects the async urb lists */
struct list_head async_pending;
struct list_head async_completed;
+   struct list_head memory_list;
wait_queue_head_t wait; /* wake up if a request completed */
unsigned int discsignr;
struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+   struct list_head memlist;
+   int vma_use_count;
+   int urb_use_count;
+   u32 size;
+   void *mem;
+   dma_addr_t dma_handle;
+   unsigned long vm_start;
+   struct usb_dev_state *ps;
+};
+
 struct async {
struct list_head asynclist;
struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
void __user *userbuffer;
void __user *userurb;
struct urb *urb;
+   struct usb_memory *usbm;
unsigned int mem_usage;
int status;
u32 secid;
@@ -157,6 +171,116 @@ static int connected(struct usb_dev_state *ps)
ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
+{
+   struct usb_dev_state *ps = usbm->ps;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   --*count;
+   if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) {
+   list_del(>memlist);
+   spin_unlock_irqrestore(>lock, flags);
+
+   usb_free_coherent(ps->dev, usbm->size, usbm->mem,
+   usbm->dma_handle);
+   usbfs_decrease_memory_usage(
+   usbm->size + sizeof(struct usb_memory));
+   kfree(usbm);
+   module_put(THIS_MODULE);
+   } else {
+   spin_unlock_irqrestore(>lock, flags);
+   }
+}
+
+static void usbdev_vm_open(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+   unsigned long flags;
+
+   spin_lock_irqsave(>ps->lock, flags);
+   ++usbm->vma_use_count;
+   spin_unlock_irqrestore(>ps->lock, flags);
+}
+
+static void usbdev_vm_close(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+
+   dec_usb_memory_use_count(usbm, >vma_use_count);
+}
+
+struct vm_operations_struct usbdev_vm_ops = {
+   .open = usbdev_vm_open,
+   .close = usbdev_vm_close
+};
+
+static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = NULL;
+   struct usb_dev_state *ps = file->private_data;
+   size_t size = vma->vm_end - vma->vm_start;
+   void *mem;
+   unsigned long flags;
+  

Re: [PATCH] Add support for usbfs zerocopy.

2015-12-24 Thread Steinar H. Gunderson
On Tue, Dec 22, 2015 at 10:38:57AM -0500, Alan Stern wrote:
> Steinar, you can look at those if you want to.  But I really don't 
> think anything more is needed other than the try_module_get and 
> module_put calls.  At least, the Linux Device Drivers book doesn't 
> mention anything else.

OK. I'll send a version with try_module_get/module_put added, then. Tested on
top of 4.3.3.

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for usbfs zerocopy.

2015-12-21 Thread Steinar H. Gunderson
On Tue, Dec 15, 2015 at 03:32:45PM +, David Laight wrote:
> That still isn't entirely correct.
> 
> Someone with more knowledge than either of us has needs to sort out
> how to handle this properly.

This discussion sort of stalled. Short of actually becoming a kernel VM
expert (not very likely), is there anything I can do to help move this
forward?

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for usbfs zerocopy.

2015-12-14 Thread Steinar H. Gunderson
On Mon, Dec 14, 2015 at 04:41:24PM +0800, Peter Chen wrote:
>> +ret = usbfs_increase_memory_usage(size + sizeof(struct usb_memory));
>> +if (ret) {
>> +return ret;
>> +}
> The braces are not needed.

New patch attached.

/* Steinar */
-- 
Software Engineer, Google Switzerland
>From 659c4025a0e182a1a205fc4d945a68b8cb3602f0 Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Thu, 26 Nov 2015 01:19:13 +0100
Subject: [PATCH] Add support for usbfs zerocopy.
To: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: linux-usb@vger.kernel.org,mrechber...@gmail.com

Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for all kinds of transfers (even if
   nonsensical for control and interrupt transfers); isochronous also
   no longer need to memset() the buffer to zero to avoid leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

There's a new capability USBDEVFS_CAP_MMAP that userspace can query to see
if the running kernel supports this functionality, if just trying mmap() is
not acceptable.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
Acked-by: Alan Stern <st...@rowland.harvard.edu>
---
 drivers/usb/core/devio.c  | 229 +-
 include/uapi/linux/usbdevice_fs.h |   1 +
 2 files changed, 205 insertions(+), 25 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..f64dd3a 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
 	spinlock_t lock;/* protects the async urb lists */
 	struct list_head async_pending;
 	struct list_head async_completed;
+	struct list_head memory_list;
 	wait_queue_head_t wait; /* wake up if a request completed */
 	unsigned int discsignr;
 	struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
 	u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+	struct list_head memlist;
+	int vma_use_count;
+	int urb_use_count;
+	u32 size;
+	void *mem;
+	dma_addr_t dma_handle;
+	unsigned long vm_start;
+	struct usb_dev_state *ps;
+};
+
 struct async {
 	struct list_head asynclist;
 	struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
 	void __user *userbuffer;
 	void __user *userurb;
 	struct urb *urb;
+	struct usb_memory *usbm;
 	unsigned int mem_usage;
 	int status;
 	u32 secid;
@@ -157,6 +171,107 @@ static int connected(struct usb_dev_state *ps)
 			ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
+{
+	struct usb_dev_state *ps = usbm->ps;
+	unsigned long flags;
+
+	spin_lock_irqsave(>lock, flags);
+	--*count;
+	if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) {
+		list_del(>memlist);
+		spin_unlock_irqrestore(>lock, flags);
+
+		usb_free_coherent(ps->dev, usbm->size, usbm->mem,
+usbm->dma_handle);
+		usbfs_decrease_memory_usage(
+			usbm->size + sizeof(struct usb_memory));
+		kfree(usbm);
+	} else {
+		spin_unlock_irqrestore(>lock, flags);
+	}
+}
+
+static void usbdev_vm_open(struct vm_area_struct *vma)
+{
+	struct usb_memory *usbm = vma->vm_private_data;
+	unsigned long flags;
+
+	spin_lock_irqsave(>ps->lock, flags);
+	++usbm->vma_use_count;
+	spin_unlock_irqrestore(>ps->lock, flags);
+}
+
+static void usbdev_vm_close(struct vm_area_struct *vma)
+{
+	struct usb_memory *usbm = vma->vm_private_data;
+
+	dec_usb_memory_use_count(usbm, >vma_use_count);
+}
+
+struct vm_operations_struct usbdev_vm_ops = {
+	.open = usbdev_vm_open,
+	.close = usbdev_vm_close
+};
+
+static int usbdev_mmap(struct file *file, struct vm_are

[PATCH] Add support for usbfs zerocopy.

2015-12-09 Thread Steinar H. Gunderson
Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for all kinds of transfers (even if
   nonsensical for control and interrupt transfers); isochronous also
   no longer need to memset() the buffer to zero to avoid leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

There's a new capability USBDEVFS_CAP_MMAP that userspace can query to see
if the running kernel supports this functionality, if just trying mmap() is
not acceptable.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
Acked-by: Alan Stern <st...@rowland.harvard.edu>
---
 drivers/usb/core/devio.c  | 230 +-
 include/uapi/linux/usbdevice_fs.h |   1 +
 2 files changed, 206 insertions(+), 25 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..3349222 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
spinlock_t lock;/* protects the async urb lists */
struct list_head async_pending;
struct list_head async_completed;
+   struct list_head memory_list;
wait_queue_head_t wait; /* wake up if a request completed */
unsigned int discsignr;
struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+   struct list_head memlist;
+   int vma_use_count;
+   int urb_use_count;
+   u32 size;
+   void *mem;
+   dma_addr_t dma_handle;
+   unsigned long vm_start;
+   struct usb_dev_state *ps;
+};
+
 struct async {
struct list_head asynclist;
struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
void __user *userbuffer;
void __user *userurb;
struct urb *urb;
+   struct usb_memory *usbm;
unsigned int mem_usage;
int status;
u32 secid;
@@ -157,6 +171,108 @@ static int connected(struct usb_dev_state *ps)
ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
+{
+   struct usb_dev_state *ps = usbm->ps;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   --*count;
+   if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) {
+   list_del(>memlist);
+   spin_unlock_irqrestore(>lock, flags);
+
+   usb_free_coherent(ps->dev, usbm->size, usbm->mem,
+   usbm->dma_handle);
+   usbfs_decrease_memory_usage(
+   usbm->size + sizeof(struct usb_memory));
+   kfree(usbm);
+   } else {
+   spin_unlock_irqrestore(>lock, flags);
+   }
+}
+
+static void usbdev_vm_open(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+   unsigned long flags;
+
+   spin_lock_irqsave(>ps->lock, flags);
+   ++usbm->vma_use_count;
+   spin_unlock_irqrestore(>ps->lock, flags);
+}
+
+static void usbdev_vm_close(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+
+   dec_usb_memory_use_count(usbm, >vma_use_count);
+}
+
+struct vm_operations_struct usbdev_vm_ops = {
+   .open = usbdev_vm_open,
+   .close = usbdev_vm_close
+};
+
+static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = NULL;
+   struct usb_dev_state *ps = file->private_data;
+   size_t size = vma->vm_end - vma->vm_start;
+   void *mem;
+   unsigned long flags;
+   dma_addr_t dma_handle;
+   in

Re: [PATCH] Add support for usbfs zerocopy.

2015-12-09 Thread Steinar H. Gunderson
On Wed, Dec 09, 2015 at 09:29:20AM -0500, Greg Kroah-Hartman wrote:
> Shouldn't there also be some sort of documentation update with this
> patch as well, so that people know of the new functionality we are
> adding?

I can write documentation if you can point me to a reasonable place.

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for usbfs zerocopy.

2015-12-09 Thread Steinar H. Gunderson
On Wed, Dec 09, 2015 at 10:39:35AM -0500, Alan Stern wrote:
>> I can write documentation if you can point me to a reasonable place.
> The only documentation I know of for usbfs is a dreadfully out-of-date 
> section in Documentation/DocBook/usb.tmpl.

I don't volunteer to get your documentation in shape there. :-)

I would guess most usbfs users go through libusb. I have a patch against
libusb that I'll send after the kernel patch is merged (it's hard to argue
for inclusion of something that is not in mainline), and it includes
documentation for the newly added functions, including zerocopy caveats.

/* Steinar */
-- 
Software Engineer, Google Switzerland
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-12-08 Thread Steinar H. Gunderson
On Tue, Dec 08, 2015 at 05:01:46PM -0500, Alan Stern wrote:
> I don't see anything else to change.  You can submit this to Greg KH 
> and add:
> 
> Acked-by: Alan Stern 

Great!

How do I submit? Just do git send-email to some address?

Do you know what the status is for the LPM blacklisting patch we discussed
earlier?

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-12-08 Thread Steinar H. Gunderson
On Tue, Dec 08, 2015 at 03:26:28PM -0500, Alan Stern wrote:
> I just noticed this -- usbm->memlist needs to be initialized before
> dec_usb_memory_use_count() can be called.  INIT_LIST_HEAD before the
> "if" statement will fix the problem.

Done.

> The patch could use one more ingredient.  In
> include/uapi/linux/usbdevicefs_h, add a #define for USBDEVFS_CAP_MMAP
> and fix up proc_get_capabilities() to report the new capability.  In
> theory, user programs could just try an mmap() call and see if it
> works, but this seems cleaner.

Done.

/* Steinar */

>From 03318604cc6ad9def451784da407e6fcd6af4705 Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Thu, 26 Nov 2015 01:19:13 +0100
Subject: [PATCH] Add support for usbfs zerocopy.

Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for all kinds of transfers (even if
   nonsensical for control and interrupt transfers); isochronous also
   no longer need to memset() the buffer to zero to avoid leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

There's a new capability USBDEVFS_CAP_MMAP that userspace can query to see
if the running kernel supports this functionality, if just trying mmap() is
not acceptable.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
---
 drivers/usb/core/devio.c  | 230 +-
 include/uapi/linux/usbdevice_fs.h |   1 +
 2 files changed, 206 insertions(+), 25 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..3349222 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
spinlock_t lock;/* protects the async urb lists */
struct list_head async_pending;
struct list_head async_completed;
+   struct list_head memory_list;
wait_queue_head_t wait; /* wake up if a request completed */
unsigned int discsignr;
struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+   struct list_head memlist;
+   int vma_use_count;
+   int urb_use_count;
+   u32 size;
+   void *mem;
+   dma_addr_t dma_handle;
+   unsigned long vm_start;
+   struct usb_dev_state *ps;
+};
+
 struct async {
struct list_head asynclist;
struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
void __user *userbuffer;
void __user *userurb;
struct urb *urb;
+   struct usb_memory *usbm;
unsigned int mem_usage;
int status;
u32 secid;
@@ -157,6 +171,108 @@ static int connected(struct usb_dev_state *ps)
ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
+{
+   struct usb_dev_state *ps = usbm->ps;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   --*count;
+   if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) {
+   list_del(>memlist);
+   spin_unlock_irqrestore(>lock, flags);
+
+   usb_free_coherent(ps->dev, usbm->size, usbm->mem,
+   usbm->dma_handle);
+   usbfs_decrease_memory_usage(
+   usbm->size + sizeof(struct usb_memory));
+   kfree(usbm);
+   } else {
+   spin_unlock_irqrestore(>lock, flags);
+   }
+}
+
+static void usbdev_vm_open(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+   unsigned long flags;
+
+ 

Re: Infrastructure for zerocopy I/O

2015-12-07 Thread Steinar H. Gunderson
On Mon, Dec 07, 2015 at 10:45:55AM -0500, Alan Stern wrote:
>> Here we are. Lightly tested, I believe all comments should be addressed.
> This looks quite good.  I have only a couple of comments.

Excellent. I'm sorry we've needed so many rounds; this is what happens when
you pick up someone else's patch against a code base you've never touched
before. :-)

>>  - Zerocopy; data no longer needs to be copied between the userspace
>>and the kernel, but can instead be read directly by the driver from
>>userspace's buffers. This works for both bulk and isochronous transfers;
>>isochronous also no longer need to memset() the buffer to zero to avoid
>>leaking kernel data.
> Actually it now works for control and interrupt too, although there's
> not much point using it for them.

Updated description.

>> +as->usbm = find_memory_area(ps, uurb);
>> +if (IS_ERR(as->usbm)) {
>> +ret = PTR_ERR(as->usbm);
>> +goto error;
> Hmmm, and then what will happen when the error-handling code does this:

Urg, you pointed this out earlier and I didn't fix it. I'm setting it to
NULL in the error path now.

>> +}
>> +
>> +/* do not use SG buffers when memory mapped segments
>> + * are in use
>> + */
>> +if (!as->usbm) {
> You mean "if (as->usbm)".

Yes. Test now inverted.

>> +num_sgs = 0;
>> +}
> Braces aren't needed.

Went to the dentist to take them off.

> It looks odd repeating the "if (as->usbm)" test like this.  You can merge
> the stuff here into the "else" clause of the preceding code.

Merged. Updated patch below.

/* Steinar */

>From ec536f8abc16d9a920a0891db3bc9b6c05c2c43f Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Thu, 26 Nov 2015 01:19:13 +0100
Subject: [PATCH] Add support for usbfs zerocopy.

Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for all kinds of transfers (even if
   nonsensical for control and interrupt transfers); isochronous also
   no longer need to memset() the buffer to zero to avoid leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
---
 drivers/usb/core/devio.c | 227 ++-
 1 file changed, 203 insertions(+), 24 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..5faf1a0 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
spinlock_t lock;/* protects the async urb lists */
struct list_head async_pending;
struct list_head async_completed;
+   struct list_head memory_list;
wait_queue_head_t wait; /* wake up if a request completed */
unsigned int discsignr;
struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+   struct list_head memlist;
+   int vma_use_count;
+   int urb_use_count;
+   u32 size;
+   void *mem;
+   dma_addr_t dma_handle;
+   unsigned long vm_start;
+   struct usb_dev_state *ps;
+};
+
 struct async {
struct list_head asynclist;
struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
void __user *userbuffer;
void __user *userurb;
struct urb *urb;
+   struct usb_memory *usbm;
unsigned int mem_usage;
int status;
u32 secid;
@@ -157,6 +171,107 @@ static int connected(struct usb_dev_state *ps)

Re: Infrastructure for zerocopy I/O

2015-12-06 Thread Steinar H. Gunderson
On Sun, Dec 06, 2015 at 01:06:08AM +0100, Steinar H. Gunderson wrote:
> I'll try to update your patch with the other suggestions tomorrow. Thanks!

Here we are. Lightly tested, I believe all comments should be addressed.

/* Steinar */

>From 73833276a4f359c35edffc2a741dba57f2e82a12 Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Thu, 26 Nov 2015 01:19:13 +0100
Subject: [PATCH] Add support for usbfs zerocopy.

Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for both bulk and isochronous transfers;
   isochronous also no longer need to memset() the buffer to zero to avoid
   leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
---
 drivers/usb/core/devio.c | 202 ---
 1 file changed, 192 insertions(+), 10 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..40f988e 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
spinlock_t lock;/* protects the async urb lists */
struct list_head async_pending;
struct list_head async_completed;
+   struct list_head memory_list;
wait_queue_head_t wait; /* wake up if a request completed */
unsigned int discsignr;
struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+   struct list_head memlist;
+   int vma_use_count;
+   int urb_use_count;
+   u32 size;
+   void *mem;
+   dma_addr_t dma_handle;
+   unsigned long vm_start;
+   struct usb_dev_state *ps;
+};
+
 struct async {
struct list_head asynclist;
struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
void __user *userbuffer;
void __user *userurb;
struct urb *urb;
+   struct usb_memory *usbm;
unsigned int mem_usage;
int status;
u32 secid;
@@ -157,6 +171,107 @@ static int connected(struct usb_dev_state *ps)
ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
+{
+   struct usb_dev_state *ps = usbm->ps;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   --*count;
+   if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) {
+   list_del(>memlist);
+   spin_unlock_irqrestore(>lock, flags);
+
+   usb_free_coherent(ps->dev, usbm->size, usbm->mem,
+   usbm->dma_handle);
+   usbfs_decrease_memory_usage(
+   usbm->size + sizeof(struct usb_memory));
+   kfree(usbm);
+   } else {
+   spin_unlock_irqrestore(>lock, flags);
+   }
+}
+
+static void usbdev_vm_open(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+   unsigned long flags;
+
+   spin_lock_irqsave(>ps->lock, flags);
+   ++usbm->vma_use_count;
+   spin_unlock_irqrestore(>ps->lock, flags);
+}
+
+static void usbdev_vm_close(struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = vma->vm_private_data;
+
+   dec_usb_memory_use_count(usbm, >vma_use_count);
+}
+
+struct vm_operations_struct usbdev_vm_ops = {
+   .open = usbdev_vm_open,
+   .close = usbdev_vm_close
+};
+
+static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct usb_memory *usbm = NULL;
+   struct usb_dev_state *ps = file->private_data;
+ 

Re: Infrastructure for zerocopy I/O

2015-12-05 Thread Steinar H. Gunderson
On Sat, Dec 05, 2015 at 12:36:51PM -0500, Alan Stern wrote:
> I meant that this "if" statement should test only uurb_start.  If the 
> test succeeds, a nested "if" statement should test buffer_length and 
> return an error if it is too big.
> 
> That's as opposed to the way you have it now, where if buffer_length is 
> too big you return NULL.  The system would then try to treat the 
> coherent memory as a normal memory buffer, which may not be a good 
> idea.

OK. Why is it a problem to treat it as a normal memory buffer? (I understand
it would be slower, of course.)

>> I can't see
>> offhand another way to call it only once during the function, save for a bool
>> that says if we called it or not.
> Simply call it at the spot where you initialize as->usbm to NULL.  Then 
> it will always run exactly once.

OK, done.

> You don't really need to do it earlier.  An unnecessary calculation of
> num_sgs won't hurt.

Is it unnecessary? I thought the test was really to force num_sgs==0 for the
DMA case, not to save a little arithmetic.

> Comments below.  I'll skip much of the patch because it looks pretty
> good.

Good, I guess we're finally converting on something acceptable to all parties 
:-)

>> +ret = usbfs_increase_memory_usage(size + sizeof(struct usb_memory));
>> +if (ret) {
>> +kfree(usbm);
>> +return ret;
>> +}
> This part should come first.  If allocating usbm would put us over the
> memory limit, we want to know before we do the allocation.

Done.

>> +usbm->vm_start = vma->vm_start;
>> +usbm->vma_use_count = 1;
> I would move these two lines before the remap_pfn_range() call.  Then
> you can use >vma_use_count as the second argument to
> dec_usb_memory_use_count() and avoid the awkward "if (count)" test at 
> the start of that routine.

Done.

> So this would be:
> 
>   if (uurb_start >= iter->vm_start &&
>   uurb_start < iter->vm_start + iter->size) {
>   if (uurb->buffer_length > iter->vm_start + iter->size -
>   uurb_start) {
>   usbm = ERR_PTR(-EINVAL);
>   } else {
>   usbm = iter;
>   usbm->urb_use_count++;
>   }
>   break;
>   }
> 
> (That's with the file's convention for continuation lines.)

Done.

> Much of this function is still subject to change, so I won't look at it 
> now.

OK.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
>From 53f132d6fcd9117f50647e604487593adfe85990 Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Thu, 26 Nov 2015 01:19:13 +0100
Subject: [PATCH] Add support for usbfs zerocopy.

Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for both bulk and isochronous transfers;
   isochronous also no longer need to memset() the buffer to zero to avoid
   leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
---
 drivers/usb/core/devio.c | 202 ---
 1 file changed, 192 insertions(+), 10 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..6b253e8 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
 	spinlock_t lock;/* prote

Re: Infrastructure for zerocopy I/O

2015-12-05 Thread Steinar H. Gunderson
On Wed, Dec 02, 2015 at 10:54:52AM -0500, Alan Stern wrote:
> Well, it's partly a matter of taste.  But there's also a race: This 
> code adds usbm to ps->memory_list (making it available to URB 
> submissions running on other CPUs) before usbm has been completely 
> initialized (vm_start isn't set yet).

OK. Joined the functions.

> find_memory_area() is used in only one place.  You can put everything 
> together if you move usbdev_mmap() and associated routines up near the 
> start of the file, after usbdev_read().

Done. I hope I got it the way you wanted.

> (Yes, I did.)  On further thought, testing uurb_start is sufficient.
> If uurb->buffer_length then turns out to be too big, the function
> should return an error (or rather, an ERR_PTR) and the URB submission
> should fail.

I don't understand “should” here. I read it as “you should change your
function for this”, but this doesn't rhyme with that uurb_start should be
sufficient. Do you want me to just remove the check? Is there something else
that would guard against it?

>> Done. (I saw you used a plural; is there more than this one?)
> Well, alloc_urb_memory() also can fail.

I can't find this function. What do you mean?

>> New patch attached. However, it's not working; I'm getting several of these
>> messages:
>> 
>>   [   28.796244] DMAR: Allocating domain for 2-2 failed
> I don't know what the reason is for that.  It may be that your kernel 
> isn't configured to allocate as much coherent memory as you are asking 
> for.  We'll have to investigate further, maybe ask somebody who knows 
> more about how the DMA subsystem works.

I found out why; I had messed up the transfer_dma address calculation,
subtracting the kernel memory address instead of the VM address. After that,
the patch works again.

>> +struct usb_memory {
>> +struct list_head memlist;
>> +int vma_use_count;
>> +int usb_use_count;
> This name has disturbed me.  It seems like urb_use_count would be more
> descriptive ("urb" rather than "usb").

Changed.

> The GFP_DMA32 isn't necessary now.  dma_zalloc_coherent() is smart
> enough to allocate memory that is compatible with the device's DMA
> mask.  If the hardware can do 64-bit DMA then the buffer doesn't need
> to be located in the first 4 GB.

Done.

> By the way, the convention in this file is to indent continuation
> lines by two tab stops beyond the first line, not to align things with
> an open paren or anything else.  (Applies elsewhere too.)

Done. (That's an unusual convention.)

> I would allocate these two in the opposite order.  Just because the
> DMA routines involve more work than kzalloc/kfree.

Done. My thought, by the way, was that the DMA routines were much more likely
to fail.

>> +usbm->mem = mem;
>> +usbm->dma_handle = dma_handle;
>> +usbm->size = size;
>> +usbm->ps = ps;
>> +spin_lock_irqsave(>lock, flags);
>> +list_add_tail(>memlist, >memory_list);
>> +spin_unlock_irqrestore(>lock, flags);
> Like I mentioned earlier, it shouldn't be added to the list until it
> is ready to be used.

Done.

>>  static void free_async(struct async *as)
>>  {
>> +struct usb_memory *usbm = NULL;
> Not needed any more.

Done.

>> +if (find_memory_area(ps, uurb) == NULL) {
> Don't call find_memory_area() twice!  Save this result and use it
> later.  Also, move this call up before the "switch" statement so that
> it will apply to all transfer types, not just bulk.
> 
> Suitable adjustments will be needed in the rest of this routine, but
> nothing that requires additional review comments.

This part I don't understand. You want to populate usbm (by calling
find_memory_area()) unconditionally, also for control transfers? I can't see
offhand another way to call it only once during the function, save for a bool
that says if we called it or not.

New patch attached. Note that I haven't had the chance to retest after some
of the latest changes.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
>From d03fc262072183887825f889996244e22ff74438 Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Thu, 26 Nov 2015 01:19:13 +0100
Subject: [PATCH] Add support for usbfs zerocopy.

Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for both bulk and isochronous transfers;
   isochronous also no longer need to memset() the buffer to zero to avoid
   leaking kernel data.

 - Once the buffers are allocated, USB transfers ca

Re: Infrastructure for zerocopy I/O

2015-12-05 Thread Steinar H. Gunderson
On Sat, Dec 05, 2015 at 05:12:03PM -0500, Alan Stern wrote:
> To tell the truth, I'm not sure whether it would be a problem or not.  
> That's why I said it "may" not be a good idea.

Fair enough.

>>> You don't really need to do it earlier.  An unnecessary calculation of
>>> num_sgs won't hurt.
>> Is it unnecessary? I thought the test was really to force num_sgs==0 for the
>> DMA case, not to save a little arithmetic.
> Well, if you calculate num_sgs and then force it to 0 anyway, the 
> calculation was unnecessary, right?

But the logic never calculates num_sgs if usbm == NULL? The if terminates
early and num_sgs stays 0.

Granted, the last version of the patch botched this and inverted the
condition. I'll fix that.

> BTW, in the future, could you put your patches in the body of the
> emails instead of making them attachments?  That's how we normally do 
> things on this mailing list; it makes replying and commenting on 
> patches easier.

Will do.

>> -num_sgs = DIV_ROUND_UP(uurb->buffer_length, USB_SG_SIZE);
>> -if (num_sgs == 1 || num_sgs > ps->dev->bus->sg_tablesize)
>> -num_sgs = 0;
>> +/* do not use SG buffers when memory mapped segments
>> + * are in use
>> + */
>> +if (as->usbm) {
>> +num_sgs = DIV_ROUND_UP(uurb->buffer_length,
>> +USB_SG_SIZE);
>> +if (num_sgs == 1 ||
>> +num_sgs > ps->dev->bus->sg_tablesize) {
>> +num_sgs = 0;
>> +}
>> +}
> No, no.  Leave this part of the code unchanged.  as hasn't even been
> allocated yet.

Good point. I did warn you about the lack of testing... :-)

By “unchanged”, do you mean unchanged from previous patch or from the
original? Because this was here all along from Markus' version of the patch.

> As pointed out repeatedly by the kbuild test robot, this should be
> IS_ERR, not IS_ERR_VALUE.  Also, you need to set as->usbm back to NULL 
> before jumping.
> 
> At this point, set num_sgs to 0 if as->usbm is non-NULL.

Didn't you just point out that this would be redundant calculation?

I'll try to update your patch with the other suggestions tomorrow. Thanks!

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-12-04 Thread Steinar H. Gunderson
On Thu, Dec 03, 2015 at 04:31:35PM -0500, Alan Stern wrote:
>> Seemingly controller is already a pointer, so the & is redundant. No idea why
>> it didn't crash plain out. Fixing that, I can allocate, although I have some
>> other bug causing an oops somewhere else.
> Ah, an easy mistake to miss.  But you should have gotten a warning from 
> the compiler about incompatible pointer types.

I did. And I missed it, because a) things were scrolling fast, and
b) I'm used to coding with -Werror, where you simply cannot ignore this kind
of thing by accident :-) (Not to mention that I usually code C++, where this
is plain out an error, not a warning.)

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-12-04 Thread Steinar H. Gunderson
On Fri, Dec 04, 2015 at 01:29:05AM +0100, Steinar H. Gunderson wrote:
> I must be doing something wrong, because I don't seem to get any frames
> from my video capture, and when I close the application, I get a slowpath
> warning:

OK, the slowpath warning is a red herring -- it's just because I do the free
under a spinlock, and I can fix that.

There's still something that causes me to not get any frames, though.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-12-03 Thread Steinar H. Gunderson
On Thu, Dec 03, 2015 at 10:37:27AM -0500, Alan Stern wrote:
>> FWIW, I tried against >dev->bus->controller, and it didn't give me any
>> error messages, but still returns NULL.
> I'm at a loss.  

Seemingly controller is already a pointer, so the & is redundant. No idea why
it didn't crash plain out. Fixing that, I can allocate, although I have some
other bug causing an oops somewhere else.

But you still think usb_alloc_coherent() is the better way to go? Should be
easy enough to change, just let me know.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-12-03 Thread Steinar H. Gunderson
On Thu, Dec 03, 2015 at 10:18:20PM +0100, Steinar H. Gunderson wrote:
> But you still think usb_alloc_coherent() is the better way to go? Should be
> easy enough to change, just let me know.

I must be doing something wrong, because I don't seem to get any frames
from my video capture, and when I close the application, I get a slowpath
warning:

[  188.781392] [ cut here ]
[  188.781409] WARNING: CPU: 0 PID: 1747 at 
include/asm-generic/dma-mapping-common.h:274 hcd_buffer_free+0xe9/0x130 
[usbcore]()
[  188.781422] Modules linked in: ctr ccm joydev nls_utf8 nls_cp437 vfat fat 
ext2 arc4 intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp 
snd_hda_codec_realtek snd_hda_codec_generic kvm_intel snd_hda_codec_hdmi 
iTCO_wdt iTCO_vendor_support kvm crct10dif_pclmul crc32_pclmul crc32c_intel 
jitterentropy_rng sha256_generic hmac drbg evdev aesni_intel psmouse aes_x86_64 
iwlmvm glue_helper lrw serio_raw gf128mul ablk_helper pcspkr cryptd mac80211 
efivars i2c_i801 sg thinkpad_acpi lpc_ich iwlwifi nvram snd_hda_intel 
rtsx_pci_ms memstick snd_hda_codec snd_seq cfg80211 snd_hwdep mei_me 
snd_seq_device snd_hda_core i915 snd_pcm mei snd_timer wmi i2c_algo_bit 
drm_kms_helper snd ac drm rfkill tpm_tis tpm battery i2c_core soundcore video 
button processor autofs4 ext4 crc16 mbcache jbd2 dm_mod sd_mod
[  188.781526]  rtsx_pci_sdmmc mmc_core ahci libahci libata scsi_mod ehci_pci 
ehci_hcd rtsx_pci mfd_core xhci_pci xhci_hcd e1000e ptp usbcore pps_core 
usb_common thermal
[  188.781550] CPU: 0 PID: 1747 Comm: QXcbEventReader Tainted: GW   
4.3.0-usb+ #34
[  188.781554] Hardware name: LENOVO 20ALCT01WW/20ALCT01WW, BIOS GIET83WW (2.33 
) 08/25/2015
[  188.781557]  c003bb40 812a5113  
81068a2c
[  188.781563]  4000 8802340c1098 8189c400 
8802345d8400
[  188.781570]  0282 c002e279 88020c968000 
ffeb
[  188.781578] Call Trace:
[  188.781584]  [] ? dump_stack+0x40/0x5d
[  188.781596]  [] ? warn_slowpath_common+0x7c/0xb0
[  188.781613]  [] ? hcd_buffer_free+0xe9/0x130 [usbcore]
[  188.781630]  [] ? dec_usb_memory_use_count+0x56/0x80 
[usbcore]
[  188.781647]  [] ? free_async+0xa2/0xf0 [usbcore]
[  188.781666]  [] ? usbdev_release+0xde/0x130 [usbcore]
[  188.781680]  [] ? __fput+0xc5/0x1d0
[  188.781694]  [] ? task_work_run+0x6f/0x90
[  188.781711]  [] ? do_exit+0x36e/0xa80
[  188.781724]  [] ? check_preempt_curr+0x74/0x80
[  188.781737]  [] ? ttwu_do_wakeup+0xf/0xc0
[  188.781753]  [] ? try_to_wake_up+0x3e/0x330
[  188.781767]  [] ? do_group_exit+0x34/0xb0
[  188.781783]  [] ? get_signal+0x297/0x680
[  188.781799]  [] ? do_signal+0x1e/0x670
[  188.781812]  [] ? wake_up_q+0x60/0x60
[  188.781828]  [] ? vfs_write+0x13e/0x180
[  188.781834]  [] ? prepare_exit_to_usermode+0xa3/0xf0
[  188.781840]  [] ? int_ret_from_sys_call+0x25/0x8f
[  188.781844] ---[ end trace 8fd705c87c4a8887 ]---
[  188.781850] [ cut here ]

I'm attaching the current patch for reference. I know I have comments of
yours to attend to, but I'd like to try to figure out what's wrong before I
embark on polish.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
>From e00443ca99bf5bb4871a7714ce244b63b35e8f83 Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Thu, 26 Nov 2015 01:19:13 +0100
Subject: [PATCH] Add support for usbfs zerocopy.

Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for both bulk and isochronous transfers;
   isochronous also no longer need to memset() the buffer to zero to avoid
   leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
---
 drivers/usb/core/devio.c | 201 ++

Re: Infrastructure for zerocopy I/O

2015-12-02 Thread Steinar H. Gunderson
On Wed, Dec 02, 2015 at 10:54:52AM -0500, Alan Stern wrote:
>>   [   28.796244] DMAR: Allocating domain for 2-2 failed
> I don't know what the reason is for that.  It may be that your kernel 
> isn't configured to allocate as much coherent memory as you are asking 
> for.  We'll have to investigate further, maybe ask somebody who knows 
> more about how the DMA subsystem works.

I'm thinking; maybe should the memory not be allocated against the USB
device, but against the controller it hangs on? (Note that it complains about
allocating the _domain_, not the memory itself.) Do you know if that's
possible from this point in the code?

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-12-02 Thread Steinar H. Gunderson
On Thu, Dec 03, 2015 at 12:51:14AM +0100, Steinar H. Gunderson wrote:
> I'm thinking; maybe should the memory not be allocated against the USB
> device, but against the controller it hangs on? (Note that it complains about
> allocating the _domain_, not the memory itself.) Do you know if that's
> possible from this point in the code?

FWIW, I tried against >dev->bus->controller, and it didn't give me any
error messages, but still returns NULL.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-12-01 Thread Steinar H. Gunderson
>> + */
>> +if (list_empty(>memory_list)) {
> No, first call find_memory_area().  If the result is NULL then do this.

Done.

>> +as->ps = ps;
>> +
> Why move this?  Isn't it fine where it is?

Reverted. (It was part of some other cleanup that I tried, I think.)

>>  as->userurb = arg;
>> -if (is_in && uurb->buffer_length > 0)
>> +if (is_in && uurb->buffer_length > 0 && usbm == NULL)
> I think you should keep this even when usbm is not NULL.  No?

No, then processcompl() will try to copy data there, which defeats the point
of zerocopy. What other use would the userbuffer member have?

>>  as->userbuffer = uurb->buffer;
>>  else
>>  as->userbuffer = NULL;
> When you use dma_zalloc_coherent(), you have to tell the USB core that
> the buffer is already mapped for DMA by setting the URB_NO_TRANSFER_DMA_MAP
> flag in urb->transfer_flags and by storing the DMA address in
> urb->transfer_dma.

Done. I have one question, though: Should I offset the transfer_dma address
to match with the offset in the buffer, or is this just a handle? (I've
assumed the latter, but I'm not at all sure it's correct.)

>> +if (usbm)
>> +dec_use_count(usbm, >usb_use_count);
>>  kfree(isopkt);
>>  kfree(dr);
>>  if (as)
> You have to change processcompl().  The copy_urb_data() call isn't
> needed when zerocopy is being used -- in fact, it rather defeats the 
> purpose of zerocopy!

See previous comment on this. :-)

>> +if (remap_pfn_range(vma, vma->vm_start,
>> +virt_to_phys(usbm->mem) >> PAGE_SHIFT,
>> +size, vma->vm_page_prot) < 0)
>> +return -EAGAIN;
> As Oliver pointed out, the memory needs to be reclaimed and accounted
> for on the failure pathways.

Done. (I saw you used a plural; is there more than this one?)

New patch attached. However, it's not working; I'm getting several of these
messages:

  [   28.796244] DMAR: Allocating domain for 2-2 failed

/* Steinar */
-- 
Homepage: https://www.sesse.net/
>From c470a7907df4b1ff526900400e8ba0a1f4cca529 Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Thu, 26 Nov 2015 01:19:13 +0100
Subject: [PATCH] Add support for usbfs zerocopy.

This is essentially a patch by Markus Rechberger with some updates.
The original can be found at

  http://sundtek.de/support/devio_mmap_v0.4.diff

This version has the following changes:

  - Rebased against a newer kernel (with some conflicts fixed).
  - Fixed all checkpatch violations and some style issues.
  - Fixes an issue where isochronous transfers would not really be
zero-copy, but go through a pointless memcpy from one area to
itself.
  - Ask for cached memory instead of uncached.
  - Drop the custom ioctl; use mmap only for allocation.

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
---
 drivers/usb/core/devio.c | 202 ---
 1 file changed, 192 insertions(+), 10 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..95c9fed 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
 	spinlock_t lock;/* protects the async urb lists */
 	struct list_head async_pending;
 	struct list_head async_completed;
+	struct list_head memory_list;
 	wait_queue_head_t wait; /* wake up if a request completed */
 	unsigned int discsignr;
 	struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
 	u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+	struct list_head memlist;
+	int vma_use_count;
+	int usb_use_count;
+	u32 size;
+	void *mem;
+	dma_addr_t dma_handle;
+	unsigned long vm_start;
+	struct usb_dev_state *ps;
+};
+
 struct async {
 	struct list_head asynclist;
 	struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
 	void __user *userbuffer;
 	void __user *userurb;
 	struct urb *urb;
+	struct usb_memory *usbm;
 	unsigned int mem_usage;
 	int status;
 	u32 secid;
@@ -157,6 +171,75 @@ static int connected(struct usb_dev_state *ps)
 			ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static struct usb_memory *
+alloc_usb_memory(struct usb_dev_state *ps, size_t size)
+{
+	struct usb_memory *usbm;
+	void *mem;
+	unsigned long flags;
+	dma_addr_t dma_handle;
+
+	mem = dma_zalloc_coherent(>dev->dev, size, _handle,
+  GFP_KERNEL | GFP_DMA32);
+	if (!mem)
+		return NULL;
+
+	usbm = kzalloc(sizeof(struct usb_memory), GFP_KERNEL);
+	if (!usbm) {
+		dma_free_coherent(>dev->dev, size, mem

Re: Infrastructure for zerocopy I/O

2015-11-26 Thread Steinar H. Gunderson
On Thu, Nov 26, 2015 at 01:26:32AM +0100, Steinar H. Gunderson wrote:
>> There are numerous smaller issues: The allocated memory needs to be 
>> charged against usbfs_memory_usage
> I'll fix this.

Fixed in updated patch (attached).

> I'll fix this, too.

Also fixed.

Now also checkpatch clean.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
>From 558466c34b1cbb8f12f3c56042558fdce2e7f87e Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Thu, 26 Nov 2015 01:19:13 +0100
Subject: [PATCH] Add support for usbfs zerocopy.

This is essentially a patch by Markus Rechberger with some updates.
The original can be found at

  http://sundtek.de/support/devio_mmap_v0.4.diff

This version has the following changes:

  - Rebased against a newer kernel (with some conflicts fixed).
  - Fixed all checkpatch violations and some style issues.
  - Fixes an issue where isochronous transfers would not really be
zero-copy, but go through a pointless memcpy from one area to
itself.
  - Ask for cached memory instead of uncached.
  - Drop the custom ioctl; use mmap only for allocation.

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
---
 drivers/usb/core/devio.c | 203 ---
 1 file changed, 193 insertions(+), 10 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..9a1a7d6 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -69,6 +69,7 @@ struct usb_dev_state {
 	spinlock_t lock;/* protects the async urb lists */
 	struct list_head async_pending;
 	struct list_head async_completed;
+	struct list_head memory_list;
 	wait_queue_head_t wait; /* wake up if a request completed */
 	unsigned int discsignr;
 	struct pid *disc_pid;
@@ -96,6 +97,17 @@ struct async {
 	u8 bulk_status;
 };
 
+struct usb_memory {
+	struct list_head memlist;
+	int vma_use_count;
+	int usb_use_count;
+	u32 offset;
+	u32 size;
+	void *mem;
+	unsigned long vm_start;
+	struct usb_dev_state *ps;
+};
+
 static bool usbfs_snoop;
 module_param(usbfs_snoop, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(usbfs_snoop, "true to log all usbfs traffic");
@@ -157,6 +169,74 @@ static int connected(struct usb_dev_state *ps)
 			ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static struct usb_memory *
+alloc_usb_memory(struct usb_dev_state *ps, size_t size)
+{
+	struct usb_memory *usbm;
+	void *mem;
+	unsigned long flags;
+
+	mem = alloc_pages_exact(size, GFP_KERNEL | GFP_DMA32);
+	if (!mem)
+		return NULL;
+
+	usbm = kzalloc(sizeof(struct usb_memory), GFP_KERNEL);
+	if (!usbm) {
+		free_pages_exact(mem, size);
+		return NULL;
+	}
+	memset(mem, 0x0, PAGE_SIZE << get_order(size));
+	usbm->mem = mem;
+	usbm->offset = virt_to_phys(mem);
+	usbm->size = size;
+	usbm->ps = ps;
+	spin_lock_irqsave(>lock, flags);
+	list_add_tail(>memlist, >memory_list);
+	spin_unlock_irqrestore(>lock, flags);
+
+	return usbm;
+}
+
+static void dec_use_count(struct usb_memory *usbm, int *count)
+{
+	struct usb_dev_state *ps = usbm->ps;
+	unsigned long flags;
+
+	spin_lock_irqsave(>lock, flags);
+	WARN_ON(count != >usb_use_count && count != >vma_use_count);
+	--*count;
+	if (usbm->usb_use_count == 0 && usbm->vma_use_count == 0) {
+		list_del_init(>memlist);
+		free_pages_exact(usbm->mem, usbm->size);
+		usbfs_decrease_memory_usage(
+			usbm->size + sizeof(struct usb_memory));
+		kfree(usbm);
+	}
+	spin_unlock_irqrestore(>lock, flags);
+}
+
+static struct usb_memory *
+find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
+{
+	struct usb_memory *usbm = NULL, *iter;
+	unsigned long flags;
+	unsigned long uurb_start = (unsigned long)uurb->buffer;
+
+	spin_lock_irqsave(>lock, flags);
+	list_for_each_entry(iter, >memory_list, memlist) {
+		if (iter->usb_use_count == 0 &&
+		uurb_start >= iter->vm_start &&
+		uurb->buffer_length <= iter->vm_start - uurb_start +
+(PAGE_SIZE << get_order(iter->size))) {
+			usbm = iter;
+			usbm->usb_use_count++;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(>lock, flags);
+	return usbm;
+}
+
 static loff_t usbdev_lseek(struct file *file, loff_t offset, int orig)
 {
 	loff_t ret;
@@ -288,6 +368,9 @@ static struct async *alloc_async(unsigned int numisoframes)
 
 static void free_async(struct async *as)
 {
+	struct usb_memory *usbm = NULL, *usbm_iter;
+	unsigned long flags;
+	struct usb_dev_state *ps = as->ps;
 	int i;
 
 	put_pid(as->pid);
@@ -297,8 +380,22 @@ static void free_async(struct async *as)
 		if (sg_page(>urb->sg[i]))
 			kfree(sg_virt(>urb->sg[i]));
 	}
+
+	spin_lock_irqsave(>lock, flags);
+	list_for_each_entry(usbm_iter, >memory_list, memlist) {
+		if (usbm_iter->mem == as->urb->tran

Re: Infrastructure for zerocopy I/O

2015-11-25 Thread Steinar H. Gunderson
On Wed, Nov 25, 2015 at 10:29:53AM -0500, Alan Stern wrote:
> I want to see a modified version of your patch.  Several things need to 
> be changed or fixed, but the major change needs to be the way memory is 
> allocated.  It should be done as part of the mmap system call, not as a 
> separate ioctl.
> 
> There are numerous smaller issues: The allocated memory needs to be 
> charged against usbfs_memory_usage, the memory should be allocated 
> using dma_zalloc_coherent instead of kmalloc, proc_do_submiturb should 
> check that the buffer starts anywhere within the allocated memory 
> rather than just at the beginning, and so on.
> 
> I'd make these changes myself, but I just don't have enough free time 
> now.  If anybody else would like to work on them, please feel free.

I can take a stab at fixing some of these. Perhaps not all of them,
but maybe it will be a useful starting point.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-11-25 Thread Steinar H. Gunderson
On Wed, Nov 25, 2015 at 10:29:53AM -0500, Alan Stern wrote:
> I want to see a modified version of your patch.  Several things need to 
> be changed or fixed, but the major change needs to be the way memory is 
> allocated.  It should be done as part of the mmap system call, not as a 
> separate ioctl.

I took a stab at this. The attached patch doesn't fix any of your other
issues, but could you take a look at whether the mmap part looks sane?
I've verified that it works in practice with my own code, and see no
performance regressions. It is still the case that you can't mix mmap
and non-mmap transfers on the same device; I suppose that should be governed
by a zerocopy flag instead of “are there any mmap-ed areas”.

Let me go through your other issues:

> There are numerous smaller issues: The allocated memory needs to be 
> charged against usbfs_memory_usage

I'll fix this.

> the memory should be allocated using dma_zalloc_coherent instead of
> kmalloc, 

dma_zalloc_coherent returns a dma_handle in addition to the memory itself;
should we just throw this away?

> proc_do_submiturb should 
> check that the buffer starts anywhere within the allocated memory 
> rather than just at the beginning

I'll fix this, too.

> , and so on.

Clarification appreciated ;-)

/* Steinar */
-- 
Homepage: https://www.sesse.net/
>From 2bb0f7b91d41de0cbd1e8984612252d101fbb2ca Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Thu, 26 Nov 2015 01:19:13 +0100
Subject: [PATCH] Add support for usbfs zerocopy.

This is essentially a patch by Markus Rechberger with some updates.
The original can be found at

  http://sundtek.de/support/devio_mmap_v0.4.diff

This version has the following changes:

  - Rebased against a newer kernel (with some conflicts fixed).
  - Fixed most checkpatch violations (some remain) and some style issues.
  - Fixes an issue where isochronous transfers would not really be
zero-copy, but go through a pointless memcpy from one area to
itself.
  - Ask for cached memory instead of uncached.
  - Drop the custom ioctl; use mmap only for allocation.

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>
---
 drivers/usb/core/devio.c | 188 +++
 1 file changed, 175 insertions(+), 13 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..a17263d56 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -69,6 +69,7 @@ struct usb_dev_state {
 	spinlock_t lock;/* protects the async urb lists */
 	struct list_head async_pending;
 	struct list_head async_completed;
+	struct list_head memory_list;
 	wait_queue_head_t wait; /* wake up if a request completed */
 	unsigned int discsignr;
 	struct pid *disc_pid;
@@ -96,6 +97,17 @@ struct async {
 	u8 bulk_status;
 };
 
+struct usb_memory {
+	struct list_head memlist;
+	int vma_use_count;
+	int usb_use_count;
+	u32 offset;
+	u32 size;
+	void *mem;
+	unsigned long vm_start;
+	struct usb_dev_state *ps;
+};
+
 static bool usbfs_snoop;
 module_param(usbfs_snoop, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(usbfs_snoop, "true to log all usbfs traffic");
@@ -157,6 +169,22 @@ static int connected(struct usb_dev_state *ps)
 			ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_use_count(struct usb_memory *usbm, int *count)
+{
+	struct usb_dev_state *ps = usbm->ps;
+	unsigned long flags;
+
+	spin_lock_irqsave(>lock, flags);
+	WARN_ON(count != >usb_use_count && count != >vma_use_count);
+	--*count;
+	if (usbm->usb_use_count == 0 && usbm->vma_use_count == 0) {
+		list_del_init(>memlist);
+		free_pages_exact(usbm->mem, usbm->size);
+		kfree(usbm);
+	}
+	spin_unlock_irqrestore(>lock, flags);
+}
+
 static loff_t usbdev_lseek(struct file *file, loff_t offset, int orig)
 {
 	loff_t ret;
@@ -288,6 +316,9 @@ static struct async *alloc_async(unsigned int numisoframes)
 
 static void free_async(struct async *as)
 {
+	struct usb_memory *usbm = NULL, *usbm_iter;
+	unsigned long flags;
+	struct usb_dev_state *ps = as->ps;
 	int i;
 
 	put_pid(as->pid);
@@ -297,8 +328,22 @@ static void free_async(struct async *as)
 		if (sg_page(>urb->sg[i]))
 			kfree(sg_virt(>urb->sg[i]));
 	}
+
+	spin_lock_irqsave(>lock, flags);
+	list_for_each_entry(usbm_iter, >memory_list, memlist) {
+		if (usbm_iter->mem == as->urb->transfer_buffer) {
+			usbm = usbm_iter;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(>lock, flags);
+
 	kfree(as->urb->sg);
-	kfree(as->urb->transfer_buffer);
+	if (usbm == NULL)
+		kfree(as->urb->transfer_buffer);
+	else
+		dec_use_count(usbm, >usb_use_count);
+
 	kfree(as->urb->setup_packet);
 	usb_free_urb(as->urb);
 	usbfs_decrease_memory_usage(as->mem_usage);
@@ -910,6

Re: Infrastructure for zerocopy I/O

2015-11-17 Thread Steinar H. Gunderson
On Mon, Nov 16, 2015 at 03:22:06PM -0500, Alan Stern wrote:
>   Check that userspace requested zerocopy (otherwise the user 
>   program might try to access other data stored in the same cache 
>   lines as the buffer while the I/O is in progres);

Needed for send only, right? For receive, I guess you're not allowed to
modify the buffer while the transfer is going on anyway.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-11-17 Thread Steinar H. Gunderson
On Tue, Nov 17, 2015 at 07:17:31PM +0100, Markus Rechberger wrote:
> 1. the memset on isochronous transfers to empty the buffers in order
> to avoid leaking raw memory to userspace (this costs a lot on intel
> Atoms and is also noticeable on other systems).
> 
> 2. the memory fragmentation. Seems like recent systems have a better
> performance here since we did not get that report for several months
> now, or maybe the user behavior changed.
> Some older Linux systems (maybe 2-3 years old) triggered this issue
> way more often.

I guess if we get transparent zerocopy, both of these are going away
just like with your patch, right? The only difference is really who sets up
the memory area (the kernel or not).

Alan, could we perhaps let the zerocopy flag make the request fail (instead
of going through a bounce buffer) if direct DMA is not possible? That way,
it would be quite obvious that you need to allocate the memory some other way
instead of silently hitting the issues Markus mention.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-11-17 Thread Steinar H. Gunderson
On Tue, Nov 17, 2015 at 02:07:58PM -0500, Alan Stern wrote:
> Is there an API for allocating user memory below 4 GB?  Would a new 
> MMAP flag be acceptable?

MAP_32BIT (to mmap) can seemingly do this, but from the man page, it sounds
more like a kludge than anything else.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-11-17 Thread Steinar H. Gunderson
On Tue, Nov 17, 2015 at 02:13:49PM -0500, Alan Stern wrote:
> But what other way of allocating memory is there?

For my part, GPU memory versus malloc(). (You can ask OpenGL to permanently
map up a chunk of GPU memory for you into userspace, but you have no
guarantees as of if it's DMA-able. But typical memory from malloc() might.)

It might be overengineering things, though. I'd be fairly happy if I only had
zerocopy in most common situations. (Does xHCI have this 32-bit limitation?)

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-11-17 Thread Steinar H. Gunderson
On Tue, Nov 17, 2015 at 03:01:39PM -0500, Alan Stern wrote:
>> On Tue, Nov 17, 2015 at 02:07:58PM -0500, Alan Stern wrote:
>>> Is there an API for allocating user memory below 4 GB?  Would a new 
>>> MMAP flag be acceptable?
>> MAP_32BIT (to mmap) can seemingly do this, but from the man page, it sounds
>> more like a kludge than anything else.
> No, that's not what I'm talking about.  MAP_32BIT means that the new 
> memory's _virtual_ address should be below 2 GB.  I'm talking about the 
> _physical_ address.

Sorry, I misunderstood the documentation.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-11-17 Thread Steinar H. Gunderson
On Tue, Nov 17, 2015 at 03:16:55PM -0500, Alan Stern wrote:
>>> But what other way of allocating memory is there?
>> For my part, GPU memory versus malloc(). (You can ask OpenGL to permanently
>> map up a chunk of GPU memory for you into userspace, but you have no
>> guarantees as of if it's DMA-able. But typical memory from malloc() might.)
> I don't think there's any reason to expect malloc to provide memory
> where you need it.  Also, since the memory it provides isn't locked,
> the kernel can move it to any physical address.

Well, fair enough, given that it can be moved. But for a system with <4GB RAM,
I would be pretty sure.

(In my hypothetical situation, my priority list would be 1. Zerocopy to GPU
memory and process using compute shaders, 2. Zerocopy to CPU memory and
process using AVX2, 3. Non-zerocopy to GPU memory. But in reality, I'm
probably too lazy to maintain two code paths.)

> xHCI always uses 64-bit addresses.  But many EHCI controllers don't, 
> and only a few of the EHCI platform drivers support 64-bit DMA.

OK, sure. But are systems with USB2 only and more than 4GB of RAM common?
(And will they not increasingly die out, if nothing else as USB3 becomes
commonplace?)

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-16 Thread Steinar H. Gunderson
On Mon, Nov 16, 2015 at 01:49:50AM +0100, Steinar H. Gunderson wrote:
> Just so we're on the same page, I cleaned up the patch I'm now using, and I'm
> attaching it here. You said the next step would be changing the memory
> allocation interface; I guess I could give it a shot, but I doubt I would
> understand the subtleties involved.

Cc-ing Markus Rechberger (original patch author), as he wanted to be kept in
the loop. Also:

Signed-off-by: Steinar H. Gunderson <se...@google.com>
Signed-off-by: Markus Rechberger <mrechber...@gmail.com>

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-11-16 Thread Steinar H. Gunderson
On Mon, Nov 16, 2015 at 01:29:58PM -0500, Alan Stern wrote:
> A proposed patch has been posted
> (http://marc.info/?l=linux-usb=144763502829452=2), but I'm not
> convinced that it is the best approach.  For instance, it always tries 
> to get contiguous pages and so is vulnerable to memory fragmentation.

What Alan says is right, but there's one additional piece of information:
The patch also makes allocation an up-front issue (the memory block allocated
can be reused across transfers), so if you can allocate at the start of the
program, you will not be affected by later fragmentation.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infrastructure for zerocopy I/O

2015-11-16 Thread Steinar H. Gunderson
On Mon, Nov 16, 2015 at 07:55:45PM +0100, Christoph Hellwig wrote:
>> A proposed patch has been posted
>> (http://marc.info/?l=linux-usb=144763502829452=2), but I'm not
>> convinced that it is the best approach.  For instance, it always tries 
>> to get contiguous pages and so is vulnerable to memory fragmentation.
> This looks totally crazy to me.  What's the use case for it?

The original use case: DVB capture on embedded devices.

My use case: USB3 uncompressed HD video capture (from multiple cards).

Both of these hit (beyond the speed boost from zerocopy) the problem that
as time goes by, memory gets more fragmented and usbfs fails allocation.
Allocating memory up-front solves that.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-15 Thread Steinar H. Gunderson
On Sat, Nov 14, 2015 at 12:36:48PM -0500, Alan Stern wrote:
> Hmmm.  If the shared memory can be accessed by the CPU using ordinary
> load and store instructions and it can be kmalloc'ed, then it should be
> usable for zerocopy I/O.  But again, only if it satisfies the
> restrictions imposed by the USB controller's DMA hardware.

I don't honestly know how DRM allocates its memory, and the proprietary
drivers from NVIDIA/AMD probably is a different story entirely. I guess the
only real way would be to have the kernel see if the memory given in is
acceptable, and that's probably a different approach from what this one does.

Just so we're on the same page, I cleaned up the patch I'm now using, and I'm
attaching it here. You said the next step would be changing the memory
allocation interface; I guess I could give it a shot, but I doubt I would
understand the subtleties involved.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
>From 4cf27fbf8000eb6a5ea75c87cd6315ed82d0 Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <se...@google.com>
Date: Mon, 16 Nov 2015 01:36:38 +0100
Subject: [PATCH] Add support for usbfs zerocopy.

This is essentially a patch by Markus Rechberger with some updates.
The original can be found at

  http://sundtek.de/support/devio_mmap_v0.4.diff

This version has the following changes:

  - Rebased against a newer kernel (with some conflicts fixed).
  - Fixed most checkpatch violations (some remain).
  - Fixes an issue where isochronous transfers would not really be
zero-copy, but go through a pointless memcpy from one area to
itself.
  - Ask for cached memory instead of uncached.
---
 drivers/usb/core/devio.c  | 229 +++---
 include/uapi/linux/usbdevice_fs.h |   8 ++
 2 files changed, 224 insertions(+), 13 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 986abde..39e8c2b 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -69,6 +69,7 @@ struct usb_dev_state {
 	spinlock_t lock;/* protects the async urb lists */
 	struct list_head async_pending;
 	struct list_head async_completed;
+	struct list_head memory_list;
 	wait_queue_head_t wait; /* wake up if a request completed */
 	unsigned int discsignr;
 	struct pid *disc_pid;
@@ -96,6 +97,16 @@ struct async {
 	u8 bulk_status;
 };
 
+struct usb_memory {
+	struct list_head memlist;
+	int vma_use_count;
+	int usb_use_count;
+	u32 offset;
+	u32 size;
+	void *mem;
+	unsigned long vm_start;
+};
+
 static bool usbfs_snoop;
 module_param(usbfs_snoop, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(usbfs_snoop, "true to log all usbfs traffic");
@@ -288,6 +299,9 @@ static struct async *alloc_async(unsigned int numisoframes)
 
 static void free_async(struct async *as)
 {
+	struct usb_memory *usbm = NULL, *usbm_iter;
+	unsigned long flags;
+	struct usb_dev_state *ps = as->ps;
 	int i;
 
 	put_pid(as->pid);
@@ -297,8 +311,22 @@ static void free_async(struct async *as)
 		if (sg_page(>urb->sg[i]))
 			kfree(sg_virt(>urb->sg[i]));
 	}
+
+	spin_lock_irqsave(>lock, flags);
+	list_for_each_entry(usbm_iter, >memory_list, memlist) {
+		if (usbm_iter->mem == as->urb->transfer_buffer) {
+			usbm = usbm_iter;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(>lock, flags);
+
 	kfree(as->urb->sg);
-	kfree(as->urb->transfer_buffer);
+	if (usbm == NULL)
+		kfree(as->urb->transfer_buffer);
+	else
+		usbm->usb_use_count--;
+
 	kfree(as->urb->setup_packet);
 	usb_free_urb(as->urb);
 	usbfs_decrease_memory_usage(as->mem_usage);
@@ -910,6 +938,7 @@ static int usbdev_open(struct inode *inode, struct file *file)
 	INIT_LIST_HEAD(>list);
 	INIT_LIST_HEAD(>async_pending);
 	INIT_LIST_HEAD(>async_completed);
+	INIT_LIST_HEAD(>memory_list);
 	init_waitqueue_head(>wait);
 	ps->discsignr = 0;
 	ps->disc_pid = get_pid(task_pid(current));
@@ -938,6 +967,8 @@ static int usbdev_release(struct inode *inode, struct file *file)
 	struct usb_dev_state *ps = file->private_data;
 	struct usb_device *dev = ps->dev;
 	unsigned int ifnum;
+	struct list_head *p, *q;
+	struct usb_memory *tmp;
 	struct async *as;
 
 	usb_lock_device(dev);
@@ -962,6 +993,14 @@ static int usbdev_release(struct inode *inode, struct file *file)
 		free_async(as);
 		as = async_getcompleted(ps);
 	}
+
+	list_for_each_safe(p, q, >memory_list) {
+		tmp = list_entry(p, struct usb_memory, memlist);
+		list_del(p);
+		if (tmp->mem)
+			free_pages_exact(tmp->mem, tmp->size);
+		kfree(tmp);
+	}
 	kfree(ps);
 	return 0;
 }
@@ -1289,6 +1328,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
 	struct usb_host_endpoint *ep;
 	struct async *as = NULL;
 	struct usb_ctrlrequest *dr = NULL;
+	struct usb_memory *usbm = NULL, *iter = NULL;
 	unsigned int u, totlen, isofrmlen;
 	int i, ret, is_in, num_sgs = 0, ifnum = -1

Re: Page allocation failure

2015-11-14 Thread Steinar H. Gunderson
On Sat, Nov 14, 2015 at 12:09:38PM -0500, Alan Stern wrote:
> And you probably couldn't use GPU memory for this purpose because (I
> assume) the USB host controller wouldn't be able to access it via DMA.

I guess that depends a bit; on my particular system, I'm using an Intel GPU,
so memory is shared. But I assume you can't do this if the memory is
dedicated and just mapped up from a dedicated GPU.

> Incidentally, I wonder how well this patch really does fix your 
> problem.  What happens if you run the old program until it fails with 
> the "unable to allocate memory" error and then try running the new 
> program?  I suspect the new program will also fail under those 
> circumstances.  The only advantage it has is that if the initial 
> allocations succeed then you don't have to worry about later 
> allocations failing, because the buffer memory is allocated only once.

You are right in that it doesn't solve at all the problem “can't allocate
memory on startup”, but that's not the problem I have at all. So it really
does solve my problem, but it does not solve all other related problems.

Incidentally, once it's really bad (without the patch), I can usually run for
a few frames before it starts failing again, so I would assume that the
result of the experiment you're proposing would actually be “it works well”.
Seemingly the memory fragmentation (or whatever it is) just increases the
probability of the allocation failing by a lot, it doesn't move it to 100%.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-13 Thread Steinar H. Gunderson
On Fri, Nov 13, 2015 at 12:00:53AM +0100, Steinar H. Gunderson wrote:
> Interesting. I hacked libusb to get the fd. Then I did USBDEVFS_ALLOC_MEMORY,
> which succeeded (well, as soon as I filled mem.size before calling),
> then mmap on it as described (which also succeeded), and everything _works_,
> but I don't think it's actually using zerocopy, because I still see
> copy_user_enhanced_fast_string() using a lot of CPU
> (with libusb_handle_events() in the call stack).

OK, so I did some more testing. It does actually seem to do _something_.
In particular, memset_erms() all but disappeared from my profile, which was
stated in the thread as one advantage of the patch.

I ran overnight for eight hours (one card, continuous stream at about
900 Mbit/sec, 128 kB buffers) and it didn't get any page allocation features.
Before the patch, it was unlikely but not impossible to run for that long.
I've kept it running, but it's weak evidence of solving the problem.

As for general performance, I'm unsure; from a quick glance at perf, I _may_
have more L3 misses when reading the data back in userspace, but I won't say
anything for sure here without a closer look.

So, in general I think it's good news, although I don't fully understand why
I still need the kernel-to-userspace copy for isochronous transfer.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-13 Thread Steinar H. Gunderson
On Fri, Nov 13, 2015 at 10:38:54AM -0500, Alan Stern wrote:
>> So, in general I think it's good news, although I don't fully understand why
>> I still need the kernel-to-userspace copy for isochronous transfer.
> Maybe you can add some debugging to copy_user_enhanced_fast_string().  
> Add a flag, and call dump_stack() in that routine if the flag is set.  
> Then set and clear this flag at the appropriate spots in devio.c.

What exactly am I looking for, beyond the stack trace the kernel already
gives me?

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-13 Thread Steinar H. Gunderson
On Fri, Nov 13, 2015 at 11:24:39AM -0500, Alan Stern wrote:
>> What exactly am I looking for, beyond the stack trace the kernel already
>> gives me?
> Find out where copy_user_enhanced_fast_string is being called from, and 
> using that information, figure out why it was called.  That will tell 
> you why this approach isn't yielding true zerocopy performance.

The stack trace is simple enough, although I fear there's some inlining going
on:

-9,50% 9,50%  app   [kernel.kallsyms]  [k] 
copy_user_enhanced_fast_string
   - copy_user_enhanced_fast_string
  - 9,45% __GI___ioctl
 - 9,30% op_handle_events
  handle_events
  libusb_handle_events_timeout_completed
  libusb_handle_events
  BMUSBCapture::usb_thread_func
  execute_native_thread_routine

I'm not honestly sure if the patch even tries to do zerocopy for isochronous
(it might be bulk only), but I'll try to understand what's going on.

For the better news: My program ran for eight hours more without hitting the
page allocation failures. So I suppose the patch (with cooperation from user
space, of course) really solves the immediate problem.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-13 Thread Steinar H. Gunderson
On Fri, Nov 13, 2015 at 06:50:27PM +0100, Steinar H. Gunderson wrote:
> The stack trace is simple enough, although I fear there's some inlining going
> on:

OK, I guess since copy_user_enhanced_fast_string is an assembler function,
the unwinding doesn't work properly. I added a dump_stack() in
copy_urb_data_to_user() instead, which is probably a better place:

[   38.535633] Call Trace:
[   38.535640]  [] ? dump_stack+0x40/0x5d
[   38.535652]  [] ? copy_urb_data_to_user+0x15/0x110 
[usbcore]
[   38.535654]  [] ? processcompl+0x8d/0x130 [usbcore]
[   38.535656]  [] ? usbdev_do_ioctl+0xa3/0x1310 [usbcore]
[   38.535659]  [] ? usbdev_ioctl+0x5/0x10 [usbcore]
[   38.535663]  [] ? do_vfs_ioctl+0x2be/0x490
[   38.535666]  [] ? ktime_get_ts64+0x3a/0xe0
[   38.535668]  [] ? SyS_ioctl+0x71/0x80
[   38.535672]  [] ? entry_SYSCALL_64_fastpath+0x12/0x71

There's still the jump from processcompl to copy_urb_data_to_user, but I
guess this is the more relevant one. (The dump_stack() triggers hundreds of
times per second.)

I tried just not setting as->userbuffer if usbm == NULL, and lo and behold,
it actually seems to help. I'm wondering if the copy is just from the buffer
to itself? copy_user_enhanced_fast_string is now down at 0.26%, ie.
effectively nothing.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-13 Thread Steinar H. Gunderson
On Fri, Nov 13, 2015 at 03:20:56PM -0500, Alan Stern wrote:
>> I tried just not setting as->userbuffer if usbm == NULL, and lo and behold,
>> it actually seems to help.
> In fact, there's no need to call copy_urb_data_to_user at all if the 
> buffer lies in the mmap'ed area.

usbm != NULL is meant to signal this. It only checks for exact match, though,
not containment. (Well, matching start; the size can be smaller.)

>> I'm wondering if the copy is just from the buffer
>> to itself?
> Yes, it is.  I can't imagine why this wasn't handled in the original 
> patch.  Simple oversight?

Or perhaps something changed between patch submission and now? Or maybe it
was never properly tested. I don't know.

>> copy_user_enhanced_fast_string is now down at 0.26%, ie.
>> effectively nothing.
> Very good.

Of course, not all if this gets translated to actual savings. Previously,
the data copy faulted data in from main memory (lots of L3 misses), and that
cost now gets moved into my own userspace code. But that's fine; it's got to
be somewhere.

So what is the road from here? I guess the original questions about cache
coherency still apply, and that this is what I'm seeing in dmesg.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-13 Thread Steinar H. Gunderson
On Fri, Nov 13, 2015 at 04:02:48PM -0500, Alan Stern wrote:
>> So what is the road from here? I guess the original questions about cache
>> coherency still apply, and that this is what I'm seeing in dmesg.
> What questions?  It should be obvious that the user program should not 
> touch the buffer contents while the transfer is taking place.

The subthread I'm thinking of starts at 

  http://marc.info/?l=linux-usb=138091207413756=2

I can't claim to have gone deeply into the details, though.

> What are you seeing in dmesg?

Several copies of

[ 1175.838536] x86/PAT: app:2838 map pfn RAM range req uncached-minus for [mem 
0x9fa4c000-0x9fa4], got write-back

> The next step would be to massage the patch and get it into a form
> suitable for applying.  This may well include changing the way the API
> works; for example, I'm not sure that allocating memory should be a
> separate step from mmap.

Yes, it sounds a bit odd to me, too.

I suppose there's no way to let userspace allocate this memory? Again,
for me personally it would be ideal to be able to give it in from a PBO
(ie., GPU memory).

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-12 Thread Steinar H. Gunderson
On Thu, Nov 12, 2015 at 05:08:32PM -0500, Alan Stern wrote:
> You're right; I got the two patch sets mixed up.  Considering that it 
> was written a couple of years ago, you'll probably have to do some work 
> to get it to apply to the current kernel.  Let me know if you need any 
> help.

There were some minor rejects, thankfully nothing big. I'll see if I can
actually get it to do anything useful, too. :-)

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-12 Thread Steinar H. Gunderson
On Fri, Nov 13, 2015 at 12:00:53AM +0100, Steinar H. Gunderson wrote:
> I guess I should either add some debug printks or else see if there's
> anything in the snooping that can help me track it down.

Interestingly enough, I get this at startup:

[ 3316.681227] x86/PAT: app:4154 map pfn RAM range req uncached-minus for [mem 
0x7060-0x7061dfff], got write-back
[ 3316.681260] x86/PAT: app:4154 map pfn RAM range req uncached-minus for [mem 
0x601c-0x601ddfff], got write-back
[ 3316.681296] x86/PAT: app:4154 map pfn RAM range req uncached-minus for [mem 
0x601e-0x601fdfff], got write-back
[ 3316.681309] x86/PAT: app:4154 map pfn RAM range req uncached-minus for [mem 
0x707d8000-0x707dbfff], got write-back
[ 3316.681325] x86/PAT: app:4154 map pfn RAM range req uncached-minus for [mem 
0x60124000-0x60127fff], got write-back
[ 3316.681347] x86/PAT: app:4154 map pfn RAM range req uncached-minus for [mem 
0x82bd8000-0x82bdbfff], got write-back

/* Steinar */
-- 
Homepage: https://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-09 Thread Steinar H. Gunderson
On Mon, Nov 02, 2015 at 11:48:36AM -0500, Alan Stern wrote:
> Order 7?  Maybe you're trying to put too much data into a single
> transfer and encountering problems with memory fragmentation.  Try
> using more frequent, smaller transfers.

I tried going down to 128 kB; it still happens (and if I go even further
down, I can't seem to keep up the data stream reliably, even with realtime
priority on my USB thread -- perhaps not so strange, since that's ~1000
submits per second already).

  [209244.299476] nageru: page allocation failure: order:5, mode:0x2040d0
  [209244.299486] CPU: 3 PID: 6045 Comm: nageru Tainted: GW   4.3.0 
#1
  [209244.299490] Hardware name: LENOVO 20ALCT01WW/20ALCT01WW, BIOS GIET83WW 
(2.33 ) 08/25/2015
  [209244.299494]  88015f5cbba0 812a5113 002040d0 
8113e613
  [209244.299502]  002040d0 880101476f00 88023e5fab08 

  [209244.299507]  0005 0050 002040d0 
88015f5cbc58
  [209244.299512] Call Trace:
  [209244.299524]  [] ? dump_stack+0x40/0x5d
  [209244.299534]  [] ? warn_alloc_failed+0xd3/0x130
  [209244.299540]  [] ? __alloc_pages_nodemask+0x2c6/0xa10
  [209244.299548]  [] ? kmem_getpages+0x58/0x100
  [209244.299556]  [] ? fallback_alloc+0x1b1/0x1f0
  [209244.299572]  [] ? proc_do_submiturb+0x5d6/0xba0 
[usbcore]
  [209244.299578]  [] ? __kmalloc+0x2c1/0x4b0
  [209244.299587]  [] ? proc_do_submiturb+0x5d6/0xba0 
[usbcore]
  [209244.299596]  [] ? proc_do_submiturb+0x5d6/0xba0 
[usbcore]
  [209244.299604]  [] ? usbdev_do_ioctl+0x8d5/0x10d0 [usbcore]
  [209244.299613]  [] ? usbdev_ioctl+0x5/0x10 [usbcore]
  [209244.299621]  [] ? do_vfs_ioctl+0x2be/0x490
  [209244.299629]  [] ? __schedule+0x262/0x880
  [209244.299635]  [] ? SyS_ioctl+0x71/0x80
  [209244.299642]  [] ? prepare_exit_to_usermode+0x89/0xf0
  [209244.299648]  [] ? entry_SYSCALL_64_fastpath+0x12/0x71
  [209244.299651] Mem-Info:
  [209244.299660] active_anon:541969 inactive_anon:342054 isolated_anon:0
   active_file:506674 inactive_file:324830 isolated_file:0
   unevictable:20244 dirty:1494 writeback:0 unstable:0
   slab_reclaimable:214493 slab_unreclaimable:13295
   mapped:162591 shmem:216948 pagetables:10999 bounce:0
   free:38519 free_pcp:106 free_cma:0
  [209244.299667] Node 0 DMA free:15868kB min:20kB low:24kB high:28kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15912kB 
managed:15868kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB 
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB 
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
  [209244.299681] lowmem_reserve[]: 0 2951 7916 7916
  [209244.299690] Node 0 DMA32 free:27324kB min:4212kB low:5264kB high:6316kB 
active_anon:771176kB inactive_anon:566580kB active_file:768952kB 
inactive_file:482948kB unevictable:30852kB isolated(anon):0kB 
isolated(file):0kB present:3033320kB managed:3024500kB mlocked:30852kB 
dirty:792kB writeback:0kB mapped:263476kB shmem:336260kB 
slab_reclaimable:327572kB slab_unreclaimable:16432kB kernel_stack:2352kB 
pagetables:15512kB unstable:0kB bounce:0kB free_pcp:276kB local_pcp:0kB 
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
  [209244.299702] lowmem_reserve[]: 0 0 4964 4964
  [209244.299707] Node 0 Normal free:110884kB min:7088kB low:8860kB 
high:10632kB active_anon:1396700kB inactive_anon:801636kB active_file:1257744kB 
inactive_file:816372kB unevictable:50124kB isolated(anon):0kB 
isolated(file):0kB present:5216256kB managed:5083780kB mlocked:50124kB 
dirty:5184kB writeback:0kB mapped:386888kB shmem:531532kB 
slab_reclaimable:530400kB slab_unreclaimable:36748kB kernel_stack:4752kB 
pagetables:28484kB unstable:0kB bounce:0kB free_pcp:148kB local_pcp:28kB 
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
  [209244.299717] lowmem_reserve[]: 0 0 0 0
  [209244.299723] Node 0 DMA: 3*4kB (U) 4*8kB (U) 3*16kB (U) 3*32kB (U) 3*64kB 
(U) 1*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15868kB
  [209244.299747] Node 0 DMA32: 1733*4kB (UEM) 1266*8kB (UEM) 616*16kB (UEM) 
16*32kB (UEM) 2*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
27556kB
  [209244.299766] Node 0 Normal: 1025*4kB (UEM) 2108*8kB (UEM) 1996*16kB (UEM) 
1227*32kB (UEM) 297*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 
0*4096kB = 72kB
  [209244.299786] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB
  [209244.299789] 1049019 total pagecache pages
  [209244.299792] 542 pages in swap cache
  [209244.299795] Swap cache stats: add 16476, delete 15934, find 55862/56175
  [209244.299798] Free swap  = 8206684kB
  [209244.299801] Total swap = 8265724kB
  [209244.299804] 2066372 pages RAM
  

Re: Overly conservative xHCI bandwidth estimation

2015-11-07 Thread Steinar H. Gunderson
On Fri, Nov 06, 2015 at 08:52:52AM +0800, Lu, Baolu wrote:
>> I reproduced the U1/U2 disconnect issues several times. I didn't try the
>> issue of not enough bandwidth for two devices.
> Can you please try the not enough bandwidth issue?

It doesn't work. I can use one card (it takes 3-4 tries to get it started due
to LPM bugs, but it works), but not two.

Note that I can use one card even if the other is plugged in. I just can't
use both at the same time.

> Can you cat /sys/bus/usb/devices/2-2/power/usb3_hardware_lpm_u1
> and /sys/bus/usb/devices/2-2/power/usb3_hardware_lpm_u2 for several
> times? Don't use that device when cat these two files.

I waited a bit longer, and indeed, now it consistently shows enabled on both
cards.

  klump:~> cat /sys/bus/usb/devices/2-1/power/usb3_hardware_lpm_u1
  enabled
  klump:~> cat /sys/bus/usb/devices/2-1/power/usb3_hardware_lpm_u2
  enabled
  klump:~> cat /sys/bus/usb/devices/2-2/power/usb3_hardware_lpm_u1
  enabled
  klump:~> cat /sys/bus/usb/devices/2-2/power/usb3_hardware_lpm_u2
  enabled

For fun, I tried also with a USB 2.0 cable. U1 and U2 comes up as disabled,
but of course, the device doesn't work.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-11-05 Thread Steinar H. Gunderson
On Thu, Nov 05, 2015 at 04:12:24PM +0800, Lu, Baolu wrote:
> 1) apply the attached patch on top the latest kernel.
> 2) build and install the kernel.
> 3) boot your machine with the new kernel.

Do you want this on top of Alan's LPM-disabling patch or on a clean
4.3.0 tree?

> 4) insert one  Blackmagic Design device into USB3 root port.
> 7) wait for a few seconds
> 6) check below sysfs nodes:

I guess there's no step 5. :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-11-05 Thread Steinar H. Gunderson
On Fri, Nov 06, 2015 at 08:24:15AM +0800, Lu, Baolu wrote:
> Yeah, sorry about it.
> 
> 1) apply the attached patch on top the latest clean kernel.
> 2) build and install the kernel.
> 3) boot your machine with the new kernel.
> 4) insert one  Blackmagic Design device into USB3 root port.
> 5) wait for a few seconds
> 6) check below sysfs nodes:
> 
> $ cat /sys/bus/usb/devices//power/usb3_hardware_lpm_u1
> $ cat /sys/bus/usb/devices//power/usb3_hardware_lpm_u2

OK, what I sent you this earlier today should be exactly this; I think I
might not have sent the results to the list, though.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-11-05 Thread Steinar H. Gunderson
On Fri, Nov 06, 2015 at 08:39:28AM +0800, Lu, Baolu wrote:
> Have you set CONFIG_PM?

Yes. CONFIG_PM=y.

> Can you reproduce the problem with this kernel?

I reproduced the U1/U2 disconnect issues several times. I didn't try the
issue of not enough bandwidth for two devices.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-11-04 Thread Steinar H. Gunderson
On Mon, Nov 02, 2015 at 04:00:42PM -0500, Alan Stern wrote:
> That commit was included in (approximately) the 4.1.5 or later stable 
> kernel, and it is included in 4.2.  You should be able to put one of 
> those on a bootable USB stick.

I tried going down this route. After some back and forth, I realize that...
the machine has only two USB ports, so I can't boot off USB stick and still
test how it behaves with two cards. (I could buy a hub, but that would surely
introduce new potential errors in the mix.)

So I'm afraid I can't help you at this point. My recommendation to include
the patch still stands, though.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-11-02 Thread Steinar H. Gunderson
On Mon, Nov 02, 2015 at 11:04:42AM -0500, Alan Stern wrote:
> To make your life easier, here's a patch which blacklists these two 
> devices for Link Power Management.  Please let me know if it works 
> okay.

OK, this worked brilliantly. I can now use both cards at the same time,
and I also see no LPM timeouts anymore. (Tested on top of 4.3.0, with
CONFIG_PM=y.)

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-11-02 Thread Steinar H. Gunderson
On Mon, Nov 02, 2015 at 01:28:09PM -0500, Alan Stern wrote:
>> Thanks for testing.  I will submit the patch in two weeks, when the
>> current merge window closes.
> Or maybe not.  This may not be the best way to solve the problem, since
> you found that the two video cards worked okay with one xHCI host
> controller but not with another.  Maybe what we really need to do is 
> blacklist some controllers for LPM, rather than the devices.

Note that the two controllers were running different kernels. The one that
worked had a kernel before LPM was turned on, I believe.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-11-02 Thread Steinar H. Gunderson
On Mon, Nov 02, 2015 at 11:04:42AM -0500, Alan Stern wrote:
> To make your life easier, here's a patch which blacklists these two 
> devices for Link Power Management.  Please let me know if it works 
> okay.

Thanks! I'll give it a shot.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure

2015-11-02 Thread Steinar H. Gunderson
On Mon, Nov 02, 2015 at 11:48:36AM -0500, Alan Stern wrote:
> Order 7?  Maybe you're trying to put too much data into a single
> transfer and encountering problems with memory fragmentation.  Try
> using more frequent, smaller transfers.

Yes, my transfers are rather big; 512 kB or so. (They used to be 2 MB.)
Somehow when you want to stream a gigabit or so of data, 16 kB transfers
would seem inefficient :-)

I've tried in the past to have fewer and just have more of them in transit,
but there seems to be a limit as to how many you can have.

Note that this is receive, not send.

> Once the data is in the kernel, the rest of the procedure is basically
> zero-copy.  The problem is getting it there from within your program.  
> We currently don't have any support for zero-copy data submissions,
> although it has been proposed a few times in the past.

Do you know if there's any plans to revive these proposals? I've seen them in
the archives, too, but they all seem to have died down. It seems each stream
costs me ~15% or so of one core (mostly in copy_user_enhanced_fast_string and
memset_erms, which I suppose are about copying to user space), and I'm a bit
strapped for CPU already.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-11-02 Thread Steinar H. Gunderson
On Mon, Nov 02, 2015 at 03:32:46PM -0500, Alan Stern wrote:
> You don't need 4.3.  Anything released within the last few years 
> will contain the LPM code.

According to the bisect, the patch that broke it was from June this year
(2d2a316765d956bc5cb6bb367b2ec52ca59ab8e9), so it would really seem I need
something quite new.

The kernel on the other machine is Ubuntu's 3.19.0-28-generic, as far as I
can see.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-11-02 Thread Steinar H. Gunderson
On Mon, Nov 02, 2015 at 02:38:41PM -0500, Alan Stern wrote:
>> Note that the two controllers were running different kernels. The one that
>> worked had a kernel before LPM was turned on, I believe.
> Oh.  Can you try running a more recent kernel on that computer?  Does 
> it fail with the same "Not enough bandwidth for altsetting 4" error?

Unfortunately I cannot easily; it's under someone else's administration.
Perhaps I could find a live USB stick, but I'm not immediately sure if
there's anything out there with e.g. 4.3 yet.

But I did run an older kernel on the computer with the problem (as part of
the bisection that showed LPM to be breaking it), and it worked then.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Page allocation failure

2015-11-01 Thread Steinar H. Gunderson
Hello again,

I am (still) using libusb-1.0 to drive an USB 3.0 video card. After I turned
off power management (see previous emails :-) ) it seems stable, but after
running something like 5–6 hours, I seem to get problems on URB submission:

  [82029.656250] nageru: page allocation failure: order:7, mode:0x2040d0
  [82029.656263] CPU: 2 PID: 31210 Comm: nageru Tainted: GW   
4.3.0-rc7 #1
  [82029.656267] Hardware name: LENOVO 20ALCT01WW/20ALCT01WW, BIOS GIET83WW 
(2.33 ) 08/25/2015
  [82029.656271]  88015bd73ba0 8e285d33 002040d0 
8e120383
  [82029.656278]  002040d0 88008c9e8500 88023e5fab08 

  [82029.656284]  0007 0050 002040d0 
88015bd73c58
  [82029.656289] Call Trace:
  [82029.656305]  [] ? dump_stack+0x40/0x5d
  [82029.656317]  [] ? warn_alloc_failed+0xd3/0x130
  [82029.656326]  [] ? __alloc_pages_nodemask+0x2c6/0x9f0
  [82029.656334]  [] ? kmem_getpages+0x58/0x100
  [82029.656339]  [] ? fallback_alloc+0x1ab/0x1e0
  [82029.656353]  [] ? proc_do_submiturb+0x5d6/0xba0 [usbcore]
  [82029.656359]  [] ? __kmalloc+0x2a9/0x490
  [82029.656368]  [] ? proc_do_submiturb+0x5d6/0xba0 [usbcore]
  [82029.656376]  [] ? proc_do_submiturb+0x5d6/0xba0 [usbcore]
  [82029.656383]  [] ? do_futex+0x82b/0xab0
  [82029.656391]  [] ? usbdev_do_ioctl+0x8d5/0x10d0 [usbcore]
  [82029.656399]  [] ? usbdev_ioctl+0x5/0x10 [usbcore]
  [82029.656407]  [] ? do_vfs_ioctl+0x2be/0x490
  [82029.656413]  [] ? __schedule+0x263/0x850
  [82029.656419]  [] ? SyS_ioctl+0x71/0x80
  [82029.656425]  [] ? prepare_exit_to_usermode+0x78/0xe0
  [82029.656431]  [] ? entry_SYSCALL_64_fastpath+0x12/0x6a
  [82029.656434] Mem-Info:
  [82029.656446] active_anon:354800 inactive_anon:275136 isolated_anon:0
  active_file:558226 inactive_file:620193 isolated_file:0
  unevictable:76 dirty:419 writeback:0 unstable:0
  slab_reclaimable:174895 slab_unreclaimable:11881
  mapped:128309 shmem:157945 pagetables:8468 bounce:0
  free:14319 free_pcp:216 free_cma:0
  [82029.656456] Node 0 DMA free:15864kB min:20kB low:24kB high:28kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15912kB 
managed:15868kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB 
slab_reclaimable:4kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB 
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
  [82029.656468] lowmem_reserve[]: 0 2951 7916 7916
  [82029.656475] Node 0 DMA32 free:28656kB min:4212kB low:5264kB high:6316kB 
active_anon:483308kB inactive_anon:429384kB active_file:843100kB 
inactive_file:933236kB unevictable:64kB isolated(anon):0kB isolated(file):0kB 
present:3033320kB managed:3024944kB mlocked:64kB dirty:800kB writeback:0kB 
mapped:162536kB shmem:202532kB slab_reclaimable:267420kB 
slab_unreclaimable:14996kB kernel_stack:2096kB pagetables:11932kB unstable:0kB 
bounce:0kB free_pcp:160kB local_pcp:68kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:0 all_unreclaimable? no
  [82029.656486] lowmem_reserve[]: 0 0 4964 4964
  [82029.656492] Node 0 Normal free:12756kB min:7088kB low:8860kB high:10632kB 
active_anon:935892kB inactive_anon:671160kB active_file:1389804kB 
inactive_file:1547536kB unevictable:240kB isolated(anon):0kB isolated(file):0kB 
present:5216256kB managed:5083812kB mlocked:240kB dirty:876kB writeback:0kB 
mapped:350700kB shmem:429248kB slab_reclaimable:432156kB 
slab_unreclaimable:32528kB kernel_stack:4544kB pagetables:21940kB unstable:0kB 
bounce:0kB free_pcp:704kB local_pcp:164kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:0 all_unreclaimable? no
  [82029.656501] lowmem_reserve[]: 0 0 0 0
  [82029.656507] Node 0 DMA: 4*4kB (UE) 5*8kB (UE) 4*16kB (UE) 4*32kB (UE) 
4*64kB (UE) 2*128kB (UE) 1*256kB (E) 1*512kB (E) 2*1024kB (UE) 2*2048kB (EM) 
2*4096kB (M) = 15864kB
  [82029.656534] Node 0 DMA32: 1561*4kB (UEM) 1647*8kB (UEM) 435*16kB (UEM) 
66*32kB (UEM) 6*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
28876kB
  [82029.656553] Node 0 Normal: 2288*4kB (UEM) 219*8kB (UEM) 47*16kB (UEM) 
25*32kB (M) 7*64kB (M) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
13032kB
  [82029.656580] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB
  [82029.656582] 1336604 total pagecache pages
  [82029.656585] 193 pages in swap cache
  [82029.656588] Swap cache stats: add 2232, delete 2039, find 315/416
  [82029.656591] Free swap  = 8259324kB
  [82029.656593] Total swap = 8265724kB
  [82029.656595] 2066372 pages RAM
  [82029.656597] 0 pages HighMem/MovableOnly
  [82029.656599] 35216 pages reserved
  [82029.656601] 0 pages hwpoisoned

After it's gone to this point, things are basically so bad that the only
thing that really helps is a reboot; I can restart the program, but it 

Re: Overly conservative xHCI bandwidth estimation

2015-10-31 Thread Steinar H. Gunderson
On Wed, Oct 21, 2015 at 09:49:16AM +0800, Lu, Baolu wrote:
> I could spend some time on this issue a week later.
> I'd like to check whether there is any bugs in the driver itself.
> Otherwise, blacklist this specific device for LPM.

I don't know if anything happened here, but if you need information for a
blacklist, the two devices I've tested (which both have this issue) are:

 - 1edb:bd3b (Blackmagic Design Intensity Shuttle)
 - 1edb:bd4f (Blackmagic Design UltraStudio SDI)

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-10-21 Thread Steinar H. Gunderson
On Wed, Oct 21, 2015 at 10:29:11AM -0400, Alan Stern wrote:
>> Is there such a blacklist already, where I can add the devices (there are two
>> distinct IDs behaving identically) and see if it helps?
> There is a blacklist, but it does not affect LPM.  See 
> drivers/usb/core/quirks.c and include/linux/usb/quirks.h.

OK, so to blacklist the device for LPM, some new code would need to get
written.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-10-21 Thread Steinar H. Gunderson
On Wed, Oct 21, 2015 at 09:49:16AM +0800, Lu, Baolu wrote:
> I could spend some time on this issue a week later.
> I'd like to check whether there is any bugs in the driver itself.
> Otherwise, blacklist this specific device for LPM.

Is there such a blacklist already, where I can add the devices (there are two
distinct IDs behaving identically) and see if it helps?

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-10-20 Thread Steinar H. Gunderson
On Mon, Sep 28, 2015 at 02:32:13PM +0200, Steinar H. Gunderson wrote:
> Just so that it doesn't get lost: I've reported issues with this specific
> device and LPM not too long ago. It's entirely possible that the device
> somehow is broken, although it works in Windows 7/8/10 and OS X, from what I
> know, and at least Windows 10 uses USB3 LPM.
> 
> Perhaps figuring out what's wrong with the ping timeouts would also fix this
> issue? I'm afraid I don't have any fancy USB analyzer equipment to debug
> with, though.
> 
> In any case, I'll reiterate my request to turn off LPM for a specific device,
> no matter whose fault these specific issues are. :-)

Is there anything else I can do to help fixing these issues? (U1/U2 timeouts
causing disconnects, and xHCI bandwidth estimation being off.) I've been
running with CONFIG_PM=n for a while now, but it is obviously not a good
place to be for a laptop.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-28 Thread Steinar H. Gunderson
On Mon, Sep 28, 2015 at 03:24:04PM +0300, Mathias Nyman wrote:
> Driver will tell the latency value to the host controller, when a exit
> latency time is set the host will know that the link is power managed, and
> host will start to schedule additional PING TP transfers to the device to
> wake the link up to U0.  the PING TP tranfers are sent Max-exit-latency
> microeseconds before the actual transfer to make sure link is woken up in
> time.  It's possible that this impacts on the bandwidth.
> 
> It's also very possible that the max exit latency is calculated wrong, or
> that one of the other parameters set with the same "evaluate context"
> command is wrong

Just so that it doesn't get lost: I've reported issues with this specific
device and LPM not too long ago. It's entirely possible that the device
somehow is broken, although it works in Windows 7/8/10 and OS X, from what I
know, and at least Windows 10 uses USB3 LPM.

Perhaps figuring out what's wrong with the ping timeouts would also fix this
issue? I'm afraid I don't have any fancy USB analyzer equipment to debug
with, though.

In any case, I'll reiterate my request to turn off LPM for a specific device,
no matter whose fault these specific issues are. :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-26 Thread Steinar H. Gunderson
On Fri, Sep 25, 2015 at 10:12:51PM +0200, Steinar H. Gunderson wrote:
> I have no idea whatsoever how this breaks bandwidth management, but seemingly
> it does.

To verify; I built 4.3rc-2 with CONFIG_PM=n, and it can drive both cards
without problems whatwoever. So there's something about power management
(LPM?) in newer kernels that causes this issue.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-25 Thread Steinar H. Gunderson
On Fri, Sep 25, 2015 at 10:26:23AM -0400, Alan Stern wrote:
>> Does this mean I have a long bisect ahead of me?
> In the absence of any better suggestions, that's the most 
> straightforward way to get an answer.

Three hours of bisecting (actually three bisects; I found a git bug along the
way :-/) later shows the culprit:

  commit 2d2a316765d956bc5cb6bb367b2ec52ca59ab8e9
  Author: Lu Baolu 
  Date:   Tue Jun 16 09:08:26 2015 +0800
  
  usb: core: lpm: set lpm_capable for root hub device

I have no idea whatsoever how this breaks bandwidth management, but seemingly
it does.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-24 Thread Steinar H. Gunderson
On Thu, Sep 24, 2015 at 03:35:10PM +0200, Krzysztof Opasiak wrote:
>> No. The 2nd altset is 2.16, the 4th altset is 1.04. Unless my calculations
>> are wrong.
> How do you do your calculations?

Like I said in my initial email:

>>> The interface of each card has two relevant alternates, 2 and 4. Number 2 
>>> has
>>> bInterval=1, bMaxBurst=11, Mult=2, which from what I can see should be
>>> 11*(2+1)*1024 byte / 125µs, or 2.16 Gbit/sec. Number 4 has bInterval=1,
>>> bMaxBurst=8, Mult=1, which should be 8*(1+1)*1024 byte / 125 µs, or 1.04 
>>> Gbit/sec.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-24 Thread Steinar H. Gunderson
On Thu, Sep 24, 2015 at 03:09:55PM +0200, Krzysztof Opasiak wrote:
> But still the problem may exist. Is the 2.16 GBit bandwidth for 4th altset?

No. The 2nd altset is 2.16, the 4th altset is 1.04. Unless my calculations
are wrong.

> Remember that according to USB spec not whole bandwidth can be used by iso
> transfers (in usb 2.0 the limit is 80%, don't know how it looks in 3.0). So
> if 2.16 is lowest possible bandwidth for your device you have 2.16 x 2 =
> 4.32 but 80% * 5 = 4...

But total bandwidth shouldn't be the issue as long as the 2nd altset
(by itself) works fine, no? It's _more_ than twice that of the 4th.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-24 Thread Steinar H. Gunderson
On Thu, Sep 24, 2015 at 02:09:41PM +0200, Krzysztof Opasiak wrote:
> Let's start from the beginning. Your device use ISO endpoints which means
> that host allocates specific amount of bandwidth on the bus. More over,
> interfaces in your devices has many alternate settings. Probably each of
> them reserves different amount of bandwidth. When you connect your first
> device, driver selects the highest possible quality and allocate maximum
> bandwidth. When you connect second device driver tries to select altsetting
> for which there is enough bandwidth left but as you see this fail. So we
> get:

I'm sorry, I should have mentioned an important detail here. There is no
kernel driver for my card; I drive it from userspace with libusb-1.0. So I'm
the one setting the alternates. The cards start out in alternate 0, so the
scenario you're outlining shouldn't exist; and indeed, the setting the first
card to 4 (the low-bandwidth alternate) works just fine, it's setting the
second one that's the problem.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-24 Thread Steinar H. Gunderson
On Thu, Sep 24, 2015 at 10:17:19AM -0400, Alan Stern wrote:
> However, none of this answers the question of why you can use both
> cards on a different machine but not on yours.  It comes down to the
> implementations of the xHCI controller chips.  In USB-3, bandwidth
> allocation is handled by firmware running on the controller, not by the
> operating system's driver.  The driver presents a series of endpoints
> with all their bandwidth requirements to the controller, and the
> controller either accepts it or rejects it.

OK, I feared as much. The other machine also has an Intel controller,
but as far as I know, a newer one (and the PCI ID is different -- 8086:9cb1).

> It doesn't give any explanation for its decision, and as far as I know, it
> doesn't provide any information about the details of how it allocates the
> bandwidth.

I thought I saw something in the xHCI spec about enumerating the bandwidth
domains to try to get some more insight in what the topology looks like,
but I guess I misunderstood it? (It all wasn't very clear to me.)

I assume there's no way I can lie to the chip? Like, if I know for a fact
that the card will send less data than the alternate claims (like,
I'm using a video mode that will require only a few hundred megabits/second
in practice, but even the lowest alternate claims >1 Gbit/sec).

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-24 Thread Steinar H. Gunderson
On Thu, Sep 24, 2015 at 11:22:57AM -0400, Alan Stern wrote:
>> I assume there's no way I can lie to the chip? Like, if I know for a fact
>> that the card will send less data than the alternate claims (like,
>> I'm using a video mode that will require only a few hundred megabits/second
>> in practice, but even the lowest alternate claims >1 Gbit/sec).
> You can always patch the kernel to do whatever you want.  For instance, 
> you could reduce the bMaxBurst value in the descriptor after the kernel 
> gets it from the card.  Short of that, there is no way to affect the 
> outcome.

I tried, but it didn't work; I bMaxBurst to 4 instead of to 15
(in usb_parse_ss_endpoint_companion). It didn't change the outcome.
Perhaps this doesn't actually change what the xHCI controller sees,
though.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-24 Thread Steinar H. Gunderson
On Thu, Sep 24, 2015 at 02:33:06PM -0700, Paul Zimmerman wrote:
> IIRC, at least some of the Intel controllers require the bandwidth
> calculations to be done by the xHCI driver, instead of doing it
> themselves in hardware. Perhaps you're tripping over a bug in the xHCI
> driver? Mathias is probably best the one to comment on this.

FWIW, I've checked that the XHCI_SW_BW_CHECKING quirk isn't set for my
device. (I suppose that's the one you're thinking about?)

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-24 Thread Steinar H. Gunderson
On Fri, Sep 25, 2015 at 12:42:22AM +0200, Steinar H. Gunderson wrote:
> I downgraded to Debian's 3.16 kernel. Both cards came up without a hitch.
> But I only seem to get frames back from the second one.

...and after a quick app fix, I have capture from both cards.

Does this mean I have a long bisect ahead of me?

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-24 Thread Steinar H. Gunderson
On Wed, Sep 23, 2015 at 10:08:05PM +0200, Steinar H. Gunderson wrote:
> I've tried on another machine and it works fine there, so my code should,
> at least on the surface of it, be fine.

PLOT TWIST:

I downgraded to Debian's 3.16 kernel. Both cards came up without a hitch.
But I only seem to get frames back from the second one.

???

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Overly conservative xHCI bandwidth estimation

2015-09-24 Thread Steinar H. Gunderson
On Thu, Sep 24, 2015 at 04:46:22PM -0400, Alan Stern wrote:
> It does.  Grep for max_burst in drivers/usb/host/x*.c to see where it 
> gets used.  (Note that in a couple of places involving USB-2 devices, 
> the code uses max_burst where it really means multiplicity.)

OK, so this is very curious. I hacked the endpoint descriptors to set
max_burst=1, max_esit_payload=1024, mult=0 (in xhci_endpoint_init, which
seems to be the thing that actually produces the structures the device 
consumed, so I'm reasonably sure nothing else is modifying it on the way).
Still it says COMP_BW_ERR. Setting interval=3 (it used to be 0) has no
effect. Either my override isn't working, or this just isn't about bandwidth,
but something else.

I also tried ignoring the three other endpoints (early-out in
xhci_add_endpoint) just to check that they weren't taking up some resource
I was needing, and it didn't change anything. If I also ignore the video
endpoint itself (0x83), the altsetting _does_ go through, but of course then
I can't get any data from it.

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >