Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-11 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 11:02:02PM -0700, Jason Gunthorpe wrote:
> > FYI, I have a patch series in linux-next to switches all remaining
> > architectures to use get_dma_ops, and there are plans to allow generic
> > per-device dma_ops based on that.
> 
> Great, so once that is merged we can drop the ib_* versions of all
> this and just have qib/etc customize get_dma_ops? Other than the
> dma_addr_t size issue that sounds great..

I'm not sure the per-device ops are a done deal, as there has been
vocal opposition to it everytime it came up.  But at least we have
the infrastructure for it now.

Other than that I think we're getting ready to actually remove
dma mapping from the ULPs.  Sagi's MR API that takes a scatterlist
is a first step, as it would allow for trivially moving the
dma_map_sg into the core helpes.  For the client side we now just
need to switch FMRs to use that API as well (given that it seems
like we can't get rid of them) and provide an API to "map" a
scatterlist using the DMA MR for those drivers playing fast and
loose.  On the server side I have a first draft of a R/W API that
does RDMA READ/WRITE requests and handles the required registration
and invalidation internally.  It again takes a scatterlist and handles
dma mapping internal.  Now all the dma mapping will be in the core,
which means they are only one step away from the driver.  Now if the
per-device dma_ops don't work out we can simply have a flag in
struct ib_device that it doesn't need dma mapping and can avoid
the indirection through another set of ops at least.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-10 Thread Jason Gunthorpe
On Thu, Dec 10, 2015 at 04:46:24PM -0800, Christoph Hellwig wrote:
> On Thu, Dec 10, 2015 at 05:29:50PM -0700, Jason Gunthorpe wrote:
> > Hrm.. sizeof(void *) > sizeof(dma_addr_t) seemed pretty obscure to me,
> > here is the original discussion:
> > 
> > https://lkml.org/lkml/2006/12/13/245
> > 
> > Sounds like someone was worried about sparc64. I doubt it is an actual
> > issue today, but granted the u64 did make some sense.
> 
> sparc64 still uses a 32-bit dma_addr_t.

Hmm, too bad, that choice severly compromises the rdma userspace -
user space can't create mrs that exceed 4G in total :(

> > So, I *believe* the issue is that linux has (had?) no approved way to
> > convert from a device specific dma_addr_t to a virtual address.
> 
> Linux doesn't have an approved way because it's impossible for the
> generic case.  When you have an iommu you have potentially multiple
> page tables mapping physical addresses to device virtual addresses,
> and there is no easy way to do a reverse mapping.

Yes, I know

> > It is really too bad we can't just use get_dma_ops to handle this case
> > and instead require our own infrastructure.
> 
> FYI, I have a patch series in linux-next to switches all remaining
> architectures to use get_dma_ops, and there are plans to allow generic
> per-device dma_ops based on that.

Great, so once that is merged we can drop the ib_* versions of all
this and just have qib/etc customize get_dma_ops? Other than the
dma_addr_t size issue that sounds great..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-10 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 05:29:50PM -0700, Jason Gunthorpe wrote:
> Hrm.. sizeof(void *) > sizeof(dma_addr_t) seemed pretty obscure to me,
> here is the original discussion:
> 
> https://lkml.org/lkml/2006/12/13/245
> 
> Sounds like someone was worried about sparc64. I doubt it is an actual
> issue today, but granted the u64 did make some sense.

sparc64 still uses a 32-bit dma_addr_t.

> > Now these drivers will end up dma mapping these virtual addresses later,
> > so we might want figure out a) why the qib & co drivers even need the
> > virtual address, and b) see if we maybe should always do the dma_map
> > in the callers anyway, and just have an additional virtual address field
> > for those drivers if absolutely needed.
> 
> So, I *believe* the issue is that linux has (had?) no approved way to
> convert from a device specific dma_addr_t to a virtual address.

Linux doesn't have an approved way because it's impossible for the
generic case.  When you have an iommu you have potentially multiple
page tables mapping physical addresses to device virtual addresses,
and there is no easy way to do a reverse mapping.

> It is really too bad we can't just use get_dma_ops to handle this case
> and instead require our own infrastructure.

FYI, I have a patch series in linux-next to switches all remaining
architectures to use get_dma_ops, and there are plans to allow generic
per-device dma_ops based on that.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-10 Thread Jason Gunthorpe
On Thu, Dec 10, 2015 at 03:41:12PM -0800, Christoph Hellwig wrote:
> On Thu, Dec 10, 2015 at 10:44:13AM -0700, Jason Gunthorpe wrote:
> > > the signature of the function in struct ib_dma_mapping_ops.
> > 
> > It is supposed to be a dma_addr_t 'cookie' not a u64.
> > 
> > A patch to cleanup the core in this area would be appreciated.
> 
> I walked through the ib_dma_* mess in detail, and sadly speaking it
> has to be a u64.  This is due to the drivers being consolidated into
> rdmavt in fact.

> Those drivers use the addr field in struct ib_sge to point to a kernel
> virtual address, not to a DMA address.  In Linux u64 is the safe
> superset for a dma_addr_t and a pointer so we'll need to go with that.

Hrm.. sizeof(void *) > sizeof(dma_addr_t) seemed pretty obscure to me,
here is the original discussion:

https://lkml.org/lkml/2006/12/13/245

Sounds like someone was worried about sparc64. I doubt it is an actual
issue today, but granted the u64 did make some sense.

I probably would have just banned compiling qib on such a platform,
and kept the dma_addr_t annotations..

> Now these drivers will end up dma mapping these virtual addresses later,
> so we might want figure out a) why the qib & co drivers even need the
> virtual address, and b) see if we maybe should always do the dma_map
> in the callers anyway, and just have an additional virtual address field
> for those drivers if absolutely needed.

So, I *believe* the issue is that linux has (had?) no approved way to
convert from a device specific dma_addr_t to a virtual address.

I always understood that there was some kind of path in qib where it
needed to memcpy with the CPU the 'dma' data, so if it only had a
dma_addr then there was no way to make it work.

Certainly a future generic soft-rocee driver will need to cpu memcpy from
'dma' memory to a skb..

It is really too bad we can't just use get_dma_ops to handle this case
and instead require our own infrastructure.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-10 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 10:44:13AM -0700, Jason Gunthorpe wrote:
> > the signature of the function in struct ib_dma_mapping_ops.
> 
> It is supposed to be a dma_addr_t 'cookie' not a u64.
> 
> A patch to cleanup the core in this area would be appreciated.

I walked through the ib_dma_* mess in detail, and sadly speaking it
has to be a u64.  This is due to the drivers being consolidated into
rdmavt in fact.

Those drivers use the addr field in struct ib_sge to point to a kernel
virtual address, not to a DMA address.  In Linux u64 is the safe
superset for a dma_addr_t and a pointer so we'll need to go with that.

Now these drivers will end up dma mapping these virtual addresses later,
so we might want figure out a) why the qib & co drivers even need the
virtual address, and b) see if we maybe should always do the dma_map
in the callers anyway, and just have an additional virtual address field
for those drivers if absolutely needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-10 Thread Dennis Dalessandro

On Thu, Dec 10, 2015 at 07:52:31PM +0200, Leon Romanovsky wrote:

On Thu, Dec 10, 2015 at 11:17:09AM -0500, Dennis Dalessandro wrote:

On Tue, Dec 08, 2015 at 08:08:21AM +0200, Leon Romanovsky wrote:
>On Mon, Dec 07, 2015 at 03:43:06PM -0500, Dennis Dalessandro wrote:
>>+
>>+#define BAD_DMA_ADDRESS ((u64)0)
>What is the advantage in using directly u64 values instead of
>pointers? You will get NULL and functions which return pointers
>without need of casting.
>
>...
>>+static u64 rvt_dma_map_single(struct ib_device *dev, void *cpu_addr,
>>+ size_t size, enum dma_data_direction direction)
>>+{
>>+   if (WARN_ON(!valid_dma_direction(direction)))
>>+   return BAD_DMA_ADDRESS;
>>+
>>+   return (u64)cpu_addr;
>>+}
>An example of such function.

Honestly I'm not really sure why it's done this way. We are just following
the signature of the function in struct ib_dma_mapping_ops.


Since that would be a core change it's beyond the scope of this patch series 
I think, but something that could certainly be done. I just wouldn't want to 
tie that to rdmavt.


-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-10 Thread Leon Romanovsky
On Thu, Dec 10, 2015 at 11:17:09AM -0500, Dennis Dalessandro wrote:
> On Tue, Dec 08, 2015 at 08:08:21AM +0200, Leon Romanovsky wrote:
> >On Mon, Dec 07, 2015 at 03:43:06PM -0500, Dennis Dalessandro wrote:
> >>+
> >>+#define BAD_DMA_ADDRESS ((u64)0)
> >What is the advantage in using directly u64 values instead of
> >pointers? You will get NULL and functions which return pointers
> >without need of casting.
> >
> >...
> >>+static u64 rvt_dma_map_single(struct ib_device *dev, void *cpu_addr,
> >>+ size_t size, enum dma_data_direction direction)
> >>+{
> >>+   if (WARN_ON(!valid_dma_direction(direction)))
> >>+   return BAD_DMA_ADDRESS;
> >>+
> >>+   return (u64)cpu_addr;
> >>+}
> >An example of such function.
> 
> Honestly I'm not really sure why it's done this way. We are just following
> the signature of the function in struct ib_dma_mapping_ops.
Is it worth to consider to implement these functions with the pointers?

> 
> -Denny
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-10 Thread Jason Gunthorpe
On Thu, Dec 10, 2015 at 11:17:09AM -0500, Dennis Dalessandro wrote:
> On Tue, Dec 08, 2015 at 08:08:21AM +0200, Leon Romanovsky wrote:
> >On Mon, Dec 07, 2015 at 03:43:06PM -0500, Dennis Dalessandro wrote:
> >>+
> >>+#define BAD_DMA_ADDRESS ((u64)0)

Just use DMA_ERROR_CODE.

> >>+static u64 rvt_dma_map_single(struct ib_device *dev, void *cpu_addr,
> >>+ size_t size, enum dma_data_direction direction)
> >>+{
> >>+   if (WARN_ON(!valid_dma_direction(direction)))
> >>+   return BAD_DMA_ADDRESS;
> >>+
> >>+   return (u64)cpu_addr;
> >>+}
> >An example of such function.
> 
> Honestly I'm not really sure why it's done this way. We are just following
> the signature of the function in struct ib_dma_mapping_ops.

It is supposed to be a dma_addr_t 'cookie' not a u64.

A patch to cleanup the core in this area would be appreciated.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-10 Thread Dennis Dalessandro

On Tue, Dec 08, 2015 at 08:08:21AM +0200, Leon Romanovsky wrote:

On Mon, Dec 07, 2015 at 03:43:06PM -0500, Dennis Dalessandro wrote:

+
+#define BAD_DMA_ADDRESS ((u64)0)

What is the advantage in using directly u64 values instead of
pointers? You will get NULL and functions which return pointers
without need of casting.

...

+static u64 rvt_dma_map_single(struct ib_device *dev, void *cpu_addr,
+ size_t size, enum dma_data_direction direction)
+{
+   if (WARN_ON(!valid_dma_direction(direction)))
+   return BAD_DMA_ADDRESS;
+
+   return (u64)cpu_addr;
+}

An example of such function.


Honestly I'm not really sure why it's done this way. We are just following 
the signature of the function in struct ib_dma_mapping_ops.


-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-10 Thread Haggai Eran
A few nits:

On 07/12/2015 22:43, Dennis Dalessandro wrote:
> +static int rvt_map_sg(struct ib_device *dev, struct scatterlist *sgl,
> +   int nents, enum dma_data_direction direction)
> +{
> + struct scatterlist *sg;
> + u64 addr;
> + int i;
> + int ret = nents;
> +
> + if (WARN_ON(!valid_dma_direction(direction)))
> + return BAD_DMA_ADDRESS;
The function returns 0 on error, so its technically correct, but it
doesn't return a DMA address, so I think it would make more sense to
return 0 here and not BAD_DMA_ADDRESS.

> +
> + for_each_sg(sgl, sg, nents, i) {
> + addr = (u64)page_address(sg_page(sg));
> + if (!addr) {
> + ret = 0;
> + break;
> + }
> + sg->dma_address = addr + sg->offset;
> +#ifdef CONFIG_NEED_SG_DMA_LENGTH
> + sg->dma_length = sg->length;
> +#endif
> + }
> + return ret;
> +}

> diff --git a/drivers/infiniband/sw/rdmavt/vt.h 
> b/drivers/infiniband/sw/rdmavt/vt.h
> index ec210f3..a19a3af 100644
> --- a/drivers/infiniband/sw/rdmavt/vt.h
> +++ b/drivers/infiniband/sw/rdmavt/vt.h
> @@ -52,5 +52,6 @@
>   */
>  
>  #include 
> +#include "dma.h"
Why do you need the dma.h file included here? Won't it be enough to
include it in vt.c?

Regards,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-07 Thread Leon Romanovsky
On Mon, Dec 07, 2015 at 03:43:06PM -0500, Dennis Dalessandro wrote:
> +
> +#define BAD_DMA_ADDRESS ((u64)0)
What is the advantage in using directly u64 values instead of
pointers? You will get NULL and functions which return pointers
without need of casting.

...
> +static u64 rvt_dma_map_single(struct ib_device *dev, void *cpu_addr,
> +   size_t size, enum dma_data_direction direction)
> +{
> + if (WARN_ON(!valid_dma_direction(direction)))
> + return BAD_DMA_ADDRESS;
> +
> + return (u64)cpu_addr;
> +}
An example of such function.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-07 Thread Dennis Dalessandro
This patch adds dma functions to rdmavt. The source is hfi1's version of
dma.c which will be removed by a subsequent hfi1 patch.

Reviewed-by: Ira Weiny 
Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
---
 drivers/infiniband/sw/rdmavt/Makefile |2 
 drivers/infiniband/sw/rdmavt/dma.c|  187 +
 drivers/infiniband/sw/rdmavt/dma.h|   56 ++
 drivers/infiniband/sw/rdmavt/vt.c |   10 ++
 drivers/infiniband/sw/rdmavt/vt.h |1 
 5 files changed, 255 insertions(+), 1 deletions(-)
 create mode 100644 drivers/infiniband/sw/rdmavt/dma.c
 create mode 100644 drivers/infiniband/sw/rdmavt/dma.h

diff --git a/drivers/infiniband/sw/rdmavt/Makefile 
b/drivers/infiniband/sw/rdmavt/Makefile
index 98a664d..134d2d0 100644
--- a/drivers/infiniband/sw/rdmavt/Makefile
+++ b/drivers/infiniband/sw/rdmavt/Makefile
@@ -7,4 +7,4 @@
 #
 obj-$(CONFIG_INFINIBAND_RDMAVT) += rdmavt.o
 
-rdmavt-y := vt.o
+rdmavt-y := vt.o dma.o
diff --git a/drivers/infiniband/sw/rdmavt/dma.c 
b/drivers/infiniband/sw/rdmavt/dma.c
new file mode 100644
index 000..d03d304
--- /dev/null
+++ b/drivers/infiniband/sw/rdmavt/dma.c
@@ -0,0 +1,187 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in
+ *the documentation and/or other materials provided with the
+ *distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *contributors may be used to endorse or promote products derived
+ *from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#include 
+#include 
+#include 
+
+#include "dma.h"
+
+#define BAD_DMA_ADDRESS ((u64)0)
+
+/*
+ * The following functions implement driver specific replacements
+ * for the ib_dma_*() functions.
+ *
+ * These functions return kernel virtual addresses instead of
+ * device bus addresses since the driver uses the CPU to copy
+ * data instead of using hardware DMA.
+ */
+
+static int rvt_mapping_error(struct ib_device *dev, u64 dma_addr)
+{
+   return dma_addr == BAD_DMA_ADDRESS;
+}
+
+static u64 rvt_dma_map_single(struct ib_device *dev, void *cpu_addr,
+ size_t size, enum dma_data_direction direction)
+{
+   if (WARN_ON(!valid_dma_direction(direction)))
+   return BAD_DMA_ADDRESS;
+
+   return (u64)cpu_addr;
+}
+
+static void rvt_dma_unmap_single(struct ib_device *dev, u64 addr, size_t size,
+enum dma_data_direction direction)
+{
+   /* This is a stub, nothing to be done here */
+}
+
+static u64 rvt_dma_map_page(struct ib_device *dev, struct page *page,
+   unsigned long offset, size_t size,
+   enum dma_data_direction direction)
+{
+   u64 addr;
+
+   if (WARN_ON(!valid_dma_direction(direction)))
+   return BAD_DMA_ADDRESS;
+
+   if (offset + size > PAGE_SIZE)
+   return BAD_DMA_ADDRESS;
+
+   addr = (u64)page_address(page);
+   if (addr)
+   addr += offset;
+
+   return addr;
+}
+
+static void rvt_dma_unma