RE: [RFC] Make use of non-dynamic dmabuf in RDMA

2021-08-24 Thread Xiong, Jianxin
> -Original Message-
> From: Alex Deucher 
> Sent: Tuesday, August 24, 2021 12:44 PM
> To: Dave Airlie 
> Cc: John Hubbard ; Jason Gunthorpe ; 
> Christian König ; Gal Pressman
> ; Daniel Vetter ; Sumit Semwal 
> ; Doug Ledford
> ; open list:DMA BUFFER SHARING FRAMEWORK 
> ; dri-devel  de...@lists.freedesktop.org>; Linux Kernel Mailing List 
> ; linux-rdma ; 
> Gabbay,
> Oded (Habana) ; Tayar, Tomer (Habana) ; 
> Yossi Leybovich ; Alexander
> Matushevsky ; Leon Romanovsky ; Xiong, 
> Jianxin 
> Subject: Re: [RFC] Make use of non-dynamic dmabuf in RDMA
> 
> On Tue, Aug 24, 2021 at 3:16 PM Dave Airlie  wrote:
> >
> > On Wed, 25 Aug 2021 at 03:36, John Hubbard  wrote:
> > >
> > > On 8/24/21 10:32 AM, Jason Gunthorpe wrote:
> > > ...
> > > >>> And yes at least for the amdgpu driver we migrate the memory to
> > > >>> host memory as soon as it is pinned and I would expect that
> > > >>> other GPU drivers do something similar.
> > > >>
> > > >> Well...for many topologies, migrating to host memory will result
> > > >> in a dramatically slower p2p setup. For that reason, some GPU
> > > >> drivers may want to allow pinning of video memory in some situations.
> > > >>
> > > >> Ideally, you've got modern ODP devices and you don't even need to pin.
> > > >> But if not, and you still hope to do high performance p2p between
> > > >> a GPU and a non-ODP Infiniband device, then you would need to
> > > >> leave the pinned memory in vidmem.
> > > >>
> > > >> So I think we don't want to rule out that behavior, right? Or is
> > > >> the thinking more like, "you're lucky that this old non-ODP setup
> > > >> works at all, and we'll make it work by routing through host/cpu
> > > >> memory, but it will be slow"?
> > > >
> > > > I think it depends on the user, if the user creates memory which
> > > > is permanently located on the GPU then it should be pinnable in
> > > > this way without force migration. But if the memory is inherently
> > > > migratable then it just cannot be pinned in the GPU at all as we
> > > > can't indefinately block migration from happening eg if the CPU
> > > > touches it later or something.
> > > >
> > >
> > > OK. I just want to avoid creating any API-level assumptions that
> > > dma_buf_pin() necessarily implies or requires migrating to host memory.
> >
> > I'm not sure we should be allowing dma_buf_pin at all on
> > non-migratable memory, what's to stop someone just pinning all the
> > VRAM and making the GPU unuseable?
> 
> In a lot of cases we have GPUs with more VRAM than system memory, but we 
> allow pinning in system memory.
> 
> Alex
> 

In addition, the dma-buf exporter can be a non-GPU device.

Jianxin

> >
> > I understand not considering more than a single user in these
> > situations is enterprise thinking, but I do worry about pinning is
> > always fine type of thinking when things are shared or multi-user.
> >
> > My impression from this is we've designed hardware that didn't
> > consider the problem, and now to let us use that hardware in horrible
> > ways we should just allow it to pin all the things.
> >
> > Dave.


RE: [PATCH rdma-core v2 3/3] configure: Add check for the presence of DRM headers

2021-02-05 Thread Xiong, Jianxin
> -Original Message-
> From: Daniel Vetter 
> Sent: Friday, February 05, 2021 5:54 AM
> To: Jason Gunthorpe 
> Cc: Xiong, Jianxin ; Yishai Hadas 
> ; Leon Romanovsky ; linux-rdma  r...@vger.kernel.org>; John Hubbard ; Edward Srouji 
> ; Emil Velikov
> ; Gal Pressman ; dri-devel 
> ; Doug Ledford
> ; Ali Alnubani ; Vetter, Daniel 
> ; Christian Koenig
> 
> Subject: Re: [PATCH rdma-core v2 3/3] configure: Add check for the presence 
> of DRM headers
> 
> On Fri, Feb 5, 2021 at 2:22 PM Jason Gunthorpe  wrote:
> >
> > On Thu, Feb 04, 2021 at 04:29:14PM -0800, Jianxin Xiong wrote:
> > > Compilation of pyverbs/dmabuf_alloc.c depends on a few DRM headers
> > > that are installed by either the kernel-header or the libdrm package.
> > > The installation is optional and the location is not unique.
> > >
> > > Check the presence of the headers at both the standard locations and
> > > any location defined by custom libdrm installation. If the headers
> > > are missing, the dmabuf allocation routines are replaced by stubs
> > > that return suitable error to allow the related tests to skip.
> > >
> > > Signed-off-by: Jianxin Xiong 
> > >  CMakeLists.txt  | 15 +++
> > >  pyverbs/CMakeLists.txt  | 14 --
> > >  pyverbs/dmabuf_alloc.c  |  8 
> > >  pyverbs/dmabuf_alloc_stub.c | 39
> > > +++
> > >  4 files changed, 70 insertions(+), 6 deletions(-)  create mode
> > > 100644 pyverbs/dmabuf_alloc_stub.c
> > >
> > > diff --git a/CMakeLists.txt b/CMakeLists.txt index 4113423..95aec11
> > > 100644
> > > +++ b/CMakeLists.txt
> > > @@ -515,6 +515,21 @@ find_package(Systemd)
> > >  include_directories(${SYSTEMD_INCLUDE_DIRS})
> > >  RDMA_DoFixup("${SYSTEMD_FOUND}" "systemd/sd-daemon.h")
> > >
> > > +# drm headers
> > > +
> > > +# First check the standard locations. The headers could have been
> > > +installed # by either the kernle-headers package or the libdrm package.
> > > +find_path(DRM_INCLUDE_DIRS "drm.h" PATH_SUFFIXES "drm" "libdrm")
> >
> > Is there a reason not to just always call pkg_check_modules?
> 
> Note that the gpu-specific libraries are split out, so I'd also check for 
> those just to be sure - I don't know whether all distros package the
> uapi headers consistently in libdrm or sometimes also in one of the 
> libdrm-$vendor packages.

The headers come from the libdrm-devel package, which present itself as 
"libdrm" 
under pkg-config. The gpu-specific packages only include the libraries, not the 
headers.

The kernel-headers packages doesn't have pkg-config info and can't be checked 
via pkg_check_modules().

One change I can make here is to use find_path() only for the headers installed 
by the
kernel-headers package (the "drm" path). The "libdrm" path is covered by the 
pkg_check_modules() check anyway.

> -Daniel
> 
> >
> > > +# Then check custom installation of libdrm if (NOT
> > > +DRM_INCLUDE_DIRS)
> > > +  pkg_check_modules(DRM libdrm)
> > > +endif()
> > > +
> > > +if (DRM_INCLUDE_DIRS)
> > > +  include_directories(${DRM_INCLUDE_DIRS})
> > > +endif()
> >
> > This needs a hunk at the end:
> >
> > if (NOT DRM_INCLUDE_DIRS)
> >   message(STATUS " DMABUF NOT supported (disabling some tests)")
> > endif()

Thanks, missed that.

> >
> > >  #-
> > >  # Apply fixups
> > >
> > > diff --git a/pyverbs/CMakeLists.txt b/pyverbs/CMakeLists.txt index
> > > 6fd7625..922253f 100644
> > > +++ b/pyverbs/CMakeLists.txt
> > > @@ -13,8 +13,6 @@ rdma_cython_module(pyverbs ""
> > >cmid.pyx
> > >cq.pyx
> > >device.pyx
> > > -  dmabuf.pyx
> > > -  dmabuf_alloc.c
> > >enums.pyx
> > >mem_alloc.pyx
> > >mr.pyx
> > > @@ -25,6 +23,18 @@ rdma_cython_module(pyverbs ""
> > >xrcd.pyx
> > >  )
> > >
> > > +if (DRM_INCLUDE_DIRS)
> > > +rdma_cython_module(pyverbs ""
> > > +  dmabuf.pyx
> > > +  dmabuf_alloc.c
> > > +)
> > > +else()
> > > +rdma_cython_module(pyverbs ""
> > > +  dmabuf.pyx
> > > +  dmabuf_alloc_stub.c
> > > +)
> > > +endif()
> >
> > Like this:
> >
> > if (DRM_INCLUDE_DIRS)
> >   set(DMABUF_ALLOC dmabuf_alloc.c)
> > else()
> >   set(DMABUF_ALLOC dmabuf_alloc_stbub.c)
> > endif()
> > rdma_cython_module(pyverbs ""
> >   dmabuf.pyx
> >   $(DMABUF_ALLOC)
> > )

Sure, will change.

> >
> > Jason
> > ___
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> 
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core 3/3] configure: Add check for the presence of DRM headers

2021-02-04 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Thursday, February 04, 2021 1:12 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> ; Edward Srouji ; Yishai Hadas 
> ; John Hubbard
> ; Ali Alnubani ; Gal Pressman 
> ; Emil Velikov
> 
> Subject: Re: [PATCH rdma-core 3/3] configure: Add check for the presence of 
> DRM headers
> 
> On Thu, Feb 04, 2021 at 10:50:51AM -0800, Jianxin Xiong wrote:
> > Compilation of pyverbs/dmabuf_alloc.c depends on a few DRM headers
> > that are installed by either the kernel-header or the libdrm package.
> > The installation is optional and the location is not unique.
> >
> > The standard locations (such as /usr/include/drm, /usr/include/libdrm)
> > are checked first. If failed, pkg-config is tried to find the include
> > path of custom libdrm installation. The dmabuf allocation routines now
> > return suitable error when the headers are not available. The related
> > tests will recognize this error code and skip.
> >
> > Signed-off-by: Jianxin Xiong 
> >  CMakeLists.txt |  7 +++
> >  buildlib/Finddrm.cmake | 19 +++
> >  buildlib/config.h.in   |  2 ++
> >  pyverbs/dmabuf_alloc.c | 47
> > ++-
> >  4 files changed, 70 insertions(+), 5 deletions(-)  create mode 100644
> > buildlib/Finddrm.cmake
> >
> > diff --git a/CMakeLists.txt b/CMakeLists.txt index 4113423..feaba3a
> > 100644
> > +++ b/CMakeLists.txt
> > @@ -515,6 +515,13 @@ find_package(Systemd)
> >  include_directories(${SYSTEMD_INCLUDE_DIRS})
> >  RDMA_DoFixup("${SYSTEMD_FOUND}" "systemd/sd-daemon.h")
> >
> > +# drm headers
> > +find_package(drm)
> > +if (DRM_INCLUDE_DIRS)
> > +  include_directories(${DRM_INCLUDE_DIRS})
> > +  set(HAVE_DRM_H 1)
> > +endif()
> > +
> >  #-
> >  # Apply fixups
> >
> > diff --git a/buildlib/Finddrm.cmake b/buildlib/Finddrm.cmake new file
> > mode 100644 index 000..6f8e5f2
> > +++ b/buildlib/Finddrm.cmake
> > @@ -0,0 +1,19 @@
> > +# COPYRIGHT (c) 2021 Intel Corporation.
> > +# Licensed under BSD (MIT variant) or GPLv2. See COPYING.
> > +
> > +# Check standard locations first
> > +find_path(DRM_INCLUDE_DIRS "drm.h" PATH_SUFFIXES "drm" "libdrm")
> > +
> > +# Check custom libdrm installation, if any if (NOT DRM_INCLUDE_DIRS)
> > +  execute_process(COMMAND pkg-config --cflags-only-I libdrm
> > +OUTPUT_VARIABLE _LIBDRM
> > +RESULT_VARIABLE _LIBDRM_RESULT
> > +ERROR_QUIET)
> > +
> > +  if (NOT _LIBDRM_RESULT)
> > +string(REGEX REPLACE "^-I" "" DRM_INCLUDE_DIRS "${_LIBDRM}")
> > +  endif()
> > +  unset(_LIBDRM)
> > +  unset(_LIBDRM_RESULT)
> > +endif()
> 
> I think this should be using pkg_check_modules() ??
> 
> Look at the NL stuff:
> 
>   pkg_check_modules(NL libnl-3.0 libnl-route-3.0 REQUIRED)
>   include_directories(${NL_INCLUDE_DIRS})
>   link_directories(${NL_LIBRARY_DIRS})
>

Yes, this is much simpler than the pkg-config method. 
 
> > +#if HAVE_DRM_H
> > +
> 
> Would rather you use cmake to conditionally compile a dmabuf_alloc.c or a 
> dmabuf_alloc_stub.c than ifdef the entire file

Sure, will try that.

> 
> Jaason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core v7 4/6] pyverbs: Add dma-buf based MR support

2021-02-03 Thread Xiong, Jianxin
> -Original Message-
> From: Daniel Vetter 
> Sent: Wednesday, February 03, 2021 9:53 AM
> To: Xiong, Jianxin 
> Cc: Leon Romanovsky ; Jason Gunthorpe ; Gal 
> Pressman ; Yishai Hadas
> ; linux-rdma ; Edward Srouji 
> ; dri-devel  de...@lists.freedesktop.org>; Christian Koenig ; 
> Doug Ledford ; Vetter, Daniel
> 
> Subject: Re: [PATCH rdma-core v7 4/6] pyverbs: Add dma-buf based MR support
> 
> On Wed, Feb 3, 2021 at 5:57 PM Xiong, Jianxin  wrote:
> >
> > > -Original Message-
> > > From: Leon Romanovsky 
> > > Sent: Tuesday, February 02, 2021 10:03 PM
> > > To: Daniel Vetter 
> > > Cc: Xiong, Jianxin ; Jason Gunthorpe
> > > ; Gal Pressman ; Yishai Hadas
> > > ; linux-rdma ;
> > > Edward Srouji ; dri-devel  > > de...@lists.freedesktop.org>; Christian Koenig
> > > ; Doug Ledford ;
> > > Vetter, Daniel 
> > > Subject: Re: [PATCH rdma-core v7 4/6] pyverbs: Add dma-buf based MR
> > > support
> > >
> >
> > <...>
> >
> > > > > > > > > > +#include 
> > > > > > > > > > +#include  #include 
> > > > > > > > > > +#include 
> > > > > > > > >
> > > > > > > > > I assume these should come from the kernel headers package, 
> > > > > > > > > right?
> > > > > > > >
> > > > > > > > This is gross, all kernel headers should be placed in
> > > > > > > > kernel-headers/* and "update" script needs to be extended to 
> > > > > > > > take drm/* files too :(.
> > > > > > >
> > > > > > > drm kernel headers are in the libdrm package. You need that
> > > > > > > anyway for doing the ioctls (if you don't hand-roll the 
> > > > > > > restarting yourself).
> > > > > > >
> > > > > > > Also our userspace has gone over to just outright copying
> > > > > > > the driver headers. Not the generic headers, but for the
> > > > > > > rendering side of gpus, which is the topic here, there's really 
> > > > > > > not much generic stuff.
> > > > > > >
> > > > > > > > Jianxin, are you fixing it?
> > > > > > >
> > > > > > > So fix is either to depend upon libdrm for building, or have
> > > > > > > copies of the headers included in the package for the
> > > > > > > i915/amdgpu/radeon headers (drm/drm.h probably not so good idea).
> > > > > >
> > > > > > We should have a cmake test to not build the drm parts if it can't 
> > > > > > be built, and pyverbs should skip the tests.
> > > > > >
> > > > >
> > > > > Yes, I will add a test for that. Also, on SLES, the headers
> > > > > could be under /usr/include/libdrm instead of /usr/include/drm.
> > > > > The make test
> > > should check that and use proper path.
> > > >
> > > > Please use pkgconfig for this, libdrm installs a .pc file to make
> > > > sure you can find the right headers.
> > >
> > > rdma-core uses cmake build system and in our case cmake find_library() is 
> > > preferable over pkgconfig.
> >
> > Only the headers are needed, and they could be installed via either the 
> > libdrm-devel package or the kernel-headers package. The cmake
> find_path() command is more suitable here.
> 
> Except if your distro is buggy (or doesn't support any gpu drivers at
> all) they will never be installed as part of kernel-headers.

Right, that's why we want to check for the existence of the header file 
(find_path() does just that) instead of the existence of the package(s).

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core v7 4/6] pyverbs: Add dma-buf based MR support

2021-02-03 Thread Xiong, Jianxin
> -Original Message-
> From: Leon Romanovsky 
> Sent: Tuesday, February 02, 2021 10:03 PM
> To: Daniel Vetter 
> Cc: Xiong, Jianxin ; Jason Gunthorpe 
> ; Gal Pressman ; Yishai Hadas
> ; linux-rdma ; Edward Srouji 
> ; dri-devel  de...@lists.freedesktop.org>; Christian Koenig ; 
> Doug Ledford ; Vetter, Daniel
> 
> Subject: Re: [PATCH rdma-core v7 4/6] pyverbs: Add dma-buf based MR support
> 

<...>

> > > > > > > > +#include 
> > > > > > > > +#include  #include 
> > > > > > > > +#include 
> > > > > > >
> > > > > > > I assume these should come from the kernel headers package, right?
> > > > > >
> > > > > > This is gross, all kernel headers should be placed in
> > > > > > kernel-headers/* and "update" script needs to be extended to take 
> > > > > > drm/* files too :(.
> > > > >
> > > > > drm kernel headers are in the libdrm package. You need that
> > > > > anyway for doing the ioctls (if you don't hand-roll the restarting 
> > > > > yourself).
> > > > >
> > > > > Also our userspace has gone over to just outright copying the
> > > > > driver headers. Not the generic headers, but for the rendering
> > > > > side of gpus, which is the topic here, there's really not much 
> > > > > generic stuff.
> > > > >
> > > > > > Jianxin, are you fixing it?
> > > > >
> > > > > So fix is either to depend upon libdrm for building, or have
> > > > > copies of the headers included in the package for the
> > > > > i915/amdgpu/radeon headers (drm/drm.h probably not so good idea).
> > > >
> > > > We should have a cmake test to not build the drm parts if it can't be 
> > > > built, and pyverbs should skip the tests.
> > > >
> > >
> > > Yes, I will add a test for that. Also, on SLES, the headers could be 
> > > under /usr/include/libdrm instead of /usr/include/drm. The make test
> should check that and use proper path.
> >
> > Please use pkgconfig for this, libdrm installs a .pc file to make sure
> > you can find the right headers.
> 
> rdma-core uses cmake build system and in our case cmake find_library() is 
> preferable over pkgconfig.

Only the headers are needed, and they could be installed via either the 
libdrm-devel package or the kernel-headers package. The cmake find_path() 
command is more suitable here.
 
> 
> Thanks
> 
> > -Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core v7 5/6] tests: Add tests for dma-buf based memory regions

2021-02-01 Thread Xiong, Jianxin
> -Original Message-
> From: John Hubbard 
> Sent: Sunday, January 31, 2021 12:45 AM
> To: Xiong, Jianxin ; linux-r...@vger.kernel.org; 
> dri-devel@lists.freedesktop.org
> Cc: Doug Ledford ; Jason Gunthorpe ; Leon 
> Romanovsky ; Sumit Semwal
> ; Christian Koenig ; 
> Vetter, Daniel ; Edward Srouji
> ; Yishai Hadas 
> Subject: Re: [PATCH rdma-core v7 5/6] tests: Add tests for dma-buf based 
> memory regions
> 
> On 1/25/21 11:57 AM, Jianxin Xiong wrote:
> > Define a set of unit tests similar to regular MR tests and a set of
> > tests for send/recv and rdma traffic using dma-buf MRs. Add a utility
> > function to generate access flags for dma-buf based MRs because the
> > set of supported flags is smaller.
> >
> > Signed-off-by: Jianxin Xiong 
> 
> Hi Jianxin,
> 
> It's awesome to see a GPU to IB test suite here!
> 
> > ---
> >   tests/args_parser.py |   4 +
> >   tests/test_mr.py | 266 
> > ++-
> >   tests/utils.py   |  26 +
> >   3 files changed, 295 insertions(+), 1 deletion(-)
> >
> > diff --git a/tests/args_parser.py b/tests/args_parser.py index
> > 446535a..5bc53b0 100644
> > --- a/tests/args_parser.py
> > +++ b/tests/args_parser.py
> > @@ -19,6 +19,10 @@ class ArgsParser(object):
> >   parser.add_argument('--port',
> >   help='Use port  of RDMA device', 
> > type=int,
> >   default=1)
> > +parser.add_argument('--gpu', nargs='?', type=int, const=0, 
> > default=0,
> > +help='GPU unit to allocate dmabuf from')
> > +parser.add_argument('--gtt', action='store_true', default=False,
> > +help='Allocate dmabuf from GTT instead of
> > + VRAM')
> 
> Just to be kind to non-GPU people, how about:
> 
>   s/GTT/GTT (Graphics Translation Table)/
> 
> 

<...>

> > +def check_dmabuf_support(unit=0):
> > +"""
> > +Check if dma-buf allocation is supported by the system.
> > +Skip the test on failure.
> > +"""
> > +device_num = 128 + unit
> > +try:
> > +DmaBuf(1, unit=unit)
> 
> unit?? This is a GPU, never anything else! Let's s/unit/gpu/ throughout, yes?
> 

Thanks for the feedback. Will integrate the suggestions in upcoming patch.

Jianxin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core v7 4/6] pyverbs: Add dma-buf based MR support

2021-02-01 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Monday, February 01, 2021 7:29 AM
> To: Daniel Vetter 
> Cc: Leon Romanovsky ; Gal Pressman ; 
> Xiong, Jianxin ; Yishai Hadas
> ; linux-rdma ; Edward Srouji 
> ; dri-devel  de...@lists.freedesktop.org>; Christian Koenig ; 
> Doug Ledford ; Vetter, Daniel
> 
> Subject: Re: [PATCH rdma-core v7 4/6] pyverbs: Add dma-buf based MR support
> 
> On Mon, Feb 01, 2021 at 03:10:00PM +0100, Daniel Vetter wrote:
> > On Mon, Feb 1, 2021 at 7:16 AM Leon Romanovsky  wrote:
> > >
> > > On Sun, Jan 31, 2021 at 05:31:16PM +0200, Gal Pressman wrote:
> > > > On 25/01/2021 21:57, Jianxin Xiong wrote:
> > > > > Define a new sub-class of 'MR' that uses dma-buf object for the
> > > > > memory region. Define a new class 'DmaBuf' as a wrapper for
> > > > > dma-buf allocation mechanism implemented in C.
> > > > >
> > > > > Update the cmake function for cython modules to allow building
> > > > > modules with mixed cython and c source files.
> > > > >
> > > > > Signed-off-by: Jianxin Xiong 
> > > > > buildlib/pyverbs_functions.cmake |  78 +++
> > > > >  pyverbs/CMakeLists.txt   |  11 +-
> > > > >  pyverbs/dmabuf.pxd   |  15 +++
> > > > >  pyverbs/dmabuf.pyx   |  73 ++
> > > > >  pyverbs/dmabuf_alloc.c   | 278 
> > > > > +++
> > > > >  pyverbs/dmabuf_alloc.h   |  19 +++
> > > > >  pyverbs/libibverbs.pxd   |   2 +
> > > > >  pyverbs/mr.pxd   |   6 +
> > > > >  pyverbs/mr.pyx   | 105 ++-
> > > > >  9 files changed, 557 insertions(+), 30 deletions(-)  create
> > > > > mode 100644 pyverbs/dmabuf.pxd  create mode 100644
> > > > > pyverbs/dmabuf.pyx  create mode 100644 pyverbs/dmabuf_alloc.c
> > > > > create mode 100644 pyverbs/dmabuf_alloc.h
> > >
> > > <...>
> > >
> > > > > index 000..05eae75
> > > > > +++ b/pyverbs/dmabuf_alloc.c
> > > > > @@ -0,0 +1,278 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > > > +/*
> > > > > + * Copyright 2020 Intel Corporation. All rights reserved. See
> > > > > +COPYING file  */
> > > > > +
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > >
> > > > I assume these should come from the kernel headers package, right?
> > >
> > > This is gross, all kernel headers should be placed in
> > > kernel-headers/* and "update" script needs to be extended to take drm/* 
> > > files too :(.
> >
> > drm kernel headers are in the libdrm package. You need that anyway for
> > doing the ioctls (if you don't hand-roll the restarting yourself).
> >
> > Also our userspace has gone over to just outright copying the driver
> > headers. Not the generic headers, but for the rendering side of gpus,
> > which is the topic here, there's really not much generic stuff.
> >
> > > Jianxin, are you fixing it?
> >
> > So fix is either to depend upon libdrm for building, or have copies of
> > the headers included in the package for the i915/amdgpu/radeon headers
> > (drm/drm.h probably not so good idea).
> 
> We should have a cmake test to not build the drm parts if it can't be built, 
> and pyverbs should skip the tests.
> 

Yes, I will add a test for that. Also, on SLES, the headers could be under 
/usr/include/libdrm instead of /usr/include/drm. The make test should check 
that and use proper path. 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v16 0/4] RDMA: Add dma-buf support

2021-01-12 Thread Xiong, Jianxin
 -Original Message-
> From: Yishai Hadas 
> Sent: Tuesday, January 12, 2021 4:49 AM
> To: Xiong, Jianxin ; Alex Deucher 
> 
> Cc: Jason Gunthorpe ; Leon Romanovsky ; 
> linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org;
> Doug Ledford ; Vetter, Daniel ; 
> Christian Koenig ; Yishai
> Hadas 
> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> On 1/11/2021 7:55 PM, Xiong, Jianxin wrote:
> >> -Original Message-
> >> From: Alex Deucher 
> >> Sent: Monday, January 11, 2021 9:47 AM
> >> To: Xiong, Jianxin 
> >> Cc: Jason Gunthorpe ; Leon Romanovsky
> >> ; linux-r...@vger.kernel.org;
> >> dri-devel@lists.freedesktop.org; Doug Ledford ;
> >> Vetter, Daniel ; Christian Koenig
> >> 
> >> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> >>
> >> On Mon, Jan 11, 2021 at 12:44 PM Xiong, Jianxin  
> >> wrote:
> >>>> -Original Message-
> >>>> From: Jason Gunthorpe 
> >>>> Sent: Monday, January 11, 2021 7:43 AM
> >>>> To: Xiong, Jianxin 
> >>>> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org;
> >>>> Doug Ledford ; Leon Romanovsky
> >>>> ; Sumit Semwal ;
> >>>> Christian Koenig ; Vetter, Daniel
> >>>> 
> >>>> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> >>>>
> >>>> On Mon, Jan 11, 2021 at 03:24:18PM +, Xiong, Jianxin wrote:
> >>>>> Jason, will this series be able to get into 5.12?
> >>>> I was going to ask you where things are after the break?
> >>>>
> >>>> Did everyone agree the userspace stuff is OK now? Is Edward OK with
> >>>> the pyverbs changes, etc
> >>>>
> >>> There is no new comment on the both the kernel and userspace series.
> >>> I assume silence means no objection. I will ask for opinions on the 
> >>> userspace thread.
> >> Do you have a link to the userspace thread?
> >>
> > https://www.spinics.net/lists/linux-rdma/msg98135.html
> >
> Any reason why the 'fork' comment that was given few times wasn't not handled 
> / answered ?
> 
> Specifically,
> 
> ibv_reg_dmabuf_mr() doesn't call ibv_dontfork_range() but ibv_dereg_mr does 
> call its opposite API (i.e. ibv_dofork_range())
> 

Sorry, that part was missed. Strangely enough, a few of your replies didn't 
reach my inbox and I just found them in the web archives:  
https://www.spinics.net/lists/linux-rdma/msg97973.html, and 
https://www.spinics.net/lists/linux-rdma/msg98133.html

I will add check to ibv_dereg_mr() to avoid calling ibv_ibv_dofork_range() for 
dmabuf case.

Thanks a lot for bring this up again.

Jianxin
  


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v16 0/4] RDMA: Add dma-buf support

2021-01-11 Thread Xiong, Jianxin
> -Original Message-
> From: Alex Deucher 
> Sent: Monday, January 11, 2021 9:47 AM
> To: Xiong, Jianxin 
> Cc: Jason Gunthorpe ; Leon Romanovsky ; 
> linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org;
> Doug Ledford ; Vetter, Daniel ; 
> Christian Koenig 
> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> On Mon, Jan 11, 2021 at 12:44 PM Xiong, Jianxin  
> wrote:
> >
> > > -Original Message-
> > > From: Jason Gunthorpe 
> > > Sent: Monday, January 11, 2021 7:43 AM
> > > To: Xiong, Jianxin 
> > > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org;
> > > Doug Ledford ; Leon Romanovsky
> > > ; Sumit Semwal ; Christian
> > > Koenig ; Vetter, Daniel
> > > 
> > > Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> > >
> > > On Mon, Jan 11, 2021 at 03:24:18PM +, Xiong, Jianxin wrote:
> > > > Jason, will this series be able to get into 5.12?
> > >
> > > I was going to ask you where things are after the break?
> > >
> > > Did everyone agree the userspace stuff is OK now? Is Edward OK with
> > > the pyverbs changes, etc
> > >
> >
> > There is no new comment on the both the kernel and userspace series. I
> > assume silence means no objection. I will ask for opinions on the userspace 
> > thread.
> 
> Do you have a link to the userspace thread?
> 
https://www.spinics.net/lists/linux-rdma/msg98135.html

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core v4 0/6] Add user space dma-buf support

2021-01-11 Thread Xiong, Jianxin
Does this series look fine now? I would like to try resolving any remaining 
issues so we
Can catch the window for kernel 5.12.

Edward, do you have more opinions on the pyverbs related changes?

Thanks,
Jianxin

> -Original Message-
> From: Xiong, Jianxin 
> Sent: Wednesday, December 02, 2020 12:57 PM
> To: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org
> Cc: Xiong, Jianxin ; Doug Ledford 
> ; Jason Gunthorpe ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> ; Edward Srouji 
> Subject: [PATCH rdma-core v4 0/6] Add user space dma-buf support
> 
> This is the fourth version of the patch series. Change log:
> 
> v4:
> * Rework the cmake funciton rdma_cython_module to support both single
>   source (.pyx) and multiple source (.pyx + [.c]*) scenarios instead
>   of using two separate functions
> * Rename 'dri_*' to 'drm_*' for the dmabuf allocation interface
> * Add option to dmabuf allocation routine to allow allocation from GTT
>   instead of VRAM
> * Add proper CPU access flags when allocating dmabufs
> * Remove 'radeon' driver support from the dmabuf allocation routines
> * Add comand line arguments to the tests for selecting GPU unit and
>   setting the option for allocating from GTT
> 
> v3: https://www.spinics.net/lists/linux-rdma/msg98059.html
> * Add parameter 'iova' to the new ibv_reg_dmabuf_mr() API
> * Change the way of allocating dma-buf object - use /dev/dri/renderD*
>   instead of /dev/dri/card* and use GEM object instead of dumb buffer
> * Add cmake function to allow building modules with mixed cython and C
>   source files
> * Add new tests that use dma-buf MRs for send/recv and rdma traffic
> * Skip dma-buf tests on unsupported systems
> * Remove some use of random values in the new tests
> * Add dealloc() and close() methods to the new classes
> * Replace string.format with f-string in python code
> * Fix some coding style issues: spacing, indentation, typo, comments
> 
> v2: https://www.spinics.net/lists/linux-rdma/msg97936.html
> * Put the kernel header updates into a separate commit
> * Add comments for the data structure used in python ioctl calls
> * Fix issues related to symbol versioning
> * Fix styling issues: extra spaces, unncecessary variable, typo
> * Fix an inproper error code usage
> * Put the new op into ibv_context_ops instead if verbs_context
> 
> v1: https://www.spinics.net/lists/linux-rdma/msg97865.html
> * Add user space API for registering dma-buf based memory regions
> * Update pyverbs with the new API
> * Add new tests
> 
> This is the user space counter-part of the kernel patch set to add dma-buf 
> support to the RDMA subsystem.
> 
> This series consists of six patches. The first patch updates the kernel 
> headers for dma-buf support. Patch 2 adds the new API function and
> updates the man pages. Patch 3 implements the new API in the mlx5 provider. 
> Patch 4 adds new class definitions to pyverbs for the new API.
> Patch 5 adds a set of new tests for the new API. Patch 6 fixes bug in the 
> utility code of the tests.
> 
> Pull request at github: https://github.com/linux-rdma/rdma-core/pull/895
> 
> Jianxin Xiong (6):
>   Update kernel headers
>   verbs: Support dma-buf based memory region
>   mlx5: Support dma-buf based memory region
>   pyverbs: Add dma-buf based MR support
>   tests: Add tests for dma-buf based memory regions
>   tests: Bug fix for get_access_flags()
> 
>  buildlib/pyverbs_functions.cmake |  78 ++---
>  debian/libibverbs1.symbols   |   2 +
>  kernel-headers/rdma/ib_user_ioctl_cmds.h |  14 ++
>  kernel-headers/rdma/ib_user_verbs.h  |  14 --
>  libibverbs/CMakeLists.txt|   2 +-
>  libibverbs/cmd_mr.c  |  38 +
>  libibverbs/driver.h  |   7 +
>  libibverbs/dummy_ops.c   |  11 ++
>  libibverbs/libibverbs.map.in |   6 +
>  libibverbs/man/ibv_reg_mr.3  |  27 ++-
>  libibverbs/verbs.c   |  18 ++
>  libibverbs/verbs.h   |  11 ++
>  providers/mlx5/mlx5.c|   2 +
>  providers/mlx5/mlx5.h|   3 +
>  providers/mlx5/verbs.c   |  22 +++
>  pyverbs/CMakeLists.txt   |  11 +-
>  pyverbs/dmabuf.pxd   |  15 ++
>  pyverbs/dmabuf.pyx   |  73 
>  pyverbs/dmabuf_alloc.c   | 278 
> +++
>  pyverbs/dmabuf_alloc.h   |  19 +++
>  pyverbs/libibverbs.pxd   |   2 +
>  pyverbs/mr.pxd   |   6 +
>  pyverbs/mr.pyx   

RE: [PATCH v16 0/4] RDMA: Add dma-buf support

2021-01-11 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Monday, January 11, 2021 7:43 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> On Mon, Jan 11, 2021 at 03:24:18PM +, Xiong, Jianxin wrote:
> > Jason, will this series be able to get into 5.12?
> 
> I was going to ask you where things are after the break?
> 
> Did everyone agree the userspace stuff is OK now? Is Edward OK with the 
> pyverbs changes, etc
> 

There is no new comment on the both the kernel and userspace series. I assume 
silence
means no objection. I will ask for opinions on the userspace thread.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v16 0/4] RDMA: Add dma-buf support

2021-01-11 Thread Xiong, Jianxin
Jason, will this series be able to get into 5.12?

> -Original Message-
> From: Xiong, Jianxin 
> Sent: Tuesday, December 15, 2020 1:27 PM
> To: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org
> Cc: Xiong, Jianxin ; Doug Ledford 
> ; Jason Gunthorpe ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> This is the sixteenth version of the patch set. Changelog:
> 
> v16:
> * Add "select DMA_SHARED_BUFFER" to Kconfig when IB UMEM is enabled.
>   This fixes the auto build test error with a random config.
> 
> v15: https://www.spinics.net/lists/linux-rdma/msg98369.html
> * Rebase to the latest linux-rdma 'for-next' branch (commit 0583531bb9ef)
>   to pick up RDMA core and mlx5 updates
> * Let ib_umem_dmabuf_get() return 'struct ib_umem_dmabuf *' instead of
>   'struct ib_umem *'
> * Move the check of on demand paging support to mlx5_ib_reg_user_mr_dmabuf()
> * Check iova alignment at the entry point of the uverb command so that
>   mlx5_umem_dmabuf_default_pgsz() can always succeed
> 
> v14: https://www.spinics.net/lists/linux-rdma/msg98265.html
> * Check return value of dma_fence_wait()
> * Fix a dma-buf leak in ib_umem_dmabuf_get()
> * Fix return value type cast for ib_umem_dmabuf_get()
> * Return -EOPNOTSUPP instead of -EINVAL for unimplemented functions
> * Remove an unnecessary use of unlikely()
> * Remove left-over commit message resulted from rebase
> 
> v13: https://www.spinics.net/lists/linux-rdma/msg98227.html
> * Rebase to the latest linux-rdma 'for-next' branch (5.10.0-rc6+)
> * Check for device on-demand paging capability at the entry point of
>   the new verbs command to avoid calling device's reg_user_mr_dmabuf()
>   method when CONFIG_INFINIBAND_ON_DEMAND_PAGING is diabled.
> 
> v12: https://www.spinics.net/lists/linux-rdma/msg97943.html
> * Move the prototype of function ib_umem_dmabuf_release() to ib_umem.h
>   and remove umem_dmabuf.h
> * Break a line that is too long
> 
> v11: https://www.spinics.net/lists/linux-rdma/msg97860.html
> * Rework the parameter checking code inside ib_umem_dmabuf_get()
> * Fix incorrect error handling in the new verbs command handler
> * Put a duplicated code sequence for checking iova and setting page size
>   into a function
> * In the invalidation callback, check for if the buffer has been mapped
>   and thus the presence of a valid driver mr is ensured
> * The patch that checks for dma_virt_ops is dropped because it is no
>   longer needed
> * The patch that documents that dma-buf size is fixed has landed at:
>   https://cgit.freedesktop.org/drm/drm-misc/commit/?id=476b485be03c
>   and thus is no longer included here
> * The matching user space patch set is sent separately
> 
> v10: https://www.spinics.net/lists/linux-rdma/msg97483.html
> * Don't map the pages in ib_umem_dmabuf_get(); use the size information
>   of the dma-buf object to validate the umem size instead
> * Use PAGE_SIZE directly instead of use ib_umem_find_best_pgsz() when
>   the MR is created since the pages have not been mapped yet and dma-buf
>   requires PAGE_SIZE anyway
> * Always call mlx5_umem_find_best_pgsz() after mapping the pages to
>   verify that the page size requirement is satisfied
> * Add a patch to document that dma-buf size is fixed
> 
> v9: https://www.spinics.net/lists/linux-rdma/msg97432.html
> * Clean up the code for sg list in-place modification
> * Prevent dma-buf pages from being mapped multiple times
> * Map the pages in ib_umem_dmabuf_get() so that inproper values of
>   address/length/iova can be caught early
> * Check for unsupported flags in the new uverbs command
> * Add missing uverbs_finalize_uobj_create()
> * Sort uverbs objects by name
> * Fix formating issue -- unnecessary alignment of '='
> * Unmap pages in mlx5_ib_fence_dmabuf_mr()
> * Remove address range checking from pagefault_dmabuf_mr()
> 
> v8: https://www.spinics.net/lists/linux-rdma/msg97370.html
> * Modify the dma-buf sg list in place to get a proper umem sg list and
>   restore it before calling dma_buf_unmap_attachment()
> * Validate the umem sg list with ib_umem_find_best_pgsz()
> * Remove the logic for slicing the sg list at runtime
> 
> v7: https://www.spinics.net/lists/linux-rdma/msg97297.html
> * Rebase on top of latest mlx5 MR patch series
> * Slice dma-buf sg list at runtime instead of creating a new list
> * Preload the buffer page mapping when the MR is created
> * Move the 'dma_virt_ops' check into dma_buf_dynamic_attach()
> 
> v6: https://www.spinics.net/lists/linux-rdma/msg96923.html
> * Move the dma-buf in

RE: [PATCH v14 4/4] RDMA/mlx5: Support dma-buf based userspace memory region

2020-12-09 Thread Xiong, Jianxin
> -Original Message-
> From: Leon Romanovsky 
> Sent: Tuesday, December 08, 2020 10:45 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Jason Gunthorpe ;
> Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel 
> Subject: Re: [PATCH v14 4/4] RDMA/mlx5: Support dma-buf based userspace 
> memory region
> 
> On Tue, Dec 08, 2020 at 02:39:15PM -0800, Jianxin Xiong wrote:
> > Implement the new driver method 'reg_user_mr_dmabuf'.  Utilize the
> > core functions to import dma-buf based memory region and update the 
> > mappings.
> >
> > Add code to handle dma-buf related page fault.
> >
> > Signed-off-by: Jianxin Xiong 
> > Reviewed-by: Sean Hefty 
> > Acked-by: Michael J. Ruhl 
> > Acked-by: Christian Koenig 
> > Acked-by: Daniel Vetter 
> > ---
> >  drivers/infiniband/hw/mlx5/main.c|   2 +
> >  drivers/infiniband/hw/mlx5/mlx5_ib.h |  18 +
> >  drivers/infiniband/hw/mlx5/mr.c  | 128 
> > +--
> >  drivers/infiniband/hw/mlx5/odp.c |  86 +--
> >  4 files changed, 225 insertions(+), 9 deletions(-)
> 
> <...>
> 
> >
> > +
> > +   umem = ib_umem_dmabuf_get(&dev->ib_dev, offset, length, fd, 
> > access_flags,
> > + &mlx5_ib_dmabuf_attach_ops);
> > +   if (IS_ERR(umem)) {
> > +   mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(umem));
> > +   return ERR_PTR(PTR_ERR(umem));
> 
> return ERR_CAST(umem);
> 
> > +   }
> 
> <...>
> 
> > +   dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > +   err = ib_umem_dmabuf_map_pages(umem_dmabuf);
> > +   if (!err) {
> > +   page_size = mlx5_umem_find_best_pgsz(&umem_dmabuf->umem, mkc,
> > +log_page_size, 0,
> > +umem_dmabuf->umem.iova);
> > +   if (unlikely(page_size < PAGE_SIZE)) {
> > +   ib_umem_dmabuf_unmap_pages(umem_dmabuf);
> > +   err = -EINVAL;
> > +   } else {
> > +   err = mlx5_ib_update_mr_pas(mr, xlt_flags);
> > +   }
> > +   }
> > +   dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> 
> Let's write this section in kernel coding style, please
> 
> dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL); err = 
> ib_umem_dmabuf_map_pages(umem_dmabuf);
> if (err) {
>   dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
>   return err;
> }
> .
> 

Thanks a lot. Will fix these.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v14 4/4] RDMA/mlx5: Support dma-buf based userspace memory region

2020-12-09 Thread Xiong, Jianxin
> -Original Message-
> From: Yishai Hadas 
> Sent: Tuesday, December 08, 2020 11:13 PM
> To: Xiong, Jianxin 
> Cc: Doug Ledford ; Jason Gunthorpe ; Leon 
> Romanovsky ; Sumit Semwal
> ; Christian Koenig ; 
> Vetter, Daniel ; linux-
> r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Yishai Hadas 
> 
> Subject: Re: [PATCH v14 4/4] RDMA/mlx5: Support dma-buf based userspace 
> memory region
> 
> On 12/9/2020 12:39 AM, Jianxin Xiong wrote:
> > Implement the new driver method 'reg_user_mr_dmabuf'.  Utilize the
> > core functions to import dma-buf based memory region and update the 
> > mappings.
> >
> > Add code to handle dma-buf related page fault.
> >
> > Signed-off-by: Jianxin Xiong 
> > Reviewed-by: Sean Hefty 
> > Acked-by: Michael J. Ruhl 
> > Acked-by: Christian Koenig 
> > Acked-by: Daniel Vetter 
> > ---
> >   drivers/infiniband/hw/mlx5/main.c|   2 +
> >   drivers/infiniband/hw/mlx5/mlx5_ib.h |  18 +
> >   drivers/infiniband/hw/mlx5/mr.c  | 128 
> > +--
> >   drivers/infiniband/hw/mlx5/odp.c |  86 +--
> >   4 files changed, 225 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/main.c
> > b/drivers/infiniband/hw/mlx5/main.c
> > index 4a054eb..c025746 100644
> > --- a/drivers/infiniband/hw/mlx5/main.c
> > +++ b/drivers/infiniband/hw/mlx5/main.c
> > @@ -1,6 +1,7 @@
> >   // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> >   /*
> >* Copyright (c) 2013-2020, Mellanox Technologies inc. All rights 
> > reserved.
> > + * Copyright (c) 2020, Intel Corporation. All rights reserved.
> >*/
> >
> >   #include 
> > @@ -4069,6 +4070,7 @@ static int mlx5_ib_enable_driver(struct ib_device 
> > *dev)
> > .query_srq = mlx5_ib_query_srq,
> > .query_ucontext = mlx5_ib_query_ucontext,
> > .reg_user_mr = mlx5_ib_reg_user_mr,
> > +   .reg_user_mr_dmabuf = mlx5_ib_reg_user_mr_dmabuf,
> > .req_notify_cq = mlx5_ib_arm_cq,
> > .rereg_user_mr = mlx5_ib_rereg_user_mr,
> > .resize_cq = mlx5_ib_resize_cq,
> > diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> > b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> > index 718e59f..6f4d1b4 100644
> > --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> > +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> > @@ -1,6 +1,7 @@
> >   /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
> >   /*
> >* Copyright (c) 2013-2020, Mellanox Technologies inc. All rights 
> > reserved.
> > + * Copyright (c) 2020, Intel Corporation. All rights reserved.
> >*/
> >
> >   #ifndef MLX5_IB_H
> > @@ -704,6 +705,12 @@ static inline bool is_odp_mr(struct mlx5_ib_mr *mr)
> >mr->umem->is_odp;
> >   }
> >
> > +static inline bool is_dmabuf_mr(struct mlx5_ib_mr *mr) {
> > +   return IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING) && mr->umem &&
> > +  mr->umem->is_dmabuf;
> > +}
> > +
> 
> 
> Didn't we agree that IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)
> should be checked earlier
> 
> upon mlx5_ib_reg_user_mr_dmabuf () to fully prevent the functionality when 
> ODP is not supported  ? see note on previous versions.

Yes, it's now checked at the entry point of the uverbs command for registering 
dmabuf mr
 (see the following lines extracted from patch 0003 in this set). The check is 
on the device's
on-demand paging cap, which is reset when CONFIG_INFINIBAND_ON_DEMAND_PAGING
is not enabled.
 
+   if (!(pd->device->attrs.device_cap_flags & IB_DEVICE_ON_DEMAND_PAGING))
+   return -EOPNOTSUPP;

> 
> 
> >   struct mlx5_ib_mw {
> > struct ib_mwibmw;
> > struct mlx5_core_mkey   mmkey;
> > @@ -1239,6 +1246,10 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const 
> > struct ib_cq_init_attr *attr,
> >   struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
> >   u64 virt_addr, int access_flags,
> >   struct ib_udata *udata);
> > +struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start,
> > +u64 length, u64 virt_addr,
> > +int fd, int access_flags,
> > +struct ib_udata *udata);
> >   int mlx5_ib_advise_mr(struct ib_pd *pd,
> >   enum ib_uverbs_advise_mr_advice advice,
> >   u32 flags,
> > @@ -1249,1

RE: [PATCH v14 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-12-09 Thread Xiong, Jianxin
> -Original Message-
> From: Leon Romanovsky 
> Sent: Tuesday, December 08, 2020 10:31 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Jason Gunthorpe ;
> Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel 
> Subject: Re: [PATCH v14 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Tue, Dec 08, 2020 at 02:39:12PM -0800, Jianxin Xiong wrote:
> > Dma-buf is a standard cross-driver buffer sharing mechanism that can
> > be used to support peer-to-peer access from RDMA devices.
> >
> > Device memory exported via dma-buf is associated with a file descriptor.
> > This is passed to the user space as a property associated with the
> > buffer allocation. When the buffer is registered as a memory region,
> > the file descriptor is passed to the RDMA driver along with other
> > parameters.
> >
> > Implement the common code for importing dma-buf object and mapping
> > dma-buf pages.
> >
> > Signed-off-by: Jianxin Xiong 
> > Reviewed-by: Sean Hefty 
> > Acked-by: Michael J. Ruhl 
> > Acked-by: Christian Koenig 
> > Acked-by: Daniel Vetter 
> > ---
> >  drivers/infiniband/core/Makefile  |   2 +-
> >  drivers/infiniband/core/umem.c|   3 +
> >  drivers/infiniband/core/umem_dmabuf.c | 174 
> > ++
> >  include/rdma/ib_umem.h|  47 -
> >  4 files changed, 222 insertions(+), 4 deletions(-)  create mode
> > 100644 drivers/infiniband/core/umem_dmabuf.c
> 
> <...>
> 
> > +int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf) {
> > +   struct sg_table *sgt;
> > +   struct scatterlist *sg;
> > +   struct dma_fence *fence;
> > +   unsigned long start, end, cur = 0;
> > +   unsigned int nmap = 0;
> > +   int i;
> > +
> > +   dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> > +
> > +   if (umem_dmabuf->sgt)
> > +   goto wait_fence;
> > +
> > +   sgt = dma_buf_map_attachment(umem_dmabuf->attach, DMA_BIDIRECTIONAL);
> > +   if (IS_ERR(sgt))
> > +   return PTR_ERR(sgt);
> > +
> > +   /* modify the sg list in-place to match umem address and length */
> > +
> > +   start = ALIGN_DOWN(umem_dmabuf->umem.address, PAGE_SIZE);
> > +   end = ALIGN(umem_dmabuf->umem.address + umem_dmabuf->umem.length,
> > +   PAGE_SIZE);
> > +   for_each_sgtable_dma_sg(sgt, sg, i) {
> > +   if (start < cur + sg_dma_len(sg) && cur < end)
> > +   nmap++;
> > +   if (cur <= start && start < cur + sg_dma_len(sg)) {
> > +   unsigned long offset = start - cur;
> > +
> > +   umem_dmabuf->first_sg = sg;
> > +   umem_dmabuf->first_sg_offset = offset;
> > +   sg_dma_address(sg) += offset;
> > +   sg_dma_len(sg) -= offset;
> > +   cur += offset;
> > +   }
> > +   if (cur < end && end <= cur + sg_dma_len(sg)) {
> > +   unsigned long trim = cur + sg_dma_len(sg) - end;
> > +
> > +   umem_dmabuf->last_sg = sg;
> > +   umem_dmabuf->last_sg_trim = trim;
> > +   sg_dma_len(sg) -= trim;
> > +   break;
> > +   }
> > +   cur += sg_dma_len(sg);
> > +   }
> > +
> > +   umem_dmabuf->umem.sg_head.sgl = umem_dmabuf->first_sg;
> > +   umem_dmabuf->umem.sg_head.nents = nmap;
> > +   umem_dmabuf->umem.nmap = nmap;
> > +   umem_dmabuf->sgt = sgt;
> > +
> > +wait_fence:
> > +   /*
> > +* Although the sg list is valid now, the content of the pages
> > +* may be not up-to-date. Wait for the exporter to finish
> > +* the migration.
> > +*/
> > +   fence = dma_resv_get_excl(umem_dmabuf->attach->dmabuf->resv);
> > +   if (fence)
> > +   return dma_fence_wait(fence, false);
> 
> You called to dma_buf_map_attachment() earlier in this function, so if you 
> return an error here, the dma_buf won't be unmapped in
> pagefault_dmabuf_mr()

Yes, that's by design. Next time ib_unen_dmabuf_map_pages() is called, 
dma_buf_map_attachment() will be skipped and dma_fence_wait() is retried.

> 
> Thanks
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v13 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-12-08 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Tuesday, December 08, 2020 10:59 AM
> To: Xiong, Jianxin 
> Cc: Leon Romanovsky ; linux-r...@vger.kernel.org; 
> dri-devel@lists.freedesktop.org; Doug Ledford
> ; Sumit Semwal ; Christian 
> Koenig ; Vetter, Daniel
> 
> Subject: Re: [PATCH v13 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Tue, Dec 08, 2020 at 06:13:20PM +, Xiong, Jianxin wrote:
> 
> > > > +static inline struct ib_umem *ib_umem_dmabuf_get(struct ib_device 
> > > > *device,
> > > > +unsigned long offset,
> > > > +size_t size, int fd,
> > > > +int access,
> > > > +struct 
> > > > dma_buf_attach_ops *ops) {
> > > > +   return ERR_PTR(-EINVAL);
> > >
> > > Probably, It should be EOPNOTSUPP and not EINVAL.
> >
> > EINVAL is used here to be consistent with existing definitions in the same 
> > file.
> 
> They may be wrong, EOPNOTSUPP is right for this situation

Ok, let me change all of them to EOPNOTSUPP.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v13 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-12-08 Thread Xiong, Jianxin
> -Original Message-
> From: Leon Romanovsky 
> Sent: Monday, December 07, 2020 11:06 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Jason Gunthorpe ;
> Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel 
> Subject: Re: [PATCH v13 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Mon, Dec 07, 2020 at 02:15:50PM -0800, Jianxin Xiong wrote:
> > Dma-buf is a standard cross-driver buffer sharing mechanism that can
> > be used to support peer-to-peer access from RDMA devices.
> >
> > Device memory exported via dma-buf is associated with a file descriptor.
> > This is passed to the user space as a property associated with the
> > buffer allocation. When the buffer is registered as a memory region,
> > the file descriptor is passed to the RDMA driver along with other
> > parameters.
> >
> > Implement the common code for importing dma-buf object and mapping
> > dma-buf pages.
> >
> > Signed-off-by: Jianxin Xiong 
> > Reviewed-by: Sean Hefty 
> > Acked-by: Michael J. Ruhl 
> > Acked-by: Christian Koenig 
> > Acked-by: Daniel Vetter 
> >
> > Conflicts:
> > include/rdma/ib_umem.h
> 
> This probably leftover from rebase, am I right?
> 
> > ---
> >  drivers/infiniband/core/Makefile  |   2 +-
> >  drivers/infiniband/core/umem.c|   3 +
> >  drivers/infiniband/core/umem_dmabuf.c | 173 
> > ++
> >  include/rdma/ib_umem.h|  43 -
> >  4 files changed, 219 insertions(+), 2 deletions(-)  create mode
> > 100644 drivers/infiniband/core/umem_dmabuf.c
> >
> > diff --git a/drivers/infiniband/core/Makefile
> > b/drivers/infiniband/core/Makefile
> > index ccf2670..8ab4eea 100644
> > --- a/drivers/infiniband/core/Makefile
> > +++ b/drivers/infiniband/core/Makefile
> > @@ -40,5 +40,5 @@ ib_uverbs-y :=uverbs_main.o 
> > uverbs_cmd.o uverbs_marshall.o \
> > uverbs_std_types_srq.o \
> > uverbs_std_types_wq.o \
> > uverbs_std_types_qp.o
> > -ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
> > +ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
> >  ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o diff
> > --git a/drivers/infiniband/core/umem.c
> > b/drivers/infiniband/core/umem.c index 7ca4112..cc131f8 100644
> > --- a/drivers/infiniband/core/umem.c
> > +++ b/drivers/infiniband/core/umem.c
> > @@ -2,6 +2,7 @@
> >   * Copyright (c) 2005 Topspin Communications.  All rights reserved.
> >   * Copyright (c) 2005 Cisco Systems.  All rights reserved.
> >   * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
> > + * Copyright (c) 2020 Intel Corporation. All rights reserved.
> >   *
> >   * This software is available to you under a choice of one of two
> >   * licenses.  You may choose to be licensed under the terms of the
> > GNU @@ -278,6 +279,8 @@ void ib_umem_release(struct ib_umem *umem)  {
> > if (!umem)
> > return;
> > +   if (umem->is_dmabuf)
> > +   return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
> > if (umem->is_odp)
> > return ib_umem_odp_release(to_ib_umem_odp(umem));
> >
> > diff --git a/drivers/infiniband/core/umem_dmabuf.c
> > b/drivers/infiniband/core/umem_dmabuf.c
> > new file mode 100644
> > index 000..e50b955
> > --- /dev/null
> > +++ b/drivers/infiniband/core/umem_dmabuf.c
> > @@ -0,0 +1,173 @@
> > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> > +/*
> > + * Copyright (c) 2020 Intel Corporation. All rights reserved.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "uverbs.h"
> > +
> > +int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf) {
> > +   struct sg_table *sgt;
> > +   struct scatterlist *sg;
> > +   struct dma_fence *fence;
> > +   unsigned long start, end, cur;
> > +   unsigned int nmap;
> > +   int i;
> > +
> > +   dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> > +
> > +   if (umem_dmabuf->sgt)
> > +   return 0;
> > +
> > +   sgt = dma_buf_map_attachment(umem_dmabuf->attach, DMA_BIDIRECTIONAL);
> > +   if (IS_ERR(sgt))
> > +   return PTR_ERR(sgt);
> > +
> > +   /* modify the sg list in-place to match umem address and length *

RE: [PATCH v13 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-12-07 Thread Xiong, Jianxin
> -Original Message-
> From: Leon Romanovsky 
> Sent: Monday, December 07, 2020 11:06 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Jason Gunthorpe ;
> Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel 
> Subject: Re: [PATCH v13 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Mon, Dec 07, 2020 at 02:15:50PM -0800, Jianxin Xiong wrote:
> > Dma-buf is a standard cross-driver buffer sharing mechanism that can
> > be used to support peer-to-peer access from RDMA devices.
> >
> > Device memory exported via dma-buf is associated with a file descriptor.
> > This is passed to the user space as a property associated with the
> > buffer allocation. When the buffer is registered as a memory region,
> > the file descriptor is passed to the RDMA driver along with other
> > parameters.
> >
> > Implement the common code for importing dma-buf object and mapping
> > dma-buf pages.
> >
> > Signed-off-by: Jianxin Xiong 
> > Reviewed-by: Sean Hefty 
> > Acked-by: Michael J. Ruhl 
> > Acked-by: Christian Koenig 
> > Acked-by: Daniel Vetter 
> >
> > Conflicts:
> > include/rdma/ib_umem.h
> 
> This probably leftover from rebase, am I right?

Right. Should have removed it.

> 
> > ---
> >  drivers/infiniband/core/Makefile  |   2 +-
> >  drivers/infiniband/core/umem.c|   3 +
> >  drivers/infiniband/core/umem_dmabuf.c | 173 
> > ++
> >  include/rdma/ib_umem.h|  43 -
> >  4 files changed, 219 insertions(+), 2 deletions(-)  create mode
> > 100644 drivers/infiniband/core/umem_dmabuf.c
> >
> > diff --git a/drivers/infiniband/core/Makefile
> > b/drivers/infiniband/core/Makefile
> > index ccf2670..8ab4eea 100644
> > --- a/drivers/infiniband/core/Makefile
> > +++ b/drivers/infiniband/core/Makefile
> > @@ -40,5 +40,5 @@ ib_uverbs-y :=uverbs_main.o 
> > uverbs_cmd.o uverbs_marshall.o \
> > uverbs_std_types_srq.o \
> > uverbs_std_types_wq.o \
> > uverbs_std_types_qp.o
> > -ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
> > +ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
> >  ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o diff
> > --git a/drivers/infiniband/core/umem.c
> > b/drivers/infiniband/core/umem.c index 7ca4112..cc131f8 100644
> > --- a/drivers/infiniband/core/umem.c
> > +++ b/drivers/infiniband/core/umem.c
> > @@ -2,6 +2,7 @@
> >   * Copyright (c) 2005 Topspin Communications.  All rights reserved.
> >   * Copyright (c) 2005 Cisco Systems.  All rights reserved.
> >   * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
> > + * Copyright (c) 2020 Intel Corporation. All rights reserved.
> >   *
> >   * This software is available to you under a choice of one of two
> >   * licenses.  You may choose to be licensed under the terms of the
> > GNU @@ -278,6 +279,8 @@ void ib_umem_release(struct ib_umem *umem)  {
> > if (!umem)
> > return;
> > +   if (umem->is_dmabuf)
> > +   return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
> > if (umem->is_odp)
> > return ib_umem_odp_release(to_ib_umem_odp(umem));
> >
> > diff --git a/drivers/infiniband/core/umem_dmabuf.c
> > b/drivers/infiniband/core/umem_dmabuf.c
> > new file mode 100644
> > index 000..e50b955
> > --- /dev/null
> > +++ b/drivers/infiniband/core/umem_dmabuf.c
> > @@ -0,0 +1,173 @@
> > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> > +/*
> > + * Copyright (c) 2020 Intel Corporation. All rights reserved.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "uverbs.h"
> > +
> > +int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf) {
> > +   struct sg_table *sgt;
> > +   struct scatterlist *sg;
> > +   struct dma_fence *fence;
> > +   unsigned long start, end, cur;
> > +   unsigned int nmap;
> > +   int i;
> > +
> > +   dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> > +
> > +   if (umem_dmabuf->sgt)
> > +   return 0;
> > +
> > +   sgt = dma_buf_map_attachment(umem_dmabuf->attach, DMA_BIDIRECTIONAL);
> > +   if (IS_ERR(sgt))
> > +   return PTR_ERR(sgt);
> > +
> > +  

RE: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR support

2020-12-01 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Tuesday, December 01, 2020 4:39 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR support
> 
> On Mon, Nov 30, 2020 at 05:53:39PM +, Xiong, Jianxin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Monday, November 30, 2020 8:08 AM
> > > To: Xiong, Jianxin 
> > > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org;
> > > Doug Ledford ; Leon Romanovsky
> > > ; Sumit Semwal ; Christian
> > > Koenig ; Vetter, Daniel
> > > 
> > > Subject: Re: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR
> > > support
> > >
> > > On Fri, Nov 27, 2020 at 12:55:41PM -0800, Jianxin Xiong wrote:
> > > >
> > > > +function(rdma_multifile_module PY_MODULE MODULE_NAME
> > > > +LINKER_FLAGS)
> > >
> > > I think just replace rdma_cython_module with this? No good reason I can 
> > > see to have two APIs?
> >
> > rdma_cython_module can handle many modules, but this one is for a single 
> > module.
> > If you agree, I can merge the two by slightly tweaking the logic: each
> > module starts with a .pyx file, followed by 0 or more .c and .h files.
> 
> Then have rdma_cython_module call some rdam_single_cython_module() multiple 
> times that has this code below?

Mostly like that. Here is an outline:

function(build_one_module PY_MODULE MODULE_NAME ALL_CFILES)
string(REGEX_REPLACE "\\.so$" "" SONAME 
${MODULE_NAME}${CMAKE_PYTHON_SO_SUFFIX}")
add_library(..)
set_target_properties(..)
target_link_libraries(..)
install(..)
endfunction()

function(rdma_cython_module ...)
foreach(SRC_FILE ${ARGN})
.. # commands to parse file name
If (${EXT} STREQAL ".pyx")
If (ALL_CFILES AND MODULE_NAME)
build_one_module(${PY_MODUE} ${MODULE_NAME} ${ALL_CFILES})
set(ALL_CFILES "")
set(MODULE_NAME "")
endif()
.. # commands to convert .pyx to .c
set(ALL_CFILES "${ALL_CFILES};${CFILE}")
elseif (${EXT} STREQAL ".c")
..
set(ALL_CFILES "${ALL_CFILES};${CFILE}")
else()
continue()
endif()
endforeach()
If (ALL_CFILES AND MODULE_NAME)
build_one_module(${PY_MODULE} ${MODULE_NAME} ${ALL_CFILES})
 endif()
endfunction()

> 
> > > Here too? You probably don't need to specify h files at all, at
> > > worst they should only be used with publish_internal_headers
> >
> > Without the .h link, the compiler fail to find the header file (both
> > dmabuf_alloc.c and the generated "dmabuf.c" contain #include
> > "dmabuf_alloc.h").
> 
> Header files are made 'cross module' using the "publish_internal_headers" 
> command
> 
> But we could also hack in a -I directive to fix up the "" include for the 
> cython outupt..
> 
> But it should not be handled here in the cython module command

Sure. That can be fixed.

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR support

2020-11-30 Thread Xiong, Jianxin
> -Original Message-
> From: Daniel Vetter 
> Sent: Monday, November 30, 2020 6:58 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Leon 
> Romanovsky ; Jason Gunthorpe ;
> Doug Ledford ; Vetter, Daniel ; 
> Christian Koenig 
> Subject: Re: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR support
> 
> > +cdef class DmaBuf:
> > +def __init__(self, size, unit=0):
> > +"""
> > +Allocate DmaBuf object from a GPU device. This is done through the
> > +DRI device interface. Usually this requires the effective user id
> > +being a member of the 'render' group.
> > +:param size: The size (in number of bytes) of the buffer.
> > +:param unit: The unit number of the GPU to allocate the buffer 
> > from.
> > +:return: The newly created DmaBuf object on success.
> > +"""
> > +self.dmabuf_mrs = weakref.WeakSet()
> > +self.dmabuf = dmabuf_alloc(size, unit)
> > +if self.dmabuf == NULL:
> > +raise PyverbsRDMAErrno(f'Failed to allocate dmabuf of size 
> > {size} on unit {unit}')
> > +self.dri_fd = dmabuf_get_dri_fd(self.dmabuf)
> 
> dri_fd seems unused by the tests

It's used by the read/write methods of the DmaBufMR class for performing mmap.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR support

2020-11-30 Thread Xiong, Jianxin
> -Original Message-
> From: Daniel Vetter 
> Sent: Monday, November 30, 2020 8:56 AM
> To: Jason Gunthorpe 
> Cc: Daniel Vetter ; Xiong, Jianxin 
> ; linux-r...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Leon Romanovsky ; Doug Ledford 
> ; Vetter, Daniel
> ; Christian Koenig 
> Subject: Re: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR support
> 
> On Mon, Nov 30, 2020 at 12:36:42PM -0400, Jason Gunthorpe wrote:
> > On Mon, Nov 30, 2020 at 05:04:43PM +0100, Daniel Vetter wrote:
> > > On Mon, Nov 30, 2020 at 11:55:44AM -0400, Jason Gunthorpe wrote:
> > > > On Mon, Nov 30, 2020 at 03:57:41PM +0100, Daniel Vetter wrote:
> > > > > > +   err = ioctl(dri->fd, DRM_IOCTL_AMDGPU_GEM_CREATE, &gem_create);
> > > > > > +   if (err)
> > > > > > +   return err;
> > > > > > +
> > > > > > +   *handle = gem_create.out.handle;
> > > > > > +   return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static int radeon_alloc(struct dri *dri, size_t size,
> > > > > > +uint32_t *handle)
> > > > >
> > > > > Tbh radeon chips are old enough I wouldn't care. Also doesn't
> > > > > support p2p dma-buf, so always going to be in system memory when
> > > > > you share. Plus you also need some more flags like I suggested above 
> > > > > I think.
> > > >
> > > > What about nouveau?
> > >
> > > Reallistically chances that someone wants to use rdma together with
> > > the upstream nouveau driver are roughly nil. Imo also needs someone
> > > with the right hardware to make sure it works (since the flags are
> > > all kinda arcane driver specific stuff testing is really needed).
> >
> > Well, it would be helpful if we can test the mlx5 part of the
> > implementation, and I have a lab stocked with nouveau compatible HW..
> >
> > But you are right someone needs to test/etc, so this does not seem
> > like Jianxin should worry
> 
> Ah yes sounds good. I can help with trying to find how to allocate vram with 
> nouveau if you don't find it. Caveat is that nouveau doesn't do
> dynamic dma-buf exports and hence none of the intersting flows and also not 
> p2p. Not sure how much work it would be to roll that out (iirc
> it wasnt that much amdgpu code really, just endless discussions on the 
> interface semantics and how to roll it out without breaking any of
> the existing dma-buf users).
> 
> Another thing that just crossed my mind: Do we have a testcase for forcing 
> the eviction? Should be fairly easy to provoke with something
> like this:
> 
> - register vram-only buffer with mlx5 and do something that binds it
> - allocate enough vram-only buffers to overfill vram (again figuring out
>   how much vram you have is driver specific)
> - touch each buffer with mmap. that should force the mlx5 buffer out. it
>   might be that eviction isn't lru but preferentially idle buffers (i.e.
>   not used by hw, so anything register to mlx5 won't qualify as first
>   victims). so we might need to instead register a ton of buffers with
>   mlx5 and access them through ibverbs
> - do something with mlx5 again to force the rebinding and test it all
>   keeps working
> 
> That entire invalidate/buffer move flow is the most complex interaction I 
> think.

Right now on my side the evict scenario is tested with the "timeout" feature of 
the
AMD gpu. The GPU driver would move all VRAM allocations to system buffer after
a certain period of "inactivity" (10s by default). VRAM being accessed by peer 
DMA
is not counted as activity from GPU's POV. I can observe the 
invalidation/remapping
sequence by running an RDMA test for long enough time. 

I agree having a more generic mechanism to force this scenario is going to be 
useful.

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR support

2020-11-30 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Monday, November 30, 2020 8:08 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR support
> 
> On Fri, Nov 27, 2020 at 12:55:41PM -0800, Jianxin Xiong wrote:
> >
> > +function(rdma_multifile_module PY_MODULE MODULE_NAME LINKER_FLAGS)
> 
> I think just replace rdma_cython_module with this? No good reason I can see 
> to have two APIs?

rdma_cython_module can handle many modules, but this one is for a single module.
If you agree, I can merge the two by slightly tweaking the logic: each module 
starts 
with a .pyx file, followed by 0 or more .c and .h files.

> 
> > +  set(ALL_CFILES "")
> > +  foreach(SRC_FILE ${ARGN})
> > +get_filename_component(FILENAME ${SRC_FILE} NAME_WE)
> > +get_filename_component(DIR ${SRC_FILE} DIRECTORY)
> > +get_filename_component(EXT ${SRC_FILE} EXT)
> > +if (DIR)
> > +  set(SRC_PATH "${CMAKE_CURRENT_SOURCE_DIR}/${DIR}")
> > +else()
> > +  set(SRC_PATH "${CMAKE_CURRENT_SOURCE_DIR}")
> > +endif()
> > +if (${EXT} STREQUAL ".pyx")
> > +  set(PYX "${SRC_PATH}/${FILENAME}.pyx")
> > +  set(CFILE "${CMAKE_CURRENT_BINARY_DIR}/${FILENAME}.c")
> > +  include_directories(${PYTHON_INCLUDE_DIRS})
> > +  add_custom_command(
> > +OUTPUT "${CFILE}"
> > +MAIN_DEPENDENCY "${PYX}"
> > +COMMAND ${CYTHON_EXECUTABLE} "${PYX}" -o "${CFILE}"
> > +"-I${PYTHON_INCLUDE_DIRS}"
> > +COMMENT "Cythonizing ${PYX}"
> > +  )
> > +  set(ALL_CFILES "${ALL_CFILES};${CFILE}")
> > +elseif(${EXT} STREQUAL ".c")
> > +  set(CFILE_ORIG "${SRC_PATH}/${FILENAME}.c")
> > +  set(CFILE "${CMAKE_CURRENT_BINARY_DIR}/${FILENAME}.c")
> > +  if (NOT ${CFILE_ORIG} STREQUAL ${CFILE})
> > +rdma_create_symlink("${CFILE_ORIG}" "${CFILE}")
> > +  endif()
> 
> Why does this need the create_symlink? The compiler should work OK from the 
> source file?

You are right, the link for .c is not necessary, but the link for .h is needed.

> 
> > +  set(ALL_CFILES "${ALL_CFILES};${CFILE}")
> > +elseif(${EXT} STREQUAL ".h")
> > +  set(HFILE_ORIG "${SRC_PATH}/${FILENAME}.h")
> > +  set(HFILE "${CMAKE_CURRENT_BINARY_DIR}/${FILENAME}.h")
> > +  if (NOT ${HFILE_ORIG} STREQUAL ${HFILE})
> > +rdma_create_symlink("${HFILE_ORIG}" "${HFILE}")
> 
> Here too? You probably don't need to specify h files at all, at worst they 
> should only be used with publish_internal_headers

Without the .h link, the compiler fail to find the header file (both 
dmabuf_alloc.c and the generated "dmabuf.c" contain #include "dmabuf_alloc.h").

> 
> > +  endif()
> > +else()
> > +  continue()
> > +endif()
> > +  endforeach()
> > +  string(REGEX REPLACE "\\.so$" "" SONAME
> > + "${MODULE_NAME}${CMAKE_PYTHON_SO_SUFFIX}")
> > +  add_library(${SONAME} SHARED ${ALL_CFILES})
> > + set_target_properties(${SONAME} PROPERTIES
> > +COMPILE_FLAGS "${CMAKE_C_FLAGS} -fPIC -fno-strict-aliasing 
> > -Wno-unused-function -Wno-redundant-decls -Wno-shadow -Wno-
> cast-function-type -Wno-implicit-fallthrough -Wno-unknown-warning 
> -Wno-unknown-warning-option -Wno-deprecated-declarations
> ${NO_VAR_TRACKING_FLAGS}"
> 
> Ugh, you copy and pasted this, but it shouldn't have existed in the first 
> place. Compiler arguments like this should not be specified manually.
> I should fix it..
> 
> Also you should cc edward on all this pyverbs stuff, he knows it all very well

Will add Edward next time. He commented a lot on the PR at github. The current 
github PR
is in sync with this version.

> 
> It all looks reasonable to me
> 
> Jason

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core v3 5/6] tests: Add tests for dma-buf based memory regions

2020-11-30 Thread Xiong, Jianxin
> -Original Message-
> From: Daniel Vetter 
> Sent: Monday, November 30, 2020 7:00 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Leon 
> Romanovsky ; Jason Gunthorpe ;
> Doug Ledford ; Vetter, Daniel ; 
> Christian Koenig 
> Subject: Re: [PATCH rdma-core v3 5/6] tests: Add tests for dma-buf based 
> memory regions
> 
> On Fri, Nov 27, 2020 at 12:55:42PM -0800, Jianxin Xiong wrote:
> > Define a set of unit tests similar to regular MR tests and a set of
> > tests for send/recv and rdma traffic using dma-buf MRs. Add a utility
> > function to generate access flags for dma-buf based MRs because the
> > set of supported flags is smaller.
> >
> > Signed-off-by: Jianxin Xiong 
> > ---
> >  tests/test_mr.py | 239 
> > ++-
> >  tests/utils.py   |  26 ++
> >  2 files changed, 264 insertions(+), 1 deletion(-)
> >
> > diff --git a/tests/test_mr.py b/tests/test_mr.py index
> > adc649c..52cf20a 100644
> > --- a/tests/test_mr.py
> > +++ b/tests/test_mr.py
> > @@ -1,5 +1,6 @@
> >  # SPDX-License-Identifier: (GPL-2.0 OR Linux-OpenIB)  # Copyright (c)
> > 2019 Mellanox Technologies, Inc. All rights reserved. See COPYING file
> > +# Copyright (c) 2020 Intel Corporation. All rights reserved. See
> > +COPYING file
> >  """
> >  Test module for pyverbs' mr module.
> >  """
> > @@ -9,9 +10,10 @@ import errno
> >
> >  from tests.base import PyverbsAPITestCase, RCResources, RDMATestCase
> > from pyverbs.pyverbs_error import PyverbsRDMAError, PyverbsError -from
> > pyverbs.mr import MR, MW, DMMR, MWBindInfo, MWBind
> > +from pyverbs.mr import MR, MW, DMMR, DmaBufMR, MWBindInfo, MWBind
> >  from pyverbs.qp import QPCap, QPInitAttr, QPAttr, QP  from pyverbs.wr
> > import SendWR
> > +from pyverbs.dmabuf import DmaBuf
> >  import pyverbs.device as d
> >  from pyverbs.pd import PD
> >  import pyverbs.enums as e
> > @@ -366,3 +368,238 @@ class DMMRTest(PyverbsAPITestCase):
> >  dm_mr = DMMR(pd, dm_mr_len, 
> > e.IBV_ACCESS_ZERO_BASED,
> >   dm=dm, offset=dm_mr_offset)
> >  dm_mr.close()
> > +
> > +
> > +def check_dmabuf_support():
> > +"""
> > +Check if dma-buf allocation is supported by the system.
> > +Skip the test on failure.
> > +"""
> > +try:
> > +DmaBuf(1)
> 
> Hardcoding gpu unit 1 here (and in other places) is probably not quite what 
> we want. Not sure what you want to do in the test framework
> here instead.

'1' here is the buffer size. Unit is the default value 0. We could probably add 
a command line argument to the test to set the preferred gpu unit.

> 
> > +except PyverbsRDMAError as ex:
> > +if ex.error_code == errno.ENOENT:
> > +raise unittest.SkipTest('Device /dev/dri/renderD* is not 
> > present')
> > +if ex.error_code == errno.EACCES:
> > +raise unittest.SkipTest('Lack of permission to access
> > + /dev/dri/renderD*')
> > +
> > +
> > +def check_dmabuf_mr_support(pd):
> > +"""
> > +Check if dma-buf MR registration is supported by the driver.
> > +Skip the test on failure
> > +"""
> > +try:
> > +DmaBufMR(pd, 1, 0)
> > +except PyverbsRDMAError as ex:
> > +if ex.error_code == errno.EOPNOTSUPP:
> > +raise unittest.SkipTest('Reg dma-buf MR is not
> > +supported')
> > +
> > +
> > +class DmaBufMRTest(PyverbsAPITestCase):
> > +"""
> > +Test various functionalities of the DmaBufMR class.
> > +"""
> > +def test_dmabuf_reg_mr(self):
> > +"""
> > +Test ibv_reg_dmabuf_mr()
> > +"""
> > +check_dmabuf_support()
> > +for ctx, attr, attr_ex in self.devices:
> > +with PD(ctx) as pd:
> > +check_dmabuf_mr_support(pd)
> > +flags = u.get_dmabuf_access_flags(ctx)
> > +for f in flags:
> > +len = u.get_mr_length()
> > +for off in [0, len//2]:
> > +with DmaBufMR(pd, len, f, offset=off) as mr:
> > +pass
> > +
> > +def test_dmabuf_dere

RE: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support

2020-11-25 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Wednesday, November 25, 2020 4:00 PM
> To: Xiong, Jianxin 
> Cc: Daniel Vetter ; Leon Romanovsky ; 
> linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org;
> Doug Ledford ; Vetter, Daniel ; 
> Christian Koenig 
> Subject: Re: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support
> 
> On Wed, Nov 25, 2020 at 07:27:07PM +, Xiong, Jianxin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Wednesday, November 25, 2020 4:15 AM
> > > To: Daniel Vetter 
> > > Cc: Xiong, Jianxin ; Leon Romanovsky
> > > ; linux-r...@vger.kernel.org; dri-
> > > de...@lists.freedesktop.org; Doug Ledford ;
> > > Vetter, Daniel ; Christian Koenig
> > > 
> > > Subject: Re: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR
> > > support
> > >
> > > On Wed, Nov 25, 2020 at 11:50:41AM +0100, Daniel Vetter wrote:
> > >
> > > > Yeah imo makes sense. It's a bunch more code for you to make it
> > > > work on
> > > > i915 and amd, but it's not terrible. And avoids the dependencies,
> > > > and also avoids the abuse of card* and dumb buffers. Plus not
> > > > really more complex, you just need a table or something to match
> > > > from the drm driver name to the driver-specific buffer create
> > > > function. Everything else stays the same.
> > >
> > > If it is going to get more complicated please write it in C then. We
> > > haven't done it yet, but you can link a C function through cython to
> > > the python test script
> > >
> > > If you struggle here I can probably work out the build system bits,
> > > but it should not be too terrible
> >
> > Thanks Daniel and Jason. I have started working in this direction.
> > There should be no technical obstacle here.
> 
> Just to be clear I mean write some 'get dma buf fd' function in C, not the 
> whole test
> 
Yes, that's my understanding.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support

2020-11-25 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Wednesday, November 25, 2020 4:15 AM
> To: Daniel Vetter 
> Cc: Xiong, Jianxin ; Leon Romanovsky 
> ; linux-r...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Doug Ledford ; Vetter, 
> Daniel ; Christian Koenig
> 
> Subject: Re: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support
> 
> On Wed, Nov 25, 2020 at 11:50:41AM +0100, Daniel Vetter wrote:
> 
> > Yeah imo makes sense. It's a bunch more code for you to make it work
> > on
> > i915 and amd, but it's not terrible. And avoids the dependencies, and
> > also avoids the abuse of card* and dumb buffers. Plus not really more
> > complex, you just need a table or something to match from the drm
> > driver name to the driver-specific buffer create function. Everything
> > else stays the same.
> 
> If it is going to get more complicated please write it in C then. We haven't 
> done it yet, but you can link a C function through cython to the
> python test script
> 
> If you struggle here I can probably work out the build system bits, but it 
> should not be too terrible

Thanks Daniel and Jason. I have started working in this direction. There should 
be no
technical obstacle here. 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core 5/5] tests: Bug fix for get_access_flags()

2020-11-24 Thread Xiong, Jianxin
> -Original Message-
> From: John Hubbard 
> Sent: Tuesday, November 24, 2020 12:27 PM
> To: Xiong, Jianxin ; linux-r...@vger.kernel.org; 
> dri-devel@lists.freedesktop.org
> Cc: Doug Ledford ; Jason Gunthorpe ; Leon 
> Romanovsky ; Sumit Semwal
> ; Christian Koenig ; 
> Vetter, Daniel 
> Subject: Re: [PATCH rdma-core 5/5] tests: Bug fix for get_access_flags()
> 
> Just some silly nits I stumbled across while trying to understand the tests.
> 
> On 11/23/20 9:53 AM, Jianxin Xiong wrote:
> > The filter defintion is wrong and causes get_access_flags() always
> 
>   definition

Thanks.

> 
> > returning empty list. As the result the MR tests using this function
> > are effectively skipped (but report success).
> >
> > Also fix a typo in the comments.
> 
> Was there another typo somewhere? All I see is an *added* typo...
> 
> >
> > Signed-off-by: Jianxin Xiong 
> > ---
> >   tests/utils.py | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/tests/utils.py b/tests/utils.py index 0ad7110..eee44b4
> > 100644
> > --- a/tests/utils.py
> > +++ b/tests/utils.py
> > @@ -55,8 +55,8 @@ def filter_illegal_access_flags(element):
> >   :param element: A list of access flags to check
> >   :return: True if this list is legal, else False
> >   """
> > -if e.IBV_ACCESS_REMOTE_ATOMIC in element or e.IBV_ACCESS_REMOTE_WRITE:
> > -if e.IBV_ACCESS_LOCAL_WRITE:
> > +if e.IBV_ACCESS_REMOTE_ATOMIC in element or e.IBV_ACCESS_REMOTE_WRITE 
> > in element:
> > +if not e.IBV_ACCESS_LOCAL_WRITE in element:
> >   return False
> >   return True
> >
> > @@ -69,7 +69,7 @@ def get_access_flags(ctx):
> >   added as well.
> >   After verifying that the flags selection is legal, it is appended to 
> > an
> >   array, assuming it wasn't previously appended.
> > -:param ctx: Device Context to check capabilities
> > +:param ctx: Device Coyyntext to check capabilities
> 
> I liked the old spelling. "Coyyntext" just doesn't sound as good. :)

Hmm, I don't know what happened 😊 I was seeing the other way around.

> 
> >   :param num: Size of initial collection
> >   :return: A random legal value for MR flags
> >   """
> >
> 
> thanks,
> --
> John Hubbard
> NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support

2020-11-24 Thread Xiong, Jianxin
> -Original Message-
> From: Daniel Vetter 
> Sent: Tuesday, November 24, 2020 7:17 AM
> To: Jason Gunthorpe 
> Cc: Xiong, Jianxin ; Leon Romanovsky 
> ; linux-r...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Doug Ledford ; Vetter, 
> Daniel ; Christian Koenig
> 
> Subject: Re: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support
> 
> On Mon, Nov 23, 2020 at 02:05:04PM -0400, Jason Gunthorpe wrote:
> > On Mon, Nov 23, 2020 at 09:53:02AM -0800, Jianxin Xiong wrote:
> >
> > > +cdef class DmaBuf:
> > > +def __init__(self, size, unit=0):
> > > +"""
> > > +Allocate DmaBuf object from a GPU device. This is done through 
> > > the
> > > +DRI device interface (/dev/dri/card*). Usually this
> > > +requires the
> 
> Please use /dev/dri/renderD* instead. That's the interface meant for 
> unpriviledged rendering access. card* is the legacy interface with
> backwards compat galore, don't use.
> 
> Specifically if you do this on a gpu which also has display (maybe some 
> testing on a local developer machine, no idea ...) then you mess with
> compositors and stuff.
> 
> Also wherever you copied this from, please also educate those teams that 
> using /dev/dri/card* for rendering stuff is a Bad Idea (tm)

/dev/dri/renderD* is not always available (e.g. for many iGPUs) and doesn't 
support
mode setting commands (including dumb_buf). The original intention here is to
have something to support the new tests added, not for general compute. 

> 
> > > +effective user id being root or being a member of the 'video' 
> > > group.
> > > +:param size: The size (in number of bytes) of the buffer.
> > > +:param unit: The unit number of the GPU to allocate the buffer 
> > > from.
> > > +:return: The newly created DmaBuf object on success.
> > > +"""
> > > +self.dmabuf_mrs = weakref.WeakSet()
> > > +self.dri_fd = open('/dev/dri/card'+str(unit), O_RDWR)
> > > +
> > > +args = bytearray(32)
> > > +pack_into('=iiq', args, 0, 1, size, 8, 0, 0, 0, 0)
> > > +ioctl(self.dri_fd, DRM_IOCTL_MODE_CREATE_DUMB, args)
> > > +a, b, c, d, self.handle, e, self.size = unpack('=iiq',
> > > + args)
> 
> Yeah no, don't allocate render buffers with create_dumb. Every time this 
> comes up I'm wondering whether we should just completely
> disable dma-buf operations on these. Dumb buffers are explicitly only for 
> software rendering for display purposes when the gpu userspace
> stack isn't fully running yet, aka boot splash.
> 
> And yes I know there's endless amounts of abuse of that stuff floating 
> around, especially on arm-soc/android systems.

One alternative is to use the GEM_CREATE method which can be done via the 
renderD*
device, but the command is vendor specific, so the logic is a little bit more 
complex. 

> 
> > > +
> > > +args = bytearray(12)
> > > +pack_into('=iii', args, 0, self.handle, O_RDWR, 0)
> > > +ioctl(self.dri_fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, args)
> > > +a, b, self.fd = unpack('=iii', args)
> > > +
> > > +args = bytearray(16)
> > > +pack_into('=iiq', args, 0, self.handle, 0, 0)
> > > +ioctl(self.dri_fd, DRM_IOCTL_MODE_MAP_DUMB, args);
> > > +a, b, self.map_offset = unpack('=iiq', args);
> >
> > Wow, OK
> >
> > Is it worth using ctypes here instead? Can you at least add a comment
> > before each pack specifying the 'struct XXX' this is following?
> >
> > Does this work with normal Intel GPUs, like in a Laptop? AMD too?
> >
> > Christian, I would be very happy to hear from you that this entire
> > work is good for AMD as well
> 
> I think the smallest generic interface for allocating gpu buffers which are 
> more useful than the stuff you get from CREATE_DUMB is gbm.
> That's used by compositors to get bare metal opengl going on linux. Ofc 
> Android has gralloc for the same purpose, and cros has minigbm
> (which isn't the same as gbm at all). So not so cool.

Again, would the "renderD* + GEM_CREATE" combination be an acceptable 
alternative? 
That would be much simpler than going with gbm and less dependency in setting up
the testing evrionment.

> 
> The other generic option is using vulkan, which works directly on bare metal 
> (without a compositor or anything r

RE: [PATCH v11 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-11-24 Thread Xiong, Jianxin
> -Original Message-
> From: Christoph Hellwig 
> Sent: Tuesday, November 24, 2020 1:34 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Jason Gunthorpe ;
> Leon Romanovsky ; Sumit Semwal ; 
> Christian Koenig ; Vetter,
> Daniel 
> Subject: Re: [PATCH v11 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> As these are mostly trivial wrappers around the EXPORT_SYMBOL_GPL dmabuf 
> exports please stick to that export style.
> 
> > --- /dev/null
> > +++ b/drivers/infiniband/core/umem_dmabuf.h
> > @@ -0,0 +1,11 @@
> > +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
> > +/*
> > + * Copyright (c) 2020 Intel Corporation. All rights reserved.
> > + */
> > +
> > +#ifndef UMEM_DMABUF_H
> > +#define UMEM_DMABUF_H
> > +
> > +void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf);
> > +
> > +#endif /* UMEM_DMABUF_H */
> 
> Does this really need a separate header?

The symbol doesn't need to be exported otherwise it can be put into "ib_umem.h".
Although the prototype could be put into the file where it is called directly, 
using a
separate header file provides a cleaner interface.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core 1/5] verbs: Support dma-buf based memory region

2020-11-24 Thread Xiong, Jianxin
> -Original Message-
> From: Yishai Hadas 
> Sent: Tuesday, November 24, 2020 2:32 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Jason Gunthorpe ;
> Leon Romanovsky ; Sumit Semwal ; 
> Christian Koenig ; Vetter,
> Daniel ; Yishai Hadas 
> Subject: Re: [PATCH rdma-core 1/5] verbs: Support dma-buf based memory region
> 
> On 11/23/2020 7:53 PM, Jianxin Xiong wrote:
> > Add new API function and new provider method for registering dma-buf
> > based memory region. Update the man page and bump the API version.
> >
> > Signed-off-by: Jianxin Xiong 
> > ---
> >   kernel-headers/rdma/ib_user_ioctl_cmds.h | 14 
> >   libibverbs/cmd_mr.c  | 38 
> > 
> >   libibverbs/driver.h  |  7 ++
> >   libibverbs/dummy_ops.c   | 11 +
> >   libibverbs/libibverbs.map.in |  6 +
> >   libibverbs/man/ibv_reg_mr.3  | 21 --
> >   libibverbs/verbs.c   | 19 
> >   libibverbs/verbs.h   | 10 +
> >   8 files changed, 124 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel-headers/rdma/ib_user_ioctl_cmds.h
> > b/kernel-headers/rdma/ib_user_ioctl_cmds.h
> > index 7968a18..dafc7eb 100644
> > --- a/kernel-headers/rdma/ib_user_ioctl_cmds.h
> > +++ b/kernel-headers/rdma/ib_user_ioctl_cmds.h
> > @@ -1,5 +1,6 @@
> >   /*
> >* Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
> > + * Copyright (c) 2020, Intel Corporation. All rights reserved.
> >*
> >* This software is available to you under a choice of one of two
> >* licenses.  You may choose to be licensed under the terms of the
> > GNU @@ -251,6 +252,7 @@ enum uverbs_methods_mr {
> > UVERBS_METHOD_MR_DESTROY,
> > UVERBS_METHOD_ADVISE_MR,
> > UVERBS_METHOD_QUERY_MR,
> > +   UVERBS_METHOD_REG_DMABUF_MR,
> >   };
> >
> >   enum uverbs_attrs_mr_destroy_ids {
> > @@ -272,6 +274,18 @@ enum uverbs_attrs_query_mr_cmd_attr_ids {
> > UVERBS_ATTR_QUERY_MR_RESP_IOVA,
> >   };
> >
> > +enum uverbs_attrs_reg_dmabuf_mr_cmd_attr_ids {
> > +   UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
> > +   UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
> > +   UVERBS_ATTR_REG_DMABUF_MR_OFFSET,
> > +   UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
> > +   UVERBS_ATTR_REG_DMABUF_MR_IOVA,
> > +   UVERBS_ATTR_REG_DMABUF_MR_FD,
> > +   UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
> > +   UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
> > +   UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
> > +};
> > +
> >   enum uverbs_attrs_create_counters_cmd_attr_ids {
> > UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
> >   };
> > diff --git a/libibverbs/cmd_mr.c b/libibverbs/cmd_mr.c index
> > 42dbe42..91ce2ef 100644
> > --- a/libibverbs/cmd_mr.c
> > +++ b/libibverbs/cmd_mr.c
> > @@ -1,5 +1,6 @@
> >   /*
> >* Copyright (c) 2018 Mellanox Technologies, Ltd.  All rights reserved.
> > + * Copyright (c) 2020 Intel Corporation.  All rights reserved.
> >*
> >* This software is available to you under a choice of one of two
> >* licenses.  You may choose to be licensed under the terms of the
> > GNU @@ -116,3 +117,40 @@ int ibv_cmd_query_mr(struct ibv_pd *pd, struct 
> > verbs_mr *vmr,
> > return 0;
> >   }
> >
> > +int ibv_cmd_reg_dmabuf_mr(struct ibv_pd *pd, uint64_t offset, size_t 
> > length,
> > + uint64_t iova, int fd, int access,
> > + struct verbs_mr *vmr)
> > +{
> > +   DECLARE_COMMAND_BUFFER(cmdb, UVERBS_OBJECT_MR,
> > +  UVERBS_METHOD_REG_DMABUF_MR,
> > +  9);
> > +   struct ib_uverbs_attr *handle;
> > +   uint32_t lkey, rkey;
> > +   int ret;
> > +
> > +   handle = fill_attr_out_obj(cmdb, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
> > +   fill_attr_out_ptr(cmdb, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY, &lkey);
> > +   fill_attr_out_ptr(cmdb, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY, &rkey);
> > +
> > +   fill_attr_in_obj(cmdb, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE, pd->handle);
> > +   fill_attr_in_uint64(cmdb, UVERBS_ATTR_REG_DMABUF_MR_OFFSET, offset);
> > +   fill_attr_in_uint64(cmdb, UVERBS_ATTR_REG_DMABUF_MR_LENGTH, length);
> > +   fill_attr_in_uint64(cmdb, UVERBS_ATTR_REG_DMABUF_MR_IOVA, iova);
> > +   fill_attr_in_uint32(cmdb, UVERBS_ATTR_REG_DMABUF_MR_FD, fd);
> > 

RE: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support

2020-11-24 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Tuesday, November 24, 2020 7:36 AM
> To: Daniel Vetter 
> Cc: Xiong, Jianxin ; Leon Romanovsky 
> ; linux-r...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Doug Ledford ; Vetter, 
> Daniel ; Christian Koenig
> 
> Subject: Re: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support
> 
> On Tue, Nov 24, 2020 at 04:16:58PM +0100, Daniel Vetter wrote:
> 
> > Compute is the worst, because opencl is widely considered a mistake
> > (maybe opencl 3 is better, but nvidia is stuck on 1.2). The actually
> > used stuff is cuda (nvidia-only), rocm (amd-only) and now with intel
> > also playing we have xe (intel-only).
> 
> > It's pretty glorious :-/
> 
> I enjoyed how the Intel version of CUDA is called "OneAPI" not "Third API" ;)
> 
> Hopefuly xe compute won't leave a lot of half finished abandoned kernel code 
> like Xeon Phi did :(
> 
> > Also I think we discussed this already, but for actual p2p the intel
> > patches aren't in upstream yet. We have some internally, but with very
> > broken locking (in the process of getting fixed up, but it's taking time).
> 
> Someone needs to say this test works on a real system with an unpatched 
> upstream driver.
> 
> I thought AMD had the needed parts merged?

Yes, I have tested these with AMD GPU.

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support

2020-11-23 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Monday, November 23, 2020 10:05 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support
> 
> On Mon, Nov 23, 2020 at 09:53:02AM -0800, Jianxin Xiong wrote:
> 
> > +cdef class DmaBuf:
> > +def __init__(self, size, unit=0):
> > +"""
> > +Allocate DmaBuf object from a GPU device. This is done through the
> > +DRI device interface (/dev/dri/card*). Usually this requires the
> > +effective user id being root or being a member of the 'video' 
> > group.
> > +:param size: The size (in number of bytes) of the buffer.
> > +:param unit: The unit number of the GPU to allocate the buffer 
> > from.
> > +:return: The newly created DmaBuf object on success.
> > +"""
> > +self.dmabuf_mrs = weakref.WeakSet()
> > +self.dri_fd = open('/dev/dri/card'+str(unit), O_RDWR)
> > +
> > +args = bytearray(32)
> > +pack_into('=iiq', args, 0, 1, size, 8, 0, 0, 0, 0)
> > +ioctl(self.dri_fd, DRM_IOCTL_MODE_CREATE_DUMB, args)
> > +a, b, c, d, self.handle, e, self.size = unpack('=iiq',
> > + args)
> > +
> > +args = bytearray(12)
> > +pack_into('=iii', args, 0, self.handle, O_RDWR, 0)
> > +ioctl(self.dri_fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, args)
> > +a, b, self.fd = unpack('=iii', args)
> > +
> > +args = bytearray(16)
> > +pack_into('=iiq', args, 0, self.handle, 0, 0)
> > +ioctl(self.dri_fd, DRM_IOCTL_MODE_MAP_DUMB, args);
> > +a, b, self.map_offset = unpack('=iiq', args);
> 
> Wow, OK
> 
> Is it worth using ctypes here instead? Can you at least add a comment before 
> each pack specifying the 'struct XXX' this is following?
> 

The ioctl call only accept a bytearray, not sure how to use ctypes here. I will 
add 
comments with the actual layout of the parameter structure.

> Does this work with normal Intel GPUs, like in a Laptop? AMD too?
> 

Yes, the interface is generic and works with most GPUs. Works with AMD, too.

> Christian, I would be very happy to hear from you that this entire work is 
> good for AMD as well
> 
> Edward should look through this, but I'm glad to see something like this
> 
> Thanks,
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH rdma-core 2/5] mlx5: Support dma-buf based memory region

2020-11-23 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Monday, November 23, 2020 10:02 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH rdma-core 2/5] mlx5: Support dma-buf based memory region
> 
> On Mon, Nov 23, 2020 at 09:53:01AM -0800, Jianxin Xiong wrote:
> 
> > +struct ibv_mr *mlx5_reg_dmabuf_mr(struct ibv_pd *pd, uint64_t offset, 
> > size_t length,
> > + uint64_t iova, int fd, int acc) {
> > +   struct mlx5_mr *mr;
> > +   int ret;
> > +   enum ibv_access_flags access = (enum ibv_access_flags)acc;
> 
> Why?

It's copied from mlx5_reg_mr(). Didn't pay attention to this but looks 
unnecessary now.

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v10 4/6] RDMA/mlx5: Support dma-buf based userspace memory region

2020-11-13 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Friday, November 13, 2020 5:06 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v10 4/6] RDMA/mlx5: Support dma-buf based userspace 
> memory region
> 
> On Fri, Nov 13, 2020 at 03:51:20AM +, Xiong, Jianxin wrote:
> 
> > > > +static void mlx5_ib_dmabuf_invalidate_cb(struct
> > > > +dma_buf_attachment
> > > > +*attach) {
> > > > +   struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
> > > > +   struct mlx5_ib_mr *mr = umem_dmabuf->private;
> > > > +
> > > > +   dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> > > > +
> > > > +   if (mr)
> > >
> > > I don't think this 'if (mr)' test is needed anymore? I certainly
> > > prefer it gone as it is kind of messy. I expect unmapping the dma to 
> > > ensure this function is not running, and won't run again.
> >
> > It is still needed. When the dma-buf moves, the callback function of every 
> > attached importer is invoked, regardless if the importer has
> mapped the dma or not.
> >
> > >
> > > > +/**
> > > > + * mlx5_ib_fence_dmabuf_mr - Stop all access to the dmabuf MR
> > > > + * @mr: to fence
> > > > + *
> > > > + * On return no parallel threads will be touching this MR and no
> > > > +DMA will be
> > > > + * active.
> > > > + */
> > > > +void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr) {
> > > > +   struct ib_umem_dmabuf *umem_dmabuf =
> > > > +to_ib_umem_dmabuf(mr->umem);
> > > > +
> > > > +   /* Prevent new page faults and prefetch requests from 
> > > > succeeding */
> > > > +   xa_erase(&mr->dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key));
> > > > +
> > > > +   /* Wait for all running page-fault handlers to finish. */
> > > > +   synchronize_srcu(&mr->dev->odp_srcu);
> > > > +
> > > > +   wait_event(mr->q_deferred_work,
> > > > +!atomic_read(&mr->num_deferred_work));
> > > > +
> > > > +   dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > > > +   mlx5_mr_cache_invalidate(mr);
> > > > +   umem_dmabuf->private = NULL;
> > > > +   ib_umem_dmabuf_unmap_pages(umem_dmabuf);
> > > > +   dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> > > > +
> > > > +   if (!mr->cache_ent) {
> > > > +   mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> > > > +   WARN_ON(mr->descs);
> > > > +   }
> > >
> > > I didn't check carefully, but are you sure this destroy_mkey should be 
> > > here??
> >
> > To my understanding, yes. This is similar to what dma_fence_odp_mr()
> > does, just inlined here since it's not called from other places.
> 
> I think you should put the calls to dma_buf_dynamic_attach() and
> dma_buf_detach() into mlx5, it makes the whole thing a little cleaner, then 
> the umem->private isn't needed any more either.

Putting these calls into mlx5 can remove the 'ops' parameter from the umem_get 
call,
but I don't see how umem->private can be eliminated. In addition, I feel 
keeping these
two calls in the core provides a better separation between the core and the 
driver -- 
dma-buf API functions are only called from the core, except for 
locking/unlocking.

The 'if (mr)' part in the callback can be removed by adding a line
'if (!umem_dmabutf->sgt) return;' before that if that makes a difference. 

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v10 3/6] RDMA/uverbs: Add uverbs command for dma-buf based MR registration

2020-11-12 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Thursday, November 12, 2020 4:34 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v10 3/6] RDMA/uverbs: Add uverbs command for dma-buf 
> based MR registration
> 
> On Tue, Nov 10, 2020 at 01:41:14PM -0800, Jianxin Xiong wrote:
> > +   mr = pd->device->ops.reg_user_mr_dmabuf(pd, offset, length, virt_addr,
> > +   fd, access_flags,
> > +   &attrs->driver_udata);
> > +   if (IS_ERR(mr))
> > +   return PTR_ERR(mr);
> > +
> > +   mr->device = pd->device;
> > +   mr->pd = pd;
> > +   mr->type = IB_MR_TYPE_USER;
> > +   mr->uobject = uobj;
> > +   atomic_inc(&pd->usecnt);
> > +
> > +   uobj->object = mr;
> > +
> > +   uverbs_finalize_uobj_create(attrs,
> > +UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
> > +
> > +   ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
> > +&mr->lkey, sizeof(mr->lkey));
> > +   if (ret)
> > +   goto err_dereg;
> > +
> > +   ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
> > +&mr->rkey, sizeof(mr->rkey));
> > +   if (ret)
> > +   goto err_dereg;
> > +
> > +   return 0;
> > +
> > +err_dereg:
> > +   ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
> 
> This isn't how the error handling works with
> uverbs_finalize_uobj_create() - you just return the error code and the caller 
> deals with destroying the fully initialized HW object properly.
> Calling ib_dereg_mr_user() here will crash the kernel.
> 
> Check the other uses for an example.
> 

Will fix.

> Again I must ask what the plan is for testing as even a basic set of pyverbs 
> sanity tests would catch this.
> 
> I've generally been insisting that all new uABI needs testing support in 
> rdma-core. This *might* be the exception but I want to hear a really
> good reason why we can't have a test first...
> 

So far I have been using a user space test that focuses on basic functionality 
and limited parameter checking (e.g. incorrect addr, length, flags). This 
specific error path happens
to be untested. Will work more on the test front to increase the code coverage.

> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v10 4/6] RDMA/mlx5: Support dma-buf based userspace memory region

2020-11-12 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Thursday, November 12, 2020 4:40 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v10 4/6] RDMA/mlx5: Support dma-buf based userspace 
> memory region
> 
> On Tue, Nov 10, 2020 at 01:41:15PM -0800, Jianxin Xiong wrote:
> 
> > -static int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int
> > flags)
> > +int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
> >  {
> > struct mlx5_ib_dev *dev = mr->dev;
> > struct device *ddev = dev->ib_dev.dev.parent; @@ -1255,6 +1267,10 @@
> > static int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
> > cur_mtt->ptag =
> > cpu_to_be64(rdma_block_iter_dma_address(&biter) |
> > MLX5_IB_MTT_PRESENT);
> > +
> > +   if (mr->umem->is_dmabuf && (flags & MLX5_IB_UPD_XLT_ZAP))
> > +   cur_mtt->ptag = 0;
> > +
> > cur_mtt++;
> > }
> >
> > @@ -1291,8 +1307,15 @@ static struct mlx5_ib_mr *reg_create(struct ib_mr 
> > *ibmr, struct ib_pd *pd,
> > int err;
> > bool pg_cap = !!(MLX5_CAP_GEN(dev->mdev, pg));
> >
> > -   page_size =
> > -   mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> > +   if (umem->is_dmabuf) {
> > +   if ((iova ^ umem->address) & (PAGE_SIZE - 1))
> > +   return ERR_PTR(-EINVAL);
> > +   umem->iova = iova;
> > +   page_size = PAGE_SIZE;
> > +   } else {
> > +   page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size,
> > +0, iova);
> > +   }
> 
> Urk, maybe this duplicated sequence should be in a function..
> 
> This will also collide with a rereg_mr bugfixing series that should be posted 
> soon..
> 
> > +static void mlx5_ib_dmabuf_invalidate_cb(struct dma_buf_attachment
> > +*attach) {
> > +   struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
> > +   struct mlx5_ib_mr *mr = umem_dmabuf->private;
> > +
> > +   dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> > +
> > +   if (mr)
> 
> I don't think this 'if (mr)' test is needed anymore? I certainly prefer it 
> gone as it is kind of messy. I expect unmapping the dma to ensure this
> function is not running, and won't run again.

It is still needed. When the dma-buf moves, the callback function of every 
attached importer is invoked, regardless if the importer has mapped the dma or 
not.

> 
> > +/**
> > + * mlx5_ib_fence_dmabuf_mr - Stop all access to the dmabuf MR
> > + * @mr: to fence
> > + *
> > + * On return no parallel threads will be touching this MR and no DMA
> > +will be
> > + * active.
> > + */
> > +void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr) {
> > +   struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
> > +
> > +   /* Prevent new page faults and prefetch requests from succeeding */
> > +   xa_erase(&mr->dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key));
> > +
> > +   /* Wait for all running page-fault handlers to finish. */
> > +   synchronize_srcu(&mr->dev->odp_srcu);
> > +
> > +   wait_event(mr->q_deferred_work,
> > +!atomic_read(&mr->num_deferred_work));
> > +
> > +   dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > +   mlx5_mr_cache_invalidate(mr);
> > +   umem_dmabuf->private = NULL;
> > +   ib_umem_dmabuf_unmap_pages(umem_dmabuf);
> > +   dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> > +
> > +   if (!mr->cache_ent) {
> > +   mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> > +   WARN_ON(mr->descs);
> > +   }
> 
> I didn't check carefully, but are you sure this destroy_mkey should be here??

To my understanding, yes. This is similar to what dma_fence_odp_mr() does,
just inlined here since it's not called from other places.
 
> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v10 1/6] RDMA/umem: Support importing dma-buf as user memory region

2020-11-12 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Thursday, November 12, 2020 4:31 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v10 1/6] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Tue, Nov 10, 2020 at 01:41:12PM -0800, Jianxin Xiong wrote:
> > +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
> > +  unsigned long offset, size_t size,
> > +  int fd, int access,
> > +  const struct dma_buf_attach_ops *ops) {
> > +   struct dma_buf *dmabuf;
> > +   struct ib_umem_dmabuf *umem_dmabuf;
> > +   struct ib_umem *umem;
> > +   unsigned long end;
> > +   long ret;
> > +
> > +   if (check_add_overflow(offset, (unsigned long)size, &end))
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
> > +   return ERR_PTR(-EINVAL);
> 
> This is weird, what does it do?

This sequence is modeled after the following code from ib_umem_init_odp():

if (check_add_overflow(umem_odp->umem.address,
   (unsigned long)umem_odp->umem.length,
   &end))
return -EOVERFLOW;
end = ALIGN(end, page_size);
if (unlikely(end < page_size))
return -EOVERFLOW;

The weird part seems to be checking if 'end' is 0, but that should have been 
covered
by check_add_overflow() already.

> 
> > +
> > +   if (unlikely(!ops || !ops->move_notify))
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
> > +   if (!umem_dmabuf)
> > +   return ERR_PTR(-ENOMEM);
> > +
> > +   umem = &umem_dmabuf->umem;
> > +   umem->ibdev = device;
> > +   umem->length = size;
> > +   umem->address = offset;
> > +   umem->writable = ib_access_writable(access);
> > +   umem->is_dmabuf = 1;
> > +
> > +   if (unlikely(!ib_umem_num_pages(umem))) {
> > +   ret = -EINVAL;
> > +   goto out_free_umem;
> > +   }
> > +
> > +   dmabuf = dma_buf_get(fd);
> > +   if (IS_ERR(dmabuf)) {
> > +   ret = PTR_ERR(dmabuf);
> > +   goto out_free_umem;
> > +   }
> > +
> > +   if (dmabuf->size < offset + size) {
> > +   ret = -EINVAL;
> > +   goto out_release_dmabuf;
> 
> offset + size == end, already computed, in fact move this above the kzalloc
> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-11-10 Thread Xiong, Jianxin
> -Original Message-
> From: Daniel Vetter 
> Sent: Tuesday, November 10, 2020 6:44 AM
> To: Jason Gunthorpe 
> Cc: Daniel Vetter ; Xiong, Jianxin 
> ; Leon Romanovsky ; linux-
> r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Vetter, Daniel ;
> Christian Koenig 
> Subject: Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Tue, Nov 10, 2020 at 10:27:57AM -0400, Jason Gunthorpe wrote:
> > On Tue, Nov 10, 2020 at 03:14:45PM +0100, Daniel Vetter wrote:
> > > On Fri, Nov 06, 2020 at 12:39:53PM -0400, Jason Gunthorpe wrote:
> > > > On Fri, Nov 06, 2020 at 04:34:07PM +, Xiong, Jianxin wrote:
> > > >
> > > > > > The user could specify a length that is beyond the dma buf,
> > > > > > can the dma buf length be checked during get?
> > > > >
> > > > > In order to check the length, the buffer needs to be mapped. That can 
> > > > > be done.
> > > >
> > > > Do DMA bufs even have definitate immutable lengths? Going to be a
> > > > probelm if they can shrink
> > >
> > > Yup. Unfortunately that's not document in the structures themselves,
> > > there's just some random comments sprinkled all over that dma-buf
> > > size is invariant, e.g. see the @mmap kerneldoc in dma_buf_ops:
> > >
> > > https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlig
> > > ht=dma_buf_ops#c.dma_buf_ops
> > >
> > > "Because dma-buf buffers have invariant size over their lifetime, ..."
> > >
> > > Jianxin, can you pls do a kerneldoc patch which updates the comment
> > > for dma_buf.size and dma_buf_export_info.size?

Sure. 

> >
> > So we can check the size without doing an attachment?
> 
> Yeah size should be invariant over the lifetime of the dma_buf (it's also 
> needed for dma_buf_vmap kernel cpu access and dma_buf_mmap
> userspace mmap forwarding). No lock or attachment needed. But like I said, 
> would be good to have the kerneldoc patch to make this clear.
> 
> The history here is that the X shm extension suffered badly from resizeable 
> storage object exploits of the X server, this is why both dma-buf
> (and also drm_gem_buffer_object before that generalization) and memfd are 
> size sealed.
> -Daniel
> 
> > That means the flow should be put back to how it was a series or two
> > ago where the SGL is only attached during page fault with the entire
> > programming sequence under the resv lock

Will do.

> >
> > Jason
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v9 4/5] RDMA/mlx5: Support dma-buf based userspace memory region

2020-11-09 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Monday, November 09, 2020 12:53 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v9 4/5] RDMA/mlx5: Support dma-buf based userspace memory 
> region
> 
> On Mon, Nov 09, 2020 at 11:23:00AM -0800, Jianxin Xiong wrote:
> > @@ -1291,8 +1303,11 @@ static struct mlx5_ib_mr *reg_create(struct ib_mr 
> > *ibmr, struct ib_pd *pd,
> > int err;
> > bool pg_cap = !!(MLX5_CAP_GEN(dev->mdev, pg));
> >
> > -   page_size =
> > -   mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> > +   if (umem->is_dmabuf)
> > +   page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
> > +   else
> > +   page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size,
> > +0, iova);
> 
> Any place touching the sgl has to also hold the resv lock, and sgl might be 
> NULL since an invalidation could come in at any time, eg before
> we get here.
> 
> You can avoid those problems by ingoring the SGL and hard wiring PAGE_SIZE 
> here
> 
> > +static int pagefault_dmabuf_mr(struct mlx5_ib_mr *mr, size_t bcnt,
> > +  u32 *bytes_mapped, u32 flags) {
> > +   struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
> > +   u32 xlt_flags = 0;
> > +   int err;
> > +
> > +   if (flags & MLX5_PF_FLAGS_ENABLE)
> > +   xlt_flags |= MLX5_IB_UPD_XLT_ENABLE;
> > +
> > +   dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > +   err = ib_umem_dmabuf_map_pages(umem_dmabuf);
> > +   if (!err)
> > +   err = mlx5_ib_update_mr_pas(mr, xlt_flags);
> 
> This still has to call mlx5_umem_find_best_pgsz() each time the sgl changes 
> to ensure it is still Ok. Just checking that
> 
>   mlx5_umem_find_best_pgsz() > PAGE_SIZE
> 
> and then throwing away the value is OK

ib_umem_find_best_pgsz() is already called inside ib_umem_dmabuf_map_pages(). 
Do we
still need to call mlx5_umem_find_best_pgsz() here?

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-11-06 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Friday, November 06, 2020 8:40 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Fri, Nov 06, 2020 at 04:34:07PM +, Xiong, Jianxin wrote:
> 
> > > The user could specify a length that is beyond the dma buf, can the
> > > dma buf length be checked during get?
> >
> > In order to check the length, the buffer needs to be mapped. That can be 
> > done.
> 
> Do DMA bufs even have definitate immutable lengths? Going to be a probelm if 
> they can shrink

Good question. The buffer itself has fixed size. If for whatever reason the 
mapping
is not full it must be temporary. If that does happen ib_umem_dmabuf_map_pages
will undo the mapping and return error. It will be retried later via the 
pagefault handler.

> 
> > > Also page_size can be 0 because iova is not OK. iova should be
> > > checked for alignment during get as well:
> > >
> > >   iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)
> >
> > If ib_umem_dmabuf_map_pages is called during get this error is 
> > automatically caught.
> 
> The ib_umem_find_best_pgsz() checks this equation, yes.
> 
> So you'd map the sgl before allocating the mkey? This then checks the length 
> and iova?

Yes

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-11-06 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Thursday, November 05, 2020 4:09 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Thu, Nov 05, 2020 at 02:48:05PM -0800, Jianxin Xiong wrote:
> > +   /* modify the sgl in-place to match umem address and length */
> > +
> > +   start = ALIGN_DOWN(umem_dmabuf->umem.address, PAGE_SIZE);
> > +   end = ALIGN(umem_dmabuf->umem.address + umem_dmabuf->umem.length,
> > +   PAGE_SIZE);
> > +   cur = 0;
> > +   nmap = 0;
> > +   for_each_sgtable_dma_sg(sgt, sg, i) {
> > +   if (cur >= end)
> > +   break;
> > +   if (cur + sg_dma_len(sg) <= start) {
> > +   cur += sg_dma_len(sg);
> > +   continue;
> > +   }
> 
> This seems like a strange way to compute interesections 

I can rework that.

> 
>   if (cur <= start && start < cur + sg_dma_len(sg))
> 
> > +   if (cur <= start) {
> > +   unsigned long offset = start - cur;
> > +
> > +   umem_dmabuf->first_sg = sg;
> > +   umem_dmabuf->first_sg_offset = offset;
> > +   sg_dma_address(sg) += offset;
> > +   sg_dma_len(sg) -= offset;
> > +   if (&sg_dma_len(sg) != &sg->length)
> > +   sg->length -= offset;
> 
> We don't need to adjust sg->length, only dma_len, so no reason for this 
> surprising if.
> 
> > +   cur += offset;
> > +   }
> > +   if (cur + sg_dma_len(sg) >= end) {
> 
> Same logic here
> 
> > +   unsigned long trim = cur + sg_dma_len(sg) - end;
> > +
> > +   umem_dmabuf->last_sg = sg;
> > +   umem_dmabuf->last_sg_trim = trim;
> > +   sg_dma_len(sg) -= trim;
> > +   if (&sg_dma_len(sg) != &sg->length)
> > +   sg->length -= trim;
> 
> break, things are done here
> 
> > +   }
> > +   cur += sg_dma_len(sg);
> > +   nmap++;
> > +   }
> 
> > +
> > +   umem_dmabuf->umem.sg_head.sgl = umem_dmabuf->first_sg;
> > +   umem_dmabuf->umem.sg_head.nents = nmap;
> > +   umem_dmabuf->umem.nmap = nmap;
> > +   umem_dmabuf->sgt = sgt;
> > +
> > +   page_size = ib_umem_find_best_pgsz(&umem_dmabuf->umem, PAGE_SIZE,
> > +  umem_dmabuf->umem.iova);
> > +
> > +   if (WARN_ON(cur != end || page_size != PAGE_SIZE)) {
> 
> Looks like nothing prevents this warn on to tigger
> 
> The user could specify a length that is beyond the dma buf, can the dma buf 
> length be checked during get?

In order to check the length, the buffer needs to be mapped. That can be done.

> 
> Also page_size can be 0 because iova is not OK. iova should be checked for 
> alignment during get as well:
> 
>   iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)
> 

If ib_umem_dmabuf_map_pages is called during get this error is automatically 
caught.

> But yes, this is the right idea
> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration

2020-11-06 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Thursday, November 05, 2020 4:13 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based 
> MR registration
> 
> On Thu, Nov 05, 2020 at 02:48:07PM -0800, Jianxin Xiong wrote:
> 
> > +   ret = ib_check_mr_access(access_flags);
> > +   if (ret)
> > +   return ret;
> 
> This should also reject unsupportable flags like ACCESS_ON_DEMAND and HUGETLB

Will do.

> 
> > +
> > +   mr = pd->device->ops.reg_user_mr_dmabuf(pd, offset, length, virt_addr,
> > +   fd, access_flags,
> > +   &attrs->driver_udata);
> > +   if (IS_ERR(mr))
> > +   return PTR_ERR(mr);
> > +
> > +   mr->device  = pd->device;
> > +   mr->pd  = pd;
> > +   mr->type= IB_MR_TYPE_USER;
> > +   mr->uobject = uobj;
> > +   atomic_inc(&pd->usecnt);
> 
> Fix intending when copying code please

It could be due to a mix of tab and space. They look aligned in the source 
file. Will fix.

> 
> > +
> > +   uobj->object = mr;
> 
> Also bit surprised this works at all, it needs to call
> 
>   uverbs_finalize_uobj_create()
> 
> Right here.
> 

Will fix.

> Seems like the core code is missing some check that the API is being used 
> properly.
> 
> > +
> > +   ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
> > +&mr->lkey, sizeof(mr->lkey));
> > +   if (ret)
> > +   goto err_dereg;
> > +
> > +   ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
> > +&mr->rkey, sizeof(mr->rkey));
> > +   if (ret)
> > +   goto err_dereg;
> > +
> > +   return 0;
> > +
> > +err_dereg:
> > +   ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
> > +
> > +   return ret;
> > +}
> > +
> >  DECLARE_UVERBS_NAMED_METHOD(
> > UVERBS_METHOD_ADVISE_MR,
> > UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,
> 
> > @@ -253,6 +364,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
> >  DECLARE_UVERBS_NAMED_OBJECT(
> > UVERBS_OBJECT_MR,
> > UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
> > +   &UVERBS_METHOD(UVERBS_METHOD_REG_DMABUF_MR),
> > &UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
> > &UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
> > &UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
> 
> I trie to keep these sorted, but I see it has become unsorted. You can 
> re-sort it in this patch

Will do.

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region

2020-11-06 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Friday, November 06, 2020 4:49 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory 
> region
> 
> On Fri, Nov 06, 2020 at 01:11:38AM +, Xiong, Jianxin wrote:
> > > On Thu, Nov 05, 2020 at 02:48:08PM -0800, Jianxin Xiong wrote:
> > > > @@ -966,7 +969,10 @@ static struct mlx5_ib_mr 
> > > > *alloc_mr_from_cache(struct ib_pd *pd,
> > > > struct mlx5_ib_mr *mr;
> > > > unsigned int page_size;
> > > >
> > > > -   page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 
> > > > 0, iova);
> > > > +   if (umem->is_dmabuf)
> > > > +   page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, 
> > > > iova);
> > >
> > > You said the sgl is not set here, why doesn't this crash? It is certainly 
> > > wrong to call this function without a SGL.
> >
> > The sgl is NULL, and nmap is 0. The 'for_each_sg' loop is just skipped and 
> > won't crash.
> 
> Just wire this to 4k it is clearer than calling some no-op pgsz

Ok

> 
> 
> > > > +   if (!mr->cache_ent) {
> > > > +   mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> > > > +   WARN_ON(mr->descs);
> > > > +   }
> > > > +}
> > >
> > > I would expect this to call ib_umem_dmabuf_unmap_pages() ?
> > >
> > > Who calls it on the dereg path?
> > >
> > > This looks quite strange to me, it calls ib_umem_dmabuf_unmap_pages() 
> > > only from the invalidate callback?
> >
> > It is also called from ib_umem_dmabuf_release().
> 
> Hmm, that is no how the other APIs work, the unmap should be paired with the 
> map in the caller, and the sequence for destroy should be
> 
>  invalidate
>  unmap
>  destroy_mkey
>  release_umem
> 
> I have another series coming that makes the other three destroy flows much 
> closer to that ideal.
> 

Can fix that.

> > > I feel uneasy how this seems to assume everything works sanely, we
> > > can have parallel page faults so pagefault_dmabuf_mr() can be called 
> > > multiple times after an invalidation, and it doesn't protect itself
> against calling ib_umem_dmabuf_map_pages() twice.
> > >
> > > Perhaps the umem code should keep track of the current map state and
> > > exit if there is already a sgl. NULL or not NULL sgl would do and seems 
> > > quite reasonable.
> >
> > Ib_umem_dmabuf_map() already checks the sgl and will do nothing if it is 
> > already set.
> 
> How? What I see in patch 1 is an unconditonal call to
> dma_buf_map_attachment() ?

My bad. I misread the lines. It used to be there (in v3) but somehow got lost. 

> 
> > > > +   if (is_dmabuf_mr(mr))
> > > > +   return pagefault_dmabuf_mr(mr, umem_dmabuf, 
> > > > user_va,
> > > > +  bcnt, bytes_mapped, 
> > > > flags);
> > >
> > > But this doesn't care about user_va or bcnt it just triggers the whole 
> > > thing to be remapped, so why calculate it?
> >
> > The range check is still needed, in order to catch application errors
> > of using incorrect address or count in verbs command. Passing the
> > values further in is to allow pagefault_dmabuf_mr to generate return
> > value and set bytes_mapped in a way consistent with the page fault
> > handler chain.
> 
> The HW validates the range. The range check in the ODP case is to protect 
> against a HW bug that would cause the kernel to malfunction.
> For dmabuf you don't need to do it

Ok.  So the handler can simply return 0 (as the number of pages mapped) and 
leave bytes_mapped untouched?

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region

2020-11-05 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Thursday, November 05, 2020 4:25 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory 
> region
> 
> On Thu, Nov 05, 2020 at 02:48:08PM -0800, Jianxin Xiong wrote:
> > @@ -966,7 +969,10 @@ static struct mlx5_ib_mr *alloc_mr_from_cache(struct 
> > ib_pd *pd,
> > struct mlx5_ib_mr *mr;
> > unsigned int page_size;
> >
> > -   page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> > +   if (umem->is_dmabuf)
> > +   page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
> 
> You said the sgl is not set here, why doesn't this crash? It is certainly 
> wrong to call this function without a SGL.

The sgl is NULL, and nmap is 0. The 'for_each_sg' loop is just skipped and 
won't crash.

> 
> > +/**
> > + * mlx5_ib_fence_dmabuf_mr - Stop all access to the dmabuf MR
> > + * @mr: to fence
> > + *
> > + * On return no parallel threads will be touching this MR and no DMA
> > +will be
> > + * active.
> > + */
> > +void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr) {
> > +   struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
> > +
> > +   /* Prevent new page faults and prefetch requests from succeeding */
> > +   xa_erase(&mr->dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key));
> > +
> > +   /* Wait for all running page-fault handlers to finish. */
> > +   synchronize_srcu(&mr->dev->odp_srcu);
> > +
> > +   wait_event(mr->q_deferred_work,
> > +!atomic_read(&mr->num_deferred_work));
> > +
> > +   dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > +   mlx5_mr_cache_invalidate(mr);
> > +   umem_dmabuf->private = NULL;
> > +   dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> > +
> > +   if (!mr->cache_ent) {
> > +   mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> > +   WARN_ON(mr->descs);
> > +   }
> > +}
> 
> I would expect this to call ib_umem_dmabuf_unmap_pages() ?
> 
> Who calls it on the dereg path?
> 
> This looks quite strange to me, it calls ib_umem_dmabuf_unmap_pages() only 
> from the invalidate callback?
>

It is also called from ib_umem_dmabuf_release(). 
 
> I feel uneasy how this seems to assume everything works sanely, we can have 
> parallel page faults so pagefault_dmabuf_mr() can be called
> multiple times after an invalidation, and it doesn't protect itself against 
> calling ib_umem_dmabuf_map_pages() twice.
> 
> Perhaps the umem code should keep track of the current map state and exit if 
> there is already a sgl. NULL or not NULL sgl would do and
> seems quite reasonable.
> 

Ib_umem_dmabuf_map() already checks the sgl and will do nothing if it is 
already set.

> > @@ -810,22 +871,31 @@ static int pagefault_mr(struct mlx5_ib_mr *mr, u64 
> > io_virt, size_t bcnt,
> > u32 *bytes_mapped, u32 flags)
> >  {
> > struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem);
> > +   struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
> >
> > lockdep_assert_held(&mr->dev->odp_srcu);
> > if (unlikely(io_virt < mr->mmkey.iova))
> > return -EFAULT;
> >
> > -   if (!odp->is_implicit_odp) {
> > +   if (is_dmabuf_mr(mr) || !odp->is_implicit_odp) {
> > u64 user_va;
> > +   u64 end;
> >
> > if (check_add_overflow(io_virt - mr->mmkey.iova,
> > -  (u64)odp->umem.address, &user_va))
> > +  (u64)mr->umem->address, &user_va))
> > return -EFAULT;
> > -   if (unlikely(user_va >= ib_umem_end(odp) ||
> > -ib_umem_end(odp) - user_va < bcnt))
> > +   if (is_dmabuf_mr(mr))
> > +   end = mr->umem->address + mr->umem->length;
> > +   else
> > +   end = ib_umem_end(odp);
> > +   if (unlikely(user_va >= end || end - user_va < bcnt))
> > return -EFAULT;
> > -   return pagefault_real_mr(mr, odp, user_va, bcnt, bytes_mapped,
> > -flags);
> > +   if (is_dmabuf_mr(mr))

RE: [PATCH v7 4/5] RDMA/mlx5: Support dma-buf based userspace memory region

2020-11-04 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Wednesday, November 04, 2020 4:07 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v7 4/5] RDMA/mlx5: Support dma-buf based userspace memory 
> region
> 
> On Wed, Nov 04, 2020 at 02:06:34PM -0800, Jianxin Xiong wrote:
> > +   umem = ib_umem_dmabuf_get(&dev->ib_dev, offset, length, fd, 
> > access_flags,
> > + &mlx5_ib_dmabuf_attach_ops);
> > +   if (IS_ERR(umem)) {
> > +   mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(umem));
> > +   return ERR_PTR(PTR_ERR(umem));
> > +   }
> > +
> > +   mr = alloc_mr_from_cache(pd, umem, virt_addr, access_flags);
> 
> It is very subtle, but this calls mlx5_umem_find_best_pgsz() which calls 
> ib_umem_find_best_pgsz() which goes over the SGL to determine
> the page size to use.
> 

When this is called here, the umem sglist is still NULL because 
dma_buf_map_attachment()
is not called until a page fault occurs. In patch 1/5, the function 
ib_umem_find_best_pgsz()
has been modified to always return PAGE_SIZE for dma-buf based MR.

> As part of this it does validation of the IOVA vs first page offset vs first 
> page dma address. These little details come into play if the IOVA and
> offset are not PAGE_SIZE aligned, which is very possible if the dma buf 
> exporter or system PAGE_SIZE is over 4k.
> 
> In other words, the dma_address of the first SGL must be the page aligned 
> starting point of the MR. Since the 'skip' approach is being done
> when breaking the SGL into blocks the ib_umem_find_best_pgsz() sees an 
> invalid page size.
> 
> Slicing it has to be done in a way that gives a properly formed SGL.
> 
> My suggestion is to just change the SGL in place. Iterate to the starting SGE 
> in the SGL and assign it to the sg table, modify it to have a offset
> dma_address and reduced length
> 
> Count the number of SGEs to span the remaning range and use that as the new 
> nmaped
> 
> Edit the last SGE to have a reduced length

Do you still think modifying the SGL in place needed given the above 
explanation? I do see
some benefits of doing so -- hiding the discrepancy of sgl and addr/length from 
the device drivers and avoid special handling in the code that use the sgl. 

> 
> Upon unmap undo the edits so the exporter doesn't see the mangled SGL.
> 
> It would be saner if the exporter could do this, but oh well.
> 
> Approximately like this:
> 
>   struct ib_umem *umem = &umem_p->umem;
>   struct scatterlist *sg;
>   int i;
> 
>   for_each_sg(umem_p->umem.sg_head.sgl, sg, umem_p->umem.nmap, i) {
>   if (cur + sg_dma_len(sg) > ALIGN_DOWN(umem->address, 
> PAGE_SIZE)) {
>   unsigned long offset;
> 
>   umem_p->first_sg = sg;
>   umem_p->first_dma_address = sg->dma_address;
>   umem_p->first_dma_length = sg_dma_len(sg);
>   umem_p->first_length = sg->length;
>   offset = ALIGN_DOWN(umem->addressm PAGE_SIZE) - cur;
>   sg->dma_address += offset;
>   sg_dma_len(sg) -= offset;
>   sg->length -= offset;
>   }
>   if (ALIGN(umem->address + umem->length, PAGE_SIZE) < cur + 
> sg_dma_len(sg)) {
>   unsigned long trim;
> 
>   umem_p->last_sg = sg;
>   umem_p->last_dma_length = sg_dma_len(sg);
>   umem_p->last_length = sg->length;
>   trim =  cur + sg_dma_len(sg) - ALIGN(umem->address + 
> umem->length, PAGE_SIZE);
>   sg_dma_len(sg) -= trim;
>   sg->length -= trim;
>   return npages;
>   }
> cur += sg_dma_len(sg);
>   }
> /* It is essential that the length of the SGL exactly match
>  the adjusted page aligned length of umem->length */
>   return -EINVAL;
> 
> Further, this really only works if the umem->page_size is locked to 4k 
> because this doesn't have code to resize the MKEY, or change the
> underlying page size when the SGL changes.

Yes, now it's locked to 4K. 

> 
> So, I'd say put something like the above in the core code to validate and 
> properly form the umem->sgl
> 
> Then modify the alloc_mr_from_cache to use only PAGE_SIZE:
> 
>  if (umem->is_dma_buf)
> page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);  else
>   page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> 

This should have been addressed in patch 1/5.

Thanks,
Jianxin

> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-11-03 Thread Xiong, Jianxin
> -Original Message-
> From: Daniel Vetter 
> Sent: Tuesday, November 03, 2020 12:43 PM
> To: Xiong, Jianxin 
> Cc: Jason Gunthorpe ; linux-rdma ; 
> dri-devel ; Leon
> Romanovsky ; Doug Ledford ; Vetter, 
> Daniel ; Christian Koenig
> 
> Subject: Re: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Tue, Nov 3, 2020 at 6:36 PM Xiong, Jianxin  wrote:
> >
> >
> > > -Original Message-
> > > From: Daniel Vetter 
> > > Sent: Friday, October 23, 2020 6:45 PM
> > > To: Jason Gunthorpe 
> > > Cc: Xiong, Jianxin ; linux-rdma
> > > ; dri-devel
> > > ; Leon Romanovsky
> > > ; Doug Ledford ; Vetter,
> > > Daniel ; Christian Koenig
> > > 
> > > Subject: Re: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as
> > > user memory region
> > >
> > > > > > +
> > > > > > +#ifdef CONFIG_DMA_VIRT_OPS
> > > > > > +   if (device->dma_device->dma_ops == &dma_virt_ops)
> > > > > > +   return ERR_PTR(-EINVAL); #endif
> > > > >
> > > > > Maybe I'm confused, but should we have this check in
> > > > > dma_buf_attach, or at least in dma_buf_dynamic_attach? The
> > > > > p2pdma functions use that too, and I can't imagine how zerocopy
> > > > > should work (which is like the entire point of
> > > > > dma-buf) when we have dma_virt_ops.
> > > >
> > > > The problem is we have RDMA drivers that assume SGL's have a valid
> > > > struct page, and these hacky/wrong P2P sgls that DMABUF creates
> > > > cannot be passed into those drivers.
> > > >
> > > > But maybe this is just a 'drivers are using it wrong' if they call
> > > > this function and expect struct pages..
> > > >
> > > > The check in the p2p stuff was done to avoid this too, but it was
> > > > on a different flow.
> > >
> > > Yeah definitely don't call dma_buf_map_attachment and expect a page
> > > back. In practice you'll get a page back fairly often, but I don't think 
> > > we want to bake that in, maybe we eventually get to non-hacky
> dma_addr_t only sgl.
> > >
> > > What I'm wondering is whether dma_buf_attach shouldn't reject such 
> > > devices directly, instead of each importer having to do that.
> >
> > Come back here to see if consensus can be reached on who should do the
> > check. My thinking is that it could be over restrictive for
> > dma_buf_attach to always reject dma_virt_ops. According to dma-buf
> > documentation the back storage would be moved to system area upon
> > mapping unless p2p is requested and can be supported by the exporter. The 
> > sg_list for system memory would have struct page present.
> 
> So I'm not clear on what this dma_virt_ops stuff is for, but if it's an 
> entirely virtual device with cpu access, then you shouldn't do
> dma_buf_map_attachment, and then peek at the struct page in the sgl.

This is the key, thanks for pointing that out. I was assuming the importer could
use the struct page if it exists. 

> Instead you need to use dma_buf_vmap/vunmap and dma_buf_begin/end_cpu_access. 
> Otherwise the coherency managed is all potentially
> busted. Also note that cpu access from the kernel to dma-buf is a rather 
> niche feature (only some usb device drivers use it), so expect warts.
> 

dma_virt_ops is a set of dma mapping operations that map page/sgl to virtual 
addresses
instead of dma addresses. Drivers that use dma_virt_ops would use the mapping
result for cpu access (to emulate DMA) instead of real DMA, and thus the dma 
mapping
returned from dma-buf is not compatible with the expectation of such drivers. 
If these
drivers are not allowed to peek into the struct page of the sgl, they have no 
way to
correctly use the sgl. In this sense I agree that drivers that use dma_virt_ops 
should not
call dma_buf_attach(). They should use dma_buf_vmap() et al to get cpu access. 

> If this is the case, then I think dma_buf_attach should check for this and 
> reject such imports, since that's an importer bug.

So here we go. I will move the check to dma_buf_dynamic_attach (and 
dma_buf_attach
is a wrapper over that).

> 
> If it's otoh something rdma specific, then I guess rdma checking for this is 
> ok.
> 
> As a third option, if it's something about the connectivity between the 
> importing and exporting device, then this should be checked in the
> ->attach callback the exporter can provide, like the p2p check. The
> idea here is that for device specific remapping units (mostly found ond SoC, 
> and not something like a standard iommu managed by the dma-
> api), some of which can even do funny stuff like rotation of 2d images, can 
> be access by some, but not other. And only the exporter is
> aware of these restrictions.
> 
> Now I dunno which case this one here is.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-11-03 Thread Xiong, Jianxin


> -Original Message-
> From: Daniel Vetter 
> Sent: Friday, October 23, 2020 6:45 PM
> To: Jason Gunthorpe 
> Cc: Xiong, Jianxin ; linux-rdma 
> ; dri-devel ; 
> Leon
> Romanovsky ; Doug Ledford ; Vetter, 
> Daniel ; Christian Koenig
> 
> Subject: Re: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> > > > +
> > > > +#ifdef CONFIG_DMA_VIRT_OPS
> > > > +   if (device->dma_device->dma_ops == &dma_virt_ops)
> > > > +   return ERR_PTR(-EINVAL); #endif
> > >
> > > Maybe I'm confused, but should we have this check in dma_buf_attach,
> > > or at least in dma_buf_dynamic_attach? The p2pdma functions use that
> > > too, and I can't imagine how zerocopy should work (which is like the
> > > entire point of
> > > dma-buf) when we have dma_virt_ops.
> >
> > The problem is we have RDMA drivers that assume SGL's have a valid
> > struct page, and these hacky/wrong P2P sgls that DMABUF creates cannot
> > be passed into those drivers.
> >
> > But maybe this is just a 'drivers are using it wrong' if they call
> > this function and expect struct pages..
> >
> > The check in the p2p stuff was done to avoid this too, but it was on a
> > different flow.
> 
> Yeah definitely don't call dma_buf_map_attachment and expect a page back. In 
> practice you'll get a page back fairly often, but I don't think
> we want to bake that in, maybe we eventually get to non-hacky dma_addr_t only 
> sgl.
> 
> What I'm wondering is whether dma_buf_attach shouldn't reject such devices 
> directly, instead of each importer having to do that.

Come back here to see if consensus can be reached on who should do the check. My
thinking is that it could be over restrictive for dma_buf_attach to always 
reject 
dma_virt_ops. According to dma-buf documentation the back storage would be
moved to system area upon mapping unless p2p is requested and can be supported
by the exporter. The sg_list for system memory would have struct page present. 

-Jianxin
  
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH] dma-buf: Fix static checker warning

2020-11-03 Thread Xiong, Jianxin
> -Original Message-
> From: Christian König 
> Sent: Monday, November 02, 2020 11:57 PM
> To: Xiong, Jianxin ; dri-devel@lists.freedesktop.org
> Cc: Sumit Semwal ; Vetter, Daniel 
> 
> Subject: Re: [PATCH] dma-buf: Fix static checker warning
> 
> Am 03.11.20 um 04:51 schrieb Jianxin Xiong:
> > Here is the warning message:
> >
> > drivers/dma-buf/dma-buf.c:917 dma_buf_map_attachment()
> > error: 'sg_table' dereferencing possible ERR_PTR()
> >
> > Fix by adding error checking before dereferencing the pointer.
> >
> > Fixes: ac80cd17a615 ("dma-buf: Clarify that dma-buf sg lists are page
> > aligned")
> > Reported-by: Dan Carpenter 
> > Signed-off-by: Jianxin Xiong 
> 
> Reviewed-by: Christian König 
> 
> Do you have commit access to drm-misc-next or should I push it?

I don't have commit access.

> 
> > ---
> >   drivers/dma-buf/dma-buf.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index 556f62e..0eb80c1 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -908,7 +908,7 @@ struct sg_table *dma_buf_map_attachment(struct 
> > dma_buf_attachment *attach,
> > }
> >
> >   #ifdef CONFIG_DMA_API_DEBUG
> > -   {
> > +   if (!IS_ERR(sg_table)) {
> > struct scatterlist *sg;
> > u64 addr;
> > int len;

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [bug report] dma-buf: Clarify that dma-buf sg lists are page aligned

2020-11-02 Thread Xiong, Jianxin
Hi Dan,

Thanks for reporting the bug. I see what was missing. Am I supposed to submit a 
patch to
replace the original patch or just to fix this bug?

Jianxin 

> -Original Message-
> From: Dan Carpenter 
> Sent: Monday, November 02, 2020 12:15 AM
> To: Xiong, Jianxin 
> Cc: dri-devel@lists.freedesktop.org
> Subject: [bug report] dma-buf: Clarify that dma-buf sg lists are page aligned
> 
> Hello Jianxin Xiong,
> 
> The patch ac80cd17a615: "dma-buf: Clarify that dma-buf sg lists are page 
> aligned" from Oct 14, 2020, leads to the following static checker
> warning:
> 
>   drivers/dma-buf/dma-buf.c:917 dma_buf_map_attachment()
>   error: 'sg_table' dereferencing possible ERR_PTR()
> 
> drivers/dma-buf/dma-buf.c
>897  sg_table = attach->dmabuf->ops->map_dma_buf(attach, 
> direction);
>898  if (!sg_table)
>899  sg_table = ERR_PTR(-ENOMEM);
>900
>901  if (IS_ERR(sg_table) && dma_buf_is_dynamic(attach->dmabuf) &&
>902   !IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY))
>903  dma_buf_unpin(attach);
>904
>905  if (!IS_ERR(sg_table) && 
> attach->dmabuf->ops->cache_sgt_mapping) {
> ^
> 
>906  attach->sgt = sg_table;
>907  attach->dir = direction;
>908  }
>909
>910  #ifdef CONFIG_DMA_API_DEBUG
>911  {
>912  struct scatterlist *sg;
>913  u64 addr;
>914  int len;
>915  int i;
>916
>917  for_each_sgtable_dma_sg(sg_table, sg, i) {
> ^ Not checked here.
> 
>918  addr = sg_dma_address(sg);
>919  len = sg_dma_len(sg);
>920  if (!PAGE_ALIGNED(addr) || 
> !PAGE_ALIGNED(len)) {
>921  pr_debug("%s: addr %llx or len %x is 
> not page aligned!\n",
>922   __func__, addr, len);
>923  }
>924  }
>925  }
>926  #endif /* CONFIG_DMA_API_DEBUG */
> 
> regards,
> dan carpenter
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v6 4/4] RDMA/mlx5: Support dma-buf based userspace memory region

2020-10-28 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Wednesday, October 28, 2020 9:36 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v6 4/4] RDMA/mlx5: Support dma-buf based userspace memory 
> region
> 
> On Tue, Oct 27, 2020 at 08:33:52PM +, Xiong, Jianxin wrote:
> > > > @@ -801,6 +816,52 @@ static int pagefault_implicit_mr(struct mlx5_ib_mr 
> > > > *imr,
> > > >   * Returns:
> > > >   *  -EFAULT: The io_virt->bcnt is not within the MR, it covers pages 
> > > > that are
> > > >   *   not accessible, or the MR is no longer valid.
> > > > + *  -EAGAIN: The operation should be retried
> > > > + *
> > > > + *  >0: Number of pages mapped
> > > > + */
> > > > +static int pagefault_dmabuf_mr(struct mlx5_ib_mr *mr, struct ib_umem 
> > > > *umem,
> > > > +  u64 io_virt, size_t bcnt, u32 
> > > > *bytes_mapped,
> > > > +  u32 flags)
> > > > +{
> > > > +   struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(umem);
> > > > +   u64 user_va;
> > > > +   u64 end;
> > > > +   int npages;
> > > > +   int err;
> > > > +
> > > > +   if (unlikely(io_virt < mr->mmkey.iova))
> > > > +   return -EFAULT;
> > > > +   if (check_add_overflow(io_virt - mr->mmkey.iova,
> > > > +  (u64)umem->address, &user_va))
> > > > +   return -EFAULT;
> > > > +   /* Overflow has alreddy been checked at the umem creation time 
> > > > */
> > > > +   end = umem->address + umem->length;
> > > > +   if (unlikely(user_va >= end || end  - user_va < bcnt))
> > > > +   return -EFAULT;
> > >
> > > Why duplicate this sequence? Caller does it
> >
> > The sequence in the caller is for umem_odp only.
> 
> Nothing about umem_odp in this code though??

The code in the caller uses ib_umem_end(odp) instead of the 'end' here, but we
can consolidate that with some minor changes.
  
> 
> > > > /* prefetch with write-access must be supported by the MR */
> > > > if (advice == IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH_WRITE &&
> > > > -   !odp->umem.writable)
> > > > +   !mr->umem->writable)
> > >
> > > ??
> 
> > There is no need to use umem_odp here, mr->umem is the same as &odp->umem.
> > This change makes the code works for both umem_odp and umem_dmabuf.
> 
> Ok
> 
> Can you please also think about how to test this? I very much prefer to see 
> new pyverbs tests for new APIs.
> 
> Distros are running the rdma-core test suite, if you want this to work widely 
> we need a public test for it.
> 

Will look into that.

> Thanks,
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v6 4/4] RDMA/mlx5: Support dma-buf based userspace memory region

2020-10-27 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Tuesday, October 27, 2020 1:08 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v6 4/4] RDMA/mlx5: Support dma-buf based userspace memory 
> region
> 
> On Fri, Oct 23, 2020 at 09:40:01AM -0700, Jianxin Xiong wrote:
> 
> > diff --git a/drivers/infiniband/hw/mlx5/mr.c
> > b/drivers/infiniband/hw/mlx5/mr.c index b261797..3bc412b 100644
> > +++ b/drivers/infiniband/hw/mlx5/mr.c
> > @@ -1,5 +1,6 @@
> >  /*
> >   * Copyright (c) 2013-2015, Mellanox Technologies. All rights reserved.
> > + * Copyright (c) 2020, Intel Corporation. All rights reserved.
> >   *
> >   * This software is available to you under a choice of one of two
> >   * licenses.  You may choose to be licensed under the terms of the
> > GNU @@ -36,6 +37,8 @@  #include   #include
> >   #include 
> > +#include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -1113,6 +1116,8 @@ int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 
> > idx, int npages,
> > dma_sync_single_for_cpu(ddev, dma, size, DMA_TO_DEVICE);
> > if (mr->umem->is_odp) {
> > mlx5_odp_populate_xlt(xlt, idx, npages, mr, flags);
> > +   } else if (mr->umem->is_dmabuf && (flags & 
> > MLX5_IB_UPD_XLT_ZAP)) {
> > +   memset(xlt, 0, size);
> > } else {
> > __mlx5_ib_populate_pas(dev, mr->umem, page_shift, idx,
> >npages, xlt,
> > @@ -1462,6 +1467,111 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, 
> > u64 start, u64 length,
> > return ERR_PTR(err);
> >  }
> >
> > +static void mlx5_ib_dmabuf_invalidate_cb(struct dma_buf_attachment
> > +*attach) {
> > +   struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
> > +   struct mlx5_ib_mr *mr = umem_dmabuf->device_context;
> > +
> > +   mlx5_ib_update_xlt(mr, 0, mr->npages, PAGE_SHIFT, MLX5_IB_UPD_XLT_ZAP);
> > +   ib_umem_dmabuf_unmap_pages(umem_dmabuf);
> > +}
> > +
> > +static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = {
> > +   .allow_peer2peer = 1,
> > +   .move_notify = mlx5_ib_dmabuf_invalidate_cb, };
> > +
> > +struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 offset,
> > +u64 length, u64 virt_addr,
> > +int fd, int access_flags,
> > +struct ib_udata *udata)
> > +{
> > +   struct mlx5_ib_dev *dev = to_mdev(pd->device);
> > +   struct mlx5_ib_mr *mr = NULL;
> > +   struct ib_umem *umem;
> > +   struct ib_umem_dmabuf *umem_dmabuf;
> > +   int npages;
> > +   int order;
> > +   int err;
> > +
> > +   if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM))
> > +   return ERR_PTR(-EOPNOTSUPP);
> > +
> > +   mlx5_ib_dbg(dev,
> > +   "offset 0x%llx, virt_addr 0x%llx, length 0x%llx, fd %d, 
> > access_flags 0x%x\n",
> > +   offset, virt_addr, length, fd, access_flags);
> > +
> > +   if (!mlx5_ib_can_load_pas_with_umr(dev, length))
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   umem = ib_umem_dmabuf_get(&dev->ib_dev, offset, length, fd, 
> > access_flags,
> > + &mlx5_ib_dmabuf_attach_ops);
> > +   if (IS_ERR(umem)) {
> > +   mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(umem));
> > +   return ERR_PTR(PTR_ERR(umem));
> > +   }
> > +
> > +   npages = ib_umem_num_pages(umem);
> > +   if (!npages) {
> 
> ib_umem_get should reject invalid umems like this

Will move the check to ib_umem_dmabuf_get().

> 
> > +   mlx5_ib_warn(dev, "avoid zero region\n");
> > +   ib_umem_release(umem);
> > +   return ERR_PTR(-EINVAL);
> > +   }
> > +
> > +   order = ilog2(roundup_pow_of_two(npages));
> 
> Must always call ib_umem_find_best_pgsz(), specify PAGE_SIZE as the argument 
> for this scenario

Will fix.

> 
> > +   mlx5_ib_dbg(dev, "npages %d, ncont %d, order %d, page_shift %d\n",
> > +   npages, npages, order, PAGE_SHIFT);
> > +
> > +   mr = alloc_mr_from_cache(pd, umem, virt_addr, length, npages,
> > +PAGE

RE: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-10-27 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Tuesday, October 27, 2020 1:00 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Fri, Oct 23, 2020 at 09:39:58AM -0700, Jianxin Xiong wrote:
> > +/*
> > + * Generate a new dma sg list from a sub range of an existing dma sg list.
> > + * Both the input and output have their entries page aligned.
> > + */
> > +static int ib_umem_dmabuf_sgt_slice(struct sg_table *sgt, u64 offset,
> > +   u64 length, struct sg_table *new_sgt) {
> > +   struct scatterlist *sg, *new_sg;
> > +   u64 start, end, off, addr, len;
> > +   unsigned int new_nents;
> > +   int err;
> > +   int i;
> > +
> > +   start = ALIGN_DOWN(offset, PAGE_SIZE);
> > +   end = ALIGN(offset + length, PAGE_SIZE);
> > +
> > +   offset = start;
> > +   length = end - start;
> > +   new_nents = 0;
> > +   for_each_sgtable_dma_sg(sgt, sg, i) {
> > +   len = sg_dma_len(sg);
> > +   off = min(len, offset);
> > +   len -= off;
> > +   len = min(len, length);
> > +   if (len)
> > +   new_nents++;
> > +   length -= len;
> > +   offset -= off;
> > +   }
> > +
> > +   err = sg_alloc_table(new_sgt, new_nents, GFP_KERNEL);
> > +   if (err)
> > +   return err;
> 
> I would really rather not allocate an entirely new table just to take a slice 
> of an existing SGT. Ideally the expoter API from DMA buf would
> prepare the SGL slice properly instead of always giving a whole buffer.
> 
> Alternatively making some small edit to rdma_umem_for_each_dma_block() and 
> ib_umem_find_best_pgsz() would let it slice the SGL at
> runtime
> 
> You need to rebase on top of this series:
> 
> https://patchwork.kernel.org/project/linux-rdma/list/?series=370437
> 
> Which makes mlx5 use those new APIs
> 
> Jason

Thanks. Will rebase and work on the runtime slicing.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-10-27 Thread Xiong, Jianxin
> -Original Message-
> From: Christoph Hellwig 
> Sent: Tuesday, October 27, 2020 1:08 AM
> To: Jason Gunthorpe 
> Cc: Christoph Hellwig ; Daniel Vetter ; 
> Xiong, Jianxin ; linux-
> r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Leon Romanovsky 
> ; Doug Ledford ;
> Vetter, Daniel ; Christian Koenig 
> 
> Subject: Re: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Mon, Oct 26, 2020 at 09:26:37AM -0300, Jason Gunthorpe wrote:
> > On Sat, Oct 24, 2020 at 08:48:07AM +0100, Christoph Hellwig wrote:
> > > On Fri, Oct 23, 2020 at 03:20:05PM -0300, Jason Gunthorpe wrote:
> > > > The problem is we have RDMA drivers that assume SGL's have a valid
> > > > struct page, and these hacky/wrong P2P sgls that DMABUF creates
> > > > cannot be passed into those drivers.
> > >
> > > RDMA drivers do not assume scatterlist have a valid struct page,
> > > scatterlists are defined to have a valid struct page.  Any
> > > scatterlist without a struct page is completely buggy.
> >
> > It is not just having the struct page, it needs to be a CPU accessible
> > one for memcpy/etc. They aren't correct with the
> > MEMORY_DEVICE_PCI_P2PDMA SGLs either.
> 
> Exactly.

In the function ib_umem_dmabuf_sgt_slice() (part of this patch) we could 
generate
a dma address array instead of filling the scatterlist 'umem->sg_head'. The 
array
would be handled similar to 'umem_odp->dma_list'. With such change, the RDMA
drivers wouldn't see incorrectly formed scatterlist. The check for dma_virt_ops 
here
wouldn't be needed either.

Would such proposal address the concern here?

-Jianxin
 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-10-23 Thread Xiong, Jianxin


> -Original Message-
> From: Daniel Vetter 
> Sent: Friday, October 23, 2020 9:49 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Leon 
> Romanovsky ; Jason Gunthorpe ;
> Doug Ledford ; Vetter, Daniel ; 
> Christian Koenig 
> Subject: Re: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Fri, Oct 23, 2020 at 09:39:58AM -0700, Jianxin Xiong wrote:
> > Dma-buf is a standard cross-driver buffer sharing mechanism that can
> > be used to support peer-to-peer access from RDMA devices.
> >
> > Device memory exported via dma-buf is associated with a file descriptor.
> > This is passed to the user space as a property associated with the
> > buffer allocation. When the buffer is registered as a memory region,
> > the file descriptor is passed to the RDMA driver along with other
> > parameters.
> >
> > Implement the common code for importing dma-buf object and mapping
> > dma-buf pages.
> >
> > Signed-off-by: Jianxin Xiong 
> > Reviewed-by: Sean Hefty 
> > Acked-by: Michael J. Ruhl 
> > Acked-by: Christian Koenig 
> > Acked-by: Daniel Vetter 
> > ---
> >  drivers/infiniband/core/Makefile  |   2 +-
> >  drivers/infiniband/core/umem.c|   4 +
> >  drivers/infiniband/core/umem_dmabuf.c | 197
> > ++
> >  drivers/infiniband/core/umem_dmabuf.h |  11 ++
> >  include/rdma/ib_umem.h|  35 +-
> >  5 files changed, 247 insertions(+), 2 deletions(-)  create mode
> > 100644 drivers/infiniband/core/umem_dmabuf.c
> >  create mode 100644 drivers/infiniband/core/umem_dmabuf.h
> >
> > diff --git a/drivers/infiniband/core/Makefile
> > b/drivers/infiniband/core/Makefile
> > index ccf2670..8ab4eea 100644
> > --- a/drivers/infiniband/core/Makefile
> > +++ b/drivers/infiniband/core/Makefile
> > @@ -40,5 +40,5 @@ ib_uverbs-y :=uverbs_main.o 
> > uverbs_cmd.o uverbs_marshall.o \
> > uverbs_std_types_srq.o \
> > uverbs_std_types_wq.o \
> > uverbs_std_types_qp.o
> > -ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
> > +ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
> >  ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o diff
> > --git a/drivers/infiniband/core/umem.c
> > b/drivers/infiniband/core/umem.c index e9fecbd..2c45525 100644
> > --- a/drivers/infiniband/core/umem.c
> > +++ b/drivers/infiniband/core/umem.c
> > @@ -2,6 +2,7 @@
> >   * Copyright (c) 2005 Topspin Communications.  All rights reserved.
> >   * Copyright (c) 2005 Cisco Systems.  All rights reserved.
> >   * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
> > + * Copyright (c) 2020 Intel Corporation. All rights reserved.
> >   *
> >   * This software is available to you under a choice of one of two
> >   * licenses.  You may choose to be licensed under the terms of the
> > GNU @@ -43,6 +44,7 @@  #include 
> >
> >  #include "uverbs.h"
> > +#include "umem_dmabuf.h"
> >
> >  static void __ib_umem_release(struct ib_device *dev, struct ib_umem
> > *umem, int dirty)  { @@ -269,6 +271,8 @@ void ib_umem_release(struct
> > ib_umem *umem)  {
> > if (!umem)
> > return;
> > +   if (umem->is_dmabuf)
> > +   return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
> > if (umem->is_odp)
> > return ib_umem_odp_release(to_ib_umem_odp(umem));
> >
> > diff --git a/drivers/infiniband/core/umem_dmabuf.c
> > b/drivers/infiniband/core/umem_dmabuf.c
> > new file mode 100644
> > index 000..66b234d
> > --- /dev/null
> > +++ b/drivers/infiniband/core/umem_dmabuf.c
> > @@ -0,0 +1,197 @@
> > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> > +/*
> > + * Copyright (c) 2020 Intel Corporation. All rights reserved.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "uverbs.h"
> > +#include "umem_dmabuf.h"
> > +
> > +/*
> > + * Generate a new dma sg list from a sub range of an existing dma sg list.
> > + * Both the input and output have their entries page aligned.
> > + */
> > +static int ib_umem_dmabuf_sgt_slice(struct sg_table *sgt, u64 offset,
> > +   u64 length, struct sg_table *new_sgt) {
> > +   struct scatterlist *sg, *new_sg;
> > +   u64 start, end, off,

RE: [PATCH v5 0/5] RDMA: Add dma-buf support

2020-10-20 Thread Xiong, Jianxin
> -Original Message-
> From: John Hubbard 
> Sent: Tuesday, October 20, 2020 6:42 PM
> To: Xiong, Jianxin ; linux-r...@vger.kernel.org; 
> dri-devel@lists.freedesktop.org
> Cc: Doug Ledford ; Jason Gunthorpe ; Leon 
> Romanovsky ; Sumit Semwal
> ; Christian Koenig ; 
> Vetter, Daniel 
> Subject: Re: [PATCH v5 0/5] RDMA: Add dma-buf support
> 
> On 10/15/20 3:02 PM, Jianxin Xiong wrote:
> > This is the fifth version of the patch set. Changelog:
> >
> 
> Hi,
> 
> A minor point, but if you can tweak your email sending setup, it would be 
> nice.
> Specifically, make follow-up patches a reply to the first item. That's a list 
> convention, and git format-patch + git send-email *.patch is
> normally sufficient to make that happen, unless you override it by doing 
> something like sending each patch separately...which is my first
> suspicion as to how this happened.
> 
> These patches are difficult to link to, because they don't follow the 
> convention of patches 1-5 being in-reply-to patch 0. So if we want to
> ask people outside of this list to take a peek (I was about to), we have to 
> go collect 5 or 6 different lore.kernel.org URLs, one for each
> patch...
> 
> Take a look on lore and you can see the problem. Here's patch 0, and there is 
> no way from there to find the remaining patches:
> 
> 
> https://lore.kernel.org/dri-devel/1602799340-138152-1-git-send-email-jianxin.xi...@intel.com/
> 
> 
Hi John,

Thanks for pointing this out. I didn't realize sending out patches individually 
would cause
the difference compared to sending with a single command. Only version 4 and 5 
have this
issue and I will switch back to my old script for the next version.

Thanks,
Jianxin

> thanks,
> --
> John Hubbard
> NVIDIA
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-10-18 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Friday, October 16, 2020 6:05 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Sat, Oct 17, 2020 at 12:57:21AM +, Xiong, Jianxin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Friday, October 16, 2020 5:28 PM
> > > To: Xiong, Jianxin 
> > > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org;
> > > Doug Ledford ; Leon Romanovsky
> > > ; Sumit Semwal ; Christian
> > > Koenig ; Vetter, Daniel
> > > 
> > > Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as
> > > user memory region
> > >
> > > On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote:
> > > > +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
> > > > +  unsigned long addr, size_t size,
> > > > +  int dmabuf_fd, int access,
> > > > +  const struct ib_umem_dmabuf_ops 
> > > > *ops) {
> > > > +   struct dma_buf *dmabuf;
> > > > +   struct ib_umem_dmabuf *umem_dmabuf;
> > > > +   struct ib_umem *umem;
> > > > +   unsigned long end;
> > > > +   long ret;
> > > > +
> > > > +   if (check_add_overflow(addr, (unsigned long)size, &end))
> > > > +   return ERR_PTR(-EINVAL);
> > > > +
> > > > +   if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
> > > > +   return ERR_PTR(-EINVAL);
> > > > +
> > > > +   if (unlikely(!ops || !ops->invalidate || !ops->update))
> > > > +   return ERR_PTR(-EINVAL);
> > > > +
> > > > +   umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
> > > > +   if (!umem_dmabuf)
> > > > +   return ERR_PTR(-ENOMEM);
> > > > +
> > > > +   umem_dmabuf->ops = ops;
> > > > +   INIT_WORK(&umem_dmabuf->work, ib_umem_dmabuf_work);
> > > > +
> > > > +   umem = &umem_dmabuf->umem;
> > > > +   umem->ibdev = device;
> > > > +   umem->length = size;
> > > > +   umem->address = addr;
> > >
> > > addr here is offset within the dma buf, but this code does nothing with 
> > > it.
> > >
> > The current code assumes 0 offset, and 'addr' is the nominal starting
> > address of the buffer. If this is to be changed to offset, then yes,
> > some more handling is needed as you mentioned below.
> 
> There is no such thing as 'nominal starting address'
> 
> If the user is to provide any argument it can only be offset and length.
> 
> > > Also, dma_buf_map_attachment() does not do the correct dma mapping
> > > for RDMA, eg it does not use ib_dma_map(). This is not a problem for
> > > mlx5 but it is troublesome to put in the core code.
> >
> > ib_dma_map() uses dma_map_single(), GPU drivers use dma_map_resource()
> > for dma_buf_map_attachment(). They belong to the same family, but take
> > different address type (kernel address vs MMIO physical address).
> > Could you elaborate what the problem could be for non-mlx5 HCAs?
> 
> They use the virtual dma ops which we intend to remove

We can have a check with the dma device before attaching the dma-buf and thus 
ib_umem_dmabuf_get() call from such drivers would fail. Something like:

#ifdef CONFIG_DMA_VIRT_OPS
if (device->dma_device->dma_ops == &dma_virt_ops)
return ERR_PTR(-EINVAL);
#endif
 
> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v5 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration

2020-10-16 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Friday, October 16, 2020 5:18 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v5 3/5] RDMA/uverbs: Add uverbs command for dma-buf based 
> MR registration
> 
> On Thu, Oct 15, 2020 at 03:02:55PM -0700, Jianxin Xiong wrote:
> > Implement a new uverbs ioctl method for memory registration with file
> > descriptor as an extra parameter.
> >
> > Signed-off-by: Jianxin Xiong 
> > Reviewed-by: Sean Hefty 
> > Acked-by: Michael J. Ruhl 
> > Acked-by: Christian Koenig 
> > Acked-by: Daniel Vetter 
> > drivers/infiniband/core/uverbs_std_types_mr.c | 112 
> > ++
> >  include/uapi/rdma/ib_user_ioctl_cmds.h|  14 
> >  2 files changed, 126 insertions(+)
> >
> > diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c
> > b/drivers/infiniband/core/uverbs_std_types_mr.c
> > index 9b22bb5..e54459f 100644
> > +++ b/drivers/infiniband/core/uverbs_std_types_mr.c
> > @@ -1,5 +1,6 @@
> >  /*
> >   * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
> > + * Copyright (c) 2020, Intel Corporation.  All rights reserved.
> >   *
> >   * This software is available to you under a choice of one of two
> >   * licenses.  You may choose to be licensed under the terms of the
> > GNU @@ -178,6 +179,85 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
> > return IS_UVERBS_COPY_ERR(ret) ? ret : 0;  }
> >
> > +static int UVERBS_HANDLER(UVERBS_METHOD_REG_DMABUF_MR)(
> > +   struct uverbs_attr_bundle *attrs)
> > +{
> > +   struct ib_uobject *uobj =
> > +   uverbs_attr_get_uobject(attrs, 
> > UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
> > +   struct ib_pd *pd =
> > +   uverbs_attr_get_obj(attrs, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE);
> > +   struct ib_device *ib_dev = pd->device;
> > +
> > +   u64 start, length, virt_addr;
> > +   u32 fd, access_flags;
> > +   struct ib_mr *mr;
> > +   int ret;
> > +
> > +   if (!ib_dev->ops.reg_user_mr_dmabuf)
> > +   return -EOPNOTSUPP;
> > +
> > +   ret = uverbs_copy_from(&start, attrs,
> > +  UVERBS_ATTR_REG_DMABUF_MR_ADDR);
> 
> This should be called OFFSET uniformly here and in all the call chain below. 
> Not start and not addr.

Right now it does mean the starting address, but that can be changed.

> 
> > +   if (ret)
> > +   return ret;
> > +
> > +   ret = uverbs_copy_from(&length, attrs,
> > +  UVERBS_ATTR_REG_DMABUF_MR_LENGTH);
> > +   if (ret)
> > +   return ret;
> > +
> > +   ret = uverbs_copy_from(&virt_addr, attrs,
> > +  UVERBS_ATTR_REG_DMABUF_MR_HCA_VA);
> 
> I've been trying to call this IOVA

IOVA sounds good to me.

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-10-16 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Friday, October 16, 2020 5:28 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote:
> > +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
> > +  unsigned long addr, size_t size,
> > +  int dmabuf_fd, int access,
> > +  const struct ib_umem_dmabuf_ops *ops) {
> > +   struct dma_buf *dmabuf;
> > +   struct ib_umem_dmabuf *umem_dmabuf;
> > +   struct ib_umem *umem;
> > +   unsigned long end;
> > +   long ret;
> > +
> > +   if (check_add_overflow(addr, (unsigned long)size, &end))
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   if (unlikely(!ops || !ops->invalidate || !ops->update))
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
> > +   if (!umem_dmabuf)
> > +   return ERR_PTR(-ENOMEM);
> > +
> > +   umem_dmabuf->ops = ops;
> > +   INIT_WORK(&umem_dmabuf->work, ib_umem_dmabuf_work);
> > +
> > +   umem = &umem_dmabuf->umem;
> > +   umem->ibdev = device;
> > +   umem->length = size;
> > +   umem->address = addr;
> 
> addr here is offset within the dma buf, but this code does nothing with it.
> 
The current code assumes 0 offset, and 'addr' is the nominal starting address 
of the
buffer. If this is to be changed to offset, then yes, some more handling is 
needed
as you mentioned below.

> dma_buf_map_attachment gives a complete SGL for the entire DMA buf, but 
> offset/length select a subset.
> 
> You need to edit the sgls to make them properly span the sub-range and follow 
> the peculiar rules for how SGLs in ib_umem's have to be
> constructed.
> 
> Who validates that the total dma length of the SGL is exactly equal to 
> length? That is really important too.
> 
> Also, dma_buf_map_attachment() does not do the correct dma mapping for RDMA, 
> eg it does not use ib_dma_map(). This is not a problem
> for mlx5 but it is troublesome to put in the core code.

ib_dma_map() uses dma_map_single(), GPU drivers use dma_map_resource() for
dma_buf_map_attachment(). They belong to the same family, but take different
address type (kernel address vs MMIO physical address). Could you elaborate what
the problem could be for non-mlx5 HCAs?
 
> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v5 4/5] RDMA/mlx5: Support dma-buf based userspace memory region

2020-10-16 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Friday, October 16, 2020 11:58 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v5 4/5] RDMA/mlx5: Support dma-buf based userspace memory 
> region
> 
> On Fri, Oct 16, 2020 at 06:40:01AM +, Xiong, Jianxin wrote:
> > > > +   if (!mr)
> > > > +   return -EINVAL;
> > > > +
> > > > +   return mlx5_ib_update_xlt(mr, 0, mr->npages, PAGE_SHIFT, flags);
> > > > +}
> > > > +
> > > > +static struct ib_umem_dmabuf_ops mlx5_ib_umem_dmabuf_ops = {
> > > > +   .init = mlx5_ib_umem_dmabuf_xlt_init,
> > > > +   .update = mlx5_ib_umem_dmabuf_xlt_update,
> > > > +   .invalidate = mlx5_ib_umem_dmabuf_xlt_invalidate,
> > > > +};
> > >
> > > I'm not really convinced these should be ops, this is usually a bad 
> > > design pattern.
> > >
> > > Why do I need so much code to extract the sgl from the dma_buf? I
> > > would prefer the dma_buf layer simplify this, not by adding a wrapper 
> > > around it in the IB core code...
> > >
> >
> > We just need a way to call a device specific function to update the
> > NIC's translation table.  I considered three ways: (1) ops registered
> > with ib_umem_get_dmabuf;
> > (2) a single function pointer registered with ib_umem_get_dmabuf; (3)
> > a method in 'struct ib_device'. Option (1) was chosen here with no
> > strong reason. We could consolidate the three functions of the ops into 
> > one, but then we will need to
> > define commands or flags for different update operations.
> 
> I'd rather the driver directly provide the dma_buf ops.. Inserting layers 
> that do nothing be call other layers is usually a bad idea. I didn't look
> carefully yet at how that would be arranged.

I can work along that direction. One change I can see is that the umem_dmabuf 
structure
will need to be exposed to the device driver (currently it's private to the 
core).
 
> 
> > > > +   ncont = npages;
> > > > +   order = ilog2(roundup_pow_of_two(ncont));
> > >
> > > We still need to deal with contiguity here, this ncont/npages is just 
> > > obfuscation.
> >
> > Since the pages can move, we can't take advantage of contiguity here.
> > This handling is similar to the ODP case. The variables 'ncont' and 
> > 'page_shift' here are not necessary.
> > They are kept just for the sake of signifying the semantics of the
> > following functions that use them.
> 
> Well, in this case we can manage it, and the performance boost is high enough 
> we need to. The work on mlx5 to do it is a bit inovlved
> though.

Maybe as a future enhancement?

> 
> > > > +   err = ib_umem_dmabuf_init_mapping(umem, mr);
> > > > +   if (err) {
> > > > +   dereg_mr(dev, mr);
> > > > +   return ERR_PTR(err);
> > > > +   }
> > >
> > > Did you test the page fault path at all? Looks like some xarray code
> > > is missing here, and this is also missing the related complex teardown 
> > > logic.
> > >
> > > Does this mean you didn't test the pagefault_dmabuf_mr() at all?
> >
> > Thanks for the hint. I was unable to get the test runs reaching the
> > pagefault_dmabuf_mr() function. Now I have found the reason. Along the
> > path of all the page fault handlers, the array "odp_mkeys" is checked
> > against the mr key. Since the dmabuf mr keys are not in the list the
> > handler is never called.
> >
> > On the other hand, it seems that pagefault_dmabuf_mr() is not needed at all.
> > The pagefault is gracefully handled by retrying until the work thread
> > finished programming the NIC.
> 
> This is a bug of some kind, pagefaults that can't find a mkey in the xarray 
> should cause completion with error.
> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-10-16 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Friday, October 16, 2020 12:00 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote:
> 
> > +static void ib_umem_dmabuf_invalidate_cb(struct dma_buf_attachment
> > +*attach) {
> > +   struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
> > +
> > +   dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> > +
> > +   ib_umem_dmabuf_unmap_pages(&umem_dmabuf->umem, true);
> > +   queue_work(ib_wq, &umem_dmabuf->work);
> 
> Do we really want to queue remapping or should it wait until there is a page 
> fault?

Queuing remapping here has performance advantage because it reduces the chance
of getting the page fault.

> 
> What do GPUs do?
> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v5 4/5] RDMA/mlx5: Support dma-buf based userspace memory region

2020-10-15 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Thursday, October 15, 2020 5:54 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v5 4/5] RDMA/mlx5: Support dma-buf based userspace memory 
> region
> 

//snip

> > +static int mlx5_ib_umem_dmabuf_xlt_init(struct ib_umem *umem, void
> > +*context) {
> > +   struct mlx5_ib_mr *mr = context;
> > +   int flags = MLX5_IB_UPD_XLT_ENABLE;
> > +
> > +   if (!mr)
> > +   return -EINVAL;
> > +
> > +   return mlx5_ib_update_xlt(mr, 0, mr->npages, PAGE_SHIFT, flags); }
> 
> > +static int mlx5_ib_umem_dmabuf_xlt_update(struct ib_umem *umem, void
> > +*context) {
> > +   struct mlx5_ib_mr *mr = context;
> > +   int flags = MLX5_IB_UPD_XLT_ATOMIC;
> 
> Why are these atomic? Why the strange coding style of declaring a variable?

The atomic flag is copied from the odp code. I have verified that it is not 
necessary.
I can remove the definition of 'flags' here.  

> 
> > +   if (!mr)
> > +   return -EINVAL;
> 
> Why can this happen? Will dma_buf call move_notify prior to 
> dma_buf_map_attachment? There are locking problems if that happens.

I agree this check is unnecessary.

> 
> > +   return mlx5_ib_update_xlt(mr, 0, mr->npages, PAGE_SHIFT, flags); }
> > +
> > +static int mlx5_ib_umem_dmabuf_xlt_invalidate(struct ib_umem *umem,
> > +void *context) {
> > +   struct mlx5_ib_mr *mr = context;
> > +   int flags = MLX5_IB_UPD_XLT_ZAP | MLX5_IB_UPD_XLT_ATOMIC;
> > +
> > +   if (!mr)
> > +   return -EINVAL;
> > +
> > +   return mlx5_ib_update_xlt(mr, 0, mr->npages, PAGE_SHIFT, flags); }
> > +
> > +static struct ib_umem_dmabuf_ops mlx5_ib_umem_dmabuf_ops = {
> > +   .init = mlx5_ib_umem_dmabuf_xlt_init,
> > +   .update = mlx5_ib_umem_dmabuf_xlt_update,
> > +   .invalidate = mlx5_ib_umem_dmabuf_xlt_invalidate,
> > +};
> 
> I'm not really convinced these should be ops, this is usually a bad design 
> pattern.
> 
> Why do I need so much code to extract the sgl from the dma_buf? I would 
> prefer the dma_buf layer simplify this, not by adding a wrapper
> around it in the IB core code...
> 

We just need a way to call a device specific function to update the NIC's 
translation
table.  I considered three ways: (1) ops registered with ib_umem_get_dmabuf; 
(2) a single function pointer registered with ib_umem_get_dmabuf; (3) a method 
in 'struct ib_device'. Option (1) was chosen here with no strong reason. We 
could
consolidate the three functions of the ops into one, but then we will need to 
define commands or flags for different update operations.   

> > +struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start,
> > +u64 length, u64 virt_addr,
> > +int dmabuf_fd, int access_flags,
> > +struct ib_udata *udata)
> > +{
> > +   struct mlx5_ib_dev *dev = to_mdev(pd->device);
> > +   struct mlx5_ib_mr *mr = NULL;
> > +   struct ib_umem *umem;
> > +   int page_shift;
> > +   int npages;
> > +   int ncont;
> > +   int order;
> > +   int err;
> > +
> > +   if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM))
> > +   return ERR_PTR(-EOPNOTSUPP);
> > +
> > +   mlx5_ib_dbg(dev,
> > +   "start 0x%llx, virt_addr 0x%llx, length 0x%llx, fd %d, 
> > access_flags 0x%x\n",
> > +   start, virt_addr, length, dmabuf_fd, access_flags);
> > +
> > +   if (!mlx5_ib_can_load_pas_with_umr(dev, length))
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   umem = ib_umem_dmabuf_get(&dev->ib_dev, start, length, dmabuf_fd,
> > + access_flags, &mlx5_ib_umem_dmabuf_ops);
> > +   if (IS_ERR(umem)) {
> > +   mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(umem));
> > +   return ERR_PTR(PTR_ERR(umem));
> > +   }
> > +
> > +   npages = ib_umem_num_pages(umem);
> > +   if (!npages) {
> > +   mlx5_ib_warn(dev, "avoid zero region\n");
> > +   ib_umem_release(umem);
> > +   return ERR_PTR(-EINVAL);
> > +   }
> > +
> > +   page_shift = PAGE_SHIFT;
> > +   ncont = npages;
> > +   order = ilog2(roundup_pow_of_two(ncont));
> 
> We still need to deal with contiguity here, this ncont/npages is just 
> obfuscation

RE: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-10-06 Thread Xiong, Jianxin


> -Original Message-
> From: Daniel Vetter 
> Sent: Tuesday, October 06, 2020 2:22 AM
> To: Xiong, Jianxin 
> Cc: Jason Gunthorpe ; Leon Romanovsky ; 
> linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org;
> Doug Ledford ; Vetter, Daniel ; 
> Christian Koenig 
> Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Mon, Oct 05, 2020 at 04:18:11PM +, Xiong, Jianxin wrote:
> > > -Original Message-
> > > From: Jason Gunthorpe 
> > > Sent: Monday, October 05, 2020 6:13 AM
> > > To: Xiong, Jianxin 
> > > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org;
> > > Doug Ledford ; Leon Romanovsky
> > > ; Sumit Semwal ; Christian
> > > Koenig ; Vetter, Daniel
> > > 
> > > Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf
> > > as user memory region
> > >
> > > On Sun, Oct 04, 2020 at 12:12:28PM -0700, Jianxin Xiong wrote:
> > > > Dma-buf is a standard cross-driver buffer sharing mechanism that
> > > > can be used to support peer-to-peer access from RDMA devices.
> > > >
> > > > Device memory exported via dma-buf is associated with a file descriptor.
> > > > This is passed to the user space as a property associated with the
> > > > buffer allocation. When the buffer is registered as a memory
> > > > region, the file descriptor is passed to the RDMA driver along
> > > > with other parameters.
> > > >
> > > > Implement the common code for importing dma-buf object and mapping
> > > > dma-buf pages.
> > > >
> > > > Signed-off-by: Jianxin Xiong 
> > > > Reviewed-by: Sean Hefty 
> > > > Acked-by: Michael J. Ruhl 
> > > > ---
> > > >  drivers/infiniband/core/Makefile  |   2 +-
> > > >  drivers/infiniband/core/umem.c|   4 +
> > > >  drivers/infiniband/core/umem_dmabuf.c | 291
> > > > ++
> > > >  drivers/infiniband/core/umem_dmabuf.h |  14 ++
> > > >  drivers/infiniband/core/umem_odp.c|  12 ++
> > > >  include/rdma/ib_umem.h|  19 ++-
> > > >  6 files changed, 340 insertions(+), 2 deletions(-)  create mode
> > > > 100644 drivers/infiniband/core/umem_dmabuf.c
> > > >  create mode 100644 drivers/infiniband/core/umem_dmabuf.h
> > >
> > > I think this is using ODP too literally, dmabuf isn't going to need
> > > fine grained page faults, and I'm not sure this locking scheme is OK - 
> > > ODP is horrifically complicated.
> > >
> >
> > > If this is the approach then I think we should make dmabuf its own
> > > stand alone API, reg_user_mr_dmabuf()
> >
> > That's the original approach in the first version. We can go back there.
> >
> > >
> > > The implementation in mlx5 will be much more understandable, it
> > > would just do dma_buf_dynamic_attach() and program the XLT exactly the 
> > > same as a normal umem.
> > >
> > > The move_notify() simply zap's the XLT and triggers a work to reload
> > > it after the move. Locking is provided by the dma_resv_lock. Only a small 
> > > disruption to the page fault handler is needed.
> > >
> >
> > We considered such scheme but didn't go that way due to the lack of
> > notification when the move is done and thus the work wouldn't know
> > when it can reload.
> >
> > Now I think it again, we could probably signal the reload in the page fault 
> > handler.
> 
> For reinstanting the pages you need:
> 
> - dma_resv_lock, this prevents anyone else from issuing new moves or
>   anything like that
> - dma_resv_get_excl + dma_fence_wait to wait for any pending moves to
>   finish. gpus generally don't wait on the cpu, but block the dependent
>   dma operations from being scheduled until that fence fired. But for rdma
>   odp I think you need the cpu wait in your worker here.
> - get the new sg list, write it into your ptes
> - dma_resv_unlock to make sure you're not racing with a concurrent
>   move_notify
> 
> You can also grab multiple dma_resv_lock in atomically, but I think the odp 
> rdma model doesn't require that (gpus need that).
> 
> Note that you're allowed to allocate memory with GFP_KERNEL while holding 
> dma_resv_lock, so this shouldn't impose any issues. You are
> otoh not allowed to cause userspace faults (so no gup/pup or copy*user with 
> f

RE: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-10-05 Thread Xiong, Jianxin


> -Original Message-
> From: Jason Gunthorpe 
> Sent: Monday, October 05, 2020 9:33 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Mon, Oct 05, 2020 at 04:18:11PM +, Xiong, Jianxin wrote:
> 
> > > The implementation in mlx5 will be much more understandable, it
> > > would just do dma_buf_dynamic_attach() and program the XLT exactly the 
> > > same as a normal umem.
> > >
> > > The move_notify() simply zap's the XLT and triggers a work to reload
> > > it after the move. Locking is provided by the dma_resv_lock. Only a small 
> > > disruption to the page fault handler is needed.
> >
> > We considered such scheme but didn't go that way due to the lack of
> > notification when the move is done and thus the work wouldn't know
> > when it can reload.
> 
> Well, the work would block on the reservation lock and that indicates the 
> move is done

Got it.  Will work on that.

> 
> It would be nicer if the dma_buf could provide an op that things are ready to 
> go though
> 
> > Now I think it again, we could probably signal the reload in the page fault 
> > handler.
> 
> This also works, with a performance cost
> 
> > > > +   dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > > > +   sgt = dma_buf_map_attachment(umem_dmabuf->attach,
> > > > +DMA_BIDIRECTIONAL);
> > > > +   dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> > >
> > > This doesn't look right, this lock has to be held up until the HW is
> > > programmed
> >
> > The mapping remains valid until being invalidated again. There is a
> > sequence number checking before programming the HW.
> 
> It races, we could immediately trigger invalidation and then re-program the 
> HW with this stale data.
> 
> > > The use of atomic looks probably wrong as well.
> >
> > Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern?
> 
> It only increments once per invalidation, that usually is racy.

I will rework these parts.

> 
> > > > +   total_pages = ib_umem_odp_num_pages(umem_odp);
> > > > +   for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
> > > > +   addr = sg_dma_address(sg);
> > > > +   pages = sg_dma_len(sg) >> page_shift;
> > > > +   while (pages > 0 && k < total_pages) {
> > > > +   umem_odp->dma_list[k++] = addr | access_mask;
> > > > +   umem_odp->npages++;
> > > > +   addr += page_size;
> > > > +   pages--;
> > >
> > > This isn't fragmenting the sg into a page list properly, won't work
> > > for unaligned things
> >
> > I thought the addresses are aligned, but will add explicit alignment here.
> 
> I have no idea what comes out of dma_buf, I wouldn't make too many 
> assumptions since it does have to pass through the IOMMU layer too
> 
> > > And really we don't need the dma_list for this case, with a fixed
> > > whole mapping DMA SGL a normal umem sgl is OK and the normal umem
> > > XLT programming in mlx5 is fine.
> >
> > The dma_list is used by both "polulate_mtt()" and
> > "mlx5_ib_invalidate_range", which are used for XLT programming and
> > invalidating (zapping), respectively.
> 
> Don't use those functions for the dma_buf flow.

Ok.  I think we can use mlx5_ib_update_xlt() directly for dma-buf case. 

> 
> Jason

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-10-05 Thread Xiong, Jianxin
> -Original Message-
> From: Christian König 
> Sent: Monday, October 05, 2020 3:55 AM
> To: Xiong, Jianxin ; linux-r...@vger.kernel.org; 
> dri-devel@lists.freedesktop.org
> Cc: Doug Ledford ; Jason Gunthorpe ; Leon 
> Romanovsky ; Sumit Semwal
> ; Vetter, Daniel 
> Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> Hi Jianxin,
> 
> Am 04.10.20 um 21:12 schrieb Jianxin Xiong:
> > Dma-buf is a standard cross-driver buffer sharing mechanism that can
> > be used to support peer-to-peer access from RDMA devices.
> >
> > Device memory exported via dma-buf is associated with a file descriptor.
> > This is passed to the user space as a property associated with the
> > buffer allocation. When the buffer is registered as a memory region,
> > the file descriptor is passed to the RDMA driver along with other
> > parameters.
> >
> > Implement the common code for importing dma-buf object and mapping
> > dma-buf pages.
> >
> > Signed-off-by: Jianxin Xiong 
> > Reviewed-by: Sean Hefty 
> > Acked-by: Michael J. Ruhl 
> 
> well first of all really nice work you have done here.
> 
> Since I'm not an expert on RDMA or its drivers I can't really review any of 
> that part.
> 
> But at least from the DMA-buf side it looks like you are using the interface 
> correctly as intended.
> 
> So feel free to add an Acked-by: Christian König  
> if it helps.
> 
> Thanks,
> Christian.

Thanks, will do.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-10-05 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Monday, October 05, 2020 6:13 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Sun, Oct 04, 2020 at 12:12:28PM -0700, Jianxin Xiong wrote:
> > Dma-buf is a standard cross-driver buffer sharing mechanism that can
> > be used to support peer-to-peer access from RDMA devices.
> >
> > Device memory exported via dma-buf is associated with a file descriptor.
> > This is passed to the user space as a property associated with the
> > buffer allocation. When the buffer is registered as a memory region,
> > the file descriptor is passed to the RDMA driver along with other
> > parameters.
> >
> > Implement the common code for importing dma-buf object and mapping
> > dma-buf pages.
> >
> > Signed-off-by: Jianxin Xiong 
> > Reviewed-by: Sean Hefty 
> > Acked-by: Michael J. Ruhl 
> > ---
> >  drivers/infiniband/core/Makefile  |   2 +-
> >  drivers/infiniband/core/umem.c|   4 +
> >  drivers/infiniband/core/umem_dmabuf.c | 291
> > ++
> >  drivers/infiniband/core/umem_dmabuf.h |  14 ++
> >  drivers/infiniband/core/umem_odp.c|  12 ++
> >  include/rdma/ib_umem.h|  19 ++-
> >  6 files changed, 340 insertions(+), 2 deletions(-)  create mode
> > 100644 drivers/infiniband/core/umem_dmabuf.c
> >  create mode 100644 drivers/infiniband/core/umem_dmabuf.h
> 
> I think this is using ODP too literally, dmabuf isn't going to need fine 
> grained page faults, and I'm not sure this locking scheme is OK - ODP is
> horrifically complicated.
> 

> If this is the approach then I think we should make dmabuf its own stand 
> alone API, reg_user_mr_dmabuf()

That's the original approach in the first version. We can go back there.

> 
> The implementation in mlx5 will be much more understandable, it would just do 
> dma_buf_dynamic_attach() and program the XLT exactly
> the same as a normal umem.
> 
> The move_notify() simply zap's the XLT and triggers a work to reload it after 
> the move. Locking is provided by the dma_resv_lock. Only a
> small disruption to the page fault handler is needed.
> 

We considered such scheme but didn't go that way due to the lack of 
notification when the move is done and thus the work wouldn't know when it can 
reload.

Now I think it again, we could probably signal the reload in the page fault 
handler. 

> > +   dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > +   sgt = dma_buf_map_attachment(umem_dmabuf->attach,
> > +DMA_BIDIRECTIONAL);
> > +   dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> 
> This doesn't look right, this lock has to be held up until the HW is 
> programmed

The mapping remains valid until being invalidated again. There is a sequence 
number checking before programming the HW. 

> 
> The use of atomic looks probably wrong as well.

Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern?

> 
> > +   k = 0;
> > +   total_pages = ib_umem_odp_num_pages(umem_odp);
> > +   for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
> > +   addr = sg_dma_address(sg);
> > +   pages = sg_dma_len(sg) >> page_shift;
> > +   while (pages > 0 && k < total_pages) {
> > +   umem_odp->dma_list[k++] = addr | access_mask;
> > +   umem_odp->npages++;
> > +   addr += page_size;
> > +   pages--;
> 
> This isn't fragmenting the sg into a page list properly, won't work for 
> unaligned things

I thought the addresses are aligned, but will add explicit alignment here.

> 
> And really we don't need the dma_list for this case, with a fixed whole 
> mapping DMA SGL a normal umem sgl is OK and the normal umem
> XLT programming in mlx5 is fine.

The dma_list is used by both "polulate_mtt()" and "mlx5_ib_invalidate_range", 
which are used for XLT programming and invalidating (zapping), respectively.

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [RFC PATCH v2 0/3] RDMA: add dma-buf support

2020-07-07 Thread Xiong, Jianxin

> -Original Message-
> From: Christian König 
> Am 03.07.20 um 15:14 schrieb Jason Gunthorpe:
> > On Fri, Jul 03, 2020 at 02:52:03PM +0200, Daniel Vetter wrote:
> >
> >> So maybe I'm just totally confused about the rdma model. I thought:
> >> - you bind a pile of memory for various transactions, that might
> >> happen whenever. Kernel driver doesn't have much if any insight into
> >> when memory isn't needed anymore. I think in the rdma world that's
> >> called registering memory, but not sure.
> > Sure, but once registered the memory is able to be used at any moment
> > with no visibilty from the kernel.
> >
> > Unlike GPU the transactions that trigger memory access do not go
> > through the kernel - so there is no ability to interrupt a command
> > flow and fiddle with mappings.
> 
> This is the same for GPUs with user space queues as well.
> 
> But we can still say for a process if that this process is using a DMA-buf 
> which is moved out and so can't run any more unless the DMA-buf is
> accessible again.
> 
> In other words you somehow need to make sure that the hardware is not 
> accessing a piece of memory any more when you want to move it.
> 

While a process can be easily suspended, there is no way to tell the RDMA NIC 
not to process posted work requests that use specific memory regions (or with 
any other conditions). 

So far it appears to me that DMA-buf dynamic mapping for RDMA is only viable 
with ODP support. For NICs without ODP, a way to allow pinning the device 
memory is still needed.

Jianxin

> Christian.
> 
> >
> > Jason

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [RFC PATCH v2 0/3] RDMA: add dma-buf support

2020-06-30 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Tuesday, June 30, 2020 12:17 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; Doug Ledford ; Sumit 
> Semwal ; Leon Romanovsky
> ; Vetter, Daniel ; Christian Koenig 
> ; dri-
> de...@lists.freedesktop.org
> Subject: Re: [RFC PATCH v2 0/3] RDMA: add dma-buf support
> 
> > >
> > > On Tue, Jun 30, 2020 at 05:21:33PM +, Xiong, Jianxin wrote:
> > > > > > Heterogeneous Memory Management (HMM) utilizes
> > > > > > mmu_interval_notifier and ZONE_DEVICE to support shared
> > > > > > virtual address space and page migration between system memory
> > > > > > and device memory. HMM doesn't support pinning device memory
> > > > > > because pages located on device must be able to migrate to
> > > > > > system memory when accessed by CPU. Peer-to-peer access is
> > > > > > possible if the peer can handle page fault. For RDMA, that means 
> > > > > > the NIC must support on-demand paging.
> > > > >
> > > > > peer-peer access is currently not possible with hmm_range_fault().
> > > >
> > > > Currently hmm_range_fault() always sets the cpu access flag and
> > > > device private pages are migrated to the system RAM in the fault 
> > > > handler.
> > > > However, it's possible to have a modified code flow to keep the
> > > > device private page info for use with peer to peer access.
> > >
> > > Sort of, but only within the same device, RDMA or anything else generic 
> > > can't reach inside a DEVICE_PRIVATE and extract anything
> useful.
> >
> > But pfn is supposed to be all that is needed.
> 
> Needed for what? The PFN of the DEVICE_PRIVATE pages is useless for anything.

Hmm. I thought the pfn corresponds to the address in the BAR range. I could be
wrong here. 

> 
> > > Well, what do you want to happen here? The RDMA parts are
> > > reasonable, but I don't want to add new functionality without a
> > > purpose - the other parts need to be settled out first.
> >
> > At the RDMA side, we mainly want to check if the changes are
> > acceptable. For example, the part about adding 'fd' to the device ops
> > and the ioctl interface. All the previous comments are very helpful
> > for us to refine the patch so that we can be ready when GPU side
> > support becomes available.
> 
> Well, I'm not totally happy with the way the umem and the fd is handled so 
> roughly and incompletely..

Yes, this feedback is very helpful. Will work on improving the code.

> 
> > > Hum. This is not actually so hard to do. The whole dma buf proposal
> > > would make a lot more sense if the 'dma buf MR' had to be the
> > > dynamic kind and the driver had to provide the faulting. It would
> > > not be so hard to change mlx5 to be able to work like this, perhaps.
> > > (the locking might be a bit tricky though)
> >
> > The main issue is that not all NICs support ODP.
> 
> Sure, but there is lots of infrastructure work here to be done on dma buf, 
> having a correct consumer in the form of ODP might be helpful to
> advance it.

Good point. Thanks.

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [RFC PATCH v2 2/3] RDMA/core: Expand the driver method 'reg_user_mr' to support dma-buf

2020-06-30 Thread Xiong, Jianxin
Cc'd drm people.

> -Original Message-
> From: Xiong, Jianxin 
> Sent: Monday, June 29, 2020 10:32 AM
> To: linux-r...@vger.kernel.org
> Cc: Xiong, Jianxin ; Doug Ledford 
> ; Jason Gunthorpe ; Sumit Semwal
> ; Leon Romanovsky ; Vetter, Daniel 
> 
> Subject: [RFC PATCH v2 2/3] RDMA/core: Expand the driver method 'reg_user_mr' 
> to support dma-buf
> 
> Add a parameter 'fd' for the file descriptor associated with the dma-buf
> object to be imported. A negative value indicates that dma-buf is not
> used.
> 
> Signed-off-by: Jianxin Xiong 
> Reviewed-by: Sean Hefty 
> Acked-by: Michael J. Ruhl 
> ---
>  drivers/infiniband/core/uverbs_cmd.c|  2 +-
>  drivers/infiniband/core/verbs.c |  2 +-
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c|  7 +++-
>  drivers/infiniband/hw/bnxt_re/ib_verbs.h|  2 +-
>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h  |  3 +-
>  drivers/infiniband/hw/cxgb4/mem.c   |  8 -
>  drivers/infiniband/hw/efa/efa.h |  2 +-
>  drivers/infiniband/hw/efa/efa_verbs.c   |  7 +++-
>  drivers/infiniband/hw/hns/hns_roce_device.h |  2 +-
>  drivers/infiniband/hw/hns/hns_roce_mr.c |  7 +++-
>  drivers/infiniband/hw/i40iw/i40iw_verbs.c   |  6 
>  drivers/infiniband/hw/mlx4/mlx4_ib.h|  2 +-
>  drivers/infiniband/hw/mlx4/mr.c |  7 +++-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h|  2 +-
>  drivers/infiniband/hw/mlx5/mr.c | 45 
> ++---
>  drivers/infiniband/hw/mthca/mthca_provider.c|  8 -
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |  9 -
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |  3 +-
>  drivers/infiniband/hw/qedr/verbs.c  |  8 -
>  drivers/infiniband/hw/qedr/verbs.h  |  3 +-
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c|  8 -
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.h|  2 +-
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c|  6 +++-
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h |  2 +-
>  drivers/infiniband/sw/rdmavt/mr.c   |  6 +++-
>  drivers/infiniband/sw/rdmavt/mr.h   |  2 +-
>  drivers/infiniband/sw/rxe/rxe_verbs.c   |  6 
>  drivers/infiniband/sw/siw/siw_verbs.c   |  8 -
>  drivers/infiniband/sw/siw/siw_verbs.h   |  3 +-
>  include/rdma/ib_verbs.h |  4 +--
>  30 files changed, 150 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/infiniband/core/uverbs_cmd.c 
> b/drivers/infiniband/core/uverbs_cmd.c
> index 060b4eb..0199da2 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -757,7 +757,7 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle 
> *attrs)
>   }
> 
>   mr = pd->device->ops.reg_user_mr(pd, cmd.start, cmd.length, cmd.hca_va,
> -  cmd.access_flags,
> +  -1, cmd.access_flags,
>&attrs->driver_udata);
>   if (IS_ERR(mr)) {
>   ret = PTR_ERR(mr);
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 56a7133..aa067b2 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -2003,7 +2003,7 @@ struct ib_mr *ib_reg_user_mr(struct ib_pd *pd, u64 
> start, u64 length,
>   }
> 
>   mr = pd->device->ops.reg_user_mr(pd, start, length, virt_addr,
> -  access_flags, NULL);
> +  -1, access_flags, NULL);
> 
>   if (IS_ERR(mr))
>   return mr;
> diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
> b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> index 95f6d49..af40457 100644
> --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> @@ -3695,7 +3695,7 @@ static int fill_umem_pbl_tbl(struct ib_umem *umem, u64 
> *pbl_tbl_orig,
> 
>  /* uverbs */
>  struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
> -   u64 virt_addr, int mr_access_flags,
> +   u64 virt_addr, int fd, int mr_access_flags,
> struct ib_udata *udata)
>  {
>   struct bnxt_re_pd *pd = container_of(ib_pd, struct bnxt_re_pd, ib_pd);
> @@ -3728,6 +3728,11 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, 
> u64 start, u64 length,
>   /* The fixed portion of the rkey is the same as the lkey */
>   mr->ib_mr.rkey = mr->qplib_mr.rkey;
&g

RE: [RFC PATCH v2 3/3] RDMA/uverbs: Add uverbs command for dma-buf based MR registration

2020-06-30 Thread Xiong, Jianxin
Cc'd drm people.

> -Original Message-
> From: Xiong, Jianxin 
> Sent: Monday, June 29, 2020 10:32 AM
> To: linux-r...@vger.kernel.org
> Cc: Xiong, Jianxin ; Doug Ledford 
> ; Jason Gunthorpe ; Sumit Semwal
> ; Leon Romanovsky ; Vetter, Daniel 
> 
> Subject: [RFC PATCH v2 3/3] RDMA/uverbs: Add uverbs command for dma-buf based 
> MR registration
> 
> Add uverbs command for registering user memory region associated with a 
> dma-buf file descriptor.
> 
> Signed-off-by: Jianxin Xiong 
> Reviewed-by: Sean Hefty 
> Acked-by: Michael J. Ruhl 
> ---
>  drivers/infiniband/core/uverbs_std_types_mr.c | 112 
> ++
>  include/uapi/rdma/ib_user_ioctl_cmds.h|  14 
>  2 files changed, 126 insertions(+)
> 
> diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c 
> b/drivers/infiniband/core/uverbs_std_types_mr.c
> index c1286a5..2c9ff7c 100644
> --- a/drivers/infiniband/core/uverbs_std_types_mr.c
> +++ b/drivers/infiniband/core/uverbs_std_types_mr.c
> @@ -1,5 +1,6 @@
>  /*
>   * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
> + * Copyright (c) 2020, Intel Corporation.  All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU @@ 
> -154,6 +155,85 @@ static int
> UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
>   return ret;
>  }
> 
> +static int UVERBS_HANDLER(UVERBS_METHOD_REG_DMABUF_MR)(
> + struct uverbs_attr_bundle *attrs)
> +{
> + struct ib_uobject *uobj =
> + uverbs_attr_get_uobject(attrs, 
> UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
> + struct ib_pd *pd =
> + uverbs_attr_get_obj(attrs, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE);
> + struct ib_device *ib_dev = pd->device;
> +
> + u64 addr, length, hca_va;
> + u32 fd;
> + u32 access_flags;
> + struct ib_mr *mr;
> + int ret;
> +
> + if (!ib_dev->ops.reg_user_mr)
> + return -EOPNOTSUPP;
> +
> + ret = uverbs_copy_from(&addr, attrs, UVERBS_ATTR_REG_DMABUF_MR_ADDR);
> + if (ret)
> + return ret;
> +
> + ret = uverbs_copy_from(&length, attrs,
> +UVERBS_ATTR_REG_DMABUF_MR_LENGTH);
> + if (ret)
> + return ret;
> +
> + ret = uverbs_copy_from(&hca_va, attrs,
> +UVERBS_ATTR_REG_DMABUF_MR_HCA_VA);
> + if (ret)
> + return ret;
> +
> + ret = uverbs_copy_from(&fd, attrs,
> +UVERBS_ATTR_REG_DMABUF_MR_FD);
> + if (ret)
> + return ret;
> +
> + ret = uverbs_get_flags32(&access_flags, attrs,
> +  UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
> +  IB_ACCESS_SUPPORTED);
> + if (ret)
> + return ret;
> +
> + ret = ib_check_mr_access(access_flags);
> + if (ret)
> + return ret;
> +
> + mr = pd->device->ops.reg_user_mr(pd, addr, length, hca_va,
> +(int)(s32)fd, access_flags,
> +&attrs->driver_udata);
> + if (IS_ERR(mr))
> + return PTR_ERR(mr);
> +
> + mr->device  = pd->device;
> + mr->pd  = pd;
> + mr->type= IB_MR_TYPE_USER;
> + mr->uobject = uobj;
> + atomic_inc(&pd->usecnt);
> +
> + uobj->object = mr;
> +
> + ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
> +  &mr->lkey, sizeof(mr->lkey));
> + if (ret)
> + goto err_dereg;
> +
> + ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
> +  &mr->rkey, sizeof(mr->rkey));
> + if (ret)
> + goto err_dereg;
> +
> + return 0;
> +
> +err_dereg:
> + ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
> +
> + return ret;
> +}
> +
>  DECLARE_UVERBS_NAMED_METHOD(
>   UVERBS_METHOD_ADVISE_MR,
>   UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,
> @@ -200,6 +280,37 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
>   UVERBS_ATTR_TYPE(u32),
>   UA_MANDATORY));
> 
> +DECLARE_UVERBS_NAMED_METHOD(
> + UVERBS_METHOD_REG_DMABUF_MR,
> + UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
> + UVERBS_OBJECT_MR,
> + UVERBS_ACCESS_NEW,
> + UA_MANDATORY),
> + UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_PD_HANDL

RE: [RFC PATCH v2 1/3] RDMA/umem: Support importing dma-buf as user memory region

2020-06-30 Thread Xiong, Jianxin
Cc'd drm people.

> -Original Message-
> From: Xiong, Jianxin 
> Sent: Monday, June 29, 2020 10:32 AM
> To: linux-r...@vger.kernel.org
> Cc: Xiong, Jianxin ; Doug Ledford 
> ; Jason Gunthorpe ; Sumit Semwal
> ; Leon Romanovsky ; Vetter, Daniel 
> 
> Subject: [RFC PATCH v2 1/3] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> Dma-buf is a standard cross-driver buffer sharing mechanism that can be
> used to support peer-to-peer access from RDMA devices.
> 
> Device memory exported via dma-buf can be associated with a file
> descriptor. This is passed to the user space as a property associated
> with the buffer allocation. When the buffer is registered as a memory
> region, the file descriptor is passed to the RDMA driver along with
> other parameters.
> 
> Implement the common code for importing dma-buf object, and pinning and
> mapping dma-buf pages.
> 
> Signed-off-by: Jianxin Xiong 
> Reviewed-by: Sean Hefty 
> Acked-by: Michael J. Ruhl 
> ---
>  drivers/infiniband/core/Makefile  |   2 +-
>  drivers/infiniband/core/umem.c|   4 ++
>  drivers/infiniband/core/umem_dmabuf.c | 105 
> ++
>  drivers/infiniband/core/umem_dmabuf.h |  11 
>  include/rdma/ib_umem.h|  14 -
>  5 files changed, 134 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/infiniband/core/umem_dmabuf.c
>  create mode 100644 drivers/infiniband/core/umem_dmabuf.h
> 
> diff --git a/drivers/infiniband/core/Makefile 
> b/drivers/infiniband/core/Makefile
> index d1b14887..0d4efa0 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -37,5 +37,5 @@ ib_uverbs-y :=  uverbs_main.o 
> uverbs_cmd.o uverbs_marshall.o \
>   uverbs_std_types_mr.o 
> uverbs_std_types_counters.o \
>   uverbs_uapi.o uverbs_std_types_device.o \
>   uverbs_std_types_async_fd.o
> -ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
> +ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
>  ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 82455a1..bf1a6c1 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -2,6 +2,7 @@
>   * Copyright (c) 2005 Topspin Communications.  All rights reserved.
>   * Copyright (c) 2005 Cisco Systems.  All rights reserved.
>   * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
> + * Copyright (c) 2020 Intel Corporation. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -42,6 +43,7 @@
>  #include 
> 
>  #include "uverbs.h"
> +#include "umem_dmabuf.h"
> 
>  static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, 
> int dirty)
>  {
> @@ -317,6 +319,8 @@ void ib_umem_release(struct ib_umem *umem)
>  {
>   if (!umem)
>   return;
> + if (umem->is_dmabuf)
> + return ib_umem_dmabuf_release(umem);
>   if (umem->is_odp)
>   return ib_umem_odp_release(to_ib_umem_odp(umem));
> 
> diff --git a/drivers/infiniband/core/umem_dmabuf.c 
> b/drivers/infiniband/core/umem_dmabuf.c
> new file mode 100644
> index 000..406ca64
> --- /dev/null
> +++ b/drivers/infiniband/core/umem_dmabuf.c
> @@ -0,0 +1,105 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> +/*
> + * Copyright (c) 2020 Intel Corporation. All rights reserved.
> + */
> +
> +#include 
> +#include 
> +
> +#include "uverbs.h"
> +
> +struct ib_umem_dmabuf {
> + struct ib_umem  umem;
> + struct dma_buf  *dmabuf;
> + struct dma_buf_attachment *attach;
> + struct sg_table *sgt;
> +};
> +
> +static inline struct ib_umem_dmabuf *to_ib_umem_dmabuf(struct ib_umem *umem)
> +{
> + return container_of(umem, struct ib_umem_dmabuf, umem);
> +}
> +
> +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
> +unsigned long addr, size_t size,
> +int dmabuf_fd, int access)
> +{
> + struct ib_umem_dmabuf *umem_dmabuf;
> + struct sg_table *sgt;
> + enum dma_data_direction dir;
> + long ret;
> + unsigned long end;
> +
> + if (check_add_overflow(addr, size, &end))
> + return ERR_PTR(-EINVAL);
> +
> + if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
> + return ERR_PTR(-EINV

RE: [RFC PATCH v2 0/3] RDMA: add dma-buf support

2020-06-30 Thread Xiong, Jianxin
Added to cc-list:
Christian Koenig 
dri-devel@lists.freedesktop.org

> -Original Message-
> From: Xiong, Jianxin 
> Sent: Monday, June 29, 2020 10:32 AM
> To: linux-r...@vger.kernel.org
> Cc: Xiong, Jianxin ; Doug Ledford 
> ; Jason Gunthorpe ; Sumit Semwal
> ; Leon Romanovsky ; Vetter, Daniel 
> 
> Subject: [RFC PATCH v2 0/3] RDMA: add dma-buf support
> 
> When enabled, an RDMA capable NIC can perform peer-to-peer transactions
> over PCIe to access the local memory located on another device. This can
> often lead to better performance than using a system memory buffer for
> RDMA and copying data between the buffer and device memory.
> 
> Current kernel RDMA stack uses get_user_pages() to pin the physical
> pages backing the user buffer and uses dma_map_sg_attrs() to get the
> dma addresses for memory access. This usually doesn't work for peer
> device memory due to the lack of associated page structures.
> 
> Several mechanisms exist today to facilitate device memory access.
> 
> ZONE_DEVICE is a new zone for device memory in the memory management
> subsystem. It allows pages from device memory being described with
> specialized page structures. As the result, calls like get_user_pages()
> can succeed, but what can be done with these page structures may be
> different from system memory. It is further specialized into multiple
> memory types, such as one type for PCI p2pmem/p2pdma and one type for
> HMM.
> 
> PCI p2pmem/p2pdma uses ZONE_DEVICE to represent device memory residing
> in a PCI BAR and provides a set of calls to publish, discover, allocate,
> and map such memory for peer-to-peer transactions. One feature of the
> API is that the buffer is allocated by the side that does the DMA
> transfer. This works well with the storage usage case, but is awkward
> with GPU-NIC communication, where typically the buffer is allocated by
> the GPU driver rather than the NIC driver.
> 
> Heterogeneous Memory Management (HMM) utilizes mmu_interval_notifier
> and ZONE_DEVICE to support shared virtual address space and page
> migration between system memory and device memory. HMM doesn't support
> pinning device memory because pages located on device must be able to
> migrate to system memory when accessed by CPU. Peer-to-peer access
> is possible if the peer can handle page fault. For RDMA, that means
> the NIC must support on-demand paging.
> 
> Dma-buf is a standard mechanism for sharing buffers among different
> device drivers. The buffer to be shared is exported by the owning
> driver and imported by the driver that wants to use it. The exporter
> provides a set of ops that the importer can call to pin and map the
> buffer. In addition, a file descriptor can be associated with a dma-
> buf object as the handle that can be passed to user space.
> 
> This patch series adds dma-buf importer role to the RDMA driver in
> attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
> chosen for a few reasons: first, the API is relatively simple and allows
> a lot of flexibility in implementing the buffer manipulation ops.
> Second, it doesn't require page structure. Third, dma-buf is already
> supported in many GPU drivers. However, we are aware that existing GPU
> drivers don't allow pinning device memory via the dma-buf interface.
> Pinning and mapping a dma-buf would cause the backing storage to migrate
> to system RAM. This is due to the lack of knowledge about whether the
> importer can perform peer-to-peer access and the lack of resource limit
> control measure for GPU. For the first part, the latest dma-buf driver
> has a peer-to-peer flag for the importer, but the flag is currently tied
> to dynamic mapping support, which requires on-demand paging support from
> the NIC to work. There are a few possible ways to address these issues,
> such as decoupling peer-to-peer flag from dynamic mapping, allowing more
> leeway for individual drivers to make the pinning decision and adding
> GPU resource limit control via cgroup. We would like to get comments on
> this patch series with the assumption that device memory pinning via
> dma-buf is supported by some GPU drivers, and at the same time welcome
> open discussions on how to address the aforementioned issues as well as
> GPU-NIC peer-to-peer access solutions in general.
> 
> This is the second version of the patch series. Here are the changes
> from the previous version:
> * The Kconfig option is removed. There is no dependence issue since
> dma-buf driver is always enabled.
> * The declaration of new data structure and functions is reorganized to
> minimize the visibility of the changes.
> * The new uverbs command now goes through ioctl() instead of write().
> * The rereg fu

RE: [RFC PATCH v2 0/3] RDMA: add dma-buf support

2020-06-30 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Tuesday, June 30, 2020 10:35 AM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; Doug Ledford ; Sumit 
> Semwal ; Leon Romanovsky
> ; Vetter, Daniel ; Christian Koenig 
> 
> Subject: Re: [RFC PATCH v2 0/3] RDMA: add dma-buf support
> 
> On Tue, Jun 30, 2020 at 05:21:33PM +, Xiong, Jianxin wrote:
> > > > Heterogeneous Memory Management (HMM) utilizes
> > > > mmu_interval_notifier and ZONE_DEVICE to support shared virtual
> > > > address space and page migration between system memory and device
> > > > memory. HMM doesn't support pinning device memory because pages
> > > > located on device must be able to migrate to system memory when
> > > > accessed by CPU. Peer-to-peer access is possible if the peer can
> > > > handle page fault. For RDMA, that means the NIC must support on-demand 
> > > > paging.
> > >
> > > peer-peer access is currently not possible with hmm_range_fault().
> >
> > Currently hmm_range_fault() always sets the cpu access flag and device
> > private pages are migrated to the system RAM in the fault handler.
> > However, it's possible to have a modified code flow to keep the device
> > private page info for use with peer to peer access.
> 
> Sort of, but only within the same device, RDMA or anything else generic can't 
> reach inside a DEVICE_PRIVATE and extract anything useful.

But pfn is supposed to be all that is needed.

> 
> > > So.. this patch doesn't really do anything new? We could just make a MR 
> > > against the DMA buf mmap and get to the same place?
> >
> > That's right, the patch alone is just half of the story. The
> > functionality depends on availability of dma-buf exporter that can pin
> > the device memory.
> 
> Well, what do you want to happen here? The RDMA parts are reasonable, but I 
> don't want to add new functionality without a purpose - the
> other parts need to be settled out first.

At the RDMA side, we mainly want to check if the changes are acceptable. For 
example,
the part about adding 'fd' to the device ops and the ioctl interface. All the 
previous
comments are very helpful for us to refine the patch so that we can be ready 
when
GPU side support becomes available.

> 
> The need for the dynamic mapping support for even the current DMA Buf hacky 
> P2P users is really too bad. Can you get any GPU driver to
> support non-dynamic mapping?

We are working on direct direction.

> 
> > > > migrate to system RAM. This is due to the lack of knowledge about
> > > > whether the importer can perform peer-to-peer access and the lack
> > > > of resource limit control measure for GPU. For the first part, the
> > > > latest dma-buf driver has a peer-to-peer flag for the importer,
> > > > but the flag is currently tied to dynamic mapping support, which
> > > > requires on-demand paging support from the NIC to work.
> > >
> > > ODP for DMA buf?
> >
> > Right.
> 
> Hum. This is not actually so hard to do. The whole dma buf proposal would 
> make a lot more sense if the 'dma buf MR' had to be the
> dynamic kind and the driver had to provide the faulting. It would not be so 
> hard to change mlx5 to be able to work like this, perhaps. (the
> locking might be a bit tricky though)

The main issue is that not all NICs support ODP.

> 
> > > > There are a few possible ways to address these issues, such as
> > > > decoupling peer-to-peer flag from dynamic mapping, allowing more
> > > > leeway for individual drivers to make the pinning decision and
> > > > adding GPU resource limit control via cgroup. We would like to get
> > > > comments on this patch series with the assumption that device
> > > > memory pinning via dma-buf is supported by some GPU drivers, and
> > > > at the same time welcome open discussions on how to address the
> > > > aforementioned issues as well as GPU-NIC peer-to-peer access solutions 
> > > > in general.
> > >
> > > These seem like DMA buf problems, not RDMA problems, why are you
> > > asking these questions with a RDMA patch set? The usual DMA buf people 
> > > are not even Cc'd here.
> >
> > The intention is to have people from both RDMA and DMA buffer side to
> > comment. Sumit Semwal is the DMA buffer maintainer according to the
> > MAINTAINERS file. I agree more people could be invited to the discussion.
> > Just added Christian Koenig to the cc-list.
> 
> Would be good to have added the drm lists too

Thanks, cc'd dri-devel here, and will also do the same for the previous part of 
the thread.

> 
> > If the umem_description you mentioned is for information used to
> > create the umem (e.g. a structure for all the parameters), then this would 
> > work better.
> 
> It would make some more sense, and avoid all these weird EOPNOTSUPPS.

Good, thanks for the suggestion.

> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel