Re: [PATCH v0 01/14] IB/hfi1, IB/qib: Make I2C terminology more inclusive

2024-04-03 Thread Leon Romanovsky
On Fri, Mar 29, 2024 at 05:00:25PM +, Easwar Hariharan wrote:
> I2C v7, SMBus 3.2, and I3C specifications have replaced "master/slave"
> with more appropriate terms. Inspired by and following on to Wolfram's series
> to fix drivers/i2c[1], fix the terminology where I had a role to play, now 
> that
> the approved verbiage exists in the specification.
> 
> Compile tested, no functionality changes intended
> 
> [1]: 
> https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/
> 
> Signed-off-by: Easwar Hariharan 
> ---
>  drivers/infiniband/hw/hfi1/chip.c   |  6 ++--
>  drivers/infiniband/hw/hfi1/chip.h   |  2 +-
>  drivers/infiniband/hw/hfi1/chip_registers.h |  2 +-
>  drivers/infiniband/hw/hfi1/file_ops.c   |  2 +-
>  drivers/infiniband/hw/hfi1/firmware.c   | 22 ++---
>  drivers/infiniband/hw/hfi1/pcie.c   |  2 +-
>  drivers/infiniband/hw/hfi1/qsfp.c   | 36 ++---
>  drivers/infiniband/hw/hfi1/user_exp_rcv.c   |  2 +-
>  drivers/infiniband/hw/qib/qib_twsi.c|  6 ++--
>  9 files changed, 40 insertions(+), 40 deletions(-)

hfi1 and qib work perfectly fine with the current terminology. There is
no need to change old code just for the sake of change.

Let's drop this patch.

Thanks


Re: [Intel-gfx] [PATCH][next] treewide: Replace zero-length arrays with flexible-array members

2022-02-15 Thread Leon Romanovsky
On Tue, Feb 15, 2022 at 01:21:10PM -0600, Gustavo A. R. Silva wrote:
> On Tue, Feb 15, 2022 at 10:17:40AM -0800, Kees Cook wrote:
> > On Tue, Feb 15, 2022 at 11:47:43AM -0600, Gustavo A. R. Silva wrote:
> > > There is a regular need in the kernel to provide a way to declare
> > > having a dynamically sized set of trailing elements in a structure.
> > > Kernel code should always use “flexible array members”[1] for these
> > > cases. The older style of one-element or zero-length arrays should
> > > no longer be used[2].
> > > 
> > > This code was transformed with the help of Coccinelle:
> > > (next-20220214$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file 
> > > script.cocci --include-headers --dir . > output.patch)
> > > 
> > > @@
> > > identifier S, member, array;
> > > type T1, T2;
> > > @@
> > > 
> > > struct S {
> > >   ...
> > >   T1 member;
> > >   T2 array[
> > > - 0
> > >   ];
> > > };
> > 
> > These all look trivially correct to me. Only two didn't have the end of
> > the struct visible in the patch, and checking those showed them to be
> > trailing members as well, so:
> > 
> > Reviewed-by: Kees Cook 
> 
> I'll add this to my -next tree.

I would like to ask you to send mlx5 patch separately to netdev. We are working
to delete that file completely and prefer to avoid from unnecessary merge 
conflicts.

Thanks

> 
> Thanks!
> --
> Gustavo


Re: [Intel-gfx] [PATCH rdma-next v4 0/3] SG fix together with update to RDMA umem

2021-08-30 Thread Leon Romanovsky
On Mon, Aug 30, 2021 at 10:31:28AM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 30, 2021 at 11:21:00AM +0300, Leon Romanovsky wrote:
> > On Tue, Aug 24, 2021 at 05:25:28PM +0300, Maor Gottlieb wrote:
> > > From: Maor Gottlieb 
> > > 
> > > Changelog:
> > > v4:
> > >  * Unify sg_free_table_entries with __sg_free_table
> > > v3: https://lore.kernel.org/lkml/cover.1627551226.git.leo...@nvidia.com/
> > >  * Rewrote to new API suggestion
> > >  * Split for more patches
> > > v2: https://lore.kernel.org/lkml/cover.1626605893.git.leo...@nvidia.com
> > >  * Changed implementation of first patch, based on our discussion with
> > >  * Christoph.
> > >https://lore.kernel.org/lkml/ynwavtt0qmqdx...@infradead.org/
> > > v1: https://lore.kernel.org/lkml/cover.1624955710.git.leo...@nvidia.com/
> > >  * Fixed sg_page with a _dma_ API in the umem.c
> > > v0: https://lore.kernel.org/lkml/cover.1624361199.git.leo...@nvidia.com
> > > 
> > > Maor Gottlieb (3):
> > >   lib/scatterlist: Provide a dedicated function to support table append
> > >   lib/scatterlist: Fix wrong update of orig_nents
> > >   RDMA: Use the sg_table directly and remove the opencoded version from
> > > umem
> > > 
> > >  drivers/gpu/drm/drm_prime.c |  13 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  11 +-
> > >  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  14 +-
> > >  drivers/infiniband/core/umem.c  |  56 ---
> > >  drivers/infiniband/core/umem_dmabuf.c   |   5 +-
> > >  drivers/infiniband/hw/hns/hns_roce_db.c |   4 +-
> > >  drivers/infiniband/hw/irdma/verbs.c |   2 +-
> > >  drivers/infiniband/hw/mlx4/doorbell.c   |   3 +-
> > >  drivers/infiniband/hw/mlx4/mr.c |   4 +-
> > >  drivers/infiniband/hw/mlx5/doorbell.c   |   3 +-
> > >  drivers/infiniband/hw/mlx5/mr.c |   3 +-
> > >  drivers/infiniband/hw/qedr/verbs.c  |   2 +-
> > >  drivers/infiniband/sw/rdmavt/mr.c   |   2 +-
> > >  drivers/infiniband/sw/rxe/rxe_mr.c  |   2 +-
> > >  include/linux/scatterlist.h |  56 +--
> > >  include/rdma/ib_umem.h  |  11 +-
> > >  include/rdma/ib_verbs.h |  28 
> > >  lib/scatterlist.c   | 155 
> > >  lib/sg_pool.c   |   3 +-
> > >  tools/testing/scatterlist/main.c|  38 +++--
> > >  20 files changed, 258 insertions(+), 157 deletions(-)
> > 
> > Jason,
> > 
> > Did you add these patches to the -next? I can't find them.
> 
> They sat in linux-next for awhile outside the rdma-next tree
> 
> All synced up now

Thanks

> 
> Jason


Re: [Intel-gfx] [PATCH rdma-next v4 0/3] SG fix together with update to RDMA umem

2021-08-30 Thread Leon Romanovsky
On Tue, Aug 24, 2021 at 05:25:28PM +0300, Maor Gottlieb wrote:
> From: Maor Gottlieb 
> 
> Changelog:
> v4:
>  * Unify sg_free_table_entries with __sg_free_table
> v3: https://lore.kernel.org/lkml/cover.1627551226.git.leo...@nvidia.com/
>  * Rewrote to new API suggestion
>  * Split for more patches
> v2: https://lore.kernel.org/lkml/cover.1626605893.git.leo...@nvidia.com
>  * Changed implementation of first patch, based on our discussion with
>  * Christoph.
>https://lore.kernel.org/lkml/ynwavtt0qmqdx...@infradead.org/
> v1: https://lore.kernel.org/lkml/cover.1624955710.git.leo...@nvidia.com/
>  * Fixed sg_page with a _dma_ API in the umem.c
> v0: https://lore.kernel.org/lkml/cover.1624361199.git.leo...@nvidia.com
> 
> Maor Gottlieb (3):
>   lib/scatterlist: Provide a dedicated function to support table append
>   lib/scatterlist: Fix wrong update of orig_nents
>   RDMA: Use the sg_table directly and remove the opencoded version from
> umem
> 
>  drivers/gpu/drm/drm_prime.c |  13 +-
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  11 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  14 +-
>  drivers/infiniband/core/umem.c  |  56 ---
>  drivers/infiniband/core/umem_dmabuf.c   |   5 +-
>  drivers/infiniband/hw/hns/hns_roce_db.c |   4 +-
>  drivers/infiniband/hw/irdma/verbs.c |   2 +-
>  drivers/infiniband/hw/mlx4/doorbell.c   |   3 +-
>  drivers/infiniband/hw/mlx4/mr.c |   4 +-
>  drivers/infiniband/hw/mlx5/doorbell.c   |   3 +-
>  drivers/infiniband/hw/mlx5/mr.c |   3 +-
>  drivers/infiniband/hw/qedr/verbs.c  |   2 +-
>  drivers/infiniband/sw/rdmavt/mr.c   |   2 +-
>  drivers/infiniband/sw/rxe/rxe_mr.c  |   2 +-
>  include/linux/scatterlist.h |  56 +--
>  include/rdma/ib_umem.h  |  11 +-
>  include/rdma/ib_verbs.h |  28 
>  lib/scatterlist.c   | 155 
>  lib/sg_pool.c   |   3 +-
>  tools/testing/scatterlist/main.c|  38 +++--
>  20 files changed, 258 insertions(+), 157 deletions(-)

Jason,

Did you add these patches to the -next? I can't find them.

Thanks

> 
> -- 
> 2.25.4
> 


[Intel-gfx] [PATCH rdma-next v3 3/3] RDMA: Use the sg_table directly and remove the opencoded version from umem

2021-07-29 Thread Leon Romanovsky
From: Maor Gottlieb 

This allows using the normal sg_table APIs and makes all the code
cleaner. Remove sgt, nents and nmapd from ib_umem.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Jason Gunthorpe 
---
 drivers/infiniband/core/umem.c  | 32 +
 drivers/infiniband/core/umem_dmabuf.c   |  5 ++--
 drivers/infiniband/hw/hns/hns_roce_db.c |  4 ++--
 drivers/infiniband/hw/irdma/verbs.c |  2 +-
 drivers/infiniband/hw/mlx4/doorbell.c   |  3 ++-
 drivers/infiniband/hw/mlx4/mr.c |  4 ++--
 drivers/infiniband/hw/mlx5/doorbell.c   |  3 ++-
 drivers/infiniband/hw/mlx5/mr.c |  3 ++-
 drivers/infiniband/hw/qedr/verbs.c  |  2 +-
 drivers/infiniband/sw/rdmavt/mr.c   |  2 +-
 drivers/infiniband/sw/rxe/rxe_mr.c  |  2 +-
 include/rdma/ib_umem.h  | 10 
 include/rdma/ib_verbs.h | 28 ++
 13 files changed, 59 insertions(+), 41 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 42481e7a72e8..86d479772fbc 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -51,11 +51,11 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
struct scatterlist *sg;
unsigned int i;
 
-   if (umem->nmap > 0)
-   ib_dma_unmap_sg(dev, umem->sg_head.sgl, umem->sg_nents,
-   DMA_BIDIRECTIONAL);
+   if (dirty)
+   ib_dma_unmap_sgtable_attrs(dev, >sgt_append.sgt,
+  DMA_BIDIRECTIONAL, 0);
 
-   for_each_sg(umem->sg_head.sgl, sg, umem->sg_nents, i)
+   for_each_sgtable_sg(>sgt_append.sgt, sg, i)
unpin_user_page_range_dirty_lock(sg_page(sg),
DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty);
 
@@ -111,7 +111,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
/* offset into first SGL */
pgoff = umem->address & ~PAGE_MASK;
 
-   for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i) {
+   for_each_sgtable_dma_sg(>sgt_append.sgt, sg, i) {
/* Walk SGL and reduce max page size if VA/PA bits differ
 * for any address.
 */
@@ -121,7 +121,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 * the maximum possible page size as the low bits of the iova
 * must be zero when starting the next chunk.
 */
-   if (i != (umem->nmap - 1))
+   if (i != (umem->sgt_append.sgt.nents - 1))
mask |= va;
pgoff = 0;
}
@@ -231,30 +231,19 @@ struct ib_umem *ib_umem_get(struct ib_device *device, 
unsigned long addr,
>sgt_append, page_list, pinned, 0,
pinned << PAGE_SHIFT, ib_dma_max_seg_size(device),
npages, GFP_KERNEL);
-   umem->sg_nents = umem->sgt_append.sgt.nents;
if (ret) {
-   memcpy(>sg_head.sgl, >sgt_append.sgt,
-  sizeof(umem->sgt_append.sgt));
unpin_user_pages_dirty_lock(page_list, pinned, 0);
goto umem_release;
}
}
 
-   memcpy(>sg_head.sgl, >sgt_append.sgt,
-  sizeof(umem->sgt_append.sgt));
if (access & IB_ACCESS_RELAXED_ORDERING)
dma_attr |= DMA_ATTR_WEAK_ORDERING;
 
-   umem->nmap =
-   ib_dma_map_sg_attrs(device, umem->sg_head.sgl, umem->sg_nents,
-   DMA_BIDIRECTIONAL, dma_attr);
-
-   if (!umem->nmap) {
-   ret = -ENOMEM;
+   ret = ib_dma_map_sgtable_attrs(device, >sgt_append.sgt,
+  DMA_BIDIRECTIONAL, dma_attr);
+   if (ret)
goto umem_release;
-   }
-
-   ret = 0;
goto out;
 
 umem_release:
@@ -314,7 +303,8 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, 
size_t offset,
return -EINVAL;
}
 
-   ret = sg_pcopy_to_buffer(umem->sg_head.sgl, umem->sg_nents, dst, length,
+   ret = sg_pcopy_to_buffer(umem->sgt_append.sgt.sgl,
+umem->sgt_append.sgt.orig_nents, dst, length,
 offset + ib_umem_offset(umem));
 
if (ret < 0)
diff --git a/drivers/infiniband/core/umem_dmabuf.c 
b/drivers/infiniband/core/umem_dmabuf.c
index c6e875619fac..e824baf4640d 100644
--- a/drivers/infiniband/core/umem_dmabuf.c
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -55,9 +55,8 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf 
*umem_dmabuf)
cur += sg_dma_len(sg);
}
 
-   umem_dmabuf->umem.sg_head.sgl = umem_dmabu

[Intel-gfx] [PATCH rdma-next v3 2/3] lib/scatterlist: Fix wrong update of orig_nents

2021-07-29 Thread Leon Romanovsky
From: Maor Gottlieb 

orig_nents should represent the number of entries with pages,
but __sg_alloc_table_from_pages sets orig_nents as the number of
total entries in the table. This is wrong when the API is used for
dynamic allocation where not all the table entries are mapped with
pages. It wasn't observed until now, since RDMA umem who uses this
API in the dynamic form doesn't use orig_nents implicit or explicit
by the scatterlist APIs.

Fix it by changing the append API to track the SG append table
state and have an API to free the append table according to the
total number of entries in the table.
Now all APIs set orig_nents as number of enries with pages.

Fixes: 07da1223ec93 ("lib/scatterlist: Add support in dynamic allocation of SG 
table from pages")
Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/umem.c   |  34 ---
 include/linux/scatterlist.h  |  17 +++-
 include/rdma/ib_umem.h   |   1 +
 lib/scatterlist.c| 161 +++
 tools/testing/scatterlist/main.c |  45 +
 5 files changed, 154 insertions(+), 104 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index b741758e528f..42481e7a72e8 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -59,7 +59,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
unpin_user_page_range_dirty_lock(sg_page(sg),
DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty);
 
-   sg_free_table(>sg_head);
+   sg_free_append_table(>sgt_append);
 }
 
 /**
@@ -155,8 +155,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, 
unsigned long addr,
unsigned long dma_attr = 0;
struct mm_struct *mm;
unsigned long npages;
-   int ret;
-   struct scatterlist *sg = NULL;
+   int pinned, ret;
unsigned int gup_flags = FOLL_WRITE;
 
/*
@@ -216,28 +215,33 @@ struct ib_umem *ib_umem_get(struct ib_device *device, 
unsigned long addr,
 
while (npages) {
cond_resched();
-   ret = pin_user_pages_fast(cur_base,
+   pinned = pin_user_pages_fast(cur_base,
  min_t(unsigned long, npages,
PAGE_SIZE /
sizeof(struct page *)),
  gup_flags | FOLL_LONGTERM, page_list);
-   if (ret < 0)
+   if (pinned < 0) {
+   ret = pinned;
goto umem_release;
+   }
 
-   cur_base += ret * PAGE_SIZE;
-   npages -= ret;
-   sg = sg_alloc_append_table_from_pages(>sg_head, page_list,
-   ret, 0, ret << PAGE_SHIFT,
-   ib_dma_max_seg_size(device), sg, npages,
-   GFP_KERNEL);
-   umem->sg_nents = umem->sg_head.nents;
-   if (IS_ERR(sg)) {
-   unpin_user_pages_dirty_lock(page_list, ret, 0);
-   ret = PTR_ERR(sg);
+   cur_base += pinned * PAGE_SIZE;
+   npages -= pinned;
+   ret = sg_alloc_append_table_from_pages(
+   >sgt_append, page_list, pinned, 0,
+   pinned << PAGE_SHIFT, ib_dma_max_seg_size(device),
+   npages, GFP_KERNEL);
+   umem->sg_nents = umem->sgt_append.sgt.nents;
+   if (ret) {
+   memcpy(>sg_head.sgl, >sgt_append.sgt,
+  sizeof(umem->sgt_append.sgt));
+   unpin_user_pages_dirty_lock(page_list, pinned, 0);
goto umem_release;
}
}
 
+   memcpy(>sg_head.sgl, >sgt_append.sgt,
+  sizeof(umem->sgt_append.sgt));
if (access & IB_ACCESS_RELAXED_ORDERING)
dma_attr |= DMA_ATTR_WEAK_ORDERING;
 
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 5c700f2a0d18..0c7aa5ccebfc 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -39,6 +39,12 @@ struct sg_table {
unsigned int orig_nents;/* original size of list */
 };
 
+struct sg_append_table {
+   struct sg_table sgt;/* The scatter list table */
+   struct scatterlist *prv;/* last populated sge in the table */
+   unsigned int total_nents;   /* Total entries in the table */
+};
+
 /*
  * Notes on SG table design.
  *
@@ -282,14 +288,15 @@ typedef void (sg_free_fn)(struct scatterlist *, unsigned 
int);
 void __sg_free_table(struct sg_table *, unsigned int, unsigned int,
 sg_free_fn *);
 void sg_free_table(struct sg_table *);
+

[Intel-gfx] [PATCH rdma-next v3 1/3] lib/scatterlist: Provide a dedicated function to support table append

2021-07-29 Thread Leon Romanovsky
From: Maor Gottlieb 

RDMA is the only in-kernel user that uses __sg_alloc_table_from_pages to
append pages dynamically. In the next patch. That mode will be extended
and that function will get more parameters. So separate it into a unique
function to make such change more clear.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/gpu/drm/drm_prime.c | 13 ---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 11 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  | 14 +++-
 drivers/infiniband/core/umem.c  |  4 +--
 include/linux/scatterlist.h | 39 ++---
 lib/scatterlist.c   | 36 ++-
 tools/testing/scatterlist/main.c| 25 +
 7 files changed, 90 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 2a54f86856af..cf3278041f9c 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -807,8 +807,8 @@ struct sg_table *drm_prime_pages_to_sg(struct drm_device 
*dev,
   struct page **pages, unsigned int 
nr_pages)
 {
struct sg_table *sg;
-   struct scatterlist *sge;
size_t max_segment = 0;
+   int err;
 
sg = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
if (!sg)
@@ -818,13 +818,12 @@ struct sg_table *drm_prime_pages_to_sg(struct drm_device 
*dev,
max_segment = dma_max_mapping_size(dev->dev);
if (max_segment == 0)
max_segment = UINT_MAX;
-   sge = __sg_alloc_table_from_pages(sg, pages, nr_pages, 0,
- nr_pages << PAGE_SHIFT,
- max_segment,
- NULL, 0, GFP_KERNEL);
-   if (IS_ERR(sge)) {
+   err = sg_alloc_table_from_pages_segment(sg, pages, nr_pages, 0,
+   nr_pages << PAGE_SHIFT,
+   max_segment, GFP_KERNEL);
+   if (err) {
kfree(sg);
-   sg = ERR_CAST(sge);
+   sg = ERR_PTR(err);
}
return sg;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 7487bab11f0b..458f797a9e1e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -133,7 +133,6 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
unsigned int max_segment = i915_sg_segment_size();
struct sg_table *st;
unsigned int sg_page_sizes;
-   struct scatterlist *sg;
struct page **pvec;
int ret;
 
@@ -153,13 +152,11 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
spin_unlock(>mm.notifier_lock);
 
 alloc_table:
-   sg = __sg_alloc_table_from_pages(st, pvec, num_pages, 0,
-num_pages << PAGE_SHIFT, max_segment,
-NULL, 0, GFP_KERNEL);
-   if (IS_ERR(sg)) {
-   ret = PTR_ERR(sg);
+   ret = sg_alloc_table_from_pages_segment(st, pvec, num_pages, 0,
+   num_pages << PAGE_SHIFT,
+   max_segment, GFP_KERNEL);
+   if (ret)
goto err;
-   }
 
ret = i915_gem_gtt_prepare_pages(obj, st);
if (ret) {
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index 0488042fb287..fc372d2e52a1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -363,7 +363,6 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
int ret = 0;
static size_t sgl_size;
static size_t sgt_size;
-   struct scatterlist *sg;
 
if (vmw_tt->mapped)
return 0;
@@ -386,15 +385,12 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
if (unlikely(ret != 0))
return ret;
 
-   sg = __sg_alloc_table_from_pages(_tt->sgt, vsgt->pages,
-   vsgt->num_pages, 0,
-   (unsigned long) vsgt->num_pages << PAGE_SHIFT,
-   dma_get_max_seg_size(dev_priv->drm.dev),
-   NULL, 0, GFP_KERNEL);
-   if (IS_ERR(sg)) {
-   ret = PTR_ERR(sg);
+   ret = sg_alloc_table_from_pages_segment(
+   _tt->sgt, vsgt->pages, vsgt->num_pages, 0,
+   (unsigned long)vsgt->num_pages << PAGE_SHIFT,
+   dma_get_max_seg_size(dev_priv->drm.dev), GFP_KERNEL);
+   if (ret)
goto out_sg_a

[Intel-gfx] [PATCH rdma-next v3 0/3] SG fix together with update to RDMA umem

2021-07-29 Thread Leon Romanovsky
From: Leon Romanovsky 

Changelog:
v3:
 * Rewrote to new API suggestion
 * Split for more patches
v2: https://lore.kernel.org/lkml/cover.1626605893.git.leo...@nvidia.com
 * Changed implementation of first patch, based on our discussion with 
Christoph.
   https://lore.kernel.org/lkml/ynwavtt0qmqdx...@infradead.org/
v1: https://lore.kernel.org/lkml/cover.1624955710.git.leo...@nvidia.com/
 * Fixed sg_page with a _dma_ API in the umem.c
v0: https://lore.kernel.org/lkml/cover.1624361199.git.leo...@nvidia.com


Maor Gottlieb (3):
  lib/scatterlist: Provide a dedicated function to support table append
  lib/scatterlist: Fix wrong update of orig_nents
  RDMA: Use the sg_table directly and remove the opencoded version from
umem

 drivers/gpu/drm/drm_prime.c |  13 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  11 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  14 +-
 drivers/infiniband/core/umem.c  |  56 +++---
 drivers/infiniband/core/umem_dmabuf.c   |   5 +-
 drivers/infiniband/hw/hns/hns_roce_db.c |   4 +-
 drivers/infiniband/hw/irdma/verbs.c |   2 +-
 drivers/infiniband/hw/mlx4/doorbell.c   |   3 +-
 drivers/infiniband/hw/mlx4/mr.c |   4 +-
 drivers/infiniband/hw/mlx5/doorbell.c   |   3 +-
 drivers/infiniband/hw/mlx5/mr.c |   3 +-
 drivers/infiniband/hw/qedr/verbs.c  |   2 +-
 drivers/infiniband/sw/rdmavt/mr.c   |   2 +-
 drivers/infiniband/sw/rxe/rxe_mr.c  |   2 +-
 include/linux/scatterlist.h |  54 +-
 include/rdma/ib_umem.h  |  11 +-
 include/rdma/ib_verbs.h |  28 +++
 lib/scatterlist.c   | 189 
 tools/testing/scatterlist/main.c|  38 ++--
 19 files changed, 275 insertions(+), 169 deletions(-)

-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH rdma-next v2 1/2] lib/scatterlist: Fix wrong update of orig_nents

2021-07-18 Thread Leon Romanovsky
From: Maor Gottlieb 

orig_nents should represent the number of entries with pages,
but __sg_alloc_table_from_pages sets orig_nents as the number of
total entries in the table. This is wrong when the API is used for
dynamic allocation where not all the table entries are mapped with
pages. It wasn't observed until now, since RDMA umem who uses this
API in the dynamic form doesn't use orig_nents implicit or explicit
by the scatterlist APIs.

Fix it by:
1. Set orig_nents as number of entries with pages also in
   __sg_alloc_table_from_pages.
2. To fix the release flow, add a new output argument to
   __sg_alloc_table_from_pages which will be set to the
   number of total entries in the table. User like umem that use
   this function for dynamic allocation, should free the table by
   calling to sg_free_table_entries and pass the number of entries.

Fixes: 07da1223ec93 ("lib/scatterlist: Add support in dynamic allocation of SG 
table from pages")
Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/gpu/drm/drm_prime.c |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  2 +-
 drivers/infiniband/core/umem.c  |  5 +-
 include/linux/scatterlist.h |  3 +-
 include/rdma/ib_umem.h  |  3 +-
 lib/scatterlist.c   | 88 ++---
 tools/testing/scatterlist/main.c| 15 +++-
 8 files changed, 80 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 2a54f86856af..1739d10a2c55 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -821,7 +821,7 @@ struct sg_table *drm_prime_pages_to_sg(struct drm_device 
*dev,
sge = __sg_alloc_table_from_pages(sg, pages, nr_pages, 0,
  nr_pages << PAGE_SHIFT,
  max_segment,
- NULL, 0, GFP_KERNEL);
+ NULL, 0, GFP_KERNEL, NULL);
if (IS_ERR(sge)) {
kfree(sg);
sg = ERR_CAST(sge);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 7487bab11f0b..c341d3e3386c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -155,7 +155,7 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
 alloc_table:
sg = __sg_alloc_table_from_pages(st, pvec, num_pages, 0,
 num_pages << PAGE_SHIFT, max_segment,
-NULL, 0, GFP_KERNEL);
+NULL, 0, GFP_KERNEL, NULL);
if (IS_ERR(sg)) {
ret = PTR_ERR(sg);
goto err;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index 0488042fb287..2ad889272b0a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -390,7 +390,7 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
vsgt->num_pages, 0,
(unsigned long) vsgt->num_pages << PAGE_SHIFT,
dma_get_max_seg_size(dev_priv->drm.dev),
-   NULL, 0, GFP_KERNEL);
+   NULL, 0, GFP_KERNEL, NULL);
if (IS_ERR(sg)) {
ret = PTR_ERR(sg);
goto out_sg_alloc_fail;
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 0eb40025075f..cf4197363346 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -59,7 +59,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
unpin_user_page_range_dirty_lock(sg_page(sg),
DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty);
 
-   sg_free_table(>sg_head);
+   sg_free_table_entries(>sg_head, umem->total_nents);
 }
 
 /**
@@ -229,8 +229,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, 
unsigned long addr,
sg = __sg_alloc_table_from_pages(>sg_head, page_list, ret,
0, ret << PAGE_SHIFT,
ib_dma_max_seg_size(device), sg, npages,
-   GFP_KERNEL);
-   umem->sg_nents = umem->sg_head.nents;
+   GFP_KERNEL, >total_nents);
if (IS_ERR(sg)) {
unpin_user_pages_dirty_lock(page_list, ret, 0);
ret = PTR_ERR(sg);
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index ecf87484814f..c796c165d5ee 100644
--- a/include/lin

[Intel-gfx] [PATCH rdma-next v2 2/2] RDMA: Use dma_map_sgtable for map umem pages

2021-07-18 Thread Leon Romanovsky
From: Maor Gottlieb 

In order to avoid incorrect usage of sg_table fields, change umem to
use dma_map_sgtable for map the pages for DMA. Since dma_map_sgtable
update the nents field (number of DMA entries), there is no need
anymore for nmap variable, hence do some cleanups accordingly.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/umem.c| 28 +++
 drivers/infiniband/core/umem_dmabuf.c |  1 -
 drivers/infiniband/hw/mlx4/mr.c   |  4 ++--
 drivers/infiniband/hw/mlx5/mr.c   |  3 ++-
 drivers/infiniband/sw/rdmavt/mr.c |  2 +-
 drivers/infiniband/sw/rxe/rxe_mr.c|  3 ++-
 include/rdma/ib_umem.h|  3 ++-
 include/rdma/ib_verbs.h   | 28 +++
 8 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index cf4197363346..77c2df1c91d1 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -51,11 +51,11 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
struct scatterlist *sg;
unsigned int i;
 
-   if (umem->nmap > 0)
-   ib_dma_unmap_sg(dev, umem->sg_head.sgl, umem->sg_nents,
-   DMA_BIDIRECTIONAL);
+   if (dirty)
+   ib_dma_unmap_sgtable_attrs(dev, >sg_head,
+  DMA_BIDIRECTIONAL, 0);
 
-   for_each_sg(umem->sg_head.sgl, sg, umem->sg_nents, i)
+   for_each_sgtable_sg(>sg_head, sg, i)
unpin_user_page_range_dirty_lock(sg_page(sg),
DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty);
 
@@ -111,7 +111,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
/* offset into first SGL */
pgoff = umem->address & ~PAGE_MASK;
 
-   for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i) {
+   for_each_sgtable_dma_sg(>sg_head, sg, i) {
/* Walk SGL and reduce max page size if VA/PA bits differ
 * for any address.
 */
@@ -121,7 +121,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 * the maximum possible page size as the low bits of the iova
 * must be zero when starting the next chunk.
 */
-   if (i != (umem->nmap - 1))
+   if (i != (umem->sg_head.nents - 1))
mask |= va;
pgoff = 0;
}
@@ -240,16 +240,10 @@ struct ib_umem *ib_umem_get(struct ib_device *device, 
unsigned long addr,
if (access & IB_ACCESS_RELAXED_ORDERING)
dma_attr |= DMA_ATTR_WEAK_ORDERING;
 
-   umem->nmap =
-   ib_dma_map_sg_attrs(device, umem->sg_head.sgl, umem->sg_nents,
-   DMA_BIDIRECTIONAL, dma_attr);
-
-   if (!umem->nmap) {
-   ret = -ENOMEM;
+   ret = ib_dma_map_sgtable_attrs(device, >sg_head,
+  DMA_BIDIRECTIONAL, dma_attr);
+   if (ret)
goto umem_release;
-   }
-
-   ret = 0;
goto out;
 
 umem_release:
@@ -309,8 +303,8 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, 
size_t offset,
return -EINVAL;
}
 
-   ret = sg_pcopy_to_buffer(umem->sg_head.sgl, umem->sg_nents, dst, length,
-offset + ib_umem_offset(umem));
+   ret = sg_pcopy_to_buffer(umem->sg_head.sgl, umem->sg_head.orig_nents,
+dst, length, offset + ib_umem_offset(umem));
 
if (ret < 0)
return ret;
diff --git a/drivers/infiniband/core/umem_dmabuf.c 
b/drivers/infiniband/core/umem_dmabuf.c
index c6e875619fac..2884e4526d78 100644
--- a/drivers/infiniband/core/umem_dmabuf.c
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -57,7 +57,6 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf 
*umem_dmabuf)
 
umem_dmabuf->umem.sg_head.sgl = umem_dmabuf->first_sg;
umem_dmabuf->umem.sg_head.nents = nmap;
-   umem_dmabuf->umem.nmap = nmap;
umem_dmabuf->sgt = sgt;
 
 wait_fence:
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 50becc0e4b62..ab5dc8eac7f8 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -200,7 +200,7 @@ int mlx4_ib_umem_write_mtt(struct mlx4_ib_dev *dev, struct 
mlx4_mtt *mtt,
mtt_shift = mtt->page_shift;
mtt_size = 1ULL << mtt_shift;
 
-   for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i) {
+   for_each_sgtable_dma_sg(>sg_head, sg, i) {
if (cur_start_addr + len == sg_dma_address(sg)) {
/* still the same block */
len += sg_dma_len(sg);
@@ -273,7 +273,7 @@ int mlx4_ib_

[Intel-gfx] [PATCH rdma-next v2 0/2] SG fix together with update to RDMA umem

2021-07-18 Thread Leon Romanovsky
From: Leon Romanovsky 

Changelog:
v2:
 * Changed implementation of first patch, based on our discussion with 
Christoph.
   https://lore.kernel.org/lkml/ynwavtt0qmqdx...@infradead.org/
v1: https://lore.kernel.org/lkml/cover.1624955710.git.leo...@nvidia.com/
 * Fixed sg_page with a _dma_ API in the umem.c
v0: https://lore.kernel.org/lkml/cover.1624361199.git.leo...@nvidia.com

Maor Gottlieb (2):
  lib/scatterlist: Fix wrong update of orig_nents
  RDMA: Use dma_map_sgtable for map umem pages

 drivers/gpu/drm/drm_prime.c |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  2 +-
 drivers/infiniband/core/umem.c  | 33 +++-
 drivers/infiniband/core/umem_dmabuf.c   |  1 -
 drivers/infiniband/hw/mlx4/mr.c |  4 +-
 drivers/infiniband/hw/mlx5/mr.c |  3 +-
 drivers/infiniband/sw/rdmavt/mr.c   |  2 +-
 drivers/infiniband/sw/rxe/rxe_mr.c  |  3 +-
 include/linux/scatterlist.h |  3 +-
 include/rdma/ib_umem.h  |  6 +-
 include/rdma/ib_verbs.h | 28 +++
 lib/scatterlist.c   | 88 ++---
 tools/testing/scatterlist/main.c| 15 +++-
 14 files changed, 128 insertions(+), 64 deletions(-)

-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 02/13] vfio: Introduce a vfio_uninit_group_dev() API call

2021-07-15 Thread Leon Romanovsky
On Wed, Jul 14, 2021 at 09:20:31PM -0300, Jason Gunthorpe wrote:
> From: Max Gurtovoy 
> 
> This pairs with vfio_init_group_dev() and allows undoing any state that is
> stored in the vfio_device unrelated to registration. Add appropriately
> placed calls to all the drivers.
> 
> The following patch will use this to add pre-registration state for the
> device set.
> 
> Signed-off-by: Max Gurtovoy 
> Signed-off-by: Jason Gunthorpe 
> ---
>  Documentation/driver-api/vfio.rst|  4 ++-
>  drivers/vfio/fsl-mc/vfio_fsl_mc.c|  6 +++--
>  drivers/vfio/mdev/vfio_mdev.c| 13 +++---
>  drivers/vfio/pci/vfio_pci.c  |  6 +++--
>  drivers/vfio/platform/vfio_platform_common.c |  7 +++--
>  drivers/vfio/vfio.c  |  5 
>  include/linux/vfio.h |  1 +
>  samples/vfio-mdev/mbochs.c   |  2 ++
>  samples/vfio-mdev/mdpy.c | 25 ++
>  samples/vfio-mdev/mtty.c | 27 
>  10 files changed, 64 insertions(+), 32 deletions(-)

<...>

> @@ -674,6 +675,7 @@ static int vfio_fsl_mc_remove(struct fsl_mc_device 
> *mc_dev)
>  
>   dprc_remove_devices(mc_dev, NULL, 0);
>   vfio_fsl_uninit_device(vdev);
> + vfio_uninit_group_dev(>vdev);

This is wrong place, the _uninit_ should be after vfio_fsl_mc_reflck_put().

>   vfio_fsl_mc_reflck_put(vdev->reflck);
>  
>   kfree(vdev);
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH rdma-next v5 0/4] Dynamicaly allocate SG table from the pages

2020-10-04 Thread Leon Romanovsky
From: Leon Romanovsky 

Changelog:
v5:
 * Use sg_init_table to allocate table and avoid changes is __sg_alloc_table
 * Fix offset issue
v4: https://lore.kernel.org/lkml/20200927064647.3106737-1-l...@kernel.org
 * Fixed formatting in first patch.
 * Added fix (clear tmp_netnts) in first patch to fix i915 failure.
 * Added test patches
v3: https://lore.kernel.org/linux-rdma/20200922083958.2150803-1-l...@kernel.org/
 * Squashed Christopher's suggestion to avoid introduced new API, but extend 
existing one.
v2: https://lore.kernel.org/linux-rdma/20200916140726.839377-1-l...@kernel.org
 * Fixed indentations and comments
 * Deleted sg_alloc_next()
 * Squashed lib/scatterlist patches into one
v1: https://lore.kernel.org/lkml/20200910134259.1304543-1-l...@kernel.org
 * Changed _sg_chain to be __sg_chain
 * Added dependency on ARCH_NO_SG_CHAIN
 * Removed struct sg_append
v0:
 * https://lore.kernel.org/lkml/20200903121853.1145976-1-l...@kernel.org

--
>From Maor:

This series extends __sg_alloc_table_from_pages to allow chaining of
new pages to already initialized SG table.

This allows for the drivers to utilize the optimization of merging contiguous
pages without a need to pre allocate all the pages and hold them in
a very large temporary buffer prior to the call to SG table initialization.

The second patch changes the Infiniband driver to use the new API. It
removes duplicate functionality from the code and benefits the
optimization of allocating dynamic SG table from pages.

In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
E.g. for 100GB memory registration:

 Number of entries  Size
Before26214400  600.0MB
After512001.2MB

Thanks

Maor Gottlieb (2):
  lib/scatterlist: Add support in dynamic allocation of SG table from
pages
  RDMA/umem: Move to allocate SG table from pages

Tvrtko Ursulin (2):
  tools/testing/scatterlist: Rejuvenate bit-rotten test
  tools/testing/scatterlist: Show errors in human readable form

 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  15 ++-
 drivers/infiniband/core/umem.c  |  94 ++-
 include/linux/scatterlist.h |  38 +++---
 lib/scatterlist.c   | 125 
 tools/testing/scatterlist/Makefile  |   3 +-
 tools/testing/scatterlist/linux/mm.h|  35 ++
 tools/testing/scatterlist/main.c|  53 ++---
 8 files changed, 225 insertions(+), 150 deletions(-)

--
2.26.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH rdma-next v5 3/4] tools/testing/scatterlist: Show errors in human readable form

2020-10-04 Thread Leon Romanovsky
From: Tvrtko Ursulin 

Instead of just asserting dump some more useful info about what the test
saw versus what it expected to see.

Signed-off-by: Tvrtko Ursulin 
Cc: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 tools/testing/scatterlist/main.c | 44 
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/tools/testing/scatterlist/main.c b/tools/testing/scatterlist/main.c
index 4899359a31ac..b2c7e9f7b8d3 100644
--- a/tools/testing/scatterlist/main.c
+++ b/tools/testing/scatterlist/main.c
@@ -5,6 +5,15 @@

 #define MAX_PAGES (64)

+struct test {
+   int alloc_ret;
+   unsigned num_pages;
+   unsigned *pfn;
+   unsigned size;
+   unsigned int max_seg;
+   unsigned int expected_segments;
+};
+
 static void set_pages(struct page **pages, const unsigned *array, unsigned num)
 {
unsigned int i;
@@ -17,17 +26,32 @@ static void set_pages(struct page **pages, const unsigned 
*array, unsigned num)

 #define pfn(...) (unsigned []){ __VA_ARGS__ }

+static void fail(struct test *test, struct sg_table *st, const char *cond)
+{
+   unsigned int i;
+
+   fprintf(stderr, "Failed on '%s'!\n\n", cond);
+
+   printf("size = %u, max segment = %u, expected nents = %u\nst->nents = 
%u, st->orig_nents= %u\n",
+  test->size, test->max_seg, test->expected_segments, st->nents,
+  st->orig_nents);
+
+   printf("%u input PFNs:", test->num_pages);
+   for (i = 0; i < test->num_pages; i++)
+   printf(" %x", test->pfn[i]);
+   printf("\n");
+
+   exit(1);
+}
+
+#define VALIDATE(cond, st, test) \
+   if (!(cond)) \
+   fail((test), (st), #cond);
+
 int main(void)
 {
const unsigned int sgmax = SCATTERLIST_MAX_SEGMENT;
-   struct test {
-   int alloc_ret;
-   unsigned num_pages;
-   unsigned *pfn;
-   unsigned size;
-   unsigned int max_seg;
-   unsigned int expected_segments;
-   } *test, tests[] = {
+   struct test *test, tests[] = {
{ -EINVAL, 1, pfn(0), PAGE_SIZE, PAGE_SIZE + 1, 1 },
{ -EINVAL, 1, pfn(0), PAGE_SIZE, 0, 1 },
{ -EINVAL, 1, pfn(0), PAGE_SIZE, sgmax + 1, 1 },
@@ -66,8 +90,8 @@ int main(void)
if (test->alloc_ret)
continue;

-   assert(st.nents == test->expected_segments);
-   assert(st.orig_nents == test->expected_segments);
+   VALIDATE(st.nents == test->expected_segments, , test);
+   VALIDATE(st.orig_nents == test->expected_segments, , test);

sg_free_table();
}
--
2.26.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH rdma-next v5 1/4] lib/scatterlist: Add support in dynamic allocation of SG table from pages

2020-10-04 Thread Leon Romanovsky
From: Maor Gottlieb 

Extend __sg_alloc_table_from_pages to support dynamic allocation of
SG table from pages. It should be used by drivers that can't supply
all the pages at one time.

This function returns the last populated SGE in the table. Users should
pass it as an argument to the function from the second call and forward.
As before, nents will be equal to the number of populated SGEs (chunks).

With this new extension, drivers can benefit the optimization of merging
contiguous pages without a need to allocate all pages in advance and
hold them in a large buffer.

E.g. with the Infiniband driver that allocates a single page for hold the
pages. For 1TB memory registration, the temporary buffer would consume only
4KB, instead of 2GB.

Signed-off-by: Maor Gottlieb 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Leon Romanovsky 
---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  15 ++-
 include/linux/scatterlist.h |  38 +++---
 lib/scatterlist.c   | 125 
 tools/testing/scatterlist/main.c|   9 +-
 5 files changed, 142 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 12b30075134a..f2eaed6aca3d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -403,6 +403,7 @@ __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object 
*obj,
unsigned int max_segment = i915_sg_segment_size();
struct sg_table *st;
unsigned int sg_page_sizes;
+   struct scatterlist *sg;
int ret;

st = kmalloc(sizeof(*st), GFP_KERNEL);
@@ -410,13 +411,12 @@ __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object 
*obj,
return ERR_PTR(-ENOMEM);

 alloc_table:
-   ret = __sg_alloc_table_from_pages(st, pvec, num_pages,
- 0, num_pages << PAGE_SHIFT,
- max_segment,
- GFP_KERNEL);
-   if (ret) {
+   sg = __sg_alloc_table_from_pages(st, pvec, num_pages, 0,
+num_pages << PAGE_SHIFT, max_segment,
+NULL, 0, GFP_KERNEL);
+   if (IS_ERR(sg)) {
kfree(st);
-   return ERR_PTR(ret);
+   return ERR_CAST(sg);
}

ret = i915_gem_gtt_prepare_pages(obj, st);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index ab524ab3b0b4..f22acd398b1f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -419,6 +419,7 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
int ret = 0;
static size_t sgl_size;
static size_t sgt_size;
+   struct scatterlist *sg;

if (vmw_tt->mapped)
return 0;
@@ -441,13 +442,15 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
if (unlikely(ret != 0))
return ret;

-   ret = __sg_alloc_table_from_pages
-   (_tt->sgt, vsgt->pages, vsgt->num_pages, 0,
-(unsigned long) vsgt->num_pages << PAGE_SHIFT,
-dma_get_max_seg_size(dev_priv->dev->dev),
-GFP_KERNEL);
-   if (unlikely(ret != 0))
+   sg = __sg_alloc_table_from_pages(_tt->sgt, vsgt->pages,
+   vsgt->num_pages, 0,
+   (unsigned long) vsgt->num_pages << PAGE_SHIFT,
+   dma_get_max_seg_size(dev_priv->dev->dev),
+   NULL, 0, GFP_KERNEL);
+   if (IS_ERR(sg)) {
+   ret = PTR_ERR(sg);
goto out_sg_alloc_fail;
+   }

if (vsgt->num_pages > vmw_tt->sgt.nents) {
uint64_t over_alloc =
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 45cf7b69d852..36c47e7e66a2 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -165,6 +165,22 @@ static inline void sg_set_buf(struct scatterlist *sg, 
const void *buf,
 #define for_each_sgtable_dma_sg(sgt, sg, i)\
for_each_sg((sgt)->sgl, sg, (sgt)->nents, i)

+static inline void __sg_chain(struct scatterlist *chain_sg,
+ struct scatterlist *sgl)
+{
+   /*
+* offset and length are unused for chain entry. Clear them.
+*/
+   chain_sg->offset = 0;
+   chain_sg->length = 0;
+
+   /*
+* Set lowest bit to indicate a link pointer, and make sure to clear
+* the termination bit if it happens to be set.

[Intel-gfx] [PATCH rdma-next v5 4/4] RDMA/umem: Move to allocate SG table from pages

2020-10-04 Thread Leon Romanovsky
From: Maor Gottlieb 

Remove the implementation of ib_umem_add_sg_table and instead
call to __sg_alloc_table_from_pages which already has the logic to
merge contiguous pages.

Besides that it removes duplicated functionality, it reduces the
memory consumption of the SG table significantly. Prior to this
patch, the SG table was allocated in advance regardless consideration
of contiguous pages.

In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
E.g. for 100GB memory registration:

 Number of entries  Size
Before26214400  600.0MB
After512001.2MB

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/umem.c | 94 +-
 1 file changed, 12 insertions(+), 82 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index c1ab6a4f2bc3..e9fecbdf391b 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -61,73 +61,6 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
sg_free_table(>sg_head);
 }

-/* ib_umem_add_sg_table - Add N contiguous pages to scatter table
- *
- * sg: current scatterlist entry
- * page_list: array of npage struct page pointers
- * npages: number of pages in page_list
- * max_seg_sz: maximum segment size in bytes
- * nents: [out] number of entries in the scatterlist
- *
- * Return new end of scatterlist
- */
-static struct scatterlist *ib_umem_add_sg_table(struct scatterlist *sg,
-   struct page **page_list,
-   unsigned long npages,
-   unsigned int max_seg_sz,
-   int *nents)
-{
-   unsigned long first_pfn;
-   unsigned long i = 0;
-   bool update_cur_sg = false;
-   bool first = !sg_page(sg);
-
-   /* Check if new page_list is contiguous with end of previous page_list.
-* sg->length here is a multiple of PAGE_SIZE and sg->offset is 0.
-*/
-   if (!first && (page_to_pfn(sg_page(sg)) + (sg->length >> PAGE_SHIFT) ==
-  page_to_pfn(page_list[0])))
-   update_cur_sg = true;
-
-   while (i != npages) {
-   unsigned long len;
-   struct page *first_page = page_list[i];
-
-   first_pfn = page_to_pfn(first_page);
-
-   /* Compute the number of contiguous pages we have starting
-* at i
-*/
-   for (len = 0; i != npages &&
- first_pfn + len == page_to_pfn(page_list[i]) &&
- len < (max_seg_sz >> PAGE_SHIFT);
-len++)
-   i++;
-
-   /* Squash N contiguous pages from page_list into current sge */
-   if (update_cur_sg) {
-   if ((max_seg_sz - sg->length) >= (len << PAGE_SHIFT)) {
-   sg_set_page(sg, sg_page(sg),
-   sg->length + (len << PAGE_SHIFT),
-   0);
-   update_cur_sg = false;
-   continue;
-   }
-   update_cur_sg = false;
-   }
-
-   /* Squash N contiguous pages into next sge or first sge */
-   if (!first)
-   sg = sg_next(sg);
-
-   (*nents)++;
-   sg_set_page(sg, first_page, len << PAGE_SHIFT, 0);
-   first = false;
-   }
-
-   return sg;
-}
-
 /**
  * ib_umem_find_best_pgsz - Find best HW page size to use for this MR
  *
@@ -217,7 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, 
unsigned long addr,
struct mm_struct *mm;
unsigned long npages;
int ret;
-   struct scatterlist *sg;
+   struct scatterlist *sg = NULL;
unsigned int gup_flags = FOLL_WRITE;

/*
@@ -272,15 +205,9 @@ struct ib_umem *ib_umem_get(struct ib_device *device, 
unsigned long addr,

cur_base = addr & PAGE_MASK;

-   ret = sg_alloc_table(>sg_head, npages, GFP_KERNEL);
-   if (ret)
-   goto vma;
-
if (!umem->writable)
gup_flags |= FOLL_FORCE;

-   sg = umem->sg_head.sgl;
-
while (npages) {
cond_resched();
ret = pin_user_pages_fast(cur_base,
@@ -292,15 +219,19 @@ struct ib_umem *ib_umem_get(struct ib_device *device, 
unsigned long addr,
goto umem_release;

cur_base += ret * PAGE_SIZE;
-   npages   -= ret;
-
-   sg = ib_umem_add_sg_table(sg, page_list, ret,
-   

[Intel-gfx] [PATCH rdma-next v5 2/4] tools/testing/scatterlist: Rejuvenate bit-rotten test

2020-10-04 Thread Leon Romanovsky
From: Tvrtko Ursulin 

A couple small tweaks are needed to make the test build and run
on current kernels.

Signed-off-by: Tvrtko Ursulin 
Cc: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 tools/testing/scatterlist/Makefile   |  3 ++-
 tools/testing/scatterlist/linux/mm.h | 35 
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/tools/testing/scatterlist/Makefile 
b/tools/testing/scatterlist/Makefile
index cbb003d9305e..c65233876622 100644
--- a/tools/testing/scatterlist/Makefile
+++ b/tools/testing/scatterlist/Makefile
@@ -14,7 +14,7 @@ targets: include $(TARGETS)
 main: $(OFILES)

 clean:
-   $(RM) $(TARGETS) $(OFILES) scatterlist.c linux/scatterlist.h 
linux/highmem.h linux/kmemleak.h asm/io.h
+   $(RM) $(TARGETS) $(OFILES) scatterlist.c linux/scatterlist.h 
linux/highmem.h linux/kmemleak.h linux/slab.h asm/io.h
@rmdir asm

 scatterlist.c: ../../../lib/scatterlist.c
@@ -28,4 +28,5 @@ include: ../../../include/linux/scatterlist.h
@touch asm/io.h
@touch linux/highmem.h
@touch linux/kmemleak.h
+   @touch linux/slab.h
@cp $< linux/scatterlist.h
diff --git a/tools/testing/scatterlist/linux/mm.h 
b/tools/testing/scatterlist/linux/mm.h
index 6f9ac14aa800..6ae907f375d2 100644
--- a/tools/testing/scatterlist/linux/mm.h
+++ b/tools/testing/scatterlist/linux/mm.h
@@ -114,6 +114,12 @@ static inline void *kmalloc(unsigned int size, unsigned 
int flags)
return malloc(size);
 }

+static inline void *
+kmalloc_array(unsigned int n, unsigned int size, unsigned int flags)
+{
+   return malloc(n * size);
+}
+
 #define kfree(x) free(x)

 #define kmemleak_alloc(a, b, c, d)
@@ -122,4 +128,33 @@ static inline void *kmalloc(unsigned int size, unsigned 
int flags)
 #define PageSlab(p) (0)
 #define flush_kernel_dcache_page(p)

+#define MAX_ERRNO  4095
+
+#define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned 
long)-MAX_ERRNO)
+
+static inline void * __must_check ERR_PTR(long error)
+{
+   return (void *) error;
+}
+
+static inline long __must_check PTR_ERR(__force const void *ptr)
+{
+   return (long) ptr;
+}
+
+static inline bool __must_check IS_ERR(__force const void *ptr)
+{
+   return IS_ERR_VALUE((unsigned long)ptr);
+}
+
+static inline int __must_check PTR_ERR_OR_ZERO(__force const void *ptr)
+{
+   if (IS_ERR(ptr))
+   return PTR_ERR(ptr);
+   else
+   return 0;
+}
+
+#define IS_ENABLED(x) (0)
+
 #endif
--
2.26.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH rdma-next v4 4/4] RDMA/umem: Move to allocate SG table from pages

2020-09-30 Thread Leon Romanovsky
On Wed, Sep 30, 2020 at 12:14:06PM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 30, 2020 at 06:05:15PM +0300, Maor Gottlieb wrote:
> > This is right only for the last iteration. E.g. in the first iteration in
> > case that there are more pages (left_pages), then we allocate
> > SG_MAX_SINGLE_ALLOC.  We don't know how many pages from the second iteration
> > will be squashed to the SGE from the first iteration.
>
> Well, it is 0 or 1 SGE's. Check if the first page is mergable and
> subtract one from the required length?
>
> I dislike this sg_mark_end() it is something that should be internal,
> IMHO.

I don't think so, but Maor provided possible solution.
Can you take the patches?

Thanks

>
> Jason
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH rdma-next v4 4/4] RDMA/umem: Move to allocate SG table from pages

2020-09-30 Thread Leon Romanovsky
On Tue, Sep 29, 2020 at 04:59:29PM -0300, Jason Gunthorpe wrote:
> On Sun, Sep 27, 2020 at 09:46:47AM +0300, Leon Romanovsky wrote:
> > @@ -296,11 +223,17 @@ static struct ib_umem *__ib_umem_get(struct ib_device 
> > *device,
> > goto umem_release;
> >
> > cur_base += ret * PAGE_SIZE;
> > -   npages   -= ret;
> > -
> > -   sg = ib_umem_add_sg_table(sg, page_list, ret,
> > -   dma_get_max_seg_size(device->dma_device),
> > -   >sg_nents);
> > +   npages -= ret;
> > +   sg = __sg_alloc_table_from_pages(
> > +   >sg_head, page_list, ret, 0, ret << PAGE_SHIFT,
> > +   dma_get_max_seg_size(device->dma_device), sg, npages,
> > +   GFP_KERNEL);
> > +   umem->sg_nents = umem->sg_head.nents;
> > +   if (IS_ERR(sg)) {
> > +   unpin_user_pages_dirty_lock(page_list, ret, 0);
> > +   ret = PTR_ERR(sg);
> > +   goto umem_release;
> > +   }
> > }
> >
> > sg_mark_end(sg);
>
> Does it still need the sg_mark_end?

It is preserved here for correctness, the release logic doesn't rely on
this marker, but it is better to leave it.

Thanks

>
> Jason
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH rdma-next v4 3/4] tools/testing/scatterlist: Show errors in human readable form

2020-09-27 Thread Leon Romanovsky
From: Tvrtko Ursulin 

Instead of just asserting dump some more useful info about what the test
saw versus what it expected to see.

Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 tools/testing/scatterlist/main.c | 44 
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/tools/testing/scatterlist/main.c b/tools/testing/scatterlist/main.c
index 4899359a31ac..b2c7e9f7b8d3 100644
--- a/tools/testing/scatterlist/main.c
+++ b/tools/testing/scatterlist/main.c
@@ -5,6 +5,15 @@

 #define MAX_PAGES (64)

+struct test {
+   int alloc_ret;
+   unsigned num_pages;
+   unsigned *pfn;
+   unsigned size;
+   unsigned int max_seg;
+   unsigned int expected_segments;
+};
+
 static void set_pages(struct page **pages, const unsigned *array, unsigned num)
 {
unsigned int i;
@@ -17,17 +26,32 @@ static void set_pages(struct page **pages, const unsigned 
*array, unsigned num)

 #define pfn(...) (unsigned []){ __VA_ARGS__ }

+static void fail(struct test *test, struct sg_table *st, const char *cond)
+{
+   unsigned int i;
+
+   fprintf(stderr, "Failed on '%s'!\n\n", cond);
+
+   printf("size = %u, max segment = %u, expected nents = %u\nst->nents = 
%u, st->orig_nents= %u\n",
+  test->size, test->max_seg, test->expected_segments, st->nents,
+  st->orig_nents);
+
+   printf("%u input PFNs:", test->num_pages);
+   for (i = 0; i < test->num_pages; i++)
+   printf(" %x", test->pfn[i]);
+   printf("\n");
+
+   exit(1);
+}
+
+#define VALIDATE(cond, st, test) \
+   if (!(cond)) \
+   fail((test), (st), #cond);
+
 int main(void)
 {
const unsigned int sgmax = SCATTERLIST_MAX_SEGMENT;
-   struct test {
-   int alloc_ret;
-   unsigned num_pages;
-   unsigned *pfn;
-   unsigned size;
-   unsigned int max_seg;
-   unsigned int expected_segments;
-   } *test, tests[] = {
+   struct test *test, tests[] = {
{ -EINVAL, 1, pfn(0), PAGE_SIZE, PAGE_SIZE + 1, 1 },
{ -EINVAL, 1, pfn(0), PAGE_SIZE, 0, 1 },
{ -EINVAL, 1, pfn(0), PAGE_SIZE, sgmax + 1, 1 },
@@ -66,8 +90,8 @@ int main(void)
if (test->alloc_ret)
continue;

-   assert(st.nents == test->expected_segments);
-   assert(st.orig_nents == test->expected_segments);
+   VALIDATE(st.nents == test->expected_segments, , test);
+   VALIDATE(st.orig_nents == test->expected_segments, , test);

sg_free_table();
}
--
2.26.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH rdma-next v4 1/4] lib/scatterlist: Add support in dynamic allocation of SG table from pages

2020-09-27 Thread Leon Romanovsky
From: Maor Gottlieb 

Extend __sg_alloc_table_from_pages to support dynamic allocation of
SG table from pages. It should be used by drivers that can't supply
all the pages at one time.

This function returns the last populated SGE in the table. Users should
pass it as an argument to the function from the second call and forward.
As before, nents will be equal to the number of populated SGEs (chunks).

With this new extension, drivers can benefit the optimization of merging
contiguous pages without a need to allocate all pages in advance and
hold them in a large buffer.

E.g. with the Infiniband driver that allocates a single page for hold the
pages. For 1TB memory registration, the temporary buffer would consume only
4KB, instead of 2GB.

Signed-off-by: Maor Gottlieb 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Leon Romanovsky 
---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  15 +-
 include/linux/scatterlist.h |  43 +++---
 lib/scatterlist.c   | 160 +++-
 lib/sg_pool.c   |   3 +-
 tools/testing/scatterlist/main.c|   9 +-
 6 files changed, 165 insertions(+), 77 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 12b30075134a..f2eaed6aca3d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -403,6 +403,7 @@ __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object 
*obj,
unsigned int max_segment = i915_sg_segment_size();
struct sg_table *st;
unsigned int sg_page_sizes;
+   struct scatterlist *sg;
int ret;

st = kmalloc(sizeof(*st), GFP_KERNEL);
@@ -410,13 +411,12 @@ __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object 
*obj,
return ERR_PTR(-ENOMEM);

 alloc_table:
-   ret = __sg_alloc_table_from_pages(st, pvec, num_pages,
- 0, num_pages << PAGE_SHIFT,
- max_segment,
- GFP_KERNEL);
-   if (ret) {
+   sg = __sg_alloc_table_from_pages(st, pvec, num_pages, 0,
+num_pages << PAGE_SHIFT, max_segment,
+NULL, 0, GFP_KERNEL);
+   if (IS_ERR(sg)) {
kfree(st);
-   return ERR_PTR(ret);
+   return ERR_CAST(sg);
}

ret = i915_gem_gtt_prepare_pages(obj, st);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index ab524ab3b0b4..f22acd398b1f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -419,6 +419,7 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
int ret = 0;
static size_t sgl_size;
static size_t sgt_size;
+   struct scatterlist *sg;

if (vmw_tt->mapped)
return 0;
@@ -441,13 +442,15 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
if (unlikely(ret != 0))
return ret;

-   ret = __sg_alloc_table_from_pages
-   (_tt->sgt, vsgt->pages, vsgt->num_pages, 0,
-(unsigned long) vsgt->num_pages << PAGE_SHIFT,
-dma_get_max_seg_size(dev_priv->dev->dev),
-GFP_KERNEL);
-   if (unlikely(ret != 0))
+   sg = __sg_alloc_table_from_pages(_tt->sgt, vsgt->pages,
+   vsgt->num_pages, 0,
+   (unsigned long) vsgt->num_pages << PAGE_SHIFT,
+   dma_get_max_seg_size(dev_priv->dev->dev),
+   NULL, 0, GFP_KERNEL);
+   if (IS_ERR(sg)) {
+   ret = PTR_ERR(sg);
goto out_sg_alloc_fail;
+   }

if (vsgt->num_pages > vmw_tt->sgt.nents) {
uint64_t over_alloc =
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 45cf7b69d852..c24cc667b56b 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -165,6 +165,22 @@ static inline void sg_set_buf(struct scatterlist *sg, 
const void *buf,
 #define for_each_sgtable_dma_sg(sgt, sg, i)\
for_each_sg((sgt)->sgl, sg, (sgt)->nents, i)

+static inline void __sg_chain(struct scatterlist *chain_sg,
+ struct scatterlist *sgl)
+{
+   /*
+* offset and length are unused for chain entry. Clear them.
+*/
+   chain_sg->offset = 0;
+   chain_sg->length = 0;
+
+   /*
+* Set lowest bit to indicate a link pointer, and make sure to clear
+* the terminati

[Intel-gfx] [PATCH rdma-next v4 2/4] tools/testing/scatterlist: Rejuvenate bit-rotten test

2020-09-27 Thread Leon Romanovsky
From: Tvrtko Ursulin 

A couple small tweaks are needed to make the test build and run
on current kernels.

Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 tools/testing/scatterlist/Makefile   |  3 ++-
 tools/testing/scatterlist/linux/mm.h | 35 
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/tools/testing/scatterlist/Makefile 
b/tools/testing/scatterlist/Makefile
index cbb003d9305e..c65233876622 100644
--- a/tools/testing/scatterlist/Makefile
+++ b/tools/testing/scatterlist/Makefile
@@ -14,7 +14,7 @@ targets: include $(TARGETS)
 main: $(OFILES)

 clean:
-   $(RM) $(TARGETS) $(OFILES) scatterlist.c linux/scatterlist.h 
linux/highmem.h linux/kmemleak.h asm/io.h
+   $(RM) $(TARGETS) $(OFILES) scatterlist.c linux/scatterlist.h 
linux/highmem.h linux/kmemleak.h linux/slab.h asm/io.h
@rmdir asm

 scatterlist.c: ../../../lib/scatterlist.c
@@ -28,4 +28,5 @@ include: ../../../include/linux/scatterlist.h
@touch asm/io.h
@touch linux/highmem.h
@touch linux/kmemleak.h
+   @touch linux/slab.h
@cp $< linux/scatterlist.h
diff --git a/tools/testing/scatterlist/linux/mm.h 
b/tools/testing/scatterlist/linux/mm.h
index 6f9ac14aa800..6ae907f375d2 100644
--- a/tools/testing/scatterlist/linux/mm.h
+++ b/tools/testing/scatterlist/linux/mm.h
@@ -114,6 +114,12 @@ static inline void *kmalloc(unsigned int size, unsigned 
int flags)
return malloc(size);
 }

+static inline void *
+kmalloc_array(unsigned int n, unsigned int size, unsigned int flags)
+{
+   return malloc(n * size);
+}
+
 #define kfree(x) free(x)

 #define kmemleak_alloc(a, b, c, d)
@@ -122,4 +128,33 @@ static inline void *kmalloc(unsigned int size, unsigned 
int flags)
 #define PageSlab(p) (0)
 #define flush_kernel_dcache_page(p)

+#define MAX_ERRNO  4095
+
+#define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned 
long)-MAX_ERRNO)
+
+static inline void * __must_check ERR_PTR(long error)
+{
+   return (void *) error;
+}
+
+static inline long __must_check PTR_ERR(__force const void *ptr)
+{
+   return (long) ptr;
+}
+
+static inline bool __must_check IS_ERR(__force const void *ptr)
+{
+   return IS_ERR_VALUE((unsigned long)ptr);
+}
+
+static inline int __must_check PTR_ERR_OR_ZERO(__force const void *ptr)
+{
+   if (IS_ERR(ptr))
+   return PTR_ERR(ptr);
+   else
+   return 0;
+}
+
+#define IS_ENABLED(x) (0)
+
 #endif
--
2.26.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH rdma-next v4 4/4] RDMA/umem: Move to allocate SG table from pages

2020-09-27 Thread Leon Romanovsky
From: Maor Gottlieb 

Remove the implementation of ib_umem_add_sg_table and instead
call to __sg_alloc_table_from_pages which already has the logic to
merge contiguous pages.

Besides that it removes duplicated functionality, it reduces the
memory consumption of the SG table significantly. Prior to this
patch, the SG table was allocated in advance regardless consideration
of contiguous pages.

In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries. E.g. for 100GB memory registration:

 Number of entries  Size
Before26214400  600.0MB
After512001.2MB

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/umem.c | 92 +-
 1 file changed, 12 insertions(+), 80 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 01b680b62846..0ef736970aba 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -63,73 +63,6 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
sg_free_table(>sg_head);
 }

-/* ib_umem_add_sg_table - Add N contiguous pages to scatter table
- *
- * sg: current scatterlist entry
- * page_list: array of npage struct page pointers
- * npages: number of pages in page_list
- * max_seg_sz: maximum segment size in bytes
- * nents: [out] number of entries in the scatterlist
- *
- * Return new end of scatterlist
- */
-static struct scatterlist *ib_umem_add_sg_table(struct scatterlist *sg,
-   struct page **page_list,
-   unsigned long npages,
-   unsigned int max_seg_sz,
-   int *nents)
-{
-   unsigned long first_pfn;
-   unsigned long i = 0;
-   bool update_cur_sg = false;
-   bool first = !sg_page(sg);
-
-   /* Check if new page_list is contiguous with end of previous page_list.
-* sg->length here is a multiple of PAGE_SIZE and sg->offset is 0.
-*/
-   if (!first && (page_to_pfn(sg_page(sg)) + (sg->length >> PAGE_SHIFT) ==
-  page_to_pfn(page_list[0])))
-   update_cur_sg = true;
-
-   while (i != npages) {
-   unsigned long len;
-   struct page *first_page = page_list[i];
-
-   first_pfn = page_to_pfn(first_page);
-
-   /* Compute the number of contiguous pages we have starting
-* at i
-*/
-   for (len = 0; i != npages &&
- first_pfn + len == page_to_pfn(page_list[i]) &&
- len < (max_seg_sz >> PAGE_SHIFT);
-len++)
-   i++;
-
-   /* Squash N contiguous pages from page_list into current sge */
-   if (update_cur_sg) {
-   if ((max_seg_sz - sg->length) >= (len << PAGE_SHIFT)) {
-   sg_set_page(sg, sg_page(sg),
-   sg->length + (len << PAGE_SHIFT),
-   0);
-   update_cur_sg = false;
-   continue;
-   }
-   update_cur_sg = false;
-   }
-
-   /* Squash N contiguous pages into next sge or first sge */
-   if (!first)
-   sg = sg_next(sg);
-
-   (*nents)++;
-   sg_set_page(sg, first_page, len << PAGE_SHIFT, 0);
-   first = false;
-   }
-
-   return sg;
-}
-
 /**
  * ib_umem_find_best_pgsz - Find best HW page size to use for this MR
  *
@@ -221,7 +154,7 @@ static struct ib_umem *__ib_umem_get(struct ib_device 
*device,
struct mm_struct *mm;
unsigned long npages;
int ret;
-   struct scatterlist *sg;
+   struct scatterlist *sg = NULL;
unsigned int gup_flags = FOLL_WRITE;

/*
@@ -276,15 +209,9 @@ static struct ib_umem *__ib_umem_get(struct ib_device 
*device,

cur_base = addr & PAGE_MASK;

-   ret = sg_alloc_table(>sg_head, npages, GFP_KERNEL);
-   if (ret)
-   goto vma;
-
if (!umem->writable)
gup_flags |= FOLL_FORCE;

-   sg = umem->sg_head.sgl;
-
while (npages) {
cond_resched();
ret = pin_user_pages_fast(cur_base,
@@ -296,11 +223,17 @@ static struct ib_umem *__ib_umem_get(struct ib_device 
*device,
goto umem_release;

cur_base += ret * PAGE_SIZE;
-   npages   -= ret;
-
-   sg = ib_umem_add_sg_table(sg, page_list, ret,
-   dma_get_max_seg_

[Intel-gfx] [PATCH rdma-next v4 0/4] Dynamicaly allocate SG table from the pages

2020-09-27 Thread Leon Romanovsky
From: Leon Romanovsky 


Changelog:
v4:
 * Fixed formatting in first patch.
 * Added fix (clear tmp_netnts) in first patch to fix i915 failure.
 * Added test patches
v3: https://lore.kernel.org/linux-rdma/20200922083958.2150803-1-l...@kernel.org/
 * Squashed Christopher's suggestion to avoid introduced new API, but extend 
existing one.
v2: https://lore.kernel.org/linux-rdma/20200916140726.839377-1-l...@kernel.org
 * Fixed indentations and comments
 * Deleted sg_alloc_next()
 * Squashed lib/scatterlist patches into one
v1: https://lore.kernel.org/lkml/20200910134259.1304543-1-l...@kernel.org
 * Changed _sg_chain to be __sg_chain
 * Added dependency on ARCH_NO_SG_CHAIN
 * Removed struct sg_append
v0:
 * https://lore.kernel.org/lkml/20200903121853.1145976-1-l...@kernel.org

--
>From Maor:

This series extends __sg_alloc_table_from_pages to allow chaining of
new pages to already initialized SG table.

This allows drivers to utilize the optimization of merging contiguous
pages without a need to pre allocate all the pages and hold them in
a very large temporary buffer prior to the call to SG table initialization.

The second patch changes the Infiniband driver to use the new API. It
removes duplicate functionality from the code and benefits the
optimization of allocating dynamic SG table from pages.

In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
E.g. for 100GB memory registration:

 Number of entries  Size
Before26214400  600.0MB
After512001.2MB

Thanks

Maor Gottlieb (2):
  lib/scatterlist: Add support in dynamic allocation of SG table from
pages
  RDMA/umem: Move to allocate SG table from pages

Tvrtko Ursulin (2):
  tools/testing/scatterlist: Rejuvenate bit-rotten test
  tools/testing/scatterlist: Show errors in human readable form

 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  15 +-
 drivers/infiniband/core/umem.c  |  92 ++-
 include/linux/scatterlist.h |  43 +++---
 lib/scatterlist.c   | 160 +++-
 lib/sg_pool.c   |   3 +-
 tools/testing/scatterlist/Makefile  |   3 +-
 tools/testing/scatterlist/linux/mm.h|  35 +
 tools/testing/scatterlist/main.c|  53 +--
 9 files changed, 248 insertions(+), 168 deletions(-)

--
2.26.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH rdma-next v3 1/2] lib/scatterlist: Add support in dynamic allocation of SG table from pages

2020-09-25 Thread Leon Romanovsky
On Thu, Sep 24, 2020 at 09:21:20AM +0100, Tvrtko Ursulin wrote:
>
> On 22/09/2020 09:39, Leon Romanovsky wrote:
> > From: Maor Gottlieb 
> >
> > Extend __sg_alloc_table_from_pages to support dynamic allocation of
> > SG table from pages. It should be used by drivers that can't supply
> > all the pages at one time.
> >
> > This function returns the last populated SGE in the table. Users should
> > pass it as an argument to the function from the second call and forward.
> > As before, nents will be equal to the number of populated SGEs (chunks).
>
> So it's appending and growing the "list", did I get that right? Sounds handy
> indeed. Some comments/questions below.

Yes, we (RDMA) use this function to chain contiguous pages.

>
> >
> > With this new extension, drivers can benefit the optimization of merging
> > contiguous pages without a need to allocate all pages in advance and
> > hold them in a large buffer.
> >
> > E.g. with the Infiniband driver that allocates a single page for hold
> > the
> > pages. For 1TB memory registration, the temporary buffer would consume
> > only
> > 4KB, instead of 2GB.
> >
> > Signed-off-by: Maor Gottlieb 
> > Signed-off-by: Leon Romanovsky 
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  12 +-
> >   drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  15 +-
> >   include/linux/scatterlist.h |  43 +++---
> >   lib/scatterlist.c   | 158 +++-
> >   lib/sg_pool.c   |   3 +-
> >   tools/testing/scatterlist/main.c|   9 +-
> >   6 files changed, 163 insertions(+), 77 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > index 12b30075134a..f2eaed6aca3d 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > @@ -403,6 +403,7 @@ __i915_gem_userptr_alloc_pages(struct 
> > drm_i915_gem_object *obj,
> > unsigned int max_segment = i915_sg_segment_size();
> > struct sg_table *st;
> > unsigned int sg_page_sizes;
> > +   struct scatterlist *sg;
> > int ret;
> >
> > st = kmalloc(sizeof(*st), GFP_KERNEL);
> > @@ -410,13 +411,12 @@ __i915_gem_userptr_alloc_pages(struct 
> > drm_i915_gem_object *obj,
> > return ERR_PTR(-ENOMEM);
> >
> >   alloc_table:
> > -   ret = __sg_alloc_table_from_pages(st, pvec, num_pages,
> > - 0, num_pages << PAGE_SHIFT,
> > - max_segment,
> > - GFP_KERNEL);
> > -   if (ret) {
> > +   sg = __sg_alloc_table_from_pages(st, pvec, num_pages, 0,
> > +num_pages << PAGE_SHIFT, max_segment,
> > +NULL, 0, GFP_KERNEL);
> > +   if (IS_ERR(sg)) {
> > kfree(st);
> > -   return ERR_PTR(ret);
> > +   return ERR_CAST(sg);
> > }
> >
> > ret = i915_gem_gtt_prepare_pages(obj, st);
> > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
> > b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> > index ab524ab3b0b4..f22acd398b1f 100644
> > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> > @@ -419,6 +419,7 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
> > int ret = 0;
> > static size_t sgl_size;
> > static size_t sgt_size;
> > +   struct scatterlist *sg;
> >
> > if (vmw_tt->mapped)
> > return 0;
> > @@ -441,13 +442,15 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
> > if (unlikely(ret != 0))
> > return ret;
> >
> > -   ret = __sg_alloc_table_from_pages
> > -   (_tt->sgt, vsgt->pages, vsgt->num_pages, 0,
> > -(unsigned long) vsgt->num_pages << PAGE_SHIFT,
> > -dma_get_max_seg_size(dev_priv->dev->dev),
> > -GFP_KERNEL);
> > -   if (unlikely(ret != 0))
> > +   sg = __sg_alloc_table_from_pages(_tt->sgt, vsgt->pages,
> > +   vsgt->num_pages, 0,
> > +   (unsigned long) vsgt->num_pages << PAGE_SHIFT,
> > +   dma_get_max_seg_size(dev_priv->dev->dev),
> &g

[Intel-gfx] [PATCH rdma-next v3 1/2] lib/scatterlist: Add support in dynamic allocation of SG table from pages

2020-09-22 Thread Leon Romanovsky
From: Maor Gottlieb 

Extend __sg_alloc_table_from_pages to support dynamic allocation of
SG table from pages. It should be used by drivers that can't supply
all the pages at one time.

This function returns the last populated SGE in the table. Users should
pass it as an argument to the function from the second call and forward.
As before, nents will be equal to the number of populated SGEs (chunks).

With this new extension, drivers can benefit the optimization of merging
contiguous pages without a need to allocate all pages in advance and
hold them in a large buffer.

E.g. with the Infiniband driver that allocates a single page for hold
the
pages. For 1TB memory registration, the temporary buffer would consume
only
4KB, instead of 2GB.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  15 +-
 include/linux/scatterlist.h |  43 +++---
 lib/scatterlist.c   | 158 +++-
 lib/sg_pool.c   |   3 +-
 tools/testing/scatterlist/main.c|   9 +-
 6 files changed, 163 insertions(+), 77 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 12b30075134a..f2eaed6aca3d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -403,6 +403,7 @@ __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object 
*obj,
unsigned int max_segment = i915_sg_segment_size();
struct sg_table *st;
unsigned int sg_page_sizes;
+   struct scatterlist *sg;
int ret;

st = kmalloc(sizeof(*st), GFP_KERNEL);
@@ -410,13 +411,12 @@ __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object 
*obj,
return ERR_PTR(-ENOMEM);

 alloc_table:
-   ret = __sg_alloc_table_from_pages(st, pvec, num_pages,
- 0, num_pages << PAGE_SHIFT,
- max_segment,
- GFP_KERNEL);
-   if (ret) {
+   sg = __sg_alloc_table_from_pages(st, pvec, num_pages, 0,
+num_pages << PAGE_SHIFT, max_segment,
+NULL, 0, GFP_KERNEL);
+   if (IS_ERR(sg)) {
kfree(st);
-   return ERR_PTR(ret);
+   return ERR_CAST(sg);
}

ret = i915_gem_gtt_prepare_pages(obj, st);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index ab524ab3b0b4..f22acd398b1f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -419,6 +419,7 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
int ret = 0;
static size_t sgl_size;
static size_t sgt_size;
+   struct scatterlist *sg;

if (vmw_tt->mapped)
return 0;
@@ -441,13 +442,15 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
if (unlikely(ret != 0))
return ret;

-   ret = __sg_alloc_table_from_pages
-   (_tt->sgt, vsgt->pages, vsgt->num_pages, 0,
-(unsigned long) vsgt->num_pages << PAGE_SHIFT,
-dma_get_max_seg_size(dev_priv->dev->dev),
-GFP_KERNEL);
-   if (unlikely(ret != 0))
+   sg = __sg_alloc_table_from_pages(_tt->sgt, vsgt->pages,
+   vsgt->num_pages, 0,
+   (unsigned long) vsgt->num_pages << PAGE_SHIFT,
+   dma_get_max_seg_size(dev_priv->dev->dev),
+   NULL, 0, GFP_KERNEL);
+   if (IS_ERR(sg)) {
+   ret = PTR_ERR(sg);
goto out_sg_alloc_fail;
+   }

if (vsgt->num_pages > vmw_tt->sgt.nents) {
uint64_t over_alloc =
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 45cf7b69d852..c24cc667b56b 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -165,6 +165,22 @@ static inline void sg_set_buf(struct scatterlist *sg, 
const void *buf,
 #define for_each_sgtable_dma_sg(sgt, sg, i)\
for_each_sg((sgt)->sgl, sg, (sgt)->nents, i)

+static inline void __sg_chain(struct scatterlist *chain_sg,
+ struct scatterlist *sgl)
+{
+   /*
+* offset and length are unused for chain entry. Clear them.
+*/
+   chain_sg->offset = 0;
+   chain_sg->length = 0;
+
+   /*
+* Set lowest bit to indicate a link pointer, and make sure to clear
+* the termination bit if it happens to be set.

[Intel-gfx] [PATCH rdma-next v3 0/2] Dynamicaly allocate SG table from the pages

2020-09-22 Thread Leon Romanovsky
From: Leon Romanovsky 

Changelog:
v3:
 * Squashed Christopher's suggestion to avoid introduced new API, but extend 
existing one.
v2: https://lore.kernel.org/linux-rdma/20200916140726.839377-1-l...@kernel.org
 * Fixed indentations and comments
 * Deleted sg_alloc_next()
 * Squashed lib/scatterlist patches into one
v1: https://lore.kernel.org/lkml/20200910134259.1304543-1-l...@kernel.org
 * Changed _sg_chain to be __sg_chain
 * Added dependency on ARCH_NO_SG_CHAIN
 * Removed struct sg_append
v0:
 * https://lore.kernel.org/lkml/20200903121853.1145976-1-l...@kernel.org

--
>From Maor:

This series extends __sg_alloc_table_from_pages to allow chaining of
new pages to already initialized SG table.

This allows drivers to utilize the optimization of merging contiguous
pages without a need to pre allocate all the pages and hold them in
a very large temporary buffer prior to the call to SG table initialization.

The second patch changes the Infiniband driver to use the new API. It
removes duplicate functionality from the code and benefits the
optimization of allocating dynamic SG table from pages.

In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
E.g. for 100GB memory registration:

 Number of entries  Size
Before26214400  600.0MB
After512001.2MB

Thanks

Maor Gottlieb (2):
  lib/scatterlist: Add support in dynamic allocation of SG table from
pages
  RDMA/umem: Move to allocate SG table from pages

 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  15 +-
 drivers/infiniband/core/umem.c  |  92 ++--
 include/linux/scatterlist.h |  43 +++---
 lib/scatterlist.c   | 158 +++-
 lib/sg_pool.c   |   3 +-
 tools/testing/scatterlist/main.c|   9 +-
 7 files changed, 175 insertions(+), 157 deletions(-)

--
2.26.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC PATCH 0/5] cgroup support for GPU devices

2019-05-05 Thread Leon Romanovsky
On Sun, May 05, 2019 at 12:34:16PM -0400, Kenny Ho wrote:
> (sent again.  Not sure why my previous email was just a reply instead
> of reply-all.)
>
> On Sun, May 5, 2019 at 12:05 PM Leon Romanovsky  wrote:
> > We are talking about two different access patterns for this device
> > memory (DM). One is to use this device memory (DM) and second to 
> > configure/limit.
> > Usually those actions will be performed by different groups.
> >
> > First group (programmers) is using special API [1] through libibverbs [2]
> > without any notion of cgroups or any limitations. Second group (sysadmins)
> > is less interested in application specifics and for them "device memory" 
> > means
> > "memory" and not "rdma, nic specific, internal memory".
> Um... I am not sure that answered it, especially in the context of
> cgroup (this is just for my curiosity btw, I don't know much about
> rdma.)  You said sysadmins are less interested in application
> specifics but then how would they make the judgement call on how much
> "device memory" is provisioned to one application/container over
> another (let say you have 5 cgroup sharing an rdma device)?  What are
> the consequences of under provisioning "device memory" to an
> application?  And if they are all just memory, can a sysadmin
> provision more system memory in place of device memory (like, are they
> interchangeable)?  I guess I am confused because if device memory is
> just memory (not rdma, nic specific) to sysadmins how would they know
> to set the right amount?

One of the immediate usages of this DM that come to my mind is very
fast spinlocks for MPI applications. In such case, the amount of DM
will be property of network topology in given MPI cluster.

In this scenario, precise amount of memory will ensure that all jobs
will continue to give maximal performance despite any programmer's
error in DM allocation.

For under provisioning scenario and if application is written correctly,
users will experience more latency and less performance, due to the PCI
accesses.

Slide 3 in Liran's presentation gives brief overview about motivation.

Thanks

>
> Regards,
> Kenny
>
> > [1] ibv_alloc_dm()
> > http://man7.org/linux/man-pages/man3/ibv_alloc_dm.3.html
> > https://www.openfabrics.org/images/2018workshop/presentations/304_LLiss_OnDeviceMemory.pdf
> > [2] https://github.com/linux-rdma/rdma-core/blob/master/libibverbs/
> >
> > >
> > > I think we need to be careful about drawing the line between
> > > duplication and over couplings between subsystems.  I have other
> > > thoughts and concerns and I will try to organize them into a response
> > > in the next few days.
> > >
> > > Regards,
> > > Kenny
> > >
> > >
> > > > >
> > > > > Is AMD interested in collaborating to help shape this framework?
> > > > > It is intended to be device-neutral, so could be leveraged by various
> > > > > types of devices.
> > > > > If you have an alternative solution well underway, then maybe
> > > > > we can work together to merge our efforts into one.
> > > > > In the end, the DRM community is best served with common solution.
> > > > >
> > > > >
> > > > > >
> > > > > >>> and with future work, we could extend to:
> > > > > >>> *  track and control share of GPU time (reuse of cpu/cpuacct)
> > > > > >>> *  apply mask of allowed execution engines (reuse of cpusets)
> > > > > >>>
> > > > > >>> Instead of introducing a new cgroup subsystem for GPU devices, a 
> > > > > >>> new
> > > > > >>> framework is proposed to allow devices to register with existing 
> > > > > >>> cgroup
> > > > > >>> controllers, which creates per-device cgroup_subsys_state within 
> > > > > >>> the
> > > > > >>> cgroup.  This gives device drivers their own private cgroup 
> > > > > >>> controls
> > > > > >>> (such as memory limits or other parameters) to be applied to 
> > > > > >>> device
> > > > > >>> resources instead of host system resources.
> > > > > >>> Device drivers (GPU or other) are then able to reuse the existing 
> > > > > >>> cgroup
> > > > > >>> controls, instead of inventing similar ones.
> > > > > >>>
> > >

Re: [Intel-gfx] [RFC PATCH 0/5] cgroup support for GPU devices

2019-05-05 Thread Leon Romanovsky
On Sun, May 05, 2019 at 10:21:30AM -0400, Kenny Ho wrote:
> On Sun, May 5, 2019 at 3:14 AM Leon Romanovsky  wrote:
> > > > Doesn't RDMA already has a separate cgroup?  Why not implement it there?
> > > >
> > >
> > > Hi Kenny, I can't answer for Leon, but I'm hopeful he agrees with 
> > > rationale
> > > I gave in the cover letter.  Namely, to implement in rdma controller, 
> > > would
> > > mean duplicating existing memcg controls there.
> >
> > Exactly, I didn't feel comfortable to add notion of "device memory"
> > to RDMA cgroup and postponed that decision to later point of time.
> > RDMA operates with verbs objects and all our user space API is based around
> > that concept. At the end, system administrator will have hard time to
> > understand the differences between memcg and RDMA memory.
> Interesting.  I actually don't understand this part (I worked in
> devops/sysadmin side of things but never with rdma.)  Don't
> applications that use rdma require some awareness of rdma (I mean, you
> mentioned verbs and objects... or do they just use regular malloc for
> buffer allocation and then send it through some function?)  As a user,
> I would have this question: why do I need to configure some part of
> rdma resources under rdma cgroup while other part of rdma resources in
> a different, seemingly unrelated cgroups.

We are talking about two different access patterns for this device
memory (DM). One is to use this device memory (DM) and second to 
configure/limit.
Usually those actions will be performed by different groups.

First group (programmers) is using special API [1] through libibverbs [2]
without any notion of cgroups or any limitations. Second group (sysadmins)
is less interested in application specifics and for them "device memory" means
"memory" and not "rdma, nic specific, internal memory".

[1] ibv_alloc_dm()
http://man7.org/linux/man-pages/man3/ibv_alloc_dm.3.html
https://www.openfabrics.org/images/2018workshop/presentations/304_LLiss_OnDeviceMemory.pdf
[2] https://github.com/linux-rdma/rdma-core/blob/master/libibverbs/

>
> I think we need to be careful about drawing the line between
> duplication and over couplings between subsystems.  I have other
> thoughts and concerns and I will try to organize them into a response
> in the next few days.
>
> Regards,
> Kenny
>
>
> > >
> > > Is AMD interested in collaborating to help shape this framework?
> > > It is intended to be device-neutral, so could be leveraged by various
> > > types of devices.
> > > If you have an alternative solution well underway, then maybe
> > > we can work together to merge our efforts into one.
> > > In the end, the DRM community is best served with common solution.
> > >
> > >
> > > >
> > > >>> and with future work, we could extend to:
> > > >>> *  track and control share of GPU time (reuse of cpu/cpuacct)
> > > >>> *  apply mask of allowed execution engines (reuse of cpusets)
> > > >>>
> > > >>> Instead of introducing a new cgroup subsystem for GPU devices, a new
> > > >>> framework is proposed to allow devices to register with existing 
> > > >>> cgroup
> > > >>> controllers, which creates per-device cgroup_subsys_state within the
> > > >>> cgroup.  This gives device drivers their own private cgroup controls
> > > >>> (such as memory limits or other parameters) to be applied to device
> > > >>> resources instead of host system resources.
> > > >>> Device drivers (GPU or other) are then able to reuse the existing 
> > > >>> cgroup
> > > >>> controls, instead of inventing similar ones.
> > > >>>
> > > >>> Per-device controls would be exposed in cgroup filesystem as:
> > > >>> 
> > > >>> mount//.devices//
> > > >>> such as (for example):
> > > >>> mount//memory.devices//memory.max
> > > >>> mount//memory.devices//memory.current
> > > >>> mount//cpu.devices//cpu.stat
> > > >>> mount//cpu.devices//cpu.weight
> > > >>>
> > > >>> The drm/i915 patch in this series is based on top of other RFC work 
> > > >>> [1]
> > > >>> for i915 device memory support.
> > > >>>
> > > >>> AMD [2] and Intel [3] have proposed related work in this area within 
> > > >>> the
> > > &g

Re: [Intel-gfx] [RFC PATCH 0/5] cgroup support for GPU devices

2019-05-05 Thread Leon Romanovsky
On Fri, May 03, 2019 at 02:14:33PM -0700, Welty, Brian wrote:
>
> On 5/2/2019 3:48 PM, Kenny Ho wrote:
> > On 5/2/2019 1:34 AM, Leon Romanovsky wrote:
> >> Count us (Mellanox) too, our RDMA devices are exposing special and
> >> limited in size device memory to the users and we would like to provide
> >> an option to use cgroup to control its exposure.
>
> Hi Leon, great to hear and happy to work with you and RDMA community
> to shape this framework for use by RDMA devices as well.  The intent
> was to support more than GPU devices.
>
> Incidentally, I also wanted to ask about the rdma cgroup controller
> and if there is interest in updating the device registration implemented
> in that controller.  It could use the cgroup_device_register() that is
> proposed here.   But this is perhaps future work, so can discuss separately.

I'll try to take a look later this week.

>
>
> > Doesn't RDMA already has a separate cgroup?  Why not implement it there?
> >
>
> Hi Kenny, I can't answer for Leon, but I'm hopeful he agrees with rationale
> I gave in the cover letter.  Namely, to implement in rdma controller, would
> mean duplicating existing memcg controls there.

Exactly, I didn't feel comfortable to add notion of "device memory"
to RDMA cgroup and postponed that decision to later point of time.
RDMA operates with verbs objects and all our user space API is based around
that concept. At the end, system administrator will have hard time to
understand the differences between memcg and RDMA memory.

>
> Is AMD interested in collaborating to help shape this framework?
> It is intended to be device-neutral, so could be leveraged by various
> types of devices.
> If you have an alternative solution well underway, then maybe
> we can work together to merge our efforts into one.
> In the end, the DRM community is best served with common solution.
>
>
> >
> >>> and with future work, we could extend to:
> >>> *  track and control share of GPU time (reuse of cpu/cpuacct)
> >>> *  apply mask of allowed execution engines (reuse of cpusets)
> >>>
> >>> Instead of introducing a new cgroup subsystem for GPU devices, a new
> >>> framework is proposed to allow devices to register with existing cgroup
> >>> controllers, which creates per-device cgroup_subsys_state within the
> >>> cgroup.  This gives device drivers their own private cgroup controls
> >>> (such as memory limits or other parameters) to be applied to device
> >>> resources instead of host system resources.
> >>> Device drivers (GPU or other) are then able to reuse the existing cgroup
> >>> controls, instead of inventing similar ones.
> >>>
> >>> Per-device controls would be exposed in cgroup filesystem as:
> >>> mount//.devices//
> >>> such as (for example):
> >>> mount//memory.devices//memory.max
> >>> mount//memory.devices//memory.current
> >>> mount//cpu.devices//cpu.stat
> >>> mount//cpu.devices//cpu.weight
> >>>
> >>> The drm/i915 patch in this series is based on top of other RFC work [1]
> >>> for i915 device memory support.
> >>>
> >>> AMD [2] and Intel [3] have proposed related work in this area within the
> >>> last few years, listed below as reference.  This new RFC reuses existing
> >>> cgroup controllers and takes a different approach than prior work.
> >>>
> >>> Finally, some potential discussion points for this series:
> >>> * merge proposed .devices into a single devices directory?
> >>> * allow devices to have multiple registrations for subsets of resources?
> >>> * document a 'common charging policy' for device drivers to follow?
> >>>
> >>> [1] https://patchwork.freedesktop.org/series/56683/
> >>> [2] 
> >>> https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
> >>> [3] 
> >>> https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
> >>>
> >>>
> >>> Brian Welty (5):
> >>>   cgroup: Add cgroup_subsys per-device registration framework
> >>>   cgroup: Change kernfs_node for directories to store
> >>> cgroup_subsys_state
> >>>   memcg: Add per-device support to memory cgroup subsystem
> >>>   drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit
> >>>   drm/i915: Use memory cgroup for enforcing device memory limit
> >>>
> >>>  drivers/

Re: [Intel-gfx] [RFC PATCH 0/5] cgroup support for GPU devices

2019-05-02 Thread Leon Romanovsky
On Wed, May 01, 2019 at 10:04:33AM -0400, Brian Welty wrote:
> In containerized or virtualized environments, there is desire to have
> controls in place for resources that can be consumed by users of a GPU
> device.  This RFC patch series proposes a framework for integrating
> use of existing cgroup controllers into device drivers.
> The i915 driver is updated in this series as our primary use case to
> leverage this framework and to serve as an example for discussion.
>
> The patch series enables device drivers to use cgroups to control the
> following resources within a GPU (or other accelerator device):
> *  control allocation of device memory (reuse of memcg)

Count us (Mellanox) too, our RDMA devices are exposing special and
limited in size device memory to the users and we would like to provide
an option to use cgroup to control its exposure.

> and with future work, we could extend to:
> *  track and control share of GPU time (reuse of cpu/cpuacct)
> *  apply mask of allowed execution engines (reuse of cpusets)
>
> Instead of introducing a new cgroup subsystem for GPU devices, a new
> framework is proposed to allow devices to register with existing cgroup
> controllers, which creates per-device cgroup_subsys_state within the
> cgroup.  This gives device drivers their own private cgroup controls
> (such as memory limits or other parameters) to be applied to device
> resources instead of host system resources.
> Device drivers (GPU or other) are then able to reuse the existing cgroup
> controls, instead of inventing similar ones.
>
> Per-device controls would be exposed in cgroup filesystem as:
> mount//.devices//
> such as (for example):
> mount//memory.devices//memory.max
> mount//memory.devices//memory.current
> mount//cpu.devices//cpu.stat
> mount//cpu.devices//cpu.weight
>
> The drm/i915 patch in this series is based on top of other RFC work [1]
> for i915 device memory support.
>
> AMD [2] and Intel [3] have proposed related work in this area within the
> last few years, listed below as reference.  This new RFC reuses existing
> cgroup controllers and takes a different approach than prior work.
>
> Finally, some potential discussion points for this series:
> * merge proposed .devices into a single devices directory?
> * allow devices to have multiple registrations for subsets of resources?
> * document a 'common charging policy' for device drivers to follow?
>
> [1] https://patchwork.freedesktop.org/series/56683/
> [2] https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
> [3] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
>
>
> Brian Welty (5):
>   cgroup: Add cgroup_subsys per-device registration framework
>   cgroup: Change kernfs_node for directories to store
> cgroup_subsys_state
>   memcg: Add per-device support to memory cgroup subsystem
>   drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit
>   drm/i915: Use memory cgroup for enforcing device memory limit
>
>  drivers/gpu/drm/drm_drv.c  |  12 +
>  drivers/gpu/drm/drm_gem.c  |   7 +
>  drivers/gpu/drm/i915/i915_drv.c|   2 +-
>  drivers/gpu/drm/i915/intel_memory_region.c |  24 +-
>  include/drm/drm_device.h   |   3 +
>  include/drm/drm_drv.h  |   8 +
>  include/drm/drm_gem.h  |  11 +
>  include/linux/cgroup-defs.h|  28 ++
>  include/linux/cgroup.h |   3 +
>  include/linux/memcontrol.h |  10 +
>  kernel/cgroup/cgroup-v1.c  |  10 +-
>  kernel/cgroup/cgroup.c | 310 ++---
>  mm/memcontrol.c| 183 +++-
>  13 files changed, 552 insertions(+), 59 deletions(-)
>
> --
> 2.21.0
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-16 Thread Leon Romanovsky
On Mon, Jul 16, 2018 at 04:12:49PM -0700, Andrew Morton wrote:
> On Mon, 16 Jul 2018 13:50:58 +0200 Michal Hocko  wrote:
>
> > From: Michal Hocko 
> >
> > There are several blockable mmu notifiers which might sleep in
> > mmu_notifier_invalidate_range_start and that is a problem for the
> > oom_reaper because it needs to guarantee a forward progress so it cannot
> > depend on any sleepable locks.
> >
> > Currently we simply back off and mark an oom victim with blockable mmu
> > notifiers as done after a short sleep. That can result in selecting a
> > new oom victim prematurely because the previous one still hasn't torn
> > its memory down yet.
> >
> > We can do much better though. Even if mmu notifiers use sleepable locks
> > there is no reason to automatically assume those locks are held.
> > Moreover majority of notifiers only care about a portion of the address
> > space and there is absolutely zero reason to fail when we are unmapping an
> > unrelated range. Many notifiers do really block and wait for HW which is
> > harder to handle and we have to bail out though.
> >
> > This patch handles the low hanging fruid. 
> > __mmu_notifier_invalidate_range_start
> > gets a blockable flag and callbacks are not allowed to sleep if the
> > flag is set to false. This is achieved by using trylock instead of the
> > sleepable lock for most callbacks and continue as long as we do not
> > block down the call chain.
>
> I assume device driver developers are wondering "what does this mean
> for me".  As I understand it, the only time they will see
> blockable==false is when their driver is being called in response to an
> out-of-memory condition, yes?  So it is a very rare thing.

I can't say for everyone, but at least for me (mlx5), it is not rare event.
I'm seeing OOM very often while I'm running my tests in low memory VMs.

Thanks

>
> Any suggestions regarding how the driver developers can test this code
> path?  I don't think we presently have a way to fake an oom-killing
> event?  Perhaps we should add such a thing, given the problems we're
> having with that feature.


signature.asc
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-16 Thread Leon Romanovsky
On Tue, Jul 10, 2018 at 04:14:10PM +0200, Michal Hocko wrote:
> On Tue 10-07-18 16:40:40, Leon Romanovsky wrote:
> > On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> > > On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > > > This is the v2 of RFC based on the feedback I've received so far. The
> > > > code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
> > > > because I have no idea how.
> > > >
> > > > Any further feedback is highly appreciated of course.
> > >
> > > Any other feedback before I post this as non-RFC?
> >
> > From mlx5 perspective, who is primary user of umem_odp.c your change looks 
> > ok.
>
> Can I assume your Acked-by?
>
> Thanks for your review!

For mlx and umem_odp pieces,
Acked-by: Leon Romanovsky 

Thanks


signature.asc
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-11 Thread Leon Romanovsky
On Wed, Jul 11, 2018 at 01:13:18PM +0200, Michal Hocko wrote:
> On Wed 11-07-18 13:14:47, Leon Romanovsky wrote:
> > On Wed, Jul 11, 2018 at 11:03:53AM +0200, Michal Hocko wrote:
> > > On Tue 10-07-18 19:20:20, Leon Romanovsky wrote:
> > > > On Tue, Jul 10, 2018 at 04:14:10PM +0200, Michal Hocko wrote:
> > > > > On Tue 10-07-18 16:40:40, Leon Romanovsky wrote:
> > > > > > On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> > > > > > > On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > > > > > > > This is the v2 of RFC based on the feedback I've received so 
> > > > > > > > far. The
> > > > > > > > code even compiles as a bonus ;) I haven't runtime tested it 
> > > > > > > > yet, mostly
> > > > > > > > because I have no idea how.
> > > > > > > >
> > > > > > > > Any further feedback is highly appreciated of course.
> > > > > > >
> > > > > > > Any other feedback before I post this as non-RFC?
> > > > > >
> > > > > > From mlx5 perspective, who is primary user of umem_odp.c your 
> > > > > > change looks ok.
> > > > >
> > > > > Can I assume your Acked-by?
> > > >
> > > > I didn't have a chance to test it because it applies on our rdma-next, 
> > > > but
> > > > fails to compile.
> > >
> > > What is the compilation problem? Is it caused by the patch or some other
> > > unrelated changed?
> >
> > Thanks for pushing me to take a look on it.
> > Your patch needs the following hunk to properly compile at least on my 
> > system.
>
> I suspect you were trying the original version. I've posted an updated
> patch here http://lkml.kernel.org/r/20180627074421.gf32...@dhcp22.suse.cz
> and all these issues should be fixed there. Including many other fixes.
>

Ohh, you used --reply-to, IMHO it is best way to make sure that the
patch will be lost :)

> Could you have a look at that one please?

I grabbed it, the results will be overnight only.

Thanks

> --
> Michal Hocko
> SUSE Labs


signature.asc
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-11 Thread Leon Romanovsky
On Wed, Jul 11, 2018 at 11:03:53AM +0200, Michal Hocko wrote:
> On Tue 10-07-18 19:20:20, Leon Romanovsky wrote:
> > On Tue, Jul 10, 2018 at 04:14:10PM +0200, Michal Hocko wrote:
> > > On Tue 10-07-18 16:40:40, Leon Romanovsky wrote:
> > > > On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> > > > > On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > > > > > This is the v2 of RFC based on the feedback I've received so far. 
> > > > > > The
> > > > > > code even compiles as a bonus ;) I haven't runtime tested it yet, 
> > > > > > mostly
> > > > > > because I have no idea how.
> > > > > >
> > > > > > Any further feedback is highly appreciated of course.
> > > > >
> > > > > Any other feedback before I post this as non-RFC?
> > > >
> > > > From mlx5 perspective, who is primary user of umem_odp.c your change 
> > > > looks ok.
> > >
> > > Can I assume your Acked-by?
> >
> > I didn't have a chance to test it because it applies on our rdma-next, but
> > fails to compile.
>
> What is the compilation problem? Is it caused by the patch or some other
> unrelated changed?

Thanks for pushing me to take a look on it.
Your patch needs the following hunk to properly compile at least on my system.

I'll take it to our regression.

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 369867501bed..1f364a157097 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -155,9 +155,9 @@ struct mmu_notifier_ops {
 * cannot block, mmu_notifier_ops.flags should have
 * MMU_INVALIDATE_DOES_NOT_BLOCK set.
 */
-   void (*invalidate_range_start)(struct mmu_notifier *mn,
+   int (*invalidate_range_start)(struct mmu_notifier *mn,
   struct mm_struct *mm,
-  unsigned long start, unsigned long end);
+  unsigned long start, unsigned long end, 
bool blockable);
void (*invalidate_range_end)(struct mmu_notifier *mn,
 struct mm_struct *mm,
 unsigned long start, unsigned long end);
@@ -229,7 +229,7 @@ extern int __mmu_notifier_test_young(struct mm_struct *mm,
 unsigned long address);
 extern void __mmu_notifier_change_pte(struct mm_struct *mm,
  unsigned long address, pte_t pte);
-extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+extern int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
  unsigned long start, unsigned long end,
  bool blockable);
 extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 6adac113e96d..92f70e4c6252 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -95,7 +95,7 @@ static inline int check_stable_address_space(struct mm_struct 
*mm)
return 0;
 }

-void __oom_reap_task_mm(struct mm_struct *mm);
+bool __oom_reap_task_mm(struct mm_struct *mm);

 extern unsigned long oom_badness(struct task_struct *p,
struct mem_cgroup *memcg, const nodemask_t *nodemask,
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7e0c6e78ae5c..7c7bd6f3298e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1,6 +1,6 @@
 /*
  *  linux/mm/oom_kill.c
- *
+ *
  *  Copyright (C)  1998,2000  Rik van Riel
  * Thanks go out to Claus Fischer for some serious inspiration and
  * for goading me into coding this file...
@@ -569,7 +569,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, 
struct mm_struct *mm)
if (!__oom_reap_task_mm(mm)) {
up_read(>mmap_sem);
ret = false;
-   goto out_unlock;
+   goto unlock_oom;
}

pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, 
file-rss:%lukB, shmem-rss:%lukB\n",

Thanks

> --
> Michal Hocko
> SUSE Labs


signature.asc
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-10 Thread Leon Romanovsky
On Tue, Jul 10, 2018 at 04:14:10PM +0200, Michal Hocko wrote:
> On Tue 10-07-18 16:40:40, Leon Romanovsky wrote:
> > On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> > > On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > > > This is the v2 of RFC based on the feedback I've received so far. The
> > > > code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
> > > > because I have no idea how.
> > > >
> > > > Any further feedback is highly appreciated of course.
> > >
> > > Any other feedback before I post this as non-RFC?
> >
> > From mlx5 perspective, who is primary user of umem_odp.c your change looks 
> > ok.
>
> Can I assume your Acked-by?

I didn't have a chance to test it because it applies on our rdma-next, but
fails to compile.

Thanks

>
> Thanks for your review!
> --
> Michal Hocko
> SUSE Labs
>


signature.asc
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

2018-07-10 Thread Leon Romanovsky
On Mon, Jul 09, 2018 at 02:29:08PM +0200, Michal Hocko wrote:
> On Wed 27-06-18 09:44:21, Michal Hocko wrote:
> > This is the v2 of RFC based on the feedback I've received so far. The
> > code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
> > because I have no idea how.
> >
> > Any further feedback is highly appreciated of course.
>
> Any other feedback before I post this as non-RFC?

From mlx5 perspective, who is primary user of umem_odp.c your change looks ok.

Thanks


signature.asc
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH rdma-next 01/21] drm/i915: Move u64-to-ptr helpers to general header

2018-05-15 Thread Leon Romanovsky
On Mon, May 14, 2018 at 02:10:54PM -0600, Jason Gunthorpe wrote:
> On Thu, May 03, 2018 at 04:36:55PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leo...@mellanox.com>
> >
> > The macro u64_to_ptr() and function ptr_to_u64() are useful enough
> > to be part of general header, so move them there and allow RDMA
> > subsystem reuse them.
> >
> > Signed-off-by: Leon Romanovsky <leo...@mellanox.com>
> > ---
> >  drivers/gpu/drm/i915/i915_utils.h | 12 ++--
> >  include/linux/kernel.h| 12 
> >  2 files changed, 14 insertions(+), 10 deletions(-)
>
> This patch and the next one to kernel.h will need to be Ack'd be
> someone.. But I am not sure who.. AndrewM perhaps?

Why is that?

The kernel.h doesn't have explicit maintainer and is managed by
community effort.

Thanks

>
> Jason


signature.asc
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH rdma-next 01/21] drm/i915: Move u64-to-ptr helpers to general header

2018-05-04 Thread Leon Romanovsky
From: Leon Romanovsky <leo...@mellanox.com>

The macro u64_to_ptr() and function ptr_to_u64() are useful enough
to be part of general header, so move them there and allow RDMA
subsystem reuse them.

Signed-off-by: Leon Romanovsky <leo...@mellanox.com>
---
 drivers/gpu/drm/i915/i915_utils.h | 12 ++--
 include/linux/kernel.h| 12 
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index 51dbfe5bb418..de3bfda7bf96 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -25,6 +25,8 @@
 #ifndef __I915_UTILS_H
 #define __I915_UTILS_H

+#include 
+
 #undef WARN_ON
 /* Many gcc seem to no see through this and fall over :( */
 #if 0
@@ -102,16 +104,6 @@
__T;\
 })

-static inline u64 ptr_to_u64(const void *ptr)
-{
-   return (uintptr_t)ptr;
-}
-
-#define u64_to_ptr(T, x) ({\
-   typecheck(u64, x);  \
-   (T *)(uintptr_t)(x);\
-})
-
 #define __mask_next_bit(mask) ({   \
int __idx = ffs(mask) - 1;  \
mask &= ~BIT(__idx);\
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 6a1eb0b0aad9..a738393c9694 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -70,6 +70,18 @@
  */
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))

+static inline u64 ptr_to_u64(const void *ptr)
+{
+   return (uintptr_t)ptr;
+}
+
+#define u64_to_ptr(T, x) ( \
+{  \
+   typecheck(u64, x);  \
+   (T *)(uintptr_t)(x);\
+}  \
+)
+
 #define u64_to_user_ptr(x) (   \
 {  \
typecheck(u64, x);  \
--
2.14.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 00/12] radix-tree: split out struct radix_tree_root out to

2017-10-10 Thread Leon Romanovsky
On Mon, Oct 09, 2017 at 01:10:01AM +0900, Masahiro Yamada wrote:

<...>
>
> By splitting out the radix_tree_root definition,
> we can reduce the header file dependency.
>
> Reducing the header dependency will help for speeding the kernel
> build, suppressing unnecessary recompile of objects during
> git-bisect'ing, etc.

If we judge by the diffstat of this series, there won't be any
visible change in anything mentioned above.

<...>

>
> Masahiro Yamada (12):
>   radix-tree: replace  with 
>   radix-tree: split struct radix_tree_root to 
>   irqdomain: replace  with 
>   writeback: replace  with 
>   iocontext.h: replace  with
> 
>   fs: replace  with 
>   blkcg: replace  with 
>   fscache: include 
>   sh: intc: replace  with 
>   net/mlx4: replace  with 
>   net/mlx5: replace  with 
>   drm/i915: replace  with 
>
>  drivers/gpu/drm/i915/i915_gem.c|  1 +
>  drivers/gpu/drm/i915/i915_gem_context.c|  1 +
>  drivers/gpu/drm/i915/i915_gem_context.h|  2 +-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  1 +
>  drivers/gpu/drm/i915/i915_gem_object.h |  1 +
>  drivers/net/ethernet/mellanox/mlx4/cq.c|  1 +
>  drivers/net/ethernet/mellanox/mlx4/mlx4.h  |  2 +-
>  drivers/net/ethernet/mellanox/mlx4/qp.c|  1 +
>  drivers/net/ethernet/mellanox/mlx4/srq.c   |  1 +
>  drivers/sh/intc/internals.h|  2 +-
>  include/linux/backing-dev-defs.h   |  2 +-
>  include/linux/blk-cgroup.h |  2 +-
>  include/linux/fs.h |  2 +-
>  include/linux/fscache.h|  1 +
>  include/linux/iocontext.h  |  2 +-
>  include/linux/irqdomain.h  |  2 +-
>  include/linux/mlx4/device.h|  2 +-
>  include/linux/mlx4/qp.h|  1 +
>  include/linux/mlx5/driver.h|  2 +-
>  include/linux/mlx5/qp.h|  1 +
>  include/linux/radix-tree-root.h| 24 
>  include/linux/radix-tree.h |  8 ++--
>  22 files changed, 46 insertions(+), 16 deletions(-)
>  create mode 100644 include/linux/radix-tree-root.h
>
> --
> 2.7.4
>


signature.asc
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 00/12] radix-tree: split out struct radix_tree_root out to

2017-10-10 Thread Leon Romanovsky
On Mon, Oct 09, 2017 at 02:58:58PM +0900, Masahiro Yamada wrote:
> 2017-10-09 3:52 GMT+09:00 Leon Romanovsky <leo...@mellanox.com>:
> > On Mon, Oct 09, 2017 at 01:10:01AM +0900, Masahiro Yamada wrote:
> >
> > <...>
> >>
> >> By splitting out the radix_tree_root definition,
> >> we can reduce the header file dependency.
> >>
> >> Reducing the header dependency will help for speeding the kernel
> >> build, suppressing unnecessary recompile of objects during
> >> git-bisect'ing, etc.
> >
> > If we judge by the diffstat of this series, there won't be any
> > visible change in anything mentioned above.
>
>
> Of course, judging by the diffstat is wrong.
>

I'm more than happy to be wrong and you for sure can help me.
Can you provide any quantitative support of your claims?

Thanks

>
>
> --
> Best Regards
> Masahiro Yamada
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx