***SPAM*** Re: [ewg] OFED (EWG) meeting agenda for tomorrow (Jan 26)
> > 3. OFED 1.5 schedule > > Betsy from Qlogic suggested to early the release. > > From the other hand Olga from Voltaire asked to stay with the July time > frame. > > Based on the decisions in 1 & 2 we should decide on the release schedule. We should decide whether we want to have one or two OFED releases per year. If we will decide that we should go for one OFED release per year, I think we should postpone OFED 1.5 release to October. And have dot release in a middle. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] OFED (EWG) meeting agenda for tomorrow (Jan 26)
These are the agenda items for the meeting tomorrow: 1. Decide on 1.4.1 release If yes - what is the scope of the release (suggestions: RH 5.3, SLES 11, RDS with iWARP, Open MPI 1.3) According to decisions we took in point releases we add only new OSes and critical bug fixes. Since some of the above features are not standing in this criteria we need to decide. 2. OFED 1.5 kernel base In last meeting we decided on 2.6.29. However there is a concern: 2.6.29 is already in release phase (RC2 is out), thus any new kernel code that we will develop will be posted for 2.6.30 only, and then we will need to back-port it if we will want to take it to 1.5. Thus it seems it is more reasonable to have the kernel base 2.6.30 3. OFED 1.5 schedule Betsy from Qlogic suggested to early the release. >From the other hand Olga from Voltaire asked to stay with the July time frame. Based on the decisions in 1 & 2 we should decide on the release schedule. Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RE: RHEL 5.3 and OFED 1.4.x
Woodruff, Robert J wrote: Personally I do not have a problem with including it, since MPI is an isolated component and does not effect the core stack, but I thought that we had discussed in Sonoma last year not including major new features in point releases to reduce the QA that is needed. And, in general I think that is the way that kernel.org works, point releases are just for bug fixes. In any case, lets discuss it again in the EWG on Monday. I will add this to the agenda Note that we will start working to add RH 5.3 backports now to see how much effort it is Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH v2] mlx4_ib: Optimize hugetlab pages support
Since Linux may not merge adjacent pages into a single scatter entry through calls to dma_map_sg(), we check the special case of hugetlb pages which are likely to be mapped to coniguous dma addresses and if they are, take advantage of this. This will result in a significantly lower number of MTT segments used for registering hugetlb memory regions. Signed-off-by: Eli Cohen --- drivers/infiniband/hw/mlx4/mr.c | 80 ++ 1 files changed, 71 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 8e4d26d..7748823 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -119,6 +119,65 @@ out: return err; } +static int handle_hugetlb_user_mr(struct ib_pd *pd, struct mlx4_ib_mr *mr, + u64 virt_addr, int access_flags) +{ +#ifdef CONFIG_HUGETLB_PAGE + struct mlx4_ib_dev *dev = to_mdev(pd->device); + struct ib_umem_chunk *chunk; + unsigned dsize; + dma_addr_t daddr; + unsigned cur_size = 0; + dma_addr_t uninitialized_var(cur_addr); + int n; + struct ib_umem *umem = mr->umem; + u64 *arr; + int err = 0; + int i; + int j = 0; + + n = PAGE_ALIGN(umem->length + (umem->address & ~HPAGE_MASK)) >> HPAGE_SHIFT; + arr = kmalloc(n * sizeof *arr, GFP_KERNEL); + if (!arr) + return -ENOMEM; + + list_for_each_entry(chunk, &umem->chunk_list, list) + for (i = 0; i < chunk->nmap; ++i) { + daddr = sg_dma_address(&chunk->page_list[i]); + dsize = sg_dma_len(&chunk->page_list[i]); + if (!cur_size) { + cur_addr = daddr; + cur_size = dsize; + } else if (cur_addr + cur_size != daddr) { + err = -EINVAL; + goto out; + } else + cur_size += dsize; + + if (cur_size > HPAGE_SIZE) { + err = -EINVAL; + goto out; + } else if (cur_size == HPAGE_SIZE) { + cur_size = 0; + arr[j++] = cur_addr; + } + } + + err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, umem->length, + convert_access(access_flags), n, HPAGE_SHIFT, &mr->mmr); + if (err) + goto out; + + err = mlx4_write_mtt(dev->dev, &mr->mmr.mtt, 0, n, arr); + +out: + kfree(arr); + return err; +#else + return -ENOSYS; +#endif +} + struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int access_flags, struct ib_udata *udata) @@ -140,17 +199,20 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, goto err_free; } - n = ib_umem_page_count(mr->umem); - shift = ilog2(mr->umem->page_size); + if (!mr->umem->hugetlb || + handle_hugetlb_user_mr(pd, mr, virt_addr, access_flags)) { + n = ib_umem_page_count(mr->umem); + shift = ilog2(mr->umem->page_size); - err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, length, - convert_access(access_flags), n, shift, &mr->mmr); - if (err) - goto err_umem; + err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, length, + convert_access(access_flags), n, shift, &mr->mmr); + if (err) + goto err_umem; - err = mlx4_ib_umem_write_mtt(dev, &mr->mmr.mtt, mr->umem); - if (err) - goto err_mr; + err = mlx4_ib_umem_write_mtt(dev, &mr->mmr.mtt, mr->umem); + if (err) + goto err_mr; + } err = mlx4_mr_enable(dev->dev, &mr->mmr); if (err) -- 1.6.1 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH] ib_core: save process's virtual address in struct ib_umem
add "address" field to struct ib_umem so low level drivers will have this information which may be needed in order to correctly calculate the number of huge pages. Signed-off-by: Eli Cohen --- drivers/infiniband/core/umem.c |1 + include/rdma/ib_umem.h |1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 6f7c096..4c076c4 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -102,6 +102,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, umem->context = context; umem->length= size; umem->offset= addr & ~PAGE_MASK; + umem->address = addr; umem->page_size = PAGE_SIZE; /* * We ask for writable memory if any access flags other than diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 9ee0d2e..c385bb6 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -43,6 +43,7 @@ struct ib_umem { struct ib_ucontext *context; size_t length; int offset; + unsigned long address; int page_size; int writable; int hugetlb; -- 1.6.1 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [PATCH v1] mlx4_ib: Optimize hugetlab pages support
On Thu, Jan 22, 2009 at 09:07:41PM -0800, Roland Dreier wrote: > > seems this might underestimate by 1 if the region doesn't start/end on a > huge-page aligned boundary (but we would still want to use big pages to > register it). > Looks like we must pass the virtual address through struct ib_umem to the low level driver. > > I think we could avoid the uninitialized_var() stuff and having restart > at all by just doing cur_size = 0 at the start of the loop, and then > instead of if (restart) just test if cur_size is 0. > initializing cur_size and eliminating restart works fine but cur_addr still needs this trick. I am sending two patches, one for ib_core and one for mlx4. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg