Re: [ewg] New Open MPI SRPM uploaded (v1.3)

2009-01-21 Thread Jeff Squyres

Thanks!  It was a long time coming.  :-)

On Jan 21, 2009, at 2:17 AM, Tziporet Koren wrote:


Jeff Squyres wrote:
Open MPI v1.3 SRPM has been uploaded to ~jsquyres/ofed_1.4, and I  
updated latest.txt.



Congratulation for the 1.3 release
Tziporet



--
Jeff Squyres
Cisco Systems

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] New Open MPI SRPM uploaded (v1.3)

2009-01-21 Thread Steve Wise

Indeed! Congrats!

This version supports iWARP.



Jeff Squyres wrote:

Thanks!  It was a long time coming.  :-)

On Jan 21, 2009, at 2:17 AM, Tziporet Koren wrote:


Jeff Squyres wrote:
Open MPI v1.3 SRPM has been uploaded to ~jsquyres/ofed_1.4, and I 
updated latest.txt.



Congratulation for the 1.3 release
Tziporet





___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] mlx4_ib: Optimize hugetlab pages support

2009-01-21 Thread Roland Dreier
  Since Linux does not merge adjacent pages into a single scatter entry through
  calls to dma_map_sg(), we do this for huge pages which are guaranteed to be
  comprised of adjacent natural pages of size PAGE_SIZE. This will result in a
  significantly lower number of MTT segments used for registering hugetlb 
  memory
  regions.

Actually on some platforms, adjacent pages will be put into a single sg
entry (eg look at VMERGE under arch/powerpc).  So your
mlx4_ib_umem_write_huge_mtt() function won't work in such cases.  In any
case relying on such a fragile implementation detail with no checking is
not a good idea for future maintainability.  So this needs to be
reimplemented to work no matter how the DMA mapping layer presents the
hugetlb region (and also fail gracefully if the bus addresses returned
from DMA mapping don't work with the huge page size -- since it would be
correct for the DMA mapping layer to map every PAGE_SIZE chunk to an
arbitrary bus address, eg if the IOMMU address space becomes very
fragmented)

  @@ -142,15 +176,34 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, 
  u64 start, u64 length,
   
   n = ib_umem_page_count(mr-umem);
   shift = ilog2(mr-umem-page_size);
  -
  -err = mlx4_mr_alloc(dev-dev, to_mpd(pd)-pdn, virt_addr, length,
  +if (mr-umem-hugetlb) {
  +nhuge = ALIGN(n  shift, HPAGE_SIZE)  HPAGE_SHIFT;
  +shift_huge = HPAGE_SHIFT;
  +err = mlx4_mr_alloc(dev-dev, to_mpd(pd)-pdn, virt_addr, 
  length,
  +convert_access(access_flags), nhuge, 
  shift_huge, mr-mmr);
  +if (err)
  +goto err_umem;
  +
  +err = mlx4_ib_umem_write_huge_mtt(dev, mr-mmr.mtt, mr-umem, 
  nhuge);
  +if (err) {
  +if (err != -EAGAIN)
  +goto err_mr;
  +else {
  +mlx4_mr_free(to_mdev(pd-device)-dev, 
  mr-mmr);
  +goto regular_pages;
  +}
  +}
  +} else {
  +regular_pages:
  +err = mlx4_mr_alloc(dev-dev, to_mpd(pd)-pdn, virt_addr, 
  length,
   convert_access(access_flags), n, shift, mr-mmr);
  -if (err)
  -goto err_umem;
  +if (err)
  +goto err_umem;
   
  -err = mlx4_ib_umem_write_mtt(dev, mr-mmr.mtt, mr-umem);
  -if (err)
  -goto err_mr;
  +err = mlx4_ib_umem_write_mtt(dev, mr-mmr.mtt, mr-umem);
  +if (err)
  +goto err_mr;
  +}
   
   err = mlx4_mr_enable(dev-dev, mr-mmr);
   if (err)

Also I think this could be made much clearer by putting the huge page
handling code in a separate function like mlx4_ib_reg_hugetlb_user_mr()
and then you could just do something like

if (!mr-umem-hugetlb || mlx4_ib_reg_hugetlb_user_mr(...)) {
// usual code path
}

rather than having to use a goto into another block.

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg