> Since Linux does not merge adjacent pages into a single scatter entry through > calls to dma_map_sg(), we do this for huge pages which are guaranteed to be > comprised of adjacent natural pages of size PAGE_SIZE. This will result in a > significantly lower number of MTT segments used for registering hugetlb > memory > regions.
Actually on some platforms, adjacent pages will be put into a single sg entry (eg look at VMERGE under arch/powerpc). So your mlx4_ib_umem_write_huge_mtt() function won't work in such cases. In any case relying on such a fragile implementation detail with no checking is not a good idea for future maintainability. So this needs to be reimplemented to work no matter how the DMA mapping layer presents the hugetlb region (and also fail gracefully if the bus addresses returned from DMA mapping don't work with the huge page size -- since it would be correct for the DMA mapping layer to map every PAGE_SIZE chunk to an arbitrary bus address, eg if the IOMMU address space becomes very fragmented) > @@ -142,15 +176,34 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, > u64 start, u64 length, > > n = ib_umem_page_count(mr->umem); > shift = ilog2(mr->umem->page_size); > - > - err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, length, > + if (mr->umem->hugetlb) { > + nhuge = ALIGN(n << shift, HPAGE_SIZE) >> HPAGE_SHIFT; > + shift_huge = HPAGE_SHIFT; > + err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, > length, > + convert_access(access_flags), nhuge, > shift_huge, &mr->mmr); > + if (err) > + goto err_umem; > + > + err = mlx4_ib_umem_write_huge_mtt(dev, &mr->mmr.mtt, mr->umem, > nhuge); > + if (err) { > + if (err != -EAGAIN) > + goto err_mr; > + else { > + mlx4_mr_free(to_mdev(pd->device)->dev, > &mr->mmr); > + goto regular_pages; > + } > + } > + } else { > +regular_pages: > + err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, > length, > convert_access(access_flags), n, shift, &mr->mmr); > - if (err) > - goto err_umem; > + if (err) > + goto err_umem; > > - err = mlx4_ib_umem_write_mtt(dev, &mr->mmr.mtt, mr->umem); > - if (err) > - goto err_mr; > + err = mlx4_ib_umem_write_mtt(dev, &mr->mmr.mtt, mr->umem); > + if (err) > + goto err_mr; > + } > > err = mlx4_mr_enable(dev->dev, &mr->mmr); > if (err) Also I think this could be made much clearer by putting the huge page handling code in a separate function like mlx4_ib_reg_hugetlb_user_mr() and then you could just do something like if (!mr->umem->hugetlb || mlx4_ib_reg_hugetlb_user_mr(...)) { // usual code path } rather than having to use a goto into another block. - R. _______________________________________________ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg