On 01/02/2018 14:10, Eduardo Habkost wrote: > On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote: >> On 01/02/2018 4:22, Michael S. Tsirkin wrote: >>> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote: > [...] >>>> BTW, what's the root cause for requiring HVAs in the buffer? >>> >>> It's a side effect of the kernel/userspace API which always wants >>> a single HVA/len pair to map memory for the application. >>> >>> >> >> Hi Eduardo and Michael, >> >>>> Can >>>> this be fixed? >>> >>> I think yes. It'd need to be a kernel patch for the RDMA subsystem >>> mapping an s/g list with actual memory. The HVA/len pair would then just >>> be used to refer to the region, without creating the two mappings. >>> >>> Something like splitting the register mr into >>> >>> mr = create mr (va/len) - allocate a handle and record the va/len >>> >>> addmemory(mr, offset, hva, len) - pin memory >>> >>> register mr - pass it to HW >>> >>> As a nice side effect we won't burn so much virtual address space. >>> >> >> We would still need a contiguous virtual address space range (for post-send) >> which we don't have since guest contiguous virtual address space >> will always end up as non-contiguous host virtual address space. >> >> I am not sure the RDMA HW can handle a large VA with holes. > > I'm confused. Why would the hardware see and care about virtual > addresses?
The post-send operations bypasses the kernel, and the process puts in the work request GVA addresses. > How exactly does the hardware translates VAs to > PAs? The HW maintains a page-directory like structure different form MMU VA -> phys pages > What if the process page tables change? > Since the page tables the HW uses are their own, we just need the phys page to be pinned. >> >> An alternative would be 0-based MR, QEMU intercepts the post-send >> operations and can substract the guest VA base address. >> However I didn't see the implementation in kernel for 0 based MRs >> and also the RDMA maintainer said it would work for local keys >> and not for remote keys. > > This is also unexpected: are GVAs visible to the virtual RDMA > hardware? Yes, explained above > Where does the QEMU pvrdma code translates GVAs to > GPAs? > During reg_mr (memory registration commands) Then it registers the same addresses to the real HW. (as Host virtual addresses) Thanks, Marcel >> >>> This will fix rdma with hugetlbfs as well which is currently broken. >>> >>> >> >> There is already a discussion on the linux-rdma list: >> https://www.spinics.net/lists/linux-rdma/msg60079.html >> But it will take some (actually a lot of) time, we are currently talking >> about >> a possible API. And it does not solve the re-mapping... >> >> Thanks, >> Marcel >> >>>> -- >>>> Eduardo >> >