On Tue, Jun 04, 2013 at 02:50:33PM +0300, Haggai Eran wrote: > Our HCAs use their own page tables, in addition to a TLB cache. A miss > in the TLB cache that can be filled from the HCA's page tables will not > cause an RNR NAK, since the HCA can fill it relatively fast without the > help of the operating system. If the page is missing from the HCA's page > table though it will trigger a page fault and ask the OS to bring that > page. Since this might take longer, in these cases we send an RNR NAK.
I also saw the presentation at the OFA conference and had several questions.. So, my assumption: - There is a fast small TLB inside the HCA - There is a larger page table the HCA accesses inside the host memory AFAIK, this is basically the construction we have today, and the larger page table is expected to be fully populated. Thus, I assume, on-demand allows pages that are 'absent' in the larger page table to generate faults to the CPU? So how does lifetime work here? - Can you populate the larger page table as soon as registration happens, relying on mmu notifier and HCA faults to keep it consistent? - After a fault happens are the faulted pages pinned? How does lifetime work here? What happens when the kernel wants to evict a page that has currently ongoing RDMA? What happens if user space munmaps something while the remote is doing RDMA to it? - If I recall the presentation, the fault-in operation was very slow, what is the cause for this? > > He was very concerned about what the size of the TLB on the HCA, > > and therefore what the actual run-time behavior would be for > > sending around large messages via MPI -- i.e., would RDMA'ing 1GB > > messages now incur this > > HCA-must-reload-its-TLB-and-therefore-incur-RNR-NAKs behavior? > > > We have a mechanism to prefetch the pages needed for a large message > upon the first page fault, which can also help amortizing the cost of > the page fault for larger messages. My reaction was that a pre-fault WR is needed to make this performant. But, I also don't fully understand why we need so many faults from the HCA in the first place. If you've properly solved the lifetime issues then the initial registration can meaningfully pre-initialize the page table in many cases, and computing the physical address of a page should not be so expensive. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html