Re: [openib-general] Re: RDMA memory registration
On 5/3/05, David Addison <[EMAIL PROTECTED]> wrote: > We believe the IOPROC patch is generic and powerful and would allow other > RDMA NICs to solve the page registration problems in a different manner. > For NICs which require page registration, new VM hooks can be used to avoid > pages being unloaded whilst DMAs are active. Our latest cut of the IOPROC > patch > has such a hook. > The key phrase here is "avoid pages being unloaded whilst DMAs are active". Correct RDMA behavior requires preventing any loss of the content of those pages in the period from the end of the DMA until the next completion is reaped. If the kernel were to start transferring the pages immediately after the DMA completed, what would prevent the associated receive completion from being generated before the migration was completed? And if a migration is in progress, how is this feedback given to RDMA device and when? Explicitly suspending a Memory Registration allows detection of the problem while the disposition of the packet is still pending. Postponing determination that the target memory is suspended until the actual DMA transfer is attempted is problematic. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
> On 5/3/05, David Addison <[EMAIL PROTECTED]> wrote: > > as our recent IOPROC patch on lkml shows, it's not that invasive. There > > are just 24 hooks added to the Linux VM code paths - which we have been > > able to > > maintain outside the mainline tree for many years now. > > As these hooks only need to synchronise the Elan's MMU state with that of > > the > > CPU, the device drivers calls don't change the Linux MM behaviour. > > > > We believe the IOPROC patch is generic and powerful and would allow other > > RDMA NICs to solve the page registration problems in a different manner. > > For NICs which require page registration, new VM hooks can be used to avoid > > pages being unloaded whilst DMAs are active. Our latest cut of the IOPROC > > patch > > has such a hook. > > david, I just saw this. I'll need to look at that patch, it sounds pretty neat. Thanks ron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
An ex post facto notification of a PTE change would enable the RDMA Device driver to know when a Memory Region had been invalidated so that it could probably declare an access violation and tear all the connections using it down. But if the intent is to allow it to migrate the memory region to the new mapping it would need a more synchronized notice. It needs to be told of a *pending* change, so that it can indicate when it has completed any data movements based on the old data. It can then use the new data. This has generally been discussed as a two part interface: suspend (to request that the old mapping no longer be used) and resume (to resume usage of the mapping with the new values), and it is generally done at a Memory Region scope rather than on a per PTE basis. RDMA has strict ordering requirements. In particular, completing a receive work request represents a guarnatee to the consumer that the prior writes have been updated in its buffer. With an unsynchronized notice that "PTE entry X has been changed" I don't see how it can fulfill those semantics. It cannot know if portions of an RDMA Write were placed to the old physical location, and therefore it cannot know that the entire RDMA Write payload will be in user memory at the anticipated locations when it generates the work completion. If it cannot make that guarantee it is obligated to terminate the connection. On 5/3/05, David Addison <[EMAIL PROTECTED]> wrote: > Ronald G. Minnich wrote: > > > > On Fri, 29 Apr 2005, Greg Lindahl wrote: > > > >>It doesn't imply that there's an MMU, either. I know that Myricom uses a > >>little lookup routine in software on their nic, which most people > >>wouldn't call an MMU. I don't know what Mellanox does for this, they > >>don't talk much about what's hardware and what's software on their nic. > >>I think Quadrics actually uses the TLB of their risc cpu on their nic > >>for this lookup, but that's just a guess. > > > > but only quadrics rewrites the mm layer code .. > > > > > Hi Ron, > as our recent IOPROC patch on lkml shows, it's not that invasive. There > are just 24 hooks added to the Linux VM code paths - which we have been able > to > maintain outside the mainline tree for many years now. > As these hooks only need to synchronise the Elan's MMU state with that of the > CPU, the device drivers calls don't change the Linux MM behaviour. > > We believe the IOPROC patch is generic and powerful and would allow other > RDMA NICs to solve the page registration problems in a different manner. > For NICs which require page registration, new VM hooks can be used to avoid > pages being unloaded whilst DMAs are active. Our latest cut of the IOPROC > patch > has such a hook. > > Cheers > Addy. > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
On Tue, May 03, 2005 at 09:42:12AM +0100, David Addison wrote: > >This doesn't scale well as more cards are added to the box. > >I think I understand why it's good for single cards though. > > With the IOPROC patch the device driver hooks are registered on a per > process or perhaps better still, a per VMA basis. I was originally thinking the registrations are global (for all memory) and not per process. Per process or per VMA seems reasonable to me. > And for processes/VMAs where there are no registrations the overhead > is very low. Yes - thanks. I'm still reading the LKML thread you started: http://lkml.org/lkml/2005/4/26/198 In particular, the comments from Brice Goglin: http://lkml.org/lkml/2005/4/26/222 openib.org folks can find the IOPROC patch for 2.6.12-rc3 archived here: http://lkml.org/lkml/diff/2005/4/26/198/1 > With multiple cards in a box, all using different device drivers, > I guess there could end up being multiple registrations per process/VMA. > But I'm not sure this will be a common case for RDMA use in real life. I agree. Gateways between fabrics is the only case I can think of. This won't be a problem until someone at a large national lab tries to connect two "legacy" fabrics together. thanks, grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
Grant Grundler wrote: On Fri, Apr 29, 2005 at 08:22:24PM +0200, Brice Goglin wrote: For instance, instead of adding PROT_DONT/ALWAYSCOPY, you may use an ioproc hook in the fork path. This hook (a function in your driver) would be called for each registered page. It will decide whether the page should be pre-copied or not and update the registration table (or whatever stores address translations in the NIC). In addition, the driver would probably pre-copy cow pages when registering them. This doesn't scale well as more cards are added to the box. I think I understand why it's good for single cards though. With the IOPROC patch the device driver hooks are registered on a per process or perhaps better still, a per VMA basis. And for processes/VMAs where there are no registrations the overhead is very low. With multiple cards in a box, all using different device drivers, I guess there could end up being multiple registrations per process/VMA. But I'm not sure this will be a common case for RDMA use in real life. Cheers Addy. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
Ronald G. Minnich wrote: On Fri, 29 Apr 2005, Greg Lindahl wrote: It doesn't imply that there's an MMU, either. I know that Myricom uses a little lookup routine in software on their nic, which most people wouldn't call an MMU. I don't know what Mellanox does for this, they don't talk much about what's hardware and what's software on their nic. I think Quadrics actually uses the TLB of their risc cpu on their nic for this lookup, but that's just a guess. but only quadrics rewrites the mm layer code .. Hi Ron, as our recent IOPROC patch on lkml shows, it's not that invasive. There are just 24 hooks added to the Linux VM code paths - which we have been able to maintain outside the mainline tree for many years now. As these hooks only need to synchronise the Elan's MMU state with that of the CPU, the device drivers calls don't change the Linux MM behaviour. We believe the IOPROC patch is generic and powerful and would allow other RDMA NICs to solve the page registration problems in a different manner. For NICs which require page registration, new VM hooks can be used to avoid pages being unloaded whilst DMAs are active. Our latest cut of the IOPROC patch has such a hook. Cheers Addy. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
Greg Lindahl wrote: On Fri, Apr 29, 2005 at 12:33:54PM -0700, Grant Grundler wrote: Being mostly clueless about Quadrics implementation, I'm probably missing something that makes Quadrics a MMU but not the IB variants. Can someone clue me in please? As far as I can tell it's mostly a marketing distinction. Many Quadrics customers run with memory registration, and Mellanox could probably alter their firmware to not require registration. Myricom certainly can, and in fact Patrick Geoffrey claimed they were doing so in their MX software. The only one I know of that isn't that flexible is PathScale's InfiniPath. Ours is a pure hardware mechanism, but it requires memory registration and is clearly not an MMU. Greg, only a few of our evaluation customers use the patch free (and hence page-pinning) software release. Most do apply our simple IOPROC patch and run without requiring page pinning whilst still achieving the peak bandwidth and low latency of our hardware. Cheers Addy. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
oops, hit the send to soon. Finishing the response... On 4/29/05, Caitlin Bestler <[EMAIL PROTECTED]> wrote: > On 4/29/05, Roland Dreier <[EMAIL PROTECTED]> wrote: > > Bill> I'm very confused at this point. Can you briefly explain how > > Bill> this works, or point me to a description? I don't see how > > Bill> you could do user level I/O without registering the memory > > Bill> with the hardware. I'm especially confused by the comment > > Bill> (may not have been yours) that the memory doesn't have to be > > Bill> pinned. -- Bill Jordan InfiniCon Systems > > > > You add a hook to the kernel so it tells you if a page is about to be > > paged out or otherwise move. Then you set a bit in the adapter's page > > table so that it won't try to access that page without telling you. > > If the adapter asks for the page, you get the kernel to fault the page > > in and program the new physical mapping in the adapter. > > > > Yes, and you could even have a system that was capable of doing > DMA to a user virtual map (in fact some minis back around 1980 > had exactly that capability). > > But there are *two* issues involved here: > > One is that the RDMA hardware, however it is marketed, essentially > needs to act as an MMU. That means that it has to be synchronized > with normal MMU. The traditional sledge-hammer approach to > "synchronizing" is to require that the mapping be frozen. You *could* define a method that attempts to be more dynamic in this synchronization, but since it is an ex post facto mechanism that must work with multiple hardware cards it needs to be defined recognizing that it is not instantaneous. It is virtually the same problem as memory suspend in general, basically the RDMA Hardware's MMU is not making calculations for each and every access to the host bus. Secondly there is the problem that an advertised buffer is implicitly a promise to the the peer that the buffer is available. Using RNRs (or dropping TCP segments for iWARP) while paging an image from disk is just not playing fair. No host should advertise 20 GB of buffers to its peer when it only has 2 GBs of physical memory backing it up. When an application registers memory it believes it has permission from the OS to advertise buffers within it. RNRs are appropriate to move memory around, not to allow a host to overadvertise. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
On Fri, 29 Apr 2005, Caitlin Bestler wrote: > One is that the RDMA hardware, however it is marketed, essentially > needs to act as an MMU. That means that it has to be synchronized > with normal MMU. The traditional sledge-hammer approach to ah ha! his RDMA mmu just crashed his mm layer. It happens. ron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
On 4/29/05, Roland Dreier <[EMAIL PROTECTED]> wrote: > Bill> I'm very confused at this point. Can you briefly explain how > Bill> this works, or point me to a description? I don't see how > Bill> you could do user level I/O without registering the memory > Bill> with the hardware. I'm especially confused by the comment > Bill> (may not have been yours) that the memory doesn't have to be > Bill> pinned. -- Bill Jordan InfiniCon Systems > > You add a hook to the kernel so it tells you if a page is about to be > paged out or otherwise move. Then you set a bit in the adapter's page > table so that it won't try to access that page without telling you. > If the adapter asks for the page, you get the kernel to fault the page > in and program the new physical mapping in the adapter. > Yes, and you could even have a system that was capable of doing DMA to a user virtual map (in fact some minis back around 1980 had exactly that capability). But there are *two* issues involved here: One is that the RDMA hardware, however it is marketed, essentially needs to act as an MMU. That means that it has to be synchronized with normal MMU. The traditional sledge-hammer approach to > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
On Fri, Apr 29, 2005 at 03:07:40PM -0600, Ronald G. Minnich wrote: > On Fri, 29 Apr 2005, Greg Lindahl wrote: > > > It doesn't imply that there's an MMU, either. I know that Myricom uses a > > little lookup routine in software on their nic, which most people > > wouldn't call an MMU. I don't know what Mellanox does for this, they > > don't talk much about what's hardware and what's software on their nic. > > I think Quadrics actually uses the TLB of their risc cpu on their nic > > for this lookup, but that's just a guess. > > but only quadrics rewrites the mm layer code .. Mellanox, although they have the capability, does not use the feature. In the existing model the mellanox hardware assumes that the page is present, hence the entire discussion about how to make sure the page stays put and that the user mapping to that page stays put. -Libor ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
On Fri, 29 Apr 2005, Greg Lindahl wrote: > It doesn't imply that there's an MMU, either. I know that Myricom uses a > little lookup routine in software on their nic, which most people > wouldn't call an MMU. I don't know what Mellanox does for this, they > don't talk much about what's hardware and what's software on their nic. > I think Quadrics actually uses the TLB of their risc cpu on their nic > for this lookup, but that's just a guess. but only quadrics rewrites the mm layer code .. ron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RDMA memory registration
On Fri, 29 Apr 2005, Rimmer, Todd wrote: > But that implies the hardware has an MMU and it also puts an interrupt > in the path per page sent. yes. it does. and it doesn't do per page sent, just per page that has no pte on the nic when received. ron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
> Todd> But that implies the hardware has an MMU and it also puts an > Todd> interrupt in the path per page sent. > > Well, there's one interrupt per non-resident page sent. But nearly > all of the time the page will be present. It doesn't imply that there's an MMU, either. I know that Myricom uses a little lookup routine in software on their nic, which most people wouldn't call an MMU. I don't know what Mellanox does for this, they don't talk much about what's hardware and what's software on their nic. I think Quadrics actually uses the TLB of their risc cpu on their nic for this lookup, but that's just a guess. -- greg ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
On Fri, 29 Apr 2005, Bill Jordan wrote: > I'm very confused at this point. Can you briefly explain how this works, > or point me to a description? I don't see how you could do user level > I/O without registering the memory with the hardware. I'm especially > confused by the comment (may not have been yours) that the memory > doesn't have to be pinned. you modify the mm layer of linux, so that the PTEs on the Quadrics card are in sync with teh PTEs int he mm layer. Then you are in a position to have a NIC incite page faults for incoming packets. I think greg got it right -- in practice, it's not done any more. Quadrics has a kernel-patch-free source base now, I'm told. ron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
Todd> But that implies the hardware has an MMU and it also puts an Todd> interrupt in the path per page sent. Well, there's one interrupt per non-resident page sent. But nearly all of the time the page will be present. Todd> Wasn't the assertion that there was no MMU in the hardware? I don't think so. Greg's original message said this doesn't work for PathScale's part precisely because they don't have an MMU. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RDMA memory registration
> You add a hook to the kernel so it tells you if a page is about to be > paged out or otherwise move. Then you set a bit in the adapter's page > table so that it won't try to access that page without telling you. > If the adapter asks for the page, you get the kernel to fault the page > in and program the new physical mapping in the adapter. But that implies the hardware has an MMU and it also puts an interrupt in the path per page sent. Wasn't the assertion that there was no MMU in the hardware? Todd Rimmer ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
Bill> I'm very confused at this point. Can you briefly explain how Bill> this works, or point me to a description? I don't see how Bill> you could do user level I/O without registering the memory Bill> with the hardware. I'm especially confused by the comment Bill> (may not have been yours) that the memory doesn't have to be Bill> pinned. -- Bill Jordan InfiniCon Systems You add a hook to the kernel so it tells you if a page is about to be paged out or otherwise move. Then you set a bit in the adapter's page table so that it won't try to access that page without telling you. If the adapter asks for the page, you get the kernel to fault the page in and program the new physical mapping in the adapter. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
On 4/29/05, Greg Lindahl <[EMAIL PROTECTED]> wrote: > On Fri, Apr 29, 2005 at 12:33:54PM -0700, Grant Grundler wrote: > > > Being mostly clueless about Quadrics implementation, I'm probably > > missing something that makes Quadrics a MMU but not the IB variants. > > Can someone clue me in please? > > As far as I can tell it's mostly a marketing distinction. Many > Quadrics customers run with memory registration, and Mellanox could > probably alter their firmware to not require registration. Myricom > certainly can, and in fact Patrick Geoffrey claimed they were doing so > in their MX software. The only one I know of that isn't that flexible > is PathScale's InfiniPath. Ours is a pure hardware mechanism, but it > requires memory registration and is clearly not an MMU. > > Confused yet? I'm very confused at this point. Can you briefly explain how this works, or point me to a description? I don't see how you could do user level I/O without registering the memory with the hardware. I'm especially confused by the comment (may not have been yours) that the memory doesn't have to be pinned. -- Bill Jordan InfiniCon Systems ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
On Fri, Apr 29, 2005 at 12:33:54PM -0700, Grant Grundler wrote: > Being mostly clueless about Quadrics implementation, I'm probably > missing something that makes Quadrics a MMU but not the IB variants. > Can someone clue me in please? As far as I can tell it's mostly a marketing distinction. Many Quadrics customers run with memory registration, and Mellanox could probably alter their firmware to not require registration. Myricom certainly can, and in fact Patrick Geoffrey claimed they were doing so in their MX software. The only one I know of that isn't that flexible is PathScale's InfiniPath. Ours is a pure hardware mechanism, but it requires memory registration and is clearly not an MMU. Confused yet? -- greg ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA memory registration
Bill> Are you suggesting making the partial pages their own VMA, Bill> or marking the entire buffer with this flag? I originally Bill> thought the entire buffer should be copy on fork (instead of Bill> copy on write), and I believe this is the path Mellanox was Bill> pursing with the VM_NO_COW flag. However, if applications Bill> are registering gigs of ram, it would be very bad to have Bill> the entire area copied on fork. It's up to userspace really but I would expect that the partial pages would be in a vma by themselves. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RDMA memory registration
On Fri, Apr 29, 2005 at 08:22:24PM +0200, Brice Goglin wrote: > For instance, instead of adding PROT_DONT/ALWAYSCOPY, you may use > an ioproc hook in the fork path. This hook (a function in your driver) > would be called for each registered page. It will decide whether > the page should be pre-copied or not and update the registration > table (or whatever stores address translations in the NIC). > In addition, the driver would probably pre-copy cow pages when > registering them. This doesn't scale well as more cards are added to the box. I think I understand why it's good for single cards though. > It's nice to see these two works coming to LKML at the same time. > It would be great if we could merge them and get a generic solution > that's suitable to both registration based cards (IB/Myri/Ammasso) > and MMU-based cards (Quadrics). Aren't the mellanox mem-free cards more or less MMU's as well? I had that impression after attending Dror Goldberg's talk though I don't think he asserted that. Openib.org developers conf (Feb 2005) slideset is here: http://www.openib.org/docs/oib_wkshp_022005/memfree-hca-mellanox-dgoldenberg.pdf Being mostly clueless about Quadrics implementation, I'm probably missing something that makes Quadrics a MMU but not the IB variants. Can someone clue me in please? thanks, grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA memory registration
Brice> Do you plan to work with David Addison from Quadrics ? For Brice> sure, your hardware have very different capabilities. But Brice> ioproc_ops is a really nice solution and might help a lot Brice> when dealing with deregistration and fork. I'm following the discussion with interest. Some hardware (eg Mellanox HCAs) has the ability to use these hooks to avoid pinning pages at all, but in general IB and iWARP need to pin pages so the mapping doesn't change. Brice> For instance, instead of adding PROT_DONT/ALWAYSCOPY, you Brice> may use an ioproc hook in the fork path. This hook (a Brice> function in your driver) would be called for each Brice> registered page. It will decide whether the page should be Brice> pre-copied or not and update the registration table (or Brice> whatever stores address translations in the NIC). In Brice> addition, the driver would probably pre-copy cow pages when Brice> registering them. This sort of monkeying around with the VM from driver code seems much more complicated than letting userspace handle it. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA memory registration
Roland Dreier a écrit : 2) For fork() support: a) Extend mprotect() with PROT_DONTCOPY so processes can avoid copy-on-write problems. b) (maybe someday?) Add a VM_ALWAYSCOPY flag and extend mprotect() with PROT_ALWAYSCOPY so processes can mark pages to be pre-copied into child processes, to handle the case where only half a page is registered. I believe this puts the code that must be trusted into the kernel and gives userspace primitives that let apps handle the rest. Do you plan to work with David Addison from Quadrics ? For sure, your hardware have very different capabilities. But ioproc_ops is a really nice solution and might help a lot when dealing with deregistration and fork. For instance, instead of adding PROT_DONT/ALWAYSCOPY, you may use an ioproc hook in the fork path. This hook (a function in your driver) would be called for each registered page. It will decide whether the page should be pre-copied or not and update the registration table (or whatever stores address translations in the NIC). In addition, the driver would probably pre-copy cow pages when registering them. It's nice to see these two works coming to LKML at the same time. It would be great if we could merge them and get a generic solution that's suitable to both registration based cards (IB/Myri/Ammasso) and MMU-based cards (Quadrics). Brice ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general