Re: [openib-general] Re: RDMA memory registration

2005-05-03 Thread Caitlin Bestler
On 5/3/05, David Addison <[EMAIL PROTECTED]> wrote:

> We believe the IOPROC patch is generic and powerful and would allow other
> RDMA NICs to solve the page registration problems in a different manner.
> For NICs which require page registration, new VM hooks can be used to avoid
> pages being unloaded whilst DMAs are active. Our latest cut of the IOPROC 
> patch
> has such a hook.
> 

The key phrase here is "avoid pages being unloaded whilst DMAs are active".
Correct RDMA behavior requires preventing any loss of the content of those
pages in the period from the end of the DMA until the next completion is
reaped.

If the kernel were to start transferring the pages immediately after the DMA
completed, what would prevent the associated receive completion from being
generated before the migration was completed?

And if a migration is in progress, how is this feedback given to RDMA device
and when? Explicitly suspending a Memory Registration allows detection of
the problem while the disposition of the packet is still pending. Postponing
determination that the target memory is suspended until the actual DMA
transfer is attempted is problematic.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-05-03 Thread Ronald G. Minnich


> On 5/3/05, David Addison <[EMAIL PROTECTED]> wrote:
> > as our recent IOPROC patch on lkml shows, it's not that invasive. There
> > are just 24 hooks added to the Linux VM code paths - which we have been 
> > able to
> > maintain outside the mainline tree for many years now.
> > As these hooks only need to synchronise the Elan's MMU state with that of 
> > the
> > CPU, the device drivers calls don't change the Linux MM behaviour.
> > 
> > We believe the IOPROC patch is generic and powerful and would allow other
> > RDMA NICs to solve the page registration problems in a different manner.
> > For NICs which require page registration, new VM hooks can be used to avoid
> > pages being unloaded whilst DMAs are active. Our latest cut of the IOPROC 
> > patch
> > has such a hook.
> > 

david, I just saw this. I'll need to look at that patch, it sounds pretty 
neat. Thanks


ron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-05-03 Thread Caitlin Bestler
An ex post facto notification of a PTE change would enable the RDMA
Device driver to know when a Memory Region had been invalidated
so that it could probably declare an access violation and tear all 
the connections using it down.

But if the intent is to allow it to migrate the memory region to the
new mapping it would need a more synchronized notice. It needs
to be told of a *pending* change, so that it can indicate when it
has completed any data movements based on the old data.
It can then use the new data. This has generally been discussed
as a two part interface: suspend (to request that the old mapping
no longer be used) and resume (to resume usage of the mapping
with the new values), and it is generally done at a Memory Region
scope rather than on a per PTE basis.

RDMA has strict ordering requirements. In particular, completing
a receive work request represents a guarnatee to the consumer
that the prior writes have been updated in its buffer. With an
unsynchronized notice that "PTE entry X has been changed"
I don't see how it can fulfill those semantics. It cannot know if
portions of an RDMA Write were placed to the old physical
location, and therefore it cannot know that the entire RDMA
Write payload will be in user memory at the anticipated locations
when it generates the work completion. If it cannot make that
guarantee it is obligated to terminate the connection.


On 5/3/05, David Addison <[EMAIL PROTECTED]> wrote:
> Ronald G. Minnich wrote:
> >
> > On Fri, 29 Apr 2005, Greg Lindahl wrote:
> >
> >>It doesn't imply that there's an MMU, either. I know that Myricom uses a
> >>little lookup routine in software on their nic, which most people
> >>wouldn't call an MMU. I don't know what Mellanox does for this, they
> >>don't talk much about what's hardware and what's software on their nic.
> >>I think Quadrics actually uses the TLB of their risc cpu on their nic
> >>for this lookup, but that's just a guess.
> >
> > but only quadrics rewrites the mm layer code ..
> >
> >
> Hi Ron,
> as our recent IOPROC patch on lkml shows, it's not that invasive. There
> are just 24 hooks added to the Linux VM code paths - which we have been able 
> to
> maintain outside the mainline tree for many years now.
> As these hooks only need to synchronise the Elan's MMU state with that of the
> CPU, the device drivers calls don't change the Linux MM behaviour.
> 
> We believe the IOPROC patch is generic and powerful and would allow other
> RDMA NICs to solve the page registration problems in a different manner.
> For NICs which require page registration, new VM hooks can be used to avoid
> pages being unloaded whilst DMAs are active. Our latest cut of the IOPROC 
> patch
> has such a hook.
> 
> Cheers
> Addy.
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-05-03 Thread Grant Grundler
On Tue, May 03, 2005 at 09:42:12AM +0100, David Addison wrote:
> >This doesn't scale well as more cards are added to the box.
> >I think I understand why it's good for single cards though.
>
> With the IOPROC patch the device driver hooks are registered on a per 
> process or perhaps better still, a per VMA basis.

I was originally thinking the registrations are global (for all memory)
and not per process. Per process or per VMA seems reasonable to me.

> And for processes/VMAs where there are no registrations the overhead
> is very low.

Yes - thanks. I'm still reading the LKML thread you started:
http://lkml.org/lkml/2005/4/26/198

In particular, the comments from Brice Goglin:
http://lkml.org/lkml/2005/4/26/222

openib.org folks can find the IOPROC patch for 2.6.12-rc3 archived here:
http://lkml.org/lkml/diff/2005/4/26/198/1

> With multiple cards in a box, all using different device drivers,
> I guess there could end up being multiple registrations per process/VMA.
> But I'm not sure this will be a common case for RDMA use in real life.

I agree. Gateways between fabrics is the only case I can think of.
This won't be a problem until someone at a large national lab tries
to connect two "legacy" fabrics together.

thanks,
grant
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-05-03 Thread David Addison
Grant Grundler wrote:
On Fri, Apr 29, 2005 at 08:22:24PM +0200, Brice Goglin wrote:
For instance, instead of adding PROT_DONT/ALWAYSCOPY, you may use
an ioproc hook in the fork path. This hook (a function in your driver)
would be called for each registered page. It will decide whether
the page should be pre-copied or not and update the registration
table (or whatever stores address translations in the NIC).
In addition, the driver would probably pre-copy cow pages when
registering them.
This doesn't scale well as more cards are added to the box.
I think I understand why it's good for single cards though.
With the IOPROC patch the device driver hooks are registered on a per process
or perhaps better still, a per VMA basis. And for processes/VMAs where there
are no registrations the overhead is very low.
With multiple cards in a box, all using different device drivers, I guess there
could end up being multiple registrations per process/VMA. But I'm not sure
this will be a common case for RDMA use in real life.
Cheers
Addy.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-05-03 Thread David Addison
Ronald G. Minnich wrote:
On Fri, 29 Apr 2005, Greg Lindahl wrote:
It doesn't imply that there's an MMU, either. I know that Myricom uses a
little lookup routine in software on their nic, which most people
wouldn't call an MMU. I don't know what Mellanox does for this, they
don't talk much about what's hardware and what's software on their nic.
I think Quadrics actually uses the TLB of their risc cpu on their nic
for this lookup, but that's just a guess.
but only quadrics rewrites the mm layer code ..

Hi Ron,
as our recent IOPROC patch on lkml shows, it's not that invasive. There
are just 24 hooks added to the Linux VM code paths - which we have been able to
maintain outside the mainline tree for many years now.
As these hooks only need to synchronise the Elan's MMU state with that of the
CPU, the device drivers calls don't change the Linux MM behaviour.
We believe the IOPROC patch is generic and powerful and would allow other
RDMA NICs to solve the page registration problems in a different manner.
For NICs which require page registration, new VM hooks can be used to avoid
pages being unloaded whilst DMAs are active. Our latest cut of the IOPROC patch
has such a hook.
Cheers
Addy.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-05-03 Thread David Addison
Greg Lindahl wrote:
On Fri, Apr 29, 2005 at 12:33:54PM -0700, Grant Grundler wrote:
Being mostly clueless about Quadrics implementation, I'm probably
missing something that makes Quadrics a MMU but not the IB variants.
Can someone clue me in please?
As far as I can tell it's mostly a marketing distinction. Many
Quadrics customers run with memory registration, and Mellanox could
probably alter their firmware to not require registration.  Myricom
certainly can, and in fact Patrick Geoffrey claimed they were doing so
in their MX software. The only one I know of that isn't that flexible
is PathScale's InfiniPath. Ours is a pure hardware mechanism, but it
requires memory registration and is clearly not an MMU.
Greg,
only a few of our evaluation customers use the patch free (and hence
page-pinning) software release.
Most do apply our simple IOPROC patch and run without requiring page
pinning whilst still achieving the peak bandwidth and low latency
of our hardware.
Cheers
Addy.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Caitlin Bestler
oops, hit the send to soon. Finishing the response...

On 4/29/05, Caitlin Bestler <[EMAIL PROTECTED]> wrote:
> On 4/29/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
> > Bill> I'm very confused at this point. Can you briefly explain how
> > Bill> this works, or point me to a description? I don't see how
> > Bill> you could do user level I/O without registering the memory
> > Bill> with the hardware. I'm especially confused by the comment
> > Bill> (may not have been yours) that the memory doesn't have to be
> > Bill> pinned.  -- Bill Jordan InfiniCon Systems
> >
> > You add a hook to the kernel so it tells you if a page is about to be
> > paged out or otherwise move.  Then you set a bit in the adapter's page
> > table so that it won't try to access that page without telling you.
> > If the adapter asks for the page, you get the kernel to fault the page
> > in and program the new physical mapping in the adapter.
> >
> 
> Yes, and you could even have a system that was capable of doing
> DMA to a user virtual map (in fact some minis back around 1980
> had exactly that capability).
> 
> But there are *two* issues involved here:
> 
> One is that the RDMA hardware, however it is marketed, essentially
> needs to act as an MMU. That means that it has to be synchronized
> with normal MMU. The traditional sledge-hammer approach to
> 
"synchronizing" is to require that the mapping be frozen. You *could*
define a method that attempts to be more dynamic in this synchronization,
but since it is an ex post facto mechanism that must work with multiple
hardware cards it needs to be defined recognizing that it is not
instantaneous.
It is virtually the same problem as memory suspend in general, basically
   the RDMA Hardware's MMU is not making calculations for each and every
   access to the host bus.

   Secondly there is the problem that an advertised buffer is implicitly a 
   promise to the the peer that the buffer is available. Using RNRs (or dropping
   TCP segments for iWARP) while paging an image from disk is just not
   playing fair. No host should advertise 20 GB of buffers to its peer when it
   only has 2 GBs of physical memory backing it up. When an application
   registers memory it believes it has permission from the OS to advertise
   buffers within it. RNRs are appropriate to move memory around, not to
   allow a host to overadvertise.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Ronald G. Minnich


On Fri, 29 Apr 2005, Caitlin Bestler wrote:

> One is that the RDMA hardware, however it is marketed, essentially
> needs to act as an MMU. That means that it has to be synchronized
> with normal MMU. The traditional sledge-hammer approach to 

ah ha! his RDMA mmu just crashed his mm layer. It happens. 

ron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Caitlin Bestler
On 4/29/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
> Bill> I'm very confused at this point. Can you briefly explain how
> Bill> this works, or point me to a description? I don't see how
> Bill> you could do user level I/O without registering the memory
> Bill> with the hardware. I'm especially confused by the comment
> Bill> (may not have been yours) that the memory doesn't have to be
> Bill> pinned.  -- Bill Jordan InfiniCon Systems
> 
> You add a hook to the kernel so it tells you if a page is about to be
> paged out or otherwise move.  Then you set a bit in the adapter's page
> table so that it won't try to access that page without telling you.
> If the adapter asks for the page, you get the kernel to fault the page
> in and program the new physical mapping in the adapter.
> 

Yes, and you could even have a system that was capable of doing
DMA to a user virtual map (in fact some minis back around 1980
had exactly that capability).

But there are *two* issues involved here:

One is that the RDMA hardware, however it is marketed, essentially
needs to act as an MMU. That means that it has to be synchronized
with normal MMU. The traditional sledge-hammer approach to 

> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Libor Michalek
On Fri, Apr 29, 2005 at 03:07:40PM -0600, Ronald G. Minnich wrote:
> On Fri, 29 Apr 2005, Greg Lindahl wrote:
> 
> > It doesn't imply that there's an MMU, either. I know that Myricom uses a
> > little lookup routine in software on their nic, which most people
> > wouldn't call an MMU. I don't know what Mellanox does for this, they
> > don't talk much about what's hardware and what's software on their nic.
> > I think Quadrics actually uses the TLB of their risc cpu on their nic
> > for this lookup, but that's just a guess.
> 
> but only quadrics rewrites the mm layer code ..

  Mellanox, although they have the capability, does not use the feature.
In the existing model the mellanox hardware assumes that the page is
present, hence the entire discussion about how to make sure the page
stays put and that the user mapping to that page stays put.

-Libor
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Ronald G. Minnich


On Fri, 29 Apr 2005, Greg Lindahl wrote:

> It doesn't imply that there's an MMU, either. I know that Myricom uses a
> little lookup routine in software on their nic, which most people
> wouldn't call an MMU. I don't know what Mellanox does for this, they
> don't talk much about what's hardware and what's software on their nic.
> I think Quadrics actually uses the TLB of their risc cpu on their nic
> for this lookup, but that's just a guess.

but only quadrics rewrites the mm layer code ..

ron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Ronald G. Minnich


On Fri, 29 Apr 2005, Rimmer, Todd wrote:

> But that implies the hardware has an MMU and it also puts an interrupt
> in the path per page sent.

yes. it does. and it doesn't do per page sent, just per page that has no 
pte on the nic when received.

ron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Greg Lindahl
> Todd> But that implies the hardware has an MMU and it also puts an
> Todd> interrupt in the path per page sent.
> 
> Well, there's one interrupt per non-resident page sent.  But nearly
> all of the time the page will be present.

It doesn't imply that there's an MMU, either. I know that Myricom uses
a little lookup routine in software on their nic, which most people
wouldn't call an MMU. I don't know what Mellanox does for this, they
don't talk much about what's hardware and what's software on their
nic. I think Quadrics actually uses the TLB of their risc cpu on their
nic for this lookup, but that's just a guess.

-- greg

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Ronald G. Minnich


On Fri, 29 Apr 2005, Bill Jordan wrote:

> I'm very confused at this point. Can you briefly explain how this works,
> or point me to a description? I don't see how you could do user level
> I/O without registering the memory with the hardware. I'm especially
> confused by the comment (may not have been yours) that the memory
> doesn't have to be pinned. 

you modify the mm layer of linux, so that the PTEs on the Quadrics card 
are in sync with teh PTEs int he mm layer. Then you are in a position to 
have a NIC incite page faults for incoming packets. 

I think greg got it right -- in practice, it's not done any more. Quadrics 
has a kernel-patch-free source base now, I'm told.

ron
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Roland Dreier
Todd> But that implies the hardware has an MMU and it also puts an
Todd> interrupt in the path per page sent.

Well, there's one interrupt per non-resident page sent.  But nearly
all of the time the page will be present.

Todd> Wasn't the assertion that there was no MMU in the hardware?

I don't think so.  Greg's original message said this doesn't work for
PathScale's part precisely because they don't have an MMU.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Rimmer, Todd
> You add a hook to the kernel so it tells you if a page is about to be
> paged out or otherwise move.  Then you set a bit in the adapter's page
> table so that it won't try to access that page without telling you.
> If the adapter asks for the page, you get the kernel to fault the page
> in and program the new physical mapping in the adapter.

But that implies the hardware has an MMU and it also puts an interrupt in the 
path per page sent.

Wasn't the assertion that there was no MMU in the hardware?

Todd Rimmer

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Roland Dreier
Bill> I'm very confused at this point. Can you briefly explain how
Bill> this works, or point me to a description? I don't see how
Bill> you could do user level I/O without registering the memory
Bill> with the hardware. I'm especially confused by the comment
Bill> (may not have been yours) that the memory doesn't have to be
Bill> pinned.  -- Bill Jordan InfiniCon Systems

You add a hook to the kernel so it tells you if a page is about to be
paged out or otherwise move.  Then you set a bit in the adapter's page
table so that it won't try to access that page without telling you.
If the adapter asks for the page, you get the kernel to fault the page
in and program the new physical mapping in the adapter.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Bill Jordan
On 4/29/05, Greg Lindahl <[EMAIL PROTECTED]> wrote:
> On Fri, Apr 29, 2005 at 12:33:54PM -0700, Grant Grundler wrote:
> 
> > Being mostly clueless about Quadrics implementation, I'm probably
> > missing something that makes Quadrics a MMU but not the IB variants.
> > Can someone clue me in please?
> 
> As far as I can tell it's mostly a marketing distinction. Many
> Quadrics customers run with memory registration, and Mellanox could
> probably alter their firmware to not require registration.  Myricom
> certainly can, and in fact Patrick Geoffrey claimed they were doing so
> in their MX software. The only one I know of that isn't that flexible
> is PathScale's InfiniPath. Ours is a pure hardware mechanism, but it
> requires memory registration and is clearly not an MMU.
> 
> Confused yet?

I'm very confused at this point. Can you briefly explain how this
works, or point me to a description? I don't see how you could do user
level I/O without registering the memory with the hardware. I'm
especially confused by the comment (may not have been yours) that the
memory doesn't have to be pinned.
-- 
Bill Jordan
InfiniCon Systems
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Greg Lindahl
On Fri, Apr 29, 2005 at 12:33:54PM -0700, Grant Grundler wrote:

> Being mostly clueless about Quadrics implementation, I'm probably
> missing something that makes Quadrics a MMU but not the IB variants.
> Can someone clue me in please?

As far as I can tell it's mostly a marketing distinction. Many
Quadrics customers run with memory registration, and Mellanox could
probably alter their firmware to not require registration.  Myricom
certainly can, and in fact Patrick Geoffrey claimed they were doing so
in their MX software. The only one I know of that isn't that flexible
is PathScale's InfiniPath. Ours is a pure hardware mechanism, but it
requires memory registration and is clearly not an MMU.

Confused yet?

-- greg
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RDMA memory registration

2005-04-29 Thread Roland Dreier
Bill> Are you suggesting making the partial pages their own VMA,
Bill> or marking the entire buffer with this flag? I originally
Bill> thought the entire buffer should be copy on fork (instead of
Bill> copy on write), and I believe this is the path Mellanox was
Bill> pursing with the VM_NO_COW flag.  However, if applications
Bill> are registering gigs of ram, it would be very bad to have
Bill> the entire area copied on fork.

It's up to userspace really but I would expect that the partial pages
would be in a vma by themselves.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Grant Grundler
On Fri, Apr 29, 2005 at 08:22:24PM +0200, Brice Goglin wrote:
> For instance, instead of adding PROT_DONT/ALWAYSCOPY, you may use
> an ioproc hook in the fork path. This hook (a function in your driver)
> would be called for each registered page. It will decide whether
> the page should be pre-copied or not and update the registration
> table (or whatever stores address translations in the NIC).
> In addition, the driver would probably pre-copy cow pages when
> registering them.

This doesn't scale well as more cards are added to the box.
I think I understand why it's good for single cards though.

> It's nice to see these two works coming to LKML at the same time.
> It would be great if we could merge them and get a generic solution
> that's suitable to both registration based cards (IB/Myri/Ammasso)
> and MMU-based cards (Quadrics).

Aren't the mellanox mem-free cards more or less MMU's as well?
I had that impression after attending Dror Goldberg's talk
though I don't think he asserted that.
Openib.org developers conf (Feb 2005) slideset is here:

http://www.openib.org/docs/oib_wkshp_022005/memfree-hca-mellanox-dgoldenberg.pdf

Being mostly clueless about Quadrics implementation, I'm probably
missing something that makes Quadrics a MMU but not the IB variants.
Can someone clue me in please?

thanks,
grant
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RDMA memory registration

2005-04-29 Thread Roland Dreier
Brice> Do you plan to work with David Addison from Quadrics ?  For
Brice> sure, your hardware have very different capabilities.  But
Brice> ioproc_ops is a really nice solution and might help a lot
Brice> when dealing with deregistration and fork.

I'm following the discussion with interest.  Some hardware (eg
Mellanox HCAs) has the ability to use these hooks to avoid pinning
pages at all, but in general IB and iWARP need to pin pages so the
mapping doesn't change.

Brice> For instance, instead of adding PROT_DONT/ALWAYSCOPY, you
Brice> may use an ioproc hook in the fork path. This hook (a
Brice> function in your driver) would be called for each
Brice> registered page. It will decide whether the page should be
Brice> pre-copied or not and update the registration table (or
Brice> whatever stores address translations in the NIC).  In
Brice> addition, the driver would probably pre-copy cow pages when
Brice> registering them.

This sort of monkeying around with the VM from driver code seems much
more complicated than letting userspace handle it.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RDMA memory registration

2005-04-29 Thread Brice Goglin
Roland Dreier a écrit :
2) For fork() support:
   a) Extend mprotect() with PROT_DONTCOPY so processes can avoid
  copy-on-write problems.
   b) (maybe someday?) Add a VM_ALWAYSCOPY flag and extend mprotect()
  with PROT_ALWAYSCOPY so processes can mark pages to be
  pre-copied into child processes, to handle the case where only
  half a page is registered.
I believe this puts the code that must be trusted into the kernel and
gives userspace primitives that let apps handle the rest.
Do you plan to work with David Addison from Quadrics ?
For sure, your hardware have very different capabilities.
But ioproc_ops is a really nice solution and might help a lot
when dealing with deregistration and fork.
For instance, instead of adding PROT_DONT/ALWAYSCOPY, you may use
an ioproc hook in the fork path. This hook (a function in your driver)
would be called for each registered page. It will decide whether
the page should be pre-copied or not and update the registration
table (or whatever stores address translations in the NIC).
In addition, the driver would probably pre-copy cow pages when
registering them.
It's nice to see these two works coming to LKML at the same time.
It would be great if we could merge them and get a generic solution
that's suitable to both registration based cards (IB/Myri/Ammasso)
and MMU-based cards (Quadrics).
Brice
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general