On Nov 30 2010, Ralph Castain wrote:

Here is what one IB vendor says about the issue on their web site (redacted to protect the innocent):

"At the time of this release, the (redacted-openib) driver has issues with buffers sharing pages when fork( ) is used. Pinned (locked in memory) pages are normally marked copy-on-write during a fork.

That is TRULY demented!  It is almost always precisely the wrong thing
to do.

If a page is pinned before a fork and subsequently written to while RDMA operations are being performed on the same page, silent data corruption can occur as RDMA operations continue to stream data to a page that has moved. To avoid this, the (redacted-openib) driver does not use copy-on-write behavior during a fork for pinned pages. Instead, access to these pages by the child process will result in a segmentation violation."

That is sane.  Not user-friendly, but at least sane.

While there is some variation, I believe you will find that all IB comm shares this problem. So it is wise to avoid using fork if you want to use the openib transport.

Yes and no.  Some such communication may allow RDMA only to shared memory,
which solves the problem in another way.  Several specialist HPC networks
were (are?) like that, and I can see no reason why an IB driver should not
use the same design.  That, of course, means that most MPI transfers need
a copy.

Hence the warning. Ignoring it is purely a "user beware" situation, but we provide that mechanism for the truly adventurous...or IB developers who want to someday resolve the problem.

Well, there is a much simpler case where it will "just work", which is
very probably what the OP was doing.  When the fork is immediately
followed by an exec in the child process, there isn't an issue.  We all
know the history, but the mainframe designs of having a proper spawn
primitive were much cleaner.  However, that's not what we've got.

It might be worth adding to the note that this is the ONLY case when the
ordinary user is advised to use that facility.  Or it might not, depending
on the level of Clue that readers are expected to have.


Regards,
Nick Maclaren.





Reply via email to