Ok, I think we're all good to go now:

- Brad's problems were initially cluster config errors, and later we determined that they *may* be eHCA gen 1 issues with RDMA CM. We're deferring fixing them for sure until after v1.3 because IBM doesn't care about RDMA CM for eHCA.

- Jon's issues *look* like MPI layer issues, not BTL connectivity issues. And they were spurrious. So we need to keep testing there.

However, I'm going to wait merging until after tomorrow's MTT morning results because of the openib BTL breakage from today caused by the ob1 commits yesterday. I'd like to get a good solid openib MTT test night in before merging in all this new stuff.



On Oct 1, 2008, at 11:21 AM, Jon Mason wrote:

On Wed, Oct 01, 2008 at 08:08:48AM -0400, Jeff Squyres wrote:
Per the call yesterday, I'll merge this into the trunk once I get it
working with Brad on PPC.

Brad and I discovered a missing htonl/ntohl somewhere in the code last
night right before I had to go offline (i.e., we can see the IP
addresses are backwards, but don't know where it's coming from) on PPC,
so I haven't finished yet.  We'll probably get it fixed up today.

My tests yesterday showed some errors. Unfortunately, I lost the system before I could take a look. I'll re-run them and verify that everything
is still sane.



On Sep 30, 2008, at 10:05 AM, Jeff Squyres wrote:

(putting this on devel just so that others can see it)

Ok, I put in all the things in the RDMA CM CPC HG tree that we've
talked about and it now should work out of the box with:

- any iwarp (no need for kernel hacks to have initiator send first)
- any IB (setup the stuff to do the initiator_depth and
responder_resources properly)
- any [valid but] bizarre IP addressing scheme

Could everyone try the HG tree again to ensure it still/now works for
you out of the box?

  http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/openib-fd-
progress/

Try with changeset 106 (b046bf97deab) or later.  The only thing that
is missing is a bit better scalability on allocating buffers for the
CTS.  Now that all the other changes are in, I'll be working on that
today and tomorrow.

--
Jeff Squyres
Cisco Systems



--
Jeff Squyres
Cisco Systems

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to