My correction - the fix is in the nightly tarball from tonight. You can get it here:
openmpi-v1.8.3-272-g4e4f997.tar.bz2 <http://www.open-mpi.org/nightly/v1.8/openmpi-v1.8.3-272-g4e4f997.tar.bz2> On Mon, Dec 15, 2014 at 2:40 PM, Ralph Castain <r...@open-mpi.org> wrote: > > Hey Tom > > Note that rc2 had a bug in the out-of-band messaging system - might be > what you are hitting. I'd suggest working with rc4. > > > On Mon, Dec 15, 2014 at 12:57 PM, Tom Wurgler <twu...@goodyear.com> wrote: > >> I have to take it back. While the first job was less than a node's >> worth of cores and ran properly on the cores I wanted. more testing is >> revealing other problems. >> >> Anything that spans more than one node crashes and burns, with a core >> dump, and nothing in the files to indicate why. >> >> Note this is still rc2.... >> >> More testing on-going.... >> >> >> ------------------------------ >> *From:* devel <devel-boun...@open-mpi.org> on behalf of Tom Wurgler < >> twu...@goodyear.com> >> *Sent:* Monday, December 15, 2014 1:23 PM >> >> *To:* Open MPI Developers >> *Subject:* Re: [OMPI devel] 1.8.4rc Status >> >> >> It seems to be working in rc2 after all. >> >> I was still trying to use a rankfile, but it appears that is no longer >> needed. >> >> Thanks! >> >> >> ------------------------------ >> *From:* devel <devel-boun...@open-mpi.org> on behalf of Ralph Castain < >> r...@open-mpi.org> >> *Sent:* Monday, December 15, 2014 8:45 AM >> *To:* Open MPI Developers >> *Subject:* Re: [OMPI devel] 1.8.4rc Status >> >> Should be there in rc4, and I thought it made it to rc2 for that >> matter. I'll take a gander. >> >> FWIW: I'm working off-list with IBM to tighten the LSF integration so >> we correctly read and follow their binding directives. This will also be in >> 1.8.4 as we are in final test with it now. >> >> Ralph >> >> >> On Mon, Dec 15, 2014 at 5:40 AM, Tom Wurgler <twu...@goodyear.com> >> wrote: >>> >>> Forgive me if I've missed it, but I believe using physical OR logical >>> core numbering was going to be >>> >>> reimplemented in the 1.8.4 series. >>> >>> >>> I've checked out rc2 and as far as I can tell, it isn't there as yet. >>> Is this correct? >>> >>> >>> thanks! >>> >>> >>> ------------------------------ >>> *From:* devel <devel-boun...@open-mpi.org> on behalf of Ralph Castain < >>> r...@open-mpi.org> >>> *Sent:* Monday, December 15, 2014 8:35 AM >>> *To:* Open MPI Developers >>> *Subject:* [OMPI devel] 1.8.4rc Status >>> >>> Hi folks >>> >>> Trying to summarize the current situation on releasing 1.8.4. >>> Remaining identified issues: >>> >>> 1. TCP/BTL hang under mpi-thread-multiple. Asked George to look into >>> it. >>> >>> 2. hwloc updates required. Brice committed them to the hwloc 1.7 repo. >>> Gilles volunteered to create the PR from there. >>> >>> 3. Fortran f08 binding disable for compilers not meeting certain >>> conditions. PR from Gilles awaiting review by Jeff >>> >>> 4. Topo signature issue reported by IBM. Ralph is waiting for more >>> debug. >>> >>> 5. MPI/IO issue reported by Eric Chamberland. Gilles investigating. >>> >>> 6. make check issue on SPARC. Problem and fix reported by Paul >>> Hargrove, Ralph will commit >>> >>> 7. Linkage issue on Solaris-11 reported by Paul Hargrove. Missing the >>> multi-threaded C libraries, apparently need "-mt=yes" in both compile and >>> link. Need someone to investigate. >>> >>> Please let me know if I've missed anything. >>> Ralph >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/12/16595.php >>> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/12/16604.php >> >