[OMPI devel] 1.2.8 testing
Short version: looks good from Cisco. If we can get one other confirmation, let's ship it. More details: I have a very heterogenous cluster, which makes 1.2.x testing difficult with openib (there's a bug in the port matching logic that makes it not always work -- we decided a long time ago not to fix it). So I have a lot of false failures in my 1.2.8rc1 MTT results. I went through and checked them and re-ran the failures manually that I found. As result, the only real failures that I find are: 1. various flavors of fortran bsend fail with the PGI compiler suite --> I'll take a good college swing at this one today and see if there's an obvious fix; it could be simply, since it appears to be a fortran-only failure. Specifically, these fail: MPI_Bsend_init_rtoa_f MPI_Bsend_rtoa_f MPI_Ibsend_rtoa_f MPI_Bsend_init_rtoa_f MPI_Bsend_rtoa_f MPI_Ibsend_rtoa_f And the _c versions of them all pass: MPI_Bsend_init_rtoa_c MPI_Bsend_rtoa_c MPI_Ibsend_rtoa_c MPI_Bsend_init_rtoa_c MPI_Bsend_rtoa_c MPI_Ibsend_rtoa_c 2. some variants of spawn, which also failed in my 1.2.7 testing (we agreed long ago not to fix these for v1.2.x) -- Jeff Squyres Cisco Systems
Re: [OMPI devel] 1.2.8 testing
On Oct 13, 2008, at 1:34 PM, Jeff Squyres wrote: MPI_Bsend_init_rtoa_f MPI_Bsend_rtoa_f MPI_Ibsend_rtoa_f MPI_Bsend_init_rtoa_f MPI_Bsend_rtoa_f MPI_Ibsend_rtoa_f These tests fail with the PGI fortran compiler because they are trying to allocate a 1.5MB buffer on the stack (i.e., they segv before the first executable line of code). Reducing the size of the buffer makes the tests pass. The size of the buffer was increased by Rolf when he made the intel tests able to be run with more than 64 procs. So I'm pretty sure this is a new failure. Rolf and I will work out what to do about the intel test, but for 1.2.8, I think we're good to one. It would be good to get one more confirmation from someone else, though. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] 1.2.8 testing
On Oct 13, 2008, at 2:12 PM, Jeff Squyres wrote: These tests fail with the PGI fortran compiler because they are trying to allocate a 1.5MB buffer on the stack (i.e., they segv before the first executable line of code). s/1.5MB/16MB/ -- Jeff Squyres Cisco Systems
Re: [OMPI devel] 1.2.8 testing
I'll test 1.2.8 on our Lobo system tomorrow (out today). Primary issue we are seeing there frankly is that some of the tests simply fail when you get up to 16ppn - in one case, it appears that the memory allocated during the test overflows available memory on the node when you get that many procs. So sorting out which tests run at 16ppn and which don't has become a bit of a challenge. I'll see what I can do, though. Ralph On Oct 13, 2008, at 12:12 PM, Jeff Squyres wrote: On Oct 13, 2008, at 1:34 PM, Jeff Squyres wrote: MPI_Bsend_init_rtoa_f MPI_Bsend_rtoa_f MPI_Ibsend_rtoa_f MPI_Bsend_init_rtoa_f MPI_Bsend_rtoa_f MPI_Ibsend_rtoa_f These tests fail with the PGI fortran compiler because they are trying to allocate a 1.5MB buffer on the stack (i.e., they segv before the first executable line of code). Reducing the size of the buffer makes the tests pass. The size of the buffer was increased by Rolf when he made the intel tests able to be run with more than 64 procs. So I'm pretty sure this is a new failure. Rolf and I will work out what to do about the intel test, but for 1.2.8, I think we're good to one. It would be good to get one more confirmation from someone else, though. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] 1.2.8 testing
K, thanks. I don't think I've run the tests up to 16ppn to be able to help out here, sorry... On Oct 13, 2008, at 2:41 PM, Ralph Castain wrote: I'll test 1.2.8 on our Lobo system tomorrow (out today). Primary issue we are seeing there frankly is that some of the tests simply fail when you get up to 16ppn - in one case, it appears that the memory allocated during the test overflows available memory on the node when you get that many procs. So sorting out which tests run at 16ppn and which don't has become a bit of a challenge. I'll see what I can do, though. Ralph On Oct 13, 2008, at 12:12 PM, Jeff Squyres wrote: On Oct 13, 2008, at 1:34 PM, Jeff Squyres wrote: MPI_Bsend_init_rtoa_f MPI_Bsend_rtoa_f MPI_Ibsend_rtoa_f MPI_Bsend_init_rtoa_f MPI_Bsend_rtoa_f MPI_Ibsend_rtoa_f These tests fail with the PGI fortran compiler because they are trying to allocate a 1.5MB buffer on the stack (i.e., they segv before the first executable line of code). Reducing the size of the buffer makes the tests pass. The size of the buffer was increased by Rolf when he made the intel tests able to be run with more than 64 procs. So I'm pretty sure this is a new failure. Rolf and I will work out what to do about the intel test, but for 1.2.8, I think we're good to one. It would be good to get one more confirmation from someone else, though. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems