[OMPI devel] 1.2.8 testing

2008-10-13 Thread Jeff Squyres
Short version: looks good from Cisco.  If we can get one other  
confirmation, let's ship it.


More details:  I have a very heterogenous cluster, which makes 1.2.x  
testing difficult with openib (there's a bug in the port matching  
logic that makes it not always work -- we decided a long time ago not  
to fix it).  So I have a lot of false failures in my 1.2.8rc1 MTT  
results.


I went through and checked them and re-ran the failures manually that  
I found.  As result, the only real failures that I find are:


1. various flavors of fortran bsend fail with the PGI compiler suite
  --> I'll take a good college swing at this one today and see if  
there's an obvious fix; it could be simply, since it appears to be a  
fortran-only failure.  Specifically, these fail:


MPI_Bsend_init_rtoa_f
MPI_Bsend_rtoa_f
MPI_Ibsend_rtoa_f
MPI_Bsend_init_rtoa_f
MPI_Bsend_rtoa_f
MPI_Ibsend_rtoa_f

And the _c versions of them all pass:

MPI_Bsend_init_rtoa_c
MPI_Bsend_rtoa_c
MPI_Ibsend_rtoa_c
MPI_Bsend_init_rtoa_c
MPI_Bsend_rtoa_c
MPI_Ibsend_rtoa_c

2. some variants of spawn, which also failed in my 1.2.7 testing (we  
agreed long ago not to fix these for v1.2.x)


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] 1.2.8 testing

2008-10-13 Thread Jeff Squyres

On Oct 13, 2008, at 1:34 PM, Jeff Squyres wrote:


MPI_Bsend_init_rtoa_f
MPI_Bsend_rtoa_f
MPI_Ibsend_rtoa_f
MPI_Bsend_init_rtoa_f
MPI_Bsend_rtoa_f
MPI_Ibsend_rtoa_f



These tests fail with the PGI fortran compiler because they are trying  
to allocate a 1.5MB buffer on the stack (i.e., they segv before the  
first executable line of code).  Reducing the size of the buffer makes  
the tests pass.


The size of the buffer was increased by Rolf when he made the intel  
tests able to be run with more than 64 procs.  So I'm pretty sure this  
is a new failure.


Rolf and I will work out what to do about the intel test, but for  
1.2.8, I think we're good to one.  It would be good to get one more  
confirmation from someone else, though.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] 1.2.8 testing

2008-10-13 Thread Jeff Squyres

On Oct 13, 2008, at 2:12 PM, Jeff Squyres wrote:

These tests fail with the PGI fortran compiler because they are  
trying to allocate a 1.5MB buffer on the stack (i.e., they segv  
before the first executable line of code).



s/1.5MB/16MB/

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] 1.2.8 testing

2008-10-13 Thread Ralph Castain
I'll test 1.2.8 on our Lobo system tomorrow (out today). Primary issue  
we are seeing there frankly is that some of the tests simply fail when  
you get up to 16ppn - in one case, it appears that the memory  
allocated during the test overflows available memory on the node when  
you get that many procs. So sorting out which tests run at 16ppn and  
which don't has become a bit of a challenge.


I'll see what I can do, though.
Ralph


On Oct 13, 2008, at 12:12 PM, Jeff Squyres wrote:


On Oct 13, 2008, at 1:34 PM, Jeff Squyres wrote:


MPI_Bsend_init_rtoa_f
MPI_Bsend_rtoa_f
MPI_Ibsend_rtoa_f
MPI_Bsend_init_rtoa_f
MPI_Bsend_rtoa_f
MPI_Ibsend_rtoa_f



These tests fail with the PGI fortran compiler because they are  
trying to allocate a 1.5MB buffer on the stack (i.e., they segv  
before the first executable line of code).  Reducing the size of the  
buffer makes the tests pass.


The size of the buffer was increased by Rolf when he made the intel  
tests able to be run with more than 64 procs.  So I'm pretty sure  
this is a new failure.


Rolf and I will work out what to do about the intel test, but for  
1.2.8, I think we're good to one.  It would be good to get one more  
confirmation from someone else, though.


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.2.8 testing

2008-10-13 Thread Jeff Squyres
K, thanks.  I don't think I've run the tests up to 16ppn to be able to  
help out here, sorry...


On Oct 13, 2008, at 2:41 PM, Ralph Castain wrote:


I'll test 1.2.8 on our Lobo system tomorrow (out today). Primary issue
we are seeing there frankly is that some of the tests simply fail when
you get up to 16ppn - in one case, it appears that the memory
allocated during the test overflows available memory on the node when
you get that many procs. So sorting out which tests run at 16ppn and
which don't has become a bit of a challenge.

I'll see what I can do, though.
Ralph


On Oct 13, 2008, at 12:12 PM, Jeff Squyres wrote:


On Oct 13, 2008, at 1:34 PM, Jeff Squyres wrote:


MPI_Bsend_init_rtoa_f
MPI_Bsend_rtoa_f
MPI_Ibsend_rtoa_f
MPI_Bsend_init_rtoa_f
MPI_Bsend_rtoa_f
MPI_Ibsend_rtoa_f



These tests fail with the PGI fortran compiler because they are
trying to allocate a 1.5MB buffer on the stack (i.e., they segv
before the first executable line of code).  Reducing the size of the
buffer makes the tests pass.

The size of the buffer was increased by Rolf when he made the intel
tests able to be run with more than 64 procs.  So I'm pretty sure
this is a new failure.

Rolf and I will work out what to do about the intel test, but for
1.2.8, I think we're good to one.  It would be good to get one more
confirmation from someone else, though.

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems