Ralph, This patch fixed it, num_nodes was being used initialised and hence the client was getting a bogus value for the number of nodes.
Ashley, On Mon, 2009-05-18 at 10:09 +0100, Ashley Pittman wrote: > No joy I'm afraid, now I get errors when I run it. This is a single > node job run with the command line "mpirun -n 3 ./a.out". I've attached > the strace output and gzipped /tmp files from the machine. Valgrind on > the opmi-ps process doesn't show anything interesting. > > [alpha:29942] [[35044,0],0] ORTE_ERROR_LOG: Data unpack would read past > end of buffer in > file > /mnt/home/debian/ashley/code/OpenMPI/ompi-trunk-tes/trunk/orte/util/comm/comm.c > at line 242 > [alpha:29942] [[35044,0],0] ORTE_ERROR_LOG: Data unpack would read past > end of buffer in > file > /mnt/home/debian/ashley/code/OpenMPI/ompi-trunk-tes/trunk/orte/tools/orte-ps/orte-ps.c > at line 818 > > Ashley. > > On Sat, 2009-05-16 at 08:15 -0600, Ralph Castain wrote: > > This is fixed now, Ashley - sorry for the problem. > > > > > > On May 15, 2009, at 4:47 AM, Ashley Pittman wrote: > > > > > On Thu, 2009-05-14 at 22:49 -0600, Ralph Castain wrote: > > >> It is definitely broken at the moment, Ashley. I have it pretty well > > >> fixed, but need/want to cleanup some corner cases that have plagued > > >> us > > >> for a long time. > > >> > > >> Should have it for you sometime Friday. > > > > > > Ok, thanks. I might try switching to slurm in the mean-time, I know > > > my > > > code works with that. > > > > > > Can you let me know when it's fixed on or off list and I'll do an > > > update. > > > > > > Ashley, > > > > > > _______________________________________________ > > > devel mailing list > > > de...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Index: orte/orted/orted_comm.c =================================================================== --- orte/orted/orted_comm.c (revision 21248) +++ orte/orted/orted_comm.c (working copy) @@ -837,6 +837,7 @@ goto CLEANUP; } } else { + num_nodes = 0; /* count number of nodes */ for (i=0; i < orte_node_pool->size; i++) { if (NULL != opal_pointer_array_get_item(orte_node_pool, i)) {