Re: [OMPI devel] bug?
Circling some off-list comments back to the list...while we could and should error-out easier, this really isn't a supportable operation. What the cmd mpirun -n 2 -slot-list 1,3 foo appears to do is cause us to launch a 2-process job consisting of vpid=1 and vpid=3, as opposed to the normal vpid=0 and 1. Not only is ORTE not prepared to handle this scenario, I believe it will cause problems in some areas within OMPI. I can try to make it fail nicer - someone with more knowledge of the intended slot-list behavior would have to make it do what they actually intended, or at least explain what is supposed o happen. Ralph On Sep 24, 2009, at 7:03 PM, Eugene Loh wrote: mpirun -V mpirun (Open MPI) 1.4a1-1 Ralph Castain wrote: Sigh - you really need to remember to tell us what version you're talking about. On Sep 24, 2009, at 5:39 PM, Eugene Loh wrote: I assume this is a bug? % mpirun -np 2 -slot-list 1,3 hostname [saem9:10337] [[455,0],0] ORTE_ERROR_LOG: Not found in file base/ odls_base_default_fns.c at line 875 [saem9:10337] *** Process received signal *** [saem9:10337] Signal: Segmentation fault (11) [saem9:10337] Signal code: Address not mapped (1) [saem9:10337] Failing at address: 0x4c [saem9:10337] [ 0] [0xe600] [saem9:10337] [ 1] /home/eugene/CTperf/test-CT821/paff_bug2/src/ myopt/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x78a) [0xf7f8c206] [saem9:10337] [ 2] /home/eugene/CTperf/test-CT821/paff_bug2/src/ myopt/lib/openmpi/mca_plm_rsh.so [0xf7d13564] [saem9:10337] [ 3] mpirun [0x804b49d] [saem9:10337] [ 4] mpirun [0x804a456] [saem9:10337] [ 5] /lib/libc.so.6(__libc_start_main+0xdc) [0xf7d348ac] [saem9:10337] [ 6] mpirun(orte_daemon_recv+0x201) [0x804a3b1] [saem9:10337] *** End of error message *** Segmentation fault ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] bug?
Thanks, filed as https://svn.open-mpi.org/trac/ompi/ticket/2030 Ralph Castain wrote: Circling some off-list comments back to the list...while we could and should error-out easier, this really isn't a supportable operation. What the cmd mpirun -n 2 -slot-list 1,3 foo appears to do is cause us to launch a 2-process job consisting of vpid=1 and vpid=3, as opposed to the normal vpid=0 and 1. Not only is ORTE not prepared to handle this scenario, I believe it will cause problems in some areas within OMPI. I can try to make it fail nicer - someone with more knowledge of the intended slot-list behavior would have to make it do what they actually intended, or at least explain what is supposed o happen. Ralph On Sep 24, 2009, at 7:03 PM, Eugene Loh wrote: mpirun -V mpirun (Open MPI) 1.4a1-1 Ralph Castain wrote: Sigh - you really need to remember to tell us what version you're talking about. On Sep 24, 2009, at 5:39 PM, Eugene Loh wrote: I assume this is a bug? % mpirun -np 2 -slot-list 1,3 hostname [saem9:10337] [[455,0],0] ORTE_ERROR_LOG: Not found in file base/ odls_base_default_fns.c at line 875 [saem9:10337] *** Process received signal *** [saem9:10337] Signal: Segmentation fault (11) [saem9:10337] Signal code: Address not mapped (1) [saem9:10337] Failing at address: 0x4c [saem9:10337] [ 0] [0xe600] [saem9:10337] [ 1] /home/eugene/CTperf/test-CT821/paff_bug2/src/ myopt/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x78a) [0xf7f8c206] [saem9:10337] [ 2] /home/eugene/CTperf/test-CT821/paff_bug2/src/ myopt/lib/openmpi/mca_plm_rsh.so [0xf7d13564] [saem9:10337] [ 3] mpirun [0x804b49d] [saem9:10337] [ 4] mpirun [0x804a456] [saem9:10337] [ 5] /lib/libc.so.6(__libc_start_main+0xdc) [0xf7d348ac] [saem9:10337] [ 6] mpirun(orte_daemon_recv+0x201) [0x804a3b1] [saem9:10337] *** End of error message *** Segmentation fault ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r22014
I think there is a problem with this change - here is a warning I get when compiling on Mac and Linux: ompi_debuggers.c:265: warning: no previous prototype for ‘MPIR_Breakpoint’ Can you please take a look? Thanks Ralph On Sep 25, 2009, at 1:14 PM, emall...@osl.iu.edu wrote: Author: emallove Date: 2009-09-25 15:14:19 EDT (Fri, 25 Sep 2009) New Revision: 22014 URL: https://svn.open-mpi.org/trac/ompi/changeset/22014 Log: Remove `static` from `MPIR_Breakpoint` so Intel compilers will not inline it Text files modified: trunk/ompi/debuggers/ompi_debuggers.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Modified: trunk/ompi/debuggers/ompi_debuggers.c = = = = = = = = == --- trunk/ompi/debuggers/ompi_debuggers.c (original) +++ trunk/ompi/debuggers/ompi_debuggers.c 2009-09-25 15:14:19 EDT (Fri, 25 Sep 2009) @@ -261,7 +261,7 @@ * defined in orterun for the starter. It should never conflict with * this one, but we'll make it static, just to be sure. */ -static void *MPIR_Breakpoint(void) +void *MPIR_Breakpoint(void) { return NULL; } ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn