Re: [OMPI devel] bug?

2009-09-25 Thread Ralph Castain
Circling some off-list comments back to the list...while we could and  
should error-out easier, this really isn't a supportable operation.  
What the cmd


mpirun -n 2 -slot-list 1,3 foo

appears to do is cause us to launch a 2-process job consisting of  
vpid=1 and vpid=3, as opposed to the normal vpid=0 and 1.


Not only is ORTE not prepared to handle this scenario, I believe it  
will cause problems in some areas within OMPI.


I can try to make it fail nicer - someone with more knowledge of the  
intended slot-list behavior would have to make it do what they  
actually intended, or at least explain what is supposed o happen.


Ralph

On Sep 24, 2009, at 7:03 PM, Eugene Loh wrote:


mpirun -V
mpirun (Open MPI) 1.4a1-1

Ralph Castain wrote:

Sigh - you really need to remember to tell us what version you're   
talking about.


On Sep 24, 2009, at 5:39 PM, Eugene Loh wrote:


I assume this is a bug?

% mpirun -np 2 -slot-list 1,3 hostname
[saem9:10337] [[455,0],0] ORTE_ERROR_LOG: Not found in file base/  
odls_base_default_fns.c at line 875

[saem9:10337] *** Process received signal ***
[saem9:10337] Signal: Segmentation fault (11)
[saem9:10337] Signal code: Address not mapped (1)
[saem9:10337] Failing at address: 0x4c
[saem9:10337] [ 0] [0xe600]
[saem9:10337] [ 1] /home/eugene/CTperf/test-CT821/paff_bug2/src/  
myopt/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x78a)   
[0xf7f8c206]
[saem9:10337] [ 2] /home/eugene/CTperf/test-CT821/paff_bug2/src/  
myopt/lib/openmpi/mca_plm_rsh.so [0xf7d13564]

[saem9:10337] [ 3] mpirun [0x804b49d]
[saem9:10337] [ 4] mpirun [0x804a456]
[saem9:10337] [ 5] /lib/libc.so.6(__libc_start_main+0xdc)  
[0xf7d348ac]

[saem9:10337] [ 6] mpirun(orte_daemon_recv+0x201) [0x804a3b1]
[saem9:10337] *** End of error message ***
Segmentation fault



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] bug?

2009-09-25 Thread Eugene Loh

Thanks, filed as https://svn.open-mpi.org/trac/ompi/ticket/2030

Ralph Castain wrote:

Circling some off-list comments back to the list...while we could and  
should error-out easier, this really isn't a supportable operation.  
What the cmd


mpirun -n 2 -slot-list 1,3 foo

appears to do is cause us to launch a 2-process job consisting of  
vpid=1 and vpid=3, as opposed to the normal vpid=0 and 1.


Not only is ORTE not prepared to handle this scenario, I believe it  
will cause problems in some areas within OMPI.


I can try to make it fail nicer - someone with more knowledge of the  
intended slot-list behavior would have to make it do what they  
actually intended, or at least explain what is supposed o happen.


Ralph

On Sep 24, 2009, at 7:03 PM, Eugene Loh wrote:


mpirun -V
mpirun (Open MPI) 1.4a1-1

Ralph Castain wrote:

Sigh - you really need to remember to tell us what version you're   
talking about.


On Sep 24, 2009, at 5:39 PM, Eugene Loh wrote:


I assume this is a bug?

% mpirun -np 2 -slot-list 1,3 hostname
[saem9:10337] [[455,0],0] ORTE_ERROR_LOG: Not found in file base/  
odls_base_default_fns.c at line 875

[saem9:10337] *** Process received signal ***
[saem9:10337] Signal: Segmentation fault (11)
[saem9:10337] Signal code: Address not mapped (1)
[saem9:10337] Failing at address: 0x4c
[saem9:10337] [ 0] [0xe600]
[saem9:10337] [ 1] /home/eugene/CTperf/test-CT821/paff_bug2/src/  
myopt/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x78a)   
[0xf7f8c206]
[saem9:10337] [ 2] /home/eugene/CTperf/test-CT821/paff_bug2/src/  
myopt/lib/openmpi/mca_plm_rsh.so [0xf7d13564]

[saem9:10337] [ 3] mpirun [0x804b49d]
[saem9:10337] [ 4] mpirun [0x804a456]
[saem9:10337] [ 5] /lib/libc.so.6(__libc_start_main+0xdc)  
[0xf7d348ac]

[saem9:10337] [ 6] mpirun(orte_daemon_recv+0x201) [0x804a3b1]
[saem9:10337] *** End of error message ***
Segmentation fault




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] [OMPI svn] svn:open-mpi r22014

2009-09-25 Thread Ralph Castain
I think there is a problem with this change - here is a warning I get  
when compiling on Mac and Linux:


ompi_debuggers.c:265: warning: no previous prototype for  
‘MPIR_Breakpoint’


Can you please take a look?

Thanks
Ralph

On Sep 25, 2009, at 1:14 PM, emall...@osl.iu.edu wrote:


Author: emallove
Date: 2009-09-25 15:14:19 EDT (Fri, 25 Sep 2009)
New Revision: 22014
URL: https://svn.open-mpi.org/trac/ompi/changeset/22014

Log:
Remove `static` from `MPIR_Breakpoint` so Intel compilers will not  
inline it


Text files modified:
  trunk/ompi/debuggers/ompi_debuggers.c | 2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

Modified: trunk/ompi/debuggers/ompi_debuggers.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/ompi/debuggers/ompi_debuggers.c   (original)
+++ trunk/ompi/debuggers/ompi_debuggers.c	2009-09-25 15:14:19 EDT  
(Fri, 25 Sep 2009)

@@ -261,7 +261,7 @@
 * defined in orterun for the starter.  It should never conflict with
 * this one, but we'll make it static, just to be sure.
 */
-static void *MPIR_Breakpoint(void)
+void *MPIR_Breakpoint(void)
{
return NULL;
}
___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn