date:20140222

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Ralph Castain


On Feb 22, 2014, at 10:14 AM, Suraj Prabhakaran  
wrote:

>> Yeah, we added those capabilities specifically for this purpose. Indeed, 
>> another researcher added this to Torque a couple of years ago, though it 
>> didn't get pushed upstream. Also was added to Slurm.
> 
> Thanks for your help . By any chance you have more info on that one? Or a 
> faint idea where I can find some info on that? I never found something like 
> that

The Torque work was being done by someone in Ohio back in 2007 - never heard 
how it ended up. The name was Prakash Velayutham - you can search the user 
archives to find the email thread. This was a much older version of OMPI and 
things have changed a lot. IIRC, someone else since then also asked some 
questions indicating they were investigating it, but I don't offhand recall who 
and didn't immediately see it in the archives.

The Slurm work was done under my direction while at EMC. Jimmy Cao did the 
coding, but you won't find anything in our archives about it. It was committed 
to Slurm about a year ago and is in the current release. You have to configure 
Slurm to use it, but the orte/mca/ras framework was updated to support a 
dynamic alloc request (only the Slurm plugin actually implements it today).

HTH
Ralph

> 
> Best,
> Suraj
> 
> On Feb 22, 2014, at 6:38 PM, Ralph Castain wrote:
> 
>> 
>> On Feb 22, 2014, at 9:30 AM, Suraj Prabhakaran  
>> wrote:
>> 
>>> Thanks Ralph.
>>> 
>>> I cannot get rid of Torque since I am actually working on dynamic 
>>> allocation of nodes for a running job on Torque. What I actually want to do 
>>> is spawn processes on the dynamically assigned nodes since that is the most 
>>> easiest way to expand MPI processes when a resource allocation is expanded. 
>> 
>> No problem - just set "-mca plm rsh" on your cmd line. You'll still use the 
>> Torque allocator, but use ssh to launch the results.
>> 
>>> 
>>> I also contemplated on whether any of my changes to the Torque daemons 
>>> could be the problem but it cannot be because of 2 reasons.
>>> 
>>> 1. For the cases which I have sent you, no dynamic allocation is done. Just 
>>> using MPI_Comm_spawn on a normal allocation of resources. So my changes to 
>>> torque are irrelevant here as they are not even called. 
>>> 2. Further, the processes start successfully on all the nodes. Torque logs 
>>> don't report any problems and the processes do exist on all the nodes. And 
>>> the fact that "sometimes" they work and don't have a problem! 
>>> 
>>> I am not sure how many users really use MPI_Comm_spawn (spawning large 
>>> processes) under the Torque environment to actually not such a problem. 
>>> Because, mpiexec works just fine for any number of processes. 
>> 
>> Not many - comm_spawn is only used by researchers upon occasion. I haven't 
>> seen a "real" application yet, though we may just have not heard about it. 
>> Still, we do have users with Torque, and perhaps someone can check it.
>> 
>>> 
>>> Any suggestions or hints on this would be highly appreciated. OpenMPI also 
>>> seems to be the only implementation we can use for this work at the moment 
>>> because of the "add-host" info argument for MPI_Comm_spawn which we are 
>>> using comfortably when spawning onto dynamically allocated hosts which were 
>>> not a part of the original allocation. 
>> 
>> Yeah, we added those capabilities specifically for this purpose. Indeed, 
>> another researcher added this to Torque a couple of years ago, though it 
>> didn't get pushed upstream. Also was added to Slurm.
>> 
>> Sadly, I no longer have access to a Torque machine and so I can only offer 
>> advice. OMPI is executing a state machine, so you could look at one of the 
>> procs on a machine where they are stalled (look for someone not reporting 
>> out of the modex) and see where it hung. You can also watch it move thru the 
>> state machine by setting
>> 
>> -mca state_base_verbose 10
>> 
>> on your command line.
>> 
>> Happy to provide advice - sorry for the problem
>> Ralph
>> 
>> 
>>> 
>>> Best,
>>> Suraj
>>> 
>>> 
>>> On Feb 22, 2014, at 4:30 PM, Ralph Castain wrote:
>>> 
 
 On Feb 21, 2014, at 5:55 PM, Suraj Prabhakaran 
  wrote:
 
> Hmm.. but in actual the MPI_Comm_spawn of parents and MPI_Init of 
> children never returned!
 
 Understood - my point was that the output shows no errors or issues. For 
 some reason, the progress thread appears to just stop. This usually 
 indicates some kind of recursive behavior, but that isn't showing up in 
 the output.
 
> 
> I configured MPI with 
> 
> ./configure --prefix=/dir/ --enable-debug --with-tm=/usr/local/
 
 Should be fine. I don't have access to a Torque-based system, and we 
 aren't hearing issues from other Torque users, so this may have something 
 to do with how Torque is configured on your system. Perhaps

[OMPI devel] openmpi-1.7.5a1r30797 fails building on SL 5.5

2014-02-22 Thread Adrian Reber

On a Scientific Linux 5.5 system the nightly snapshot
openmpi-1.7.5a1r30797 fails to build with following errors:


Making all in romio
make[3]: Entering directory 
`/tmp/adrian/openmpi-compile/openmpi-1.7.5a1r30797/build/ompi/mca/io/romio/romio'
make[4]: Entering directory 
`/tmp/adrian/openmpi-compile/openmpi-1.7.5a1r30797/build/ompi/mca/io/romio/romio'
make[4]: Leaving directory 
`/tmp/adrian/openmpi-compile/openmpi-1.7.5a1r30797/build/ompi/mca/io/romio/romio'
make[3]: Leaving directory 
`/tmp/adrian/openmpi-compile/openmpi-1.7.5a1r30797/build/ompi/mca/io/romio/romio'
make[3]: Entering directory 
`/tmp/adrian/openmpi-compile/openmpi-1.7.5a1r30797/build/ompi/mca/io/romio'
  CCLD mca_io_romio.la
romio/.libs/libromio_dist.a(delete.o): In function `lstat64':
delete.c:(.text+0x0): multiple definition of `lstat64'
romio/.libs/libromio_dist.a(close.o):close.c:(.text+0x0): first defined here
romio/.libs/libromio_dist.a(fsync.o): In function `lstat64':
fsync.c:(.text+0x0): multiple definition of `lstat64'
romio/.libs/libromio_dist.a(close.o):close.c:(.text+0x0): first defined here
romio/.libs/libromio_dist.a(get_amode.o): In function `lstat64':
get_amode.c:(.text+0x0): multiple definition of `lstat64'
romio/.libs/libromio_dist.a(close.o):close.c:(.text+0x0): first defined here
romio/.libs/libromio_dist.a(get_atom.o): In function `lstat64':
get_atom.c:(.text+0x0): multiple definition of `lstat64'

and many more of those errors. 1.7.4 also fails.

Following can be seen during configure (with no parameters):

WARNING: Unknown architecture ... proceeding anyway

Adrian

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Suraj Prabhakaran

> Yeah, we added those capabilities specifically for this purpose. Indeed, 
> another researcher added this to Torque a couple of years ago, though it 
> didn't get pushed upstream. Also was added to Slurm.

Thanks for your help . By any chance you have more info on that one? Or a faint 
idea where I can find some info on that? I never found something like that

Best,
Suraj

On Feb 22, 2014, at 6:38 PM, Ralph Castain wrote:

> 
> On Feb 22, 2014, at 9:30 AM, Suraj Prabhakaran  
> wrote:
> 
>> Thanks Ralph.
>> 
>> I cannot get rid of Torque since I am actually working on dynamic allocation 
>> of nodes for a running job on Torque. What I actually want to do is spawn 
>> processes on the dynamically assigned nodes since that is the most easiest 
>> way to expand MPI processes when a resource allocation is expanded. 
> 
> No problem - just set "-mca plm rsh" on your cmd line. You'll still use the 
> Torque allocator, but use ssh to launch the results.
> 
>> 
>> I also contemplated on whether any of my changes to the Torque daemons could 
>> be the problem but it cannot be because of 2 reasons.
>> 
>> 1. For the cases which I have sent you, no dynamic allocation is done. Just 
>> using MPI_Comm_spawn on a normal allocation of resources. So my changes to 
>> torque are irrelevant here as they are not even called. 
>> 2. Further, the processes start successfully on all the nodes. Torque logs 
>> don't report any problems and the processes do exist on all the nodes. And 
>> the fact that "sometimes" they work and don't have a problem! 
>> 
>> I am not sure how many users really use MPI_Comm_spawn (spawning large 
>> processes) under the Torque environment to actually not such a problem. 
>> Because, mpiexec works just fine for any number of processes. 
> 
> Not many - comm_spawn is only used by researchers upon occasion. I haven't 
> seen a "real" application yet, though we may just have not heard about it. 
> Still, we do have users with Torque, and perhaps someone can check it.
> 
>> 
>> Any suggestions or hints on this would be highly appreciated. OpenMPI also 
>> seems to be the only implementation we can use for this work at the moment 
>> because of the "add-host" info argument for MPI_Comm_spawn which we are 
>> using comfortably when spawning onto dynamically allocated hosts which were 
>> not a part of the original allocation. 
> 
> Yeah, we added those capabilities specifically for this purpose. Indeed, 
> another researcher added this to Torque a couple of years ago, though it 
> didn't get pushed upstream. Also was added to Slurm.
> 
> Sadly, I no longer have access to a Torque machine and so I can only offer 
> advice. OMPI is executing a state machine, so you could look at one of the 
> procs on a machine where they are stalled (look for someone not reporting out 
> of the modex) and see where it hung. You can also watch it move thru the 
> state machine by setting
> 
> -mca state_base_verbose 10
> 
> on your command line.
> 
> Happy to provide advice - sorry for the problem
> Ralph
> 
> 
>> 
>> Best,
>> Suraj
>> 
>> 
>> On Feb 22, 2014, at 4:30 PM, Ralph Castain wrote:
>> 
>>> 
>>> On Feb 21, 2014, at 5:55 PM, Suraj Prabhakaran 
>>>  wrote:
>>> 
 Hmm.. but in actual the MPI_Comm_spawn of parents and MPI_Init of children 
 never returned!
>>> 
>>> Understood - my point was that the output shows no errors or issues. For 
>>> some reason, the progress thread appears to just stop. This usually 
>>> indicates some kind of recursive behavior, but that isn't showing up in the 
>>> output.
>>> 
 
 I configured MPI with 
 
 ./configure --prefix=/dir/ --enable-debug --with-tm=/usr/local/
>>> 
>>> Should be fine. I don't have access to a Torque-based system, and we aren't 
>>> hearing issues from other Torque users, so this may have something to do 
>>> with how Torque is configured on your system. Perhaps someone with a 
>>> Torque-based system on the list could also test this?
>>> 
>>> Meantime, I would suggest just using rsh/ssh (since you said that works) 
>>> for now as Torque really isn't doing anything for you in this use-case.
>>> 
>>> 
 
 
 On Feb 22, 2014, at 12:53 AM, Ralph Castain wrote:
 
> Strange - it all looks just fine. How was OMPI configured?
> 
> On Feb 21, 2014, at 3:31 PM, Suraj Prabhakaran 
>  wrote:
> 
>> Ok, I figured out that it was not a problem with the node grsacc04 
>> because I now conducted the same on totally different set of nodes. 
>> 
>> I must really say that with --bind-to none option, the program completed 
>> "many" times compared to earlier but still "sometimes" it hangs! 
>> Attaching now the output of the same case conducted on different set of 
>> nodes with the --bind-to none option.
>> 
>> mpiexec  -mca plm_base_verbose 5 -mca ess_base_verbose 5 -mca 
>> grpcomm_base_verbose

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Ralph Castain


On Feb 22, 2014, at 9:30 AM, Suraj Prabhakaran  
wrote:

> Thanks Ralph.
> 
> I cannot get rid of Torque since I am actually working on dynamic allocation 
> of nodes for a running job on Torque. What I actually want to do is spawn 
> processes on the dynamically assigned nodes since that is the most easiest 
> way to expand MPI processes when a resource allocation is expanded. 

No problem - just set "-mca plm rsh" on your cmd line. You'll still use the 
Torque allocator, but use ssh to launch the results.

> 
> I also contemplated on whether any of my changes to the Torque daemons could 
> be the problem but it cannot be because of 2 reasons.
> 
> 1. For the cases which I have sent you, no dynamic allocation is done. Just 
> using MPI_Comm_spawn on a normal allocation of resources. So my changes to 
> torque are irrelevant here as they are not even called. 
> 2. Further, the processes start successfully on all the nodes. Torque logs 
> don't report any problems and the processes do exist on all the nodes. And 
> the fact that "sometimes" they work and don't have a problem! 
> 
> I am not sure how many users really use MPI_Comm_spawn (spawning large 
> processes) under the Torque environment to actually not such a problem. 
> Because, mpiexec works just fine for any number of processes. 

Not many - comm_spawn is only used by researchers upon occasion. I haven't seen 
a "real" application yet, though we may just have not heard about it. Still, we 
do have users with Torque, and perhaps someone can check it.

> 
> Any suggestions or hints on this would be highly appreciated. OpenMPI also 
> seems to be the only implementation we can use for this work at the moment 
> because of the "add-host" info argument for MPI_Comm_spawn which we are using 
> comfortably when spawning onto dynamically allocated hosts which were not a 
> part of the original allocation. 

Yeah, we added those capabilities specifically for this purpose. Indeed, 
another researcher added this to Torque a couple of years ago, though it didn't 
get pushed upstream. Also was added to Slurm.

Sadly, I no longer have access to a Torque machine and so I can only offer 
advice. OMPI is executing a state machine, so you could look at one of the 
procs on a machine where they are stalled (look for someone not reporting out 
of the modex) and see where it hung. You can also watch it move thru the state 
machine by setting

-mca state_base_verbose 10

on your command line.

Happy to provide advice - sorry for the problem
Ralph


> 
> Best,
> Suraj
> 
> 
> On Feb 22, 2014, at 4:30 PM, Ralph Castain wrote:
> 
>> 
>> On Feb 21, 2014, at 5:55 PM, Suraj Prabhakaran  
>> wrote:
>> 
>>> Hmm.. but in actual the MPI_Comm_spawn of parents and MPI_Init of children 
>>> never returned!
>> 
>> Understood - my point was that the output shows no errors or issues. For 
>> some reason, the progress thread appears to just stop. This usually 
>> indicates some kind of recursive behavior, but that isn't showing up in the 
>> output.
>> 
>>> 
>>> I configured MPI with 
>>> 
>>> ./configure --prefix=/dir/ --enable-debug --with-tm=/usr/local/
>> 
>> Should be fine. I don't have access to a Torque-based system, and we aren't 
>> hearing issues from other Torque users, so this may have something to do 
>> with how Torque is configured on your system. Perhaps someone with a 
>> Torque-based system on the list could also test this?
>> 
>> Meantime, I would suggest just using rsh/ssh (since you said that works) for 
>> now as Torque really isn't doing anything for you in this use-case.
>> 
>> 
>>> 
>>> 
>>> On Feb 22, 2014, at 12:53 AM, Ralph Castain wrote:
>>> 
 Strange - it all looks just fine. How was OMPI configured?
 
 On Feb 21, 2014, at 3:31 PM, Suraj Prabhakaran 
  wrote:
 
> Ok, I figured out that it was not a problem with the node grsacc04 
> because I now conducted the same on totally different set of nodes. 
> 
> I must really say that with --bind-to none option, the program completed 
> "many" times compared to earlier but still "sometimes" it hangs! 
> Attaching now the output of the same case conducted on different set of 
> nodes with the --bind-to none option.
> 
> mpiexec  -mca plm_base_verbose 5 -mca ess_base_verbose 5 -mca 
> grpcomm_base_verbose 5 --bind-to none -np 3 ./example
> 
> Best,
> Suraj
> 
> 
> 
> 
> On Feb 21, 2014, at 5:03 PM, Ralph Castain wrote:
> 
>> Well, that all looks fine. However, I note that the procs on grsacc04 
>> all stopped making progress at the same point, which is why the job 
>> hung. All the procs on the other nodes were just fine.
>> 
>> So let's try a couple of things:
>> 
>> 1. add "--bind-to none" to your cmd line so we avoid any contention 
>> issues
>> 
>> 2. if possible, remove grsacc04 from the

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Suraj Prabhakaran

Thanks Ralph.

I cannot get rid of Torque since I am actually working on dynamic allocation of 
nodes for a running job on Torque. What I actually want to do is spawn 
processes on the dynamically assigned nodes since that is the most easiest way 
to expand MPI processes when a resource allocation is expanded. 

I also contemplated on whether any of my changes to the Torque daemons could be 
the problem but it cannot be because of 2 reasons.

1. For the cases which I have sent you, no dynamic allocation is done. Just 
using MPI_Comm_spawn on a normal allocation of resources. So my changes to 
torque are irrelevant here as they are not even called. 
2. Further, the processes start successfully on all the nodes. Torque logs 
don't report any problems and the processes do exist on all the nodes. And the 
fact that "sometimes" they work and don't have a problem! 

I am not sure how many users really use MPI_Comm_spawn (spawning large 
processes) under the Torque environment to actually not such a problem. 
Because, mpiexec works just fine for any number of processes. 

Any suggestions or hints on this would be highly appreciated. OpenMPI also 
seems to be the only implementation we can use for this work at the moment 
because of the "add-host" info argument for MPI_Comm_spawn which we are using 
comfortably when spawning onto dynamically allocated hosts which were not a 
part of the original allocation. 

Best,
Suraj

On Feb 22, 2014, at 4:30 PM, Ralph Castain wrote:

> 
> On Feb 21, 2014, at 5:55 PM, Suraj Prabhakaran  
> wrote:
> 
>> Hmm.. but in actual the MPI_Comm_spawn of parents and MPI_Init of children 
>> never returned!
> 
> Understood - my point was that the output shows no errors or issues. For some 
> reason, the progress thread appears to just stop. This usually indicates some 
> kind of recursive behavior, but that isn't showing up in the output.
> 
>> 
>> I configured MPI with 
>> 
>> ./configure --prefix=/dir/ --enable-debug --with-tm=/usr/local/
> 
> Should be fine. I don't have access to a Torque-based system, and we aren't 
> hearing issues from other Torque users, so this may have something to do with 
> how Torque is configured on your system. Perhaps someone with a Torque-based 
> system on the list could also test this?
> 
> Meantime, I would suggest just using rsh/ssh (since you said that works) for 
> now as Torque really isn't doing anything for you in this use-case.
> 
> 
>> 
>> 
>> On Feb 22, 2014, at 12:53 AM, Ralph Castain wrote:
>> 
>>> Strange - it all looks just fine. How was OMPI configured?
>>> 
>>> On Feb 21, 2014, at 3:31 PM, Suraj Prabhakaran 
>>>  wrote:
>>> 
 Ok, I figured out that it was not a problem with the node grsacc04 because 
 I now conducted the same on totally different set of nodes. 

 I must really say that with --bind-to none option, the program completed 
 "many" times compared to earlier but still "sometimes" it hangs! Attaching 
 now the output of the same case conducted on different set of nodes with 
 the --bind-to none option.

 mpiexec  -mca plm_base_verbose 5 -mca ess_base_verbose 5 -mca 
 grpcomm_base_verbose 5 --bind-to none -np 3 ./example

 Best,
 Suraj

 On Feb 21, 2014, at 5:03 PM, Ralph Castain wrote:

> Well, that all looks fine. However, I note that the procs on grsacc04 all 
> stopped making progress at the same point, which is why the job hung. All 
> the procs on the other nodes were just fine.
> 
> So let's try a couple of things:
> 
> 1. add "--bind-to none" to your cmd line so we avoid any contention issues
> 
> 2. if possible, remove grsacc04 from the allocation (you can just use the 
> -host option on the cmd line to ignore it), and/or replace that host with 
> another one. Let's see if the problem has something to do with that 
> specific node.
> 
> 
> On Feb 21, 2014, at 4:08 AM, Suraj Prabhakaran 
>  wrote:
> 
>> Right, so I have the output here. Same case, 
>> 
>> mpiexec  -mca plm_base_verbose 5 -mca ess_base_verbose 5 -mca 
>> grpcomm_base_verbose 5  -np 3 ./simple_spawn
>> 
>> Output attached. 
>> 
>> Best,
>> Suraj
>> 
>> 
>> 
>> On Feb 21, 2014, at 5:30 AM, Ralph Castain wrote:
>> 
>>> 
>>> On Feb 20, 2014, at 7:05 PM, Suraj Prabhakaran 
>>>  wrote:
>>> 
 Thanks Ralph!

 I must have mentioned though. Without the Torque environment, spawning 
 with ssh works ok. But Under the torque environment, not. 
>>> 
>>> Ah, no - you forgot to mention that point.
>>> 

 I started the simple_spawn with 3 processes and spawned 9 processes (3 
 per node on 3 nodes). 

 There is no problem with the Torque environment

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Ralph Castain


On Feb 21, 2014, at 5:55 PM, Suraj Prabhakaran  
wrote:

> Hmm.. but in actual the MPI_Comm_spawn of parents and MPI_Init of children 
> never returned!

Understood - my point was that the output shows no errors or issues. For some 
reason, the progress thread appears to just stop. This usually indicates some 
kind of recursive behavior, but that isn't showing up in the output.

> 
> I configured MPI with 
> 
> ./configure --prefix=/dir/ --enable-debug --with-tm=/usr/local/

Should be fine. I don't have access to a Torque-based system, and we aren't 
hearing issues from other Torque users, so this may have something to do with 
how Torque is configured on your system. Perhaps someone with a Torque-based 
system on the list could also test this?

Meantime, I would suggest just using rsh/ssh (since you said that works) for 
now as Torque really isn't doing anything for you in this use-case.


> 
> 
> On Feb 22, 2014, at 12:53 AM, Ralph Castain wrote:
> 
>> Strange - it all looks just fine. How was OMPI configured?
>> 
>> On Feb 21, 2014, at 3:31 PM, Suraj Prabhakaran  
>> wrote:
>> 
>>> Ok, I figured out that it was not a problem with the node grsacc04 because 
>>> I now conducted the same on totally different set of nodes. 
>>> 
>>> I must really say that with --bind-to none option, the program completed 
>>> "many" times compared to earlier but still "sometimes" it hangs! Attaching 
>>> now the output of the same case conducted on different set of nodes with 
>>> the --bind-to none option.
>>> 
>>> mpiexec  -mca plm_base_verbose 5 -mca ess_base_verbose 5 -mca 
>>> grpcomm_base_verbose 5 --bind-to none -np 3 ./example
>>> 
>>> Best,
>>> Suraj
>>> 
>>> 
>>> 
>>> 
>>> On Feb 21, 2014, at 5:03 PM, Ralph Castain wrote:
>>> 
 Well, that all looks fine. However, I note that the procs on grsacc04 all 
 stopped making progress at the same point, which is why the job hung. All 
 the procs on the other nodes were just fine.
 
 So let's try a couple of things:
 
 1. add "--bind-to none" to your cmd line so we avoid any contention issues
 
 2. if possible, remove grsacc04 from the allocation (you can just use the 
 -host option on the cmd line to ignore it), and/or replace that host with 
 another one. Let's see if the problem has something to do with that 
 specific node.
 
 
 On Feb 21, 2014, at 4:08 AM, Suraj Prabhakaran 
  wrote:
 
> Right, so I have the output here. Same case, 
> 
> mpiexec  -mca plm_base_verbose 5 -mca ess_base_verbose 5 -mca 
> grpcomm_base_verbose 5  -np 3 ./simple_spawn
> 
> Output attached. 
> 
> Best,
> Suraj
> 
> 
> 
> On Feb 21, 2014, at 5:30 AM, Ralph Castain wrote:
> 
>> 
>> On Feb 20, 2014, at 7:05 PM, Suraj Prabhakaran 
>>  wrote:
>> 
>>> Thanks Ralph!
>>> 
>>> I must have mentioned though. Without the Torque environment, spawning 
>>> with ssh works ok. But Under the torque environment, not. 
>> 
>> Ah, no - you forgot to mention that point.
>> 
>>> 
>>> I started the simple_spawn with 3 processes and spawned 9 processes (3 
>>> per node on 3 nodes). 
>>> 
>>> There is no problem with the Torque environment because all the 9 
>>> processes are started on the respective nodes. But the MPI_Comm_spawn 
>>> of the parent and MPI_Init of the children, "sometimes" don't return!
>> 
>> Seems odd - the launch environment has nothing to do with MPI_Init, so 
>> if the processes are indeed being started, they should run. One 
>> possibility is that they aren't correctly getting some wireup info.
>> 
>> Can you configure OMPI --enable-debug and then rerun the example with 
>> "-mca plm_base_verbose 5 -mca ess_base_verbose 5 -mca 
>> grpcomm_base_verbose 5" on the command line?
>> 
>> 
>>> 
>>> This is the output of simple_spawn - which confirms the above 
>>> statement. 
>>> 
>>> [pid 31208] starting up!
>>> [pid 31209] starting up!
>>> [pid 31210] starting up!
>>> 0 completed MPI_Init
>>> Parent [pid 31208] about to spawn!
>>> 1 completed MPI_Init
>>> Parent [pid 31209] about to spawn!
>>> 2 completed MPI_Init
>>> Parent [pid 31210] about to spawn!
>>> [pid 28630] starting up!
>>> [pid 28631] starting up!
>>> [pid 9846] starting up!
>>> [pid 9847] starting up!
>>> [pid 9848] starting up!
>>> [pid 6363] starting up!
>>> [pid 6361] starting up!
>>> [pid 6362] starting up!
>>> [pid 28632] starting up!
>>> 
>>> Any hints?
>>> 
>>> Best,
>>> Suraj
>>> 
>>> On Feb 21, 2014, at 3:44 AM, Ralph Castain wrote:
>>> 
 Hmmm...I don't see anything immediately glaring. What do you mean by 
 "doesn't work"? Is there some specific

Re: [OMPI devel] MPI_Comm_spawn under Torque

[OMPI devel] openmpi-1.7.5a1r30797 fails building on SL 5.5

Re: [OMPI devel] MPI_Comm_spawn under Torque

Re: [OMPI devel] MPI_Comm_spawn under Torque

Re: [OMPI devel] MPI_Comm_spawn under Torque

Re: [OMPI devel] MPI_Comm_spawn under Torque

6 matches

Site Navigation

Mail list logo

Footer information