I would like to be able to start a non-oversubscribed run of a program in 
OpenMPI as if it were oversubscribed, so that the processes run in Degraded 
Mode, such that I have the option to start an additional simultaneous run on 
the same nodes if necessary.
(Basically, I have a program that will ask for some data, run for a while, then 
print some results, then stop and ask for more data.  It takes some time to 
collect and input the additional data, so I would like to be able to start 
another instance of the program which can be running while i'm inputting data 
to the first instance, and can be inputting while the first instance is 
running).

Since I have single-processor nodes, the obvious solution would be to set 
slots=0 for each of my nodes, so that using 1 slot for every run causes the 
nodes to be oversubscribed.  However, it seems that slots=0 is treated like 
slots=infinity, so my processes run in Aggressive Mode, and I loose the ability 
to oversubscribe my node using two independent processes.

So, I tried setting '--mca mpi_yield_when_idle 1', since this sounded like it 
was meant to force Degraded Mode.  But, it didn't seem to do anything - my 
processes still ran in Aggressive Mode.  I skimmed through the source code real 
quick, and it doesn't look like mpi_yield_when_idle is ever actually used.

So, could either slots=0 be changed to really mean slots=0, or could 
mpi_yield_when_idle be implemented so I can force my processes to run in 
Degraded Mode?


I also noticed another bug in the scheduler:
hostfile:
 A slots=2 max-slots=2
 B slots=2 max-slots=2
'mpirun -np 5' quits with an over-subscription error
'mpirun -np 3 --host B' hangs and just chews up CPU cycles forever


And finally, on http://www.open-mpi.org/faq/?category=tuning - 11. How do I 
tell Open MPI to use processor and/or memory affinity?
It mentions that OpenMPI will automatically disable processor affinity on 
oversubscribed nodes.  When I first read it, I made the assumption that 
processor affinity and Degraded Mode were incompatible.  However, it seems that 
independent non-oversubscribed processes running in Degraded Mode work fine 
with processor affinity - it's only actually oversubscribed processes which 
have problems.  A note that Degraded Mode and Processor Affinity work together 
even though Processor Affinity and oversubscription do not would be nice.

Thanks a ton!
-Paul

Reply via email to