Re: [OMPI users] round-robin scheduling question [hostfile]

2009-02-23 Thread Raymond Wan


Hi Ralph,


Ralph Castain wrote:
...
The man page will describe all the various options. Which one is best 
for your app really depends on what the app is doing, the capabilities 
and topology of your cluster, etc. A little experimentation can help you 
get a feel for when to use which one.



Thank you for the explanation!  So far, I've only been using -np and letting the rest work itself through magic :-) --  but, I'll try the options you suggested and also other options in the man page of mpirun to see what works for my application...  


Thanks again!

Ray



Re: [OMPI users] round-robin scheduling question [hostfile]

2009-02-21 Thread Ralph Castain


On Feb 21, 2009, at 1:05 AM, Raymond Wan wrote:



Hi Ralph,

Thank you very much for your explanation!


Ralph Castain wrote:

It is a little bit of both:
* historical, because most MPI's default to mapping by slot, and
* performance, because procs that share a node can communicate via  
shared memory, which is faster than sending messages over an  
interconnect, and most apps are communication-bound
If your app is disk-intensive, then mapping it -bynode may be a  
better



Ok -- by this, it seems that there is no "rule" that says one is  
obviously better than the other.  It depends on factors such as disk  
access and shared memory access and which one is dominating.  So, it  
is worth to try both to see?


Can't hurt! You might be able to tell by knowing what your app is  
doing, but otherwise, feel free to experiment.







option for you. That's why we provide it. Note, however, that you  
can still wind up with multiple procs on a node. All "bynode" means  
is that the ranks are numbered consecutively bynode - it doesn't  
mean that there is only one proc/node.




I see.  But if the number of processes (as specified using -np) is  
less than the number of nodes, if "by node" is chosen, then is it  
guaranteed that only one process will be on each node?


That is correct


Is there a way to write the hostfile to ensure this?


You don't need to do anything in the hostfile - if you use bynode and  
np < #nodes, it is guaranteed that you will have only one proc/node




I was curious if a node has 4 slots, whether writing it 4 times in  
the hostfile with 1 slot each has any meaning.  Might be a bad idea  
as we are trying to fool mpirun?


It won't have any meaning as we aggregate the results. In other words,  
we read through the hostfile, and if a host appears more than once, we  
simply add the #slots on subsequent entries to the earlier one. So we  
wind up with just one instance of that host that has the total number  
of slots allocated to it.







If you truly want one proc/node, then you should use the -pernode  
option. This maps one proc on each node up to either the number of  
procs you specified or the number of available nodes. If you don't  
specify -np, we just put one proc on each node in your allocation/ 
hostfile.



I see ... I was not aware of that option; thank you!


Do a "man mpirun" and you will see that there are several mapping  
options that might interest you, including:


1. npernode - let's you specify how many procs/node (as opposed to  
"pernode", where you only get one proc/node - obviously, pernode is  
the equivalent of "-npernode 1")


2. seq - a sequential mapper. This mapper will read a file (which can  
be different from the hostfile used to specify your allocation) and  
assign one proc to each entry in a sequential manner like this:


node1 > rank 0 goes on node1
node5 > rank 1 goes on node5
node1 > rank 2 goes on node1
...

3. rank_file - allows you to specify that rank x goes on node foo, and  
what core/socket that rank should be bound to


The man page will describe all the various options. Which one is best  
for your app really depends on what the app is doing, the capabilities  
and topology of your cluster, etc. A little experimentation can help  
you get a feel for when to use which one.


HTH
Ralph




Ray



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] round-robin scheduling question [hostfile]

2009-02-21 Thread Raymond Wan


Hi Ralph,

Thank you very much for your explanation!


Ralph Castain wrote:

It is a little bit of both:

* historical, because most MPI's default to mapping by slot, and

* performance, because procs that share a node can communicate via 
shared memory, which is faster than sending messages over an 
interconnect, and most apps are communication-bound


If your app is disk-intensive, then mapping it -bynode may be a better 



Ok -- by this, it seems that there is no "rule" that says one is obviously 
better than the other.  It depends on factors such as disk access and shared memory 
access and which one is dominating.  So, it is worth to try both to see?


option for you. That's why we provide it. Note, however, that you can 
still wind up with multiple procs on a node. All "bynode" means is that 
the ranks are numbered consecutively bynode - it doesn't mean that there 
is only one proc/node.




I see.  But if the number of processes (as specified using -np) is less than the number of nodes, if "by node" is chosen, then is it guaranteed that only one process will be on each node?  Is there a way to write the hostfile to ensure this?  


I was curious if a node has 4 slots, whether writing it 4 times in the hostfile 
with 1 slot each has any meaning.  Might be a bad idea as we are trying to fool 
mpirun?



If you truly want one proc/node, then you should use the -pernode 
option. This maps one proc on each node up to either the number of procs 
you specified or the number of available nodes. If you don't specify 
-np, we just put one proc on each node in your allocation/hostfile.



I see ... I was not aware of that option; thank you!

Ray





Re: [OMPI users] round-robin scheduling question [hostfile]

2009-02-20 Thread Ralph Castain

It is a little bit of both:

* historical, because most MPI's default to mapping by slot, and

* performance, because procs that share a node can communicate via  
shared memory, which is faster than sending messages over an  
interconnect, and most apps are communication-bound


If your app is disk-intensive, then mapping it -bynode may be a better  
option for you. That's why we provide it. Note, however, that you can  
still wind up with multiple procs on a node. All "bynode" means is  
that the ranks are numbered consecutively bynode - it doesn't mean  
that there is only one proc/node.


If you truly want one proc/node, then you should use the -pernode  
option. This maps one proc on each node up to either the number of  
procs you specified or the number of available nodes. If you don't  
specify -np, we just put one proc on each node in your allocation/ 
hostfile.


HTH
Ralph

On Feb 20, 2009, at 1:25 AM, Raymond Wan wrote:



Hi all,

According to FAQ 14 (How do I control how my processes are scheduled  
across nodes?) [http://www.open-mpi.org/faq/?category=running#mpirun-scheduling 
], it says that the default scheduling policy is by slot and not by  
node.  I'm curious why the default is "by slot" since I am thinking  
of explicitly specifying by node but I'm wondering if there is an  
issue which I haven't considered.
I would think that one reason for "by node" is to distribute HDD  
access across machines [as is the case for me since my program is  
HDD access intensive].  Or perhaps I am mistaken?  I'm now thinking  
that "by slot" is the default because processes with ranks that are  
close together might do similar tasks and you would want them on the  
same node?  Is that the reason?


Also, at the end of this FAQ, it says "NOTE:  This is the scheduling  
policy in Open MPI because of a long historical precendent..." --   
does this "This" refer to "the fact that there are two scheduling  
policies" or "the fact that 'by slot' is the default"?  If the  
latter, then that explains why "by slot" is the default, I guess...


Thank you!

Ray



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] round-robin scheduling question [hostfile]

2009-02-20 Thread Raymond Wan


Hi all,

According to FAQ 14 (How do I control how my processes are scheduled across nodes?) [http://www.open-mpi.org/faq/?category=running#mpirun-scheduling], it says that the default scheduling policy is by slot and not by node.  I'm curious why the default is "by slot" since I am thinking of explicitly specifying by node but I'm wondering if there is an issue which I haven't considered.  


I would think that one reason for "by node" is to distribute HDD access across machines 
[as is the case for me since my program is HDD access intensive].  Or perhaps I am mistaken?  I'm 
now thinking that "by slot" is the default because processes with ranks that are close 
together might do similar tasks and you would want them on the same node?  Is that the reason?

Also, at the end of this FAQ, it says "NOTE:  This is the scheduling policy in Open MPI because of a long historical 
precendent..." --  does this "This" refer to "the fact that there are two scheduling policies" or 
"the fact that 'by slot' is the default"?  If the latter, then that explains why "by slot" is the default, I 
guess...

Thank you!

Ray