Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread Jeff Squyres
I think the point is that as a group, we consciously, deliberately,  
and painfully decided not to support multi-cluster.  And as a result,  
we ripped out a lot of supporting code.  Starting down this path again  
will likely result in a) re-opening all the discussions, b) re-adding  
a lot of code (or code effectively similar to what was there before).   
Let's not forget that there were many unsolved problems surrounding  
multi-cluster last time, too.


It was also pointed out in Ralph's mails that, at least from the  
descriptions provided, adding the field in orte_node_t does not  
actually solve the problem that ORNL is trying to solve.


If we, as a group, decide to re-add all this stuff, then a) recognize  
that we are flip-flopping *again* on this issue, and b) it will take a  
lot of coding effort to do so.  I do think that since this was a group  
decision last time, it should be a group decision this time, too.  If  
this does turn out to be as large of a sub-project as described, I  
would be opposed to the development occurring on the trunk; hg trees  
are perfect for this kind of stuff.


I personally have no customers who are doing cross-cluster kinds of  
things, so I don't personally care if cross-cluster functionality  
works its way [back] in.  But I recognize that OMPI core members are  
investigating it.  So the points I'm making are procedural; I have no  
real dog in this fight...



On Sep 22, 2008, at 4:40 PM, George Bosilca wrote:


Ralph,

There is NO need to have this discussion again, it was painful  
enough last time. From my perspective I do not understand why are  
you making so much noise on this one. How a 4 lines change in some  
ALPS specific files (Cray system very specific to ORNL) can generate  
more than 3 A4 pages of emails, is still something out of my  
perception.


If they want to do multi-cluster and they do not break anything in  
ORTE/OMPI and they do not ask other people to do it for them why  
trying to stop them ?


 george.

On Sep 22, 2008, at 3:59 PM, Ralph Castain wrote:

There was a very long drawn-out discussion about this early in  
2007. Rather than rehash all that, I'll try to summarize it here.  
It may get confusing - it helped a whole lot to be in a room with a  
whiteboard. There were also presentations on the subject - I  
believe the slides may still be in the docs repository.


Because terminology quickly gets confusing, we adopted a slightly  
different one for these discussions. We talk about OMPI being a  
"single cell" system - i.e., jobs executed via mpirun can only span  
nodes that are reachable by that mpirun. In a typical managed  
environment, a cell aligns quite well with a "cluster". In an  
unmanaged environment where the user provides a hostfile, the cell  
will contain all nodes specified in the hostfile.


We don't filter or abort for non-matching hostnames - if mpirun can  
launch on that node, then great. What we don't support is asking  
mpirun to remotely execute another mpirun on the frontend of  
another cell in order to launch procs on the nodes in -that- cell,  
nor do we ask mpirun to in any way manage (or even know about) any  
procs running on a remote cell.


I see what you are saying about the ALPS node name. However, the  
field you want to add doesn't have anything to do with accept/ 
connect. The orte_node_t object is used solely by mpirun to keep  
track of the node pool it controls - i.e., the nodes upon which it  
is launching jobs. Thus, the mpirun on cluster A will have  
"nid" entries it got from its allocation, and the mpirun on  
cluster B will have "nid" entries it got from its allocation -  
but the two mpiruns will never exchange that information, nor will  
the mpirun on cluster A ever have a need to know the node entries  
for cluster B. Each mpirun launches and manages procs -only- on the  
nodes in its own allocation.


I agree you will have issues when doing the connect/accept modex as  
the nodenames are exchanged and are no longer unique in your  
scenario. However, that info stays in the  ompi_proc_t - it never  
gets communicated to the ORTE layer as we couldn't care less down  
there about the remote procs since they are under the control of a  
different mpirun. So if you need to add a cluster id field for this  
purpose, it needs to go in ompi_proc_t - not in the orte structures.


And for that, you probably need to discuss it with the MPI team as  
changes to ompi_proc_t will likely generate considerable discussion.


FWIW: this is one reason I warned Galen about the problems in  
reviving multi-cluster operations again. We used to deal with multi- 
cells in the process name itself, but all that support has been  
removed from OMPI.


Hope that helps
Ralph

On Sep 22, 2008, at 1:39 PM, Matney Sr, Kenneth D. wrote:


I may be opening a can of worms...

But, what prevents a user from running across clusters in a "normal
OMPI", i.e., non-ALPS environment?  When he puts hosts int

Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread Ralph Castain

Because:

1. last time we went through this, it all started with a single pebble  
in the pond - and I don't want to get engulfed again; and


2. if you bothered to read the email, then you would see that I  
pointed out this change doesn't even do what they are trying to do.  
The change needs to be done elsewhere.


I'm not trying to stop them - but I don't want to go back to the patch- 
our-way-to-hell methodology...and to point them to where they need to  
make the change so it -will- work


Ralph

On Sep 22, 2008, at 2:40 PM, George Bosilca wrote:


Ralph,

There is NO need to have this discussion again, it was painful  
enough last time. From my perspective I do not understand why are  
you making so much noise on this one. How a 4 lines change in some  
ALPS specific files (Cray system very specific to ORNL) can generate  
more than 3 A4 pages of emails, is still something out of my  
perception.


If they want to do multi-cluster and they do not break anything in  
ORTE/OMPI and they do not ask other people to do it for them why  
trying to stop them ?


 george.

On Sep 22, 2008, at 3:59 PM, Ralph Castain wrote:

There was a very long drawn-out discussion about this early in  
2007. Rather than rehash all that, I'll try to summarize it here.  
It may get confusing - it helped a whole lot to be in a room with a  
whiteboard. There were also presentations on the subject - I  
believe the slides may still be in the docs repository.


Because terminology quickly gets confusing, we adopted a slightly  
different one for these discussions. We talk about OMPI being a  
"single cell" system - i.e., jobs executed via mpirun can only span  
nodes that are reachable by that mpirun. In a typical managed  
environment, a cell aligns quite well with a "cluster". In an  
unmanaged environment where the user provides a hostfile, the cell  
will contain all nodes specified in the hostfile.


We don't filter or abort for non-matching hostnames - if mpirun can  
launch on that node, then great. What we don't support is asking  
mpirun to remotely execute another mpirun on the frontend of  
another cell in order to launch procs on the nodes in -that- cell,  
nor do we ask mpirun to in any way manage (or even know about) any  
procs running on a remote cell.


I see what you are saying about the ALPS node name. However, the  
field you want to add doesn't have anything to do with accept/ 
connect. The orte_node_t object is used solely by mpirun to keep  
track of the node pool it controls - i.e., the nodes upon which it  
is launching jobs. Thus, the mpirun on cluster A will have  
"nid" entries it got from its allocation, and the mpirun on  
cluster B will have "nid" entries it got from its allocation -  
but the two mpiruns will never exchange that information, nor will  
the mpirun on cluster A ever have a need to know the node entries  
for cluster B. Each mpirun launches and manages procs -only- on the  
nodes in its own allocation.


I agree you will have issues when doing the connect/accept modex as  
the nodenames are exchanged and are no longer unique in your  
scenario. However, that info stays in the  ompi_proc_t - it never  
gets communicated to the ORTE layer as we couldn't care less down  
there about the remote procs since they are under the control of a  
different mpirun. So if you need to add a cluster id field for this  
purpose, it needs to go in ompi_proc_t - not in the orte structures.


And for that, you probably need to discuss it with the MPI team as  
changes to ompi_proc_t will likely generate considerable discussion.


FWIW: this is one reason I warned Galen about the problems in  
reviving multi-cluster operations again. We used to deal with multi- 
cells in the process name itself, but all that support has been  
removed from OMPI.


Hope that helps
Ralph

On Sep 22, 2008, at 1:39 PM, Matney Sr, Kenneth D. wrote:


I may be opening a can of worms...

But, what prevents a user from running across clusters in a "normal
OMPI", i.e., non-ALPS environment?  When he puts hosts into his
hostfile, does it parse and abort/filter non-matching hostnames?   
The
problem for ALPS based systems is that nodes are addressed via  
NID,PID

pairs at the portals level.  Thus, these are unique only within a
cluster.  In point of fact, I could rewrite all of the ALPS  
support to
identify the nodes by "cluster_id".NID.  It would be a bit  
inefficient

within a cluster because, we would have to extract the NID from this
syntax as we go down to the portals layer.  It also would lead to a
larger degree of change within the OMPI ALPS code base.  However,  
I can
give ALPS-based systems the same feature set as the rest of the  
world.

It just is more efficient to use an additional pointer in the
orte_node_t structure and results is a far simpler code  
structure.  This

makes it easier to maintain.

The only thing that "this change" really does is to identify the  
cluster
under which the ALPS allocatio

Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread George Bosilca

Ralph,

There is NO need to have this discussion again, it was painful enough  
last time. From my perspective I do not understand why are you making  
so much noise on this one. How a 4 lines change in some ALPS specific  
files (Cray system very specific to ORNL) can generate more than 3 A4  
pages of emails, is still something out of my perception.


If they want to do multi-cluster and they do not break anything in  
ORTE/OMPI and they do not ask other people to do it for them why  
trying to stop them ?


  george.

On Sep 22, 2008, at 3:59 PM, Ralph Castain wrote:

There was a very long drawn-out discussion about this early in 2007.  
Rather than rehash all that, I'll try to summarize it here. It may  
get confusing - it helped a whole lot to be in a room with a  
whiteboard. There were also presentations on the subject - I believe  
the slides may still be in the docs repository.


Because terminology quickly gets confusing, we adopted a slightly  
different one for these discussions. We talk about OMPI being a  
"single cell" system - i.e., jobs executed via mpirun can only span  
nodes that are reachable by that mpirun. In a typical managed  
environment, a cell aligns quite well with a "cluster". In an  
unmanaged environment where the user provides a hostfile, the cell  
will contain all nodes specified in the hostfile.


We don't filter or abort for non-matching hostnames - if mpirun can  
launch on that node, then great. What we don't support is asking  
mpirun to remotely execute another mpirun on the frontend of another  
cell in order to launch procs on the nodes in -that- cell, nor do we  
ask mpirun to in any way manage (or even know about) any procs  
running on a remote cell.


I see what you are saying about the ALPS node name. However, the  
field you want to add doesn't have anything to do with accept/ 
connect. The orte_node_t object is used solely by mpirun to keep  
track of the node pool it controls - i.e., the nodes upon which it  
is launching jobs. Thus, the mpirun on cluster A will have "nid"  
entries it got from its allocation, and the mpirun on cluster B will  
have "nid" entries it got from its allocation - but the two  
mpiruns will never exchange that information, nor will the mpirun on  
cluster A ever have a need to know the node entries for cluster B.  
Each mpirun launches and manages procs -only- on the nodes in its  
own allocation.


I agree you will have issues when doing the connect/accept modex as  
the nodenames are exchanged and are no longer unique in your  
scenario. However, that info stays in the  ompi_proc_t - it never  
gets communicated to the ORTE layer as we couldn't care less down  
there about the remote procs since they are under the control of a  
different mpirun. So if you need to add a cluster id field for this  
purpose, it needs to go in ompi_proc_t - not in the orte structures.


And for that, you probably need to discuss it with the MPI team as  
changes to ompi_proc_t will likely generate considerable discussion.


FWIW: this is one reason I warned Galen about the problems in  
reviving multi-cluster operations again. We used to deal with multi- 
cells in the process name itself, but all that support has been  
removed from OMPI.


Hope that helps
Ralph

On Sep 22, 2008, at 1:39 PM, Matney Sr, Kenneth D. wrote:


I may be opening a can of worms...

But, what prevents a user from running across clusters in a "normal
OMPI", i.e., non-ALPS environment?  When he puts hosts into his
hostfile, does it parse and abort/filter non-matching hostnames?  The
problem for ALPS based systems is that nodes are addressed via  
NID,PID

pairs at the portals level.  Thus, these are unique only within a
cluster.  In point of fact, I could rewrite all of the ALPS support  
to
identify the nodes by "cluster_id".NID.  It would be a bit  
inefficient

within a cluster because, we would have to extract the NID from this
syntax as we go down to the portals layer.  It also would lead to a
larger degree of change within the OMPI ALPS code base.  However, I  
can
give ALPS-based systems the same feature set as the rest of the  
world.

It just is more efficient to use an additional pointer in the
orte_node_t structure and results is a far simpler code structure.   
This

makes it easier to maintain.

The only thing that "this change" really does is to identify the  
cluster
under which the ALPS allocation is made.  If you are addressing a  
node
in another cluster, (e.g., via accept/connect), the clustername/NID  
pair

is unique for ALPS as a hostname on a cluster node is unique between
clusters.  If you do a gethostname() on a normal cluster node, you  
are

going to get mynameN, or something similar.  If you do a
gethostname() on an ALPS node, you are going to get nidN; there  
is

no differentiation between cluster A and cluster B.

Perhaps, my earlier comment was not accurate.  In reality, it  
provides

the same degree of identification for AL

Re: [OMPI devel] -display-map

2008-09-22 Thread Ralph Castain
Sorry for delay - was on vacation and am now trying to work my way  
back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for the  
node. However, the problem is that it needs to be consistent. In this  
case, ORTE has already used the name returned by gethostname to create  
its session directory structure long before mpirun reads a hostfile.  
This is why we retain the value from gethostname instead of allowing  
it to be overwritten by the name in whatever allocation we are given.  
Using the name in hostfile would require that I either find some way  
to remember any prior name, or that I tear down and rebuild the  
session directory tree - neither seems attractive nor simple (e.g.,  
what happens when the user provides multiple entries in the hostfile  
for the node, each with a different IP address based on another  
interface in that node? Sounds crazy, but we have already seen it done  
- which one do I use?).


2. We don't actually store the hostfile info anywhere - we just use it  
and forget it. For us to add an XML attribute containing any hostfile- 
related info would therefore require us to re-read the hostfile. I  
could have it do that -only- in the case of "XML output required", but  
it seems rather ugly.


An alternative might be for you to simply do a "gethostbyname" lookup  
of the IP address or hostname to see if it matches instead of just  
doing a strcmp. This is what we have to do internally as we frequently  
have problems with FQDN vs. non-FQDN vs. IP addresses etc. If the  
local OS hasn't cached the IP address for the node in question it can  
take a little time to DNS resolve it, but otherwise works fine.


I can point you to the code in OPAL that we use - I would think  
something similar would be easy to implement in your code and would  
readily solve the problem.


Ralph

On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:


Ralph,

The problem we're seeing is just with the head node. If I specify a  
particular IP address for the head node in the hostfile, it gets  
changed to the FQDN when displayed in the map. This is a problem for  
us as we need to be able to match the two, and since we're not  
necessarily running on the head node, we can't always do the same  
resolution you're doing.


Would it be possible to use the same address that is specified in  
the hostfile, or alternatively provide an XML attribute that  
contains this information?


Thanks,

Greg

On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:

Not in that regard, depending upon what you mean by "recently". The  
only changes I am aware of wrt nodes consisted of some changes to  
the order in which we use the nodes when specified by hostfile or - 
host, and a little #if protectionism needed by Brian for the Cray  
port.


Are you seeing this for every node? Reason I ask: I can't offhand  
think of anything in the code base that would replace a host name  
with the FQDN because we don't get that info for remote nodes. The  
only exception is the head node (where mpirun sits) - in that lone  
case, we default to the name returned to us by gethostname(). We do  
that because the head node is frequently accessible on a more  
global basis than the compute nodes - thus, the FQDN is required to  
ensure that there is no address confusion on the network.


If the user refers to compute nodes in a hostfile or -host (or in  
an allocation from a resource manager) by non-FQDN, we just assume  
they know what they are doing and the name will correctly resolve  
to a unique address.



On Sep 10, 2008, at 9:45 AM, Greg Watson wrote:


Hi,

Has there been a change in the behavior of the -display-map option  
has changed recently in the 1.3 branch. We're now seeing the host  
name as a fully resolved DN rather than the entry that was  
specified in the hostfile. Is there any particular reason for  
this? If so, would it be possible to add the hostfile entry to the  
output since we need to be able to match the two?


Thanks,

Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread Ralph Castain
There was a very long drawn-out discussion about this early in 2007.  
Rather than rehash all that, I'll try to summarize it here. It may get  
confusing - it helped a whole lot to be in a room with a whiteboard.  
There were also presentations on the subject - I believe the slides  
may still be in the docs repository.


Because terminology quickly gets confusing, we adopted a slightly  
different one for these discussions. We talk about OMPI being a  
"single cell" system - i.e., jobs executed via mpirun can only span  
nodes that are reachable by that mpirun. In a typical managed  
environment, a cell aligns quite well with a "cluster". In an  
unmanaged environment where the user provides a hostfile, the cell  
will contain all nodes specified in the hostfile.


We don't filter or abort for non-matching hostnames - if mpirun can  
launch on that node, then great. What we don't support is asking  
mpirun to remotely execute another mpirun on the frontend of another  
cell in order to launch procs on the nodes in -that- cell, nor do we  
ask mpirun to in any way manage (or even know about) any procs running  
on a remote cell.


I see what you are saying about the ALPS node name. However, the field  
you want to add doesn't have anything to do with accept/connect. The  
orte_node_t object is used solely by mpirun to keep track of the node  
pool it controls - i.e., the nodes upon which it is launching jobs.  
Thus, the mpirun on cluster A will have "nid" entries it got from  
its allocation, and the mpirun on cluster B will have "nid"  
entries it got from its allocation - but the two mpiruns will never  
exchange that information, nor will the mpirun on cluster A ever have  
a need to know the node entries for cluster B. Each mpirun launches  
and manages procs -only- on the nodes in its own allocation.


I agree you will have issues when doing the connect/accept modex as  
the nodenames are exchanged and are no longer unique in your scenario.  
However, that info stays in the  ompi_proc_t - it never gets  
communicated to the ORTE layer as we couldn't care less down there  
about the remote procs since they are under the control of a different  
mpirun. So if you need to add a cluster id field for this purpose, it  
needs to go in ompi_proc_t - not in the orte structures.


And for that, you probably need to discuss it with the MPI team as  
changes to ompi_proc_t will likely generate considerable discussion.


FWIW: this is one reason I warned Galen about the problems in reviving  
multi-cluster operations again. We used to deal with multi-cells in  
the process name itself, but all that support has been removed from  
OMPI.


Hope that helps
Ralph

On Sep 22, 2008, at 1:39 PM, Matney Sr, Kenneth D. wrote:


I may be opening a can of worms...

But, what prevents a user from running across clusters in a "normal
OMPI", i.e., non-ALPS environment?  When he puts hosts into his
hostfile, does it parse and abort/filter non-matching hostnames?  The
problem for ALPS based systems is that nodes are addressed via NID,PID
pairs at the portals level.  Thus, these are unique only within a
cluster.  In point of fact, I could rewrite all of the ALPS support to
identify the nodes by "cluster_id".NID.  It would be a bit inefficient
within a cluster because, we would have to extract the NID from this
syntax as we go down to the portals layer.  It also would lead to a
larger degree of change within the OMPI ALPS code base.  However, I  
can

give ALPS-based systems the same feature set as the rest of the world.
It just is more efficient to use an additional pointer in the
orte_node_t structure and results is a far simpler code structure.   
This

makes it easier to maintain.

The only thing that "this change" really does is to identify the  
cluster

under which the ALPS allocation is made.  If you are addressing a node
in another cluster, (e.g., via accept/connect), the clustername/NID  
pair

is unique for ALPS as a hostname on a cluster node is unique between
clusters.  If you do a gethostname() on a normal cluster node, you are
going to get mynameN, or something similar.  If you do a
gethostname() on an ALPS node, you are going to get nidN; there is
no differentiation between cluster A and cluster B.

Perhaps, my earlier comment was not accurate.  In reality, it provides
the same degree of identification for ALPS nodes as hostname provides
for normal clusters.  From your perspective, it is immaterial that it
also would allow us to support our limited form of multi-cluster
support.  However, of and by itself, it only provides the same level  
of

identification as is done for other cluster nodes.
--
Ken


-Original Message-
From: Ralph Castain [mailto:r...@lanl.gov]
Sent: Monday, September 22, 2008 2:33 PM
To: Open MPI Developers
Cc: Matney Sr, Kenneth D.
Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

The issue isn't with adding a string. The question is whether or not
OMPI is to su

Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread Ralph Castain
The issue isn't with adding a string. The question is whether or not  
OMPI is to support one job running across multiple clusters. We made a  
conscious decision (after lengthy discussions on OMPI core and ORTE  
mailing lists, plus several telecons) to not do so - we require that  
the job execute on a single cluster, while allowing connect/accept to  
occur between jobs on different clusters.


It is difficult to understand why we need a string (or our old "cell  
id") to tell us which cluster we are on if we are only following that  
operating model. From the commit comment, and from what I know of the  
system, the only rationale for adding such a designator is to shift  
back to the one-mpirun-spanning-multiple-cluster model.


If we are now going to make that change, then it merits a similar  
level of consideration as the last decision to move away from that  
model. Making that move involves considerably more than just adding a  
cluster id string. You may think that now, but the next step is  
inevitably to bring back remote launch, killing jobs on all clusters  
when one cluster has a problem, etc.


Before we go down this path and re-open Pandora's box, we should at  
least agree that is what we intend to do...or agree on what hard  
constraints we will place on multi-cluster operations. Frankly, I'm  
tired of bouncing back-and-forth on even the most basic design  
decisions.


Ralph



On Sep 22, 2008, at 11:55 AM, Richard Graham wrote:

What Ken put in is what is needed for the limited multi-cluster  
capabilities
we need, just one additional string.  I don't think there is a need  
for any

discussion of such a small change.

Rich


On 9/22/08 1:32 PM, "Ralph Castain"  wrote:


We really should discuss that as a group first - there is quite a bit
of code required to actually support multi-clusters that has been
removed.

Our operational model that was agreed to quite a while ago is that
mpirun can -only- extend over a single "cell". You can connect/accept
multiple mpiruns that are sitting on different cells, but you cannot
execute a single mpirun across multiple cells.

Please keep this on your own development branch for now. Bringing it
into the trunk will require discussion as this changes the operating
model, and has significant code consequences when we look at abnormal
terminations, comm_spawn, etc.

Thanks
Ralph

On Sep 22, 2008, at 11:26 AM, Richard Graham wrote:


This check in was in error - I had not realized that the checkout
was from
the 1.3 branch, so we will fix this, and put these into the trunk
(1.4).  We
are going to bring in some limited multi-cluster support - limited
is the
operative word.

Rich


On 9/22/08 12:50 PM, "Jeff Squyres"  wrote:

I notice that Ken Matney (the committer) is not on the devel  
list; I

added him explicitly to the CC line.

Ken: please see below.


On Sep 22, 2008, at 12:46 PM, Ralph Castain wrote:


Whoa! We made a decision NOT to support multi-cluster apps in OMPI
over a year ago!

Please remove this from 1.3 - we should discuss if/when this would
even be allowed in the trunk.

Thanks
Ralph

On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote:


Author: matney
Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008)
New Revision: 19600
URL: https://svn.open-mpi.org/trac/ompi/changeset/19600

Log:
Added member to orte_node_t to enable multi-cluster jobs in ALPS
scheduled systems (like Cray XT).

Text files modified:
branches/v1.3/orte/runtime/orte_globals.h | 4 
1 files changed, 4 insertions(+), 0 deletions(-)

Modified: branches/v1.3/orte/runtime/orte_globals.h
=
=
=
=
=
=
=
=
=
=
=
= 
= 
=

--- branches/v1.3/orte/runtime/orte_globals.h (original)
+++ branches/v1.3/orte/runtime/orte_globals.h 2008-09-22 12:35:54
EDT (Mon, 22 Sep 2008)
@@ -222,6 +222,10 @@
/** Username on this node, if specified */
char *username;
char *slot_list;
+/** Clustername (machine name of cluster) on which this node
+resides.  ALPS scheduled systems need this to enable
+multi-cluster support.  */
+char *clustername;
} orte_node_t;
ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t);

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Commit access to 1.3 restricted to gatekeeper(s)

2008-09-22 Thread Tim Mattox
Hello All,
As it became apparent this morning, it was well past the time
to actually restrict commit access to the 1.3 branch.  As of this
afternoon, all changes to the 1.3 branch must occur via the
CMR process we are all familiar with from the 1.2 branch. See:
 https://svn.open-mpi.org/trac/ompi/wiki/SubmittingChangesetMoveReqs

Sorry for the delay in actually closing off access since it was agreed
that we would close things off two weeks ago, with some fuzz for
a few remaining already-in-the-works CMRs.
-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread Richard Graham
What Ken put in is what is needed for the limited multi-cluster capabilities
we need, just one additional string.  I don't think there is a need for any
discussion of such a small change.

Rich


On 9/22/08 1:32 PM, "Ralph Castain"  wrote:

> We really should discuss that as a group first - there is quite a bit
> of code required to actually support multi-clusters that has been
> removed.
> 
> Our operational model that was agreed to quite a while ago is that
> mpirun can -only- extend over a single "cell". You can connect/accept
> multiple mpiruns that are sitting on different cells, but you cannot
> execute a single mpirun across multiple cells.
> 
> Please keep this on your own development branch for now. Bringing it
> into the trunk will require discussion as this changes the operating
> model, and has significant code consequences when we look at abnormal
> terminations, comm_spawn, etc.
> 
> Thanks
> Ralph
> 
> On Sep 22, 2008, at 11:26 AM, Richard Graham wrote:
> 
>> This check in was in error - I had not realized that the checkout
>> was from
>> the 1.3 branch, so we will fix this, and put these into the trunk
>> (1.4).  We
>> are going to bring in some limited multi-cluster support - limited
>> is the
>> operative word.
>> 
>> Rich
>> 
>> 
>> On 9/22/08 12:50 PM, "Jeff Squyres"  wrote:
>> 
>>> I notice that Ken Matney (the committer) is not on the devel list; I
>>> added him explicitly to the CC line.
>>> 
>>> Ken: please see below.
>>> 
>>> 
>>> On Sep 22, 2008, at 12:46 PM, Ralph Castain wrote:
>>> 
 Whoa! We made a decision NOT to support multi-cluster apps in OMPI
 over a year ago!
 
 Please remove this from 1.3 - we should discuss if/when this would
 even be allowed in the trunk.
 
 Thanks
 Ralph
 
 On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote:
 
> Author: matney
> Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008)
> New Revision: 19600
> URL: https://svn.open-mpi.org/trac/ompi/changeset/19600
> 
> Log:
> Added member to orte_node_t to enable multi-cluster jobs in ALPS
> scheduled systems (like Cray XT).
> 
> Text files modified:
> branches/v1.3/orte/runtime/orte_globals.h | 4 
> 1 files changed, 4 insertions(+), 0 deletions(-)
> 
> Modified: branches/v1.3/orte/runtime/orte_globals.h
> =
> =
> =
> =
> =
> =
> =
> =
> =
> = 
> = 
> ===
> --- branches/v1.3/orte/runtime/orte_globals.h (original)
> +++ branches/v1.3/orte/runtime/orte_globals.h 2008-09-22 12:35:54
> EDT (Mon, 22 Sep 2008)
> @@ -222,6 +222,10 @@
>  /** Username on this node, if specified */
>  char *username;
>  char *slot_list;
> +/** Clustername (machine name of cluster) on which this node
> +resides.  ALPS scheduled systems need this to enable
> +multi-cluster support.  */
> +char *clustername;
> } orte_node_t;
> ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t);
> 
> ___
> svn mailing list
> s...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] proper way to shut down orted

2008-09-22 Thread Ralph Castain
Hmmm...well, there -used- to be a tool that was distributed with the  
1.2 series for doing just that, but I don't see it in the 1.2.7  
release. Not sure when or how that got dropped - probably fell through  
a crack.


Unfortunately, minus that tool, there is no clean way to shut this  
down. However, you can re-use the universe name if you simply go to  
your tmp directory, find the session directory for you, and just rm - 
rf the directory with that universe name.


So you want to look for something like "/tmp/openmpi-sessions- 
username@hostname_0/univ3" per your example below, and blow the  
"univ3" directory tree away.


Sorry it isn't simpler - trying to re-release with that tool is  
probably more trouble than it is worth now, especially given that the  
"seed" operation isn't used anymore beginning with the upcoming 1.3  
release.


Ralph

On Sep 22, 2008, at 10:08 AM, Timothy Kaiser wrote:


Greetings,

I have a manager/worker application.  The
manager is called "t2a" and the workers "w2d"

I launch the manager and each worker with
its own mpiexec with n=1.  They connect using
various calls including MPI_Open_port,
MPI_Comm_accept, MPI_Comm_connect and
MPI_Intercomm_merge.

It works fine.


I am using the command:

orted --persistent --seed --scope public --universe univ3 --set-sid

to set up the universe and the mpiexec commands are:

mpiexec -np 1 --universe univ3 t2a

mpiexec -np 1 --universe univ3 w2d

mpiexec -np 1 --universe univ3 w2d

mpiexec -np 1 --universe univ3 w2d


Question:

What is the proper way to shutdown orted?
I have found that if I just kill orted then
I can't reuse the universe name.

Platforms and OpenMPI versions:

OS X  openmpi-1.2.7 or openmpi-1.2.6 (ethernet)

Rocks openmpi-1.2.6 (Infiniband)




Thanks!

Tim

--
--
Timothy H. Kaiser, Ph.D. tkai...@mines.edu  CSM::GECO
"Nobody made a greater mistake than he who did nothing
because he could only do a little" (Edmund Burke)


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread Ralph Castain
We really should discuss that as a group first - there is quite a bit  
of code required to actually support multi-clusters that has been  
removed.


Our operational model that was agreed to quite a while ago is that  
mpirun can -only- extend over a single "cell". You can connect/accept  
multiple mpiruns that are sitting on different cells, but you cannot  
execute a single mpirun across multiple cells.


Please keep this on your own development branch for now. Bringing it  
into the trunk will require discussion as this changes the operating  
model, and has significant code consequences when we look at abnormal  
terminations, comm_spawn, etc.


Thanks
Ralph

On Sep 22, 2008, at 11:26 AM, Richard Graham wrote:

This check in was in error - I had not realized that the checkout  
was from
the 1.3 branch, so we will fix this, and put these into the trunk  
(1.4).  We
are going to bring in some limited multi-cluster support - limited  
is the

operative word.

Rich


On 9/22/08 12:50 PM, "Jeff Squyres"  wrote:


I notice that Ken Matney (the committer) is not on the devel list; I
added him explicitly to the CC line.

Ken: please see below.


On Sep 22, 2008, at 12:46 PM, Ralph Castain wrote:


Whoa! We made a decision NOT to support multi-cluster apps in OMPI
over a year ago!

Please remove this from 1.3 - we should discuss if/when this would
even be allowed in the trunk.

Thanks
Ralph

On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote:


Author: matney
Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008)
New Revision: 19600
URL: https://svn.open-mpi.org/trac/ompi/changeset/19600

Log:
Added member to orte_node_t to enable multi-cluster jobs in ALPS
scheduled systems (like Cray XT).

Text files modified:
branches/v1.3/orte/runtime/orte_globals.h | 4 
1 files changed, 4 insertions(+), 0 deletions(-)

Modified: branches/v1.3/orte/runtime/orte_globals.h
=
=
=
=
=
=
=
=
=
= 
= 
===

--- branches/v1.3/orte/runtime/orte_globals.h (original)
+++ branches/v1.3/orte/runtime/orte_globals.h 2008-09-22 12:35:54
EDT (Mon, 22 Sep 2008)
@@ -222,6 +222,10 @@
 /** Username on this node, if specified */
 char *username;
 char *slot_list;
+/** Clustername (machine name of cluster) on which this node
+resides.  ALPS scheduled systems need this to enable
+multi-cluster support.  */
+char *clustername;
} orte_node_t;
ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t);

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread Richard Graham
This check in was in error - I had not realized that the checkout was from
the 1.3 branch, so we will fix this, and put these into the trunk (1.4).  We
are going to bring in some limited multi-cluster support - limited is the
operative word.

Rich


On 9/22/08 12:50 PM, "Jeff Squyres"  wrote:

> I notice that Ken Matney (the committer) is not on the devel list; I
> added him explicitly to the CC line.
> 
> Ken: please see below.
> 
> 
> On Sep 22, 2008, at 12:46 PM, Ralph Castain wrote:
> 
>> Whoa! We made a decision NOT to support multi-cluster apps in OMPI
>> over a year ago!
>> 
>> Please remove this from 1.3 - we should discuss if/when this would
>> even be allowed in the trunk.
>> 
>> Thanks
>> Ralph
>> 
>> On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote:
>> 
>>> Author: matney
>>> Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008)
>>> New Revision: 19600
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/19600
>>> 
>>> Log:
>>> Added member to orte_node_t to enable multi-cluster jobs in ALPS
>>> scheduled systems (like Cray XT).
>>> 
>>> Text files modified:
>>> branches/v1.3/orte/runtime/orte_globals.h | 4 
>>> 1 files changed, 4 insertions(+), 0 deletions(-)
>>> 
>>> Modified: branches/v1.3/orte/runtime/orte_globals.h
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> =
>>> --- branches/v1.3/orte/runtime/orte_globals.h (original)
>>> +++ branches/v1.3/orte/runtime/orte_globals.h 2008-09-22 12:35:54
>>> EDT (Mon, 22 Sep 2008)
>>> @@ -222,6 +222,10 @@
>>>   /** Username on this node, if specified */
>>>   char *username;
>>>   char *slot_list;
>>> +/** Clustername (machine name of cluster) on which this node
>>> +resides.  ALPS scheduled systems need this to enable
>>> +multi-cluster support.  */
>>> +char *clustername;
>>> } orte_node_t;
>>> ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t);
>>> 
>>> ___
>>> svn mailing list
>>> s...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 



Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread Jeff Squyres
I notice that Ken Matney (the committer) is not on the devel list; I  
added him explicitly to the CC line.


Ken: please see below.


On Sep 22, 2008, at 12:46 PM, Ralph Castain wrote:

Whoa! We made a decision NOT to support multi-cluster apps in OMPI  
over a year ago!


Please remove this from 1.3 - we should discuss if/when this would  
even be allowed in the trunk.


Thanks
Ralph

On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote:


Author: matney
Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008)
New Revision: 19600
URL: https://svn.open-mpi.org/trac/ompi/changeset/19600

Log:
Added member to orte_node_t to enable multi-cluster jobs in ALPS
scheduled systems (like Cray XT).

Text files modified:
branches/v1.3/orte/runtime/orte_globals.h | 4 
1 files changed, 4 insertions(+), 0 deletions(-)

Modified: branches/v1.3/orte/runtime/orte_globals.h
= 
= 
= 
= 
= 
= 
= 
= 
= 
=

--- branches/v1.3/orte/runtime/orte_globals.h   (original)
+++ branches/v1.3/orte/runtime/orte_globals.h	2008-09-22 12:35:54  
EDT (Mon, 22 Sep 2008)

@@ -222,6 +222,10 @@
  /** Username on this node, if specified */
  char *username;
  char *slot_list;
+/** Clustername (machine name of cluster) on which this node
+resides.  ALPS scheduled systems need this to enable
+multi-cluster support.  */
+char *clustername;
} orte_node_t;
ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t);

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread Ralph Castain
Whoa! We made a decision NOT to support multi-cluster apps in OMPI  
over a year ago!


Please remove this from 1.3 - we should discuss if/when this would  
even be allowed in the trunk.


Thanks
Ralph

On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote:


Author: matney
Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008)
New Revision: 19600
URL: https://svn.open-mpi.org/trac/ompi/changeset/19600

Log:
Added member to orte_node_t to enable multi-cluster jobs in ALPS
scheduled systems (like Cray XT).

Text files modified:
 branches/v1.3/orte/runtime/orte_globals.h | 4 
 1 files changed, 4 insertions(+), 0 deletions(-)

Modified: branches/v1.3/orte/runtime/orte_globals.h
= 
= 
= 
= 
= 
= 
= 
= 
==

--- branches/v1.3/orte/runtime/orte_globals.h   (original)
+++ branches/v1.3/orte/runtime/orte_globals.h	2008-09-22 12:35:54  
EDT (Mon, 22 Sep 2008)

@@ -222,6 +222,10 @@
   /** Username on this node, if specified */
   char *username;
   char *slot_list;
+/** Clustername (machine name of cluster) on which this node
+resides.  ALPS scheduled systems need this to enable
+multi-cluster support.  */
+char *clustername;
} orte_node_t;
ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t);

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r19599

2008-09-22 Thread Jeff Squyres

Ken --

Can you please also apply this to the trunk?

Thanks.

On Sep 22, 2008, at 12:20 PM, mat...@osl.iu.edu wrote:


Author: matney
Date: 2008-09-22 12:20:18 EDT (Mon, 22 Sep 2008)
New Revision: 19599
URL: https://svn.open-mpi.org/trac/ompi/changeset/19599

Log:
Add #include for stdio.h to allow make check to run with gcc 4.2.4 (on
Cray XT platform).

Text files modified:
  branches/v1.3/test/datatype/checksum.c | 1 +
  branches/v1.3/test/datatype/position.c | 1 +
  2 files changed, 2 insertions(+), 0 deletions(-)

Modified: branches/v1.3/test/datatype/checksum.c
=
=
=
=
=
=
=
=
==
--- branches/v1.3/test/datatype/checksum.c  (original)
+++ branches/v1.3/test/datatype/checksum.c	2008-09-22 12:20:18 EDT  
(Mon, 22 Sep 2008)

@@ -15,6 +15,7 @@
#include "ompi/datatype/datatype.h"
#include "ompi/datatype/datatype_checksum.h"

+#include 
#include 
#include 


Modified: branches/v1.3/test/datatype/position.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- branches/v1.3/test/datatype/position.c  (original)
+++ branches/v1.3/test/datatype/position.c	2008-09-22 12:20:18 EDT  
(Mon, 22 Sep 2008)

@@ -11,6 +11,7 @@
 */

#include "ompi_config.h"
+#include 
#include 
#include "ompi/datatype/convertor.h"
#include "ompi/datatype/datatype.h"
___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full



--
Jeff Squyres
Cisco Systems



[OMPI devel] proper way to shut down orted

2008-09-22 Thread Timothy Kaiser

Greetings,

I have a manager/worker application.  The
manager is called "t2a" and the workers "w2d"

I launch the manager and each worker with
its own mpiexec with n=1.  They connect using
various calls including MPI_Open_port,
MPI_Comm_accept, MPI_Comm_connect and
MPI_Intercomm_merge.

It works fine.


I am using the command:

orted --persistent --seed --scope public --universe univ3 --set-sid

to set up the universe and the mpiexec commands are:

mpiexec -np 1 --universe univ3 t2a

mpiexec -np 1 --universe univ3 w2d

mpiexec -np 1 --universe univ3 w2d

mpiexec -np 1 --universe univ3 w2d


Question:

What is the proper way to shutdown orted?
I have found that if I just kill orted then
I can't reuse the universe name.

Platforms and OpenMPI versions:

OS X  openmpi-1.2.7 or openmpi-1.2.6 (ethernet)

Rocks openmpi-1.2.6 (Infiniband)




Thanks!

Tim

--
--
Timothy H. Kaiser, Ph.D. tkai...@mines.edu  CSM::GECO
"Nobody made a greater mistake than he who did nothing
because he could only do a little" (Edmund Burke)




Re: [OMPI devel] -display-map and mpi_spawn

2008-09-22 Thread Ralph Castain
We always output the entire map, so you'll see the parent procs as  
well as the child



On Sep 16, 2008, at 12:52 PM, Greg Watson wrote:


Hi Ralph,

No I'm happy to get a map at the beginning and at every spawn. Do  
you send the whole map again, or only an update?


Regards,

Greg

On Sep 11, 2008, at 9:09 AM, Ralph Castain wrote:

It already somewhat does. If you use --display-map at mpirun, you  
automatically get display-map whenever MPI_Spawn is called.


We didn't provide a mechanism by which you could only display-map  
for MPI_Spawn (and not for the original mpirun), but it would be  
trivial to do so - just have to define an info-key for that  
purpose. Is that what you need?



On Sep 11, 2008, at 5:35 AM, Greg Watson wrote:


Ralph,

At the moment -display-map shows the process mapping when mpirun  
first starts, but I'm wondering about processes created  
dynamically. Would it be possible to trigger a map update when  
MPI_Spawn is called?


Regards,

Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel