Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600
I think the point is that as a group, we consciously, deliberately, and painfully decided not to support multi-cluster. And as a result, we ripped out a lot of supporting code. Starting down this path again will likely result in a) re-opening all the discussions, b) re-adding a lot of code (or code effectively similar to what was there before). Let's not forget that there were many unsolved problems surrounding multi-cluster last time, too. It was also pointed out in Ralph's mails that, at least from the descriptions provided, adding the field in orte_node_t does not actually solve the problem that ORNL is trying to solve. If we, as a group, decide to re-add all this stuff, then a) recognize that we are flip-flopping *again* on this issue, and b) it will take a lot of coding effort to do so. I do think that since this was a group decision last time, it should be a group decision this time, too. If this does turn out to be as large of a sub-project as described, I would be opposed to the development occurring on the trunk; hg trees are perfect for this kind of stuff. I personally have no customers who are doing cross-cluster kinds of things, so I don't personally care if cross-cluster functionality works its way [back] in. But I recognize that OMPI core members are investigating it. So the points I'm making are procedural; I have no real dog in this fight... On Sep 22, 2008, at 4:40 PM, George Bosilca wrote: Ralph, There is NO need to have this discussion again, it was painful enough last time. From my perspective I do not understand why are you making so much noise on this one. How a 4 lines change in some ALPS specific files (Cray system very specific to ORNL) can generate more than 3 A4 pages of emails, is still something out of my perception. If they want to do multi-cluster and they do not break anything in ORTE/OMPI and they do not ask other people to do it for them why trying to stop them ? george. On Sep 22, 2008, at 3:59 PM, Ralph Castain wrote: There was a very long drawn-out discussion about this early in 2007. Rather than rehash all that, I'll try to summarize it here. It may get confusing - it helped a whole lot to be in a room with a whiteboard. There were also presentations on the subject - I believe the slides may still be in the docs repository. Because terminology quickly gets confusing, we adopted a slightly different one for these discussions. We talk about OMPI being a "single cell" system - i.e., jobs executed via mpirun can only span nodes that are reachable by that mpirun. In a typical managed environment, a cell aligns quite well with a "cluster". In an unmanaged environment where the user provides a hostfile, the cell will contain all nodes specified in the hostfile. We don't filter or abort for non-matching hostnames - if mpirun can launch on that node, then great. What we don't support is asking mpirun to remotely execute another mpirun on the frontend of another cell in order to launch procs on the nodes in -that- cell, nor do we ask mpirun to in any way manage (or even know about) any procs running on a remote cell. I see what you are saying about the ALPS node name. However, the field you want to add doesn't have anything to do with accept/ connect. The orte_node_t object is used solely by mpirun to keep track of the node pool it controls - i.e., the nodes upon which it is launching jobs. Thus, the mpirun on cluster A will have "nid" entries it got from its allocation, and the mpirun on cluster B will have "nid" entries it got from its allocation - but the two mpiruns will never exchange that information, nor will the mpirun on cluster A ever have a need to know the node entries for cluster B. Each mpirun launches and manages procs -only- on the nodes in its own allocation. I agree you will have issues when doing the connect/accept modex as the nodenames are exchanged and are no longer unique in your scenario. However, that info stays in the ompi_proc_t - it never gets communicated to the ORTE layer as we couldn't care less down there about the remote procs since they are under the control of a different mpirun. So if you need to add a cluster id field for this purpose, it needs to go in ompi_proc_t - not in the orte structures. And for that, you probably need to discuss it with the MPI team as changes to ompi_proc_t will likely generate considerable discussion. FWIW: this is one reason I warned Galen about the problems in reviving multi-cluster operations again. We used to deal with multi- cells in the process name itself, but all that support has been removed from OMPI. Hope that helps Ralph On Sep 22, 2008, at 1:39 PM, Matney Sr, Kenneth D. wrote: I may be opening a can of worms... But, what prevents a user from running across clusters in a "normal OMPI", i.e., non-ALPS environment? When he puts hosts int
Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600
Because: 1. last time we went through this, it all started with a single pebble in the pond - and I don't want to get engulfed again; and 2. if you bothered to read the email, then you would see that I pointed out this change doesn't even do what they are trying to do. The change needs to be done elsewhere. I'm not trying to stop them - but I don't want to go back to the patch- our-way-to-hell methodology...and to point them to where they need to make the change so it -will- work Ralph On Sep 22, 2008, at 2:40 PM, George Bosilca wrote: Ralph, There is NO need to have this discussion again, it was painful enough last time. From my perspective I do not understand why are you making so much noise on this one. How a 4 lines change in some ALPS specific files (Cray system very specific to ORNL) can generate more than 3 A4 pages of emails, is still something out of my perception. If they want to do multi-cluster and they do not break anything in ORTE/OMPI and they do not ask other people to do it for them why trying to stop them ? george. On Sep 22, 2008, at 3:59 PM, Ralph Castain wrote: There was a very long drawn-out discussion about this early in 2007. Rather than rehash all that, I'll try to summarize it here. It may get confusing - it helped a whole lot to be in a room with a whiteboard. There were also presentations on the subject - I believe the slides may still be in the docs repository. Because terminology quickly gets confusing, we adopted a slightly different one for these discussions. We talk about OMPI being a "single cell" system - i.e., jobs executed via mpirun can only span nodes that are reachable by that mpirun. In a typical managed environment, a cell aligns quite well with a "cluster". In an unmanaged environment where the user provides a hostfile, the cell will contain all nodes specified in the hostfile. We don't filter or abort for non-matching hostnames - if mpirun can launch on that node, then great. What we don't support is asking mpirun to remotely execute another mpirun on the frontend of another cell in order to launch procs on the nodes in -that- cell, nor do we ask mpirun to in any way manage (or even know about) any procs running on a remote cell. I see what you are saying about the ALPS node name. However, the field you want to add doesn't have anything to do with accept/ connect. The orte_node_t object is used solely by mpirun to keep track of the node pool it controls - i.e., the nodes upon which it is launching jobs. Thus, the mpirun on cluster A will have "nid" entries it got from its allocation, and the mpirun on cluster B will have "nid" entries it got from its allocation - but the two mpiruns will never exchange that information, nor will the mpirun on cluster A ever have a need to know the node entries for cluster B. Each mpirun launches and manages procs -only- on the nodes in its own allocation. I agree you will have issues when doing the connect/accept modex as the nodenames are exchanged and are no longer unique in your scenario. However, that info stays in the ompi_proc_t - it never gets communicated to the ORTE layer as we couldn't care less down there about the remote procs since they are under the control of a different mpirun. So if you need to add a cluster id field for this purpose, it needs to go in ompi_proc_t - not in the orte structures. And for that, you probably need to discuss it with the MPI team as changes to ompi_proc_t will likely generate considerable discussion. FWIW: this is one reason I warned Galen about the problems in reviving multi-cluster operations again. We used to deal with multi- cells in the process name itself, but all that support has been removed from OMPI. Hope that helps Ralph On Sep 22, 2008, at 1:39 PM, Matney Sr, Kenneth D. wrote: I may be opening a can of worms... But, what prevents a user from running across clusters in a "normal OMPI", i.e., non-ALPS environment? When he puts hosts into his hostfile, does it parse and abort/filter non-matching hostnames? The problem for ALPS based systems is that nodes are addressed via NID,PID pairs at the portals level. Thus, these are unique only within a cluster. In point of fact, I could rewrite all of the ALPS support to identify the nodes by "cluster_id".NID. It would be a bit inefficient within a cluster because, we would have to extract the NID from this syntax as we go down to the portals layer. It also would lead to a larger degree of change within the OMPI ALPS code base. However, I can give ALPS-based systems the same feature set as the rest of the world. It just is more efficient to use an additional pointer in the orte_node_t structure and results is a far simpler code structure. This makes it easier to maintain. The only thing that "this change" really does is to identify the cluster under which the ALPS allocatio
Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600
Ralph, There is NO need to have this discussion again, it was painful enough last time. From my perspective I do not understand why are you making so much noise on this one. How a 4 lines change in some ALPS specific files (Cray system very specific to ORNL) can generate more than 3 A4 pages of emails, is still something out of my perception. If they want to do multi-cluster and they do not break anything in ORTE/OMPI and they do not ask other people to do it for them why trying to stop them ? george. On Sep 22, 2008, at 3:59 PM, Ralph Castain wrote: There was a very long drawn-out discussion about this early in 2007. Rather than rehash all that, I'll try to summarize it here. It may get confusing - it helped a whole lot to be in a room with a whiteboard. There were also presentations on the subject - I believe the slides may still be in the docs repository. Because terminology quickly gets confusing, we adopted a slightly different one for these discussions. We talk about OMPI being a "single cell" system - i.e., jobs executed via mpirun can only span nodes that are reachable by that mpirun. In a typical managed environment, a cell aligns quite well with a "cluster". In an unmanaged environment where the user provides a hostfile, the cell will contain all nodes specified in the hostfile. We don't filter or abort for non-matching hostnames - if mpirun can launch on that node, then great. What we don't support is asking mpirun to remotely execute another mpirun on the frontend of another cell in order to launch procs on the nodes in -that- cell, nor do we ask mpirun to in any way manage (or even know about) any procs running on a remote cell. I see what you are saying about the ALPS node name. However, the field you want to add doesn't have anything to do with accept/ connect. The orte_node_t object is used solely by mpirun to keep track of the node pool it controls - i.e., the nodes upon which it is launching jobs. Thus, the mpirun on cluster A will have "nid" entries it got from its allocation, and the mpirun on cluster B will have "nid" entries it got from its allocation - but the two mpiruns will never exchange that information, nor will the mpirun on cluster A ever have a need to know the node entries for cluster B. Each mpirun launches and manages procs -only- on the nodes in its own allocation. I agree you will have issues when doing the connect/accept modex as the nodenames are exchanged and are no longer unique in your scenario. However, that info stays in the ompi_proc_t - it never gets communicated to the ORTE layer as we couldn't care less down there about the remote procs since they are under the control of a different mpirun. So if you need to add a cluster id field for this purpose, it needs to go in ompi_proc_t - not in the orte structures. And for that, you probably need to discuss it with the MPI team as changes to ompi_proc_t will likely generate considerable discussion. FWIW: this is one reason I warned Galen about the problems in reviving multi-cluster operations again. We used to deal with multi- cells in the process name itself, but all that support has been removed from OMPI. Hope that helps Ralph On Sep 22, 2008, at 1:39 PM, Matney Sr, Kenneth D. wrote: I may be opening a can of worms... But, what prevents a user from running across clusters in a "normal OMPI", i.e., non-ALPS environment? When he puts hosts into his hostfile, does it parse and abort/filter non-matching hostnames? The problem for ALPS based systems is that nodes are addressed via NID,PID pairs at the portals level. Thus, these are unique only within a cluster. In point of fact, I could rewrite all of the ALPS support to identify the nodes by "cluster_id".NID. It would be a bit inefficient within a cluster because, we would have to extract the NID from this syntax as we go down to the portals layer. It also would lead to a larger degree of change within the OMPI ALPS code base. However, I can give ALPS-based systems the same feature set as the rest of the world. It just is more efficient to use an additional pointer in the orte_node_t structure and results is a far simpler code structure. This makes it easier to maintain. The only thing that "this change" really does is to identify the cluster under which the ALPS allocation is made. If you are addressing a node in another cluster, (e.g., via accept/connect), the clustername/NID pair is unique for ALPS as a hostname on a cluster node is unique between clusters. If you do a gethostname() on a normal cluster node, you are going to get mynameN, or something similar. If you do a gethostname() on an ALPS node, you are going to get nidN; there is no differentiation between cluster A and cluster B. Perhaps, my earlier comment was not accurate. In reality, it provides the same degree of identification for AL
Re: [OMPI devel] -display-map
Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already used the name returned by gethostname to create its session directory structure long before mpirun reads a hostfile. This is why we retain the value from gethostname instead of allowing it to be overwritten by the name in whatever allocation we are given. Using the name in hostfile would require that I either find some way to remember any prior name, or that I tear down and rebuild the session directory tree - neither seems attractive nor simple (e.g., what happens when the user provides multiple entries in the hostfile for the node, each with a different IP address based on another interface in that node? Sounds crazy, but we have already seen it done - which one do I use?). 2. We don't actually store the hostfile info anywhere - we just use it and forget it. For us to add an XML attribute containing any hostfile- related info would therefore require us to re-read the hostfile. I could have it do that -only- in the case of "XML output required", but it seems rather ugly. An alternative might be for you to simply do a "gethostbyname" lookup of the IP address or hostname to see if it matches instead of just doing a strcmp. This is what we have to do internally as we frequently have problems with FQDN vs. non-FQDN vs. IP addresses etc. If the local OS hasn't cached the IP address for the node in question it can take a little time to DNS resolve it, but otherwise works fine. I can point you to the code in OPAL that we use - I would think something similar would be easy to implement in your code and would readily solve the problem. Ralph On Sep 19, 2008, at 7:18 AM, Greg Watson wrote: Ralph, The problem we're seeing is just with the head node. If I specify a particular IP address for the head node in the hostfile, it gets changed to the FQDN when displayed in the map. This is a problem for us as we need to be able to match the two, and since we're not necessarily running on the head node, we can't always do the same resolution you're doing. Would it be possible to use the same address that is specified in the hostfile, or alternatively provide an XML attribute that contains this information? Thanks, Greg On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote: Not in that regard, depending upon what you mean by "recently". The only changes I am aware of wrt nodes consisted of some changes to the order in which we use the nodes when specified by hostfile or - host, and a little #if protectionism needed by Brian for the Cray port. Are you seeing this for every node? Reason I ask: I can't offhand think of anything in the code base that would replace a host name with the FQDN because we don't get that info for remote nodes. The only exception is the head node (where mpirun sits) - in that lone case, we default to the name returned to us by gethostname(). We do that because the head node is frequently accessible on a more global basis than the compute nodes - thus, the FQDN is required to ensure that there is no address confusion on the network. If the user refers to compute nodes in a hostfile or -host (or in an allocation from a resource manager) by non-FQDN, we just assume they know what they are doing and the name will correctly resolve to a unique address. On Sep 10, 2008, at 9:45 AM, Greg Watson wrote: Hi, Has there been a change in the behavior of the -display-map option has changed recently in the 1.3 branch. We're now seeing the host name as a fully resolved DN rather than the entry that was specified in the hostfile. Is there any particular reason for this? If so, would it be possible to add the hostfile entry to the output since we need to be able to match the two? Thanks, Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600
There was a very long drawn-out discussion about this early in 2007. Rather than rehash all that, I'll try to summarize it here. It may get confusing - it helped a whole lot to be in a room with a whiteboard. There were also presentations on the subject - I believe the slides may still be in the docs repository. Because terminology quickly gets confusing, we adopted a slightly different one for these discussions. We talk about OMPI being a "single cell" system - i.e., jobs executed via mpirun can only span nodes that are reachable by that mpirun. In a typical managed environment, a cell aligns quite well with a "cluster". In an unmanaged environment where the user provides a hostfile, the cell will contain all nodes specified in the hostfile. We don't filter or abort for non-matching hostnames - if mpirun can launch on that node, then great. What we don't support is asking mpirun to remotely execute another mpirun on the frontend of another cell in order to launch procs on the nodes in -that- cell, nor do we ask mpirun to in any way manage (or even know about) any procs running on a remote cell. I see what you are saying about the ALPS node name. However, the field you want to add doesn't have anything to do with accept/connect. The orte_node_t object is used solely by mpirun to keep track of the node pool it controls - i.e., the nodes upon which it is launching jobs. Thus, the mpirun on cluster A will have "nid" entries it got from its allocation, and the mpirun on cluster B will have "nid" entries it got from its allocation - but the two mpiruns will never exchange that information, nor will the mpirun on cluster A ever have a need to know the node entries for cluster B. Each mpirun launches and manages procs -only- on the nodes in its own allocation. I agree you will have issues when doing the connect/accept modex as the nodenames are exchanged and are no longer unique in your scenario. However, that info stays in the ompi_proc_t - it never gets communicated to the ORTE layer as we couldn't care less down there about the remote procs since they are under the control of a different mpirun. So if you need to add a cluster id field for this purpose, it needs to go in ompi_proc_t - not in the orte structures. And for that, you probably need to discuss it with the MPI team as changes to ompi_proc_t will likely generate considerable discussion. FWIW: this is one reason I warned Galen about the problems in reviving multi-cluster operations again. We used to deal with multi-cells in the process name itself, but all that support has been removed from OMPI. Hope that helps Ralph On Sep 22, 2008, at 1:39 PM, Matney Sr, Kenneth D. wrote: I may be opening a can of worms... But, what prevents a user from running across clusters in a "normal OMPI", i.e., non-ALPS environment? When he puts hosts into his hostfile, does it parse and abort/filter non-matching hostnames? The problem for ALPS based systems is that nodes are addressed via NID,PID pairs at the portals level. Thus, these are unique only within a cluster. In point of fact, I could rewrite all of the ALPS support to identify the nodes by "cluster_id".NID. It would be a bit inefficient within a cluster because, we would have to extract the NID from this syntax as we go down to the portals layer. It also would lead to a larger degree of change within the OMPI ALPS code base. However, I can give ALPS-based systems the same feature set as the rest of the world. It just is more efficient to use an additional pointer in the orte_node_t structure and results is a far simpler code structure. This makes it easier to maintain. The only thing that "this change" really does is to identify the cluster under which the ALPS allocation is made. If you are addressing a node in another cluster, (e.g., via accept/connect), the clustername/NID pair is unique for ALPS as a hostname on a cluster node is unique between clusters. If you do a gethostname() on a normal cluster node, you are going to get mynameN, or something similar. If you do a gethostname() on an ALPS node, you are going to get nidN; there is no differentiation between cluster A and cluster B. Perhaps, my earlier comment was not accurate. In reality, it provides the same degree of identification for ALPS nodes as hostname provides for normal clusters. From your perspective, it is immaterial that it also would allow us to support our limited form of multi-cluster support. However, of and by itself, it only provides the same level of identification as is done for other cluster nodes. -- Ken -Original Message- From: Ralph Castain [mailto:r...@lanl.gov] Sent: Monday, September 22, 2008 2:33 PM To: Open MPI Developers Cc: Matney Sr, Kenneth D. Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600 The issue isn't with adding a string. The question is whether or not OMPI is to su
Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600
The issue isn't with adding a string. The question is whether or not OMPI is to support one job running across multiple clusters. We made a conscious decision (after lengthy discussions on OMPI core and ORTE mailing lists, plus several telecons) to not do so - we require that the job execute on a single cluster, while allowing connect/accept to occur between jobs on different clusters. It is difficult to understand why we need a string (or our old "cell id") to tell us which cluster we are on if we are only following that operating model. From the commit comment, and from what I know of the system, the only rationale for adding such a designator is to shift back to the one-mpirun-spanning-multiple-cluster model. If we are now going to make that change, then it merits a similar level of consideration as the last decision to move away from that model. Making that move involves considerably more than just adding a cluster id string. You may think that now, but the next step is inevitably to bring back remote launch, killing jobs on all clusters when one cluster has a problem, etc. Before we go down this path and re-open Pandora's box, we should at least agree that is what we intend to do...or agree on what hard constraints we will place on multi-cluster operations. Frankly, I'm tired of bouncing back-and-forth on even the most basic design decisions. Ralph On Sep 22, 2008, at 11:55 AM, Richard Graham wrote: What Ken put in is what is needed for the limited multi-cluster capabilities we need, just one additional string. I don't think there is a need for any discussion of such a small change. Rich On 9/22/08 1:32 PM, "Ralph Castain" wrote: We really should discuss that as a group first - there is quite a bit of code required to actually support multi-clusters that has been removed. Our operational model that was agreed to quite a while ago is that mpirun can -only- extend over a single "cell". You can connect/accept multiple mpiruns that are sitting on different cells, but you cannot execute a single mpirun across multiple cells. Please keep this on your own development branch for now. Bringing it into the trunk will require discussion as this changes the operating model, and has significant code consequences when we look at abnormal terminations, comm_spawn, etc. Thanks Ralph On Sep 22, 2008, at 11:26 AM, Richard Graham wrote: This check in was in error - I had not realized that the checkout was from the 1.3 branch, so we will fix this, and put these into the trunk (1.4). We are going to bring in some limited multi-cluster support - limited is the operative word. Rich On 9/22/08 12:50 PM, "Jeff Squyres" wrote: I notice that Ken Matney (the committer) is not on the devel list; I added him explicitly to the CC line. Ken: please see below. On Sep 22, 2008, at 12:46 PM, Ralph Castain wrote: Whoa! We made a decision NOT to support multi-cluster apps in OMPI over a year ago! Please remove this from 1.3 - we should discuss if/when this would even be allowed in the trunk. Thanks Ralph On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote: Author: matney Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008) New Revision: 19600 URL: https://svn.open-mpi.org/trac/ompi/changeset/19600 Log: Added member to orte_node_t to enable multi-cluster jobs in ALPS scheduled systems (like Cray XT). Text files modified: branches/v1.3/orte/runtime/orte_globals.h | 4 1 files changed, 4 insertions(+), 0 deletions(-) Modified: branches/v1.3/orte/runtime/orte_globals.h = = = = = = = = = = = = = = --- branches/v1.3/orte/runtime/orte_globals.h (original) +++ branches/v1.3/orte/runtime/orte_globals.h 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008) @@ -222,6 +222,10 @@ /** Username on this node, if specified */ char *username; char *slot_list; +/** Clustername (machine name of cluster) on which this node +resides. ALPS scheduled systems need this to enable +multi-cluster support. */ +char *clustername; } orte_node_t; ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t); ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] Commit access to 1.3 restricted to gatekeeper(s)
Hello All, As it became apparent this morning, it was well past the time to actually restrict commit access to the 1.3 branch. As of this afternoon, all changes to the 1.3 branch must occur via the CMR process we are all familiar with from the 1.2 branch. See: https://svn.open-mpi.org/trac/ompi/wiki/SubmittingChangesetMoveReqs Sorry for the delay in actually closing off access since it was agreed that we would close things off two weeks ago, with some fuzz for a few remaining already-in-the-works CMRs. -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... http://www.the-brights.net/
Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600
What Ken put in is what is needed for the limited multi-cluster capabilities we need, just one additional string. I don't think there is a need for any discussion of such a small change. Rich On 9/22/08 1:32 PM, "Ralph Castain" wrote: > We really should discuss that as a group first - there is quite a bit > of code required to actually support multi-clusters that has been > removed. > > Our operational model that was agreed to quite a while ago is that > mpirun can -only- extend over a single "cell". You can connect/accept > multiple mpiruns that are sitting on different cells, but you cannot > execute a single mpirun across multiple cells. > > Please keep this on your own development branch for now. Bringing it > into the trunk will require discussion as this changes the operating > model, and has significant code consequences when we look at abnormal > terminations, comm_spawn, etc. > > Thanks > Ralph > > On Sep 22, 2008, at 11:26 AM, Richard Graham wrote: > >> This check in was in error - I had not realized that the checkout >> was from >> the 1.3 branch, so we will fix this, and put these into the trunk >> (1.4). We >> are going to bring in some limited multi-cluster support - limited >> is the >> operative word. >> >> Rich >> >> >> On 9/22/08 12:50 PM, "Jeff Squyres" wrote: >> >>> I notice that Ken Matney (the committer) is not on the devel list; I >>> added him explicitly to the CC line. >>> >>> Ken: please see below. >>> >>> >>> On Sep 22, 2008, at 12:46 PM, Ralph Castain wrote: >>> Whoa! We made a decision NOT to support multi-cluster apps in OMPI over a year ago! Please remove this from 1.3 - we should discuss if/when this would even be allowed in the trunk. Thanks Ralph On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote: > Author: matney > Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008) > New Revision: 19600 > URL: https://svn.open-mpi.org/trac/ompi/changeset/19600 > > Log: > Added member to orte_node_t to enable multi-cluster jobs in ALPS > scheduled systems (like Cray XT). > > Text files modified: > branches/v1.3/orte/runtime/orte_globals.h | 4 > 1 files changed, 4 insertions(+), 0 deletions(-) > > Modified: branches/v1.3/orte/runtime/orte_globals.h > = > = > = > = > = > = > = > = > = > = > = > === > --- branches/v1.3/orte/runtime/orte_globals.h (original) > +++ branches/v1.3/orte/runtime/orte_globals.h 2008-09-22 12:35:54 > EDT (Mon, 22 Sep 2008) > @@ -222,6 +222,10 @@ > /** Username on this node, if specified */ > char *username; > char *slot_list; > +/** Clustername (machine name of cluster) on which this node > +resides. ALPS scheduled systems need this to enable > +multi-cluster support. */ > +char *clustername; > } orte_node_t; > ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t); > > ___ > svn mailing list > s...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] proper way to shut down orted
Hmmm...well, there -used- to be a tool that was distributed with the 1.2 series for doing just that, but I don't see it in the 1.2.7 release. Not sure when or how that got dropped - probably fell through a crack. Unfortunately, minus that tool, there is no clean way to shut this down. However, you can re-use the universe name if you simply go to your tmp directory, find the session directory for you, and just rm - rf the directory with that universe name. So you want to look for something like "/tmp/openmpi-sessions- username@hostname_0/univ3" per your example below, and blow the "univ3" directory tree away. Sorry it isn't simpler - trying to re-release with that tool is probably more trouble than it is worth now, especially given that the "seed" operation isn't used anymore beginning with the upcoming 1.3 release. Ralph On Sep 22, 2008, at 10:08 AM, Timothy Kaiser wrote: Greetings, I have a manager/worker application. The manager is called "t2a" and the workers "w2d" I launch the manager and each worker with its own mpiexec with n=1. They connect using various calls including MPI_Open_port, MPI_Comm_accept, MPI_Comm_connect and MPI_Intercomm_merge. It works fine. I am using the command: orted --persistent --seed --scope public --universe univ3 --set-sid to set up the universe and the mpiexec commands are: mpiexec -np 1 --universe univ3 t2a mpiexec -np 1 --universe univ3 w2d mpiexec -np 1 --universe univ3 w2d mpiexec -np 1 --universe univ3 w2d Question: What is the proper way to shutdown orted? I have found that if I just kill orted then I can't reuse the universe name. Platforms and OpenMPI versions: OS X openmpi-1.2.7 or openmpi-1.2.6 (ethernet) Rocks openmpi-1.2.6 (Infiniband) Thanks! Tim -- -- Timothy H. Kaiser, Ph.D. tkai...@mines.edu CSM::GECO "Nobody made a greater mistake than he who did nothing because he could only do a little" (Edmund Burke) ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600
We really should discuss that as a group first - there is quite a bit of code required to actually support multi-clusters that has been removed. Our operational model that was agreed to quite a while ago is that mpirun can -only- extend over a single "cell". You can connect/accept multiple mpiruns that are sitting on different cells, but you cannot execute a single mpirun across multiple cells. Please keep this on your own development branch for now. Bringing it into the trunk will require discussion as this changes the operating model, and has significant code consequences when we look at abnormal terminations, comm_spawn, etc. Thanks Ralph On Sep 22, 2008, at 11:26 AM, Richard Graham wrote: This check in was in error - I had not realized that the checkout was from the 1.3 branch, so we will fix this, and put these into the trunk (1.4). We are going to bring in some limited multi-cluster support - limited is the operative word. Rich On 9/22/08 12:50 PM, "Jeff Squyres" wrote: I notice that Ken Matney (the committer) is not on the devel list; I added him explicitly to the CC line. Ken: please see below. On Sep 22, 2008, at 12:46 PM, Ralph Castain wrote: Whoa! We made a decision NOT to support multi-cluster apps in OMPI over a year ago! Please remove this from 1.3 - we should discuss if/when this would even be allowed in the trunk. Thanks Ralph On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote: Author: matney Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008) New Revision: 19600 URL: https://svn.open-mpi.org/trac/ompi/changeset/19600 Log: Added member to orte_node_t to enable multi-cluster jobs in ALPS scheduled systems (like Cray XT). Text files modified: branches/v1.3/orte/runtime/orte_globals.h | 4 1 files changed, 4 insertions(+), 0 deletions(-) Modified: branches/v1.3/orte/runtime/orte_globals.h = = = = = = = = = = = === --- branches/v1.3/orte/runtime/orte_globals.h (original) +++ branches/v1.3/orte/runtime/orte_globals.h 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008) @@ -222,6 +222,10 @@ /** Username on this node, if specified */ char *username; char *slot_list; +/** Clustername (machine name of cluster) on which this node +resides. ALPS scheduled systems need this to enable +multi-cluster support. */ +char *clustername; } orte_node_t; ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t); ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600
This check in was in error - I had not realized that the checkout was from the 1.3 branch, so we will fix this, and put these into the trunk (1.4). We are going to bring in some limited multi-cluster support - limited is the operative word. Rich On 9/22/08 12:50 PM, "Jeff Squyres" wrote: > I notice that Ken Matney (the committer) is not on the devel list; I > added him explicitly to the CC line. > > Ken: please see below. > > > On Sep 22, 2008, at 12:46 PM, Ralph Castain wrote: > >> Whoa! We made a decision NOT to support multi-cluster apps in OMPI >> over a year ago! >> >> Please remove this from 1.3 - we should discuss if/when this would >> even be allowed in the trunk. >> >> Thanks >> Ralph >> >> On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote: >> >>> Author: matney >>> Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008) >>> New Revision: 19600 >>> URL: https://svn.open-mpi.org/trac/ompi/changeset/19600 >>> >>> Log: >>> Added member to orte_node_t to enable multi-cluster jobs in ALPS >>> scheduled systems (like Cray XT). >>> >>> Text files modified: >>> branches/v1.3/orte/runtime/orte_globals.h | 4 >>> 1 files changed, 4 insertions(+), 0 deletions(-) >>> >>> Modified: branches/v1.3/orte/runtime/orte_globals.h >>> = >>> = >>> = >>> = >>> = >>> = >>> = >>> = >>> = >>> = >>> --- branches/v1.3/orte/runtime/orte_globals.h (original) >>> +++ branches/v1.3/orte/runtime/orte_globals.h 2008-09-22 12:35:54 >>> EDT (Mon, 22 Sep 2008) >>> @@ -222,6 +222,10 @@ >>> /** Username on this node, if specified */ >>> char *username; >>> char *slot_list; >>> +/** Clustername (machine name of cluster) on which this node >>> +resides. ALPS scheduled systems need this to enable >>> +multi-cluster support. */ >>> +char *clustername; >>> } orte_node_t; >>> ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t); >>> >>> ___ >>> svn mailing list >>> s...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/svn >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600
I notice that Ken Matney (the committer) is not on the devel list; I added him explicitly to the CC line. Ken: please see below. On Sep 22, 2008, at 12:46 PM, Ralph Castain wrote: Whoa! We made a decision NOT to support multi-cluster apps in OMPI over a year ago! Please remove this from 1.3 - we should discuss if/when this would even be allowed in the trunk. Thanks Ralph On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote: Author: matney Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008) New Revision: 19600 URL: https://svn.open-mpi.org/trac/ompi/changeset/19600 Log: Added member to orte_node_t to enable multi-cluster jobs in ALPS scheduled systems (like Cray XT). Text files modified: branches/v1.3/orte/runtime/orte_globals.h | 4 1 files changed, 4 insertions(+), 0 deletions(-) Modified: branches/v1.3/orte/runtime/orte_globals.h = = = = = = = = = = --- branches/v1.3/orte/runtime/orte_globals.h (original) +++ branches/v1.3/orte/runtime/orte_globals.h 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008) @@ -222,6 +222,10 @@ /** Username on this node, if specified */ char *username; char *slot_list; +/** Clustername (machine name of cluster) on which this node +resides. ALPS scheduled systems need this to enable +multi-cluster support. */ +char *clustername; } orte_node_t; ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t); ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600
Whoa! We made a decision NOT to support multi-cluster apps in OMPI over a year ago! Please remove this from 1.3 - we should discuss if/when this would even be allowed in the trunk. Thanks Ralph On Sep 22, 2008, at 10:35 AM, mat...@osl.iu.edu wrote: Author: matney Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008) New Revision: 19600 URL: https://svn.open-mpi.org/trac/ompi/changeset/19600 Log: Added member to orte_node_t to enable multi-cluster jobs in ALPS scheduled systems (like Cray XT). Text files modified: branches/v1.3/orte/runtime/orte_globals.h | 4 1 files changed, 4 insertions(+), 0 deletions(-) Modified: branches/v1.3/orte/runtime/orte_globals.h = = = = = = = = == --- branches/v1.3/orte/runtime/orte_globals.h (original) +++ branches/v1.3/orte/runtime/orte_globals.h 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008) @@ -222,6 +222,10 @@ /** Username on this node, if specified */ char *username; char *slot_list; +/** Clustername (machine name of cluster) on which this node +resides. ALPS scheduled systems need this to enable +multi-cluster support. */ +char *clustername; } orte_node_t; ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t); ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r19599
Ken -- Can you please also apply this to the trunk? Thanks. On Sep 22, 2008, at 12:20 PM, mat...@osl.iu.edu wrote: Author: matney Date: 2008-09-22 12:20:18 EDT (Mon, 22 Sep 2008) New Revision: 19599 URL: https://svn.open-mpi.org/trac/ompi/changeset/19599 Log: Add #include for stdio.h to allow make check to run with gcc 4.2.4 (on Cray XT platform). Text files modified: branches/v1.3/test/datatype/checksum.c | 1 + branches/v1.3/test/datatype/position.c | 1 + 2 files changed, 2 insertions(+), 0 deletions(-) Modified: branches/v1.3/test/datatype/checksum.c = = = = = = = = == --- branches/v1.3/test/datatype/checksum.c (original) +++ branches/v1.3/test/datatype/checksum.c 2008-09-22 12:20:18 EDT (Mon, 22 Sep 2008) @@ -15,6 +15,7 @@ #include "ompi/datatype/datatype.h" #include "ompi/datatype/datatype_checksum.h" +#include #include #include Modified: branches/v1.3/test/datatype/position.c = = = = = = = = == --- branches/v1.3/test/datatype/position.c (original) +++ branches/v1.3/test/datatype/position.c 2008-09-22 12:20:18 EDT (Mon, 22 Sep 2008) @@ -11,6 +11,7 @@ */ #include "ompi_config.h" +#include #include #include "ompi/datatype/convertor.h" #include "ompi/datatype/datatype.h" ___ svn-full mailing list svn-f...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Jeff Squyres Cisco Systems
[OMPI devel] proper way to shut down orted
Greetings, I have a manager/worker application. The manager is called "t2a" and the workers "w2d" I launch the manager and each worker with its own mpiexec with n=1. They connect using various calls including MPI_Open_port, MPI_Comm_accept, MPI_Comm_connect and MPI_Intercomm_merge. It works fine. I am using the command: orted --persistent --seed --scope public --universe univ3 --set-sid to set up the universe and the mpiexec commands are: mpiexec -np 1 --universe univ3 t2a mpiexec -np 1 --universe univ3 w2d mpiexec -np 1 --universe univ3 w2d mpiexec -np 1 --universe univ3 w2d Question: What is the proper way to shutdown orted? I have found that if I just kill orted then I can't reuse the universe name. Platforms and OpenMPI versions: OS X openmpi-1.2.7 or openmpi-1.2.6 (ethernet) Rocks openmpi-1.2.6 (Infiniband) Thanks! Tim -- -- Timothy H. Kaiser, Ph.D. tkai...@mines.edu CSM::GECO "Nobody made a greater mistake than he who did nothing because he could only do a little" (Edmund Burke)
Re: [OMPI devel] -display-map and mpi_spawn
We always output the entire map, so you'll see the parent procs as well as the child On Sep 16, 2008, at 12:52 PM, Greg Watson wrote: Hi Ralph, No I'm happy to get a map at the beginning and at every spawn. Do you send the whole map again, or only an update? Regards, Greg On Sep 11, 2008, at 9:09 AM, Ralph Castain wrote: It already somewhat does. If you use --display-map at mpirun, you automatically get display-map whenever MPI_Spawn is called. We didn't provide a mechanism by which you could only display-map for MPI_Spawn (and not for the original mpirun), but it would be trivial to do so - just have to define an info-key for that purpose. Is that what you need? On Sep 11, 2008, at 5:35 AM, Greg Watson wrote: Ralph, At the moment -display-map shows the process mapping when mpirun first starts, but I'm wondering about processes created dynamically. Would it be possible to trigger a map update when MPI_Spawn is called? Regards, Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel