On further thought, perhaps I should be clearer. If you are saying that you need to read the hostfile to display the cluster *before* the user actually submits a job for execution, then fine - go ahead and call rds.query.
What I'm trying to communicate to you is that you need to call setup_job when you are launching the resulting application. If you want, you could do the following: 1. call orte_rds.query(ORTE_JOBID_INVALID) to get your host info. Note that only a hostfile will be read here - so if you are in (for example) a bproc environment, you won't get any node info at this point. 2. when you are ready to launch the app, call orte_rmgr.spawn with an attribute list that contains ORTE_RMGR_SPAWN_FLOW with a value of ORTE_RMGR_SETUP | ORTE_RMGR_ALLOC | ORTE_RMGR_MAP | ORTE_RMGR_SETUP_TRIGS | ORTE_RMGR_LAUNCH. This will tell spawn to do everything *except* rds.query so you avoid re-entering the hostfile info. Unfortunately, if you want to see node info prior to launch on anything other than a hostfile, we really don't have a way to do that right now. The ORTE 2.0 design allows for it, but we haven't implemented that yet - probably a few months away. Hope that helps Ralph On 1/29/07 6:45 PM, "Ralph Castain" <r...@lanl.gov> wrote: > > > > On 1/29/07 5:57 PM, "Greg Watson" <gwat...@lanl.gov> wrote: > >> Ralph, >> >> On Jan 29, 2007, at 11:10 AM, Ralph H Castain wrote: >> >>> >>> >>> >>> On 1/29/07 10:20 AM, "Greg Watson" <gwat...@lanl.gov> wrote: >>> >>>> >>>> No, we have always called query() first, just after orte_init(). >>>> Since query() has never required a job id before, this used to work. >>>> I think the call was required to kick the SOH into action, but I'm >>>> not sure if it was needed for any other purpose. >>> >>> Query has nothing to do with the SOH - the only time you would >>> "need" it >>> would be if you are reading a hostfile. Otherwise, it doesn't do >>> anything at >>> the moment. >>> >>> >>> Not calling setup_job would be risky, in my opinion... >> >> We've had this discussion before. We *need* to read the hostfile >> before calling setup_job() because we have to populate the registry >> with node information. If you're saying that this is now no longer >> possible, then I'd respectfully ask that this functionality be >> restored before you release 1.2. If there is some other way to >> achieve this, then please let me know. We've been doing this ever >> since 1.0 and in the alpha and beta versions of 1.2. > > I think you don't understand what setup_job does. Setup_job has four > arguments: > > (a) an array of app_context objects that contain the application to be > launched > > (b) the number of elements in that array > > (c) a pointer to a location where the jobid for this job is to be returned; > and > > (d) a list of attributes that allows the caller to "fine-tune" behavior > > With that info, setup_job will: > > (a) create a new jobid for your application; and > > (b) store the app_context info in appropriate places in the registry > > And that is *all* setup_job does - it simply gets a jobid and initializes some > important info in the registry. It never looks at node information, nor does > it in any way impact node info. > > Calling rds.query after rmgr.setup_job is how we always do it. In truth, the > precise ordering of those two operations is immaterial as they have absolutely > nothing in common. However, we always do it in the described order so that > rds.query can have a valid jobid. As I said, at the moment rds.query doesn't > actually use the jobid, though that will change at some point in the future. > > Although it isn't *absolutely* necessary, I would still suggest that you call > rmgr.setup_job before calling rds.query to ensure that any subsequent > operations have all the info they require to function correctly. You can see > the progression we use in orte/mca/rmgr/urm/rmgr_urm.c - I believe you will > find it helpful to follow that logic. > > Alternatively, if you want, you can simply repeatedly call orte_rmgr.spawn and > use the attributes I built for you to step your way through the standard > launch. As you probably recall, I gave you the ability to specify - at a very > atomistic level - exactly which steps in the spawn process were to be > implemented at each call into rmgr.spawn. You can look at the referenced file > to see the attribute for each step in the procedure. > > >> >>> >>> >>>> >>>> Are there likely to be further API changes before the release >>>> version? We are trying to release PTP, but I think this is impossible >>>> until your API's stabilize. >>> >>> None planned, other than what I mentioned above. If you want to >>> support Open >>> MPI 1.2, you may need a slight phase shift, though, so you can see >>> the final >>> release. >> >> Please explain "phase shift". >> >> Greg >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel ------ End of Forwarded Message