Hi George

Your help is welcome! See below for some thoughts

On Oct 29, 2008, at 8:12 AM, George Bosilca wrote:

Thanks Ralph, this indeed fixed my problem. However, I run in more troubles ...

I have a simple application that keep spawning MPI processes, exchange some data and then the children disconnect and vanish. But I keep doing this in a loop ... absolutely legal from the MPI standard perspective. However, with Open MPI trunk I run in two kinds of troubles:

1. I run out of fds. Apparently the orteds don't close the connections when the children disconnect, and after few iterations I exhaust the available fd, the orted start complaining and everything end up being killed. If I check with lsof I can see the pending fd (in an invalid state) but still attached to the orted.

Good point - this was actually the case with the old system too, IIRC. We didn't have a mechanism by which the orted could reach down into the iof and "close" the file descriptors from a child process when it terminates.

Here is what I suggest:

1. in orte/mca/odls/base/ odls_base_default_fns.c:odls_base_default_wait_local_proc function, add a call to orte_iof.close(child->name)

2. in orte/mca/iof/orted/iof_orted.c:orted_close, look up the read events and sinks that refer to that process and close those fds. Be sure to also terminate the read events, cleanup any outputs still on the sink's write event, and release those objects

That should do the trick.



2. I tried to be helpful and provide a host file describing the cluster. I even annotate the nodes with he number of slots and max- slots. When we spawn processes we correctly load balance them on the available nodes, but when they finish we do not release the resources. After few iterations we run out of available nodes, and the application exit with the following error:
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
 ./slave

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------

However, at this point there is only one MPI process running, the master. All other resources are fully available for the children.

This isn't an IOF issue, but rather a problem in how we track resource usage in mpirun. When a job completes, we don't "release" its resources back to the node pool.

Been that way since day one, now that I think about it - just nobody noticed! :-)

Here is what I suggest.In orte/mca/plm/base/plm_base_launch_support.c: orte_plm_base_check_job_completed - this is where we detect that a job has actually completed. You could add a function call here to a new routine that:

1. calls orte_rmaps.get_job_map(job) to get the map for this job - that will tell you exactly which nodes and how many slots on each of those nodes was used

2. the nodes in the map are stored as pointers to the corresponding orte_node_t object in the global orte_node_pool. So all you would need to do is cycle through the resulting array of node pointers, decrementing their slots_in_use by the appropriate amount.

That should do the trick. I can't think of anything else that would be required, though I can't swear I didn't miss something.

Thanks!
Ralph




I would like to get involved in this and help fix the two problems. But I have a hard time figuring out where to start. Any pointers will be welcomed.

 Thanks,
   george.

On Oct 28, 2008, at 10:50 AM, Ralph Castain wrote:

Done...r19820

On Oct 28, 2008, at 8:37 AM, Ralph Castain wrote:

Yes, of course it does - the problem is in a sanity check I just installed over the weekend.

Easily fixed...


On Oct 28, 2008, at 8:33 AM, George Bosilca wrote:

Ralph,

I run in troubles with the new IO framework when I spawn a new process. The following error message is dumped and the job is aborted.

--------------------------------------------------------------------------
The requested stdin target is out of range for this job - it points
to a process rank that is greater than the number of process in the
job.

Specified target: INVALID
Number of procs: 2

This could be caused by specifying a negative number for the stdin
target, or by mistyping the desired rank. Please correct the cmd line
and try again.
--------------------------------------------------------------------------

Is the new IO framework supposed to support MPI2 dynamics ?

Thanks,
george.

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to