Hi Ken,

I thank you a lot for your reply, I will think about it and do some more tests. I was only thinking about using MPI threads, but yes as you say if two threads are scheduled on the same core, that wouldn't be pretty at all. I can probably do some more tests of that functionality, but I don't expect to have great results.

I'm not sure to correctly understand what you say about the spawn. I found a presentation on the web from Richard Graham saying that the spawn functionality was implemented as well as it says in this presentation that you get a full MPI 2 support on the Cray XT. When I said that I had problems with the MPI_Comm_accept/connect functions, I meant that I actually get errors when I try to do a "simple" MPI_Open_port, do you know where I can find in the code whether this function is implemented or not? If it is implemented, knowing where it is defined would help me to find the origin of my problem and possibly extend the support of this functionality (if it is feasible). I would like to be able to link two different jobs together using these functions, ie. creating a communicator between the jobs.

Thanks,

Jerome

On 07/09/2010 07:16 PM, Matney Sr, Kenneth D. wrote:
Hello Jerome,

The first one is simple.  portals is not thead-safe on the Cray XT.  As, I 
recall,
only the master thread can post an event. although any thread can receive
the event.  Although, i might have it backwards.  It has been a couple of years
since I played with this.

The second one depends on how you use your Cray XT.  In our case, the machine
is used as process-per-core; i.e., not as a collection of SMPs.  For performance
reasons, you definitely do not want MPI threads.  Also, since it is run 
process-per-core,
there is nothing to be gained with progress threads.  Portals events will 
generate a kernel
level interrupt.  Whether you can run the XT as a cluster of SMPs is another 
question
entirely.  We really have not tried this in the context of OMPI.  But, in 
conjunction with
portals, this might open a "can of worms".  For example, any thread can be run 
on any
core.  But the portals ID for a thread will be the NID/PID pair for that core.  
If two threads
get scheduled to the same core, it would not be pretty.

I could see lots of reasons why spawn might fail.  First, it is run on a 
compute node.
There is no way for a compute node to run a process on another compute node.
Also, there will be no rank/size initialization forthcoming from ALPS.  So, 
even if
it got past this, it would be running on the same node as its parent.
-- Ken Matney, Sr.
    Oak Ridge National Laboratory


On Jul 9, 2010, at 7:53 AM, Jerome Soumagne wrote:

Hi,

As I said in the previous e-mail, we've recently installed OpenMPI on a Cray 
XT5 machine, and we therefore use the portals and the alps libraries. Thanks 
for providing the configuration script from Jaguar, this was very helpful, it 
had just to be slightly adapted in order to use the latest CNL version 
installed on this machine.

I have some questions though regarding the use of the portals btl and mtl 
components. I noticed that when I compiled OpenMPI with mpi-thread support 
enabled and ran a job, the portals components did not want to initialize due to 
these funny lines:

./mtl_portals_component.c
182     /* we don't run with no stinkin' threads */
183     if (enable_progress_threads || enable_mpi_threads) return NULL;

I'd like to know why are mpi threads disabled since threads are supported on 
the XT5, does the btl/mtl require to have thread-safety implemented or 
something like that or is it because of the portals library itself ?

I would also like to use the MPI_Comm_accept/connect functions, it seems that 
it's not possible to do that using the portals mtl even if the spawn seems to 
be supported, did I do something wrong or is it really not supported?
In this case, is it possible to extend this module to support these functions? 
We could help in doing that.

I'd like also to know, are there any plans for creating a module in order to 
use the DMAPP interface for the Gemini interconnect?

Thanks.

Jerome


--
Jérôme Soumagne
Scientific Computing Research Group
CSCS, Swiss National Supercomputing Centre
Galleria 2, Via Cantonale  | Tel: +41 (0)91 610 8258
CH-6928 Manno, Switzerland | Fax: +41 (0)91 610 8282



<ATT00001..txt>


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to