Re: [OMPI devel] System V Shared Memory forOpenMPI:Request forCommunity Input and Testing

2010-05-05 Thread Samuel K. Gutierrez

On May 5, 2010, at 6:10 AM, Jeff Squyres wrote:


On May 4, 2010, at 9:53 AM, Ashley Pittman wrote:

Point noted.  But actually -- can you give specific reasons as to  
why a user should care?  Keep in mind that this would be a short- 
lived fork'ed process -- not "spawn" in the MPI sense of the word.


You might be running the job under Valgrind or another debugger,  
bclr has some issues with fork as I remember and traditionally  
there have been IB mapping issues here as well.  I'm sure you could  
make a case against any of those points if you wanted to but I  
think the argument stands, doing this kind of run-time check  
shouldn't be needed.


Mmm; good points (especially Valgrind).  BLCR and OpenFabrics verbs  
shouldn't be much of an issue here, but I can see that there might  
be unexpectedness if you're running under Valgrind or some other  
debugger.


It might be possible to construct the code however so that if it  
failed to initialise it just wasn't used rather than aborted the  
job which would have much the same effect as a run-time test but  
without having to fork new processes and create short-lived shared  
memory regions.


That's how most of the network transports are in OMPI today -- if  
they fail to init, they are just skipped.


The problem here is that you really need 2 processes to do this  
test.  I suppose it could be done with local ranks 0 and 1 instead  
of forking a new process -- they would just need to communicate via  
RML to sync up, I suppose.


I need to think about it a little more, but I like this solution.

Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory



I should of course said fork where I mentioned spawn above to avoid  
any confusion, spawn has a specific meaning in the context of MPI.


I still think a better understanding of the issue is required  
before any decision here is made though, I'm surprised by Samuels  
description of the problem because it's not how I remember it and  
from what Chris says it doesn't reflect what is in linux Git code  
either.  I'd like to see why there is an apparent difference in  
behaviour before a decision is made to only support one.


There's no intent to only support sysv or mmap.  Samuel's work was  
to extend OMPI to support sysv in the case where it would be  
advantageous (e.g., guaranteed cleanup of the shmem segment).  The  
mmap stuff is definitely not going to be removed.


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] System V Shared Memory forOpenMPI:Request forCommunity Input and Testing

2010-05-05 Thread Terry Dontje

Jeff Squyres wrote:

On May 4, 2010, at 9:53 AM, Ashley Pittman wrote:

  

Point noted.  But actually -- can you give specific reasons as to why a user should care? 
 Keep in mind that this would be a short-lived fork'ed process -- not "spawn" 
in the MPI sense of the word.
  

You might be running the job under Valgrind or another debugger, bclr has some 
issues with fork as I remember and traditionally there have been IB mapping 
issues here as well.  I'm sure you could make a case against any of those 
points if you wanted to but I think the argument stands, doing this kind of 
run-time check shouldn't be needed.



Mmm; good points (especially Valgrind).  BLCR and OpenFabrics verbs shouldn't 
be much of an issue here, but I can see that there might be unexpectedness if 
you're running under Valgrind or some other debugger.
  
Couldn't you also run into problems if a job is running under an RM that 
is enforcing a number of processes limit on the job?


--td
  

It might be possible to construct the code however so that if it failed to 
initialise it just wasn't used rather than aborted the job which would have 
much the same effect as a run-time test but without having to fork new 
processes and create short-lived shared memory regions.



That's how most of the network transports are in OMPI today -- if they fail to 
init, they are just skipped.

The problem here is that you really need 2 processes to do this test.  I 
suppose it could be done with local ranks 0 and 1 instead of forking a new 
process -- they would just need to communicate via RML to sync up, I suppose.

  

I should of course said fork where I mentioned spawn above to avoid any 
confusion, spawn has a specific meaning in the context of MPI.

I still think a better understanding of the issue is required before any 
decision here is made though, I'm surprised by Samuels description of the 
problem because it's not how I remember it and from what Chris says it doesn't 
reflect what is in linux Git code either.  I'd like to see why there is an 
apparent difference in behaviour before a decision is made to only support one.



There's no intent to only support sysv or mmap.  Samuel's work was to extend 
OMPI to support sysv in the case where it would be advantageous (e.g., 
guaranteed cleanup of the shmem segment).  The mmap stuff is definitely not 
going to be removed.

  



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com