Re: [OMPI users] Open MPI 1.4.2 released

2010-05-27 Thread David Singleton

On 05/28/2010 08:20 AM, Jeff Squyres wrote:

On May 16, 2010, at 5:21 AM, Aleksej Saushev wrote:


http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/parallel/openmpi/patches/


Sorry for the high latency reply...

aa: We haven't added RPATH support yet.  We've talked about it but never done 
it.  There are some in OMPI who insist that rpath support needs to be optional. 
 A full patch solution would be appreciated.



We have problems with rpath overriding LD_RUN_PATH.  LD_RUN_PATH is
an intrinsic part of the way we configure our user's environment.  We
effectively use (impose) rpath but through the flexible, concatenatable
LD_RUN_PATH.

David


Re: [hwloc-users] hwloc on systems with more than 64 cpus?

2010-05-27 Thread Jirka Hladky
On Thursday 27 May 2010 11:47:25 pm Brice Goglin wrote:
> Le 27/05/2010 23:28, Jirka Hladky a écrit :
> >> hwloc-calc doesn't accept input from stdin, it only reads the
> >> command-line. We have a TODO entry about this, I'll work on it soon.
> >> 
> >> For now, you can do:
> >>  hwloc-distrib ... | xargs -n 1 utils/hwloc-calc
> > 
> > I forgot to use "-n 1" switch in xargs to send only 1 cpu set per one
> > hwloc- calc command.
> > 
> > This works just fine: :-)
> > hwloc-distrib --single 8 | xargs -n1 hwloc-calc --taskset
> > 
> > Perhaps you can add this example to hwloc-distrib man page?
> 
> I've added the stdin support to hwloc-calc so I don't think it matters
> anymore: "hwloc-distrib --single 8 | hwloc-calc --taskset" should do
> what you want. I'll add something like this to the manpage.
> 
> Brice

Great, thanks!
Jirka



Re: [OMPI users] Open MPI 1.4.2 released

2010-05-27 Thread Jeff Squyres
On May 16, 2010, at 5:21 AM, Aleksej Saushev wrote:

> http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/parallel/openmpi/patches/

Sorry for the high latency reply...

aa: We haven't added RPATH support yet.  We've talked about it but never done 
it.  There are some in OMPI who insist that rpath support needs to be optional. 
 A full patch solution would be appreciated.

ab: This should now be moot on the dev trunk as of r23158.  It won't go to 
v1.4, but it is slated for the v1.5 series.  I was waiting for your reply to my 
off-list pings on testing this stuff before I filed a v1.5 CMR, but I just went 
ahead and filed one anyway: https://svn.open-mpi.org/trac/ompi/ticket/2423.

ac: ditto to ab

ad: ditto to ab

ae: ditto to ab

af: ditto to ab -- but I might have missed this one.  Can you test?

ag: ditto to ab -- but I might have missed this one.  Can you test?

ah: this should be applied -- did we miss it?  Gah!  I just checked and it 
didn't go.  What the heck happened here... (checking)  I see that it went into 
v1.5.  It supposedly went into v1.4 in r22890.  gahh!  It looks like the 
commit message on r22890 *says* it put in r22640, but it didn't actually *do* 
it.  :-(

ag: should be moot by ab, above.

ai: I think you explained this to me before, but I forget (sorry!).  These are 
actually configuration files, not example files.  Hence, we install them into 
sysconfdir.  Is this a difference of definitions, somehow?  (i.e., what you 
define as usage policies for exampledir and sysconfdir)

aj: ditto to ai

ak: ditto to ai

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] request_get_status: Recheck request status [PATCH]

2010-05-27 Thread Jeff Squyres
Thanks for the ping -- sorry it took so long!

Committed to the SVN trunk in r23215; I filed CMR's for v1.4 and v1.5.  It's 
technically not a bug, so I don't know if the v1.4 RM's will allow it.


On May 27, 2010, at 12:02 PM, Shaun Jackman wrote:

> Ping.
> 
> On Tue, 2010-05-04 at 14:06 -0700, Shaun Jackman wrote:
> > Hi Jeff,
> >
> > request_get_status polls request->req_complete before calling
> > opal_progress. Ideally, it would check req_complete, call opal_progress,
> > and check req_complete one final time. This patch identically mirrors
> > the logic of
> > ompi_request_default_test in ompi/request/req_test.c.
> >
> > We've discussed this patch on the mailing list previously. I think we
> > both agreed it was a good idea, but never made it around to being
> > applied.
> >
> > Cheers,
> > Shaun
> >
> > 2009-09-14  Shaun Jackman  
> >
> >   * ompi/mpi/c/request_get_status.c (MPI_Request_get_status):
> >   If opal_progress is called then check the status of the request
> >   before returning. opal_progress is called only once. This logic
> >   parallels MPI_Test (ompi_request_default_test).
> >
> > --- ompi/mpi/c/request_get_status.c.orig  2008-11-04 12:56:27.0 
> > -0800
> > +++ ompi/mpi/c/request_get_status.c   2009-09-24 15:30:09.99585 -0700
> > @@ -41,6 +41,10 @@
> >  int MPI_Request_get_status(MPI_Request request, int *flag,
> > MPI_Status *status)
> >  {
> > +#if OMPI_ENABLE_PROGRESS_THREADS == 0
> > +int do_it_once = 0;
> > +#endif
> > +
> >  MEMCHECKER(
> >  memchecker_request();
> >  );
> > @@ -57,6 +61,9 @@
> >  }
> >  }
> > 
> > +#if OMPI_ENABLE_PROGRESS_THREADS == 0
> > + recheck_request_status:
> > +#endif
> >  opal_atomic_mb();
> >  if( (request == MPI_REQUEST_NULL) || (request->req_state == 
> > OMPI_REQUEST_INACTIVE) ) {
> >  *flag = true;
> > @@ -78,9 +85,17 @@
> >  }
> >  return MPI_SUCCESS;
> >  }
> > -*flag = false;
> >  #if OMPI_ENABLE_PROGRESS_THREADS == 0
> > -opal_progress();
> > +if( 0 == do_it_once ) {
> > +/**
> > + * If we run the opal_progress then check the status of the 
> > request before
> > + * leaving. We will call the opal_progress only once per call.
> > + */
> > +opal_progress();
> > +do_it_once++;
> > +goto recheck_request_status;
> > +}
> >  #endif
> > +*flag = false;
> >
> 
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] request_get_status: Recheck request status [PATCH]

2010-05-27 Thread Shaun Jackman
Ping.

On Tue, 2010-05-04 at 14:06 -0700, Shaun Jackman wrote:
> Hi Jeff,
> 
> request_get_status polls request->req_complete before calling
> opal_progress. Ideally, it would check req_complete, call opal_progress,
> and check req_complete one final time. This patch identically mirrors
> the logic of
> ompi_request_default_test in ompi/request/req_test.c.
> 
> We've discussed this patch on the mailing list previously. I think we
> both agreed it was a good idea, but never made it around to being
> applied.
> 
> Cheers,
> Shaun
> 
> 2009-09-14  Shaun Jackman  
> 
>   * ompi/mpi/c/request_get_status.c (MPI_Request_get_status):
>   If opal_progress is called then check the status of the request
>   before returning. opal_progress is called only once. This logic
>   parallels MPI_Test (ompi_request_default_test).
> 
> --- ompi/mpi/c/request_get_status.c.orig  2008-11-04 12:56:27.0 
> -0800
> +++ ompi/mpi/c/request_get_status.c   2009-09-24 15:30:09.99585 -0700
> @@ -41,6 +41,10 @@
>  int MPI_Request_get_status(MPI_Request request, int *flag,
> MPI_Status *status) 
>  {
> +#if OMPI_ENABLE_PROGRESS_THREADS == 0
> +int do_it_once = 0;
> +#endif
> +
>  MEMCHECKER(
>  memchecker_request();
>  );
> @@ -57,6 +61,9 @@
>  }
>  }
>  
> +#if OMPI_ENABLE_PROGRESS_THREADS == 0
> + recheck_request_status:
> +#endif
>  opal_atomic_mb();
>  if( (request == MPI_REQUEST_NULL) || (request->req_state == 
> OMPI_REQUEST_INACTIVE) ) {
>  *flag = true;
> @@ -78,9 +85,17 @@
>  }
>  return MPI_SUCCESS;
>  }
> -*flag = false;
>  #if OMPI_ENABLE_PROGRESS_THREADS == 0
> -opal_progress();
> +if( 0 == do_it_once ) {
> +/**
> + * If we run the opal_progress then check the status of the request 
> before
> + * leaving. We will call the opal_progress only once per call.
> + */
> +opal_progress();
> +do_it_once++;
> +goto recheck_request_status;
> +}
>  #endif
> +*flag = false;
> 



Re: [OMPI users] Some Questions on Building OMPI on Linux Em64t

2010-05-27 Thread Jeff Squyres
On May 26, 2010, at 3:32 PM, Michael E. Thomadakis wrote:

> How do you handle thread/task and memory affinity? Do you pass the requested 
> affinity desires to the batch scheduler and them let it issue the specific 
> placements for threads to the nodes ? 

Not as of yet, no.  At the moment, Open MPI only obeys its own affinity 
settings, usually passed via mpirun (see mpirun(1)).

> This is something we are concerned as we are running multiple jobs on same 
> node and we don't want to oversubscribe cores by binding there threads 
> inadvertandly. 
> 
> Looking at ompi_info 
>  $ ompi_info | grep -i aff
>MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
>MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2)
> 
> does this mean we have the full affinity support included or do I need to 
> involve HWLOC in any way ?

Yes, Open MPI processes can bind themselves to sockets / cores.  The 1.4 series 
uses PLPA behind the scenes for processor affinity stuff (the first_use stuff 
is for memory affinity).  The 1.5 series will eventually use hwloc (we just 
recently imported it into our development trunk, but it's still "soaking" 
before moving over to the v1.5 branch (we've found at least one minor problem 
so far).  It'll likely be there for the v1.5.1 series.

That being said, you can certainly ignore OMPI's intrinsic binding capabilities 
and use a standalone program like hwloc-bind or taskset to bind MPI processes.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Deadlock question

2010-05-27 Thread Gijsbert Wiesenekker

On May 24, 2010, at 20:27 , Eugene Loh wrote:

> Gijsbert Wiesenekker wrote:
> 
>> My MPI program consists of a number of processes that send 0 or more 
>> messages (using MPI_Isend) to 0 or more other processes. The processes check 
>> periodically if messages are available to be processed. It was running fine 
>> until I increased the message size, and I got deadlock problems. Googling 
>> learned I was running into a classic deadlock problem if (see for example 
>> http://www.cs.ucsb.edu/~hnielsen/cs140/mpi-deadlocks.html). The workarounds 
>> suggested like changing the order of MPI_Send and MPI_Recv do not work in my 
>> case, as it could be that one processor does not send any message at all to 
>> the other processes, so MPI_Recv would wait indefinitely.
>> Any suggestions on how to avoid deadlock in this case?
>> 
> The problems you describe would seem to arise with blocking functions like 
> MPI_Send and MPI_Recv.  With the non-blocking variants MPI_Isend/MPI_Irecv, 
> there shouldn't be this problem.  There should be no requirement of ordering 
> the functions in the way that web page describes... that workaround is 
> suggested for the blocking calls.  It feels to me that something is missing 
> from your description.
> 
> If you know the maximum size any message will be, you can post an MPI_Irecv 
> with wild card tags and source ranks.  You can post MPI_Isend calls for 
> whatever messages you want to send.  You can use MPI_Test to check if any 
> message has been received;  if so, process the received message and re-post 
> the MPI_Irecv.  You can use MPI_Test to check if any send messages have 
> completed;  if so, you can reuse those send buffers.  You need some signal to 
> indicate to processes that no further messages will be arriving.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

My program was running fine using the methods you describe 
(MPI_Isend/MPI_Test/MPI_Irecv), until I increased the message size. My program 
was not running very efficient because of the MPI overhead associated with 
sending/receiving a large number of small messages. So I decided to combine 
messages before sending them, and then I got the deadlock problems: the 
MPI_Test calls never returned true, so the MPI_Isend calls never completed. As 
described on the link given above, the reason was that I exhausted the MPI 
system buffer space, in combination with the unsafe ordering of the 
send/receive calls (but I cannot see how I can change that order given the 
nature of my program).
See for example also 
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.pe.doc/pe_422/am10600481.html:
 'Destination buffer space unavailability cannot cause a safe MPI program to 
fail, but could cause hangs in unsafe MPI programs. An unsafe program is one 
that assumes MPI can guarantee system buffering of sent data until the receive 
is posted.' 

Gijsbert