Re: [OMPI devel] regression with derived datatypes

2014-05-08 Thread Hjelm, Nathan T
Since I have a system that has the scif libraries installed I will try to 
reproduce and see if I can come up with a fix. It will probably be sometime 
next week at the earliest.

-Nathan

From: devel [devel-boun...@open-mpi.org] on behalf of Gilles Gouaillardet 
[gilles.gouaillar...@iferc.org]
Sent: Wednesday, May 07, 2014 9:03 PM
To: de...@open-mpi.org
Subject: Re: [OMPI devel] regression with derived datatypes

On 2014/05/08 2:15, Ralph Castain wrote:
> I wonder if that might also explain the issue reported by Gilles regarding 
> the scif BTL? In his example, the problem only occurred if the message was 
> split across scif and vader. If so, then it might be that splitting messages 
> in general is broken.
>
i am afraid there is a misunderstanding :
the problem always occur with scif,vader,self (regardless the ompi v1.8
version)
the problem occurs with scif,self only if r31496 is applied to ompi v1.8


In my previous email
http://www.open-mpi.org/community/lists/devel/2014/05/14699.php
i reported the following interesting fact :

with ompi v1.8 (latest r31678), the following command produces incorrect
results :
mpirun -host localhost -np 2 --mca btl scif,self ./test_scif

but with ompi v1.8 r31309, the very same command produces correct results

Elena pointed that r31496 is a suspect. so i took the latest v1.8
(r31678) and reverted r31496 and ...


mpirun -host localhost -np 2 --mca btl scif,self ./test_scif

works again !

note that the "default"
mpirun -host localhost -np 2 --mca btl scif,vader,self ./test_scif
still produces incorrect results

in order to reproduce the issue, a MIC is *not* needed,
you only need to install the software stack, load the mic kernel module
and make sure you can read/write /dev/mic/*

bottom line, there are two issues here :
1) r31496 broke something : mpirun -np 2 -host localhost --mca btl
scif,self ./test_scif
2) something else never worked : mpirun -np 2 -host localhost --mca btl
scif,vader,self ./test_scif

Gilles

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/05/14739.php


Re: [OMPI devel] regression with derived datatypes

2014-05-08 Thread George Bosilca
Nathan, or anybody with access to the target hardware,

If you can provide a minimalistic output of the applications with and
without the above-mentioned patch and with mpi_ddt_unpack_debug and
mpi_ddt_pack_debug, and mpi_ddt_position_debug set to 1, I would try
to help.

  George.


On Thu, May 8, 2014 at 2:50 AM, Hjelm, Nathan T  wrote:
> Since I have a system that has the scif libraries installed I will try to 
> reproduce and see if I can come up with a fix. It will probably be sometime 
> next week at the earliest.
>
> -Nathan
> 
> From: devel [devel-boun...@open-mpi.org] on behalf of Gilles Gouaillardet 
> [gilles.gouaillar...@iferc.org]
> Sent: Wednesday, May 07, 2014 9:03 PM
> To: de...@open-mpi.org
> Subject: Re: [OMPI devel] regression with derived datatypes
>
> On 2014/05/08 2:15, Ralph Castain wrote:
>> I wonder if that might also explain the issue reported by Gilles regarding 
>> the scif BTL? In his example, the problem only occurred if the message was 
>> split across scif and vader. If so, then it might be that splitting messages 
>> in general is broken.
>>
> i am afraid there is a misunderstanding :
> the problem always occur with scif,vader,self (regardless the ompi v1.8
> version)
> the problem occurs with scif,self only if r31496 is applied to ompi v1.8
>
>
> In my previous email
> http://www.open-mpi.org/community/lists/devel/2014/05/14699.php
> i reported the following interesting fact :
>
> with ompi v1.8 (latest r31678), the following command produces incorrect
> results :
> mpirun -host localhost -np 2 --mca btl scif,self ./test_scif
>
> but with ompi v1.8 r31309, the very same command produces correct results
>
> Elena pointed that r31496 is a suspect. so i took the latest v1.8
> (r31678) and reverted r31496 and ...
>
>
> mpirun -host localhost -np 2 --mca btl scif,self ./test_scif
>
> works again !
>
> note that the "default"
> mpirun -host localhost -np 2 --mca btl scif,vader,self ./test_scif
> still produces incorrect results
>
> in order to reproduce the issue, a MIC is *not* needed,
> you only need to install the software stack, load the mic kernel module
> and make sure you can read/write /dev/mic/*
>
> bottom line, there are two issues here :
> 1) r31496 broke something : mpirun -np 2 -host localhost --mca btl
> scif,self ./test_scif
> 2) something else never worked : mpirun -np 2 -host localhost --mca btl
> scif,vader,self ./test_scif
>
> Gilles
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14739.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14742.php


Re: [OMPI devel] regression with derived datatypes

2014-05-08 Thread Gilles Gouaillardet
George,

you do not need any hardware, just download MPSS from Intel and install it.
make sure the mic kernel module is loaded *and* you can read/write to the
newly created /dev/mic/* devices.

/* i am now running this on a virtual machine with no MIC whatsoever */

i was able to improve things a bit for the new attached test case
/* send MPI_PACKED / recv newtype */
with the attached unpack.patch.

it has to be applied on r31678 (aka the latest checkout of the v1.8 branch)

with this patch (zero regression test so far, it might solve one problem
but break anything else !)

mpirun -np 2 -host localhost --mca btl,scif,vader ./test_scif2
works fine :-)

but

mpirun -np 2 -host localhost --mca btl scif,vader ./test_scif2
still crashes (and it did not crash before r31496)

i will provide the output you requested shortly

Cheers,

Gilles
/*
 * This test is an oversimplified version of collective/bcast_struct
 * that comes with the ibm test suite.
 * it must be ran on two tasks on a single host where the MIC software stack
 * is present (e.g. libscif.so is present, the mic driver is loaded and
 * /dev/mic/* are accessible and the scif btl is available.
 *
 * mpirun -np 2 -host localhost --mca scif,vader,self ./test_scif
 * will produce incorrect results with trunk and v1.8
 *
 * mpirun -np 2 --mca btl ^scif -host localhost ./test_scif
 * will work with trunk and v1.8
 *
 * mpirun -np 2 --mca btl scif,self -host localhost ./test_scif
 * will produce correct results with v1.8 r31309 (but eventually crash in 
MPI_Finalize)
 * and produce incorrect result with v1.8 r31671 and trunk r31667
 *
 * Copyright (c) 2011  Oracle and/or its affiliates.  All rights reserved.
 * Copyright (c) 2014  Research Organization for Information Science
 * and Technology (RIST). All rights reserved.
 */
/

 MESSAGE PASSING INTERFACE TEST CASE SUITE

 Copyright IBM Corp. 1995

 IBM Corp. hereby grants a non-exclusive license to use, copy, modify, and
 distribute this software for any purpose and without fee provided that the
 above copyright notice and the following paragraphs appear in all copies.

 IBM Corp. makes no representation that the test cases comprising this
 suite are correct or are an accurate representation of any standard.

 In no event shall IBM be liable to any party for direct, indirect, special
 incidental, or consequential damage arising out of the use of this software
 even if IBM Corp. has been advised of the possibility of such damage.

 IBM CORP. SPECIFICALLY DISCLAIMS ANY WARRANTIES INCLUDING, BUT NOT LIMITED
 TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS AND IBM
 CORP. HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES,
 ENHANCEMENTS, OR MODIFICATIONS.



 These test cases reflect an interpretation of the MPI Standard.  They are
 are, in most cases, unit tests of specific MPI behaviors.  If a user of any
 test case from this set believes that the MPI Standard requires behavior
 different than that implied by the test case we would appreciate feedback.

 Comments may be sent to:
Richard Treumann
treum...@kgn.ibm.com


*/
#include 
#include 
#include 
#include "mpi.h"

#define ompitest_error(file,line,...) {fprintf(stderr, "FUCK at %s:%d root=%d 
size=%d (i,j)=(%d,%d)\n", file, line,root, i0, i, j); MPI_Abort(MPI_COMM_WORLD, 
1);}

const int SIZE = 1000;

int main(int argc, char **argv)
{
   int myself;

   double a[2], t_stop;
   int ii, size;
   int len[2];
   MPI_Aint disp[2];
   MPI_Datatype type[2], newtype, t1, t2;
   struct foo_t {
   int i[3];
   double d[3];
   } foo, *bar;
   struct pfoo_t {
   int i[2];
   double d[2];
   } pfoo, *pbar;
   int i0, i, j, root, nseconds = 600, done_flag;
   int _dbg=0;

   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD,&myself);
   MPI_Comm_size(MPI_COMM_WORLD,&size);
   // _dbg = (0 == myself);
   while (_dbg) poll(NULL,0,1);

   if ( argc > 1 ) nseconds = atoi(argv[1]);
   t_stop = MPI_Wtime() + nseconds;

   /*-*/
   /* Build a datatype that is guaranteed to have holes; send/recv
  large numbers of them */

   MPI_Type_vector(2, 1, 2, MPI_INT, &t1);
   MPI_Type_commit(&t1);
   MPI_Type_vector(2, 1, 2, MPI_DOUBLE, &t2);
   MPI_Type_commit(&t2);

   len[0] = len[1] = 1;
   MPI_Address(&foo.i[0], &disp[0]);
   MPI_Address(&foo.d[0], &disp[1]);
   printf ("%d: %x %x\n", myself, disp[0], disp[1]);
   disp[0] -= (MPI_Aint) &foo;
   disp[1] -= (MPI_Aint) &foo;
   printf ("%d: %ld %ld\n", myself, disp[0], disp[1]);
   type[0] = t1;
   type[1] = t2;
   MPI_Type_struct(2, len, disp, type, &newtype);
   MPI_Type_commit(&newtype);

Re: [OMPI devel] regression with derived datatypes

2014-05-08 Thread Hjelm, Nathan T
If you can get me the backtrace from one of the crash core files I would like 
to see what is going on there.

-Nathan

From: devel [devel-boun...@open-mpi.org] on behalf of Gilles Gouaillardet 
[gilles.gouaillar...@iferc.org]
Sent: Thursday, May 08, 2014 1:32 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] regression with derived datatypes

George,

you do not need any hardware, just download MPSS from Intel and install it.
make sure the mic kernel module is loaded *and* you can read/write to the
newly created /dev/mic/* devices.

/* i am now running this on a virtual machine with no MIC whatsoever */

i was able to improve things a bit for the new attached test case
/* send MPI_PACKED / recv newtype */
with the attached unpack.patch.

it has to be applied on r31678 (aka the latest checkout of the v1.8 branch)

with this patch (zero regression test so far, it might solve one problem
but break anything else !)

mpirun -np 2 -host localhost --mca btl,scif,vader ./test_scif2
works fine :-)

but

mpirun -np 2 -host localhost --mca btl scif,vader ./test_scif2
still crashes (and it did not crash before r31496)

i will provide the output you requested shortly

Cheers,

Gilles


Re: [OMPI devel] regression with derived datatypes

2014-05-08 Thread Gilles Gouaillardet
Nathan and George,

here are the output files of the original test_scif.c
the command line was

mpirun -np 2 -host localhost --mca btl scif,vader,self --mca
mpi_ddt_unpack_debug 1 --mca mpi_ddt_pack_debug 1 --mca
mpi_ddt_position_debug 1 a.out

this is a silent failure and there is no core file
the test itself detects it did not receive the expected value
/* grep "expected" in the output */

Gilles

On 2014/05/08 16:43, Hjelm, Nathan T wrote:
> If you can get me the backtrace from one of the crash core files I would like 
> to see what is going on there.
>



Re: [OMPI devel] regression with derived datatypes

2014-05-08 Thread Elena Elkina
Hi,

My reproducer failed even with one port enabled (-mca btl_openib_if_include
mlx4_0:1 ).
I tried with trunk as well - the same issue.

Best,
Elena


On Thu, May 8, 2014 at 11:49 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> Nathan and George,
>
> here are the output files of the original test_scif.c
> the command line was
>
> mpirun -np 2 -host localhost --mca btl scif,vader,self --mca
> mpi_ddt_unpack_debug 1 --mca mpi_ddt_pack_debug 1 --mca
> mpi_ddt_position_debug 1 a.out
>
> this is a silent failure and there is no core file
> the test itself detects it did not receive the expected value
> /* grep "expected" in the output */
>
> Gilles
>
> On 2014/05/08 16:43, Hjelm, Nathan T wrote:
> > If you can get me the backtrace from one of the crash core files I would
> like to see what is going on there.
> >
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/05/14746.php
>


Re: [OMPI devel] regression with derived datatypes

2014-05-08 Thread Gilles Gouaillardet
Nathan and George,

here are the (compressed) traces

Gilles

On 2014/05/08 16:43, Hjelm, Nathan T wrote:
> If you can get me the backtrace from one of the crash core files I would like 
> to see what is going on there.
>
> -Nathan
> 
> From: devel [devel-boun...@open-mpi.org] on behalf of Gilles Gouaillardet 
> [gilles.gouaillar...@iferc.org]
> Sent: Thursday, May 08, 2014 1:32 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] regression with derived datatypes
>
> George,
>
> you do not need any hardware, just download MPSS from Intel and install it.
> make sure the mic kernel module is loaded *and* you can read/write to the
> newly created /dev/mic/* devices.
>
> /* i am now running this on a virtual machine with no MIC whatsoever */
>
> i was able to improve things a bit for the new attached test case
> /* send MPI_PACKED / recv newtype */
> with the attached unpack.patch.
>
> it has to be applied on r31678 (aka the latest checkout of the v1.8 branch)
>
> with this patch (zero regression test so far, it might solve one problem
> but break anything else !)
>
> mpirun -np 2 -host localhost --mca btl,scif,vader ./test_scif2
> works fine :-)
>
> but
>
> mpirun -np 2 -host localhost --mca btl scif,vader ./test_scif2
> still crashes (and it did not crash before r31496)
>
> i will provide the output you requested shortly
>
> Cheers,
>
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14745.php



r31678.log.bz2
Description: Binary data


r31678withoutr31496.log.bz2
Description: Binary data


[OMPI devel] RFC: Remove autogen.sh sym link

2014-05-08 Thread Jeff Squyres (jsquyres)
WHAT: Remove the backwards-compatibility autogen.sh sym link

WHY: Because it's time

WHERE: svn rm autogen.sh

TIMEOUT: Teleconf next Tuesday, 13 May 2014

MORE DETAIL:

We converted from autogen.sh to autogen.pl nearly 4 years ago (2010-09-17).  
The autogen.sh->autogen.pl sym link was put in shortly thereafter as a stopgap 
measure to give people time to update their automated scripts from autogen.sh 
to autogen.pl (or better yet, test and see which name they should invoke).

Every time I type "./au", it stops at "./autogen.", which is just annoying.

It's been nearly 4 years.  I think it's time to cut the cord: remove the 
autogen.sh sym link and move on.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove autogen.sh sym link

2014-05-08 Thread Joshua Ladd
+1



On Thu, May 8, 2014 at 6:08 AM, Jeff Squyres (jsquyres)
wrote:

> WHAT: Remove the backwards-compatibility autogen.sh sym link
>
> WHY: Because it's time
>
> WHERE: svn rm autogen.sh
>
> TIMEOUT: Teleconf next Tuesday, 13 May 2014
>
> MORE DETAIL:
>
> We converted from autogen.sh to autogen.pl nearly 4 years ago
> (2010-09-17).  The autogen.sh->autogen.pl sym link was put in shortly
> thereafter as a stopgap measure to give people time to update their
> automated scripts from autogen.sh to autogen.pl (or better yet, test and
> see which name they should invoke).
>
> Every time I type "./au", it stops at "./autogen.", which is just
> annoying.
>
> It's been nearly 4 years.  I think it's time to cut the cord: remove the
> autogen.sh sym link and move on.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/05/14749.php
>


Re: [OMPI devel] RFC: Remove autogen.sh sym link

2014-05-08 Thread Stephen Poole

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

+1

Best
Steve...


On 5/8/14, 6:08 AM, Jeff Squyres (jsquyres) wrote:
> WHAT: Remove the backwards-compatibility autogen.sh sym link
>
> WHY: Because it's time
>
> WHERE: svn rm autogen.sh
>
> TIMEOUT: Teleconf next Tuesday, 13 May 2014
>
> MORE DETAIL:
>
> We converted from autogen.sh to autogen.pl nearly 4 years ago
(2010-09-17).  The autogen.sh->autogen.pl sym link was put in shortly
thereafter as a stopgap measure to give people time to update their
automated scripts from autogen.sh to autogen.pl (or better yet, test and
see which name they should invoke).
>
> Every time I type "./au", it stops at "./autogen.", which is just
annoying.
>
> It's been nearly 4 years.  I think it's time to cut the cord: remove
the autogen.sh sym link and move on.
>

-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTa2TCAAoJECiO+w6Set8uM5cP/RvKUK79ics4EmAFub0SZW3k
TvskGXtSEIIS8G0YsiQq8ipdPSr0IhddaAr0oQx1y/fspzPKWvSzAFr0OUn8O9OM
636obLQfYkl0Eq98JdmS8fVJvOerwR9h2COahbvuybFJaE3W/2EkwY8zmzTqAj96
q5yhAJdYu0UaXrZWcpOuX5Q6FncSyE0+PM0msWcW8VeSu8MxAbF3ooQvcst03RgJ
gFaqDc447xyY+bV0GHuPRrd1nwU9p4JJsP4mLGvseXxMuIMAVkQfMVnyElDU4qsH
ZfqrzdtXS8UmHyWLxw/Ir75ZzEpE56LySfofqAzvO9BHxSnHvYCgcc8y7+jYIg9t
r2aC3gGmLkCXObG40OuF1s+O/t+UCEc6TiEvjYTUPJRmEvbimJ5aqzP5zX8/NuyY
yWe8JwdhASvvYf9Ps+tGaKw0nbH1Xx22zB6iBd3ARTv27ifvpZccVOtNHAYLBz0w
0RHTblHNUZlt2255lZkpHcUijL+MvgwU5wEh9MTpuYwb3mkD+y7Ql8Ag6guGyn1D
/nOZ/d3t2j4DSXVCsLCKyZOhZtcQDwWi23EMjn/0xaV4gMQIiGcGqFTOX+nuBw1m
YKxnc/eb+En84l0yFjppzDq45VhBhYPYJYHHyIRAPsoT/2Cv5SEj+JcJfeL9fpZO
/ytJJoXwfxCFnBaa2fq1
=8tsu
-END PGP SIGNATURE-



Re: [OMPI devel] RFC: Remove autogen.sh sym link

2014-05-08 Thread Ashley Pittman

This will break my build but it’s an easy fix so don’t let that stop you.

Ashley.

On 8 May 2014, at 11:08, Jeff Squyres (jsquyres)  wrote:

> WHAT: Remove the backwards-compatibility autogen.sh sym link
> 
> WHY: Because it's time
> 
> WHERE: svn rm autogen.sh
> 
> TIMEOUT: Teleconf next Tuesday, 13 May 2014
> 
> MORE DETAIL:
> 
> We converted from autogen.sh to autogen.pl nearly 4 years ago (2010-09-17).  
> The autogen.sh->autogen.pl sym link was put in shortly thereafter as a 
> stopgap measure to give people time to update their automated scripts from 
> autogen.sh to autogen.pl (or better yet, test and see which name they should 
> invoke).
> 
> Every time I type "./au", it stops at "./autogen.", which is just 
> annoying.
> 
> It's been nearly 4 years.  I think it's time to cut the cord: remove the 
> autogen.sh sym link and move on.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14749.php



[OMPI devel] VPATH builds broken?

2014-05-08 Thread Ashley Pittman

I started getting build failures against trunk on the 29th, most likely as a 
result of this commit:

https://github.com/open-mpi/ompi-svn-mirror/commit/3f42cbf50670c5b311cc4414dbb3f4ccf762e455

It looks like there was another commit almost immediately afterwards which 
fixed the first problem (include file errors) however I’m still seeing build 
failures with the following error, I don’t know if this is still aside effect 
of the previous VPATH problem or something else.

Making all in mpi
make[10]: Entering directory 
`/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/mpi'
ln -s 
../../../../../../../../../../source/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/handler.c
 handler.c
  CC   otfmerge_mpi-handler.o
ln -s 
../../../../../../../../../../source/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/otfmerge.c
 otfmerge.c
  CC   otfmerge_mpi-otfmerge.o
  CCLD otfmerge-mpi
/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
 undefined reference to `opal_dstore_peer'
/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
 undefined reference to `opal_value_load'
/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
 undefined reference to `opal_value_unload'
/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
 undefined reference to `opal_dstore_nonpeer'
/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
 undefined reference to `opal_dstore_internal'
/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
 undefined reference to `opal_dstore'
collect2: error: ld returned 1 exit status
make[10]: *** [otfmerge-mpi] Error 1
make[10]: Leaving directory 
`/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/mpi'
make[9]: *** [all-recursive] Error 1
make[9]: Leaving directory 
`/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge'
make[8]: *** [all-recursive] Error 1
make[8]: Leaving directory 
`/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools'
make[7]: *** [all-recursive] Error 1
make[7]: Leaving directory 
`/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf'
make[6]: *** [all] Error 2
make[6]: Leaving directory 
`/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf'
make[5]: *** [all-recursive] Error 1
make[5]: Leaving directory 
`/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib'
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory 
`/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt'
make[3]: *** [all] Error 2
make[3]: Leaving directory 
`/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory 
`/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/space/jenkins/workspace/open-mpi/build/ompi'
make: *** [all-recursive] Error 1


The build script I’m using is fairly simple, it’s working from a clean checkout 
each time but is doing a “VPATH” or out-of-tree build

cd source
./autogen.sh
cd ..
[ -d build ] && rm -rf build
[ -d build ] && rm -rf install
mkdir build
cd build
../source/configure --enable-mpirun-prefix-by-default --prefix 
$WORKSPACE/install
make
make install

Ashley,

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Joshua Ladd
Hi, Adam

We (MLNX) are working on a new SLURM PMI2 plugin that we plan to eventually
push upstream. However, to use it, it will require linking in a proprietary
Mellanox library that accelerates the collective operations (available in
MOFED versions 2.1 and higher.)  Similar in spirit to the MXM MTL or FCA
COLL components in OMPI.

Best,

Josh


On Wed, May 7, 2014 at 11:45 AM, Moody, Adam T.  wrote:

>  Hi Josh,
> Are your changes to OMPI or SLURM's PMI2 implementation?  Do you plan to
> push those changes back upstream?
> -Adam
>
>
>  --
> *From:* devel [devel-boun...@open-mpi.org] on behalf of Joshua Ladd [
> jladd.m...@gmail.com]
> *Sent:* Wednesday, May 07, 2014 7:56 AM
> *To:* Open MPI Developers
>
> *Subject:* Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is
> specifically requested
>
>   Ah, I see. Sorry for the reactionary comment - but this feature falls
> squarely within my "jurisdiction", and we've invested a lot in improving
> OMPI jobstart under srun.
>
> That being said (now that I've taken some deep breaths and carefully read
> your original email :)), what you're proposing isn't a bad idea. I think it
> would be good to maybe add a "--with-pmi2" flag to configure since
> "--with-pmi" automagically uses PMI2 if it finds the header and lib. This
> way, we could experiment with PMI1/PMI2 without having to rebuild SLURM or
> hack the installation.
>
>  Josh
>
>
> On Wed, May 7, 2014 at 10:45 AM, Ralph Castain  wrote:
>
>> Okay, then we'll just have to develop a workaround for all those Slurm
>> releases where PMI-2 is borked :-(
>>
>>  FWIW: I think people misunderstood my statement. I specifically did
>> *not* propose to *lose* PMI-2 support. I suggested that we change it to
>> "on-by-request" instead of the current "on-by-default" so we wouldn't keep
>> getting asked about PMI-2 bugs in Slurm. Once the Slurm implementation
>> stabilized, then we could reverse that policy.
>>
>>  However, given that both you and Chris appear to prefer to keep it
>> "on-by-default", we'll see if we can find a way to detect that PMI-2 is
>> broken and then fall back to PMI-1.
>>
>>
>>   On May 7, 2014, at 7:39 AM, Joshua Ladd  wrote:
>>
>>  Just saw this thread, and I second Chris' observations: at scale we
>> are seeing huge gains in jobstart performance with PMI2 over PMI1. We
>> *CANNOT* loose this functionality. For competitive reasons, I cannot
>> provide exact numbers, but let's say the difference is in the ballpark of a
>> full order-of-magnitude on 20K ranks versus PMI1. PMI1 is completely
>> unacceptable/unusable at scale. Certainly PMI2 still has scaling issues,
>> but there is no contest between PMI1 and PMI2.  We (MLNX) are actively
>> working to resolve some of the scalability issues in PMI2.
>>
>>  Josh
>>
>>  Joshua S. Ladd
>>  Staff Engineer, HPC Software
>>  Mellanox Technologies
>>
>>  Email: josh...@mellanox.com
>>
>>
>> On Wed, May 7, 2014 at 4:00 AM, Ralph Castain  wrote:
>>
>>> Interesting - how many nodes were involved? As I said, the bad scaling
>>> becomes more evident at a fairly high node count.
>>>
>>> On May 7, 2014, at 12:07 AM, Christopher Samuel 
>>> wrote:
>>>
>>> > -BEGIN PGP SIGNED MESSAGE-
>>> > Hash: SHA1
>>> >
>>> > Hiya Ralph,
>>> >
>>> > On 07/05/14 14:49, Ralph Castain wrote:
>>> >
>>> >> I should have looked closer to see the numbers you posted, Chris -
>>> >> those include time for MPI wireup. So what you are seeing is that
>>> >> mpirun is much more efficient at exchanging the MPI endpoint info
>>> >> than PMI. I suspect that PMI2 is not much better as the primary
>>> >> reason for the difference is that mpriun sends blobs, while PMI
>>> >> requires that everything be encoded into strings and sent in little
>>> >> pieces.
>>> >>
>>> >> Hence, mpirun can exchange the endpoint info (the dreaded "modex"
>>> >> operation) much faster, and MPI_Init completes faster. Rest of the
>>> >> computation should be the same, so long compute apps will see the
>>> >> difference narrow considerably.
>>> >
>>> > Unfortunately it looks like I had an enthusiastic cleanup at some point
>>> > and so I cannot find the out files from those runs at the moment, but
>>> > I did find some comparisons from around that time.
>>> >
>>> > This first pair are comparing running NAMD with OMPI 1.7.3a1r29103
>>> > run with mpirun and srun successively from inside the same Slurm job.
>>> >
>>> > mpirun namd2 macpf.conf
>>> > srun --mpi=pmi2 namd2 macpf.conf
>>> >
>>> > Firstly the mpirun output (grep'ing the interesting bits):
>>> >
>>> > Charm++> Running on MPI version: 2.1
>>> > Info: Benchmark time: 512 CPUs 0.0959179 s/step 0.555081 days/ns
>>> 1055.19 MB memory
>>> > Info: Benchmark time: 512 CPUs 0.0929002 s/step 0.537617 days/ns
>>> 1055.19 MB memory
>>> > Info: Benchmark time: 512 CPUs 0.0727373 s/step 0.420933 days/ns
>>> 1055.19 MB memory
>>> > Info: Benchmark time: 512 CPUs 0.0779532 s/step 0.451118 days/ns
>>> 1055.19 MB memory
>>> > Info

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Chris Samuel
On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote:

> We (MLNX) are working on a new SLURM PMI2 plugin that we plan to eventually
> push upstream. However, to use it, it will require linking in a proprietary
> Mellanox library that accelerates the collective operations (available in
> MOFED versions 2.1 and higher.)

What about those of us who cannot run Mellanox OFED?

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] RFC: Remove autogen.sh sym link

2014-05-08 Thread Jeff Squyres (jsquyres)
On May 8, 2014, at 8:59 AM, Ashley Pittman  wrote:

> This will break my build but it’s an easy fix so don’t let that stop you.

Something like this should do ya:

--- bogus   2014-05-08 06:26:19.759259593 -0700
+++ bogus-new   2014-05-08 06:26:22.567481480 -0700
@@ -14,7 +14,11 @@
 
 
 
-./autogen.sh
+if test -x autogen.sh; then
+   ./autogen.sh
+else
+   ./autogen.pl
+fi
 
 
 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove autogen.sh sym link

2014-05-08 Thread Ashley Pittman

I was thinking of something even easier than that ;)  I try to keep an eye on 
the message queue functionality so it’s not often that I need to build code 
over four years old from source.

Ashley.

On 8 May 2014, at 14:27, Jeff Squyres (jsquyres)  wrote:

> On May 8, 2014, at 8:59 AM, Ashley Pittman  wrote:
> 
>> This will break my build but it’s an easy fix so don’t let that stop you.
> 
> Something like this should do ya:
> 
> --- bogus 2014-05-08 06:26:19.759259593 -0700
> +++ bogus-new 2014-05-08 06:26:22.567481480 -0700
> @@ -14,7 +14,11 @@
> 
> 
> 
> -./autogen.sh
> +if test -x autogen.sh; then
> +   ./autogen.sh
> +else
> +   ./autogen.pl
> +fi
> 
> 
> 
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14756.php



Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Ralph Castain

On May 8, 2014, at 6:23 AM, Chris Samuel  wrote:

> On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote:
> 
>> We (MLNX) are working on a new SLURM PMI2 plugin that we plan to eventually
>> push upstream. However, to use it, it will require linking in a proprietary
>> Mellanox library that accelerates the collective operations (available in
>> MOFED versions 2.1 and higher.)
> 
> What about those of us who cannot run Mellanox OFED?

Artem and I are working on a new PMIx plugin that will resolve it for 
non-Mellanox cases.

> 
> All the best,
> Chris
> -- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14755.php



Re: [OMPI devel] VPATH builds broken?

2014-05-08 Thread Jeff Squyres (jsquyres)
I'm unable to reproduce your error, even with a git clone of the mirror.  
Perhaps you need to "git clean -df"?


On May 8, 2014, at 9:09 AM, Ashley Pittman  wrote:

> 
> I started getting build failures against trunk on the 29th, most likely as a 
> result of this commit:
> 
> https://github.com/open-mpi/ompi-svn-mirror/commit/3f42cbf50670c5b311cc4414dbb3f4ccf762e455
> 
> It looks like there was another commit almost immediately afterwards which 
> fixed the first problem (include file errors) however I’m still seeing build 
> failures with the following error, I don’t know if this is still aside effect 
> of the previous VPATH problem or something else.
> 
> Making all in mpi
> make[10]: Entering directory 
> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/mpi'
> ln -s 
> ../../../../../../../../../../source/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/handler.c
>  handler.c
>  CC   otfmerge_mpi-handler.o
> ln -s 
> ../../../../../../../../../../source/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/otfmerge.c
>  otfmerge.c
>  CC   otfmerge_mpi-otfmerge.o
>  CCLD otfmerge-mpi
> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>  undefined reference to `opal_dstore_peer'
> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>  undefined reference to `opal_value_load'
> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>  undefined reference to `opal_value_unload'
> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>  undefined reference to `opal_dstore_nonpeer'
> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>  undefined reference to `opal_dstore_internal'
> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>  undefined reference to `opal_dstore'
> collect2: error: ld returned 1 exit status
> make[10]: *** [otfmerge-mpi] Error 1
> make[10]: Leaving directory 
> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/mpi'
> make[9]: *** [all-recursive] Error 1
> make[9]: Leaving directory 
> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge'
> make[8]: *** [all-recursive] Error 1
> make[8]: Leaving directory 
> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools'
> make[7]: *** [all-recursive] Error 1
> make[7]: Leaving directory 
> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf'
> make[6]: *** [all] Error 2
> make[6]: Leaving directory 
> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf'
> make[5]: *** [all-recursive] Error 1
> make[5]: Leaving directory 
> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib'
> make[4]: *** [all-recursive] Error 1
> make[4]: Leaving directory 
> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt'
> make[3]: *** [all] Error 2
> make[3]: Leaving directory 
> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory 
> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/space/jenkins/workspace/open-mpi/build/ompi'
> make: *** [all-recursive] Error 1
> 
> 
> The build script I’m using is fairly simple, it’s working from a clean 
> checkout each time but is doing a “VPATH” or out-of-tree build
> 
> cd source
> ./autogen.sh
> cd ..
> [ -d build ] && rm -rf build
> [ -d build ] && rm -rf install
> mkdir build
> cd build
> ../source/configure --enable-mpirun-prefix-by-default --prefix 
> $WORKSPACE/install
> make
> make install
> 
> Ashley,
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14753.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Joshua Ladd
Chris,

The necessary packages will be supported and available in community OFED.

Josh


On Thu, May 8, 2014 at 9:23 AM, Chris Samuel  wrote:

> On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote:
>
> > We (MLNX) are working on a new SLURM PMI2 plugin that we plan to
> eventually
> > push upstream. However, to use it, it will require linking in a
> proprietary
> > Mellanox library that accelerates the collective operations (available in
> > MOFED versions 2.1 and higher.)
>
> What about those of us who cannot run Mellanox OFED?
>
> All the best,
> Chris
> --
>  Christopher SamuelSenior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/  http://twitter.com/vlsci
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/05/14755.php
>


Re: [OMPI devel] VPATH builds broken?

2014-05-08 Thread Ashley Pittman

Ah, it was something my end.  I had a bug in my build script that it wasn’t 
wiping the install directory before doing the build.  This might be an 
indication that something in the build is picking up the install directory in 
preference to the build directory but I don’t think that would represent a real 
problem - frankly I’m surprised this worked as long as it did.

Ashley,

On 8 May 2014, at 14:52, Jeff Squyres (jsquyres)  wrote:

> I'm unable to reproduce your error, even with a git clone of the mirror.  
> Perhaps you need to "git clean -df"?
> 
> 
> On May 8, 2014, at 9:09 AM, Ashley Pittman  wrote:
> 
>> 
>> I started getting build failures against trunk on the 29th, most likely as a 
>> result of this commit:
>> 
>> https://github.com/open-mpi/ompi-svn-mirror/commit/3f42cbf50670c5b311cc4414dbb3f4ccf762e455
>> 
>> It looks like there was another commit almost immediately afterwards which 
>> fixed the first problem (include file errors) however I’m still seeing build 
>> failures with the following error, I don’t know if this is still aside 
>> effect of the previous VPATH problem or something else.
>> 
>> Making all in mpi
>> make[10]: Entering directory 
>> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/mpi'
>> ln -s 
>> ../../../../../../../../../../source/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/handler.c
>>  handler.c
>> CC   otfmerge_mpi-handler.o
>> ln -s 
>> ../../../../../../../../../../source/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/otfmerge.c
>>  otfmerge.c
>> CC   otfmerge_mpi-otfmerge.o
>> CCLD otfmerge-mpi
>> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>>  undefined reference to `opal_dstore_peer'
>> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>>  undefined reference to `opal_value_load'
>> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>>  undefined reference to `opal_value_unload'
>> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>>  undefined reference to `opal_dstore_nonpeer'
>> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>>  undefined reference to `opal_dstore_internal'
>> /space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>>  undefined reference to `opal_dstore'
>> collect2: error: ld returned 1 exit status
>> make[10]: *** [otfmerge-mpi] Error 1
>> make[10]: Leaving directory 
>> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge/mpi'
>> make[9]: *** [all-recursive] Error 1
>> make[9]: Leaving directory 
>> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools/otfmerge'
>> make[8]: *** [all-recursive] Error 1
>> make[8]: Leaving directory 
>> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf/tools'
>> make[7]: *** [all-recursive] Error 1
>> make[7]: Leaving directory 
>> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf'
>> make[6]: *** [all] Error 2
>> make[6]: Leaving directory 
>> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib/otf'
>> make[5]: *** [all-recursive] Error 1
>> make[5]: Leaving directory 
>> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt/extlib'
>> make[4]: *** [all-recursive] Error 1
>> make[4]: Leaving directory 
>> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt'
>> make[3]: *** [all] Error 2
>> make[3]: Leaving directory 
>> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt/vt'
>> make[2]: *** [all-recursive] Error 1
>> make[2]: Leaving directory 
>> `/space/jenkins/workspace/open-mpi/build/ompi/contrib/vt'
>> make[1]: *** [all-recursive] Error 1
>> make[1]: Leaving directory `/space/jenkins/workspace/open-mpi/build/ompi'
>> make: *** [all-recursive] Error 1
>> 
>> 
>> The build script I’m using is fairly simple, it’s working from a clean 
>> checkout each time but is doing a “VPATH” or out-of-tree build
>> 
>> cd source
>> ./autogen.sh
>> cd ..
>> [ -d build ] && rm -rf build
>> [ -d build ] && rm -rf install
>> mkdir build
>> cd build
>> ../source/configure --enable-mpirun-prefix-by-default --prefix 
>> $WORKSPACE/install
>> make
>> make install
>> 
>> Ashley,
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/05/14753.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14759.php



Re: [OMPI devel] RFC: continue cleanup of build system abstractions

2014-05-08 Thread Ralph Castain
This RFC is now complete - the renaming exercise is done. My apologies to all 
for the churn, and my deepest thanks for your patience.

I know it will take awhile to get used to using the revised names and to avoid 
breaking the abstractions going forward. We have a "canary" for most of the 
abstraction breaks, so we can deal with them rather quickly when they occur.

Please let me know if/when you hit issues and we'll fix them as quickly as 
possible. I think the system is pretty close to right, but (as usual) there may 
be things in areas we can't compile that are broken.

Thanks again for your patience during this transition.
Ralph


On Apr 27, 2014, at 4:39 PM, Ralph Castain  wrote:

> WHAT:   continue the cleanup of build system abstractions that was started
>  a couple of years ago by Brian, Jeff, and I. The objective is to 
> fix
>  all the naming conventions for things like OMPI_CHECK_PACKAGE
>  so they accurately reflect their targeted level in the code base 
> - e.g.,
>  OMPI_foo gets used for things in the MPI layer. This basically 
> just
>  corrects some historical decisions made before we cared as much
>  about abstractions
> 
> WHEN:  to be done in a series of commits over the next two months
> 
> HOW:a simple search_replace.pl across the repo
> 
> First step:
>OMPI_CHECK_PACKAGE->  OPAL_CHECK_PACKAGE
>OMPI_CHECK_FUNC_LIB->  OPAL_CHECK_FUNC_LIB
>OMPI_CHECK_COMPILER_WORKS   ->  OPAL_CHECK_COMPILER_WORKS
>OMPI_CHECK_WITHDIR  ->  OPAL_CHECK_WITHDIR
> 
> 
> TIMEOUT:  if nobody raises an objection, sometime after the Tues telecon
> 
> 



Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/05/14 23:45, Ralph Castain wrote:

> Artem and I are working on a new PMIx plugin that will resolve it 
> for non-Mellanox cases.

Ah yes of course, sorry my bad!

- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlNsGcsACgkQO2KABBYQAh/ATgCfeQHS1KsZbLS8Hdux6p98K3w3
DqsAn3vZJMtYGs1xWK4ubK26ceuACtf1
=zPyS
-END PGP SIGNATURE-


Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/05/14 00:16, Joshua Ladd wrote:

> The necessary packages will be supported and available in community
> OFED.

We're constrained to what is in RHEL6 I'm afraid.

This is because we have to run GPFS over IB to BG/Q from the same NSDs
that talk GPFS to all our Intel clusters.   We did try MOFED 2.x (in
connected mode) on a new Intel cluster during its bring up last year
which worked for MPI but stopped it talking to the NSDs.  Reverting to
vanilla RHEL6 fixed it.

Not your problem though. :-)  As Ralph has said there is work on an
alternative solution that we will be able to use.

Thanks!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlNsG88ACgkQO2KABBYQAh8+SwCfZWpViBFwuhlxqERXpbXbr8Eq
awwAnjj7NJ2/zUGBeZNT0UPwkmaGOaLR
=nPxl
-END PGP SIGNATURE-