[OMPI devel] autogen broken

2014-12-09 Thread George Bosilca
After updating to the latest master (3a14c8e), I start having issues with the 
VPATH build on Mac OS X. The autogen.pl and configure succeeded but when make 
is invoked I got the following error:

Making all in opal
Making all in include
/Applications/Xcode.app/Contents/Developer/usr/bin/make  all-am
Making all in libltdl
CDPATH="${ZSH_VERSION+.}:" && cd ../../../ompi/opal/libltdl && /bin/sh 
/Users/bosilca/unstable/ompi/trunk/ompi/config/missing aclocal-1.14 -I 
../../config
aclocal-1.14: error: ../../config/autogen_found_items.m4:312: file 
‘opal/mca/backtrace/configure.m4’ does not exist

I tried to wipe out everything and start from scratch but to no avail. Any 
ideas what’s going wrong and/or how to fix this?

  George.





Re: [OMPI devel] autogen broken

2014-12-09 Thread Nick Papior Andersen
I experience the exact same thing.
Please see my bug-report on this (and the work-around) here:
http://www.open-mpi.org/community/lists/devel/2014/11/16371.php

2014-12-09 7:57 GMT+01:00 George Bosilca :

> After updating to the latest master (3a14c8e), I start having issues with
> the VPATH build on Mac OS X. The autogen.pl and configure succeeded but
> when make is invoked I got the following error:
>
> Making all in opal
> Making all in include
> /Applications/Xcode.app/Contents/Developer/usr/bin/make  all-am
> Making all in libltdl
> CDPATH="${ZSH_VERSION+.}:" && cd ../../../ompi/opal/libltdl && /bin/sh
> /Users/bosilca/unstable/ompi/trunk/ompi/config/missing aclocal-1.14 -I
> ../../config
> aclocal-1.14: error: ../../config/autogen_found_items.m4:312: file
> ‘opal/mca/backtrace/configure.m4’ does not exist
>
> I tried to wipe out everything and start from scratch but to no avail. Any
> ideas what’s going wrong and/or how to fix this?
>
>   George.
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16455.php




-- 
Kind regards Nick


Re: [OMPI devel] autogen broken

2014-12-09 Thread Ralph Castain
Yeah, we’ve confirmed at Intel that OMPI won’t build with libtool 2.4.3+

I made Jeff aware of it, but we’re both too busy to dig into this before the 
holiday.


> On Dec 8, 2014, at 11:27 PM, Nick Papior Andersen  
> wrote:
> 
> I experience the exact same thing.
> Please see my bug-report on this (and the work-around) here:
> http://www.open-mpi.org/community/lists/devel/2014/11/16371.php 
> 
> 
> 2014-12-09 7:57 GMT+01:00 George Bosilca  >:
> After updating to the latest master (3a14c8e), I start having issues with the 
> VPATH build on Mac OS X. The autogen.pl  and configure 
> succeeded but when make is invoked I got the following error:
> 
> Making all in opal
> Making all in include
> /Applications/Xcode.app/Contents/Developer/usr/bin/make  all-am
> Making all in libltdl
> CDPATH="${ZSH_VERSION+.}:" && cd ../../../ompi/opal/libltdl && /bin/sh 
> /Users/bosilca/unstable/ompi/trunk/ompi/config/missing aclocal-1.14 -I 
> ../../config
> aclocal-1.14: error: ../../config/autogen_found_items.m4:312: file 
> ‘opal/mca/backtrace/configure.m4’ does not exist
> 
> I tried to wipe out everything and start from scratch but to no avail. Any 
> ideas what’s going wrong and/or how to fix this?
> 
>   George.
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16455.php 
> 
> 
> 
> -- 
> Kind regards Nick
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16456.php



Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-09 Thread Pim Schellart
The version that “lstopo --version” reports is the same (1.8) on all nodes, but 
we may indeed be hitting the second issue. We can try to compile a new version 
of openmpi, but how do we ensure that the external programs (e.g. SLURM) are 
using the same hwloc version as the one embedded in openmpi? Is it enough to 
just compile hwloc 1.9 separately as well and link against that? Also, if this 
is an issue, should we file a bug against hwloc or openmpi on Ubuntu for 
mismatching versions?

> On 09 Dec 2014, at 00:50, Ralph Castain  wrote:
> 
> Hmmm…they probably linked that to the external, system hwloc version, so it 
> sounds like one or more of your nodes has a different hwloc rpm on it.
> 
> I couldn’t leaf thru your output well enough to see all the lstopo versions, 
> but you might check to ensure they are the same.
> 
> Looking at the code base, you may also hit a problem here. OMPI 1.6 series 
> was based on hwloc 1.3 - the output you sent indicated you have hwloc 1.8, 
> which is quite a big change. OMPI 1.8 series is based on hwloc 1.9, so at 
> least that is closer (though probably still a mismatch).
> 
> Frankly, I’d just download and install an OMPI tarball myself and avoid these 
> headaches. This mismatch in required versions is why we embed hwloc as it is 
> a critical library for OMPI, and we had to ensure that the version matched 
> our internal requirements.
> 
> 
>> On Dec 8, 2014, at 8:50 AM, Pim Schellart  wrote:
>> 
>> It is the default openmpi that comes with Ubuntu 14.04.
>> 
>>> On 08 Dec 2014, at 17:17, Ralph Castain  wrote:
>>> 
>>> Pim: is this an OMPI you built, or one you were given somehow? If you built 
>>> it, how did you configure it?
>>> 
 On Dec 8, 2014, at 8:12 AM, Brice Goglin  wrote:
 
 It likely depends on how SLURM allocates the cpuset/cgroup inside the
 nodes. The XML warning is related to these restrictions inside the node.
 Anyway, my feeling is that there's a old OMPI or a old hwloc somewhere.
 
 How do we check after install whether OMPI uses the embedded or the
 system-wide hwloc?
 
 Brice
 
 
 
 
 Le 08/12/2014 17:07, Pim Schellart a écrit :
> Dear Ralph,
> 
> the nodes are called coma## and as you can see in the logs the nodes of 
> the broken example are the same as the nodes of the working one, so that 
> doesn’t seem to be the cause. Unless (very likely) I’m missing something. 
> Anything else I can check?
> 
> Regards,
> 
> Pim
> 
>> On 08 Dec 2014, at 17:03, Ralph Castain  wrote:
>> 
>> As Brice said, OMPI has its own embedded version of hwloc that we use, 
>> so there is no Slurm interaction to be considered. The most likely cause 
>> is that one or more of your nodes is picking up a different version of 
>> OMPI. So things “work” if you happen to get nodes where all the versions 
>> match, and “fail” when you get a combination that includes a different 
>> version.
>> 
>> Is there some way you can narrow down your search to find the node(s) 
>> that are picking up the different version?
>> 
>> 
>>> On Dec 8, 2014, at 7:48 AM, Pim Schellart  wrote:
>>> 
>>> Dear Brice,
>>> 
>>> I am not sure why this is happening since all code seems to be using 
>>> the same hwloc library version (1.8) but it does :) An MPI program is 
>>> started through SLURM on two nodes with four CPU cores total (divided 
>>> over the nodes) using the following script:
>>> 
>>> #! /bin/bash
>>> #SBATCH -N 2 -n 4
>>> /usr/bin/mpiexec /usr/bin/lstopo --version
>>> /usr/bin/mpiexec /usr/bin/lstopo --of xml
>>> /usr/bin/mpiexec  /path/to/my_mpi_code
>>> 
>>> When this is submitted multiple times it gives “out-of-order” warnings 
>>> in about 9/10 cases but works without warnings in 1/10 cases. I 
>>> attached the output (with xml) for both the working and `broken` case. 
>>> Note that the xml is of course printed (differently) multiple times for 
>>> each task/core. As always, any help would be appreciated.
>>> 
>>> Regards,
>>> 
>>> Pim Schellart
>>> 
>>> P.S. $ mpirun --version
>>> mpirun (Open MPI) 1.6.5
>>> 
>>> 
>>> 
 On 07 Dec 2014, at 13:50, Brice Goglin  wrote:
 
 Hello
 The github issue you're refering to was closed 18 months ago. The
 warning (it's not an error) is only supposed to appear if you're
 importing in a recent hwloc a XML that was exported from a old hwloc. I
 don't see how that could happen when using Open MPI since the hwloc
 versions on both sides is the same.
 Make sure you're not confusing with another error described here
 
 http://www.open-mpi.org/projects/hwloc/doc/v1.10.0/a00028.php#faq_os_error
 Otherwise please report the exact Open MPI and/or hwloc versions as 
 well
 as 

[OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-09 Thread Pascal Deveze

In case where MPI is compiled with --enable-mpi-thread-multiple, a call to 
opal_using_threads() always returns 0 in the routine btl_xxx_component_init() 
of the BTLs, event if the application calls MPI_Init_thread() with 
MPI_THREAD_MULTIPLE.

This is because opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is 
called to late.

I propose the following patch that solves the problem for me:

diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c
index 35509cf..c2370fc 100644
--- a/ompi/runtime/ompi_mpi_init.c
+++ b/ompi/runtime/ompi_mpi_init.c
@@ -512,6 +512,13 @@ int ompi_mpi_init(int argc, char **argv, int requested, 
int *provided)
 }
#endif

+/* If thread support was enabled, then setup OPAL to allow for
+   them. */
+if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
+(*provided != MPI_THREAD_SINGLE)) {
+opal_set_using_threads(true);
+}
+
 /* initialize datatypes. This step should be done early as it will
  * create the local convertor and local arch used in the proc
  * init.
@@ -724,13 +731,6 @@ int ompi_mpi_init(int argc, char **argv, int requested, 
int *provided)
goto error;
 }

-/* If thread support was enabled, then setup OPAL to allow for
-   them. */
-if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
-(*provided != MPI_THREAD_SINGLE)) {
-opal_set_using_threads(true);
-}
-
 /* start PML/BTL's */
 ret = MCA_PML_CALL(enable(true));
 if( OMPI_SUCCESS != ret ) {


Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-09 Thread Ralph Castain
There is no linkage between slurm and ompi when it comes to hwloc. If you 
directly launch your app using srun, then slurm will use its version of hwloc 
to do the binding. If you use mpirun to launch the app, then we’ll use our 
internal version to do it.

The two are completely isolated from each other.


> On Dec 9, 2014, at 12:25 AM, Pim Schellart  wrote:
> 
> The version that “lstopo --version” reports is the same (1.8) on all nodes, 
> but we may indeed be hitting the second issue. We can try to compile a new 
> version of openmpi, but how do we ensure that the external programs (e.g. 
> SLURM) are using the same hwloc version as the one embedded in openmpi? Is it 
> enough to just compile hwloc 1.9 separately as well and link against that? 
> Also, if this is an issue, should we file a bug against hwloc or openmpi on 
> Ubuntu for mismatching versions?
> 
>> On 09 Dec 2014, at 00:50, Ralph Castain  wrote:
>> 
>> Hmmm…they probably linked that to the external, system hwloc version, so it 
>> sounds like one or more of your nodes has a different hwloc rpm on it.
>> 
>> I couldn’t leaf thru your output well enough to see all the lstopo versions, 
>> but you might check to ensure they are the same.
>> 
>> Looking at the code base, you may also hit a problem here. OMPI 1.6 series 
>> was based on hwloc 1.3 - the output you sent indicated you have hwloc 1.8, 
>> which is quite a big change. OMPI 1.8 series is based on hwloc 1.9, so at 
>> least that is closer (though probably still a mismatch).
>> 
>> Frankly, I’d just download and install an OMPI tarball myself and avoid 
>> these headaches. This mismatch in required versions is why we embed hwloc as 
>> it is a critical library for OMPI, and we had to ensure that the version 
>> matched our internal requirements.
>> 
>> 
>>> On Dec 8, 2014, at 8:50 AM, Pim Schellart  wrote:
>>> 
>>> It is the default openmpi that comes with Ubuntu 14.04.
>>> 
 On 08 Dec 2014, at 17:17, Ralph Castain  wrote:
 
 Pim: is this an OMPI you built, or one you were given somehow? If you 
 built it, how did you configure it?
 
> On Dec 8, 2014, at 8:12 AM, Brice Goglin  wrote:
> 
> It likely depends on how SLURM allocates the cpuset/cgroup inside the
> nodes. The XML warning is related to these restrictions inside the node.
> Anyway, my feeling is that there's a old OMPI or a old hwloc somewhere.
> 
> How do we check after install whether OMPI uses the embedded or the
> system-wide hwloc?
> 
> Brice
> 
> 
> 
> 
> Le 08/12/2014 17:07, Pim Schellart a écrit :
>> Dear Ralph,
>> 
>> the nodes are called coma## and as you can see in the logs the nodes of 
>> the broken example are the same as the nodes of the working one, so that 
>> doesn’t seem to be the cause. Unless (very likely) I’m missing 
>> something. Anything else I can check?
>> 
>> Regards,
>> 
>> Pim
>> 
>>> On 08 Dec 2014, at 17:03, Ralph Castain  wrote:
>>> 
>>> As Brice said, OMPI has its own embedded version of hwloc that we use, 
>>> so there is no Slurm interaction to be considered. The most likely 
>>> cause is that one or more of your nodes is picking up a different 
>>> version of OMPI. So things “work” if you happen to get nodes where all 
>>> the versions match, and “fail” when you get a combination that includes 
>>> a different version.
>>> 
>>> Is there some way you can narrow down your search to find the node(s) 
>>> that are picking up the different version?
>>> 
>>> 
 On Dec 8, 2014, at 7:48 AM, Pim Schellart  
 wrote:
 
 Dear Brice,
 
 I am not sure why this is happening since all code seems to be using 
 the same hwloc library version (1.8) but it does :) An MPI program is 
 started through SLURM on two nodes with four CPU cores total (divided 
 over the nodes) using the following script:
 
 #! /bin/bash
 #SBATCH -N 2 -n 4
 /usr/bin/mpiexec /usr/bin/lstopo --version
 /usr/bin/mpiexec /usr/bin/lstopo --of xml
 /usr/bin/mpiexec  /path/to/my_mpi_code
 
 When this is submitted multiple times it gives “out-of-order” warnings 
 in about 9/10 cases but works without warnings in 1/10 cases. I 
 attached the output (with xml) for both the working and `broken` case. 
 Note that the xml is of course printed (differently) multiple times 
 for each task/core. As always, any help would be appreciated.
 
 Regards,
 
 Pim Schellart
 
 P.S. $ mpirun --version
 mpirun (Open MPI) 1.6.5
 
 
 
> On 07 Dec 2014, at 13:50, Brice Goglin  wrote:
> 
> Hello
> The github issue you're refering to was closed 18 months ago. The
> warning (it's not an error) is only supposed to

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-09 Thread Ralph Castain
Hi Pascal

Is this in the trunk or in the 1.8 series (or both)?


> On Dec 9, 2014, at 12:28 AM, Pascal Deveze  wrote:
> 
>  
> In case where MPI is compiled with --enable-mpi-thread-multiple, a call to 
> opal_using_threads() always returns 0 in the routine btl_xxx_component_init() 
> of the BTLs, event if the application calls MPI_Init_thread() with 
> MPI_THREAD_MULTIPLE.
>  
> This is because opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c 
> is called to late.
>  
> I propose the following patch that solves the problem for me:
>  
> diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c
> index 35509cf..c2370fc 100644
> --- a/ompi/runtime/ompi_mpi_init.c
> +++ b/ompi/runtime/ompi_mpi_init.c
> @@ -512,6 +512,13 @@ int ompi_mpi_init(int argc, char **argv, int requested, 
> int *provided)
>  }
> #endif
>  
> +/* If thread support was enabled, then setup OPAL to allow for
> +   them. */
> +if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
> +(*provided != MPI_THREAD_SINGLE)) {
> +opal_set_using_threads(true);
> +}
> +
>  /* initialize datatypes. This step should be done early as it will
>   * create the local convertor and local arch used in the proc
>   * init.
> @@ -724,13 +731,6 @@ int ompi_mpi_init(int argc, char **argv, int requested, 
> int *provided)
> goto error;
>  }
>  
> -/* If thread support was enabled, then setup OPAL to allow for
> -   them. */
> -if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
> -(*provided != MPI_THREAD_SINGLE)) {
> -opal_set_using_threads(true);
> -}
> -
>  /* start PML/BTL's */
>  ret = MCA_PML_CALL(enable(true));
>  if( OMPI_SUCCESS != ret ) {
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16459.php 
> 


Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-09 Thread Pascal Deveze
Hi Ralph,

This in in the trunk.

De : devel [mailto:devel-boun...@open-mpi.org] De la part de Ralph Castain
Envoyé : mardi 9 décembre 2014 09:32
À : Open MPI Developers
Objet : Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in 
ompi/runtime/ompi_mpi_init.c is called to late

Hi Pascal

Is this in the trunk or in the 1.8 series (or both)?


On Dec 9, 2014, at 12:28 AM, Pascal Deveze 
mailto:pascal.dev...@bull.net>> wrote:


In case where MPI is compiled with --enable-mpi-thread-multiple, a call to 
opal_using_threads() always returns 0 in the routine btl_xxx_component_init() 
of the BTLs, event if the application calls MPI_Init_thread() with 
MPI_THREAD_MULTIPLE.

This is because opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is 
called to late.

I propose the following patch that solves the problem for me:

diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c
index 35509cf..c2370fc 100644
--- a/ompi/runtime/ompi_mpi_init.c
+++ b/ompi/runtime/ompi_mpi_init.c
@@ -512,6 +512,13 @@ int ompi_mpi_init(int argc, char **argv, int requested, 
int *provided)
 }
#endif

+/* If thread support was enabled, then setup OPAL to allow for
+   them. */
+if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
+(*provided != MPI_THREAD_SINGLE)) {
+opal_set_using_threads(true);
+}
+
 /* initialize datatypes. This step should be done early as it will
  * create the local convertor and local arch used in the proc
  * init.
@@ -724,13 +731,6 @@ int ompi_mpi_init(int argc, char **argv, int requested, 
int *provided)
goto error;
 }

-/* If thread support was enabled, then setup OPAL to allow for
-   them. */
-if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
-(*provided != MPI_THREAD_SINGLE)) {
-opal_set_using_threads(true);
-}
-
 /* start PML/BTL's */
 ret = MCA_PML_CALL(enable(true));
 if( OMPI_SUCCESS != ret ) {
___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/12/16459.php



Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-09 Thread Ralph Castain
Kewl - I’ll fix. Thanks!

> On Dec 9, 2014, at 12:32 AM, Pascal Deveze  wrote:
> 
> Hi Ralph,
>  
> This in in the trunk.
>  
> De : devel [mailto:devel-boun...@open-mpi.org] De la part de Ralph Castain
> Envoyé : mardi 9 décembre 2014 09:32
> À : Open MPI Developers
> Objet : Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in 
> ompi/runtime/ompi_mpi_init.c is called to late
>  
> Hi Pascal
>  
> Is this in the trunk or in the 1.8 series (or both)?
>  
>  
> On Dec 9, 2014, at 12:28 AM, Pascal Deveze  > wrote:
>  
>  
> In case where MPI is compiled with --enable-mpi-thread-multiple, a call to 
> opal_using_threads() always returns 0 in the routine btl_xxx_component_init() 
> of the BTLs, event if the application calls MPI_Init_thread() with 
> MPI_THREAD_MULTIPLE.
>  
> This is because opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c 
> is called to late.
>  
> I propose the following patch that solves the problem for me:
>  
> diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c
> index 35509cf..c2370fc 100644
> --- a/ompi/runtime/ompi_mpi_init.c
> +++ b/ompi/runtime/ompi_mpi_init.c
> @@ -512,6 +512,13 @@ int ompi_mpi_init(int argc, char **argv, int requested, 
> int *provided)
>  }
> #endif
>  
> +/* If thread support was enabled, then setup OPAL to allow for
> +   them. */
> +if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
> +(*provided != MPI_THREAD_SINGLE)) {
> +opal_set_using_threads(true);
> +}
> +
>  /* initialize datatypes. This step should be done early as it will
>   * create the local convertor and local arch used in the proc
>   * init.
> @@ -724,13 +731,6 @@ int ompi_mpi_init(int argc, char **argv, int requested, 
> int *provided)
> goto error;
>  }
>  
> -/* If thread support was enabled, then setup OPAL to allow for
> -   them. */
> -if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
> -(*provided != MPI_THREAD_SINGLE)) {
> -opal_set_using_threads(true);
> -}
> -
>  /* start PML/BTL's */
>  ret = MCA_PML_CALL(enable(true));
>  if( OMPI_SUCCESS != ret ) {
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16459.php 
> 
>  
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16462.php 
> 


Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-09 Thread Pim Schellart
Ah, ok so that was where the confusion came from, I did see hwloc in the SLURM 
sources but couldn’t immediately figure out where exactly it was used. We will 
try compiling openmpi with the embedded hwloc. Any particular flags I should 
set?

> On 09 Dec 2014, at 09:30, Ralph Castain  wrote:
> 
> There is no linkage between slurm and ompi when it comes to hwloc. If you 
> directly launch your app using srun, then slurm will use its version of hwloc 
> to do the binding. If you use mpirun to launch the app, then we’ll use our 
> internal version to do it.
> 
> The two are completely isolated from each other.
> 
> 
>> On Dec 9, 2014, at 12:25 AM, Pim Schellart  wrote:
>> 
>> The version that “lstopo --version” reports is the same (1.8) on all nodes, 
>> but we may indeed be hitting the second issue. We can try to compile a new 
>> version of openmpi, but how do we ensure that the external programs (e.g. 
>> SLURM) are using the same hwloc version as the one embedded in openmpi? Is 
>> it enough to just compile hwloc 1.9 separately as well and link against 
>> that? Also, if this is an issue, should we file a bug against hwloc or 
>> openmpi on Ubuntu for mismatching versions?
>> 
>>> On 09 Dec 2014, at 00:50, Ralph Castain  wrote:
>>> 
>>> Hmmm…they probably linked that to the external, system hwloc version, so it 
>>> sounds like one or more of your nodes has a different hwloc rpm on it.
>>> 
>>> I couldn’t leaf thru your output well enough to see all the lstopo 
>>> versions, but you might check to ensure they are the same.
>>> 
>>> Looking at the code base, you may also hit a problem here. OMPI 1.6 series 
>>> was based on hwloc 1.3 - the output you sent indicated you have hwloc 1.8, 
>>> which is quite a big change. OMPI 1.8 series is based on hwloc 1.9, so at 
>>> least that is closer (though probably still a mismatch).
>>> 
>>> Frankly, I’d just download and install an OMPI tarball myself and avoid 
>>> these headaches. This mismatch in required versions is why we embed hwloc 
>>> as it is a critical library for OMPI, and we had to ensure that the version 
>>> matched our internal requirements.
>>> 
>>> 
 On Dec 8, 2014, at 8:50 AM, Pim Schellart  wrote:
 
 It is the default openmpi that comes with Ubuntu 14.04.
 
> On 08 Dec 2014, at 17:17, Ralph Castain  wrote:
> 
> Pim: is this an OMPI you built, or one you were given somehow? If you 
> built it, how did you configure it?
> 
>> On Dec 8, 2014, at 8:12 AM, Brice Goglin  wrote:
>> 
>> It likely depends on how SLURM allocates the cpuset/cgroup inside the
>> nodes. The XML warning is related to these restrictions inside the node.
>> Anyway, my feeling is that there's a old OMPI or a old hwloc somewhere.
>> 
>> How do we check after install whether OMPI uses the embedded or the
>> system-wide hwloc?
>> 
>> Brice
>> 
>> 
>> 
>> 
>> Le 08/12/2014 17:07, Pim Schellart a écrit :
>>> Dear Ralph,
>>> 
>>> the nodes are called coma## and as you can see in the logs the nodes of 
>>> the broken example are the same as the nodes of the working one, so 
>>> that doesn’t seem to be the cause. Unless (very likely) I’m missing 
>>> something. Anything else I can check?
>>> 
>>> Regards,
>>> 
>>> Pim
>>> 
 On 08 Dec 2014, at 17:03, Ralph Castain  wrote:
 
 As Brice said, OMPI has its own embedded version of hwloc that we use, 
 so there is no Slurm interaction to be considered. The most likely 
 cause is that one or more of your nodes is picking up a different 
 version of OMPI. So things “work” if you happen to get nodes where all 
 the versions match, and “fail” when you get a combination that 
 includes a different version.
 
 Is there some way you can narrow down your search to find the node(s) 
 that are picking up the different version?
 
 
> On Dec 8, 2014, at 7:48 AM, Pim Schellart  
> wrote:
> 
> Dear Brice,
> 
> I am not sure why this is happening since all code seems to be using 
> the same hwloc library version (1.8) but it does :) An MPI program is 
> started through SLURM on two nodes with four CPU cores total (divided 
> over the nodes) using the following script:
> 
> #! /bin/bash
> #SBATCH -N 2 -n 4
> /usr/bin/mpiexec /usr/bin/lstopo --version
> /usr/bin/mpiexec /usr/bin/lstopo --of xml
> /usr/bin/mpiexec  /path/to/my_mpi_code
> 
> When this is submitted multiple times it gives “out-of-order” 
> warnings in about 9/10 cases but works without warnings in 1/10 
> cases. I attached the output (with xml) for both the working and 
> `broken` case. Note that the xml is of course printed (differently) 
> multiple times for each task/core. As always, any 

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-09 Thread Gilles Gouaillardet
Pim,

if you configure OpenMPI with --with-hwloc=external (or something like
--with-hwloc=/usr) it is very likely
OpenMPI will use the same hwloc library (e.g. the "system" library) that
is used by SLURM

/* i do not know how Ubuntu packages OpenMPI ... */


The default (e.g. no --with-hwloc parameter in the configure command
line) is to use the hwloc library that is embedded within OpenMPI

Gilles

On 2014/12/09 17:34, Pim Schellart wrote:
> Ah, ok so that was where the confusion came from, I did see hwloc in the 
> SLURM sources but couldn’t immediately figure out where exactly it was used. 
> We will try compiling openmpi with the embedded hwloc. Any particular flags I 
> should set?
>
>> On 09 Dec 2014, at 09:30, Ralph Castain  wrote:
>>
>> There is no linkage between slurm and ompi when it comes to hwloc. If you 
>> directly launch your app using srun, then slurm will use its version of 
>> hwloc to do the binding. If you use mpirun to launch the app, then we’ll use 
>> our internal version to do it.
>>
>> The two are completely isolated from each other.
>>
>>
>>> On Dec 9, 2014, at 12:25 AM, Pim Schellart  wrote:
>>>
>>> The version that “lstopo --version” reports is the same (1.8) on all nodes, 
>>> but we may indeed be hitting the second issue. We can try to compile a new 
>>> version of openmpi, but how do we ensure that the external programs (e.g. 
>>> SLURM) are using the same hwloc version as the one embedded in openmpi? Is 
>>> it enough to just compile hwloc 1.9 separately as well and link against 
>>> that? Also, if this is an issue, should we file a bug against hwloc or 
>>> openmpi on Ubuntu for mismatching versions?
>>>
 On 09 Dec 2014, at 00:50, Ralph Castain  wrote:

 Hmmm…they probably linked that to the external, system hwloc version, so 
 it sounds like one or more of your nodes has a different hwloc rpm on it.

 I couldn’t leaf thru your output well enough to see all the lstopo 
 versions, but you might check to ensure they are the same.

 Looking at the code base, you may also hit a problem here. OMPI 1.6 series 
 was based on hwloc 1.3 - the output you sent indicated you have hwloc 1.8, 
 which is quite a big change. OMPI 1.8 series is based on hwloc 1.9, so at 
 least that is closer (though probably still a mismatch).

 Frankly, I’d just download and install an OMPI tarball myself and avoid 
 these headaches. This mismatch in required versions is why we embed hwloc 
 as it is a critical library for OMPI, and we had to ensure that the 
 version matched our internal requirements.


> On Dec 8, 2014, at 8:50 AM, Pim Schellart  wrote:
>
> It is the default openmpi that comes with Ubuntu 14.04.
>
>> On 08 Dec 2014, at 17:17, Ralph Castain  wrote:
>>
>> Pim: is this an OMPI you built, or one you were given somehow? If you 
>> built it, how did you configure it?
>>
>>> On Dec 8, 2014, at 8:12 AM, Brice Goglin  wrote:
>>>
>>> It likely depends on how SLURM allocates the cpuset/cgroup inside the
>>> nodes. The XML warning is related to these restrictions inside the node.
>>> Anyway, my feeling is that there's a old OMPI or a old hwloc somewhere.
>>>
>>> How do we check after install whether OMPI uses the embedded or the
>>> system-wide hwloc?
>>>
>>> Brice
>>>
>>>
>>>
>>>
>>> Le 08/12/2014 17:07, Pim Schellart a écrit :
 Dear Ralph,

 the nodes are called coma## and as you can see in the logs the nodes 
 of the broken example are the same as the nodes of the working one, so 
 that doesn’t seem to be the cause. Unless (very likely) I’m missing 
 something. Anything else I can check?

 Regards,

 Pim

> On 08 Dec 2014, at 17:03, Ralph Castain  wrote:
>
> As Brice said, OMPI has its own embedded version of hwloc that we 
> use, so there is no Slurm interaction to be considered. The most 
> likely cause is that one or more of your nodes is picking up a 
> different version of OMPI. So things “work” if you happen to get 
> nodes where all the versions match, and “fail” when you get a 
> combination that includes a different version.
>
> Is there some way you can narrow down your search to find the node(s) 
> that are picking up the different version?
>
>
>> On Dec 8, 2014, at 7:48 AM, Pim Schellart  
>> wrote:
>>
>> Dear Brice,
>>
>> I am not sure why this is happening since all code seems to be using 
>> the same hwloc library version (1.8) but it does :) An MPI program 
>> is started through SLURM on two nodes with four CPU cores total 
>> (divided over the nodes) using the following script:
>>
>> #! /bin/bash
>> #SBATCH -N 2 -n 4
>

Re: [OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-09 Thread Piotr Lesnicki
Hi,

We indeed have a fix for XRC support on our branch at Bull and sorry I
neglected to contribute it, my bad…

I join here the patch on top of current v1.6.6 (should I rather
submit it as a pull request ?).

For v1.8+, a merge of the v1.6 code is not enough as openib connect
changed from xoob to udcm. I made a version on a pre-git state, so I
will update it and make a pull request.

Piotr





De : devel [devel-boun...@open-mpi.org] de la part de Gilles Gouaillardet 
[gilles.gouaillar...@iferc.org]
Envoyé : lundi 8 décembre 2014 03:27
À : Open MPI Developers
Objet : Re: [OMPI devel] openmpi and XRC API from ofed-3.12

Hi Piotr,

this  is quite an old thread now, but i did not see any support for XRC
with ofed 3.12 yet
(nor in trunk nor in v1.8)

my understanding is that Bull already did something similar for the v1.6
series,
so let me put this the other way around :

does Bull have any plan to contribute this work ?
(for example, publish a patch for the v1.6 series, or submit pull
request(s) for master and v1.8 branch)

Cheers,

Gilles

On 2014/04/23 21:58, Piotr Lesnicki wrote:
> Hi,
>
> In OFED-3.12 the API for XRC has changed. I did not find
> corresponding changes in Open MPI: for example the function
> 'ibv_create_xrc_rcv_qp()' queried in openmpi configure script no
> longer exists in ofed-3.12-rc1.
>
> Are there any plans to support the new XRC API ?
>
>
> --
> Piotr
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/04/14583.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/12/16445.php
diff --git a/ompi/config/ompi_check_openib.m4 b/ompi/config/ompi_check_openib.m4
index 187356f..97ee8fb 100644
--- a/ompi/config/ompi_check_openib.m4
+++ b/ompi/config/ompi_check_openib.m4
@@ -15,6 +15,7 @@
 # reserved.
 # Copyright (c) 2006-2009 Mellanox Technologies. All rights reserved.
 # Copyright (c) 2010-2012 Oracle and/or its affiliates.  All rights reserved.
+# Copyright (c) 2014  Bull SAS.  All rights reserved.
 # $COPYRIGHT$
 # 
 # Additional copyrights may follow
@@ -175,6 +176,7 @@ AC_DEFUN([OMPI_CHECK_OPENIB],[
 # (unconditionally)
 $1_have_xrc=0
 $1_have_rdmacm=0
+$1_have_xrc_connectib=0
 $1_have_opensm_devel=0
 
 # If we have the openib stuff available, find out what we've got
@@ -188,10 +190,15 @@ AC_DEFUN([OMPI_CHECK_OPENIB],[
 [#include ])
 
# ibv_create_xrc_rcv_qp was added in OFED 1.3
+	   # ibv_open_xrcd was added in  OFED 3.12 (new API)
if test "$enable_connectx_xrc" = "yes"; then
-   AC_CHECK_FUNCS([ibv_create_xrc_rcv_qp], [$1_have_xrc=1])
+   AC_CHECK_FUNCS([ibv_create_xrc_rcv_qp ibv_cmd_open_xrcd], [$1_have_xrc=1])
+   fi
+   if test "$enable_connectx_xrc" = "yes"; then
+   AC_CHECK_FUNCS([ibv_cmd_open_xrcd], [$1_have_xrc_connectib=1])
fi
 
+
if test "no" != "$enable_openib_dynamic_sl"; then
# We need ib_types.h file, which is installed with opensm-devel
# package. However, ib_types.h has a bad include directive,
@@ -279,6 +286,15 @@ AC_DEFUN([OMPI_CHECK_OPENIB],[
 AC_MSG_RESULT([no])
 fi
 
+AC_MSG_CHECKING([if ConnectIB XRC support is enabled])
+AC_DEFINE_UNQUOTED([OMPI_HAVE_CONNECTIB_XRC], [$$1_have_xrc_connectib],
+[Enable features required for ConnectIB XRC support])
+if test "1" = "$$1_have_xrc_connectib"; then
+AC_MSG_RESULT([yes])
+else
+AC_MSG_RESULT([no])
+fi
+
 AC_MSG_CHECKING([if dynamic SL is enabled])
 AC_DEFINE_UNQUOTED([OMPI_ENABLE_DYNAMIC_SL], [$$1_have_opensm_devel],
 [Enable features required for dynamic SL support])
diff --git a/ompi/mca/btl/openib/btl_openib.c b/ompi/mca/btl/openib/btl_openib.c
index 8a9d942..80f833b 100644
--- a/ompi/mca/btl/openib/btl_openib.c
+++ b/ompi/mca/btl/openib/btl_openib.c
@@ -17,6 +17,7 @@
  * Copyright (c) 2006-2007 Voltaire All rights reserved.
  * Copyright (c) 2008-2012 Oracle and/or its affiliates.  All rights reserved.
  * Copyright (c) 2009  IBM Corporation.  All rights reserved.
+ * Copyright (c) 2014  Bull SAS.  All rights reserved
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -323,10 +324,26 @@ static int create_srq(mca_btl_openib_module_t *openib_btl)
 openib_btl->qps[qp].u.srq_qp.rd_posted = 0;
 #if HAVE_XRC
 if(BTL_OPENIB_QP_TYPE_XRC(qp)) {
+#if OMPI_HAVE_CONNECTIB_XRC
+		struct ibv_srq_init_attr_ex attr_ex;
+		memset(&attr_ex, 0, sizeof(struct ibv_srq_init_attr_ex));
+		attr_ex.attr.max_wr = attr.attr.max_wr;
+		

Re: [OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-09 Thread Gilles Gouaillardet
Thanks Piotr,

Based on the ompi community rules, a pr should be made vs the master, so code 
can be reviewed and shacked a bit.
I already prepared such a pr based on your patch and i will push it tomorrow.

Then the changes will be backported to the v1.8 branch, assuming this is not 
considered as a new feature.

Ralph, can you please comment on that ?

Cheers,

Gilles


Piotr Lesnicki さんのメール:
>Hi,
>
>We indeed have a fix for XRC support on our branch at Bull and sorry I
>neglected to contribute it, my bad…
>
>I join here the patch on top of current v1.6.6 (should I rather
>submit it as a pull request ?).
>
>For v1.8+, a merge of the v1.6 code is not enough as openib connect
>changed from xoob to udcm. I made a version on a pre-git state, so I
>will update it and make a pull request.
>
>Piotr
>
>
>
>
>
>De : devel [devel-boun...@open-mpi.org] de la part de Gilles Gouaillardet 
>[gilles.gouaillar...@iferc.org]
>Envoyé : lundi 8 décembre 2014 03:27
>À : Open MPI Developers
>Objet : Re: [OMPI devel] openmpi and XRC API from ofed-3.12
>
>Hi Piotr,
>
>this  is quite an old thread now, but i did not see any support for XRC
>with ofed 3.12 yet
>(nor in trunk nor in v1.8)
>
>my understanding is that Bull already did something similar for the v1.6
>series,
>so let me put this the other way around :
>
>does Bull have any plan to contribute this work ?
>(for example, publish a patch for the v1.6 series, or submit pull
>request(s) for master and v1.8 branch)
>
>Cheers,
>
>Gilles
>
>On 2014/04/23 21:58, Piotr Lesnicki wrote:
>> Hi,
>>
>> In OFED-3.12 the API for XRC has changed. I did not find
>> corresponding changes in Open MPI: for example the function
>> 'ibv_create_xrc_rcv_qp()' queried in openmpi configure script no
>> longer exists in ofed-3.12-rc1.
>>
>> Are there any plans to support the new XRC API ?
>>
>>
>> --
>> Piotr
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/04/14583.php
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/12/16445.php


[OMPI devel] Update to usnic BTL / libfabric

2014-12-09 Thread Jeff Squyres (jsquyres)
As I mentioned on the call a week ago, the usnic BTL has been updated to use 
libfabric (instead of verbs).

What is libfabric?
--> Think of it as a "next generation verbs" -- it's OS-bypass networking for a 
wide range of network hardware, and libfabric contains many more capabilities 
than the verbs API.  libfabric is being developed by most the same people who 
initially developed verbs; it's not in competition with verbs -- it's a true 
"next generation" effort.  See http://ofiwg.github.io/libfabric/ for more 
detail.

Why should anyone care?
--> The usnic BTL has been updated to use libfabric.  I have therefore removed 
all usnic-specific code from the verbs parts of the OMPI code base (e.g., 
opal/mca/common).  Additionally, there will shortly be another commit that 
introduces another OMPI network device that uses libfabric.

Did you really just embed libfabric in opal/common/libfabric?
--> Yes -- but this is temporary.  libfabric isn't v1.0 yet -- there aren't 
libfabric tarballs being distributed.  Hence, other than git-cloning the 
libfabric github repo, you can't easily build OMPI against libfabric.  So we 
are temporarily embedding a copy of libfabric in OMPI, partly for convenience, 
and partly because the libfabric API is still changing slightly -- we need a 
stable libfabric stake in the ground against which to build the usnic and other 
components.  We'll update the embedded libfabric periodically to keep up with 
its development (e.g., I just did, earlier this morning).  We anticipate 
removing the embedded copy of libfabric at some point in the future.

Whoa; I'm getting a slew of -pedantic warnings when compiling libfabric!
--> Yeah, sorry about that.  :-(  I added a pragma this morning that should 
remove some of them, but there's still a bunch of -pedantic warnings when you 
compile opal/mca/common/libfabric.  We're working with libfabric upstream to 
get them fixed.  Stay tuned.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-477-g09d03a1

2014-12-09 Thread Ralph Castain
I believe this just reverted a commit last night from Howard that he needed to 
fix the build on the Cray.


> On Dec 9, 2014, at 5:52 AM, git...@crest.iu.edu wrote:
> 
> This is an automated email from the git hooks/post-receive script. It was
> generated because a ref change was pushed to the repository containing
> the project "open-mpi/ompi".
> 
> The branch, master has been updated
>   via  09d03a154bcb5ba1fae45895a20c7d4ffb9846ab (commit)
>  from  18d9fdfd8ddd9e778ea1193a9f44a0b0423b7a76 (commit)
> 
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
> 
> - Log -
> https://github.com/open-mpi/ompi/commit/09d03a154bcb5ba1fae45895a20c7d4ffb9846ab
> 
> commit 09d03a154bcb5ba1fae45895a20c7d4ffb9846ab
> Author: Jeff Squyres 
> Date:   Tue Dec 9 05:52:24 2014 -0800
> 
>libfabric: fix some typos in the usnic configury
> 
> diff --git a/opal/mca/common/libfabric/configure.m4 
> b/opal/mca/common/libfabric/configure.m4
> index a255fc3..26b39e1 100644
> --- a/opal/mca/common/libfabric/configure.m4
> +++ b/opal/mca/common/libfabric/configure.m4
> @@ -267,12 +267,12 @@ AC_DEFUN([_OPAL_COMMON_LIBFABRIC_CHECK_INCDIR],[
> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC],[
> opal_common_libfabric_usnic_happy=1
> AC_CHECK_HEADER([linux/netlink.h], [],
> -[opal_common_libfabric_happy=0], [
> +[opal_common_libfabric_usnic_happy=0], [
> #include 
> #include 
> ])
> AC_CHECK_LIB([nl], [nl_connect], [],
> - [opal_common_libfabric_happy=0])
> + [opal_common_libfabric_usnic_happy=0])
> 
> opal_common_libfabric_CPPFLAGS="$opal_common_libfabric_CPPFLAGS 
> -I$OPAL_TOP_SRCDIR/opal/mca/common/libfabric/libfabric/prov/usnic/src 
> -I$OPAL_TOP_SRCDIR/opal/mca/common/libfabric/libfabric/prov/usnic/src/usnic_direct"
> 
> opal_common_libfabric_LIBADD="\$(OPAL_TOP_BUILDDIR)/opal/mca/common/libfabric/lib${OPAL_LIB_PREFIX}mca_common_libfabric.la"
> @@ -286,5 +286,5 @@ 
> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC],[
> # 
> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC_CONDITIONALS],[
> AM_CONDITIONAL([OPAL_COMMON_LIBFABRIC_HAVE_PROVIDER_USNIC],
> -   [test $opal_common_libfabric_happy -eq 1])
> +   [test $opal_common_libfabric_usnic_happy -eq 1])
> ])
> 
> 
> ---
> 
> Summary of changes:
> opal/mca/common/libfabric/configure.m4 | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
> 
> 
> hooks/post-receive
> -- 
> open-mpi/ompi
> ___
> ompi-commits mailing list
> ompi-comm...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits



Re: [OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-09 Thread Ralph Castain
Fixing XRC for the newer ofed would be acceptable to me for the 1.8 series - 
thanks!


> On Dec 9, 2014, at 5:07 AM, Gilles Gouaillardet 
>  wrote:
> 
> Thanks Piotr,
> 
> Based on the ompi community rules, a pr should be made vs the master, so code 
> can be reviewed and shacked a bit.
> I already prepared such a pr based on your patch and i will push it tomorrow.
> 
> Then the changes will be backported to the v1.8 branch, assuming this is not 
> considered as a new feature.
> 
> Ralph, can you please comment on that ?
> 
> Cheers,
> 
> Gilles
> 
> 
> Piotr Lesnicki さんのメール:
>> Hi,
>> 
>> We indeed have a fix for XRC support on our branch at Bull and sorry I
>> neglected to contribute it, my bad…
>> 
>> I join here the patch on top of current v1.6.6 (should I rather
>> submit it as a pull request ?).
>> 
>> For v1.8+, a merge of the v1.6 code is not enough as openib connect
>> changed from xoob to udcm. I made a version on a pre-git state, so I
>> will update it and make a pull request.
>> 
>> Piotr
>> 
>> 
>> 
>> 
>> 
>> De : devel [devel-boun...@open-mpi.org] de la part de Gilles Gouaillardet 
>> [gilles.gouaillar...@iferc.org]
>> Envoyé : lundi 8 décembre 2014 03:27
>> À : Open MPI Developers
>> Objet : Re: [OMPI devel] openmpi and XRC API from ofed-3.12
>> 
>> Hi Piotr,
>> 
>> this  is quite an old thread now, but i did not see any support for XRC
>> with ofed 3.12 yet
>> (nor in trunk nor in v1.8)
>> 
>> my understanding is that Bull already did something similar for the v1.6
>> series,
>> so let me put this the other way around :
>> 
>> does Bull have any plan to contribute this work ?
>> (for example, publish a patch for the v1.6 series, or submit pull
>> request(s) for master and v1.8 branch)
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 2014/04/23 21:58, Piotr Lesnicki wrote:
>>> Hi,
>>> 
>>> In OFED-3.12 the API for XRC has changed. I did not find
>>> corresponding changes in Open MPI: for example the function
>>> 'ibv_create_xrc_rcv_qp()' queried in openmpi configure script no
>>> longer exists in ofed-3.12-rc1.
>>> 
>>> Are there any plans to support the new XRC API ?
>>> 
>>> 
>>> --
>>> Piotr
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/04/14583.php
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16445.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16467.php



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-477-g09d03a1

2014-12-09 Thread Jeff Squyres (jsquyres)
Yes, it did, because Howard's commit was wrong.

I'm not sure what the exact problem was he was fixing (the commit message 
wasn't very specific), but the shell variable names were already correct -- 
they are to indicate whether a specific provider (usnic, in this case) can be 
built; not the libfabric core.

However, there was a problem where provider libs were being unconditionally 
added; this *may* have been Howard's problem...?

I just pushed a fix for that: 
https://github.com/open-mpi/ompi/commit/c40fd09d2a0575e493137158644fd2b610a48aca

Howard's here at the Forum with me; I'll consult with him in person later this 
morning.




On Dec 9, 2014, at 7:15 AM, Ralph Castain  wrote:

> I believe this just reverted a commit last night from Howard that he needed 
> to fix the build on the Cray.
> 
> 
>> On Dec 9, 2014, at 5:52 AM, git...@crest.iu.edu wrote:
>> 
>> This is an automated email from the git hooks/post-receive script. It was
>> generated because a ref change was pushed to the repository containing
>> the project "open-mpi/ompi".
>> 
>> The branch, master has been updated
>>  via  09d03a154bcb5ba1fae45895a20c7d4ffb9846ab (commit)
>> from  18d9fdfd8ddd9e778ea1193a9f44a0b0423b7a76 (commit)
>> 
>> Those revisions listed above that are new to this repository have
>> not appeared on any other notification email; so we list those
>> revisions in full, below.
>> 
>> - Log -
>> https://github.com/open-mpi/ompi/commit/09d03a154bcb5ba1fae45895a20c7d4ffb9846ab
>> 
>> commit 09d03a154bcb5ba1fae45895a20c7d4ffb9846ab
>> Author: Jeff Squyres 
>> Date:   Tue Dec 9 05:52:24 2014 -0800
>> 
>>   libfabric: fix some typos in the usnic configury
>> 
>> diff --git a/opal/mca/common/libfabric/configure.m4 
>> b/opal/mca/common/libfabric/configure.m4
>> index a255fc3..26b39e1 100644
>> --- a/opal/mca/common/libfabric/configure.m4
>> +++ b/opal/mca/common/libfabric/configure.m4
>> @@ -267,12 +267,12 @@ AC_DEFUN([_OPAL_COMMON_LIBFABRIC_CHECK_INCDIR],[
>> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC],[
>>opal_common_libfabric_usnic_happy=1
>>AC_CHECK_HEADER([linux/netlink.h], [],
>> -[opal_common_libfabric_happy=0], [
>> +[opal_common_libfabric_usnic_happy=0], [
>> #include 
>> #include 
>> ])
>>AC_CHECK_LIB([nl], [nl_connect], [],
>> - [opal_common_libfabric_happy=0])
>> + [opal_common_libfabric_usnic_happy=0])
>> 
>>opal_common_libfabric_CPPFLAGS="$opal_common_libfabric_CPPFLAGS 
>> -I$OPAL_TOP_SRCDIR/opal/mca/common/libfabric/libfabric/prov/usnic/src 
>> -I$OPAL_TOP_SRCDIR/opal/mca/common/libfabric/libfabric/prov/usnic/src/usnic_direct"
>>
>> opal_common_libfabric_LIBADD="\$(OPAL_TOP_BUILDDIR)/opal/mca/common/libfabric/lib${OPAL_LIB_PREFIX}mca_common_libfabric.la"
>> @@ -286,5 +286,5 @@ 
>> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC],[
>> # 
>> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC_CONDITIONALS],[
>>AM_CONDITIONAL([OPAL_COMMON_LIBFABRIC_HAVE_PROVIDER_USNIC],
>> -   [test $opal_common_libfabric_happy -eq 1])
>> +   [test $opal_common_libfabric_usnic_happy -eq 1])
>> ])
>> 
>> 
>> ---
>> 
>> Summary of changes:
>> opal/mca/common/libfabric/configure.m4 | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> 
>> hooks/post-receive
>> -- 
>> open-mpi/ompi
>> ___
>> ompi-commits mailing list
>> ompi-comm...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16469.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-477-g09d03a1

2014-12-09 Thread Ralph Castain
No problem - just wanted to make sure you were aware of it.


> On Dec 9, 2014, at 7:21 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Yes, it did, because Howard's commit was wrong.
> 
> I'm not sure what the exact problem was he was fixing (the commit message 
> wasn't very specific), but the shell variable names were already correct -- 
> they are to indicate whether a specific provider (usnic, in this case) can be 
> built; not the libfabric core.
> 
> However, there was a problem where provider libs were being unconditionally 
> added; this *may* have been Howard's problem...?
> 
> I just pushed a fix for that: 
> https://github.com/open-mpi/ompi/commit/c40fd09d2a0575e493137158644fd2b610a48aca
> 
> Howard's here at the Forum with me; I'll consult with him in person later 
> this morning.
> 
> 
> 
> 
> On Dec 9, 2014, at 7:15 AM, Ralph Castain  wrote:
> 
>> I believe this just reverted a commit last night from Howard that he needed 
>> to fix the build on the Cray.
>> 
>> 
>>> On Dec 9, 2014, at 5:52 AM, git...@crest.iu.edu wrote:
>>> 
>>> This is an automated email from the git hooks/post-receive script. It was
>>> generated because a ref change was pushed to the repository containing
>>> the project "open-mpi/ompi".
>>> 
>>> The branch, master has been updated
>>> via  09d03a154bcb5ba1fae45895a20c7d4ffb9846ab (commit)
>>>from  18d9fdfd8ddd9e778ea1193a9f44a0b0423b7a76 (commit)
>>> 
>>> Those revisions listed above that are new to this repository have
>>> not appeared on any other notification email; so we list those
>>> revisions in full, below.
>>> 
>>> - Log -
>>> https://github.com/open-mpi/ompi/commit/09d03a154bcb5ba1fae45895a20c7d4ffb9846ab
>>> 
>>> commit 09d03a154bcb5ba1fae45895a20c7d4ffb9846ab
>>> Author: Jeff Squyres 
>>> Date:   Tue Dec 9 05:52:24 2014 -0800
>>> 
>>>  libfabric: fix some typos in the usnic configury
>>> 
>>> diff --git a/opal/mca/common/libfabric/configure.m4 
>>> b/opal/mca/common/libfabric/configure.m4
>>> index a255fc3..26b39e1 100644
>>> --- a/opal/mca/common/libfabric/configure.m4
>>> +++ b/opal/mca/common/libfabric/configure.m4
>>> @@ -267,12 +267,12 @@ AC_DEFUN([_OPAL_COMMON_LIBFABRIC_CHECK_INCDIR],[
>>> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC],[
>>>   opal_common_libfabric_usnic_happy=1
>>>   AC_CHECK_HEADER([linux/netlink.h], [],
>>> -[opal_common_libfabric_happy=0], [
>>> +[opal_common_libfabric_usnic_happy=0], [
>>> #include 
>>> #include 
>>> ])
>>>   AC_CHECK_LIB([nl], [nl_connect], [],
>>> - [opal_common_libfabric_happy=0])
>>> + [opal_common_libfabric_usnic_happy=0])
>>> 
>>>   opal_common_libfabric_CPPFLAGS="$opal_common_libfabric_CPPFLAGS 
>>> -I$OPAL_TOP_SRCDIR/opal/mca/common/libfabric/libfabric/prov/usnic/src 
>>> -I$OPAL_TOP_SRCDIR/opal/mca/common/libfabric/libfabric/prov/usnic/src/usnic_direct"
>>>   
>>> opal_common_libfabric_LIBADD="\$(OPAL_TOP_BUILDDIR)/opal/mca/common/libfabric/lib${OPAL_LIB_PREFIX}mca_common_libfabric.la"
>>> @@ -286,5 +286,5 @@ 
>>> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC],[
>>> # 
>>> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC_CONDITIONALS],[
>>>   AM_CONDITIONAL([OPAL_COMMON_LIBFABRIC_HAVE_PROVIDER_USNIC],
>>> -   [test $opal_common_libfabric_happy -eq 1])
>>> +   [test $opal_common_libfabric_usnic_happy -eq 1])
>>> ])
>>> 
>>> 
>>> ---
>>> 
>>> Summary of changes:
>>> opal/mca/common/libfabric/configure.m4 | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>> 
>>> 
>>> hooks/post-receive
>>> -- 
>>> open-mpi/ompi
>>> ___
>>> ompi-commits mailing list
>>> ompi-comm...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16469.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16471.php



Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-09 Thread Nathan Hjelm

Ralph, I corrected this as part of the thread multiple pull request in
1.8.

https://github.com/rhc54/ompi-release/commit/52823d592c3759c53ed63ed1f63fe200d2491220#diff-3673b21a7f42dc0665ea4470b3171df1R510

-Nathan

On Tue, Dec 09, 2014 at 12:31:55AM -0800, Ralph Castain wrote:
>Hi Pascal
>Is this in the trunk or in the 1.8 series (or both)?
> 
>  On Dec 9, 2014, at 12:28 AM, Pascal Deveze 
>  wrote:
>   
>  In case where MPI is compiled with --enable-mpi-thread-multiple, a call
>  to opal_using_threads() always returns 0 in the routine
>  btl_xxx_component_init() of the BTLs, event if the application calls
>  MPI_Init_thread() with MPI_THREAD_MULTIPLE.
>   
>  This is because opal_set_using_threads(true) in
>  ompi/runtime/ompi_mpi_init.c is called to late.
>   
>  I propose the following patch that solves the problem for me:
>   
>  diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c
>  index 35509cf..c2370fc 100644
>  --- a/ompi/runtime/ompi_mpi_init.c
>  +++ b/ompi/runtime/ompi_mpi_init.c
>  @@ -512,6 +512,13 @@ int ompi_mpi_init(int argc, char **argv, int
>  requested, int *provided)
>   }
>  #endif
>   
>  +/* If thread support was enabled, then setup OPAL to allow for
>  +   them. */
>  +if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
>  +(*provided != MPI_THREAD_SINGLE)) {
>  +opal_set_using_threads(true);
>  +}
>  +
>   /* initialize datatypes. This step should be done early as it will
>* create the local convertor and local arch used in the proc
>* init.
>  @@ -724,13 +731,6 @@ int ompi_mpi_init(int argc, char **argv, int
>  requested, int *provided)
>  goto error;
>   }
>   
>  -/* If thread support was enabled, then setup OPAL to allow for
>  -   them. */
>  -if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
>  -(*provided != MPI_THREAD_SINGLE)) {
>  -opal_set_using_threads(true);
>  -}
>  -
>   /* start PML/BTL's */
>   ret = MCA_PML_CALL(enable(true));
>   if( OMPI_SUCCESS != ret ) {
>  ___
>  devel mailing list
>  de...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>  Link to this
>  post: http://www.open-mpi.org/community/lists/devel/2014/12/16459.php

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16461.php



pgpV5kPcEA4y3.pgp
Description: PGP signature


Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-09 Thread Ralph Castain
Kewl - I wonder why it wasn’t fixed in trunk then?


> On Dec 9, 2014, at 7:52 AM, Nathan Hjelm  wrote:
> 
> 
> Ralph, I corrected this as part of the thread multiple pull request in
> 1.8.
> 
> https://github.com/rhc54/ompi-release/commit/52823d592c3759c53ed63ed1f63fe200d2491220#diff-3673b21a7f42dc0665ea4470b3171df1R510
> 
> -Nathan
> 
> On Tue, Dec 09, 2014 at 12:31:55AM -0800, Ralph Castain wrote:
>>   Hi Pascal
>>   Is this in the trunk or in the 1.8 series (or both)?
>> 
>> On Dec 9, 2014, at 12:28 AM, Pascal Deveze 
>> wrote:
>> 
>> In case where MPI is compiled with --enable-mpi-thread-multiple, a call
>> to opal_using_threads() always returns 0 in the routine
>> btl_xxx_component_init() of the BTLs, event if the application calls
>> MPI_Init_thread() with MPI_THREAD_MULTIPLE.
>> 
>> This is because opal_set_using_threads(true) in
>> ompi/runtime/ompi_mpi_init.c is called to late.
>> 
>> I propose the following patch that solves the problem for me:
>> 
>> diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c
>> index 35509cf..c2370fc 100644
>> --- a/ompi/runtime/ompi_mpi_init.c
>> +++ b/ompi/runtime/ompi_mpi_init.c
>> @@ -512,6 +512,13 @@ int ompi_mpi_init(int argc, char **argv, int
>> requested, int *provided)
>>  }
>> #endif
>> 
>> +/* If thread support was enabled, then setup OPAL to allow for
>> +   them. */
>> +if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
>> +(*provided != MPI_THREAD_SINGLE)) {
>> +opal_set_using_threads(true);
>> +}
>> +
>>  /* initialize datatypes. This step should be done early as it will
>>   * create the local convertor and local arch used in the proc
>>   * init.
>> @@ -724,13 +731,6 @@ int ompi_mpi_init(int argc, char **argv, int
>> requested, int *provided)
>> goto error;
>>  }
>> 
>> -/* If thread support was enabled, then setup OPAL to allow for
>> -   them. */
>> -if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
>> -(*provided != MPI_THREAD_SINGLE)) {
>> -opal_set_using_threads(true);
>> -}
>> -
>>  /* start PML/BTL's */
>>  ret = MCA_PML_CALL(enable(true));
>>  if( OMPI_SUCCESS != ret ) {
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this
>> post: http://www.open-mpi.org/community/lists/devel/2014/12/16459.php
> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16461.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16473.php



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-477-g09d03a1

2014-12-09 Thread Howard Pritchard
Well the build is broken again for cray.  I'd like to have this stop.


2014-12-09 7:23 GMT-08:00 Ralph Castain :

> No problem - just wanted to make sure you were aware of it.
>
>
> > On Dec 9, 2014, at 7:21 AM, Jeff Squyres (jsquyres) 
> wrote:
> >
> > Yes, it did, because Howard's commit was wrong.
> >
> > I'm not sure what the exact problem was he was fixing (the commit
> message wasn't very specific), but the shell variable names were already
> correct -- they are to indicate whether a specific provider (usnic, in this
> case) can be built; not the libfabric core.
> >
> > However, there was a problem where provider libs were being
> unconditionally added; this *may* have been Howard's problem...?
> >
> > I just pushed a fix for that:
> https://github.com/open-mpi/ompi/commit/c40fd09d2a0575e493137158644fd2b610a48aca
> >
> > Howard's here at the Forum with me; I'll consult with him in person
> later this morning.
> >
> >
> >
> >
> > On Dec 9, 2014, at 7:15 AM, Ralph Castain  wrote:
> >
> >> I believe this just reverted a commit last night from Howard that he
> needed to fix the build on the Cray.
> >>
> >>
> >>> On Dec 9, 2014, at 5:52 AM, git...@crest.iu.edu wrote:
> >>>
> >>> This is an automated email from the git hooks/post-receive script. It
> was
> >>> generated because a ref change was pushed to the repository containing
> >>> the project "open-mpi/ompi".
> >>>
> >>> The branch, master has been updated
> >>> via  09d03a154bcb5ba1fae45895a20c7d4ffb9846ab (commit)
> >>>from  18d9fdfd8ddd9e778ea1193a9f44a0b0423b7a76 (commit)
> >>>
> >>> Those revisions listed above that are new to this repository have
> >>> not appeared on any other notification email; so we list those
> >>> revisions in full, below.
> >>>
> >>> - Log -
> >>>
> https://github.com/open-mpi/ompi/commit/09d03a154bcb5ba1fae45895a20c7d4ffb9846ab
> >>>
> >>> commit 09d03a154bcb5ba1fae45895a20c7d4ffb9846ab
> >>> Author: Jeff Squyres 
> >>> Date:   Tue Dec 9 05:52:24 2014 -0800
> >>>
> >>>  libfabric: fix some typos in the usnic configury
> >>>
> >>> diff --git a/opal/mca/common/libfabric/configure.m4
> b/opal/mca/common/libfabric/configure.m4
> >>> index a255fc3..26b39e1 100644
> >>> --- a/opal/mca/common/libfabric/configure.m4
> >>> +++ b/opal/mca/common/libfabric/configure.m4
> >>> @@ -267,12 +267,12 @@ AC_DEFUN([_OPAL_COMMON_LIBFABRIC_CHECK_INCDIR],[
> >>> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC],[
> >>>   opal_common_libfabric_usnic_happy=1
> >>>   AC_CHECK_HEADER([linux/netlink.h], [],
> >>> -[opal_common_libfabric_happy=0], [
> >>> +[opal_common_libfabric_usnic_happy=0], [
> >>> #include 
> >>> #include 
> >>> ])
> >>>   AC_CHECK_LIB([nl], [nl_connect], [],
> >>> - [opal_common_libfabric_happy=0])
> >>> + [opal_common_libfabric_usnic_happy=0])
> >>>
> >>>   opal_common_libfabric_CPPFLAGS="$opal_common_libfabric_CPPFLAGS
> -I$OPAL_TOP_SRCDIR/opal/mca/common/libfabric/libfabric/prov/usnic/src
> -I$OPAL_TOP_SRCDIR/opal/mca/common/libfabric/libfabric/prov/usnic/src/usnic_direct"
> >>>
>  
> opal_common_libfabric_LIBADD="\$(OPAL_TOP_BUILDDIR)/opal/mca/common/libfabric/lib${OPAL_LIB_PREFIX}
> mca_common_libfabric.la"
> >>> @@ -286,5 +286,5 @@
> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC],[
> >>> # 
> >>>
> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC_CONDITIONALS],[
> >>>   AM_CONDITIONAL([OPAL_COMMON_LIBFABRIC_HAVE_PROVIDER_USNIC],
> >>> -   [test $opal_common_libfabric_happy -eq 1])
> >>> +   [test $opal_common_libfabric_usnic_happy -eq 1])
> >>> ])
> >>>
> >>>
> >>> ---
> >>>
> >>> Summary of changes:
> >>> opal/mca/common/libfabric/configure.m4 | 6 +++---
> >>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>>
> >>> hooks/post-receive
> >>> --
> >>> open-mpi/ompi
> >>> ___
> >>> ompi-commits mailing list
> >>> ompi-comm...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16469.php
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16471.php
>
> ___
> devel mailing list
> de..

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-09 Thread Nathan Hjelm
Just hadn't gotten around to it yet :). Still working on free list and
lifo stuff.

-Nathan

On Tue, Dec 09, 2014 at 07:56:04AM -0800, Ralph Castain wrote:
> Kewl - I wonder why it wasn’t fixed in trunk then?
> 
> 
> > On Dec 9, 2014, at 7:52 AM, Nathan Hjelm  wrote:
> > 
> > 
> > Ralph, I corrected this as part of the thread multiple pull request in
> > 1.8.
> > 
> > https://github.com/rhc54/ompi-release/commit/52823d592c3759c53ed63ed1f63fe200d2491220#diff-3673b21a7f42dc0665ea4470b3171df1R510
> > 
> > -Nathan
> > 
> > On Tue, Dec 09, 2014 at 12:31:55AM -0800, Ralph Castain wrote:
> >>   Hi Pascal
> >>   Is this in the trunk or in the 1.8 series (or both)?
> >> 
> >> On Dec 9, 2014, at 12:28 AM, Pascal Deveze 
> >> wrote:
> >> 
> >> In case where MPI is compiled with --enable-mpi-thread-multiple, a call
> >> to opal_using_threads() always returns 0 in the routine
> >> btl_xxx_component_init() of the BTLs, event if the application calls
> >> MPI_Init_thread() with MPI_THREAD_MULTIPLE.
> >> 
> >> This is because opal_set_using_threads(true) in
> >> ompi/runtime/ompi_mpi_init.c is called to late.
> >> 
> >> I propose the following patch that solves the problem for me:
> >> 
> >> diff --git a/ompi/runtime/ompi_mpi_init.c 
> >> b/ompi/runtime/ompi_mpi_init.c
> >> index 35509cf..c2370fc 100644
> >> --- a/ompi/runtime/ompi_mpi_init.c
> >> +++ b/ompi/runtime/ompi_mpi_init.c
> >> @@ -512,6 +512,13 @@ int ompi_mpi_init(int argc, char **argv, int
> >> requested, int *provided)
> >>  }
> >> #endif
> >> 
> >> +/* If thread support was enabled, then setup OPAL to allow for
> >> +   them. */
> >> +if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
> >> +(*provided != MPI_THREAD_SINGLE)) {
> >> +opal_set_using_threads(true);
> >> +}
> >> +
> >>  /* initialize datatypes. This step should be done early as it will
> >>   * create the local convertor and local arch used in the proc
> >>   * init.
> >> @@ -724,13 +731,6 @@ int ompi_mpi_init(int argc, char **argv, int
> >> requested, int *provided)
> >> goto error;
> >>  }
> >> 
> >> -/* If thread support was enabled, then setup OPAL to allow for
> >> -   them. */
> >> -if ((OPAL_ENABLE_PROGRESS_THREADS == 1) ||
> >> -(*provided != MPI_THREAD_SINGLE)) {
> >> -opal_set_using_threads(true);
> >> -}
> >> -
> >>  /* start PML/BTL's */
> >>  ret = MCA_PML_CALL(enable(true));
> >>  if( OMPI_SUCCESS != ret ) {
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this
> >> post: http://www.open-mpi.org/community/lists/devel/2014/12/16459.php
> > 
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2014/12/16461.php
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/12/16473.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16474.php


pgpyndUQCq7kB.pgp
Description: PGP signature


Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-477-g09d03a1

2014-12-09 Thread Jeff Squyres (jsquyres)
I would, too.  Can you provide more detail than "it's broken"?


On Dec 9, 2014, at 7:59 AM, Howard Pritchard  wrote:

> Well the build is broken again for cray.  I'd like to have this stop.
> 
> 
> 2014-12-09 7:23 GMT-08:00 Ralph Castain :
> No problem - just wanted to make sure you were aware of it.
> 
> 
> > On Dec 9, 2014, at 7:21 AM, Jeff Squyres (jsquyres)  
> > wrote:
> >
> > Yes, it did, because Howard's commit was wrong.
> >
> > I'm not sure what the exact problem was he was fixing (the commit message 
> > wasn't very specific), but the shell variable names were already correct -- 
> > they are to indicate whether a specific provider (usnic, in this case) can 
> > be built; not the libfabric core.
> >
> > However, there was a problem where provider libs were being unconditionally 
> > added; this *may* have been Howard's problem...?
> >
> > I just pushed a fix for that: 
> > https://github.com/open-mpi/ompi/commit/c40fd09d2a0575e493137158644fd2b610a48aca
> >
> > Howard's here at the Forum with me; I'll consult with him in person later 
> > this morning.
> >
> >
> >
> >
> > On Dec 9, 2014, at 7:15 AM, Ralph Castain  wrote:
> >
> >> I believe this just reverted a commit last night from Howard that he 
> >> needed to fix the build on the Cray.
> >>
> >>
> >>> On Dec 9, 2014, at 5:52 AM, git...@crest.iu.edu wrote:
> >>>
> >>> This is an automated email from the git hooks/post-receive script. It was
> >>> generated because a ref change was pushed to the repository containing
> >>> the project "open-mpi/ompi".
> >>>
> >>> The branch, master has been updated
> >>> via  09d03a154bcb5ba1fae45895a20c7d4ffb9846ab (commit)
> >>>from  18d9fdfd8ddd9e778ea1193a9f44a0b0423b7a76 (commit)
> >>>
> >>> Those revisions listed above that are new to this repository have
> >>> not appeared on any other notification email; so we list those
> >>> revisions in full, below.
> >>>
> >>> - Log -
> >>> https://github.com/open-mpi/ompi/commit/09d03a154bcb5ba1fae45895a20c7d4ffb9846ab
> >>>
> >>> commit 09d03a154bcb5ba1fae45895a20c7d4ffb9846ab
> >>> Author: Jeff Squyres 
> >>> Date:   Tue Dec 9 05:52:24 2014 -0800
> >>>
> >>>  libfabric: fix some typos in the usnic configury
> >>>
> >>> diff --git a/opal/mca/common/libfabric/configure.m4 
> >>> b/opal/mca/common/libfabric/configure.m4
> >>> index a255fc3..26b39e1 100644
> >>> --- a/opal/mca/common/libfabric/configure.m4
> >>> +++ b/opal/mca/common/libfabric/configure.m4
> >>> @@ -267,12 +267,12 @@ AC_DEFUN([_OPAL_COMMON_LIBFABRIC_CHECK_INCDIR],[
> >>> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC],[
> >>>   opal_common_libfabric_usnic_happy=1
> >>>   AC_CHECK_HEADER([linux/netlink.h], [],
> >>> -[opal_common_libfabric_happy=0], [
> >>> +[opal_common_libfabric_usnic_happy=0], [
> >>> #include 
> >>> #include 
> >>> ])
> >>>   AC_CHECK_LIB([nl], [nl_connect], [],
> >>> - [opal_common_libfabric_happy=0])
> >>> + [opal_common_libfabric_usnic_happy=0])
> >>>
> >>>   opal_common_libfabric_CPPFLAGS="$opal_common_libfabric_CPPFLAGS 
> >>> -I$OPAL_TOP_SRCDIR/opal/mca/common/libfabric/libfabric/prov/usnic/src 
> >>> -I$OPAL_TOP_SRCDIR/opal/mca/common/libfabric/libfabric/prov/usnic/src/usnic_direct"
> >>>   
> >>> opal_common_libfabric_LIBADD="\$(OPAL_TOP_BUILDDIR)/opal/mca/common/libfabric/lib${OPAL_LIB_PREFIX}mca_common_libfabric.la"
> >>> @@ -286,5 +286,5 @@ 
> >>> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC],[
> >>> # 
> >>> AC_DEFUN([_OPAL_COMMON_LIBFABRIC_EMBEDDED_PROVIDER_USNIC_CONDITIONALS],[
> >>>   AM_CONDITIONAL([OPAL_COMMON_LIBFABRIC_HAVE_PROVIDER_USNIC],
> >>> -   [test $opal_common_libfabric_happy -eq 1])
> >>> +   [test $opal_common_libfabric_usnic_happy -eq 1])
> >>> ])
> >>>
> >>>
> >>> ---
> >>>
> >>> Summary of changes:
> >>> opal/mca/common/libfabric/configure.m4 | 6 +++---
> >>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>>
> >>> hooks/post-receive
> >>> --
> >>> open-mpi/ompi
> >>> ___
> >>> ompi-commits mailing list
> >>> ompi-comm...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2014/12/16469.php
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to: 
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/

Re: [OMPI devel] autogen broken

2014-12-09 Thread George Bosilca
Thanks for the hint, it does fixes the problem. But it has to be applied on
the build directory after every configure ...

  George.


On Tue, Dec 9, 2014 at 2:27 AM, Nick Papior Andersen 
wrote:

> I experience the exact same thing.
> Please see my bug-report on this (and the work-around) here:
> http://www.open-mpi.org/community/lists/devel/2014/11/16371.php
>
> 2014-12-09 7:57 GMT+01:00 George Bosilca :
>
>> After updating to the latest master (3a14c8e), I start having issues with
>> the VPATH build on Mac OS X. The autogen.pl and configure succeeded but
>> when make is invoked I got the following error:
>>
>> Making all in opal
>> Making all in include
>> /Applications/Xcode.app/Contents/Developer/usr/bin/make  all-am
>> Making all in libltdl
>> CDPATH="${ZSH_VERSION+.}:" && cd ../../../ompi/opal/libltdl && /bin/sh
>> /Users/bosilca/unstable/ompi/trunk/ompi/config/missing aclocal-1.14 -I
>> ../../config
>> aclocal-1.14: error: ../../config/autogen_found_items.m4:312: file
>> ‘opal/mca/backtrace/configure.m4’ does not exist
>>
>> I tried to wipe out everything and start from scratch but to no avail.
>> Any ideas what’s going wrong and/or how to fix this?
>>
>>   George.
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/12/16455.php
>
>
>
>
> --
> Kind regards Nick
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16456.php
>


Re: [OMPI devel] autogen broken

2014-12-09 Thread Nick Papior Andersen
Yes, I know, but it might give a clue as to what goes wrong in the autogen
script (as the config path is erroneous), what is peculiar is that it only
happens for that sub-directory, so that should narrow the search down :)

I am glad it worked :)

2014-12-09 16:32 GMT+00:00 George Bosilca :

> Thanks for the hint, it does fixes the problem. But it has to be applied
> on the build directory after every configure ...
>
>   George.
>
>
> On Tue, Dec 9, 2014 at 2:27 AM, Nick Papior Andersen  > wrote:
>
>> I experience the exact same thing.
>> Please see my bug-report on this (and the work-around) here:
>> http://www.open-mpi.org/community/lists/devel/2014/11/16371.php
>>
>> 2014-12-09 7:57 GMT+01:00 George Bosilca :
>>
>>> After updating to the latest master (3a14c8e), I start having issues
>>> with the VPATH build on Mac OS X. The autogen.pl and configure
>>> succeeded but when make is invoked I got the following error:
>>>
>>> Making all in opal
>>> Making all in include
>>> /Applications/Xcode.app/Contents/Developer/usr/bin/make  all-am
>>> Making all in libltdl
>>> CDPATH="${ZSH_VERSION+.}:" && cd ../../../ompi/opal/libltdl && /bin/sh
>>> /Users/bosilca/unstable/ompi/trunk/ompi/config/missing aclocal-1.14 -I
>>> ../../config
>>> aclocal-1.14: error: ../../config/autogen_found_items.m4:312: file
>>> ‘opal/mca/backtrace/configure.m4’ does not exist
>>>
>>> I tried to wipe out everything and start from scratch but to no avail.
>>> Any ideas what’s going wrong and/or how to fix this?
>>>
>>>   George.
>>>
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16455.php
>>
>>
>>
>>
>> --
>> Kind regards Nick
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/12/16456.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16478.php
>



-- 
Kind regards Nick


Re: [OMPI devel] Update to usnic BTL / libfabric

2014-12-09 Thread Ralph Castain
Just as an FYI: we discovered that libfabric requires libnl, and that the 
configure logic doesn’t kick you out if libnl isn’t found - you just fail 
during compile.


> On Dec 9, 2014, at 6:29 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
> As I mentioned on the call a week ago, the usnic BTL has been updated to use 
> libfabric (instead of verbs).
> 
> What is libfabric?
> --> Think of it as a "next generation verbs" -- it's OS-bypass networking for 
> a wide range of network hardware, and libfabric contains many more 
> capabilities than the verbs API.  libfabric is being developed by most the 
> same people who initially developed verbs; it's not in competition with verbs 
> -- it's a true "next generation" effort.  See 
> http://ofiwg.github.io/libfabric/ for more detail.
> 
> Why should anyone care?
> --> The usnic BTL has been updated to use libfabric.  I have therefore 
> removed all usnic-specific code from the verbs parts of the OMPI code base 
> (e.g., opal/mca/common).  Additionally, there will shortly be another commit 
> that introduces another OMPI network device that uses libfabric.
> 
> Did you really just embed libfabric in opal/common/libfabric?
> --> Yes -- but this is temporary.  libfabric isn't v1.0 yet -- there aren't 
> libfabric tarballs being distributed.  Hence, other than git-cloning the 
> libfabric github repo, you can't easily build OMPI against libfabric.  So we 
> are temporarily embedding a copy of libfabric in OMPI, partly for 
> convenience, and partly because the libfabric API is still changing slightly 
> -- we need a stable libfabric stake in the ground against which to build the 
> usnic and other components.  We'll update the embedded libfabric periodically 
> to keep up with its development (e.g., I just did, earlier this morning).  We 
> anticipate removing the embedded copy of libfabric at some point in the 
> future.
> 
> Whoa; I'm getting a slew of -pedantic warnings when compiling libfabric!
> --> Yeah, sorry about that.  :-(  I added a pragma this morning that should 
> remove some of them, but there's still a bunch of -pedantic warnings when you 
> compile opal/mca/common/libfabric.  We're working with libfabric upstream to 
> get them fixed.  Stay tuned.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16468.php



Re: [OMPI devel] Update to usnic BTL / libfabric

2014-12-09 Thread Jeff Squyres (jsquyres)
That was fixed earlier this morning.

On Dec 9, 2014, at 1:45 PM, Ralph Castain  wrote:

> Just as an FYI: we discovered that libfabric requires libnl, and that the 
> configure logic doesn’t kick you out if libnl isn’t found - you just fail 
> during compile.
> 
> 
>> On Dec 9, 2014, at 6:29 AM, Jeff Squyres (jsquyres)  
>> wrote:
>> 
>> As I mentioned on the call a week ago, the usnic BTL has been updated to use 
>> libfabric (instead of verbs).
>> 
>> What is libfabric?
>> --> Think of it as a "next generation verbs" -- it's OS-bypass networking 
>> for a wide range of network hardware, and libfabric contains many more 
>> capabilities than the verbs API.  libfabric is being developed by most the 
>> same people who initially developed verbs; it's not in competition with 
>> verbs -- it's a true "next generation" effort.  See 
>> http://ofiwg.github.io/libfabric/ for more detail.
>> 
>> Why should anyone care?
>> --> The usnic BTL has been updated to use libfabric.  I have therefore 
>> removed all usnic-specific code from the verbs parts of the OMPI code base 
>> (e.g., opal/mca/common).  Additionally, there will shortly be another commit 
>> that introduces another OMPI network device that uses libfabric.
>> 
>> Did you really just embed libfabric in opal/common/libfabric?
>> --> Yes -- but this is temporary.  libfabric isn't v1.0 yet -- there aren't 
>> libfabric tarballs being distributed.  Hence, other than git-cloning the 
>> libfabric github repo, you can't easily build OMPI against libfabric.  So we 
>> are temporarily embedding a copy of libfabric in OMPI, partly for 
>> convenience, and partly because the libfabric API is still changing slightly 
>> -- we need a stable libfabric stake in the ground against which to build the 
>> usnic and other components.  We'll update the embedded libfabric 
>> periodically to keep up with its development (e.g., I just did, earlier this 
>> morning).  We anticipate removing the embedded copy of libfabric at some 
>> point in the future.
>> 
>> Whoa; I'm getting a slew of -pedantic warnings when compiling libfabric!
>> --> Yeah, sorry about that.  :-(  I added a pragma this morning that should 
>> remove some of them, but there's still a bunch of -pedantic warnings when 
>> you compile opal/mca/common/libfabric.  We're working with libfabric 
>> upstream to get them fixed.  Stay tuned.
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16468.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16480.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Update to usnic BTL / libfabric

2014-12-09 Thread Howard Pritchard
HI Ralph,

Jeff fixed this in c40fd09.  That's the problem I hit, in addition to
later not having psm_infinipath.  After that commit,and commit cd0a54d
you should be able to config and make again.

Howard






2014-12-09 13:45 GMT-08:00 Ralph Castain :

> Just as an FYI: we discovered that libfabric requires libnl, and that the
> configure logic doesn’t kick you out if libnl isn’t found - you just fail
> during compile.
>
>
> > On Dec 9, 2014, at 6:29 AM, Jeff Squyres (jsquyres) 
> wrote:
> >
> > As I mentioned on the call a week ago, the usnic BTL has been updated to
> use libfabric (instead of verbs).
> >
> > What is libfabric?
> > --> Think of it as a "next generation verbs" -- it's OS-bypass
> networking for a wide range of network hardware, and libfabric contains
> many more capabilities than the verbs API.  libfabric is being developed by
> most the same people who initially developed verbs; it's not in competition
> with verbs -- it's a true "next generation" effort.  See
> http://ofiwg.github.io/libfabric/ for more detail.
> >
> > Why should anyone care?
> > --> The usnic BTL has been updated to use libfabric.  I have therefore
> removed all usnic-specific code from the verbs parts of the OMPI code base
> (e.g., opal/mca/common).  Additionally, there will shortly be another
> commit that introduces another OMPI network device that uses libfabric.
> >
> > Did you really just embed libfabric in opal/common/libfabric?
> > --> Yes -- but this is temporary.  libfabric isn't v1.0 yet -- there
> aren't libfabric tarballs being distributed.  Hence, other than git-cloning
> the libfabric github repo, you can't easily build OMPI against libfabric.
> So we are temporarily embedding a copy of libfabric in OMPI, partly for
> convenience, and partly because the libfabric API is still changing
> slightly -- we need a stable libfabric stake in the ground against which to
> build the usnic and other components.  We'll update the embedded libfabric
> periodically to keep up with its development (e.g., I just did, earlier
> this morning).  We anticipate removing the embedded copy of libfabric at
> some point in the future.
> >
> > Whoa; I'm getting a slew of -pedantic warnings when compiling libfabric!
> > --> Yeah, sorry about that.  :-(  I added a pragma this morning that
> should remove some of them, but there's still a bunch of -pedantic warnings
> when you compile opal/mca/common/libfabric.  We're working with libfabric
> upstream to get them fixed.  Stay tuned.
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16468.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16480.php


Re: [OMPI devel] Update to usnic BTL / libfabric

2014-12-09 Thread Ralph Castain
Oh I can configure and make just fine - I have nl installed on my machines. The 
problem was hit by our folks at Intel, who didn’t have libnl and it didn’t kick 
out. So far as they tell me (as of 2 minutes ago), it still doesn’t, though 
I’ll double-check with them.


> On Dec 9, 2014, at 1:59 PM, Howard Pritchard  wrote:
> 
> HI Ralph,
> 
> Jeff fixed this in c40fd09.  That's the problem I hit, in addition to
> later not having psm_infinipath.  After that commit,and commit cd0a54d
> you should be able to config and make again.
> 
> Howard
> 
> 
> 
> 
> 
> 
> 2014-12-09 13:45 GMT-08:00 Ralph Castain  >:
> Just as an FYI: we discovered that libfabric requires libnl, and that the 
> configure logic doesn’t kick you out if libnl isn’t found - you just fail 
> during compile.
> 
> 
> > On Dec 9, 2014, at 6:29 AM, Jeff Squyres (jsquyres)  > > wrote:
> >
> > As I mentioned on the call a week ago, the usnic BTL has been updated to 
> > use libfabric (instead of verbs).
> >
> > What is libfabric?
> > --> Think of it as a "next generation verbs" -- it's OS-bypass networking 
> > for a wide range of network hardware, and libfabric contains many more 
> > capabilities than the verbs API.  libfabric is being developed by most the 
> > same people who initially developed verbs; it's not in competition with 
> > verbs -- it's a true "next generation" effort.  See 
> > http://ofiwg.github.io/libfabric/  for 
> > more detail.
> >
> > Why should anyone care?
> > --> The usnic BTL has been updated to use libfabric.  I have therefore 
> > removed all usnic-specific code from the verbs parts of the OMPI code base 
> > (e.g., opal/mca/common).  Additionally, there will shortly be another 
> > commit that introduces another OMPI network device that uses libfabric.
> >
> > Did you really just embed libfabric in opal/common/libfabric?
> > --> Yes -- but this is temporary.  libfabric isn't v1.0 yet -- there aren't 
> > libfabric tarballs being distributed.  Hence, other than git-cloning the 
> > libfabric github repo, you can't easily build OMPI against libfabric.  So 
> > we are temporarily embedding a copy of libfabric in OMPI, partly for 
> > convenience, and partly because the libfabric API is still changing 
> > slightly -- we need a stable libfabric stake in the ground against which to 
> > build the usnic and other components.  We'll update the embedded libfabric 
> > periodically to keep up with its development (e.g., I just did, earlier 
> > this morning).  We anticipate removing the embedded copy of libfabric at 
> > some point in the future.
> >
> > Whoa; I'm getting a slew of -pedantic warnings when compiling libfabric!
> > --> Yeah, sorry about that.  :-(  I added a pragma this morning that should 
> > remove some of them, but there's still a bunch of -pedantic warnings when 
> > you compile opal/mca/common/libfabric.  We're working with libfabric 
> > upstream to get them fixed.  Stay tuned.
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com 
> > For corporate legal information go to: 
> > http://www.cisco.com/web/about/doing_business/legal/cri/ 
> > 
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org 
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> > 
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/12/16468.php 
> > 
> 
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16480.php 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16482.php



Re: [OMPI devel] Update to usnic BTL / libfabric

2014-12-09 Thread Jeff Squyres (jsquyres)
On Dec 9, 2014, at 3:08 PM, Ralph Castain  wrote:

> Oh I can configure and make just fine - I have nl installed on my machines. 
> The problem was hit by our folks at Intel, who didn’t have libnl and it 
> didn’t kick out. So far as they tell me (as of 2 minutes ago), it still 
> doesn’t, though I’ll double-check with them.

Ok, please do.  If I still have something in there that mucks you guys up, let 
me know.  I thought I had made the addition of -lnl be conditional on whether 
libnl was there as of 
https://github.com/open-mpi/ompi/commit/c40fd09d2a0575e493137158644fd2b610a48aca.

If I missed something, please let me know.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] opal_lifo/opal_fifo fail with make distcheck

2014-12-09 Thread Howard Pritchard
Hi Folks,

I've tried running make distcheck on master and get failures for
opal_fifo/opal_lifo:

make[4]: Leaving directory
`/global/u2/h/hpp/ompi/openmpi-gitclone/_build/test/class'

make  check-TESTS

make[4]: Entering directory
`/global/u2/h/hpp/ompi/openmpi-gitclone/_build/test/class'

make[5]: Entering directory
`/global/u2/h/hpp/ompi/openmpi-gitclone/_build/test/class'

FAIL: opal_lifo

FAIL: opal_fifo

Has anyone else seen this?

Howard


Re: [OMPI devel] Update to usnic BTL / libfabric

2014-12-09 Thread Ralph Castain
Okay, they checked with my latest ORCM update and it is indeed working right 
now - thx!

> On Dec 9, 2014, at 3:29 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> On Dec 9, 2014, at 3:08 PM, Ralph Castain  wrote:
> 
>> Oh I can configure and make just fine - I have nl installed on my machines. 
>> The problem was hit by our folks at Intel, who didn’t have libnl and it 
>> didn’t kick out. So far as they tell me (as of 2 minutes ago), it still 
>> doesn’t, though I’ll double-check with them.
> 
> Ok, please do.  If I still have something in there that mucks you guys up, 
> let me know.  I thought I had made the addition of -lnl be conditional on 
> whether libnl was there as of 
> https://github.com/open-mpi/ompi/commit/c40fd09d2a0575e493137158644fd2b610a48aca.
> 
> If I missed something, please let me know.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16484.php



Re: [OMPI devel] (no subject)

2014-12-09 Thread Kevin Buckley
On 9 December 2014 at 03:29, Howard Pritchard  wrote:
> Hello Kevin,
>
> Could you try testing with Open MPI 1.8.3?  There was a bug in 1.8.1
> that you are likely hitting in your testing.
>
> Thanks,
>
> Howard

Bingo!

Seems to have got rid of those messages.

Thanks.