Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Gilles Gouaillardet
Ralph,

You are right,
please disregard my previous post, it was irrelevant.

i just noticed that unlike ompi v1.8 (hwloc 1.7.2 based => no warning),
master has this warning (hwloc 1.9.1)

i will build slurm vs a recent hwloc and see what happens
(FWIW RHEL6 comes with hwloc 1.5, RHEL7 comes with hwloc 1.7 and both do
*not* have this warning)

Cheers,

Gilles

On 2014/12/11 12:02, Ralph Castain wrote:
> Per his prior notes, he is using mpirun to launch his jobs. Brice has 
> confirmed that OMPI doesn't have that hwloc warning in it. So either he has 
> inadvertently linked against the Ubuntu system version of hwloc, or the 
> message must be coming from Slurm.
>
>
>> On Dec 10, 2014, at 6:14 PM, Gilles Gouaillardet 
>>  wrote:
>>
>> Pim,
>>
>> at this stage, all i can do is acknowledge your slurm is configured to use 
>> cgroups.
>>
>> and based on your previous comment (e.g. problem only occurs with several 
>> jobs on the same node)
>> that *could* be a bug in OpenMPI (or hwloc).
>>
>> by the way, how do you start your mpi application ?
>> - do you use mpirun ?
>> - do you use srun --resv-ports ?
>>
>> i'll try to reproduce this in my test environment.
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/12/11 2:45, Pim Schellart wrote:
>>> Dear Gilles et al.,
>>>
>>> we tested with openmpi compiled from source (version 1.8.3) both with:
>>>
>>> ./configure --prefix=/usr/local/openmpi --disable-silent-rules 
>>> --with-libltdl=external --with-devel-headers --with-slurm 
>>> --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>>>
>>> and
>>>
>>> ./configure --prefix=/usr/local/openmpi --with-hwloc=/usr 
>>> --disable-silent-rules --with-libltdl=external --with-devel-headers 
>>> --with-slurm --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>>>
>>> (e.g. with embedded and external hwloc) and the issue remains the same. 
>>> Meanwhile we have found another interesting detail. A job is started 
>>> consisting of four tasks split over two nodes. If this is the only job 
>>> running on those nodes the out-of-order warnings do not appear. However, if 
>>> multiple jobs are running the warnings do appear but only for the jobs that 
>>> are started later. We suspect that this is because for the first started 
>>> job the CPU cores assigned are 0 and 1 whereas they are different for the 
>>> later started jobs. I attached the output (including lstopo ---of xml 
>>> output (called for each task)) for both the working and broken case again.
>>>
>>> Kind regards,
>>>
>>> Pim Schellart
>>>
>>>
>>>
>>>
 On 09 Dec 2014, at 09:38, Gilles Gouaillardet 
   
 wrote:

 Pim,

 if you configure OpenMPI with --with-hwloc=external (or something like
 --with-hwloc=/usr) it is very likely
 OpenMPI will use the same hwloc library (e.g. the "system" library) that
 is used by SLURM

 /* i do not know how Ubuntu packages OpenMPI ... */


 The default (e.g. no --with-hwloc parameter in the configure command
 line) is to use the hwloc library that is embedded within OpenMPI

 Gilles

 On 2014/12/09 17:34, Pim Schellart wrote:
> Ah, ok so that was where the confusion came from, I did see hwloc in the 
> SLURM sources but couldn't immediately figure out where exactly it was 
> used. We will try compiling openmpi with the embedded hwloc. Any 
> particular flags I should set?
>
>> On 09 Dec 2014, at 09:30, Ralph Castain  
>>  wrote:
>>
>> There is no linkage between slurm and ompi when it comes to hwloc. If 
>> you directly launch your app using srun, then slurm will use its version 
>> of hwloc to do the binding. If you use mpirun to launch the app, then 
>> we'll use our internal version to do it.
>>
>> The two are completely isolated from each other.
>>
>>
>>> On Dec 9, 2014, at 12:25 AM, Pim Schellart  
>>>  wrote:
>>>
>>> The version that "lstopo --version" reports is the same (1.8) on all 
>>> nodes, but we may indeed be hitting the second issue. We can try to 
>>> compile a new version of openmpi, but how do we ensure that the 
>>> external programs (e.g. SLURM) are using the same hwloc version as the 
>>> one embedded in openmpi? Is it enough to just compile hwloc 1.9 
>>> separately as well and link against that? Also, if this is an issue, 
>>> should we file a bug against hwloc or openmpi on Ubuntu for mismatching 
>>> versions?
>>>
 On 09 Dec 2014, at 00:50, Ralph Castain  
  wrote:

 Hmmm...they probably linked that to the external, system hwloc 
 version, so it sounds like one or more of your nodes has a different 
 hwloc rpm on it.

 I couldn't leaf thru your output well enough to see all the lstopo 
 versions, but you might check to ensur

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Ralph Castain
Per his prior notes, he is using mpirun to launch his jobs. Brice has confirmed 
that OMPI doesn’t have that hwloc warning in it. So either he has inadvertently 
linked against the Ubuntu system version of hwloc, or the message must be 
coming from Slurm.


> On Dec 10, 2014, at 6:14 PM, Gilles Gouaillardet 
>  wrote:
> 
> Pim,
> 
> at this stage, all i can do is acknowledge your slurm is configured to use 
> cgroups.
> 
> and based on your previous comment (e.g. problem only occurs with several 
> jobs on the same node)
> that *could* be a bug in OpenMPI (or hwloc).
> 
> by the way, how do you start your mpi application ?
> - do you use mpirun ?
> - do you use srun --resv-ports ?
> 
> i'll try to reproduce this in my test environment.
> 
> Cheers,
> 
> Gilles
> 
> On 2014/12/11 2:45, Pim Schellart wrote:
>> Dear Gilles et al.,
>> 
>> we tested with openmpi compiled from source (version 1.8.3) both with:
>> 
>> ./configure --prefix=/usr/local/openmpi --disable-silent-rules 
>> --with-libltdl=external --with-devel-headers --with-slurm 
>> --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>> 
>> and
>> 
>> ./configure --prefix=/usr/local/openmpi --with-hwloc=/usr 
>> --disable-silent-rules --with-libltdl=external --with-devel-headers 
>> --with-slurm --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>> 
>> (e.g. with embedded and external hwloc) and the issue remains the same. 
>> Meanwhile we have found another interesting detail. A job is started 
>> consisting of four tasks split over two nodes. If this is the only job 
>> running on those nodes the out-of-order warnings do not appear. However, if 
>> multiple jobs are running the warnings do appear but only for the jobs that 
>> are started later. We suspect that this is because for the first started job 
>> the CPU cores assigned are 0 and 1 whereas they are different for the later 
>> started jobs. I attached the output (including lstopo —of xml output (called 
>> for each task)) for both the working and broken case again.
>> 
>> Kind regards,
>> 
>> Pim Schellart
>> 
>> 
>> 
>> 
>>> On 09 Dec 2014, at 09:38, Gilles Gouaillardet 
>>>   
>>> wrote:
>>> 
>>> Pim,
>>> 
>>> if you configure OpenMPI with --with-hwloc=external (or something like
>>> --with-hwloc=/usr) it is very likely
>>> OpenMPI will use the same hwloc library (e.g. the "system" library) that
>>> is used by SLURM
>>> 
>>> /* i do not know how Ubuntu packages OpenMPI ... */
>>> 
>>> 
>>> The default (e.g. no --with-hwloc parameter in the configure command
>>> line) is to use the hwloc library that is embedded within OpenMPI
>>> 
>>> Gilles
>>> 
>>> On 2014/12/09 17:34, Pim Schellart wrote:
 Ah, ok so that was where the confusion came from, I did see hwloc in the 
 SLURM sources but couldn’t immediately figure out where exactly it was 
 used. We will try compiling openmpi with the embedded hwloc. Any 
 particular flags I should set?
 
> On 09 Dec 2014, at 09:30, Ralph Castain  
>  wrote:
> 
> There is no linkage between slurm and ompi when it comes to hwloc. If you 
> directly launch your app using srun, then slurm will use its version of 
> hwloc to do the binding. If you use mpirun to launch the app, then we’ll 
> use our internal version to do it.
> 
> The two are completely isolated from each other.
> 
> 
>> On Dec 9, 2014, at 12:25 AM, Pim Schellart  
>>  wrote:
>> 
>> The version that “lstopo --version” reports is the same (1.8) on all 
>> nodes, but we may indeed be hitting the second issue. We can try to 
>> compile a new version of openmpi, but how do we ensure that the external 
>> programs (e.g. SLURM) are using the same hwloc version as the one 
>> embedded in openmpi? Is it enough to just compile hwloc 1.9 separately 
>> as well and link against that? Also, if this is an issue, should we file 
>> a bug against hwloc or openmpi on Ubuntu for mismatching versions?
>> 
>>> On 09 Dec 2014, at 00:50, Ralph Castain  
>>>  wrote:
>>> 
>>> Hmmm…they probably linked that to the external, system hwloc version, 
>>> so it sounds like one or more of your nodes has a different hwloc rpm 
>>> on it.
>>> 
>>> I couldn’t leaf thru your output well enough to see all the lstopo 
>>> versions, but you might check to ensure they are the same.
>>> 
>>> Looking at the code base, you may also hit a problem here. OMPI 1.6 
>>> series was based on hwloc 1.3 - the output you sent indicated you have 
>>> hwloc 1.8, which is quite a big change. OMPI 1.8 series is based on 
>>> hwloc 1.9, so at least that is closer (though probably still a 
>>> mismatch).
>>> 
>>> Frankly, I’d just download and install an OMPI tarball myself and avoid 
>>> these headaches. This mismatch in required version

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Gilles Gouaillardet
Pim,

at this stage, all i can do is acknowledge your slurm is configured to
use cgroups.

and based on your previous comment (e.g. problem only occurs with
several jobs on the same node)
that *could* be a bug in OpenMPI (or hwloc).

by the way, how do you start your mpi application ?
- do you use mpirun ?
- do you use srun --resv-ports ?

i'll try to reproduce this in my test environment.

Cheers,

Gilles

On 2014/12/11 2:45, Pim Schellart wrote:
> Dear Gilles et al.,
>
> we tested with openmpi compiled from source (version 1.8.3) both with:
>
> ./configure --prefix=/usr/local/openmpi --disable-silent-rules 
> --with-libltdl=external --with-devel-headers --with-slurm 
> --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>
> and
>
> ./configure --prefix=/usr/local/openmpi --with-hwloc=/usr 
> --disable-silent-rules --with-libltdl=external --with-devel-headers 
> --with-slurm --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>
> (e.g. with embedded and external hwloc) and the issue remains the same. 
> Meanwhile we have found another interesting detail. A job is started 
> consisting of four tasks split over two nodes. If this is the only job 
> running on those nodes the out-of-order warnings do not appear. However, if 
> multiple jobs are running the warnings do appear but only for the jobs that 
> are started later. We suspect that this is because for the first started job 
> the CPU cores assigned are 0 and 1 whereas they are different for the later 
> started jobs. I attached the output (including lstopo ---of xml output 
> (called for each task)) for both the working and broken case again.
>
> Kind regards,
>
> Pim Schellart
>
>
>
>
>> On 09 Dec 2014, at 09:38, Gilles Gouaillardet 
>>  wrote:
>>
>> Pim,
>>
>> if you configure OpenMPI with --with-hwloc=external (or something like
>> --with-hwloc=/usr) it is very likely
>> OpenMPI will use the same hwloc library (e.g. the "system" library) that
>> is used by SLURM
>>
>> /* i do not know how Ubuntu packages OpenMPI ... */
>>
>>
>> The default (e.g. no --with-hwloc parameter in the configure command
>> line) is to use the hwloc library that is embedded within OpenMPI
>>
>> Gilles
>>
>> On 2014/12/09 17:34, Pim Schellart wrote:
>>> Ah, ok so that was where the confusion came from, I did see hwloc in the 
>>> SLURM sources but couldn't immediately figure out where exactly it was 
>>> used. We will try compiling openmpi with the embedded hwloc. Any particular 
>>> flags I should set?
>>>
 On 09 Dec 2014, at 09:30, Ralph Castain  wrote:

 There is no linkage between slurm and ompi when it comes to hwloc. If you 
 directly launch your app using srun, then slurm will use its version of 
 hwloc to do the binding. If you use mpirun to launch the app, then we'll 
 use our internal version to do it.

 The two are completely isolated from each other.


> On Dec 9, 2014, at 12:25 AM, Pim Schellart  wrote:
>
> The version that "lstopo --version" reports is the same (1.8) on all 
> nodes, but we may indeed be hitting the second issue. We can try to 
> compile a new version of openmpi, but how do we ensure that the external 
> programs (e.g. SLURM) are using the same hwloc version as the one 
> embedded in openmpi? Is it enough to just compile hwloc 1.9 separately as 
> well and link against that? Also, if this is an issue, should we file a 
> bug against hwloc or openmpi on Ubuntu for mismatching versions?
>
>> On 09 Dec 2014, at 00:50, Ralph Castain  wrote:
>>
>> Hmmm...they probably linked that to the external, system hwloc version, 
>> so it sounds like one or more of your nodes has a different hwloc rpm on 
>> it.
>>
>> I couldn't leaf thru your output well enough to see all the lstopo 
>> versions, but you might check to ensure they are the same.
>>
>> Looking at the code base, you may also hit a problem here. OMPI 1.6 
>> series was based on hwloc 1.3 - the output you sent indicated you have 
>> hwloc 1.8, which is quite a big change. OMPI 1.8 series is based on 
>> hwloc 1.9, so at least that is closer (though probably still a mismatch).
>>
>> Frankly, I'd just download and install an OMPI tarball myself and avoid 
>> these headaches. This mismatch in required versions is why we embed 
>> hwloc as it is a critical library for OMPI, and we had to ensure that 
>> the version matched our internal requirements.
>>
>>
>>> On Dec 8, 2014, at 8:50 AM, Pim Schellart  wrote:
>>>
>>> It is the default openmpi that comes with Ubuntu 14.04.
>>>
 On 08 Dec 2014, at 17:17, Ralph Castain  wrote:

 Pim: is this an OMPI you built, or one you were given somehow? If you 
 built it, how did you configure it?

> On Dec 8, 2014, at 8:12 AM, Brice Goglin  
> wrote:
>
> It likely depends on how SLURM allocates the cpuse

Re: [OMPI devel] opal_lifo/opal_fifo fail with make distcheck

2014-12-10 Thread Nathan Hjelm

The failure was due to the use of opal_init() in the tests. I thought it
was ok to use because it is used by other tests (which turned out to be
disabled) but that isn't the case. opal_init_util() has to be used
instead. I pushed a fix to master last night.

-Nathan

On Tue, Dec 09, 2014 at 03:35:27PM -0800, Howard Pritchard wrote:
>Hi Folks,
>I've tried running make distcheck on master and get failures for
>opal_fifo/opal_lifo:
> 
>make[4]: Leaving directory
>`/global/u2/h/hpp/ompi/openmpi-gitclone/_build/test/class'
> 
>make  check-TESTS
> 
>make[4]: Entering directory
>`/global/u2/h/hpp/ompi/openmpi-gitclone/_build/test/class'
> 
>make[5]: Entering directory
>`/global/u2/h/hpp/ompi/openmpi-gitclone/_build/test/class'
> 
>FAIL: opal_lifo
> 
>FAIL: opal_fifo
> 
>Has anyone else seen this?  
> 
>Howard

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16485.php



pgpNgm9RvEn3Y.pgp
Description: PGP signature


Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Ralph Castain
I think you actually already answered this - if that warning message isn’t in 
OMPI’s internal code, and the user gets it when building with either internal 
or external hwloc support, then it must be coming from Slurm.

This assumes that ldd libopen-pal.so doesn’t show OMPI to actually be linked 
against the external hwloc in both cases :-)


> On Dec 10, 2014, at 10:23 AM, Brice Goglin  wrote:
> 
> Unfortunately I don't think we have any way to know which process and
> hwloc version generates a XML so far. I am currently looking at adding
> this to hwloc 1.10.1 because of this thread.
> 
> One thing that could help would be to dump the XML file that OMPI
> receives. Just write the entire buffer to a file before passing it to
> hwloc, and send it to me in the broken case. If "lstopo -i file.xml"
> shows the warning, we'll know for sure it's coming from a old hwloc
> somewhere.
> 
> Brice
> 
> 
> 
> Le 10/12/2014 19:11, Ralph Castain a écrit :
>> Brice: is there any way to tell if these are coming from Slurm vs OMPI? 
>> Given this data, I’m suspicious that this might have something to do with 
>> Slurm and not us.
>> 
>> 
>>> On Dec 10, 2014, at 9:45 AM, Pim Schellart  wrote:
>>> 
>>> Dear Gilles et al.,
>>> 
>>> we tested with openmpi compiled from source (version 1.8.3) both with:
>>> 
>>> ./configure --prefix=/usr/local/openmpi --disable-silent-rules 
>>> --with-libltdl=external --with-devel-headers --with-slurm 
>>> --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>>> 
>>> and
>>> 
>>> ./configure --prefix=/usr/local/openmpi --with-hwloc=/usr 
>>> --disable-silent-rules --with-libltdl=external --with-devel-headers 
>>> --with-slurm --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>>> 
>>> (e.g. with embedded and external hwloc) and the issue remains the same. 
>>> Meanwhile we have found another interesting detail. A job is started 
>>> consisting of four tasks split over two nodes. If this is the only job 
>>> running on those nodes the out-of-order warnings do not appear. However, if 
>>> multiple jobs are running the warnings do appear but only for the jobs that 
>>> are started later. We suspect that this is because for the first started 
>>> job the CPU cores assigned are 0 and 1 whereas they are different for the 
>>> later started jobs. I attached the output (including lstopo —of xml output 
>>> (called for each task)) for both the working and broken case again.
>>> 
>>> Kind regards,
>>> 
>>> Pim Schellart
>>> 
>>> 
>>> 
 On 09 Dec 2014, at 09:38, Gilles Gouaillardet 
  wrote:
 
 Pim,
 
 if you configure OpenMPI with --with-hwloc=external (or something like
 --with-hwloc=/usr) it is very likely
 OpenMPI will use the same hwloc library (e.g. the "system" library) that
 is used by SLURM
 
 /* i do not know how Ubuntu packages OpenMPI ... */
 
 
 The default (e.g. no --with-hwloc parameter in the configure command
 line) is to use the hwloc library that is embedded within OpenMPI
 
 Gilles
 
 On 2014/12/09 17:34, Pim Schellart wrote:
> Ah, ok so that was where the confusion came from, I did see hwloc in the 
> SLURM sources but couldn’t immediately figure out where exactly it was 
> used. We will try compiling openmpi with the embedded hwloc. Any 
> particular flags I should set?
> 
>> On 09 Dec 2014, at 09:30, Ralph Castain  wrote:
>> 
>> There is no linkage between slurm and ompi when it comes to hwloc. If 
>> you directly launch your app using srun, then slurm will use its version 
>> of hwloc to do the binding. If you use mpirun to launch the app, then 
>> we’ll use our internal version to do it.
>> 
>> The two are completely isolated from each other.
>> 
>> 
>>> On Dec 9, 2014, at 12:25 AM, Pim Schellart  
>>> wrote:
>>> 
>>> The version that “lstopo --version” reports is the same (1.8) on all 
>>> nodes, but we may indeed be hitting the second issue. We can try to 
>>> compile a new version of openmpi, but how do we ensure that the 
>>> external programs (e.g. SLURM) are using the same hwloc version as the 
>>> one embedded in openmpi? Is it enough to just compile hwloc 1.9 
>>> separately as well and link against that? Also, if this is an issue, 
>>> should we file a bug against hwloc or openmpi on Ubuntu for mismatching 
>>> versions?
>>> 
 On 09 Dec 2014, at 00:50, Ralph Castain  wrote:
 
 Hmmm…they probably linked that to the external, system hwloc version, 
 so it sounds like one or more of your nodes has a different hwloc rpm 
 on it.
 
 I couldn’t leaf thru your output well enough to see all the lstopo 
 versions, but you might check to ensure they are the same.
 
 Looking at the code base, you may also hit a problem here. OMPI 1.6 
 series was based on hwloc 1.3 - the output you sent indica

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Brice Goglin
Unfortunately I don't think we have any way to know which process and
hwloc version generates a XML so far. I am currently looking at adding
this to hwloc 1.10.1 because of this thread.

One thing that could help would be to dump the XML file that OMPI
receives. Just write the entire buffer to a file before passing it to
hwloc, and send it to me in the broken case. If "lstopo -i file.xml"
shows the warning, we'll know for sure it's coming from a old hwloc
somewhere.

Brice



Le 10/12/2014 19:11, Ralph Castain a écrit :
> Brice: is there any way to tell if these are coming from Slurm vs OMPI? Given 
> this data, I’m suspicious that this might have something to do with Slurm and 
> not us.
>
>
>> On Dec 10, 2014, at 9:45 AM, Pim Schellart  wrote:
>>
>> Dear Gilles et al.,
>>
>> we tested with openmpi compiled from source (version 1.8.3) both with:
>>
>> ./configure --prefix=/usr/local/openmpi --disable-silent-rules 
>> --with-libltdl=external --with-devel-headers --with-slurm 
>> --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>>
>> and
>>
>> ./configure --prefix=/usr/local/openmpi --with-hwloc=/usr 
>> --disable-silent-rules --with-libltdl=external --with-devel-headers 
>> --with-slurm --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>>
>> (e.g. with embedded and external hwloc) and the issue remains the same. 
>> Meanwhile we have found another interesting detail. A job is started 
>> consisting of four tasks split over two nodes. If this is the only job 
>> running on those nodes the out-of-order warnings do not appear. However, if 
>> multiple jobs are running the warnings do appear but only for the jobs that 
>> are started later. We suspect that this is because for the first started job 
>> the CPU cores assigned are 0 and 1 whereas they are different for the later 
>> started jobs. I attached the output (including lstopo —of xml output (called 
>> for each task)) for both the working and broken case again.
>>
>> Kind regards,
>>
>> Pim Schellart
>>
>> 
>>
>>> On 09 Dec 2014, at 09:38, Gilles Gouaillardet 
>>>  wrote:
>>>
>>> Pim,
>>>
>>> if you configure OpenMPI with --with-hwloc=external (or something like
>>> --with-hwloc=/usr) it is very likely
>>> OpenMPI will use the same hwloc library (e.g. the "system" library) that
>>> is used by SLURM
>>>
>>> /* i do not know how Ubuntu packages OpenMPI ... */
>>>
>>>
>>> The default (e.g. no --with-hwloc parameter in the configure command
>>> line) is to use the hwloc library that is embedded within OpenMPI
>>>
>>> Gilles
>>>
>>> On 2014/12/09 17:34, Pim Schellart wrote:
 Ah, ok so that was where the confusion came from, I did see hwloc in the 
 SLURM sources but couldn’t immediately figure out where exactly it was 
 used. We will try compiling openmpi with the embedded hwloc. Any 
 particular flags I should set?

> On 09 Dec 2014, at 09:30, Ralph Castain  wrote:
>
> There is no linkage between slurm and ompi when it comes to hwloc. If you 
> directly launch your app using srun, then slurm will use its version of 
> hwloc to do the binding. If you use mpirun to launch the app, then we’ll 
> use our internal version to do it.
>
> The two are completely isolated from each other.
>
>
>> On Dec 9, 2014, at 12:25 AM, Pim Schellart  wrote:
>>
>> The version that “lstopo --version” reports is the same (1.8) on all 
>> nodes, but we may indeed be hitting the second issue. We can try to 
>> compile a new version of openmpi, but how do we ensure that the external 
>> programs (e.g. SLURM) are using the same hwloc version as the one 
>> embedded in openmpi? Is it enough to just compile hwloc 1.9 separately 
>> as well and link against that? Also, if this is an issue, should we file 
>> a bug against hwloc or openmpi on Ubuntu for mismatching versions?
>>
>>> On 09 Dec 2014, at 00:50, Ralph Castain  wrote:
>>>
>>> Hmmm…they probably linked that to the external, system hwloc version, 
>>> so it sounds like one or more of your nodes has a different hwloc rpm 
>>> on it.
>>>
>>> I couldn’t leaf thru your output well enough to see all the lstopo 
>>> versions, but you might check to ensure they are the same.
>>>
>>> Looking at the code base, you may also hit a problem here. OMPI 1.6 
>>> series was based on hwloc 1.3 - the output you sent indicated you have 
>>> hwloc 1.8, which is quite a big change. OMPI 1.8 series is based on 
>>> hwloc 1.9, so at least that is closer (though probably still a 
>>> mismatch).
>>>
>>> Frankly, I’d just download and install an OMPI tarball myself and avoid 
>>> these headaches. This mismatch in required versions is why we embed 
>>> hwloc as it is a critical library for OMPI, and we had to ensure that 
>>> the version matched our internal requirements.
>>>
>>>
 On Dec 8, 2014, at 8:50 AM, Pim Schellart  
 wrote:

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Ralph Castain
Brice: is there any way to tell if these are coming from Slurm vs OMPI? Given 
this data, I’m suspicious that this might have something to do with Slurm and 
not us.


> On Dec 10, 2014, at 9:45 AM, Pim Schellart  wrote:
> 
> Dear Gilles et al.,
> 
> we tested with openmpi compiled from source (version 1.8.3) both with:
> 
> ./configure --prefix=/usr/local/openmpi --disable-silent-rules 
> --with-libltdl=external --with-devel-headers --with-slurm 
> --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
> 
> and
> 
> ./configure --prefix=/usr/local/openmpi --with-hwloc=/usr 
> --disable-silent-rules --with-libltdl=external --with-devel-headers 
> --with-slurm --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
> 
> (e.g. with embedded and external hwloc) and the issue remains the same. 
> Meanwhile we have found another interesting detail. A job is started 
> consisting of four tasks split over two nodes. If this is the only job 
> running on those nodes the out-of-order warnings do not appear. However, if 
> multiple jobs are running the warnings do appear but only for the jobs that 
> are started later. We suspect that this is because for the first started job 
> the CPU cores assigned are 0 and 1 whereas they are different for the later 
> started jobs. I attached the output (including lstopo —of xml output (called 
> for each task)) for both the working and broken case again.
> 
> Kind regards,
> 
> Pim Schellart
> 
> 
> 
>> On 09 Dec 2014, at 09:38, Gilles Gouaillardet 
>>  wrote:
>> 
>> Pim,
>> 
>> if you configure OpenMPI with --with-hwloc=external (or something like
>> --with-hwloc=/usr) it is very likely
>> OpenMPI will use the same hwloc library (e.g. the "system" library) that
>> is used by SLURM
>> 
>> /* i do not know how Ubuntu packages OpenMPI ... */
>> 
>> 
>> The default (e.g. no --with-hwloc parameter in the configure command
>> line) is to use the hwloc library that is embedded within OpenMPI
>> 
>> Gilles
>> 
>> On 2014/12/09 17:34, Pim Schellart wrote:
>>> Ah, ok so that was where the confusion came from, I did see hwloc in the 
>>> SLURM sources but couldn’t immediately figure out where exactly it was 
>>> used. We will try compiling openmpi with the embedded hwloc. Any particular 
>>> flags I should set?
>>> 
 On 09 Dec 2014, at 09:30, Ralph Castain  wrote:
 
 There is no linkage between slurm and ompi when it comes to hwloc. If you 
 directly launch your app using srun, then slurm will use its version of 
 hwloc to do the binding. If you use mpirun to launch the app, then we’ll 
 use our internal version to do it.
 
 The two are completely isolated from each other.
 
 
> On Dec 9, 2014, at 12:25 AM, Pim Schellart  wrote:
> 
> The version that “lstopo --version” reports is the same (1.8) on all 
> nodes, but we may indeed be hitting the second issue. We can try to 
> compile a new version of openmpi, but how do we ensure that the external 
> programs (e.g. SLURM) are using the same hwloc version as the one 
> embedded in openmpi? Is it enough to just compile hwloc 1.9 separately as 
> well and link against that? Also, if this is an issue, should we file a 
> bug against hwloc or openmpi on Ubuntu for mismatching versions?
> 
>> On 09 Dec 2014, at 00:50, Ralph Castain  wrote:
>> 
>> Hmmm…they probably linked that to the external, system hwloc version, so 
>> it sounds like one or more of your nodes has a different hwloc rpm on it.
>> 
>> I couldn’t leaf thru your output well enough to see all the lstopo 
>> versions, but you might check to ensure they are the same.
>> 
>> Looking at the code base, you may also hit a problem here. OMPI 1.6 
>> series was based on hwloc 1.3 - the output you sent indicated you have 
>> hwloc 1.8, which is quite a big change. OMPI 1.8 series is based on 
>> hwloc 1.9, so at least that is closer (though probably still a mismatch).
>> 
>> Frankly, I’d just download and install an OMPI tarball myself and avoid 
>> these headaches. This mismatch in required versions is why we embed 
>> hwloc as it is a critical library for OMPI, and we had to ensure that 
>> the version matched our internal requirements.
>> 
>> 
>>> On Dec 8, 2014, at 8:50 AM, Pim Schellart  wrote:
>>> 
>>> It is the default openmpi that comes with Ubuntu 14.04.
>>> 
 On 08 Dec 2014, at 17:17, Ralph Castain  wrote:
 
 Pim: is this an OMPI you built, or one you were given somehow? If you 
 built it, how did you configure it?
 
> On Dec 8, 2014, at 8:12 AM, Brice Goglin  
> wrote:
> 
> It likely depends on how SLURM allocates the cpuset/cgroup inside the
> nodes. The XML warning is related to these restrictions inside the 
> node.
> Anyway, my feeling is that there's a old OMPI or a old hwloc 
> somewhere.

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Brice Goglin
The warning does not exist in the hwloc code inside OMPI 1.8, so there's
something strange happening in your first test. I would assume it's
using the external hwloc in both cases for some reason. Running ldd on
libopen-pal.so could be a way to check whether it depends on an external
libhwloc.so or not.

I still can't reproduce any warning with your XML outputs.

Which hwloc do you have running on the frontend/master node where mpirun
is launched? Try loading each XML output on the frontend node with
"lstopo -i foo.xml". You'll need a way to split the outputs of each
node, for instance mpirun myscript.sh where myscript.sh does lstopo
$(hostname).xml

Brice


Le 10/12/2014 18:45, Pim Schellart a écrit :
> Dear Gilles et al.,
>
> we tested with openmpi compiled from source (version 1.8.3) both with:
>
> ./configure --prefix=/usr/local/openmpi --disable-silent-rules 
> --with-libltdl=external --with-devel-headers --with-slurm 
> --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>
> and
>
> ./configure --prefix=/usr/local/openmpi --with-hwloc=/usr 
> --disable-silent-rules --with-libltdl=external --with-devel-headers 
> --with-slurm --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
>
> (e.g. with embedded and external hwloc) and the issue remains the same. 
> Meanwhile we have found another interesting detail. A job is started 
> consisting of four tasks split over two nodes. If this is the only job 
> running on those nodes the out-of-order warnings do not appear. However, if 
> multiple jobs are running the warnings do appear but only for the jobs that 
> are started later. We suspect that this is because for the first started job 
> the CPU cores assigned are 0 and 1 whereas they are different for the later 
> started jobs. I attached the output (including lstopo —of xml output (called 
> for each task)) for both the working and broken case again.
>
> Kind regards,
>
> Pim Schellart
>
>
>
>
>> On 09 Dec 2014, at 09:38, Gilles Gouaillardet 
>>  wrote:
>>
>> Pim,
>>
>> if you configure OpenMPI with --with-hwloc=external (or something like
>> --with-hwloc=/usr) it is very likely
>> OpenMPI will use the same hwloc library (e.g. the "system" library) that
>> is used by SLURM
>>
>> /* i do not know how Ubuntu packages OpenMPI ... */
>>
>>
>> The default (e.g. no --with-hwloc parameter in the configure command
>> line) is to use the hwloc library that is embedded within OpenMPI
>>
>> Gilles
>>
>> On 2014/12/09 17:34, Pim Schellart wrote:
>>> Ah, ok so that was where the confusion came from, I did see hwloc in the 
>>> SLURM sources but couldn’t immediately figure out where exactly it was 
>>> used. We will try compiling openmpi with the embedded hwloc. Any particular 
>>> flags I should set?
>>>
 On 09 Dec 2014, at 09:30, Ralph Castain  wrote:

 There is no linkage between slurm and ompi when it comes to hwloc. If you 
 directly launch your app using srun, then slurm will use its version of 
 hwloc to do the binding. If you use mpirun to launch the app, then we’ll 
 use our internal version to do it.

 The two are completely isolated from each other.


> On Dec 9, 2014, at 12:25 AM, Pim Schellart  wrote:
>
> The version that “lstopo --version” reports is the same (1.8) on all 
> nodes, but we may indeed be hitting the second issue. We can try to 
> compile a new version of openmpi, but how do we ensure that the external 
> programs (e.g. SLURM) are using the same hwloc version as the one 
> embedded in openmpi? Is it enough to just compile hwloc 1.9 separately as 
> well and link against that? Also, if this is an issue, should we file a 
> bug against hwloc or openmpi on Ubuntu for mismatching versions?
>
>> On 09 Dec 2014, at 00:50, Ralph Castain  wrote:
>>
>> Hmmm…they probably linked that to the external, system hwloc version, so 
>> it sounds like one or more of your nodes has a different hwloc rpm on it.
>>
>> I couldn’t leaf thru your output well enough to see all the lstopo 
>> versions, but you might check to ensure they are the same.
>>
>> Looking at the code base, you may also hit a problem here. OMPI 1.6 
>> series was based on hwloc 1.3 - the output you sent indicated you have 
>> hwloc 1.8, which is quite a big change. OMPI 1.8 series is based on 
>> hwloc 1.9, so at least that is closer (though probably still a mismatch).
>>
>> Frankly, I’d just download and install an OMPI tarball myself and avoid 
>> these headaches. This mismatch in required versions is why we embed 
>> hwloc as it is a critical library for OMPI, and we had to ensure that 
>> the version matched our internal requirements.
>>
>>
>>> On Dec 8, 2014, at 8:50 AM, Pim Schellart  wrote:
>>>
>>> It is the default openmpi that comes with Ubuntu 14.04.
>>>
 On 08 Dec 2014, at 17:17, Ralph Castain  wrote:

 Pim: is this an O

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Pim Schellart
Dear Gilles et al.,

we tested with openmpi compiled from source (version 1.8.3) both with:

./configure --prefix=/usr/local/openmpi --disable-silent-rules 
--with-libltdl=external --with-devel-headers --with-slurm 
--enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi

and

./configure --prefix=/usr/local/openmpi --with-hwloc=/usr 
--disable-silent-rules --with-libltdl=external --with-devel-headers 
--with-slurm --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi

(e.g. with embedded and external hwloc) and the issue remains the same. 
Meanwhile we have found another interesting detail. A job is started consisting 
of four tasks split over two nodes. If this is the only job running on those 
nodes the out-of-order warnings do not appear. However, if multiple jobs are 
running the warnings do appear but only for the jobs that are started later. We 
suspect that this is because for the first started job the CPU cores assigned 
are 0 and 1 whereas they are different for the later started jobs. I attached 
the output (including lstopo —of xml output (called for each task)) for both 
the working and broken case again.

Kind regards,

Pim Schellart




  
























  
  
  
  


  
  
  


  

  

  

  

  

  


  
  


  



  
  
  

  


  
  
  

  

  
  



  
  
  

  


  
  
  

  

  
  



  
  
  

  
  



  
  
  

  
  



  
  
  



  
  
  


  

  

  


  



  
  
  


  
  






  

  

  




  
























  
  
  
  


  
  
  


  

  

  

  

  
  

  

  

  

  

  
  

  
  
  



  

  
  



  

  


  
  
  



  

  
  



  

  


  
  
  



  


  
  
  



  


  
  
  



  
  
  



  
  

  

  

  


  
  
  


  

  

  

  

  

  
  

  
  
  



  
  


  
  
  
  
  
  

  

  

  




  
























  
  
  
  


  
  
  


  

  

  

  

  
  

  

  

  

  

  
  

  
  
  
   

Re: [OMPI devel] [OMPI users] Warning about not enough registerable memory on SL6.6

2014-12-10 Thread Joshua Ladd
Thanks, Gilles

We're back to looking at this (yet again.) It's a false positive, yes,
however, it's not completely benign. The max_reg that's calculated is much
smaller than it should be. In OFED 3.12, max_reg should be 2*TOTAL_RAM. We
should have a fix for 1.8.4.

Josh

On Mon, Dec 8, 2014 at 9:25 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

>  Folks,
>
> FWIW, i observe a similar behaviour on my system.
>
> imho, the root cause is OFED has been upgraded from a (quite) older
> version to latest 3.12 version
>
> here is the relevant part of code (btl_openib.c from the master) :
>
>
> static uint64_t calculate_max_reg (void)
> {
> if (0 == stat("/sys/module/mlx4_core/parameters/log_num_mtt",
> &statinfo)) {
> } else if (0 == stat("/sys/module/ib_mthca/parameters/num_mtt",
> &statinfo)) {
> mtts_per_seg = 1 <<
> read_module_param("/sys/module/ib_mthca/parameters/log_mtts_per_seg", 1);
> num_mtt =
> read_module_param("/sys/module/ib_mthca/parameters/num_mtt", 1 << 20);
> reserved_mtt =
> read_module_param("/sys/module/ib_mthca/parameters/fmr_reserved_mtts", 0);
>
> max_reg = (num_mtt - reserved_mtt) * opal_getpagesize () *
> mtts_per_seg;
> } else if (
> (0 == stat("/sys/module/mlx5_core", &statinfo)) ||
> (0 == stat("/sys/module/mlx4_core/parameters", &statinfo)) ||
> (0 == stat("/sys/module/ib_mthca/parameters", &statinfo))
> ) {
> /* mlx5 means that we have ofed 2.0 and it can always register
> 2xmem_total for any mlx hca */
> max_reg = 2 * mem_total;
> } else {
> }
>
> /* Print a warning if we can't register more than 75% of physical
>memory.  Abort if the abort_not_enough_reg_mem MCA param was
>set. */
> if (max_reg < mem_total * 3 / 4) {
> }
> return (max_reg * 7) >> 3;
> }
>
> with OFED 3.12, the /sys/module/mlx4_core/parameters/log_num_mtt pseudo
> file does *not* exist any more
> /sys/module/ib_mthca/parameters/num_mtt exists so the second path is taken
> and mtts_per_seg is read from
> /sys/module/ib_mthca/parameters/log_mtts_per_seg
>
> i noted that log_mtts_per_seg is also a parameter of mlx4_core :
> /sys/module/mlx4_core/parameters/log_mtts_per_seg
>
> the value is 3 in ib_mthca (and leads to a warning) but 5 in mlx4_core
> (big enough, and does not lead to a warning if this value is read)
>
>
> i had no time to read the latest ofed doc, so i cannot answer :
> - should log_mtts_per_seg be read from mlx4_core instead ?
> - is the warning a false positive ?
>
>
> my only point is this warning *might* be a false positive and the root
> cause *might* be calculate_max_reg logic
> *could* be wrong with the latest OFED stack.
>
> Could the Mellanox folks comment on this ?
>
> Cheers,
>
> Gilles
>
>
>
>
>
> On 2014/12/09 3:18, Götz Waschk wrote:
>
> Hi,
>
> here's another test with openmpi 1.8.3. With 1.8.1, 32GB was detected, now
> it is just 16:
> % mpirun -np 2 /usr/lib64/openmpi-intel/bin/mpitests-osu_get_bw
> --
> WARNING: It appears that your OpenFabrics subsystem is configured to only
> allow registering part of your physical memory.  This can cause MPI jobs to
> run with erratic performance, hang, and/or crash.
>
> This may be caused by your OpenFabrics vendor limiting the amount of
> physical memory that can be registered.  You should investigate the
> relevant Linux kernel module parameters that control how much physical
> memory can be registered, and increase them to allow registering all
> physical memory on your machine.
>
> See this Open MPI FAQ item for more information on these Linux kernel module
> parameters:
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
>
>   Local host:  pax95
>   Registerable memory: 16384 MiB
>   Total memory:49106 MiB
>
> Your MPI job will continue, but may be behave poorly and/or hang.
> --
> # OSU MPI_Get Bandwidth Test v4.3
> # Window creation: MPI_Win_allocate
> # Synchronization: MPI_Win_flush
> # Size  Bandwidth (MB/s)
> 1  28.56
> 2  58.74
>
>
> So it wasn't fixed for RHEL 6.6.
>
> Regards, Götz
>
> On Mon, Dec 8, 2014 at 4:00 PM, Götz Waschk  
>  wrote:
>
>
>  Hi,
>
> I had tested 1.8.4rc1 and it wasn't fixed. I can try again though,
> maybe I had made an error.
>
> Regards, Götz Waschk
>
> On Mon, Dec 8, 2014 at 3:17 PM, Joshua Ladd  
>  wrote:
>
>  Hi,
>
> This should be fixed in OMPI 1.8.3. Is it possible for you to give 1.8.3
>
>  a
>
>  shot?
>
> Best,
>
> Josh
>
> On Mon, Dec 8, 2014 at 8:43 AM, Götz Waschk  
> 
>
>  wrote:
>
>  Dear Open-MPI experts,
>
> I have updated my little cluster from Scientific Linux 6.5 to 6.6,
> this included extensive changes in the Infiniband drivers and a newer
> openmpi version (1.8.1). Now I'm getting this messa

Re: [OMPI devel] OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-10 Thread Gilles Gouaillardet
Hi,

I already figured this out and did the port :-)

Cheers,

Gilles

Piotr Lesnicki  wrote:
>Hi,
>
>Gilles, the patch I sent is vs v1.6, so I prepare a patch vs master.
>
>The patch on v1.6 can not apply on master because of changes in the
>btl openib: connecting XRC queues has changed from XOOB to UDCM.
>
>Piotr
>
>
>De : devel [devel-boun...@open-mpi.org] de la part de Gilles Gouaillardet 
>[gilles.gouaillar...@iferc.org]
>Envoyé : mercredi 10 décembre 2014 09:20
>À : Open MPI Developers
>Objet : Re: [OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12
>
>Piotr and all,
>
>i issued PR #313 (vs master) based on your patch:
>https://github.com/open-mpi/ompi/pull/313
>
>could you please have a look at it ?
>
>Cheers,
>
>Gilles
>
>On 2014/12/09 22:07, Gilles Gouaillardet wrote:
>> Thanks Piotr,
>>
>> Based on the ompi community rules, a pr should be made vs the master, so 
>> code can be reviewed and shacked a bit.
>> I already prepared such a pr based on your patch and i will push it tomorrow.
>>
>> Then the changes will be backported to the v1.8 branch, assuming this is not 
>> considered as a new feature.
>>
>> Ralph, can you please comment on that ?
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> Piotr Lesnicki さんのメール:
>>> Hi,
>>>
>>> We indeed have a fix for XRC support on our branch at Bull and sorry I
>>> neglected to contribute it, my bad…
>>>
>>> I join here the patch on top of current v1.6.6 (should I rather
>>> submit it as a pull request ?).
>>>
>>> For v1.8+, a merge of the v1.6 code is not enough as openib connect
>>> changed from xoob to udcm. I made a version on a pre-git state, so I
>>> will update it and make a pull request.
>>>
>>> Piotr
>>>
>>>
>>>
>>>
>>> 
>>> De : devel [devel-boun...@open-mpi.org] de la part de Gilles Gouaillardet 
>>> [gilles.gouaillar...@iferc.org]
>>> Envoyé : lundi 8 décembre 2014 03:27
>>> À : Open MPI Developers
>>> Objet : Re: [OMPI devel] openmpi and XRC API from ofed-3.12
>>>
>>> Hi Piotr,
>>>
>>> this  is quite an old thread now, but i did not see any support for XRC
>>> with ofed 3.12 yet
>>> (nor in trunk nor in v1.8)
>>>
>>> my understanding is that Bull already did something similar for the v1.6
>>> series,
>>> so let me put this the other way around :
>>>
>>> does Bull have any plan to contribute this work ?
>>> (for example, publish a patch for the v1.6 series, or submit pull
>>> request(s) for master and v1.8 branch)
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 2014/04/23 21:58, Piotr Lesnicki wrote:
 Hi,

 In OFED-3.12 the API for XRC has changed. I did not find
 corresponding changes in Open MPI: for example the function
 'ibv_create_xrc_rcv_qp()' queried in openmpi configure script no
 longer exists in ofed-3.12-rc1.

 Are there any plans to support the new XRC API ?


 --
 Piotr
 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post:
 http://www.open-mpi.org/community/lists/devel/2014/04/14583.php
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16445.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16467.php
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/12/16488.php
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/12/16489.php

Re: [OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-10 Thread Piotr Lesnicki
Hi,

Gilles, the patch I sent is vs v1.6, so I prepare a patch vs master.

The patch on v1.6 can not apply on master because of changes in the
btl openib: connecting XRC queues has changed from XOOB to UDCM.

Piotr


De : devel [devel-boun...@open-mpi.org] de la part de Gilles Gouaillardet 
[gilles.gouaillar...@iferc.org]
Envoyé : mercredi 10 décembre 2014 09:20
À : Open MPI Developers
Objet : Re: [OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

Piotr and all,

i issued PR #313 (vs master) based on your patch:
https://github.com/open-mpi/ompi/pull/313

could you please have a look at it ?

Cheers,

Gilles

On 2014/12/09 22:07, Gilles Gouaillardet wrote:
> Thanks Piotr,
>
> Based on the ompi community rules, a pr should be made vs the master, so code 
> can be reviewed and shacked a bit.
> I already prepared such a pr based on your patch and i will push it tomorrow.
>
> Then the changes will be backported to the v1.8 branch, assuming this is not 
> considered as a new feature.
>
> Ralph, can you please comment on that ?
>
> Cheers,
>
> Gilles
>
>
> Piotr Lesnicki さんのメール:
>> Hi,
>>
>> We indeed have a fix for XRC support on our branch at Bull and sorry I
>> neglected to contribute it, my bad…
>>
>> I join here the patch on top of current v1.6.6 (should I rather
>> submit it as a pull request ?).
>>
>> For v1.8+, a merge of the v1.6 code is not enough as openib connect
>> changed from xoob to udcm. I made a version on a pre-git state, so I
>> will update it and make a pull request.
>>
>> Piotr
>>
>>
>>
>>
>> 
>> De : devel [devel-boun...@open-mpi.org] de la part de Gilles Gouaillardet 
>> [gilles.gouaillar...@iferc.org]
>> Envoyé : lundi 8 décembre 2014 03:27
>> À : Open MPI Developers
>> Objet : Re: [OMPI devel] openmpi and XRC API from ofed-3.12
>>
>> Hi Piotr,
>>
>> this  is quite an old thread now, but i did not see any support for XRC
>> with ofed 3.12 yet
>> (nor in trunk nor in v1.8)
>>
>> my understanding is that Bull already did something similar for the v1.6
>> series,
>> so let me put this the other way around :
>>
>> does Bull have any plan to contribute this work ?
>> (for example, publish a patch for the v1.6 series, or submit pull
>> request(s) for master and v1.8 branch)
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/04/23 21:58, Piotr Lesnicki wrote:
>>> Hi,
>>>
>>> In OFED-3.12 the API for XRC has changed. I did not find
>>> corresponding changes in Open MPI: for example the function
>>> 'ibv_create_xrc_rcv_qp()' queried in openmpi configure script no
>>> longer exists in ofed-3.12-rc1.
>>>
>>> Are there any plans to support the new XRC API ?
>>>
>>>
>>> --
>>> Piotr
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/04/14583.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16445.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16467.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/12/16488.php


Re: [OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-10 Thread Gilles Gouaillardet
Piotr and all,

i issued PR #313 (vs master) based on your patch:
https://github.com/open-mpi/ompi/pull/313

could you please have a look at it ?

Cheers,

Gilles

On 2014/12/09 22:07, Gilles Gouaillardet wrote:
> Thanks Piotr,
>
> Based on the ompi community rules, a pr should be made vs the master, so code 
> can be reviewed and shacked a bit.
> I already prepared such a pr based on your patch and i will push it tomorrow.
>
> Then the changes will be backported to the v1.8 branch, assuming this is not 
> considered as a new feature.
>
> Ralph, can you please comment on that ?
>
> Cheers,
>
> Gilles
>
>
> Piotr Lesnicki さんのメール:
>> Hi,
>>
>> We indeed have a fix for XRC support on our branch at Bull and sorry I
>> neglected to contribute it, my bad…
>>
>> I join here the patch on top of current v1.6.6 (should I rather
>> submit it as a pull request ?).
>>
>> For v1.8+, a merge of the v1.6 code is not enough as openib connect
>> changed from xoob to udcm. I made a version on a pre-git state, so I
>> will update it and make a pull request.
>>
>> Piotr
>>
>>
>>
>>
>> 
>> De : devel [devel-boun...@open-mpi.org] de la part de Gilles Gouaillardet 
>> [gilles.gouaillar...@iferc.org]
>> Envoyé : lundi 8 décembre 2014 03:27
>> À : Open MPI Developers
>> Objet : Re: [OMPI devel] openmpi and XRC API from ofed-3.12
>>
>> Hi Piotr,
>>
>> this  is quite an old thread now, but i did not see any support for XRC
>> with ofed 3.12 yet
>> (nor in trunk nor in v1.8)
>>
>> my understanding is that Bull already did something similar for the v1.6
>> series,
>> so let me put this the other way around :
>>
>> does Bull have any plan to contribute this work ?
>> (for example, publish a patch for the v1.6 series, or submit pull
>> request(s) for master and v1.8 branch)
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/04/23 21:58, Piotr Lesnicki wrote:
>>> Hi,
>>>
>>> In OFED-3.12 the API for XRC has changed. I did not find
>>> corresponding changes in Open MPI: for example the function
>>> 'ibv_create_xrc_rcv_qp()' queried in openmpi configure script no
>>> longer exists in ofed-3.12-rc1.
>>>
>>> Are there any plans to support the new XRC API ?
>>>
>>>
>>> --
>>> Piotr
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/04/14583.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16445.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16467.php