Re: [OMPI devel] RFC: BTL Interface Change (2 of 5)

2014-08-18 Thread George Bosilca
Nathan,

Indeed the original design allowed for multiple usages of the same
descriptor, not concurrent as the text in the btl.h indicates but
consecutive. The MCA_BTL_FLAGS_RDMA_MATCHED flag is a weirdness needed for
Portal, and I am not use it is currently in use anywhere in the code base.

My problem with the depicted approach is that now we have two critical
sections in the fast path: one to allocate/reserve the descriptor (this is
at the BTL level on a call from the PML), and then another one to allocate
whatever structure the BTL needs to store the callback informations (again
on a call from the PML to the BTL).  In the previous design, we carefully
analyzed all communications path and tried to minimize the number of
back-and-forth between the PML and BTL layer in order to preserve the
performance.

 George.




On Thu, Jul 10, 2014 at 2:57 PM, Nathan Hjelm  wrote:

>
> What: Change the descriptor completion callback function to include
> cbdata and context pointers.
>
> Old callback:
>
> typedef void (*mca_btl_base_completion_fn_t)(
> struct mca_btl_base_module_t* module,
> struct mca_btl_base_endpoint_t* endpoint,
> struct mca_btl_base_descriptor_t* descriptor,
> int status);
>
>
> New callback:
>
> typedef void (*mca_btl_base_completion_fn_t)(
> struct mca_btl_base_module_t* module,
> struct mca_btl_base_endpoint_t* endpoint,
> struct mca_btl_base_descriptor_t* descriptor,
> void *cbdata, void *context, int status);
>
>
> Why: The BTL interface provides support for using a single descriptor
> with multiple concurrent RDMA operations. BTLs support this feature if
> the following flag is not set:
>
> /** RDMA put/get calls must have a matching prepare_{src,dst} call
> on the target with the same base (and possibly bound). */
> #define MCA_BTL_FLAGS_RDMA_MATCHED0x0040
>
>
> The problem is that in order to pass back the correct callback data and
> context to the completion function BTLs need to modify the
> descriptor. This could be a disaster in a multi-threaded application if
> one thread is calling the completion callback while another thread is
> preparing to start a put/get operation. To avoid issues it is better to
> provide the callback data as seperate arguments.
>
> The change is straightforward and the commit will update all BTLs and
> BTL users to use the new completion callback signature.
>
>
> When: As this was discussed at the developer's meeting last month I am
> setting a short timeout for this RFC. This times out next Wed (July
> 16).
>
>
> I would really like feedback on this change. Can it be improved? Should
> the segment data be passed back to the function (not something I need
> for RMA but might be useful elsewhere)? Would it be better to remove the
> simultaneous RDMA feature in favor of a lightweight descriptor clone (I
> have this implemented as well and I have no problem with providing
> both features)?
>
>
> This is another is a series of RFCs to improve (I hope) the BTL
> interface for one-sided operations. The next RFC will be on the
> one-sided BTL interface.
>
> -Nathan Hjelm
> HPC-5, LANL
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15101.php
>


Re: [hwloc-devel] --enable-plugins broken

2014-08-18 Thread Jeff Squyres (jsquyres)
Good enough; thanks for the refresher. :)

Sent from my phone. No type good. 

> On Aug 18, 2014, at 2:07 PM, "Brice Goglin"  wrote:
> 
> Le 18/08/2014 20:37, Jeff Squyres (jsquyres) a écrit :
>> I notice that --enable-plugins seems to be broken -- it always ends in:
>> 
>> configure: WARNING: Plugin support requested, but could not find ltdl.h
>> configure: error: Cannot continue
>> 
>> if you don't have libltdl installed.  Is this intentional?  I.e., have we 
>> already relied on an external libltdl?  Or have we previously embedded 
>> libltdl (via LT_CONFIG_LTDL_DIR), and it has just bit-rotted?
> 
> We had both external and embedded ltdl support in the beginning. We
> removed embedded in 1.7.1.
> Brice
> 
> 
> commit 7491172a4754b0e198f699cb31b7c65c59ac4df6
> Author: Brice Goglin 
> Date:   Wed May 15 08:15:49 2013 +
> 
>Stop embedding libltdl, always use the system-wide libltdl
> 
>The ltdl embedding caused problems to the hwloc embedding such as
>  http://www.open-mpi.org/community/lists/hwloc-devel/2013/04/3659.php
>We fixed the embedding AM_CONDITIONAL problem in
>  https://svn.open-mpi.org/trac/hwloc/changeset/5605
>but the generated tarballs now (sometimes) miss libltdl,
>causing configure to break.
>The patch in the first link above worked around that issue but...
> 
>* Embedding ltdl is useful when you rely on recent ltdl features,
>  while ltdl 1.5 (couldn't test earlier) looks OK for hwloc,
>  and that version is very old and available everywhere.
>* the ltdl embedding ability isn't perfect in "recursive" mode
>  (we had a hack for its config.h file in our configure
>   see http://lists.gnu.org/archive/html/libtool/2012-08/msg00016.html)
>* we only need ltdl when --enable-plugins (not by default)
> 
>That's enough reasons to consider not embedding ltdl anymore.
>We now require people to install libltdl-dev/libtool-ltdl-dev
>if they want plugins.
> 
>This commit was SVN r5618.
> 
> 
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/hwloc-devel/2014/08/4176.php


Re: [hwloc-devel] --enable-plugins broken

2014-08-18 Thread Brice Goglin
Le 18/08/2014 20:37, Jeff Squyres (jsquyres) a écrit :
> I notice that --enable-plugins seems to be broken -- it always ends in:
>
> configure: WARNING: Plugin support requested, but could not find ltdl.h
> configure: error: Cannot continue
>
> if you don't have libltdl installed.  Is this intentional?  I.e., have we 
> already relied on an external libltdl?  Or have we previously embedded 
> libltdl (via LT_CONFIG_LTDL_DIR), and it has just bit-rotted?
>

We had both external and embedded ltdl support in the beginning. We
removed embedded in 1.7.1.
Brice


commit 7491172a4754b0e198f699cb31b7c65c59ac4df6
Author: Brice Goglin 
List-Post: hwloc-devel@lists.open-mpi.org
Date:   Wed May 15 08:15:49 2013 +

Stop embedding libltdl, always use the system-wide libltdl

The ltdl embedding caused problems to the hwloc embedding such as
  http://www.open-mpi.org/community/lists/hwloc-devel/2013/04/3659.php
We fixed the embedding AM_CONDITIONAL problem in
  https://svn.open-mpi.org/trac/hwloc/changeset/5605
but the generated tarballs now (sometimes) miss libltdl,
causing configure to break.
The patch in the first link above worked around that issue but...

* Embedding ltdl is useful when you rely on recent ltdl features,
  while ltdl 1.5 (couldn't test earlier) looks OK for hwloc,
  and that version is very old and available everywhere.
* the ltdl embedding ability isn't perfect in "recursive" mode
  (we had a hack for its config.h file in our configure
   see http://lists.gnu.org/archive/html/libtool/2012-08/msg00016.html)
* we only need ltdl when --enable-plugins (not by default)

That's enough reasons to consider not embedding ltdl anymore.
We now require people to install libltdl-dev/libtool-ltdl-dev
if they want plugins.

This commit was SVN r5618.




[hwloc-devel] --enable-plugins broken

2014-08-18 Thread Jeff Squyres (jsquyres)
I notice that --enable-plugins seems to be broken -- it always ends in:

configure: WARNING: Plugin support requested, but could not find ltdl.h
configure: error: Cannot continue

if you don't have libltdl installed.  Is this intentional?  I.e., have we 
already relied on an external libltdl?  Or have we previously embedded libltdl 
(via LT_CONFIG_LTDL_DIR), and it has just bit-rotted?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-18 Thread Gilles Gouaillardet
In the case of PGI compilers prior to 13, a workaround is to configure
with --disable-oshmem-profile

On 2014/08/18 16:21, Gilles Gouaillardet wrote:
> Josh, Paul,
>
> the problem with old PGI compilers comes from the preprocessor (!)
>
> with pgi 12.10 :
> oshmem/shmem/fortran/start_pes_f.c
> SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes)
>
> gets expanded as
>
> #pragma weak START_PES = PSTART_PES SHMEM_GENERATE_WEAK_PRAGMA ( weak
> start_pes_ = pstart_pes_ )
>
> whereas with pgi 14.7, it gets expanded as
>
> #pragma weak START_PES = PSTART_PES
> #pragma weak start_pes_ = pstart_pes_
> #pragma weak start_pes__ = pstart_pes__
>
> from oshmem/shmem/fortran/profile/pbindings.h :
> #define SHMEM_GENERATE_WEAK_PRAGMA(x) _Pragma(#x)
>
> #define SHMEM_GENERATE_WEAK_BINDINGS(UPPER_NAME,
> lower_name) \
> SHMEM_GENERATE_WEAK_PRAGMA(weak UPPER_NAME = P ##
> UPPER_NAME)\
> SHMEM_GENERATE_WEAK_PRAGMA(weak lower_name ## _ = p ## lower_name ##
> _)  \
> SHMEM_GENERATE_WEAK_PRAGMA(weak lower_name ## __ = p ## lower_name
> ## __)
>
> a workaround is to manually expand the SHMEM_GENERATE_WEAK_BINDINGS
> macro and replace
>
> SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes)
>
> with
>
> SHMEM_GENERATE_WEAK_PRAGMA(weak START_PES = PSTART_PES)
> SHMEM_GENERATE_WEAK_PRAGMA(weak start_pes_ = pstart_pes_)
> SHMEM_GENERATE_WEAK_PRAGMA(weak start_pes__ = pstart_pes__)
>
> /* i was unable to get something that works by simply replacing the
> definition of the SHMEM_GENERATE_WEAK_BINDINGS macro */
>
> of course, this would have to be repeated in all the source files ...
>
>
> Cheers,
>
> Gilles
>
> On 2014/08/15 3:44, Paul Hargrove wrote:
>> Josh,
>>
>> The specific compilers that caused the most problems are the older PGI
>> compilers (any before 13.x).
>> In this case the user has the option to update their compiler to 13.10 or
>> newer.
>>
>> There are also issues with IBM's xlf.
>> For the IBM compiler I have never found a version that builds/links the MPI
>> f08 bindings, and now also find no version that can link the OSHMEM fortran
>> bindings.
>>
>> -Paul
>>
>> -Paul
>>
>>
>> On Thu, Aug 14, 2014 at 11:30 AM, Joshua Ladd  wrote:
>>
>>> We will update the README accordingly. Thank you, Paul.
>>>
>>> Josh
>>>
>>>
>>> On Thu, Aug 14, 2014 at 10:00 AM, Jeff Squyres (jsquyres) <
>>> jsquy...@cisco.com> wrote:
>>>
 Good points.

 Mellanox -- can you update per Paul's suggestions?


 On Aug 13, 2014, at 8:26 PM, Paul Hargrove  wrote:

> The following is NOT a bug report.
> This is just an observation that may deserve some text in the README.
>
> I've reported issues in the past with some Fortran compilers (mostly
 older XLC and PGI) which either cannot build the "use mpi_f08" module, or
 cannot correctly link to it (and sometimes this fails only if configured
 with --enable-debug).
> Testing the OSHMEM Fortran bindings (enabled by default on Linux) I
 have found several compilers which fail to link the examples
 (hello_oshmemfh and ring_oshmemfh).  I reported one specific instance (with
 xlc-11/xlf-13) back in February:
 http://www.open-mpi.org/community/lists/devel/2014/02/14057.php
> So far I have these failures only on platforms where the Fortran
 compiler is *known* to be broken for the MPI f90 and/or f08 bindings.
 Specifically, all the failing platforms are ones on which either:
> + Configure determines (without my help) that FC cannot build the F90
 and/or F08 modules.
> OR
> + I must pass --enable-mpi-fortran=usempi or --enable-mpi-fortran=mpifh
 for cases configure cannot detect.
> So, I do *not* believe there is anything wrong with the OSHMEM code,
 which is why I started this post with "The following is NOT a bug report".
 However, I have two recommendations to make:
> 1) Documentation
>
> The README says just:
>
> --disable-oshmem-fortran
>   Disable building only the Fortran OSHMEM bindings.
>
> So, I recommend adding a sentence there referencing the "Compiler
 Notes" section of the README which has details on some known bad Fortran
 compilers.
> 2) Configure:
>
> As I noted above, at least some of the failures are on platforms where
 configure has determined it cannot build the f08 MPI bindings.  So, maybe
 there is something that could be done at configure time to disqualify some
 Fortran compilers from building the OSHMEM fotran bindings, too.
> -Paul
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: 

Re: [OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-18 Thread Gilles Gouaillardet
Josh, Paul,

the problem with old PGI compilers comes from the preprocessor (!)

with pgi 12.10 :
oshmem/shmem/fortran/start_pes_f.c
SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes)

gets expanded as

#pragma weak START_PES = PSTART_PES SHMEM_GENERATE_WEAK_PRAGMA ( weak
start_pes_ = pstart_pes_ )

whereas with pgi 14.7, it gets expanded as

#pragma weak START_PES = PSTART_PES
#pragma weak start_pes_ = pstart_pes_
#pragma weak start_pes__ = pstart_pes__

from oshmem/shmem/fortran/profile/pbindings.h :
#define SHMEM_GENERATE_WEAK_PRAGMA(x) _Pragma(#x)

#define SHMEM_GENERATE_WEAK_BINDINGS(UPPER_NAME,
lower_name) \
SHMEM_GENERATE_WEAK_PRAGMA(weak UPPER_NAME = P ##
UPPER_NAME)\
SHMEM_GENERATE_WEAK_PRAGMA(weak lower_name ## _ = p ## lower_name ##
_)  \
SHMEM_GENERATE_WEAK_PRAGMA(weak lower_name ## __ = p ## lower_name
## __)

a workaround is to manually expand the SHMEM_GENERATE_WEAK_BINDINGS
macro and replace

SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes)

with

SHMEM_GENERATE_WEAK_PRAGMA(weak START_PES = PSTART_PES)
SHMEM_GENERATE_WEAK_PRAGMA(weak start_pes_ = pstart_pes_)
SHMEM_GENERATE_WEAK_PRAGMA(weak start_pes__ = pstart_pes__)

/* i was unable to get something that works by simply replacing the
definition of the SHMEM_GENERATE_WEAK_BINDINGS macro */

of course, this would have to be repeated in all the source files ...


Cheers,

Gilles

On 2014/08/15 3:44, Paul Hargrove wrote:
> Josh,
>
> The specific compilers that caused the most problems are the older PGI
> compilers (any before 13.x).
> In this case the user has the option to update their compiler to 13.10 or
> newer.
>
> There are also issues with IBM's xlf.
> For the IBM compiler I have never found a version that builds/links the MPI
> f08 bindings, and now also find no version that can link the OSHMEM fortran
> bindings.
>
> -Paul
>
> -Paul
>
>
> On Thu, Aug 14, 2014 at 11:30 AM, Joshua Ladd  wrote:
>
>> We will update the README accordingly. Thank you, Paul.
>>
>> Josh
>>
>>
>> On Thu, Aug 14, 2014 at 10:00 AM, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com> wrote:
>>
>>> Good points.
>>>
>>> Mellanox -- can you update per Paul's suggestions?
>>>
>>>
>>> On Aug 13, 2014, at 8:26 PM, Paul Hargrove  wrote:
>>>
 The following is NOT a bug report.
 This is just an observation that may deserve some text in the README.

 I've reported issues in the past with some Fortran compilers (mostly
>>> older XLC and PGI) which either cannot build the "use mpi_f08" module, or
>>> cannot correctly link to it (and sometimes this fails only if configured
>>> with --enable-debug).
 Testing the OSHMEM Fortran bindings (enabled by default on Linux) I
>>> have found several compilers which fail to link the examples
>>> (hello_oshmemfh and ring_oshmemfh).  I reported one specific instance (with
>>> xlc-11/xlf-13) back in February:
>>> http://www.open-mpi.org/community/lists/devel/2014/02/14057.php
 So far I have these failures only on platforms where the Fortran
>>> compiler is *known* to be broken for the MPI f90 and/or f08 bindings.
>>> Specifically, all the failing platforms are ones on which either:
 + Configure determines (without my help) that FC cannot build the F90
>>> and/or F08 modules.
 OR
 + I must pass --enable-mpi-fortran=usempi or --enable-mpi-fortran=mpifh
>>> for cases configure cannot detect.
 So, I do *not* believe there is anything wrong with the OSHMEM code,
>>> which is why I started this post with "The following is NOT a bug report".
>>> However, I have two recommendations to make:
 1) Documentation

 The README says just:

 --disable-oshmem-fortran
   Disable building only the Fortran OSHMEM bindings.

 So, I recommend adding a sentence there referencing the "Compiler
>>> Notes" section of the README which has details on some known bad Fortran
>>> compilers.
 2) Configure:

 As I noted above, at least some of the failures are on platforms where
>>> configure has determined it cannot build the f08 MPI bindings.  So, maybe
>>> there is something that could be done at configure time to disqualify some
>>> Fortran compilers from building the OSHMEM fotran bindings, too.
 -Paul

 --
 Paul H. Hargrove  phhargr...@lbl.gov
 Future Technologies Group
 Computer and Data Sciences Department Tel: +1-510-495-2352
 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15643.php
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>