Re: [OMPI devel] barrier before calling del_procs

2014-07-23 Thread Yossi Etigin
1.   If the barrier is before del_proc, it does guarantee all MPI calls 
have been completed by all other ranks, but it does not guarantee all ACKs have 
been delivered. For MXM, closing the connection (del_procs call completed) 
guarantees that my rank got all ACKs. So we need a barrier between del_procs 
and pml_finalize, because only when all other ranks closed their connection 
it’s safe to destroy the global pml resources.



2.   In order to avoid a situation when rankA starts disconnecting from 
rankB, while rankB is still doing MPI work. In this case rankB will not be able 
to communicate with rankA any more, while it still has work to do.

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
Sent: Monday, July 21, 2014 9:11 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] barrier before calling del_procs

On Mon, Jul 21, 2014 at 1:41 PM, Yossi Etigin 
mailto:yos...@mellanox.com>> wrote:
Right, but:

1.   IMHO the rte_barrier in the wrong place (in the trunk)

In the trunk we have the rte_barrier prior to del_proc, which is what I would 
have expected: quiescence the BTLs by reaching a point where everybody agree 
that no more MPI messages will be exchanged, and then delete the BTLs.


2.   In addition to the rte_barrier, need also mpi_barrier
Care for providing a reasoning for this barrier? Why and where should it be 
placed?

  George.




From: devel 
[mailto:devel-boun...@open-mpi.org] On 
Behalf Of George Bosilca
Sent: Monday, July 21, 2014 8:19 PM
To: Open MPI Developers

Subject: Re: [OMPI devel] barrier before calling del_procs

There was a long thread of discussion on why we must use an rte_barrier and not 
an mpi_barrier during the finalize. Basically, we long as we have 
connectionless unreliable BTLs we need an external mechanism to ensure complete 
tear-down of the entire infrastructure. Thus, we need to rely on an rte_barrier 
not because it guarantees the correctness of the code, but because it provides 
enough time to all processes to flush all HPC traffic.

  George.


On Mon, Jul 21, 2014 at 1:10 PM, Yossi Etigin 
mailto:yos...@mellanox.com>> wrote:
I see. But in branch v1.8, in 31869, Ralph reverted the commit which moved 
del_procs after the barrier:
  "Revert r31851 until we can resolve how to close these leaks without causing 
the usnic BTL to fail during disconnect of intercommunicators
   Refs #4643"
Also, we need an rte barrier after del_procs - because otherwise rankA could 
call pml_finalize() before rankB finishes disconnecting from rankA.

I think the order in finalize should be like this:
1. mpi_barrier(world)
2. del_procs()
3. rte_barrier()
4. pml_finalize()

-Original Message-
From: Nathan Hjelm [mailto:hje...@lanl.gov]
Sent: Monday, July 21, 2014 8:01 PM
To: Open MPI Developers
Cc: Yossi Etigin
Subject: Re: [OMPI devel] barrier before calling del_procs

I should add that it is an rte barrier and not an MPI barrier for technical 
reasons.

-Nathan

On Mon, Jul 21, 2014 at 09:42:53AM -0700, Ralph Castain wrote:
>We already have an rte barrier before del procs
>
>Sent from my iPhone
>On Jul 21, 2014, at 8:21 AM, Yossi Etigin 
> mailto:yos...@mellanox.com>> wrote:
>
>  Hi,
>
>
>
>  We get occasional hangs with MTL/MXM during finalize, because a global
>  synchronization is needed before calling del_procs.
>
>  e.g rank A may call del_procs() and disconnect from rank B, while rank B
>  is still working.
>
>  What do you think about adding an MPI barrier on COMM_WORLD before
>  calling del_procs()?
>
>

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15204.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15206.php


___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15208.php



[OMPI devel] v1.8 - compile/dist problem

2014-07-23 Thread Mike Dubman
  CC   libvt_mpi_la-vt_iowrap_helper.lo
  CC   libvt_mpi_la-vt_libwrap.lo
  CC   libvt_mpi_la-vt_mallocwrap.lo
  CC   libvt_mpi_la-vt_mpifile.lo
make[6]: Entering directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/mpi'
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify.cc
vt_unify.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_defs.cc
vt_unify_defs.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_defs_recs.cc
vt_unify_defs_recs.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_events_stats.cc
vt_unify_events_stats.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_handlers.cc
vt_unify_handlers.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_hooks.cc
vt_unify_hooks.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_markers.cc
vt_unify_markers.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_sync.cc
vt_unify_sync.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_tkfac.cc
vt_unify_tkfac.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_usrcom.cc
vt_unify_usrcom.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/hooks/vt_unify_hooks_base.cc
hooks/vt_unify_hooks_base.cc
ln: failed to create symbolic link ‘hooks/vt_unify_hooks_base.cc’: No
such file or directory
Makefile:1593: recipe for target 'hooks/vt_unify_hooks_base.cc' failed
make[6]: *** [hooks/vt_unify_hooks_base.cc] Error 1
make[6]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/mpi'
Makefile:3548: recipe for target '../tools/vtunify/mpi/libvt-mpi-unify.la'
failed
make[5]: *** [../tools/vtunify/mpi/libvt-mpi-unify.la] Error 2
make[5]: *** Waiting for unfinished jobs
make[5]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/vtlib'
Makefile:810: recipe for target 'all-recursive' failed
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt'
Makefile:679: recipe for target 'all' failed
make[3]: *** [all] Error 2
make[3]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt'
Makefile:1579: recipe for target 'all-recursive' failed
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt'
Makefile:3152: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi'
Makefile:1714: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.QzMv2a (%build)


Re: [OMPI devel] v1.8 - compile/dist problem

2014-07-23 Thread Jeff Squyres (jsquyres)
Mike --

Are you having the same jenkins problem we ran into yesterday?  If so, it's a 
simple fix:

http://www.open-mpi.org/community/lists/devel/2014/07/15211.php


On Jul 23, 2014, at 9:01 AM, Mike Dubman  wrote:

> 
>   CC   libvt_mpi_la-vt_iowrap_helper.lo
>   CC   libvt_mpi_la-vt_libwrap.lo
>   CC   libvt_mpi_la-vt_mallocwrap.lo
>   CC   libvt_mpi_la-vt_mpifile.lo
> make[6]: Entering directory 
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/mpi'
> ln -s 
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify.cc
>  vt_unify.cc
> ln -s 
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_defs.cc
>  vt_unify_defs.cc
> ln -s 
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_defs_recs.cc
>  vt_unify_defs_recs.cc
> ln -s 
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_events_stats.cc
>  vt_unify_events_stats.cc
> ln -s 
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_handlers.cc
>  vt_unify_handlers.cc
> ln -s 
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_hooks.cc
>  vt_unify_hooks.cc
> ln -s 
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_markers.cc
>  vt_unify_markers.cc
> ln -s 
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_sync.cc
>  vt_unify_sync.cc
> ln -s 
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_tkfac.cc
>  vt_unify_tkfac.cc
> ln -s 
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_usrcom.cc
>  vt_unify_usrcom.cc
> ln -s 
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/hooks/vt_unify_hooks_base.cc
>  hooks/vt_unify_hooks_base.cc
> ln: failed to create symbolic link ‘hooks/vt_unify_hooks_base.cc’: No 
> such file or directory
> Makefile:1593: recipe for target 'hooks/vt_unify_hooks_base.cc' failed
> make[6]: *** [hooks/vt_unify_hooks_base.cc] Error 1
> make[6]: Leaving directory 
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/mpi'
> Makefile:3548: recipe for target '../tools/vtunify/mpi/libvt-mpi-unify.la' 
> failed
> make[5]: *** [../tools/vtunify/mpi/libvt-mpi-unify.la] Error 2
> make[5]: *** Waiting for unfinished jobs
> make[5]: Leaving directory 
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/vtlib'
> Makefile:810: recipe for target 'all-recursive' failed
> make[4]: *** [all-recursive] Error 1
> make[4]: Leaving directory 
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt'
> Makefile:679: recipe for target 'all' failed
> make[3]: *** [all] Error 2
> make[3]: Leaving directory 
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt'
> Makefile:1579: recipe for target 'all-recursive' failed
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory 
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt'
> Makefile:3152: recipe for target 'all-recursive' failed
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi'
> Makefile:1714: recipe for target 'all-recursive' failed
> make: *** [all-recursive] Error 1
> error: Bad exit status from /var/tmp/rpm-tmp.QzMv2a (%build)
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15214.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] v1.8 - compile/dist problem

2014-07-23 Thread Bert Wesarg

Noticed by the VT guys.

On 07/23/2014 03:01 PM, Mike Dubman wrote:

   CC   libvt_mpi_la-vt_iowrap_helper.lo
   CC   libvt_mpi_la-vt_libwrap.lo
   CC   libvt_mpi_la-vt_mallocwrap.lo
   CC   libvt_mpi_la-vt_mpifile.lo
make[6]: Entering directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/mpi'
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify.cc
vt_unify.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_defs.cc
vt_unify_defs.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_defs_recs.cc
vt_unify_defs_recs.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_events_stats.cc
vt_unify_events_stats.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_handlers.cc
vt_unify_handlers.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_hooks.cc
vt_unify_hooks.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_markers.cc
vt_unify_markers.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_sync.cc
vt_unify_sync.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_tkfac.cc
vt_unify_tkfac.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_usrcom.cc
vt_unify_usrcom.cc
ln -s
/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/hooks/vt_unify_hooks_base.cc
hooks/vt_unify_hooks_base.cc
ln: failed to create symbolic link ‘hooks/vt_unify_hooks_base.cc’: No
such file or directory
Makefile:1593: recipe for target 'hooks/vt_unify_hooks_base.cc' failed
make[6]: *** [hooks/vt_unify_hooks_base.cc] Error 1
make[6]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/mpi'
Makefile:3548: recipe for target '../tools/vtunify/mpi/libvt-mpi-unify.la'
failed
make[5]: *** [../tools/vtunify/mpi/libvt-mpi-unify.la] Error 2
make[5]: *** Waiting for unfinished jobs
make[5]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/vtlib'
Makefile:810: recipe for target 'all-recursive' failed
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt'
Makefile:679: recipe for target 'all' failed
make[3]: *** [all] Error 2
make[3]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt'
Makefile:1579: recipe for target 'all-recursive' failed
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt'
Makefile:3152: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
'/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi'
Makefile:1714: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.QzMv2a (%build)



___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15214.php



--
Dipl.-Inf. Bert Wesarg
wiss. Mitarbeiter

Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
01062 Dresden
Tel.: +49 (351) 463-42451
Fax: +49 (351) 463-37773
E-Mail: bert.wes...@tu-dresden.de



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI devel] v1.8 - compile/dist problem

2014-07-23 Thread Mike Dubman
nope, we use git.
it passed on rhel 6.x, failed on ubuntu/debian/fedora and rhel 7.x


On Wed, Jul 23, 2014 at 4:03 PM, Jeff Squyres (jsquyres)  wrote:

> Mike --
>
> Are you having the same jenkins problem we ran into yesterday?  If so,
> it's a simple fix:
>
> http://www.open-mpi.org/community/lists/devel/2014/07/15211.php
>
>
> On Jul 23, 2014, at 9:01 AM, Mike Dubman  wrote:
>
> >
> >   CC   libvt_mpi_la-vt_iowrap_helper.lo
> >   CC   libvt_mpi_la-vt_libwrap.lo
> >   CC   libvt_mpi_la-vt_mallocwrap.lo
> >   CC   libvt_mpi_la-vt_mpifile.lo
> > make[6]: Entering directory
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/mpi'
> > ln -s
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify.cc
> vt_unify.cc
> > ln -s
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_defs.cc
> vt_unify_defs.cc
> > ln -s
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_defs_recs.cc
> vt_unify_defs_recs.cc
> > ln -s
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_events_stats.cc
> vt_unify_events_stats.cc
> > ln -s
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_handlers.cc
> vt_unify_handlers.cc
> > ln -s
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_hooks.cc
> vt_unify_hooks.cc
> > ln -s
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_markers.cc
> vt_unify_markers.cc
> > ln -s
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_sync.cc
> vt_unify_sync.cc
> > ln -s
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_tkfac.cc
> vt_unify_tkfac.cc
> > ln -s
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_usrcom.cc
> vt_unify_usrcom.cc
> > ln -s
> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/hooks/vt_unify_hooks_base.cc
> hooks/vt_unify_hooks_base.cc
> > ln: failed to create symbolic link ‘hooks/vt_unify_hooks_base.cc’:
> No such file or directory
> > Makefile:1593: recipe for target 'hooks/vt_unify_hooks_base.cc' failed
> > make[6]: *** [hooks/vt_unify_hooks_base.cc] Error 1
> > make[6]: Leaving directory
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/mpi'
> > Makefile:3548: recipe for target '../tools/vtunify/mpi/
> libvt-mpi-unify.la' failed
> > make[5]: *** [../tools/vtunify/mpi/libvt-mpi-unify.la] Error 2
> > make[5]: *** Waiting for unfinished jobs
> > make[5]: Leaving directory
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/vtlib'
> > Makefile:810: recipe for target 'all-recursive' failed
> > make[4]: *** [all-recursive] Error 1
> > make[4]: Leaving directory
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt'
> > Makefile:679: recipe for target 'all' failed
> > make[3]: *** [all] Error 2
> > make[3]: Leaving directory
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt'
> > Makefile:1579: recipe for target 'all-recursive' failed
> > make[2]: *** [all-recursive] Error 1
> > make[2]: Leaving directory
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt'
> > Makefile:3152: recipe for target 'all-recursive' failed
> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory
> '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi'
> > Makefile:1714: recipe for target 'all-recursive' failed
> > make: *** [all-recursive] Error 1
> > error: Bad exit status from /var/tmp/rpm-tmp.QzMv2a (%build)
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15214.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15215.php
>


Re: [OMPI devel] v1.8 - compile/dist problem

2014-07-23 Thread Jeff Squyres (jsquyres)
Are you sure something isn't stale?  I.e., did you do a fresh checkout since 
the last build, or a "git clean", or something?


On Jul 23, 2014, at 10:02 AM, Mike Dubman  wrote:

> nope, we use git.
> it passed on rhel 6.x, failed on ubuntu/debian/fedora and rhel 7.x
> 
> 
> On Wed, Jul 23, 2014 at 4:03 PM, Jeff Squyres (jsquyres)  
> wrote:
> Mike --
> 
> Are you having the same jenkins problem we ran into yesterday?  If so, it's a 
> simple fix:
> 
> http://www.open-mpi.org/community/lists/devel/2014/07/15211.php
> 
> 
> On Jul 23, 2014, at 9:01 AM, Mike Dubman  wrote:
> 
> >
> >   CC   libvt_mpi_la-vt_iowrap_helper.lo
> >   CC   libvt_mpi_la-vt_libwrap.lo
> >   CC   libvt_mpi_la-vt_mallocwrap.lo
> >   CC   libvt_mpi_la-vt_mpifile.lo
> > make[6]: Entering directory 
> > '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/mpi'
> > ln -s 
> > /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify.cc
> >  vt_unify.cc
> > ln -s 
> > /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_defs.cc
> >  vt_unify_defs.cc
> > ln -s 
> > /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_defs_recs.cc
> >  vt_unify_defs_recs.cc
> > ln -s 
> > /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_events_stats.cc
> >  vt_unify_events_stats.cc
> > ln -s 
> > /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_handlers.cc
> >  vt_unify_handlers.cc
> > ln -s 
> > /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_hooks.cc
> >  vt_unify_hooks.cc
> > ln -s 
> > /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_markers.cc
> >  vt_unify_markers.cc
> > ln -s 
> > /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_sync.cc
> >  vt_unify_sync.cc
> > ln -s 
> > /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_tkfac.cc
> >  vt_unify_tkfac.cc
> > ln -s 
> > /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/vt_unify_usrcom.cc
> >  vt_unify_usrcom.cc
> > ln -s 
> > /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/hooks/vt_unify_hooks_base.cc
> >  hooks/vt_unify_hooks_base.cc
> > ln: failed to create symbolic link ‘hooks/vt_unify_hooks_base.cc’: No 
> > such file or directory
> > Makefile:1593: recipe for target 'hooks/vt_unify_hooks_base.cc' failed
> > make[6]: *** [hooks/vt_unify_hooks_base.cc] Error 1
> > make[6]: Leaving directory 
> > '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/tools/vtunify/mpi'
> > Makefile:3548: recipe for target '../tools/vtunify/mpi/libvt-mpi-unify.la' 
> > failed
> > make[5]: *** [../tools/vtunify/mpi/libvt-mpi-unify.la] Error 2
> > make[5]: *** Waiting for unfinished jobs
> > make[5]: Leaving directory 
> > '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt/vtlib'
> > Makefile:810: recipe for target 'all-recursive' failed
> > make[4]: *** [all-recursive] Error 1
> > make[4]: Leaving directory 
> > '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt'
> > Makefile:679: recipe for target 'all' failed
> > make[3]: *** [all] Error 2
> > make[3]: Leaving directory 
> > '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt/vt'
> > Makefile:1579: recipe for target 'all-recursive' failed
> > make[2]: *** [all-recursive] Error 1
> > make[2]: Leaving directory 
> > '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi/contrib/vt'
> > Makefile:3152: recipe for target 'all-recursive' failed
> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory 
> > '/var/tmp/OFED_topdir/BUILD/openmpi-1.8.2rc2/ompi'
> > Makefile:1714: recipe for target 'all-recursive' failed
> > make: *** [all-recursive] Error 1
> > error: Bad exit status from /var/tmp/rpm-tmp.QzMv2a (%build)
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/07/15214.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15215.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15217.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.co

[OMPI devel] Annual SVN account maintenance

2014-07-23 Thread Jeff Squyres (jsquyres)
It is that time again -- it's time to clean house of SVN write access accounts.

SHORT VERSION
=

Edit the wiki to preserve your organization's SVN accounts by COB, Thursday, 
July 31, 2014:

https://svn.open-mpi.org/trac/ompi/wiki/2014-SVN-summer-cleaning

If you don't indicate which accounts are still in use by the deadline, THEY 
WILL BE DELETED.

MORE DETAIL
===

Once a year, we prune non-active SVN accounts. For example:

* People who are no longer involved in Open MPI
* People who no longer need write access to Open MPI's Subversion
* People who have moved organizations and no longer have that organization's 
permission to commit to Open MPI

Each organization is responsible for editing this wiki page and moving each of 
their IDs to the "keep SVN write access" or "remove SVN write access" columns. 
There's also some unknown IDs (meaning: Jeff didn't recognize those IDs and 
therefore didn't know with which organization to group them). Please claim your 
organization's IDs.

ALL ORGANIZATIONS MUST CLASSIFY THEIR COMMITTERS!

To help with your classification, here's lists of IDs from AUTHORS who have 
(and have not) had commits in the last year:

AUTHORS with commits in the past year:
--
adrian -> Adrian Reber, HE
alex -> Alex Margolin, Mellanox
alinas -> Alina Sklarevich, Mellanox
amikheev -> Alex Mikheev, Mellanox
bosilca -> George Bosilca, UTK
brbarret -> Brian Barrett, IU, LANL, SNL
devendar -> Devendar Bureddy, Mellanox
dgoodell -> Dave Goodell, Cisco
edgar -> Edgar Gabriel, HLRS, UH, UTK
ggouaillardet -> Gilles Gouaillardet, RIST
hadi -> Hadi Montakhabi, UH
hjelmn -> Nathan Hjelm, LANL
hpcchris -> Christoph Niethammer, HLRS
hppritcha -> Howard Pritchard, LANL
jladd -> Joshua Ladd, Mellanox
jroman -> Jose E. Roman, UPV
jsquyres -> Jeff Squyres, Cisco, IU
jurenz -> Matthias Jurenz, ZIH
manjugv -> Manjunath Gorentla Venkata, ORNL
miked -> Mike Dubman, Mellanox
mpiteam -> Automated commits, None
naughtont -> Tom Naughton, ORNL
osvegis -> Oscar Vega-Gisbert, UPV
pasha -> Pavel Shamis, Mellanox, ORNL
regrant -> Ryan Grant, SNL
rfaucett -> Reese Faucette, Cisco
rhc -> Ralph Castain, LANL, Cisco, Intel
rolfv -> Rolf Vandevaart, Sun, Oracle, NVIDIA
swise -> Steve Wise, Chelsio
vasily -> Vasily Filipov, Mellanox
vvenkatesan -> Vishwanath Venkatesan, UH
yosefe -> Yossi Etigin, Mellanox

AUTHORS with NO commits in the past year:
-
Anya -> Anya Tatashina, Sun
abbyz -> Abhishek Kulkarni, IU
adi -> Adrian Knoth, UJ
adkulkar -> Abhishek Kulkarni, IU
afriedle -> Andrew Friedley, IU, SNL
alekseys -> Aleksey Senin, Mellanox
angskun -> Thara Angskun, UTK
apant -> Avneesh Pant, QLogic
artpol -> Artem Polyakov, Individual
bbenton -> Brad Benton, IBM
bouteill -> Aurelien Bouteiller, UTK
bricka -> Alex Brick, NU
casswell -> Laura Casswell, LANL
coti -> Camille Coti, UTK, INRIA
csbell -> Christian Bell, QLogic
cyeoh -> Chris Yeoh, IBM
damico -> Bill D'Amico, Cisco
ddd -> David Daniel, LANL
derbeyn -> Nadia Derby, Bull
dgdimick -> Denis Dimick, LANL
dkerr -> Donald Kerr, Sun, Oracle
dlacher -> Dan Lacher, Sun
dorons -> Doron Shoham, Mellanox
emallove -> Ethan Mallove, Sun, Oracle
eugene -> Eugene Loh, Sun, Oracle
gef -> Graham Fagg, UTK
gingery -> Ginger Young, LANL
gleb -> Gleb Natapov, Voltaire
gshipman -> Galen Shipman, LANL
gwatson -> Greg Watson, LANL
herault -> Thomas Herault, INRIA
hpcstork -> Sven Stork, HLRS
htor -> Torsten Hoefler, IU, TUC
igb -> Iain Bason, Sun, Oracle
igoru -> Igor Usarov, Mellanox
jdmason -> Jon Mason, Chelsio
jjhursey -> Josh Hursey, IU, ORNL, LANL, LBNL, UWL
jnysal -> Nysal Jan, IBM
karenn -> Karen Norteman, Sun
kliteyn -> Yevgeny Kliteynik, Mellanox
kmroz -> Karl Mroz, UBC
knuepfer -> Andreas Knuepfer, ZIH
koenig -> Greg Koenig, ORNL
lemarini -> Pierre Lemarinier, UTK
lennyve -> Lenny Verkhovsky, Mellanox
lums -> Andrew Lumsdaine, IU
matney -> Ken Matney, ORNL
mitch -> Mitch Sukalski, SNL
mschaara -> Mohamad Chaarawi, UH
mt -> Mark Taylor, LANL
ollie -> Li-Ta Lo, LANL
paklui -> Pak Lui, Sun
patrick -> Patrick Geoffray, Myricom
penoff -> Brad Penoff, UBC
pjesa -> Jelena Pjesivac-Grbovic, UTK
pkambadu -> Prabhanjan Kambadur, IU
rasmussn -> Craig Rasmussen, LANL, UO
rbbrigh -> Ron Brightwell, SNL
rlgraham -> Rich Graham, ORNL, LANL, Mellanox
rta -> Rob Awles, LANL
rusraink -> Rainer Keller, HLRS, ORNL
sami -> Sami Ayyorgun, LANL
samuel -> Samuel K. Gutierrez, LANL
santhana -> Gopal Santhanaraman, OSU
sboehm -> Swen Boehm, ORNL
sharonm -> Sharon Melamed, Voltaire
shiqing -> Shiqing Fan, HLRS
sjeaugey -> Sylvain Jeaugey, Bull
surs -> Sayantan Sur, OSU
sushant -> Sushant Sharma, LANL
tdd -> Terry Dontje, Sun, Oracle
timattox -> Tim Mattox, IU, Cisco
tprins -> Tim Prins, IU, LANL
twoodall -> Tim Woodall, LANL
vsahay -> Vishal Sahay, IU
wbland -> Wesley Bland, UTK
yaeld -> Yael Dalen, Mellanox
yuw -> Weikuan Yu, LANL, OSU

DEADLINE FOR ACCOUNT CLASSIFICATION: COB, Thursday, July 31, 2014.  All 
acc

Re: [OMPI devel] Annual SVN account maintenance

2014-07-23 Thread Ralph Castain
Just a little confusing here as some of these folks have changed organizations, 
some of the orgs have dropped out of sight, etc. So it isn't entirely clear who 
owns what on your chart.

Looking at your lists, it is full of people who haven't been involved with OMPI 
for many years. So I'd say you should remove everyone on your "no commits in 
last year" list who doesn't respond and/or have someone from an active member 
org specifically request they be retained.


On Jul 23, 2014, at 7:52 AM, Jeff Squyres (jsquyres)  wrote:

> It is that time again -- it's time to clean house of SVN write access 
> accounts.
> 
> SHORT VERSION
> =
> 
> Edit the wiki to preserve your organization's SVN accounts by COB, Thursday, 
> July 31, 2014:
> 
>https://svn.open-mpi.org/trac/ompi/wiki/2014-SVN-summer-cleaning
> 
> If you don't indicate which accounts are still in use by the deadline, THEY 
> WILL BE DELETED.
> 
> MORE DETAIL
> ===
> 
> Once a year, we prune non-active SVN accounts. For example:
> 
> * People who are no longer involved in Open MPI
> * People who no longer need write access to Open MPI's Subversion
> * People who have moved organizations and no longer have that organization's 
> permission to commit to Open MPI
> 
> Each organization is responsible for editing this wiki page and moving each 
> of their IDs to the "keep SVN write access" or "remove SVN write access" 
> columns. There's also some unknown IDs (meaning: Jeff didn't recognize those 
> IDs and therefore didn't know with which organization to group them). Please 
> claim your organization's IDs.
> 
> ALL ORGANIZATIONS MUST CLASSIFY THEIR COMMITTERS!
> 
> To help with your classification, here's lists of IDs from AUTHORS who have 
> (and have not) had commits in the last year:
> 
> AUTHORS with commits in the past year:
> --
> adrian -> Adrian Reber, HE
> alex -> Alex Margolin, Mellanox
> alinas -> Alina Sklarevich, Mellanox
> amikheev -> Alex Mikheev, Mellanox
> bosilca -> George Bosilca, UTK
> brbarret -> Brian Barrett, IU, LANL, SNL
> devendar -> Devendar Bureddy, Mellanox
> dgoodell -> Dave Goodell, Cisco
> edgar -> Edgar Gabriel, HLRS, UH, UTK
> ggouaillardet -> Gilles Gouaillardet, RIST
> hadi -> Hadi Montakhabi, UH
> hjelmn -> Nathan Hjelm, LANL
> hpcchris -> Christoph Niethammer, HLRS
> hppritcha -> Howard Pritchard, LANL
> jladd -> Joshua Ladd, Mellanox
> jroman -> Jose E. Roman, UPV
> jsquyres -> Jeff Squyres, Cisco, IU
> jurenz -> Matthias Jurenz, ZIH
> manjugv -> Manjunath Gorentla Venkata, ORNL
> miked -> Mike Dubman, Mellanox
> mpiteam -> Automated commits, None
> naughtont -> Tom Naughton, ORNL
> osvegis -> Oscar Vega-Gisbert, UPV
> pasha -> Pavel Shamis, Mellanox, ORNL
> regrant -> Ryan Grant, SNL
> rfaucett -> Reese Faucette, Cisco
> rhc -> Ralph Castain, LANL, Cisco, Intel
> rolfv -> Rolf Vandevaart, Sun, Oracle, NVIDIA
> swise -> Steve Wise, Chelsio
> vasily -> Vasily Filipov, Mellanox
> vvenkatesan -> Vishwanath Venkatesan, UH
> yosefe -> Yossi Etigin, Mellanox
> 
> AUTHORS with NO commits in the past year:
> -
> Anya -> Anya Tatashina, Sun
> abbyz -> Abhishek Kulkarni, IU
> adi -> Adrian Knoth, UJ
> adkulkar -> Abhishek Kulkarni, IU
> afriedle -> Andrew Friedley, IU, SNL
> alekseys -> Aleksey Senin, Mellanox
> angskun -> Thara Angskun, UTK
> apant -> Avneesh Pant, QLogic
> artpol -> Artem Polyakov, Individual
> bbenton -> Brad Benton, IBM
> bouteill -> Aurelien Bouteiller, UTK
> bricka -> Alex Brick, NU
> casswell -> Laura Casswell, LANL
> coti -> Camille Coti, UTK, INRIA
> csbell -> Christian Bell, QLogic
> cyeoh -> Chris Yeoh, IBM
> damico -> Bill D'Amico, Cisco
> ddd -> David Daniel, LANL
> derbeyn -> Nadia Derby, Bull
> dgdimick -> Denis Dimick, LANL
> dkerr -> Donald Kerr, Sun, Oracle
> dlacher -> Dan Lacher, Sun
> dorons -> Doron Shoham, Mellanox
> emallove -> Ethan Mallove, Sun, Oracle
> eugene -> Eugene Loh, Sun, Oracle
> gef -> Graham Fagg, UTK
> gingery -> Ginger Young, LANL
> gleb -> Gleb Natapov, Voltaire
> gshipman -> Galen Shipman, LANL
> gwatson -> Greg Watson, LANL
> herault -> Thomas Herault, INRIA
> hpcstork -> Sven Stork, HLRS
> htor -> Torsten Hoefler, IU, TUC
> igb -> Iain Bason, Sun, Oracle
> igoru -> Igor Usarov, Mellanox
> jdmason -> Jon Mason, Chelsio
> jjhursey -> Josh Hursey, IU, ORNL, LANL, LBNL, UWL
> jnysal -> Nysal Jan, IBM
> karenn -> Karen Norteman, Sun
> kliteyn -> Yevgeny Kliteynik, Mellanox
> kmroz -> Karl Mroz, UBC
> knuepfer -> Andreas Knuepfer, ZIH
> koenig -> Greg Koenig, ORNL
> lemarini -> Pierre Lemarinier, UTK
> lennyve -> Lenny Verkhovsky, Mellanox
> lums -> Andrew Lumsdaine, IU
> matney -> Ken Matney, ORNL
> mitch -> Mitch Sukalski, SNL
> mschaara -> Mohamad Chaarawi, UH
> mt -> Mark Taylor, LANL
> ollie -> Li-Ta Lo, LANL
> paklui -> Pak Lui, Sun
> patrick -> Patrick Geoffray, Myricom
> penoff -> Brad Penoff, UBC
> pjesa -> Jelena Pjesivac-Grbovic, UTK
> pkambadu -> Prabhanjan Kam

Re: [OMPI devel] Annual SVN account maintenance

2014-07-23 Thread Joshua Ladd
Done.


On Wed, Jul 23, 2014 at 11:02 AM, Ralph Castain  wrote:

> Just a little confusing here as some of these folks have changed
> organizations, some of the orgs have dropped out of sight, etc. So it isn't
> entirely clear who owns what on your chart.
>
> Looking at your lists, it is full of people who haven't been involved with
> OMPI for many years. So I'd say you should remove everyone on your "no
> commits in last year" list who doesn't respond and/or have someone from an
> active member org specifically request they be retained.
>
>
> On Jul 23, 2014, at 7:52 AM, Jeff Squyres (jsquyres) 
> wrote:
>
> > It is that time again -- it's time to clean house of SVN write access
> accounts.
> >
> > SHORT VERSION
> > =
> >
> > Edit the wiki to preserve your organization's SVN accounts by COB,
> Thursday, July 31, 2014:
> >
> >https://svn.open-mpi.org/trac/ompi/wiki/2014-SVN-summer-cleaning
> >
> > If you don't indicate which accounts are still in use by the deadline,
> THEY WILL BE DELETED.
> >
> > MORE DETAIL
> > ===
> >
> > Once a year, we prune non-active SVN accounts. For example:
> >
> > * People who are no longer involved in Open MPI
> > * People who no longer need write access to Open MPI's Subversion
> > * People who have moved organizations and no longer have that
> organization's permission to commit to Open MPI
> >
> > Each organization is responsible for editing this wiki page and moving
> each of their IDs to the "keep SVN write access" or "remove SVN write
> access" columns. There's also some unknown IDs (meaning: Jeff didn't
> recognize those IDs and therefore didn't know with which organization to
> group them). Please claim your organization's IDs.
> >
> > ALL ORGANIZATIONS MUST CLASSIFY THEIR COMMITTERS!
> >
> > To help with your classification, here's lists of IDs from AUTHORS who
> have (and have not) had commits in the last year:
> >
> > AUTHORS with commits in the past year:
> > --
> > adrian -> Adrian Reber, HE
> > alex -> Alex Margolin, Mellanox
> > alinas -> Alina Sklarevich, Mellanox
> > amikheev -> Alex Mikheev, Mellanox
> > bosilca -> George Bosilca, UTK
> > brbarret -> Brian Barrett, IU, LANL, SNL
> > devendar -> Devendar Bureddy, Mellanox
> > dgoodell -> Dave Goodell, Cisco
> > edgar -> Edgar Gabriel, HLRS, UH, UTK
> > ggouaillardet -> Gilles Gouaillardet, RIST
> > hadi -> Hadi Montakhabi, UH
> > hjelmn -> Nathan Hjelm, LANL
> > hpcchris -> Christoph Niethammer, HLRS
> > hppritcha -> Howard Pritchard, LANL
> > jladd -> Joshua Ladd, Mellanox
> > jroman -> Jose E. Roman, UPV
> > jsquyres -> Jeff Squyres, Cisco, IU
> > jurenz -> Matthias Jurenz, ZIH
> > manjugv -> Manjunath Gorentla Venkata, ORNL
> > miked -> Mike Dubman, Mellanox
> > mpiteam -> Automated commits, None
> > naughtont -> Tom Naughton, ORNL
> > osvegis -> Oscar Vega-Gisbert, UPV
> > pasha -> Pavel Shamis, Mellanox, ORNL
> > regrant -> Ryan Grant, SNL
> > rfaucett -> Reese Faucette, Cisco
> > rhc -> Ralph Castain, LANL, Cisco, Intel
> > rolfv -> Rolf Vandevaart, Sun, Oracle, NVIDIA
> > swise -> Steve Wise, Chelsio
> > vasily -> Vasily Filipov, Mellanox
> > vvenkatesan -> Vishwanath Venkatesan, UH
> > yosefe -> Yossi Etigin, Mellanox
> >
> > AUTHORS with NO commits in the past year:
> > -
> > Anya -> Anya Tatashina, Sun
> > abbyz -> Abhishek Kulkarni, IU
> > adi -> Adrian Knoth, UJ
> > adkulkar -> Abhishek Kulkarni, IU
> > afriedle -> Andrew Friedley, IU, SNL
> > alekseys -> Aleksey Senin, Mellanox
> > angskun -> Thara Angskun, UTK
> > apant -> Avneesh Pant, QLogic
> > artpol -> Artem Polyakov, Individual
> > bbenton -> Brad Benton, IBM
> > bouteill -> Aurelien Bouteiller, UTK
> > bricka -> Alex Brick, NU
> > casswell -> Laura Casswell, LANL
> > coti -> Camille Coti, UTK, INRIA
> > csbell -> Christian Bell, QLogic
> > cyeoh -> Chris Yeoh, IBM
> > damico -> Bill D'Amico, Cisco
> > ddd -> David Daniel, LANL
> > derbeyn -> Nadia Derby, Bull
> > dgdimick -> Denis Dimick, LANL
> > dkerr -> Donald Kerr, Sun, Oracle
> > dlacher -> Dan Lacher, Sun
> > dorons -> Doron Shoham, Mellanox
> > emallove -> Ethan Mallove, Sun, Oracle
> > eugene -> Eugene Loh, Sun, Oracle
> > gef -> Graham Fagg, UTK
> > gingery -> Ginger Young, LANL
> > gleb -> Gleb Natapov, Voltaire
> > gshipman -> Galen Shipman, LANL
> > gwatson -> Greg Watson, LANL
> > herault -> Thomas Herault, INRIA
> > hpcstork -> Sven Stork, HLRS
> > htor -> Torsten Hoefler, IU, TUC
> > igb -> Iain Bason, Sun, Oracle
> > igoru -> Igor Usarov, Mellanox
> > jdmason -> Jon Mason, Chelsio
> > jjhursey -> Josh Hursey, IU, ORNL, LANL, LBNL, UWL
> > jnysal -> Nysal Jan, IBM
> > karenn -> Karen Norteman, Sun
> > kliteyn -> Yevgeny Kliteynik, Mellanox
> > kmroz -> Karl Mroz, UBC
> > knuepfer -> Andreas Knuepfer, ZIH
> > koenig -> Greg Koenig, ORNL
> > lemarini -> Pierre Lemarinier, UTK
> > lennyve -> Lenny Verkhovsky, Mellanox
> > lums -> Andrew Lumsdaine, IU
> > matney -> Ken Matney, ORN

Re: [OMPI devel] Annual SVN account maintenance

2014-07-23 Thread Jeff Squyres (jsquyres)
On Jul 23, 2014, at 11:02 AM, Ralph Castain  wrote:

> Just a little confusing here as some of these folks have changed 
> organizations, some of the orgs have dropped out of sight, etc. So it isn't 
> entirely clear who owns what on your chart.

Note that the chart I included in the email is pulled directly from the AUTHORS 
file.  So if things have changed, people should update their entries in AUTHORS.

We keep a FULL list of people who have *ever* committed in AUTHORS; we never 
trim it.

> Looking at your lists, it is full of people who haven't been involved with 
> OMPI for many years. So I'd say you should remove everyone on your "no 
> commits in last year" list who doesn't respond and/or have someone from an 
> active member org specifically request they be retained.

I'm guessing all of those will drop off, too.

I did a quick check and I think that (almost) everyone who hasn't had a commit 
in the last year isn't on the wiki page to be classified (some exceptions 
include the IU staff who have admin rights over the repo, etc.).

Regardless, I think those who have drifted away from the project won't update 
the wiki, and we'll end up deleting their accounts next week.  If a mistake is 
made, it's easy/trivial to restore an SVN account.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] barrier before calling del_procs

2014-07-23 Thread George Bosilca
My understanding is that both of these clauses are based on the fact that
there are ongoing communications between two processes when one of them
decide to shut down. From an MPI perspective, I can hardly see a case where
this is legit.

  George.



On Wed, Jul 23, 2014 at 8:33 AM, Yossi Etigin  wrote:

>  1.   If the barrier is before del_proc, it does guarantee all MPI
> calls have been completed by all other ranks, but it does not guarantee all
> ACKs have been delivered. For MXM, closing the connection (del_procs call
> completed) guarantees that my rank got all ACKs. So we need a barrier
> between del_procs and pml_finalize, because only when all other ranks
> closed their connection it’s safe to destroy the global pml resources.
>
>
>
> 2.   In order to avoid a situation when rankA starts disconnecting
> from rankB, while rankB is still doing MPI work. In this case rankB will
> not be able to communicate with rankA any more, while it still has work to
> do.
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *George
> Bosilca
> *Sent:* Monday, July 21, 2014 9:11 PM
>
> *To:* Open MPI Developers
> *Subject:* Re: [OMPI devel] barrier before calling del_procs
>
>
>
> On Mon, Jul 21, 2014 at 1:41 PM, Yossi Etigin  wrote:
>
> Right, but:
>
> 1.   IMHO the rte_barrier in the wrong place (in the trunk)
>
>
>
> In the trunk we have the rte_barrier prior to del_proc, which is what I
> would have expected: quiescence the BTLs by reaching a point where
> everybody agree that no more MPI messages will be exchanged, and then
> delete the BTLs.
>
>
>
>  2.   In addition to the rte_barrier, need also mpi_barrier
>
>  Care for providing a reasoning for this barrier? Why and where should it
> be placed?
>
>
>
>   George.
>
>
>
>
>
>
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *George
> Bosilca
> *Sent:* Monday, July 21, 2014 8:19 PM
> *To:* Open MPI Developers
>
>
> *Subject:* Re: [OMPI devel] barrier before calling del_procs
>
>
>
> There was a long thread of discussion on why we must use an rte_barrier
> and not an mpi_barrier during the finalize. Basically, we long as we have
> connectionless unreliable BTLs we need an external mechanism to ensure
> complete tear-down of the entire infrastructure. Thus, we need to rely on
> an rte_barrier not because it guarantees the correctness of the code, but
> because it provides enough time to all processes to flush all HPC traffic.
>
>
>
>   George.
>
>
>
>
>
> On Mon, Jul 21, 2014 at 1:10 PM, Yossi Etigin  wrote:
>
> I see. But in branch v1.8, in 31869, Ralph reverted the commit which moved
> del_procs after the barrier:
>   "Revert r31851 until we can resolve how to close these leaks without
> causing the usnic BTL to fail during disconnect of intercommunicators
>Refs #4643"
> Also, we need an rte barrier after del_procs - because otherwise rankA
> could call pml_finalize() before rankB finishes disconnecting from rankA.
>
> I think the order in finalize should be like this:
> 1. mpi_barrier(world)
> 2. del_procs()
> 3. rte_barrier()
> 4. pml_finalize()
>
>
> -Original Message-
> From: Nathan Hjelm [mailto:hje...@lanl.gov]
> Sent: Monday, July 21, 2014 8:01 PM
> To: Open MPI Developers
> Cc: Yossi Etigin
> Subject: Re: [OMPI devel] barrier before calling del_procs
>
> I should add that it is an rte barrier and not an MPI barrier for
> technical reasons.
>
> -Nathan
>
> On Mon, Jul 21, 2014 at 09:42:53AM -0700, Ralph Castain wrote:
> >We already have an rte barrier before del procs
> >
> >Sent from my iPhone
> >On Jul 21, 2014, at 8:21 AM, Yossi Etigin 
> wrote:
> >
> >  Hi,
> >
> >
> >
> >  We get occasional hangs with MTL/MXM during finalize, because a
> global
> >  synchronization is needed before calling del_procs.
> >
> >  e.g rank A may call del_procs() and disconnect from rank B, while
> rank B
> >  is still working.
> >
> >  What do you think about adding an MPI barrier on COMM_WORLD before
> >  calling del_procs()?
> >
> >
>
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/07/15204.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15206.php
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15208.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/li

[OMPI devel] 1.8.2rc2 ready for test

2014-07-23 Thread Ralph Castain
Usual place:

http://www.open-mpi.org/software/ompi/v1.8/

Please test and report problems by Wed July 30

Thanks
Ralph



Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-23 Thread Jeff Squyres (jsquyres)
George --

Is there a way to get the MPI_COMM_WORLD rank of an opal_process_name_t?

I am currently outputting some information about peer processes in the usnic 
BTL to include the peer's VPID, which is the MCW rank.  I'll be sad if that 
goes away...


On Jul 15, 2014, at 2:06 AM, George Bosilca  wrote:

> Ralph,
> 
> There are two reasons that prevent me from pushing this RFC forward.
> 
> 1. Minor: The code has some minor issues related to the last set of BTL/PML 
> changes, and I didn't found the time to fix them.
> 
> 2. Major: Not all BTLs have been updated and validated. What we need at this 
> point from their respective developers is a little help with the validation 
> process. We need to validate that the new code works as expected and passes 
> all tests.
> 
> The move will be ready to go as soon as all BTL developers raise the green 
> flag. I got it from Jeff (but the last USNIC commit broke something), and 
> myself. In other words, TCP, self, SM and USNIC are good to go. For the 
> others, as I didn't heard back from their developers/maintainers, I assume 
> they are not yet ready. Here I am referring to OpenIB, Portals4, Scif, 
> smcuda, ugni, usnic and vader.
> 
>   George.
> 
> PS: As a reminder the code is available at 
> https://bitbucket.org/bosilca/ompi-btl
> 
> 
> 
> On Fri, Jul 11, 2014 at 3:17 PM, Pritchard, Howard P  wrote:
> Hi Folks,
> 
> Now work is planned for the uGNI BTL at this time either.
> 
> Howard
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
> (jsquyres)
> Sent: Thursday, July 10, 2014 5:04 PM
> To: Open MPI Developers List
> Subject: Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure 
> in OPAL
> 
> FWIW: I can't speak for other BTL maintainers, but I'm out of the office for 
> the next week, and the usnic BTL will be standing still during that time.  
> Once I return, I will be making additional changes in the usnic BTL (new 
> features, updates, ...etc.).
> 
> So if you have the cycles, doing it in the next week or so would be good 
> because at least there will be no conflicts with usnic BTL concurrent 
> development.  :-)
> 
> 
> 
> 
> On Jul 10, 2014, at 2:56 PM, Ralph Castain  wrote:
> 
> > George: any update on when this will happen?
> >
> >
> > On Jun 4, 2014, at 9:14 PM, George Bosilca  wrote:
> >
> >> WHAT:Open our low-level communication infrastructure by moving all
> >> necessary components
> >>  (btl/rcache/allocator/mpool) down in OPAL
> >>
> >> WHY: All the components required for inter-process communications are
> >> currently deeply integrated in the OMPI
> >> layer. Several groups/institutions have express interest
> >> in having a more generic communication
> >> infrastructure, without all the OMPI layer dependencies.
> >> This communication layer should be made
> >> available at a different software level, available to all
> >> layers in the Open MPI software stack. As an
> >> example, our ORTE layer could replace the current OOB and
> >> instead use the BTL directly, gaining
> >> access to more reactive network interfaces than TCP.
> >> Similarly, external software libraries could take
> >> advantage of our highly optimized AM (active message)
> >> communication layer for their own purpose.
> >>
> >> UTK with support from Sandia, developped a version of
> >> Open MPI where the entire communication
> >> infrastucture has been moved down to OPAL
> >> (btl/rcache/allocator/mpool). Most of the moved
> >> components have been updated to match the new schema,
> >> with few exceptions (mainly BTLs
> >> where I have no way of compiling/testing them). Thus, the
> >> completion of this RFC is tied to
> >> being able to completing this move for all BTLs. For this
> >> we need help from the rest of the Open MPI
> >> community, especially those supporting some of the BTLs.
> >> A non-exhaustive list of BTLs that
> >> qualify here is: mx, portals4, scif, udapl, ugni, usnic.
> >>
> >> WHERE:  bitbucket.org/bosilca/ompi-btl (updated today with respect to
> >> trunk r31952)
> >>
> >> TIMEOUT: After all the BTLs have been amended to match the new
> >> location and usage. We will discuss
> >> the last bits regarding this RFC at the Open MPI
> >> developers meeting in Chicago, June 24-26. The
> >> RFC will become final only after the meeting.
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/devel/2014/06/14974.php
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this pos

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-23 Thread Ralph Castain
You should be able to memcpy it to an ompi_process_name_t and then extract it 
as usual


On Jul 23, 2014, at 6:51 PM, Jeff Squyres (jsquyres)  wrote:

> George --
> 
> Is there a way to get the MPI_COMM_WORLD rank of an opal_process_name_t?
> 
> I am currently outputting some information about peer processes in the usnic 
> BTL to include the peer's VPID, which is the MCW rank.  I'll be sad if that 
> goes away...
> 
> 
> On Jul 15, 2014, at 2:06 AM, George Bosilca  wrote:
> 
>> Ralph,
>> 
>> There are two reasons that prevent me from pushing this RFC forward.
>> 
>> 1. Minor: The code has some minor issues related to the last set of BTL/PML 
>> changes, and I didn't found the time to fix them.
>> 
>> 2. Major: Not all BTLs have been updated and validated. What we need at this 
>> point from their respective developers is a little help with the validation 
>> process. We need to validate that the new code works as expected and passes 
>> all tests.
>> 
>> The move will be ready to go as soon as all BTL developers raise the green 
>> flag. I got it from Jeff (but the last USNIC commit broke something), and 
>> myself. In other words, TCP, self, SM and USNIC are good to go. For the 
>> others, as I didn't heard back from their developers/maintainers, I assume 
>> they are not yet ready. Here I am referring to OpenIB, Portals4, Scif, 
>> smcuda, ugni, usnic and vader.
>> 
>>  George.
>> 
>> PS: As a reminder the code is available at 
>> https://bitbucket.org/bosilca/ompi-btl
>> 
>> 
>> 
>> On Fri, Jul 11, 2014 at 3:17 PM, Pritchard, Howard P  
>> wrote:
>> Hi Folks,
>> 
>> Now work is planned for the uGNI BTL at this time either.
>> 
>> Howard
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
>> (jsquyres)
>> Sent: Thursday, July 10, 2014 5:04 PM
>> To: Open MPI Developers List
>> Subject: Re: [OMPI devel] RFC: Move the Open MPI communication 
>> infrastructure in OPAL
>> 
>> FWIW: I can't speak for other BTL maintainers, but I'm out of the office for 
>> the next week, and the usnic BTL will be standing still during that time.  
>> Once I return, I will be making additional changes in the usnic BTL (new 
>> features, updates, ...etc.).
>> 
>> So if you have the cycles, doing it in the next week or so would be good 
>> because at least there will be no conflicts with usnic BTL concurrent 
>> development.  :-)
>> 
>> 
>> 
>> 
>> On Jul 10, 2014, at 2:56 PM, Ralph Castain  wrote:
>> 
>>> George: any update on when this will happen?
>>> 
>>> 
>>> On Jun 4, 2014, at 9:14 PM, George Bosilca  wrote:
>>> 
 WHAT:Open our low-level communication infrastructure by moving all
 necessary components
 (btl/rcache/allocator/mpool) down in OPAL
 
 WHY: All the components required for inter-process communications are
 currently deeply integrated in the OMPI
layer. Several groups/institutions have express interest
 in having a more generic communication
infrastructure, without all the OMPI layer dependencies.
 This communication layer should be made
available at a different software level, available to all
 layers in the Open MPI software stack. As an
example, our ORTE layer could replace the current OOB and
 instead use the BTL directly, gaining
access to more reactive network interfaces than TCP.
 Similarly, external software libraries could take
advantage of our highly optimized AM (active message)
 communication layer for their own purpose.
 
UTK with support from Sandia, developped a version of
 Open MPI where the entire communication
infrastucture has been moved down to OPAL
 (btl/rcache/allocator/mpool). Most of the moved
components have been updated to match the new schema,
 with few exceptions (mainly BTLs
where I have no way of compiling/testing them). Thus, the
 completion of this RFC is tied to
being able to completing this move for all BTLs. For this
 we need help from the rest of the Open MPI
community, especially those supporting some of the BTLs.
 A non-exhaustive list of BTLs that
qualify here is: mx, portals4, scif, udapl, ugni, usnic.
 
 WHERE:  bitbucket.org/bosilca/ompi-btl (updated today with respect to
 trunk r31952)
 
 TIMEOUT: After all the BTLs have been amended to match the new
 location and usage. We will discuss
the last bits regarding this RFC at the Open MPI
 developers meeting in Chicago, June 24-26. The
RFC will become final only after the meeting.
 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post:
 http://www.open-mpi.org/communi

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-23 Thread Jeff Squyres (jsquyres)
Ralph and I chatted in IM.

For the moment, I'm masking off the lower 32 bits to get the VPID, the 
uppermost 16 as the job family, and the next 16 as the sub-family.

If George makes the name be a handle with accessors to get the parts, we can 
switch to using that.



On Jul 23, 2014, at 9:57 PM, Ralph Castain  wrote:

> You should be able to memcpy it to an ompi_process_name_t and then extract it 
> as usual
> 
> 
> On Jul 23, 2014, at 6:51 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
>> George --
>> 
>> Is there a way to get the MPI_COMM_WORLD rank of an opal_process_name_t?
>> 
>> I am currently outputting some information about peer processes in the usnic 
>> BTL to include the peer's VPID, which is the MCW rank.  I'll be sad if that 
>> goes away...
>> 
>> 
>> On Jul 15, 2014, at 2:06 AM, George Bosilca  wrote:
>> 
>>> Ralph,
>>> 
>>> There are two reasons that prevent me from pushing this RFC forward.
>>> 
>>> 1. Minor: The code has some minor issues related to the last set of BTL/PML 
>>> changes, and I didn't found the time to fix them.
>>> 
>>> 2. Major: Not all BTLs have been updated and validated. What we need at 
>>> this point from their respective developers is a little help with the 
>>> validation process. We need to validate that the new code works as expected 
>>> and passes all tests.
>>> 
>>> The move will be ready to go as soon as all BTL developers raise the green 
>>> flag. I got it from Jeff (but the last USNIC commit broke something), and 
>>> myself. In other words, TCP, self, SM and USNIC are good to go. For the 
>>> others, as I didn't heard back from their developers/maintainers, I assume 
>>> they are not yet ready. Here I am referring to OpenIB, Portals4, Scif, 
>>> smcuda, ugni, usnic and vader.
>>> 
>>> George.
>>> 
>>> PS: As a reminder the code is available at 
>>> https://bitbucket.org/bosilca/ompi-btl
>>> 
>>> 
>>> 
>>> On Fri, Jul 11, 2014 at 3:17 PM, Pritchard, Howard P  
>>> wrote:
>>> Hi Folks,
>>> 
>>> Now work is planned for the uGNI BTL at this time either.
>>> 
>>> Howard
>>> 
>>> 
>>> -Original Message-
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
>>> (jsquyres)
>>> Sent: Thursday, July 10, 2014 5:04 PM
>>> To: Open MPI Developers List
>>> Subject: Re: [OMPI devel] RFC: Move the Open MPI communication 
>>> infrastructure in OPAL
>>> 
>>> FWIW: I can't speak for other BTL maintainers, but I'm out of the office 
>>> for the next week, and the usnic BTL will be standing still during that 
>>> time.  Once I return, I will be making additional changes in the usnic BTL 
>>> (new features, updates, ...etc.).
>>> 
>>> So if you have the cycles, doing it in the next week or so would be good 
>>> because at least there will be no conflicts with usnic BTL concurrent 
>>> development.  :-)
>>> 
>>> 
>>> 
>>> 
>>> On Jul 10, 2014, at 2:56 PM, Ralph Castain  wrote:
>>> 
 George: any update on when this will happen?
 
 
 On Jun 4, 2014, at 9:14 PM, George Bosilca  wrote:
 
> WHAT:Open our low-level communication infrastructure by moving all
> necessary components
>(btl/rcache/allocator/mpool) down in OPAL
> 
> WHY: All the components required for inter-process communications are
> currently deeply integrated in the OMPI
>   layer. Several groups/institutions have express interest
> in having a more generic communication
>   infrastructure, without all the OMPI layer dependencies.
> This communication layer should be made
>   available at a different software level, available to all
> layers in the Open MPI software stack. As an
>   example, our ORTE layer could replace the current OOB and
> instead use the BTL directly, gaining
>   access to more reactive network interfaces than TCP.
> Similarly, external software libraries could take
>   advantage of our highly optimized AM (active message)
> communication layer for their own purpose.
> 
>   UTK with support from Sandia, developped a version of
> Open MPI where the entire communication
>   infrastucture has been moved down to OPAL
> (btl/rcache/allocator/mpool). Most of the moved
>   components have been updated to match the new schema,
> with few exceptions (mainly BTLs
>   where I have no way of compiling/testing them). Thus, the
> completion of this RFC is tied to
>   being able to completing this move for all BTLs. For this
> we need help from the rest of the Open MPI
>   community, especially those supporting some of the BTLs.
> A non-exhaustive list of BTLs that
>   qualify here is: mx, portals4, scif, udapl, ugni, usnic.
> 
> WHERE:  bitbucket.org/bosilca/ompi-btl (updated today with respect to
> trunk r31952)
> 
> TIMEOUT: After all the BTLs have been amended to match the new
> location and

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-23 Thread George Bosilca
I was struggling with a similar issue while trying to fix the OpenIB 
compilation. And I choose to implement a different approach, which does not 
require knowledge of what’s inside opal_process_name_t.

Look in opal/util/proc.h. You should be able to use: opal_process_name_vpid and 
opal_process_name_jobid. They will remain there until we figure out a nice way 
to get rid of them completely.

HINT: I personally prefer to get rid of void and jobid completely. As long as 
need the info only for a visual clue, the output of OPAL_NAME_PRINT might be 
enough.

  George.

On Jul 23, 2014, at 22:11 , Jeff Squyres (jsquyres)  wrote:

> Ralph and I chatted in IM.
> 
> For the moment, I'm masking off the lower 32 bits to get the VPID, the 
> uppermost 16 as the job family, and the next 16 as the sub-family.
> 
> If George makes the name be a handle with accessors to get the parts, we can 
> switch to using that.
> 
> 
> 
> On Jul 23, 2014, at 9:57 PM, Ralph Castain  wrote:
> 
>> You should be able to memcpy it to an ompi_process_name_t and then extract 
>> it as usual
>> 
>> 
>> On Jul 23, 2014, at 6:51 PM, Jeff Squyres (jsquyres)  
>> wrote:
>> 
>>> George --
>>> 
>>> Is there a way to get the MPI_COMM_WORLD rank of an opal_process_name_t?
>>> 
>>> I am currently outputting some information about peer processes in the 
>>> usnic BTL to include the peer's VPID, which is the MCW rank.  I'll be sad 
>>> if that goes away...
>>> 
>>> 
>>> On Jul 15, 2014, at 2:06 AM, George Bosilca  wrote:
>>> 
 Ralph,
 
 There are two reasons that prevent me from pushing this RFC forward.
 
 1. Minor: The code has some minor issues related to the last set of 
 BTL/PML changes, and I didn't found the time to fix them.
 
 2. Major: Not all BTLs have been updated and validated. What we need at 
 this point from their respective developers is a little help with the 
 validation process. We need to validate that the new code works as 
 expected and passes all tests.
 
 The move will be ready to go as soon as all BTL developers raise the green 
 flag. I got it from Jeff (but the last USNIC commit broke something), and 
 myself. In other words, TCP, self, SM and USNIC are good to go. For the 
 others, as I didn't heard back from their developers/maintainers, I assume 
 they are not yet ready. Here I am referring to OpenIB, Portals4, Scif, 
 smcuda, ugni, usnic and vader.
 
 George.
 
 PS: As a reminder the code is available at 
 https://bitbucket.org/bosilca/ompi-btl
 
 
 
 On Fri, Jul 11, 2014 at 3:17 PM, Pritchard, Howard P  
 wrote:
 Hi Folks,
 
 Now work is planned for the uGNI BTL at this time either.
 
 Howard
 
 
 -Original Message-
 From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
 (jsquyres)
 Sent: Thursday, July 10, 2014 5:04 PM
 To: Open MPI Developers List
 Subject: Re: [OMPI devel] RFC: Move the Open MPI communication 
 infrastructure in OPAL
 
 FWIW: I can't speak for other BTL maintainers, but I'm out of the office 
 for the next week, and the usnic BTL will be standing still during that 
 time.  Once I return, I will be making additional changes in the usnic BTL 
 (new features, updates, ...etc.).
 
 So if you have the cycles, doing it in the next week or so would be good 
 because at least there will be no conflicts with usnic BTL concurrent 
 development.  :-)
 
 
 
 
 On Jul 10, 2014, at 2:56 PM, Ralph Castain  wrote:
 
> George: any update on when this will happen?
> 
> 
> On Jun 4, 2014, at 9:14 PM, George Bosilca  wrote:
> 
>> WHAT:Open our low-level communication infrastructure by moving all
>> necessary components
>>   (btl/rcache/allocator/mpool) down in OPAL
>> 
>> WHY: All the components required for inter-process communications are
>> currently deeply integrated in the OMPI
>>  layer. Several groups/institutions have express interest
>> in having a more generic communication
>>  infrastructure, without all the OMPI layer dependencies.
>> This communication layer should be made
>>  available at a different software level, available to all
>> layers in the Open MPI software stack. As an
>>  example, our ORTE layer could replace the current OOB and
>> instead use the BTL directly, gaining
>>  access to more reactive network interfaces than TCP.
>> Similarly, external software libraries could take
>>  advantage of our highly optimized AM (active message)
>> communication layer for their own purpose.
>> 
>>  UTK with support from Sandia, developped a version of
>> Open MPI where the entire communication
>>  infrastucture has been moved down to OPAL
>> (btl/rcache/allocator/mpo

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-23 Thread Jeff Squyres (jsquyres)
Sweet; I'll have a look at all of that -- thanks.

On Jul 23, 2014, at 10:15 PM, George Bosilca  wrote:

> I was struggling with a similar issue while trying to fix the OpenIB 
> compilation. And I choose to implement a different approach, which does not 
> require knowledge of what’s inside opal_process_name_t.
> 
> Look in opal/util/proc.h. You should be able to use: opal_process_name_vpid 
> and opal_process_name_jobid. They will remain there until we figure out a 
> nice way to get rid of them completely.
> 
> HINT: I personally prefer to get rid of void and jobid completely. As long as 
> need the info only for a visual clue, the output of OPAL_NAME_PRINT might be 
> enough.
> 
>  George.
> 
> On Jul 23, 2014, at 22:11 , Jeff Squyres (jsquyres)  
> wrote:
> 
>> Ralph and I chatted in IM.
>> 
>> For the moment, I'm masking off the lower 32 bits to get the VPID, the 
>> uppermost 16 as the job family, and the next 16 as the sub-family.
>> 
>> If George makes the name be a handle with accessors to get the parts, we can 
>> switch to using that.
>> 
>> 
>> 
>> On Jul 23, 2014, at 9:57 PM, Ralph Castain  wrote:
>> 
>>> You should be able to memcpy it to an ompi_process_name_t and then extract 
>>> it as usual
>>> 
>>> 
>>> On Jul 23, 2014, at 6:51 PM, Jeff Squyres (jsquyres)  
>>> wrote:
>>> 
 George --
 
 Is there a way to get the MPI_COMM_WORLD rank of an opal_process_name_t?
 
 I am currently outputting some information about peer processes in the 
 usnic BTL to include the peer's VPID, which is the MCW rank.  I'll be sad 
 if that goes away...
 
 
 On Jul 15, 2014, at 2:06 AM, George Bosilca  wrote:
 
> Ralph,
> 
> There are two reasons that prevent me from pushing this RFC forward.
> 
> 1. Minor: The code has some minor issues related to the last set of 
> BTL/PML changes, and I didn't found the time to fix them.
> 
> 2. Major: Not all BTLs have been updated and validated. What we need at 
> this point from their respective developers is a little help with the 
> validation process. We need to validate that the new code works as 
> expected and passes all tests.
> 
> The move will be ready to go as soon as all BTL developers raise the 
> green flag. I got it from Jeff (but the last USNIC commit broke 
> something), and myself. In other words, TCP, self, SM and USNIC are good 
> to go. For the others, as I didn't heard back from their 
> developers/maintainers, I assume they are not yet ready. Here I am 
> referring to OpenIB, Portals4, Scif, smcuda, ugni, usnic and vader.
> 
> George.
> 
> PS: As a reminder the code is available at 
> https://bitbucket.org/bosilca/ompi-btl
> 
> 
> 
> On Fri, Jul 11, 2014 at 3:17 PM, Pritchard, Howard P  
> wrote:
> Hi Folks,
> 
> Now work is planned for the uGNI BTL at this time either.
> 
> Howard
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
> (jsquyres)
> Sent: Thursday, July 10, 2014 5:04 PM
> To: Open MPI Developers List
> Subject: Re: [OMPI devel] RFC: Move the Open MPI communication 
> infrastructure in OPAL
> 
> FWIW: I can't speak for other BTL maintainers, but I'm out of the office 
> for the next week, and the usnic BTL will be standing still during that 
> time.  Once I return, I will be making additional changes in the usnic 
> BTL (new features, updates, ...etc.).
> 
> So if you have the cycles, doing it in the next week or so would be good 
> because at least there will be no conflicts with usnic BTL concurrent 
> development.  :-)
> 
> 
> 
> 
> On Jul 10, 2014, at 2:56 PM, Ralph Castain  wrote:
> 
>> George: any update on when this will happen?
>> 
>> 
>> On Jun 4, 2014, at 9:14 PM, George Bosilca  wrote:
>> 
>>> WHAT:Open our low-level communication infrastructure by moving all
>>> necessary components
>>>  (btl/rcache/allocator/mpool) down in OPAL
>>> 
>>> WHY: All the components required for inter-process communications are
>>> currently deeply integrated in the OMPI
>>> layer. Several groups/institutions have express interest
>>> in having a more generic communication
>>> infrastructure, without all the OMPI layer dependencies.
>>> This communication layer should be made
>>> available at a different software level, available to all
>>> layers in the Open MPI software stack. As an
>>> example, our ORTE layer could replace the current OOB and
>>> instead use the BTL directly, gaining
>>> access to more reactive network interfaces than TCP.
>>> Similarly, external software libraries could take
>>> advantage of our highly optimized AM (active message)
>>> communication la

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-23 Thread Ralph Castain
Sounds reasonable. However, keep in mind that some BTLs actually require the 
notion of a jobid and rank-within-that-job. If the current ones don't, I assure 
you that at least one off-trunk one definitely does

Some of the MTL's, of course, definitely rely on those fields.


On Jul 23, 2014, at 7:15 PM, George Bosilca  wrote:

> I was struggling with a similar issue while trying to fix the OpenIB 
> compilation. And I choose to implement a different approach, which does not 
> require knowledge of what’s inside opal_process_name_t.
> 
> Look in opal/util/proc.h. You should be able to use: opal_process_name_vpid 
> and opal_process_name_jobid. They will remain there until we figure out a 
> nice way to get rid of them completely.
> 
> HINT: I personally prefer to get rid of void and jobid completely. As long as 
> need the info only for a visual clue, the output of OPAL_NAME_PRINT might be 
> enough.
> 
>  George.
> 
> On Jul 23, 2014, at 22:11 , Jeff Squyres (jsquyres)  
> wrote:
> 
>> Ralph and I chatted in IM.
>> 
>> For the moment, I'm masking off the lower 32 bits to get the VPID, the 
>> uppermost 16 as the job family, and the next 16 as the sub-family.
>> 
>> If George makes the name be a handle with accessors to get the parts, we can 
>> switch to using that.
>> 
>> 
>> 
>> On Jul 23, 2014, at 9:57 PM, Ralph Castain  wrote:
>> 
>>> You should be able to memcpy it to an ompi_process_name_t and then extract 
>>> it as usual
>>> 
>>> 
>>> On Jul 23, 2014, at 6:51 PM, Jeff Squyres (jsquyres)  
>>> wrote:
>>> 
 George --
 
 Is there a way to get the MPI_COMM_WORLD rank of an opal_process_name_t?
 
 I am currently outputting some information about peer processes in the 
 usnic BTL to include the peer's VPID, which is the MCW rank.  I'll be sad 
 if that goes away...
 
 
 On Jul 15, 2014, at 2:06 AM, George Bosilca  wrote:
 
> Ralph,
> 
> There are two reasons that prevent me from pushing this RFC forward.
> 
> 1. Minor: The code has some minor issues related to the last set of 
> BTL/PML changes, and I didn't found the time to fix them.
> 
> 2. Major: Not all BTLs have been updated and validated. What we need at 
> this point from their respective developers is a little help with the 
> validation process. We need to validate that the new code works as 
> expected and passes all tests.
> 
> The move will be ready to go as soon as all BTL developers raise the 
> green flag. I got it from Jeff (but the last USNIC commit broke 
> something), and myself. In other words, TCP, self, SM and USNIC are good 
> to go. For the others, as I didn't heard back from their 
> developers/maintainers, I assume they are not yet ready. Here I am 
> referring to OpenIB, Portals4, Scif, smcuda, ugni, usnic and vader.
> 
> George.
> 
> PS: As a reminder the code is available at 
> https://bitbucket.org/bosilca/ompi-btl
> 
> 
> 
> On Fri, Jul 11, 2014 at 3:17 PM, Pritchard, Howard P  
> wrote:
> Hi Folks,
> 
> Now work is planned for the uGNI BTL at this time either.
> 
> Howard
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
> (jsquyres)
> Sent: Thursday, July 10, 2014 5:04 PM
> To: Open MPI Developers List
> Subject: Re: [OMPI devel] RFC: Move the Open MPI communication 
> infrastructure in OPAL
> 
> FWIW: I can't speak for other BTL maintainers, but I'm out of the office 
> for the next week, and the usnic BTL will be standing still during that 
> time.  Once I return, I will be making additional changes in the usnic 
> BTL (new features, updates, ...etc.).
> 
> So if you have the cycles, doing it in the next week or so would be good 
> because at least there will be no conflicts with usnic BTL concurrent 
> development.  :-)
> 
> 
> 
> 
> On Jul 10, 2014, at 2:56 PM, Ralph Castain  wrote:
> 
>> George: any update on when this will happen?
>> 
>> 
>> On Jun 4, 2014, at 9:14 PM, George Bosilca  wrote:
>> 
>>> WHAT:Open our low-level communication infrastructure by moving all
>>> necessary components
>>>  (btl/rcache/allocator/mpool) down in OPAL
>>> 
>>> WHY: All the components required for inter-process communications are
>>> currently deeply integrated in the OMPI
>>> layer. Several groups/institutions have express interest
>>> in having a more generic communication
>>> infrastructure, without all the OMPI layer dependencies.
>>> This communication layer should be made
>>> available at a different software level, available to all
>>> layers in the Open MPI software stack. As an
>>> example, our ORTE layer could replace the current OOB and
>>> instead use the BTL directly, gain

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-23 Thread George Bosilca
A BTL should be completely agnostic to the notions of vpid and jobid.  
Unfortunately, as you mentioned, some of the BTLs are relying on this 
information in diverses ways.

- If they rely for output purposes, this is a trivial matter as a BTL is 
supposed to rely upward any error and some upper layer will decide how to 
handle it. As the callers are in the OMPI layer, they can output the meaningful 
message (including rank and what not).

- Some other BTLs use this information to create connections. Clearly not the 
best decision, as it bit us for quite some time (as an example being the major 
reason preventing SM support across different MPI worlds). Moreover, other 
programming paradigms that can use the BTLs, are not subject to a rank-base 
concept. Thus, this usage should be banned and replaced by a more sensible 
approach (to be defined). Until then, the current solution provide an 
acceptable band-aid.

George. 

PS: The PML and MTL remaining at the OMPI later do not create any issues with 
accessing the local or the MPI rank.

On Jul 23, 2014, at 22:19 , Ralph Castain  wrote:

> Sounds reasonable. However, keep in mind that some BTLs actually require the 
> notion of a jobid and rank-within-that-job. If the current ones don't, I 
> assure you that at least one off-trunk one definitely does
> 
> Some of the MTL's, of course, definitely rely on those fields.
> 
> 
> On Jul 23, 2014, at 7:15 PM, George Bosilca  wrote:
> 
>> I was struggling with a similar issue while trying to fix the OpenIB 
>> compilation. And I choose to implement a different approach, which does not 
>> require knowledge of what’s inside opal_process_name_t.
>> 
>> Look in opal/util/proc.h. You should be able to use: opal_process_name_vpid 
>> and opal_process_name_jobid. They will remain there until we figure out a 
>> nice way to get rid of them completely.
>> 
>> HINT: I personally prefer to get rid of void and jobid completely. As long 
>> as need the info only for a visual clue, the output of OPAL_NAME_PRINT might 
>> be enough.
>> 
>> George.
>> 
>> On Jul 23, 2014, at 22:11 , Jeff Squyres (jsquyres)  
>> wrote:
>> 
>>> Ralph and I chatted in IM.
>>> 
>>> For the moment, I'm masking off the lower 32 bits to get the VPID, the 
>>> uppermost 16 as the job family, and the next 16 as the sub-family.
>>> 
>>> If George makes the name be a handle with accessors to get the parts, we 
>>> can switch to using that.
>>> 
>>> 
>>> 
>>> On Jul 23, 2014, at 9:57 PM, Ralph Castain  wrote:
>>> 
 You should be able to memcpy it to an ompi_process_name_t and then extract 
 it as usual
 
 
 On Jul 23, 2014, at 6:51 PM, Jeff Squyres (jsquyres)  
 wrote:
 
> George --
> 
> Is there a way to get the MPI_COMM_WORLD rank of an opal_process_name_t?
> 
> I am currently outputting some information about peer processes in the 
> usnic BTL to include the peer's VPID, which is the MCW rank.  I'll be sad 
> if that goes away...
> 
> 
> On Jul 15, 2014, at 2:06 AM, George Bosilca  wrote:
> 
>> Ralph,
>> 
>> There are two reasons that prevent me from pushing this RFC forward.
>> 
>> 1. Minor: The code has some minor issues related to the last set of 
>> BTL/PML changes, and I didn't found the time to fix them.
>> 
>> 2. Major: Not all BTLs have been updated and validated. What we need at 
>> this point from their respective developers is a little help with the 
>> validation process. We need to validate that the new code works as 
>> expected and passes all tests.
>> 
>> The move will be ready to go as soon as all BTL developers raise the 
>> green flag. I got it from Jeff (but the last USNIC commit broke 
>> something), and myself. In other words, TCP, self, SM and USNIC are good 
>> to go. For the others, as I didn't heard back from their 
>> developers/maintainers, I assume they are not yet ready. Here I am 
>> referring to OpenIB, Portals4, Scif, smcuda, ugni, usnic and vader.
>> 
>> George.
>> 
>> PS: As a reminder the code is available at 
>> https://bitbucket.org/bosilca/ompi-btl
>> 
>> 
>> 
>> On Fri, Jul 11, 2014 at 3:17 PM, Pritchard, Howard P  
>> wrote:
>> Hi Folks,
>> 
>> Now work is planned for the uGNI BTL at this time either.
>> 
>> Howard
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff 
>> Squyres (jsquyres)
>> Sent: Thursday, July 10, 2014 5:04 PM
>> To: Open MPI Developers List
>> Subject: Re: [OMPI devel] RFC: Move the Open MPI communication 
>> infrastructure in OPAL
>> 
>> FWIW: I can't speak for other BTL maintainers, but I'm out of the office 
>> for the next week, and the usnic BTL will be standing still during that 
>> time.  Once I return, I will be making additional changes in the usnic 
>> BTL (new features, updat

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-23 Thread Ralph Castain
I agree with the goal - we'll have to work this out at a later time. One key 
will be maintaining a memory-efficient mapping of opal_identifier to an RTE 
identifier, which typically requires some notion of launch grouping and rank 
within that grouping.


On Jul 23, 2014, at 7:36 PM, George Bosilca  wrote:

> A BTL should be completely agnostic to the notions of vpid and jobid.  
> Unfortunately, as you mentioned, some of the BTLs are relying on this 
> information in diverses ways.
> 
> - If they rely for output purposes, this is a trivial matter as a BTL is 
> supposed to rely upward any error and some upper layer will decide how to 
> handle it. As the callers are in the OMPI layer, they can output the 
> meaningful message (including rank and what not).
> 
> - Some other BTLs use this information to create connections. Clearly not the 
> best decision, as it bit us for quite some time (as an example being the 
> major reason preventing SM support across different MPI worlds). Moreover, 
> other programming paradigms that can use the BTLs, are not subject to a 
> rank-base concept. Thus, this usage should be banned and replaced by a more 
> sensible approach (to be defined). Until then, the current solution provide 
> an acceptable band-aid.
> 
> George. 
> 
> PS: The PML and MTL remaining at the OMPI later do not create any issues with 
> accessing the local or the MPI rank.
> 
> On Jul 23, 2014, at 22:19 , Ralph Castain  wrote:
> 
>> Sounds reasonable. However, keep in mind that some BTLs actually require the 
>> notion of a jobid and rank-within-that-job. If the current ones don't, I 
>> assure you that at least one off-trunk one definitely does
>> 
>> Some of the MTL's, of course, definitely rely on those fields.
>> 
>> 
>> On Jul 23, 2014, at 7:15 PM, George Bosilca  wrote:
>> 
>>> I was struggling with a similar issue while trying to fix the OpenIB 
>>> compilation. And I choose to implement a different approach, which does not 
>>> require knowledge of what’s inside opal_process_name_t.
>>> 
>>> Look in opal/util/proc.h. You should be able to use: opal_process_name_vpid 
>>> and opal_process_name_jobid. They will remain there until we figure out a 
>>> nice way to get rid of them completely.
>>> 
>>> HINT: I personally prefer to get rid of void and jobid completely. As long 
>>> as need the info only for a visual clue, the output of OPAL_NAME_PRINT 
>>> might be enough.
>>> 
>>> George.
>>> 
>>> On Jul 23, 2014, at 22:11 , Jeff Squyres (jsquyres)  
>>> wrote:
>>> 
 Ralph and I chatted in IM.
 
 For the moment, I'm masking off the lower 32 bits to get the VPID, the 
 uppermost 16 as the job family, and the next 16 as the sub-family.
 
 If George makes the name be a handle with accessors to get the parts, we 
 can switch to using that.
 
 
 
 On Jul 23, 2014, at 9:57 PM, Ralph Castain  wrote:
 
> You should be able to memcpy it to an ompi_process_name_t and then 
> extract it as usual
> 
> 
> On Jul 23, 2014, at 6:51 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
>> George --
>> 
>> Is there a way to get the MPI_COMM_WORLD rank of an opal_process_name_t?
>> 
>> I am currently outputting some information about peer processes in the 
>> usnic BTL to include the peer's VPID, which is the MCW rank.  I'll be 
>> sad if that goes away...
>> 
>> 
>> On Jul 15, 2014, at 2:06 AM, George Bosilca  wrote:
>> 
>>> Ralph,
>>> 
>>> There are two reasons that prevent me from pushing this RFC forward.
>>> 
>>> 1. Minor: The code has some minor issues related to the last set of 
>>> BTL/PML changes, and I didn't found the time to fix them.
>>> 
>>> 2. Major: Not all BTLs have been updated and validated. What we need at 
>>> this point from their respective developers is a little help with the 
>>> validation process. We need to validate that the new code works as 
>>> expected and passes all tests.
>>> 
>>> The move will be ready to go as soon as all BTL developers raise the 
>>> green flag. I got it from Jeff (but the last USNIC commit broke 
>>> something), and myself. In other words, TCP, self, SM and USNIC are 
>>> good to go. For the others, as I didn't heard back from their 
>>> developers/maintainers, I assume they are not yet ready. Here I am 
>>> referring to OpenIB, Portals4, Scif, smcuda, ugni, usnic and vader.
>>> 
>>> George.
>>> 
>>> PS: As a reminder the code is available at 
>>> https://bitbucket.org/bosilca/ompi-btl
>>> 
>>> 
>>> 
>>> On Fri, Jul 11, 2014 at 3:17 PM, Pritchard, Howard P  
>>> wrote:
>>> Hi Folks,
>>> 
>>> Now work is planned for the uGNI BTL at this time either.
>>> 
>>> Howard
>>> 
>>> 
>>> -Original Message-
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff 
>>> Squyres (jsquyres)
>>> S