Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Howard Pritchard
thanks Ralph.  I will add it to one of the UH jenkins scripts.

--

sent from my smart phonr so no good type.

Howard
On Sep 16, 2015 10:28 PM, "Ralph Castain"  wrote:

> Actually, Edgar attached a simple reproducer to the first message in this
> thread.
>
>
> On Wed, Sep 16, 2015 at 7:27 PM, Howard Pritchard 
> wrote:
>
>> Edgar
>>
>> Do you have a simple test we could run with jenkins ghprb that would
>> catch this going forward?
>>
>> I could add it to some of the checks we run on your UH slave node.
>>
>> Howard
>>
>> --
>>
>> sent from my smart phonr so no good type.
>>
>> Howard
>> On Sep 16, 2015 12:36 PM, "Nathan Hjelm"  wrote:
>>
>>>
>>> I see the problem. Before my changes ompi_comm_dup signalled that the
>>> communicator was not an inter-communicator by setting remote_size to
>>> 0. The remote size is now from the remote group if one was supplied
>>> (which is the case with intra-communicators) so ompi_comm_dup needs to
>>> make sure NULL is passed for the remote_group when duplicating
>>> intra-communicators.
>>>
>>> I opened a PR. Once jenkins finishes I will merge it onto master.
>>>
>>> -Nathan
>>>
>>> On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
>>> > yes, I did fresh pull this morning, for me it deadlocks reliably for 2
>>> and
>>> > more processes.
>>> >
>>> > Thanks
>>> > Edgar
>>> >
>>> > On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
>>> > >
>>> > >The reproducer is working for me with master on OX 10.10. Some changes
>>> > >to ompi_comm_set went in yesterday. Are you on the latest hash?
>>> > >
>>> > >-Nathan
>>> > >
>>> > >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
>>> > >>something is borked right now on master in the management of inter
>>> vs. intra
>>> > >>communicators. It looks like intra communicators are wrongly
>>> selecting the
>>> > >>inter coll module thinking that it is an inter communicator, and we
>>> have
>>> > >>hangs because of that. I attach a small replicator, where a bcast of
>>> a
>>> > >>duplicate of MPI_COMM_WORLD hangs, because the inter collective
>>> module is
>>> > >>being selected.
>>> > >>
>>> > >>Thanks
>>> > >>Edgar
>>> > >
>>> > >>#include 
>>> > >>#include "mpi.h"
>>> > >>
>>> > >>int main( int argc, char *argv[] )
>>> > >>{
>>> > >>   MPI_Comm comm1;
>>> > >>   int root=0;
>>> > >>   int rank2, size2, global_buf=1;
>>> > >>   int rank, size;
>>> > >>
>>> > >>   MPI_Init ( &argc, &argv );
>>> > >>
>>> > >>   MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
>>> > >>   MPI_Comm_size ( MPI_COMM_WORLD, &size );
>>> > >>
>>> > >>/* Setting up a new communicator */
>>> > >>   MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );
>>> > >>
>>> > >>   MPI_Comm_size ( comm1, &size2 );
>>> > >>   MPI_Comm_rank ( comm1, &rank2 );
>>> > >>
>>> > >>
>>> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
>>> > >>   if ( rank == root ) {
>>> > >>   printf("Bcast on MPI_COMM_WORLD finished\n");
>>> > >>   }
>>> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
>>> > >>   if ( rank == root ) {
>>> > >>   printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
>>> > >>   }
>>> > >>
>>> > >>   MPI_Comm_free ( &comm1 );
>>> > >>
>>> > >>   MPI_Finalize ();
>>> > >>   return ( 0 );
>>> > >>}
>>> > >
>>> > >>___
>>> > >>devel mailing list
>>> > >>de...@open-mpi.org
>>> > >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> > >>Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
>>> > >
>>> > >
>>> > >
>>> > >___
>>> > >devel mailing list
>>> > >de...@open-mpi.org
>>> > >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> > >Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
>>> > >
>>> > ___
>>> > devel mailing list
>>> > de...@open-mpi.org
>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> > Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18049.php
>>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18057.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18059.php
>


Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Ralph Castain
Actually, Edgar attached a simple reproducer to the first message in this
thread.


On Wed, Sep 16, 2015 at 7:27 PM, Howard Pritchard 
wrote:

> Edgar
>
> Do you have a simple test we could run with jenkins ghprb that would catch
> this going forward?
>
> I could add it to some of the checks we run on your UH slave node.
>
> Howard
>
> --
>
> sent from my smart phonr so no good type.
>
> Howard
> On Sep 16, 2015 12:36 PM, "Nathan Hjelm"  wrote:
>
>>
>> I see the problem. Before my changes ompi_comm_dup signalled that the
>> communicator was not an inter-communicator by setting remote_size to
>> 0. The remote size is now from the remote group if one was supplied
>> (which is the case with intra-communicators) so ompi_comm_dup needs to
>> make sure NULL is passed for the remote_group when duplicating
>> intra-communicators.
>>
>> I opened a PR. Once jenkins finishes I will merge it onto master.
>>
>> -Nathan
>>
>> On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
>> > yes, I did fresh pull this morning, for me it deadlocks reliably for 2
>> and
>> > more processes.
>> >
>> > Thanks
>> > Edgar
>> >
>> > On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
>> > >
>> > >The reproducer is working for me with master on OX 10.10. Some changes
>> > >to ompi_comm_set went in yesterday. Are you on the latest hash?
>> > >
>> > >-Nathan
>> > >
>> > >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
>> > >>something is borked right now on master in the management of inter
>> vs. intra
>> > >>communicators. It looks like intra communicators are wrongly
>> selecting the
>> > >>inter coll module thinking that it is an inter communicator, and we
>> have
>> > >>hangs because of that. I attach a small replicator, where a bcast of a
>> > >>duplicate of MPI_COMM_WORLD hangs, because the inter collective
>> module is
>> > >>being selected.
>> > >>
>> > >>Thanks
>> > >>Edgar
>> > >
>> > >>#include 
>> > >>#include "mpi.h"
>> > >>
>> > >>int main( int argc, char *argv[] )
>> > >>{
>> > >>   MPI_Comm comm1;
>> > >>   int root=0;
>> > >>   int rank2, size2, global_buf=1;
>> > >>   int rank, size;
>> > >>
>> > >>   MPI_Init ( &argc, &argv );
>> > >>
>> > >>   MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
>> > >>   MPI_Comm_size ( MPI_COMM_WORLD, &size );
>> > >>
>> > >>/* Setting up a new communicator */
>> > >>   MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );
>> > >>
>> > >>   MPI_Comm_size ( comm1, &size2 );
>> > >>   MPI_Comm_rank ( comm1, &rank2 );
>> > >>
>> > >>
>> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
>> > >>   if ( rank == root ) {
>> > >>   printf("Bcast on MPI_COMM_WORLD finished\n");
>> > >>   }
>> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
>> > >>   if ( rank == root ) {
>> > >>   printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
>> > >>   }
>> > >>
>> > >>   MPI_Comm_free ( &comm1 );
>> > >>
>> > >>   MPI_Finalize ();
>> > >>   return ( 0 );
>> > >>}
>> > >
>> > >>___
>> > >>devel mailing list
>> > >>de...@open-mpi.org
>> > >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > >>Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
>> > >
>> > >
>> > >
>> > >___
>> > >devel mailing list
>> > >de...@open-mpi.org
>> > >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > >Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
>> > >
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18049.php
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18057.php
>


Re: [OMPI devel] --enable-spare-groups build broken

2015-09-16 Thread Ralph Castain
Yes - Nathan made some changes related to the add_procs code. I doubt that
configure option was checked...

On Wed, Sep 16, 2015 at 7:13 PM, Jeff Squyres (jsquyres)  wrote:

> Did something change in the group structure in the last 24-48 hours?
>
> --enable-spare-groups groups are currently broken:
>
> 
> make[2]: Entering directory `/home/jsquyres/git/ompi/ompi/debuggers'
>   CC   libdebuggers_la-ompi_debuggers.lo
> In file included from ../../ompi/communicator/communicator.h:38:0,
>  from ../../ompi/mca/pml/base/pml_base_request.h:32,
>  from ompi_debuggers.c:67:
> ../../ompi/group/group.h: In function ‘ompi_group_get_proc_ptr’:
> ../../ompi/group/group.h:366:52: error: ‘peer_id’ undeclared (first use in
> this function)
>  return ompi_group_dense_lookup (group, peer_id, allocate);
> ^
> ../../ompi/group/group.h:366:52: note: each undeclared identifier is
> reported only once for each function it appears in
> -
>
> Can someone have a look?
>
> Thanks.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18056.php


Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Howard Pritchard
Edgar

Do you have a simple test we could run with jenkins ghprb that would catch
this going forward?

I could add it to some of the checks we run on your UH slave node.

Howard

--

sent from my smart phonr so no good type.

Howard
On Sep 16, 2015 12:36 PM, "Nathan Hjelm"  wrote:

>
> I see the problem. Before my changes ompi_comm_dup signalled that the
> communicator was not an inter-communicator by setting remote_size to
> 0. The remote size is now from the remote group if one was supplied
> (which is the case with intra-communicators) so ompi_comm_dup needs to
> make sure NULL is passed for the remote_group when duplicating
> intra-communicators.
>
> I opened a PR. Once jenkins finishes I will merge it onto master.
>
> -Nathan
>
> On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
> > yes, I did fresh pull this morning, for me it deadlocks reliably for 2
> and
> > more processes.
> >
> > Thanks
> > Edgar
> >
> > On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
> > >
> > >The reproducer is working for me with master on OX 10.10. Some changes
> > >to ompi_comm_set went in yesterday. Are you on the latest hash?
> > >
> > >-Nathan
> > >
> > >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
> > >>something is borked right now on master in the management of inter vs.
> intra
> > >>communicators. It looks like intra communicators are wrongly selecting
> the
> > >>inter coll module thinking that it is an inter communicator, and we
> have
> > >>hangs because of that. I attach a small replicator, where a bcast of a
> > >>duplicate of MPI_COMM_WORLD hangs, because the inter collective module
> is
> > >>being selected.
> > >>
> > >>Thanks
> > >>Edgar
> > >
> > >>#include 
> > >>#include "mpi.h"
> > >>
> > >>int main( int argc, char *argv[] )
> > >>{
> > >>   MPI_Comm comm1;
> > >>   int root=0;
> > >>   int rank2, size2, global_buf=1;
> > >>   int rank, size;
> > >>
> > >>   MPI_Init ( &argc, &argv );
> > >>
> > >>   MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
> > >>   MPI_Comm_size ( MPI_COMM_WORLD, &size );
> > >>
> > >>/* Setting up a new communicator */
> > >>   MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );
> > >>
> > >>   MPI_Comm_size ( comm1, &size2 );
> > >>   MPI_Comm_rank ( comm1, &rank2 );
> > >>
> > >>
> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
> > >>   if ( rank == root ) {
> > >>   printf("Bcast on MPI_COMM_WORLD finished\n");
> > >>   }
> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
> > >>   if ( rank == root ) {
> > >>   printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
> > >>   }
> > >>
> > >>   MPI_Comm_free ( &comm1 );
> > >>
> > >>   MPI_Finalize ();
> > >>   return ( 0 );
> > >>}
> > >
> > >>___
> > >>devel mailing list
> > >>de...@open-mpi.org
> > >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >>Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
> > >
> > >
> > >
> > >___
> > >devel mailing list
> > >de...@open-mpi.org
> > >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
> > >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18049.php
>


[OMPI devel] --enable-spare-groups build broken

2015-09-16 Thread Jeff Squyres (jsquyres)
Did something change in the group structure in the last 24-48 hours?

--enable-spare-groups groups are currently broken:


make[2]: Entering directory `/home/jsquyres/git/ompi/ompi/debuggers'
  CC   libdebuggers_la-ompi_debuggers.lo
In file included from ../../ompi/communicator/communicator.h:38:0,
 from ../../ompi/mca/pml/base/pml_base_request.h:32,
 from ompi_debuggers.c:67:
../../ompi/group/group.h: In function ‘ompi_group_get_proc_ptr’:
../../ompi/group/group.h:366:52: error: ‘peer_id’ undeclared (first use in this 
function)
 return ompi_group_dense_lookup (group, peer_id, allocate);
^
../../ompi/group/group.h:366:52: note: each undeclared identifier is reported 
only once for each function it appears in 
-

Can someone have a look?

Thanks.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread George Bosilca
On Wed, Sep 16, 2015 at 3:11 PM, Владимир Трущин 
wrote:

> Sorry, “We saw the following problem in OMPI_FREE_LIST_GET_MT…”.
>

That's exactly what the WAIT macro is supposed to solve, wait (grow the
freelist and call opal_progress) until an item become available.

  George.



>
>
> *From:* Владимир Трущин [mailto:vdtrusc...@compcenter.org]
> *Sent:* Wednesday, September 16, 2015 10:09 PM
> *To:* 'Open MPI Developers'
> *Subject:* RE: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()
>
>
>
> George,
>
>
>
> You are right. The sequence of calls in our test is MPI_Irecv ->
> mca_pml_ob1_irecv -> MCA_PML_OB1_RECV_REQUEST_ALLOC. We will try to use
> OMPI_FREE_LIST_WAIT_MT.
>
>
>
> We saw the following problem in OMPI_FREE_LIST_WAIT_MT. It returned NULL
> in case when thread A was suspended after the call of  ompi_free_list_grow.
> At this time others threads took all items from free list at the first call
> of opal_atomic_lifo_pop in macro. So, when thread A was unsuspended and
> call the second opal_atomic_lifo_pop in macro - it returned NULL.
>
>
>
> Best regards,
>
> Vladimir.
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org
> ] *On Behalf Of *George Bosilca
> *Sent:* Wednesday, September 16, 2015 7:00 PM
> *To:* Open MPI Developers
> *Subject:* Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()
>
>
>
> Alexey,
>
>
>
> This is not necessarily the fix for all cases. Most of the internal uses
> of the free_list can easily accommodate to the fact that no more elements
> are available. Based on your description of the problem I would assume you
> encounter this problem once the MCA_PML_OB1_RECV_REQUEST_ALLOC is called.
> In this particular case the problem is that fact that we call
> OMPI_FREE_LIST_GET_MT and that the upper level is unable to correctly deal
> with the case where the returned item is NULL. In this particular case the
> real fix is to use the blocking version of the free_list accessor (similar
> to the case for send) OMPI_FREE_LIST_WAIT_MT.
>
>
>
>
>
> It is also possible that I misunderstood your problem. IF the solution
> above doesn't work can you describe exactly where the NULL return of the
> OMPI_FREE_LIST_GET_MT is creating an issue?
>
>
>
> George.
>
>
>
>
>
> On Wed, Sep 16, 2015 at 9:03 AM, Алексей Рыжих 
> wrote:
>
> Hi all,
>
> We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
> support level)  where several threads submits a lot of MPI_Irecv() requests
> simultaneously and encountered an intermittent bug
> OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC()
> because  OMPI_FREE_LIST_GET_MT()  returned NULL.  Investigating this bug we
> found that sometimes the thread calling ompi_free_list_grow()  don’t have
> any free items in LIFO list at exit because other threads  retrieved  all
> new items at opal_atomic_lifo_pop()
>
> So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
>
>
>
> #define OMPI_FREE_LIST_GET_MT(fl, item)
>\
>
> {
>   \
>
> item = (ompi_free_list_item_t*)
> opal_atomic_lifo_pop(&((fl)->super)); \
>
> if( OPAL_UNLIKELY(NULL == item) )
> {   \
>
> if(opal_using_threads())
> {\
>
> int rc;
>   \
>
>
> opal_mutex_lock(&((fl)->fl_lock));\
>
>
> do\
>
> {
>   \
>
> rc = ompi_free_list_grow((fl),
> (fl)->fl_num_per_alloc);   \
>
> if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
> break; \
>
>
>   \
>
> item = (ompi_free_list_item_t*)
> opal_atomic_lifo_pop(&((fl)->super)); \
>
>
> \
>
> } while
> (!item);  \
>
>
> opal_mutex_unlock(&((fl)->fl_lock));  \
>
> } else
> {  \
>
> ompi_free_list_grow((fl),
> (fl)->fl_num_per_alloc);\
>
> item = (ompi_free_list_item_t*)
> opal_atomic_lifo_pop(&((fl)->super)); \
>
> } /* opal_using_threads() */
>   \
>
> } /* NULL == item
> */  \
>
> }
>
>
>
>
>
> Another workaround is to increase the value of  pml_ob1_free_list_inc
> parameter.
>
>
>
> Regards,
>
> Alexey
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/

Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Владимир Трущин
Sorry, “We saw the following problem in OMPI_FREE_LIST_GET_MT…”.



*From:* Владимир Трущин [mailto:vdtrusc...@compcenter.org]
*Sent:* Wednesday, September 16, 2015 10:09 PM
*To:* 'Open MPI Developers'
*Subject:* RE: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()



George,



You are right. The sequence of calls in our test is MPI_Irecv ->
mca_pml_ob1_irecv -> MCA_PML_OB1_RECV_REQUEST_ALLOC. We will try to use
OMPI_FREE_LIST_WAIT_MT.



We saw the following problem in OMPI_FREE_LIST_WAIT_MT. It returned NULL in
case when thread A was suspended after the call of  ompi_free_list_grow. At
this time others threads took all items from free list at the first call of
opal_atomic_lifo_pop in macro. So, when thread A was unsuspended and call
the second opal_atomic_lifo_pop in macro - it returned NULL.



Best regards,

Vladimir.



*From:* devel [mailto:devel-boun...@open-mpi.org
] *On Behalf Of *George Bosilca
*Sent:* Wednesday, September 16, 2015 7:00 PM
*To:* Open MPI Developers
*Subject:* Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()



Alexey,



This is not necessarily the fix for all cases. Most of the internal uses of
the free_list can easily accommodate to the fact that no more elements are
available. Based on your description of the problem I would assume you
encounter this problem once the MCA_PML_OB1_RECV_REQUEST_ALLOC is called.
In this particular case the problem is that fact that we call
OMPI_FREE_LIST_GET_MT and that the upper level is unable to correctly deal
with the case where the returned item is NULL. In this particular case the
real fix is to use the blocking version of the free_list accessor (similar
to the case for send) OMPI_FREE_LIST_WAIT_MT.





It is also possible that I misunderstood your problem. IF the solution
above doesn't work can you describe exactly where the NULL return of the
OMPI_FREE_LIST_GET_MT is creating an issue?



George.





On Wed, Sep 16, 2015 at 9:03 AM, Алексей Рыжих 
wrote:

Hi all,

We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
support level)  where several threads submits a lot of MPI_Irecv() requests
simultaneously and encountered an intermittent bug
OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC()
because  OMPI_FREE_LIST_GET_MT()  returned NULL.  Investigating this bug we
found that sometimes the thread calling ompi_free_list_grow()  don’t have
any free items in LIFO list at exit because other threads  retrieved  all
new items at opal_atomic_lifo_pop()

So we suggest to change OMPI_FREE_LIST_GET_MT() as below:



#define OMPI_FREE_LIST_GET_MT(fl, item)
   \

{
  \

item = (ompi_free_list_item_t*)
opal_atomic_lifo_pop(&((fl)->super)); \

if( OPAL_UNLIKELY(NULL == item) )
{   \

if(opal_using_threads())
{\

int rc;
  \


opal_mutex_lock(&((fl)->fl_lock));\


do\

{
  \

rc = ompi_free_list_grow((fl),
(fl)->fl_num_per_alloc);   \

if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
break; \


  \

item = (ompi_free_list_item_t*)
opal_atomic_lifo_pop(&((fl)->super)); \


\

} while
(!item);  \


opal_mutex_unlock(&((fl)->fl_lock));  \

} else {
  \

ompi_free_list_grow((fl),
(fl)->fl_num_per_alloc);\

item = (ompi_free_list_item_t*)
opal_atomic_lifo_pop(&((fl)->super)); \

} /* opal_using_threads() */
  \

} /* NULL == item
*/  \

}





Another workaround is to increase the value of  pml_ob1_free_list_inc
parameter.



Regards,

Alexey




___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/09/18039.php


Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Владимир Трущин
George,



You are right. The sequence of calls in our test is MPI_Irecv ->
mca_pml_ob1_irecv -> MCA_PML_OB1_RECV_REQUEST_ALLOC. We will try to use
OMPI_FREE_LIST_WAIT_MT.



We saw the following problem in OMPI_FREE_LIST_WAIT_MT. It returned NULL in
case when thread A was suspended after the call of  ompi_free_list_grow. At
this time others threads took all items from free list at the first call of
opal_atomic_lifo_pop in macro. So, when thread A was unsuspended and call
the second opal_atomic_lifo_pop in macro - it returned NULL.



Best regards,

Vladimir.



*From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *George
Bosilca
*Sent:* Wednesday, September 16, 2015 7:00 PM
*To:* Open MPI Developers
*Subject:* Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()



Alexey,



This is not necessarily the fix for all cases. Most of the internal uses of
the free_list can easily accommodate to the fact that no more elements are
available. Based on your description of the problem I would assume you
encounter this problem once the MCA_PML_OB1_RECV_REQUEST_ALLOC is called.
In this particular case the problem is that fact that we call
OMPI_FREE_LIST_GET_MT and that the upper level is unable to correctly deal
with the case where the returned item is NULL. In this particular case the
real fix is to use the blocking version of the free_list accessor (similar
to the case for send) OMPI_FREE_LIST_WAIT_MT.





It is also possible that I misunderstood your problem. IF the solution
above doesn't work can you describe exactly where the NULL return of the
OMPI_FREE_LIST_GET_MT is creating an issue?



George.





On Wed, Sep 16, 2015 at 9:03 AM, Алексей Рыжих 
wrote:

Hi all,

We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
support level)  where several threads submits a lot of MPI_Irecv() requests
simultaneously and encountered an intermittent bug
OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC()
because  OMPI_FREE_LIST_GET_MT()  returned NULL.  Investigating this bug we
found that sometimes the thread calling ompi_free_list_grow()  don’t have
any free items in LIFO list at exit because other threads  retrieved  all
new items at opal_atomic_lifo_pop()

So we suggest to change OMPI_FREE_LIST_GET_MT() as below:



#define OMPI_FREE_LIST_GET_MT(fl, item)
   \

{
  \

item = (ompi_free_list_item_t*)
opal_atomic_lifo_pop(&((fl)->super)); \

if( OPAL_UNLIKELY(NULL == item) )
{   \

if(opal_using_threads())
{\

int rc;
  \


opal_mutex_lock(&((fl)->fl_lock));\


do\

{
  \

rc = ompi_free_list_grow((fl),
(fl)->fl_num_per_alloc);   \

if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
break; \


  \

item = (ompi_free_list_item_t*)
opal_atomic_lifo_pop(&((fl)->super)); \


\

} while
(!item);  \


opal_mutex_unlock(&((fl)->fl_lock));  \

} else {
  \

ompi_free_list_grow((fl),
(fl)->fl_num_per_alloc);\

item = (ompi_free_list_item_t*)
opal_atomic_lifo_pop(&((fl)->super)); \

} /* opal_using_threads() */
  \

} /* NULL == item
*/  \

}





Another workaround is to increase the value of  pml_ob1_free_list_inc
parameter.



Regards,

Alexey




___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/09/18039.php


Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Rolf vandeVaart
The bfo was my creation many years ago.  Can we keep it around for a little 
longer?  If we blow it away, then we should probably clean up all the code I 
also have in the openib BTL for supporting failover.  There is also some 
configure code that would have to go as well.

Rolf

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan
>Hjelm
>Sent: Wednesday, September 16, 2015 1:43 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()
>
>* PGP Signed by an unknown key
>
>
>Not sure. I give a +1 for blowing them away. We can bring them back later if
>needed.
>
>-Nathan
>
>On Wed, Sep 16, 2015 at 01:19:24PM -0400, George Bosilca wrote:
>>As they don't even compile why are we keeping them around?
>>  George.
>>On Wed, Sep 16, 2015 at 12:05 PM, Nathan Hjelm 
>wrote:
>>
>>  iboffload and bfo are opal ignored by default. Neither exists in the
>>  release branch.
>>
>>  -Nathan
>>  On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote:
>>  >While looking into a possible fix for this problem we should also
>>  cleanup
>>  >in the trunk the leftover from the OMPI_FREE_LIST.
>>  >$find . -name "*.[ch]" -exec grep -Hn OMPI_FREE_LIST_GET_MT {} +
>>  >./opal/mca/btl/usnic/btl_usnic_compat.h:161:
>>  > OMPI_FREE_LIST_GET_MT(list, (item))
>>  >./ompi/mca/pml/bfo/pml_bfo_recvreq.h:89:
>>  >OMPI_FREE_LIST_GET_MT(&mca_pml_base_recv_requests, item);
>>  \
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:149:
>>  > OMPI_FREE_LIST_GET_MT(&cm->tasks_free, item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:206:
>>  > OMPI_FREE_LIST_GET_MT(task_list, item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:107:
>>  > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:146:
>>  > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:208:
>>  > OMPI_FREE_LIST_GET_MT(&iboffload->device-
>>frags_free[qp_index],
>>  item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_qp_info.c:156:
>>  > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_collfrag.h:130:
>>  > OMPI_FREE_LIST_GET_MT(&cm->collfrags_free, item);
>>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.h:115:
>>  > OMPI_FREE_LIST_GET_MT(&cm->ml_frags_free, item);
>>  >I wonder how these are even compiling ...
>>  >  George.
>>  >On Wed, Sep 16, 2015 at 11:59 AM, George Bosilca
>>  
>>  >wrote:
>>  >
>>  >  Alexey,
>>  >  This is not necessarily the fix for all cases. Most of the
>>  internal uses
>>  >  of the free_list can easily accommodate to the fact that no more
>>  >  elements are available. Based on your description of the problem
>>  I would
>>  >  assume you encounter this problem once the
>>  >  MCA_PML_OB1_RECV_REQUEST_ALLOC is called. In this particular
>case
>>  the
>>  >  problem is that fact that we call OMPI_FREE_LIST_GET_MT and that
>>  the
>>  >  upper level is unable to correctly deal with the case where the
>>  returned
>>  >  item is NULL. In this particular case the real fix is to use the
>>  >  blocking version of the free_list accessor (similar to the case
>>  for
>>  >  send) OMPI_FREE_LIST_WAIT_MT.
>>  >  It is also possible that I misunderstood your problem. IF the
>>  solution
>>  >  above doesn't work can you describe exactly where the NULL return
>>  of the
>>  >  OMPI_FREE_LIST_GET_MT is creating an issue?
>>  >  George.
>>  >  On Wed, Sep 16, 2015 at 9:03 AM, Aleksej Ryzhih
>>  >   wrote:
>>  >
>>  >Hi all,
>>  >
>>  >We experimented with MPI+OpenMP hybrid application
>>  >(MPI_THREAD_MULTIPLE support level)  where several threads
>>  submits a
>>  >lot of MPI_Irecv() requests simultaneously and encountered an
>>  >intermittent bug OMPI_ERR_TEMP_OUT_OF_RESOURCE after
>>  >MCA_PML_OB1_RECV_REQUEST_ALLOC()  because
>>  OMPI_FREE_LIST_GET_MT()
>>  > returned NULL.  Investigating this bug we found that sometimes
>>  the
>>  >thread calling ompi_free_list_grow()  don't have any free items
>>  in
>>  >LIFO list at exit because other threads  retrieved  all new
>>  items at
>>  >opal_atomic_lifo_pop()
>>  >
>>  >So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
>>  >
>>  >
>>  >
>>  >#define OMPI_FREE_LIST_GET_MT(fl,
>>  >item)
>>\
>>  

Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Nathan Hjelm

Not sure. I give a +1 for blowing them away. We can bring them back
later if needed.

-Nathan

On Wed, Sep 16, 2015 at 01:19:24PM -0400, George Bosilca wrote:
>As they don't even compile why are we keeping them around?
>  George.
>On Wed, Sep 16, 2015 at 12:05 PM, Nathan Hjelm  wrote:
> 
>  iboffload and bfo are opal ignored by default. Neither exists in the
>  release branch.
> 
>  -Nathan
>  On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote:
>  >While looking into a possible fix for this problem we should also
>  cleanup
>  >in the trunk the leftover from the OMPI_FREE_LIST.
>  >$find . -name "*.[ch]" -exec grep -Hn OMPI_FREE_LIST_GET_MT {} +
>  >./opal/mca/btl/usnic/btl_usnic_compat.h:161:
>  > OMPI_FREE_LIST_GET_MT(list, (item))
>  >./ompi/mca/pml/bfo/pml_bfo_recvreq.h:89:
>  >OMPI_FREE_LIST_GET_MT(&mca_pml_base_recv_requests, item); 
>  \
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:149:
>  > OMPI_FREE_LIST_GET_MT(&cm->tasks_free, item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:206:
>  > OMPI_FREE_LIST_GET_MT(task_list, item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:107:
>  > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:146:
>  > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:208:
>  > OMPI_FREE_LIST_GET_MT(&iboffload->device->frags_free[qp_index],
>  item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_qp_info.c:156:
>  > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_collfrag.h:130:
>  > OMPI_FREE_LIST_GET_MT(&cm->collfrags_free, item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.h:115:
>  > OMPI_FREE_LIST_GET_MT(&cm->ml_frags_free, item);
>  >I wonder how these are even compiling ...
>  >  George.
>  >On Wed, Sep 16, 2015 at 11:59 AM, George Bosilca
>  
>  >wrote:
>  >
>  >  Alexey,
>  >  This is not necessarily the fix for all cases. Most of the
>  internal uses
>  >  of the free_list can easily accommodate to the fact that no more
>  >  elements are available. Based on your description of the problem
>  I would
>  >  assume you encounter this problem once the
>  >  MCA_PML_OB1_RECV_REQUEST_ALLOC is called. In this particular case
>  the
>  >  problem is that fact that we call OMPI_FREE_LIST_GET_MT and that
>  the
>  >  upper level is unable to correctly deal with the case where the
>  returned
>  >  item is NULL. In this particular case the real fix is to use the
>  >  blocking version of the free_list accessor (similar to the case
>  for
>  >  send) OMPI_FREE_LIST_WAIT_MT.
>  >  It is also possible that I misunderstood your problem. IF the
>  solution
>  >  above doesn't work can you describe exactly where the NULL return
>  of the
>  >  OMPI_FREE_LIST_GET_MT is creating an issue?
>  >  George.
>  >  On Wed, Sep 16, 2015 at 9:03 AM, Aleksej Ryzhih
>  >   wrote:
>  >
>  >Hi all,
>  >
>  >We experimented with MPI+OpenMP hybrid application
>  >(MPI_THREAD_MULTIPLE support level)  where several threads
>  submits a
>  >lot of MPI_Irecv() requests simultaneously and encountered an
>  >intermittent bug OMPI_ERR_TEMP_OUT_OF_RESOURCE after
>  >MCA_PML_OB1_RECV_REQUEST_ALLOC()  because 
>  OMPI_FREE_LIST_GET_MT()
>  > returned NULL.  Investigating this bug we found that sometimes
>  the
>  >thread calling ompi_free_list_grow()  don't have any free items
>  in
>  >LIFO list at exit because other threads  retrieved  all new
>  items at
>  >opal_atomic_lifo_pop()
>  >
>  >So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
>  >
>  >
>  >
>  >#define OMPI_FREE_LIST_GET_MT(fl,
>  >item) 
>\
>  >
>  >
>  >{
>  >  \
>  >
>  >item = (ompi_free_list_item_t*)
>  >opal_atomic_lifo_pop(&((fl)->super)); \
>  >
>  >if( OPAL_UNLIKELY(NULL == item) )
>  >{   \
>  >
>  >if(opal_using_threads())
>  >{\
>  >
>  >int rc;
>  >  \
>  >
>  >
>

Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread George Bosilca
As they don't even compile why are we keeping them around?

  George.


On Wed, Sep 16, 2015 at 12:05 PM, Nathan Hjelm  wrote:

>
> iboffload and bfo are opal ignored by default. Neither exists in the
> release branch.
>
> -Nathan
>
> On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote:
> >While looking into a possible fix for this problem we should also
> cleanup
> >in the trunk the leftover from the OMPI_FREE_LIST.
> >$find . -name "*.[ch]" -exec grep -Hn OMPI_FREE_LIST_GET_MT {} +
> >./opal/mca/btl/usnic/btl_usnic_compat.h:161:
> > OMPI_FREE_LIST_GET_MT(list, (item))
> >./ompi/mca/pml/bfo/pml_bfo_recvreq.h:89:
> >OMPI_FREE_LIST_GET_MT(&mca_pml_base_recv_requests, item);  \
> >./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:149:
> > OMPI_FREE_LIST_GET_MT(&cm->tasks_free, item);
> >./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:206:
> > OMPI_FREE_LIST_GET_MT(task_list, item);
> >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:107:
> > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
> >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:146:
> > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
> >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:208:
> > OMPI_FREE_LIST_GET_MT(&iboffload->device->frags_free[qp_index],
> item);
> >./ompi/mca/bcol/iboffload/bcol_iboffload_qp_info.c:156:
> > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
> >./ompi/mca/bcol/iboffload/bcol_iboffload_collfrag.h:130:
> > OMPI_FREE_LIST_GET_MT(&cm->collfrags_free, item);
> >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.h:115:
> > OMPI_FREE_LIST_GET_MT(&cm->ml_frags_free, item);
> >I wonder how these are even compiling ...
> >  George.
> >On Wed, Sep 16, 2015 at 11:59 AM, George Bosilca  >
> >wrote:
> >
> >  Alexey,
> >  This is not necessarily the fix for all cases. Most of the internal
> uses
> >  of the free_list can easily accommodate to the fact that no more
> >  elements are available. Based on your description of the problem I
> would
> >  assume you encounter this problem once the
> >  MCA_PML_OB1_RECV_REQUEST_ALLOC is called. In this particular case
> the
> >  problem is that fact that we call OMPI_FREE_LIST_GET_MT and that the
> >  upper level is unable to correctly deal with the case where the
> returned
> >  item is NULL. In this particular case the real fix is to use the
> >  blocking version of the free_list accessor (similar to the case for
> >  send) OMPI_FREE_LIST_WAIT_MT.
> >  It is also possible that I misunderstood your problem. IF the
> solution
> >  above doesn't work can you describe exactly where the NULL return
> of the
> >  OMPI_FREE_LIST_GET_MT is creating an issue?
> >  George.
> >  On Wed, Sep 16, 2015 at 9:03 AM, Aleksej Ryzhih
> >   wrote:
> >
> >Hi all,
> >
> >We experimented with MPI+OpenMP hybrid application
> >(MPI_THREAD_MULTIPLE support level)  where several threads
> submits a
> >lot of MPI_Irecv() requests simultaneously and encountered an
> >intermittent bug OMPI_ERR_TEMP_OUT_OF_RESOURCE after
> >MCA_PML_OB1_RECV_REQUEST_ALLOC()  because  OMPI_FREE_LIST_GET_MT()
> > returned NULL.  Investigating this bug we found that sometimes
> the
> >thread calling ompi_free_list_grow()  don't have any free items in
> >LIFO list at exit because other threads  retrieved  all new items
> at
> >opal_atomic_lifo_pop()
> >
> >So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
> >
> >
> >
> >#define OMPI_FREE_LIST_GET_MT(fl,
> >item)
> \
> >
> >
> >{
> >  \
> >
> >item = (ompi_free_list_item_t*)
> >opal_atomic_lifo_pop(&((fl)->super)); \
> >
> >if( OPAL_UNLIKELY(NULL == item) )
> >{   \
> >
> >if(opal_using_threads())
> >{\
> >
> >int rc;
> >  \
> >
> >
> >opal_mutex_lock(&((fl)->fl_lock));
> >\
> >
> >
> >do
> >\
> >
> >{
> >  \
> >
> >rc = ompi_free_list_grow((fl),
> >(fl)->fl_num_per_alloc);   \
> >
> >if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
> >break; \
> >
> >
> >
> \
> >
> >item = (ompi_free_list_item_t*)
> >opal_atomic_lifo_pop(&((fl)->super)); \
> >
> >
> >\
> >
> >} while
> >(!item);
> \
> >
> >
> >opal_mutex_unlock(&((fl)->fl_lock));
> >\
> >
> >} else
> >{
> >  

Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Nathan Hjelm

I see the problem. Before my changes ompi_comm_dup signalled that the
communicator was not an inter-communicator by setting remote_size to
0. The remote size is now from the remote group if one was supplied
(which is the case with intra-communicators) so ompi_comm_dup needs to
make sure NULL is passed for the remote_group when duplicating
intra-communicators.

I opened a PR. Once jenkins finishes I will merge it onto master.

-Nathan

On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
> yes, I did fresh pull this morning, for me it deadlocks reliably for 2 and
> more processes.
> 
> Thanks
> Edgar
> 
> On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
> >
> >The reproducer is working for me with master on OX 10.10. Some changes
> >to ompi_comm_set went in yesterday. Are you on the latest hash?
> >
> >-Nathan
> >
> >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
> >>something is borked right now on master in the management of inter vs. intra
> >>communicators. It looks like intra communicators are wrongly selecting the
> >>inter coll module thinking that it is an inter communicator, and we have
> >>hangs because of that. I attach a small replicator, where a bcast of a
> >>duplicate of MPI_COMM_WORLD hangs, because the inter collective module is
> >>being selected.
> >>
> >>Thanks
> >>Edgar
> >
> >>#include 
> >>#include "mpi.h"
> >>
> >>int main( int argc, char *argv[] )
> >>{
> >>   MPI_Comm comm1;
> >>   int root=0;
> >>   int rank2, size2, global_buf=1;
> >>   int rank, size;
> >>
> >>   MPI_Init ( &argc, &argv );
> >>
> >>   MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
> >>   MPI_Comm_size ( MPI_COMM_WORLD, &size );
> >>
> >>/* Setting up a new communicator */
> >>   MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );
> >>
> >>   MPI_Comm_size ( comm1, &size2 );
> >>   MPI_Comm_rank ( comm1, &rank2 );
> >>
> >>
> >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
> >>   if ( rank == root ) {
> >>   printf("Bcast on MPI_COMM_WORLD finished\n");
> >>   }
> >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
> >>   if ( rank == root ) {
> >>   printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
> >>   }
> >>
> >>   MPI_Comm_free ( &comm1 );
> >>
> >>   MPI_Finalize ();
> >>   return ( 0 );
> >>}
> >
> >>___
> >>devel mailing list
> >>de...@open-mpi.org
> >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>Link to this post: 
> >>http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
> >
> >
> >
> >___
> >devel mailing list
> >de...@open-mpi.org
> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >Link to this post: 
> >http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
> >
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php


pgpmMsCoitOgp.pgp
Description: PGP signature


Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Nathan Hjelm

iboffload and bfo are opal ignored by default. Neither exists in the
release branch.

-Nathan

On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote:
>While looking into a possible fix for this problem we should also cleanup
>in the trunk the leftover from the OMPI_FREE_LIST.
>$find . -name "*.[ch]" -exec grep -Hn OMPI_FREE_LIST_GET_MT {} +
>./opal/mca/btl/usnic/btl_usnic_compat.h:161:  
> OMPI_FREE_LIST_GET_MT(list, (item))
>./ompi/mca/pml/bfo/pml_bfo_recvreq.h:89:  
>OMPI_FREE_LIST_GET_MT(&mca_pml_base_recv_requests, item);  \
>./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:149:  
> OMPI_FREE_LIST_GET_MT(&cm->tasks_free, item);
>./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:206:  
> OMPI_FREE_LIST_GET_MT(task_list, item);
>./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:107:  
> OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:146:  
> OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:208:  
> OMPI_FREE_LIST_GET_MT(&iboffload->device->frags_free[qp_index], item);
>./ompi/mca/bcol/iboffload/bcol_iboffload_qp_info.c:156:  
> OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>./ompi/mca/bcol/iboffload/bcol_iboffload_collfrag.h:130:  
> OMPI_FREE_LIST_GET_MT(&cm->collfrags_free, item);
>./ompi/mca/bcol/iboffload/bcol_iboffload_frag.h:115:  
> OMPI_FREE_LIST_GET_MT(&cm->ml_frags_free, item);
>I wonder how these are even compiling ...
>  George.
>On Wed, Sep 16, 2015 at 11:59 AM, George Bosilca 
>wrote:
> 
>  Alexey,
>  This is not necessarily the fix for all cases. Most of the internal uses
>  of the free_list can easily accommodate to the fact that no more
>  elements are available. Based on your description of the problem I would
>  assume you encounter this problem once the
>  MCA_PML_OB1_RECV_REQUEST_ALLOC is called. In this particular case the
>  problem is that fact that we call OMPI_FREE_LIST_GET_MT and that the
>  upper level is unable to correctly deal with the case where the returned
>  item is NULL. In this particular case the real fix is to use the
>  blocking version of the free_list accessor (similar to the case for
>  send) OMPI_FREE_LIST_WAIT_MT.
>  It is also possible that I misunderstood your problem. IF the solution
>  above doesn't work can you describe exactly where the NULL return of the
>  OMPI_FREE_LIST_GET_MT is creating an issue?
>  George.
>  On Wed, Sep 16, 2015 at 9:03 AM, Aleksej Ryzhih
>   wrote:
> 
>Hi all,
> 
>We experimented with MPI+OpenMP hybrid application
>(MPI_THREAD_MULTIPLE support level)  where several threads submits a
>lot of MPI_Irecv() requests simultaneously and encountered an
>intermittent bug OMPI_ERR_TEMP_OUT_OF_RESOURCE after
>MCA_PML_OB1_RECV_REQUEST_ALLOC()  because  OMPI_FREE_LIST_GET_MT()
> returned NULL.  Investigating this bug we found that sometimes the
>thread calling ompi_free_list_grow()  don't have any free items in
>LIFO list at exit because other threads  retrieved  all new items at
>opal_atomic_lifo_pop() 
> 
>So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
> 
> 
> 
>#define OMPI_FREE_LIST_GET_MT(fl,
>item)\
> 
>   
>{  
>  \
> 
>item = (ompi_free_list_item_t*)
>opal_atomic_lifo_pop(&((fl)->super)); \
> 
>if( OPAL_UNLIKELY(NULL == item) )
>{   \
> 
>if(opal_using_threads())
>{\
> 
>int rc;
>  \
> 
>   
>opal_mutex_lock(&((fl)->fl_lock)); 
>   
>\
> 
>   
>do 
>   
>\
> 
>{  
>  \
> 
>rc = ompi_free_list_grow((fl),
>(fl)->fl_num_per_alloc);   \
> 
>if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
>break; \
> 
>   
>  \
> 
>item = (ompi_free_list_item_t*)
>opal_atomic_lifo_pop(&((fl)->super)); \
> 
>  

[OMPI devel] edison/hopper jenkins nodes back on line

2015-09-16 Thread Howard Pritchard
Hi Folks,

I had to update my password for NERSC systems and that broke the credentials
the IU jenkins was using to launch on those nodes.  Should be working again.

Sorry for the inconvenience,

Howard


Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread George Bosilca
While looking into a possible fix for this problem we should also cleanup
in the trunk the leftover from the OMPI_FREE_LIST.

$find . -name "*.[ch]" -exec grep -Hn OMPI_FREE_LIST_GET_MT {} +
./opal/mca/btl/usnic/btl_usnic_compat.h:161:OMPI_FREE_LIST_GET_MT(list,
(item))
./ompi/mca/pml/bfo/pml_bfo_recvreq.h:89:
OMPI_FREE_LIST_GET_MT(&mca_pml_base_recv_requests, item);  \
./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:149:
 OMPI_FREE_LIST_GET_MT(&cm->tasks_free, item);
./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:206:
 OMPI_FREE_LIST_GET_MT(task_list, item);
./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:107:
 OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:146:
 OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:208:
 OMPI_FREE_LIST_GET_MT(&iboffload->device->frags_free[qp_index], item);
./ompi/mca/bcol/iboffload/bcol_iboffload_qp_info.c:156:
 OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
./ompi/mca/bcol/iboffload/bcol_iboffload_collfrag.h:130:
 OMPI_FREE_LIST_GET_MT(&cm->collfrags_free, item);
./ompi/mca/bcol/iboffload/bcol_iboffload_frag.h:115:
 OMPI_FREE_LIST_GET_MT(&cm->ml_frags_free, item);

I wonder how these are even compiling ...

  George.



On Wed, Sep 16, 2015 at 11:59 AM, George Bosilca 
wrote:

> Alexey,
>
> This is not necessarily the fix for all cases. Most of the internal uses
> of the free_list can easily accommodate to the fact that no more elements
> are available. Based on your description of the problem I would assume you
> encounter this problem once the MCA_PML_OB1_RECV_REQUEST_ALLOC is called.
> In this particular case the problem is that fact that we call
> OMPI_FREE_LIST_GET_MT and that the upper level is unable to correctly deal
> with the case where the returned item is NULL. In this particular case the
> real fix is to use the blocking version of the free_list accessor (similar
> to the case for send) OMPI_FREE_LIST_WAIT_MT.
>
>
> It is also possible that I misunderstood your problem. IF the solution
> above doesn't work can you describe exactly where the NULL return of the
> OMPI_FREE_LIST_GET_MT is creating an issue?
>
> George.
>
>
> On Wed, Sep 16, 2015 at 9:03 AM, Алексей Рыжих 
> wrote:
>
>> Hi all,
>>
>> We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
>> support level)  where several threads submits a lot of MPI_Irecv() requests
>> simultaneously and encountered an intermittent bug
>> OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC()
>> because  OMPI_FREE_LIST_GET_MT()  returned NULL.  Investigating this bug we
>> found that sometimes the thread calling ompi_free_list_grow()  don’t have
>> any free items in LIFO list at exit because other threads  retrieved  all
>> new items at opal_atomic_lifo_pop()
>>
>> So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
>>
>>
>>
>> #define OMPI_FREE_LIST_GET_MT(fl, item)
>>\
>>
>> {
>>   \
>>
>> item = (ompi_free_list_item_t*)
>> opal_atomic_lifo_pop(&((fl)->super)); \
>>
>> if( OPAL_UNLIKELY(NULL == item) )
>> {   \
>>
>> if(opal_using_threads())
>> {\
>>
>> int rc;
>>   \
>>
>>
>> opal_mutex_lock(&((fl)->fl_lock));\
>>
>>
>> do\
>>
>> {
>>   \
>>
>> rc = ompi_free_list_grow((fl),
>> (fl)->fl_num_per_alloc);   \
>>
>> if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
>> break; \
>>
>>
>>   \
>>
>> item = (ompi_free_list_item_t*)
>> opal_atomic_lifo_pop(&((fl)->super)); \
>>
>>
>> \
>>
>> } while
>> (!item);  \
>>
>>
>> opal_mutex_unlock(&((fl)->fl_lock));  \
>>
>> } else
>> {  \
>>
>> ompi_free_list_grow((fl),
>> (fl)->fl_num_per_alloc);\
>>
>> item = (ompi_free_list_item_t*)
>> opal_atomic_lifo_pop(&((fl)->super)); \
>>
>> } /* opal_using_threads() */
>>   \
>>
>> } /* NULL == item
>> */  \
>>
>> }
>>
>>
>>
>>
>>
>> Another workaround is to increase the value of  pml_ob1_free_list_inc
>> parameter.
>>
>>
>>
>> Regards,
>>
>> Alexey
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>

Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread George Bosilca
Alexey,

This is not necessarily the fix for all cases. Most of the internal uses of
the free_list can easily accommodate to the fact that no more elements are
available. Based on your description of the problem I would assume you
encounter this problem once the MCA_PML_OB1_RECV_REQUEST_ALLOC is called.
In this particular case the problem is that fact that we call
OMPI_FREE_LIST_GET_MT and that the upper level is unable to correctly deal
with the case where the returned item is NULL. In this particular case the
real fix is to use the blocking version of the free_list accessor (similar
to the case for send) OMPI_FREE_LIST_WAIT_MT.


It is also possible that I misunderstood your problem. IF the solution
above doesn't work can you describe exactly where the NULL return of the
OMPI_FREE_LIST_GET_MT is creating an issue?

George.


On Wed, Sep 16, 2015 at 9:03 AM, Алексей Рыжих 
wrote:

> Hi all,
>
> We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
> support level)  where several threads submits a lot of MPI_Irecv() requests
> simultaneously and encountered an intermittent bug
> OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC()
> because  OMPI_FREE_LIST_GET_MT()  returned NULL.  Investigating this bug we
> found that sometimes the thread calling ompi_free_list_grow()  don’t have
> any free items in LIFO list at exit because other threads  retrieved  all
> new items at opal_atomic_lifo_pop()
>
> So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
>
>
>
> #define OMPI_FREE_LIST_GET_MT(fl, item)
>\
>
> {
>   \
>
> item = (ompi_free_list_item_t*)
> opal_atomic_lifo_pop(&((fl)->super)); \
>
> if( OPAL_UNLIKELY(NULL == item) )
> {   \
>
> if(opal_using_threads())
> {\
>
> int rc;
>   \
>
>
> opal_mutex_lock(&((fl)->fl_lock));\
>
>
> do\
>
> {
>   \
>
> rc = ompi_free_list_grow((fl),
> (fl)->fl_num_per_alloc);   \
>
> if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
> break; \
>
>
>   \
>
> item = (ompi_free_list_item_t*)
> opal_atomic_lifo_pop(&((fl)->super)); \
>
>
> \
>
> } while
> (!item);  \
>
>
> opal_mutex_unlock(&((fl)->fl_lock));  \
>
> } else
> {  \
>
> ompi_free_list_grow((fl),
> (fl)->fl_num_per_alloc);\
>
> item = (ompi_free_list_item_t*)
> opal_atomic_lifo_pop(&((fl)->super)); \
>
> } /* opal_using_threads() */
>   \
>
> } /* NULL == item
> */  \
>
> }
>
>
>
>
>
> Another workaround is to increase the value of  pml_ob1_free_list_inc
> parameter.
>
>
>
> Regards,
>
> Alexey
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18039.php
>


Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Nathan Hjelm

I just realized my branch is behind master. Updating now and will retest.

-Nathan

On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
> yes, I did fresh pull this morning, for me it deadlocks reliably for 2 and
> more processes.
> 
> Thanks
> Edgar
> 
> On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
> >
> >The reproducer is working for me with master on OX 10.10. Some changes
> >to ompi_comm_set went in yesterday. Are you on the latest hash?
> >
> >-Nathan
> >
> >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
> >>something is borked right now on master in the management of inter vs. intra
> >>communicators. It looks like intra communicators are wrongly selecting the
> >>inter coll module thinking that it is an inter communicator, and we have
> >>hangs because of that. I attach a small replicator, where a bcast of a
> >>duplicate of MPI_COMM_WORLD hangs, because the inter collective module is
> >>being selected.
> >>
> >>Thanks
> >>Edgar
> >
> >>#include 
> >>#include "mpi.h"
> >>
> >>int main( int argc, char *argv[] )
> >>{
> >>   MPI_Comm comm1;
> >>   int root=0;
> >>   int rank2, size2, global_buf=1;
> >>   int rank, size;
> >>
> >>   MPI_Init ( &argc, &argv );
> >>
> >>   MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
> >>   MPI_Comm_size ( MPI_COMM_WORLD, &size );
> >>
> >>/* Setting up a new communicator */
> >>   MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );
> >>
> >>   MPI_Comm_size ( comm1, &size2 );
> >>   MPI_Comm_rank ( comm1, &rank2 );
> >>
> >>
> >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
> >>   if ( rank == root ) {
> >>   printf("Bcast on MPI_COMM_WORLD finished\n");
> >>   }
> >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
> >>   if ( rank == root ) {
> >>   printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
> >>   }
> >>
> >>   MPI_Comm_free ( &comm1 );
> >>
> >>   MPI_Finalize ();
> >>   return ( 0 );
> >>}
> >
> >>___
> >>devel mailing list
> >>de...@open-mpi.org
> >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>Link to this post: 
> >>http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
> >
> >
> >
> >___
> >devel mailing list
> >de...@open-mpi.org
> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >Link to this post: 
> >http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
> >
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php


pgpYHFc1iNhFV.pgp
Description: PGP signature


Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Edgar Gabriel
yes, I did fresh pull this morning, for me it deadlocks reliably for 2 
and more processes.


Thanks
Edgar

On 9/16/2015 10:42 AM, Nathan Hjelm wrote:


The reproducer is working for me with master on OX 10.10. Some changes
to ompi_comm_set went in yesterday. Are you on the latest hash?

-Nathan

On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:

something is borked right now on master in the management of inter vs. intra
communicators. It looks like intra communicators are wrongly selecting the
inter coll module thinking that it is an inter communicator, and we have
hangs because of that. I attach a small replicator, where a bcast of a
duplicate of MPI_COMM_WORLD hangs, because the inter collective module is
being selected.

Thanks
Edgar



#include 
#include "mpi.h"

int main( int argc, char *argv[] )
{
   MPI_Comm comm1;
   int root=0;
   int rank2, size2, global_buf=1;
   int rank, size;

   MPI_Init ( &argc, &argv );

   MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
   MPI_Comm_size ( MPI_COMM_WORLD, &size );

/* Setting up a new communicator */
   MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );

   MPI_Comm_size ( comm1, &size2 );
   MPI_Comm_rank ( comm1, &rank2 );


   MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
   if ( rank == root ) {
   printf("Bcast on MPI_COMM_WORLD finished\n");
   }
   MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
   if ( rank == root ) {
   printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
   }

   MPI_Comm_free ( &comm1 );

   MPI_Finalize ();
   return ( 0 );
}



___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/09/18040.php




___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/09/18042.php



Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Nathan Hjelm

The reproducer is working for me with master on OX 10.10. Some changes
to ompi_comm_set went in yesterday. Are you on the latest hash?

-Nathan

On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
> something is borked right now on master in the management of inter vs. intra
> communicators. It looks like intra communicators are wrongly selecting the
> inter coll module thinking that it is an inter communicator, and we have
> hangs because of that. I attach a small replicator, where a bcast of a
> duplicate of MPI_COMM_WORLD hangs, because the inter collective module is
> being selected.
> 
> Thanks
> Edgar

> #include 
> #include "mpi.h"
> 
> int main( int argc, char *argv[] )
> {
>   MPI_Comm comm1;
>   int root=0;
>   int rank2, size2, global_buf=1;
>   int rank, size;
> 
>   MPI_Init ( &argc, &argv );
> 
>   MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
>   MPI_Comm_size ( MPI_COMM_WORLD, &size );
> 
> /* Setting up a new communicator */
>   MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );
> 
>   MPI_Comm_size ( comm1, &size2 );
>   MPI_Comm_rank ( comm1, &rank2 );
> 
> 
>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
>   if ( rank == root ) {
>   printf("Bcast on MPI_COMM_WORLD finished\n");
>   }
>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
>   if ( rank == root ) {
>   printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
>   }
> 
>   MPI_Comm_free ( &comm1 );
> 
>   MPI_Finalize ();
>   return ( 0 );
> }

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18040.php



pgp5P6XiYmwtY.pgp
Description: PGP signature


Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Nathan Hjelm

The formatting of the code got all messed up. Please send a diff and I
will take a look. ompi free list no longer exists in master or the next
release branch but the change may be worthwhile for the opal free list
code.

-Nathan

On Wed, Sep 16, 2015 at 04:03:44PM +0300, Алексей Рыжих wrote:
>Hi all,
> 
>We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
>support level)  where several threads submits a lot of MPI_Irecv()
>requests simultaneously and encountered an intermittent bug
>OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC() 
>because  OMPI_FREE_LIST_GET_MT()  returned NULL.  Investigating this bug
>we found that sometimes the thread calling ompi_free_list_grow()  don't
>have any free items in LIFO list at exit because other threads  retrieved
> all new items at opal_atomic_lifo_pop() 
> 
>So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
> 
> 
> 
>#define OMPI_FREE_LIST_GET_MT(fl, item)
>   \
> 
>{  
>  \
> 
>item = (ompi_free_list_item_t*)
>opal_atomic_lifo_pop(&((fl)->super)); \
> 
>if( OPAL_UNLIKELY(NULL == item) )
>{   \
> 
>if(opal_using_threads())
>{\
> 
>int rc;
>  \
> 
>   
>opal_mutex_lock(&((fl)->fl_lock));   
>\
> 
>   
>do   
>\
> 
>{  
>  \
> 
>rc = ompi_free_list_grow((fl),
>(fl)->fl_num_per_alloc);   \
> 
>if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
>break; \
> 
>   
>  \
> 
>item = (ompi_free_list_item_t*)
>opal_atomic_lifo_pop(&((fl)->super)); \
> 
>   
>   
>\
> 
>} while
>(!item);  \
> 
>   
>opal_mutex_unlock(&((fl)->fl_lock)); 
>\
> 
>} else
>{  \
> 
>ompi_free_list_grow((fl),
>(fl)->fl_num_per_alloc);\
> 
>item = (ompi_free_list_item_t*)
>opal_atomic_lifo_pop(&((fl)->super)); \
> 
>} /* opal_using_threads() */   
>  \
> 
>} /* NULL == item
>*/  \
> 
>}
> 
> 
> 
> 
> 
>Another workaround is to increase the value of  pml_ob1_free_list_inc
>parameter.
> 
> 
> 
>Regards,
> 
>Alexey
> 
> 

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18039.php



pgpRE9F8AQdun.pgp
Description: PGP signature


[OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Edgar Gabriel
something is borked right now on master in the management of inter vs. 
intra communicators. It looks like intra communicators are wrongly 
selecting the inter coll module thinking that it is an inter 
communicator, and we have hangs because of that. I attach a small 
replicator, where a bcast of a duplicate of MPI_COMM_WORLD hangs, 
because the inter collective module is being selected.


Thanks
Edgar
#include 
#include "mpi.h"

int main( int argc, char *argv[] )
{
  MPI_Comm comm1;
  int root=0;
  int rank2, size2, global_buf=1;
  int rank, size;

  MPI_Init ( &argc, &argv );

  MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
  MPI_Comm_size ( MPI_COMM_WORLD, &size );

/* Setting up a new communicator */
  MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );

  MPI_Comm_size ( comm1, &size2 );
  MPI_Comm_rank ( comm1, &rank2 );


  MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
  if ( rank == root ) {
  printf("Bcast on MPI_COMM_WORLD finished\n");
  }
  MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
  if ( rank == root ) {
  printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
  }

  MPI_Comm_free ( &comm1 );

  MPI_Finalize ();
  return ( 0 );
}


[OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Алексей Рыжих
Hi all,

We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
support level)  where several threads submits a lot of MPI_Irecv() requests
simultaneously and encountered an intermittent bug
OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC()
because  OMPI_FREE_LIST_GET_MT()  returned NULL.  Investigating this bug we
found that sometimes the thread calling ompi_free_list_grow()  don’t have
any free items in LIFO list at exit because other threads  retrieved  all
new items at opal_atomic_lifo_pop()

So we suggest to change OMPI_FREE_LIST_GET_MT() as below:



#define OMPI_FREE_LIST_GET_MT(fl, item)
   \

{
  \

item = (ompi_free_list_item_t*)
opal_atomic_lifo_pop(&((fl)->super)); \

if( OPAL_UNLIKELY(NULL == item) )
{   \

if(opal_using_threads())
{\

int rc;
  \


opal_mutex_lock(&((fl)->fl_lock));\


do\

{
  \

rc = ompi_free_list_grow((fl),
(fl)->fl_num_per_alloc);   \

if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
break; \


  \

item = (ompi_free_list_item_t*)
opal_atomic_lifo_pop(&((fl)->super)); \


\

} while
(!item);  \


opal_mutex_unlock(&((fl)->fl_lock));  \

} else {
  \

ompi_free_list_grow((fl),
(fl)->fl_num_per_alloc);\

item = (ompi_free_list_item_t*)
opal_atomic_lifo_pop(&((fl)->super)); \

} /* opal_using_threads() */
  \

} /* NULL == item
*/  \

}





Another workaround is to increase the value of  pml_ob1_free_list_inc
parameter.



Regards,

Alexey


[OMPI devel] Interaction between orterun and user program

2015-09-16 Thread Kay Khandan (Hamed)
Hello everyone,

My name is Kay. I’m a huge "oom-pi" fan, but only recently have been looking at 
from devel perspective.

I appreciate if somebody shows me the entry point into understanding how 
orterun and user program interact, and more importantly how to change the way 
they interact.

The reason: I am making a plugin for MPI support in another message passing 
system. This plugin is loaded from a dynamic library sometime after the process 
is started and is run on a separated tread. Therefore, (1) it does not receive 
any command line arguments, and (2) it is not allowed to use standard pipes 
(file descriptors 0,1, 2). With that in mind, I’de like to interface this 
plugin from inside so-called ARE (which is the name for the runtime environment 
for this particular message passing system) to our old friend ORTE. I have the 
option to run “are” as a user program run by orterun.

$orterun are ./actual-user-program 

 It might be wishful thinking, but I am also kindda hoping that I could get 
orterun out of the way all together by embedding a part of its implementation 
directly inside that plugin.

I’de appreciate to hear your insights.

Best,
— Kay

Re: [OMPI devel] Commit 6e6a3e96

2015-09-16 Thread Gilles Gouaillardet
George,

I will revisit this.
if I added const modifier when not required by the standard, this was not
intentional, this was a mistake.

thanks for the report

Gilles

On Wednesday, September 16, 2015, George Bosilca 
wrote:

> Gilles,
>
> Your commit 6e6a3e96 is only partially correct. There is no point arguing
> about the correctness of the const keyword for the send buffer. I can also
> understand your willingness to diverge from the MPI standard in order to
> fix the interface for irecv_init. But there is definitively no reason to
> have const for:
> - the receive buffer of any receive functions
> - the free buffer (mca_allocator_*_free)
>
> Please revise you patch.
>
> Thanks,
>   George.
>
>