Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-17 Thread Алексей Рыжих
George,

Thank you for response.

In my opinion our solution with do/while() loop  in  OMPI_FREE_LIST_GET_MT
is better for our MPI+OpenMP hybrid application than using
OMPI_FREE_LIST_WAIT_MT.

Because in case OMPI_FREE_LIST_WAIT_MT MPI_Irecv()  will be suspended in
opal_progress() until one of MPI_Irecv() requests  from other thread is
completed.

And it is not the case when the list reached   free_list_max_num  limit.
The situation is that the threads consumed   all items from free list
  before one other thread completed ompi_free_list_grow() and that thread
executing  ompi_free_list_grow() got NULL.



Sorry for my poor English.



Alexey.



*From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *George
Bosilca
*Sent:* Wednesday, September 16, 2015 10:18 PM
*To:* Open MPI Developers
*Subject:* Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()



On Wed, Sep 16, 2015 at 3:11 PM, Владимир Трущин 
wrote:

Sorry, “We saw the following problem in OMPI_FREE_LIST_GET_MT…”.



That's exactly what the WAIT macro is supposed to solve, wait (grow the
freelist and call opal_progress) until an item become available.



  George.







*From:* Владимир Трущин [mailto:vdtrusc...@compcenter.org]
*Sent:* Wednesday, September 16, 2015 10:09 PM
*To:* 'Open MPI Developers'
*Subject:* RE: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()



George,



You are right. The sequence of calls in our test is MPI_Irecv ->
mca_pml_ob1_irecv -> MCA_PML_OB1_RECV_REQUEST_ALLOC. We will try to use
OMPI_FREE_LIST_WAIT_MT.



We saw the following problem in OMPI_FREE_LIST_WAIT_MT. It returned NULL in
case when thread A was suspended after the call of  ompi_free_list_grow. At
this time others threads took all items from free list at the first call of
opal_atomic_lifo_pop in macro. So, when thread A was unsuspended and call
the second opal_atomic_lifo_pop in macro - it returned NULL.



Best regards,

Vladimir.



*From:* devel [mailto:devel-boun...@open-mpi.org
] *On Behalf Of *George Bosilca
*Sent:* Wednesday, September 16, 2015 7:00 PM
*To:* Open MPI Developers
*Subject:* Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()



Alexey,



This is not necessarily the fix for all cases. Most of the internal uses of
the free_list can easily accommodate to the fact that no more elements are
available. Based on your description of the problem I would assume you
encounter this problem once the MCA_PML_OB1_RECV_REQUEST_ALLOC is called.
In this particular case the problem is that fact that we call
OMPI_FREE_LIST_GET_MT and that the upper level is unable to correctly deal
with the case where the returned item is NULL. In this particular case the
real fix is to use the blocking version of the free_list accessor (similar
to the case for send) OMPI_FREE_LIST_WAIT_MT.





It is also possible that I misunderstood your problem. IF the solution
above doesn't work can you describe exactly where the NULL return of the
OMPI_FREE_LIST_GET_MT is creating an issue?



George.





On Wed, Sep 16, 2015 at 9:03 AM, Алексей Рыжих 
wrote:

Hi all,

We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
support level)  where several threads submits a lot of MPI_Irecv() requests
simultaneously and encountered an intermittent bug
OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC()
because  OMPI_FREE_LIST_GET_MT()  returned NULL.  Investigating this bug we
found that sometimes the thread calling ompi_free_list_grow()  don’t have
any free items in LIFO list at exit because other threads  retrieved  all
new items at opal_atomic_lifo_pop()

So we suggest to change OMPI_FREE_LIST_GET_MT() as below:



#define OMPI_FREE_LIST_GET_MT(fl, item)
   \

{
  \

item = (ompi_free_list_item_t*)
opal_atomic_lifo_pop(&((fl)->super)); \

if( OPAL_UNLIKELY(NULL == item) )
{   \

if(opal_using_threads())
{\

int rc;
  \


opal_mutex_lock(&((fl)->fl_lock));\


do\

{
  \

rc = ompi_free_list_grow((fl),
(fl)->fl_num_per_alloc);   \

if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
break; \


  \

item = (ompi_free_list_item_t*)
opal_atomic_lifo_pop(&((fl)->super)); \


\

} while
(!item);  \


opal_mutex_unlock(&((fl)->fl_lock));  \

} else {
  \

ompi_free_list_grow((fl),
(fl)->fl_num_per_alloc);\

item

Re: [OMPI devel] --enable-spare-groups build broken

2015-09-17 Thread Nathan Hjelm

No, it was not. Will fix.

-Nathan

On Wed, Sep 16, 2015 at 07:26:58PM -0700, Ralph Castain wrote:
>Yes - Nathan made some changes related to the add_procs code. I doubt that
>configure option was checked...
>On Wed, Sep 16, 2015 at 7:13 PM, Jeff Squyres (jsquyres)
> wrote:
> 
>  Did something change in the group structure in the last 24-48 hours?
> 
>  --enable-spare-groups groups are currently broken:
> 
>  
>  make[2]: Entering directory `/home/jsquyres/git/ompi/ompi/debuggers'
>CC   libdebuggers_la-ompi_debuggers.lo
>  In file included from ../../ompi/communicator/communicator.h:38:0,
>   from ../../ompi/mca/pml/base/pml_base_request.h:32,
>   from ompi_debuggers.c:67:
>  ../../ompi/group/group.h: In function `ompi_group_get_proc_ptr':
>  ../../ompi/group/group.h:366:52: error: `peer_id' undeclared (first use
>  in this function)
>   return ompi_group_dense_lookup (group, peer_id, allocate);
>  ^
>  ../../ompi/group/group.h:366:52: note: each undeclared identifier is
>  reported only once for each function it appears in
>  -
> 
>  Can someone have a look?
> 
>  Thanks.
>  --
>  Jeff Squyres
>  jsquy...@cisco.com
>  For corporate legal information go to:
>  http://www.cisco.com/web/about/doing_business/legal/cri/
> 
>  ___
>  devel mailing list
>  de...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>  Link to this post:
>  http://www.open-mpi.org/community/lists/devel/2015/09/18056.php

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18057.php



pgpUgIJf38XsO.pgp
Description: PGP signature


Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-17 Thread Jeff Squyres (jsquyres)
On Sep 16, 2015, at 12:02 PM, George Bosilca  wrote:
> 
> ./opal/mca/btl/usnic/btl_usnic_compat.h:161:OMPI_FREE_LIST_GET_MT(list, 
> (item))

FWIW: This one exists because we use the same usnic BTL code between master and 
v1.8/v1.10.  We have some configury that figures out in which tree the usNIC 
BTL is being compiles, and reacts accordingly.

Hence, this OMPI_FREE_LIST_GET_MT is only used when compiling in v1.8/v1.10, 
and is ignored in master/v2.x.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] regression running mpi applications with dvm

2015-09-17 Thread Mark Santcroos
Hi (Ralph),

Over the last months I have been focussing on exec throughput, and not so much 
on the application payload (read: mainly using /bin/sleep ;-)
As things are stabilising now, I returned my attention to "real" applications.
To discover that launching MPI applications (build with the same Open MPI 
version) within a DVM doesn't work anymore (see error below).

I've been doing a binary search, but that turned out to be not so trivial 
because of other problems in the window of the change.
So far I've narrowed it down to:

64ec498 - mar 5 - works on my laptop (but not on the Crays)
b67b361 - mar 28 - works once per DVM launch on my laptop, but consecutive 
orte-submits get a connect error
b209c9e - March 30 - same MPI_Init issue as HEAD

Going further into mid-March I run into build issues with verb, runtime issues 
with default binding complaining about missing libnumactl, runtime tcp oob 
errors, etc.
So I don't know whether the binary search will yield much more than I was able 
to dig up now.

What can I do to get closer to debugging the actual issue?

Thanks!

Mark


OMPI_PREFIX=/Users/mark/proj/openmpi/installed/HEAD
OMPI_MCA_orte_hnp_uri=723386368.0;usock;tcp://192.168.0.103:56672
OMPI_MCA_ess=tool
[netbook:70703] Job [11038,3] has launched
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[netbook:70704] Local abort before MPI_INIT completed completed successfully, 
but am not able to aggregate error messages, and not able to guarantee that all 
other processes were killed!



Re: [OMPI devel] regression running mpi applications with dvm

2015-09-17 Thread Ralph Castain
Ouch - this is on current master HEAD? I'm on travel right now, but I'll be
back Fri evening and can look at it this weekend. Probably something silly
that needs to be fixed.


On Thu, Sep 17, 2015 at 11:30 AM, Mark Santcroos  wrote:

> Hi (Ralph),
>
> Over the last months I have been focussing on exec throughput, and not so
> much on the application payload (read: mainly using /bin/sleep ;-)
> As things are stabilising now, I returned my attention to "real"
> applications.
> To discover that launching MPI applications (build with the same Open MPI
> version) within a DVM doesn't work anymore (see error below).
>
> I've been doing a binary search, but that turned out to be not so trivial
> because of other problems in the window of the change.
> So far I've narrowed it down to:
>
> 64ec498 - mar 5 - works on my laptop (but not on the Crays)
> b67b361 - mar 28 - works once per DVM launch on my laptop, but consecutive
> orte-submits get a connect error
> b209c9e - March 30 - same MPI_Init issue as HEAD
>
> Going further into mid-March I run into build issues with verb, runtime
> issues with default binding complaining about missing libnumactl, runtime
> tcp oob errors, etc.
> So I don't know whether the binary search will yield much more than I was
> able to dig up now.
>
> What can I do to get closer to debugging the actual issue?
>
> Thanks!
>
> Mark
>
>
> OMPI_PREFIX=/Users/mark/proj/openmpi/installed/HEAD
> OMPI_MCA_orte_hnp_uri=723386368.0;usock;tcp://192.168.0.103:56672
> OMPI_MCA_ess=tool
> [netbook:70703] Job [11038,3] has launched
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "(null)" (-43) instead of "Success" (0)
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> [netbook:70704] Local abort before MPI_INIT completed completed
> successfully, but am not able to aggregate error messages, and not able to
> guarantee that all other processes were killed!
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18064.php
>


Re: [OMPI devel] regression running mpi applications with dvm

2015-09-17 Thread Mark Santcroos

> On 17 Sep 2015, at 20:34 , Ralph Castain  wrote:
> 
> Ouch - this is on current master HEAD?

Yep!

> I'm on travel right now, but I'll be back Fri evening and can look at it this 
> weekend. Probably something silly that needs to be fixed.

Thanks!

Obviously I didn't check every single version between March and now, but its 
safe to assume that it didn't work in between either I guess.


> 
> 
> On Thu, Sep 17, 2015 at 11:30 AM, Mark Santcroos  
> wrote:
> Hi (Ralph),
> 
> Over the last months I have been focussing on exec throughput, and not so 
> much on the application payload (read: mainly using /bin/sleep ;-)
> As things are stabilising now, I returned my attention to "real" applications.
> To discover that launching MPI applications (build with the same Open MPI 
> version) within a DVM doesn't work anymore (see error below).
> 
> I've been doing a binary search, but that turned out to be not so trivial 
> because of other problems in the window of the change.
> So far I've narrowed it down to:
> 
> 64ec498 - mar 5 - works on my laptop (but not on the Crays)
> b67b361 - mar 28 - works once per DVM launch on my laptop, but consecutive 
> orte-submits get a connect error
> b209c9e - March 30 - same MPI_Init issue as HEAD
> 
> Going further into mid-March I run into build issues with verb, runtime 
> issues with default binding complaining about missing libnumactl, runtime tcp 
> oob errors, etc.
> So I don't know whether the binary search will yield much more than I was 
> able to dig up now.
> 
> What can I do to get closer to debugging the actual issue?
> 
> Thanks!
> 
> Mark
> 
> 
> OMPI_PREFIX=/Users/mark/proj/openmpi/installed/HEAD
> OMPI_MCA_orte_hnp_uri=723386368.0;usock;tcp://192.168.0.103:56672
> OMPI_MCA_ess=tool
> [netbook:70703] Job [11038,3] has launched
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "(null)" (-43) instead of "Success" (0)
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> [netbook:70704] Local abort before MPI_INIT completed completed successfully, 
> but am not able to aggregate error messages, and not able to guarantee that 
> all other processes were killed!
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18064.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18065.php



Re: [OMPI devel] regression running mpi applications with dvm

2015-09-17 Thread Ralph Castain
Might not - there has been a very large amount of change over the last few
months, and I confess I haven't been checking the DVM regularly. So let me
take a step back and look at that code.

I'll also include the extensions you requested on the other email - I
didn't forget them, just somewhat overwhelmed lately


On Thu, Sep 17, 2015 at 11:39 AM, Mark Santcroos  wrote:

>
> > On 17 Sep 2015, at 20:34 , Ralph Castain  wrote:
> >
> > Ouch - this is on current master HEAD?
>
> Yep!
>
> > I'm on travel right now, but I'll be back Fri evening and can look at it
> this weekend. Probably something silly that needs to be fixed.
>
> Thanks!
>
> Obviously I didn't check every single version between March and now, but
> its safe to assume that it didn't work in between either I guess.
>
>
> >
> >
> > On Thu, Sep 17, 2015 at 11:30 AM, Mark Santcroos <
> mark.santcr...@rutgers.edu> wrote:
> > Hi (Ralph),
> >
> > Over the last months I have been focussing on exec throughput, and not
> so much on the application payload (read: mainly using /bin/sleep ;-)
> > As things are stabilising now, I returned my attention to "real"
> applications.
> > To discover that launching MPI applications (build with the same Open
> MPI version) within a DVM doesn't work anymore (see error below).
> >
> > I've been doing a binary search, but that turned out to be not so
> trivial because of other problems in the window of the change.
> > So far I've narrowed it down to:
> >
> > 64ec498 - mar 5 - works on my laptop (but not on the Crays)
> > b67b361 - mar 28 - works once per DVM launch on my laptop, but
> consecutive orte-submits get a connect error
> > b209c9e - March 30 - same MPI_Init issue as HEAD
> >
> > Going further into mid-March I run into build issues with verb, runtime
> issues with default binding complaining about missing libnumactl, runtime
> tcp oob errors, etc.
> > So I don't know whether the binary search will yield much more than I
> was able to dig up now.
> >
> > What can I do to get closer to debugging the actual issue?
> >
> > Thanks!
> >
> > Mark
> >
> >
> > OMPI_PREFIX=/Users/mark/proj/openmpi/installed/HEAD
> > OMPI_MCA_orte_hnp_uri=723386368.0;usock;tcp://192.168.0.103:56672
> > OMPI_MCA_ess=tool
> > [netbook:70703] Job [11038,3] has launched
> >
> --
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
> environment
> > problems.  This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> >
> >   ompi_mpi_init: ompi_rte_init failed
> >   --> Returned "(null)" (-43) instead of "Success" (0)
> >
> --
> > *** An error occurred in MPI_Init
> > *** on a NULL communicator
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> > ***and potentially your MPI job)
> > [netbook:70704] Local abort before MPI_INIT completed completed
> successfully, but am not able to aggregate error messages, and not able to
> guarantee that all other processes were killed!
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18064.php
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18065.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18066.php
>


Re: [OMPI devel] regression running mpi applications with dvm

2015-09-17 Thread Mark Santcroos

> On 17 Sep 2015, at 20:48 , Ralph Castain  wrote:
> Might not - there has been a very large amount of change over the last few 
> months, and I confess I haven't been checking the DVM regularly. So let me 
> take a step back and look at that code.

Ok.

> I'll also include the extensions you requested on the other email - I didn't 
> forget them, just somewhat overwhelmed lately

Don't worry too much about these, at least not on the short term, I actually 
worked around those ... still have to reply to that mail though, let me do that 
straight away!



[OMPI devel] papers/reports about Open MPI collective algorithms

2015-09-17 Thread Dahai Guo
Hi, 
Is there some technical reports/ papers to summarize the collective algorithms 
used in OpenMPI?, such as MPI_barrier, MPI_bcast, and MPI_Alltoall?
Dahai

Re: [OMPI devel] Interaction between orterun and user program

2015-09-17 Thread Jeff Squyres (jsquyres)
Ralph is the guy who needs to answer this for you -- he's on travel at the 
moment; his response may be a little delayed...


> On Sep 16, 2015, at 4:17 AM, Kay Khandan (Hamed)  wrote:
> 
> Hello everyone,
> 
> My name is Kay. I’m a huge "oom-pi" fan, but only recently have been looking 
> at from devel perspective.
> 
> I appreciate if somebody shows me the entry point into understanding how 
> orterun and user program interact, and more importantly how to change the way 
> they interact.
> 
> The reason: I am making a plugin for MPI support in another message passing 
> system. This plugin is loaded from a dynamic library sometime after the 
> process is started and is run on a separated tread. Therefore, (1) it does 
> not receive any command line arguments, and (2) it is not allowed to use 
> standard pipes (file descriptors 0,1, 2). With that in mind, I’de like to 
> interface this plugin from inside so-called ARE (which is the name for the 
> runtime environment for this particular message passing system) to our old 
> friend ORTE. I have the option to run “are” as a user program run by orterun.
> 
> $orterun are ./actual-user-program 
> 
> It might be wishful thinking, but I am also kindda hoping that I could get 
> orterun out of the way all together by embedding a part of its implementation 
> directly inside that plugin.
> 
> I’de appreciate to hear your insights.
> 
> Best,
> — Kay
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18038.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-17 Thread George Bosilca
Alexey,

There is a conceptual different between GET and WAIT: one can return NULL
while the other cannot. If you want a solution with do {} while, I think
the best place is specifically in the PML OB1 recv functions (around the
OMPI_FREE_LIST_GET_MT) and not inside the OMPI_FREE_LIST_GET_MT  macro
itself.

  George.


On Thu, Sep 17, 2015 at 2:35 AM, Алексей Рыжих 
wrote:

> George,
>
> Thank you for response.
>
> In my opinion our solution with do/while() loop  in  OMPI_FREE_LIST_GET_MT
> is better for our MPI+OpenMP hybrid application than using
> OMPI_FREE_LIST_WAIT_MT.
>
> Because in case OMPI_FREE_LIST_WAIT_MT MPI_Irecv()  will be suspended in
> opal_progress() until one of MPI_Irecv() requests  from other thread is
> completed.
>
> And it is not the case when the list reached   free_list_max_num  limit.
> The situation is that the threads consumed   all items from free list
>   before one other thread completed ompi_free_list_grow() and that thread
> executing  ompi_free_list_grow() got NULL.
>
>
>
> Sorry for my poor English.
>
>
>
> Alexey.
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *George
> Bosilca
> *Sent:* Wednesday, September 16, 2015 10:18 PM
>
> *To:* Open MPI Developers
> *Subject:* Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()
>
>
>
> On Wed, Sep 16, 2015 at 3:11 PM, Владимир Трущин <
> vdtrusc...@compcenter.org> wrote:
>
> Sorry, “We saw the following problem in OMPI_FREE_LIST_GET_MT…”.
>
>
>
> That's exactly what the WAIT macro is supposed to solve, wait (grow the
> freelist and call opal_progress) until an item become available.
>
>
>
>   George.
>
>
>
>
>
>
>
> *From:* Владимир Трущин [mailto:vdtrusc...@compcenter.org]
> *Sent:* Wednesday, September 16, 2015 10:09 PM
> *To:* 'Open MPI Developers'
> *Subject:* RE: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()
>
>
>
> George,
>
>
>
> You are right. The sequence of calls in our test is MPI_Irecv ->
> mca_pml_ob1_irecv -> MCA_PML_OB1_RECV_REQUEST_ALLOC. We will try to use
> OMPI_FREE_LIST_WAIT_MT.
>
>
>
> We saw the following problem in OMPI_FREE_LIST_WAIT_MT. It returned NULL
> in case when thread A was suspended after the call of  ompi_free_list_grow.
> At this time others threads took all items from free list at the first call
> of opal_atomic_lifo_pop in macro. So, when thread A was unsuspended and
> call the second opal_atomic_lifo_pop in macro - it returned NULL.
>
>
>
> Best regards,
>
> Vladimir.
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org
> ] *On Behalf Of *George Bosilca
> *Sent:* Wednesday, September 16, 2015 7:00 PM
> *To:* Open MPI Developers
> *Subject:* Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()
>
>
>
> Alexey,
>
>
>
> This is not necessarily the fix for all cases. Most of the internal uses
> of the free_list can easily accommodate to the fact that no more elements
> are available. Based on your description of the problem I would assume you
> encounter this problem once the MCA_PML_OB1_RECV_REQUEST_ALLOC is called.
> In this particular case the problem is that fact that we call
> OMPI_FREE_LIST_GET_MT and that the upper level is unable to correctly deal
> with the case where the returned item is NULL. In this particular case the
> real fix is to use the blocking version of the free_list accessor (similar
> to the case for send) OMPI_FREE_LIST_WAIT_MT.
>
>
>
>
>
> It is also possible that I misunderstood your problem. IF the solution
> above doesn't work can you describe exactly where the NULL return of the
> OMPI_FREE_LIST_GET_MT is creating an issue?
>
>
>
> George.
>
>
>
>
>
> On Wed, Sep 16, 2015 at 9:03 AM, Алексей Рыжих 
> wrote:
>
> Hi all,
>
> We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
> support level)  where several threads submits a lot of MPI_Irecv() requests
> simultaneously and encountered an intermittent bug
> OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC()
> because  OMPI_FREE_LIST_GET_MT()  returned NULL.  Investigating this bug we
> found that sometimes the thread calling ompi_free_list_grow()  don’t have
> any free items in LIFO list at exit because other threads  retrieved  all
> new items at opal_atomic_lifo_pop()
>
> So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
>
>
>
> #define OMPI_FREE_LIST_GET_MT(fl, item)
>\
>
> {
>   \
>
> item = (ompi_free_list_item_t*)
> opal_atomic_lifo_pop(&((fl)->super)); \
>
> if( OPAL_UNLIKELY(NULL == item) )
> {   \
>
> if(opal_using_threads())
> {\
>
> int rc;
>   \
>
>
> opal_mutex_lock(&((fl)->fl_lock));\
>
>
> do\
>
> {
>   \
>
>   

Re: [OMPI devel] orte-dvm and orte_max_vm_size

2015-09-17 Thread Mark Santcroos
Hi Ralph,

Sorry for the late reply, something along the lines of "swamped" ;-)

> On 03 Sep 2015, at 16:04 , Ralph Castain  wrote:
> The purpose of orte_max_vm_size is to subdivide the allocation - i.e., for a 
> given mpirun execution, you can specify to only use a certain number of the 
> allocated nodes. If you want to further limit the VM to specific nodes in the 
> allocation, then you would use -host option.

*nods* Thanks, thats also how I interpreted it.

> It’s a little more complicated for your use-case as orte-dvm defines the VM, 
> not orte-submit. The latter simply tells orte-dvm to launch an application - 
> the daemons have already been established by orte-dvm and cannot change. So 
> if you want to setup orte-dvm and then submit to only some of the nodes, you 
> would have to use the -host option. Note that -host supports an extended 
> syntax for this purpose - you can ask for a specific number of “empty” nodes, 
> you can tell it to use only so many slots on a node, etc.

Ack. My question originated from running the dvm on a limited set.

> I’m confused by your examples because the max_vm_size values don’t seem 
> right. If you have a VM of size 1 or 2, then max_vm_size can only be 1 or 2. 
> You can’t have a max_vm_size larger than the number of available nodes. This 
> is probably the source of the problem you are seeing - I can add some 
> protection to ensure this doesn’t happen.

I screwed up my write-up, the actual calls were correct, but I understand your 
confusion :-)
(In my code I have a "reservation size", which I mixed up with the VM size in 
my original mail)

> We don’t appear to support either -host or -np as MCA params.
> I’m not sure -np would make sense,

I probably agree with that.

> but we could add a param for -host.

Yeah, that would help.

> We do have a param for the default hostfile, but that probably wouldn’t help 
> here.

I was expecting such a thing actually, that also raised my MCA question.

> We can certainly extend the orte-dvm and orte-submit cmd lines. I only 
> brought over a minimal set at first in order to get things running quickly, 
> but no problem with increasing capability. Just a question of finding a 
> little time.

Fully understandable!

> For ompi_info, try doing “ompi_info -l 9” to get the full output of params.

Right I tried that. So I don't understand it completely or it doesn't work as 
expected, as I dont manage to get e.g. "orte_max_vm_size" as output from that.

(I also believe that -all sets the level to 9 already)

Thanks!

Mark


> 
> 
>> On Sep 3, 2015, at 5:08 AM, Mark Santcroos  
>> wrote:
>> 
>> Hi,
>> 
>> I've been running into some funny issue with using orte-dvm (Hi Ralph ;-) 
>> and trying to define the size of the created vm and for that I use "--mca 
>> orte_max_vm_size" which in general seems to work.
>> 
>> In this example I have a PBS job of 4 nodes and want to run the DVM on < 4 
>> nodes.
>> If I create the VM with size 3 or 4 (max_vm_size 1 and 0 respectively) 
>> everything works as expected.
>> However, when I create a VM of size 1 or 2 (max_vm_size 3 and 2 
>> respectively) I get the stack trace below once I use orte-submit to start 
>> something within the VM.
>> 
>> [nid01280:02498] [[39239,0],0] orted:comm:process_commands() Processing 
>> Command: ORTE_DAEMON_SPAWN_JOB_CMD
>> orte-dvm: ../../../../../src/ompi/opal/class/opal_list.h:547: 
>> _opal_list_append: Assertion `0 == item->opal_list_item_refcount' failed.
>> [nid01280:02498] *** Process received signal ***
>> [nid01280:02498] Signal: Aborted (6)
>> [nid01280:02498] Signal code:  (-6)
>> [nid01280:02498] [ 0] /lib64/libpthread.so.0(+0xf810)[0x2ba3e274a810]
>> [nid01280:02498] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x2ba3e298b885]
>> [nid01280:02498] [ 2] /lib64/libc.so.6(abort+0x181)[0x2ba3e298ce61]
>> [nid01280:02498] [ 3] /lib64/libc.so.6(__assert_fail+0xf0)[0x2ba3e2984740]
>> [nid01280:02498] [ 4] 
>> /global/homes/m/marksant/openmpi/edison/installed/HEAD/lib/libopen-rte.so.0(+0x83f16)[0x2ba3e1687f16]
>> [nid01280:02498] [ 5] 
>> /global/homes/m/marksant/openmpi/edison/installed/HEAD/lib/libopen-rte.so.0(orte_plm_base_setup_virtual_machine+0x473)[0x2ba3e16907fe]
>> [nid01280:02498] [ 6] 
>> /global/homes/m/marksant/openmpi/edison/installed/HEAD/lib/openmpi/mca_plm_alps.so(+0x274d)[0x2ba3e666574d]
>> [nid01280:02498] [ 7] 
>> /global/homes/m/marksant/openmpi/edison/installed/HEAD/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0xd81)[0x2ba3e198cee1]
>> [nid01280:02498] [ 8] 
>> /global/homes/m/marksant/openmpi/edison/installed/HEAD/bin/orte-dvm[0x402e20]
>> [nid01280:02498] [ 9] 
>> /lib64/libc.so.6(__libc_start_main+0xe6)[0x2ba3e2977c36]
>> [nid01280:02498] [10] 
>> /global/homes/m/marksant/openmpi/edison/installed/HEAD/bin/orte-dvm[0x401d19]
>> [nid01280:02498] *** End of error message ***
>> [nid05888:25419] 
>> [[39239,0],1]:../../../../../../src/ompi/orte/mca/errmgr/default_orted/errmgr_default_orted.c(251)
>>  updating exit status to 1
>