The formatting of the code got all messed up. Please send a diff and I will take a look. ompi free list no longer exists in master or the next release branch but the change may be worthwhile for the opal free list code.
-Nathan
On Wed, Sep 16, 2015 at 04:03:44PM +0300, Алексей Рыжих wrote:
> Hi all,
>
> We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
> support level) where several threads submits a lot of MPI_Irecv()
> requests simultaneously and encountered an intermittent bug
> OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC()
> because OMPI_FREE_LIST_GET_MT() returned NULL. Investigating this bug
> we found that sometimes the thread calling ompi_free_list_grow() don't
> have any free items in LIFO list at exit because other threads retrieved
> all new items at opal_atomic_lifo_pop()
>
> So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
>
>
>
> #define OMPI_FREE_LIST_GET_MT(fl, item)
> \
>
> {
> \
>
> item = (ompi_free_list_item_t*)
> opal_atomic_lifo_pop(&((fl)->super)); \
>
> if( OPAL_UNLIKELY(NULL == item) )
> { \
>
> if(opal_using_threads())
> { \
>
> int rc;
> \
>
>
> opal_mutex_lock(&((fl)->fl_lock));
> \
>
>
> do
> \
>
> {
> \
>
> rc = ompi_free_list_grow((fl),
> (fl)->fl_num_per_alloc); \
>
> if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
> break; \
>
>
> \
>
> item = (ompi_free_list_item_t*)
> opal_atomic_lifo_pop(&((fl)->super)); \
>
>
>
> \
>
> } while
> (!item); \
>
>
> opal_mutex_unlock(&((fl)->fl_lock));
> \
>
> } else
> { \
>
> ompi_free_list_grow((fl),
> (fl)->fl_num_per_alloc); \
>
> item = (ompi_free_list_item_t*)
> opal_atomic_lifo_pop(&((fl)->super)); \
>
> } /* opal_using_threads() */
> \
>
> } /* NULL == item
> */ \
>
> }
>
>
>
>
>
> Another workaround is to increase the value of pml_ob1_free_list_inc
> parameter.
>
>
>
> Regards,
>
> Alexey
>
>
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18039.php
pgpRE9F8AQdun.pgp
Description: PGP signature
