George --

I believe that this is the subject of a few long-standing tickets (i.e., what 
to do when running out of registered memory -- right now, we hang, for a few 
reasons).  I think that this is Mellanox's attempt to at least warn the user 
that we have run out of registered memory, and will therefore hang.

Once the hangs have been fixed, I'm assuming this message can be removed.

Note, too, that this is in the BTL registration code (openib_reg_mr), not in 
the directly-invoked-by-the-PML code.  So it's the mpool's fault -- not the 
PML's fault.



On Mar 6, 2012, at 10:05 AM, George Bosilca wrote:

> I din't check thoroughly the code, but OMPI_ERR_OUT_OF_RESOURCES is not an 
> error. If the registration returns out of resources, the BTL will returns 
> OUT_OF_RESOURCE (as an example via the mca_btl_openib_prepare_src). At the 
> upper level, the PML (in the mca_pml_ob1_send_request_start function) 
> intercept it and insert the request into a pending list. Later on this 
> pending list will be examined and the request for resource re-issued.
> 
> Why do we need to trigger a BTL_ERROR for OUT_OF_RESOURCES?
> 
>   george.
> 
> On Mar 6, 2012, at 09:48 , Jeffrey Squyres wrote:
> 
> > Mike --
> >
> > I would make this a bit better of an error.  I.e., use orte_show_help(), so 
> > you can explain the issue more, and also remove all duplicates (i.e., if it 
> > fails to register multiple times).
> >
> >
> > On Mar 6, 2012, at 8:25 AM, mi...@osl.iu.edu wrote:
> >
> >> Author: miked
> >> Date: 2012-03-06 09:25:56 EST (Tue, 06 Mar 2012)
> >> New Revision: 26106
> >> URL: https://svn.open-mpi.org/trac/ompi/changeset/26106
> >>
> >> Log:
> >> print error which is ignored on upper layer
> >> Text files modified:
> >>  trunk/ompi/mca/btl/openib/btl_openib_component.c |     2 ++               
> >>                       
> >>  1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >> Modified: trunk/ompi/mca/btl/openib/btl_openib_component.c
> >> ==============================================================================
> >> --- trunk/ompi/mca/btl/openib/btl_openib_component.c (original)
> >> +++ trunk/ompi/mca/btl/openib/btl_openib_component.c 2012-03-06 09:25:56 
> >> EST (Tue, 06 Mar 2012)
> >> @@ -569,6 +569,8 @@
> >>    openib_reg->mr = ibv_reg_mr(device->ib_pd, base, size, access_flag);
> >>
> >>    if (NULL == openib_reg->mr) {
> >> +        BTL_ERROR(("%s: error pinning openib memory errno says %s",
> >> +                       __func__, strerror(errno)));
> >>        return OMPI_ERR_OUT_OF_RESOURCE;
> >>    }
> >>
> >> _______________________________________________
> >> svn-full mailing list
> >> svn-f...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to: 
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to