Re: [OMPI devel] ompi-dmtcp configure options
On Feb 2, 2011, at 10:38 AM, Javier Martinez Canillas wrote: > Yes, I'm really interested in this development branch even if it's > still unstable. We are using OpenMPI + BLCR now but have a few > portability issues since we do not have permissions to load kernel > modules in many systems that our application is supposed to run. So > mtcp in this context fits like a glove. I will keep an eye at this > repo and help in any way I can. Ok, thanks. Work is progressing there; I got word from the MTCP guys the other day that it's generally "working", but there's some subtle problems still creeping in at run-time. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] OMPI 1.4.3 hangs in gather
What IP interfaces are configured on the cluster? In particular, are there IPoIB interfaces that are configured? If you use the dynamic connection method but restrict either the number or type of IP interfaces to be used via oob_tcp_if_{include,exclude}, do you still see the problem? --brad On Wed, Jan 26, 2011 at 12:14 AM, Doron Shoham wrote: > using the flag --mca mpi_preconnect_mpi seems to solved the issue with the > oob connection manager. > This solution is not scalable but it looks more and more like a connection > establishment problem. > I'm still trying to figure out what is the root cause of this and how to > solve it. > Any ideas will be more then welcome. > > > Thanks, > Doron > > On Tue, Jan 18, 2011 at 3:29 PM, Terry Dontje wrote: > >> On 01/18/2011 07:48 AM, Jeff Squyres wrote: >> >> IBCM is broken and disabled (has been for a long time). >> >> Did you mean RDMACM? >> >> >> >> No I think I meant OMPI oob. >> >> sorry, >> >> -- >> [image: Oracle] >> >> Terry D. Dontje | Principal Software Engineer >> Developer Tools Engineering | +1.781.442.2631 >> Oracle *- Performance Technologies* >> 95 Network Drive, Burlington, MA 01803 >> Email terry.don...@oracle.com >> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24356
Eugene -- This ROMIO fix needs to go upstream. On Feb 3, 2011, at 6:53 PM, eug...@osl.iu.edu wrote: > Author: eugene > Date: 2011-02-03 18:53:21 EST (Thu, 03 Feb 2011) > New Revision: 24356 > URL: https://svn.open-mpi.org/trac/ompi/changeset/24356 > > Log: > Some minor changes to help the openib BTL build and run on Solaris: > - poll() can return POLLRDNORM even if not requested (Solaris bug) > - MIN macro not defined in btl_openib.c > and while we're at it, we clean up the MIN definition in ad_bgl_pset.h > - btl_openib_connect_rdmacm.c was calling rdma_destroy_id() twice > leading to undefined behavior (a hang on Solaris) > > Text files modified: > trunk/ompi/mca/btl/openib/btl_openib.c| 3 +++ > > trunk/ompi/mca/btl/openib/btl_openib_async.c | 7 > +++ > trunk/ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c | 6 > +- > trunk/ompi/mca/io/romio/romio/adio/ad_bgl/ad_bgl_pset.h | 2 +- > > 4 files changed, 16 insertions(+), 2 deletions(-) > > Modified: trunk/ompi/mca/btl/openib/btl_openib.c > == > --- trunk/ompi/mca/btl/openib/btl_openib.c(original) > +++ trunk/ompi/mca/btl/openib/btl_openib.c2011-02-03 18:53:21 EST (Thu, > 03 Feb 2011) > @@ -70,6 +70,9 @@ > #ifdef HAVE_UNISTD_H > #include > #endif > +#ifndef MIN > +#define MIN(a,b) ((a)<(b)?(a):(b)) > +#endif > > mca_btl_openib_module_t mca_btl_openib_module = { > { > > Modified: trunk/ompi/mca/btl/openib/btl_openib_async.c > == > --- trunk/ompi/mca/btl/openib/btl_openib_async.c (original) > +++ trunk/ompi/mca/btl/openib/btl_openib_async.c 2011-02-03 18:53:21 EST > (Thu, 03 Feb 2011) > @@ -432,6 +432,13 @@ > /* no events */ > break; > case POLLIN: > +#if defined(__SVR4) && defined(__sun) > +/* > + * Need workaround for Solaris IB user verbs since > + * "Poll on IB async fd returns POLLRDNORM revent even > though it is masked out" > + */ > +case POLLIN | POLLRDNORM: > +#endif > /* Processing our event */ > if (0 == i) { > /* 0 poll we use for comunication with main thread */ > > Modified: trunk/ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c > == > --- trunk/ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c > (original) > +++ trunk/ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c > 2011-02-03 18:53:21 EST (Thu, 03 Feb 2011) > @@ -1922,7 +1922,11 @@ > return OMPI_SUCCESS; > > out5: > -rdma_destroy_id(context->id); > +/* > + * Since rdma_create_id() succeeded, we need > "rdma_destroy_id(context->id)". > + * But don't do it here since it's part of out4:OBJ_RELEASE(context), > + * and we don't want to do it twice. > + */ > out4: > opal_list_remove_first(&(server->ids)); > OBJ_RELEASE(context); > > Modified: trunk/ompi/mca/io/romio/romio/adio/ad_bgl/ad_bgl_pset.h > == > --- trunk/ompi/mca/io/romio/romio/adio/ad_bgl/ad_bgl_pset.h (original) > +++ trunk/ompi/mca/io/romio/romio/adio/ad_bgl/ad_bgl_pset.h 2011-02-03 > 18:53:21 EST (Thu, 03 Feb 2011) > @@ -47,7 +47,7 @@ > > > #undef MIN > -#define MIN(a,b) ((a +#define MIN(a,b) ((a)<(b) ? (a) : (b)) > > > /* Default is to choose 8 aggregator nodes in each 32 CN pset. > ___ > svn-full mailing list > svn-f...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24356
Jeff Squyres wrote: Eugene -- This ROMIO fix needs to go upstream. Makes sense. Whom do I pester about that? Is r24356 (and now CMR 2712) okay as is? The ROMIO change is an unimportant stylistic change, so I'm okay cutting it loose from the other changes in the putback.
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24356
I think the fix is fine as at it -- you're right; it's bad style and your macro is clearly more robust. Actually, we should probably wait and lump your fix into more ROMIO fixes to go upstream -- I'm getting a *lot* of compiler warnings with this new ROMIO version. I haven't looked into them yet. On Feb 3, 2011, at 7:54 PM, Eugene Loh wrote: > Jeff Squyres wrote: > >> Eugene -- >> >> This ROMIO fix needs to go upstream. >> > Makes sense. Whom do I pester about that? Is r24356 (and now CMR 2712) okay > as is? The ROMIO change is an unimportant stylistic change, so I'm okay > cutting it loose from the other changes in the putback. > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/