Re: [OMPI devel] ompi-dmtcp configure options

2011-02-03 Thread Jeff Squyres
On Feb 2, 2011, at 10:38 AM, Javier Martinez Canillas wrote:

> Yes, I'm really interested in this development branch even if it's
> still unstable. We are using OpenMPI + BLCR now but have a few
> portability issues since we do not have permissions to load kernel
> modules in many systems that our application is supposed to run. So
> mtcp in this context fits like a glove. I will keep an eye at this
> repo and help in any way I can.

Ok, thanks.  Work is progressing there; I got word from the MTCP guys the other 
day that it's generally "working", but there's some subtle problems still 
creeping in at run-time.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-02-03 Thread Brad Benton
What IP interfaces are configured on the cluster?  In particular, are there
IPoIB interfaces that are configured?  If you use the dynamic connection
method but restrict either the number or type of IP interfaces to be used
via oob_tcp_if_{include,exclude}, do you still see the problem?

--brad


On Wed, Jan 26, 2011 at 12:14 AM, Doron Shoham  wrote:

> using the flag --mca mpi_preconnect_mpi seems to solved the issue with the
> oob connection manager.
> This solution is not scalable but it looks more and more like a connection
> establishment problem.
> I'm still trying to figure out what is the root cause of this and how to
> solve it.
> Any ideas will be more then welcome.
>
>
> Thanks,
> Doron
>
> On Tue, Jan 18, 2011 at 3:29 PM, Terry Dontje wrote:
>
>>  On 01/18/2011 07:48 AM, Jeff Squyres wrote:
>>
>> IBCM is broken and disabled (has been for a long time).
>>
>> Did you mean RDMACM?
>>
>>
>>
>> No I think I meant OMPI oob.
>>
>> sorry,
>>
>> --
>> [image: Oracle]
>>
>> Terry D. Dontje | Principal Software Engineer
>> Developer Tools Engineering | +1.781.442.2631
>> Oracle *- Performance Technologies*
>> 95 Network Drive, Burlington, MA 01803
>> Email terry.don...@oracle.com
>>
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24356

2011-02-03 Thread Jeff Squyres
Eugene --

This ROMIO fix needs to go upstream.




On Feb 3, 2011, at 6:53 PM, eug...@osl.iu.edu wrote:

> Author: eugene
> Date: 2011-02-03 18:53:21 EST (Thu, 03 Feb 2011)
> New Revision: 24356
> URL: https://svn.open-mpi.org/trac/ompi/changeset/24356
> 
> Log:
> Some minor changes to help the openib BTL build and run on Solaris:
> - poll() can return POLLRDNORM even if not requested (Solaris bug)
> - MIN macro not defined in btl_openib.c
>  and while we're at it, we clean up the MIN definition in ad_bgl_pset.h
> - btl_openib_connect_rdmacm.c was calling rdma_destroy_id() twice
>  leading to undefined behavior (a hang on Solaris)
> 
> Text files modified: 
>   trunk/ompi/mca/btl/openib/btl_openib.c| 3 +++   
>   
>   trunk/ompi/mca/btl/openib/btl_openib_async.c  | 7 
> +++ 
>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c | 6 
> +-  
>   trunk/ompi/mca/io/romio/romio/adio/ad_bgl/ad_bgl_pset.h   | 2 +-
>   
>   4 files changed, 16 insertions(+), 2 deletions(-)
> 
> Modified: trunk/ompi/mca/btl/openib/btl_openib.c
> ==
> --- trunk/ompi/mca/btl/openib/btl_openib.c(original)
> +++ trunk/ompi/mca/btl/openib/btl_openib.c2011-02-03 18:53:21 EST (Thu, 
> 03 Feb 2011)
> @@ -70,6 +70,9 @@
> #ifdef HAVE_UNISTD_H
> #include 
> #endif
> +#ifndef MIN
> +#define MIN(a,b) ((a)<(b)?(a):(b))
> +#endif
> 
> mca_btl_openib_module_t mca_btl_openib_module = {
> {
> 
> Modified: trunk/ompi/mca/btl/openib/btl_openib_async.c
> ==
> --- trunk/ompi/mca/btl/openib/btl_openib_async.c  (original)
> +++ trunk/ompi/mca/btl/openib/btl_openib_async.c  2011-02-03 18:53:21 EST 
> (Thu, 03 Feb 2011)
> @@ -432,6 +432,13 @@
> /* no events */
> break;
> case POLLIN:
> +#if defined(__SVR4) && defined(__sun)
> +/*
> + * Need workaround for Solaris IB user verbs since
> + * "Poll on IB async fd returns POLLRDNORM revent even 
> though it is masked out"
> + */
> +case POLLIN | POLLRDNORM:
> +#endif
> /* Processing our event */
> if (0 == i) {
> /* 0 poll we use for comunication with main thread */
> 
> Modified: trunk/ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c
> ==
> --- trunk/ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c 
> (original)
> +++ trunk/ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c 
> 2011-02-03 18:53:21 EST (Thu, 03 Feb 2011)
> @@ -1922,7 +1922,11 @@
> return OMPI_SUCCESS;
> 
> out5:
> -rdma_destroy_id(context->id);
> +/*
> + * Since rdma_create_id() succeeded, we need 
> "rdma_destroy_id(context->id)".
> + * But don't do it here since it's part of out4:OBJ_RELEASE(context),
> + * and we don't want to do it twice.
> + */
> out4:
> opal_list_remove_first(&(server->ids));
> OBJ_RELEASE(context);
> 
> Modified: trunk/ompi/mca/io/romio/romio/adio/ad_bgl/ad_bgl_pset.h
> ==
> --- trunk/ompi/mca/io/romio/romio/adio/ad_bgl/ad_bgl_pset.h   (original)
> +++ trunk/ompi/mca/io/romio/romio/adio/ad_bgl/ad_bgl_pset.h   2011-02-03 
> 18:53:21 EST (Thu, 03 Feb 2011)
> @@ -47,7 +47,7 @@
> 
> 
> #undef MIN
> -#define MIN(a,b) ((a +#define MIN(a,b) ((a)<(b) ? (a) : (b))
> 
> 
> /* Default is to choose 8 aggregator nodes in each 32 CN pset. 
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24356

2011-02-03 Thread Eugene Loh

Jeff Squyres wrote:


Eugene --

This ROMIO fix needs to go upstream.
 

Makes sense.  Whom do I pester about that?  Is r24356 (and now CMR 2712) 
okay as is?  The ROMIO change is an unimportant stylistic change, so I'm 
okay cutting it loose from the other changes in the putback.


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24356

2011-02-03 Thread Jeff Squyres
I think the fix is fine as at it -- you're right; it's bad style and your macro 
is clearly more robust.

Actually, we should probably wait and lump your fix into more ROMIO fixes to go 
upstream -- I'm getting a *lot* of compiler warnings with this new ROMIO 
version.  I haven't looked into them yet.


On Feb 3, 2011, at 7:54 PM, Eugene Loh wrote:

> Jeff Squyres wrote:
> 
>> Eugene --
>> 
>> This ROMIO fix needs to go upstream.
>> 
> Makes sense.  Whom do I pester about that?  Is r24356 (and now CMR 2712) okay 
> as is?  The ROMIO change is an unimportant stylistic change, so I'm okay 
> cutting it loose from the other changes in the putback.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/