Re: [OMPI devel] patch for building gm btl

2008-01-03 Thread Jeff Squyres

On Jan 3, 2008, at 5:13 PM, Patrick Geoffray wrote:


One thing I quickly learned with MTT is that there is only
24 hours in a day :-)



Sweet.  :-)

Be sure to ask any questions you have about MTT on the MTT users list (mtt-us...@open-mpi.org 
).


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] patch for building gm btl

2008-01-03 Thread Patrick Geoffray

Paul,

Paul H. Hargrove wrote:
discuss what tests we will run, but it will probably be a very minimal 
set.  Once we both have MTT setup and running GM tests, we should 
compare configs to avoid overlap (and thus increase coverage).


That would be great. I have only one 32-node 2G cluster I can use 
full-time for MTT testing for GM, MX, OpenMPI, MPICH{1,2}, HP-MPI, and 
many more.  One thing I quickly learned with MTT is that there is only 
24 hours in a day :-)


Patrick


Re: [OMPI devel] patch for building gm btl

2008-01-03 Thread Paul H. Hargrove

Patrick,

 Thanks for the info.
 Jeff and I are working (well Jeff is working anyway) to get MTT setup 
on my cluster to do MTT builds against both the GM-1.6.4 and GM-2.0.19 
libs I have installed.  While there is no current development at Myricom 
for GM, there are still folks with older hardware in the field who are 
using GM (and in my case will continue to do so until the hardware dies).
 We have only 2 nodes (GM-2.0.19) to run on and Jeff and I have yet to 
discuss what tests we will run, but it will probably be a very minimal 
set.  Once we both have MTT setup and running GM tests, we should 
compare configs to avoid overlap (and thus increase coverage).


-Paul

Patrick Geoffray wrote:

Hi Paul,

Paul H. Hargrove wrote:
  
The fact that this has gone unfixed for 2 months suggests to me that 
nobody is building the GM BTL.  So, how would I go about checking ...



  

a) ...if there exists any periodic build of the GM BTL via MTT?



We are deploying MTT on all our clusters. Right now, we use our own MTT 
server, but we will report a subset of the test to the OpenMPI server 
once everything is working.


  

c) ...which GM library versions such builds, if any, compile against



There is no GM tests currently under our still-evolving MTT setup. Once 
we have a working setup, we will run a single Pallas test on 32 nodes 
with GM-2.1.28, two 2G NICs per node (single and dual port). There is no 
active development on GM, just kernel updates, so the GM version does 
not matter much.


Patrick
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900




Re: [OMPI devel] patch for building gm btl

2008-01-03 Thread Patrick Geoffray

Hi Paul,

Paul H. Hargrove wrote:
The fact that this has gone unfixed for 2 months suggests to me that 
nobody is building the GM BTL.  So, how would I go about checking ...



a) ...if there exists any periodic build of the GM BTL via MTT?


We are deploying MTT on all our clusters. Right now, we use our own MTT 
server, but we will report a subset of the test to the OpenMPI server 
once everything is working.



c) ...which GM library versions such builds, if any, compile against


There is no GM tests currently under our still-evolving MTT setup. Once 
we have a working setup, we will run a single Pallas test on 32 nodes 
with GM-2.1.28, two 2G NICs per node (single and dual port). There is no 
active development on GM, just kernel updates, so the GM version does 
not matter much.


Patrick


[OMPI devel] Fixing SPARC bus errors

2008-01-03 Thread Rolf vandeVaart


Greetings.  We have seen some bus errors when compiling a user 
application with certain compiler flags and running on a sparc based 
server.  The issue is that some structures are not word or double word 
aligned causing a bus error.  I have tracked down two places where I can 
make a minor change and everything seems to work fine.   However, I want 
to see if anyone has issues with these changes.  The two changes are 
shown below.


burl-ct-v440-0 206 =>svn diff
Index: ompi/mca/btl/sm/btl_sm_frag.h
===
--- ompi/mca/btl/sm/btl_sm_frag.h(revision 17039)
+++ ompi/mca/btl/sm/btl_sm_frag.h(working copy)
@@ -9,6 +9,7 @@
* University of Stuttgart.  All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
+ * Copyright (c) 2008  Sun Microsystems, Inc.  All rights reserved.
* $COPYRIGHT$
*  * Additional copyrights may follow
@@ -41,6 +42,10 @@
   struct mca_btl_sm_frag_t *frag;
   size_t len;
   mca_btl_base_tag_t tag;
+   /* Add a 4 byte pad to round out structure to 16 bytes for 32-bit
+* and to 24 bytes for 64-bit.  Helps prevent bus errors for strict
+* alignment cases like SPARC. */
+char pad[4];
};
typedef struct mca_btl_sm_hdr_t mca_btl_sm_hdr_t;


Index: ompi/mca/pml/ob1/pml_ob1_recvfrag.h
===
--- ompi/mca/pml/ob1/pml_ob1_recvfrag.h(revision 17039)
+++ ompi/mca/pml/ob1/pml_ob1_recvfrag.h(working copy)
@@ -9,6 +9,7 @@
* University of Stuttgart.  All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
+ * Copyright (c) 2008  Sun Microsystems, Inc.  All rights reserved.
* $COPYRIGHT$
*  * Additional copyrights may follow
@@ -67,7 +68,8 @@
   unsigned char* _ptr = (unsigned char*)frag->addr;   \
   /* init recv_frag */\
   frag->btl = btl;\
-frag->hdr = *(mca_pml_ob1_hdr_t*)hdr;   \
+memcpy(>hdr, (void *)((mca_pml_ob1_hdr_t*)hdr)\
+   sizeof(mca_pml_ob1_hdr_t));  \
   frag->num_segments = 1; \
   _size = segs[0].seg_len;\
   for( i = 1; i < cnt; i++ ) {\
burl-ct-v440-0 207 =>


The ticket associated with this issue is 
https://svn.open-mpi.org/trac/ompi/ticket/1148


Rolf


Re: [OMPI devel] Common initialization code for IB.

2008-01-03 Thread Jeff Squyres

On Jan 3, 2008, at 9:03 AM, Gleb Natapov wrote:

 In Paris we've talked about putting HCA discovery and  
initialization code
outside of openib BTL so other components that want to use IB will  
be able
to share common code, data and registration cache. Other components  
I am
thinking about are ofud and multicast collectives. I started to look  
at
this and I have a couple of problems with this approach. Currently  
openib
BTL has if_include/if_exclude parameters to control which HCAs  
should be

used. Should we make those parameters global and initialize only HCAs
that are not exulted by those filters, or should we initialize all  
HCAs

and each component will have its own include/exclude filters?


Good question.  I think the optimal solution would be to have one set  
of globals (common_of_if_include or somesuch?) with optional per- 
component overrides.  E.g., tell all of OMPI to if_include mthca0, but  
then tell just the multicast collectives to if_include ipath1 (for  
whatever reason).  This would allow fine-grained selection of which  
communication types use which devices.


To minimize the repetition of code, this could be effected by having a  
function in the common/of area that does all the work for the include/ 
exclude behavior.  You can simply call it with any of the MCA param  
values, such as: common_of_if_in/exclude, btl_openib_if_in/exclude,  
coll_of_if_in/exclude, ... and it can return a list of ports to use.



Another
problem is how multicast collective knows that all processes in a
communicator are reachable via the same network, do we have a  
mechanism

in ompi to check this?



Good question.

Perhaps the common_of stuff could hang some data off the ompi_proc_t  
that can be read by any of-like component (btl openib, coll of  
multicast, etc.)...?  This could contain a subnet ID, or perhaps a  
reachable flag, or somesuch.


--
Jeff Squyres
Cisco Systems


[OMPI devel] Common initialization code for IB.

2008-01-03 Thread Gleb Natapov
Hi,

  In Paris we've talked about putting HCA discovery and initialization code
outside of openib BTL so other components that want to use IB will be able
to share common code, data and registration cache. Other components I am
thinking about are ofud and multicast collectives. I started to look at
this and I have a couple of problems with this approach. Currently openib
BTL has if_include/if_exclude parameters to control which HCAs should be
used. Should we make those parameters global and initialize only HCAs
that are not exulted by those filters, or should we initialize all HCAs
and each component will have its own include/exclude filters? Another
problem is how multicast collective knows that all processes in a
communicator are reachable via the same network, do we have a mechanism
in ompi to check this?

--
Gleb.