Re: [OMPI devel] [openMPI-infiniband] openMPI in IB network when openSM with LASH is running

2007-11-29 Thread Keshetti Mahesh
On Nov 28, 2007 9:58 PM, Jeff Squyres  wrote:
> There is work starting literally right about now to allow Open MPI to
> use the RDMA CM and/or the IBCM for creating OpenFabrics connections
> (IB or iWARP).
>
>

When this is expected to be completed?

-Mahes


[OMPI devel] configure: error: Cannot support Fortran MPI_ADDRESS_KIND!

2007-11-29 Thread geetha r
Hi,
   I want to install opnempi-1.2.4 on windows machine thru cygwin.

i used following build command to build openmpi-1.2.4 on windows.

./configure --disable-mpi-f77 --with-devel-headers


Getting Following error:
-
configure: error: Cannot support Fortran MPI_ADDRESS_KIND!


PS: can some body pls point me how to resolve this error , so that openmp
get installed on windows.

I dont want fortran specific stuff, even though iam getting this problem, is
it a bug.

iam using all g77,g++ comiplers fro MINGW package(latest)

cheers,
geetha


[OMPI devel] Using MTT to test the newly added SCTP BTL

2007-11-29 Thread Karol Mroz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Good morning...

So Brad Penoff and I would like begin MTT tests of the SCTP BTL, but due
to the presence of the .ompi_ignore file, the SCTP BTL source is not
included in nightly tarballs. I was curious if anyone would object to
our removing .ompi_ignore?

I posted a similar message to MTT-Users and a couple concerns were
brought up which I will address below.

Jeff Squyres wrote:
> One solution might be to remove the .ompi_ignore but to only enable  
> the SCTP BTL when an explicit --with-sctp flag is given to configure  
> (or something similar).  You might want to run this by the [OMPI]  
> group first, but there's precedent for it, so I doubt anyone would  
> object.

The situation at present is that the SCTP BTL only builds on FreeBSD,
OSX and Linux and only if the SCTP is found to be in a standard place.
On Linux, for instance, you need to have installed the lksctp package in
order for the SCTP BTL to build. We also have a --with-sctp configure
option where you can specify the SCTP path should it not be in a
standard location. If SCTP does not exist on the system, then the BTL
will not build and more importantly, will not break the build of the
overall system.

My question now, is it necessary for us to alter the above
behavior (as initially mentioned by Jeff), or is having the SCTP BTL
build iff SCTP is found sufficient?

Thanks in advance for any advice on this matter.

- --
Karol Mroz
km...@cs.ubc.ca

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHTvM9uoug78g/Mz8RAubuAJ9g7BShXIaYJCetVA9OWOFIqtUDAgCdFU+6
2TF/mgeTaJr1hfB7AP3Evb4=
=+aH4
-END PGP SIGNATURE-


[OMPI devel] Using ompi_proc_t's proc_name.vpid as Universal rank

2007-11-29 Thread Sajjad Tabib
Hello,

I have a proprietary transport/messaging layer that sits below an MTL 
component. This transport layer requires OpenMPI to assign it a rank that 
is unique and specific to that process and will not change from execution 
to termination. In a way, I am trying to find a one-one correspondence 
between a process's universal rank in OpenMPI and this transport layer. I 
began looking at ompi_proc_t from different processes and seemingly found 
a unique identifier, proc_name.vpid. Consequently, I assigned the ranks to 
each process in my transport layer based on the value of the local vpid of 
each process.
I have not tested this thoroughly, but it has been working so far. 
Although, I would like to make sure that this is a good approach, or know, 
at least, whether if there are other ways to do this. I would appreciate 
it if you could leave me feedback or give suggestions on how to assign 
universal ranks to a proprietary transport software.

Thanks for your help,

Sajjad Tabib


Re: [OMPI devel] Indirect calls to wait* and test*

2007-11-29 Thread Aurelien Bouteiller
This patch introduces customisable wait/test for requests as discussed  
at the face-to-face ompi meeting in Paris.


A new global structure (ompi_request_functions) holding all the  
pointers to the wait/test functions have been added.  
ompi_request_wait* and ompi_request_test* have been #defined to be  
replaced by ompi_request_functions.req_wait. The default  
implementations of the wait/test functions names have been changed  
from ompi_request_% to ompi_request_default_%. Those functions are  
static initializer of the ompi_request_functions structure.


To modify the defaults, a components 1) copy the  
ompi_request_functions structure (the type ompi_request_fns_t can be  
used to declare a suitable variable), 2) change some of the functions  
according to its needs. This is best done at MPI_init time when there  
is no threads. Should this component be unloaded it have to restore  
the defaults. The ompi_request_default_* functions should never be  
called directly anywhere in the code. If a component needs to access  
the previously defined implementation  of wait, it should call its  
local copy of the function. Component implementors should keep in mind  
that another component might have already changed the defaults and  
needs to be called.


Performance impact on NetPipe -a (async recv mode) does not show  
measurable overhead. Here follows the "diff -y" between original and  
modified ompi assembly code from ompi/mpi/c/wait.c. The only  
significant difference is an extra movl to load the address of the  
ompi_request_functions structure in eax. This obviously explains why  
there is no measurable cost on latency.


ORIGINAL 
MODIFIED


L2: 
L2:
	movl	L_ompi_request_null$non_lazy_ptr-"L001$pb"(%ebx),  
%eax	movl	L_ompi_request_null 
$non_lazy_ptr-"L001$pb"(%ebx), %eax

cmpl%eax, (%edi)
cmpl%eax, (%edi)
je  L18 
je  L18
   >		movl	L_ompi_request_functions 
$non_lazy_ptr-"L001$pb"(%ebx), %eax

movl%esi, 4(%esp)   
movl%esi, 4(%esp)
movl%edi, (%esp)
movl%edi, (%esp)
callL_ompi_request_wait$stub
   |call*16(%eax)

Here is the patch for those who want to try themselves.



custom_request_wait_and_test.patch
Description: Binary data




If I receive comments outlining the need, thread safe accessors could  
be added to allow components to change the functions at anytime during  
execution and not only during MPI_Init/Finalize. Please make noise if  
you find this useful.
If comments does not suggest extra work, I expect this code to be  
committed in trunk next week.


Aurelien

Le 8 oct. 07 à 06:01, Aurelien Bouteiller a écrit :


For message logging purpose, we need to interface with wait_any,
wait_some, test, test_any, test_some, test_all. It is not possible to
use PMPI for this purpose. During the face-to-face meeting in Paris
(5-12 october 2007) we discussed this issue and came to the
conclusion that the best way to achieve this is to replace direct
calls to ompi_request_wait* and test* by indirect calls (same way as
PML send, recv, etc).

Basic idea is to declare a static structure containing the 8 pointers
to all the functions. This structure is initialized at compilation
time with the current basic wait/test functions. Before end of
MPI_init, any component might replace the basics with specialized
functions.

Expected cost is less than .01us latency according to preliminary
test. The method is consistent with the way we call pml send/recv.
Mechanism could be used later for stripping out grequest from
critical path when they are not used.

--
Aurelien Bouteiller, PhD
Innovative Computing Laboratory - MPI group
+1 865 974 6321
1122 Volunteer Boulevard
Claxton Education Building Suite 350
Knoxville, TN 37996

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Dr. Aurelien Bouteiller, Sr. Research Associate
Innovative Computing Laboratory - MPI group
+1 865 974 6321
1122 Volunteer Boulevard
Claxton Education Building Suite 350
Knoxville, TN 37996



[OMPI devel] Branch for iWARP uDAPL enablement

2007-11-29 Thread Jon Mason
I created a public branch to make available the patch which gets OPMI
uDAPL to kinda work on iWARP.  The branch can be found at:
http://svn.open-mpi.org/svn/ompi/tmp-public/iwarp-ompi-v1.2/

The branch contains an updated version of the patch Steve Wise sent out
some time ago.  Below is the patch (on top of the ompi v1.2 tree) that
enables this.

I am currently focusing on other issues, and might not be able to get
back to it for a while.  Therefore, I wanted to make this patch
available to anyone who might need it or want to work on flushing out
the existing bugs.  Feel free to contact me if there are any questions.

Thanks,
Jon

==

This patch gets OPMI uDAPL to kinda work on iWARP.

Specifically, this patch address 3 issues needed for iWARP to work:
1. Force the first DTO from connecting side
2. Post receive buffers for the connection
3. Flush outstanding writes with a 0B read

This patch enforces the rule that all connections must come from the
connecting side.

On iWARP, the connection may be TERMINATED if a SEND arrives on a QP
and no corresponding RECV buffer is posted.  This patch posts the
receive buffers prior to the connection setup completing.

There is a race condition where a the receive buffers for a large
write may be freed prior to the completion of the write.  This patch
to post the 0B read after a large write and use the 0B read completion
to trigger the write completion to the upper layers.

With this patch some MPI test cases using the uDAPL BTL will run,
while others continue to fail.  Without this patch, no MPI programs
will run if using the uDAPL BTL.

This patch breaks IB support, and should not be checked in to the
regular tree until that is fixed.

Index: ompi/mca/btl/udapl/btl_udapl_endpoint.c
===
--- ompi/mca/btl/udapl/btl_udapl_endpoint.c (revision 16805)
+++ ompi/mca/btl/udapl/btl_udapl_endpoint.c (working copy)
@@ -130,7 +130,7 @@
 remote_buffer.segment_length = frag->triplet.segment_length;

 /* write the data out */
-cookie.as_ptr = frag;
+cookie.as_ptr = frag; 
 rc = dat_ep_post_rdma_write(endpoint->endpoint_eager,
 1,
 &(frag->triplet),
@@ -367,7 +367,9 @@
 }
 } 

-(*ep_attr).max_recv_dtos = btl->udapl_max_recv_dtos;
+(*ep_attr).max_recv_dtos = btl->udapl_max_recv_dtos + 1;
+(*ep_attr).max_rdma_read_in = 4;
+(*ep_attr).max_rdma_read_out = 4;

 /* Set max_request_dtos :
  * The max_request_dtos should equal the max number of
@@ -429,6 +431,74 @@
 return rc;
 }

+int mca_btl_udapl_addrdata_send(mca_btl_udapl_module_t* btl,
+DAT_EP_HANDLE endpoint)
+{
+mca_btl_udapl_frag_t* frag;
+DAT_DTO_COOKIE cookie;
+static int32_t connection_seq = 1;
+int rc;
+
+/* Send our local address data over this EP */
+frag = (mca_btl_udapl_frag_t*)mca_btl_udapl_alloc(
+(mca_btl_base_module_t*)btl, sizeof(mca_btl_udapl_addr_t) +
+sizeof(int32_t));
+cookie.as_ptr = frag;
+
+memcpy(frag->segment.seg_addr.pval,
+&btl->udapl_addr, sizeof(mca_btl_udapl_addr_t));
+memcpy((char *)frag->segment.seg_addr.pval + sizeof(mca_btl_udapl_addr_t),
+&connection_seq, sizeof(int32_t));
+connection_seq++;
+
+frag->type = MCA_BTL_UDAPL_CONN_SEND;
+
+rc = dat_ep_post_send(endpoint, 1,
+&frag->triplet, cookie, DAT_COMPLETION_DEFAULT_FLAG);
+if(DAT_SUCCESS != rc) {
+char* major;
+char* minor;
+
+dat_strerror(rc, (const char**)&major,
+(const char**)&minor);
+BTL_ERROR(("ERROR: %s %s %s\n", "dat_ep_post_send",
+major, minor));
+return OMPI_ERROR;
+}
+
+return OMPI_SUCCESS;
+}
+
+static inline int mca_btl_udapl_addrdata_recv(mca_btl_udapl_module_t* btl,
+DAT_EP_HANDLE endpoint)
+{
+mca_btl_udapl_frag_t* frag;
+DAT_DTO_COOKIE cookie;
+int rc;
+
+/* Post a receive to get the peer's address data */
+frag = (mca_btl_udapl_frag_t*)mca_btl_udapl_alloc(
+(mca_btl_base_module_t*)btl, sizeof(mca_btl_udapl_addr_t) +
+sizeof(int32_t));
+cookie.as_ptr = frag;
+
+frag->type = MCA_BTL_UDAPL_CONN_RECV;
+
+rc = dat_ep_post_recv(endpoint, 1,
+&frag->triplet, cookie, DAT_COMPLETION_DEFAULT_FLAG);
+if(DAT_SUCCESS != rc) {
+char* major;
+char* minor;
+
+dat_strerror(rc, (const char**)&major,
+(const char**)&minor);
+BTL_ERROR(("ERROR: %s %s %s\n", "dat_ep_post_recv",
+major, minor));
+return OMPI_ERROR;
+}
+return OMPI_SUCCESS;
+}
+
 /*
  * Create a uDAPL endpoint
  *
@@ -457,6 +527,15 @@
 major, minor));
 dat_ep_free(udapl_endpoint);
 udapl_endpoint = DAT_HANDLE_NULL;
+} else {
+   DAT_CONTEXT c;
+
+/* pre-post recv buffer for exchanging address data */
+mca_btl_udapl_addrdata_r