Re: [OMPI devel] Moving fragments in btl sm

2007-11-09 Thread Torje Henriksen


Thanks, don't know how long it would take me to find it on my own.

And thanks to you too of course, Ollie :)


Regards,

-Torje


On Thu, 8 Nov 2007, George Bosilca wrote:

The real memory copy happen in the convertor, more specifically in the 
ompi_convertor_pack for the sender and in the ompi_convertor_unpack for the 
receiver. In fact, none of the BTL directly call memcpy, all memory movements 
are done via the convertor.


george.

On Nov 8, 2007, at 7:38 AM, Torje Henriksen wrote:


Hi,

I have a question that I shouldn't need to ask, but I'm
kind of lost in the code.

The btl sm component is using the circular buffers to write and read
fragments (sending and receiving).

In the write_to_head and read_from_tail I can only see pointers beeing set,
no data being moved. So where does the actual data movement/copying take
place? I'm thinking maybe a callback function existing somewhere :)


Thank you for your help now and earlier.


Best regards,

Torje Henriksen
(tor...@stud.cs.uit.no)

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] === CREATE FAILURE ===

2007-11-09 Thread Josh Hursey

Sorry about this. I fixed the problem in r16706.

It seems that an old version of BLCR is installed in the default path  
on the IU machine that builds the Open MPI tarballs. This has not  
been a problem in the past since I have not been using any new  
features of BLCR. Since I am starting to use some of this new  
functionality the IU machine was not able to find the right symbols  
and died. I implemented some configure checks to work around this.


Sorry again,
Josh

On Nov 8, 2007, at 9:13 PM, MPI Team wrote:



ERROR: Command returned a non-zero exist status
   make -j 4 distcheck

Start time: Thu Nov  8 21:00:26 EST 2007
End time:   Thu Nov  8 21:13:07 EST 2007

== 
=

[... previous lines snipped ...]
/bin/sh ../libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H - 
I. -I../opal/include -I../orte/include -I../ompi/include -I../opal/ 
mca/paffinity/linux/plpa/src/libplpa -I../../opal   -I../.. -I.. - 
I../../opal/include -I../../orte/include -I../../ompi/include- 
O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -MT  
threads/condition.lo -MD -MP -MF $depbase.Tpo -c -o threads/ 
condition.lo ../../opal/threads/condition.c &&\

mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../opal/include -I../ 
orte/include -I../ompi/include -I../opal/mca/paffinity/linux/plpa/ 
src/libplpa -I../../opal -I../.. -I.. -I../../opal/include -I../../ 
orte/include -I../../ompi/include -O3 -DNDEBUG -finline-functions - 
fno-strict-aliasing -pthread -MT runtime/opal_params.lo -MD -MP -MF  
runtime/.deps/opal_params.Tpo -c ../../opal/runtime/opal_params.c  - 
fPIC -DPIC -o runtime/.libs/opal_params.o
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../opal/include -I../ 
orte/include -I../ompi/include -I../opal/mca/paffinity/linux/plpa/ 
src/libplpa -I../../opal -I../.. -I.. -I../../opal/include -I../../ 
orte/include -I../../ompi/include -O3 -DNDEBUG -finline-functions - 
fno-strict-aliasing -pthread -MT runtime/opal_cr.lo -MD -MP -MF  
runtime/.deps/opal_cr.Tpo -c ../../opal/runtime/opal_cr.c  -fPIC - 
DPIC -o runtime/.libs/opal_cr.o
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../opal/include -I../ 
orte/include -I../ompi/include -I../opal/mca/paffinity/linux/plpa/ 
src/libplpa -I../../opal -I../.. -I.. -I../../opal/include -I../../ 
orte/include -I../../ompi/include -O3 -DNDEBUG -finline-functions - 
fno-strict-aliasing -pthread -MT threads/condition.lo -MD -MP -MF  
threads/.deps/condition.Tpo -c ../../opal/threads/condition.c  - 
fPIC -DPIC -o threads/.libs/condition.o

depbase=`echo threads/mutex.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ../libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H - 
I. -I../opal/include -I../orte/include -I../ompi/include -I../opal/ 
mca/paffinity/linux/plpa/src/libplpa -I../../opal   -I../.. -I.. - 
I../../opal/include -I../../orte/include -I../../ompi/include- 
O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -MT  
threads/mutex.lo -MD -MP -MF $depbase.Tpo -c -o threads/ 
mutex.lo ../../opal/threads/mutex.c &&\

mv -f $depbase.Tpo $depbase.Plo
depbase=`echo threads/thread.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ../libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H - 
I. -I../opal/include -I../orte/include -I../ompi/include -I../opal/ 
mca/paffinity/linux/plpa/src/libplpa -I../../opal   -I../.. -I.. - 
I../../opal/include -I../../orte/include -I../../ompi/include- 
O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -MT  
threads/thread.lo -MD -MP -MF $depbase.Tpo -c -o threads/ 
thread.lo ../../opal/threads/thread.c &&\

mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../opal/include -I../ 
orte/include -I../ompi/include -I../opal/mca/paffinity/linux/plpa/ 
src/libplpa -I../../opal -I../.. -I.. -I../../opal/include -I../../ 
orte/include -I../../ompi/include -O3 -DNDEBUG -finline-functions - 
fno-strict-aliasing -pthread -MT threads/mutex.lo -MD -MP -MF  
threads/.deps/mutex.Tpo -c ../../opal/threads/mutex.c  -fPIC -DPIC - 
o threads/.libs/mutex.o
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../opal/include -I../ 
orte/include -I../ompi/include -I../opal/mca/paffinity/linux/plpa/ 
src/libplpa -I../../opal -I../.. -I.. -I../../opal/include -I../../ 
orte/include -I../../ompi/include -O3 -DNDEBUG -finline-functions - 
fno-strict-aliasing -pthread -MT threads/thread.lo -MD -MP -MF  
threads/.deps/thread.Tpo -c ../../opal/threads/thread.c  -fPIC - 
DPIC -o threads/.libs/thread.o

depbase=`echo threads/tsd.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ../libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H - 
I. -I../opal/include -I../orte/include -I../ompi/include -I../opal/ 
mca/paffinity/linux/plpa/src/libplpa -I../../opal   -I../.. -I.. - 
I../../opal/include -I../../orte/include -I../../ompi/include- 
O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -MT  
threads/tsd.lo -

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-09 Thread Jeff Squyres
Should there be another option for passing MCA parameters between  
processes, such as via stdin (or any file descriptor)?  I.e., during  
the command line parsing to check for command line MCA params, perhaps  
a new argument could be introduced: -mcauri , where  could  
be a few different forms:


- file://stdin: (note the 2 //, not 3, so "stdin" would never conflict  
with a real file named /stdin)  Read the parameters in off stdin.


- rml://...rml contact info...: read in the MCA params via the RML  
(although I assume that reading via the RML would be *wy* to late  
during the MCA setup process -- I mentioned this option for  
completeness, even though I don't think it'll work)


- ip://ipaddress:port: open a socket back and read the MCA params in  
over a socket.  This could have some scalability issues...?  But who  
knows; it could be tied into the hierarchical startup such that we  
wouldn't have to have an all-to-one connection scheme.  Certainly it  
would cause scalability problems when paired with today's all-to-one  
RML connection scheme for the OOB.


I'm not sure that the rml: and ip: schemes are worthwhile.  Maybe a  
file://stdin kind of approach could work?  Or perhaps some other kind  
of URI/IPC...?  (I really haven't thought through the issues -- this  
is off the top of my head)




On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote:


Might I suggest:

https://svn.open-mpi.org/trac/ompi/ticket/1073

It deals with some of these issues and explains the boundaries of the
problem. As for what a string param can contain, I have no opinion.  
I only
note that it must handle special characters such as ';', '/', etc.  
that are
typically found in uri's. I cannot think of any reason it should  
have a

quote in it.

Ralph



On 11/8/07 12:25 PM, "Tim Prins"  wrote:

The alias option you presented does not work. I think we do some  
weird

things to find the absolute path for ssh, instead of just issuing the
command.

I would spend some time fixing this, but I don't want to do it  
wrong. We

could quote all the param values, and change the parser to remove the
quotes, but this is assuming that a mca param does not contain  
quotes.


So I guess there are 2 questions that need to be answered before a  
fix

is made:

1. What exactly can a string mca param contain? Can it have quotes or
spaces or?

2. Which mca parameters should be forwarded? Should it be just the  
ones

from the command line? From the environment? From config files?

Tim

Ralph Castain wrote:
What changed is that we never passed mca params to the orted  
before - they
always went to the app, but it's the orted that has the issue.  
There is a

bug ticket thread on this subject - I forget the number immediately.

Basically, the problem was that we cannot generally pass the local
environment to the orteds when we launch them. However, people  
needed
various mca params to get to the orteds to control their behavior.  
The only
way to resolve that problem was to pass the params via the command  
line,

which is what was done.

Except for a very few cases, all of our mca params are single  
values that do
not include spaces, so this is not a problem that is causing  
widespread
issues. As I said, I already had to deal with one special case  
that didn't
involve spaces, but did have special characters that required  
quoting, which

identified the larger problem of dealing with quoted strings.

I have no objection to a more general fix. Like I said in my note,  
though,
the general fix will take a larger effort. If someone is willing  
to do so,
that is fine with me - I was only offering solutions that would  
fill the
interim time as I haven't heard anyone step up to say they would  
fix it

anytime soon.

Please feel free to jump in and volunteer! ;-) I'm willing to put  
the quotes
around things if you will fix the mca cmd line parser to cleanly  
remove them

on the other end.

Ralph



On 11/7/07 5:50 PM, "Tim Prins"  wrote:

I'm curious what changed to make this a problem. How were we  
passing mca

param
from the base to the app before, and why did it change?

I think that options 1 & 2 below are no good, since we, in  
general, allow
string mca params to have spaces (as far as I understand it). So  
a more

general approach is needed.

Tim

On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote:

Sorry for delay - wasn't ignoring the issue.

There are several fixes to this problem - ranging in order from  
least to

most work:

1. just alias "ssh" to be "ssh -Y" and run without setting the  
mca param.
It won't affect anything on the backend because the daemon/procs  
don't use

ssh.

2. include "pls_rsh_agent" in the array of mca params not to be  
passed to

the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
orte_pls_base_orted_append_basic_args function. This would fix  
the specific
problem cited here, but I admit that listing every such param by  
name would

get tedious.

3. we could ea

Re: [OMPI devel] Multi-Rail and Open IB BTL

2007-11-09 Thread Don Kerr

Gleb,

Another question.  What about the case of one node with 2 ports and one 
node with one port.  Does the open ib btl allow the side with 2 ports to 
establish two  endpoints to the single remote port?


-DON

Gleb Natapov wrote:


On Thu, Nov 01, 2007 at 11:15:21AM -0400, Don Kerr wrote:
 


How would the openib btl handle the following scenario:
Two nodes, each with two ports, all ports are on the same subnet and switch.

Would striping occur over 4 connections or 2?
   


Only two connections will be created.

 

If 2 is it equal distribution or are both local ports connected to the 
same remote port?


   


Equal distribution.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 



Re: [OMPI devel] Multi-Rail and Open IB BTL

2007-11-09 Thread Jeff Squyres

Don --

Are you asking what *does* it do, or what *should* a BTL do?

On Nov 9, 2007, at 1:09 PM, Don Kerr wrote:


Gleb,

Another question.  What about the case of one node with 2 ports and  
one
node with one port.  Does the open ib btl allow the side with 2  
ports to

establish two  endpoints to the single remote port?

-DON

Gleb Natapov wrote:


On Thu, Nov 01, 2007 at 11:15:21AM -0400, Don Kerr wrote:



How would the openib btl handle the following scenario:
Two nodes, each with two ports, all ports are on the same subnet  
and switch.


Would striping occur over 4 connections or 2?



Only two connections will be created.



If 2 is it equal distribution or are both local ports connected to  
the

same remote port?




Equal distribution.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Multi-Rail and Open IB BTL

2007-11-09 Thread Don Kerr
both, I was thinking of listing what I think are multi-rail requirements 
but wanted to understand what the current state of things are


Jeff Squyres wrote:


Don --

Are you asking what *does* it do, or what *should* a BTL do?

On Nov 9, 2007, at 1:09 PM, Don Kerr wrote:

 


Gleb,

Another question.  What about the case of one node with 2 ports and  
one
node with one port.  Does the open ib btl allow the side with 2  
ports to

establish two  endpoints to the single remote port?

-DON

Gleb Natapov wrote:

   


On Thu, Nov 01, 2007 at 11:15:21AM -0400, Don Kerr wrote:


 


How would the openib btl handle the following scenario:
Two nodes, each with two ports, all ports are on the same subnet  
and switch.


Would striping occur over 4 connections or 2?


   


Only two connections will be created.



 

If 2 is it equal distribution or are both local ports connected to  
the

same remote port?



   


Equal distribution.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


 


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   




 



[OMPI devel] initial SCTP BTL commit comments?

2007-11-09 Thread Brad Penoff
Greetings Open MPI developers,

Karol Mroz and I at UBC have been working on a BTL component for SCTP.
 With our own internal testing, the BTL has stabilized so we were
hoping to commit it to ompi-trunk.  Prior to doing so though, we
wanted get some feedback from the community.  Particularly we were
curious if there were any objections to putting an initial version in
the trunk, initially with an ompi_ignore.  The SCTP BTL component
stands alone completely.  So what we're wondering


Any objections to us committing an SCTP BTL to ompi-trunk if it has
the ompi_ignore file in it first?


I'll try to tell a little bit about this new SCTP BTL.  Feel free to
write back if you have any questions.

For starters, SCTP is an IP-based transport protocol.  There are
kernel-based implementations on most major operating systems.  The
best implementation seems to be the FreeBSD stack (now by default in
FreeBSD 7), but the Linux one (lksctp.sf.net) has been getting better
and is currently a module in the vanilla kernel.  These have been the
only two stacks that we have tested on so far; we've been able to run
a handful of our own tests in addition to the OSU, NAS, and Intel
benchmarks.  At present, our autoconf rules only build the component
on these two platforms.  We've also conformed to the Open MPI coding
standards as outlined on the wiki.

For fault tolerance purposes, SCTP connections (termed "associations")
can be made aware of multiple interfaces on the endpoints by binding
to more than one interface (for performance, the CMT extension uses
this multihoming feature to stripe data).  SCTP also has several
different APIs that it supports.  Like TCP, there can be a one-to-one
socket per connection.  Another option is that like UDP, there can be
a single one-to-many socket that is used for all connections.  The
SCTP BTL has the option of using either socket style, depending on the
value of the btl_sctp_if_11 MCA option.  When this value is 1, the
one-to-one socket is used and like the TCP BTL, there are as many BTL
component modules as the number of network cards specified with
if_include and friends.  By default, this value is 0 which means that
a single one-to-many socket is used; here only one BTL module is used
and internally, SCTP itself handles within that one socket all the
network cards specified with if_include, etc.

Currently, both the one-to-one and the one-to-many make use of the
event library offered by Open MPI.  The callback functions for the
one-to-many style however are quite unique as multiple endpoints may
be interested in the events that poll returns.  Currently we use these
unique callback functions, but in the future the hope is to play with
the potential benefits of a btl_progress function, particularly for
the one-to-many style.

At a high level, that's a review of the SCTP BTL component.  The
current design does not make use of the SCTP multistreaming feature;
that is the intent of a future MTL so that we have access to MPI
information (like the context and tag).  The question here is if I can
go ahead and commit, initially with the proper ignore files   any
comments/suggestions/feedback?

Thanks!
brad