Re: [OMPI devel] IPv6 support in OpenMPI?

2006-03-31 Thread Adrian Knoth
On Fri, Mar 31, 2006 at 10:44:11AM +0200, Christian Kauhaus wrote:

> Hello *,

Hi.

> University of Jena (Germany). Our work group is digging into how to
> connect several clusters on a campus. 

I think I'm also a member of this workgroup, though I am not
working at University of Jena, but studying there.

> First we are interested to integrate IPv6 support into the tcp btl.
> Does anyone know if there is someone already working on this?

I have a first quick and dirty patch, replacing AF_INET by AF_INET6,
the sockaddr_in structs and so on.

I think it is broken, the calculation of net1 and net2 in
btl_tcp_proc.c isn't really ported and to be honest: I don't
understand the details, i.e. do I have to port name lookups,
are there high level structures relying on IPv4 structs
and so on.

At least it compiles ;) (let's ship it)

I don't know if this patched tcp-component can handle
IPv6 connections, I've never tested it. I think it
even breaks IPv4 functionality; we should make clear
how IPv4 and IPv6 may work in parallel (or may not, if
one considers IPv4 deprecated ;)

You can retrieve the patch here:

   http://cluster.inf-ra.uni-jena.de/~adi/ompi.ipv6.v1.patch

I'd also appreciate any suggestions, hints or even success stories ;)



-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Bill Gates's Motto: "If you can't make it good, make it look good!"


Re: [OMPI devel] IPv6 support in OpenMPI?

2006-03-31 Thread Adrian Knoth
On Fri, Mar 31, 2006 at 09:07:39AM -0500, Brian Barrett wrote:

> > I have a first quick and dirty patch, replacing AF_INET by AF_INET6,
> > the sockaddr_in structs and so on.
> Is there a way to do this to better support both IPv4 and IPv6?

I think so, too. There are probably two different ways to achieve
this: either provide two components "tcp" and "tcp6" or use
v6-mapped-v4 addresses. The first would surely result in a lot
of shared code, but I think this won't be a problem. If it is
possible to have to components (and by this several modules)
for communication, this might be a solution.

The other way, v6-mapped-v4, is how normal userland daemons
are usually implemented. The application only listens on
v6-sockets, v4-addresses are mapped to :::a.b.c.d/96,
where a.b.c.d is the normal 32bit v4-address:

Mar 31 13:58:26 ltw pop3-login: Login: x [:::84.184.164.40]

Perhaps it's a good idea to port any internal structure to
IPv6, as it is able to represent the whole v4 namespace.
One can always determine whether it is a real v6 or only
a mapped v4 address (the common ::: prefix)


> mca_btl_tcp_proc_insert(), which is what I think you're referring to  
> by the net1/net2 code, that's intended to be used to try to get all  
> the multi-nic scenarios wired up in the most advantageous way  
> possible.  So we look at the combination IPv4 addr and netmask and  
> prefer to connect two endpoints in the same subnet.

Ok, this is how I understood the code. The current implementation
does a bitwise AND on uint32, for IPv6 this will be 128 bits.

I don't know of any predeclared type of this size, so we have
to find a different solution. Though the final decision will
always be boolean ("Are we on the same network?" Yes/No), we
have to represent the correct answer.

There is only one comparision between net1 and net2, so the
decision is a local one and we don't really need the
netmasks.

> I'm not sure how IPv6 deals with netmasks and routing, but I'm
> assuming there would be something similar.

Pretty much the same. Netmasks are now called "prefixlen",
integers between 0 (like /0) and 128 (like /32).
The typical onlink prefixlen is /64, there's usually no
smaller (i.e. /112) prefixlen, though it might exist.

Routing aggregation is done by enlarging the prefix.
A typical one is /48, this means 2^16 networks with 2^64
hosts each.

So to say: the LAN prefixlen will be 64 in most cases.
Larger ones (i.e. /48) are only for routing.

I apologize for calling the numerical smaller value of 48
the larger prefix than 64. This just refers to the network
size as the /64 is the smaller network.


> > I don't know if this patched tcp-component can handle
> > IPv6 connections, I've never tested it. I think it
> > even breaks IPv4 functionality; we should make clear
> > how IPv4 and IPv6 may work in parallel (or may not, if
> > one considers IPv4 deprecated ;)
>  From a practical standpoint, Open MPI has to support both IPv4 and  
> IPv6 for the foreseeable future. 

I think so, too. We're dual stacked.

> We currently try to wire up one connection per "IP device", so it
> seems like we should be able to find some way to automatically
> switch between IPv6 or IPv4 based on what we determine is available
> on that host, right?

That's right. The orte-oob seems to be the right place for
this decision, assuming that ompi/mca/btl/tcp can handle
both or have two different components providing the desired
functionality.

Implementing this dual stack behaviour isn't that hard, almost
every userland tool does it this way: try the v6 and if it
fails, use v4. The user can usually force the code to use
either v4 or v6. This shouldn't be too hard in case of
v6-mapped-v4. The only thing to take care is for RFC1918 networks.

adi@drcomp:~$ telnet :::127.0.0.1 25

(works fine)

To automatically select the right protocol, it might be good
to prefer IPv4 (smaller headers->less overhead). The user
can still force the use of IPv6 via DNS (assigning special
IPv6-only hostnames)


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Lieber einen Spanner im Garten als garkein Strom!


Re: [OMPI devel] IPv6 support in OpenMPI?

2006-03-31 Thread Adrian Knoth
On Fri, Mar 31, 2006 at 09:36:31AM -0500, Jeff Squyres (jsquyres) wrote:

> I have no personal experience with IPv6, but one thought that strikes me
> is that the components might be able to figure out what to do by looking
> at/parsing either the hostnames or the results that come back from
> resolving the hostname...?

Yes. You can ask the resolver for v4, v6 or any of them.
The libc functions are standardized and handle both.
The socket family, too. You just have to specify whether
to use AF_INET or AF_INET6. That's all.

Due to the new lookup functions, DNS lookups now return
a linked list of dynamically allocated memory containing
the results for probably multi homed hosts. The common way
is to iterate over this list, try every given address/information
and manually free the memory afterwards.

The whole process in its naive implementation is straightforward.

Are we getting trouble with listen()/accept()? If we use
v6-mapped-v4 (:::a.b.c.d/96), we only have one socket
to bind to and to listen on. But if we create two separate
sockets, are they non-blocking to each other? So to say:
does OMPI already handle more than one listen socket?

Would this be a problem in case of a btl/tcp6-component?

(I really prefer the v6-mapped-v4 solution with a single
 socket, thus eliminating this problem)



-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Werbung für einen Schützenverein:
"Lernen Sie bei uns schießen und treffen Sie gute Freunde!"


Re: [OMPI devel] IPv6 support in OpenMPI?

2006-03-31 Thread Adrian Knoth
On Fri, Mar 31, 2006 at 05:21:42PM +0200, Ralf Wildenhues wrote:

> > Perhaps it's a good idea to port any internal structure to
> > IPv6, as it is able to represent the whole v4 namespace.
> > One can always determine whether it is a real v6 or only
> > a mapped v4 address (the common ::: prefix)
> I'm far from knowledgeable in this networking area, but I have a
> maybe-naive question here: Won't you have to assume in this case that
> the host operating system has IPv6 support, so that the corresponding
> data structures are defined?

This is true. I don't know of any modern OS without IPv6 support,
even Windows provides these structures ;)

If there is really a platform without sockaddr_in6, this should
be catched by configure (reverting to v4-only code, a little
tricky, yes).

As far as I know: All BSDs have v6, Linux has, HPUX, AIX, Solaris,
Windows (XP for sure, 2000 experimental, 9X/ME don't).


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

  Schlecht: Dein Mann zieht gerne Frauenkleider an.
  Panik: Er sieht darin besser aus als du.


Re: [OMPI devel] IPv6 support in OpenMPI?

2006-03-31 Thread Adrian Knoth
On Fri, Mar 31, 2006 at 05:55:28PM +0200, Ralf Wildenhues wrote:

> Have not:
> HP-UX 11.00

HPUX 11iv2 has, for the early HPUX-11 versions there
is TOUR (Transport Optional Upgrade Release)


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

  Schlecht: Du kannst deinen Vibrator nicht finden.
  Panik: Deine Tochter hat ihn sich ausgeliehen.


Re: [OMPI devel] IPv6 support in OpenMPI?

2006-03-31 Thread Adrian Knoth
On Fri, Mar 31, 2006 at 11:06:55AM -0800, Brooks Davis wrote:

> > One little problem here is that it is possible to disable the
> > IPv6-mapped IPv4 addresses at least under Linux and some BSD variants.
> > For Linux, have a look at sys.net.ipv6.bindv6only.  Some authors even
> More specifically, KAME derived (BSD) stacks disable them by default so

In addition, OpenBSD doesn't provide mapped addresses. Though this
is a violation of RFC 4291, RFC 4038 "recommends" this approach:

   Note that some systems will disable (by default) support for internal
   IPv4-mapped IPv6 addresses.  The security concerns regarding these
   are legitimate, but disabling them internally breaks one transition
   mechanism for server applications originally written to bind() and
   listen() to a single socket by using a wildcard address.  This forces
   the software developer to rewrite the daemon to create two separate
   sockets, one for IPv4 only and the other for IPv6 only, and then to
   use select().  However, mapping-enabling of IPv4 addresses on any
   particular system is controlled by the OS owner and not necessarily
   by a developer.  This complicates developers' work, as they now have
   to rewrite the daemon network code to handle both environments, even
   for the same OS.

> it might be best to assume it doesn't work since you'll probably have
> to support that case anyway.

ACK. But what does this imply? Do we already have select()ed
binds, in other words, can we simply spawn two listen()-sockets?

If we conclude not to use mapped addresses, will we end up
with btl/tcp and btl/tcp6?

The OS support issue can be handled this way:

union sockaddr_union {
struct sockaddr sa;
struct sockaddr_in sin;
#ifdef HAVE_IPV6
struct sockaddr_in6 sin6;
#endif
};

and later:

/* copy IP to sockaddr */
static inline void
sin_set_ip(union sockaddr_union *so, const struct ip_addr *ip)
{
if (ip == NULL) {
#ifdef HAVE_IPV6
so->sin6.sin6_family = AF_INET6;
so->sin6.sin6_addr = in6addr_any;
#else
so->sin.sin_family = AF_INET;
so->sin.sin_addr.s_addr = INADDR_ANY;
#endif
return;
}


(code shamelessly borrowed from dovecot/src/lib/network.c)

It might be a little harder to read, but it keeps both
versions (IPv4-only and IPv6) close together.


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Scheiße wird nicht dadurch besser oder sicherer, dass man ein paar
Shareware-Warnlämpchen dranpappt. (Robin S. Socha über "Desktop-Firewalls")


[OMPI devel] How to test OpenMPI?

2006-05-02 Thread Adrian Knoth
Hi,

as already mentioned some weeks ago, we plan to provide IPv6-support
for OpenMPI.

Before touching the code, we'd like to have a test environment to ensure
not to break anything.

There is a test/-directory, but the tests inside seem to be very basic,
no network testing or anything running longer than a few milliseconds ;)

Is there a better/larger testsuite available? Or how do you test
your code modifications?

TIA

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Der merkt halt, dass das seine muttermilch ist, die gerade übertragen wird.
(user während des dd's der root-partition auf den fileserver)


[OMPI devel] Building ompi occasionally touches the source files

2006-07-17 Thread Adrian Knoth
Hi,

I have a bunch of boxes used to test and compile OMPI (we're talking
about the openmpi-1.1 release).

Two of them are Debian sarge (current stable), two are
Debian testing (i386+amd64) and one is Debian unstable (amd64)

The source is shared via svn, so it's for sure all are using the
same code.

I have the following directory layout:

  trunk
  trunk/Makefile
  trunk/src
  trunk/ARCH
  trunk/build/ARCH

where ARCH is dynamically determined by the Makefile, trunk/src/ contains
the openmpi-1.1 tarball, trunk/build/ARCH is for building ompi and
trunk/ARCH is the install directory.

Everything is fine on the Debian sarge hosts.

Trouble starts on the Debian testing boxes:

 1. If compiling without my special layout, in other words, just
untaring, ./configure && make, everything is fine

 2. If compiling inside my directory layout, the build 

a) changes the following two files in trunk/src/

adi@ten:~/trunk/src$ svn st
M  opal/util/show_help_lex.c
M  opal/util/keyval/keyval_lex.c

b) fails to complete (see attachment), the errors are all
   related to lex.

If I chmod -R -w trunk/src/ and call "make", everything works,
no build error at all.


And now to the Debian unstable (amd64) box: (I'm not root for
this machine, so I cannot guarantee for anything.)

Building without separate builddir works fine, the lex error also
exists, but I cannot circumvent it with chmod:

config.status: executing depfiles commands
config.status: executing pml-direct commands
config.status: creating ompi/mca/pml/pml_direct_call.h
rm: cannot remove `/home/adi/trunk/src/opal/util/keyval/keyval_lex.c': 
Permission denied
make: *** [/home/adi/trunk/src/opal/util/keyval/keyval_lex.c] Error 1

I'll attach two files: 

   i386-testing.log.gz   (Debian testing without chmod trick, failing build)
   amd64-unstable.log.gz (Debian unstable with failing chmod trick)


Feel free to ask for more information.



TIA


PS: My Makefile sets the following variables to disable autoconf et. al.:

export ACLOCAL=/bin/true 
export AMTAR=/bin/true 
export AUTOCONF=/bin/true
export AUTOHEADER=/bin/true 
export AUTOMAKE=/bin/true 
export MAKEINFO=/bin/true


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Wer andere in die Grube schubst, fällt selbst nicht rein


i386-testing.log.gz
Description: Binary data


amd64-unstable.log.gz
Description: Binary data


Re: [OMPI devel] Building ompi occasionally touches the source files

2006-07-18 Thread Adrian Knoth
On Tue, Jul 18, 2006 at 12:34:21PM +0200, Christian Kauhaus wrote:

> >b) fails to complete (see attachment), the errors are all
> >   related to lex.
> What are the flex versions used on these systems? On Debian stable it is
> flex 2.5.31 and on my Gentoo box it is flex 2.5.33, both giving correct
> builds. 

My testing boxes use 2.5.33, the unstable host is currently offline,
but according to packages.debian.org, it is also 2.5.33.

AFAIK, flex is only needed for developer builds (just read about it
in configure ;)


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Die erste Nacht am Galgen ist die schlimmste


Re: [OMPI devel] Building ompi occasionally touches the source files

2006-07-20 Thread Adrian Knoth
On Mon, Jul 17, 2006 at 10:05:05PM +0200, Adrian Knoth wrote:

Hi,


> The source is shared via svn, so it's for sure all are using the
> same code.

>  2. If compiling inside my directory layout, the build 
> 
> a) changes the following two files in trunk/src/
> 
> adi@ten:~/trunk/src$ svn st
> M  opal/util/show_help_lex.c
> M  opal/util/keyval/keyval_lex.c
> 
> b) fails to complete (see attachment), the errors are all
>related to lex.

We've solved this bug and it is not OMPI-related.

The long story:

Whenever an .l-file is newer than its corresponding .c-file, it
is generated by a rule from our toplevel Makefile.

svn changes the c/m-time, so flex is called, and probably due to
disabled autoconf-tools, this call is done with wrong parameters.

The question why this behaviour was only seen on some of my
hosts is even weirder: The svn checkout generates timestamps.
If you checkout to an ext2/ext3-filesystem, everything works,
but the failing hosts all use xfs.

We believe that xfs has a higher resolution for timestamps,
so the age of corresponding .l- and .c-files differ for
some value of a very small delta, thus forcing to rerun
flex and breaking the code.

Our fix is quite simple: we touch the .c-files whenever there
are .c-files belonging to existing .l-files. Obviously, the
c-files are now "newer" and regeneration is prevented.


Sorry for inconvenience.

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Idiot. Versager. Was für Luschen laufen heutzutage eigentlich im Usenet 
herum?  (Felix von Leitner in de.alt.sysadmin.recovery)


Re: [OMPI devel] OpenMPI not conforming with the C90 spec?

2006-08-19 Thread Adrian Knoth
On Thu, Aug 17, 2006 at 11:48:44PM +0100, Jonathan Underwood wrote:

> Hi,

Hi!

> Compiling a file with the gcc options -Wall and -pedantic gives the
> following warning:
> mpi.h:147: warning: ISO C90 does not support 'long long'
> Is this intentional, or is this a bug?

If you do not insist on using C90, you may compile with -std=c99
to get rid of this message ;)

I don't have the C90 (ANSI-C) at my fingertips, but I confirm it
does not support "long long".

Perhaps we should use int64_t instead.

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Scheiße wird nicht dadurch besser oder sicherer, dass man ein paar
Shareware-Warnlämpchen dranpappt. (Robin S. Socha über "Desktop-Firewalls")


[OMPI devel] A few notes on IPv6 status

2006-08-19 Thread Adrian Knoth
Hi,

as mentioned earlier this year, I'm now working on IPv6 support
for OpenMPI.

The main design goals are:

   - do not break existing IPv4 code
   - compile on SUSv2 (without new socket API)
   - do not use mapped addresses
   - test the new code on many systems

The porting of OPAL is more or less finished (at least on Linux, but
I'll do some investigations on *BSD and Solaris) and I've halfway
ported ORTE (perhaps I can manage it within the next two or three
weeks). I'll still have to write more test code, but that's already
scheduled for tomorrow.

Christian Kauhaus proposed to set up a blog containing news about
progress, early patches and so on. Is there anyone who'd like to
read it? ;)

I have a few questions to discuss:

 In opal/util/if.c:

/*
 *  Attempt to resolve the adddress as either a dotted decimal formated
 *  string or a hostname and lookup corresponding interface.
 */

int opal_ifaddrtoname(const char* if_addr, char* if_name, int length)


And somewhere below:

#define ADDRLEN 100
bool
opal_ifislocal(char *hostname)
{
char addrname[ADDRLEN + 1];
[..]
ret = opal_ifaddrtoname(hostname, addrname, ADDRLEN);

Why ADDRLEN? Shouldn't IF_NAMESIZE (defined 32) do the job?
opal_ifaddrtoname copies the interface name to its second
parameter (here: addrname), so the largest string can only
be as long as IF_NAMESIZE.

ORTE-question:

According to RFC 3986 (and some others), I've implemented the
service string as follows:

#ifdef IPV6
if (addr.sin6_family == AF_INET) {
ptr += sprintf(ptr, "tcp://%s:%d", opal_sockaddr2str(&addr),
  ntohs(mca_oob_tcp_component.tcp_listen_port));
}

if (addr.sin6_family == AF_INET6) {
ptr += sprintf(ptr, "tcp://[%s]:%d", opal_sockaddr2str(&addr),
ntohs(mca_oob_tcp_component.tcp_listen_port));
}
#else
ptr += sprintf(ptr, "tcp://%s:%d", inet_ntoa(addr.sin_addr),
  ntohs(mca_oob_tcp_component.tcp_listen_port));
#endif


Do you agree with a resulting URL like tcp://[2001:6f8::1]:port or
do you think it should be tcp6://?

I prefer the first one due to its RFC compliance. Both versions
won't interfear with existing libraries, because parse_uri would
return ORTE_ERR_BAD_PARAM in case of IPv6-connect strings on
ipv6-unaware systems.

Is it ok to use -DIPV6 or should I rename it? Is there already
a way to get the operating system we're compiling for? (uname -s)

IPv6 interface discovery (talking about opal/util/if.c again)
needs special treatment on some systems. Right now, I have
-DLINUX_IPV6 and I'd probably need to catch more (at least
HPUX defines SIOCGLIFADDR which is also present on OpenBSD).

If I'd have something like -DLINUX, I wouldn't need to
introduce more defines (like -DLINUX_IPV6 oder -DBSD_IPV6).

There is probably more to discuss (i.e. the CIDR support I've
implemented), but let's delay this until the first patch ;)


Best regards.

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Win95 Error 188: User hat sich aufgehängt


Re: [OMPI devel] A few notes on IPv6 status

2006-08-21 Thread Adrian Knoth
On Sat, Aug 19, 2006 at 11:07:26PM +0200, Adrian Knoth wrote:

> Hi,

Hi!

> Do you agree with a resulting URL like tcp://[2001:6f8::1]:port or
> do you think it should be tcp6://?

I've changed this to tcp6://, because orte/mca/oob/tcp/oob_tcp.c
contains the following lines:

/* setup the IP address for storage */
tmp = mca_oob.oob_get_addr();
tmp2 = strrchr(tmp, '/') + 1;
tmp3 = strrchr(tmp, ':');
if(NULL == tmp2 || NULL == tmp3) {

The old way (tcp://[IPv6]) would require code to remove
'[' and ']' iff af_family == AF_INET6.

tcp6:// does not need any special treatment.

> IPv6 interface discovery (talking about opal/util/if.c again)
> needs special treatment on some systems. Right now, I have
> -DLINUX_IPV6 and I'd probably need to catch more (at least
> HPUX defines SIOCGLIFADDR which is also present on OpenBSD).
> 
> If I'd have something like -DLINUX, I wouldn't need to
> introduce more defines (like -DLINUX_IPV6 oder -DBSD_IPV6).

I'm now using the compiler defines:

#ifdef __linux__
#endif



-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Fighting for peace is like fucking for virginity!


[OMPI devel] First IPv6 communication with ORTE

2006-08-24 Thread Adrian Knoth
Hi,

I'm glad to announce the first IPv6 launch of orted:

tcp6   0960 2001:638:906:2:20:43810 2001:638:906:2::1:43421 
ESTABLISHED18368/orted 


Unit testing discovered the relevant bugs. They're now fixed
and it's actually working. Who'd ever guess this? ;)

I'm going to prepare some developer information next week.

Best regards.

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Paradox ist, wenn ein Neger sich schwarz aergert.


[OMPI devel] [IPv6] new component oob/tcp6

2006-09-01 Thread Adrian Knoth
Hi,

yesterday I felt impelled to create a new ORTE oob component: tcp6.

I was able to either compile the library with IPv4 or IPv6 support,
but not with both (so to say: two different ompi installations or
at least two different DSO versions).

As far as I can see, many functions use mca_oob_tcp_component.tcp_listen_sd.
Unfortunately, as I am not allowed to use v4mapped addresses (not supported
by the Windows IPv6 stack, disabled by default on *BSD), this socket
is either AF_INET or AF_INET6, but not both (both means AF_INET6 *and*
accepting v4mapped addresses).

Do you agree to go on with two oob components, tcp and tcp6?
There is a lot of duplicated code, but we might refactor this
when everything else will be done.

On the other hand, this whole procedure might be totally useless:
two nodes may exchange IPv4-URIs via IPv6 containing identical
RFC1918 networks. One would prefer IPv4 due to less overhead,
but with IPv6, these v4-addresses might be at different locations
anywhere in the world.

In other words: IPv6 must be tried first or mixing with IPv4
cannot be reliable. In this case, a lot of code may be removed
and we'll end up with either two installations/DSOs (a mentioned
above) or with runtime detection of af_family (i.e. look for
global IPv6 addresses and iff found, disable IPv4 completely)

What do you think - which way is best? Use cases?


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Was du tun willst, tue ganz! Oder halb. Oder laß es bleiben.


Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-01 Thread Adrian Knoth
On Fri, Sep 01, 2006 at 07:01:25AM -0600, Ralph Castain wrote:

> > Do you agree to go on with two oob components, tcp and tcp6?
> Yes, I think that's the right approach

It's a deal. ;)

> I think this can be supported nicely in the framework system. All we
> have to do is set the IPv6 component's priority higher than IPv4.

Do you mean that priority?:

   MCA oob: parameter "oob_tcp6_priority" (current value: "0")


> We then can deal with the "try IPv6 first" by traversing the component
> list in priority order. As an example, see the RAS framework.

Where is it done? It's outside the mca/oob directory, right?
My knowledge about orte is currently more or less limited to
this subdirectory ;)

> it. In this case, we need both OOB components active, and we need a routing
> table that tells us which one to use to talk to various processes. I suspect
> the routing table belongs in the RML framework. If you look at the PLS
> framework, you'll see where we "front" the select function to give you the
> ability to specify a preferred selection. We might have to do the same thing
> with the OOB to allow the RML to say "send this buffer using this specific
> OOB component", while still allowing it to say "send this buffer using the
> *best* component".

Sounds good (but I don't have to do it on my own, do I?).

Right now it looks like this:

   orterun -np 2 -host hostA,hostB some_command

uses IPv4 and it is still working.

   orterun -mca oob ^tcp hostA,hostB some_command

hangs. The HNP correctly generated the tcp6://-URIs, but I guess
the remote node tries to connect with its oob/tcp module (which
cannot handle IPv6 anymore).

So I chmod 0 the mca_oob_tcp.so to prevent its loading, thus resulting
in a working IPv6 connection.

(for now, I don't know why this happens (the hang), but at least
 the oob/tcp6 component is working at all)

> I suspect that backend processes (i.e., non-HNP processes) really will
> only use one or the other.

The question also arises for the btl/tcp component: if all nodes
should be able to communicate with each other, they must use the
same address family.


Thanks for your help.

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Person1: Geil. Morgen um 9 muss ich Präsentation halten. ÖRKS!
Person2: Morgen um 9 werde ich eine Kaffeetasse halten.


Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Adrian Knoth
On Wed, Sep 06, 2006 at 05:44:23PM +0200, Christian Kauhaus wrote:

> Our current plan is to look into the hostfile and see if there are 
> 
> (1a) just IPv4 addresses
> (1b) IPv4 addresses and hostnames for which 'A' queries can be resolved
> (2a) just IPv6 addresses
> (2b) IPv6 addresses and hostnames for which '' queries can be resolved.

Speaking of which: Today, I've extended rds/hostfile/ to accept
IPv6 addresses.

This now gives me the possibility to specify IPv6 addresses,
resulting in an IPv4 (yes, I-P-v-four) connection.

Obviously, I'll have to investigate ;)


(just to let you know I'm working on it)

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Wer braucht 'ne Maus, wenn er 'ne Tastatur hat? (Sebastian Linser)


Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-07 Thread Adrian Knoth
On Thu, Sep 07, 2006 at 11:46:28AM -0400, Jeff Squyres wrote:

> > On Fri, Sep 01, 2006 at 07:01:25AM -0600, Ralph Castain wrote:
> > 
> >>> Do you agree to go on with two oob components, tcp and tcp6?
> >> Yes, I think that's the right approach
> > 
> > It's a deal. ;)
> Actually, I would disagree here (sorry for jumping in late! :-( ).

No problem, just two hours ago, Christian and me decided to drop
the idea of oob/tcp6 and go on with only one oob-tcp-component.

It shouldn't be that hard and I'll try it tonight or tomorrow.

> Can we just have one component that handles both ivp4 and ivp6?

Yes. At least that's what I try to code ;)

> Appropriate #if's can be added

Are already present.

> (I'm willing to help with the configure.m4 mojo -- the

That's good. Just check for struct sockaddr_in6 and add
-DIPV6 to the CFLAGS. This flag is currently needed by
opal/util/if.* and orte/mca/oob/tcp/*, so one might limit
it to the two corresponding makefiles.

We can also set/define IPV6 in something_config.h.
It'd also be a good idea to have a --disable-ipv6 configure flag.


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Die Nase ist die Bohrinsel des kleinen Mannes


Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-07 Thread Adrian Knoth
On Thu, Sep 07, 2006 at 07:51:28PM +0200, Adrian Knoth wrote:

> No problem, just two hours ago, Christian and me decided to drop
> the idea of oob/tcp6 and go on with only one oob-tcp-component.
> It shouldn't be that hard and I'll try it tonight or tomorrow.

Looks quite promising:

adi@ipc654:~/ompi/trunk/test$ (orterun -np 2 -host amun,ipc654 netstat -tpln) 
2> /dev/null | grep orte
tcp0  0 0.0.0.0:44012   0.0.0.0:*   LISTEN 
1332/orted  
tcp0  0 0.0.0.0:42706   0.0.0.0:*   LISTEN 
1329/orterun
tcp0  0 0.0.0.0:36376   0.0.0.0:*   LISTEN 
27961/orted 
tcp6   0  0 :::56783:::*LISTEN 
27961/orted 
tcp6   0  0 :::34615:::*LISTEN 
1329/orterun
tcp6   0  0 :::39837:::*LISTEN 
1332/orted 


This is one component with two listening sockets.

The main work isn't done yet: the mca_oob_tcp_peer_start_connect.

I've extended it a little bit:

static int  mca_oob_tcp_peer_start_connect(mca_oob_tcp_peer_t* peer,
   uint16_t af_family);

where af_family is one of {AF_INET, AF_INET6}. I start with AF_INET
and within mca_oob_tcp_peer_start_connect, I call this function
again with AF_INET6 (one level of recursion) to try the other
address family.

This approach (coded last week when I still had a single component)
is bad (long timeouts before trying AF_INET6) and probably wrong:

for the accepting sockets, I've added

   opal_event_t   tcp6_send_event;
   opal_event_t   tcp6_recv_event;

and perhaps something like this is necessary for peers, too (don't
know this, yet. I'll have a look at it tomorrow).


So long

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Frauen verstehen entweder gar nichts oder alles falsch


[OMPI devel] [IPv6] ORTE layer working

2006-09-12 Thread Adrian Knoth
Hi,

I'm glad to announce a first working version of IPv4+IPv6 orte.

It contains:

   - IPv6 interface discovery on Linux
   - a single orte/mca/oob/tcp component
   - a single module (no multiple instances)
   - two listening sockets
   - two connecting sockets

The listening sockets always stay open, the connecting sockets
are tried concurrently and if one succeeds to connect (a real
orte connect, mca_oob_tcp_peer_connected()), the other one
will be closed.

Of course, Jeff's ipv6-configure-patch is also included.

This work is still based on OpenMPI-v1.1, but I'll port
it to the v1.2 svn checkout, hopefully until the end
of the week ;) It will then appear in the svn-/tmp/.

If someone is interested in the code right now, I could
create a snapshot of my svn working copy and put it
on the webserver.



-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Die deutsche Telekom ist eine Organisation für Radsport. Das Geschaeft
mit den Telefon-Leitungen betreiben die nur ehrenamtlich nebenbei.


Re: [OMPI devel] [IPv6] ORTE layer working

2006-09-22 Thread Adrian Knoth
On Tue, Sep 12, 2006 at 05:44:49PM +0200, Adrian Knoth wrote:

> I'm glad to announce a first working version of IPv4+IPv6 orte.
> 
> It contains:
>- IPv6 interface discovery on Linux
>- a single orte/mca/oob/tcp component
>- a single module (no multiple instances)
>- two listening sockets
>- two connecting sockets

Since now, Solaris IPv6 interface discovery is working, too.

> This work is still based on OpenMPI-v1.1, but I'll port
> it to the v1.2 svn checkout, hopefully until the end
> of the week ;)

It's based on 1.2-svn since last weekend.

> It will then appear in the svn-/tmp/.

It won't appear in svn-/tmp/ until our local mandarin manages
to finish his vacation and sign the license paper. If you
need/want to access the code, see next paragraph: ;)

> If someone is interested in the code right now, I could
> create a snapshot of my svn working copy and put it
> on the webserver.


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Eine Schraube ohne Nagel ist ein Gewinde.


[OMPI devel] IPv6 in btl/tcp

2006-10-11 Thread Adrian Knoth
Hi,

this mail starts like all the others before ;):

I'm glad to announce a first working version of btl/tcp
with both, IPv4 and IPv6 support.

adi@ipc654:~/ompi/trunk/test$ ruby ringtest.rb 
Loaded suite ringtest
Started
0: sending message (0) to 1
1: got message (1) from 0, sending to 2
2: got message (2) from 1, sending to 0
0: got message (2) from 2
3 additional processes aborted (not shown)

This ringtest was done in an IPv6-only environment, so
process launching (orte) and MPI communication were
done via IPv6.

Unfortunately, the process crashed afterwards, but as
mentioned above, it's the very first version. (when I write
these lines, the svn checkin is only three minutes old)

The ringtest also works fine in plain IPv4 environments and
mixed environments within the same cluster. It fails on
mixed multi-cluster setups and heterogenous OSs, but I'm
going to fix these issues on Saturday (or next week).

(I'm currently passing complete struct sockaddr_storages
 from btl_tcp_addr.h to the pml, thus giving different
 sizeof (mca_btl_tcp_addr_t) in btl_tcp_proc.c:
 (if(0 != (size % sizeof(mca_btl_tcp_addr_t
 That's obviously wrong.)


Let's see...

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Q: Was können Frauen, was Männer nicht können?
A: Kinder kriegen, Periode kriegen, nach dem Tod noch Sex haben


Re: [OMPI devel] IPv6 in btl/tcp

2006-10-16 Thread Adrian Knoth
On Wed, Oct 11, 2006 at 11:28:13PM +0200, Adrian Knoth wrote:

> The ringtest also works fine in plain IPv4 environments and
> mixed environments within the same cluster. It fails on
> mixed multi-cluster setups and heterogenous OSs, but I'm
> going to fix these issues on Saturday (or next week).

I've fixed it:

0: sending message (0) to 1
0: got message (3) from 3
[0,1,1][/home/racl/adi/ompi/trunk/src/ompi/mca/btl/tcp/btl_tcp_endpoint.c:194:mca_btl_tcp_endpoint_dump]
 accepted: 192.168.1.132 - 192.168.1.1 nodelay 1 sndbuf 262144 rcvbuf 262144 
flags 0802
3: got message (3) from 2, sending to 0
2: got message (2) from 1, sending to 3
[0,1,0][/home/racl/adi/ompi/trunk/src/ompi/mca/btl/tcp/btl_tcp_endpoint.c:194:mca_btl_tcp_endpoint_dump]
 connected: 192.168.1.1 - 192.168.1.132 nodelay 1 sndbuf 262144 rcvbuf 262144 
flags 0802
[0,1,0][/home/racl/adi/ompi/trunk/src/ompi/mca/btl/tcp/btl_tcp_endpoint.c:194:mca_btl_tcp_endpoint_dump]
 accepted: 141.35.14.189 - 141.35.13.178 nodelay 1 sndbuf 262144 rcvbuf 262144 
flags 0802
[0,1,1][/home/racl/adi/ompi/trunk/src/ompi/mca/btl/tcp/btl_tcp_endpoint.c:194:mca_btl_tcp_endpoint_dump]
 connected: 2001:638:906:1:20e:a6ff:fe3d:48d6 - 
2001:638:906:2:213:d3ff:fec5:3480 nodelay 1 sndbuf 262144 rcvbuf 262144 flags 
0802
1: got message (1) from 0, sending to 2

This is a ringtest between two different Linux machines and two
Solaris hosts (both x86) in a mixed environment.

The two Linux nodes talk via RFC1918 (192.168.1.x) - the fastest connection
between them. One of them talk to the Solaris via public IPv4 (141.x.y.z),
also the fastest connection, the other Linux system (which is 192.168.1.131
and doesn't have a public IPv4 address) uses IPv6 to communicate
with the second Solaris, because no faster (other) connection is
available. (it's a node from the formerly RFC1918-only cluster which
now has its own IPv6 subnet (2001:638:908:1/64))


Things are getting interesting... ;)


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

[X] <-- nail here for new monitor


Re: [OMPI devel] IPv6 in btl/tcp

2006-10-17 Thread Adrian Knoth
On Mon, Oct 16, 2006 at 07:22:12PM -0600, Brian Barrett wrote:

> I just committed some code in the TCP OOB component to deal with  
> packing / unpacking sockaddr_in structures for cases where there is  
> different heterogeneity / padding.  I think it's going to require  
> some work to make it IPv6 friendly.  Just an FYI.

You're talking about the patch for #493? I've incorporated
this fix and extended it to support IPv6. Works fine.


Thanks for the pointer.

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Wer dauernd auf die Pauke haut geht eines Tages flöten


[OMPI devel] New oob/tcp?

2006-10-25 Thread Adrian Knoth
Hi,

I've seen a new oob/tcp component in the v1.2 branch (copied from
the trunk). Of course, it doesn't merge with my IPv6 patch, so
I'm currently using the old oob/tcp in my branch.

Is this new component considered stable, thus making it worth
to port the IPv6 patch?


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Therapie: Vorlesen der Linux-Kernelsourcen [...] bei gleichzeitiger 
Beruhigungsmusik: cat /boot/bzImage > /dev/audio   (Matthias Lipp in dasr)


Re: [OMPI devel] New oob/tcp?

2006-10-25 Thread Adrian Knoth
On Wed, Oct 25, 2006 at 06:27:47AM -0600, Ralph H Castain wrote:

> I don't see any new component, Adrian. There have been a few updates to the
> existing component, some of which might cause conflicts with the merge, but
> those shouldn't be too hard to resolve.

Ok, I just saw something with "create_listen_thread" and so on, but
didn't look closer.

I'm currently trying to merge...


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Wenn man vom Gegner gelobt wird, hat man etwas falsch gemacht.


Re: [OMPI devel] New oob/tcp?

2006-10-25 Thread Adrian Knoth
On Wed, Oct 25, 2006 at 02:48:33PM +0200, Adrian Knoth wrote:

> > I don't see any new component, Adrian. There have been a few updates to the
> > existing component, some of which might cause conflicts with the merge, but
> > those shouldn't be too hard to resolve.
> Ok, I just saw something with "create_listen_thread" and so on, but
> didn't look closer.

The "new" (current) oob/tcp (in the v1.2 branch) does not have Brian's
fix for #493. (the following constant is missing, the code, too)

   MCA_OOB_TCP_ADDR_TYPE_AFINET

There are probably more differences...

If you want, I can do the merge and we'll use my IPv6 oob with
all the patches up to r12050.


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

> Pine ist geil. Jedenfalls als Mailer. (Holger Marzen und)
Pine stinkt.  Insbesondere als Mailer.  (Felix von Leitner in dasr)


[OMPI devel] IPv6 code uploaded to svn

2006-10-25 Thread Adrian Knoth
Hi,

I've uploaded my current IPv6 code to /tmp/adi-ipv6/.

The checkin was splitted to ease the review.

What has changed?:

 OPAL: (changeset 12308)
The OPAL layer can now detect IPv6 addresses on Linux and
Solaris. The functions in if.c were rewritten to handle
the new address structures.

See that I do not use the kernel index for NIC enumeration,
because there is no 1:1 mapping between the index and
the address.

I've also switched to CIDR notation (like 10.0.0.1/8 instead
of 10.0.0.1/255.0.0.0) to be able to use the same code
for IPv6 (which needs /0 .. /128)

 ORTE:
mca/rds: (changeset 12309)
This is an extended version of the hostfile parser.
It does support IPv6 addresses, but even invalid onces
like 2001::dead::1 ("::" is only allowed once).
I don't care, they get catched by the lookup functions.

mca/oob/tcp: (changeset 12310)
This is "my" implementation of the oob/tcp component.
It is based on the latest version from the v1.2 branch.
The new thread listener isn't already ported to IPv6,
we could do this once the code is in the trunk/v1.2.

As you can see, I've duplicated the listening sockets
and the corresponding events. That's why I also extended
some functions by a new parameter "sd" in order to know
to which socket (either tcp_listen_sd or tcp6_listen_sd)
the function should be applied.


 OMPI: (changeset 12311)
A straight-forward implementation very similar to the
oob/tcp code. I haven't tested the --disable-ipv6 case,
this still has to be done. The address selection code
may need some improvements, I currently use my own
function from oob_tcp_addr.c, but it's probably better
to ask is_ipv4_private than using (!is_ipv4_public).

There are still some changes to be done to port it
to MCA_BTL_BASE_VERSION_1_0_1, I haven't noticed this
until the checkin. I'll do this on Friday.


To sum it up: I believe the code is ready to go into v1.2,
but it should be reviewed, tested, extended...


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Bessere Qualitätskontrolle bei Hundefutter! Kinder sollten zweimal
jährlich lebensmittelchemisch untersucht werden.  (aus de.talk.jokes)


[OMPI devel] MPI between amd64 and x86

2006-11-01 Thread Adrian Knoth
Hi,

I'm currently testing the new IPv6 code in a lot of
different setups.

It's doing fine with Linux and Solaris, both on x86.
There are also no problems between multiple amd64s,
but I wasn't able to communicate between x86 and amd64.

The oob connection is up, but the BTL hangs. gdb (remote) shows:

#0  0xb7d3bac9 in sigprocmask () from /lib/tls/libc.so.6
#1  0xb7eb956c in opal_evsignal_recalc ()
   from /home/racl/adi/ompi/trunk/Linux-i686/lib/libopal.so.0
#2  0xb7eba033 in poll_dispatch ()
   from /home/racl/adi/ompi/trunk/Linux-i686/lib/libopal.so.0
#3  0xb7eb8d5d in opal_event_loop ()
   from /home/racl/adi/ompi/trunk/Linux-i686/lib/libopal.so.0
#4  0xb7eb2f58 in opal_progress ()
   from /home/racl/adi/ompi/trunk/Linux-i686/lib/libopal.so.0
#5  0xb7c72505 in mca_pml_ob1_recv ()
   from /home/racl/adi/ompi/trunk/Linux-i686//lib/openmpi/mca_pml_ob1.so
#6  0xb7fa8c10 in PMPI_Recv ()
   from /home/racl/adi/ompi/trunk/Linux-i686/lib/libmpi.so.0
#7  0x080488cd in main ()


and the local gdb:

#0  0x2b4b4d99 in __libc_sigaction () from /lib/libpthread.so.0
#1  0x2aee4c26 in opal_evsignal_recalc ()
   from /home/adi//trunk/Linux-x86_64/lib/libopal.so.0
#2  0x2aee44b1 in opal_event_loop ()
   from /home/adi//trunk/Linux-x86_64/lib/libopal.so.0
#3  0x2aedfc10 in opal_progress ()
   from /home/adi//trunk/Linux-x86_64/lib/libopal.so.0
#4  0x2d6a0c8c in mca_pml_ob1_recv ()
   from /home/adi/trunk/Linux-x86_64//lib/openmpi/mca_pml_ob1.so
#5  0x2ac429f9 in PMPI_Recv ()
   from /home/adi//trunk/Linux-x86_64/lib/libmpi.so.0
#6  0x00400b39 in main ()


The ompi-1.1.2-release also shows this problem, so I'm not
sure if it's my fault.

I've added some debug output to my ringtest (see below) and
got the following result:

1: waiting for message
0: sending message (0) to 1
0: sent message

Here's the code:

#include 
#include 

int main(int argc, char** argv)
{
int rank;
int size;
int message = 0;

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

if (!rank) {
printf("%i: sending message (%i) to %i\n", rank, message, 1);
MPI_Send(&message, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
printf("%i: sent message\n", rank);
MPI_Recv(&message, 1, MPI_INT, size-1, 0, MPI_COMM_WORLD, 
MPI_STATUS_IGNORE);
printf("%i: got message (%i) from %i\n", rank, message, size-1);
} else {
printf("%i: waiting for message\n");
MPI_Recv(&message, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
message += 1;
MPI_Send(&message, 1, MPI_INT, (rank+1)%size, 0, MPI_COMM_WORLD);
printf("%i: got message (%i) from %i, sending to %i\n", rank, message, 
   rank-1, (rank+1)%size);
}

MPI_Finalize();
return 0;
}

Not very particular, but as seen in the gdb output and also
from the debug lines, both processes are waiting in PMPI_Recv(),
expecting a message to arrive.

Is this a known problem? What's wrong? Usercode? ompi?
As far as I can see (tcpdump and strace), all tcp connections
are up, so the message might got stuck between rank0 and rank1.


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Windows not found - Abort/Retry/Smile


Re: [OMPI devel] MPI between amd64 and x86

2006-11-04 Thread Adrian Knoth
On Sat, Nov 04, 2006 at 02:07:58PM +0530, Nysal Jan wrote:

> >come from the BTL headers where the fields do not have the same
> >alignment inside. The original question was asked by Nysal Jan on an
> >email with the subject "SEGV in EM64T <--> PPC64 communication" on
> >Oct. 11 2006. Unfortunately, we still have the same problem.
> I'm forwarding that email. Further investigation showed that the same
> issue exists with a few other ob1 headers as well. A 64-bit build doesn't
> have this problem. I'm not sure if this might be the same issue that you
> are facing. You could test if the attached patch works for you (Although
> this is not the right solution).

The attached patch solves my issue and I feel that it's right
the problem I was facing (I saw the hang in pml_ob1).

Is there already a ticket assigned for it?


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Mathematik ohne Axiome ist nicht mehr als heiße Luft. (Matthias Heidbrink)
Auch mit Axiomen ist Mathematik nicht mehr als heiße Luft. (Florian Weimer)


[OMPI devel] valgrind messages important?

2006-11-12 Thread Adrian Knoth
Hi,

I'm currently tracing a segfault in mpi_init which is caused
by ompi/runtime/ompi_mpi_init.c:569

ret = MCA_PML_CALL(add_procs(procs, nprocs));
free(procs);

In most cases, no segfault occurs and everything works fine,
but with some special combinations of machines, I can trigger
the bug.

If I choose a working configuration and increase the number
of IPv6 addresses on one of the machines, the segfault occurs.

It cannot be triggered by adding IPv4 addresses, and disabling
IPv6 completely solves the problem.

The debugger shows that free internally calls mem2chunk.
The working configuration has a chunksize of 16 (bytes?),
the failing one has $BIGNUM, thus causing the segfault.
(trying to free unallocated memory)

I think these long IPv6 addresses overwrite a buffer (or at
least some memory which is allocated inside OMPI's memory
pool, thus delaying the segfault).

There are two issues found by valgrind, but I wanted to
check the "normal" valgrind output first. With the nightly
snapshot 1.2b1r12555, I got the following "errors":

==8948== Conditional jump or move depends on uninitialised value(s)
==8948==at 0x1B92884D: ompi_attr_create_predefined_callback 
(attribute_predefined.c:374)
==8948==by 0x1BC869B8: orte_gpr_proxy_deliver_notify_msg 
(gpr_proxy_deliver_notify_msg.c:144)
==8948==by 0x1B9FEDF7: mca_oob_xcast (oob_base_xcast.c:147)
==8948==by 0x1B947E49: ompi_mpi_init (ompi_mpi_init.c:542)
==8948==by 0x1B97D657: MPI_Init (pinit.c:71)
==8948==by 0x8048846: main (in /home/racl/adi/ompi/trunk/test/vm/ring)

and

==8948== Syscall param writev(vector[...]) points to uninitialised byte(s)
==8948==at 0x1BBCD5E8: (within /lib/tls/libc-2.3.2.so)
==8948==by 0x1BD873C1: mca_btl_tcp_frag_send (btl_tcp_frag.c:104)
==8948==by 0x1BD87133: mca_btl_tcp_endpoint_send_handler 
(btl_tcp_endpoint.c:689)
==8948==by 0x1BA48AD3: opal_event_process_active (event.c:463)
==8948==by 0x1BA48E11: opal_event_base_loop (event.c:600)
==8948==by 0x1BA48BE3: opal_event_loop (event.c:514)
==8948==by 0x1BA4211D: opal_progress (opal_progress.c:259)
==8948==by 0x1BD59D24: opal_condition_wait (condition.h:81)
==8948==by 0x1BD5AD00: mca_pml_ob1_send (pml_ob1_isend.c:128)
==8948==by 0x1B985CD9: MPI_Send (psend.c:63)
==8948==by 0x80488B6: main (in /home/racl/adi/ompi/trunk/test/vm/ring)
==8948==  Address 0x80FEECE is not stack'd, malloc'd or (recently) free'd


Should I worry about these two?

The segfault itself is probably related to this output:

==3324== Syscall param writev(vector[...]) points to uninitialised byte(s)
==3324==at 0x1BBB45E8: (within /lib/tls/libc-2.3.2.so)
==3324==by 0x1BC57191: mca_oob_tcp_msg_send_handler (oob_tcp_msg.c:234)
==3324==by 0x1BC58658: mca_oob_tcp_peer_send (oob_tcp_peer.c:194)
==3324==by 0x1BC5E873: mca_oob_tcp_send (oob_tcp_send.c:152)
==3324==by 0x1B9FEC92: mca_oob_send_packed (oob_base_send.c:78)
==3324==by 0x1BC6CE92: orte_gpr_proxy_exec_compound_cmd 
(gpr_proxy_compound_cmd.c:117)
==3324==by 0x1B94503A: ompi_mpi_init (ompi_mpi_init.c:523)
==3324==by 0x1B97AE7F: MPI_Init (pinit.c:71)
==3324==by 0x8048846: main (in /home/racl/adi/ompi/trunk/test/vm/ring)
==3324==  Address 0x822BF11 is not stack'd, malloc'd or (recently) free'd

But I still have to look closer.

Is there a way to disable OMPI's ptmalloc2 and use the
system's free/malloc? (hopefully causing the segfault right where
it is done, probably a memcpy with wrong size)

Or are there other ways to debug such an issue?

TIA

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Paradox ist, wenn einer vom Rotwein blau wird.


Re: [OMPI devel] Cross-Cluster OpenMPI

2006-11-19 Thread Adrian Knoth
On Sun, Nov 19, 2006 at 02:35:27AM -0500, Resat Umit Payli wrote:

> Hi;

Hi!

> I am interested in using OpenMPI cross-cluster runs on the Grid
> environments.

Though it's not Grid, but "our" IPv6 code is intended to be run
on multi-clusters.

(if you're only looking for using all of your machines as long as
 they're reachable via IPv6 (which will be the default in uuuhh...
 only 10 years))


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Auch bei Wahrheiten muß man auf das Verfallsdatum achten.


[OMPI devel] IPv6 up and working

2006-11-24 Thread Adrian Knoth
Hi,

last week I've rewritten my btl-tcp component to improve several
aspects, mainly no oversubscription of interfaces.
I now have:

   - the MCA parameter btl_tcp_disable_family={4|6}
 to force the use of a special address family at runtime

   - a working include/exclude list for interfaces (broken with
 the old design)

   - the ability to use different address families for different
 directions of communication, so

 ,--.   ,---.
 |  Host A  |   IPv4 >  |  Host B   |
 |  RFC1918/NAT |   |  IPv4 public  |
 |  IPv6|  <--- IPv6    |  IPv6 |
 `--'   `---'

And probably something more ;) Yesterday, Thomas Peiselt,
Christian Kauhaus and me (from the Cluster- & Metacomputing Group at the 
University of Jena) also did a first benchmark and we're proud to come
up with very good news:

   IPv6 is not much slower than IPv4.

Latency and bandwidth utilisation behave comparably, resulting
in 111.17 MB/s with IPv4 versus 109.52 MB/s with IPv6 running IMB 2.3 
PingPong over 1GigEthernet (in other words: IPv6 delivered 98.5% of 
the IPv4 bandwidth).

I expect to confirm this tendency (less than 2% performance loss)
by detailed benchmarking next week.

Thomas is going to improve address family selection, so whenever
it's possible, use IPv4 and by this squeeze out the last 1.5%
of bandwidth ;) (my implementation tends to prefer IPv6, but
I guess he'll fix it)

Christian is very interested in testing. We have automatic builds,
a cluster with IPv4/IPv6 and a traffic light turning from green
to red whenever a build/test fails, so he's looking forward
to provide testing facilities to the OMPI project, but I guess
he'll raise his questions (mainly with respect to MTT) in a
separate mail.


I'm going to upload the new btl-tcp component to tmp/adi-ipv6
within the next days, it's currently only a branch in our local
subversion which I'd like to merge before.



-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Mit leerem Kopf nickt es sich leichter.


Re: [OMPI devel] Major revision to the RML/OOB

2006-12-05 Thread Adrian Knoth
On Mon, Dec 04, 2006 at 06:26:26AM -0700, Ralph Castain wrote:

> Hello all

Hi!

> With some luck and (hopefully) not too many conflicting priorities, Jeff
> and I may complete this work by Christmas
[..]
> As always, feel free to comment and/or make suggestions!

You wrote a lot about oob, sockets and connections. Does this
imply changes to oob/tcp? If so, I suggest to integrate the
IPv6 support first (may be ported from /tmp/adi-ipv6, see

for details).

Of course, I'd like to help. Has anybody ever tested the code?
(surely we did, but someone else?)


-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Windows 98? Warum? Ich hab' das alte noch nicht zu Ende gespielt.


Re: [OMPI devel] Major revision to the RML/OOB

2006-12-06 Thread Adrian Knoth
On Wed, Dec 06, 2006 at 07:07:42AM -0700, Ralph H Castain wrote:

> The concern is that we want to leave open the possibility of putting this
> revision into 1.2 since it will have a major performance impact on both
> startup time and the max cluster size we can support. The IP6 code is
> scheduled for 1.3 and we don't know what the performance impact will look
> like - hence the hesitation.

I agree not to include IPv6 in the v1.2 (you might remove the configure
patch from the v1.2 line, or leave it there without really using it)

If one considers the current v1.2 branch as stable, the trunk could
be used for the new v1.3 line.

I therefore suggest to move the OPAL changes into the trunk,
also the small hostfile code (lex code for IPv6) and the btl code.

When you've completed all changes to the OOB, we can have a look
and do the necessary IPv6 changes afterwards. Though I feel the oob/tcp
is the hardest part of all (it got the most modifications), I hope
that it's possible to copy a lot of the existing patch. Perhaps
your rewrite simplifies something.

I'm currently not developing new code, so at least the IPv6 codebase
isn't a moving target.


Just let me know if I could help.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany  


Re: [OMPI devel] Major revision to the RML/OOB

2006-12-08 Thread Adrian Knoth
On Thu, Dec 07, 2006 at 11:12:23AM -0500, Jeff Squyres wrote:

Hi,

> > I therefore suggest to move the OPAL changes into the trunk,
> > also the small hostfile code (lex code for IPv6) and the btl code.
> Can you describe the changes in opal that were made for IPv6?

These changes are limited to three files: opal/util/if.[ch] and
the new opal/include/opal/ipv6compat.h. The latter one is only
required for compatibility with old SUSv2 systems.

In if.c, I've added IPv6 interface discovery for Linux and Solaris,
Thomas Peiselt also contributed getifaddrs() support for *BSD/OSX.
Helper functions were extended to deal with struct sockaddr_storage.

I've introduced CIDR netmask handling, so the netmask no longer
holds something like  (a.s.o), but simply 8, 16 or
whatever. There are helper functions to convert from and to CIDR.

/* convert a netmask (in network byte order) to CIDR notation */
static int prefix (uint32_t netmask)

/* convert a CIDR prefixlen to netmask (in network byte order) */
uint32_t opal_prefix2netmask (uint32_t prefixlen)

I've also extended the interface struct, still containing if_index,
but that's just its number in the opal_list. The new field is
called if_kernel_index, representing the associated kernel interface
index for this device. My BTL/TCP code also exchanges this new
information to enable the remote to detect if two or more addresses
are assigned to the same interface, thus preventing oversubscription
(multiple connections to the same interface but to difference addresses,
 which is very likely if you have at least one IPv6 address and one
 IPv4 address on the same interface)

The code in if.c handles both, AF_INET and AF_INET6, so it's no
problem to use it without using IPv6 somewhere else (i.e. oob/tcp,
btl/tcp).

HTH

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Drink wet cement and get really stoned!


[OMPI devel] NFS race condition in romio

2007-01-08 Thread Adrian Knoth
Hi,

we're facing a NFS race condition if File_Open is called for
a nonexisting file:

#include 
int main(int argc, char *argv[])
{
MPI::Init(argc, argv);
MPI::File _outputFile;
double dummy = 42;

_outputFile = MPI::File::Open(MPI::COMM_WORLD,
"foo",
MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI::INFO_NULL);
_outputFile.Set_errhandler(MPI::ERRORS_ARE_FATAL);
_outputFile.Write(&dummy, 1, MPI::DOUBLE);
_outputFile.Close();
MPI::Finalize();
}

If run on two or more nodes with shared NFS, it usually fails:

ADIOI_NFS_OPEN (line 55): **filenoexist fooADIOI_NFS_OPEN (line 55): 
**filenoexist fooMPI_FILE_CLOSE (line 51): **iobadfh
ADIO_OPEN (line 273): **oremote_fail
ADIOI_NFS_OPEN (line 55): **filenoexist fooADIOI_NFS_OPEN (line 55): 
**filenoexist fooADIOI_NFS_OPEN (line 55): **filenoexist fooADIOI_NFS_OPEN 
(line 55): **filenoexist foo[amun2:12137] *** An error occurred in 
MPI_File_write
[amun2:12137] *** on a NULL file
MPI_FILE_CLOSE (line 51): **iobadfh
MPI_FILE_CLOSE (line 51): **iobadfh
MPI_FILE_CLOSE (line 51): **iobadfh
[amun2:12137] *** MPI_ERR_FILE: invalid file
[amun2:12137] *** MPI_ERRORS_ARE_FATAL (goodbye)
[inge:19493] *** An error occurred in MPI_File_write
[inge:19493] *** on a NULL file
[amun4:10186] *** An error occurred in MPI_File_write
[amun4:10186] *** on a NULL file
[amun3:11146] *** An error occurred in MPI_File_write
[amun3:11146] *** on a NULL file


(There are chances that this code will succeed if it is run on only two
 nodes and rank=0 is the NFS client and rank=1 is the NFS server)

The file is created on rank 0, closed and later reopened by all N
processes as described in ad_open.c around line 163. Unfortunately,
NFS isn't fast enough to inform all clients about the new file.
Also sync-mounting the share doesn't solve this issue.

A well-placed system("ls") in the code remedies the problem.
To avoid this noisy call, I've reimplemented this ls with
open(".") and stat("."), but stat() isn't necessary.

The attached patch fixes this problem, but perhaps there is
a better way to do it. What about upstream? (MPICH)?

(I guess NFS is widely used, so there should be more people
 facing this issue).


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de
Index: ompi/mca/io/romio/romio/adio/common/ad_open.c
===
--- ompi/mca/io/romio/romio/adio/common/ad_open.c   (revision 1913)
+++ ompi/mca/io/romio/romio/adio/common/ad_open.c   (working copy)
@@ -5,6 +5,9 @@
  *   See COPYRIGHT notice in top-level directory.
  */

+#include 
+#include 
+
 #include "adio.h"
 #include "adio_extern.h"
 #include "adio_cb_config_list.h"
@@ -226,8 +229,19 @@
 }
 fd->access_mode = access_mode;

-(*(fd->fns->ADIOI_xxx_Open))(fd, error_code);
+if ((ADIO_NFS == fd->file_system)) {
+char *dirc = ADIOI_Strdup(filename);
+char *dname = dirname (dirc);
+int my_fd;

+my_fd = open (dname, O_RDONLY);
+//stat (my_fd, NULL);
+close (my_fd);
+ADIOI_Free(dirc);
+free (dname);
+(*(fd->fns->ADIOI_xxx_Open))(fd, error_code);
+}
+
 /* if error, may be it was due to the change in amode above. 
therefore, reopen with access mode provided by the user.*/ 
 fd->access_mode = orig_amode_wronly;  


Re: [OMPI devel] NFS race condition in romio

2007-01-08 Thread Adrian Knoth
On Mon, Jan 08, 2007 at 11:49:32PM +0100, Adrian Knoth wrote:

> The attached patch fixes this problem, but perhaps there is

New patch, I've missed the non-NFS case.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de
Index: ompi/mca/io/romio/romio/adio/common/ad_open.c
===
--- ompi/mca/io/romio/romio/adio/common/ad_open.c   (revision 1913)
+++ ompi/mca/io/romio/romio/adio/common/ad_open.c   (working copy)
@@ -5,6 +5,9 @@
  *   See COPYRIGHT notice in top-level directory.
  */

+#include 
+#include 
+
 #include "adio.h"
 #include "adio_extern.h"
 #include "adio_cb_config_list.h"
@@ -226,6 +229,17 @@
 }
 fd->access_mode = access_mode;

+if ((ADIO_NFS == fd->file_system)) {
+char *dirc = ADIOI_Strdup(filename);
+char *dname = dirname (dirc);
+int my_fd;
+
+my_fd = open (dname, O_RDONLY);
+//stat (my_fd, NULL);
+close (my_fd);
+ADIOI_Free(dirc);
+free (dname);
+}
 (*(fd->fns->ADIOI_xxx_Open))(fd, error_code);

 /* if error, may be it was due to the change in amode above. 


Re: [OMPI devel] NFS race condition in romio

2007-01-09 Thread Adrian Knoth
On Tue, Jan 09, 2007 at 12:03:38AM +0100, Adrian Knoth wrote:

> > The attached patch fixes this problem, but perhaps there is
> New patch, I've missed the non-NFS case.

This patch was wrong, too (containing a double free segfault).
Don't code when dog-tired... ;)

I've create ticket #733 and attached the new (3rd) patch, so not
everybody on the list gets spammed with diff files.



-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] SOS!! Run-time error

2007-04-15 Thread Adrian Knoth
On Sun, Apr 15, 2007 at 01:40:01PM -0400, chaitali dherange wrote:

> Hi,

Hi!

>   I have downloaded the developer version of source code by downloading a
> nightly Subversion snapshot tarball.And have installed the openmpi.

Things are getting much clearer when you compile Open MPI with
--enable-debug.


> [oolong:09783] *** Process received signal ***
> [oolong:09783] Signal: Segmentation fault (11)
> [oolong:09783] Signal code:  (128)
> [oolong:09783] Failing at address: (nil)

NULL-pointer dereference, so at least the segfault is correct ;)


HTH

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] SOS... help needed :(

2007-04-16 Thread Adrian Knoth
On Sun, Apr 15, 2007 at 10:25:06PM -0400, chaitali dherange wrote:

> Hi,

Hi!

> giving more priority to the MPI calls over the non MPI ones.

> static I mean.. we know that our clusters use Infiniband for MPI ...
> so all the non MPI communication can be assumed to be TCP
> communication using the 'mca_btl_tcp_send()' from the
> ompi/mca/btl/tcp/btl_tcp.c file.

I don't see why you call BTL/IB a MPI call, but BTL/TCP is non-MPI.

The BTL components are used to provide MPI data transport. Depending on
your installed hardware, this transport can be done via IB, Myrinet or
at least TCP. Open MPI is even able to mix multiple transports and do
message striping.

I suggest you read the comments in pml.h to make things clear. Don't get
confused, they still use the old terminology 'PTL' instead of 'BTL', but
just consider them to be equal.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] replace 'atoi' with 'strtol'

2007-04-18 Thread Adrian Knoth
On Wed, Apr 18, 2007 at 01:16:54PM -0400, George Bosilca wrote:

> That's right, long and int have the same size on Windows 32 and 64  
> bits (always 32 bits). However, they are considered as being  
> different types (!!!).

How about (u)int32_t? When I was an Ada programmer, subtypes with the
approriate range were always encouraged (i.e.: define the semantical
range and let the compiler/runtime library warn you on range
violations (the well-known "CONSTRAINT_ERROR"))


Adr"int consired harmful"ian

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


[OMPI devel] sockaddr* vs. sockaddr_storage*

2007-04-29 Thread Adrian Knoth
Hi, especially bosilca (George?)

r14544 broke the IPv6 support (see Ticket #1008). I've committed a quick
patch, but I guess we (George and me?) will have to look closer in order
to provide the desired functionality.

There's another question concerning r14544: why did you change
sockaddr_storage* to sockaddr* for some btl/tcp functions? Is there
something special about sockaddr*? For me, sockaddr_storage is a
complete replacement for sockaddr... we also have the necessary defines
in ipv6compat.h, so we could always make use of ss_family.


Cheerio

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] sockaddr* vs. sockaddr_storage*

2007-04-29 Thread Adrian Knoth
On Sun, Apr 29, 2007 at 10:18:01AM -0400, George Bosilca wrote:

> I have to ask you to remove r14549 quickly as it bring back the trunk  
> to the stage it was before r14544 (only random support for multiple  

I'll have a look how to accomplish both: IPv6 and a reverted r14549.

> BTL). It's not that I don't care about IPv6, it's just that I care  
> more about multi TCP BTL working in the way it is supposed to work.  

There'd be less trouble if we all had automatic testing, so nobody
breaks stuff somebody else relies on.

See, you have committed something that made my internal tests turn red:

   http://cluster.inf-ra.uni-jena.de:8010/

If I just had an URL indicating when *I* break something *you* rely on.


BTW: How does multi TCP BTL works? I see num_links, but I wonder if
kernel channel bonding would achieve the same results...

> PS: Please read the commit log for the r14544. It explain why I  
> changed from sockaddr_storage* to sockaddr*.

It doesn't:

   > Second, the IPv6 RFC suggest to use sockaddr_storage as a holder
   > for the IP information, but use a sockaddr* when we pass it to 
   > functions.

I don't understand the second part: "but use a sockaddr*". Why?


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] sockaddr* vs. sockaddr_storage*

2007-04-29 Thread Adrian Knoth
On Sun, Apr 29, 2007 at 06:07:03PM +0200, Adrian Knoth wrote:

> > I have to ask you to remove r14549 quickly as it bring back the trunk  
> > to the stage it was before r14544 (only random support for multiple  
> I'll have a look how to accomplish both: IPv6 and a reverted r14549.

Does r14550 satisfies your needs?


Cheerio

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] sockaddr* vs. sockaddr_storage*

2007-05-01 Thread Adrian Knoth
On Tue, May 01, 2007 at 07:39:07AM -0700, Jeff Squyres wrote:

> > (b) that
> > IPv6 was correctly operating...which were the two issues in this  
> > discussion.
> We currently do not have any IPv6 setup in our MPI testing equipment  

We automatically check every trunk commit against our IPv6 tests, so at
least someone from Jena would notice problems.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Add a bug fix to 1.2.x version

2007-05-02 Thread Adrian Knoth
On Wed, May 02, 2007 at 02:07:17PM +0300, Sharon Melamed wrote:

> Hi,

Hi!

> Change set 14463 - [1]https://svn.open-mpi.org/trac/ompi/changeset/14463.
> I would like to integrate this change to version 1.2.x.

I guess you're looking for

   https://svn.open-mpi.org/trac/ompi/wiki/SubmittingChangesetMoveReqs

HTH


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Fwd: [Open MPI] #1101: MPI_ALLOC_MEM with 0 size must be valid

2007-07-24 Thread Adrian Knoth
On Tue, Jul 24, 2007 at 08:41:27AM -0600, Brian Barrett wrote:

> > man malloc tells me this:
> > "If size was equal to 0, either NULL or a pointer suitable to be  
> > passed to free()
> > is returned". So may be we should just return NULL and be done with  
> > it?
> 
> Which is also what POSIX says:
> 
>http://www.opengroup.org/onlinepubs/009695399/functions/malloc.html
> 
> I vote with gleb -- return NULL, don't set errno, and be done with  

I'd like to second. Just if this is a poll ;)


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [Pkg-openmpi-maintainers] Bug#433142: openmpi: FTBFS on GNU/kFreeBSD

2007-07-24 Thread Adrian Knoth
On Sat, Jul 14, 2007 at 03:55:12PM -0500, Dirk Eddelbuettel wrote:

> | the current version fails to build on GNU/kFreeBSD.
> | 
> | It needs small fixups for munmap hackery and stacktrace.
> | It also needs to exclude linux specific build-depends.
> | Please find attached patch with that.
> 
> Thanks for that patch.

> | It would be nice if you can ask upstream
> | to include changes to opal/util/stacktrace.c and
> | opal/mca/memory/ptmalloc2/opal_ptmalloc2_munmap.c .

I've neither seen a ticket nor any discussion within the last days. Did
you get any response?

AFAIK, kFreeBSD isn't a major target for OMPI, but if these patches
doesn't break anything, I don't mind to include them.


I'll give them a run inside our virtual testing environment, but I'd
feel better with additional feedback from MTT.


HTH


PS: https://svn.open-mpi.org/trac/ompi/ticket/1105

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [u...@hermann-uwe.de: [Pkg-openmpi-maintainers] Bug#435581: openmpi-bin: Segfault on Debian GNU/kFreeBSD]

2007-08-02 Thread Adrian Knoth
On Thu, Aug 02, 2007 at 02:31:30AM +, Dirk Eddelbuettel wrote:

> Dear Open MPI developers,

Hi!

> We (as in the Debian maintainer for Open MPI) got this bug report from
> Uwe who sees mpi apps segfault on Debian systems with the FreeBSD
> kernel.

> Any input would be greatly appreciated!

Uwe, could you please recompile with --enable-debug and rerun the test?
If possible, also provide a gdb backtrace, probably with details about
the failing frame.

If you're able to provide shell access to kFreeBSD, things would even be
easier. If not, I'll follow the QEMU instructions on your website and
investigate on my own ;)


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [u...@hermann-uwe.de: [Pkg-openmpi-maintainers] Bug#435581: openmpi-bin: Segfault on Debian GNU/kFreeBSD]

2007-08-13 Thread Adrian Knoth
On Thu, Aug 02, 2007 at 10:51:13AM +0200, Adrian Knoth wrote:

> > We (as in the Debian maintainer for Open MPI) got this bug report from
> > Uwe who sees mpi apps segfault on Debian systems with the FreeBSD
> > kernel.
> > Any input would be greatly appreciated!
> I'll follow the QEMU instructions on your website and investigate on
> my own ;)

I was able to get OMPI running on kfreebsd-amd64. I used a nightly
snapshot from the trunk, so the problem is "more or less fixed by
upstream" ;)

adi@debian:~$ ./ompi/bin/mpirun -np 2 ring
0: sending message (0) to 1
0: sent message
1: waiting for message
1: got message (1) from 0, sending to 0
0: got message (1) from 1

adi@debian:~$ ./ompi/bin/ompi_info 
Open MPI: 1.3a1r15820
   Open MPI SVN revision: r15820
Open RTE: 1.3a1r15820
   Open RTE SVN revision: r15820
OPAL: 1.3a1r15820
   OPAL SVN revision: r15820
  Prefix: /home/adi/ompi
 Configured architecture: x86_64-unknown-kfreebsd6.2-gnu


I'll now compile the 1.2.3 release tarball and see if I can reproduce
the segfaults. On the other hand, I guess nobody is using OMPI on
GNU/kFreeBSD, so upgrading the openmpi-package to a subversion snapshot
would also fix the problem (think of "fixed in experimental").


JFTR: It's currently not possible to compile OMPI on amd64 (out of the
box). Though it compiles on i386

   
http://experimental.debian.net/fetch.php?&pkg=openmpi&ver=1.2.3-3&arch=kfreebsd-i386&stamp=1187000200&file=log&as=raw

it fails on amd64:

   
http://experimental.debian.net/fetch.php?&pkg=openmpi&ver=1.2.3-3&arch=kfreebsd-amd64&stamp=1186969782&file=log&as=raw

stacktrace.c: In function 'opal_show_stackframe':
stacktrace.c:145: error: 'FPE_FLTDIV' undeclared (first use in this
function)
stacktrace.c:145: error: (Each undeclared identifier is reported only
once
stacktrace.c:145: error: for each function it appears in.)
stacktrace.c:146: error: 'FPE_FLTOVF' undeclared (first use in this
function)
stacktrace.c:147: error: 'FPE_FLTUND' undeclared (first use in this
function)
make[4]: *** [stacktrace.lo] Error 1
make[4]: Leaving directory `/build/buildd/openmpi-1.2.3/opal/util'


This is caused by libc0.1-dev in /usr/include/bits/sigcontext.h, the
relevant #define's are placed in an #ifdef __i386__ condition. After
extending this for __x86_64__, everything works fine.

Should I file a bugreport against libc0.1-dev or will you take care?


I'll keep you posted...

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [u...@hermann-uwe.de: [Pkg-openmpi-maintainers] Bug#435581: openmpi-bin: Segfault on Debian GNU/kFreeBSD]

2007-08-13 Thread Adrian Knoth
On Mon, Aug 13, 2007 at 04:26:31PM -0500, Dirk Eddelbuettel wrote:

> > I'll now compile the 1.2.3 release tarball and see if I can reproduce

The 1.2.3 release also works fine:

adi@debian:~$ ./ompi123/bin/mpirun -np 2 ring
0: sending message (0) to 1
0: sent message
1: waiting for message
1: got message (1) from 0, sending to 0
0: got message (1) from 1

adi@debian:~$ ./ompi123/bin/ompi_info 
Open MPI: 1.2.3
   Open MPI SVN revision: r15136
Open RTE: 1.2.3
   Open RTE SVN revision: r15136
OPAL: 1.2.3
   OPAL SVN revision: r15136
  Prefix: /home/adi/ompi123
 Configured architecture: x86_64-unknown-kfreebsd6.2-gnu

> > the segfaults. On the other hand, I guess nobody is using OMPI on
> > GNU/kFreeBSD, so upgrading the openmpi-package to a subversion snapshot
> > would also fix the problem (think of "fixed in experimental").
> Well, I generally prefer to follow upstream releases, and Jeff from the
> upstream team echoed that. Let's wait for 1.2.4, shall we?

That's fine, v1.2 is the production release.

> | JFTR: It's currently not possible to compile OMPI on amd64 (out of the
> | box). Though it compiles on i386
> | 
> |
> http://experimental.debian.net/fetch.php?&pkg=openmpi&ver=1.2.3-3&arch=kfreebsd-i386&stamp=1187000200&file=log&as=raw
> | 
> | it fails on amd64:
> | 
> |
> http://experimental.debian.net/fetch.php?&pkg=openmpi&ver=1.2.3-3&arch=kfreebsd-amd64&stamp=1186969782&file=log&as=raw
> | 
> | stacktrace.c: In function 'opal_show_stackframe':
> | stacktrace.c:145: error: 'FPE_FLTDIV' undeclared (first use in this
> | function)
> | stacktrace.c:145: error: (Each undeclared identifier is reported only
> | once
> | stacktrace.c:145: error: for each function it appears in.)
> | stacktrace.c:146: error: 'FPE_FLTOVF' undeclared (first use in this
> | function)
> | stacktrace.c:147: error: 'FPE_FLTUND' undeclared (first use in this
> | function)
> | make[4]: *** [stacktrace.lo] Error 1
> | make[4]: Leaving directory `/build/buildd/openmpi-1.2.3/opal/util'
> | 
> | 
> | This is caused by libc0.1-dev in /usr/include/bits/sigcontext.h, the
> | relevant #define's are placed in an #ifdef __i386__ condition. After
> | extending this for __x86_64__, everything works fine.
> | 
> | Should I file a bugreport against libc0.1-dev or will you take care?
> I'm confused. What is libc0.1-dev?


   http://packages.debian.org/unstable/libdevel/libc0.1-dev

It's the "libc6-dev" for GNU/kFreeBSD, at least that's how I understand
it.

> Also note that I happened to have uploaded a third Debian revision of 1.2.3
> yesterday, and that Debian release 1.2.3-3 built fine on amd as per:
> 
> http://buildd.debian.org/build.php?&pkg=openmpi&ver=1.2.3-3&arch=amd64&file=log
> 
> So are we sure there's a bug?

Yes, absolutely. I was a little bit imprecise: with amd64, I ment
kfreebsd-amd64, not Linux-amd64.

If you follow my two links and read their headlines, you can see that
these are the buildlogs of 1.2.3-3 on kfreebsd, working for i386, but
failing for amd64.

This is caused by "wrong" libc headers on kfreebsd, that's why I thought
Uwe might want to have a look at it.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [Pkg-openmpi-maintainers] Bug#435581: [u...@hermann-uwe.de: Bug#435581: openmpi-bin: Segfault on Debian GNU/kFreeBSD]

2007-08-17 Thread Adrian Knoth
On Fri, Aug 17, 2007 at 02:11:02AM +0200, Uwe Hermann wrote:

> > | The 1.2.3 release also works fine:
> I think Adrian used a tarball, not the Debian package?
> I'll try a local, manual install too, maybe the bug is Debian-related only?

I've tried both: the tarball works fine, the Debian package
segfaults. I suspect it's the threading support, so someone (Uwe?) could
try to remove it from debian/rules.

Ok, I'll check this for amd64, but it will take some time to compile in
the qemu ;)




-- 
mail: a...@thur.de  http://adi.thur.de  PGP/GPG: key via keyserver

Die Stosstange ist aller Laster Anfang.


Re: [OMPI devel] [Pkg-openmpi-maintainers] Bug#435581: [u...@hermann-uwe.de: Bug#435581: openmpi-bin: Segfault on Debian GNU/kFreeBSD]

2007-08-17 Thread Adrian Knoth
On Fri, Aug 17, 2007 at 09:25:05AM +0200, Adrian Knoth wrote:

> I've tried both: the tarball works fine, the Debian package
> segfaults. I suspect it's the threading support, so someone (Uwe?) could
> try to remove it from debian/rules.

Ok, --enable-progress-threads and --enable-mpi-threads cause the
segfaults. If you compile without, everything works.

I'll now try if it's mpi-threads or the progress-threads, and also check
the upcoming v1.2.4.


How does Debian feel about disabling threads on kFreeBSD? Are there
known issues with pthreads on kFreeBSD?

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [Pkg-openmpi-maintainers] Bug#435581: [u...@hermann-uwe.de: Bug#435581: openmpi-bin: Segfault on Debian GNU/kFreeBSD]

2007-08-17 Thread Adrian Knoth
On Fri, Aug 17, 2007 at 08:26:50AM -0400, Jeff Squyres wrote:

> > Ok, --enable-progress-threads and --enable-mpi-threads cause the
> > segfaults. If you compile without, everything works.
> 
> > I'll now try if it's mpi-threads or the progress-threads, and also  
> > check
> > the upcoming v1.2.4.
> The --enable-progress-threads and --enable-mpi-threads configure  
> options result in broken-ness on the v1.2 branch; you should not use  
> them.

That's why I wondered why Debian has enabled them.

Dirk: Do you mind removing them from debian/rules, thus fixing the
issue?



-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Small manual page patches from Debian package

2007-09-28 Thread Adrian Knoth
On Thu, Sep 27, 2007 at 09:18:39PM -0500, Dirk Eddelbuettel wrote:

> Dear Open MPI developers,

Hi!

> The Debian (source) package for Open MPI still carries a few tiny patches
> that we thought we had submitted to you, but then maybe we got that mixed up
> with some new manual pages I sent in on June 29.  In any event, the files are

Your changes were applied to the trunk (upcoming v1.3) and are still
present.

I'll file a Changeset Move Request to get them over to the v1.2 branch,
so they'll be shipped with ompi-1.2.5.

Thanks for the reminder.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16691

2007-11-08 Thread Adrian Knoth
On Thu, Nov 08, 2007 at 07:51:28AM -0500, Jeff Squyres wrote:

[r16691]
> Whoa; I'm not sure we want to apply this.

Me neither.

> All ROMIO patches *must* be coordinated with the ROMIO maintainers.   

Upstream? That's the upstream patch.

Jiri Polach has extracted the fix for this problem. Updating OMPI to a
newer ROMIO version should do the trick, so we might want to revert
r16693 and r16691.

You decide.

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16691

2007-11-08 Thread Adrian Knoth
On Thu, Nov 08, 2007 at 08:02:09AM -0500, Jeff Squyres wrote:

> >> All ROMIO patches *must* be coordinated with the ROMIO maintainers.
> > Upstream? That's the upstream patch.
> That was extracted from ROMIO itself?  Which release?

>From Jiri:


The patch was extracted from a ROMIO sources that come with MPICH2 1.0.6.

As noted on the ROMIO web page (http://www-unix.mcs.anl.gov/romio/):

"Note: The version of ROMIO described on this page is an old one. We
haven't released newer versions of ROMIO as independent packages for a
while; they were included as part of MPICH2 and MPICH-1. You can get the
latest version of ROMIO when you download MPICH2 or MPICH-1."


--- end of Jiri ---

> > Jiri Polach has extracted the fix for this problem. Updating OMPI to a
> > newer ROMIO version should do the trick, so we might want to revert
> > r16693 and r16691.
> It would be great to upgrade to a newer version of ROMIO.  Do you have  
> the cycles to do it?

Let's see ;) If life is going to be boring, I'll have a look at it ;)

> If this is slated for v1.3, then I think it would be much better to  
> back out that patch and then do a real upgrade.

ACK.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


[OMPI devel] IPv4 mapped IPv6 addresses

2007-12-14 Thread Adrian Knoth
Hi!

The current BTL/TCP and OOB/TCP code contains separate sockets for IPv4
and IPv6. Though it has never been a problem for me, this might cause an
out-of-FDs-error in large clusters. (IIRC, rhc has already pointed out
this issue)

A possible way to reduce FD consumption would be the use of IPv4 mapped
IPv6 addresses. These addresses let one use a single AF_INET6 socket for
both, IPv4 and IPv6.

One year ago, I've chosen not to employ these addresses for mainly two
reasons:

   - Windows XP doesn't support them
   - OpenBSD has disabled them, but the system administrator can enable
 them at runtime

These limitions are also mentioned here: 

   http://en.wikipedia.org/wiki/IPv4_mapped_address#Limitations

Nowadays, Vista (and the Windows Server line) has support for
IPv4-mapped IPv6 addresses.

If disabled on OpenBSD systems, the code wouldn't be able to do IPv4,
but as already mentioned, the admin could easily fix this.

Should we consider moving towards these mapped addresses? The
implications:

   - less code, only one socket to handle
   - better FD consumption
   - breaks WinXP support, but not Vista/Longhorn or later
   - requires non-default kernel runtime setting on OpenBSD for IPv4
 connections

FWIW, FD consumption is the only real issue to consider.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Minor patch for !IPV6_V6ONLY

2008-01-01 Thread Adrian Knoth
On Mon, Dec 31, 2007 at 08:05:38PM -0800, Paul H. Hargrove wrote:

> I just tried today to build the OMPI trunk on an old RH8 box and found 
> that for
>   OPAL_WANT_IPV6 && !defined(IPV6_V6ONLY)
> the file oob_tcp.c fails to compile due to unbalanced braces.
> 
> Swapping an #endif with a closing branc (patch below) fixed the problem 
> for me.

Thanks for the patch, you were absolutely right. Fixed in r17028.



-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] btl tcp port to xensocket

2008-01-09 Thread Adrian Knoth
On Tue, Jan 08, 2008 at 10:51:45PM -0800, Muhammad Atif wrote:

> I am planning to port tcp component to xensocket, which is a fast
> interdomain communication mechanism for guest domains in Xen. I may

Just to get things right: You first partition your SMP/Multicore system
with Xen, and then want to re-combine it later for MPI communication?

Wouldn't it be easier to leave the unpartitioned host alone and use
shared memory communication instead?

> As per design, and the fact that these sockets are not normal sockets,
> I have to pass certain information (basically memory references, guest
> domain info etc) to other peers once sockets have been created. I

There's ORTE, the runtime environment. It employs OOB/tcp to have a so
called out-of-band channel. ORTE also provides a general purpose
registry (GPR).

Once a TCP connection between the headnode process and all other peers
is established, you can store your required information in the GPR.

> understand that mca_pml_base_modex_send and recv (or simply using
> mca_btl_tcp_component_exchange) can be used to exchange information,

Use mca_pml_base_modex_send (now ompi_modex_send) and encode your
required information. It's getting stored in the GPR. Read it back with
mca_pml_base_modex_recv (ompi_modex_recv), as it is done in
mca_btl_tcp_component_exchange and mca_btl_tcp_proc_create.

> but I cannot seem to get them to communicate. So to put my question in
> a very simple way. I want to create a socket structure containing
> necessary information, and then pass it to all other peers before
> start of actual mpi communication. What is the easiest way to do it.


Quite the same way. mca_btl_tcp_component_exchange assembles the
required information and stores it in the GPR by calling
ompi_modex_send.

mca_btl_tcp_proc_create (think of "the other peers") reads this
information into local context.


I guess you might want to copy btl/tcp to let's say btl/xen, so you can
modify internal structures, if required. Perhaps xensockets don't need
IP addresses, as they are actually memory sockets.

However, you'll still need TCP communication between Xen guests for the
OOB channel.


As mentioned above, I'm not sure if it's reasonable to use Xen and MPI
at all. Virtualization overhead might decrease your performance, and
that's usually the last thing you want to have when using MPI ;)


HTH

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] btl tcp port to xensocket

2008-01-17 Thread Adrian Knoth
On Tue, Jan 15, 2008 at 04:07:02PM -0800, Muhammad Atif wrote:

> Just for reference, I am trying to port btl/tcp to xensockets. Now if
> i want to do modex send/recv , to my understanding, mca_btl_tcp_addr_t
> is used (ref code/function is mca_btl_tcp_component_exchange). For
> xensockets, I need to send only one additional integer remote_domU_id
> across to say all the peers (in refined code it would be specific to
> each domain, but i just want to have clear understanding before i move
> any further). Now I have changed the struct mca_btl_tcp_addr_t present
> in btl_tcp_addr.h and have added int r_domu_id. This makes the size of
> structure 12. Upon receive mca_btl_tcp_proc_create() gives an error
> after mca_pml_base_modex_recv() and at this statement if(0 != (size %
> sizeof(mca_btl_tcp_addr_t))) that size do not match. It is still
> expecting size 8, where as i have made the size 12.  I am unable to
> pin point the exact location where the size 8 is still embedded. Any
> ideas?

Just an idea: the mca_base_modex_recv error gives you this error:

   BTL_ERROR(("mca_base_modex_recv: invalid size %d: btl-size:
   %d\n", size, sizeof(mca_btl_tcp_addr_t)));


So what is wrong? Is btl-size shown as 12 or as 8? It should be 12. And
is size just 8? So this means you forgot to include your new socket in
your modex_send_request.

See mca_btl_tcp_component_exchange: We copy the information to be sent
into the addrs array and increase xfer_size afterwards (telling the
function how many bytes to be transferred).

Perhaps you missed something there.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Trunk borked

2008-01-28 Thread Adrian Knoth
On Mon, Jan 28, 2008 at 07:26:56AM -0700, Ralph H Castain wrote:

> We seem to have a problem on the trunk this morning. I am building on a

There are more errors:

/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request
for member `__pos' in something not a structure or union
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos64':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request
for member `__pos' in something not a structure or union
gmake[5]: *** [vt_iowrap.o] Error 1
gmake[5]: Leaving directory
`/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib'
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request
for member `__pos' in something not a structure or union
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos64':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request
for member `__pos' in something not a structure or union
gmake[5]: *** [vt_iowrap.o] Error 1
gmake[5]: Leaving directory
`/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib'


Just my $0.02

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307

2008-01-30 Thread Adrian Knoth
On Tue, Jan 29, 2008 at 07:37:42PM -0500, George Bosilca wrote:

> The previous code was correct. Each IP address correspond to a  
> specific endpoint, and therefore to a specific BTL. This enable us to  
> have multiple TCP BTL at the same time, and allow the OB1 PML to  
> stripe the data over all of them.
> 
> Unfortunately, your commit disable the multi-rail over TCP. Please  
> undo it.

That's exactly what I had in mind when I said "this might break
functionality".

So we need as many endpoints as IP addresses? Then, simply connecting
them leads to oversubscription: two parallel connections on the same
media. That's where the kernel index enters the scene: we'll have to
make sure not to open two parallel connections to the same remote kernel
index.

I'll revert the patch and come up with another solution, but for the
moment, let me point out that the assumption "One interface, one
address" isn't true. So, the previous code was also wrong.


I hope not to run into model limitations: avoiding oversubscription
means to keep the number of endpoints per peer lower than the amount of
his interfaces, but accepting incoming connections from this peer means
to have all his addresses (probably more than #remote_NICs) available in
order to accept them.

As mentioned earlier: it's very common to have multiple addresses per
interface, and it's the kernel who assigns the source address, so
there's nothing one could say about an incoming connection. Only that it
could be any of all exported addresses. Any.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307

2008-01-30 Thread Adrian Knoth
On Wed, Jan 30, 2008 at 09:20:45AM -0500, Tim Mattox wrote:

> > As mentioned earlier: it's very common to have multiple addresses per
> > interface, and it's the kernel who assigns the source address, so
> > there's nothing one could say about an incoming connection. Only that it
> > could be any of all exported addresses. Any.
> This is only partially correct.  Yes, by default the Linux kernel will
   ^^^
> fill in the IP header with any of the IP addresses associated with

I just reverted the patch and will look for a fix. As Jeff always says:
Let OMPI work out of the box.

So I cannot rely on special /proc settings.


Anyway, thanks for the pointer.

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307

2008-01-30 Thread Adrian Knoth
On Wed, Jan 30, 2008 at 12:05:50PM -0500, George Bosilca wrote:

> What is the real issue behind this whole discussion?

Hanging connections. See

   https://svn.open-mpi.org/trac/ompi/ticket/1206

The multi-address peer tries to connect, but btl_tcp_proc_accept denies
due to not matching addresses. (less btl_endpoints than possible source
addresses)

r17331 and r17332 haven't fixed the issue. Don't code when leaving the
office ;) I'll have a look at it tomorrow.

Sorry for all the noise in the trunk.

> multiple IP addresses by interface the connection step will work. Now  
> I can see a benefit of having multiple socket over the same link (and  
> it's already implemented in Open MPI), but I don't see the interest of  
> using multiple IP in this case.

I have an easy to reproduce testcase for #1206. If you like, we can step
through the debugger in a shared screen (screen -x) or VNC session.

Just mail me if you're interested. ;)



-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307

2008-01-30 Thread Adrian Knoth
On Wed, Jan 30, 2008 at 03:38:00PM +0100, Bogdan Costescu wrote:

> The results is that, with the default Linux kernel settings, there is 
> no way to tell which way a connection will take in a multi-rail TCP/IP 
> setup. Even more, when the ARP cache expires and a new ARP request is 
> made, the answer (MAC address) from the target/destination could be 
> different, so that from that moment on the connection could switch to 
> a different media. I've tested this recently with the RHEL5 kernels 
> with one gigabit and one Myri-10G connection, seeing a TCP stream 
> switching randomly between the gigabit and the Myri-10G connection.

That's weird. I've never seen this, but according to the various ARP
settings in the Linux kernel, I could imagine such a scenario.

IPv6 doesn't use ARP, but neighbourhood discovery. It's completely
different, and I hope it behaves "link local". It's a whole protocol
("ICMPv6"), so things might be better.


JFTR: http://www-uxsup.csx.cam.ac.uk/courses/ipv6_basics/x84.html

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307

2008-01-31 Thread Adrian Knoth
On Wed, Jan 30, 2008 at 06:48:54PM +0100, Adrian Knoth wrote:

> > What is the real issue behind this whole discussion?
> Hanging connections.
> I'll have a look at it tomorrow.

To everybody who's interested in BTL-TCP, especially George and (to a
minor degree) rhc:

I've integrated something what I call "magic address selection code".
See the comments in r17348.

Can you check

   https://svn.open-mpi.org/svn/ompi/tmp-public/btl-tcp

if it's working for you? Read: multi-rail TCP, FNN, whatever is
important to you?


The code is proof of concept and could use a little tuning (if it's
working at all. Over here, it satisfies all tests).

I vaguely remember that at least Ralph doesn't like

   int a[perm_size * sizeof(int)];

where perm_size is dynamically evaluated (read: array size is runtime
dependent)

There are also some large arrays, search for MAX_KERNEL_INTERFACE_INDEX.
Perhaps it's better to replace them with an appropriate OMPI data
structure. I don't know what fits best, you guys know the details...


So please give the code a try, and if it's working, feel free to cleanup
whatever is necessary to make it the OMPI style or give me some pointers
what to change.


I'd like to point to Thomas' diploma thesis. The PDF explains the theory
behind the code, it's like an rationale. Unfortunately, the PDF has some
typos, but I guess you'll get the idea. It's a graph matching algorithm,
Chapter 3 covers everything in detail:

 http://cluster.inf-ra.uni-jena.de/~adi/peiselt-thesis.pdf


HTH

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


[OMPI devel] New address selection for btl-tcp (was Re: [OMPI svn] svn:open-mpi r17307)

2008-02-12 Thread Adrian Knoth
On Fri, Feb 01, 2008 at 11:40:20AM -0500, Tim Prins wrote:

> Adrian,

Hi!

Sorry for the late reply and thanks for your testing.

> 1. There are some warnings when compiling:

I've fixed these issues.

> 2. If I exclude all my tcp interfaces, the connection fails properly, 
> but I do get a malloc request for 0 bytes:
> tprins@odin examples]$ mpirun -mca btl tcp,self  -mca btl_tcp_if_exclude 
> eth0,ib0,lo -np 2 ./ring_c
> malloc debug: Request for 0 bytes (btl_tcp_component.c, 844)
> malloc debug: Request for 0 bytes (btl_tcp_component.c, 844)
> 

Not my fault, but I guess we could fix it anyway. Should we?

> 3. If the exclude list does not contain 'lo', or the include list 
> contains 'lo', the job hangs when using multiple nodes:

That's weird. Loopback interfaces should automatically be excluded right
from the beginning. See opal/util/if.c.

I neither know nor haven't checked where things go wrong. Do you want to
investigate? As already mentioned, this should not happen.

Can you post the output of "ip a s" or "ifconfig -a"?

> However, the great news about this patch is that it appears to fix 
> https://svn.open-mpi.org/trac/ompi/ticket/1027 for me.

It also fixes my #1206. I'd like to merge tmp-public/btl-tcp into the
trunk, especially before the 1.3 code freeze. Any objections?


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] New address selection for btl-tcp (was Re: [OMPI svn] svn:open-mpi r17307)

2008-02-22 Thread Adrian Knoth
On Fri, Feb 15, 2008 at 09:02:10AM -0500, Tim Prins wrote:

> >> 3. If the exclude list does not contain 'lo', or the include list 
> >> contains 'lo', the job hangs when using multiple nodes:
> > That's weird. Loopback interfaces should automatically be excluded right
> > from the beginning. See opal/util/if.c.
> I took a quick glance at this file, and I'd be lying if I said I 
> understood what was going on in it. One thing I did notice is that the 
> parameter btl_tcp_if_exclude defaults to 'lo', but the user can of 
> course overwrite it.

I was wrong. To be more precise, there are conflicting comments in if.c:

#if 0
if ((ifr->ifr_flags & IFF_LOOPBACK) != 0)
continue;
#endif


And:

/* skip interface if it is a loopback device (IFF_LOOPBACK set) */
/* or if it is a point-to-point interface */
/* TODO: do we really skip p2p? */
if(0 != (cur_ifaddrs->ifa_flags & IFF_LOOPBACK)
|| 0!= (cur_ifaddrs->ifa_flags & IFF_POINTOPOINT)) {
continue;
}

and:

if ( (! IN6_IS_ADDR_LOOPBACK (&my_addr->sin6_addr)) &&
 (! IN6_IS_ADDR_LINKLOCAL (&my_addr->sin6_addr))) {
/* create interface for newly found address */


and:

/* generate the interface name on your own 
   loopback: lo
   Rest:eth0, eth1, . */

if (if_list[i].iiFlags & IFF_LOOPBACK) {
sprintf (intf.if_name, "lo");
} else {
sprintf (intf.if_name, "eth%u", interface_counter++);
}


To be honest: When porting to IPv6, I've excluded lo, because I see no
use in using it.

That is what the code reflects: 127.0.0.1 is included (IPv4-lo), but ::1
is excluded (IPv6-lo).


> It might be worth looking into this further. If the user got an error or 
> the job aborted if they did something wrong with 'lo' I would not worry 
> about it at all. But the fact that it causes a hang is worrisome to me.

It could be treated as the user's fault.

I see three approaches:

   a) remove lo globally (in if.c). I expect objections. ;)

   b) print a warning from BTL/TCP if the interfaces in use contain lo.
  Like "Warning: You've included the loopback for communication.
This may cause hanging processes due to unreachable peers."

   c) Throw away 127.0.0.1 on the remote side. But when doing so, what's
  the use for including it at all?


So as mentioned earlier: It could be the user's fault. ;) If he includes
lo, this means he wants to announce 127.0.0.1 to remote peers. And this
sounds useless (unless you want local communication without SM).




-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


[OMPI devel] Logo as a vector graphic

2008-03-13 Thread Adrian Knoth
Hi!

Next week, I'll have a talk at ICNS'08:

http://www.iaria.org/conferences2008/ICNS08.html

(Is anybody around?)

I'd like to show the Open MPI logo on one of my slides, but I cannot
find a vectorized version (svg, eps, whatever) or at least a high-res
bitmap.


Does such a file exist and if so, where I can download it?


TIA

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Logo as a vector graphic

2008-03-13 Thread Adrian Knoth
On Thu, Mar 13, 2008 at 08:07:18AM -0500, Jeff Squyres wrote:

> Try this one. 

Thanks, that's beautiful. I'll send you the slides once they are ready,
the logo really fits well ;)

> We usually snip off the words at the bottom.

I also did so. How do you crop the image? I used pdfcrop which is part
of the tetex distribution, but I guess there are better PS editors out
for Linux/Unix. I didn't find one, pdfcrop was fine, but JFTR...


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Logo as a vector graphic

2008-03-13 Thread Adrian Knoth
On Thu, Mar 13, 2008 at 06:06:12PM +0100, Andreas Schäfer wrote:

> > Heh.  I usually use the png or jpg version and just crop there.  :-)
> As this seems to be of public interest, please find attached a vector
> version of the logo without text. (-8

Now things are getting difficult... why is my version (see attachment)
so much smaller? ;) SCNR

adi@drcomp:/home/adi$ pdfinfo /tmp/openmpi_final.pdf
Title:  Untitled-1
Creator:Adobe Illustrator CS2
Producer:   Adobe PDF library 7.77
CreationDate:   Thu Mar 13 18:04:12 2008
ModDate:Thu Mar 13 18:04:12 2008
Tagged: no
Pages:  1
Encrypted:  no
Page size:  141.732 x 141.732 pts
File size:  193508 bytes
Optimized:  no
PDF version:1.4

adi@drcomp:/home/adi$ pdfinfo /tmp/ompilogo.pdf 
Creator:TeX
Producer:   pdfeTeX-1.21a
CreationDate:   Thu Mar 13 14:28:39 2008
Tagged: no
Pages:  1
Encrypted:  no
Page size:  125 x 127 pts
File size:  10068 bytes
Optimized:  no
PDF version:1.4


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


ompilogo.pdf
Description: Adobe PDF document


Re: [OMPI devel] Logo as a vector graphic

2008-03-29 Thread Adrian Knoth
On Thu, Mar 13, 2008 at 02:35:41PM +0100, Adrian Knoth wrote:

> > We usually snip off the words at the bottom.
> I also did so. How do you crop the image? I used pdfcrop which is part
> of the tetex distribution, but I guess there are better PS editors out
> for Linux/Unix. I didn't find one, pdfcrop was fine, but JFTR...

inkscape 0.46 and later supports editing PDFs. JFTR ;)


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] --disable-ipv6 broken on trunk

2008-04-02 Thread Adrian Knoth
On Wed, Apr 02, 2008 at 06:36:02AM -0400, Josh Hursey wrote:

> It seems that builds configured with '--disable-ipv6' are broken on  
> the trunk. I suspect r18055 for this break since the tarball from two  
> ---
> oob_tcp.c: In function `mca_oob_tcp_fini':
> oob_tcp.c:1364: error: structure has no member named `tcp6_listen_sd'
> oob_tcp.c:1365: error: structure has no member named `tcp6_recv_event'
> ---
> Can someone take a look at this?

Fixed in r18071. Thanks for observation.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


[OMPI devel] Change in btl/tcp

2008-04-16 Thread Adrian Knoth
Hi!

As of r18169, I've changed the acceptance rules for incoming BTL-TCP
connections.

The old code would have denied a connection in case of non-matching
addresses (comparison between source address and expected source
address).

Unfortunately, you cannot always say which source address an incoming
packet will have (it's the sender's kernel who decides), so rejecting a
connection due to "wrong" source address caused a complete hang.

I had several cases, mostly multi-cluster setups, where this has happend
all the time. (typical scenario: you're expecting the headnode's
internal address, but since you're talking to another cluster,
the kernel uses the headnode's external address)

Though I've tested it as much as possible, I don't know if it breaks
your setup, especially the multi-rail stuff. George?


Cheerio

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Change in btl/tcp

2008-04-18 Thread Adrian Knoth
On Fri, Apr 18, 2008 at 08:04:17AM -0400, Tim Prins wrote:

> Hi Adrian,

Hi!

> After this change, I am getting a lot of errors of the form:
> [sif2][[12854,1],9][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] 
> mca_btl_tcp_frag_recv: readv failed: Connection reset by
> peer (104)
> 
> See for instance: http://www.open-mpi.org/mtt/index.php?do_redir=615

That's weird. I've tried hello_c.c on about ten machines with different
network configurations, none of them showed any problems at all.

Do you have a very special setup? And if need be, would it be possible
to debug on your machine?


>From all MTT sites, this error only occurs on Odin and Sif. What's so
special with these clusters?

> I have found this especially easy to reproduce if I run 16 processes all 
> with just the tcp and self btls on the same machine, running the 
> 'hello_c' program in the examples directory.

Unfortunately, I can't reproduce it that way. If this is related to the
change, then it would mean that mca_btl_tcp_proc_accept() returns false,
either after the large loop or in mca_btl_tcp_endpoint_accept().

Do you have the cycles to add some BTL_VERBOSE-lines to see where things
go wrong? Or even to step through with the debugger?

If you want me to do it, I would provide you with my ssh key?


Cheerio


-- 
mail: a...@thur.de  http://adi.thur.de  PGP/GPG: key via keyserver

Das Sterben wird nur halb so schlimm, rauchst du KIM.


Re: [OMPI devel] Change in btl/tcp

2008-04-18 Thread Adrian Knoth
On Fri, Apr 18, 2008 at 01:00:40PM -0400, Josh Hursey wrote:

> The trick is to force Open MPI to use only tcp,self and nothing else.  
> Did you try adding this (-mca btl tcp,self) to the runtime parameter  
> set?

Sure. Even with 64 processes, I cannot trigger this behaviour. Neither
on Linux nor Solaris.

Any special compile flags?

I guess a little bit more debug output could probably reveal the
culprit.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Change in btl/tcp

2008-04-21 Thread Adrian Knoth
On Mon, Apr 21, 2008 at 09:04:28AM -0400, Josh Hursey wrote:

> Adrian,

Hi!

> Has there been any progress on this bug? If you still cannot reproduce  
> it, if you send either Tim Prins or I a debugging patch we can run  
> with it. Or we can try to arrange access to one of our machines for you.

A login would probably be the easiest.

> This bug is making it difficult for us to continue working off of the  
> trunk since we get these connection errors so frequently.

I propose the following: We (either you or me) immediately revert
r18169, thus enabling you to work on the trunk again.

Afterwards, I'll try to find a solution to the problem and commit it
when it passes all tests on your and my clusters. I'd really appreciate
a login to one or two machines at odin/sif to make sure things will be
right this time.

If you like, here's my ssh key:

   http://cluster.inf-ra.uni-jena.de/~adi/id_dsa.pub

Just add it to the authorized_keys file, so no password needs to be
exchanged.


TIA.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] multiple GigE interfaces...

2008-06-23 Thread Adrian Knoth
On Wed, Jun 18, 2008 at 05:13:28PM -0700, Muhammad Atif wrote:

>  Hi again... I was on a break from Xensocket stuff This time some
>  general questions...

Hi.

> question. What if I have multiple Ethernet cards (say 5) on two of my
> quad core machines.  The IP addresses (and the subnets of course) are 
> Machine A   Machine B
> eth0 is y.y.1.a y.y.1.z 
> eth1 is y.y.4.by.y.4.y
> eth2 is y.y.4.c   ...
> eth3 is y.y.4.d   ...
> 
>  ...

This sounds pretty weird. And I guess your netmasks don't allow to
separate the NICs, do they?

> from the FAQ's/Some emails in user lists  it is clear that if I want
> to run a job on multiple ethernets, I can use --mca btl_tcp_if_include
> eth0,eth1. This

You can, but you don't have to. If you don't specify something, OMPI
will choose "something right".

> will run the job on two of the subnets utilizing both the Ethernet
> cards. Is it doing some sort of load balancing? or some round robin
> mechanism? What part of code is responsible for this work?

As far as I know, it's handled by OB1 (PML), which does striping across
several BTL instances.

So in other words, as long as both segments are equally fast, the load
balancing should do fine. If they differ in performance, the OB1 doesn't
find an optimal solution. If you're hitting this case, ask htor, he has
an auto-tuning replacement, but that's not going to be part of OMPI.

> eth1,eth2,eth3,eth4. Notice that all of these ethNs are on same subnet.
> Even in the FAQ's (which mostly answers our lame questions)  its not
> entirely clear how communication will be done.  Each process will have
> tcp_num_btls equal to interfaces, but then what? Is it some sort of
> load balancing or similar stuff which is not clear in tcpdump?

I feel you could end up with communication stalls, the typical hang
situation. One problem that might occur: the TCP component looks for
remote addresses on the "same" network, so the component might be unable
to decide whether your IP is on the same physical network or uses
the wrong link. Then, you won't gain anything.

Another problem: at least the Linux kernel (without tweaking) decides
which interface and address to use for outgoing communication. If you
have multiple subnets, then the kernel would go for the closest match
between local and remote addresses, but in your case, it might be some
kind of lottery.


> related question is what if I want to run 8 process job (on 2x4
> cluster) and want to pin a process to an network interface. OpenMPI to
> my understanding does not give any control of allocating IP to a
> process (like MPICH)

You could just say btl_if_include=ethX, thus giving you the right
network interface. Obviously, this requires separate networks.


> or is there some magical --mca thingie. I think only way to go is
> adding routing tables... am i thinking in right direction? If yes, then
> the performance of my boxes decrease when i trying to force the routing

Routing should be fast, since it's done at kernel level. I cannot speak
for Xen-based virtual interfaces.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Funny warning message

2008-07-28 Thread Adrian Knoth
On Mon, Jul 28, 2008 at 05:14:29PM +0300, Lenny Verkhovsky wrote:

> -advisable to configure rd_win smaller then (rd_num - rd_low), but currently
> +advisable to configure rd_win bigger then (rd_num - rd_low), but currently
  ^ a


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] TCP BTL routability (was: ticket #972)

2008-07-29 Thread Adrian Knoth
On Tue, Jul 29, 2008 at 03:25:00PM -0400, Jeff Squyres wrote:

> For reference, the FAQ entry is here:
> 
> http://www.open-mpi.org/faq/?category=tcp#tcp-routability
> 
> It looks like we now *always* assume that two TCP peers are routable.   

As long as they share the same address family (IPv4 or IPv6).

> The code in question is in btl_tcp_proc.c with the loop starting at  
> line 413.

Yes. The FAQ is outdated, the new code is very different.

We now use graph theory, imagine a bipartite graph where each interface
is a vertex. (one peer on the left, the other on the right, no
connections inside each peer, only from left to right, hence a bipartite
graph).

Every edge in this graph is given a weight depending on its quality. The
quality is "defined" in btl_tcp_proc.h:

enum mca_btl_tcp_connection_quality { 
CQ_NO_CONNECTION,
CQ_PRIVATE_DIFFERENT_NETWORK,
CQ_PRIVATE_SAME_NETWORK,
CQ_PUBLIC_DIFFERENT_NETWORK,
CQ_PUBLIC_SAME_NETWORK
};

CQ_NO_CONNECTION (weight 0) is for different address families, so we
don't try to connect from IPv6 to IPv4 and vice versa. The more likely a
connection is going to be established, the higher the weight. So public
addresses on the same network (read: very close, probably sharing the
same link) are the best one can get, private addresses on different
networks have the lowest probability for a succeeding connection.

We then try to find a matching in the graph, this means, no two edges
may have a common endpoint on either side, thus avoiding
oversubscription.

In order to support striping, we look for the largest matching (read:
selecting as many edges (links) as possible).

In order to ensure connectivity, we then choose from the variety of
largest matchings the one with the highest summed weights. These edges
denote the addresses with the best probability for a succeeding
connection.

In terms of graph theory, this is called a maximum cardinality maximum
weight matching.


You can find the whole background story in Chapter 3:

   http://cluster.inf-ra.uni-jena.de/~adi/peiselt-thesis.pdf


We have also a brief IEEE paper on this:

   
http://www.ieeexplore.ieee.org/xpl/freeabs_all.jsp?isnumber=4476518&arnumber=4476565&count=56&index=46


In other words: #972 is somewhat obsolete, the FAQ entry should surely
be removed/updated. I don't know to which extent, but if you want me to
write some lines, I could probably invent a not so scientific
description.


HTH

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Additional excluded tcp inteface

2008-11-07 Thread Adrian Knoth
On Fri, Nov 07, 2008 at 09:49:43AM -0500, Rolf Vandevaart wrote:

> I do not think anyone will have a problem with this, but just thought I 
> would mention that I am planning on adding an additional interface to 
> the excluded list for the tcp btl.  I want to add "sppp" to the list. 
> This is an internal interface to one of our servers and needs to be 
> treated like the "lo" interface.

Is it possible to detect this interface and exclude it right from the
beginning in opal/util/if.c? Special flags that apply to this interface,
so we have a classification?


Just my $0.02

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universit�t Jena, Germany

private: http://adi.thur.de