Bogdan Costescu <bogdan.coste...@iwr.uni-heidelberg.de>: >- are all computers that should participate in a job configured >similarly (only IPv6 or both IPv4 and IPv6) ? If not all are, then >should some part of the computers communicate over one protocol and >the rest over the other ? I think that this split coomunication would
This should be really possible. If we do the connection handling code correctly, the Internet Protocol version should not matter. Many other daemons are coded right this way. The basic algorithm is like this: /* retrieve list of addresses bound to the given target host */ getaddrinfo(..., &addr_list); for (addr_res in addr_list) { /* initialize socket of the correct address family */ fd = socket(addr_res->ai_family, ...); if (try_to_connect(fd)) break; } So the resolver already does the complicated work for us, since it returns all addresses associated to a given target (hostname or IP-addr notation) in the order of decreasing preference. >- a related point is whether the 2 protocols should really be regarded >as 2 different communication channels. OpenMPI is able to use several >communication channels between 2 processes/MPI ranks at the same time, >so should the same physical interface be split between the 2 logical >protocols for communication between the same pair of computers ? This one is sort of complicated. According to OMPI, there are several interfaces on a host, and each interface has access to some fraction of the total bandwidth. Now we also have two different protocols on each interface. Possible scenarios: - We add the IP version to the OMP interface name. So instead of eth0 and eth1 we would get eth0 eth0.v6 eth1 eth1.v6. Using this approach one could quite easily state her preferences using the btl command line arguments. Of course, the latency/bandwidth code would need to be re-worked, since now all traffic on a IPv6 interface would take available bandwidth away from the corresponding IPv4 interface. - We do not add the IP version to the interface name, but perform protocol selection automatically based on resolver results. In this case the modification to the interface selection algorithm would probably a minor one. Backdraw: we cannot control the IP version beyond the resolver configuration, which is probably out of reach from the user. Since IPv6 imposes a slightly higher protocol overhead, users might want to use IPv4 in the local network, but cannot do anything if the automatic selection does it wrong. - We introduce another parameter, which allows an IP version selection both globally and on a per-interface basis. Something like: IPv4-only / prefer IPv4 / auto (resolver) / prefer IPv6 / IPv6-only The third approach would possibly the cleanest one. >of the computers. For example, if the remote computer has IPv6 >configured but the sshd is restricted to bind to IPv4, then a ssh >connection to this computer using the IPv6 address (which would be >specified in the hostfile) will fail, while OpenMPI processes [...] In my experience, this is no problem. We currently have some IPv6 test networks running and also one of our clusters does IPv6 on its internal ethernet. Hosts which are generally not IPv6-ready get no IPv6 address in the DNS / hosts file. This prevents any contact using IPv6, since their address is not known. Hosts which have some IPv6 support get a double entry in the DNS or hosts file. Since it is standard behaviour for every IPv6 app to try all known addresses for the target host until any one succeeds, we are also able to connect to a IPv6-enabled host where the target daemon does not listen on a IPv6 interface. For example, we ran several weeks without an IPv6-enabled rsh, which is used to handle MPI job startup on the cluster, without any problems. >IMHO, some discussion of them should occur before the actual coding... I agree. So here we go :-) Christian -- Dipl.-Inf. Christian Kauhaus <>< Lehrstuhl fuer Rechnerarchitektur und -kommunikation Institut fuer Informatik * Ernst-Abbe-Platz 1-2 * D-07743 Jena Tel: +49 3641 9 46376 * Fax: +49 3641 9 46372 * Raum 3217