>>>>> Manuel Prinz <man...@debian.org> writes:

 > thanks for the report! I also took this upstream, but unfortunately
 > neither upstream nor I can reproduce the bug since we do not have
 > multi-homed IPv6 hosts available for testing.

        “Fortunately,” it appears that you don't need one, as the
        problem apparently arises on multi-IPv4-homed hosts as well.

        Trying to work-around the problem, I've tried both the

    --mca oob_tcp_disable_family 6 \
    --mca btl_tcp_disable_family 6 \

        options' combination, and building the package without the IPv6
        support:

--- openmpi-1.4.2/debian/rules
+++ openmpi-1.4.2/debian/rules
@@ -57,6 +57,7 @@
                        --includedir=\$${prefix}/lib/openmpi/include    \
                        --with-devel-headers \
                        --enable-heterogeneous \
+                       --disable-ipv6 \
                        $(TORQUE)
 
 # Thread support disabled because it's broken, see bug #435581

        To my surprise, it didn't help!

        Then, however, I observed that the system is IPv4-multihomed
        just as well:

$ ip -4 
…
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state 
UNKNOWN qlen 1000
    inet 192.168.57.XX/24 scope global eth0
    inet 192.168.57.ZZ/24 scope global eth0
…
$ 

        As soon as I have removed one of the addresses (with
        # ip addr del), the problem was gone.  (As long as IPv6 is
        turned off, — I cannot drop the extra IPv6 addresses on that
        host without running into issues.)

        To reproduce the problem, one can try, e. g. (assuming A.B.C.D
        is an unused address in the network, MASK is the netmask, and
        ethN is the network interface):

root# ip addr add A.B.C.D/MASK dev ethN 
root# 

$ mkdir -- test 
$ cd test/ 
$ cp -- /usr/share/doc/hpcc/examples/_hpccinf.txt hpccinf.txt 
$ rm -f -- hpccoutf.txt 
$ mpirun.openmpi \
      --mca btl_base_verbose 30 \
      --mca oob_tcp_debug 1 \
      --mca oob_tcp_disable_family 6 \
      --mca btl_tcp_disable_family 6 \
      hpcc \
      < /dev/null 

        While normally this would create ‘hpccoutf.txt’ almost
        immediately, the problem being discussed will make ‘hpcc’ stuck
        before it'll try to open (create) the file.

        Removing the extra IP addresses should eliminate the problem.

[…]

-- 
Long Happy Life.

Attachment: pgpPJsl1uKvCm.pgp
Description: PGP signature

Reply via email to