Hello, I seem to be experiencing the exact issue discussed below (back in December). I'm using the 2.6.27 kernel and the bonding drivers available in that kernel. Was there ever a solution or patch to solve this? I have been using the ib-bond scripts as well, but using other approaches like standard OS tools or adding the bond through sysfs all seem to have the same results.
Regular TCP/IP unicast works, though dmesg is full of warning about multicast failing. Multicast does not work at all. Any hints or suggestions would be greatly appreciated. Best regards, Dennis Portello > Or Gerlitz wrote: >>> If I am not mistaken the issue you mention is a little different from the one I pointed out. >>> Without bonding I see the following: >>> kernel: ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 >>> However, with bonding what I see is : >>> ib0: multicast join failed for 0001:0000:0000:0000:0000:0000:0000:0000, status -22 >> >> Please note that -11 EAGAIN (try again) is and -22 is EINVAL (invalid >> argument). So you can get EAGAIN when the underlying core sa agent is >> not ready to send SA queries, while you get EINVAL when attempting to >> join on a junk MGID. I am confident that for long time we see joins on >> junk MGIDs and it has been reported on this list (google...) in the >> past, no resolution yet. > > Or, > > I looked through the mailing list going back more than a year. The closest > I can find to this issue (-EINVAL) was when you reported problems with junk MGID on a > child interface (and that works properly now). > > I agree that the -EAGAIN problem has been known for some time now. However, this issue with > IPoIB bonding is new. My recollections are that it all worked properly around end October. > I had not tested since then, so this is something that must have cropped in the interregnum. > >> >> Under bonding there might be a window is time where from the kernel >> network stack perspective the bonding device ether-type is ethernet >> and not infiniband and hence the wrong (ip_eth_mc_map instead of >> ip_ib_mc_map) function would be called to do the mapping from the IP >> multicast address to the HW multicast address >> >> >>> Subsequently an ib-bond status does not reveal any slave as active as shown below: >>> ib-bond --status >>> bond0: 80:00:04:04:fe:80:00:00:00:00:00:00:00:05:ad:00:00:03:05:b9 >>> slave0: ib0 >>> slave1: ib1 >> >> As this script is not standard and deprecated, I would recommend not >> to use it but rather the classic /proc/net/bonding/bond0 entry, along >> with ip addr show on bond0, ib0, ib1 > Thanks for alerting me to the fact that the ib-bond script was deprecated. Again this seemed > to all work about 6 weeks ago. Is that (ib-bond is deprecated) documented somewhere? > > Pradeep >
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
