Jason Gunthorpe wrote: > [...] The socket that is bound to a device will then use its device for > sending, > but other sockets not bound to devices will do route lookups and use the lo > device. > Do: [...] To see the difference in each side.
sure, makes sense, the ping-reply code does route lookup and will use the loopback device. I took a 2nd look on ping w.r.t to various sysctl states, and when rp_filter is set to its default > # sysctl -a | grep -wE "accept_local|rp_filter|arp_ignore" | grep ib > net.ipv4.conf.ib0.rp_filter = 1 > net.ipv4.conf.ib0.accept_local = 1 > net.ipv4.conf.ib0.arp_ignore = 1 > net.ipv4.conf.ib1.rp_filter = 1 > net.ipv4.conf.ib1.accept_local = 1 > net.ipv4.conf.ib1.arp_ignore = 1 ping isn't working since there's no arp reply > # ping -I ib0 192.168.20.100 > PING 192.168.20.100 (192.168.20.100) from 192.168.20.1 ib0: 56(84) bytes of > data. > From 192.168.20.1 icmp_seq=2 Destination Host Unreachable > From 192.168.20.1 icmp_seq=3 Destination Host Unreachable > From 192.168.20.1 icmp_seq=4 Destination Host Unreachable > # tcpdump -ni ib0 > 18:04:39.492306 ARP, Request who-has 192.168.20.100 tell 192.168.20.1, length > 56 > 18:04:40.492541 ARP, Request who-has 192.168.20.100 tell 192.168.20.1, length > 56 > # tcpdump -ni ib1 > 18:04:42.497039 ARP, Request who-has 192.168.20.100 tell 192.168.20.1, length > 56 > 18:04:43.497268 ARP, Request who-has 192.168.20.100 tell 192.168.20.1, length > 56 Once I'm setting net.ipv4.conf.ib1.rp_filter=0 arps replies are generated and ping is working as you explained, echo-request externally, echo-reply internally > # tcpdump -ni ib1 > 18:06:33.103248 ARP, Request who-has 192.168.20.100 tell 192.168.20.1, length > 56 > 18:06:33.103281 ARP, Reply 192.168.20.100 is-at > 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:02:6b:e8, length 56 > 18:06:33.103369 ARP, Reply 192.168.20.100 is-at > 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:02:6b:e8, length 56 > 18:06:33.103461 IP 192.168.20.1 > 192.168.20.100: ICMP echo request, id > 26906, seq 1, length 64 > 18:06:34.107465 IP 192.168.20.1 > 192.168.20.100: ICMP echo request, id > 26906, seq 2, length 64 Now, If I return rp_filter to 1, ping keeps working using the neighbour previously created. ping even keeps working when I set net.ipv4.conf.ib1.accept_local to 0, which is a bit weird unless this sysctl is made to act in the neigbour level (i.e control arp replies and not any packet xmit). > To really effect a full external loopback you need to have both sides > bound to their respective devices. Note that binding to a device and > binding to a source IP are not the same thing in Linux. Even without being fully into the details of what does binding to a source IP actually translates to, I understand there's a difference. > In the RDMA CM case the listening side doesn't do any IP > routing operations at all so a device bind isn't necessary. Yes, indeed. As for the active side, the RDMA CM doesn't have a BINDTODEVICE equivalent. As for the original issue we were discussing here, Sean - the conclusion is that with upstream 2.6.35 bits for the rdma connection to go from hca1 port1 to hca1 port2 (or from hca1 port1 to hca2 port1), the rdma-cm needs a neighbour, similarly to a ping -I ib0 to ib1 address. A neighbour isn't created unless the responding NIC (ib1 in my example) has both rp_filter set to 0 and accept_local set to 1, Jason, does this makes sense? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html