On 6/8/15 1:58 PM, Hannes Frederic Sowa wrote:
Hi Shrijeet,

On Mo, 2015-06-08 at 11:35 -0700, Shrijeet Mukherjee wrote:
From: Shrijeet Mukherjee <s...@cumulusnetworks.com>

Incoming frames for IP protocol stacks need the IIF to be changed
from the actual interface to the VRF device. This allows the IIF
rule to be used to select tables (or do regular PBR)

This change selects the iif to be the VRF device if it exists and
the incoming iif is enslaved to the VRF device.

Since VRF aware sockets are always bound to the VRF device this
system allows return traffic to find the socket of origin.

changes are in the arp_rcv, icmp_rcv and ip_rcv paths

Question : I did not wrap the rcv modifications, in CONFIG_NET_VRF
as it would create code variations and the vrf_ptr check is there
I can make that whole thing modular.

 From an architectural level I think the output path looks good. For the
input path I would also to propose my (I think) more flexible solution:


Something is still not right on the output path. e.g., I see the wrong source address showing up on ping -I vrf0:

# ping -I vrf0 1.1.1.254
ping: Warning: source address might be selected on device other than vrf0.
PING 1.1.1.254 (1.1.1.254) from 172.16.1.52 vrf0: 56(84) bytes of data.
64 bytes from 1.1.1.254: icmp_seq=1 ttl=64 time=0.215 ms
...

The reason is because the datagram connect function fails to look up the outbound route in the vrf and falls back to the main table. (As an aside the fallback to other tables is something that should not be happening for VRFs; you want to use the table specific to the VRF.)

The route lookup fails because it passes in oif = vrf device (this VRF design relies on bind to device which sets oif in the flow). That is good for selecting the table to use for the lookups, but not good for selecting the route within the table.

This is one way to fix the connect problem:

diff --git a/include/net/route.h b/include/net/route.h
index fe22d03afb6a..a18798caec25 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -245,11 +245,18 @@ static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst, __be32
                     __be16 sport, __be16 dport,
                     struct sock *sk)
 {
+   struct net_device *dev = dev_get_by_index(sock_net(sk), oif);
    __u8 flow_flags = 0;

    if (inet_sk(sk)->transparent)
        flow_flags |= FLOWI_FLAG_ANYSRC;

+   if (dev) {
+       if (netif_is_vrf(dev))
+           flow_flags |= FLOWI_FLAG_VRFSRC;
+       dev_put(dev);
+   }
+
    flowi4_init_output(fl4, oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE,
               protocol, flow_flags, dst, src, dport, sport);
 }


which essentially tells fib_table_lookup to drop the OIF comparison after selecting the table per this change made in the patch Shrijeet posted:

                        if (!(flp->flowi4_flags & FLOWI_FLAG_VRFSRC)) {
                                if (flp->flowi4_oif &&
                                    flp->flowi4_oif != nh->nh_oif)
                                        continue;
                        }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to