This patch modifies the OSv routecache to handle the case when a route
entry gets added for a non-loopback address and loopback device before
packets start coming from outside of the local network. The practical
example is an app that on startup calls itself on its HTTP endpoint but
using non-loopback address like 192.168.122.15 before any external
requests from outside of the guest. This is exactly what would happen
when running SeaWeedFS (see #1188). This bug can be also replicated
with a golang-httpclient and httpserver-api apps:

./scripts/build image=golang-httpclient,httpserver-monitoring-api
.script/run.py -api -e '/httpclient.so http://192.168.122.15:8000/os/version'

After it starts, run 'curl http://localhost:8000/os/version' and
observe curl never receive the response.

Let us assume we have an app that binds to 192.168.122.15 and listens
on some port. When a call to connect on a non-loopback address comes from
the app itself, it goes through the layers of the networking stack and
eventually calls route_cache::lookup() to find a route entry for
192.168.122.15 as in the stack trace below.

  in route_cache::lookup (dst=dst@entry=0xffff8000014bb84c, fibnum=<optimized 
out>, ret=ret@entry=0xffff8000014bb860) at ./bsd/sys/net/routecache.hh:197
  in_pcbladdr (inp=inp@entry=0xffffa0000155a200, 
faddr=faddr@entry=0xffff8000014bb98c, laddr=laddr@entry=0xffff8000014bb988, 
cred=0x0) at bsd/sys/netinet/in_pcb.cc:881
  in_pcbconnect_setup (inp=0xffffa0000155a200, nam=0xffffa000008e0510, 
laddrp=0xffff8000014bb9e4, lportp=0xffff8000014bb9e2, 
faddrp=0xffffa0000155a264, fportp=0xffffa0000155a254, oinpp=0xffff8000014bb9e8, 
cred=0x0) at bsd/sys/netinet/in_pcb.cc:1056
  tcp_connect (tp=tp@entry=0xffffa00000f76800, 
nam=nam@entry=0xffffa000008e0510, td=<optimized out>) at 
bsd/sys/netinet/tcp_usrreq.cc:1089
  tcp_usr_connect (td=<optimized out>, nam=0xffffa000008e0510, so=<optimized 
out>) at bsd/sys/netinet/tcp_usrreq.cc:463
  tcp_usr_connect (so=<optimized out>, nam=0xffffa000008e0510, td=<optimized 
out>) at bsd/sys/netinet/tcp_usrreq.cc:436
  kern_connect (fd=<optimized out>, sa=0xffffa000008e0510) at 
bsd/sys/kern/uipc_syscalls.cc:374

Initially the route cache is empty so it adds new entry that associates the
address 192.168.122.15, the loopback device and its netmask 255.0.0.0.
Once all subsequent networking stack calls to handle this particular
HTTP request complete, at some point later external client calls the
same HTTP endpoint from outside of OSv guest. While handling this call,
at some point we get to this point as illustrated by the stack trace below:

  in route_cache::lookup (dst=<optimized out>, fibnum=<optimized out>, 
ret=0xffff800000c4ead0) at ./bsd/sys/net/routecache.hh:197
  in tcp_maxmtu (inc=<optimized out>, flags=0x0) at 
bsd/sys/netinet/tcp_subr.cc:1690
  in tcp_mssopt (inc=0xffffa00000f7b910) at bsd/sys/netinet/tcp_input.cc:3131
  in syncookie_generate (flowlabel=<synthetic pointer>, sc=0xffffa00000f7b900, 
sch=<optimized out>) at bsd/sys/netinet/tcp_syncache.cc:1536
  in _syncache_add (inc=<optimized out>, to=<optimized out>, th=<optimized 
out>, inp=<optimized out>, lsop=0xffff800000c4edc8, m=0xffffa00000f7bf00, 
toepcb=0x0, tu=0x0) at bsd/sys/netinet/tcp_syncache.cc:1210
  in tcp_input (m=0xffffa00000f7bf00, off0=<optimized out>) at 
bsd/sys/netinet/tcp_input.cc:941
  in ip_input (m=<optimized out>) at bsd/sys/netinet/ip_input.cc:774
  in netisr_dispatch_src (proto=1, source=<optimized out>, 
m=0xffffa00000f7bf00) at bsd/sys/net/netisr.cc:769
  in netisr_dispatch_src (proto=9, source=<optimized out>, 
m=0xffffa00000f7bf00) at bsd/sys/net/netisr.cc:769
  in virtio::net::receiver (this=0xffff900000ba3000) at 
drivers/virtio-net.cc:545

This time the destination address (dst) is 192.168.122.1 which is a
gateway for the non-loopback (eth0) network. However at this point we have
a single route entry created when handling the first request and the search()
method used by route_cache::lookup() simply compares this entry network
with the network of the IP address in question by applying netmask
255.0.0.0 and it matches so that entry is returned. The problem is that
this and all subsequent packets get routed to the loopback device
instead of the virtio device in this case. And the client never receives
relevant packets to complete the handshake and TCP connection does
not get established from the client perspective. Effectively the
application will never be able to receive any external requests, however
it can still handle any internal ones like the 1st one in this example.

What is interesting if we revert the sequence in this example and send
external request before the app makes an internal request to itself
everything works fine. It is because when handling the external call,
the initial and only route entry gets created for IP 192.168.122.1 and
netmask 255.255.255.0 and non-loopback (eth0) device. The subsequent
internal request would match the netmask and go through the ifloop()
like this illustrates:

0xffff80000119b040 /httpclient.so   0        15.368111615 net_packet_in        
b'IP 192.168.122.15.46890 > 192.168.122.15.8000: Flags [S], seq 744996567, win 
65535, options [mss 1460,nop,wscale 6,sackOK,TS val 15368 ecr 0], length 0'
  log_packet_in(mbuf*, int) core/net_trace.cc:143
  netisr_queue_workstream(netisr_workstream*, unsigned int, netisr_work*, 
mbuf*, int*) [clone .constprop.0] bsd/sys/net/netisr.cc:633
  netisr_queue_src bsd/sys/net/netisr.cc:684
  netisr_queue_src bsd/sys/net/netisr.cc:712
  if_simloop bsd/sys/net/if_loop.cc:291
  ether_output bsd/sys/net/if_ethersubr.cc:256
  ip_output(mbuf*, mbuf*, route*, int, ip_moptions*, inpcb*) 
bsd/sys/netinet/ip_output.cc:621
  tcp_output bsd/sys/netinet/tcp_output.cc:1385
  tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:465
  tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:436
  kern_connect bsd/sys/kern/uipc_syscalls.cc:374

So this patch fixes this problem by changing the search() method
to handle the first scenario. In essence, instead of simply
comparing the networks of the entry IP address and dst, it first
identifies type of the device the entry is associated with. If
non-loopback it does the same as before, if device is loopback if checks
if dst is loopback in which case it is a match otherwise it compares
full IP addresses. With the patch applied, in first scenario
when the second external call is handled the entry device is
checked which is loopback, and new logic returns null as full addresses
would not match.

This patch has been backported from the original one by Jan-Michael Kho
contributed to the Spirent fork of OSv -
https://github.com/SpirentOrion/osv/commit/f6e6e54ba14a7e9c1e04574769d0dd1832d66b92.

Co-authored-by: "Jan-Michael Kho" <jan....@spirent.com>
Co-authored-by: Waldemar Kozaczuk <jwkozac...@gmail.com>
Signed-off-by: Waldemar Kozaczuk <jwkozac...@gmail.com>
---
 bsd/sys/net/routecache.hh | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/bsd/sys/net/routecache.hh b/bsd/sys/net/routecache.hh
index 0c5c1f40..f71ac8f6 100644
--- a/bsd/sys/net/routecache.hh
+++ b/bsd/sys/net/routecache.hh
@@ -54,6 +54,7 @@
 #include <bsd/sys/net/if.h>
 #include <bsd/sys/net/if_dl.h>
 #include <bsd/sys/netinet/in_var.h>
+#include <bsd/sys/netinet/in.h>
 
 #include <bsd/sys/net/route.h>
 
@@ -88,6 +89,10 @@ public:
 #endif
         return *this;
     }
+
+    bool is_loopback(void) const {
+        return (rt_ifp && (rt_ifp->if_flags & IFF_LOOPBACK)) ? true : false;
+    }
 };
 
 // Silly routing table implementation, allowing search given address in list
@@ -116,10 +121,26 @@ public:
         }
         entries.emplace_front(a, n, r);
     }
+    // address should be in host order
+    bool is_loopback_net(u32 address) const {
+        return ((address >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) ? true : 
false;
+    }
     nonlockable_rtentry *search(u32 address) {
         for (silly_rtable_entry &e : entries) {
-            if ((e.address & e.netmask) == (address & e.netmask)) {
-                return &e.rte;
+            if (e.rte.is_loopback() == false) {
+                if ((e.address & e.netmask) == (address & e.netmask)) {
+                    return &e.rte;
+                }
+            } else {
+                if (is_loopback_net(address)) {
+                    return &e.rte;
+                }
+                // We shouldn't use this entry on IP addresses just because 
they're
+                // on the same network as our non-loopback address. So match 
the entire
+                // address.
+                if (e.address == address) {
+                    return &e.rte;
+                }
             }
         }
         return nullptr;
-- 
2.35.1

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/20220723181515.293695-1-jwkozaczuk%40gmail.com.

Reply via email to