Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address
On Sunday, July 24, 2022 at 10:21:48 AM UTC-4 n...@scylladb.com wrote: > On Sun, Jul 24, 2022 at 5:11 PM Waldek Kozaczuk > wrote: > >> BTW there are more routecache related patches that I think we can bring >> in from Spirent fork which I documented here - >> https://github.com/cloudius-systems/osv/issues/1197. >> This one looks like a bug fix - >> https://github.com/SpirentOrion/osv/commit/38df7da9fbe2f171297f41ecae58edc9bd436617 >> > > Yes, looks correct to me. > > >> This one more of an ehnancement - >> https://github.com/SpirentOrion/osv/commit/a20c112bf47a4a36b1fff17313f9d1b2e3ce919c >> . >> > > This patch seems to suggest our cache was missing rt_gateway and it needs > it. But why didn't I ever notice this? > Unfortunately the patch doesn't say what kind of real-life scenario it was > trying to fix. > > >> I'm not sure this is the *best* way to tell these two cases apart, but I >>> probably agree it's a "good enough" way. As you can >>> see in the comments I wrote in routecache.hh (which I should have >>> copy-edited more carefully, I see they have a lot of >>> silly mistakes...), routecache.hh is all about the art of the "good >>> enough", so it makes sense to continue this. >>> >> >> In nutshell, this new version of the logic checks the device associated >> with the cached (present) route entry to determine what kind of match it >> should do - just netmask if non-loopback or >> more involved in a loopback one. >> > > Yes, but consider for example that even on the real loopback (not an > optimization where the outside address is "converted" to loopback) you can > have 127.0.0.1 and 127.0.0.2 > both on the same loopback. The new code will force it to use two different > routing table entries instead of one. > > I think it will use one entry. Consider this part of the new code: 135 if (is_loopback_net(address)) { 136 return &e.rte; 137 } 138 // We shouldn't use this entry on IP addresses just because they're 139 // on the same network as our non-loopback address. So match the entire 140 // address. 141 if (e.address == address) { 142 return &e.rte; 143 } The 1st if statement will qualify both 127.0.0.1 and 127.0.0.2 as loopback, no? If that is the case, it will return the same entry for both. > It's not a huge problem - most people never use multiple loopback > addresses. But someone who will use twenty of these addresses (I'm > sometimes doing this to run multiple servers on the same host for testing), > will suffer because the route cache only has room for 10 routes. > > I'm tending to accept this patch, but I have a feeling that the "real" fix > would be something else. But this fix is probably "good enough". > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/6a0f1686-7b42-4788-98f5-f1a0f67e703fn%40googlegroups.com.
Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address
On Sun, Jul 24, 2022 at 5:11 PM Waldek Kozaczuk wrote: > BTW there are more routecache related patches that I think we can bring in > from Spirent fork which I documented here - > https://github.com/cloudius-systems/osv/issues/1197. > This one looks like a bug fix - > https://github.com/SpirentOrion/osv/commit/38df7da9fbe2f171297f41ecae58edc9bd436617 > Yes, looks correct to me. > This one more of an ehnancement - > https://github.com/SpirentOrion/osv/commit/a20c112bf47a4a36b1fff17313f9d1b2e3ce919c > . > This patch seems to suggest our cache was missing rt_gateway and it needs it. But why didn't I ever notice this? Unfortunately the patch doesn't say what kind of real-life scenario it was trying to fix. > I'm not sure this is the *best* way to tell these two cases apart, but I >> probably agree it's a "good enough" way. As you can >> see in the comments I wrote in routecache.hh (which I should have >> copy-edited more carefully, I see they have a lot of >> silly mistakes...), routecache.hh is all about the art of the "good >> enough", so it makes sense to continue this. >> > > In nutshell, this new version of the logic checks the device associated > with the cached (present) route entry to determine what kind of match it > should do - just netmask if non-loopback or > more involved in a loopback one. > Yes, but consider for example that even on the real loopback (not an optimization where the outside address is "converted" to loopback) you can have 127.0.0.1 and 127.0.0.2 both on the same loopback. The new code will force it to use two different routing table entries instead of one. It's not a huge problem - most people never use multiple loopback addresses. But someone who will use twenty of these addresses (I'm sometimes doing this to run multiple servers on the same host for testing), will suffer because the route cache only has room for 10 routes. I'm tending to accept this patch, but I have a feeling that the "real" fix would be something else. But this fix is probably "good enough". -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/CANEVyjsSGXAb%3DXrOL-XYx%3D21NHEHELFw-n2bsHxb%3DX-ULkwLZA%40mail.gmail.com.
Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address
BTW there are more routecache related patches that I think we can bring in from Spirent fork which I documented here - https://github.com/cloudius-systems/osv/issues/1197. This one looks like a bug fix - https://github.com/SpirentOrion/osv/commit/38df7da9fbe2f171297f41ecae58edc9bd436617 This one more of an ehnancement - https://github.com/SpirentOrion/osv/commit/a20c112bf47a4a36b1fff17313f9d1b2e3ce919c. What do you think about those above? The patch we are discussing here addresses "real" bug I have come accross myself. On Sunday, July 24, 2022 at 6:02:22 AM UTC-4 n...@scylladb.com wrote: > Hi, this code is probably correct given that both you and the Spirent guys > gave it a lot of thought, but there's something I'm not sure I understand: > > If I understand correctly (but please correct me if I'm wrong), the > problem is that OSv has somewhere an "optimization" that > internal connections to the node's external IP address are handled > internally (loopback) instead of being sent to the (virtual) network > card, so if we cache this knowledge, then next time if the connection is > of the "other" type (external connection after we cached the > internal route, or internal connection after the external route) the > result will be either not working (passing external connection > through loopback) or inefficient (passing internal connection through the > real device). > Right, that is exactly what I am thinking as well. I am not an expert here, but I think the if_simloop() function called from ether_output() is where that optimization happens. For illustration see this: "local call (from itself on 192.168.122.15)" 0x8119b040 /httpclient.so 015.368111615 net_packet_in b'IP 192.168.122.15.46890 > 192.168.122.15.8000: Flags ÆSÅ, seq 744996567, win 65535, options Æmss 1460,nop,wscale 6,sackOK,TS val 15368 ecr 0Å, length 0' log_packet_in(mbuf*, int) core/net_trace.cc:143 netisr_queue_workstream(netisr_workstream*, unsigned int, netisr_work*, mbuf*, int*) Æclone .constprop.0Å bsd/sys/net/netisr.cc:633 netisr_queue_src bsd/sys/net/netisr.cc:684 netisr_queue_src bsd/sys/net/netisr.cc:712 *if_simloop bsd/sys/net/if_loop.cc:291* *ether_output bsd/sys/net/if_ethersubr.cc:256* * ip_output(mbuf*, mbuf*, route*, int, ip_moptions*, inpcb*) bsd/sys/netinet/ip_output.cc:621* tcp_output bsd/sys/netinet/tcp_output.cc:1385 tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:465 tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:436 kern_connect bsd/sys/kern/uipc_syscalls.cc:374 "external call" 0x807e0040 virtio-net-rx0 4.284705130 net_packet_out b'IP 192.168.122.15.8000 > 192.168.122.1.45818: Flags Æ.Å, ack 512095, win 65534, length 0' log_packet_out(mbuf*, int) core/net_trace.cc:148 *ether_output bsd/sys/net/if_ethersubr.cc:397 //This should really be ether_output_frame* *ether_output bsd/sys/net/if_ethersubr.cc:366* *ip_output(mbuf*, mbuf*, route*, int, ip_moptions*, inpcb*) bsd/sys/netinet/ip_output.cc:621* tcp_twrespond bsd/sys/netinet/tcp_timewait.cc:553 tcp_twstart bsd/sys/netinet/tcp_timewait.cc:281 tcp_do_segment(mbuf*, tcphdr*, socket*, tcpcb*, int, int, unsigned char, int, bool&) bsd/sys/netinet/tcp_input.cc:2511 tcp_input bsd/sys/netinet/tcp_input.cc:956 ip_input bsd/sys/netinet/ip_input.cc:774 netisr_dispatch_src bsd/sys/net/netisr.cc:769 BTW in case of the local call the qualification "net_packet_in" vs "net_packet_out" might be confusing. So this if_simloop() might be what is used to handle this kind of "loops" to itself. > If I understand correctly your patch works because external connections > happen to look up the gateway (*.1) > instead of the actual IP address, so it makes it easy to tell these two > cases apart. Is this true? > I think these are actually the "responses" back from 192.168.122.15 to the "world outside" (pointed by the gateway 192.168.122.1) when the routecache is used as well. > > I'm not sure this is the *best* way to tell these two cases apart, but I > probably agree it's a "good enough" way. As you can > see in the comments I wrote in routecache.hh (which I should have > copy-edited more carefully, I see they have a lot of > silly mistakes...), routecache.hh is all about the art of the "good > enough", so it makes sense to continue this. > In nutshell, this new version of the logic checks the device associated with the cached (present) route entry to determine what kind of match it should do - just netmask if non-loopback or more involved in a loopback one. > > > > -- > Nadav Har'El > n...@scylladb.com > > > On Sat, Jul 23, 2022 at 9:15 PM Waldemar Kozaczuk > wrote: > >> This patch modifies the OSv routecache to handle the case when a route >> entry gets added for a non-loopback address and loopback device before >> packets start coming from outside of the local network. The practical >> example is an app that on startup calls itself on its HTTP endpoint but >>
Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address
Hi, this code is probably correct given that both you and the Spirent guys gave it a lot of thought, but there's something I'm not sure I understand: If I understand correctly (but please correct me if I'm wrong), the problem is that OSv has somewhere an "optimization" that internal connections to the node's external IP address are handled internally (loopback) instead of being sent to the (virtual) network card, so if we cache this knowledge, then next time if the connection is of the "other" type (external connection after we cached the internal route, or internal connection after the external route) the result will be either not working (passing external connection through loopback) or inefficient (passing internal connection through the real device). If I understand correctly your patch works because external connections happen to look up the gateway (*.1) instead of the actual IP address, so it makes it easy to tell these two cases apart. Is this true? I'm not sure this is the *best* way to tell these two cases apart, but I probably agree it's a "good enough" way. As you can see in the comments I wrote in routecache.hh (which I should have copy-edited more carefully, I see they have a lot of silly mistakes...), routecache.hh is all about the art of the "good enough", so it makes sense to continue this. -- Nadav Har'El n...@scylladb.com On Sat, Jul 23, 2022 at 9:15 PM Waldemar Kozaczuk wrote: > This patch modifies the OSv routecache to handle the case when a route > entry gets added for a non-loopback address and loopback device before > packets start coming from outside of the local network. The practical > example is an app that on startup calls itself on its HTTP endpoint but > using non-loopback address like 192.168.122.15 before any external > requests from outside of the guest. This is exactly what would happen > when running SeaWeedFS (see #1188). This bug can be also replicated > with a golang-httpclient and httpserver-api apps: > > ./scripts/build image=golang-httpclient,httpserver-monitoring-api > .script/run.py -api -e '/httpclient.so > http://192.168.122.15:8000/os/version' > > After it starts, run 'curl http://localhost:8000/os/version' and > observe curl never receive the response. > > Let us assume we have an app that binds to 192.168.122.15 and listens > on some port. When a call to connect on a non-loopback address comes from > the app itself, it goes through the layers of the networking stack and > eventually calls route_cache::lookup() to find a route entry for > 192.168.122.15 as in the stack trace below. > > in route_cache::lookup (dst=dst@entry=0x814bb84c, > fibnum=, ret=ret@entry=0x814bb860) at > ./bsd/sys/net/routecache.hh:197 > in_pcbladdr (inp=inp@entry=0xa155a200, > faddr=faddr@entry=0x814bb98c, > laddr=laddr@entry=0x814bb988, cred=0x0) at > bsd/sys/netinet/in_pcb.cc:881 > in_pcbconnect_setup (inp=0xa155a200, nam=0xa08e0510, > laddrp=0x814bb9e4, lportp=0x814bb9e2, > faddrp=0xa155a264, fportp=0xa155a254, > oinpp=0x814bb9e8, cred=0x0) at bsd/sys/netinet/in_pcb.cc:1056 > tcp_connect (tp=tp@entry=0xa0f76800, > nam=nam@entry=0xa08e0510, > td=) at bsd/sys/netinet/tcp_usrreq.cc:1089 > tcp_usr_connect (td=, nam=0xa08e0510, > so=) at bsd/sys/netinet/tcp_usrreq.cc:463 > tcp_usr_connect (so=, nam=0xa08e0510, > td=) at bsd/sys/netinet/tcp_usrreq.cc:436 > kern_connect (fd=, sa=0xa08e0510) at > bsd/sys/kern/uipc_syscalls.cc:374 > > Initially the route cache is empty so it adds new entry that associates the > address 192.168.122.15, the loopback device and its netmask 255.0.0.0. > Once all subsequent networking stack calls to handle this particular > HTTP request complete, at some point later external client calls the > same HTTP endpoint from outside of OSv guest. While handling this call, > at some point we get to this point as illustrated by the stack trace below: > > in route_cache::lookup (dst=, fibnum=, > ret=0x80c4ead0) at ./bsd/sys/net/routecache.hh:197 > in tcp_maxmtu (inc=, flags=0x0) at > bsd/sys/netinet/tcp_subr.cc:1690 > in tcp_mssopt (inc=0xa0f7b910) at > bsd/sys/netinet/tcp_input.cc:3131 > in syncookie_generate (flowlabel=, > sc=0xa0f7b900, sch=) at > bsd/sys/netinet/tcp_syncache.cc:1536 > in _syncache_add (inc=, to=, th= out>, inp=, lsop=0x80c4edc8, m=0xa0f7bf00, > toepcb=0x0, tu=0x0) at bsd/sys/netinet/tcp_syncache.cc:1210 > in tcp_input (m=0xa0f7bf00, off0=) at > bsd/sys/netinet/tcp_input.cc:941 > in ip_input (m=) at bsd/sys/netinet/ip_input.cc:774 > in netisr_dispatch_src (proto=1, source=, > m=0xa0f7bf00) at bsd/sys/net/netisr.cc:769 > in netisr_dispatch_src (proto=9, source=, > m=0xa0f7bf00) at bsd/sys/net/netisr.cc:769 > in virtio::net::receiver (this=0x90ba3000) at > drivers/virtio-net.cc:545 > > This time the destinat
[osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address
This patch modifies the OSv routecache to handle the case when a route entry gets added for a non-loopback address and loopback device before packets start coming from outside of the local network. The practical example is an app that on startup calls itself on its HTTP endpoint but using non-loopback address like 192.168.122.15 before any external requests from outside of the guest. This is exactly what would happen when running SeaWeedFS (see #1188). This bug can be also replicated with a golang-httpclient and httpserver-api apps: ./scripts/build image=golang-httpclient,httpserver-monitoring-api .script/run.py -api -e '/httpclient.so http://192.168.122.15:8000/os/version' After it starts, run 'curl http://localhost:8000/os/version' and observe curl never receive the response. Let us assume we have an app that binds to 192.168.122.15 and listens on some port. When a call to connect on a non-loopback address comes from the app itself, it goes through the layers of the networking stack and eventually calls route_cache::lookup() to find a route entry for 192.168.122.15 as in the stack trace below. in route_cache::lookup (dst=dst@entry=0x814bb84c, fibnum=, ret=ret@entry=0x814bb860) at ./bsd/sys/net/routecache.hh:197 in_pcbladdr (inp=inp@entry=0xa155a200, faddr=faddr@entry=0x814bb98c, laddr=laddr@entry=0x814bb988, cred=0x0) at bsd/sys/netinet/in_pcb.cc:881 in_pcbconnect_setup (inp=0xa155a200, nam=0xa08e0510, laddrp=0x814bb9e4, lportp=0x814bb9e2, faddrp=0xa155a264, fportp=0xa155a254, oinpp=0x814bb9e8, cred=0x0) at bsd/sys/netinet/in_pcb.cc:1056 tcp_connect (tp=tp@entry=0xa0f76800, nam=nam@entry=0xa08e0510, td=) at bsd/sys/netinet/tcp_usrreq.cc:1089 tcp_usr_connect (td=, nam=0xa08e0510, so=) at bsd/sys/netinet/tcp_usrreq.cc:463 tcp_usr_connect (so=, nam=0xa08e0510, td=) at bsd/sys/netinet/tcp_usrreq.cc:436 kern_connect (fd=, sa=0xa08e0510) at bsd/sys/kern/uipc_syscalls.cc:374 Initially the route cache is empty so it adds new entry that associates the address 192.168.122.15, the loopback device and its netmask 255.0.0.0. Once all subsequent networking stack calls to handle this particular HTTP request complete, at some point later external client calls the same HTTP endpoint from outside of OSv guest. While handling this call, at some point we get to this point as illustrated by the stack trace below: in route_cache::lookup (dst=, fibnum=, ret=0x80c4ead0) at ./bsd/sys/net/routecache.hh:197 in tcp_maxmtu (inc=, flags=0x0) at bsd/sys/netinet/tcp_subr.cc:1690 in tcp_mssopt (inc=0xa0f7b910) at bsd/sys/netinet/tcp_input.cc:3131 in syncookie_generate (flowlabel=, sc=0xa0f7b900, sch=) at bsd/sys/netinet/tcp_syncache.cc:1536 in _syncache_add (inc=, to=, th=, inp=, lsop=0x80c4edc8, m=0xa0f7bf00, toepcb=0x0, tu=0x0) at bsd/sys/netinet/tcp_syncache.cc:1210 in tcp_input (m=0xa0f7bf00, off0=) at bsd/sys/netinet/tcp_input.cc:941 in ip_input (m=) at bsd/sys/netinet/ip_input.cc:774 in netisr_dispatch_src (proto=1, source=, m=0xa0f7bf00) at bsd/sys/net/netisr.cc:769 in netisr_dispatch_src (proto=9, source=, m=0xa0f7bf00) at bsd/sys/net/netisr.cc:769 in virtio::net::receiver (this=0x90ba3000) at drivers/virtio-net.cc:545 This time the destination address (dst) is 192.168.122.1 which is a gateway for the non-loopback (eth0) network. However at this point we have a single route entry created when handling the first request and the search() method used by route_cache::lookup() simply compares this entry network with the network of the IP address in question by applying netmask 255.0.0.0 and it matches so that entry is returned. The problem is that this and all subsequent packets get routed to the loopback device instead of the virtio device in this case. And the client never receives relevant packets to complete the handshake and TCP connection does not get established from the client perspective. Effectively the application will never be able to receive any external requests, however it can still handle any internal ones like the 1st one in this example. What is interesting if we revert the sequence in this example and send external request before the app makes an internal request to itself everything works fine. It is because when handling the external call, the initial and only route entry gets created for IP 192.168.122.1 and netmask 255.255.255.0 and non-loopback (eth0) device. The subsequent internal request would match the netmask and go through the ifloop() like this illustrates: 0x8119b040 /httpclient.so 015.368111615 net_packet_in b'IP 192.168.122.15.46890 > 192.168.122.15.8000: Flags [S], seq 744996567, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 15368 ecr 0], length 0' log_packet_in(mbuf*, int) core/net_trace.cc:143 n