subject:"\[osv\-dev\] \[PATCH\] routecache\: handle case when sent to our non\-loopback address"

Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address

2022-07-24 Thread Waldek Kozaczuk



On Sunday, July 24, 2022 at 10:21:48 AM UTC-4 n...@scylladb.com wrote:

> On Sun, Jul 24, 2022 at 5:11 PM Waldek Kozaczuk  
> wrote:
>
>> BTW there are more routecache related patches that I think we can bring 
>> in from Spirent fork which I documented here - 
>> https://github.com/cloudius-systems/osv/issues/1197.
>> This one looks like a bug fix  - 
>> https://github.com/SpirentOrion/osv/commit/38df7da9fbe2f171297f41ecae58edc9bd436617
>>
>
> Yes, looks correct to me.
>  
>
>> This one more of an ehnancement - 
>> https://github.com/SpirentOrion/osv/commit/a20c112bf47a4a36b1fff17313f9d1b2e3ce919c
>> .
>>
>
> This patch seems to suggest our cache was missing rt_gateway and it needs 
> it. But why didn't I ever notice this?
> Unfortunately the patch doesn't say what kind of real-life scenario it was 
> trying to fix. 
>
>
>> I'm not sure this is the *best* way to tell these two cases apart, but I 
>>> probably agree it's a "good enough" way. As you can
>>> see in the comments I wrote in routecache.hh (which I should have 
>>> copy-edited more carefully, I see they have a lot of
>>> silly mistakes...), routecache.hh is all about the art of the "good 
>>> enough", so it makes sense to continue this.
>>>
>>  
>> In nutshell, this new version of the logic checks the device associated 
>> with the cached (present) route entry to determine what kind of match it 
>> should do - just netmask if non-loopback or
>> more involved in a loopback one.
>>
>
> Yes, but consider for example that even on the real loopback (not an 
> optimization where the outside address is "converted" to loopback) you can 
> have 127.0.0.1 and 127.0.0.2
> both on the same loopback. The new code will force it to use two different 
> routing table entries instead of one.
>
> I think it will use one entry. Consider this part of the new code:

135 if (is_loopback_net(address)) {
136 return &e.rte;
137 }
138 // We shouldn't use this entry on IP addresses just 
because they're
139 // on the same network as our non-loopback address. So 
match the entire
140 // address.
141 if (e.address == address) {
142 return &e.rte;
143 }

The 1st if statement will qualify both 127.0.0.1 and 127.0.0.2 as loopback, 
no? If that is the case, it will return the same entry for both. 
 

> It's not a huge problem - most people never use multiple loopback 
> addresses. But someone who will use twenty of these addresses (I'm 
> sometimes doing this to run multiple servers on the same host for testing), 
> will suffer because the route cache only has room for 10 routes.
>
> I'm tending to accept this patch, but I have a feeling that the "real" fix 
> would be something else. But this fix is probably "good enough".
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/6a0f1686-7b42-4788-98f5-f1a0f67e703fn%40googlegroups.com.

Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address

2022-07-24 Thread Nadav Har'El

On Sun, Jul 24, 2022 at 5:11 PM Waldek Kozaczuk 
wrote:

> BTW there are more routecache related patches that I think we can bring in
> from Spirent fork which I documented here -
> https://github.com/cloudius-systems/osv/issues/1197.
> This one looks like a bug fix  -
> https://github.com/SpirentOrion/osv/commit/38df7da9fbe2f171297f41ecae58edc9bd436617
>

Yes, looks correct to me.

> This one more of an ehnancement -
> https://github.com/SpirentOrion/osv/commit/a20c112bf47a4a36b1fff17313f9d1b2e3ce919c
> .
>

This patch seems to suggest our cache was missing rt_gateway and it needs
it. But why didn't I ever notice this?
Unfortunately the patch doesn't say what kind of real-life scenario it was
trying to fix.

> I'm not sure this is the *best* way to tell these two cases apart, but I
>> probably agree it's a "good enough" way. As you can
>> see in the comments I wrote in routecache.hh (which I should have
>> copy-edited more carefully, I see they have a lot of
>> silly mistakes...), routecache.hh is all about the art of the "good
>> enough", so it makes sense to continue this.
>>
>
> In nutshell, this new version of the logic checks the device associated
> with the cached (present) route entry to determine what kind of match it
> should do - just netmask if non-loopback or
> more involved in a loopback one.
>

Yes, but consider for example that even on the real loopback (not an
optimization where the outside address is "converted" to loopback) you can
have 127.0.0.1 and 127.0.0.2
both on the same loopback. The new code will force it to use two different
routing table entries instead of one.

It's not a huge problem - most people never use multiple loopback
addresses. But someone who will use twenty of these addresses (I'm
sometimes doing this to run multiple servers on the same host for testing),
will suffer because the route cache only has room for 10 routes.

I'm tending to accept this patch, but I have a feeling that the "real" fix
would be something else. But this fix is probably "good enough".

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjsSGXAb%3DXrOL-XYx%3D21NHEHELFw-n2bsHxb%3DX-ULkwLZA%40mail.gmail.com.

Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address

2022-07-24 Thread Waldek Kozaczuk

BTW there are more routecache related patches that I think we can bring in 
from Spirent fork which I documented here 
- https://github.com/cloudius-systems/osv/issues/1197.
This one looks like a bug fix  
- 
https://github.com/SpirentOrion/osv/commit/38df7da9fbe2f171297f41ecae58edc9bd436617
This one more of an ehnancement 
- 
https://github.com/SpirentOrion/osv/commit/a20c112bf47a4a36b1fff17313f9d1b2e3ce919c.
What do you think about those above?

The patch we are discussing here addresses "real" bug I have come accross 
myself.

On Sunday, July 24, 2022 at 6:02:22 AM UTC-4 n...@scylladb.com wrote:

> Hi, this code is probably correct given that both you and the Spirent guys 
> gave it a lot of thought, but there's something I'm not sure I understand:
>
> If I understand correctly (but please correct me if I'm wrong), the 
> problem is that OSv has somewhere an "optimization" that
> internal connections to the node's external IP address are handled 
> internally (loopback) instead of being sent to the (virtual) network
> card, so if we cache this knowledge, then next time if the connection is 
> of the "other" type (external connection after we cached the
> internal route, or internal connection after the external route) the 
> result will be either not working (passing external connection
> through loopback) or inefficient (passing internal connection through the 
> real device).
>

Right, that is exactly what I am thinking as well. I am not an expert here, 
but I think the if_simloop() function called from ether_output() is where
that optimization happens. For illustration see this:

"local call (from itself on 192.168.122.15)"
0x8119b040 /httpclient.so   015.368111615 net_packet_in 
   b'IP 192.168.122.15.46890 > 192.168.122.15.8000: Flags ÆSÅ, seq 
744996567, win 65535, options Æmss 1460,nop,wscale 6,sackOK,TS val 15368 
ecr 0Å, length 0'
  log_packet_in(mbuf*, int) core/net_trace.cc:143
  netisr_queue_workstream(netisr_workstream*, unsigned int, netisr_work*, 
mbuf*, int*) Æclone .constprop.0Å bsd/sys/net/netisr.cc:633
  netisr_queue_src bsd/sys/net/netisr.cc:684
  netisr_queue_src bsd/sys/net/netisr.cc:712
  *if_simloop bsd/sys/net/if_loop.cc:291*
  *ether_output bsd/sys/net/if_ethersubr.cc:256*
 * ip_output(mbuf*, mbuf*, route*, int, ip_moptions*, inpcb*) 
bsd/sys/netinet/ip_output.cc:621*
  tcp_output bsd/sys/netinet/tcp_output.cc:1385
  tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:465
  tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:436
  kern_connect bsd/sys/kern/uipc_syscalls.cc:374

"external call"
0x807e0040 virtio-net-rx0 4.284705130 net_packet_out   
b'IP 192.168.122.15.8000 > 192.168.122.1.45818: Flags Æ.Å, ack 512095, 
win 65534, length 0'
  log_packet_out(mbuf*, int) core/net_trace.cc:148
  *ether_output bsd/sys/net/if_ethersubr.cc:397 //This should really 
be ether_output_frame*
  *ether_output bsd/sys/net/if_ethersubr.cc:366*
  *ip_output(mbuf*, mbuf*, route*, int, ip_moptions*, inpcb*) 
bsd/sys/netinet/ip_output.cc:621*
  tcp_twrespond bsd/sys/netinet/tcp_timewait.cc:553
  tcp_twstart bsd/sys/netinet/tcp_timewait.cc:281
  tcp_do_segment(mbuf*, tcphdr*, socket*, tcpcb*, int, int, unsigned char, 
int, bool&) bsd/sys/netinet/tcp_input.cc:2511
  tcp_input bsd/sys/netinet/tcp_input.cc:956
  ip_input bsd/sys/netinet/ip_input.cc:774
  netisr_dispatch_src bsd/sys/net/netisr.cc:769

BTW in case of the local call the qualification "net_packet_in" vs 
"net_packet_out" might be confusing. So this if_simloop() might be what is 
used to handle this kind of "loops" to itself.


> If I understand correctly your patch works because external connections 
> happen to look up the gateway (*.1)
> instead of the actual IP address, so it makes it easy to tell these two 
> cases apart. Is this true?
>
I think these are actually the "responses" back from 192.168.122.15 to the 
"world outside" (pointed by the gateway 192.168.122.1) when
the routecache is used as well. 

>
> I'm not sure this is the *best* way to tell these two cases apart, but I 
> probably agree it's a "good enough" way. As you can
> see in the comments I wrote in routecache.hh (which I should have 
> copy-edited more carefully, I see they have a lot of
> silly mistakes...), routecache.hh is all about the art of the "good 
> enough", so it makes sense to continue this.
>
 
In nutshell, this new version of the logic checks the device associated 
with the cached (present) route entry to determine what kind of match it 
should do - just netmask if non-loopback or
more involved in a loopback one.

>
>
>
> --
> Nadav Har'El
> n...@scylladb.com
>
>
> On Sat, Jul 23, 2022 at 9:15 PM Waldemar Kozaczuk  
> wrote:
>
>> This patch modifies the OSv routecache to handle the case when a route
>> entry gets added for a non-loopback address and loopback device before
>> packets start coming from outside of the local network. The practical
>> example is an app that on startup calls itself on its HTTP endpoint but
>>

Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address

2022-07-24 Thread Nadav Har'El

Hi, this code is probably correct given that both you and the Spirent guys
gave it a lot of thought, but there's something I'm not sure I understand:

If I understand correctly (but please correct me if I'm wrong), the problem
is that OSv has somewhere an "optimization" that
internal connections to the node's external IP address are handled
internally (loopback) instead of being sent to the (virtual) network
card, so if we cache this knowledge, then next time if the connection is of
the "other" type (external connection after we cached the
internal route, or internal connection after the external route) the result
will be either not working (passing external connection
through loopback) or inefficient (passing internal connection through the
real device).

If I understand correctly your patch works because external connections
happen to look up the gateway (*.1)
instead of the actual IP address, so it makes it easy to tell these two
cases apart. Is this true?

I'm not sure this is the *best* way to tell these two cases apart, but I
probably agree it's a "good enough" way. As you can
see in the comments I wrote in routecache.hh (which I should have
copy-edited more carefully, I see they have a lot of
silly mistakes...), routecache.hh is all about the art of the "good
enough", so it makes sense to continue this.



--
Nadav Har'El
n...@scylladb.com


On Sat, Jul 23, 2022 at 9:15 PM Waldemar Kozaczuk 
wrote:

> This patch modifies the OSv routecache to handle the case when a route
> entry gets added for a non-loopback address and loopback device before
> packets start coming from outside of the local network. The practical
> example is an app that on startup calls itself on its HTTP endpoint but
> using non-loopback address like 192.168.122.15 before any external
> requests from outside of the guest. This is exactly what would happen
> when running SeaWeedFS (see #1188). This bug can be also replicated
> with a golang-httpclient and httpserver-api apps:
>
> ./scripts/build image=golang-httpclient,httpserver-monitoring-api
> .script/run.py -api -e '/httpclient.so
> http://192.168.122.15:8000/os/version'
>
> After it starts, run 'curl http://localhost:8000/os/version' and
> observe curl never receive the response.
>
> Let us assume we have an app that binds to 192.168.122.15 and listens
> on some port. When a call to connect on a non-loopback address comes from
> the app itself, it goes through the layers of the networking stack and
> eventually calls route_cache::lookup() to find a route entry for
> 192.168.122.15 as in the stack trace below.
>
>   in route_cache::lookup (dst=dst@entry=0x814bb84c,
> fibnum=, ret=ret@entry=0x814bb860) at
> ./bsd/sys/net/routecache.hh:197
>   in_pcbladdr (inp=inp@entry=0xa155a200, 
> faddr=faddr@entry=0x814bb98c,
> laddr=laddr@entry=0x814bb988, cred=0x0) at
> bsd/sys/netinet/in_pcb.cc:881
>   in_pcbconnect_setup (inp=0xa155a200, nam=0xa08e0510,
> laddrp=0x814bb9e4, lportp=0x814bb9e2,
> faddrp=0xa155a264, fportp=0xa155a254,
> oinpp=0x814bb9e8, cred=0x0) at bsd/sys/netinet/in_pcb.cc:1056
>   tcp_connect (tp=tp@entry=0xa0f76800, 
> nam=nam@entry=0xa08e0510,
> td=) at bsd/sys/netinet/tcp_usrreq.cc:1089
>   tcp_usr_connect (td=, nam=0xa08e0510,
> so=) at bsd/sys/netinet/tcp_usrreq.cc:463
>   tcp_usr_connect (so=, nam=0xa08e0510,
> td=) at bsd/sys/netinet/tcp_usrreq.cc:436
>   kern_connect (fd=, sa=0xa08e0510) at
> bsd/sys/kern/uipc_syscalls.cc:374
>
> Initially the route cache is empty so it adds new entry that associates the
> address 192.168.122.15, the loopback device and its netmask 255.0.0.0.
> Once all subsequent networking stack calls to handle this particular
> HTTP request complete, at some point later external client calls the
> same HTTP endpoint from outside of OSv guest. While handling this call,
> at some point we get to this point as illustrated by the stack trace below:
>
>   in route_cache::lookup (dst=, fibnum=,
> ret=0x80c4ead0) at ./bsd/sys/net/routecache.hh:197
>   in tcp_maxmtu (inc=, flags=0x0) at
> bsd/sys/netinet/tcp_subr.cc:1690
>   in tcp_mssopt (inc=0xa0f7b910) at
> bsd/sys/netinet/tcp_input.cc:3131
>   in syncookie_generate (flowlabel=,
> sc=0xa0f7b900, sch=) at
> bsd/sys/netinet/tcp_syncache.cc:1536
>   in _syncache_add (inc=, to=, th= out>, inp=, lsop=0x80c4edc8, m=0xa0f7bf00,
> toepcb=0x0, tu=0x0) at bsd/sys/netinet/tcp_syncache.cc:1210
>   in tcp_input (m=0xa0f7bf00, off0=) at
> bsd/sys/netinet/tcp_input.cc:941
>   in ip_input (m=) at bsd/sys/netinet/ip_input.cc:774
>   in netisr_dispatch_src (proto=1, source=,
> m=0xa0f7bf00) at bsd/sys/net/netisr.cc:769
>   in netisr_dispatch_src (proto=9, source=,
> m=0xa0f7bf00) at bsd/sys/net/netisr.cc:769
>   in virtio::net::receiver (this=0x90ba3000) at
> drivers/virtio-net.cc:545
>
> This time the destinat

[osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address

2022-07-23 Thread Waldemar Kozaczuk

This patch modifies the OSv routecache to handle the case when a route
entry gets added for a non-loopback address and loopback device before
packets start coming from outside of the local network. The practical
example is an app that on startup calls itself on its HTTP endpoint but
using non-loopback address like 192.168.122.15 before any external
requests from outside of the guest. This is exactly what would happen
when running SeaWeedFS (see #1188). This bug can be also replicated
with a golang-httpclient and httpserver-api apps:

./scripts/build image=golang-httpclient,httpserver-monitoring-api
.script/run.py -api -e '/httpclient.so http://192.168.122.15:8000/os/version'

After it starts, run 'curl http://localhost:8000/os/version' and
observe curl never receive the response.

Let us assume we have an app that binds to 192.168.122.15 and listens
on some port. When a call to connect on a non-loopback address comes from
the app itself, it goes through the layers of the networking stack and
eventually calls route_cache::lookup() to find a route entry for
192.168.122.15 as in the stack trace below.

  in route_cache::lookup (dst=dst@entry=0x814bb84c, fibnum=, ret=ret@entry=0x814bb860) at ./bsd/sys/net/routecache.hh:197
  in_pcbladdr (inp=inp@entry=0xa155a200, 
faddr=faddr@entry=0x814bb98c, laddr=laddr@entry=0x814bb988, 
cred=0x0) at bsd/sys/netinet/in_pcb.cc:881
  in_pcbconnect_setup (inp=0xa155a200, nam=0xa08e0510, 
laddrp=0x814bb9e4, lportp=0x814bb9e2, 
faddrp=0xa155a264, fportp=0xa155a254, oinpp=0x814bb9e8, 
cred=0x0) at bsd/sys/netinet/in_pcb.cc:1056
  tcp_connect (tp=tp@entry=0xa0f76800, 
nam=nam@entry=0xa08e0510, td=) at 
bsd/sys/netinet/tcp_usrreq.cc:1089
  tcp_usr_connect (td=, nam=0xa08e0510, so=) at bsd/sys/netinet/tcp_usrreq.cc:463
  tcp_usr_connect (so=, nam=0xa08e0510, td=) at bsd/sys/netinet/tcp_usrreq.cc:436
  kern_connect (fd=, sa=0xa08e0510) at 
bsd/sys/kern/uipc_syscalls.cc:374

Initially the route cache is empty so it adds new entry that associates the
address 192.168.122.15, the loopback device and its netmask 255.0.0.0.
Once all subsequent networking stack calls to handle this particular
HTTP request complete, at some point later external client calls the
same HTTP endpoint from outside of OSv guest. While handling this call,
at some point we get to this point as illustrated by the stack trace below:

  in route_cache::lookup (dst=, fibnum=, 
ret=0x80c4ead0) at ./bsd/sys/net/routecache.hh:197
  in tcp_maxmtu (inc=, flags=0x0) at 
bsd/sys/netinet/tcp_subr.cc:1690
  in tcp_mssopt (inc=0xa0f7b910) at bsd/sys/netinet/tcp_input.cc:3131
  in syncookie_generate (flowlabel=, sc=0xa0f7b900, 
sch=) at bsd/sys/netinet/tcp_syncache.cc:1536
  in _syncache_add (inc=, to=, th=, inp=, lsop=0x80c4edc8, m=0xa0f7bf00, 
toepcb=0x0, tu=0x0) at bsd/sys/netinet/tcp_syncache.cc:1210
  in tcp_input (m=0xa0f7bf00, off0=) at 
bsd/sys/netinet/tcp_input.cc:941
  in ip_input (m=) at bsd/sys/netinet/ip_input.cc:774
  in netisr_dispatch_src (proto=1, source=, 
m=0xa0f7bf00) at bsd/sys/net/netisr.cc:769
  in netisr_dispatch_src (proto=9, source=, 
m=0xa0f7bf00) at bsd/sys/net/netisr.cc:769
  in virtio::net::receiver (this=0x90ba3000) at 
drivers/virtio-net.cc:545

This time the destination address (dst) is 192.168.122.1 which is a
gateway for the non-loopback (eth0) network. However at this point we have
a single route entry created when handling the first request and the search()
method used by route_cache::lookup() simply compares this entry network
with the network of the IP address in question by applying netmask
255.0.0.0 and it matches so that entry is returned. The problem is that
this and all subsequent packets get routed to the loopback device
instead of the virtio device in this case. And the client never receives
relevant packets to complete the handshake and TCP connection does
not get established from the client perspective. Effectively the
application will never be able to receive any external requests, however
it can still handle any internal ones like the 1st one in this example.

What is interesting if we revert the sequence in this example and send
external request before the app makes an internal request to itself
everything works fine. It is because when handling the external call,
the initial and only route entry gets created for IP 192.168.122.1 and
netmask 255.255.255.0 and non-loopback (eth0) device. The subsequent
internal request would match the netmask and go through the ifloop()
like this illustrates:

0x8119b040 /httpclient.so   015.368111615 net_packet_in
b'IP 192.168.122.15.46890 > 192.168.122.15.8000: Flags [S], seq 744996567, win 
65535, options [mss 1460,nop,wscale 6,sackOK,TS val 15368 ecr 0], length 0'
  log_packet_in(mbuf*, int) core/net_trace.cc:143
  n

Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address

Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address

Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address

Re: [osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address

[osv-dev] [PATCH] routecache: handle case when sent to our non-loopback address

5 matches

Site Navigation

Mail list logo

Footer information