On Wed, Aug 31, 2022 at 2:06 AM Ilya Maximets <i.maxim...@ovn.org> wrote: > > On 8/31/22 01:32, Han Zhou wrote: > > > > > > On Tue, Aug 30, 2022 at 9:35 AM Ilya Maximets <i.maxim...@ovn.org <mailto:i.maxim...@ovn.org>> wrote: > >> > >> On 8/24/22 08:40, Han Zhou wrote: > >> > The ls_in_pre_stateful priority 120 flow that saves dst IP and Port to > >> > registers is causing a critical dataplane performance impact to > >> > short-lived connections, because it unwildcards megaflows with exact > >> > match on dst IP and L4 ports. Any new connections with a different > >> > client side L4 port will encounter datapath flow miss and upcall to > >> > ovs-vswitchd, which makes typical use cases such as HTTP1.0 based > >> > RESTful API calls suffer big performance degredations. > >> > > >> > These fields (dst IP and port) were saved to registers to solve a > >> > problem of LB hairpin use case when different VIPs are sharing > >> > overlapping backend+port [0]. The change [0] might not have as wide > >> > performance impact as it is now because at that time one of the match > >> > condition "REGBIT_CONNTRACK_NAT == 1" was set only for established and > >> > natted traffic, while now the impact is more obvious because > >> > REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP > >> > configured on the LS) since commit [1], after several other indirectly > >> > related optimizations and refactors. > >> > > >> > This patch fixes the problem by modifying the priority-120 flows in > >> > ls_in_pre_stateful. Instead of blindly saving dst IP and L4 port for any > >> > traffic with the REGBIT_CONNTRACK_NAT == 1, we now save dst IP and L4 > >> > port only for traffic matching the LB VIPs, because these are the ones > >> > that need to be saved for the hairpin purpose. The existed priority-110 > >> > flows will match the rest of the traffic just like before but wouldn't > >> > not save dst IP and L4 port, so any server->client traffic would not > >> > unwildcard megaflows with client side L4 ports. > >> > >> Hmm, but if higher priority flows have matches on these fields, datapath > >> flows will have them unwildcarded anyway. So, why exactly that is better > >> than the current approach? > >> > > Hi Ilya, > > > > The problem of the current approach is that it blindly saves the L4 dst port for any traffic in any direction, as long as there are VIPs configured on the datapath. > > So consider the most typical scenario of a client sending API requests to server backends behind a VIP. On the server side, any *reply* packets would hit the flow that saves the client side L4 port because for server->client direction the client port is the dst. If the client sends 10 requests, each with a different source port, the server side will end up with unwildcarded DP flows like below: (192.168.1.2 is client IP) > > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51224), packets:5, bytes:2475, used:1.118s, flags:FP., actions:ct(zone=8,nat),recirc(0x20) > > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51226), packets:5, bytes:2475, used:1.105s, flags:FP., actions:ct(zone=8,nat),recirc(0x21) > > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=37798), packets:5, bytes:2475, used:0.574s, flags:FP., actions:ct(zone=8,nat),recirc(0x40) > > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51250), packets:5, bytes:2475, used:0.872s, flags:FP., actions:ct(zone=8,nat),recirc(0x2d) > > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46940), packets:5, bytes:2475, used:0.109s, flags:FP., actions:ct(zone=8,nat),recirc(0x60) > > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46938), packets:5, bytes:2475, used:0.118s, flags:FP., actions:ct(zone=8,nat),recirc(0x5f) > > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51236), packets:5, bytes:2475, used:0.938s, flags:FP., actions:ct(zone=8,nat),recirc(0x26) > > ... > > > > As a result, DP flows explode and every new request is going to be a miss and upcall to userspace, which is very inefficient. Even worse, as the flow is so generic, even traffic unrelated to the VIP would have the same impact, as long as a server on a LS with any VIP configuration is replying client requests. > > With the fix, only the client->VIP packets would hit such flows, and in those cases the dst port is the server (well known) port, which is expected to be matched in megaflows anyway, while the client side port is not unwildcarded, so new requests/replies will match megaflows in fast path. > > The above megaflows become: > > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst= 128.0.0.0/128.0.0.0,frag=no <http://128.0.0.0/128.0.0.0,frag=no>), packets:263, bytes:112082, used:0.013s, flags:SFP., actions:ct(zone=8,nat),recirc(0xd) > > > Oh, OK. Thanks for the explanation! > > So, it's a reply traffic, and it will not have matches on L3 level unwildcarded > too much since, I suppose, it has a destination address typically in a different > subnet.
After the fix, yes. Before the fix, no, because of the flow that saves dst IP and port to registers. > So, the ipv4 trie on addresses cuts off the rest of the L3/L4 headers > including source ip and the ports from the match criteria. Sorry I am not sure if I understand your question here. If you are talking about the server(source)->client(destination) direction, for the source/server ip and port, this is correct (before and after the fix). If you are talking about the client ip and ports, it is the case after the fix, but not before the fix. Thanks, Han > > Did I get that right? > > > > > Thanks, > > Han > > > >> I see how that can help for the case where vIPs has no ports specified, > >> because we will not have ports unwildcarded in this case, but I thought > >> it's a very unlikely scenario for, e.g., ovn-kubernetes setups. And if > >> even one vIP will have a port, all the datapath flows will have a port > >> match. Or am I missing something? > >> > >> Best regards, Ilya Maximets. > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev