On Wed, Aug 31, 2022 at 2:06 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> On 8/31/22 01:32, Han Zhou wrote:
> >
> >
> > On Tue, Aug 30, 2022 at 9:35 AM Ilya Maximets <i.maxim...@ovn.org
<mailto:i.maxim...@ovn.org>> wrote:
> >>
> >> On 8/24/22 08:40, Han Zhou wrote:
> >> > The ls_in_pre_stateful priority 120 flow that saves dst IP and Port
to
> >> > registers is causing a critical dataplane performance impact to
> >> > short-lived connections, because it unwildcards megaflows with exact
> >> > match on dst IP and L4 ports. Any new connections with a different
> >> > client side L4 port will encounter datapath flow miss and upcall to
> >> > ovs-vswitchd, which makes typical use cases such as HTTP1.0 based
> >> > RESTful API calls suffer big performance degredations.
> >> >
> >> > These fields (dst IP and port) were saved to registers to solve a
> >> > problem of LB hairpin use case when different VIPs are sharing
> >> > overlapping backend+port [0]. The change [0] might not have as wide
> >> > performance impact as it is now because at that time one of the match
> >> > condition "REGBIT_CONNTRACK_NAT == 1" was set only for established
and
> >> > natted traffic, while now the impact is more obvious because
> >> > REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP
> >> > configured on the LS) since commit [1], after several other
indirectly
> >> > related optimizations and refactors.
> >> >
> >> > This patch fixes the problem by modifying the priority-120 flows in
> >> > ls_in_pre_stateful. Instead of blindly saving dst IP and L4 port for
any
> >> > traffic with the REGBIT_CONNTRACK_NAT == 1, we now save dst IP and L4
> >> > port only for traffic matching the LB VIPs, because these are the
ones
> >> > that need to be saved for the hairpin purpose. The existed
priority-110
> >> > flows will match the rest of the traffic just like before but
wouldn't
> >> > not save dst IP and L4 port, so any server->client traffic would not
> >> > unwildcard megaflows with client side L4 ports.
> >>
> >> Hmm, but if higher priority flows have matches on these fields,
datapath
> >> flows will have them unwildcarded anyway.  So, why exactly that is
better
> >> than the current approach?
> >>
> > Hi Ilya,
> >
> > The problem of the current approach is that it blindly saves the L4 dst
port for any traffic in any direction, as long as there are VIPs configured
on the datapath.
> > So consider the most typical scenario of a client sending API requests
to server backends behind a VIP. On the server side, any *reply* packets
would hit the flow that saves the client side L4 port because for
server->client direction the client port is the dst. If the client sends 10
requests, each with a different source port, the server side will end up
with unwildcarded DP flows like below: (192.168.1.2 is client IP)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51224),
packets:5, bytes:2475, used:1.118s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x20)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51226),
packets:5, bytes:2475, used:1.105s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x21)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=37798),
packets:5, bytes:2475, used:0.574s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x40)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51250),
packets:5, bytes:2475, used:0.872s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x2d)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46940),
packets:5, bytes:2475, used:0.109s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x60)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46938),
packets:5, bytes:2475, used:0.118s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x5f)
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51236),
packets:5, bytes:2475, used:0.938s, flags:FP.,
actions:ct(zone=8,nat),recirc(0x26)
> > ...
> >
> > As a result, DP flows explode and every new request is going to be a
miss and upcall to userspace, which is very inefficient. Even worse, as the
flow is so generic, even traffic unrelated to the VIP would have the same
impact, as long as a server on a LS with any VIP configuration is replying
client requests.
> > With the fix, only the client->VIP packets would hit such flows, and in
those cases the dst port is the server (well known) port, which is expected
to be matched in megaflows anyway, while the client side port is not
unwildcarded, so new requests/replies will match megaflows in fast path.
> > The above megaflows become:
> >
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=
128.0.0.0/128.0.0.0,frag=no <http://128.0.0.0/128.0.0.0,frag=no>),
packets:263, bytes:112082, used:0.013s, flags:SFP.,
actions:ct(zone=8,nat),recirc(0xd)
>
>
> Oh, OK.  Thanks for the explanation!
>
> So, it's a reply traffic, and it will not have matches on L3 level
unwildcarded
> too much since, I suppose, it has a destination address typically in a
different
> subnet.

After the fix, yes. Before the fix, no, because of the flow that saves dst
IP and port to registers.

> So, the ipv4 trie on addresses cuts off the rest of the L3/L4 headers
> including source ip and the ports from the match criteria.

Sorry I am not sure if I understand your question here.
If you are talking about the server(source)->client(destination) direction,
for the source/server ip and port, this is correct (before and after the
fix).
If you are talking about the client ip and ports, it is the case after the
fix, but not before the fix.

Thanks,
Han

>
> Did I get that right?
>
> >
> > Thanks,
> > Han
> >
> >> I see how that can help for the case where vIPs has no ports specified,
> >> because we will not have ports unwildcarded in this case, but I thought
> >> it's a very unlikely scenario for, e.g., ovn-kubernetes setups.  And if
> >> even one vIP will have a port, all the datapath flows will have a port
> >> match.  Or am I missing something?
> >>
> >> Best regards, Ilya Maximets.
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to