Hi,

We've hit a scaling issue recently [1] in the following topology:

- External network connected to public logical switch "LS-pub"
- ~300 logical networks (LR-i <--> LS-i <--> VMi) connected to LS-pub
with dnat_and_snat rules.

While trying to ping the VMs from outside the ARP request packet from
the external host doesn't reach all the LR-i pipelines because it gets
dropped due to "Too many resubmits".

This happens because the MC_FLOOD group for LS-pub installs openflow
entries that resubmit the packet:
- through the patch ports towards all LR-i (one entry in table 32 with
300 resubmit actions).
- to the egress pipeline for each VIF that's local to LS-pub (one
entry in table 33).

This means that for the ARP broadcast packet we'll execute the LR-i
ingress/egress pipeline 300 times. For each execution we do a fair
amount of resubmits through the different tables of the pipeline
leading to a total number of resubmits for the single initial
broadcast packet of more than 4K, the maximum allowed by OVS.

After looking at the implementation I couldn't figure out a way to
avoid running the full pipelines for each potential logical output
port (patch or local VIF) because we might have flows later in the
pipelines that perform actions based on the value of the logical
output port (e.g., out ACL, qos).

Do you think there's a different approach that we could use to
implement flooding of broadcast/unknown unicast packets that would
require less resubmit actions?

This issue could also appear in a flat topology with a single logical
switch and multiple VIFs (>300). In this case the resubmits would be
part of the the openflow entry in table 33 but the result would be the
same: too many resubmits due to the egress pipeline resubmits for each
logical output port.

Thanks,
Dumitru

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1756945
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to