
Sorry for what is most likely an unconnected reply to a thread - I can't seem 
to figure out how to reply to a thread from before I was subscribed to ML.

We've been testing OVN scaling for our OpenStack cloud, and found what seems to 
be a OF flow explosion that is basically a mirror of the issue reported by 
Girish a week ago or so.

In OpenStack, neutron creates a "default" security group that has 4 rules (2 
for both IPv4 and IPv6):

- allow all egress traffic from the port
- allow all ingress traffic from other ports belonging to the same default group

What we have discovered in our testing, is that this second rule translates 
into the following ACL in OVN:

outport == @pg_304cc336_8db3_4efd_a558_408e648e6259 && ip4 && ip4.src == 
where port_group `pg_304cc336_8db3_4efd_a558_408e648e6259_ip4` is defined in 
nbdb and contains all ports attached to the SG, and address_set 
pg_304cc336_8db3_4efd_a558_408e648e6259_ip4 is defined in sbdb and seems to 
have a list of addresses that are assigned to ports from that port_group[1].

As Girish has explained in his email, such ACLs are translated into a bunch of 
duplicated flows that only seem to differ in metadata:

# ovs-ofctl dump-flows br-int |egrep "(12474|12475)"
 cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
idle_age=47132, priority=2002,ip,reg0=0x100/0x100,reg15=0x3,metadata=0x20e 
 cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
 cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
 cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
 cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
 cookie=0xb25108c3, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
idle_age=47132, priority=2002,conj_id=12475,ip,reg0=0x100/0x100,metadata=0x20e 
 cookie=0xb25108c3, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
idle_age=47132, priority=2002,conj_id=12475,ip,reg0=0x100/0x100,metadata=0x20d 
(See http://paste.openstack.org/show/803598/ for the full output of grep)

His idea of changing this conjunction into one that matches additionally on 
metadata seems to make sense in this particular instance, given that all ports 
from all datapaths need to evaluate same set of rules, and possibly it makes 
sense for all ACLs too?

Anyway, to understand how OF flows are generated by ovn-controller, I took a 
quick look at the source code, and it seems that right now all flows are 
forcefully matched to their datapath (by unconditional matching on metadata 
Would it make sense to introduce a notion of "datapath unbound flow" when 
conjunction is already matching metadata?
Are there some other parts of OVN code that heavily depend on flows being 
installed per-dp?
How would that affect OVS performance when matching packets in userspace? In 
our testing we've ended up with over 1M flows installed in table 45, which 
seems to be dwarfing any potential performance loss from having flows that 
don't match on metadata field, but perhaps I'm wrong? Still, that's a lot of 
flows, and puts a hard scaling limit on some openstack deployments given it's a 
SG that is by default attached to all ports on all VMs.

[1] (although apparently not additional IP addresses allowed on port via 
allowed-address-pair - I think I've seen this issue before while testing magnum.

  Krzysztof Klimonda
