You're right that my patch was a bit of a short term hack. It
addressed a particular case where we could easily discard flows when
we know they could never match (because of the logical port included
in the match).
Use of conjunctive matches is a better more general solution to the
explosion of flows when using address sets, but that hasn't been
worked on yet.
On Wed, Dec 13, 2017 at 5:00 PM, Kevin Lin wrote:
> Thanks for the replies!
>
> We’re using v2.8.1.
>
> I don’t completely understand Russell’s patch, but I don’t think our ACLs
> were taking advantage of it. Do the ACLs need to be “tagged” with port
> information in order for it to be useful?
>
> Before, our ACLs were in terms of L3 and above. I brought down the number of
> flows from 252981 to 16224 by modifying the ACLs to include the logical
> port. For example:
>
> from-lport: (inport == “1” || inport == “2” || inport == “3”) && (ip4.dst ==
> $addressSet)
> to-lport: (ip4.src == $addressSet) && (outport == “1” || outport == “2” ||
> outport == “3”)
>
>
> Is that the right way to take advantage of Russell’s patch? This explodes
> the address set on one side of the connection, but does decrease the number
> of flows installed on each vswitchd.
>
> Even with the decrease in flows in vswitchd, I’m still seeing the log
> messages. Does 16224 flows per vswitchd instance, and 559 flows in ovn-sb
> sound reasonable?
>
> Thanks,
> —Kevin
>
> On Dec 13, 2017, at 7:55 AM, Mark Michelson wrote:
>
> On 12/12/2017 03:26 PM, Kevin Lin wrote:
>
> Hi again,
> We’re trying to scale up our OVN deployment and we’re seeing some worrying
> log messages.
> The topology is 32 containers connected to another 32 containers on 10
> different ports. This is running on 17 machines (one machine runs ovn-northd
> and ovsdb-server, the other 16 run ovn-controller, ovs-vswitchd, and
> ovsdb-server). We’re using an address set for the source group, but not the
> destination group. We’re also creating a different ACL for each port. So the
> ACLs look like:
> One address set for { container1, container2, … container32 }
> addressSet -> container1 on port 80
> addressSet -> container1 on port 81
> …
> addressSet -> container1 on port 90
> addressSet -> container2 on port 80
> …
> addressSet -> container32 on port 90
> The ovn-controller log:
> 2017-12-12T20:14:49Z|11878|timeval|WARN|Unreasonably long 1843ms poll
> interval (1840ms user, 0ms system)
> 2017-12-12T20:14:49Z|11879|timeval|WARN|disk: 0 reads, 16 writes
> 2017-12-12T20:14:49Z|11880|timeval|WARN|context switches: 0 voluntary, 21
> involuntary
> 2017-12-12T20:14:49Z|11881|poll_loop|DBG|wakeup due to [POLLIN] on fd 9
> (172.31.11.193:48460<->172.31.2.181:6640) at lib/stream-fd.c:157 (36% CPU
> usage)
> 2017-12-12T20:14:49Z|11882|poll_loop|DBG|wakeup due to [POLLIN] on fd 12
> (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (36% CPU usage)
> 2017-12-12T20:14:49Z|11883|jsonrpc|DBG|tcp:172.31.2.181:6640: received
> reply, result=[], id="echo"
> 2017-12-12T20:14:49Z|11884|netlink_socket|DBG|nl_sock_transact_multiple__
> (Success): nl(len:36, type=38(family-defined), flags=9[REQUEST][ECHO],
> seq=b11, pid=2268452876
> 2017-12-12T20:14:49Z|11885|netlink_socket|DBG|nl_sock_recv__ (Success):
> nl(len:136, type=36(family-defined), flags=0, seq=b11, pid=2268452876
> 2017-12-12T20:14:49Z|11886|vconn|DBG|unix:/var/run/openvswitch/br-int.mgmt:
> received: OFPT_ECHO_REQUEST (OF1.3) (xid=0x0): 0 bytes of payload
> 2017-12-12T20:14:49Z|11887|vconn|DBG|unix:/var/run/openvswitch/br-int.mgmt:
> sent (Success): OFPT_ECHO_REPLY (OF1.3) (xid=0x0): 0 bytes of payload
> 2017-12-12T20:14:51Z|11888|timeval|WARN|Unreasonably long 1851ms poll
> interval (1844ms user, 8ms system)
> 2017-12-12T20:14:51Z|11889|timeval|WARN|context switches: 0 voluntary, 11
> involuntary
> 2017-12-12T20:14:52Z|11890|poll_loop|DBG|wakeup due to [POLLIN] on fd 9
> (172.31.11.193:48460<->172.31.2.181:6640) at lib/stream-fd.c:157 (73% CPU
> usage)
> 2017-12-12T20:14:52Z|11891|jsonrpc|DBG|tcp:172.31.2.181:6640: received
> request, method="echo", params=[], id="echo"
> 2017-12-12T20:14:52Z|11892|jsonrpc|DBG|tcp:172.31.2.181:6640: send reply,
> result=[], id="echo"
> 2017-12-12T20:14:52Z|11893|netlink_socket|DBG|nl_sock_transact_multiple__
> (Success): nl(len:36, type=38(family-defined), flags=9[REQUEST][ECHO],
> seq=b12, pid=2268452876
> 2017-12-12T20:14:52Z|11894|netlink_socket|DBG|nl_sock_recv__ (Success):
> nl(len:136, type=36(family-defined), flags=0, seq=b12, pid=2268452876
> 2017-12-12T20:14:52Z|11895|netdev_linux|DBG|Dropped 18 log messages in last
> 56 seconds (most recently, 3 seconds ago) due to excessive rate
> 2017-12-12T20:14:52Z|11896|netdev_linux|DBG|unknown qdisc "mq"
> 2017-12-12T20:14:54Z|11897|hmap|DBG|Dropped 15511 log messages in last 6
> seconds (most recently, 0 seconds ago) due to excessive rate
> 2017-12-12T20:14:54Z|11898|hmap|DBG|ovn/lib/expr.c:2644: 6 nodes in bucket
> (128 nodes, 64 buckets)
> 2017-12-12T20:14:54Z|11899|timeval|WA