On Tue, Jul 31, 2018 at 12:43 PM, Matteo Croce <mcr...@redhat.com> wrote: > On Mon, Jul 16, 2018 at 4:54 PM Matteo Croce <mcr...@redhat.com> wrote: >> >> On Tue, Jul 10, 2018 at 6:31 PM Pravin Shelar <pshe...@ovn.org> wrote: >> > >> > On Wed, Jul 4, 2018 at 7:23 AM, Matteo Croce <mcr...@redhat.com> wrote: >> > > From: Stefano Brivio <sbri...@redhat.com> >> > > >> > > Open vSwitch sends to userspace all received packets that have >> > > no associated flow (thus doing an "upcall"). Then the userspace >> > > program creates a new flow and determines the actions to apply >> > > based on its configuration. >> > > >> > > When a single port generates a high rate of upcalls, it can >> > > prevent other ports from dispatching their own upcalls. vswitchd >> > > overcomes this problem by creating many netlink sockets for each >> > > port, but it quickly exceeds any reasonable maximum number of >> > > open files when dealing with huge amounts of ports. >> > > >> > > This patch queues all the upcalls into a list, ordering them in >> > > a per-port round-robin fashion, and schedules a deferred work to >> > > queue them to userspace. >> > > >> > > The algorithm to queue upcalls in a round-robin fashion, >> > > provided by Stefano, is based on these two rules: >> > > - upcalls for a given port must be inserted after all the other >> > > occurrences of upcalls for the same port already in the queue, >> > > in order to avoid out-of-order upcalls for a given port >> > > - insertion happens once the highest upcall count for any given >> > > port (excluding the one currently at hand) is greater than the >> > > count for the port we're queuing to -- if this condition is >> > > never true, upcall is queued at the tail. This results in a >> > > per-port round-robin order. >> > > >> > > In order to implement a fair round-robin behaviour, a variable >> > > queueing delay is introduced. This will be zero if the upcalls >> > > rate is below a given threshold, and grows linearly with the >> > > queue utilisation (i.e. upcalls rate) otherwise. >> > > >> > > This ensures fairness among ports under load and with few >> > > netlink sockets. >> > > >> > Thanks for the patch. >> > This patch is adding following overhead for upcall handling: >> > 1. kmalloc. >> > 2. global spin-lock. >> > 3. context switch to single worker thread. >> > I think this could become bottle neck on most of multi core systems. >> > You have mentioned issue with existing fairness mechanism, Can you >> > elaborate on those, I think we could improve that before implementing >> > heavy weight fairness in upcall handling. >> >> Hi Pravin, >> >> vswitchd allocates N * P netlink sockets, where N is the number of >> online CPU cores, and P the number of ports. >> With some setups, this number can grow quite fast, also exceeding the >> system maximum file descriptor limit. >> I've seen a 48 core server failing with -EMFILE when trying to create >> more than 65535 netlink sockets needed for handling 1800+ ports. >> >> I made a previous attempt to reduce the sockets to one per CPU, but >> this was discussed and rejected on ovs-dev because it would remove >> fairness among ports[1].
Rather than reducing number of thread down to 1, we could find better number of FDs per port. How about this simple solution: 1. Allocate (N * P) FDs as long as it is under FD limit. 2. If FD limit (-EMFILE) is hit reduce N value by half and repeat step 1. Thanks, Pravin. >> I think that the current approach of opening a huge number of sockets >> doesn't really work, (it doesn't scale for sure), it still needs some >> queueing logic (either in kernel or user space) if we really want to >> be sure that low traffic ports gets their upcalls quota when other >> ports are doing way more traffic. >> >> If you are concerned about the kmalloc or spinlock, we can solve them >> with kmem_cache or two copies of the list and rcu, I'll happy to >> discuss the implementation details, as long as we all agree that the >> current implementation doesn't scale well and has an issue. >> >> [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344279.html >> >> -- >> Matteo Croce >> per aspera ad upstream > > Hi all, > > any idea on how to solve the file descriptor limit hit by the netlink sockets? > I see this issue happen very often, and raising the FD limit to 400k > seems not the right way to solve it. > Any other suggestion on how to improve the patch, or solve the problem > in a different way? > > Regards, > > > > -- > Matteo Croce > per aspera ad upstream _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev