> It is great that you are considering lock-free optimizations.
> Apart of that, with regard to structural changes for salability, I would
> still prefer a structure without the centralized dispatcher. But as
> also mentioned by Alex, this is not urgent.

Completely agree, I'd like to get rid of it too.  Turns out to be
rather difficult so we've been focusing on lower hanging fruit first.
But this is definitely on our radar.

> I am kind of worried about this point, because in our testing we
> observed that when system is loaded even with ~90% CPU for
> dispatcher thread, it cannot handle upcalls in batch in most cases.
> Here is the data collected by Chengyuan:
> n_upcalls total            1076606
> n_upcalls in a batch > 5:             4537
> n_upcalls in a batch > 10:          825
> n_upcalls in a batch > 20:           103
> n_upcalls in a batch >= 50:           0
>
> Less than 5% upcalls were handled in a batch bigger than 5, and
> the MAX_BATCH number 50 was never hit.
> When we increased hping speed above this threshold batch handling
> did increase a lot but fastpath already started dropping packets because
> dispatcher was now bottleneck and cannot handle fast enough:
>         lookups: hit:165917 missed:8042062 lost:168576
>
> This means batching will not happen naturally: it is either no batching,
> or batching but overloaded.

Thanks for collecting these numbers.  I haven't actually done these
measurements on my end yet, and I'm fairly surprised to see the
batching isn't as effective as I would have guessed.  This is
definitely something we'll investigate going forward once we've
attacked some of the other performance projects we're looking at.
Hopefully there's a solution that strikes a balance between latency
and throughput.

One more thing that occurred to me on the issue of lock contention.
Are you running with hyperthreading enabled?  If so, you're going to
have quite a few more threads hitting various locks, but not
necessarily more CPU for them to take advantage of.  I definitely
would not expect our performance to scale linearly once we've got more
threads than the number of real cores on the system. It might be worth
manually reducing the number of handlers and checking what kind of
numbers you get.

Ethan
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to