Hi, just some thought on the results.
the test-conntrack.c now only prepares one batch size of packets for each thread, and repeat running ct on the same batch until the packet number reaches to the parameter of n_pkt, which is, in your case, 16777216 So I guess the optimization results largely comes from the fact that you remove the global lock on the timeout lists, which is taken/released by each packet. the split bucket patch reduces the contention and thus increase the rate of connection setup, I think maybe we need a new benchmark on the setup rate. Paolo Valerio <pvale...@redhat.com> 于2022年3月7日周一 20:39写道: > This series aims to share the work done so far, and to start a > discussion of a slightly different approach in terms of the way we > handle the connections. > It's not considered ready as further work and extensive tests are > needed. > > These are some performance numbers obtained in the following way: > > ./tests/ovstest test-conntrack benchmark $i 16777216 32 1 > > $i baseline ct-scale ct-bucket > 2 16480 ms 3527 ms 3180 ms > 4 37375 ms 4997 ms 4477 ms > 8 63239 ms 9692 ms 8954 ms > 16 131583 ms 19303 ms 18062 ms > > Both ct-scale and ct-bucket introduce a noticeable improvement in > terms of performace. It's worth to mention that the ct-scale, keeping > the exp_lists, is more efficient during the sweeping. > > There are two main logical changes that the series tries to introduce. > The first one is the reintroduction of buckets, the second one is the > removal of the expiration lists. > > The first two patches are a prereq and were picked (untouched) from > this series (by Gaetan): > > > https://patchwork.ozlabs.org/project/openvswitch/list/?series=249027&state=* > > The third patch is a follow up of the following: > > > https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0...@bytedance.com/ > > Some logic has changed, but the overall idea remained the same. > Additional details in the patch description. > > The fourth patch introduces the buckets and removes the expiration > lists. > The buckets got reintroduced as cmaps instead of hmaps, trying to > narrow down the critical sections as well. > > An alternative approach may involve the replacement of cmaps with > linked lists (without increasing the number of locks), increasing the > number of buckets (it could involve the ability to change the buckets > number when the user changes the connection limit). This could improve > the stale conns eviction during the processing of the packet, but this > is currently out of scope for this series (can be discussed, though). > > The insertion, namely the creation of the connection, can probably be > improved (or at least an improvement can be evaluated), as, in case of > nat, it acquires the bucket locks calling nat_get_unique_tuple_lock(). > This is needed because we may need to know the reverse tuple before > locking in order to acquire the locks in the right way to avoid ABBA > deadlocks. > > Further details about locking may be found in patch #4 description. > > There's a known limitation, and it is related mostly to zone > limit. Keeping the stale entries, leads to a mismatch between the > current number of per zone connections and the actual number of valid > connections. This means that if we have a small zone limit number, we > may wait a whole "next_wakeup" time before the entries get cleaned up > (assuming we have candidates for removal). To avoid this, a zone clean > up from the packet path has been added, but alternative approaches as > triggering a per zone cleanup when a threshold of connections has > been reached (storing the connections per bucket and per zone during > the creation) may be still viable. > > Expiration lists have been removed for the sake of evaluation, but > they could be included and duplicated among buckets. In this way we > could distribute the contention (as a matter of fact reducing the > number of connections per list) that affects the write access to a > single data structure during every connection update. > > Patch #6 reintegrates the command below with multiple buckets: > > ovs-appctl dpctl/ct-bkts > > A couple of minor patches have been kept out of the series, for the > time being, as they strongly depend on #4. > > Gaetan Rivet (3): > conntrack: Use a cmap to store zone limits > conntrack-tp: Use a cmap to store timeout policies > conntrack: Use an atomic conn expiration value > > Paolo Valerio (2): > conntrack: Split single cmap to multiple buckets. > conntrack: Make ovs-appctl dpctl/ct-bkts work with multiple buckets > > Peng He (1): > conntrack: Replaces nat_conn introducing key directionality. > > > lib/conntrack-private.h | 60 ++- > lib/conntrack-tp.c | 102 ++-- > lib/conntrack.c | 1062 +++++++++++++++++++++++---------------- > lib/conntrack.h | 6 +- > lib/dpif-netdev.c | 5 +- > tests/system-traffic.at | 5 +- > 6 files changed, 707 insertions(+), 533 deletions(-) > > -- hepeng _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev