When ping-pong'in a live VM migration between two machines running
OVS-DPDK every now and then the ping misses would increase
dramatically. For example:

===========Stream Rate: 3Mpps===========
No Stream_Rate Downtime Totaltime Ping_Loss Moongen_Loss
 0       3Mpps      128     13974       115      7168374
 1       3Mpps      145     13620        17      1169770
 2       3Mpps      140     14499       116      7141175
 3       3Mpps      142     13358        16      1150606
 4       3Mpps      136     14004        16      1124020
 5       3Mpps      139     15494       214     13170452
 6       3Mpps      136     15610       217     13282413
 7       3Mpps      146     13194        17      1167512
 8       3Mpps      148     12871        16      1162655
 9       3Mpps      137     15615       214     13170656

I identified this issue being introduced in OVS commit
https://github.com/openvswitch/ovs/commit/f3e7ec254738364101eed8f04b1d954cb510615c
and more specific due to DPDK commit
http://dpdk.org/browse/dpdk/commit/?id=af14759181240120f76c82f894982e8f33f0ba2a

The combined changes no longer have OVS start the vhost socket polling
thread at startup, but DPDK will do it on its own when the first vhost
client is started.

Figuring out the reason why this happens kept me puzzled for quite some time...
What happens is that the callbacks called from the vhost thread are
calling ovsrcu_synchronize() as part of destroy_device(). This will
end-up calling seq_wait__(), and the problem is with this part:

   176static void
   177seq_wait__(struct seq *seq, uint64_t value, const char *where)
   178    OVS_REQUIRES(seq_mutex)
   179{
>> 180    unsigned int id = ovsthread_id_self();
   181    uint32_t hash = hash_int(id, 0);
   182    struct seq_waiter *waiter;
   183
   184    HMAP_FOR_EACH_IN_BUCKET (waiter, hmap_node, hash, &seq->waiters) {
>> 185        if (waiter->ovsthread_id == id) {
   186            if (waiter->value != value) {
   187                /* The current value is different from the value we've 
already
   188                 * waited for, */
   189                poll_immediate_wake_at(where);
   190            } else {
   191                /* Already waiting on 'value', nothing more to do. */
   192            }
>> 193            return;
   194        }
   195    }
   196

By default, all created threads outside of OVS will get thread id 0,
which is equal to the main ovs thread. So for example in the function
above if the main thread is waiting already we won't add ourselves as
a waiter.

The fix below assigns UINT_MAX to none OVS created threads, which will
fix this specific issue. However if more none OVS threads gets added
the issue might arise again.

Currently, I do not see another solution that will work unless DPDK is
adding some framework/callback support when new threads get created.

Signed-off-by: Eelco Chaudron <echau...@redhat.com>
---
 lib/ovs-thread.c        | 2 +-
 lib/ovs-thread.h        | 7 +++++++
 vswitchd/ovs-vswitchd.c | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
index f8bc06d38..e663794ea 100644
--- a/lib/ovs-thread.c
+++ b/lib/ovs-thread.c
@@ -315,7 +315,7 @@ ovs_barrier_block(struct ovs_barrier *barrier)
     }
 }
 
-DEFINE_EXTERN_PER_THREAD_DATA(ovsthread_id, 0);
+DEFINE_EXTERN_PER_THREAD_DATA(ovsthread_id, UINT_MAX);
 
 struct ovsthread_aux {
     void *(*start)(void *);
diff --git a/lib/ovs-thread.h b/lib/ovs-thread.h
index 03fd80439..4f543eb57 100644
--- a/lib/ovs-thread.h
+++ b/lib/ovs-thread.h
@@ -525,4 +525,11 @@ bool may_fork(void);
 int count_cpu_cores(void);
 bool thread_is_pmd(void);
 
+static inline void
+ovsthread_set_id(unsigned int id)
+{
+    assert_single_threaded();
+    *ovsthread_id_get() = id;
+}
+
 #endif /* ovs-thread.h */
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index 414b54780..ed456e2d2 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -39,6 +39,7 @@
 #include "ovsdb-idl.h"
 #include "ovs-rcu.h"
 #include "ovs-router.h"
+#include "ovs-thread.h"
 #include "openvswitch/poll-loop.h"
 #include "simap.h"
 #include "stream-ssl.h"
@@ -78,6 +79,7 @@ main(int argc, char *argv[])
     int retval;
 
     set_program_name(argv[0]);
+    ovsthread_set_id(0);
 
     ovs_cmdl_proctitle_init(argc, argv);
     service_start(&argc, &argv);
-- 
2.17.0

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to