Hi there, I have the same issue with OVS 2.4 (latest commit in the branch 2.4) and DPDK 2.0.0 in Debian 8 environment. After a while it just stuck.
Regards, Federico -----Original Message----- From: discuss [mailto:[email protected]] On Behalf Of Ben Pfaff Sent: Tuesday, January 26, 2016 7:13 AM To: Daniele di Proietto <[email protected]> Cc: [email protected] Subject: Re: [ovs-discuss] dpdk watchdog stuck? Daniele, I think that you said in our meeting today that there was some sort of bug that falsely blames a thread. Can you explain further? On Mon, Jan 25, 2016 at 09:29:52PM +0100, Patrik Andersson R wrote: > Right, that is likely for sure. Will look there first. > > What do you think of the case where the thread is "main". I've got > examples of this one as well. Have not been able to figure out so far > what would cause this. > > ... > ovs-vswitchd.log.1.1.1.1:2016-01-23T01:47:19.026Z|00016|ovs_rcu(urcu2) > |WARN|blocked > 32768000 ms waiting for main to quiesce > ovs-vswitchd.log.1.1.1.1:2016-01-23T10:53:27.026Z|00017|ovs_rcu(urcu2) > |WARN|blocked > 65536000 ms waiting for main to quiesce > ovs-vswitchd.log.1.1.1.1:2016-01-24T05:05:43.026Z|00018|ovs_rcu(urcu2) > |WARN|blocked > 131072000 ms waiting for main to quiesce > ovs-vswitchd.log.1.1.1.1:2016-01-24T18:24:40.826Z|00001|ovs_rcu(urcu1) > |WARN|blocked > 1092 ms waiting for main to quiesce > ovs-vswitchd.log.1.1.1.1:2016-01-24T18:24:41.805Z|00002|ovs_rcu(urcu1) > |WARN|blocked > 2072 ms waiting for main to quiesce > ... > > Could it be in connection with a deletion of a netdev port? > > Regards, > > Patrik > > > On 01/25/2016 07:50 PM, Ben Pfaff wrote: > >On Mon, Jan 25, 2016 at 03:09:09PM +0100, Patrik Andersson R wrote: > >>during robustness testing, where VM:s are booted and deleted using > >>nova boot/delete in rather rapid succession, VMs get stuck in > >>spawning state after a few test cycles. Presumably this is due to > >>the OVS not responding to port additions and deletions anymore, or > >>rather that responses to these requests become painfully slow. Other > >>requests towards the vswitchd fail to complete in any reasonable > >>time frame as well, ovs-appctl vlog/set is one example. > >> > >>The only conclusion I can draw at the moment is that some thread > >>(I've observed main and dpdk_watchdog3) is blocking the > >>ovsrcu_synchronize() operation for "infinite" time and there is no > >>fall-back to get out of this. > >>To > >>recover, the minimum operation seems to be a service restart of the > >>openvswitch-switch service but that seems to cause other issues longer term. > >> > >>In the vswitch log when this happens the following can be observed: > >> > >>2016-01-24T20:36:14.601Z|02742|ovs_rcu(vhost_thread2)|WARN|blocked > >>1000 ms waiting for dpdk_watchdog3 to quiesce > >This looks like a bug somewhere in the DPDK code. The watchdog code > >is really simple: > > > > static void * > > dpdk_watchdog(void *dummy OVS_UNUSED) > > { > > struct netdev_dpdk *dev; > > > > pthread_detach(pthread_self()); > > > > for (;;) { > > ovs_mutex_lock(&dpdk_mutex); > > LIST_FOR_EACH (dev, list_node, &dpdk_list) { > > ovs_mutex_lock(&dev->mutex); > > check_link_status(dev); > > ovs_mutex_unlock(&dev->mutex); > > } > > ovs_mutex_unlock(&dpdk_mutex); > > xsleep(DPDK_PORT_WATCHDOG_INTERVAL); > > } > > > > return NULL; > > } > > > >Although it looks at first glance like it doesn't quiesce, xsleep() > >does that internally, so I guess check_link_status() must be hanging. > _______________________________________________ discuss mailing list [email protected] http://openvswitch.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list [email protected] http://openvswitch.org/mailman/listinfo/discuss
