Hi there,

I have the same issue with OVS 2.4 (latest commit in the branch 2.4) and DPDK 
2.0.0 in Debian 8 environment.
After a while it just stuck.

Regards,
Federico

-----Original Message-----
From: discuss [mailto:[email protected]] On Behalf Of Ben Pfaff
Sent: Tuesday, January 26, 2016 7:13 AM
To: Daniele di Proietto <[email protected]>
Cc: [email protected]
Subject: Re: [ovs-discuss] dpdk watchdog stuck?

Daniele, I think that you said in our meeting today that there was some sort of 
bug that falsely blames a thread.  Can you explain further?

On Mon, Jan 25, 2016 at 09:29:52PM +0100, Patrik Andersson R wrote:
> Right, that is likely for sure. Will look there first.
> 
> What do you think of the case where the thread is "main". I've got 
> examples of this one as well. Have not been able to figure out so far 
> what would cause this.
> 
> ...
> ovs-vswitchd.log.1.1.1.1:2016-01-23T01:47:19.026Z|00016|ovs_rcu(urcu2)
> |WARN|blocked
> 32768000 ms waiting for main to quiesce 
> ovs-vswitchd.log.1.1.1.1:2016-01-23T10:53:27.026Z|00017|ovs_rcu(urcu2)
> |WARN|blocked
> 65536000 ms waiting for main to quiesce 
> ovs-vswitchd.log.1.1.1.1:2016-01-24T05:05:43.026Z|00018|ovs_rcu(urcu2)
> |WARN|blocked
> 131072000 ms waiting for main to quiesce 
> ovs-vswitchd.log.1.1.1.1:2016-01-24T18:24:40.826Z|00001|ovs_rcu(urcu1)
> |WARN|blocked
> 1092 ms waiting for main to quiesce
> ovs-vswitchd.log.1.1.1.1:2016-01-24T18:24:41.805Z|00002|ovs_rcu(urcu1)
> |WARN|blocked
> 2072 ms waiting for main to quiesce
> ...
> 
> Could it be in connection with a deletion of a netdev port?
> 
> Regards,
> 
> Patrik
> 
> 
> On 01/25/2016 07:50 PM, Ben Pfaff wrote:
> >On Mon, Jan 25, 2016 at 03:09:09PM +0100, Patrik Andersson R wrote:
> >>during robustness testing, where VM:s are booted and deleted using 
> >>nova boot/delete in rather rapid succession, VMs get stuck in 
> >>spawning state after a few test cycles. Presumably this is due to 
> >>the OVS not responding to port additions and deletions anymore, or 
> >>rather that responses to these requests become painfully slow. Other 
> >>requests towards the vswitchd fail to complete in any reasonable 
> >>time frame as well, ovs-appctl vlog/set is one example.
> >>
> >>The only conclusion I can draw at the moment is that some thread 
> >>(I've observed main and dpdk_watchdog3) is blocking the 
> >>ovsrcu_synchronize() operation for "infinite" time and there is no 
> >>fall-back to get out of this.
> >>To
> >>recover, the minimum operation seems to be a service restart of the 
> >>openvswitch-switch service but that seems to cause other issues longer term.
> >>
> >>In the vswitch log when this happens the following can be observed:
> >>
> >>2016-01-24T20:36:14.601Z|02742|ovs_rcu(vhost_thread2)|WARN|blocked 
> >>1000 ms waiting for dpdk_watchdog3 to quiesce
> >This looks like a bug somewhere in the DPDK code.  The watchdog code 
> >is really simple:
> >
> >     static void *
> >     dpdk_watchdog(void *dummy OVS_UNUSED)
> >     {
> >         struct netdev_dpdk *dev;
> >
> >         pthread_detach(pthread_self());
> >
> >         for (;;) {
> >             ovs_mutex_lock(&dpdk_mutex);
> >             LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> >                 ovs_mutex_lock(&dev->mutex);
> >                 check_link_status(dev);
> >                 ovs_mutex_unlock(&dev->mutex);
> >             }
> >             ovs_mutex_unlock(&dpdk_mutex);
> >             xsleep(DPDK_PORT_WATCHDOG_INTERVAL);
> >         }
> >
> >         return NULL;
> >     }
> >
> >Although it looks at first glance like it doesn't quiesce, xsleep() 
> >does that internally, so I guess check_link_status() must be hanging.
> 
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to