[dpdk-dev] ovs crash when running traffic from VM to VM over DPDK and vhostuser

2016-05-02 Thread Yi Ba
Running with dpdk 16.04 and latest ovs from git, and removing "mrg_rxbuf=off" 
from virtio params, the crash is no longer observed. However, we are 
wittnessing ovs gets stuck, and will post to ovs mailing 
list:2016-05-02T17:26:18.804Z|00111|ovs_rcu|WARN|blocked 1000 ms waiting for 
pmd145 to quiesce
2016-05-02T17:26:19.805Z|00112|ovs_rcu|WARN|blocked 2001 ms waiting for pmd145 
to quiesce
2016-05-02T17:26:21.804Z|00113|ovs_rcu|WARN|blocked 4000 ms waiting for pmd145 
to quiesce
2016-05-02T17:26:25.805Z|00114|ovs_rcu|WARN|blocked 8001 ms waiting for pmd145 
to quiesce
2016-05-02T17:26:33.805Z|00115|ovs_rcu|WARN|blocked 16001 ms waiting for pmd145 
to quiesce
2016-05-02T17:26:49.805Z|00116|ovs_rcu|WARN|blocked 32001 ms waiting for pmd145 
to quiesce
2016-05-02T17:27:14.354Z|00072|ovs_rcu(vhost_thread2)|WARN|blocked 128000 ms 
waiting for pmd145 to quiesce
2016-05-02T17:27:15.841Z|8|ovs_rcu(urcu3)|WARN|blocked 128001 ms waiting 
for pmd145 to quiesce
2016-05-02T17:27:21.805Z|00117|ovs_rcu|WARN|blocked 64000 ms waiting for pmd145 
to quiesce
2016-05-02T17:28:25.804Z|00118|ovs_rcu|WARN|blocked 128000 ms waiting for 
pmd145 to quiesce


On Wednesday, 6 April 2016 10:56 AM, Yuanhan Liu  wrote:


 On Tue, Apr 05, 2016 at 08:36:19PM +, Yi Ba wrote:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ff1ddffb700 (LWP 21287)]
> 0x00450da7 in update_secure_len (vec_idx=0x7ff1ddff27f8, 
> secure_len=0x7ff1ddff27fc, id=13948, vq=0x7fe7992c8940)
>? ? at /home/stack/ovs-dpdk/dpdk-2.2.0/lib/librte_vhost/vhost_rxtx.c:452
> 452? ? /home/stack/ovs-dpdk/dpdk-2.2.0/lib/librte_vhost/vhost_rxtx.c: No such 
> file or directory.
> (gdb) bt
> #0? 0x00450da7 in update_secure_len (vec_idx=0x7ff1ddff27f8, 
> secure_len=0x7ff1ddff27fc, id=13948, vq=0x7fe7992c8940)

It looks like a known issue, which has been fixed in this release. So,
could you please just try again with the latest DPDK code? It should
be able to solve your issue.

??? --yliu





[dpdk-dev] ovs crash when running traffic from VM to VM over DPDK and vhostuser

2016-04-07 Thread Yuanhan Liu
On Tue, Apr 05, 2016 at 08:36:19PM +, Yi Ba wrote:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ff1ddffb700 (LWP 21287)]
> 0x00450da7 in update_secure_len (vec_idx=0x7ff1ddff27f8, 
> secure_len=0x7ff1ddff27fc, id=13948, vq=0x7fe7992c8940)
> at /home/stack/ovs-dpdk/dpdk-2.2.0/lib/librte_vhost/vhost_rxtx.c:452
> 452 /home/stack/ovs-dpdk/dpdk-2.2.0/lib/librte_vhost/vhost_rxtx.c: No 
> such file or directory.
> (gdb) bt
> #0  0x00450da7 in update_secure_len (vec_idx=0x7ff1ddff27f8, 
> secure_len=0x7ff1ddff27fc, id=13948, vq=0x7fe7992c8940)

It looks like a known issue, which has been fixed in this release. So,
could you please just try again with the latest DPDK code? It should
be able to solve your issue.

--yliu


[dpdk-dev] ovs crash when running traffic from VM to VM over DPDK and vhostuser

2016-04-05 Thread Yi Ba



This OVS crash was first sent to openvswitch bug report mailing list, but it 
was suggested it is posted to dpdk as crash is in netdev code.
What you did that make the problem appear.   
   - We have an openstack kilo setup. it has 3 controllers and 3 computes. 1 of 
the controllers runs an ODL, which manages the OVS on each compute host. The 
compute hosts are running an hlinux OS, which is HPE's Debian8-based OS.   
each host has 2 numa nodes, each with 12 cores (24 Hyper Threaded). each numa 
with 64GB.   
We patched neutron to create vhostuser ports (which is not available in stable 
kilo), in order to work with dpdk in order to achieve highest throughput 
possible.   
OVS was running with "-c 4" and pmd-core-mask 0x38. all these cores were 
isolated.   
nova was configured with vcpu_pin_set=6-11, and the flavor had 6 vCPUs. flavor 
had 16 1GB huge pages, backed up by real 1GB huge pages in host.   
Then running a traffic generator inside 2 VMs, using DPDK, in order to generate 
traffic. sending directly to the other VMs mac and IP.   

   - What you expected to happen.   
We expected traffic to flow.   

   - What actually happened.   
OVS crashed (in dpdk code). Attached BT.   



   - The Open vSwitch version number (as output by?ovs-vswitchd --version)   
root at BASE-CCP-CPN-N0001-NETCLM:~# ovs-vswitchd --version   
ovs-vswitchd (Open vSwitch) 2.5.0   
Compiled Apr? 4 2016 08:51:09   

   - Any local patches or changes you have applied (if any).   
applied ce179f1163f947fe8dc5afa35a2cdd0756bb53a0   

The following are also handy sometimes:   
   - The kernel version on which Open vSwitch is running (from?/proc/version) 
and the distribution and version number of your OS (e.g. "Centos 5.0").   
root at BASE-CCP-CPN-N0001-NETCLM:~# cat /proc/version   
Linux version 3.14.48-1-amd64-hlinux (pbuilder at build) (gcc version 4.9.2 
(Debian 4.9.2-10) ) #hlinux1 SMP Thu Aug 6 16:02:22 UTC 2015   

   - If you have Open vSwitch configured to connect to an OpenFlow controller, 
the output of?ovs-ofctl show ?for each??configured in the 
vswitchd configuration database.   
We are using odl. attached outputs.   

   - A fix or workaround, if you have one   
We disabled mrg_rxbuf (mrg_rxbuf=off) in qemu   


We can supply more info if necessary, like our exact build process etc.




-- next part --
An embedded and charset-unspecified text was scrubbed...
Name: ovs-ofctl.txt
URL: 

-- next part --
An embedded and charset-unspecified text was scrubbed...
Name: ovs-vswitchd-gdb.txt
URL: