[ovs-discuss] fedora 28 bootloop with ovsdb-server and networking

2018-06-25 Thread Vasiliy Tolstov
Hi. I have very bad issue when booting some servers.
I have infiniband hardware with IPoIB (ip over infiniband).
When sometimes ib network not ready (subnet manager down, link down)
networking service failed to load (systemd-networkd), because it can't
up ib* device. But ovsdb-server can't start because networking not
ready.
So in kvm i have messages about

Failed to start OpenVswitch database unit
Stopped OpenVSwitch database unit
Starting OpenvSwitc database unit
Failed to start Networking service
after that messages looped from the begging (i'm wait more then 30m ,
but tty console not appeared).
What can i do in such case? Why ovsdb-server have hard depency in networking?
As i understand it can bring connection up after some times when
networking ready?

-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Bridge not taking ip address of bonded interface on dhclient command

2018-06-25 Thread Tejali Bhujbal
 I am using Ubuntu server and the following configuration but sometimes
vmbr0 does not take ip address at all.
so is there any alternative to dhclient?


ovs-vsctl add-br vmbr0
ifconfig vmbr0 up
ovs-vsctl add-bond vmbr0 bond0 enp7s0f0 enp7s0f1 trunks=1529,1530
ovs-vsctl set port bond0 lacp=active
ovs-vsctl set port bond0 bond-mode=balance-tcp
ovs-vsctl add-port vmbr0 vlan1529 tag=1529 -- set interface vlan1529
type=internal
ovs-vsctl add-port vmbr0 vlan1530 tag=1530 -- set interface vlan1530
type=internal
ifconfig bond0 0
dhclient vmbr0
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Error vhost-user socket device setup failure for socket...

2018-06-25 Thread Ian Stokes

On 6/22/2018 9:13 PM, kro...@gmx.com wrote:

Hi all,
I am trying to use OVS (2.5.4) with DPSK on Ubuntu server 16.04.3 (KVM 
host) purely for inter-VM communication.

I have been following this guide very closely:
https://help.ubuntu.com/lts/serverguide/DPDK.html


Hi,

Is there a hard requirement why you are using OVS 2.5.4?

DPDK support was very early at that stage. There has been many bug 
fixes/new features & performance improvements for OVS with DPDK since 
then. (The file descriptor error is fixed in a later release for sure).


If you can then I'd recommend moving to the latest OVS 2.9 and testing 
your usecase there to see if you see the segfault issue .


Ian


While I don't have any issue at all with using a normal OVS bridge for 
the VMs running in KVM host:

$ sudo ovs-vsctl add-br br-MGT

I cannot succeed in using OVS with DPDK bridges:
$ sudo ovs-vsctl add-br br-LAN1 -- set bridge br-LAN1 datapath_type=netdev
$ sudo ovs-vsctl add-br br-LAN2 -- set bridge br-LAN2 datapath_type=netdev
I have created 2 ports as follows:
$ sudo ovs-vsctl add-port br-LAN1 vhost-user1 -- set Interface 
vhost-user1 type=dpdkvhostuser
$ sudo ovs-vsctl add-port br-LAN2 vhost-user2 -- set Interface 
vhost-user2 type=dpdkvhostuser

The VM is defined with one interface on each bridges
     
       
       
       
       function='0x0'/>

     
     
       mode='client'/>

       
       function='0x0'/>

     
     
       mode='client'/>

       
       function='0x0'/>

     

After the VM is started, I observe the following errors for both 
vhost-user1 and vhost-user2:
Jun 23 03:35:14 ubt-ovs ovs-vswitchd[2095]: VHOST_CONFIG: fail to bind 
fd:66, remove file:/var/run/openvswitch/vhost-user2 and try again.
Jun 23 03:35:14 ubt-ovs ovs-vswitchd[2095]: 
ovs|00023|dpdk|ERR|vhost-user socket device setup failure for socket 
/var/run/openvswitch/vhost-user2

Also, I observe a segmentation fault crash:
Jun 23 03:35:14 ubt-ovs kernel: [  272.156977] vhost_thread2[1540]: 
segfault at 18 ip 7f379d9b395f sp 7f379c619740 error 4 in 
libdpdk.so.0[7f379d979000+1ea000]
Jun 23 03:35:14 ubt-ovs ovs-vswitchd[1531]: 
ovs|3|daemon_unix(monitor)|ERR|1 crashes: pid 1532 died, killed 
(Segmentation fault), core dumped, restarting


I have tried to remove files /var/run/openvswitch/vhost-user*, delete 
dpdkvhostuser ports, restart the KVM host...

But these 2 kind of issues occur again systematically.

What investigation would you suggest to help me understand the cause(s) 
of these issues?


More complete logs below.

 From the file /var/log/openvswitch/ovs-vswitchd.log:

2018-06-22T19:30:58.531Z|2|vlog|INFO|opened log file 
/var/log/openvswitch/ovs-vswitchd.log
2018-06-22T19:30:58.638Z|3|ovs_numa|INFO|Discovered 6 CPU cores on 
NUMA node 0
2018-06-22T19:30:58.638Z|4|ovs_numa|INFO|Discovered 1 NUMA nodes and 
6 CPU cores
2018-06-22T19:30:58.638Z|5|reconnect|INFO|unix:/var/run/openvswitch/db.sock: 
connecting...
2018-06-22T19:30:58.640Z|6|reconnect|INFO|unix:/var/run/openvswitch/db.sock: 
connected
2018-06-22T19:30:58.645Z|7|ofproto_dpif|INFO|netdev@ovs-netdev: 
Datapath supports recirculation
2018-06-22T19:30:58.645Z|8|ofproto_dpif|INFO|netdev@ovs-netdev: MPLS 
label stack length probed as 3
2018-06-22T19:30:58.645Z|9|ofproto_dpif|INFO|netdev@ovs-netdev: 
Datapath supports unique flow ids
2018-06-22T19:30:58.645Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: 
Datapath does not support ct_state
2018-06-22T19:30:58.645Z|00011|ofproto_dpif|INFO|netdev@ovs-netdev: 
Datapath does not support ct_zone
2018-06-22T19:30:58.645Z|00012|ofproto_dpif|INFO|netdev@ovs-netdev: 
Datapath does not support ct_mark
2018-06-22T19:30:58.645Z|00013|ofproto_dpif|INFO|netdev@ovs-netdev: 
Datapath does not support ct_label
2018-06-22T19:30:58.659Z|00014|ofproto_dpif|INFO|system@ovs-system: 
Datapath supports recirculation
2018-06-22T19:30:58.659Z|00015|ofproto_dpif|INFO|system@ovs-system: MPLS 
label stack length probed as 1
2018-06-22T19:30:58.659Z|00016|ofproto_dpif|INFO|system@ovs-system: 
Datapath supports unique flow ids
2018-06-22T19:30:58.659Z|00017|ofproto_dpif|INFO|system@ovs-system: 
Datapath supports ct_state
2018-06-22T19:30:58.659Z|00018|ofproto_dpif|INFO|system@ovs-system: 
Datapath supports ct_zone
2018-06-22T19:30:58.659Z|00019|ofproto_dpif|INFO|system@ovs-system: 
Datapath supports ct_mark
2018-06-22T19:30:58.659Z|00020|ofproto_dpif|INFO|system@ovs-system: 
Datapath supports ct_label
2018-06-22T19:30:58.672Z|1|ofproto_dpif_upcall(handler12)|INFO|received 
packet on unassociated datapath port 0
2018-06-22T19:30:58.673Z|00021|bridge|INFO|bridge br-LAN2: added 
interface br-LAN2 on port 65534
2018-06-22T19:30:58.674Z|00022|bridge|INFO|bridge br-MGT: added 
interface ens160 on port 1
2018-06-22T19:30:58.702Z|00023|bridge|INFO|bridge br-MGT: added 
interface br-MGT on port 65534
2018-06-22T19:30:58.702Z|1|ofproto_dpif_upcall(handler15)|INFO|received 
packet on unassociated datapath port 1

Re: [ovs-discuss] Bad checksums observed with nsh encapsulation

2018-06-25 Thread Jaime Caamaño Ruiz
Hello

I looked a bit more into the issue.

This is happenning when OVS receives a CHECKSUM_PARTIAL. For a normal
vm2vm non nsh scenario, OVS provides the same CHECKSUM_PARTIAL to the
receiver which wont then verify the checksum.

But when we are pushing nsh headers, the first receiver may not be the
final receiver and CHECKSUM_PARTIAL may not reach the final reciever
which will then verify and reject a bad checksum.

So I think it may be necessary to handle the CHECKSUM_PARTIAL case on
nsh_push, something like adding

if (skb->ip_summed == CHECKSUM_PARTIAL) {
skb_checksum_help(skb);
}

Tried that and got rid of my problem.

Any thoughts?

BR
Jaime.


-Original Message-
From: Jaime Caamaño Ruiz 
Reply-To: jcaam...@suse.com
To: jcaam...@suse.com, ovs-discuss@openvswitch.org
Subject: Re: [ovs-discuss] Bad checksums observed with nsh
encapsulation
Date: Thu, 14 Jun 2018 18:15:10 +0200

Hello

I have done a follow-up test very similar to the previous one, but this
time using two computes such that client and server reside in one of
them and the vnf on the other one. This means that packets coming from
either client/server that are being nsh encapsulated are then forwarded
to the vnf compute egressing through a vxlan tunnel port
(vxlan+eth+nsh+payload). 

In this scenario I dont observe the checksum problem. So it is a
combination of nsh encasulation + tap port egress when the checksum is
sometimes observed to be incorrect.

BR
Jaime.


-Original Message-
From: Jaime Caamaño Ruiz  
Reply-To: jcaam...@suse.com
To: ovs-discuss@openvswitch.org, jcaam...@suse.de
Subject: [ovs-discuss] Bad checksums observed with nsh encapsulation
Date: Wed, 13 Jun 2018 12:51:59 +0200

Hello

I am facing a problem where eth+nsh encapsulated packets egress OVS
with incorrect checksum. 

The scenario is

client  vnf  server

all guests on the same host so this is vm2vm traffic, tap ports are
directly added to the ovs bridge. TCP traffic from/to server port 80 is
encapsulated with eth+nsh and traverse the vnf. I exercise the traffic
by using nc both on client and server.

I include captures at the client [1] and at the vnf [2] where I attempt
three tcp connections on port 80. The general observation is that
packets generated on client/server are seen there with wrong checksums
due to offloading but then arrive at the vnf with correct checksum. But
not all of them. For the first conenction attempt you can see that SYN
(frame 74) and ACK (78) are ok, but then FIN (79) is not ok. A
retransmitted FIN (80) is still not ok and then a further FIN (93)
retranmission is ok. Much of the same happens for the second attempt.
The third attempt shows a bad SYN (104) coming from the server.

Two additional observations:

- This does not happen if I try the same on a port different than 80 so
that the traffic goes directly from the client to the server with no
eth+nsh encapsulation.

- This does not happen if I disable tx offloading both in the server
and the client.

I include also the flows [3] and the ofproto trace [4] for the FIN
(79), generated by the client, which is eth+nsh encapsulated and
forwarded to the vnf. The decision on whether packet should be eth+nsh
encapsulated or no happens on table 101 by setting reg2 which is then
checked on 221. Packet is nsh encapsulated on table 222 and then
ethernet encapsulated on table 83. If not encapsulated packet would go
from 221 back to 220 and output there without any further actions.

Using OVS 2.9.2 with OVS tree kernel module. Kernel is 4.4.

I am understanding the problem correctly in regards to OVS being
responsible for these checksums when offloading is enabled?
Any pointers on how I can debug this further?
Why would just some of the eth+nsh packets exhibit this problem and not
all?
Why would these bad packets be ok after retransmissions?

[1] https://filebin.net/8mnypc2qm4vninof/client.pcap?t=b097kh0m
[2] https://filebin.net/8mnypc2qm4vninof/vnf_eth0.pcap?t=b097kh0m
[3] https://hastebin.com/nuhexufaze.sql
[4] https://hastebin.com/yevufanula.http

Thanks for your help,
Jaime.







___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] a question about ovs crash relationship with learn action

2018-06-25 Thread wangyunjian
I'm running OVS 2.7.0 on a Linux 3.10.0 kernel. I found a ovs crash.
I doubt it's caused by use-after-free set match->flow = NULL in
minimatch_destroy function with following stack:

(gdb) bt
#0  0x7ff273b71197 in raise () from /usr/lib64/libc.so.6
#1  0x7ff273b72888 in abort () from /usr/lib64/libc.so.6
#2  0x00787289 in PAT_abort ()
#3  0x007843cd in patchIllInsHandler ()
#4  
#5  0x004cbfae in miniflow_n_values (flow=0x0) at lib/flow.h:540
#6  0x004cc95f in minimask_hash (mask=0x0, basis=0) at 
lib/classifier-private.h:321
#7  0x004cf613 in find_subtable (cls=0x38ad6e8, mask=0x0) at 
lib/classifier.c:1406
#8  0x004cefa7 in classifier_find_rule_exactly (cls=0x38ad6e8, 
target=0x7ff118025500, version=18446744073709551615) at lib/classifier.c:1178
#9  0x0047bcaf in collect_rules_strict (ofproto=0x389bc30, 
criteria=0x7ff1180254f8, rules=0x7ff118025588) at ofproto/ofproto.c:4253
#10 0x0047eba3 in modify_flow_start_strict (ofproto=0x389bc30, 
ofm=0x7ff1180254f0) at ofproto/ofproto.c:5492
#11 0x00482c9f in ofproto_flow_mod_start (ofproto=0x389bc30, 
ofm=0x7ff1180254f0) at ofproto/ofproto.c:7506
#12 0x0047dc01 in ofproto_flow_mod_learn_start (ofm=0x7ff1180254f0) at 
ofproto/ofproto.c:5088
#13 0x0047dd4b in ofproto_flow_mod_learn (ofm=0x7ff1180254f0, 
keep_ref=true) at ofproto/ofproto.c:5140
#14 0x004b55d4 in xlate_push_stats_entry (entry=0x7ff118015148, 
stats=0x7ff11d6675f0) at ofproto/ofproto-dpif-xlate-cache.c:130
#15 0x004b57b6 in xlate_push_stats (xcache=0x7ff1180254a0, 
stats=0x7ff11d6675f0) at ofproto/ofproto-dpif-xlate-cache.c:183
#16 0x004a312f in revalidate_ukey (udpif=0x38a5260, 
ukey=0x7ff0fc015910, stats=0x7ff11d668260, odp_actions=0x7ff11d66a3d0, 
reval_seq=25145760, recircs=0x7ff11d66a3b0) at 
ofproto/ofproto-dpif-upcall.c:2134
#17 0x004a3d76 in revalidate (revalidator=0x4cdda08) at 
ofproto/ofproto-dpif-upcall.c:2428
#18 0x004a0528 in udpif_revalidator (arg=0x4cdda08) at 
ofproto/ofproto-dpif-upcall.c:954
#19 0x0058f811 in ovsthread_wrapper (aux_=0x55088a0) at 
lib/ovs-thread.c:682
#20 0x7ff27549adc5 in start_thread () from /usr/lib64/libpthread.so.0

Any idea about this?
Thanks,
Yunjian
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss