[ovs-discuss] fedora 28 bootloop with ovsdb-server and networking
Hi. I have very bad issue when booting some servers. I have infiniband hardware with IPoIB (ip over infiniband). When sometimes ib network not ready (subnet manager down, link down) networking service failed to load (systemd-networkd), because it can't up ib* device. But ovsdb-server can't start because networking not ready. So in kvm i have messages about Failed to start OpenVswitch database unit Stopped OpenVSwitch database unit Starting OpenvSwitc database unit Failed to start Networking service after that messages looped from the begging (i'm wait more then 30m , but tty console not appeared). What can i do in such case? Why ovsdb-server have hard depency in networking? As i understand it can bring connection up after some times when networking ready? -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] Bridge not taking ip address of bonded interface on dhclient command
I am using Ubuntu server and the following configuration but sometimes vmbr0 does not take ip address at all. so is there any alternative to dhclient? ovs-vsctl add-br vmbr0 ifconfig vmbr0 up ovs-vsctl add-bond vmbr0 bond0 enp7s0f0 enp7s0f1 trunks=1529,1530 ovs-vsctl set port bond0 lacp=active ovs-vsctl set port bond0 bond-mode=balance-tcp ovs-vsctl add-port vmbr0 vlan1529 tag=1529 -- set interface vlan1529 type=internal ovs-vsctl add-port vmbr0 vlan1530 tag=1530 -- set interface vlan1530 type=internal ifconfig bond0 0 dhclient vmbr0 ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Error vhost-user socket device setup failure for socket...
On 6/22/2018 9:13 PM, kro...@gmx.com wrote: Hi all, I am trying to use OVS (2.5.4) with DPSK on Ubuntu server 16.04.3 (KVM host) purely for inter-VM communication. I have been following this guide very closely: https://help.ubuntu.com/lts/serverguide/DPDK.html Hi, Is there a hard requirement why you are using OVS 2.5.4? DPDK support was very early at that stage. There has been many bug fixes/new features & performance improvements for OVS with DPDK since then. (The file descriptor error is fixed in a later release for sure). If you can then I'd recommend moving to the latest OVS 2.9 and testing your usecase there to see if you see the segfault issue . Ian While I don't have any issue at all with using a normal OVS bridge for the VMs running in KVM host: $ sudo ovs-vsctl add-br br-MGT I cannot succeed in using OVS with DPDK bridges: $ sudo ovs-vsctl add-br br-LAN1 -- set bridge br-LAN1 datapath_type=netdev $ sudo ovs-vsctl add-br br-LAN2 -- set bridge br-LAN2 datapath_type=netdev I have created 2 ports as follows: $ sudo ovs-vsctl add-port br-LAN1 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser $ sudo ovs-vsctl add-port br-LAN2 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuser The VM is defined with one interface on each bridges function='0x0'/> mode='client'/> function='0x0'/> mode='client'/> function='0x0'/> After the VM is started, I observe the following errors for both vhost-user1 and vhost-user2: Jun 23 03:35:14 ubt-ovs ovs-vswitchd[2095]: VHOST_CONFIG: fail to bind fd:66, remove file:/var/run/openvswitch/vhost-user2 and try again. Jun 23 03:35:14 ubt-ovs ovs-vswitchd[2095]: ovs|00023|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhost-user2 Also, I observe a segmentation fault crash: Jun 23 03:35:14 ubt-ovs kernel: [ 272.156977] vhost_thread2[1540]: segfault at 18 ip 7f379d9b395f sp 7f379c619740 error 4 in libdpdk.so.0[7f379d979000+1ea000] Jun 23 03:35:14 ubt-ovs ovs-vswitchd[1531]: ovs|3|daemon_unix(monitor)|ERR|1 crashes: pid 1532 died, killed (Segmentation fault), core dumped, restarting I have tried to remove files /var/run/openvswitch/vhost-user*, delete dpdkvhostuser ports, restart the KVM host... But these 2 kind of issues occur again systematically. What investigation would you suggest to help me understand the cause(s) of these issues? More complete logs below. From the file /var/log/openvswitch/ovs-vswitchd.log: 2018-06-22T19:30:58.531Z|2|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2018-06-22T19:30:58.638Z|3|ovs_numa|INFO|Discovered 6 CPU cores on NUMA node 0 2018-06-22T19:30:58.638Z|4|ovs_numa|INFO|Discovered 1 NUMA nodes and 6 CPU cores 2018-06-22T19:30:58.638Z|5|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2018-06-22T19:30:58.640Z|6|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2018-06-22T19:30:58.645Z|7|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation 2018-06-22T19:30:58.645Z|8|ofproto_dpif|INFO|netdev@ovs-netdev: MPLS label stack length probed as 3 2018-06-22T19:30:58.645Z|9|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports unique flow ids 2018-06-22T19:30:58.645Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_state 2018-06-22T19:30:58.645Z|00011|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_zone 2018-06-22T19:30:58.645Z|00012|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_mark 2018-06-22T19:30:58.645Z|00013|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_label 2018-06-22T19:30:58.659Z|00014|ofproto_dpif|INFO|system@ovs-system: Datapath supports recirculation 2018-06-22T19:30:58.659Z|00015|ofproto_dpif|INFO|system@ovs-system: MPLS label stack length probed as 1 2018-06-22T19:30:58.659Z|00016|ofproto_dpif|INFO|system@ovs-system: Datapath supports unique flow ids 2018-06-22T19:30:58.659Z|00017|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state 2018-06-22T19:30:58.659Z|00018|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_zone 2018-06-22T19:30:58.659Z|00019|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_mark 2018-06-22T19:30:58.659Z|00020|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label 2018-06-22T19:30:58.672Z|1|ofproto_dpif_upcall(handler12)|INFO|received packet on unassociated datapath port 0 2018-06-22T19:30:58.673Z|00021|bridge|INFO|bridge br-LAN2: added interface br-LAN2 on port 65534 2018-06-22T19:30:58.674Z|00022|bridge|INFO|bridge br-MGT: added interface ens160 on port 1 2018-06-22T19:30:58.702Z|00023|bridge|INFO|bridge br-MGT: added interface br-MGT on port 65534 2018-06-22T19:30:58.702Z|1|ofproto_dpif_upcall(handler15)|INFO|received packet on unassociated datapath port 1
Re: [ovs-discuss] Bad checksums observed with nsh encapsulation
Hello I looked a bit more into the issue. This is happenning when OVS receives a CHECKSUM_PARTIAL. For a normal vm2vm non nsh scenario, OVS provides the same CHECKSUM_PARTIAL to the receiver which wont then verify the checksum. But when we are pushing nsh headers, the first receiver may not be the final receiver and CHECKSUM_PARTIAL may not reach the final reciever which will then verify and reject a bad checksum. So I think it may be necessary to handle the CHECKSUM_PARTIAL case on nsh_push, something like adding if (skb->ip_summed == CHECKSUM_PARTIAL) { skb_checksum_help(skb); } Tried that and got rid of my problem. Any thoughts? BR Jaime. -Original Message- From: Jaime Caamaño Ruiz Reply-To: jcaam...@suse.com To: jcaam...@suse.com, ovs-discuss@openvswitch.org Subject: Re: [ovs-discuss] Bad checksums observed with nsh encapsulation Date: Thu, 14 Jun 2018 18:15:10 +0200 Hello I have done a follow-up test very similar to the previous one, but this time using two computes such that client and server reside in one of them and the vnf on the other one. This means that packets coming from either client/server that are being nsh encapsulated are then forwarded to the vnf compute egressing through a vxlan tunnel port (vxlan+eth+nsh+payload). In this scenario I dont observe the checksum problem. So it is a combination of nsh encasulation + tap port egress when the checksum is sometimes observed to be incorrect. BR Jaime. -Original Message- From: Jaime Caamaño Ruiz Reply-To: jcaam...@suse.com To: ovs-discuss@openvswitch.org, jcaam...@suse.de Subject: [ovs-discuss] Bad checksums observed with nsh encapsulation Date: Wed, 13 Jun 2018 12:51:59 +0200 Hello I am facing a problem where eth+nsh encapsulated packets egress OVS with incorrect checksum. The scenario is client vnf server all guests on the same host so this is vm2vm traffic, tap ports are directly added to the ovs bridge. TCP traffic from/to server port 80 is encapsulated with eth+nsh and traverse the vnf. I exercise the traffic by using nc both on client and server. I include captures at the client [1] and at the vnf [2] where I attempt three tcp connections on port 80. The general observation is that packets generated on client/server are seen there with wrong checksums due to offloading but then arrive at the vnf with correct checksum. But not all of them. For the first conenction attempt you can see that SYN (frame 74) and ACK (78) are ok, but then FIN (79) is not ok. A retransmitted FIN (80) is still not ok and then a further FIN (93) retranmission is ok. Much of the same happens for the second attempt. The third attempt shows a bad SYN (104) coming from the server. Two additional observations: - This does not happen if I try the same on a port different than 80 so that the traffic goes directly from the client to the server with no eth+nsh encapsulation. - This does not happen if I disable tx offloading both in the server and the client. I include also the flows [3] and the ofproto trace [4] for the FIN (79), generated by the client, which is eth+nsh encapsulated and forwarded to the vnf. The decision on whether packet should be eth+nsh encapsulated or no happens on table 101 by setting reg2 which is then checked on 221. Packet is nsh encapsulated on table 222 and then ethernet encapsulated on table 83. If not encapsulated packet would go from 221 back to 220 and output there without any further actions. Using OVS 2.9.2 with OVS tree kernel module. Kernel is 4.4. I am understanding the problem correctly in regards to OVS being responsible for these checksums when offloading is enabled? Any pointers on how I can debug this further? Why would just some of the eth+nsh packets exhibit this problem and not all? Why would these bad packets be ok after retransmissions? [1] https://filebin.net/8mnypc2qm4vninof/client.pcap?t=b097kh0m [2] https://filebin.net/8mnypc2qm4vninof/vnf_eth0.pcap?t=b097kh0m [3] https://hastebin.com/nuhexufaze.sql [4] https://hastebin.com/yevufanula.http Thanks for your help, Jaime. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] a question about ovs crash relationship with learn action
I'm running OVS 2.7.0 on a Linux 3.10.0 kernel. I found a ovs crash. I doubt it's caused by use-after-free set match->flow = NULL in minimatch_destroy function with following stack: (gdb) bt #0 0x7ff273b71197 in raise () from /usr/lib64/libc.so.6 #1 0x7ff273b72888 in abort () from /usr/lib64/libc.so.6 #2 0x00787289 in PAT_abort () #3 0x007843cd in patchIllInsHandler () #4 #5 0x004cbfae in miniflow_n_values (flow=0x0) at lib/flow.h:540 #6 0x004cc95f in minimask_hash (mask=0x0, basis=0) at lib/classifier-private.h:321 #7 0x004cf613 in find_subtable (cls=0x38ad6e8, mask=0x0) at lib/classifier.c:1406 #8 0x004cefa7 in classifier_find_rule_exactly (cls=0x38ad6e8, target=0x7ff118025500, version=18446744073709551615) at lib/classifier.c:1178 #9 0x0047bcaf in collect_rules_strict (ofproto=0x389bc30, criteria=0x7ff1180254f8, rules=0x7ff118025588) at ofproto/ofproto.c:4253 #10 0x0047eba3 in modify_flow_start_strict (ofproto=0x389bc30, ofm=0x7ff1180254f0) at ofproto/ofproto.c:5492 #11 0x00482c9f in ofproto_flow_mod_start (ofproto=0x389bc30, ofm=0x7ff1180254f0) at ofproto/ofproto.c:7506 #12 0x0047dc01 in ofproto_flow_mod_learn_start (ofm=0x7ff1180254f0) at ofproto/ofproto.c:5088 #13 0x0047dd4b in ofproto_flow_mod_learn (ofm=0x7ff1180254f0, keep_ref=true) at ofproto/ofproto.c:5140 #14 0x004b55d4 in xlate_push_stats_entry (entry=0x7ff118015148, stats=0x7ff11d6675f0) at ofproto/ofproto-dpif-xlate-cache.c:130 #15 0x004b57b6 in xlate_push_stats (xcache=0x7ff1180254a0, stats=0x7ff11d6675f0) at ofproto/ofproto-dpif-xlate-cache.c:183 #16 0x004a312f in revalidate_ukey (udpif=0x38a5260, ukey=0x7ff0fc015910, stats=0x7ff11d668260, odp_actions=0x7ff11d66a3d0, reval_seq=25145760, recircs=0x7ff11d66a3b0) at ofproto/ofproto-dpif-upcall.c:2134 #17 0x004a3d76 in revalidate (revalidator=0x4cdda08) at ofproto/ofproto-dpif-upcall.c:2428 #18 0x004a0528 in udpif_revalidator (arg=0x4cdda08) at ofproto/ofproto-dpif-upcall.c:954 #19 0x0058f811 in ovsthread_wrapper (aux_=0x55088a0) at lib/ovs-thread.c:682 #20 0x7ff27549adc5 in start_thread () from /usr/lib64/libpthread.so.0 Any idea about this? Thanks, Yunjian ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss