[ovs-discuss] OpenFlow port number leak causing OVN GW data-plane down
Hello folks, I am writing to share a problem and fix, and also ask a question. I found a problem this week which caused OVN GW data-plane down. After onboarding a hypervisor to existing OVN deployment where a lot a hypervisors and VMs have been running well, suddenly all GW nodes' all BFD tunnel status were down, shown by "ovs-vsctl show" (and of course, all VMs lost connection). Checking the logs of ovs-vswitchd, there were same logs shown on all GW nodes that ofp port number 65535 is used for creating the new tunnel interface to the new hypervisor, e.g.: 2018-11-06T01:29:10.042Z|142103|dpif(ovs-vswitchd)|WARN|system@ovs-system: failed to add ovn-aded97-0 as port: Device or resource busy 2018-11-06T01:29:10.045Z|142104|bridge(ovs-vswitchd)|INFO|bridge br-int: added interface ovn-aded97-0 on port 65535 2018-11-06T01:29:11.479Z|142108|ofproto(ovs-vswitchd)|WARN|br-int: cannot configure bfd on nonexistent port 65535 2018-11-06T01:29:11.479Z|142109|ofproto(ovs-vswitchd)|WARN|br-int: cannot configure LLDP on nonexistent port 65535 2018-11-06T01:29:11.479Z|142110|ofproto(ovs-vswitchd)|WARN|br-int: cannot configure datapath on nonexistent port 65535 ... 2018-11-06T01:29:18.783Z|142117|bfd(ovs-vswitchd)|INFO|ovn-aded97-0: BFD state change: admin_down->down "No Diagnostic"->"No Diagnostic". 2018-11-06T01:29:18.785Z|00061|bfd(monitor82)|INFO|Interface ovn-aded97-0 remote mult value 0 changed to 3 2018-11-06T01:29:18.785Z|00062|bfd(monitor82)|INFO|ovn-aded97-0: New remote min_rx. ... 2018-11-06T01:29:18.773Z|142111|bridge(ovs-vswitchd)|INFO|bridge br-int: deleted interface ovn-aded97-0 on port 65535 ... 2018-11-06T01:29:18.779Z|142115|dpif(ovs-vswitchd)|WARN|system@ovs-system: failed to add ovn-aded97-0 as port: Device or resource busy 2018-11-06T01:29:18.782Z|142116|bridge(ovs-vswitchd)|INFO|bridge br-int: added interface ovn-aded97-0 on port 65535 ... 2018-11-06T01:29:18.785Z|00064|bfd(monitor82)|WARN|ovn-aded97-0: Incorrect your_disc. ... After debugging with the OVS code, here is reason why 65535 is used as port number. Auto-generated port number range is between 1 - 32768. If all the numbers are used, the functionalloc_ofp_port() will return this OFPP_NONE which is 65535. But the caller doesn't check if the returned port is valid or not, and just continue using this invalid number. The setup doesn't have so many hypervisors and tunnels, and the reason why the port numbers are exhausted is because of port number leak in corner cases. Particularly, when OVN SB has redundant chassis (with same IP), ovn-controller will create redundant tunnel interfaces. ovs-vswitchd fails to add the redundant port to ofproto, but in this case every time ovs-vswitchd tries to add the port, it generates a new port number without freeing it afterwards. In this environment there are other events causing ovsdb changes frequently, so every time ovsdb changes, ovs-vswitchd tries to add the redundant port and leaks port numbers. Over a long period, ovs-vswitchd enters a state that no valid number is available, thus triggered the above problem that uses 65535 as the tunnel port number. The recovery was pretty simple - just restart ovs on all GW nodes. For these problems, I submitted two fixes: https://patchwork.ozlabs.org/project/openvswitch/list/?series=74665 (in addition, I am working on avoiding adding redundant entries to OVN SB chassis table) Now comes to my question. The time when all the GW BFD status went down matches perfectly with the time when the port number 65535 is used. However, I still didn't understand why would using the port number 65535 cause BFD status down on all tunnels (to other GWs and all hypervisors). Could someone help explain here, so that we are confident that there is no other potential problems? Thanks, Han ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Issue when using local_ip with VXLAN tunnels in OVS
On Wed, Nov 7, 2018 at 6:43 PM Gregory Rose wrote: > On 11/6/2018 3:17 PM, Gregory Rose wrote: > > > > I see. It appears you are right and I misread the documentation. OK, > > l'll investigate further then. > > > > - Greg > > > > > > Siva, > > I am still looking into this but wanted to update you on my status. > > I am seeing problems with the vxlan local_ip option myself. Either I > don't actually understand the docs > and examples I've looked at or else there is a real bug but given the > reports from you and Marcus (in > the email thread you first referenced) I think there must be some issue. > > Flavio mentioned that it might be an MTU issue but the testing I'm doing > would not be affected by > the MTU so I think there is something else. > > Hopefully I can make some progress on this soon - I'll let you know. > > Thanks, > > - Greg > > Thanks for the update. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Issue when using local_ip with VXLAN tunnels in OVS
On 11/6/2018 3:17 PM, Gregory Rose wrote: I see. It appears you are right and I misread the documentation. OK, l'll investigate further then. - Greg Siva, I am still looking into this but wanted to update you on my status. I am seeing problems with the vxlan local_ip option myself. Either I don't actually understand the docs and examples I've looked at or else there is a real bug but given the reports from you and Marcus (in the email thread you first referenced) I think there must be some issue. Flavio mentioned that it might be an MTU issue but the testing I'm doing would not be affected by the MTU so I think there is something else. Hopefully I can make some progress on this soon - I'll let you know. Thanks, - Greg ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] RFC: incremental computation for OVN with DDlog
On Wed, Nov 07, 2018 at 08:57:00AM -0500, Mark Michelson wrote: > Thanks for the e-mail, Ben. I'm 100% behind this effort. The performance > benefits and the potential drop in CPU usage of OVN components is absolutely > worth it. I have some questions inline below with regards to specific points > you've brought up. > > On 11/02/2018 01:44 PM, Ben Pfaff wrote: > >I was asked in an OVN meeting to send out an email talking about what > >we're working on to make ovn-northd and ovn-controller faster. Here's > >my summary. > > > >OVN is essentially a stack of compilers. At the top, the CMS dumps > >some configuration into the northbound database (NDBB). Then: > > > > 1. ovn-northd centrally translates the high-level NBDB description > >into logical flows in the southbound database (SBDB). > > > > 2. ovn-controller, on each HV, translates the SBDB logical flows > >into "physical" (OpenFlow) flows for the local hypervisor and > >passes them to ovs-vswitchd. > > > > 3. ovs-vswitchd translates OpenFlow flows into datapath flows on > >demand as traffic appears. > > > >Currently, OVN implements steps 1 and 2 with code that translates all > >input to output in one go. When any of the input changes, it > >re-translates all of it. This is fine for small deployments, but it > >scales poorly beyond about 1000 hypervisors, at which point each > >translation step begins to take multiple seconds. Larger deployments > >call for incremental computation, in which a small change in the input > >requires only a small amount of computation to yield a small change in > >the output. > > > >It is difficult to implement incremental computation in C. For > >ovn-controller, two attempts have been made already. The first attempt, > >in 2016, increased code complexity without similar benefit > >(https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/078272.html). > >A recent approach, by Han Zhou shows a much bigger improvement, but it > >also increases complexity greatly and definitely makes maintenance more > >difficult. > > > >Justin and I are proposing a new approach, based on an incremental > >computation engine called Differential Datalog, or DDlog for short > >(https://github.com/ryzhyk/differential-datalog). DDlog is open source > >software developed at the VMware Research Group in Palo Alto by Leonid > >Ryzhyk, Mihai Budiu, and others. It uses an underlying engine developed > >by Frank McSherry at Microsoft Research, called Differential Dataflow > >(https://github.com/frankmcsherry/differential-dataflow). Here's a talk > >that Leonid gave on DDlog earlier this earlier: > >https://ovsorbit.org/#e58 > > > >DDlog appears suitable for steps 1 and 2, that is, for both ovn-northd > >and ovn-controller. Justin and I are starting with ovn-northd, because > >it is a simpler case, and once we've arrived at some minimum amount of > >success, Han is going to apply what we've learned to ovn-controller as > >well. Leonid and Mihai have been working very closely with us (we have > >literally been writing DDlog code in conference rooms in 90 minute > >sessions with everyone clustered around laptops) and none of it could > >happen without them. > > > >Here's the process we'll need to follow to get DDlog to work with > >ovn-northd: > > > >* DDlog needs to be able to talk to OVSDB for input (reading data from > > the northbound database) and output (writing data to the southbound > > database). Therefore, we need to write OVSDB adapters for DDlog. > > Leonid has already done an important part of this work. There is > > more work to do plumbing the adapter into ovn-northd's database > > connections. > > Is this work in one of the repos you previously linked? If not, is there > somewhere we can find the WIP? The part that is implemented is a DDlog API that accepts and produces the JSON format that OVSDB understands. The API for this is in rust/template/ddlog.h in the northd branch at https://github.com/ryzhyk/differential-datalog. You can search for "json" in that file to see what's there. The missing piece is OVS client library code to pass the JSON to and from the actual database server. > >* We need to translate the C flow generation code in ovn-northd into > > DDlog's domain specific language. There are some tricky parts to > > this but we expect the bulk of it to be straightforward and probably > > easier to read in DDlog than in C. We've started with the tricky > > parts, which you can find at > > > > https://github.com/ryzhyk/differential-datalog/blob/northd/test/ovn/ovn_northd.dl > > Please don't take the code there as illustrative of what one would > > typically see for flow generation, because as I said, these are the > > hard parts. > > Thanks for the code examples. Seeing sample DDlog is very nice, even if it's > not necessarily illustrative of what the final product will be. > > For those of us doing work right now to add new features to
Re: [ovs-discuss] Adding a linux bond to ovs bridge
On Wed, Nov 07, 2018 at 06:24:02AM +, Srinivas via discuss wrote: > We have a system where we support linux bond mode 1(Active backup) and bond > mode 4(LACP). The bonds are pre-created on boot up.Can we add these bonds as > a ports to the ovs bridge without breaking any of the linux bond > functionality? Ordinarily this will work. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] unsupported version 0x5
On Wed, Nov 07, 2018 at 11:48:58AM +0500, Ramzah Rehman wrote: > I was just running ryu app simple_switch_15.py with ovs-vswitchd version > 2.10.90 but the controller threw error. Is Openflow15 not supported by ovs > 2.10.90? If that's the case, how to enable support for OpenFlow15? I had > set OpenFlow15 in switch as a protocol as well. The FAQ says: Q: What versions of OpenFlow does Open vSwitch support? A: The following table lists the versions of OpenFlow supported by each version of Open vSwitch: === = = = = = = = Open vSwitchOF1.0 OF1.1 OF1.2 OF1.3 OF1.4 OF1.5 OF1.6 === = = = = = = = 1.9 and earlier yes --- --- --- --- --- --- 1.10, 1.11 yes --- (*) (*) --- --- --- 2.0, 2.1 yes (*) (*) (*) --- --- --- 2.2 yes (*) (*) (*) (%) (*) --- 2.3, 2.4 yes yes yes yes (*) (*) --- 2.5, 2.6, 2.7yes yes yes yes (*) (*) (*) 2.8 yes yes yes yes yes (*) (*) === = = = = = = = --- Not supported. yes Supported and enabled by default (*) Supported, but missing features, and must be enabled by user. (%) Experimental, unsafe implementation. In any case, the user may override the default: - To enable OpenFlow 1.0, 1.1, 1.2, and 1.3 on bridge br0:: $ ovs-vsctl set bridge br0 \ protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13 - To enable OpenFlow 1.0, 1.1, 1.2, 1.3, 1.4, and 1.5 on bridge br0:: $ ovs-vsctl set bridge br0 \ protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 - To enable only OpenFlow 1.0 on bridge br0:: $ ovs-vsctl set bridge br0 protocols=OpenFlow10 All current versions of ovs-ofctl enable only OpenFlow 1.0 by default. Use the -O option to enable support for later versions of OpenFlow in ovs-ofctl. For example:: $ ovs-ofctl -O OpenFlow13 dump-flows br0 (Open vSwitch 2.2 had an experimental implementation of OpenFlow 1.4 that could cause crashes. We don't recommend enabling it.) :doc:`/topics/openflow` tracks support for OpenFlow 1.1 and later features. When support for OpenFlow 1.5 and 1.6 is solidly implemented, Open vSwitch will enable those version by default. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] RFC: incremental computation for OVN with DDlog
Thanks for the e-mail, Ben. I'm 100% behind this effort. The performance benefits and the potential drop in CPU usage of OVN components is absolutely worth it. I have some questions inline below with regards to specific points you've brought up. On 11/02/2018 01:44 PM, Ben Pfaff wrote: I was asked in an OVN meeting to send out an email talking about what we're working on to make ovn-northd and ovn-controller faster. Here's my summary. OVN is essentially a stack of compilers. At the top, the CMS dumps some configuration into the northbound database (NDBB). Then: 1. ovn-northd centrally translates the high-level NBDB description into logical flows in the southbound database (SBDB). 2. ovn-controller, on each HV, translates the SBDB logical flows into "physical" (OpenFlow) flows for the local hypervisor and passes them to ovs-vswitchd. 3. ovs-vswitchd translates OpenFlow flows into datapath flows on demand as traffic appears. Currently, OVN implements steps 1 and 2 with code that translates all input to output in one go. When any of the input changes, it re-translates all of it. This is fine for small deployments, but it scales poorly beyond about 1000 hypervisors, at which point each translation step begins to take multiple seconds. Larger deployments call for incremental computation, in which a small change in the input requires only a small amount of computation to yield a small change in the output. It is difficult to implement incremental computation in C. For ovn-controller, two attempts have been made already. The first attempt, in 2016, increased code complexity without similar benefit (https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/078272.html). A recent approach, by Han Zhou shows a much bigger improvement, but it also increases complexity greatly and definitely makes maintenance more difficult. Justin and I are proposing a new approach, based on an incremental computation engine called Differential Datalog, or DDlog for short (https://github.com/ryzhyk/differential-datalog). DDlog is open source software developed at the VMware Research Group in Palo Alto by Leonid Ryzhyk, Mihai Budiu, and others. It uses an underlying engine developed by Frank McSherry at Microsoft Research, called Differential Dataflow (https://github.com/frankmcsherry/differential-dataflow). Here's a talk that Leonid gave on DDlog earlier this earlier: https://ovsorbit.org/#e58 DDlog appears suitable for steps 1 and 2, that is, for both ovn-northd and ovn-controller. Justin and I are starting with ovn-northd, because it is a simpler case, and once we've arrived at some minimum amount of success, Han is going to apply what we've learned to ovn-controller as well. Leonid and Mihai have been working very closely with us (we have literally been writing DDlog code in conference rooms in 90 minute sessions with everyone clustered around laptops) and none of it could happen without them. Here's the process we'll need to follow to get DDlog to work with ovn-northd: * DDlog needs to be able to talk to OVSDB for input (reading data from the northbound database) and output (writing data to the southbound database). Therefore, we need to write OVSDB adapters for DDlog. Leonid has already done an important part of this work. There is more work to do plumbing the adapter into ovn-northd's database connections. Is this work in one of the repos you previously linked? If not, is there somewhere we can find the WIP? * We need to translate the C flow generation code in ovn-northd into DDlog's domain specific language. There are some tricky parts to this but we expect the bulk of it to be straightforward and probably easier to read in DDlog than in C. We've started with the tricky parts, which you can find at https://github.com/ryzhyk/differential-datalog/blob/northd/test/ovn/ovn_northd.dl Please don't take the code there as illustrative of what one would typically see for flow generation, because as I said, these are the hard parts. Thanks for the code examples. Seeing sample DDlog is very nice, even if it's not necessarily illustrative of what the final product will be. For those of us doing work right now to add new features to OVN, how should we approach the conversion to DDlog? As an example, I have some multicast work in progress that will add some new northbound data. It also introduces ovn-northd changes to generate logical flows and southbound data. My assumption is that I should focus 100% on the C implementation for now. When should I consider adding the analogous DDlog changes? Is there some sort of plan for how to keep DDlog up to date in the face of new C development? For instance, would we implement a policy that states that C changes will not be accepted without equivalent DDlog changes? For this initial conversion, would we declare a C feature freeze date that states that no
Re: [ovs-discuss] Issue when using local_ip with VXLAN tunnels in OVS
On Tue, Nov 06, 2018 at 06:21:22PM -0500, Siva Teja ARETI wrote: > Yes. Packet counts are incremented. > > [root@vm1 ~]# ovs-ofctl dump-ports testbr0 > OFPST_PORT reply (xid=0x2): 3 ports > port LOCAL: rx pkts=0, bytes=0, drop=68726, errs=0, frame=0, over=0, crc=0 >tx pkts=0, bytes=0, drop=0, errs=0, coll=0 > port vxlan0: rx pkts=0, bytes=0, drop=?, errs=?, frame=?, over=?, crc=? >tx pkts=58, bytes=2436, drop=?, errs=?, coll=? > port "2cfb62a9b0f04_l": rx pkts=69211, bytes=2918374, drop=0, errs=0, > frame=0, over=0, crc=0 >tx pkts=190, bytes=17532, drop=0, errs=0, coll=0 It sounds like you have a MTU issue. When using VXLAN, the packet is not passing somewhere. fbl > > Siva Teja. > > On Tue, Nov 6, 2018 at 6:15 PM Flavio Leitner wrote: > > > On Tue, Nov 06, 2018 at 02:09:23PM -0500, Siva Teja ARETI wrote: > > > Answers in line. > > > > > > Siva Teja. > > > > > > On Tue, Nov 6, 2018 at 1:56 PM Flavio Leitner wrote: > > > > > > > On Tue, Nov 06, 2018 at 11:51:49AM -0500, Siva Teja ARETI wrote: > > > > > Hi Greg, > > > > > > > > > > Thanks for looking into this. > > > > > > > > > > I have two VMs in my setup each with two interfaces. Trying to setup > > the > > > > > VXLAN tunnels across these interfaces which are in different > > subnets. A > > > > > docker container is attached to ovs bridge using ovs-docker utility > > on > > > > each > > > > > VM and doing a ping from one container to another. > > > > > > > > Do you see any interesting related messages in 'dmesg' output or in > > > > ovs-vswitchd.log? > > > > > > > > > > I could not find any interesting messages in dmesg or in ovs-vswitchd.log > > > output. > > > > > > > > > > If I recall correctly, the "ip l" should show the vxlan dev named > > > > vxlan_sys_ > > > > > > > > > > Yes. I can see the dev on both of my VMs > > > > > > [root@vm1 ~]# ifconfig vxlan_sys_4789 > > > vxlan_sys_4789: flags=4163 mtu 65000 > > > inet6 fe80::2a:28ff:fed2:d4f6 prefixlen 64 scopeid 0x20 > > > ether 02:2a:28:d2:d4:f6 txqueuelen 1000 (Ethernet) > > > RX packets 0 bytes 0 (0.0 B) > > > RX errors 0 dropped 0 overruns 0 frame 0 > > > TX packets 48 bytes 1680 (1.6 KiB) > > > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > > > Do you see TX increasing as you execute the test? > > or in ovs-ofctl dump-ports ? > > > > Thanks, > > fbl > > > > > > > > > > > > > > > fbl > > > > > > > > > > > > > > *VM1 details:* > > > > > > > > > > [root@vm1 ~]# ip a > > > > > ... > > > > > 3: eth1: mtu 1500 qdisc pfifo_fast > > > > state > > > > > UP qlen 1000 > > > > > link/ether 52:54:00:b8:05:be brd ff:ff:ff:ff:ff:ff > > > > > inet 30.30.0.59/24 brd 30.30.0.255 scope global dynamic eth1 > > > > >valid_lft 3002sec preferred_lft 3002sec > > > > > inet6 fe80::5054:ff:feb8:5be/64 scope link > > > > >valid_lft forever preferred_lft forever > > > > > 4: eth2: mtu 1500 qdisc pfifo_fast > > > > state > > > > > UP qlen 1000 > > > > > link/ether 52:54:00:f0:64:37 brd ff:ff:ff:ff:ff:ff > > > > > inet 20.20.0.183/24 brd 20.20.0.255 scope global dynamic eth2 > > > > >valid_lft 3248sec preferred_lft 3248sec > > > > > inet6 fe80::5054:ff:fef0:6437/64 scope link > > > > >valid_lft forever preferred_lft forever > > > > > ... > > > > > [root@vm1 ~]# ovs-vsctl show > > > > > ff70c814-d1b0-4018-aee8-8b635187afee > > > > > Bridge "testbr0" > > > > > Port "gre0" > > > > > Interface "gre0" > > > > > type: gre > > > > > options: {local_ip="20.20.0.183", > > > > remote_ip="30.30.0.193"} > > > > > Port "testbr0" > > > > > Interface "testbr0" > > > > > type: internal > > > > > Port "2cfb62a9b0f04_l" > > > > > Interface "2cfb62a9b0f04_l" > > > > > ovs_version: "2.9.2" > > > > > [root@vm1 ~]# ip rule list > > > > > 0: from all lookup local > > > > > 32765: from 20.20.0.183 lookup siva > > > > > 32766: from all lookup main > > > > > 32767: from all lookup default > > > > > [root@vm1 ~]# ip route show table siva > > > > > default dev eth2 scope link src 20.20.0.183 > > > > > [root@vm1 ~]# # A docker container is > > attached > > > > to > > > > > ovs bridge using ovs-docker utility > > > > > [root@vm1 ~]# docker ps > > > > > CONTAINER IDIMAGE COMMAND CREATED > > > > > STATUS PORTS NAMES > > > > > be4ab434db99busybox "sh"5 days > > ago > > > > > Up 5 days admiring_euclid > > > > > [root@vm1 ~]# nsenter -n -t `docker inspect be4 > > > > --format={{.State.Pid}}` -- > > > > > ip a > > > > > 1: lo: mtu 65536 qdisc noqueue state UNKNOWN > > qlen > > > > 1 > > > > > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > > > > > inet 127.0.0.1/8 scope