[ovs-dev] [PATCH V2 1/1] dpif-netlink: Log eth type 0x1234 not offloadable
Ethernet type 0x1234 is used for testing and not being offloadable. For testing offloadable features, log about using this value. Signed-off-by: Eli Britstein Acked-by: Roi Dayan Signed-off-by: Eli Britstein --- lib/dpif-netlink.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index ba80a0079..a0d51ae61 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -2007,6 +2007,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct dpif_flow_put *put) /* When we try to install a dummy flow from a probed feature. */ if (match.flow.dl_type == htons(0x1234)) { +VLOG_INFO_RL(&rl, "eth 0x1234 is special and not offloadable"); return EOPNOTSUPP; } -- 2.17.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] tnl-neigh-cache: Purge learnt neighbors when port/bridge is deleted
Hi Flavio, I am trying to emulate the test case scenario you mentioned earlier, where in we need to clean you neighbor cache when an external interface goes down. To study that, I wrote a small script based off of test case system-traffic.at, "datapath - ping over vxlan tunnel". And this experiment gave me good experience around the understanding of netdev and kmod implementation of VxLAN. Various tests I have performed using the new script include, 1. ovs-vswitchd started normally, for performing kernel mode VxLAN tests. And in this mode, tested following cases a. VxLAN interface being part of OVS. b. VxLAN interface external to OVS I see that in kernel mode of operation, native tunneling is off, and hence no ARP entries are learnt in tnl-neigh-cache. Please refer to ofproto-dpif-xlate.c:terminate_native_tunnel(). So, whatever changes we are making here(as part of my commit) are not effected in kernel mode of operation. 2. ovs-vswitchd started with "--disable-system" option for performing usermode VxLAN tests. And in this mode, tested following cases a. VxLAN interface being part of OVS. This test case works. In this case, entries are cleanly removed by user deleting the ports from the bridge. b. VxLAN interface external to OVS. I could not get this case to work. 3. ovs-vswitchd started normally(without --disable-system option) for performing usermode VxLAN tests. And in this mode, tested following cases a. VxLAN interface being part of OVS. This test case works. In this case, entries are cleanly removed by user deleting the ports from the bridge. b. VxLAN interface external to OVS. I could not get this case to work. I could not 2b and 3b use cases to at all. Do you expect these to work normally? The reason I ask this is, as I understand from the code, even though "ovs/route/show" shows the route to remote vtep, OVS does not know which ODP port to take to send the packet out of. Please refer to ofproto-dpif-xlate.c:native_tunnel_output() and tnl_route_lookup_flow() and hence packet transmission fails with "native tunnel routing failed". If you agree that 2b and 3b are not supported use cases, then I can submit my code changes as per your review comments. Please let me know what you think of. -Vasu *Vasu Dasari* On Mon, Jun 24, 2019 at 6:15 PM Flavio Leitner wrote: > On Mon, Jun 24, 2019 at 05:43:26PM -0400, Vasu Dasari wrote: > > On Mon, Jun 24, 2019 at 3:58 PM Flavio Leitner wrote: > > > On Wed, Jun 19, 2019 at 11:02:07PM -0400, Vasu Dasari wrote: > > > > +{ > > > > +struct tnl_neigh_entry *neigh; > > > > +bool changed = false; > > > > + > > > > +ovs_mutex_lock(&mutex); > > > > +CMAP_FOR_EACH(neigh, cmap_node, &table) { > > > > +if (nullable_string_is_equal(neigh->br_name, br_name)) { > > > > +tnl_neigh_delete(neigh); > > > > +changed = true; > > > > > > Do you expect to match on additional entries? Otherwise it > > > could bail out here. > > > > > > > > Say there are two or more neighbors on the port or on bridge, then by > > bailing out we would be missing others. So, will leave it there. > > You're right. > > [...] > > > However, as I mentioned in the discussion, the tnl IP address could > > > be on a device that doesn't belong to OVS at all. For example: > [...] > > > The tnl endpoint must have a valid route, so I suggest to hook the > > > tnl_neigh_cache_flush into route-table.c which receives a notification > > > when a route changes. If a route is deleted, we should flush the > > > device's tnl cache. Doing so, would cover both userspace and kernel > > > datapath for OVS and non-OVS devices. > > > > > > > > I see what you are saying. Let me play with code a bit and resubmit > patch. > > OK, looking forward to it! > Thanks Vasu, > fbl > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv5] tunnel: Add layer 2 IPv6 GRE encapsulation support.
On 7/1/2019 12:45 PM, William Tu wrote: The patch adds ip6gre support. Tunnel type 'ip6gre' with packet_type= legacy_l2 is a layer 2 GRE tunnel over IPv6, carrying inner ethernet packets and encap with GRE header with outer IPv6 header. Encapsulation of layer 3 packet over IPv6 GRE, ip6gre, is not supported yet. I tested it by running: # make check-kernel TESTSUITEFLAGS='-k ip6gre' under kernel 5.2 and for userspace: # make check TESTSUITEFLAGS='-k ip6gre' Signed-off-by: William Tu --- v1-v2 - rebase to master v2-v3 - update documentation suggested by Eli v3-v4 - squash Eli's documentation v4-v5 - remove using 'ip6gretap', use only 'ip6gre' with options:packet_type=legacy_l2 --- LGTM - ran into two problems testing, user error or on one and an issue on a single system that is not reproducible anywhere else. Ran check-kmod for systems running the out of tree kernel modules and check-kernel on a system with the upstream kernel - all checks out well. Passes a travis check - https://travis-ci.org/gvrose8192/ovs-experimental/builds/552977116 Thanks William! Tested-by: Greg Rose Reviewed-by: Greg Rose - Greg Documentation/faq/configuration.rst | 13 +++ NEWS| 1 + datapath/linux/compat/ip6_gre.c | 2 +- lib/dpif-netlink-rtnl.c | 8 - tests/system-traffic.at | 40 + tests/tunnel-push-pop-ipv6.at | 69 + vswitchd/vswitch.xml| 32 ++--- 7 files changed, 151 insertions(+), 14 deletions(-) diff --git a/Documentation/faq/configuration.rst b/Documentation/faq/configuration.rst index cb2c6b4eca98..ff3b71a5d4ef 100644 --- a/Documentation/faq/configuration.rst +++ b/Documentation/faq/configuration.rst @@ -212,6 +212,19 @@ Q: Does Open vSwitch support ERSPAN? options:erspan_ver=2 options:erspan_dir=1 \ options:erspan_hwid=4 +Q: Does Open vSwitch support IPv6 GRE? + +A: Yes. L2 tunnel interface GRE over IPv6 is supported. +L3 GRE tunnel over IPv6 is not supported. + +:: + +$ ovs-vsctl add-br br0 +$ ovs-vsctl add-port br0 at_gre0 -- \ +set int at_gre0 type=ip6gre \ +options:remote_ip=fc00:100::1 \ +options:packet_type=legacy_l2 + Q: How do I connect two bridges? A: First, why do you want to do this? Two connected bridges are not much diff --git a/NEWS b/NEWS index a38ab258fc6c..c7e84ed7931d 100644 --- a/NEWS +++ b/NEWS @@ -47,6 +47,7 @@ Post-v2.11.0 - Linux datapath: * Support for the kernel versions 4.19.x and 4.20.x. * Support for the kernel version 5.0.x. + - Add L2 GRE tunnel over IPv6 support. v2.11.0 - 19 Feb 2019 diff --git a/datapath/linux/compat/ip6_gre.c b/datapath/linux/compat/ip6_gre.c index ca4e66133570..ab50c72d0753 100644 --- a/datapath/linux/compat/ip6_gre.c +++ b/datapath/linux/compat/ip6_gre.c @@ -2550,7 +2550,7 @@ static struct rtnl_link_ops ip6gre_link_ops __read_mostly = { }; static struct rtnl_link_ops ip6gre_tap_ops __read_mostly = { - .kind = "ip6gre", + .kind = "ip6gretap", .maxtype= RPL_IFLA_GRE_MAX, .policy = ip6gre_policy, .priv_size = sizeof(struct ip6_tnl), diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c index 2e23a8c14fcf..582274c46774 100644 --- a/lib/dpif-netlink-rtnl.c +++ b/lib/dpif-netlink-rtnl.c @@ -104,7 +104,13 @@ vport_type_to_kind(enum ovs_vport_type type, case OVS_VPORT_TYPE_IP6ERSPAN: return "ip6erspan"; case OVS_VPORT_TYPE_IP6GRE: -return "ip6gre"; +if (tnl_cfg->pt_mode == NETDEV_PT_LEGACY_L2) { +return "ip6gretap"; +} else if (tnl_cfg->pt_mode == NETDEV_PT_LEGACY_L3) { +return NULL; +} else { +return NULL; +} case OVS_VPORT_TYPE_NETDEV: case OVS_VPORT_TYPE_INTERNAL: case OVS_VPORT_TYPE_LISP: diff --git a/tests/system-traffic.at b/tests/system-traffic.at index d23ee897b0b2..8ea450887076 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -340,6 +340,46 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PI OVS_TRAFFIC_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([datapath - ping over ip6gre L2 tunnel]) +OVS_CHECK_KERNEL_EXCL(3, 10, 4, 15) +OVS_CHECK_GRE() +OVS_CHECK_ERSPAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad) +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS out
Re: [ovs-dev] [PATCHv5] tunnel: Add layer 2 IPv6 GRE encapsulation support.
On 7/2/2019 10:42 AM, Gregory Rose wrote: On 7/1/2019 12:45 PM, William Tu wrote: The patch adds ip6gre support. Tunnel type 'ip6gre' with packet_type= legacy_l2 is a layer 2 GRE tunnel over IPv6, carrying inner ethernet packets and encap with GRE header with outer IPv6 header. Encapsulation of layer 3 packet over IPv6 GRE, ip6gre, is not supported yet. I tested it by running: # make check-kernel TESTSUITEFLAGS='-k ip6gre' under kernel 5.2 and for userspace: # make check TESTSUITEFLAGS='-k ip6gre' On a Ubuntu 18.04 system with the distro provided 4.15.0-43-generic kernel I get the following splat: [ 510.469589] Kernel panic - not syncing: Fatal exception in interrupt [ 510.470789] Kernel Offset: 0x1e00 from 0x8100 (relocation range: 0x8000-0xbfff) [ 510.471360] ---[ end Kernel panic - not syncing: Fatal exception in interrupt [snip] I've run into an interesting problem but I don't think it's related to this patch. I can revert the patch on that system and get the same problem. In the interest of getting this patch tested and reviewed I'll just try a different distro/machine. - Greg ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] criteria/benchmarks for an improved userspace datapath classifier?
Hello, On 02/07/2019 19:47, Flavio Leitner wrote: On Tue, Jul 02, 2019 at 10:00:44AM -0700, Ben Pfaff wrote: Hi Ilya and Ian. Please allow me to introduce Michal Orsak, a grad student currently looking at packet classifiers. He's implemented a novel classifier that is faster than the one already in OVS in the benchmarks that he's run. His classifier is tree-based, like most high-performance classifiers, but also incremental so that flows can be inserted and deleted individually without undue delay. Ultimately, it might make sense to replace the OVS userspace datapath classifier by one based on the concepts that he's come up with. A difficulty with classifiers, however, is coming up with an appropriate set of benchmarks to compare them fairly. The userspace datapath focuses on performance, so do you have a set of benchmarks that you recommend for comparison? Are there other criteria that would be important (besides correctness)? (I'd take answers from anyone, not just Ian and Ilya!) Hi Ben, We use 1M constant IPv4 flows, and 200k new and 200k retiring IPv4 flows per second to compare as it seems to be close to some production workloads. What is the format of the rules in classifier? Do you also remove and add 200K rules/s? fbl Michal ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] criteria/benchmarks for an improved userspace datapath classifier?
Hello everyone, On 02/07/2019 19:43, Ian Stokes wrote: On 7/2/2019 6:00 PM, Ben Pfaff wrote: Hi Ilya and Ian. Please allow me to introduce Michal Orsak, a grad student currently looking at packet classifiers. He's implemented a novel classifier that is faster than the one already in OVS in the benchmarks that he's run. His classifier is tree-based, like most high-performance classifiers, but also incremental so that flows can be inserted and deleted individually without undue delay. Ultimately, it might make sense to replace the OVS userspace datapath classifier by one based on the concepts that he's come up with. Thanks for the introduction Ben. I wasn't aware of Michals work. The timing is quite apt as currently I've been looking at Harrys (CCd) approach to improving performance for the current DPCLS. https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/358790.html I was hoping to have this approach in place for OVS 2.12 as there's been ongoing testing/reviews across the community with general improvements of ~ 15% or more for both x86 and ARM architectures which looks promising. I'd be interested in seeing Michals proposals if there are patches available that would be great. Now I am implementing the DPCLS API so we can test the thing. There are also some differences and problems like: * support for priority of the rules * only single matching rule * c++ * my limited knowledge of DPCLS A difficulty with classifiers, however, is coming up with an appropriate set of benchmarks to compare them fairly. The userspace datapath focuses on performance, so do you have a set of benchmarks that you recommend for comparison? Are there other criteria that would be important (besides correctness)? Agreed, that's something we've been looking for feedback on from the community to date. Most testing we have seen have been performance focused. There are so many corner cases... Any test is appreciated. The areas I think that would be good to measure would be flow addition/flow deletion overhead in CPU cycles. Also memory and time complexity comparisons for existing and proposed techniques might be nice to see. Currently I do have comparison with OVS TSS, DPDK librte_acl, PartitionSort, HyperSplit, HyperCuts and few others. (C based benchmarks. Some of them not optimized implementations. Originally for devel. of hardware based classifier.). All previously mentioned except TSS do have worst case complexities from the same class. However it is more complicated as it depends on size of the matching field and other factors which have different meanings between algs. I do not have understandable and short definitions yet. Thanks Ian (I'd take answers from anyone, not just Ian and Ilya!) Thanks, Ben. Thanks, Michal ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 1/3] doc: Move vhost tx retry info to separate section.
On 7/2/2019 1:32 AM, Kevin Traynor wrote: vhost tx retry is applicable to vhost-user and vhost-user-client, but was in the section that compares them. Also, moved further down the doc as prefer to have more fundamental info about vhost nearer the top. Fixes: 6d6513bfc657 ("doc: Add info on vhost tx retries.") Reported-by: David Marchand Signed-off-by: Kevin Traynor --- Documentation/topics/dpdk/vhost-user.rst | 72 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst index a6cf9d1cc..5f393aced 100644 --- a/Documentation/topics/dpdk/vhost-user.rst +++ b/Documentation/topics/dpdk/vhost-user.rst @@ -76,40 +76,4 @@ mode ports require QEMU version 2.7. Ports of type vhost-user are currently deprecated and will be removed in a future release. -vhost tx retries - - -When sending a batch of packets to a vhost-user or vhost-user-client interface, -it may happen that some but not all of the packets in the batch are able to be -sent to the guest. This is often because there is not enough free descriptors -in the virtqueue for all the packets in the batch to be sent. In this case -there will be a retry, with a default maximum of 8 occurring. If at any time no -packets can be sent, it may mean the guest is not accepting packets, so there -are no (more) retries. - -.. note:: - - Maximum vhost tx batch size is defined by NETDEV_MAX_BURST, and is currently - as 32. - -Tx Retries may be reduced or even avoided by some external configuration, such -as increasing the virtqueue size through the ``rx_queue_size`` parameter -introduced in QEMU 2.7.0 / libvirt 2.3.0:: - - - - - - - - - -The guest application will also need need to provide enough descriptors. For -example with ``testpmd`` the command line argument can be used:: - - --rxd=1024 --txd=1024 - -The guest should also have sufficient cores dedicated for consuming and -processing packets at the required rate. - .. _dpdk-vhost-user: @@ -521,4 +485,40 @@ DPDK vHost User ports can be configured to use Jumbo Frames. For more information, refer to :doc:`jumbo-frames`. +vhost tx retries + + +When sending a batch of packets to a vhost-user or vhost-user-client interface, +it may happen that some but not all of the packets in the batch are able to be +sent to the guest. This is often because there is not enough free descriptors +in the virtqueue for all the packets in the batch to be sent. In this case +there will be a retry, with a default maximum of 8 occurring. If at any time no +packets can be sent, it may mean the guest is not accepting packets, so there +are no (more) retries. + +.. note:: + + Maximum vhost tx batch size is defined by NETDEV_MAX_BURST, and is currently + as 32. + +Tx Retries may be reduced or even avoided by some external configuration, such +as increasing the virtqueue size through the ``rx_queue_size`` parameter +introduced in QEMU 2.7.0 / libvirt 2.3.0:: + + + + + + + + + +The guest application will also need need to provide enough descriptors. For +example with ``testpmd`` the command line argument can be used:: + + --rxd=1024 --txd=1024 + +The guest should also have sufficient cores dedicated for consuming and +processing packets at the required rate. + Looks ok to me. @David as this was suggested by you, are you happy with the change? Thanks Ian ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and vhostuser ports
On 02/07/2019 13:30, Sriram Vatala wrote: > Hi Kevin, > > Please consider this as a gentle remainder regarding below mail discussion > and > let me know your thoughts if any. > Hi Sriram, I think your suggested approach is the best available at present i.e. use a more generic name and document the most likely cause. I thought about something with rte_vhost_avail_entries() but i'm not sure it's sufficient. You would need to re-write the OVS vhost send code, and it seems to return 0 for multiple reasons. thanks, Kevin. PS, I am on leave with limited email access, so best to discuss further with Ian and Ilya if required. > Thanks & Regards, > Sriram. > > -Original Message- > From: Sriram Vatala > Sent: 26 June 2019 11:01 > To: 'Venkatesan Pradeep' ; 'Kevin Traynor' > ; 'ovs-dev@openvswitch.org' > Cc: 'Ilya Maximets' > Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk > and > vhostuser ports > > Hi, > > Agreed. As per suggestions, I would go a head with generic name for packet > drops due to transmission failure (i.e packets left un-transmitted in output > packet buffer argument after call to transmit API ) and since the most likely > reason for this to happen is when transmit queue is full or has been filled > up . So, i will highlight this reason in the documentation. > > Thanks & Regards, > Sriram. > > -Original Message- > From: Venkatesan Pradeep > Sent: 24 June 2019 12:30 > To: Sriram Vatala ; 'Kevin Traynor' > > Cc: 'Ilya Maximets' > Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and > vhostuser ports > > Hi, > > Since the reason for introducing a new counter to track drops due to tx queue > being full was to quickly isolate that the problem was on the VM side rather > than on OVS, having a generic counter and name wouldn't help. One would still > need to have some knowledge of the code to understand that the likely reason > for tx failure was the queue being full. Short of having a new API that can > return the failure code we won't be able to say for sure the drop reason. If > we want to use a generic name perhaps we can update the documentation to > highlight the likely reason. > > Thanks, > > Pradeep > -Original Message- > From: ovs-dev-boun...@openvswitch.org On > Behalf Of Sriram Vatala via dev > Sent: Wednesday, June 19, 2019 8:44 PM > To: 'Kevin Traynor' ; ovs-dev@openvswitch.org > Cc: 'Ilya Maximets' > Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and > vhostuser ports > > Hi kevin, > > Thanks for your inputs. I will send the updated patch ASAP. > > Thanks & Regards, > Sriram. > > -Original Message- > From: Kevin Traynor > Sent: 19 June 2019 19:07 > To: Sriram Vatala ; ovs-dev@openvswitch.org > Cc: 'Ilya Maximets' > Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and > vhostuser ports > > On 19/06/2019 11:07, Sriram Vatala wrote: >> >> Hi Kevin, >> >> Thanks for reviewing the patch. Please find my answers to your >> comments below. >> >> Comment-1 >> I'm not sure about this name, you would need to know that it was the >> only reason rte_eth_tx_burst() and rte_vhost_enqueue_burst() will not >> send pkts. >> --- >> Agree that queue full is not the only reason for which the above DPDK >> apis will fail to send packets. But queue full is the most common reason. >> Instead of overlooking other reasons like invalid queue-id, i can >> change the name of the counter to a generic name something like >> "tx_failed_drops". >> what do you think? >> > > Sounds good to me. > >> Comment-2 >> There can be other drops earlier in this function >> ("__netdev_dpdk_vhost_send"), should they be logged also? >> --- >> These are the drops for invalid queue-id or vhost device id and if >> device is not up. >> Again these are very unlikely to happen, but i can add a rate limiting >> warn log. >> > > Well I guess it doesn't invalidate your patches which provides drop stats for > mtu/qos/qfull(or new name), so maybe it's ok to not log those other drops in > the context of this patchset. > >> Comment-3 >>> rte_spinlock_lock(&dev->stats_lock); >>> netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, >>> total_pkts, >>> + dropped); >>> +dev->stats.tx_mtu_drops += mtu_drops; >> >> There is a function just above for updating vhost tx stats. Now the >> updates are split with some updated in the function and some updated >> here, it would be better to update them all in the same place. >> --- >> Agree. will change the implementation here. >> >> Comment-4 >>> >>> +dropped += mtu_drops + qos_drops + qfull_drops; >> >> = is enough, you don't need += >> --- >> Actually in the code above to this part in "dpdk_do_tx_copy" function, >> dropped will updated if mbuf alloc fails. >> Here is the code snippet: >> >> >> pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp); >> if (OVS_UNL
Re: [ovs-dev] criteria/benchmarks for an improved userspace datapath classifier?
On Tue, Jul 02, 2019 at 10:00:44AM -0700, Ben Pfaff wrote: > Hi Ilya and Ian. Please allow me to introduce Michal Orsak, a grad > student currently looking at packet classifiers. He's implemented a > novel classifier that is faster than the one already in OVS in the > benchmarks that he's run. His classifier is tree-based, like most > high-performance classifiers, but also incremental so that flows can be > inserted and deleted individually without undue delay. Ultimately, it > might make sense to replace the OVS userspace datapath classifier by one > based on the concepts that he's come up with. > > A difficulty with classifiers, however, is coming up with an appropriate > set of benchmarks to compare them fairly. The userspace datapath > focuses on performance, so do you have a set of benchmarks that you > recommend for comparison? Are there other criteria that would be > important (besides correctness)? > > (I'd take answers from anyone, not just Ian and Ilya!) Hi Ben, We use 1M constant IPv4 flows, and 200k new and 200k retiring IPv4 flows per second to compare as it seems to be close to some production workloads. fbl ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] criteria/benchmarks for an improved userspace datapath classifier?
On 7/2/2019 6:00 PM, Ben Pfaff wrote: Hi Ilya and Ian. Please allow me to introduce Michal Orsak, a grad student currently looking at packet classifiers. He's implemented a novel classifier that is faster than the one already in OVS in the benchmarks that he's run. His classifier is tree-based, like most high-performance classifiers, but also incremental so that flows can be inserted and deleted individually without undue delay. Ultimately, it might make sense to replace the OVS userspace datapath classifier by one based on the concepts that he's come up with. Thanks for the introduction Ben. I wasn't aware of Michals work. The timing is quite apt as currently I've been looking at Harrys (CCd) approach to improving performance for the current DPCLS. https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/358790.html I was hoping to have this approach in place for OVS 2.12 as there's been ongoing testing/reviews across the community with general improvements of ~ 15% or more for both x86 and ARM architectures which looks promising. I'd be interested in seeing Michals proposals if there are patches available that would be great. A difficulty with classifiers, however, is coming up with an appropriate set of benchmarks to compare them fairly. The userspace datapath focuses on performance, so do you have a set of benchmarks that you recommend for comparison? Are there other criteria that would be important (besides correctness)? Agreed, that's something we've been looking for feedback on from the community to date. Most testing we have seen have been performance focused. The areas I think that would be good to measure would be flow addition/flow deletion overhead in CPU cycles. Also memory and time complexity comparisons for existing and proposed techniques might be nice to see. Thanks Ian (I'd take answers from anyone, not just Ian and Ilya!) Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv5] tunnel: Add layer 2 IPv6 GRE encapsulation support.
On 7/1/2019 12:45 PM, William Tu wrote: The patch adds ip6gre support. Tunnel type 'ip6gre' with packet_type= legacy_l2 is a layer 2 GRE tunnel over IPv6, carrying inner ethernet packets and encap with GRE header with outer IPv6 header. Encapsulation of layer 3 packet over IPv6 GRE, ip6gre, is not supported yet. I tested it by running: # make check-kernel TESTSUITEFLAGS='-k ip6gre' under kernel 5.2 and for userspace: # make check TESTSUITEFLAGS='-k ip6gre' On a Ubuntu 18.04 system with the distro provided 4.15.0-43-generic kernel I get the following splat: [ 510.428122] general protection fault: [#1] SMP PTI [ 510.428192] Modules linked in: openvswitch(OE) tunnel6 nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 netconsole xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev snd soundcore input_leds mac_hid serio_raw qemu_fw_cfg sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc qxl ttm [ 510.428536] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops aesni_intel aes_x86_64 crypto_simd glue_helper cryptd drm psmouse i2c_piix4 virtio_net virtio_blk pata_acpi floppy [ 510.428609] CPU: 0 PID: 35 Comm: kworker/0:1 Tainted: G OE 4.15.0-43-generic #46-Ubuntu [ 510.428646] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 510.428727] Workqueue: ipv6_addrconf addrconf_dad_work [ 510.428763] RIP: 0010:__gre6_xmit+0x11f/0x2c0 [openvswitch] [ 510.428787] RSP: 0018:b1e840d8f3e8 EFLAGS: 00010286 [ 510.428820] RAX: ffea RBX: 9dfaa8a1 RCX: [ 510.428848] RDX: af55e000 RSI: 0040 RDI: 9dfaae9cda00 [ 510.428875] RBP: b1e840d8f440 R08: 5865 R09: b1e840d8f464 [ 510.428903] R10: 2000 R11: 9dfab7002c00 R12: 9dfaae9cda00 [ 510.428930] R13: R14: b1e840d8f468 R15: [ 510.428972] FS: () GS:9dfabfc0() knlGS: [ 510.429003] CS: 0010 DS: ES: CR0: 80050033 [ 510.429950] CR2: 55d347881e60 CR3: 0001b760a005 CR4: 003606f0 [ 510.430860] DR0: DR1: DR2: [ 510.431683] DR3: DR6: fffe0ff0 DR7: 0400 [ 510.432494] Call Trace: [ 510.433318] ip6gre_tunnel_xmit+0x103/0x530 [openvswitch] [ 510.434083] ? __wake_up_common+0x73/0x130 [ 510.434891] ? __wake_up_common_lock+0x8e/0xc0 [ 510.435763] __ip6gre_tunnel_xmit+0x12/0x20 [openvswitch] [ 510.436562] ovs_vport_send+0xd4/0x170 [openvswitch] [ 510.437393] do_output+0x53/0x160 [openvswitch] [ 510.438221] do_execute_actions+0x9a1/0x1880 [openvswitch] [ 510.438968] ? __wake_up_common+0x73/0x130 [ 510.439746] ? __wake_up_common_lock+0x8e/0xc0 [ 510.440515] ? __skb_flow_dissect+0xd29/0x1450 [ 510.441295] ovs_execute_actions+0x4c/0x120 [openvswitch] [ 510.442030] ? ovs_execute_actions+0x4c/0x120 [openvswitch] [ 510.442819] ovs_dp_process_packet+0x95/0x150 [openvswitch] [ 510.443649] ? ovs_ct_update_key+0x85/0xe0 [openvswitch] [ 510.92] ovs_vport_receive+0x7e/0xd0 [openvswitch] [ 510.445294] ? select_idle_sibling+0x29/0x410 [ 510.446200] ? select_task_rq_fair+0x637/0xab0 [ 510.446990] ? __update_load_avg_se.isra.38+0x1c0/0x1d0 [ 510.447763] ? try_to_wake_up+0x59/0x480 [ 510.448549] internal_dev_xmit+0x28/0x60 [openvswitch] [ 510.449356] dev_hard_start_xmit+0xa8/0x210 [ 510.450173] __dev_queue_xmit+0x690/0x7d0 [ 510.450908] ? mroute6_socket+0x78/0xa0 [ 510.451704] dev_queue_xmit+0x10/0x20 [ 510.452413] ? dev_queue_xmit+0x10/0x20 [ 510.453101] ip6_finish_output2+0x339/0x520 [ 510.453776] ? ipv6_get_l4proto+0x96/0xe0 [nf_conntrack_ipv6] [ 510.454429] ip6_finish_output+0x13a/0x1b0 [ 510.455049] ? ip6_finish_output+0x13a/0x1b0 [ 510.455662] ip6_output+0x6c/0x110 [ 510.456285] ? ip6_fragment+0x9f0/0x9f0 [ 510.456825] NF_HOOK.constprop.34+0x45/0xb0 [ 510.457401] ? ipv6_icmp_sysctl_init+0x40/0x40 [ 510.458022] mld_sendpack+0x163/0x210 [ 510.458572] mld_send_initial_cr.part.23+0x86/0xa0 [ 510.459214] ipv6_mc_dad_complete+0x2b/0x40 [ 510.459744] addrconf_dad_completed+0x302/0x370 [ 510.460268] ? __switch_to_asm+0x40/0x70 [ 510.460768] ? __switch_to_asm+0x34/0x70 [ 510.461272] ? __switch_to_asm+0x40/0x70 [ 510.461766] ? __switch_to_asm+0x34/0x70 [ 510.462271] ? __switch_to_asm+0x40/0x70 [ 510.462738] addrconf_dad_work+0x2b8/0x440 [
Re: [ovs-dev] [PATCH 1/1] dpif-netlink: Warn eth type 0x1234 not offloadable
On Sun, Jun 30, 2019 at 06:16:05AM +, Eli Britstein wrote: > Ethernet type 0x1234 is used for testing and not being offloadable. For > testing offloadable features, warn about using this value. > > Signed-off-by: Eli Britstein > --- > lib/dpif-netlink.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c > index ba80a0079..327076784 100644 > --- a/lib/dpif-netlink.c > +++ b/lib/dpif-netlink.c > @@ -2007,6 +2007,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct > dpif_flow_put *put) > > /* When we try to install a dummy flow from a probed feature. */ > if (match.flow.dl_type == htons(0x1234)) { > +VLOG_WARN("eth 0x1234 is special and not offloadable"); > return EOPNOTSUPP; > } This should be rate-limited. Are you sure that you want it to be a warning? It does not seem to be an actual problem, so INFO or even DBG might be better. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] criteria/benchmarks for an improved userspace datapath classifier?
Hi Ilya and Ian. Please allow me to introduce Michal Orsak, a grad student currently looking at packet classifiers. He's implemented a novel classifier that is faster than the one already in OVS in the benchmarks that he's run. His classifier is tree-based, like most high-performance classifiers, but also incremental so that flows can be inserted and deleted individually without undue delay. Ultimately, it might make sense to replace the OVS userspace datapath classifier by one based on the concepts that he's come up with. A difficulty with classifiers, however, is coming up with an appropriate set of benchmarks to compare them fairly. The userspace datapath focuses on performance, so do you have a set of benchmarks that you recommend for comparison? Are there other criteria that would be important (besides correctness)? (I'd take answers from anyone, not just Ian and Ilya!) Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v9 1/5] dpif-netdev: implement function pointers/subtable
On 7/2/2019 4:02 PM, Van Haaren, Harry wrote: -Original Message- From: Stokes, Ian Sent: Tuesday, July 2, 2019 3:40 PM To: Van Haaren, Harry ; ovs-dev@openvswitch.org Cc: i.maxim...@samsung.com Subject: Re: [ovs-dev] [PATCH v9 1/5] dpif-netdev: implement function pointers/subtable On 5/15/2019 6:02 PM, Ian Stokes wrote: On 5/8/2019 4:13 PM, Harry van Haaren wrote: This allows plugging-in of different subtable hash-lookup-verify routines, and allows special casing of those functions based on known context (eg: # of bits set) of the specific subtable. Signed-off-by: Harry van Haaren --- v9: - Use count_1bits in favour of __builtin_popcount (Ilya) v6: - Implement subtable effort per packet "lookups_match" counter (Ilya) - Remove double newline (Eelco) - Remove doubel * before comments (Eelco) - Reword comments in dpcls_lookup() for clarity (Harry) --- lib/dpif-netdev.c | 135 +++--- 1 file changed, 93 insertions(+), 42 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 5a6f2abac..e2e2c14e7 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -7572,6 +7572,27 @@ dpif_dummy_register(enum dummy_level level) /* Datapath Classifier. */ +/* forward declaration for lookup_func typedef */ Minor nit above, general rule of thumb for comment style, start with capital letter and finish with period. +struct dpcls_subtable; + +/* Lookup function for a subtable in the dpcls. This function is called + * by each subtable with an array of packets, and a bitmask of packets to + * perform the lookup on. Using a function pointer gives flexibility to + * optimize the lookup function based on subtable properties and the + * CPU instruction set available at runtime. + */ +typedef uint32_t (*dpcls_subtable_lookup_func)(struct dpcls_subtable *subtable, + uint32_t keys_map, const struct netdev_flow_key *keys[], + struct dpcls_rule **rules); Alignment for the arguments above looks odd, typically arguments are kept vertically in line with one another *similar to dpcls_subtable_lookup_generic below). + +/* Prototype for generic lookup func, using same code path as before */ Period at end of comment above. +uint32_t +dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable, + uint32_t keys_map, + const struct netdev_flow_key *keys[], + struct dpcls_rule **rules); + + One minor follow up, extra whitespace seems to have been added here, cab be reduced to 1 new line. Will fix. + uint32_t pkts_matched = count_1bits(found_map); Just to clarify, at this point found_map has been returned and it only contains matches where packets were found? i.e. it doesn't contain matches from possible hash collisions? + lookups_match += pkts_matched * subtable_pos; Hi Harry, can you clarify above why '* subtable_pos' is used? Is that right? I'm just thinking that you already have the number of packets matched from the count_1bits(found_map). The "lookups match" counter variable name is a bit misleading, but I'd change it in this patchset as its orthogonal to the DPCLS function pointer concept. The counter counts "number of subtables searched per hit packet", so to speak. See Ilya's input and my response here: https://www.mail-archive.com/ovs-dev@openvswitch.org/msg31442.html This is a pseudo code of expected behavior: 1st subtable packet hits are += 1, 2nd subtable packet hits are += 2, 3rd subtable packet hits are += 3 etc. At the end of a burst, we have a bitmask of packets hit, and a counter for "subtable effort per packet matched". Given two experienced OVS folks asked ~ the same question, hints that its time for a cleanup & refactor, perhaps even a /* descriptive comment */ :) Ya that makes more sense, confirms what i suspected :). Sure a description of the purpose would help here in the next revision. Thanks Ian ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv13] netdev-afxdp: add new netdev type for AF_XDP.
On 28.06.2019 19:37, William Tu wrote: >> >> >> One more thing I noticed is the same issue as you had with completion queue, >> but >> with rx queue. When I'm trying to send traffic from 2 threads to the same >> port, > > Is the 2 threads send traffic using afxdp tx? Yes. > >> I'm starting receiving same pointers from rx ring. Not only the same ring >> entries, >> but there was cases where two identical pointers was stored sequentially in >> rx ring. > > I use similar way as used in completion queue (assign UINT64_MAX to rx ring > at netdev_afxdp_rxq_recv) but do not see any identical pointers. > >> I'm more and more thinking that it's a kernel/libbpf bug. The last bit that >> left >> for checking is the pointers inside the fill queue. All other parts in OVS >> seems to >> work correctly. I'll send more information about the testcase later after >> re-checking >> with the most recent bpf-next. >> > > Look forward to your investigation! Thanks a lot. It was a kernel bug that generic receive path doesn't have any locks, but generic receive could be triggered from different cores at the same time breaking the rx an fill queues. I tried to run 2 traffic flows over the veth pair, one side of which was opened by netdev-afxdp in OVS. And OVS constantly crashed because two kernel threads tried to allocate same addresses from fill queue and pushed them to rx queue. That is the root cause of duplicated addresses in RX queue. Data in these descriptors most probably was corrupted too. I've send a patch for this issue: https://lore.kernel.org/bpf/20190702143634.19688-1-i.maxim...@samsung.com/ I'm still having some troubles with this scenario. Sometimes the traffic simply stops flowing. But this seems a different issue. Most likely, one more kernel issue... However, OVS doesn't crash for me anymore. And this is good news. - Full testcase description - ip netns add at_ns0 ip netns add at_ns1 ip link add p0 type veth peer name patch-p0 ethtool -K p0 tx off rxvlan off txvlan off ip link set p0 netns at_ns0 ip link set dev patch-p0 up ip link set dev patch-p0 promisc on ip netns exec at_ns0 ip addr add "10.1.1.1/24" dev p0 ip netns exec at_ns0 ip link set dev p0 up ip link add p1 type veth peer name patch-p1 ethtool -K p1 tx off rxvlan off txvlan off ip link set p1 netns at_ns1 ip link set dev patch-p1 up ip link set dev patch-p1 promisc on ip netns exec at_ns1 ip addr add "10.1.1.2/24" dev p1 ip netns exec at_ns1 ip link set dev p1 up # up the internal port of ovs bridge ip link set dev br0 up ip addr add dev br0 10.1.1.13/24 [shell#1] ip netns exec at_ns1 iperf3 -s [shell#2] ip netns exec at_ns1 iperf3 -s -p 5008 [shell#3] ip netns exec at_ns0 iperf3 -c 10.1.1.2 -t 3600 [shell#4] iperf3 -c 10.1.1.2 -t 3600 -p 5008 # Works via internal port. - For this testcase to work you need 'skb_unclone' patch applied in kernel, otherwise TCP traffic will not flow. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v9 1/5] dpif-netdev: implement function pointers/subtable
> -Original Message- > From: Stokes, Ian > Sent: Tuesday, July 2, 2019 3:40 PM > To: Van Haaren, Harry ; ovs-dev@openvswitch.org > Cc: i.maxim...@samsung.com > Subject: Re: [ovs-dev] [PATCH v9 1/5] dpif-netdev: implement function > pointers/subtable > > On 5/15/2019 6:02 PM, Ian Stokes wrote: > > On 5/8/2019 4:13 PM, Harry van Haaren wrote: > >> This allows plugging-in of different subtable hash-lookup-verify > >> routines, and allows special casing of those functions based on > >> known context (eg: # of bits set) of the specific subtable. > >> > >> Signed-off-by: Harry van Haaren > >> > >> --- > >> > >> v9: > >> - Use count_1bits in favour of __builtin_popcount (Ilya) > >> > >> v6: > >> - Implement subtable effort per packet "lookups_match" counter (Ilya) > >> - Remove double newline (Eelco) > >> - Remove doubel * before comments (Eelco) > >> - Reword comments in dpcls_lookup() for clarity (Harry) > >> --- > >> lib/dpif-netdev.c | 135 +++--- > >> 1 file changed, 93 insertions(+), 42 deletions(-) > >> > >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > >> index 5a6f2abac..e2e2c14e7 100644 > >> --- a/lib/dpif-netdev.c > >> +++ b/lib/dpif-netdev.c > >> @@ -7572,6 +7572,27 @@ dpif_dummy_register(enum dummy_level level) > >> > >> /* Datapath Classifier. */ > >> +/* forward declaration for lookup_func typedef */ > > > > Minor nit above, general rule of thumb for comment style, start with > > capital letter and finish with period. > > > >> +struct dpcls_subtable; > >> + > >> +/* Lookup function for a subtable in the dpcls. This function is called > >> + * by each subtable with an array of packets, and a bitmask of > >> packets to > >> + * perform the lookup on. Using a function pointer gives flexibility to > >> + * optimize the lookup function based on subtable properties and the > >> + * CPU instruction set available at runtime. > >> + */ > >> +typedef uint32_t (*dpcls_subtable_lookup_func)(struct dpcls_subtable > >> *subtable, > >> + uint32_t keys_map, const struct netdev_flow_key *keys[], > >> + struct dpcls_rule **rules); > > > > Alignment for the arguments above looks odd, typically arguments are > > kept vertically in line with one another *similar to > > dpcls_subtable_lookup_generic below). > > > >> + > >> +/* Prototype for generic lookup func, using same code path as before */ > > > > Period at end of comment above. > > > >> +uint32_t > >> +dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable, > >> + uint32_t keys_map, > >> + const struct netdev_flow_key *keys[], > >> + struct dpcls_rule **rules); > >> + > >> + > > One minor follow up, extra whitespace seems to have been added here, cab > be reduced to 1 new line. Will fix. > > > >> + uint32_t pkts_matched = count_1bits(found_map); > > > > Just to clarify, at this point found_map has been returned and it only > > contains matches where packets were found? i.e. it doesn't contain > > matches from possible hash collisions? > > > >> + lookups_match += pkts_matched * subtable_pos; > > Hi Harry, can you clarify above why '* subtable_pos' is used? Is that > right? I'm just thinking that you already have the number of packets > matched from the count_1bits(found_map). The "lookups match" counter variable name is a bit misleading, but I'd change it in this patchset as its orthogonal to the DPCLS function pointer concept. The counter counts "number of subtables searched per hit packet", so to speak. See Ilya's input and my response here: https://www.mail-archive.com/ovs-dev@openvswitch.org/msg31442.html This is a pseudo code of expected behavior: 1st subtable packet hits are += 1, 2nd subtable packet hits are += 2, 3rd subtable packet hits are += 3 etc. At the end of a burst, we have a bitmask of packets hit, and a counter for "subtable effort per packet matched". Given two experienced OVS folks asked ~ the same question, hints that its time for a cleanup & refactor, perhaps even a /* descriptive comment */ :) Thanks for review / comments! -Harry ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v9 1/5] dpif-netdev: implement function pointers/subtable
On 5/15/2019 6:02 PM, Ian Stokes wrote: On 5/8/2019 4:13 PM, Harry van Haaren wrote: This allows plugging-in of different subtable hash-lookup-verify routines, and allows special casing of those functions based on known context (eg: # of bits set) of the specific subtable. Signed-off-by: Harry van Haaren --- v9: - Use count_1bits in favour of __builtin_popcount (Ilya) v6: - Implement subtable effort per packet "lookups_match" counter (Ilya) - Remove double newline (Eelco) - Remove doubel * before comments (Eelco) - Reword comments in dpcls_lookup() for clarity (Harry) --- lib/dpif-netdev.c | 135 +++--- 1 file changed, 93 insertions(+), 42 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 5a6f2abac..e2e2c14e7 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -7572,6 +7572,27 @@ dpif_dummy_register(enum dummy_level level) /* Datapath Classifier. */ +/* forward declaration for lookup_func typedef */ Minor nit above, general rule of thumb for comment style, start with capital letter and finish with period. +struct dpcls_subtable; + +/* Lookup function for a subtable in the dpcls. This function is called + * by each subtable with an array of packets, and a bitmask of packets to + * perform the lookup on. Using a function pointer gives flexibility to + * optimize the lookup function based on subtable properties and the + * CPU instruction set available at runtime. + */ +typedef uint32_t (*dpcls_subtable_lookup_func)(struct dpcls_subtable *subtable, + uint32_t keys_map, const struct netdev_flow_key *keys[], + struct dpcls_rule **rules); Alignment for the arguments above looks odd, typically arguments are kept vertically in line with one another *similar to dpcls_subtable_lookup_generic below). + +/* Prototype for generic lookup func, using same code path as before */ Period at end of comment above. +uint32_t +dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable, + uint32_t keys_map, + const struct netdev_flow_key *keys[], + struct dpcls_rule **rules); + + One minor follow up, extra whitespace seems to have been added here, cab be reduced to 1 new line. /* A set of rules that all have the same fields wildcarded. */ struct dpcls_subtable { /* The fields are only used by writers. */ @@ -7581,6 +7602,13 @@ struct dpcls_subtable { struct cmap rules; /* Contains "struct dpcls_rule"s. */ uint32_t hit_cnt; /* Number of match hits in subtable in current optimization interval. */ + + /* the lookup function to use for this subtable. If there is a known + * property of the subtable (eg: only 3 bits of miniflow metadata is + * used for the lookup) then this can point at an optimized version of + * the lookup function for this particular subtable. */ the -> The above. + dpcls_subtable_lookup_func lookup_func; + struct netdev_flow_key mask; /* Wildcards for fields (const). */ /* 'mask' must be the last field, additional space is allocated here. */ }; @@ -7640,6 +7668,10 @@ dpcls_create_subtable(struct dpcls *cls, const struct netdev_flow_key *mask) cmap_init(&subtable->rules); subtable->hit_cnt = 0; netdev_flow_key_clone(&subtable->mask, mask); + + /* decide which hash/lookup/verify function to use */ + subtable->lookup_func = dpcls_subtable_lookup_generic; So by default dpcls_subtable_lookup_generic is set as the look up function. Makes sense as this is the only look up available at this stage. I assume it's later in the series a mechanism to select a different lookup is introduced. Or by default does the lookup always start off as the generic option when a subtable is created even when other options are possible? + cmap_insert(&cls->subtables_map, &subtable->cmap_node, mask->hash); /* Add the new subtable at the end of the pvector (with no hits yet) */ pvector_insert(&cls->subtables, subtable, 0); @@ -7800,6 +7832,53 @@ dpcls_rule_matches_key(const struct dpcls_rule *rule, return true; } +uint32_t +dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable, + uint32_t keys_map, + const struct netdev_flow_key *keys[], + struct dpcls_rule **rules) +{ + int i; + /* Compute hashes for the remaining keys. Each search-key is + * masked with the subtable's mask to avoid hashing the wildcarded + * bits. */ + uint32_t hashes[NETDEV_MAX_BURST]; + ULLONG_FOR_EACH_1(i, keys_map) { + hashes[i] = netdev_flow_key_hash_in_mask(keys[i], + &subtable->mask); + } + + /* Lookup. */ + const struct cmap_node *no
Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and vhostuser ports
Hi Kevin, Please consider this as a gentle remainder regarding below mail discussion and let me know your thoughts if any. Thanks & Regards, Sriram. -Original Message- From: Sriram Vatala Sent: 26 June 2019 11:01 To: 'Venkatesan Pradeep' ; 'Kevin Traynor' ; 'ovs-dev@openvswitch.org' Cc: 'Ilya Maximets' Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and vhostuser ports Hi, Agreed. As per suggestions, I would go a head with generic name for packet drops due to transmission failure (i.e packets left un-transmitted in output packet buffer argument after call to transmit API ) and since the most likely reason for this to happen is when transmit queue is full or has been filled up . So, i will highlight this reason in the documentation. Thanks & Regards, Sriram. -Original Message- From: Venkatesan Pradeep Sent: 24 June 2019 12:30 To: Sriram Vatala ; 'Kevin Traynor' Cc: 'Ilya Maximets' Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and vhostuser ports Hi, Since the reason for introducing a new counter to track drops due to tx queue being full was to quickly isolate that the problem was on the VM side rather than on OVS, having a generic counter and name wouldn't help. One would still need to have some knowledge of the code to understand that the likely reason for tx failure was the queue being full. Short of having a new API that can return the failure code we won't be able to say for sure the drop reason. If we want to use a generic name perhaps we can update the documentation to highlight the likely reason. Thanks, Pradeep -Original Message- From: ovs-dev-boun...@openvswitch.org On Behalf Of Sriram Vatala via dev Sent: Wednesday, June 19, 2019 8:44 PM To: 'Kevin Traynor' ; ovs-dev@openvswitch.org Cc: 'Ilya Maximets' Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and vhostuser ports Hi kevin, Thanks for your inputs. I will send the updated patch ASAP. Thanks & Regards, Sriram. -Original Message- From: Kevin Traynor Sent: 19 June 2019 19:07 To: Sriram Vatala ; ovs-dev@openvswitch.org Cc: 'Ilya Maximets' Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and vhostuser ports On 19/06/2019 11:07, Sriram Vatala wrote: > > Hi Kevin, > > Thanks for reviewing the patch. Please find my answers to your > comments below. > > Comment-1 > I'm not sure about this name, you would need to know that it was the > only reason rte_eth_tx_burst() and rte_vhost_enqueue_burst() will not > send pkts. > --- > Agree that queue full is not the only reason for which the above DPDK > apis will fail to send packets. But queue full is the most common reason. > Instead of overlooking other reasons like invalid queue-id, i can > change the name of the counter to a generic name something like > "tx_failed_drops". > what do you think? > Sounds good to me. > Comment-2 > There can be other drops earlier in this function > ("__netdev_dpdk_vhost_send"), should they be logged also? > --- > These are the drops for invalid queue-id or vhost device id and if > device is not up. > Again these are very unlikely to happen, but i can add a rate limiting > warn log. > Well I guess it doesn't invalidate your patches which provides drop stats for mtu/qos/qfull(or new name), so maybe it's ok to not log those other drops in the context of this patchset. > Comment-3 >> rte_spinlock_lock(&dev->stats_lock); >> netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, >> total_pkts, >> + dropped); >> +dev->stats.tx_mtu_drops += mtu_drops; > > There is a function just above for updating vhost tx stats. Now the > updates are split with some updated in the function and some updated > here, it would be better to update them all in the same place. > --- > Agree. will change the implementation here. > > Comment-4 >> >> +dropped += mtu_drops + qos_drops + qfull_drops; > > = is enough, you don't need += > --- > Actually in the code above to this part in "dpdk_do_tx_copy" function, > dropped will updated if mbuf alloc fails. > Here is the code snippet: > > > pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp); > if (OVS_UNLIKELY(!pkts[txcnt])) { > dropped += cnt - i; > break; > } > > To take care not to miss this in total drops, i am using dropped+=, > Even if the above part of the code doesn't hit, as dropped variable is > initialized to Zero that expression should not result in incorrect > value for drops. > Yes, you're right, I missed it, thanks. > > Comment-5 > >> +> + >> filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-iface-st >> ats >> + > > Probably better to be consistent and not wrap this line > --- > Ok. I wrapped this line as utilities/checkpatch script is giving > warning as the line length is exceeding 79 characters. I will unwrap this. > > > > Comment-6 > >> I think
Re: [ovs-dev] [Suspected-Phishing]Re: [Suspected-Phishing]Re: [PATCH v2 1/3] netdev: Dynamic per-port Flow API.
On 2019-07-02 2:23 PM, Roi Dayan wrote: >>> Regarding deprecation, I'd like to remove all the functionality from >>> ovs-dpctl >>> utility keeping only ability to remove system datapaths, since this, IMHO, >>> is the only useful function. >>> Imagine some flows are in TC and some in OVS. I guess dpctl will only show the OVS ones. >>> Yes, and this is documented. >> I missed the documentation but bisected to find the offending commit. I >> think it's better to deprecate as you suggested above, maybe with some >> message to refer users to the documentation. > was added when u try to use the type arg. > > # ovs-dpctl dump-flows type=tc > ovs-dpctl: Invalid argument 'type'. Use 'ovs-appctl dpctl/dump-flows' instead. > > just output the msg all the time would have been annoying. > Hi Ilya, I didn't understand at first what Eli intended but I do now. Eli suggest ovs-dpctl will always output a deprecation notice to use ovs-appctl instead. so all users will see it and start migrating to ovs-appctl. Is that accepted ? can be in any ovs-dpctl or specific ones like dump-flows but not only if type arg used but always. Thanks, Roi ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [Suspected-Phishing]Re: [Suspected-Phishing]Re: [Suspected-Phishing] [PATCH 1/1] dpif-netlink: Warn eth type 0x1234 not offloadable
On 2019-07-02 2:33 PM, Roi Dayan wrote: > > > On 2019-07-02 2:21 PM, Roi Dayan wrote: >> >> >> On 2019-06-30 9:16 AM, Eli Britstein wrote: >>> Ethernet type 0x1234 is used for testing and not being offloadable. For >>> testing offloadable features, warn about using this value. >>> >>> Signed-off-by: Eli Britstein >>> --- >>> lib/dpif-netlink.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c >>> index ba80a0079..327076784 100644 >>> --- a/lib/dpif-netlink.c >>> +++ b/lib/dpif-netlink.c >>> @@ -2007,6 +2007,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct >>> dpif_flow_put *put) >>> >>> /* When we try to install a dummy flow from a probed feature. */ >>> if (match.flow.dl_type == htons(0x1234)) { >>> +VLOG_WARN("eth 0x1234 is special and not offloadable"); >>> return EOPNOTSUPP; >>> } >>> >>> >> >> why a warn is needed? > > nm. somehow skipped your commit msg. > did u check by any chance if unit tests won't fail as they need to filter > this from the log? > tested here. didnt need to filter it. https://travis-ci.org/roidayan/ovs/builds/553249802 Acked-by: Roi Dayan >> >> ___ >> dev mailing list >> d...@openvswitch.org >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&data=02%7C01%7Croid%40mellanox.com%7C8103de0801a340b6fce008d6fee142cb%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636976640787761456&sdata=BUPGrfzUwJjVLPoBT6miPjsxXvXhC6XIELUTvtfnFe0%3D&reserved=0 >> > ___ > dev mailing list > d...@openvswitch.org > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&data=02%7C01%7Croid%40mellanox.com%7C8103de0801a340b6fce008d6fee142cb%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636976640787771450&sdata=Z5%2BCwEWOsZSXVI%2BZcCM0l0B9%2BzpBP%2F1YOT07Ai2u7tc%3D&reserved=0 > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [Suspected-Phishing] [PATCH 1/1] dpif-netlink: Warn eth type 0x1234 not offloadable
On 2019-06-30 9:16 AM, Eli Britstein wrote: > Ethernet type 0x1234 is used for testing and not being offloadable. For > testing offloadable features, warn about using this value. > > Signed-off-by: Eli Britstein > --- > lib/dpif-netlink.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c > index ba80a0079..327076784 100644 > --- a/lib/dpif-netlink.c > +++ b/lib/dpif-netlink.c > @@ -2007,6 +2007,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct > dpif_flow_put *put) > > /* When we try to install a dummy flow from a probed feature. */ > if (match.flow.dl_type == htons(0x1234)) { > +VLOG_WARN("eth 0x1234 is special and not offloadable"); > return EOPNOTSUPP; > } > > why a warn is needed? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [Suspected-Phishing]Re: [Suspected-Phishing] [PATCH 1/1] dpif-netlink: Warn eth type 0x1234 not offloadable
On 2019-07-02 2:21 PM, Roi Dayan wrote: > > > On 2019-06-30 9:16 AM, Eli Britstein wrote: >> Ethernet type 0x1234 is used for testing and not being offloadable. For >> testing offloadable features, warn about using this value. >> >> Signed-off-by: Eli Britstein >> --- >> lib/dpif-netlink.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c >> index ba80a0079..327076784 100644 >> --- a/lib/dpif-netlink.c >> +++ b/lib/dpif-netlink.c >> @@ -2007,6 +2007,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct >> dpif_flow_put *put) >> >> /* When we try to install a dummy flow from a probed feature. */ >> if (match.flow.dl_type == htons(0x1234)) { >> +VLOG_WARN("eth 0x1234 is special and not offloadable"); >> return EOPNOTSUPP; >> } >> >> > > why a warn is needed? nm. somehow skipped your commit msg. did u check by any chance if unit tests won't fail as they need to filter this from the log? > > ___ > dev mailing list > d...@openvswitch.org > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&data=02%7C01%7Croid%40mellanox.com%7Ce8fb39de2ec84d799be308d6fedfa943%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636976633920052415&sdata=VHoqOa8rylBZNj9q8W3fa4H3xx55NK3qR2tEDvBIeRg%3D&reserved=0 > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [Suspected-Phishing]Re: [PATCH v2 1/3] netdev: Dynamic per-port Flow API.
On 2019-07-01 1:53 PM, Eli Britstein wrote: > > On 7/1/2019 1:46 PM, Ilya Maximets wrote: >> On 01.07.2019 13:24, Eli Britstein wrote: >>> On 7/1/2019 1:13 PM, Ilya Maximets wrote: On 30.06.2019 7:47, Eli Britstein wrote: > This patch breaks ovs-dpctl dump-flows, when using TC offloads (kernel). > > I added a print in netdev_flow_dump_create, and flow_api is NULL when > invoking ovs-dpctl dump-flows. > > I think new netdev objects are created to the ports (netdev_open), but > not properly initialized. > > Could you please have a look? > Hi. This is a known thing that ovs-dpctl is no longer suitable for dumping of HW offloaded flows. Please, use 'ovs-appctl dpctl/dump-flows' instead. There was a patch for documentation in v4 of patch-set: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2019-May%2F359013.html&data=02%7C01%7Croid%40mellanox.com%7C87911589131b4a2272e408d6fe126b00%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636975752392668133&sdata=Ix865YKHk%2F2DhsB%2BvKVzw9ncYo413WtBtfdggXrMn7M%3D&reserved=0 Here is the relevant thread: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2019-May%2F358985.html&data=02%7C01%7Croid%40mellanox.com%7C87911589131b4a2272e408d6fe126b00%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636975752392678131&sdata=Ckv4uo%2B9jYxUcjM%2BPbr6aGWutyh1NYGn2y9FhL3aC3M%3D&reserved=0 However, I missed updating the NEWS about that change. Will send the patch for this soon. Best regards, Ilya Maximets. >>> If that so, don't you think this should be either fixed or deprecated? >> Fixing seems not possible, because ovs-dpctl is a standalone application >> that works only with default configuration and HW offloading is disabled >> by default. >> >> Regarding deprecation, I'd like to remove all the functionality from >> ovs-dpctl >> utility keeping only ability to remove system datapaths, since this, IMHO, >> is the only useful function. >> >>> Imagine some flows are in TC and some in OVS. I guess dpctl will only >>> show the OVS ones. >> Yes, and this is documented. > I missed the documentation but bisected to find the offending commit. I > think it's better to deprecate as you suggested above, maybe with some > message to refer users to the documentation. was added when u try to use the type arg. # ovs-dpctl dump-flows type=tc ovs-dpctl: Invalid argument 'type'. Use 'ovs-appctl dpctl/dump-flows' instead. just output the msg all the time would have been annoying. >> >>> I know about that appctl, but at least currently it misses some >>> information in the dump like "dp" and "offloaded". >> appctl and dpctl works through the same code in lib/dpctl.c. So, everything >> is supported. If you don't see those fields, increase the verbosity with >> '--more' argument. > Right. Sorry. >> >> Best regards, Ilya Maximets. > ___ > dev mailing list > d...@openvswitch.org > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&data=02%7C01%7Croid%40mellanox.com%7C87911589131b4a2272e408d6fe126b00%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636975752392678131&sdata=2RLvrvkSgzxbCSV1yJbCwnKNCtPble95OP6pGTgJXLs%3D&reserved=0 > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] vswitchd: Always cleanup userspace datapath.
On 01.07.2019 19:52, Ben Pfaff wrote: > On Thu, Jun 27, 2019 at 08:24:46PM +0300, Ilya Maximets wrote: >> On 26.06.2019 21:27, Ben Pfaff wrote: >>> On Tue, Jun 25, 2019 at 01:12:11PM +0300, Ilya Maximets wrote: 'netdev' datapath is implemented within ovs-vswitchd process and can not exist without it, so it should be gracefully terminated with a full cleanup of resources upon ovs-vswitchd exit. This change forces dpif cleanup for 'netdev' datapath regardless of passing '--cleanup' to 'ovs-appctl exit'. Such solution allowes to not pass this additional option everytime for userspace datapath installations and also allowes to not terminate system datapath in setups where both datapaths runs at the same time. Exception made for 'internal' ports that could have user ip/route configuration. These ports will not be removed without '--cleanup'. This change fixes OVS disappearing from the DPDK point of view (keeping HW NICs improperly configured, sudden closing of vhost-user connections) and will help with linux devices clearing with upcoming AF_XDP netdev support. Signed-off-by: Ilya Maximets >>> >>> I'm having trouble figuring out what the critical step is in the >>> destruction process that this enables or disables. It controls whether >>> dpif_port_del() gets called. There's a lot of stuff under >>> dpif_port_del(), most of it indirect. I'm not sure which bit is the >>> important one. Would you mind explaining what it is as part of the >>> commit message? >> >> """ >> The main part is that dpif_port_del() will lead to netdev_close() >> and subsequent netdev_class->destroy(dev) which will stop HW NICs >> and free their resources. For vhost-user interfaces it will invoke >> vhost driver unregistering with a properly closed vhost-user >> connection. For upcoming AF_XDP netdev this will allow to gracefully >> destroy xdp sockets and unload xdp programs from linux interfaces. >> Another important thing is that port deletion will also trigger >> flushing of flows offloaded to HW NICs. >> """ >> >> Does above shed some light on the main goals of this patch? >> I could add this information to commit message while applying the >> patch. > > Thanks. It helps a lot. Please add it to the commit message. > > Acked-by: Ben Pfaff Thanks, Ben, Flavio and William! I added above information to commit message and pushed the patch to master. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-offload-tc: Fix requesting match on wildcarded vlan tpid.
On 19.06.2019 12:16, Ilya Maximets wrote: > 'mask' must be checked first before configuring key in flower. > > CC: Eli Britstein > Fixes: 0b0a84783cd6 ("netdev-tc-offloads: Support match on priority tags") > Signed-off-by: Ilya Maximets > --- > lib/netdev-offload-tc.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) Thanks, Roi and Eli! Applied to master. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] NEWS: Update regarding dumping HW offloaded flows.
On 01.07.2019 13:52, Ilya Maximets wrote: > NEWS update was missed while updating docs for dynamic Flow API. > Since this is a user visible change, it should be mentioned here. > > Fixes: d74ca2269e36 ("dpctl: Update docs about dump-flows and HW offloading.") > Signed-off-by: Ilya Maximets > --- > NEWS | 2 ++ > 1 file changed, 2 insertions(+) > Thanks, Roi and Eli! Applied to master. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] B2B COLLABORATION (BUYING)
Sainsbury's is the UK's second largest supermarket chain. It offers a wide range of other products. Sainsbury's started e-commerce home delivery operations in 1995. In 1996, on Monday 30 December it was announced that Sainsbury's has joined forces with Hewlett-Packard for the development of an Internet-based supermarket offering a full range of products. In 1998 the service was extended back to food with the launch of ‘Orderline'. This enabled customers to order their shopping via telephone, fax or the internet and have it delivered to their front door. We are planning to extend to other EU countries therefore we are looking for new suppliers and we will be pleased to welcome you to support our different projects . We are interested in your products. Can you please send us a quotation. Thank you for giving us the opportunity to collaborate with your company . Don't hesitate to contact myself if you need any further informations . Looking forward to hearing from you . Sainsbury's live well for less Best Regards Sainsbury's Supermarkets Limited Simon Roberts / Project Manager 33 Holborn London United Kingdom EC1N 2HT Email: simo...@sainsburys-uk.com Landline: +44 208 638 0674 My Direct: +44 7908 125 767 Company reg: 3261722 VAT: GB 660 454 836 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 3/3] ovn: Send GARP for router port IPs of a router port connected to bridged logical switch
On Mon, Jul 1, 2019 at 9:45 AM wrote: > > From: Numan Siddique > > This patch handles sending GARPs for > > - router port IPs of a distributed router port > > - router port IPs of a router port which belongs to gateway router >(with the option - redirect-chassis set in Logical_Router.options) > > Signed-off-by: Numan Siddique I tested the patches on my setup and the code looks good to me. A bit unrelated to this change, at least for me the ovn_port_update_sbrec is getting quite hard to follow. Maybe we can consider refactoring it a bit in the future so different functionalities are handled by smaller functions (i.e., router-port, switch-port connected to vif, switch-port connected to router). Acked-by: Dumitru Ceara > --- > ovn/northd/ovn-northd.c | 44 > tests/ovn.at| 89 +++-- > 2 files changed, 105 insertions(+), 28 deletions(-) > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > index e0af234f8..e8cbc3534 100644 > --- a/ovn/northd/ovn-northd.c > +++ b/ovn/northd/ovn-northd.c > @@ -1983,9 +1983,23 @@ get_nat_addresses(const struct ovn_port *op, size_t *n) > } > } else { > /* Centralized NAT rule, either on gateway router or distributed > - * router. */ > -ds_put_format(&c_addresses, " %s", nat->external_ip); > -central_ip_address = true; > + * router. > + * Check if external_ip is same as router ip. If so, then there > + * is no need to add this to the nat_addresses. The router IPs > + * will be added separately. */ > +bool is_router_ip = false; > +for (size_t j = 0; j < op->lrp_networks.n_ipv4_addrs; j++) { > +if (!strcmp(nat->external_ip, > +op->lrp_networks.ipv4_addrs[j].addr_s)) { > +is_router_ip = true; > +break; > +} > +} > + > +if (!is_router_ip) { > +ds_put_format(&c_addresses, " %s", nat->external_ip); > +central_ip_address = true; > +} > } > } > > @@ -2531,13 +2545,26 @@ ovn_port_update_sbrec(struct northd_context *ctx, > * - op->peer has 'reside-on-gateway-chassis' set and the > *the logical router datapath has distributed router port. > * > + * - op->peer is distributed gateway router port. > + * > + * - op->peer's router is a gateway router and op has a localnet > + *port. > + * > * Note: Port_Binding.nat_addresses column is also used for > * sending the GARPs for the router port IPs. > * */ > +bool add_router_port_garp = false; > if (op->peer && op->peer->nbrp && op->peer->od->l3dgw_port && > op->peer->od->l3redirect_port && > -smap_get_bool(&op->peer->nbrp->options, > - "reside-on-redirect-chassis", false)) { > +(smap_get_bool(&op->peer->nbrp->options, > + "reside-on-redirect-chassis", false) || > +op->peer == op->peer->od->l3dgw_port)) { > +add_router_port_garp = true; > +} else if (chassis && op->od->localnet_port) { > +add_router_port_garp = true; > +} > + > +if (add_router_port_garp) { > struct ds garp_info = DS_EMPTY_INITIALIZER; > ds_put_format(&garp_info, "%s", op->peer->lrp_networks.ea_s); > for (size_t i = 0; i < op->peer->lrp_networks.n_ipv4_addrs; > @@ -2545,8 +2572,11 @@ ovn_port_update_sbrec(struct northd_context *ctx, > ds_put_format(&garp_info, " %s", > > op->peer->lrp_networks.ipv4_addrs[i].addr_s); > } > -ds_put_format(&garp_info, " is_chassis_resident(%s)", > - op->peer->od->l3redirect_port->json_key); > + > +if (op->peer->od->l3redirect_port) { > +ds_put_format(&garp_info, " is_chassis_resident(%s)", > + op->peer->od->l3redirect_port->json_key); > +} > > sbrec_port_binding_update_nat_addresses_addvalue( > op->sb, ds_cstr(&garp_info)); > diff --git a/tests/ovn.at b/tests/ovn.at > index ea627e128..2e266d94a 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -6730,6 +6730,9 @@ AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown]) > AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet]) > AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1]) > > +# Wait until the patch ports are created in hv1 to connect br-int to br-eth0 > +OVS_WAIT_UNTIL([test 1 = `as hv1 ovs-vsctl show | \ > +grep "Por
Re: [ovs-dev] [PATCH v4 0/3] vhost tx retry updates
On 2 Jul 2019, at 2:32, Kevin Traynor wrote: > v4: > - 1/2 New patch: Move vhost tx retries doc to a seperate section (David) > > - 2/3 > -- Changed tx_retries to be a custom stat for vhost (Ilya) > -- Added back in MIN() that was dropped in v2, as in retesting I >saw it is needed when the retry limit is reached to prevent >an accounting error > -- note: this is not a typo, it was just out of date > -/* 44 pad bytes here. */ > +/* 4 pad bytes here. */ > -- Removed acks due to the amount of changes made > > > - 3/3 > -- Added link between doc sections on vhost tx retries and >and config (Ian) > -- Fixed spacing, fixed typo and added default in docs (Ilya) > -- Some minor renaming due to 2/3 changes > -- Changed config option name from "vhost-tx-retries" to "tx-retries-max" >in order to have a generic user facing name in case it is ever extended >to other interfaces > -- Other minor edits > -- Kept acks as the operation didn't really change Changes look good to me, so ack for the series Acked-by: Eelco Chaudron > v3: > - 2/4 really updated text to NETDEV_MAX_BURST this time > - 4/4 comments from Flavio. Made max retries limit 32. > > v2: > - 1/4 - split out minor fix from docs patch > - 2/4 - updated text, NETDEV_MAX_BURST, QEMU/libvirt versions > - 3/4 - removed unneeded MIN() > - 4/4 - explicitly range check, removed unneeded tx_counters fn changes > > Kevin Traynor (3): > doc: Move vhost tx retry info to separate section. > netdev-dpdk: Add custom stat for vhost tx retries. > netdev-dpdk: Enable vhost-tx-retries config. > > Documentation/topics/dpdk/vhost-user.rst | 105 +++ > lib/netdev-dpdk.c| 75 +++- > vswitchd/vswitch.xml | 12 +++ > 3 files changed, 152 insertions(+), 40 deletions(-) > > -- > 2.20.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 4/4] OVN: Add IGMP tests
Signed-off-by: Dumitru Ceara --- tests/ovn.at | 270 ++ 1 file changed, 270 insertions(+) diff --git a/tests/ovn.at b/tests/ovn.at index daf85a5..aa8c46f 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -14017,3 +14017,273 @@ ovn-hv4-0 OVN_CLEANUP([hv1], [hv2], [hv3]) AT_CLEANUP + +AT_SETUP([ovn -- IGMP snoop/querier]) +AT_SKIP_IF([test $HAVE_PYTHON = no]) +ovn_start + +# Logical network: +# Two independent logical switches (sw1 and sw2). +# sw1: +# - subnet 10.0.0.0/8 +# - 2 ports bound on hv1 (sw1-p11, sw1-p12) +# - 2 ports bound on hv2 (sw1-p21, sw1-p22) +# sw2: +# - subnet 20.0.0.0/8 +# - 1 port bound on hv1 (sw2-p1) +# - 1 port bound on hv2 (sw2-p2) +# - IGMP Querier from 20.0.0.254 + +reset_pcap_file() { +local iface=$1 +local pcap_file=$2 +ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \ +options:rxq_pcap=dummy-rx.pcap +rm -f ${pcap_file}*.pcap +ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \ +options:rxq_pcap=${pcap_file}-rx.pcap +} + +ip_to_hex() { + printf "%02x%02x%02x%02x" "$@" +} + +# +# send_igmp_v3_report INPORT HV ETH_SRC IP_SRC IP_CSUM GROUP REC_TYPE +# IGMP_CSUM OUTFILE +# +# This shell function causes an IGMPv3 report to be received on INPORT of HV. +# The packet's content has Ethernet destination 01:00:5E:00:00:22 and source +# ETH_SRC (exactly 12 hex digits). Ethernet type is set to IP. +# GROUP is the IP multicast group to be joined/to leave (based on REC_TYPE). +# REC_TYPE == 04: join GROUP +# REC_TYPE == 03: leave GROUP +# The packet hexdump is also stored in OUTFILE. +# +send_igmp_v3_report() { +local inport=$1 hv=$2 eth_src=$3 ip_src=$4 ip_chksum=$5 group=$6 +local rec_type=$7 igmp_chksum=$8 outfile=$9 + +local eth_dst=01005e16 +local ip_dst=$(ip_to_hex 224 0 0 22) +local ip_ttl=01 +local ip_ra_opt=9404 + +local igmp_type=2200 +local num_rec=0001 +local aux_dlen=00 +local num_src= + +local eth=${eth_dst}${eth_src}0800 +local ip=46c000284000${ip_ttl}02${ip_chksum}${ip_src}${ip_dst}${ip_ra_opt} +local igmp=${igmp_type}${igmp_chksum}${num_rec}${rec_type}${aux_dlen}${num_src}${group} +local packet=${eth}${ip}${igmp} + +echo ${packet} >> ${outfile} +as $hv ovs-appctl netdev-dummy/receive ${inport} ${packet} +} + +# +# store_igmp_v3_query ETH_SRC IP_SRC IP_CSUM OUTFILE +# +# This shell function builds an IGMPv3 general query from ETH_SRC and IP_SRC +# and stores the hexdump of the packet in OUTFILE. +# +store_igmp_v3_query() { +local eth_src=$1 ip_src=$2 ip_chksum=$3 outfile=$4 + +local eth_dst=01005e01 +local ip_dst=$(ip_to_hex 224 0 0 1) +local ip_ttl=01 +local igmp_type=11 +local max_resp=0a +local igmp_chksum=eeeb +local addr= + +local eth=${eth_dst}${eth_src}0800 +local ip=45204000${ip_ttl}02${ip_chksum}${ip_src}${ip_dst} +local igmp=${igmp_type}${max_resp}${igmp_chksum}${addr}000a +local packet=${eth}${ip}${igmp} + +echo ${packet} >> ${outfile} +} + +# +# send_ip_multicast_pkt INPORT HV ETH_SRC ETH_DST IP_SRC IP_DST IP_LEN +#IP_PROTO DATA OUTFILE +# +# This shell function causes an IP multicast packet to be received on INPORT +# of HV. +# The hexdump of the packet is stored in OUTFILE. +# +send_ip_multicast_pkt() { +local inport=$1 hv=$2 eth_src=$3 eth_dst=$4 ip_src=$5 ip_dst=$6 +local ip_len=$7 ip_chksum=$8 proto=$9 data=${10} outfile=${11} + +local ip_ttl=20 + +local eth=${eth_dst}${eth_src}0800 +local ip=45${ip_len}95f14000${ip_ttl}${proto}${ip_chksum}${ip_src}${ip_dst} +local packet=${eth}${ip}${data} + +as $hv ovs-appctl netdev-dummy/receive ${inport} ${packet} +echo ${packet} >> ${outfile} +} + +ovn-nbctl ls-add sw1 +ovn-nbctl ls-add sw2 + +ovn-nbctl lsp-add sw1 sw1-p11 +ovn-nbctl lsp-add sw1 sw1-p12 +ovn-nbctl lsp-add sw1 sw1-p21 +ovn-nbctl lsp-add sw1 sw1-p22 +ovn-nbctl lsp-add sw2 sw2-p1 +ovn-nbctl lsp-add sw2 sw2-p2 + +net_add n1 +sim_add hv1 +as hv1 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.1 +ovs-vsctl -- add-port br-int hv1-vif1 -- \ +set interface hv1-vif1 external-ids:iface-id=sw1-p11 \ +options:tx_pcap=hv1/vif1-tx.pcap \ +options:rxq_pcap=hv1/vif1-rx.pcap \ +ofport-request=1 +ovs-vsctl -- add-port br-int hv1-vif2 -- \ +set interface hv1-vif2 external-ids:iface-id=sw1-p12 \ +options:tx_pcap=hv1/vif2-tx.pcap \ +options:rxq_pcap=hv1/vif2-rx.pcap \ +ofport-request=1 +ovs-vsctl -- add-port br-int hv1-vif3 -- \ +set interface hv1-vif3 external-ids:iface-id=sw2-p1 \ +options:tx_pcap=hv1/vif3-tx.pcap \ +options:rxq_pcap=hv1/vif3-rx.pcap \ +ofport-request=1 + +sim_add hv2 +as hv2 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.2 +ovs-vsctl -- add-port br-int hv2-vif1 -- \ +set interface hv2-vif1 external-ids:iface-id=sw1-p21 \ +opti
[ovs-dev] [PATCH 3/4] OVN: Add ovn-northd IGMP support
New IP Multicast Snooping Options are added to the Northbound DB Logical_Switch:other_config column. These allow enabling IGMP snooping and querier on the logical switch and get translated by ovn-northd to rows in the IP_Multicast Southbound DB table. ovn-northd monitors for changes done by ovn-controllers in the Southbound DB IGMP_Group table. Based on the entries in IGMP_Group ovn-northd creates Multicast_Group entries in the Southbound DB, one per IGMP_Group address X, containing the list of logical switch ports (aggregated from all controllers) that have IGMP_Group entries for that datapath and address X. ovn-northd also creates a logical flow that matches on IP multicast traffic destined to address X and outputs it on the tunnel key of the corresponding Multicast_Group entry. Signed-off-by: Dumitru Ceara --- ovn/northd/ovn-northd.c | 446 --- ovn/ovn-nb.xml | 54 ++ 2 files changed, 471 insertions(+), 29 deletions(-) diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 0b0a96a..3f24fbe 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -29,6 +29,7 @@ #include "openvswitch/json.h" #include "ovn/lex.h" #include "ovn/lib/chassis-index.h" +#include "ovn/lib/ip-mcast-index.h" #include "ovn/lib/ovn-l7.h" #include "ovn/lib/ovn-nb-idl.h" #include "ovn/lib/ovn-sb-idl.h" @@ -57,6 +58,7 @@ struct northd_context { struct ovsdb_idl_txn *ovnnb_txn; struct ovsdb_idl_txn *ovnsb_txn; struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name; +struct ovsdb_idl_index *sbrec_ip_mcast_by_dp; }; static const char *ovnnb_db; @@ -424,6 +426,32 @@ struct ipam_info { bool mac_only; }; +struct mcast_info { +bool enabled; +bool querier; +bool flood_unregistered; + +uint32_t table_size; +uint32_t idle_timeout; +uint32_t query_interval; +char *eth_src; +char *ipv4_src; +uint8_t query_max_response; + +uint32_t active_flows; +}; + +/* + * Multicast snooping idle timeout and query interval definitions. + */ +#define OVN_MCAST_MIN_IDLE_TIMEOUT_S 15 +#define OVN_MCAST_MAX_IDLE_TIMEOUT_S 3600 +#define OVN_MCAST_DEFAULT_IDLE_TIMEOUT_S 300 +#define OVN_MCAST_MIN_QUERY_INTERVAL_S 1 +#define OVN_MCAST_MAX_QUERY_INTERVAL_S OVN_MCAST_MAX_IDLE_TIMEOUT_S +#define OVN_MCAST_DEFAULT_QUERY_MAX_RESPONSE_S 1 +#define OVN_MCAST_DEFAULT_MAX_ENTRIES 2048 + /* The 'key' comes from nbs->header_.uuid or nbr->header_.uuid or * sb->external_ids:logical-switch. */ struct ovn_datapath { @@ -448,6 +476,9 @@ struct ovn_datapath { /* IPAM data. */ struct ipam_info ipam_info; +/* Multicast data. */ +struct mcast_info mcast_info; + /* OVN northd only needs to know about the logical router gateway port for * NAT on a distributed router. This "distributed gateway port" is * populated only when there is a "redirect-chassis" specified for one of @@ -522,6 +553,8 @@ ovn_datapath_destroy(struct hmap *datapaths, struct ovn_datapath *od) hmap_remove(datapaths, &od->key_node); destroy_tnlids(&od->port_tnlids); bitmap_free(od->ipam_info.allocated_ipv4s); +free(od->mcast_info.eth_src); +free(od->mcast_info.ipv4_src); free(od->router_ports); ovn_ls_port_group_destroy(&od->nb_pgs); free(od); @@ -659,6 +692,80 @@ init_ipam_info_for_datapath(struct ovn_datapath *od) } static void +init_mcast_info_for_datapath(struct ovn_datapath *od) +{ +if (!od->nbs) { +return; +} + +struct mcast_info *mcast_info = &od->mcast_info; + +mcast_info->enabled = +smap_get_bool(&od->nbs->other_config, "mcast_snoop", false); +mcast_info->querier = +smap_get_bool(&od->nbs->other_config, "mcast_querier", true); +mcast_info->flood_unregistered = +smap_get_bool(&od->nbs->other_config, "mcast_flood_unregistered", + false); + +mcast_info->table_size = +smap_get_ullong(&od->nbs->other_config, "mcast_table_size", +OVN_MCAST_DEFAULT_MAX_ENTRIES); + +uint32_t idle_timeout = +smap_get_ullong(&od->nbs->other_config, "mcast_idle_timeout", +OVN_MCAST_DEFAULT_IDLE_TIMEOUT_S); +if (idle_timeout < OVN_MCAST_MIN_IDLE_TIMEOUT_S) { +idle_timeout = OVN_MCAST_MIN_IDLE_TIMEOUT_S; +} else if (idle_timeout > OVN_MCAST_MAX_IDLE_TIMEOUT_S) { +idle_timeout = OVN_MCAST_MAX_IDLE_TIMEOUT_S; +} +mcast_info->idle_timeout = idle_timeout; + +uint32_t query_interval = +smap_get_ullong(&od->nbs->other_config, "mcast_query_interval", +mcast_info->idle_timeout / 2); +if (query_interval < OVN_MCAST_MIN_QUERY_INTERVAL_S) { +query_interval = OVN_MCAST_MIN_QUERY_INTERVAL_S; +} else if (query_interval > OVN_MCAST_MAX_QUERY_INTERVAL_S) { +query_interval = OVN_MCAST_MAX_QUERY_
[ovs-dev] [PATCH 2/4] OVN: Add IGMP SB definitions and ovn-controller support
A new IP_Multicast table is added to Southbound DB. This table stores the multicast related configuration for each datapath. Each row will be populated by ovn-northd and will control: - if IGMP Snooping is enabled or not, the snooping table size and multicast group idle timeout. - if IGMP Querier is enabled or not (only if snooping is enabled too), query interval, query source addresses (Ethernet and IP) and the max-response field to be stored in outgoing queries. - an additional "seq_no" column is added such that ovn-sbctl or if needed a CMS can flush currently learned groups. This can be achieved by incrementing the "seq_no" value. A new IGMP_Group table is added to Southbound DB. This table stores all the multicast groups learned by ovn-controllers. The table is indexed by datapath, group address and chassis. For a learned multicast group on a specific datapath each ovn-controller will store its own row in this table. Each row contains the list of chassis-local ports on which the group was learned. Rows in the IGMP_Group table are updated or deleted only by the ovn-controllers that created them. A new action ("igmp") is added to punt IGMP packets on a specific logical switch datapath to ovn-controller if IGMP snooping is enabled. Per datapath IGMP multicast snooping support is added to pinctrl: - incoming IGMP reports are processed and multicast groups are maintained (using the OVS mcast-snooping library). - each OVN controller syncs its in-memory IGMP groups to the Southbound DB in the IGMP_Group table. - pinctrl also sends periodic IGMPv3 general queries for all datapaths where querier is enabled. Signed-off-by: Mark Michelson Co-authored-by: Mark Michelson Signed-off-by: Dumitru Ceara --- include/ovn/actions.h |6 ovn/controller/automake.mk |2 ovn/controller/ip-mcast.c | 164 + ovn/controller/ip-mcast.h | 52 +++ ovn/controller/ovn-controller.c | 23 + ovn/controller/pinctrl.c| 742 +++ ovn/controller/pinctrl.h|2 ovn/lib/actions.c | 22 + ovn/lib/automake.mk |2 ovn/lib/ip-mcast-index.c| 40 ++ ovn/lib/ip-mcast-index.h| 30 ++ ovn/lib/logical-fields.c|2 ovn/ovn-sb.ovsschema| 39 ++ ovn/ovn-sb.xml | 73 ovn/utilities/ovn-sbctl.c | 53 +++ ovn/utilities/ovn-trace.c | 14 + 16 files changed, 1261 insertions(+), 5 deletions(-) create mode 100644 ovn/controller/ip-mcast.c create mode 100644 ovn/controller/ip-mcast.h create mode 100644 ovn/lib/ip-mcast-index.c create mode 100644 ovn/lib/ip-mcast-index.h diff --git a/include/ovn/actions.h b/include/ovn/actions.h index f42bbc2..1cc022b 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -67,6 +67,7 @@ struct ovn_extend_table; OVNACT(ICMP4, ovnact_nest)\ OVNACT(ICMP4_ERROR, ovnact_nest)\ OVNACT(ICMP6, ovnact_nest)\ +OVNACT(IGMP, ovnact_nest)\ OVNACT(TCP_RESET, ovnact_nest)\ OVNACT(ND_NA, ovnact_nest)\ OVNACT(ND_NA_ROUTER, ovnact_nest)\ @@ -486,6 +487,11 @@ enum action_opcode { * The actions, in OpenFlow 1.3 format, follow the action_header. */ ACTION_OPCODE_ICMP4_ERROR, +/* "igmp()". + * + * Snoop IGMP, learn the multicast participants + */ +ACTION_OPCODE_IGMP, }; /* Header. */ diff --git a/ovn/controller/automake.mk b/ovn/controller/automake.mk index fcdf7a4..193ea69 100644 --- a/ovn/controller/automake.mk +++ b/ovn/controller/automake.mk @@ -10,6 +10,8 @@ ovn_controller_ovn_controller_SOURCES = \ ovn/controller/encaps.h \ ovn/controller/ha-chassis.c \ ovn/controller/ha-chassis.h \ + ovn/controller/ip-mcast.c \ + ovn/controller/ip-mcast.h \ ovn/controller/lflow.c \ ovn/controller/lflow.h \ ovn/controller/lport.c \ diff --git a/ovn/controller/ip-mcast.c b/ovn/controller/ip-mcast.c new file mode 100644 index 000..ef36be2 --- /dev/null +++ b/ovn/controller/ip-mcast.c @@ -0,0 +1,164 @@ +/* Copyright (c) 2019, Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include "ip-mcast.h" +#include "lport.h" +#include "ovn/lib/ovn-sb-idl.h" + +/* + * Used for (faster) upd
[ovs-dev] [PATCH 0/4] OVN: Add IGMP support
This series introduces support for IGMP Snooping and IGMP Querier. IGMP versions v1-v3 are supported for snooping and IGMP queries originated by ovn-controller are general IGMPv3 queries. The rationale behind using v3 for querier is that it's backward compatible with v1-v2. The majority of the code is IP version independent with the thought in mind that support for MLD snooping for IPv6 will be added next. Dumitru Ceara (4): packets: Add IGMPv3 query packet definitions OVN: Add IGMP SB definitions and ovn-controller support OVN: Add ovn-northd IGMP support OVN: Add IGMP tests include/ovn/actions.h |6 lib/packets.c | 44 ++ lib/packets.h | 19 + ovn/controller/automake.mk |2 ovn/controller/ip-mcast.c | 164 + ovn/controller/ip-mcast.h | 52 +++ ovn/controller/ovn-controller.c | 23 + ovn/controller/pinctrl.c| 742 +++ ovn/controller/pinctrl.h|2 ovn/lib/actions.c | 22 + ovn/lib/automake.mk |2 ovn/lib/ip-mcast-index.c| 40 ++ ovn/lib/ip-mcast-index.h| 30 ++ ovn/lib/logical-fields.c|2 ovn/northd/ovn-northd.c | 446 ++- ovn/ovn-nb.xml | 54 +++ ovn/ovn-sb.ovsschema| 39 ++ ovn/ovn-sb.xml | 73 ovn/utilities/ovn-sbctl.c | 53 +++ ovn/utilities/ovn-trace.c | 14 + tests/ovn.at| 270 ++ 21 files changed, 2064 insertions(+), 35 deletions(-) create mode 100644 ovn/controller/ip-mcast.c create mode 100644 ovn/controller/ip-mcast.h create mode 100644 ovn/lib/ip-mcast-index.c create mode 100644 ovn/lib/ip-mcast-index.h ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 1/4] packets: Add IGMPv3 query packet definitions
Signed-off-by: Dumitru Ceara --- lib/packets.c | 44 lib/packets.h | 19 ++- 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/lib/packets.c b/lib/packets.c index a8fd61f..ab0b1a3 100644 --- a/lib/packets.c +++ b/lib/packets.c @@ -1281,6 +1281,50 @@ packet_set_icmp(struct dp_packet *packet, uint8_t type, uint8_t code) } } +/* Sets the IGMP type to IGMP_HOST_MEMBERSHIP_QUERY and populates the + * v3 query header fields in 'packet'. 'packet' must be a valid IGMPv3 + * query packet with its l4 offset properly populated. + */ +void +packet_set_igmp3_query(struct dp_packet *packet, uint8_t max_resp, + ovs_be32 group, bool srs, uint8_t qrv, uint8_t qqic) +{ +struct igmpv3_query_header *igh = dp_packet_l4(packet); +ovs_be16 orig_type_max_resp = +htons(igh->type << 8 | igh->max_resp); +ovs_be16 new_type_max_resp = +htons(IGMP_HOST_MEMBERSHIP_QUERY << 8 | max_resp); + +if (orig_type_max_resp != new_type_max_resp) { +igh->type = IGMP_HOST_MEMBERSHIP_QUERY; +igh->max_resp = max_resp; +igh->csum = recalc_csum16(igh->csum, orig_type_max_resp, + new_type_max_resp); +} + +ovs_be32 old_group = get_16aligned_be32(&igh->group); + +if (old_group != group) { +put_16aligned_be32(&igh->group, group); +igh->csum = recalc_csum32(igh->csum, old_group, group); +} + +/* See RFC 3376 4.1.6. */ +if (qrv > 7) { +qrv = 0; +} + +ovs_be16 orig_srs_qrv_qqic = htons(igh->srs_qrv << 8 | igh->qqic); +ovs_be16 new_srs_qrv_qqic = htons(srs << 11 | qrv << 8 | qqic); + +if (orig_srs_qrv_qqic != new_srs_qrv_qqic) { +igh->srs_qrv = (srs << 3 | qrv); +igh->qqic = qqic; +igh->csum = recalc_csum16(igh->csum, orig_srs_qrv_qqic, + new_srs_qrv_qqic); +} +} + void packet_set_nd_ext(struct dp_packet *packet, const ovs_16aligned_be32 rso_flags, const uint8_t opt_type) diff --git a/lib/packets.h b/lib/packets.h index d293b35..4124490 100644 --- a/lib/packets.h +++ b/lib/packets.h @@ -681,6 +681,7 @@ char *ip_parse_cidr_len(const char *s, int *n, ovs_be32 *ip, #define IP_ECN_ECT_0 0x02 #define IP_ECN_CE 0x03 #define IP_ECN_MASK 0x03 +#define IP_DSCP_CS6 0xc0 #define IP_DSCP_MASK 0xfc static inline int @@ -763,6 +764,20 @@ struct igmpv3_header { }; BUILD_ASSERT_DECL(IGMPV3_HEADER_LEN == sizeof(struct igmpv3_header)); +#define IGMPV3_QUERY_HEADER_LEN 12 +struct igmpv3_query_header { +uint8_t type; +uint8_t max_resp; +ovs_be16 csum; +ovs_16aligned_be32 group; +uint8_t srs_qrv; +uint8_t qqic; +ovs_be16 nsrcs; +}; +BUILD_ASSERT_DECL( +IGMPV3_QUERY_HEADER_LEN == sizeof(struct igmpv3_query_header +)); + #define IGMPV3_RECORD_LEN 8 struct igmpv3_record { uint8_t type; @@ -1543,7 +1558,9 @@ void packet_set_nd(struct dp_packet *, const struct in6_addr *target, void packet_set_nd_ext(struct dp_packet *packet, const ovs_16aligned_be32 rso_flags, const uint8_t opt_type); - +void packet_set_igmp3_query(struct dp_packet *, uint8_t max_resp, +ovs_be32 group, bool srs, uint8_t qrv, +uint8_t qqic); void packet_format_tcp_flags(struct ds *, uint16_t); const char *packet_tcp_flag_to_string(uint32_t flag); void compose_arp__(struct dp_packet *); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev