Re: [ovs-dev] [PATCH] netdev-linux: correct unit of burst parameter
On Wed, Feb 24, 2021 at 6:38 AM Simon Horman wrote: > > From: Yong Xu > > Correct calculation of burst parameter used when configuring TC policer > action for ingress port-based policing in the case where TC offload is in > use. This now matches the value calculated for the case where TC offload is > not in use. > > The division by 8 is to convert from bits to bytes. > Its unclear why 64 was previously used. A minor nit. s/Its/It's > > Fixes: e7f6ba220 ("lib/tc: add ingress ratelimiting support for tc-offload") > Signed-off-by: Yong Xu > [simon: reworked changelog] > Signed-off-by: Simon Horman > Signed-off-by: Louis Peens > --- This patch LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] connmgr: Check nullptr inside ofmonitor_report()
On Tue, Feb 16, 2021 at 1:06 PM Yifeng Sun wrote: > > ovs-vswitchd could crash under these circumstances: > 1. When one bridge is being destroyed, ofproto_destroy() is called and > connmgr pointer of its ofproto struct is nullified. This ofproto struct is > deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'. > 2. Before RCU enters quiesce state to actually free this ofproto struct, > revalidator thread calls udpif_revalidator(), which could handle > a learn flow and calls ofproto_flow_mod_learn(), it later calls > ofmonitor_report() and ofproto struct's connmgr pointer is accessed. > > The crash stack trace is shown below: > > 0 ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, > event=event@entry=NXFME_ADDED, > reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, > old_actions=old_actions@entry=0x0) > at ofproto/connmgr.c:2160 > 1 0x7fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, > ofm=, req=req@entry=0x0) > at ofproto/ofproto.c:5221 > 2 0x7fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, > ofproto=0x55d9075d4ab0) > at ofproto/ofproto.c:5823 > 3 ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, > ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0) > at ofproto/ofproto.c:8088 > 4 0x7fa4d680372d in ofproto_flow_mod_learn_finish > (ofm=ofm@entry=0x7fa4980753f0, > orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439 > 5 0x7fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, > keep_ref=keep_ref@entry=true, > limit=, below_limitp=below_limitp@entry=0x0) at > ofproto/ofproto.c:5499 > 6 0x7fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, > stats=stats@entry=0x7fa4d2701a10, > offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127 > 7 0x7fa4d6835e3a in xlate_push_stats (xcache=, > stats=stats@entry=0x7fa4d2701a10, > offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181 > 8 0x7fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, > ukey=ukey@entry=0x7fa4b0191660, > stats=stats@entry=0x7fa4d2705118, > odp_actions=odp_actions@entry=0x7fa4d2701b50, > reval_seq=reval_seq@entry=5655486242, > recircs=recircs@entry=0x7fa4d2701b40, offloaded=false) > at ofproto/ofproto-dpif-upcall.c:2294 > 9 0x7fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at > ofproto/ofproto-dpif-upcall.c:2683 > 10 0x7fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at > ofproto/ofproto-dpif-upcall.c:936 > 11 0x7fa4d6259c9f in ovsthread_wrapper (aux_=) at > lib/ovs-thread.c:423 > 12 0x7fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0 > 13 0x7fa4d504b96d in clone () from /usr/lib64/libc.so.6 > > At the time of crash, the involved ofproto was already deallocated: > > (gdb) print *ofproto > $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {..., > one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ... > > This patch fixes it. > > VMware-BZ: #2700626 > Signed-off-by: Yifeng Sun > --- LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] rhel: Add option to enable AF_XDP on rpm package
On Tue, Feb 2, 2021 at 9:32 AM Ilya Maximets wrote: > > On 1/21/21 7:51 PM, Yi-Hung Wei wrote: > > This patch adds an RPMBUILD_OPT so that user can enable > > AF_XDP support in the rpm package by: > > > > $ make rpm-fedora RPMBUILD_OPT="--with afxdp" > > Thanks! > This change looks good to me in general. > > Could you, please, also update Documentation/intro/install/fedora.rst > with this new option? > > Best regards, Ilya Maximets. Thanks for Yifeng and Ilya for review. Please find the v2 with the documentation update here: https://patchwork.ozlabs.org/project/openvswitch/patch/1612398452-114205-1-git-send-email-yihung@gmail.com/ Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] rhel: Add option to enable AF_XDP on rpm package
This patch adds an RPMBUILD_OPT so that user can enable AF_XDP support in the rpm package by: $ make rpm-fedora RPMBUILD_OPT="--with afxdp" Signed-off-by: Yi-Hung Wei --- Documentation/intro/install/fedora.rst | 7 +++ rhel/openvswitch-fedora.spec.in| 8 2 files changed, 15 insertions(+) diff --git a/Documentation/intro/install/fedora.rst b/Documentation/intro/install/fedora.rst index 4a2f3507cbbb..06a0bd3d59e1 100644 --- a/Documentation/intro/install/fedora.rst +++ b/Documentation/intro/install/fedora.rst @@ -117,6 +117,13 @@ can be added: $ make rpm-fedora RPMBUILD_OPT="--with dpdk --without check" +To enable AF_XDP support in the openvswitch package, the ``--with afxdp`` +option can be added: + +:: + +$ make rpm-fedora RPMBUILD_OPT="--with afxdp --without check" + You can also have the above commands automatically run the Open vSwitch unit tests. This can take several minutes. diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in index 2c0c4fa186a3..e03b26b6af34 100644 --- a/rhel/openvswitch-fedora.spec.in +++ b/rhel/openvswitch-fedora.spec.in @@ -28,6 +28,8 @@ %bcond_without libcapng # To enable DPDK support, specify '--with dpdk' when building %bcond_with dpdk +# To enable AF_XDP support, specify '--with afxdp' when building +%bcond_with afxdp # If there is a need to automatically enable the package after installation, # specify the "--with autoenable" @@ -73,6 +75,9 @@ BuildRequires: libpcap-devel numactl-devel BuildRequires: dpdk-devel >= 17.05.1 Provides: %{name}-dpdk = %{version}-%{release} %endif +%if %{with afxdp} +BuildRequires: libbpf-devel numactl-devel +%endif BuildRequires: unbound unbound-devel Requires: openssl hostname iproute module-init-tools unbound @@ -164,6 +169,9 @@ This package provides IPsec tunneling support for OVS tunnels. %if %{with dpdk} --with-dpdk=$(dirname %{_datadir}/dpdk/*/.config) \ %endif +%if %{with afxdp} +--enable-afxdp \ +%endif --enable-ssl \ --disable-static \ --enable-shared \ -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] rhel: Add option to enable AF_XDP on rpm package
This patch adds an RPMBUILD_OPT so that user can enable AF_XDP support in the rpm package by: $ make rpm-fedora RPMBUILD_OPT="--with afxdp" Signed-off-by: Yi-Hung Wei --- rhel/openvswitch-fedora.spec.in | 8 1 file changed, 8 insertions(+) diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in index 2c0c4fa186a3..e03b26b6af34 100644 --- a/rhel/openvswitch-fedora.spec.in +++ b/rhel/openvswitch-fedora.spec.in @@ -28,6 +28,8 @@ %bcond_without libcapng # To enable DPDK support, specify '--with dpdk' when building %bcond_with dpdk +# To enable AF_XDP support, specify '--with afxdp' when building +%bcond_with afxdp # If there is a need to automatically enable the package after installation, # specify the "--with autoenable" @@ -73,6 +75,9 @@ BuildRequires: libpcap-devel numactl-devel BuildRequires: dpdk-devel >= 17.05.1 Provides: %{name}-dpdk = %{version}-%{release} %endif +%if %{with afxdp} +BuildRequires: libbpf-devel numactl-devel +%endif BuildRequires: unbound unbound-devel Requires: openssl hostname iproute module-init-tools unbound @@ -164,6 +169,9 @@ This package provides IPsec tunneling support for OVS tunnels. %if %{with dpdk} --with-dpdk=$(dirname %{_datadir}/dpdk/*/.config) \ %endif +%if %{with afxdp} +--enable-afxdp \ +%endif --enable-ssl \ --disable-static \ --enable-shared \ -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovs v2] conntrack: Fix the icmp conntrack new state.
On Thu, Jan 21, 2021 at 1:15 AM wrote: > > From: Tonghao Zhang > > The same icmp packet may traverse conntrack module more than once. > Or same icmp packets traverse contranck module in orderly. > > Don't change state to CS_ESTABLISHED before receiving reply or related > packets. > > Fixes: a867c010ee91 ("conntrack: Fix conntrack new state") > Signed-off-by: Tonghao Zhang > --- > v2: > 1. add test case > 2. change the commit message, and title > 3. change the fix tag > --- Thanks for v2. LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovs] conntrack: Update the icmp stats accurately.
On Tue, Jan 19, 2021 at 6:45 AM Aaron Conole wrote: > > The fixes tag for this should be: > > Fixes: a867c010ee91 ("conntrack: Fix conntrack new state") Yes, commit a867c010ee91 ("conntrack: Fix conntrack new state") is the right one to fix. Commit a867c010ee91 only takes care of TCP, UDP (conntrack-others) but does not address ICMP properly. I checked Tonghao's patch and it does address the ICMP case properly. > Since that was the patch which attempted to make the behavior between > kernel-userspace consistent. > > Please submit a v2 with the following diff (and run 'make > check-system-userspace' and 'make check-kernel' to ensure proper > coverage of the tests). This way, we don't break the behavior in the > future (and can guarantee consistency). I did try this out and it > worked for me. > > 122: conntrack - Multiple ICMP traverse ok > > --- a/tests/system-traffic.at > +++ b/tests/system-traffic.at > @@ -5898,6 +5898,58 @@ ovs-appctl dpif/dump-flows br0 > OVS_TRAFFIC_VSWITCHD_STOP > AT_CLEANUP > > +# OFPROTO_CLEAR_DURATION_IDLE([]) > +# > +# Clear the duration from the piped input which would differ from test to > test > +# > +m4_define([OFPROTO_CLEAR_DURATION_IDLE], [[sed -e > 's/duration=.*s,/duration=,/g' -e > 's/idle_age=[0-9]*,/idle_age=,/g']]) Thanks Aaron for adding this new system traffic test. It does work on my machine for both kernel and userspace conntrack. One suggestion for the test is that usually we add the testing m4 marco for system traffic test on system-common-macros.at. The rest of the test case looks good to me. Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-natvie-tnl: Fix geneve tunnel flag
On Wed, Dec 16, 2020 at 1:24 AM Ilya Maximets wrote: > > On 12/16/20 12:41 AM, Yi-Hung Wei wrote: > > On userspace datapath, geneve option flag FLOW_TNL_F_UDPIF should only > > be set when the geneve option is available. However, currently, we always > > set FLOW_TNL_F_UDPIF in the metadata flag, and that may result in megaflow > > revalidation issue when the geneve option length is zero due to the datapath > > flow key mis-match. That is the original flow key always has > > FLOW_TNL_F_UDPIF > > set, while the regenerated flow key during revalidation process only has > > FLOW_TNL_F_UDPIF set if geneve option length is zero. For reference, check > > tun_metadata_to_geneve_nlattr() to see how the flow key of geneve > > tunnel metadatafor are generated from netlink attributes. > > > > The following are the reproducible steps reported by Antonin. > > After step 6, OvS reports a warning about failing to update the datapath > > flow. > > > > 2020-12-15T01:56:21.324Z|7|dpif(revalidator4)|WARN|netdev@ovs-netdev: > > failed to put[modify] (No such file or directory) > > ufid:d252fe62-5b2a-44e6-846a-190328190e09 skb_priority(0/0), > > tunnel(tun_id=0x0,src=192.168.77.1,dst=192.168.77.2,ttl=64/0,tp_src=57391/0, > > tp_dst=6081/0,flags(-df-csum+key)),skb_mark(0/0),ct_state(0x21/0x21), > > ct_zone(0xfff0/0),ct_mark(0/0),ct_label(0/0),ct_tuple4(src=10.10.0.1/0.0.0.0, > > dst=10.10.0.2/0.0.0.0,proto=1/0,tp_src=8/0,tp_dst=0/0),recirc_id(0xa), > > dp_hash(0/0),in_port(5),packet_type(ns=0,id=0), > > eth(src=1e:9c:f0:fb:18:9c/00:00:00:00:00:00,dst=ee:9f:60:8a:c0:a5/00:00:00:00:00:00), > > eth_type(0x0800),ipv4(src=10.10.0.1/0.0.0.0,dst=10.10.0.2,proto=1/0,tos=0/0, > > ttl=64/0,frag=no),icmp(type=8/0,code=0/0), actions:drop > > > > Setup: > > 2 Nodes on the same subnet (192.168.77.101/24 - 192.168.77.102/24) > > > > Step 1 – Creating bridges (run on each Node): > > > > iface="enp0s8" > > > > hwaddr=$(ip link show $iface | grep link/ether | awk '{print $2}') > > inet=$(ip addr show $iface | grep "inet " | awk '{ print $2 }') > > > > ovs-vsctl add-br br-phy \ > > -- set Bridge br-phy datapath_type=netdev \ > > -- br-set-external-id br-phy bridge-id br-phy \ > > -- set bridge br-phy fail-mode=standalone \ > > other_config:hwaddr="$hwaddr" > > ovs-vsctl --timeout 10 add-port br-phy "$iface" > > ip addr add "$inet" dev br-phy > > ip link set br-phy up > > ip addr flush dev enp0s8 2>/dev/null > > ip link set enp0s8 up > > iptables -F > > > > ovs-vsctl add-br br-int \ > > -- set Bridge br-int datapath_type=netdev \ > > -- set bridge br-phy fail-mode=standalone > > > > Step 2 – Creating netns to for overlay endpoints: > > > > On first Node (192.168.77.101): > > ip netns add ns0 > > ip link add veth0 type veth peer name veth1 > > ip link set veth0 netns ns0 > > ovs-vsctl add-port br-int veth1 > > ip link set dev veth1 up > > ip netns exec ns0 ip addr add 10.10.0.1/24 dev veth0 > > ip netns exec ns0 ip link set dev veth0 up > > > > On second Node (192.168.77.102): > > ip netns add ns0 > > ip link add veth0 type veth peer name veth1 > > ip link set veth0 netns ns0 > > ovs-vsctl add-port br-int veth1 > > ip link set dev veth1 up > > ip netns exec ns0 ip addr add 10.10.0.2/24 dev veth0 > > ip netns exec ns0 ip link set dev veth0 up > > > > Step 3 – Create tunnel (run on each Node): > > > > ovs-vsctl add-port br-int tun0 -- set interface tun0 type=geneve \ > > ofport_request=100 options:remote_ip=flow options:key=flow > > > > Step 4 – Create the following flows: > > > > On first Node (192.168.77.101): > > root@ovs-test-node-1:/home/vagrant# ovs-ofctl dump-flows br-int --no-stats > > priority=100,ip actions=resubmit(,10) > > priority=0 actions=NORMAL > > priority=50 actions=resubmit(,40) > > table=10, priority=100,ip actions=ct(table=20,zone=65520) > > table=20, priority=200,ct_state=-new+trk,ip actions=resubmit(,30) > > table=20, priority=100,ip,nw_dst=10.10.0.2 actions=resubmit(,30) > > table=20, priority=0,ip actions=drop > > table=30, priority=100,ip actions=ct(commit,table=40,zone=65520) > > table=40, priority=100,in_port=veth1 > > actions=load:0xc0a84d66->NXM_NX_TUN_IPV4_DST[],output:tun0 > > table=40, priority=100,in_port=tun0 actions=output:veth1 > > table=40, priority=0 actions=drop > > > > On second Node (192
Re: [ovs-dev] [PATCH] netlink: removed incorrect optimization
On Mon, Nov 23, 2020 at 6:45 AM Cpp Code wrote: > > I would expect such checks to be commented as indeed it was not clear why > it was there. I looked in similar places where FLOW_TNL_F_UDPIF is used > and there is no such check. The commit which created this condition is > quite large so I am not able to conclude from that. > I assume this is an optimization because if we don't set > OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS there is less code to be executed later > when the parsing is done. I believe the assumption was made to exclude this > without considering specifics of Geneve related FLOW_TNL_F_UDPIF flag. > > > On Fri, Nov 20, 2020 at 8:52 AM Ben Pfaff wrote: > > > On Fri, Nov 20, 2020 at 05:12:46AM -0800, Toms Atteka wrote: > > > This optimization caused FLOW_TNL_F_UDPIF flag not to be used in > > > hash calculation for geneve tunnel when revalidating flows which > > > resulted in different cache hash values and incorrect behaviour. > > > > > > Reported-at: https://github.com/vmware-tanzu/antrea/issues/897 > > > Signed-off-by: Toms Atteka > > > > Why was the check there? It seems strange to have a very specific "if" > > test and then just remove it without some idea of why it was there to > > begin with. > > > > ... > > > > > -} else if (flow->metadata.present.len || is_mask) { > > > +} else { > > Hi Toms and Ben, I think the root cause for the reported issue observed on Antrea is because the netdev native tunnel always sets FLOW_TNL_F_UDPIF no matter the geneve option is present or not. So Toms's fix here will always generate OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS so that FLOW_TNL_F_UDPIF can be set when we convert the netlink attributes back to flow key. Instead of fixing the issue on this function, I think we can fix on the tunnel receive function, and only set FLOW_TNL_F_UDPIF when the geneve tunnel metadata is present. Here is the proposed fix: https://mail.openvswitch.org/pipermail/ovs-dev/2020-December/378711.html Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] netdev-natvie-tnl: Fix geneve tunnel flag
e completes. If you follow these steps, you should see the ping in the last step succeed. This is not expected because of the deleted flow. It also does not happen with VXLAN. Reported-by: Antonin Bas Reported-at: https://github.com/vmware-tanzu/antrea/issues/897 Fixes: 6b241d645291 ("netdev-vport: Factor-out tunnel Push-pop code into separate module.") Signed-off-by: Yi-Hung Wei --- lib/netdev-native-tnl.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c index b89dfdd52a86..09cff3a1fc7d 100644 --- a/lib/netdev-native-tnl.c +++ b/lib/netdev-native-tnl.c @@ -1015,9 +1015,11 @@ netdev_geneve_pop_header(struct dp_packet *packet) tnl->tun_id = htonll(ntohl(get_16aligned_be32(&gnh->vni)) >> 8); tnl->flags |= FLOW_TNL_F_KEY; -memcpy(tnl->metadata.opts.gnv, gnh->options, opts_len); -tnl->metadata.present.len = opts_len; -tnl->flags |= FLOW_TNL_F_UDPIF; +if (opts_len > 0) { +memcpy(tnl->metadata.opts.gnv, gnh->options, opts_len); +tnl->metadata.present.len = opts_len; +tnl->flags |= FLOW_TNL_F_UDPIF; +} packet->packet_type = htonl(PT_ETH); dp_packet_reset_packet(packet, hlen); -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [patch v1] conntrack: Support zone limits.
On Wed, Dec 9, 2020 at 7:50 AM Harold Huang wrote: > > When I use ```ovs-appctl dpctl/ct-set-limits default=6``` to set the > default zone limit, the max connection limit of all zones is set to 6. I > think it is not resonable. First, the max connection limit can be set with > ```ovs-appctl dpctl/ct-set-maxconns```. And when I use ```ovs-appctl > dpctl/ct-get-limits zone=5,6,7,8```, the result is like: > ``` > $ovs-appctl dpctl/ct-get-limits zone=5,6,7,8 > default limit=6 > zone=5,limit=6,count=0 > zone=6,limit=6,count=0 > zone=7,limit=6,count=0 > zone=8,limit=6,count=0 > ``` It is the expected behavior. Basically, dpct/ct-set-limits sets the per-zone limit, if the per-zone limit is not configured, it defaults to the default per-zone limit. It is documented on ovs-vswitchd(8), dpctl/ct-set-limits[dp] [default=default_limit] [zone=zone,limit=limit]... Sets the maximum allowed number of connections in a connection tracking zone. A specific zone may be set to limit, and multi‐ ple zones may be specified with a comma-separated list. If a per-zone limit for a particular zone is not specified in the datapath, it defaults to the default per-zone limit. A default zone may be specified with the default=default_limit argument. Initially, the default per-zone limit is unlimited. An unlim‐ ited number of entries may be set with 0 limit. > It seems that each zone has a default limit(6), but the limit(6) is the > total connection limit for all zones if we do not set the limit for a > specific zone. > The total connection limit is set by dpctl/ct-set-maxconns for userpsace datapath. Quoted from ovs-vswitchd(8), dpctl/ct-set-maxconns [dp] maxconns Sets the maximum limit of connection tracker entries to maxconns on dp. This can be used to reduce the processing load on the system due to connection tracking or simply limiting connection tracking. If the number of connections is already over the new maximum limit request then the new maximum limit will be en‐ forced when the number of connections decreases to that limit, which normally happens due to connection expiry. Only supported for userspace datapath. For kernel datapath, the number of conntrack entries for all zones is limited by nf_conntrack_max, https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 2/3] compat: Fix build issue on RHEL 7.7
On Thu, Nov 12, 2020 at 3:10 PM Greg Rose wrote: > > RHEL 7.2 introduced a KABI fixup in struct sk_buff for the name > change of l4_rxhash to l4_hash. Then patch > 9ba57fc7 ("datapath: Add hash info to upcall") introduced a > compile error by using l4_hash and not fixing up the HAVE_L4_RXHASH > configuration flag. > > Remove all references to HAVE_L4_RXHASH and always use l4_hash to > resolve the issue. This will break compilation on RHEL 7.0 and > RHEL 7.1 but dropping support for these older kernels shouldn't be > a problem. > > Fixes: 9ba57fc7 ("datapath: Add hash info to upcall") > Signed-off-by: Greg Rose > > --- > V2 - Just removes l4_rxhash and ends support for RHEL 7.x < 7.2 > --- Thanks for v2. Looks good to me. Acked-by: Yi-Hung Wei > diff --git a/datapath/linux/compat/include/linux/skbuff.h > b/datapath/linux/compat/include/linux/skbuff.h > index bc73255d5..d3bc6c715 100644 > --- a/datapath/linux/compat/include/linux/skbuff.h > +++ b/datapath/linux/compat/include/linux/skbuff.h > @@ -278,9 +278,7 @@ static inline void skb_clear_hash(struct sk_buff *skb) > #ifdef HAVE_RXHASH > skb->rxhash = 0; > #endif > -#if defined(HAVE_L4_RXHASH) > - skb->l4_rxhash = 0; > -#endif > + skb->l4_hash = 0; There is a space before the tab in this line. When the maintainer commits this patch, please help to remove the space. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 3/3] compat: Fix compile warning
On Thu, Nov 12, 2020 at 3:10 PM Greg Rose wrote: > > In ../compat/nf_conntrack_reasm.c nf_frags_cache_name is declared > if OVS_NF_DEFRAG6_BACKPORT is defined. However, later in the patch > it is only used if HAVE_INET_FRAGS_WITH_FRAGS_WORK is defined and > HAVE_INET_FRAGS_RND is not defined. This will cause a compile warning > about unused variables. > > Fix it up by using the same defines that enable its use to decide > if it should be declared and avoid the compiler warning. > > Fixes: 4a90b277baca ("compat: Fixup ipv6 fragmentation on 4.9.135+ kernels") > Signed-off-by: Greg Rose > > --- > V2 No change > --- LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 1/3] compat: Remove stale code
On Thu, Nov 12, 2020 at 3:10 PM Greg Rose wrote: > > Remove stale and unused code left over after support for kernels > older than 3.10 was removed. > > Fixes: 8063e0958780 ("datapath: Drop support for kernel older than 3.10") > Signed-off-by: Greg Rose > > --- > V2 Separate out stale code removal to it's own patch as per suggestion >from Yi-Hung > --- Looks good to me. Thanks. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/2] compat: Fix build issue on RHEL 7.7
On Tue, Nov 3, 2020 at 4:06 PM Gregory Rose wrote: > On 11/3/2020 2:17 PM, Yi-Hung Wei wrote: > > On Tue, Oct 20, 2020 at 10:07 AM Greg Rose wrote: > >> > >> RHEL 7.7 has a KABI fixup in struct sk_buff to backport the member > >> name change of l4_rxhash to l4_hash. This exposed a couple of > >> issues in patch 8063e0958780 which was intended to remove support > >> for kernels older than 3.10. > >> > >> Remove stale code and add a compat level check to detect the change. > >> This fixes a compile error on RHEL 7.7. > >> > >> Fixes: 8063e0958780 ("datapath: Drop support for kernel older than 3.10") > >> Signed-off-by: Greg Rose > > > > Hi Greg, > > > > Thanks for the patch. I found the compilation error on RHEL 7.7 > > actually happens starting from a more recent patch as follows. If > > that is the case, please update the fix tag. > > > > 2020-05-25 9ba57fc7 ("datapath: Add hash info to upcall") > > Hi Yi-Hung, > > I don't think I'm fixing anything in that patch. There's nothing > wrong with it that I can see. It properly depended on the > HAVE_L4_RXHASH define which is not part of that patch but was > from 75e93287db ("datapath: improve l4_rxhash regex"). > > Seems to me fixing something would require changes to the code > that's being fixed. Hi Greg, I thought your change in datapath.c is to fix the following new added lines in 9ba57fc7 ("datapath: Add hash info to upcall"). I found it because the compilation goes fine before this patch on RHEL 7.7. 9ba57fc7 ("datapath: Add hash info to upcall") @@ -523,6 +525,25 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb, . +#ifdef HAVE_L4_RXHASH + if (skb->l4_rxhash) +#else + if (skb->l4_hash) +#endif + hash |= OVS_PACKET_HASH_L4_BIT; > >> diff --git a/acinclude.m4 b/acinclude.m4 > >> index 1460289ca..8e80d7930 100644 > >> --- a/acinclude.m4 > >> +++ b/acinclude.m4 > >> @@ -879,6 +879,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ > >> [OVS_DEFINE([HAVE_SKB_ZEROCOPY])]) > >> OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [u8.*l4_rxhash], > >> [OVS_DEFINE([HAVE_L4_RXHASH])]) > >> + OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [u8.*l4_hash], > >> + [OVS_DEFINE([HAVE_L4_HASH])]) > > Looks like the main compatibility issue on using either l4_rxhash or > > l4_hash is due to the kernel ABI update on skbuff.h from the following > > commit that firstly introduced in 3.15 kernel. > > > > 2014-03-24 61b905da33ae (("net: Rename skb->rxhash to skb->hash"). > > > > I also found that starting from RHEL 7.2, RHEL introduced the new ABI. > > > > __u8 RH_KABI_RENAME(l4_rxhash, l4_hash):1; > > > > From Documentation/faq/releases.rst, the oldest kernel that OVS > > supports from 2.10.x is 3.16, I think we can drop the compatibility > > support on "l4_rxhash" and only use "l4_hash" in the code base. > > > > If that makes sense to you, let's drop the following in acindlue.m4, > > and clean up the update on datapath.c and > > ./datapath/linux/compat/include/linux/skbuff.h accordingly. > > So we don't want to support RHEL 7.2 and older? I know we only > "officially" support 3.16 and above but RHEL 7.x 3.10 kernel is > a special case. The new ABI is introduced in RHEL7.2, so the support that we will drop is RHEL 7.0 and RHEL 7.1. Since RHEL 7.1 was introduced in 2015 March, I think it is probably fine. > >> @@ -975,8 +977,6 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ > >> > >> OVS_GREP_IFELSE([$KSRC/include/net/sock.h], [sk_no_check_tx]) > >> OVS_GREP_IFELSE([$KSRC/include/linux/udp.h], [no_check6_tx]) > >> - OVS_GREP_IFELSE([$KSRC/include/linux/utsrelease.h], [el6], > >> - [OVS_DEFINE([HAVE_RHEL6_PER_CPU])]) > >> OVS_FIND_PARAM_IFELSE([$KSRC/include/net/protocol.h], > >> [udp_add_offload], [net], > >> [OVS_DEFINE([HAVE_UDP_ADD_OFFLOAD_TAKES_NET])]) > > Good catch. I think it is a valid clean up. But since it does not fix > > 9ba57fc7 ("datapath: Add hash info to upcall"), should we move it > > along with the changes in > > ./datapath/linux/compat/include/linux/percpu.h to a separate patch? > > But it does fix 8063e0958780 > ("datapath: Drop support for kernel older than 3.10") Yes, it does, and we should clea
Re: [ovs-dev] [PATCH 2/2] compat: Fix compile warning
On Tue, Oct 20, 2020 at 10:07 AM Greg Rose wrote: > > In ../compat/nf_conntrack_reasm.c nf_frags_cache_name is declared > if OVS_NF_DEFRAG6_BACKPORT is defined. However, later in the patch > it is only used if HAVE_INET_FRAGS_WITH_FRAGS_WORK is defined and > HAVE_INET_FRAGS_RND is not defined. This will cause a compile warning > about unused variables. > > Fix it up by using the same defines that enable its use to decide > if it should be declared and avoid the compiler warning. > > Fixes: 4a90b277baca ("compat: Fixup ipv6 fragmentation on 4.9.135+ kernels") > Signed-off-by: Greg Rose > --- LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/2] compat: Fix build issue on RHEL 7.7
On Tue, Oct 20, 2020 at 10:07 AM Greg Rose wrote: > > RHEL 7.7 has a KABI fixup in struct sk_buff to backport the member > name change of l4_rxhash to l4_hash. This exposed a couple of > issues in patch 8063e0958780 which was intended to remove support > for kernels older than 3.10. > > Remove stale code and add a compat level check to detect the change. > This fixes a compile error on RHEL 7.7. > > Fixes: 8063e0958780 ("datapath: Drop support for kernel older than 3.10") > Signed-off-by: Greg Rose Hi Greg, Thanks for the patch. I found the compilation error on RHEL 7.7 actually happens starting from a more recent patch as follows. If that is the case, please update the fix tag. 2020-05-25 9ba57fc7 ("datapath: Add hash info to upcall") > diff --git a/acinclude.m4 b/acinclude.m4 > index 1460289ca..8e80d7930 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -879,6 +879,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ >[OVS_DEFINE([HAVE_SKB_ZEROCOPY])]) >OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [u8.*l4_rxhash], >[OVS_DEFINE([HAVE_L4_RXHASH])]) > + OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [u8.*l4_hash], > + [OVS_DEFINE([HAVE_L4_HASH])]) Looks like the main compatibility issue on using either l4_rxhash or l4_hash is due to the kernel ABI update on skbuff.h from the following commit that firstly introduced in 3.15 kernel. 2014-03-24 61b905da33ae (("net: Rename skb->rxhash to skb->hash"). I also found that starting from RHEL 7.2, RHEL introduced the new ABI. __u8 RH_KABI_RENAME(l4_rxhash, l4_hash):1; >From Documentation/faq/releases.rst, the oldest kernel that OVS supports from 2.10.x is 3.16, I think we can drop the compatibility support on "l4_rxhash" and only use "l4_hash" in the code base. If that makes sense to you, let's drop the following in acindlue.m4, and clean up the update on datapath.c and ./datapath/linux/compat/include/linux/skbuff.h accordingly. >OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [u8.*l4_rxhash], >[OVS_DEFINE([HAVE_L4_RXHASH])]) > + OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [u8.*l4_hash], > + [OVS_DEFINE([HAVE_L4_HASH])]) > @@ -975,8 +977,6 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ > >OVS_GREP_IFELSE([$KSRC/include/net/sock.h], [sk_no_check_tx]) >OVS_GREP_IFELSE([$KSRC/include/linux/udp.h], [no_check6_tx]) > - OVS_GREP_IFELSE([$KSRC/include/linux/utsrelease.h], [el6], > - [OVS_DEFINE([HAVE_RHEL6_PER_CPU])]) >OVS_FIND_PARAM_IFELSE([$KSRC/include/net/protocol.h], > [udp_add_offload], [net], > [OVS_DEFINE([HAVE_UDP_ADD_OFFLOAD_TAKES_NET])]) Good catch. I think it is a valid clean up. But since it does not fix 9ba57fc7 ("datapath: Add hash info to upcall"), should we move it along with the changes in ./datapath/linux/compat/include/linux/percpu.h to a separate patch? > diff --git a/datapath/datapath.c b/datapath/datapath.c > index 52a59f135..09fb3b1fc 100644 > --- a/datapath/datapath.c > +++ b/datapath/datapath.c > @@ -529,7 +529,7 @@ static int queue_userspace_packet(struct datapath *dp, > struct sk_buff *skb, > hash |= OVS_PACKET_HASH_SW_BIT; > #endif > > -#ifdef HAVE_L4_RXHASH > +#if defined(HAVE_L4_RXHASH) && !defined(HAVE_L4_HASH) > if (skb->l4_rxhash) > #else > if (skb->l4_hash) Looks like we can get rid of all #ifdef, and only leave > if (skb->l4_hash) > diff --git a/datapath/linux/compat/include/linux/skbuff.h > b/datapath/linux/compat/include/linux/skbuff.h > index 204ce5497..94479f57b 100644 > --- a/datapath/linux/compat/include/linux/skbuff.h > +++ b/datapath/linux/compat/include/linux/skbuff.h > @@ -278,8 +278,10 @@ static inline void skb_clear_hash(struct sk_buff *skb) > #ifdef HAVE_RXHASH > skb->rxhash = 0; > #endif > -#if defined(HAVE_L4_RXHASH) && !defined(HAVE_RHEL_OVS_HOOK) > +#if defined(HAVE_L4_RXHASH) && !defined(HAVE_L4_HASH) > skb->l4_rxhash = 0; > +#else > + skb->l4_hash = 0; > #endif > } > #endif We can also clean up skb_clear_hash() if we only care l4_hash. Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V4 00/24] Add support for Linux kernels up to 5.8.x
On Wed, Oct 14, 2020 at 8:41 AM Ilya Maximets wrote: > > On 10/12/20 10:24 PM, Greg Rose wrote: > > This patch set will add support for Linux kernels up to 5.8. In > > addition there are quite a few patches for openvswitch on the Linux > > kernel mailing list that have not been backported - here is a first > > batch attempting to catch up on some of that technical debt. There > > will be a follow up batch of patches to this one but I didn't want > > the patch bomb to get too large. > > > > V4 passes Travis here: > > https://travis-ci.org/github/gvrose8192/ovs-experimental/builds/735114390 > > Thanks! > > I'm wondering if there was changes in patches beside #5 and #11? > If not, I guess, I could add Acks from Yi-Hung from v3 before applying > the series. > > Yi-Hung, could you, please, take a look at patches #5 and #11 and check > if comments was addressed? > > Best regards, Ilya Maximets. Hi IIya, Thanks for reminding me. I just acked both patch 5 and 11. Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V4 11/24] datapath: fix possible memleak on destroy flow-table
On Mon, Oct 12, 2020 at 1:28 PM Greg Rose wrote: > > From: Tonghao Zhang > > Upstream commit: > commit 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 > Author: Tonghao Zhang > Date: Fri Nov 1 22:23:52 2019 +0800 > > net: openvswitch: fix possible memleak on destroy flow-table > > When we destroy the flow tables which may contain the flow_mask, > so release the flow mask struct. > > Signed-off-by: Tonghao Zhang > Tested-by: Greg Rose > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Added additional compat layer fixup for WRITE_ONCE() > > Cc: Tonghao Zhang > Reviewed-by: Tonghao Zhang > Signed-off-by: Greg Rose > > --- > V4 - Simplify the WRITE_ONCE macro implemention as suggested by > Yihung. > --- Looks good to me. Thanks! Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V4 05/24] datapath: Set OvS recirc_id from tc chain index
On Mon, Oct 12, 2020 at 1:25 PM Greg Rose wrote: > > From: Paul Blakey > > Upstream commit: > commit 95a7233c452a58a4c2310c456c73997853b2ec46 > Author: Paul Blakey > Date: Wed Sep 4 16:56:37 2019 +0300 > > net: openvswitch: Set OvS recirc_id from tc chain index > > Offloaded OvS datapath rules are translated one to one to tc rules, > for example the following simplified OvS rule: > > recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) > actions:ct(),recirc(2) > > Will be translated to the following tc rule: > > $ tc filter add dev dev1 ingress \ > prio 1 chain 0 proto ip \ > flower tcp ct_state -trk \ > action ct pipe \ > action goto chain 2 > > Received packets will first travel though tc, and if they aren't stolen > by it, like in the above rule, they will continue to OvS datapath. > Since we already did some actions (action ct in this case) which might > modify the packets, and updated action stats, we would like to continue > the proccessing with the correct recirc_id in OvS (here recirc_id(2)) > where we left off. > > To support this, introduce a new skb extension for tc, which > will be used for translating tc chain to ovs recirc_id to > handle these miss cases. Last tc chain index will be set > by tc goto chain action and read by OvS datapath. > > Signed-off-by: Paul Blakey > Signed-off-by: Vlad Buslov > Acked-by: Jiri Pirko > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Backport the local datapath changes from this patch and add compat > layer fixup for the DECLARE_STATIC_KEY_FALSE macro. > > Cc: Paul Blakey > Signed-off-by: Greg Rose > > --- > V4 - Add in portion of commit for ovs_dp_cmd_set which was missed > in first patch. > --- Thanks Greg for this new version. It looks good to me. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 24/24] Documentation: Update faq and NEWS for kernel 5.8
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > Update the NEWS and faq now that we will support up to Linux kernel > 5.8. > > Signed-off-by: Greg Rose > --- > Documentation/faq/releases.rst | 1 + > NEWS | 2 ++ > 2 files changed, 3 insertions(+) > > diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst > index 9d5d2c3e1..dcba97e16 100644 > --- a/Documentation/faq/releases.rst > +++ b/Documentation/faq/releases.rst > @@ -72,6 +72,7 @@ Q: What Linux kernel versions does each Open vSwitch > release work with? > 2.12.x 3.16 to 5.0 > 2.13.x 3.16 to 5.0 > 2.14.x 3.16 to 5.5 > +2.15.x 3.16 to 5.8 > == > > Open vSwitch userspace should also work with the Linux kernel module > built > diff --git a/NEWS b/NEWS > index 2f67d5047..2cd0a7cbf 100644 > --- a/NEWS > +++ b/NEWS > @@ -1,5 +1,7 @@ > Post-v2.14.0 > - > + - Linux datapath: > + * Support for kernel versions up to 5.8.x. This patch can not be cleanly applied to NEWS, but that should be an easy fix. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 23/24] travis: Update kernel list as of 5.8
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > Update the list to more closely track the LTS releases on kernel.org. > > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 22/24] acinclude: Enable builds up to Linux 5.8
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > Allow building openvswitch against Linux kernels up to and including > version 5.8. > > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 21/24] datapath: use hlist_for_each_entry_rcu instead of hlist_for_each_entry
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > From: Tonghao Zhang > > Upstream commit: > commit 64948427a63f49dd0ce403388d232f22cc1971a8 > Author: Tonghao Zhang > Date: Thu Mar 26 04:27:24 2020 +0800 > > net: openvswitch: use hlist_for_each_entry_rcu instead of > hlist_for_each_entry > > The struct sw_flow is protected by RCU, when traversing them, > use hlist_for_each_entry_rcu. > > Signed-off-by: Tonghao Zhang > Tested-by: Greg Rose > Reviewed-by: Greg Rose > Signed-off-by: David S. Miller > > Compat fixup - OVS doesn't support lockdep_ovsl_is_held() yet > > Cc: Tonghao Zhang > Reviewed-by: Tonghao Zhang > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 19/24] datapath: use skb_list_walk_safe helper for gso segments
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > From: "Jason A. Donenfeld" > > Upstream commit: > commit 2cec4448db38758832c2edad439f99584bb8fa0d > Author: Jason A. Donenfeld > Date: Mon Jan 13 18:42:29 2020 -0500 > > net: openvswitch: use skb_list_walk_safe helper for gso segments > > This is a straight-forward conversion case for the new function, keeping > the flow of the existing code as intact as possible. > > Signed-off-by: Jason A. Donenfeld > Signed-off-by: David S. Miller > > Cc: Jason A. Donenfeld > Signed-off-by: Greg Rose LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 16/24] datapath: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > From: Paolo Abeni > > Upstream commit: > commit 8ffeb03fbba3b599690b361467bfd2373e8c450f > Author: Paolo Abeni > Date: Sun Dec 1 18:41:24 2019 +0100 > > openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info() > > All the callers of ovs_flow_cmd_build_info() already deal with > error return code correctly, so we can handle the error condition > in a more gracefull way. Still dump a warning to preserve > debuggability. > > v1 -> v2: > - clarify the commit message > - clean the skb and report the error (DaveM) > > Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.") > Signed-off-by: Paolo Abeni > Signed-off-by: David S. Miller > > Cc: Paolo Abeni > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 20/24] datapath: Distribute switch variables for initialization
On Wed, Sep 16, 2020 at 10:35 AM Greg Rose wrote: > > From: Kees Cook > > Upstream commit: > commit 16a556eeb7ed2dc3709fe2c5be76accdfa4901ab > Author: Kees Cook > Date: Wed Feb 19 22:23:09 2020 -0800 > > openvswitch: Distribute switch variables for initialization > > Variables declared in a switch statement before any case statements > cannot be automatically initialized with compiler instrumentation (as > they are not part of any execution flow). With GCC's proposed automatic > stack variable initialization feature, this triggers a warning (and they > don't get initialized). Clang's automatic stack variable initialization > (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also > doesn't initialize such variables[1]. Note that these warnings (or silent > skipping) happen before the dead-store elimination optimization phase, > so even when the automatic initializations are later elided in favor of > direct initializations, the warnings remain. > > To avoid these problems, move such variables into the "case" where > they're used or lift them up into the main function body. > > net/openvswitch/flow_netlink.c: In function ‘validate_set’: > net/openvswitch/flow_netlink.c:2711:29: warning: statement will never be > executed [-Wswitch-unreachable] > 2711 | const struct ovs_key_ipv4 *ipv4_key; > | ^~~~ > > [1] https://bugs.llvm.org/show_bug.cgi?id=44916 > > Signed-off-by: Kees Cook > Signed-off-by: David S. Miller > > Cc: Kees Cook > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 18/24] datapath: support asymmetric conntrack
On Wed, Sep 16, 2020 at 10:38 AM Greg Rose wrote: > > From: aaron conole > > Upstream commit: > commit 5d50aa83e2c8e91ced2cca77c198b468ca9210f4 > author: aaron conole > date: tue dec 3 16:34:13 2019 -0500 > > openvswitch: support asymmetric conntrack > > the openvswitch module shares a common conntrack and nat infrastructure > exposed via netfilter. it's possible that a packet needs both snat and > dnat manipulation, due to e.g. tuple collision. netfilter can support > this because it runs through the nat table twice - once on ingress and > again after egress. the openvswitch module doesn't have such capability. > > like netfilter hook infrastructure, we should run through nat twice to > keep the symmetry. > > fixes: 05752523e565 ("openvswitch: interface with nat.") > signed-off-by: aaron conole > signed-off-by: david s. miller > > Fixes: c5f6c06b58d6 ("datapath: Interface with NAT.") > Cc: aaron conole > Acked-by: Aaron Conole > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 17/24] datapath: remove another BUG_ON()
On Wed, Sep 16, 2020 at 10:36 AM Greg Rose wrote: > > From: Paolo Abeni > > Upstream commit: > commit 8a574f86652a4540a2433946ba826ccb87f398cc > Author: Paolo Abeni > Date: Sun Dec 1 18:41:25 2019 +0100 > > openvswitch: remove another BUG_ON() > > If we can't build the flow del notification, we can simply delete > the flow, no need to crash the kernel. Still keep a WARN_ON to > preserve debuggability. > > Note: the BUG_ON() predates the Fixes tag, but this change > can be applied only after the mentioned commit. > > v1 -> v2: > - do not leak an skb on error > > Fixes: aed067783e50 ("openvswitch: Minimize ovs_flow_cmd_del critical > section.") > Signed-off-by: Paolo Abeni > Signed-off-by: David S. Miller > > Cc: Paolo Abeni > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 12/24] datapath: simplify the ovs_dp_cmd_new
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > From: Tonghao Zhang > > Upstream commit: > commit eec62eadd1d757b0743ccbde55973814f3ad396e > Author: Tonghao Zhang > Date: Fri Nov 1 22:23:54 2019 +0800 > > net: openvswitch: simplify the ovs_dp_cmd_new > > use the specified functions to init resource. > > Signed-off-by: Tonghao Zhang > Tested-by: Greg Rose > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Cc: Tonghao Zhang > Reviewed-by: Tonghao Zhang > Signed-off-by: Greg Rose > --- LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 11/24] datapath: fix possible memleak on destroy flow-table
On Wed, Sep 16, 2020 at 10:33 AM Greg Rose wrote: > > From: Tonghao Zhang > > Upstream commit: > commit 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 > Author: Tonghao Zhang > Date: Fri Nov 1 22:23:52 2019 +0800 > > net: openvswitch: fix possible memleak on destroy flow-table > > When we destroy the flow tables which may contain the flow_mask, > so release the flow mask struct. > > Signed-off-by: Tonghao Zhang > Tested-by: Greg Rose > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Added additional compat layer fixup for WRITE_ONCE() > > Cc: Tonghao Zhang > Reviewed-by: Tonghao Zhang > Signed-off-by: Greg Rose > --- > datapath/flow_table.c | 186 +- > .../linux/compat/include/linux/compiler.h | 13 ++ > 2 files changed, 111 insertions(+), 88 deletions(-) > > diff --git a/datapath/flow_table.c b/datapath/flow_table.c > index ca2efe94d..bd05dd394 100644 > --- a/datapath/flow_table.c > +++ b/datapath/flow_table.c > @@ -234,6 +234,74 @@ static int tbl_mask_array_realloc(struct flow_table > *tbl, int size) > return 0; > } > > +static int tbl_mask_array_add_mask(struct flow_table *tbl, > + struct sw_flow_mask *new) > +{ > + struct mask_array *ma = ovsl_dereference(tbl->mask_array); > + int err, ma_count = READ_ONCE(ma->count); > + > + if (ma_count >= ma->max) { > + err = tbl_mask_array_realloc(tbl, ma->max + > + MASK_ARRAY_SIZE_MIN); > + if (err) > + return err; > + > + ma = ovsl_dereference(tbl->mask_array); > + } > + > + BUG_ON(ovsl_dereference(ma->masks[ma_count])); > + > + rcu_assign_pointer(ma->masks[ma_count], new); > + WRITE_ONCE(ma->count, ma_count +1); > + > + return 0; > +} > + > +static void tbl_mask_array_del_mask(struct flow_table *tbl, > + struct sw_flow_mask *mask) > +{ > + struct mask_array *ma = ovsl_dereference(tbl->mask_array); > + int i, ma_count = READ_ONCE(ma->count); > + > + /* Remove the deleted mask pointers from the array */ > + for (i = 0; i < ma_count; i++) { > + if (mask == ovsl_dereference(ma->masks[i])) > + goto found; > + } > + > + BUG(); > + return; > + > +found: > + WRITE_ONCE(ma->count, ma_count -1); > + > + rcu_assign_pointer(ma->masks[i], ma->masks[ma_count -1]); > + RCU_INIT_POINTER(ma->masks[ma_count -1], NULL); > + > + kfree_rcu(mask, rcu); > + > + /* Shrink the mask array if necessary. */ > + if (ma->max >= (MASK_ARRAY_SIZE_MIN * 2) && > + ma_count <= (ma->max / 3)) > + tbl_mask_array_realloc(tbl, ma->max / 2); > +} > + > +/* Remove 'mask' from the mask list, if it is not needed any more. */ > +static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask > *mask) > +{ > + if (mask) { > + /* ovs-lock is required to protect mask-refcount and > +* mask list. > +*/ > + ASSERT_OVSL(); > + BUG_ON(!mask->ref_count); > + mask->ref_count--; > + > + if (!mask->ref_count) > + tbl_mask_array_del_mask(tbl, mask); > + } > +} > + > int ovs_flow_tbl_init(struct flow_table *table) > { > struct table_instance *ti, *ufid_ti; > @@ -280,7 +348,28 @@ static void flow_tbl_destroy_rcu_cb(struct rcu_head *rcu) > __table_instance_destroy(ti); > } > > -static void table_instance_destroy(struct table_instance *ti, > +static void table_instance_flow_free(struct flow_table *table, > + struct table_instance *ti, > + struct table_instance *ufid_ti, > + struct sw_flow *flow, > + bool count) > +{ > + hlist_del_rcu(&flow->flow_table.node[ti->node_ver]); > + if (count) > + table->count--; > + > + if (ovs_identifier_is_ufid(&flow->id)) { > + hlist_del_rcu(&flow->ufid_table.node[ufid_ti->node_ver]); > + > + if (count) > + table->ufid_count--; > + } > + > + flow_mask_remove(table, flow->mask); > +} > + > +static void table_instance_destroy(struct flow_table *table, > + struct table_instance *ti, >struct table_instance *ufid_ti, >bool deferred) > { > @@ -297,13 +386,12 @@ static void table_instance_destroy(struct > table_instance *ti, > struct sw_flow *flow; > struct hlist_head *head = &ti->buckets[i]; > struct hlist_node *n; > - int ver = ti->node_ver; > - int ufid_ver = ufid_ti
Re: [ovs-dev] [PATCH V3 14/24] datapath: don't call pad_packet if not necessary
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > From: Tonghao Zhang > > Upstream commit: > commit 61ca533c0e94104c35fcb7858a23ec9a05d78143 > Author: Tonghao Zhang > Date: Thu Nov 14 23:51:08 2019 +0800 > > net: openvswitch: don't call pad_packet if not necessary > > The nla_put_u16/nla_put_u32 makes sure that > *attrlen is align. The call tree is that: > > nla_put_u16/nla_put_u32 > -> nla_putattrlen = sizeof(u16) or sizeof(u32) > -> __nla_put attrlen > -> __nla_reserve attrlen > -> skb_put(skb, nla_total_size(attrlen)) > > nla_total_size returns the total length of attribute > including padding. > > Cc: Joe Stringer > Cc: William Tu > Signed-off-by: Tonghao Zhang > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Cc: Tonghao Zhang > Reviewed-by: Tonghao Zhang > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 10/24] datapath: add likely in flow_lookup
On Wed, Sep 16, 2020 at 10:35 AM Greg Rose wrote: > > From: Tonghao Zhang > > Upstream commit: > commit 0a3e01371db17d753dd92ec4d0fc6247412d3b01 > Author: Tonghao Zhang > Date: Fri Nov 1 22:23:51 2019 +0800 > > net: openvswitch: add likely in flow_lookup > > The most case *index < ma->max, and flow-mask is not NULL. > We add un/likely for performance. > > Signed-off-by: Tonghao Zhang > Tested-by: Greg Rose > Acked-by: William Tu > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Cc: Tonghao Zhang > Reviewed-by: Tonghao Zhang > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 13/24] datapath: select vport upcall portid directly
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > From: Tonghao Zhang > > Upstream commit: > commit 90ce9f23a886bdef7a4b7a9bd52c7a50a6a81635 > Author: Tonghao Zhang > Date: Thu Nov 7 00:34:28 2019 +0800 > > net: openvswitch: select vport upcall portid directly > > The commit 69c51582ff786 ("dpif-netlink: don't allocate per > thread netlink sockets"), in Open vSwitch ovs-vswitchd, has > changed the number of allocated sockets to just one per port > by moving the socket array from a per handler structure to > a per datapath one. In the kernel datapath, a vport will have > only one socket in most case, if so select it directly in > fast-path. > > Signed-off-by: Tonghao Zhang > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Cc: Tonghao Zhang > Reviewed-by: Tonghao Zhang > Signed-off-by: Greg Rose > --- LGTM Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 15/24] datapath: fix flow command message size
On Wed, Sep 16, 2020 at 10:39 AM Greg Rose wrote: > > From: Paolo Abeni > > Upstream commit: > commit 4e81c0b3fa93d07653e2415fa71656b080a112fd > Author: Paolo Abeni > Date: Tue Nov 26 12:55:50 2019 +0100 > > openvswitch: fix flow command message size > > When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant > flow has no UFID, we can exceed the computed size, as > ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY > attribute. > Take the above in account when computing the flow command message > size. > > Fixes: 74ed7ab9264c ("openvswitch: Add support for unique flow IDs.") > Reported-by: Qi Jun Ding > Signed-off-by: Paolo Abeni > Signed-off-by: David S. Miller > > Cc: Paolo Abeni > Signed-off-by: Greg Rose > --- LGTM Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 09/24] datapath: simplify the flow_hash
On Wed, Sep 16, 2020 at 10:33 AM Greg Rose wrote: > > From: Tonghao Zhang > > Upstream commit: > commit 515b65a4b99197ae062a795ab4de919e6d04be04 > Author: Tonghao Zhang > Date: Fri Nov 1 22:23:50 2019 +0800 > > net: openvswitch: simplify the flow_hash > > Simplify the code and remove the unnecessary BUILD_BUG_ON. > > Signed-off-by: Tonghao Zhang > Tested-by: Greg Rose > Acked-by: William Tu > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Cc: Tonghao Zhang > Reviewed-by: Tonghao Zhang > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 07/24] datapath: don't unlock mutex when changing the user_features fails
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > From: Tonghao Zhang > > Upstream commit: > commit 4c76bf696a608ea5cc555fe97ec59a9033236604 > Author: Tonghao Zhang > Date: Fri Nov 1 22:23:53 2019 +0800 > > net: openvswitch: don't unlock mutex when changing the user_features fails > > Unlocking of a not locked mutex is not allowed. > Other kernel thread may be in critical section while > we unlock it because of setting user_feature fail. > > Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain > index") > Cc: Paul Blakey > Signed-off-by: Tonghao Zhang > Tested-by: Greg Rose > Acked-by: William Tu > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Cc: Tonghao Zhang > Reviewed-by: Tonghao Zhang > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 08/24] datapath: optimize flow-mask looking up
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > From: Tonghao Zhang > > Upstream commit: > commit 57f7d7b9164426c496300d254fd5167fbbf205ea > Author: Tonghao Zhang > Date: Fri Nov 1 22:23:49 2019 +0800 > > net: openvswitch: optimize flow-mask looking up > > The full looking up on flow table traverses all mask array. > If mask-array is too large, the number of invalid flow-mask > increase, performance will be drop. > > One bad case, for example: M means flow-mask is valid and NULL > of flow-mask means deleted. > > +---+ > | M | NULL | ... | NULL | M| > +---+ > > In that case, without this patch, openvswitch will traverses all > mask array, because there will be one flow-mask in the tail. This > patch changes the way of flow-mask inserting and deleting, and the > mask array will be keep as below: there is not a NULL hole. In the > fast path, we can "break" "for" (not "continue") in flow_lookup > when we get a NULL flow-mask. > > "break" > v > +---+ > | M | M | NULL |... | NULL | NULL| > +---+ > > This patch don't optimize slow or control path, still using ma->max > to traverse. Slow path: > * tbl_mask_array_realloc > * ovs_flow_tbl_lookup_exact > * flow_mask_find > > Signed-off-by: Tonghao Zhang > Tested-by: Greg Rose > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Cc: Tonghao Zhang > Reviewed-by: Tonghao Zhang > Signed-off-by: Greg Rose > --- LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 06/24] datapath: fix GFP flags in rtnl_net_notifyid()
On Wed, Sep 16, 2020 at 10:34 AM Greg Rose wrote: > > From: Guillaume Nault > > Upstream commit: > commit d4e4fdf9e4a27c87edb79b1478955075be141f67 > Author: Guillaume Nault > Date: Wed Oct 23 18:39:04 2019 +0200 > > netns: fix GFP flags in rtnl_net_notifyid() > > In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to > rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances, > but there are a few paths calling rtnl_net_notifyid() from atomic > context or from RCU critical sections. The later also precludes the use > of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new() > call is wrong too, as it uses GFP_KERNEL unconditionally. > > Therefore, we need to pass the GFP flags as parameter and propagate it > through function calls until the proper flags can be determined. > > In most cases, GFP_KERNEL is fine. The exceptions are: > * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump() > indirectly call rtnl_net_notifyid() from RCU critical section, > > * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as > parameter. > > Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used > by nlmsg_new(). The function is allowed to sleep, so better make the > flags consistent with the ones used in the following > ovs_vport_cmd_fill_info() call. > > Found by code inspection. > > Fixes: 9a9634545c70 ("netns: notify netns id events") > Signed-off-by: Guillaume Nault > Acked-by: Nicolas Dichtel > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Backport the datapath.c portion of this fix. > > Cc: Guillaume Nault > Signed-off-by: Greg Rose > --- LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 03/24] datapath: do not update max_headroom if new headroom is equal to old headroom
On Wed, Sep 16, 2020 at 10:33 AM Greg Rose wrote: > > From: Taehee Yoo > > Upstream commit: > commit 6b660c4177aaebdc73df7a3378f0e8b110aa4b51 > Author: Taehee Yoo > Date: Sat Jul 6 01:08:09 2019 +0900 > > net: openvswitch: do not update max_headroom if new headroom is equal to > old headroom > > When a vport is deleted, the maximum headroom size would be changed. > If the vport which has the largest headroom is deleted, > the new max_headroom would be set. > But, if the new headroom size is equal to the old headroom size, > updating routine is unnecessary. > > Signed-off-by: Taehee Yoo > Tested-by: Greg Rose > Reviewed-by: Greg Rose > Signed-off-by: David S. Miller > > Cc: Taehee Yoo > Signed-off-by: Greg Rose > --- LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 01/24] datapath: return an error instead of doing BUG_ON()
On Wed, Sep 16, 2020 at 10:33 AM Greg Rose wrote: > > From: Eelco Chaudron > > Upstream commit: > commit a734d1f4c2fc962ef4daa179e216df84a8ec5f84 > Author: Eelco Chaudron > Date: Thu May 2 16:12:38 2019 -0400 > > net: openvswitch: return an error instead of doing BUG_ON() > > For all other error cases in queue_userspace_packet() the error is > returned, so it makes sense to do the same for these two error cases. > > Reported-by: Davide Caratti > Signed-off-by: Eelco Chaudron > Acked-by: Flavio Leitner > Signed-off-by: David S. Miller > > Cc: Eelco Chaudron > Acked-by: Eelco Chaudron > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 02/24] datapath: drop unneeded likely() call around IS_ERR()
On Wed, Sep 16, 2020 at 10:33 AM Greg Rose wrote: > > From: Enrico Weigelt > > Upstream commit: > commit b90f5aa4d6268e81dd1fd51e5ef89d2892bf040d > Author: Enrico Weigelt > Date: Wed Jun 5 23:06:40 2019 +0200 > > net: openvswitch: drop unneeded likely() call around IS_ERR() > > IS_ERR() already calls unlikely(), so this extra likely() call > around the !IS_ERR() is not needed. > > Signed-off-by: Enrico Weigelt > Signed-off-by: David S. Miller > > Cc: Enrico Weigelt > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 05/24] datapath: Set OvS recirc_id from tc chain index
On Wed, Sep 16, 2020 at 10:33 AM Greg Rose wrote: > > From: Paul Blakey > > Upstream commit: > commit 95a7233c452a58a4c2310c456c73997853b2ec46 > Author: Paul Blakey > Date: Wed Sep 4 16:56:37 2019 +0300 > > net: openvswitch: Set OvS recirc_id from tc chain index > > Offloaded OvS datapath rules are translated one to one to tc rules, > for example the following simplified OvS rule: > > recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) > actions:ct(),recirc(2) > > Will be translated to the following tc rule: > > $ tc filter add dev dev1 ingress \ > prio 1 chain 0 proto ip \ > flower tcp ct_state -trk \ > action ct pipe \ > action goto chain 2 > > Received packets will first travel though tc, and if they aren't stolen > by it, like in the above rule, they will continue to OvS datapath. > Since we already did some actions (action ct in this case) which might > modify the packets, and updated action stats, we would like to continue > the proccessing with the correct recirc_id in OvS (here recirc_id(2)) > where we left off. > > To support this, introduce a new skb extension for tc, which > will be used for translating tc chain to ovs recirc_id to > handle these miss cases. Last tc chain index will be set > by tc goto chain action and read by OvS datapath. > > Signed-off-by: Paul Blakey > Signed-off-by: Vlad Buslov > Acked-by: Jiri Pirko > Acked-by: Pravin B Shelar > Signed-off-by: David S. Miller > > Backport the local datapath changes from this patch and add compat > layer fixup for the DECLARE_STATIC_KEY_FALSE macro. > > Cc: Paul Blakey > Signed-off-by: Greg Rose > --- > acinclude.m4 | 3 ++ > datapath/datapath.c | 34 --- > datapath/datapath.h | 2 ++ > datapath/flow.c | 13 +++ > .../linux/compat/include/linux/static_key.h | 7 > 5 files changed, 55 insertions(+), 4 deletions(-) > > diff --git a/acinclude.m4 b/acinclude.m4 > index 84f344da0..3d56510a0 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -631,6 +631,9 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ >[OVS_DEFINE([HAVE_UPSTREAM_STATIC_KEY])]) >OVS_GREP_IFELSE([$KSRC/include/linux/jump_label.h], > [DEFINE_STATIC_KEY_FALSE], >[OVS_DEFINE([HAVE_DEFINE_STATIC_KEY])]) > + OVS_GREP_IFELSE([$KSRC/include/linux/jump_label.h], > + [DECLARE_STATIC_KEY_FALSE], > + [OVS_DEFINE([HAVE_DECLARE_STATIC_KEY])]) > >OVS_GREP_IFELSE([$KSRC/include/linux/etherdevice.h], [eth_hw_addr_random]) >OVS_GREP_IFELSE([$KSRC/include/linux/etherdevice.h], [ether_addr_copy]) > diff --git a/datapath/datapath.c b/datapath/datapath.c > index c8c21d774..36f1e7894 100644 > --- a/datapath/datapath.c > +++ b/datapath/datapath.c > @@ -1635,10 +1635,34 @@ static void ovs_dp_reset_user_features(struct sk_buff > *skb, struct genl_info *in > dp->user_features = 0; > } > > -static void ovs_dp_change(struct datapath *dp, struct nlattr *a[]) > +DEFINE_STATIC_KEY_FALSE(tc_recirc_sharing_support); > + > +static int ovs_dp_change(struct datapath *dp, struct nlattr *a[]) > { > - if (a[OVS_DP_ATTR_USER_FEATURES]) > - dp->user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]); > + u32 user_features = 0; > + > + if (a[OVS_DP_ATTR_USER_FEATURES]) { > + user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]); > + > + if (user_features & ~(OVS_DP_F_VPORT_PIDS | > + OVS_DP_F_UNALIGNED | > + OVS_DP_F_TC_RECIRC_SHARING)) > + return -EOPNOTSUPP; > + > +#if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT) > + if (user_features & OVS_DP_F_TC_RECIRC_SHARING) > + return -EOPNOTSUPP; > +#endif > + } > + > + dp->user_features = user_features; > + > + if (dp->user_features & OVS_DP_F_TC_RECIRC_SHARING) > + static_branch_enable(&tc_recirc_sharing_support); > + else > + static_branch_disable(&tc_recirc_sharing_support); > + > + return 0; > } > > static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info) > @@ -1700,7 +1724,9 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct > genl_info *info) > parms.port_no = OVSP_LOCAL; > parms.upcall_portids = a[OVS_DP_ATTR_UPCALL_PID]; > > - ovs_dp_change(dp, a); > + err = ovs_dp_change(dp, a); > + if (err) > + goto err_destroy_meters; > > /* So far only local changes have been made, now need the lock. */ > ovs_lock(); Hi Greg, In the upstream commit, it also updates ovs_dp_cmd_set() which seems to be missing in this patch. Can you d
Re: [ovs-dev] [PATCH V3 04/24] datapath: Print error when ovs_execute_actions() fails
On Wed, Sep 16, 2020 at 10:33 AM Greg Rose wrote: > > From: Yifeng Sun > > Upstream commit: > commit aa733660dbd8d9192b8c528ae0f4b84f3fef74e4 > Author: Yifeng Sun > Date: Sun Aug 4 19:56:11 2019 -0700 > > openvswitch: Print error when ovs_execute_actions() fails > > Currently in function ovs_dp_process_packet(), return values of > ovs_execute_actions() are silently discarded. This patch prints out > an debug message when error happens so as to provide helpful hints > for debugging. > Acked-by: Pravin B Shelar > > Signed-off-by: David S. Miller > > Cc: Yifeng Sun > Reviewed-by: Yifeng Sun > Signed-off-by: Greg Rose > --- LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ovsdb: Remove read permission of *.db from others
Currently, when ovsdb *.db is created by ovsdb-tool it grants read permission to others. This may incur security concerns, for example, IPsec Pre-shared keys are stored in ovs-vsitchd.conf.db. This patch addresses the concerns by removing permission for others. Reported-by: Antonin Bas Signed-off-by: Yi-Hung Wei --- ovsdb/log.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovsdb/log.c b/ovsdb/log.c index 41af77679178..4a28fa3db6da 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -212,7 +212,7 @@ ovsdb_log_open(const char *name, const char *magic, if (!strcmp(name, "/dev/stdin") && open_mode == OVSDB_LOG_READ_ONLY) { fd = dup(STDIN_FILENO); } else { -fd = open(name, flags, 0666); +fd = open(name, flags, 0660); } if (fd < 0) { const char *op = (open_mode == OVSDB_LOG_CREATE_EXCL ? "create" -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovs-lib: Handle daemon segfaults during exit.
On Fri, Sep 18, 2020 at 9:54 AM Gurucharan Shetty wrote: > > Currently, we terminate a daemon by trying > "ovs-appctl exit", "SIGTERM" and finally "SIGKILL". > But the logic fails if during "ovs-appctl exit", the > daemon crashes (segfaults). The monitor will automatically > restart the daemon with a new pid. The current logic of > checking the non-existance of old pid succeeds and we proceed > with the assumption that the daemon is dead. > > This is a problem during OVS upgrades as we will continue > to run the older version of OVS. > > With this commit, we take care of this situation. If there > is a segfault, the pidfile is not deleted. So, we wait a > little to give time for the monitor to restart the daemon > (which is usually instantaneous) and then re-read the pidfile. > > VMware-BZ: #2633995 > Signed-off-by: Gurucharan Shetty > --- Thanks for the patch. I think it does fix a case when --monitor is specified and ovs crash during "ovs-appctl exit". Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] selinux: Add missing permissions for ovs-kmod-ctl
On RHEL 8, a SELinux policy is missing when ovs-kmod-ctl use modprobe to load kernel modules. This patch adds the missing permissions based on /var/log/audit/audit.log Example log of the AVC violations: type=AVC msg=audit(1599075387.136:65): avc: denied { read } for pid=1472 comm="modprobe" name="modules.alias.bin" dev="dm-0" ino=586629 scontext=system_u:system_r:openvswitch_load_module_t:s0 tcontext=system_u:object_r:modules_dep_t:s0 tclass=file permissive=0 type=AVC msg=audit(1599085253.148:45): avc: denied { open } for pid=1355 comm="modprobe" path="/usr/lib/modules/4.18.0-193.el8.x86_64/modules.dep.bin" dev="dm-0" ino=624258 scontext=system_u:system_r:openvswitch_load_module_t:s0 tcontext=unconfined_u:object_r:modules_dep_t:s0 tclass=file permissive=0 VMWare-BZ: #2633569 Signed-off-by: Yi-Hung Wei --- selinux/openvswitch-custom.te.in | 2 ++ 1 file changed, 2 insertions(+) diff --git a/selinux/openvswitch-custom.te.in b/selinux/openvswitch-custom.te.in index 2adaf231fe63..beb0ab0d6612 100644 --- a/selinux/openvswitch-custom.te.in +++ b/selinux/openvswitch-custom.te.in @@ -19,6 +19,7 @@ require { type kernel_t; type hostname_exec_t; type modules_conf_t; +type modules_dep_t; type modules_object_t; type passwd_file_t; type plymouth_exec_t; @@ -121,6 +122,7 @@ allow openvswitch_load_module_t insmod_exec_t:file { execute execute_no_trans ge allow openvswitch_load_module_t kernel_t:system module_request; allow openvswitch_load_module_t modules_conf_t:dir { getattr open read search }; allow openvswitch_load_module_t modules_conf_t:file { getattr open read }; +allow openvswitch_load_module_t modules_dep_t:file { getattr map open read }; allow openvswitch_load_module_t modules_object_t:file { map getattr open read }; allow openvswitch_load_module_t modules_object_t:dir { getattr open read search }; allow openvswitch_load_module_t openvswitch_load_module_exec_t:file { entrypoint }; -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ofproto-dpif-xlate: Fix tun_metadata match after recirc
Consider the following OpenFlow rules that match on tun_metadata0 after recirculation. If we start ICMP flow with tun_metadata0=0x1 follow by a flow with tun_metadata0=0x3, OVS will incorrectly match the second flow with the tun_metadata0=0x1 rule. table=0, in_port=gnv1, icmp, action=ct(table=1) table=0, in_port=gnv0, icmp action=ct(table=1) table=1, in_port=gnv1, icmp, action=output:gnv0 table=1, in_port=gnv0, icmp, ct_state=+trk+new, action=ct(commit),output:gnv1 table=1, in_port=gnv0, icmp, ct_state=+trk+est, tun_metadata0=0x1 action=output:gnv1 table=1, in_port=gnv0, icmp, ct_state=+trk+est, tun_metadata0=0x3 action=output:gnv1 The root cause of this issue is that during recirculation OVS will overwrite the tun_metadata0 with the one stored in the frozen state. This will result in erroneous behavior if tun_metadata0 is wildcarded in the megaflow as shown below. Notice that both the second and the third megaflow both carry 0x1 in tun_metadata0, where the expected behavior in the thrid megaflow one should be 0x3. recirc_id(0),tunnel(tun_id=0x2,src=172.31.1.1,dst=172.31.1.100,geneve({}{}{}), flags(-df-csum+key)),in_port(5),eth(),eth_type(0x0800),ipv4(proto=1,frag=no), packets:213, bytes:20774, used:0.672s, actions:ct,recirc(0x4) recirc_id(0x4),tunnel(tun_id=0x2,src=172.31.1.1,dst=172.31.1.100, geneve({class=0x102,type=0x7,len=4,0x1}{class=0,type=0,len=0}), flags(-df-csum+key)),in_port(5),ct_state(-new+est+trk),eth(),eth_type(0x0800), ipv4(proto=1,tos=0/0x3,frag=no), packets:112, bytes:10976, used:0.672s, actions:set(tunnel(tun_id=0x2,dst=172.31.1.2,ttl=64,tp_src=64725,tp_dst=6081, geneve({class=0x102,type=0x7,len=4,0x1}),flags(df|key))),5 recirc_id(0x4),tunnel(tun_id=0x2,src=172.31.1.1,dst=172.31.1.100, geneve({class=0x102,type=0x7,len=4,0x3}{class=0,type=0,len=0}), flags(-df-csum+key)),in_port(5),ct_state(-new+est+trk),eth(),eth_type(0x0800), ipv4(proto=1,tos=0/0x3,frag=no), packets:98, bytes:9506, used:0.992s, actions:set(tunnel(tun_id=0x2,dst=172.31.1.2,ttl=64,tp_src=64725,tp_dst=6081, geneve({class=0x102,type=0x7,len=4,0x1}),flags(df|key))),5 There seems to be two ways to fix this issue, one is to unwildcarded all the tunnel metadata when OVS generates megaflow with recirculation. The other one is to add a boolean flag in the frozen state, set the flag if the flow is from a tunnel port, and ask OVS to honor the tunnel metadata from the flow instead of from the frozen state. Since the first approach will increase the number of megaflows and may incur performance impact, this patch takes the second approach. With this patch, the megaflows become as below: recirc_id(0),tunnel(tun_id=0x2,src=172.31.1.1,dst=172.31.1.100, geneve({}{}{}),flags(-df-csum+key)),in_port(5),eth(),eth_type(0x0800), ipv4(proto=1,frag=no), packets:60, bytes:5850, used:0.004s, actions:ct,recirc(0x8) recirc_id(0x8),tunnel(tun_id=0x2,src=172.31.1.1,dst=172.31.1.100, geneve({class=0x102,type=0x7,len=4,0x1}{class=0,type=0,len=0}), flags(-df-csum+key)),in_port(5),ct_state(-new+est+trk),eth(),eth_type(0x0800), ipv4(proto=1,tos=0/0x3,frag=no), packets:29, bytes:2842, used:0.356s, actions:set(tunnel(tun_id=0x2,dst=172.31.1.2,ttl=64,tp_src=64725,tp_dst=6081, geneve({class=0x102,type=0x7,len=4,0x1}),flags(df|key))),5 recirc_id(0x8),tunnel(tun_id=0x2,src=172.31.1.1,dst=172.31.1.100, geneve({class=0x102,type=0x7,len=4,0x3}{class=0,type=0,len=0}), flags(-df-csum+key)),in_port(5),ct_state(-new+est+trk), eth(),eth_type(0x0800), ipv4(proto=1,tos=0/0x3,frag=no), packets:28, bytes:2716, used:0.004s, actions:set(tunnel(tun_id=0x2,dst=172.31.1.2,ttl=64,tp_src=64725,tp_dst=6081, geneve({class=0x102,type=0x7,len=4,0x3}),flags(df|key))),5 VMWare-BZ: 2617696 Reported-by: Ran Gu Signed-off-by: Yi-Hung Wei --- https://travis-ci.org/github/YiHungWei/ovs/builds/719445339 --- ofproto/ofproto-dpif-rid.h| 8 ++-- ofproto/ofproto-dpif-upcall.c | 16 +++- ofproto/ofproto-dpif-xlate.c | 16 +++- tests/tunnel.at | 33 + 4 files changed, 69 insertions(+), 4 deletions(-) diff --git a/ofproto/ofproto-dpif-rid.h b/ofproto/ofproto-dpif-rid.h index 30cd5275f24c..d6df1212cc0f 100644 --- a/ofproto/ofproto-dpif-rid.h +++ b/ofproto/ofproto-dpif-rid.h @@ -131,9 +131,12 @@ frozen_metadata_from_flow(struct frozen_metadata *md, static inline void frozen_metadata_to_flow(struct ofproto *ofproto, const struct frozen_metadata *md, -struct flow *flow) +struct flow *flow, +bool from_tunnel) { -flow->tunnel = md->tunnel; +if (!from_tunnel) { +flow->tunnel = md->tunnel; +} flow->tunnel.metadata.tab = ofproto_get_tun_tab(ofproto); flow->metadata = md->metadata; memcpy(flow->regs, md->regs, sizeof flow->regs); @@ -153,6 +156,7 @@ struct frozen_state { size_t stack_size; mirror_mask_t mirrors;/* Mirror
Re: [ovs-dev] [PATCH V2] rhel: ovs-kmod-manage.sh: Disable unneeded warning
On Thu, Aug 20, 2020 at 2:56 PM Greg Rose wrote: > > The script itself says which versions the script is needed for but > it is run on RHEL 8.x as well where it is not needed. Disable the > warning and change the exit code to zero since it may unnecessarily > alarm users and is really only for debugging anyway. > > Signed-off-by: Greg Rose > --- > > V2 - As per Yi-Hung's suggestion change exit code to zero. > --- LGTM Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] rhel: ovs-kmod-manage.sh: Disable unneeded warning
On Thu, Aug 20, 2020 at 11:00 AM Greg Rose wrote: > > The script itself says which versions the script is needed for but > it is run on RHEL 8.x as well where it is not needed. Disable the > warning since it may unnecessarily alarm users and is really only > for debugging anyway. > > Signed-off-by: Greg Rose > --- > rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh > b/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh > index c70e135..c2c8d18 100644 > --- a/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh > +++ b/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh > @@ -136,7 +136,7 @@ elif [ "$mainline_major" = "4" ] && [ "$mainline_minor" = > "12" ]; then > fi > > if [ X"$ver_offset" = X ]; then > -echo "This script is not intended to run on kernel $(uname -r)" > +#echo "This script is not intended to run on kernel $(uname -r)" > exit 1 > fi Hi Greg, >From /rhel/openvswitch-kmod-fedora.spec.in, looks like for RHEL 8.2, ovs-kmod-manage.sh is invoked in the %posttrans section, since in the %post section we do check if the kernel version is applied to ovs-kmod-manage.sh. Commenting out the warning message would make the rpm package installation warning from: $ rpm -ivh ./openvswitch-kmod-2.13.1.rhel82.16775750-1.el8.x86_64.rpm current kernel is 4.18.0-193.el8.x86_64 This script is not intended to run on kernel 4.18.0-193.el8.x86_64 warning: %posttrans(openvswitch-kmod-2.13.1.rhel82.16775750-1.el8.x86_64) scriptlet failed, exit status 1 To the following, which still contains confusing message "scriptlet failed, exit status 1". current kernel is 4.18.0-193.el8.x86_64 warning: %posttrans(openvswitch-kmod-2.13.1.rhel82.16775750-1.el8.x86_64) scriptlet failed, exit status 1 ovs-kmod-manage.sh is not intended to run on RHEL 8.2, should we consider to 1) add checking on the %posttrans section or 2) simply make the exit code from 1 to 0 so that "the scriptlet failed" is not triggered? Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] python: Fixup python shebangs to python3
On Tue, Aug 18, 2020 at 9:45 AM Greg Rose wrote: > > Builds on RHEL 8.2 systems are failing due to this issue. > > See [1] as to why this is necessary. > > I used the following command to identify files that need this fix: > find . -type f -executable | /usr/lib/rpm/redhat/brp-mangle-shebangs > > I also updated the copyright notices as needed. > > 1. https://fedoraproject.org/wiki/Changes/Make_ambiguous_python_shebangs_error > > Signed-off-by: Greg Rose > --- Thanks for the patch. LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovs-bugtool: Fix Python3 bytes str issue.
On Mon, Jul 13, 2020 at 1:46 PM William Tu wrote: > > The patch fixes two errors due to type mismatched, when converting > between str and bytes: > File "/usr/local/sbin/ovs-bugtool", line 649, in main > cmd_output(CAP_NETWORK_STATUS, [OVS_DPCTL, 'dump-flows', '-m', d]) > File "/usr/local/sbin/ovs-bugtool", line 278, in cmd_output > label = ' '.join(a) > TypeError: sequence item 3: expected str instance, bytes found > > And > File "/usr/sbin/ovs-bugtool", line 721, in main > collect_data() > File "/usr/sbin/ovs-bugtool", line 366, in collect_data > run_procs(process_lists.values()) > File "/usr/sbin/ovs-bugtool", line 1354, in run_procs > p.inst.write("\n** timeout **\n") > File "/usr/sbin/ovs-bugtool", line 1403, in write > BytesIO.write(self, s) > TypeError: a bytes-like object is required, not 'str' > > VMware-BZ: #2602135 > Fixed: 9e6c00bca9af ("bugtool: Fix for Python3.") > Signed-off-by: William Tu > --- > utilities/bugtool/ovs-bugtool.in | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/utilities/bugtool/ovs-bugtool.in > b/utilities/bugtool/ovs-bugtool.in > index 1a5170d8c78b..8ca4c922a104 100755 > --- a/utilities/bugtool/ovs-bugtool.in > +++ b/utilities/bugtool/ovs-bugtool.in > @@ -643,7 +643,7 @@ exclude those logs from the archive. > if os.path.exists(OPENVSWITCH_VSWITCHD_PID): > cmd_output(CAP_NETWORK_STATUS, [OVS_DPCTL, 'show', '-s']) > for d in dp_list(): > -cmd_output(CAP_NETWORK_STATUS, [OVS_DPCTL, 'dump-flows', '-m', > d]) > +cmd_output(CAP_NETWORK_STATUS, [OVS_DPCTL, 'dump-flows', '-m', > d.decode()]) Thanks for the patch. I got the following flake check warning. utilities/bugtool/ovs-bugtool.in:646:80: E501 line too long (87 > 79 characters) Other than that, it looks good to me. Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] netdev-linux: Fix broken build on Ubuntu 14.04
Patch 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") uses __virtio16 which is defined in kernel 3.19. Ubuntu 14.04 is using 3.13 kernel that lacks the virtio_types definition. This patch fixes that. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Yi-Hung Wei --- Travis CI: https://travis-ci.org/github/YiHungWei/ovs/builds/705960203 --- acinclude.m4 | 12 configure.ac | 1 + lib/netdev-linux.c | 8 3 files changed, 21 insertions(+) diff --git a/acinclude.m4 b/acinclude.m4 index 054ec2e3c43f..863a04349373 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -250,6 +250,18 @@ AC_DEFUN([OVS_CHECK_LINUX_SCTP_CT], [ [Define to 1 if SCTP_CONNTRACK_HEARTBEAT_SENT is available.])]) ]) +dnl OVS_CHECK_LINUX_VIRTIO_TYPES +dnl +dnl Checks for kernels that need virtio_types definition. +AC_DEFUN([OVS_CHECK_LINUX_VIRTIO_TYPES], [ + AC_COMPILE_IFELSE([ +AC_LANG_PROGRAM([#include ], [ +__virtio16 x = 0; +])], +[AC_DEFINE([HAVE_VIRTIO_TYPES], [1], +[Define to 1 if __virtio16 is available.])]) +]) + dnl OVS_FIND_DEPENDENCY(FUNCTION, SEARCH_LIBS, NAME_TO_PRINT) dnl dnl Check for a function in a library list. diff --git a/configure.ac b/configure.ac index 1877aae561d8..5ce510c2032f 100644 --- a/configure.ac +++ b/configure.ac @@ -188,6 +188,7 @@ OVS_CHECK_LINUX OVS_CHECK_LINUX_NETLINK OVS_CHECK_LINUX_TC OVS_CHECK_LINUX_SCTP_CT +OVS_CHECK_LINUX_VIRTIO_TYPES OVS_CHECK_DPDK OVS_CHECK_PRAGMA_MESSAGE AC_SUBST([OVS_CFLAGS]) diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 6269c24acf75..fe7fb9b29c0e 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -227,6 +227,14 @@ struct rtnl_link_stats64 { uint64_t tx_compressed; }; +/* Linux 3.19 introduced virtio_types.h. It might be missing + * if we are using old kernel. */ +#ifndef HAVE_VIRTIO_TYPES +typedef __u16 __bitwise__ __virtio16; +typedef __u32 __bitwise__ __virtio32; +typedef __u64 __bitwise__ __virtio64; +#endif + enum { VALID_IFINDEX = 1 << 0, VALID_ETHERADDR = 1 << 1, -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] bridge: Fix null dereference on ct_timeout_policy record
Accoridng to vswitch.ovsschema, each CT_Zone record may have zero or one associcated CT_Timeout_policy. Thus, this patch checks if ovsrec_ct_timeout_policy exist before accesses the record. VMWare-BZ: 2585825 Fixes: 45339539f69d ("ovs-vsctl: Add conntrack zone commands.") Fixes: 993cae678bca ("ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables") Reported-by: Yang Song Signed-off-by: Yi-Hung Wei --- Travis CI: https://travis-ci.org/github/YiHungWei/ovs/builds/699829754 Please backport this fix to branch-2.13. --- tests/ovs-vsctl.at| 8 utilities/ovs-vsctl.c | 10 +++--- vswitchd/bridge.c | 6 -- 3 files changed, 19 insertions(+), 5 deletions(-) diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at index 55c7a6e179cd..c8babe36120a 100644 --- a/tests/ovs-vsctl.at +++ b/tests/ovs-vsctl.at @@ -966,6 +966,14 @@ AT_CHECK([RUN_OVS_VSCTL([--if-exists del-zone-tp netdev zone=1])]) AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:2, Timeout Policies: icmp_first=2 icmp_reply=3 ]) +AT_CHECK( + [RUN_OVS_VSCTL_TOGETHER([--id=@n create CT_Zone external_ids:"test"="123"], + [--id=@m create Datapath datapath_version=0 ct_zones:"10"=@n], + [set Open_vSwitch . datapaths:"netdev"=@m])], + [0], [stdout]) +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:10, Timeout Policies: system default +]) + AT_CHECK([RUN_OVS_VSCTL([-- --id=@m create Datapath datapath_version=0 'capabilities={recirc=true}' -- set Open_vSwitch . datapaths:"system"=@m])], [0], [stdout]) AT_CHECK([RUN_OVS_VSCTL([list-dp-cap system])], [0], [recirc=true ]) diff --git a/utilities/ovs-vsctl.c b/utilities/ovs-vsctl.c index bd3972636e66..37cc72d401d3 100644 --- a/utilities/ovs-vsctl.c +++ b/utilities/ovs-vsctl.c @@ -1344,9 +1344,13 @@ cmd_list_zone_tp(struct ctl_context *ctx) struct ovsrec_ct_timeout_policy *tp = zone->timeout_policy; -for (int j = 0; j < tp->n_timeouts; j++) { -ds_put_format(&ctx->output, "%s=%"PRIu64" ", - tp->key_timeouts[j], tp->value_timeouts[j]); +if (tp) { +for (int j = 0; j < tp->n_timeouts; j++) { +ds_put_format(&ctx->output, "%s=%"PRIu64" ", +tp->key_timeouts[j], tp->value_timeouts[j]); +} +} else { +ds_put_cstr(&ctx->output, "system default"); } ds_chomp(&ctx->output, ' '); ds_put_char(&ctx->output, '\n'); diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index f312efd8e128..0bb4fa6524e9 100644 --- a/vswitchd/bridge.c +++ b/vswitchd/bridge.c @@ -635,8 +635,10 @@ static void get_timeout_policy_from_ovsrec(struct simap *tp, const struct ovsrec_ct_timeout_policy *tp_cfg) { -for (size_t i = 0; i < tp_cfg->n_timeouts; i++) { -simap_put(tp, tp_cfg->key_timeouts[i], tp_cfg->value_timeouts[i]); +if (tp_cfg) { +for (size_t i = 0; i < tp_cfg->n_timeouts; i++) { +simap_put(tp, tp_cfg->key_timeouts[i], tp_cfg->value_timeouts[i]); +} } } -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Port patches to windows datapath: "conntrack: Fix conntrack new state"
On Thu, Jun 18, 2020 at 7:05 AM Rui Cao wrote: > > Hi, > > Recenty we found that two previous patches which fix the TCP conntrack new > state issue in userspace are also needed by windows datapath. > > Here's the details of the issue: > https://github.com/openvswitch/ovs-issues/issues/188 > > I port these two patches to windows datapath and verified it locally, could > you help review the patch: > https://github.com/openvswitch/ovs/pull/320 > [https://avatars3.githubusercontent.com/u/7143863?s=400&v=4]<https://github.com/openvswitch/ovs/pull/320> > Port patches to windows: "conntrack: Fix conntrack new state" by ruicao93 · > Pull Request #320 · > openvswitch/ovs<https://github.com/openvswitch/ovs/pull/320> > Port following commits to windows driver to fix the TCP conntrack new state > issue on windows: a867c01 ac23d20 @YiHungWei Fixes openvswitch/ovs-issues#188 > github.com > > > Port patches to windows: "conntrack: Fix conntrack new state" > > Port following commits to windows to fix the conntrack new state issue: > - a867c010ee9183885ee9d3eb76a0005c075c4d2e > - ac23d20fc90da3b1c9b2117d1e22102e99fba006 > > Signed-off-by: Rui Cao > Thanks for porting this patch to windows datapath. It looks good from conntrack's perspective. I do not have a windows environment to test it tho. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv2] rhel: Support RHEL 7.8 kernel module rpm build.
On Wed, Jun 17, 2020 at 11:37 AM William Tu wrote: > > Add support for RHEL7.8 GA release with kernel 3.10.0-1127. > > VMware-BZ: #2582834 > Signed-off-by: William Tu > --- Looks good to me. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] rhel: Support RHEL 7.8 kernel module rpm build.
Hi William, Thanks for the patch. Should we update rhel/openvswitch-kmod-fedora.spec.in so that the ovs-kmod-manage.sh will be invoked in the post installation stage? I.e. 391b52f3c33a ("rhel: Support RHEL 7.8 kernel module rpm build") Thanks, -Yi-Hung On Wed, Jun 17, 2020 at 8:43 AM William Tu wrote: > > Add support for RHEL7.8 GA release with kernel 3.10.0-1127. > > VMware-BZ: #2582834 > Signed-off-by: William Tu > --- > rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh > b/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh > index a9b5cdd817da..93d487101253 100644 > --- a/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh > +++ b/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh > @@ -19,7 +19,8 @@ > # - 3.10.0 major revision 693 (RHEL 7.4) > # - 3.10.0 major revision 957 (RHEL 7.6) > # - 3.10.0 major revision 1062 (RHEL 7.7) > -# - 3.10.0 major revision 1101 (RHEL 7.8) > +# - 3.10.0 major revision 1101 (RHEL 7.8 Beta) > +# - 3.10.0 major revision 1127 (RHEL 7.8 GA) > # - 4.4.x, x >= 73 (SLES 12 SP3) > # - 4.12.x, x >= 14 (SLES 12 SP4). > # It is packaged in the openvswitch kmod RPM and run in the post-install > @@ -113,6 +114,12 @@ if [ "$mainline_major" = "3" ] && [ "$mainline_minor" = > "10" ]; then > ver_offset=4 > installed_ver="$minor_rev" > fi > +elif [ "$major_rev" = "1127" ]; then > +#echo "rhel78" > +comp_ver=10 > +ver_offset=4 > +installed_ver="$minor_rev" > +fi > elif [ "$mainline_major" = "4" ] && [ "$mainline_minor" = "4" ]; then > if [ "$mainline_patch" -ge "73" ]; then > #echo "sles12sp3" > -- > 2.7.4 > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] docs: Add note for AF_XDP installation
Add notes about some configuration issues when enabling AF_XDP support. Signed-off-by: Yi-Hung Wei --- Documentation/intro/install/afxdp.rst | 9 + 1 file changed, 9 insertions(+) diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst index 99003e4dbdb2..3c8f78825b41 100644 --- a/Documentation/intro/install/afxdp.rst +++ b/Documentation/intro/install/afxdp.rst @@ -146,11 +146,20 @@ Make sure the libbpf.so is installed correctly:: ldconfig ldconfig -p | grep libbpf +.. note:: + Check /etc/ld.so.conf if libbpf is installed but can not be found by + ldconfig. + Third, ensure the standard OVS requirements are installed and bootstrap/configure the package:: ./boot.sh && ./configure --enable-afxdp +.. note:: + If you encounter "WARNING: bpf/libbpf.h: present but cannot be compiled", + check the Linux headers are in line with libbpf. For example, in Ubuntu, + check the installed linux-headers* and linux-libc-dev* dpkg. + Finally, build and install OVS:: make && make install -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] metaflow: Fix maskable conntrack orig tuple fields
On Thu, May 14, 2020 at 8:37 AM William Tu wrote: > > On Wed, May 13, 2020 at 01:11:17PM -0700, Yi-Hung Wei wrote: > > From man ovs-fields(7), the conntrack origin tuple fields > > ct_nw_src/dst, ct_ipv6_src/dst, and ct_tp_src/dst are supposed > > to be bitwise maskable, but they are not. This patch enables > > those fields to be maskable, and adds a regression test. > > > > Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.") > > Reported-by: Wenying Dong > > Signed-off-by: Yi-Hung Wei > > --- > > Travis CI: https://travis-ci.org/github/YiHungWei/ovs/builds/686707703 > > --- > > Thanks for fixing it and adding tests! Applied to master. > William Thanks William for review. Can we backport it to older branches (as far as we can cleanly apply) ? Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] metaflow: Fix maskable conntrack orig tuple fields
>From man ovs-fields(7), the conntrack origin tuple fields ct_nw_src/dst, ct_ipv6_src/dst, and ct_tp_src/dst are supposed to be bitwise maskable, but they are not. This patch enables those fields to be maskable, and adds a regression test. Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.") Reported-by: Wenying Dong Signed-off-by: Yi-Hung Wei --- Travis CI: https://travis-ci.org/github/YiHungWei/ovs/builds/686707703 --- lib/meta-flow.c | 30 +-- tests/ofproto-dpif.at | 56 +++ 2 files changed, 80 insertions(+), 6 deletions(-) diff --git a/lib/meta-flow.c b/lib/meta-flow.c index 9ab82460bfc4..c808d205d5b4 100644 --- a/lib/meta-flow.c +++ b/lib/meta-flow.c @@ -2328,12 +2328,6 @@ mf_set(const struct mf_field *mf, switch (mf->id) { case MFF_CT_ZONE: case MFF_CT_NW_PROTO: -case MFF_CT_NW_SRC: -case MFF_CT_NW_DST: -case MFF_CT_IPV6_SRC: -case MFF_CT_IPV6_DST: -case MFF_CT_TP_SRC: -case MFF_CT_TP_DST: case MFF_RECIRC_ID: case MFF_PACKET_TYPE: case MFF_CONJ_ID: @@ -2457,6 +2451,30 @@ mf_set(const struct mf_field *mf, ntoh128(mask->be128)); break; +case MFF_CT_NW_SRC: +match_set_ct_nw_src_masked(match, value->be32, mask->be32); +break; + +case MFF_CT_NW_DST: +match_set_ct_nw_dst_masked(match, value->be32, mask->be32); +break; + +case MFF_CT_IPV6_SRC: +match_set_ct_ipv6_src_masked(match, &value->ipv6, &mask->ipv6); +break; + +case MFF_CT_IPV6_DST: +match_set_ct_ipv6_dst_masked(match, &value->ipv6, &mask->ipv6); +break; + +case MFF_CT_TP_SRC: +match_set_ct_tp_src_masked(match, value->be16, mask->be16); +break; + +case MFF_CT_TP_DST: +match_set_ct_tp_dst_masked(match, value->be16, mask->be16); +break; + case MFF_ETH_DST: match_set_dl_dst_masked(match, value->mac, mask->mac); break; diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at index d444cf0844a9..41164d735d8a 100644 --- a/tests/ofproto-dpif.at +++ b/tests/ofproto-dpif.at @@ -10570,6 +10570,62 @@ udp,vlan_tci=0x,dl_src=50:54:00:00:00:0a,dl_dst=50:54:00:00:00:09,nw_src=10. OVS_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([ofproto-dpif - conntrack - match masked ct fields]) +OVS_VSWITCHD_START + +add_of_ports br0 1 2 + +AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg vconn:info ofproto_dpif:info]) + +dnl Allow new connections on p1->p2. Allow only established connections p2->p1 +AT_DATA([flows.txt], [dnl +table=0,arp,action=normal +table=0,ip,in_port=1,udp,nw_src=10.1.2.1/24,action=ct(commit) +table=0,ip,in_port=1,udp6,ipv6_dst=2001:db8::1/64,action=ct(commit) +table=0,ip,in_port=1,udp,tp_src=3/0x1,action=ct(commit) +table=0,ip,in_port=2,actions=ct(table=1) +table=0,ip6,in_port=2,actions=ct(table=1) +table=1,priority=10,udp,ct_state=+trk+rpl,ct_nw_src=10.1.2.1/24,actions=controller +table=1,priority=10,udp6,ct_state=+trk+rpl,ct_ipv6_dst=2001:db8::1/64,actions=controller +table=1,priority=10,udp,ct_state=+trk+rpl,ct_tp_src=3/0x1,actions=controller +table=1,priority=1,action=drop +]) + +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +AT_CAPTURE_FILE([ofctl_monitor.log]) +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl -P nxt_packet_in --detach --no-chdir --pidfile 2> ofctl_monitor.log]) + +dnl Match ct_nw_src=10.1.2.1/24 +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.1.2.100,dst=10.1.2.200,proto=17,tos=0,ttl=64,frag=no),udp(src=6,dst=6)']) +AT_CHECK([ovs-appctl netdev-dummy/receive p2 'in_port(2),eth(src=50:54:00:00:00:0a,dst=50:54:00:00:00:09),eth_type(0x0800),ipv4(src=10.1.2.200,dst=10.1.2.100,proto=17,tos=0,ttl=64,frag=no),udp(src=6,dst=6)']) + +dnl Match ct_ipv6_dst=2001:db8::1/64 +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x86dd),ipv6(src=2001:db8::1,dst=2001:db8::2,label=0,proto=17,tclass=0x70,hlimit=128,frag=no),udp(src=1,dst=2)']) +AT_CHECK([ovs-appctl netdev-dummy/receive p2 'in_port(2),eth(src=50:54:00:00:00:0a,dst=50:54:00:00:00:09),eth_type(0x86dd),ipv6(src=2001:db8::2,dst=2001:db8::1,label=0,proto=17,tclass=0x70,hlimit=128,frag=no),udp(src=2,dst=1)']) + +dnl Match ct_tp_src=3/0x1 +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.1.1.1,dst=10.1.1.2,proto=17,tos=0,ttl=64,frag=no),udp(src=1,dst=2)']) +AT_CHECK([ovs-appctl netdev-dummy/receive p2 'in_port(2),eth(src=50:54:00:00:00:0a,dst=50:54:00:00:00:09),eth_type(0x0800),ipv4(src=10.1.1.2,dst=10.1.1.1,proto=17,tos=0,ttl=64,frag=no),udp(src=2,dst=1)']) + +OVS_WAIT_UNTIL([t
Re: [ovs-dev] [PATCHv5] userspace: Add conntrack timeout policy support.
On Wed, Apr 29, 2020 at 12:25 PM William Tu wrote: > > Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout > policy support") adds conntrack timeout policy for kernel datapath. > This patch enables support for the userspace datapath. I tested > using the 'make check-system-userspace' which checks the timeout > policies for ICMP and UDP cases. > > Signed-off-by: William Tu > --- > v5: address feedback from Yi-Hung > - couple improvement for error handling > - travis: https://travis-ci.org/github/williamtu/ovs-travis/builds/681160951 > - currently failed due to kernel issue > > v4: address feedback from Yi-Hung > - move default policy value to lib/conntrack-tp.c > - separate icmp bug patch > - refactor and fix include issues > - fix the clang lock analysis annotation > - keep clean interval to 5 seconds > - improve tests in system-traffic.at > - travis: https://travis-ci.org/github/williamtu/ovs-travis/builds/680158645 > > v3: > - address feedback from Yi-Hung > - use ID 0 as default policy > - move conn_{init,update}_expiration to lib/conntrack-tp.c > - s/tpid/tp_id/ > - add default timeout value to CT_DPIF_TP_*_ATTRs > - reduce the CT_CLEAN_INTERVAL from 5 to 3s, to make the tests > run faster. > - add more tests to system-traffic.at > - code refactoring and renaming > --- Thanks for this new version. It looks good to me. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] compat: Fix ipv6_dst_lookup build error
The geneve/vxlan compat code base invokes ipv6_dst_lookup() which is recently replaced by ipv6_dst_lookup_flow() in the stable kernel tree. This causes travis build failure: * https://travis-ci.org/github/openvswitch/ovs/builds/681084038 This patch updates the backport logic to invoke the right function. Related patch in git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git b9f3e457098e ("net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup") Signed-off-by: Yi-Hung Wei --- Travis test: https://travis-ci.org/github/YiHungWei/ovs/builds/681179784 --- acinclude.m4 | 3 +++ datapath/linux/compat/geneve.c | 11 +++ datapath/linux/compat/vxlan.c | 14 -- 3 files changed, 18 insertions(+), 10 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index 0e90c333211e..dabbffd01cf7 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -589,7 +589,10 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ OVS_GREP_IFELSE([$KSRC/include/net/addrconf.h], [ipv6_dst_lookup.*net], [OVS_DEFINE([HAVE_IPV6_DST_LOOKUP_NET])]) + OVS_GREP_IFELSE([$KSRC/include/net/addrconf.h], [ipv6_dst_lookup_flow.*net], + [OVS_DEFINE([HAVE_IPV6_DST_LOOKUP_FLOW_NET])]) OVS_GREP_IFELSE([$KSRC/include/net/addrconf.h], [ipv6_stub]) + OVS_GREP_IFELSE([$KSRC/include/net/addrconf.h], [ipv6_dst_lookup_flow]) OVS_GREP_IFELSE([$KSRC/include/linux/err.h], [ERR_CAST]) OVS_GREP_IFELSE([$KSRC/include/linux/err.h], [IS_ERR_OR_NULL]) diff --git a/datapath/linux/compat/geneve.c b/datapath/linux/compat/geneve.c index 1551a37217ec..7bfc6d8822e5 100644 --- a/datapath/linux/compat/geneve.c +++ b/datapath/linux/compat/geneve.c @@ -962,15 +962,18 @@ static struct dst_entry *geneve_get_v6_dst(struct sk_buff *skb, return dst; } -#ifdef HAVE_IPV6_DST_LOOKUP_NET +#if defined(HAVE_IPV6_DST_LOOKUP_FLOW_NET) + if (ipv6_stub->ipv6_dst_lookup_flow(geneve->net, gs6->sock->sk, &dst, +fl6)) { +#elif defined(HAVE_IPV6_DST_LOOKUP_FLOW) + if (ipv6_stub->ipv6_dst_lookup_flow(gs6->sock->sk, &dst, fl6)) { +#elif defined(HAVE_IPV6_DST_LOOKUP_NET) if (ipv6_stub->ipv6_dst_lookup(geneve->net, gs6->sock->sk, &dst, fl6)) { -#else -#ifdef HAVE_IPV6_STUB +#elif defined(HAVE_IPV6_STUB) if (ipv6_stub->ipv6_dst_lookup(gs6->sock->sk, &dst, fl6)) { #else if (ip6_dst_lookup(gs6->sock->sk, &dst, fl6)) { #endif -#endif netdev_dbg(dev, "no route to %pI6\n", &fl6->daddr); return ERR_PTR(-ENETUNREACH); } diff --git a/datapath/linux/compat/vxlan.c b/datapath/linux/compat/vxlan.c index f8f667e9748b..b334870b768e 100644 --- a/datapath/linux/compat/vxlan.c +++ b/datapath/linux/compat/vxlan.c @@ -990,18 +990,20 @@ static struct dst_entry *vxlan6_get_route(struct vxlan_dev *vxlan, fl6.fl6_dport = dport; fl6.fl6_sport = sport; -#ifdef HAVE_IPV6_DST_LOOKUP_NET - err = ipv6_stub->ipv6_dst_lookup(vxlan->net, -sock6->sock->sk, +#if defined(HAVE_IPV6_DST_LOOKUP_FLOW_NET) + err = ipv6_stub->ipv6_dst_lookup_flow(vxlan->net, sock6->sock->sk, + &ndst, &fl6); +#elif defined(HAVE_IPV6_DST_LOOKUP_FLOW) + err = ipv6_stub->ipv6_dst_lookup_flow(sock6->sock->sk, &ndst, &fl6); +#elif defined(HAVE_IPV6_DST_LOOKUP_NET) + err = ipv6_stub->ipv6_dst_lookup(vxlan->net, sock6->sock->sk, &ndst, &fl6); -#else -#ifdef HAVE_IPV6_STUB +#elif defined(HAVE_IPV6_STUB) err = ipv6_stub->ipv6_dst_lookup(vxlan->vn6_sock->sock->sk, &ndst, &fl6); #else err = ip6_dst_lookup(vxlan->vn6_sock->sock->sk, &ndst, &fl6); #endif -#endif if (err < 0) return ERR_PTR(err); -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovs-bugtool: Add ethtool -l for combined channel.
On Wed, Apr 29, 2020 at 10:46 AM William Tu wrote: > > Users of netdev-afxdp has to setup the combined channel > on physical NIC. This helps debugging related issues. > Example output: > $ ethtool -l enp3s0f0 > Channel parameters for enp3s0f0: > Pre-set maximums: > RX:0 > TX:0 > Other: 1 > Combined: 63 > Current hardware settings: > RX:0 > TX:0 > Other: 1 > Combined: 1 > > Some previous discussion: > https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/366631.html > > Signed-off-by: William Tu Given that each NIC has different implementation on their rx, tx, combined queue as mentioned in https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/366631.html I think it is a good idea to at log that in the bugtool. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] bugtool: Add dump-tlv-map.
On Wed, Apr 29, 2020 at 9:30 AM William Tu wrote: > > This helps debugging the tlv map issues. > > Signed-off-by: William Tu > --- > utilities/bugtool/plugins/network-status/openvswitch.xml | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/utilities/bugtool/plugins/network-status/openvswitch.xml > b/utilities/bugtool/plugins/network-status/openvswitch.xml > index 72aa449302b8..e6fa4fd15fff 100644 > --- a/utilities/bugtool/plugins/network-status/openvswitch.xml > +++ b/utilities/bugtool/plugins/network-status/openvswitch.xml > @@ -39,6 +39,7 @@ > repeat="2">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges > "dump-ports" > filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges > "dump-groups" > repeat="2">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges > "dump-group-stats" > + repeat="2">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges > "dump-tlv-map" > filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-dpdk-nic-numa > ip -s -s link > show > filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-port-stats > -- Thanks for this patch. It will be useful to debug geneve tlv issue. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv4 2/2] userspace: Add conntrack timeout policy support.
On Mon, Apr 27, 2020 at 8:42 AM William Tu wrote: > > Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout > policy support") adds conntrack timeout policy for kernel datapath. > This patch enables support for the userspace datapath. I tested > using the 'make check-system-userspace' which checks the timeout > policies for ICMP and UDP cases. > > Signed-off-by: William Tu > --- > v4: address feedback from Yi-Hung > - move default policy value to lib/conntrack-tp.c > - separate icmp bug patch > - refactor and fix include issues > - fix the clang lock analysis annotation > - keep clean interval to 5 seconds > - improve tests in system-traffic.at > - travis: https://travis-ci.org/github/williamtu/ovs-travis/builds/680158645 > --- Thanks for v4. I only have a few minor comments below. Thanks, -Yi-Hung > +++ b/lib/conntrack-tp.c > @@ -0,0 +1,301 @@ > +/* > + * Copyright (c) 2020 VMware, Inc. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#include > + > +#include "conntrack-private.h" > +#include "conntrack-tp.h" > +#include "ct-dpif.h" > +#include "openvswitch/vlog.h" > + > +VLOG_DEFINE_THIS_MODULE(conntrack_tp); > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); > + > +static const char *ct_timeout_str[] = { > +#define CT_TIMEOUT(NAME) #NAME, > +CT_TIMEOUTS > +#undef CT_TIMEOUT > +}; > + > +/* Default timeout policy in seconds. */ > +static unsigned int ct_dpif_netdev_tp_def[] = { > +[CT_DPIF_TP_ATTR_TCP_SYN_SENT] = 30, > +[CT_DPIF_TP_ATTR_TCP_SYN_RECV] = 30, > +[CT_DPIF_TP_ATTR_TCP_ESTABLISHED] = 24 * 60 * 60, > +[CT_DPIF_TP_ATTR_TCP_FIN_WAIT] = 15 * 60, > +[CT_DPIF_TP_ATTR_TCP_TIME_WAIT] = 45, > +[CT_DPIF_TP_ATTR_TCP_CLOSE] = 30, > +[CT_DPIF_TP_ATTR_UDP_FIRST] = 60, > +[CT_DPIF_TP_ATTR_UDP_SINGLE] = 60, > +[CT_DPIF_TP_ATTR_UDP_MULTIPLE] = 30, > +[CT_DPIF_TP_ATTR_ICMP_FIRST] = 60, > +[CT_DPIF_TP_ATTR_ICMP_REPLY] = 30, > +}; > + > +static struct timeout_policy * > +timeout_policy_lookup(struct conntrack *ct, int32_t tp_id) > +OVS_REQUIRES(ct->ct_lock) > +{ > +struct timeout_policy *tp; > +uint32_t hash; > + > +hash = hash_int(tp_id, ct->hash_basis); > +HMAP_FOR_EACH_IN_BUCKET (tp, node, hash, &ct->timeout_policies) { > +if (tp->policy.id == tp_id) { > +return tp; > +} > +} > +return NULL; > +} > + > +struct timeout_policy * > +timeout_policy_get(struct conntrack *ct, int32_t tp_id) > +{ > +struct timeout_policy *tp; > + > +ovs_mutex_lock(&ct->ct_lock); > +tp = timeout_policy_lookup(ct, tp_id); > +if (!tp) { > +ovs_mutex_unlock(&ct->ct_lock); > +return NULL; > +} > + > +ovs_mutex_unlock(&ct->ct_lock); > +return tp; > +} > + > +static void > +update_existing_tp(struct timeout_policy *tp_dst, > + const struct timeout_policy *tp_src) > +{ > +struct ct_dpif_timeout_policy *dst; > +const struct ct_dpif_timeout_policy *src; > +int i; > + > +dst = &tp_dst->policy; > +src = &tp_src->policy; > + > +/* Set the value and present bit to dst if present > + * bit in src is set. > + */ > +for (i = 0; i < ARRAY_SIZE(dst->attrs); i++) { > +if (src->present & (1 << i)) { > +dst->attrs[i] = src->attrs[i]; > +dst->present |= (1 << i); > +} > +} > +} > + > +static void > +init_default_tp(struct timeout_policy *tp, uint32_t tp_id) > +{ > +tp->policy.id = tp_id; > +/* Initialize the timeout value to default, but not > + * setting the present bit. > + */ > +tp->policy.present = 0; > +memcpy(tp->policy.attrs, ct_dpif_netdev_tp_def, > + sizeof tp->policy.attrs); > +} > + > +static void > +timeout_policy_create(struct conntrack *ct, > + struct timeout_policy *new_tp) > +OVS_REQUIRES(ct->ct_lock) > +{ > +uint32_t tp_id = new_tp->policy.id; > +struct timeout_policy *tp; > +uint32_t hash; > + > +tp = xzalloc(sizeof *tp); > +init_default_tp(tp, tp_id); > +update_existing_tp(tp, new_tp); > +hash = hash_int(tp_id, ct->hash_basis); > +hmap_insert(&ct->timeout_policies, &tp->node, hash); > +} > + > +static void > +timeout_policy_clean(struct conntrack *ct, struct timeout_policy *tp) > +OVS_REQUIRES(ct->ct_lock) > +{ > +hmap_remove(&ct->timeout_policies, &tp->node); > +free(tp); > +} > + > +static void
Re: [ovs-dev] [PATCHv4 1/2] conntrack: Fix icmp conntrack state.
On Mon, Apr 27, 2020 at 8:42 AM William Tu wrote: > > ICMP conntrack state should be ICMPS_REPLY after seeing both > side of ICMP traffic. > > Signed-off-by: William Tu > --- > lib/conntrack-icmp.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/lib/conntrack-icmp.c b/lib/conntrack-icmp.c > index 63246f0124d0..6cbf9656dd93 100644 > --- a/lib/conntrack-icmp.c > +++ b/lib/conntrack-icmp.c > @@ -50,9 +50,12 @@ icmp_conn_update(struct conntrack *ct, struct conn *conn_, > struct dp_packet *pkt OVS_UNUSED, bool reply, long long now) > { > struct conn_icmp *conn = conn_icmp_cast(conn_); > -conn->state = reply ? ICMPS_REPLY : ICMPS_FIRST; > -conn_update_expiration(ct, &conn->up, icmp_timeouts[conn->state], now); > > +if (reply && conn->state == ICMPS_FIRST) { > + conn->state = ICMPS_REPLY; > +} > + > +conn_update_expiration(ct, &conn->up, icmp_timeouts[conn->state], now); > return CT_UPDATE_VALID; > } > > -- Thanks for the patch. Looks good to me. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv3] userspace: Add conntrack timeout policy support.
On Sun, Apr 19, 2020 at 9:05 AM William Tu wrote: > > Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout > policy support") adds conntrack timeout policy for kernel datapath. > This patch enables support for the userspace datapath. I tested > using the 'make check-system-userspace' which checks the timeout > policies for ICMP and UDP cases. > > Signed-off-by: William Tu > --- > v3: > - address feedback from Yi-Hung > - use ID 0 as default policy > - move conn_{init,update}_expiration to lib/conntrack-tp.c > - s/tpid/tp_id/ > - add default timeout value to CT_DPIF_TP_*_ATTRs > - reduce the CT_CLEAN_INTERVAL from 5 to 3s, to make the tests > run faster. > - add more tests to system-traffic.at > - code refactoring and renaming > - travis: > https://travis-ci.org/github/williamtu/ovs-travis/builds/676683451 > --- Thanks for the updated patch. > diff --git a/lib/conntrack-icmp.c b/lib/conntrack-icmp.c > index 63246f0124d0..5c6ef37d2eb4 100644 > --- a/lib/conntrack-icmp.c > +++ b/lib/conntrack-icmp.c > @@ -22,6 +22,7 @@ > #include > > #include "conntrack-private.h" > +#include "conntrack-tp.h" > #include "dp-packet.h" > > enum OVS_PACKED_ENUM icmp_state { > @@ -50,9 +51,12 @@ icmp_conn_update(struct conntrack *ct, struct conn *conn_, > struct dp_packet *pkt OVS_UNUSED, bool reply, long long now) > { > struct conn_icmp *conn = conn_icmp_cast(conn_); > -conn->state = reply ? ICMPS_REPLY : ICMPS_FIRST; > -conn_update_expiration(ct, &conn->up, icmp_timeouts[conn->state], now); > > +if (reply && conn->state == ICMPS_FIRST) { > +conn->state = ICMPS_REPLY; > +} This code snippet looks like a reasonable fix on the ICMP state handling, should we pull it out as a separate patch? Previously, we will change the ICMP state from ICMP_REPLY back to ICMP_FIRST for every echo request packet after an echo reply packet. With this patch, the ICMP connection will stay in ICMP_REPLY state after the first echo reply packet. > diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c > new file mode 100644 > index ..5e59f56eb949 > --- /dev/null > +++ b/lib/conntrack-tp.c > @@ -0,0 +1,319 @@ > +#include > + > +#include "conntrack-private.h" > +#include "conntrack-tp.h" > +#include "ct-dpif.h" > +#include "dp-packet.h" Is "dp-packet.h" required? > +#include "openvswitch/vlog.h" May be put this line with the other #include? > +VLOG_DEFINE_THIS_MODULE(conntrack_tp); > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); > + > +#define DEFAULT_TP_ID 0 DEFAULT_TP_ID is also defined in ct-dpif.h, we should keep it in one place. > +static const char *ct_timeout_str[] = { > +#define CT_TIMEOUT(NAME, VALUE) #NAME, > +CT_TIMEOUTS > +#undef CT_TIMEOUT > +}; > + > +static const char *dpif_ct_timeout_str[] = { > +#define xstr(s) #s > +#define CT_DPIF_TP_TCP_ATTR(NAME, VAL) xstr(TCP_##NAME), > +CT_DPIF_TP_TCP_ATTRS > +#undef CT_DPIF_TP_TCP_ATTR > +#define CT_DPIF_TP_UDP_ATTR(NAME, VAL) xstr(UDP_##NAME), > +CT_DPIF_TP_UDP_ATTRS > +#undef CT_DPIF_TP_UDP_ATTR > +#define CT_DPIF_TP_ICMP_ATTR(NAME, VAL) xstr(ICMP_##NAME), > +CT_DPIF_TP_ICMP_ATTRS > +#undef CT_DPIF_TP_ICMP_ATTR > +#undef xstr > +}; > + > +static unsigned int ct_dpif_timeout_value_def[] = { > +#define CT_DPIF_TP_TCP_ATTR(NAME, VAL) [CT_DPIF_TP_ATTR_TCP_##NAME] = VAL, > +CT_DPIF_TP_TCP_ATTRS > +#undef CT_DPIF_TP_TCP_ATTR > +#define CT_DPIF_TP_UDP_ATTR(NAME, VAL) [CT_DPIF_TP_ATTR_UDP_##NAME] = VAL, > +CT_DPIF_TP_UDP_ATTRS > +#undef CT_DPIF_TP_UDP_ATTR > +#define CT_DPIF_TP_ICMP_ATTR(NAME, VAL) [CT_DPIF_TP_ATTR_ICMP_##NAME] = VAL, > +CT_DPIF_TP_ICMP_ATTRS > +#undef CT_DPIF_TP_ICMP_ATTR > +}; We now have two places that define the default timeout of the L4 protocols, one in ct-dpif.h (CT_DPIF_TP_TCP_ATTRS, CT_DPIF_TP_UDP_ATTRS, CT_DPIF_ICMP_ATTRS), and one in conntrack-private.h (CT_TIMEOUTS). Since ct-dpif.h is a common conntrack dpif interface for both the kernel and the userspace datapath, and the default value is different in this two datapaths, should we put the userspace default timeout setting in lib/conntrack-tp.h, and clean up CT_TIMEOUTS in lib/conntrack-private.h ? Maybe rename ct_dpif_timeout_value_def to ct_dpif_netdev* ? > + > +static void OVS_UNUSED > +timeout_policy_dump(struct timeout_policy *tp) > +{ > +bool present; > +int i; > + > +VLOG_DBG("Timeout Policy ID %u:", tp->policy.id); > +for (i = 0; i < ARRAY_SIZE(tp->policy.attrs); i++) { > +if (tp->policy.present & (1 << i)) { > +present = !!(tp->policy.present & (1 << i)); > +VLOG_DBG(" Policy: %s present: %u value: %u", > + dpif_ct_timeout_str[i], present, tp->policy.attrs[i]); > +} > +} > +} Is this static function needed? It does not seem to be called within conntrack-tp.c. If it is not needed, we should remove dpif_ct_timeout_str as well. > + > +static struc
[ovs-dev] [PATCH] ovsdb: Remove duplicated function defintions
ovsdb_function_from_string() and ovsdb_function_to_string() are defined both in ovsdb/condition.c and lib/ovsdb-condidtion.c with the same function definition. Remove the one in ovsdb/condition.c to avoid duplication. This also resolves the following bazel building error. ./libopenvswitch.lo(ovsdb-condition.pic.o): In function `ovsdb_function_from_string': /lib/ovsdb-condition.c:24: multiple definition of `ovsdb_function_from_string' ./libovsdb.a(condition.pic.o):/proc/self/cwd/external/openvswitch_repo/ovsdb/condition.c:34: first defined here ./libopenvswitch.lo(ovsdb-condition.pic.o): In function `ovsdb_function_from_string': ./lib/ovsdb-condition.c:24: multiple definition of `ovsdb_function_to_string' ./libovsdb.a(condition.pic.o):/proc/self/cwd/external/openvswitch_repo/ovsdb/condition.c:335 Reported-by: Harold Lim Signed-off-by: Yi-Hung Wei --- Travis-CI: https://travis-ci.org/github/YiHungWei/ovs/builds/677890120 --- ovsdb/condition.c | 27 --- 1 file changed, 27 deletions(-) diff --git a/ovsdb/condition.c b/ovsdb/condition.c index 692c0932864c..388dd54a16cf 100644 --- a/ovsdb/condition.c +++ b/ovsdb/condition.c @@ -29,33 +29,6 @@ #include "table.h" #include "util.h" -struct ovsdb_error * -ovsdb_function_from_string(const char *name, enum ovsdb_function *function) -{ -#define OVSDB_FUNCTION(ENUM, NAME) \ -if (!strcmp(name, NAME)) { \ -*function = ENUM; \ -return NULL;\ -} -OVSDB_FUNCTIONS; -#undef OVSDB_FUNCTION - -return ovsdb_syntax_error(NULL, "unknown function", - "No function named %s.", name); -} - -const char * -ovsdb_function_to_string(enum ovsdb_function function) -{ -switch (function) { -#define OVSDB_FUNCTION(ENUM, NAME) case ENUM: return NAME; -OVSDB_FUNCTIONS; -#undef OVSDB_FUNCTION -} - -return NULL; -} - static struct ovsdb_error * ovsdb_clause_from_json(const struct ovsdb_table_schema *ts, const struct json *json, -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv2] conntrack: Add coverage count for l4csum error.
On Thu, Apr 16, 2020 at 12:55 PM William Tu wrote: > > Add a coverage counter when userspace conntrack receives a packet > with invalid l4 checksum. When using veth for testing, users > often forget to turn off the tx offload on the other side of the > namespace, causing l4 checksum not calculated in packet header, > and at conntrack, return invalid conntrack state. > > Suggested-by: Yi-Hung Wei > Signed-off-by: William Tu > --- LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] conntrack: Add coverage count for l4csum error.
On Thu, Apr 16, 2020 at 7:17 AM William Tu wrote: > > Add a coverage counter when userspace conntrack receives a packet > with invalid l4 checksum. When using veth for testing, users > often forget to turn off the tx offload on the other side of the > namespace, causing l4 checksum not calculated in packet header, > and at conntrack, return invalid conntrack state. > > Suggested-by: Yi-Hung Wei > Signed-off-by: William Tu > --- > lib/conntrack.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/lib/conntrack.c b/lib/conntrack.c > index 0cbc8f6d2b25..98a62ce3bae6 100644 > --- a/lib/conntrack.c > +++ b/lib/conntrack.c > @@ -44,6 +44,7 @@ VLOG_DEFINE_THIS_MODULE(conntrack); > > COVERAGE_DEFINE(conntrack_full); > COVERAGE_DEFINE(conntrack_long_cleanup); > +COVERAGE_DEFINE(conntrack_l4csum_err); > > struct conn_lookup_ctx { > struct conn_key key; > @@ -1661,6 +1662,7 @@ checksum_valid(const struct conn_key *key, const void > *data, size_t size, > } else if (key->dl_type == htons(ETH_TYPE_IPV6)) { > return packet_csum_upperlayer6(l3, data, key->nw_proto, size) == 0; > } else { > +COVERAGE_INC(conntrack_l4csum_err); > return false; > } > } > -- Hi William, I think this patch does count the l4 checksum error on tcp, udp and icmpv6. But it does not cover the case of icmp v4. Should we include the checksum error on icmp v4 as well? Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] ofp-actions: Add delete field action
This patch adds a new OpenFlow action, delete field, to delete a field in packets. Currently, only the tun_metadata fields are supported. One use case to add this action is to support multiple versions of geneve tunnel metadatas to be exchanged among different versions of networks. For example, we may introduce tun_metadata2 to replace old tun_metadata1, but still want to provide backward compatibility to the older release. In this case, in the new OpenFlow pipeline, we would like to support the case to receive a packet with tun_metadata1, do some processing. And if the packet is going to a switch in the newer release, we would like to delete the value in tun_metadata1 and set a value into tun_metadata2. Currently, ovs does not provide an action to remove a value in tun_metadata if the value is present. This patch fulfills the gap by adding the delete_field action. For example, the OpenFlow syntax to delete tun_metadata1 is: actions=delete_field:tun_metadata1 Signed-off-by: Yi-Hung Wei --- Travis CI: https://travis-ci.org/github/YiHungWei/ovs/builds/674955333 --- NEWS | 1 + include/openvswitch/ofp-actions.h | 13 +- lib/nx-match.c| 20 - lib/nx-match.h| 4 +- lib/ofp-actions.c | 90 ++- lib/ovs-actions.xml | 16 +++ lib/tun-metadata.c| 17 lib/tun-metadata.h| 1 + ofproto/ofproto-dpif-xlate.c | 24 ++- tests/ofp-actions.at | 3 ++ tests/tunnel.at | 37 11 files changed, 221 insertions(+), 5 deletions(-) diff --git a/NEWS b/NEWS index 70bd17584594..58fbcedcb9bf 100644 --- a/NEWS +++ b/NEWS @@ -5,6 +5,7 @@ Post-v2.13.0 * The OpenFlow ofp_desc/serial_num may now be configured by setting the value of other-config:dp-sn in the Bridge table. * Added support to watch CONTROLLER port status in fast failover group. + * New action "delete_field". - DPDK: * Deprecated DPDK pdump packet capture support removed. * Deprecated DPDK ring ports (dpdkr) are no longer supported. diff --git a/include/openvswitch/ofp-actions.h b/include/openvswitch/ofp-actions.h index c8948e0d694f..226e86d0b0a5 100644 --- a/include/openvswitch/ofp-actions.h +++ b/include/openvswitch/ofp-actions.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2012, 2013, 2014, 2015, 2016, 2017, 2019 Nicira, Inc. + * Copyright (c) 2012-2017, 2019-2020 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -94,6 +94,7 @@ struct vl_mff_map; OFPACT(PUSH_MPLS, ofpact_push_mpls, ofpact, "push_mpls")\ OFPACT(POP_MPLS,ofpact_pop_mpls,ofpact, "pop_mpls") \ OFPACT(DEC_NSH_TTL, ofpact_null,ofpact, "dec_nsh_ttl") \ +OFPACT(DELETE_FIELD,ofpact_delete_field, ofpact, "delete_field") \ \ /* Generic encap & decap */ \ OFPACT(ENCAP, ofpact_encap, props, "encap") \ @@ -576,6 +577,16 @@ struct ofpact_pop_mpls { ); }; +/* OFPACT_DELETE_FIELD. + * + * Used for NXAST_DELETE_FIELD. */ +struct ofpact_delete_field { +OFPACT_PADDED_MEMBERS( +struct ofpact ofpact; +const struct mf_field *field; +); +}; + /* OFPACT_SET_TUNNEL. * * Used for NXAST_SET_TUNNEL, NXAST_SET_TUNNEL64. */ diff --git a/lib/nx-match.c b/lib/nx-match.c index 058816c7b88f..3ffd7d9d7711 100644 --- a/lib/nx-match.c +++ b/lib/nx-match.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 Nicira, Inc. + * Copyright (c) 2010-2017, 2020 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -1994,6 +1994,24 @@ nxm_execute_stack_pop(const struct ofpact_stack *pop, } } +/* Parses a field from '*s' into '*field'. If successful, stores the + * reference to the field in '*field', and returns NULL. On failure, + * returns a malloc()'ed error message. + */ +char * OVS_WARN_UNUSED_RESULT +mf_parse_field(const struct mf_field **field, const char *s) +{ +const struct nxm_field *f; +int s_len = strlen(s); + +f = nxm_field_by_name(s, s_len); +(*field) = f ? mf_from_id(f->id) : mf_from_name_len(s, s_len); +if (!*field) { +return xasprintf("unknown field `%s'", s); +} +return NULL; +} + /* Formats 'sf' into 's' in a format normally acceptable to * mf_parse_subfield(). (It won't be acceptable if sf->field is NULL or if * sf-&g
Re: [ovs-dev] [PATCH] ofp-actions: Add delete field action
On Sat, Apr 11, 2020 at 8:10 AM William Tu wrote: > > Hi Yi-Hung, > > I read through the patch and have some questions and comments. > > On Fri, Apr 10, 2020 at 01:05:24PM -0700, Yi-Hung Wei wrote: > > This patch adds a new OpenFlow action, delete field, to delete a > > field in packets. Currently, only the tun_metadata fields are > > supported. > > > > One use case to add this action is to support multiple versions > > of geneve tunnel metadatas to be exchanged among different versions > > of networks. For example, we may introduce tun_metadata2 to > > replace old tun_metadata1, but still want to provide backward > > compatibility to the older release. In this case, in the new > > OpenFlow pipeline, we would like to support the case to receive a > > packet with tun_metadata1, do some processing. And if the packet > > is going to a switch in the newer release, we would like to delete > > the value in tun_metadata1 and set a value into tun_metadata2. > > > > Currently, ovs does not provide an action to remove a value in > > tun_metadata if the value is present. This patch fulfills the gap > > by adding the delete_field action. The OpenFlow syntax is: > > > > actions: delete_field:tun_metadata1 > actions=delete_field:tun_metadata Thanks for William's review. I will address your comment and send out v2. > > --- a/lib/ofp-actions.c > > +++ b/lib/ofp-actions.c > > @@ -4140,6 +4144,87 @@ check_SET_TUNNEL(const struct ofpact_tunnel *a > > OVS_UNUSED, > > return 0; > > } > > > > +/* Delete field action. */ > > + > > +/* Action structure for DELETE_FIELD */ > > +struct nx_action_delete_field { > > +ovs_be16 type; /* OFPAT_VENDOR */ > > +ovs_be16 len; /* Length is 24. */ > > +ovs_be32 vendor;/* NX_VENDOR_ID. */ > > +ovs_be16 subtype; /* NXAST_DELETE_FIELD. */ > > +/* Followed by: > > + * - OXM/NXM header for field to delete (4 or 8 bytes). > > + * - Enough 0-bytes to pad out the action to 24 bytes. */ > > +uint8_t pad[14]; > > +}; > > +OFP_ASSERT(sizeof(struct nx_action_delete_field ) == 24); > > + > > +static enum ofperr > > +decode_NXAST_RAW_DELETE_FIELD(const struct nx_action_delete_field *nadf, > > + enum ofp_version ofp_version OVS_UNUSED, > > + const struct vl_mff_map *vl_mff_map, > > + uint64_t *tlv_bitmap, struct ofpbuf *out) > > +{ > > +struct ofpact_delete_field *delete_field; > > +enum ofperr err; > > + > > +delete_field = ofpact_put_DELETE_FIELD(out); > > +delete_field->ofpact.raw = NXAST_RAW_DELETE_FIELD; > > + > > +struct ofpbuf b = ofpbuf_const_initializer(nadf, ntohs(nadf->len)); > > +ofpbuf_pull(&b, OBJECT_OFFSETOF(nadf, pad)); > > + > > +err = mf_vl_mff_nx_pull_header(&b, vl_mff_map, &delete_field->field, > > + NULL, tlv_bitmap); > > +if (err) { > > +return err; > > +} > > + > > +return 0; > > +} > > + > > +static void > > +encode_DELETE_FIELD(const struct ofpact_delete_field *delete_field, > > + enum ofp_version ofp_version OVS_UNUSED, > > + struct ofpbuf *out) > alignment, missing one space Yes, I will fix the alignment issue in v2. > > +{ > > +struct nx_action_delete_field *nadf = put_NXAST_DELETE_FIELD(out); > > +size_t size = out->size; > > + > > +out->size = size - sizeof nadf->pad; > > +nx_put_mff_header(out, delete_field->field, 0, false); > > +out->size = size; > > question: why setting the size back? > There is 14 bytes of pad in struct nx_action_delete_field, and put_NXAST_DELETE_FIELD() would allocate the whole structure with padding in the buffer. Therefore, we move the data pointer in the buffer back to put the OXM/NXM header in the correct position. > > diff --git a/lib/ovs-actions.xml b/lib/ovs-actions.xml > > index ab8e08b84d8b..5acf6fe64fc5 100644 > > --- a/lib/ovs-actions.xml > > +++ b/lib/ovs-actions.xml > > @@ -1552,6 +1552,22 @@ for i in [1,n_slaves]: > > This action was added in Open vSwitch 2.11.90. > > > > > > + > > + > > + The delete_fieldaction > missing a space before 'action'? Thanks, I will add the missing space back. > > --- a/lib/tun-metadata.c > > +++ b/lib/tun-metadata.c > > @@ -261,6 +261,23 @@ tun_
Re: [ovs-dev] [PATCHv2] userspace: Add conntrack timeout policy support.
On Tue, Apr 7, 2020 at 2:08 PM William Tu wrote: > > Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout > policy support") adds conntrack timeout policy for kernel datapath. > This patch enables support for the userspace datapath. I tested > using the 'make check-system-userspace' which checks the timeout > policies for ICMP and UDP cases. > > Signed-off-by: William Tu Thanks William for implementing this feature in the userspace. I have some comments as below. > diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst > index b3507bd1c7fa..ffcda37f9985 100644 > --- a/Documentation/faq/releases.rst > +++ b/Documentation/faq/releases.rst > @@ -118,7 +118,7 @@ Q: Are all features available with all datapaths? > == == == = > === > Connection tracking 4.32.5 2.6 YES > Conntrack Fragment Reass. 4.32.6 2.12 YES > -Conntrack Timeout Policies 5.22.12 NO NO > +Conntrack Timeout Policies 5.22.12 2.13 NO For the support of userspace timeout policies in userspace datapath, should it be 2.14? > diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h > index 9a8ca3910157..852482303684 100644 > --- a/lib/conntrack-private.h > +++ b/lib/conntrack-private.h > @@ -118,6 +118,8 @@ struct conn { > /* Immutable data. */ > bool alg_related; /* True if alg data connection. */ > enum ct_conn_type conn_type; > + > +uint32_t tpid; /* Timeout policy ID. */ I have a minor suggestion here. I would recommend to replace tpid with tp_id so that when we grep tp_it it would not be confused with vlan's tpid (tunnel protocol id) field. > @@ -143,9 +145,9 @@ enum ct_update_res { > CT_TIMEOUT(ICMP_FIRST, 60 * 1000) \ > CT_TIMEOUT(ICMP_REPLY, 30 * 1000) > > -/* The smallest of the above values: it is used as an upper bound for the > - * interval between two rounds of cleanup of expired entries */ > -#define CT_TM_MIN (30 * 1000) > +/* This is used as an upper bound for the interval between two > + * rounds of cleanup of expired entries */ > +#define CT_TM_MIN (1 * 1000) I understand that while this patch enables users to customize the timeout policy, users can set the minimum timeout configuration to 1 second. Thus, updates CT_TM_MIN to 1 second would make the conntrack clean thread to check timer expiration every single second to meet that requirement. However, this may increase the system load even without any customized timeout policy. I would recommend to update CT_TM_MIN dynamically to the minimum customized timeout value on the fly. > diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c > new file mode 100644 > index ..6be4b8276340 > --- /dev/null > +++ b/lib/conntrack-tp.c > @@ -0,0 +1,247 @@ > +/* > + * Copyright (c) 2020 VMware, Inc. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#include > + > +#include "conntrack-private.h" > +#include "conntrack-tp.h" > +#include "dp-packet.h" > + > +#include "openvswitch/vlog.h" > +VLOG_DEFINE_THIS_MODULE(conntrack_tp); > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); > + > +static const char *ct_timeout_str[] = { > +#define CT_TIMEOUT(NAME, VALUE) #NAME > +CT_TIMEOUTS > +#undef CT_TIMEOUT > +}; > + > +static inline bool > +check_present_and_set(struct timeout_policy *tp, uint32_t *v, > + enum ct_dpif_tp_attr attr) > +{ > +if (tp && (tp->p.present & (1 << attr))) { > +*v = tp->p.attrs[attr]; > +return true; > +} > +return false; > +} > + > +#define TP_HAS_FUNC(ATTR) \ > +static bool \ > +tp_has_##ATTR(struct timeout_policy *tp, uint32_t *v) \ > +{ \ > +return check_present_and_set(tp, v, CT_DPIF_TP_ATTR_##ATTR); \ > +} > + > +/* These functions check whether the timeout value is set from the > + * present bit. If true, then set the value to '*v'. For meaning > + * of each value, see CT_Timeout_Policy table at ovs-vswitchd.conf.db(5). > + */ > +TP_HAS_FUNC(TCP_SYN_SENT) > +TP_HAS_FUNC(TCP_SYN_RECV) > +TP_HAS_FUNC(TCP_ESTABLISHED) > +TP_HAS_FUNC(TCP_FIN_WAIT) > +TP_HAS_FUNC(TCP_TIME_WAIT) > +TP_HAS_FUNC(TCP_CLOSE) > +TP_HAS_FUNC(UDP_FIRST) > +TP_HAS_FUNC(UDP_SINGLE) > +TP_HAS_FUNC(UDP
[ovs-dev] [PATCH] ofp-actions: Add delete field action
This patch adds a new OpenFlow action, delete field, to delete a field in packets. Currently, only the tun_metadata fields are supported. One use case to add this action is to support multiple versions of geneve tunnel metadatas to be exchanged among different versions of networks. For example, we may introduce tun_metadata2 to replace old tun_metadata1, but still want to provide backward compatibility to the older release. In this case, in the new OpenFlow pipeline, we would like to support the case to receive a packet with tun_metadata1, do some processing. And if the packet is going to a switch in the newer release, we would like to delete the value in tun_metadata1 and set a value into tun_metadata2. Currently, ovs does not provide an action to remove a value in tun_metadata if the value is present. This patch fulfills the gap by adding the delete_field action. The OpenFlow syntax is: actions: delete_field:tun_metadata1 Signed-off-by: Yi-Hung Wei --- Travis CI: https://travis-ci.org/github/YiHungWei/ovs/builds/673081110 --- NEWS | 1 + include/openvswitch/ofp-actions.h | 13 +- lib/nx-match.c| 20 - lib/nx-match.h| 4 +- lib/ofp-actions.c | 90 ++- lib/ovs-actions.xml | 16 +++ lib/tun-metadata.c| 17 lib/tun-metadata.h| 1 + ofproto/ofproto-dpif-xlate.c | 24 ++- tests/ofp-actions.at | 3 ++ tests/tunnel.at | 30 + 11 files changed, 214 insertions(+), 5 deletions(-) diff --git a/NEWS b/NEWS index 70bd17584594..58fbcedcb9bf 100644 --- a/NEWS +++ b/NEWS @@ -5,6 +5,7 @@ Post-v2.13.0 * The OpenFlow ofp_desc/serial_num may now be configured by setting the value of other-config:dp-sn in the Bridge table. * Added support to watch CONTROLLER port status in fast failover group. + * New action "delete_field". - DPDK: * Deprecated DPDK pdump packet capture support removed. * Deprecated DPDK ring ports (dpdkr) are no longer supported. diff --git a/include/openvswitch/ofp-actions.h b/include/openvswitch/ofp-actions.h index c8948e0d694f..226e86d0b0a5 100644 --- a/include/openvswitch/ofp-actions.h +++ b/include/openvswitch/ofp-actions.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2012, 2013, 2014, 2015, 2016, 2017, 2019 Nicira, Inc. + * Copyright (c) 2012-2017, 2019-2020 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -94,6 +94,7 @@ struct vl_mff_map; OFPACT(PUSH_MPLS, ofpact_push_mpls, ofpact, "push_mpls")\ OFPACT(POP_MPLS,ofpact_pop_mpls,ofpact, "pop_mpls") \ OFPACT(DEC_NSH_TTL, ofpact_null,ofpact, "dec_nsh_ttl") \ +OFPACT(DELETE_FIELD,ofpact_delete_field, ofpact, "delete_field") \ \ /* Generic encap & decap */ \ OFPACT(ENCAP, ofpact_encap, props, "encap") \ @@ -576,6 +577,16 @@ struct ofpact_pop_mpls { ); }; +/* OFPACT_DELETE_FIELD. + * + * Used for NXAST_DELETE_FIELD. */ +struct ofpact_delete_field { +OFPACT_PADDED_MEMBERS( +struct ofpact ofpact; +const struct mf_field *field; +); +}; + /* OFPACT_SET_TUNNEL. * * Used for NXAST_SET_TUNNEL, NXAST_SET_TUNNEL64. */ diff --git a/lib/nx-match.c b/lib/nx-match.c index 058816c7b88f..3ffd7d9d7711 100644 --- a/lib/nx-match.c +++ b/lib/nx-match.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 Nicira, Inc. + * Copyright (c) 2010-2017, 2020 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -1994,6 +1994,24 @@ nxm_execute_stack_pop(const struct ofpact_stack *pop, } } +/* Parses a field from '*s' into '*field'. If successful, stores the + * reference to the field in '*field', and returns NULL. On failure, + * returns a malloc()'ed error message. + */ +char * OVS_WARN_UNUSED_RESULT +mf_parse_field(const struct mf_field **field, const char *s) +{ +const struct nxm_field *f; +int s_len = strlen(s); + +f = nxm_field_by_name(s, s_len); +(*field) = f ? mf_from_id(f->id) : mf_from_name_len(s, s_len); +if (!*field) { +return xasprintf("unknown field `%s'", s); +} +return NULL; +} + /* Formats 'sf' into 's' in a format normally acceptable to * mf_parse_subfield(). (It won't be acceptable if sf->field is NULL or if * sf->field has no NXM name.) */ diff --git a/lib/n
Re: [ovs-dev] [PATCH v2 1/2] tun_metadata: Fix coredump caused by use-after-free bug
On Tue, Apr 7, 2020 at 2:50 PM Yifeng Sun wrote: > > Tun_metadata can be referened by flow and frozen_state at the same > time. When ovs-vswitchd handles TLV table mod message, the involved > tun_metadata gets freed. The call trace to free tun_metadata is > shown as below: > > ofproto_run > - handle_openflow > - handle_single_part_openflow > - handle_tlv_table_mod > - tun_metadata_table_mod > - tun_metadata_postpone_free > > Unfortunately, this tun_metadata can be still used by some frozen_state, > and later on when frozen_state tries to access its tun_metadata table, > ovs-vswitchd crashes. The call trace to access tun_metadata from > frozen_state is shown as below: > > udpif_upcall_handler > - recv_upcalls > - process_upcall > - frozen_metadata_to_flow > > It is unsafe for frozen_state to reference tun_table because tun_table > is protected by RCU while the lifecycle of frozen_state can span several > RCU quiesce states. Current code violates OVS's RCU protection mechanism. > > This patch fixes it by simply stopping frozen_state from referencing > tun_table. If frozen_state needs tun_table, we can find the latest valid > tun_table through ofproto_get_tun_tab() efficiently. > > A previous commit seems fixing the samiliar issue: > 254878c18874f6 (ofproto-dpif-xlate: Fix segmentation fault caused by > tun_table) > > VMware-BZ: #2526222 > Signed-off-by: Yifeng Sun > --- > v1->v2: Drop the fix based on reference count. It doesn't fit well with RCU > mechanism. Thanks William and YiHung for the offline discussion. > > ofproto/ofproto-dpif-rid.h| 7 +++ > ofproto/ofproto-dpif-upcall.c | 2 ++ > 2 files changed, 9 insertions(+) > > diff --git a/ofproto/ofproto-dpif-rid.h b/ofproto/ofproto-dpif-rid.h > index e5d02caf28a3..5235764a9885 100644 > --- a/ofproto/ofproto-dpif-rid.h > +++ b/ofproto/ofproto-dpif-rid.h > @@ -115,6 +115,13 @@ frozen_metadata_from_flow(struct frozen_metadata *md, > { > memset(md, 0, sizeof *md); > md->tunnel = flow->tunnel; > +/* It is unsafe for frozen_state to reference tun_table because > + * tun_table is protected by RCU while the lifecycle of frozen_state > + * can span several RCU quiesce states. > + * > + * The latest valid tun_table can be found by ofproto_get_tun_tab() > + * efficiently. */ > +md->tunnel.metadata.tab = NULL; > md->metadata = flow->metadata; > memcpy(md->regs, flow->regs, sizeof md->regs); > md->in_port = flow->in_port.ofp_port; > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 8dfa05b71df4..949cd4dbaf6f 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -1535,6 +1535,8 @@ process_upcall(struct udpif *udpif, struct upcall > *upcall, > } > > frozen_metadata_to_flow(&state->metadata, &frozen_flow); > +frozen_flow.tunnel.metadata.tab = ofproto_get_tun_tab( > +&upcall->ofproto->up); Thanks for the fix. I wonder if it makes sense to move ofproto_get_tun_tab() into frozen_metadata_to_flow()? Therefore, we do not need to call ofproto_get_tun_tab() to reset the tun_table for other frozen state use case. Thanks, -Yi-Hung > flow_get_metadata(&frozen_flow, &am->pin.up.base.flow_metadata); > > ofproto_dpif_send_async_msg(upcall->ofproto, am); > -- ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 2/2] Documentation: Add extra repo info for RHEL 8
On Tue, Mar 24, 2020 at 8:42 AM Greg Rose wrote: > > The extra development repo for RHEL 8 has changed. Document it. > > Signed-off-by: Greg Rose > > --- > V2 - Break long line > --- > Documentation/intro/install/fedora.rst | 4 > 1 file changed, 4 insertions(+) LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 1/2] compat: Fix nf_ip_hook parameters for RHEL 8
On Tue, Mar 24, 2020 at 8:42 AM Greg Rose wrote: > > From: Greg Rose > > A RHEL release version check was only checking for RHEL releases > greater than 7.0 so that ended up including a compat fixup that > is not needed for 8.0. Fix up the version check. > > Signed-off-by: Greg Rose > --- LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 9/9] datapath: Update kernel test list
On Wed, Mar 4, 2020 at 3:04 PM Greg Rose wrote: > > We are adding support for Linux kernels up to 5.5 so update the > Travis test list, NEWS and FAQ. > > Signed-off-by: Greg Rose > > --- > V3 - Squash separate commits for Travis, NEWS and FAQ > --- > .travis.yml| 2 +- > Documentation/faq/releases.rst | 1 + > NEWS | 2 ++ > 3 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/.travis.yml b/.travis.yml > index abd2a91..ef9f867 100644 > --- a/.travis.yml > +++ b/.travis.yml > @@ -38,7 +38,7 @@ env: >- TESTSUITE=1 OPTS="--enable-shared" >- TESTSUITE=1 DPDK=1 >- TESTSUITE=1 LIBS=-ljemalloc > - - KERNEL_LIST="5.0 4.20 4.19 4.18 4.17 4.16" > + - KERNEL_LIST="5.5 4.20 4.19 4.18 4.17 4.16" Hi Greg, Any reason we want to skip build test with 5.0 kernel? >- KERNEL_LIST="4.15 4.14 4.9 4.4 3.19 3.16" >- AFXDP=1 KERNEL=5.3 >- M32=1 OPTS="--disable-ssl" > diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst > index 6ff47d7..a073b97 100644 > --- a/Documentation/faq/releases.rst > +++ b/Documentation/faq/releases.rst > @@ -70,6 +70,7 @@ Q: What Linux kernel versions does each Open vSwitch > release work with? > 2.10.x 3.10 to 4.17 > 2.11.x 3.10 to 4.18 > 2.12.x 3.10 to 5.0 > +2.13.x 3.10 to 5.5 Since this patch series is for the master branch, and we general do not backport new features back to older branch. Maybe we should update as the following? +2.13.x 3.10 to 5.0 +2.14.x 3.10 to 5.5 Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 8/9] datapath: conntrack: mark expected switch fall-through
On Wed, Mar 4, 2020 at 3:04 PM Greg Rose wrote: > > From: "Gustavo A. R. Silva" > > Upstream commit: > commit 279badc2a85be83e0187b8c566e3b476b76a87a2 > Author: Gustavo A. R. Silva > Date: Thu Oct 19 12:55:03 2017 -0500 > > openvswitch: conntrack: mark expected switch fall-through > > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. > > Notice that in this particular case I placed a "fall through" comment on > its own line, which is what GCC is expecting to find. > > Signed-off-by: Gustavo A. R. Silva > Signed-off-by: David S. Miller > > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 7/9] compat: Use nla_parse deprecated functions
On Wed, Mar 4, 2020 at 3:04 PM Greg Rose wrote: > > From: Johannes Berg > > Upstream commit: > commit 8cb081746c031fb164089322e2336a0bf5b3070c > Author: Johannes Berg > Date: Fri Apr 26 14:07:28 2019 +0200 > > netlink: make validation more configurable for future strictness > > We currently have two levels of strict validation: > > 1) liberal (default) > - undefined (type >= max) & NLA_UNSPEC attributes accepted > - attribute length >= expected accepted > - garbage at end of message accepted > 2) strict (opt-in) > - NLA_UNSPEC attributes accepted > - attribute length >= expected accepted > > Split out parsing strictness into four different options: > * TRAILING - check that there's no trailing data after parsing > attributes (in message or nested) > * MAXTYPE - reject attrs > max known type > * UNSPEC - reject attributes with NLA_UNSPEC policy entries > * STRICT_ATTRS - strictly validate attribute size > > The default for future things should be *everything*. > The current *_strict() is a combination of TRAILING and MAXTYPE, > and is renamed to _deprecated_strict(). > The current regular parsing has none of this, and is renamed to > *_parse_deprecated(). > > Additionally it allows us to selectively set one of the new flags > even on old policies. Notably, the UNSPEC flag could be useful in > this case, since it can be arranged (by filling in the policy) to > not be an incompatible userspace ABI change, but would then going > forward prevent forgetting attribute entries. Similar can apply > to the POLICY flag. > > We end up with the following renames: > * nla_parse -> nla_parse_deprecated > * nla_parse_strict-> nla_parse_deprecated_strict > * nlmsg_parse -> nlmsg_parse_deprecated > * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict > * nla_parse_nested-> nla_parse_nested_deprecated > * nla_validate_nested -> nla_validate_nested_deprecated > > Using spatch, of course: > @@ > expression TB, MAX, HEAD, LEN, POL, EXT; > @@ > -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) > +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) > > @@ > expression NLH, HDRLEN, TB, MAX, POL, EXT; > @@ > -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) > +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) > > @@ > expression NLH, HDRLEN, TB, MAX, POL, EXT; > @@ > -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) > +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) > > @@ > expression TB, MAX, NLA, POL, EXT; > @@ > -nla_parse_nested(TB, MAX, NLA, POL, EXT) > +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) > > @@ > expression START, MAX, POL, EXT; > @@ > -nla_validate_nested(START, MAX, POL, EXT) > +nla_validate_nested_deprecated(START, MAX, POL, EXT) > > @@ > expression NLH, HDRLEN, MAX, POL, EXT; > @@ > -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) > +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) > > For this patch, don't actually add the strict, non-renamed versions > yet so that it breaks compile if I get it wrong. > > Also, while at it, make nla_validate and nla_parse go down to a > common __nla_validate_parse() function to avoid code duplication. > > Ultimately, this allows us to have very strict validation for every > new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the > next patch, while existing things will continue to work as is. > > In effect then, this adds fully strict validation for any new command. > > Signed-off-by: Johannes Berg > Signed-off-by: David S. Miller > > Backport portions of this commit applicable to openvswitch and > added necessary compatibility layer changes to support older > kernels. > > Signed-off-by: Greg Rose > --- > V3 - As per Yi-Hung's suggestion just backport the upstream patch > to stay in sync with upstream kernel code. > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 6/9] datapath: Kbuild: Add kcompat.h header to front of NOSTDINC
On Wed, Mar 4, 2020 at 3:04 PM Greg Rose wrote: > > Since this commit in the Linux upstream kernel: > 'commit 9b9a3f20cbe0 ("kbuild: split final module linking out into > Makefile.modfinal")' > The openvswitch kernel module fails to build against the upstream > Linux kernel. The cause of the build failure is that the include of the > KBUILD_EXTMOD variable was dropped in Makefile.modfinal when > it was split out from Makefile.modpost. Our Kbuild was setting > the ccflags-y variable to include our kcompat.h header as the > first header file. The Linux kernel maintainer has said that > it is incorrect to rely on the ccflags-y variable for the modfinal > phase of the build so that is why KBUILD_EXTMOD is not included. > > We fix this by breaking a different Linux kernel make rule. We > add '-include $(builddir)/kcompat.h' to the front of the NOSTDINC > variable setting in our Kbuild makefile. > > As noted already in the comment for the NOSTDINC setting: > \# These include directories have to go before -I$(KSRC)/include. > \# NOSTDINC_FLAGS just happens to be a variable that goes in the > \# right place, even though it's conceptually incorrect. > > So we continue the misuse of the NOSTDINC variable to fix this > issue as well. > > The assumption of the Linux kernel maintainers is that any > local, out-of-tree build include files can be added to the end > of the command line. In our case that is wrong of course, but > there is nothing we can do about it that I know of other than using > some utility like unifdef to strip out offending chunks of our > compatibility layer code before invocation of Makefile.modfinal. > That is a big change that would take a lot of work to implement. > > We could ask the Linux kernel maintainers to provide some > way for out-of-tree kernel modules to include their own header > files first in a proper manner. I consider that to be a very > low probability of success but something we could ask about. > > For now we cheat and take the easy way out. > > Reported-by: David Ahern > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 5/9] datapath: Use sizeof_field macro
On Wed, Mar 4, 2020 at 3:04 PM Greg Rose wrote: > > From: Pankaj Bharadiya > > Upstream commit: > commit c593642c8be046915ca3a4a300243a68077cd207 > Author: Pankaj Bharadiya > Date: Mon Dec 9 10:31:43 2019 -0800 > > treewide: Use sizeof_field() macro > > Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except > at places where these are defined. Later patches will remove the unused > definition of FIELD_SIZEOF(). > > This patch is generated using following script: > > EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" > > git grep -l -e "\bFIELD_SIZEOF\b" | while read file; > do > > if [[ "$file" =~ $EXCLUDE_FILES ]]; then > continue > fi > sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; > done > > Signed-off-by: Pankaj Bharadiya > Link: > https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharad...@intel.com > Co-developed-by: Kees Cook > Signed-off-by: Kees Cook > Acked-by: David Miller # for net > > Also added a compatibility layer macro for older kernels that still > use SIZEOF_FIELD > > Signed-off-by: Greg Rose > --- > V3 - As suggested by Yi-Hung go ahead and just do the sizeof_field > backport rather than the cheap and easy way done in previous > versions of this patch. > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 4/9] compat: Remove flex_array code
On Wed, Mar 4, 2020 at 3:04 PM Greg Rose wrote: > > Flex array support is removed since kernel 5.1. Convert to use > kvmalloc_array instead. > > Signed-off-by: Greg Rose > --- Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 3/9] compat: Move genl_ops policy to genl_family
On Wed, Mar 4, 2020 at 3:04 PM Greg Rose wrote: > > From: Johannes Berg > > Upstream commit: > commit 3b0f31f2b8c9fb348e4530b88f6b64f9621f83d6 > Author: Johannes Berg > Date: Thu Mar 21 22:51:02 2019 +0100 > > genetlink: make policy common to family > > Since maxattr is common, the policy can't really differ sanely, > so make it common as well. > > The only user that did in fact manage to make a non-common policy > is taskstats, which has to be really careful about it (since it's > still using a common maxattr!). This is no longer supported, but > we can fake it using pre_doit. > > This reduces the size of e.g. nl80211.o (which has lots of commands): > >textdata bss dec hex filename > 398745 143232240 415308 6564c net/wireless/nl80211.o > (before) > 397913 143312240 414484 65314 net/wireless/nl80211.o (after) > >-832 +8 0-824 > > Which is obviously just 8 bytes for each command, and an added 8 > bytes for the new policy pointer. I'm not sure why the ops list is > counted as .text though. > > Most of the code transformations were done using the following spatch: > @ops@ > identifier OPS; > expression POLICY; > @@ > struct genl_ops OPS[] = { > ..., > { > - .policy = POLICY, > }, > ... > }; > > @@ > identifier ops.OPS; > expression ops.POLICY; > identifier fam; > expression M; > @@ > struct genl_family fam = { > .ops = OPS, > .maxattr = M, > + .policy = POLICY, > ... > }; > > This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing > the cb->data as ops, which we want to change in a later genl patch. > > Signed-off-by: Johannes Berg > Signed-off-by: David S. Miller > > Since commit 3b0f31f2b8c9f ("genetlink: make policy common to family") > the policy field of the genl_ops structure has been moved into the > genl_family structure. Add necessary compat layer infrastructure > to still support older kernels. > > Signed-off-by: Greg Rose > --- > V3 - Add policy back to genl_family instead of completely removing > it. > --- Thanks for the patch. LGTM. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 2/9] compat: Fix up changes to inet frags in 5.1+
On Wed, Mar 4, 2020 at 3:04 PM Greg Rose wrote: > > Since Linux kernel release 5.1 the fragments field of the inet_frag_queue > structure is removed and now only the rb_fragments structure with an > rb_node pointer is used for both ipv4 and ipv6. In addition, the > atomic_sub and atomic_add functions are replaced with their > equivalent long counterparts. > > Signed-off-by: Greg Rose > --- > V3 - Use HAVE_CORRECT_MRU_HANDLING instead of less reliable kernel > version check for compile time handling of the rb_fragments > change. > --- Looks good to me. Acked-by: Yi-Hung Wei Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V3 1/9] acinclude: Enable Linux kernel 5.5
On Wed, Mar 4, 2020 at 3:04 PM Greg Rose wrote: > > Signed-off-by: Greg Rose > --- > acinclude.m4 | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/acinclude.m4 b/acinclude.m4 > index 1212a46..db64267 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -151,7 +151,7 @@ AC_DEFUN([OVS_CHECK_LINUX], [ > AC_MSG_RESULT([$kversion]) > > if test "$version" -ge 5; then > - if test "$version" = 5 && test "$patchlevel" -le 0; then > + if test "$version" = 5 && test "$patchlevel" -le 5; then >: # Linux 5.x > else >AC_ERROR([Linux kernel in $KBUILD is version $kversion, but > version newer than 5.0.x is not supported (please refer to the FAQ for > advice)]) Opps, I just found that we should update the version number in the AC_ERROR message as well. Somehow I did not find it in the previous review. With the version number update. Acked-by: Yi-Hung Wei -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 07/11] datapath: Kbuild: Add kcompat.h header to front of NOSTDINC
On Wed, Feb 26, 2020 at 9:42 AM Greg Rose wrote: > > Since this commit in the Linux upstream kernel: > 'commit 9b9a3f20cbe0 ("kbuild: split final module linking out into > Makefile.modfinal")' > The openvswitch kernel module fails to build against the upstream > Linux kernel. The cause of the build failure is that the include of the > KBUILD_EXTMOD variable was dropped in Makefile.modfinal when > it was split out from Makefile.modpost. Our Kbuild was setting > the ccflags-y variable to include our kcompat.h header as the > first header file. The Linux kernel maintainer has said that > it is incorrect to rely on the ccflags-y variable for the modfinal > phase of the build so that is why KBUILD_EXTMOD is not included. > > We fix this by breaking a different Linux kernel make rule. We > add '-include $(builddir)/kcompat.h' to the front of the NOSTDINC > variable setting in our Kbuild makefile. > > As noted already in the comment for the NOSTDINC setting: > \# These include directories have to go before -I$(KSRC)/include. > \# NOSTDINC_FLAGS just happens to be a variable that goes in the > \# right place, even though it's conceptually incorrect. > > So we continue the misuse of the NOSTDINC variable to fix this > issue as well. > > The assumption of the Linux kernel maintainers is that any > local, out-of-tree build include files can be added to the end > of the command line. In our case that is wrong of course, but > there is nothing we can do about it that I know of other than using > some utility like unifdef to strip out offending chunks of our > compatibility layer code before invocation of Makefile.modfinal. > That is a big change that would take a lot of work to implement. > > We could ask the Linux kernel maintainers to provide some > way for out-of-tree kernel modules to include their own header > files first in a proper manner. I consider that to be a very > low probability of success but something we could ask about. > > For now we cheat and take the easy way out. > > Reported-by: David Ahern > Signed-off-by: Greg Rose > --- Thanks for this patch. It must be a lot of effort to figure this out. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 09/11] compat: Use nla_parse deprecated functions
On Wed, Feb 26, 2020 at 9:42 AM Greg Rose wrote: > > Changes for in kernel generated netlink attribute parsing functions > require our out of tree driver to use the deprecated forms of those > functions. Otherwise the message parsing will return -EINVAL because > NLA_F_NESTED is not set in the nla_type field. > > Signed-off-by: Greg Rose > --- > acinclude.m4| 3 +++ > datapath/linux/compat/include/net/netlink.h | 5 + > 2 files changed, 8 insertions(+) > > diff --git a/acinclude.m4 b/acinclude.m4 > index a55c905..43d1576 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -1072,6 +1072,9 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ >OVS_FIND_FIELD_IFELSE([$KSRC/include/net/genetlink.h], [genl_ops], > [policy], > [OVS_DEFINE([HAVE_GENL_OPS_POLICY])]) > + OVS_GREP_IFELSE([$KSRC/include/net/netlink.h], > + [nla_parse_deprecated_strict], > + [OVS_DEFINE([HAVE_NLA_PARSE_DEPRECATED_STRICT])]) > >if cmp -s datapath/linux/kcompat.h.new \ > datapath/linux/kcompat.h >/dev/null 2>&1; then > diff --git a/datapath/linux/compat/include/net/netlink.h > b/datapath/linux/compat/include/net/netlink.h > index 34fc346..7c0d993 100644 > --- a/datapath/linux/compat/include/net/netlink.h > +++ b/datapath/linux/compat/include/net/netlink.h > @@ -143,6 +143,10 @@ static inline int nla_put_be64(struct sk_buff *skb, int > attrtype, __be64 value, > > #endif > > +#ifdef HAVE_NLA_PARSE_DEPRECATED_STRICT > +#define nla_parse_nested nla_parse_nested_deprecated > +#define nla_parse nla_parse_deprecated_strict Thanks for this patch. Looks like this change is related to upstream patch 8cb081746c031 ("netlink: make validation more configurable for future strictness"). To be in sync with the upstream kernel, should we use the new upstream netlink parsing APIs (i.e. nla_parse_nested_deprecated(), nla_parse_deprecated_strict(), genlmsg_parse_deprecated()), and backports them to the old APIs if they are not available in the older kernel? Thanks, -Yi-Hung > +#else > #ifndef HAVE_NETLINK_EXT_ACK > struct netlink_ext_ack; > > @@ -164,6 +168,7 @@ static inline int rpl_nla_parse(struct nlattr **tb, int > maxtype, > } > #define nla_parse rpl_nla_parse > #endif > +#endif /* HAVE_NLA_PARSE_DEPRECATED_STRICT */ > > #ifndef HAVE_NLA_NEST_START_NOFLAG > static inline struct nlattr *rpl_nla_nest_start_noflag(struct sk_buff *skb, > -- ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 06/11] compat: Add FIELD_SIZEOF macro
On Wed, Feb 26, 2020 at 9:42 AM Greg Rose wrote: > > The FIELD_SIZEOF macro is removed in Linux kernel release 5.5 but > is still needed in our out of tree kernel module for compatibility > with older kernels. > > Signed-off-by: Greg Rose > --- > datapath/linux/compat/include/linux/kernel.h | 4 > 1 file changed, 4 insertions(+) > > diff --git a/datapath/linux/compat/include/linux/kernel.h > b/datapath/linux/compat/include/linux/kernel.h > index 2e81abc..8529b1d 100644 > --- a/datapath/linux/compat/include/linux/kernel.h > +++ b/datapath/linux/compat/include/linux/kernel.h > @@ -32,4 +32,8 @@ > #define U32_MAX((u32)~0U) > #endif > > +#ifndef FIELD_SIZEOF > +#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f)) > +#endif > + > #endif /* linux/kernel.h */ > -- Hi Greg, Thanks for the patch. I think we generally want to keep files in ./datapath/*.c|h to be in sync with upstream kernel, and we backports the missing function in the older kernel. As upstream patch c593642c8be04 ("treewide: Use sizeof_field() macro") replaces FIELD_SIZEOF with sizeof_field(), should we update FIELD_SIZEOF to sizeof_field() in datapath.c and flow.h and backports sizeof_field() if necessary? Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 04/11] compat: Remove flex_array code
On Wed, Feb 26, 2020 at 9:41 AM Greg Rose wrote: > > Flex array support is removed since kernel 5.1. Convert to use > kvmalloc_array instead. > > Signed-off-by: Greg Rose > --- Thanks for the patch. Looks good to me. Acked-by: Yi-Hung Wei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 03/11] compat: Remove genl_ops policy field for newer kernels
On Wed, Feb 26, 2020 at 9:41 AM Greg Rose wrote: > > The policy field of the genl_ops structure has been removed in recent > kernels. > > Signed-off-by: Greg Rose > --- > acinclude.m4 | 3 +++ > datapath/conntrack.c | 8 > datapath/datapath.c | 32 > datapath/meter.c | 10 ++ > 4 files changed, 53 insertions(+) > > diff --git a/acinclude.m4 b/acinclude.m4 > index cad76c7..a55c905 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -1069,6 +1069,9 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ >[OVS_DEFINE([HAVE_DST_OPS_CONFIRM_NEIGH])]) >OVS_GREP_IFELSE([$KSRC/include/net/inet_frag.h], [fqdir], >[OVS_DEFINE([HAVE_INET_FRAG_FQDIR])]) > + OVS_FIND_FIELD_IFELSE([$KSRC/include/net/genetlink.h], [genl_ops], > +[policy], > +[OVS_DEFINE([HAVE_GENL_OPS_POLICY])]) > >if cmp -s datapath/linux/kcompat.h.new \ > datapath/linux/kcompat.h >/dev/null 2>&1; then > diff --git a/datapath/conntrack.c b/datapath/conntrack.c > index 838cf63..25da2a5 100644 > --- a/datapath/conntrack.c > +++ b/datapath/conntrack.c > @@ -134,10 +134,12 @@ struct ovs_ct_limit_info { > struct nf_conncount_data *data; > }; > > +#ifdef HAVE_GENL_OPS_POLICY > static const struct nla_policy ct_limit_policy[OVS_CT_LIMIT_ATTR_MAX + 1] = { > [OVS_CT_LIMIT_ATTR_ZONE_LIMIT] = { .type = NLA_NESTED, }, > }; > #endif > +#endif > > static bool labels_nonzero(const struct ovs_key_ct_labels *labels); > > @@ -2312,7 +2314,9 @@ static struct genl_ops ct_limit_genl_ops[] = { > #endif > .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN >* privilege. */ > +#ifdef HAVE_GENL_OPS_POLICY > .policy = ct_limit_policy, > +#endif > .doit = ovs_ct_limit_cmd_set, > }, > { .cmd = OVS_CT_LIMIT_CMD_DEL, > @@ -2321,7 +2325,9 @@ static struct genl_ops ct_limit_genl_ops[] = { > #endif > .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN >* privilege. */ > +#ifdef HAVE_GENL_OPS_POLICY > .policy = ct_limit_policy, > +#endif > .doit = ovs_ct_limit_cmd_del, > }, > { .cmd = OVS_CT_LIMIT_CMD_GET, > @@ -2329,7 +2335,9 @@ static struct genl_ops ct_limit_genl_ops[] = { > .validate = GENL_DONT_VALIDATE_STRICT | > GENL_DONT_VALIDATE_DUMP, > #endif > .flags = 0, /* OK for unprivileged users. */ > +#ifdef HAVE_GENL_OPS_POLICY > .policy = ct_limit_policy, > +#endif > .doit = ovs_ct_limit_cmd_get, > }, > }; Hi Greg, I think this patch is related to upstream patch 3b0f31f2b8c9f ("genetlink: make policy common to family") where it moved the generic netlink policy from genl_ops to genl_family. This patch does remove policy from struct genl_ops, but it does not add the policy to genl_family. Can you update that part in various places as well? Also, it would be great to include that upstream commit into the commit message for reference purpose. Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 02/11] compat: Fix up changes to inet frags in 5.1+
On Wed, Feb 26, 2020 at 9:41 AM Greg Rose wrote: > > Since Linux kernel release 5.1 the fragments field of the inet_frag_queue > structure is removed and now only the rb_fragments structure with an > rb_node pointer is used for both ipv4 and ipv6. In addition, the > atomic_sub and atomic_add functions are replaced with their > equivalent long counterparts. > > Signed-off-by: Greg Rose > --- > acinclude.m4 | 2 ++ > datapath/linux/compat/include/net/inet_frag.h | 21 + > 2 files changed, 23 insertions(+) > > diff --git a/acinclude.m4 b/acinclude.m4 > index db64267..cad76c7 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -1067,6 +1067,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ >[OVS_DEFINE([HAVE_RBTREE_RB_LINK_NODE_RCU])]) >OVS_GREP_IFELSE([$KSRC/include/net/dst_ops.h], [bool confirm_neigh], >[OVS_DEFINE([HAVE_DST_OPS_CONFIRM_NEIGH])]) > + OVS_GREP_IFELSE([$KSRC/include/net/inet_frag.h], [fqdir], > + [OVS_DEFINE([HAVE_INET_FRAG_FQDIR])]) > >if cmp -s datapath/linux/kcompat.h.new \ > datapath/linux/kcompat.h >/dev/null 2>&1; then > diff --git a/datapath/linux/compat/include/net/inet_frag.h > b/datapath/linux/compat/include/net/inet_frag.h > index 124c8be..e3c6df3 100644 > --- a/datapath/linux/compat/include/net/inet_frag.h > +++ b/datapath/linux/compat/include/net/inet_frag.h > @@ -18,7 +18,16 @@ static inline bool inet_frag_evicting(struct > inet_frag_queue *q) > #ifdef HAVE_INET_FRAG_QUEUE_WITH_LIST_EVICTOR > return !hlist_unhashed(&q->list_evictor); > #else > +/* > + * We can't use acinclude.m4 to check this as the field 'fragments' > + * also matches 'rb_fragments'. > + */ > +#if LINUX_VERSION_CODE < KERNEL_VERSION(5,1,0) > return (q_flags(q) & INET_FRAG_FIRST_IN) && q->fragments != NULL; > +#else > + return (q_flags(q) & INET_FRAG_FIRST_IN) && > + q->rb_fragments.rb_node != NULL; > +#endif > #endif /* HAVE_INET_FRAG_QUEUE_WITH_LIST_EVICTOR */ > } > #endif /* HAVE_INET_FRAG_EVICTING */ Yes, it's a bummer if we can not check some fields in the acinclude.m4. It looks like inet_frag_evicting() has been removed by upstream commit 399d1404be66 ("inet: frags: get rif of inet_frag_evicting()") since kernel 4.17, and the only place that we use is in rpl_ipfrag_init() -> ip_expire() when HAVE_CORRECT_MRU_HANDLING is not available. Therefore, how about the following approach that avoids the less reliable kernel version check. --- a/datapath/linux/compat/include/net/inet_frag.h +++ b/datapath/linux/compat/include/net/inet_frag.h @@ -12,6 +12,7 @@ #define qp_flags(qp) (qp->q.flags) #endif +#ifndef HAVE_CORRECT_MRU_HANDLING #ifndef HAVE_INET_FRAG_EVICTING static inline bool inet_frag_evicting(struct inet_frag_queue *q) { @@ -22,6 +23,7 @@ static inline bool inet_frag_evicting(struct inet_frag_queue *q) #endif /* HAVE_INET_FRAG_QUEUE_WITH_LIST_EVICTOR */ } #endif /* HAVE_INET_FRAG_EVICTING */ +#endif /* HAVE_CORRECT_MRU_HANDLING */ I am good with the rest of the backports in this patch. Thanks, -Yi-Hung > @@ -29,6 +38,10 @@ static inline bool inet_frag_evicting(struct > inet_frag_queue *q) > #define inet_frag_lru_move(q) > #endif > > +#ifdef HAVE_INET_FRAG_FQDIR > +#define netns_frags fqdir > +#endif > + > #ifndef HAVE_SUB_FRAG_MEM_LIMIT_ARG_STRUCT_NETNS_FRAGS > #ifdef HAVE_FRAG_PERCPU_COUNTER_BATCH > static inline void rpl_sub_frag_mem_limit(struct netns_frags *nf, int i) > @@ -45,13 +58,21 @@ static inline void rpl_add_frag_mem_limit(struct > netns_frags *nf, int i) > #else /* !frag_percpu_counter_batch */ > static inline void rpl_sub_frag_mem_limit(struct netns_frags *nf, int i) > { > +#ifdef HAVE_INET_FRAG_FQDIR > + atomic_long_sub(i, &nf->mem); > +#else > atomic_sub(i, &nf->mem); > +#endif > } > #define sub_frag_mem_limit rpl_sub_frag_mem_limit > > static inline void rpl_add_frag_mem_limit(struct netns_frags *nf, int i) > { > +#ifdef HAVE_INET_FRAG_FQDIR > + atomic_long_add(i, &nf->mem); > +#else > atomic_add(i, &nf->mem); > +#endif > } > #define add_frag_mem_limit rpl_add_frag_mem_limit > #endif /* frag_percpu_counter_batch */ > -- > 1.8.3.1 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] docs: Update conntrack established state description
Patch a867c010ee91 ("conntrack: Fix conntrack new state") fixes the userspace conntrack behavior. This patch updates the corresponding conntrack state description. Fixes: a867c010ee91 ("conntrack: Fix conntrack new state") Reported-by: Roni Bar Yanai Signed-off-by: Yi-Hung Wei --- Please backports to branch 2.13. --- lib/meta-flow.xml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/meta-flow.xml b/lib/meta-flow.xml index 90b405c73750..2f9c5ee16342 100644 --- a/lib/meta-flow.xml +++ b/lib/meta-flow.xml @@ -2566,8 +2566,8 @@ actions=clone(load:0->NXM_OF_IN_PORT[],output:123) est (0x02) - Part of an existing connection. Set to 1 if this is a committed - connection. + Part of an existing connection. Set to 1 if packets of a committed + connection have been seen by conntrack from both directions. rel (0x04) -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] conntrack: Fix conntrack new state
On Thu, Feb 20, 2020 at 4:42 AM Roni Bar Yanai wrote: > > Hi Yi-Hung, > > From the man page, it looks like the +est is the result of committing action > and not related to the TCP state machine itself. > I'm aware that this is not how it works in kernel conntrack, if you want to > align both , the description might need an update. > > new (0x01) > A new connection. Set to 1 if this is an uncommitted con‐ > nection. > > est (0x02) > Part of an existing connection. Set to 1 if this is a > committed connection. > Hi Roni, Thanks for bringing this up. I will send a patch to update the documentation. Thanks, -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] Documentation: Fix literal blocks formating
Signed-off-by: Yi-Hung Wei --- Documentation/faq/openflow.rst | 2 +- Documentation/faq/qos.rst | 2 +- Documentation/howto/selinux.rst| 2 +- Documentation/howto/tunneling.rst | 2 +- Documentation/intro/install/rhel.rst | 2 +- Documentation/topics/userspace-tso.rst | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/faq/openflow.rst b/Documentation/faq/openflow.rst index 8c94891703a8..0111de78a380 100644 --- a/Documentation/faq/openflow.rst +++ b/Documentation/faq/openflow.rst @@ -385,7 +385,7 @@ but OVS drops the packets instead. $ ovs-ofctl add-flow br0 actions=load:0->NXM_OF_IN_PORT[],2,3,4,5,6 If the input port is important, then one may save and restore it on the -stack: +stack:: $ ovs-ofctl add-flow br0 actions=push:NXM_OF_IN_PORT[],\ load:0->NXM_OF_IN_PORT[],\ diff --git a/Documentation/faq/qos.rst b/Documentation/faq/qos.rst index 53ad89809671..33c319166f37 100644 --- a/Documentation/faq/qos.rst +++ b/Documentation/faq/qos.rst @@ -102,7 +102,7 @@ Q: How do I configure ingress policing? A: A policing policy can be configured on an interface to drop packets that arrive at a higher rate than the configured value. For example, the following commands will rate-limit traffic that vif1.0 may generate to -10Mbps: +10Mbps:: $ ovs-vsctl set interface vif1.0 ingress_policing_rate=1 $ ovs-vsctl set interface vif1.0 ingress_policing_burst=8000 diff --git a/Documentation/howto/selinux.rst b/Documentation/howto/selinux.rst index 4809639bc5bc..55c3e39cece4 100644 --- a/Documentation/howto/selinux.rst +++ b/Documentation/howto/selinux.rst @@ -117,7 +117,7 @@ see in Open vSwitch log files "Permission Denied" errors:: However, not all "Permission denied" errors are caused by SELinux. So, before blaming too strict SELinux policy, make sure that indeed SELinux was the one -that denied OVS access to certain resources, for example, run: +that denied OVS access to certain resources, for example, run:: $ grep "openvswitch_t" /var/log/audit/audit.log | tail type=AVC msg=audit(1453235431.640:114671): avc: denied { getopt } for pid=4583 comm="ovs-vswitchd" scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:system_r:openvswitch_t:s0 tclass=netlink_generic_socket permissive=0 diff --git a/Documentation/howto/tunneling.rst b/Documentation/howto/tunneling.rst index 2645b9043e24..2cbca977ba19 100644 --- a/Documentation/howto/tunneling.rst +++ b/Documentation/howto/tunneling.rst @@ -130,7 +130,7 @@ Create a mirrored configuration on `host2` using the same basic steps: $ ovs-vsctl add-port br0 tap1 #. Create the GRE tunnel on `host2`, this time using the IP address for - ``eth0`` on `host1` when specifying the ``remote_ip`` option: + ``eth0`` on `host1` when specifying the ``remote_ip`` option:: $ ovs-vsctl add-port br0 gre0 \ -- set interface gre0 type=gre options:remote_ip= diff --git a/Documentation/intro/install/rhel.rst b/Documentation/intro/install/rhel.rst index 31f0eec3a4bd..b21b274b716a 100644 --- a/Documentation/intro/install/rhel.rst +++ b/Documentation/intro/install/rhel.rst @@ -201,7 +201,7 @@ On RHEL 6, to build the Open vSwitch kernel module run:: $ rpmbuild -bb rhel/kmod-openvswitch-rhel6.spec -You might have to specify a kernel version and/or variants, e.g.: +You might have to specify a kernel version and/or variants, e.g.:: $ rpmbuild -bb \ -D "kversion 2.6.32-131.6.1.el6.x86_64" \ diff --git a/Documentation/topics/userspace-tso.rst b/Documentation/topics/userspace-tso.rst index 94eddc0b2fd0..9da5d7ef2912 100644 --- a/Documentation/topics/userspace-tso.rst +++ b/Documentation/topics/userspace-tso.rst @@ -53,7 +53,7 @@ Enabling TSO The TSO support may be enabled via a global config value ``userspace-tso-enable``. Setting this to ``true`` enables TSO support for -all ports. +all ports.:: $ ovs-vsctl set Open_vSwitch . other_config:userspace-tso-enable=true -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] conntrack: Fix TCP conntrack state
If a TCP connection is in SYN_SENT state, receiving another SYN packet would just renew the timeout of that conntrack entry rather than create a new one. Thus, tcp_conn_update() should return CT_UPDATE_VALID_NEW. This also fixes regressions of a couple of OVN system tests. Fixes: a867c010ee91 ("conntrack: Fix conntrack new state") Reported-by: Dumitru Ceara Signed-off-by: Yi-Hung Wei --- Please backport to branch 2.13. --- lib/conntrack-tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c index 416cb769d22f..47261c7551d1 100644 --- a/lib/conntrack-tcp.c +++ b/lib/conntrack-tcp.c @@ -189,7 +189,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, } else if (src->state <= CT_DPIF_TCPS_SYN_SENT) { src->state = CT_DPIF_TCPS_SYN_SENT; conn_update_expiration(ct, &conn->up, CT_TM_TCP_FIRST_PACKET, now); -return CT_UPDATE_NEW; +return CT_UPDATE_VALID_NEW; } } -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] conntrack: Fix conntrack new state
On Fri, Jan 31, 2020 at 2:07 AM Dumitru Ceara wrote: > > On 1/29/20 7:58 PM, William Tu wrote: > > On Fri, Dec 20, 2019 at 01:16:39PM -0800, Ben Pfaff wrote: > >> On Fri, Dec 20, 2019 at 09:51:08AM -0800, Yi-Hung Wei wrote: > >>> In connection tracking system, a connection is established if we > >>> see packets from both directions. However, in userspace datapath's > >>> conntrack, if we send a connection setup packet in one direction > >>> twice, it will make the connection to be in established state. > >>> > >>> This patch fixes the aforementioned issue, and adds a system traffic > >>> test for UDP and TCP traffic to avoid regression. > >>> > >>> Fixes: a489b16854b59 ("conntrack: New userspace connection tracker.") > >>> Signed-off-by: Yi-Hung Wei > >>> --- > > > > LGTM. I applied this to master, and branch 2.13. > > Thanks > > William > > > > Hi William, Yi-Hung, > > This patch breaks OVN switch load balancer system tests when running > with the OVS userspace datapath. I only had a glance at the code and > didn't dig into why exactly the tests fail yet but it's quite easy to > reproduce: > > $ cd /tmp/ > $ git clone https://github.com/openvswitch/ovs > $ cd /tmp/ovs > $ git checkout a867c010ee9183885ee9d3eb76a0005c075c4d2e > $ ./boot.sh && ./configure && make -j8 > $ cd /tmp/ > $ git clone https://github.com/ovn-org/ovn > $ cd /tmp/ovn/ > $ ./boot.sh && ./configure --with-ovs-source=/tmp/ovs > --with-ovs-build=/tmp/ovs && make -j8 > $ make -j8 check-system-userspace TESTSUITEFLAGS="-k ovnlb" > [...] > system-ovn > > 7: ovn -- load-balancing FAILED > (system-ovn.at:1121) > 8: ovn -- load-balancing - IPv6FAILED > (system-ovn.at:1268) > 9: ovn -- load-balancing - same subnet.FAILED > (system-ovn.at:1389) > 10: ovn -- load-balancing - same subnet. - IPv6 FAILED > (system-ovn.at:1498) > 11: ovn -- load balancing in gateway router ok > 12: ovn -- load balancing in gateway router - IPv6 ok > 13: ovn -- multiple gateway routers, load-balancing ok > 14: ovn -- multiple gateway routers, load-balancing - IPv6 ok > 15: ovn -- load balancing in router with gateway router port ok > 16: ovn -- load balancing in router with gateway router port - IPv6 ok > > # Reverting commit a867c010ee9183885ee9d3eb76a0005c075c4d2e > > $ cd /tmp/ovs > $ git checkout a867c010ee9183885ee9d3eb76a0005c075c4d2e~ > $ make -j8 > $ cd /tmp/ovn > $ make -j8 check-system-userspace TESTSUITEFLAGS="-k ovnlb" > [...] > system-ovn > > 7: ovn -- load-balancing ok > 8: ovn -- load-balancing - IPv6ok > 9: ovn -- load-balancing - same subnet.ok > 10: ovn -- load-balancing - same subnet. - IPv6 ok > 11: ovn -- load balancing in gateway router ok > 12: ovn -- load balancing in gateway router - IPv6 ok > 13: ovn -- multiple gateway routers, load-balancing ok > 14: ovn -- multiple gateway routers, load-balancing - IPv6 ok > 15: ovn -- load balancing in router with gateway router port ok > 16: ovn -- load balancing in router with gateway router port - IPv6 ok > > Thanks, > Dumitru Hi Dumitru, Thanks for reporting this issue. I am not familiar with OVN and the broken system tests, but I will take a look. -Yi-Hung ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev_afxdp: Detects combined channels and aborts wrong config
On Thu, Jan 9, 2020 at 5:39 AM Ilya Maximets wrote: > On 06.01.2020 22:48, William Tu wrote: > > On Mon, Dec 23, 2019 at 11:34 AM Yi-Hung Wei wrote: > >> > >> This patches detects the number of combined channels on a AF_XDP port > >> via using ethtool interface. If the number of combined channels > >> is different from the n_rxq, netdev_afxdp will not be able to get > >> all the packets from that port. Thus, abort the port setup when > >> the case is detected. > >> > >> Signed-off-by: Yi-Hung Wei > > > > Looks good to me, thanks! > > CC Ilya to see if he has more insight/comments. > > > > Acked-by: William Tu > > > >> --- > >> Travis CI: https://travis-ci.org/YiHungWei/ovs/builds/627972465 > >> > >> --- > >> lib/netdev-afxdp.c | 15 +++ > >> lib/netdev-linux.c | 27 +++ > >> lib/netdev-linux.h | 2 ++ > >> 3 files changed, 44 insertions(+) > >> > >> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c > >> index 58365ed483e3..6d002882d355 100644 > >> --- a/lib/netdev-afxdp.c > >> +++ b/lib/netdev-afxdp.c > >> @@ -601,6 +601,14 @@ netdev_afxdp_set_config(struct netdev *netdev, const > >> struct smap *args, > >> enum afxdp_mode xdp_mode; > >> bool need_wakeup; > >> int new_n_rxq; > >> +int combined_channels; > >> + > >> +if (netdev_linux_ethtool_get_combined_channels(netdev, > >> + &combined_channels)) { > >> +VLOG_INFO("Cannot get the number of combined channels of %s. " > >> + "Defaults it to 1.", netdev_get_name(netdev)); > >> +combined_channels = 1; > >> +} > >> > >> ovs_mutex_lock(&dev->mutex); > >> new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); > >> @@ -611,6 +619,13 @@ netdev_afxdp_set_config(struct netdev *netdev, const > >> struct smap *args, > >> return EINVAL; > >> } > >> > >> +if (new_n_rxq != combined_channels) { > >> +ovs_mutex_unlock(&dev->mutex); > >> +VLOG_ERR("%s: n_rxq: %d != combined channels: %d.", > >> + netdev_get_name(netdev), new_n_rxq, combined_channels); > >> +return EINVAL; > >> +} > >> + > >> str_xdp_mode = smap_get_def(args, "xdp-mode", "best-effort"); > >> for (xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT; > >> xdp_mode < OVS_AF_XDP_MODE_MAX; > >> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > >> index f8e59bacfb13..e3086fc1cbb6 100644 > >> --- a/lib/netdev-linux.c > >> +++ b/lib/netdev-linux.c > >> @@ -2166,6 +2166,33 @@ out: > >> netdev->get_features_error = error; > >> } > >> > >> +int > >> +netdev_linux_ethtool_get_combined_channels(struct netdev *netdev_, > >> + int *combined_channels) > >> +{ > >> +struct netdev_linux *netdev = netdev_linux_cast(netdev_); > >> +struct ethtool_channels echannels; > >> +int err; > >> + > >> +ovs_mutex_lock(&netdev->mutex); > >> + > >> +COVERAGE_INC(netdev_get_ethtool); > >> + > >> +memset(&echannels, 0, sizeof echannels); > >> +err = netdev_linux_do_ethtool(netdev_get_name(netdev_), > >> + (struct ethtool_cmd *) &echannels, > >> + ETHTOOL_GCHANNELS, "ETHTOOL_GCHANNELS"); > > Does this command return maximum possible number of channels or > number of currently enabled channels? Thanks for review. This command can return both the maximum possible number of channels and the currently enabled channels. > In first case we'll end up with a need to configure huge number of queues. > In second case it's not guaranteed that number of channels will not be > changed later. (Does enabling more channels generates if-notifier event?) Yes, it generates if-notifier event. > Another point is why we need a configurable parameter that is allowed to > be configured with a single value only? > > And there is a possible usecase for not having all the queues attached > to OVS. For example, if you have a custom xdp program loaded, you may > redirect specific traffic to fast afxdp queues and ot
Re: [ovs-dev] [PATCH] netdev_afxdp: Detects combined channels and aborts wrong config
Hi Ben, After taking closer look at different driver implementations, I found that the use of combined channels is actually driver dependent. >From ethtool (8), a channel is an IRQ and the set of queues that can trigger that IRQ. I tend to think channel as rx/tx queues from software's point of view. There are 4 types of channels that ethtool can configure, rx, tx, other, and combined. For Intel NICs, (at least for NICs that use ixgbe, i40 driver), we can only configure channels through 'combined' parameter. It will set both rx and tx channel to N with the following command. $ ethool -L eth0 combined N However, it does not support configuring rx and tx channel separately. We can only set/get number of combined channel, but not rx or tx channel. For example, $ ethtool -l eth0 Channel parameters for eth0: Pre-set maximums: RX:0 TX:0 Other:1 Combined:63 Current hardware settings: RX:0 TX:0 Other:1 Combined:8 On the other hand, NICs that use Broadcom tg3 drivers can configure its rx and tx channel individually, but can not set with 'combined' parameter. For Mellanox, mlx4 driver supports set/get rx/tx channels, but mlx5 driver can only configured through combined channel. Another ethtool example with Broadcom NetXtreme BCM5720 NIC. $ ethtool -L eno1 rx 2 tx 1 $ ethtool -l eno1 Channel parameters for eno1: Pre-set maximums: RX: 4 TX: 4 Other: 0 Combined: 0 Current hardware settings: RX: 2 TX: 1 Other: 0 Combined: 0 Back to this patch, what I was trying to do is to make sure that the number of rx channels reported from ethtool is equal to n_rxq in AF_XDP's port setting. Since, I was using a testbed with Intel NIC, I assume to get # of rx channels from 'combined'. To support other types of NICs, in the next version, I will first compare n_rxq with 'rx'. In case of 'rx' is 0, 'combined' will be used. I will update the related documentation as well. Thanks, -Yi-Hung On Mon, Jan 6, 2020 at 2:15 PM Ben Pfaff wrote: > > On Mon, Dec 23, 2019 at 11:33:57AM -0800, Yi-Hung Wei wrote: > > This patches detects the number of combined channels on a AF_XDP port > > via using ethtool interface. If the number of combined channels > > is different from the n_rxq, netdev_afxdp will not be able to get > > all the packets from that port. Thus, abort the port setup when > > the case is detected. > > > > Signed-off-by: Yi-Hung Wei > > --- > > Travis CI: https://travis-ci.org/YiHungWei/ovs/builds/627972465 > > I don't know what a combined channel is. Should the documentation talk > about this? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v9 2/2] netdev-afxdp: NUMA-aware memory allocation for XSK related memory
Currently, the AF_XDP socket (XSK) related memory are allocated by main thread in the main thread's NUMA domain. With the patch that detects netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on the AF_XDP netdev's NUMA domain. If the net device's NUMA domain is different from the main thread's NUMA domain, we will have two cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU). This patch addresses the aforementioned issue by allocating the memory in the net device's NUMA domain. Signed-off-by: Yi-Hung Wei --- v9: - Addreess review comments from Ilya in patch 2. * Add check on numa_available() * Properly restore memory policy with get/set_mempolicy. v8: - Addreess review comments from Eelco and Ilya in patch 2. * Use OVS_FIND_DEPENDENCY(). * Avoid the locking issue when calling netdev_get_numa_id(). * Check NETDEV_NUMA_UNSPEC. * Use return value from netdev_get_numa_id() directly, and check NETDEV_NUMA_UNSPEC case. * Use numa_set_preferred(). --- Documentation/intro/install/afxdp.rst | 2 +- acinclude.m4 | 2 ++ include/sparse/automake.mk| 1 + include/sparse/numa.h | 27 +++ lib/netdev-afxdp.c| 26 ++ 5 files changed, 57 insertions(+), 1 deletion(-) create mode 100644 include/sparse/numa.h diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst index 7b0736c96114..c4685fa7ebac 100644 --- a/Documentation/intro/install/afxdp.rst +++ b/Documentation/intro/install/afxdp.rst @@ -164,7 +164,7 @@ If a test case fails, check the log at:: Setup AF_XDP netdev --- -Before running OVS with AF_XDP, make sure the libbpf and libelf are +Before running OVS with AF_XDP, make sure the libbpf, libelf, and libnuma are set-up right:: ldd vswitchd/ovs-vswitchd diff --git a/acinclude.m4 b/acinclude.m4 index 542637ac8cb8..f73dc9bf7e3c 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -286,6 +286,8 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [ AC_CHECK_FUNCS([pthread_spin_lock], [], [AC_MSG_ERROR([unable to find pthread_spin_lock for AF_XDP support])]) +OVS_FIND_DEPENDENCY([numa_alloc_onnode], [numa], [libnuma]) + AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF_XDP support is available and enabled.]) LIBBPF_LDADD=" -lbpf -lelf" diff --git a/include/sparse/automake.mk b/include/sparse/automake.mk index 073631e8c082..974ad3fe55f7 100644 --- a/include/sparse/automake.mk +++ b/include/sparse/automake.mk @@ -5,6 +5,7 @@ noinst_HEADERS += \ include/sparse/bits/floatn.h \ include/sparse/assert.h \ include/sparse/math.h \ +include/sparse/numa.h \ include/sparse/netinet/in.h \ include/sparse/netinet/ip6.h \ include/sparse/netpacket/packet.h \ diff --git a/include/sparse/numa.h b/include/sparse/numa.h new file mode 100644 index ..3691a0eaf729 --- /dev/null +++ b/include/sparse/numa.h @@ -0,0 +1,27 @@ +/* + * Copyright (c) 2019 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef __CHECKER__ +#error "Use this header only with sparse. It is not a correct implementation." +#endif + +/* Avoid sparse warning: non-ANSI function declaration of function" */ +#define numa_get_membind_compat() numa_get_membind_compat(void) +#define numa_get_interleave_mask_compat() numa_get_interleave_mask_compat(void) +#define numa_get_run_node_mask_compat() numa_get_run_node_mask_compat(void) + +/* Get actual definitions for us to annotate and build on. */ +#include_next diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index 426dce944977..fad0aab9d550 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -26,6 +26,8 @@ #include #include #include +#include +#include #include #include #include @@ -669,6 +671,24 @@ netdev_afxdp_reconfigure(struct netdev *netdev) struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; int err = 0; +/* Allocate all the xsk related memory in the netdev's NUMA domain. */ +struct bitmask *old_bm = NULL; +int old_policy, numa_id; +if (numa_available() != -1) { +numa_id = netdev_get_numa_id(netdev); +if (numa_id != NETDEV_NUMA_UNSPEC) { +old_bm = numa_allo