Re: [ovs-dev] [PATCH v2 13/15] system-tests: Disable offloads in userspace tests.
On 26/04/2016 16:38, "Flavio Leitner" wrote: >On Fri, Apr 15, 2016 at 05:02:45PM -0700, Daniele Di Proietto wrote: >> The system userspace testsuite uses the userspace datapath with >> netdev-linux devices, connected to veth pairs with the AF_PACKET socket: >> >> (veth pair) (AF_PACKET) >> TCP stack -> p0 ---> ovs-p0 -> netdev-linux (userspace OVS) >> >> Unfortunately this configuration has some problems with offloads: a >> packet generated by the TCP stack maybe sent to p0 without being >> checksummed or segmented. The AF_PACKET socket, by default, ignores the >> offloads and just transmits the data of the packets to userspace, but: >> >> 1. The packet may need GSO, so the data will be too big to be received >>by the userspace datapath >> 2. The packet might have incomplete checksums, so it will likely be >>discarded by the receiver. >> >> Problem 1 causes TCP connections to see a congestion window smaller than >> the MTU, which hurts performance but doesn't prevent communication. >> >> Problem 2 was hidden in the testsuite by a Linux kernel bug, fixed by >> commit ce8c839b74e3("veth: don’t modify ip_summed; doing so treats >> packets with bad checksums as good"). In the kernels that include the >> fix, the userspace datapath is able to process pings, but not tcp or udp >> data. >> >> Unfortunately I couldn't find a way to ask the AF_PACKET to perform >> offloads in kernel. A possible fix would be to use the PACKET_VNET_HDR >> sockopt and perform the offloads in userspace. >> >> Until a proper fix is worked out for netdev-linux, this commit disables >> offloads on the non-OVS side of the veth pair, as a workaround. >> >> Signed-off-by: Daniele Di Proietto >> --- > >I understand why you need this but I don't think it should be part of >this patchset. > >We should start noting somewhere things that we need to revisit in >the future. > >Acked-by: Flavio Leitner You're right, I'll push that separately. I've only included it here because the system tests in the last patch need this. I can add a line in TODO.md for that Thanks for the review! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v9 06/15] hmap: Add HMAP_FOR_EACH_POP.
On 26/04/2016 08:24, "Ben Pfaff" wrote: >On Fri, Apr 22, 2016 at 06:02:58PM -0700, Daniele Di Proietto wrote: >> Makes popping each member of the hmap a bit easier. >> >> Signed-off-by: Daniele Di Proietto > >Much better, thank you! > >Acked-by: Ben Pfaff Thanks! I pushed this to master ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 14/15] system-tests: Add tcp simple test.
On 26/04/2016 16:38, "Flavio Leitner" wrote: >On Fri, Apr 15, 2016 at 05:02:46PM -0700, Daniele Di Proietto wrote: >> Useful to test the datapath ability to forward tcp packets without the >> complexity of connection tracking. >> >> Signed-off-by: Daniele Di Proietto >> --- > >This patch can be separated from this patchset as well. Agreed, I'll push that separately > >Thanks for improving the testsuite. > >Acked-by: Flavio Leitner Thanks for the reviews! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 09/15] dpif-netdev: Execute conntrack action.
On 26/04/2016 16:37, "Flavio Leitner" wrote: >On Fri, Apr 15, 2016 at 05:02:41PM -0700, Daniele Di Proietto wrote: >> This commit implements the OVS_ACTION_ATTR_CT action in dpif-netdev. >> >> To allow ofproto-dpif to detect the conntrack feature, flow_put will not >> discard anymore flows with ct_* fields set. We still shouldn't allow >> flows with NAT bits set, since there is no support for NAT. >> >> Signed-off-by: Daniele Di Proietto >> --- >> lib/dpif-netdev.c | 68 >> +-- >> tests/dpif-netdev.at | 14 +-- >> tests/ofproto-dpif.at | 20 +++ >> 3 files changed, 77 insertions(+), 25 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 1e8a37c..436359a 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -33,6 +33,7 @@ >> >> #include "bitmap.h" >> #include "cmap.h" >> +#include "conntrack.h" >> #include "coverage.h" >> #include "csum.h" >> #include "dp-packet.h" >> @@ -89,9 +90,17 @@ static struct shash dp_netdevs >> OVS_GUARDED_BY(dp_netdev_mutex) >> >> static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600); >> >> +#define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \ >> + | CS_INVALID | CS_REPLY_DIR | >> CS_TRACKED) >> +#define DP_NETDEV_CS_UNSUPPORTED_MASK >> (~(uint32_t)DP_NETDEV_CS_SUPPORTED_MASK) >> + >> static struct odp_support dp_netdev_support = { >> .max_mpls_depth = SIZE_MAX, >> .recirc = true, >> +.ct_state = true, >> +.ct_zone = true, >> +.ct_mark = true, >> +.ct_label = true, >> }; >> >> /* Stores a miniflow with inline values */ >> @@ -224,6 +233,8 @@ struct dp_netdev { >> /* Cpu mask for pin of pmd threads. */ >> char *pmd_cmask; >> uint64_t last_tnl_conf_seq; >> + >> +struct conntrack conntrack; >> }; >> >> static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev >> *dp, >> @@ -906,6 +917,8 @@ create_dp_netdev(const char *name, const struct >> dpif_class *class, >> dp->upcall_aux = NULL; >> dp->upcall_cb = NULL; >> >> +conntrack_init(&dp->conntrack); >> + >> cmap_init(&dp->poll_threads); >> ovs_mutex_init_recursive(&dp->non_pmd_mutex); >> ovsthread_key_create(&dp->per_pmd_key, NULL); >> @@ -976,6 +989,8 @@ dp_netdev_free(struct dp_netdev *dp) >> ovs_mutex_destroy(&dp->non_pmd_mutex); >> ovsthread_key_delete(dp->per_pmd_key); >> >> +conntrack_destroy(&dp->conntrack); >> + >> ovs_mutex_lock(&dp->port_mutex); >> CMAP_FOR_EACH (port, node, &dp->ports) { >> /* PMD threads are destroyed here. do_del_port() cannot quiesce */ >> @@ -1965,9 +1980,7 @@ dpif_netdev_flow_from_nlattrs(const struct nlattr >> *key, uint32_t key_len, >> return EINVAL; >> } >> >> -/* Userspace datapath doesn't support conntrack. */ >> -if (flow->ct_state || flow->ct_zone || flow->ct_mark >> -|| !ovs_u128_is_zero(&flow->ct_label)) { >> +if (flow->ct_state & DP_NETDEV_CS_UNSUPPORTED_MASK) { >> return EINVAL; >> } >> >> @@ -2604,6 +2617,9 @@ dpif_netdev_run(struct dpif *dpif) >> ovs_mutex_unlock(&dp->non_pmd_mutex); >> dp_netdev_pmd_unref(non_pmd); >> >> +/* XXX: If workload is too heavy we could add a separate thread. */ >> +conntrack_run(&dp->conntrack); >> + > >Do you have any results to share? Just curious. > >Acked-by: Flavio Leitner As also pointed out by Joe this probably need to be revisited. I'll do more testing. Thanks! > >> tnl_neigh_cache_run(); >> tnl_port_map_run(); >> new_tnl_seq = seq_read(tnl_conf_seq); >> @@ -3850,12 +3866,48 @@ dp_execute_cb(void *aux_, struct dp_packet >> **packets, int cnt, >> VLOG_WARN("Packet dropped. Max recirculation depth exceeded."); >> break; >> >> -case OVS_ACTION_ATTR_CT: >> -/* If a flow with this action is slow-pathed, datapath assistance is >> - * required to implement it. However, we don't support this action >> - * in the userspace datapath. */ >> -VLOG_WARN("Cannot execute conntrack action in userspace."); >> +case OVS_ACTION_ATTR_CT: { >> +const struct nlattr *b; >> +bool commit = false; >> +unsigned int left; >> +uint16_t zone = 0; >> +const char *helper = NULL; >> +const uint32_t *setmark = NULL; >> +const struct ovs_key_ct_labels *setlabel = NULL; >> + >> + >> +/* XXX parsing this everytime is expensive. We should do like >> kernel >> + * does and create a structure. */ >> +NL_ATTR_FOR_EACH_UNSAFE (b, left, nl_attr_get(a), >> nl_attr_get_size(a)) { >> +enum ovs_ct_attr sub_type = nl_attr_type(b); >> + >> +switch(sub_type) { >> +case OVS_CT_ATTR_COMMIT: >> +commit = true; >> +break; >> +case OVS_CT_ATTR_ZONE
Re: [ovs-dev] [PATCH v2 00/15] Userspace (DPDK) connection tracker
Hi Antonio, great, thank you very much for performing this tests and share the results On 25/04/2016 07:00, "Fischetti, Antonio" wrote: >Hi Daniele, >below are > - Regression Tests results > - ALWAYS_INLINE check > > >Regression Tests on Performance >=== > >I did some regression tests on performance, it looks like there's >no performance drop. > >I've sent mono-directional data with 64-Byte packets at line rate >on one ixia port (Tx) and read the receiving rate on the other ixia >port (Rx), regardless of packet loss. > >Below the 2 tests I did and then a table with performance figures. > >I’ve used OVS master @ Commit ID: d87091c2a1f9d3db1a2352eba1b609405122d15a >and compared results with and without the ConnTracker v2 patches. > > >Test #1: 'Phy-to-Phy' >- > > +-+ >-->[ Port 1 (dpdk) ]---| | > | OVS | ><--[ Port 2 (dpdk) ]---| | > +-+ > >ovs-ofctl add-flow br0 in_port=1,action=output:2 > > > >Test #2: Phy-VM-Phy with vHost-User >--- > > +-++--+ >-->[ Port 1 (dpdk) ]---| |---[ Port 4 (dpdkvhostuser) ]---| | > | OVS || testpmd @ VM | > ><--[ Port 3 (dpdk) ]---| |---[ Port 2 (dpdkvhostuser) ]---| | > +-++--+ > >ovs-ofctl add-flow br0 idle_timeout=0,in_port=1,action=output:4 >ovs-ofctl add-flow br0 idle_timeout=0,in_port=2,action=output:3 > > > >Performance figures >--- > > ++---+ > | OVS master | OVS master 2.5 + | > || C.T. Patches v2 | >+--++---+ >| Phy-to-Phy | 13.49 Mpps | 13.46 Mpps| >+--++---+ >| PVP (vHost-User) | 5.6 Mpps | 5.6 Mpps | >+--++---+ > > > >ALWAYS_INLINE >= >I've also checked what we already discussed with the patch set v1. >I applied patches one by one and measured the performance, it seems >that ALWAYS_INLINE in parse_vlan() and parse_ethertype() has fixed >the performance regression issue. > > > >Antonio > > >> -Original Message- >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di >> Proietto >> Sent: Saturday, April 16, 2016 1:03 AM >> To: dev@openvswitch.org >> Subject: [ovs-dev] [PATCH v2 00/15] Userspace (DPDK) connection tracker >> >> This series aims to implement the ct() action for the dpif-netdev datapath. >> The bulk of the code is in the new conntrack module: it contains some packet >> parsing code, some lookup tables and the logic to implements all the ct bits. >> >> The conntrack module is helped by conntrack-tcp, for TCP window and flags >> tracking: the bulk of the code of this submodule is from the FreeBSD's pf >> subsystem, therefore is BSD licensed. >> >> The rest of the series integrates the connection tracker with the rest of >> OVS: the ct() action is implemented in dpif-netdev, and the debugging >> interfaces required by dpctl/{dump,flush}-conntrack are implemented. >> >> Besides adding some unit tests, this series ports the existing conntrack >> system test to the userspace datapath. Some small modifications are >> required to pass the testsuite, and some tests still have to be skipped. >> >> On newer kernels the userspace testsuite has some problems with offloads, >> so a workaround is included. >> >> This can also be downloaded at: >> >> https://github.com/ddiproietto/ovs/tree/userconntrack_20160415 >> >> Any feedback is appreciated, thanks. >> >> v1 -> v2: >> * Fixed bug in tcp_get_wscale(), related to TCP options parsing. >> * Changed names of ICMP constants: now they're different from Linux and >> FreeBSD. >> * Fixed bug in parse_ipv6_ext_hdrs(). >> * Used ALWAYS_INLINE in parse_vlan and parse_ethertype, to avoid a >> performance regression in miniflow_extract(). >> * Updated copyright info in COPYING and debian/copyright.in. >> * Rebased. >> * Changed batching strategy in conntrack_execute() to allow a newly >> created connection to be picked up by packets in the same batch. >> * Added an ovs-test module to throw pcap files at the connection tracker. >> * Added a workaround for the userspace testsuite on new kernels and a tcp >> non-conntrack test. >> >> Daniele Di Proietto (15): >> packets: Define ICMP types. >> flow: Export parse_ipv6_ext_hdrs(). >> flow: Introduce parse_dl_type(). >> conntrack: New userspace connection tracker. >> tests: Add very simple conntrack benchmark. >> tests: Add test-conntrack pcap test. >> conntrack: Implement flush function. >> conntrack: Implement dumping t
Re: [ovs-dev] [PATCH v2 04/15] conntrack: New userspace connection tracker.
Thank you for your comments, Flavio! On 26/04/2016 16:35, "Flavio Leitner" wrote: >On Fri, Apr 15, 2016 at 05:02:36PM -0700, Daniele Di Proietto wrote: >> This commit adds the conntrack module. >> >> It is a connection tracker that resides entirely in userspace. Its >> primary user will be the dpif-netdev datapath. >> >> The module main goal is to provide conntrack_execute(), which offers a >> convenient interface to implement the datapath ct() action. >> >> The conntrack module uses two submodules to deal with the l4 protocol >> details (conntrack-other for UDP and ICMP, conntrack-tcp for TCP). >> >> The conntrack-tcp submodule implementation is adapted from FreeBSD's pf >> subsystem, therefore it's BSD licensed. It has been slightly altered to >> match the OVS coding style and to allow the pickup of already >> established connections. >> >> Signed-off-by: Daniele Di Proietto >> --- >> COPYING | 1 + >> debian/copyright.in | 4 + >> lib/automake.mk | 5 + >> lib/conntrack-other.c | 91 ++ >> lib/conntrack-private.h | 77 + >> lib/conntrack-tcp.c | 476 +++ >> lib/conntrack.c | 851 >> >> lib/conntrack.h | 144 >> 8 files changed, 1649 insertions(+) >> create mode 100644 lib/conntrack-other.c >> create mode 100644 lib/conntrack-private.h >> create mode 100644 lib/conntrack-tcp.c >> create mode 100644 lib/conntrack.c >> create mode 100644 lib/conntrack.h >> >> diff --git a/COPYING b/COPYING >> index 308e3ea..afb98b9 100644 >> --- a/COPYING >> +++ b/COPYING >> @@ -25,6 +25,7 @@ License, version 2. >> The following files are licensed under the 2-clause BSD license. >> include/windows/getopt.h >> lib/getopt_long.c >> +lib/conntrack-tcp.c >> >> The following files are licensed under the 3-clause BSD-license >> include/windows/netinet/icmp6.h >> diff --git a/debian/copyright.in b/debian/copyright.in >> index 57d007a..a15f4dd 100644 >> --- a/debian/copyright.in >> +++ b/debian/copyright.in >> @@ -21,6 +21,9 @@ Upstream Copyright Holders: >> Copyright (c) 2014 Michael Chapman >> Copyright (c) 2014 WindRiver, Inc. >> Copyright (c) 2014 Avaya, Inc. >> +Copyright (c) 2001 Daniel Hartmeier >> +Copyright (c) 2002 - 2008 Henning Brauer >> +Copyright (c) 2012 Gleb Smirnoff >> >> License: >> >> @@ -90,6 +93,7 @@ License: >> lib/getopt_long.c >> include/windows/getopt.h >> datapath-windows/ovsext/Conntrack-tcp.c >> +lib/conntrack-tcp.c >> >> * The following files are licensed under the 3-clause BSD-license >> >> diff --git a/lib/automake.mk b/lib/automake.mk >> index 1ec2115..ba30442 100644 >> --- a/lib/automake.mk >> +++ b/lib/automake.mk >> @@ -47,6 +47,11 @@ lib_libopenvswitch_la_SOURCES = \ >> lib/compiler.h \ >> lib/connectivity.c \ >> lib/connectivity.h \ >> +lib/conntrack-private.h \ >> +lib/conntrack-tcp.c \ >> +lib/conntrack-other.c \ >> +lib/conntrack.c \ >> +lib/conntrack.h \ >> lib/coverage.c \ >> lib/coverage.h \ >> lib/crc32c.c \ >> diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c >> new file mode 100644 >> index 000..65d02a9 >> --- /dev/null >> +++ b/lib/conntrack-other.c >> @@ -0,0 +1,91 @@ >> +/* >> + * Copyright (c) 2015, 2016 Nicira, Inc. >> + * >> + * Licensed under the Apache License, Version 2.0 (the "License"); >> + * you may not use this file except in compliance with the License. >> + * You may obtain a copy of the License at: >> + * >> + * http://www.apache.org/licenses/LICENSE-2.0 >> + * >> + * Unless required by applicable law or agreed to in writing, software >> + * distributed under the License is distributed on an "AS IS" BASIS, >> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. >> + * See the License for the specific language governing permissions and >> + * limitations under the License. >> + */ >> + >> +#include >> + >> +#include "conntrack-private.h" >> +#include "dp-packet.h" >> + >> +enum other_state { >> +OTHERS_FIRST, >> +OTHERS_MULTIPLE, >> +OTHERS_BIDIR, >> +}; >> + >> +struct conn_other { >> +struct conn up; >> +enum other_state state; >> +}; >> + >> +static const long long other_timeouts[] = { >> +[OTHERS_FIRST] = 60 * 1000, >> +[OTHERS_MULTIPLE] = 60 * 1000, >> +[OTHERS_BIDIR] = 30 * 1000, >> +}; > >I missed a description here, like the unit. Right, I added a comment. > >> + >> +static struct conn_other * >> +conn_other_cast(const struct conn *conn) >> +{ >> +return CONTAINER_OF(conn, struct conn_other, up); >> +} >> + >> +static void >> +update_expiration(struct conn_other *conn, long long now) >> +{ >> +conn->up.expiration = now + other_timeouts[conn->state]; >> +} >> + >> +static enum ct_update_res >> +other_conn_update(struct conn *conn_, struct dp_packet *pkt OVS_UNUSED, >> +bo
Re: [ovs-dev] [PATCH v2 11/15] dpif-netdev: Implement conntrack flush interface.
On 26/04/2016 16:38, "Flavio Leitner" wrote: >On Fri, Apr 15, 2016 at 05:02:43PM -0700, Daniele Di Proietto wrote: >> Signed-off-by: Daniele Di Proietto > >This could be squashed with "[07/15] conntrack: Implement flush >function" > >Just a suggestion. > >Acked-by: Flavio Leitner As also suggested by Joe I've squashed those Thanks > >> --- >> lib/dpif-netdev.c | 10 +- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 5ac2bf3..995cbc0 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -3991,6 +3991,14 @@ dpif_netdev_ct_dump_done(struct dpif *dpif OVS_UNUSED, >> return err; >> } >> >> +static int >> +dpif_netdev_ct_flush(struct dpif *dpif, const uint16_t *zone) >> +{ >> +struct dp_netdev *dp = get_dp_netdev(dpif); >> + >> +return conntrack_flush(&dp->conntrack, zone); >> +} >> + >> const struct dpif_class dpif_netdev_class = { >> "netdev", >> dpif_netdev_init, >> @@ -4034,7 +4042,7 @@ const struct dpif_class dpif_netdev_class = { >> dpif_netdev_ct_dump_start, >> dpif_netdev_ct_dump_next, >> dpif_netdev_ct_dump_done, >> -NULL, /* ct_flush */ >> +dpif_netdev_ct_flush, >> }; >> >> static void >> -- >> 2.1.4 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev > >-- >fbl ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 05/15] tests: Add very simple conntrack benchmark.
On 26/04/2016 16:36, "Flavio Leitner" wrote: >On Fri, Apr 15, 2016 at 05:02:37PM -0700, Daniele Di Proietto wrote: >> This introduces a very limited but simple benchmark for >> conntrack_execute(). It just sends repeatedly the same batch of packets >> through the connection tracker and returns the time spent to process >> them. >> >> While this is not a realistic benchmark, it has proven useful during >> development to evaluate different batching and locking strategies. >> >> E.g. the line: >> >> `./test/ovstest test-conntrack benchmark 1 1488 32` > >14.88M? :-) You got me :-) >> >> starts 1 thread that will send 1488 packets to the connection >> tracker, 32 at a time. It will print the time taken to process them. >> >> Signed-off-by: Daniele Di Proietto >> --- > >Acked-by: Flavio Leitner Thanks for the review! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 04/15] conntrack: New userspace connection tracker.
Hi Antonio, Thank you for your comments, replies inline On 25/04/2016 09:14, "Fischetti, Antonio" wrote: >Hi Daniele, >some comments inline prefixed with [Antonio F]. > >Regards, >Antonio > >> -Original Message- >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di >> Proietto >> Sent: Saturday, April 16, 2016 1:03 AM >> To: dev@openvswitch.org >> Subject: [ovs-dev] [PATCH v2 04/15] conntrack: New userspace connection >> tracker. >> >> This commit adds the conntrack module. >> >> It is a connection tracker that resides entirely in userspace. Its >> primary user will be the dpif-netdev datapath. >> >> The module main goal is to provide conntrack_execute(), which offers a >> convenient interface to implement the datapath ct() action. >> >> The conntrack module uses two submodules to deal with the l4 protocol >> details (conntrack-other for UDP and ICMP, conntrack-tcp for TCP). >> >> The conntrack-tcp submodule implementation is adapted from FreeBSD's pf >> subsystem, therefore it's BSD licensed. It has been slightly altered to >> match the OVS coding style and to allow the pickup of already >> established connections. >> >> Signed-off-by: Daniele Di Proietto >> --- >> COPYING | 1 + >> debian/copyright.in | 4 + >> lib/automake.mk | 5 + >> lib/conntrack-other.c | 91 ++ >> lib/conntrack-private.h | 77 + >> lib/conntrack-tcp.c | 476 +++ >> lib/conntrack.c | 851 >> >> lib/conntrack.h | 144 >> 8 files changed, 1649 insertions(+) >> create mode 100644 lib/conntrack-other.c >> create mode 100644 lib/conntrack-private.h >> create mode 100644 lib/conntrack-tcp.c >> create mode 100644 lib/conntrack.c >> create mode 100644 lib/conntrack.h >> >> diff --git a/COPYING b/COPYING >> index 308e3ea..afb98b9 100644 >> --- a/COPYING >> +++ b/COPYING >> @@ -25,6 +25,7 @@ License, version 2. >> The following files are licensed under the 2-clause BSD license. >> include/windows/getopt.h >> lib/getopt_long.c >> +lib/conntrack-tcp.c >> >> The following files are licensed under the 3-clause BSD-license >> include/windows/netinet/icmp6.h >> diff --git a/debian/copyright.in b/debian/copyright.in >> index 57d007a..a15f4dd 100644 >> --- a/debian/copyright.in >> +++ b/debian/copyright.in >> @@ -21,6 +21,9 @@ Upstream Copyright Holders: >> Copyright (c) 2014 Michael Chapman >> Copyright (c) 2014 WindRiver, Inc. >> Copyright (c) 2014 Avaya, Inc. >> +Copyright (c) 2001 Daniel Hartmeier >> +Copyright (c) 2002 - 2008 Henning Brauer >> +Copyright (c) 2012 Gleb Smirnoff >> >> License: >> >> @@ -90,6 +93,7 @@ License: >> lib/getopt_long.c >> include/windows/getopt.h >> datapath-windows/ovsext/Conntrack-tcp.c >> +lib/conntrack-tcp.c >> >> * The following files are licensed under the 3-clause BSD-license >> >> diff --git a/lib/automake.mk b/lib/automake.mk >> index 1ec2115..ba30442 100644 >> --- a/lib/automake.mk >> +++ b/lib/automake.mk >> @@ -47,6 +47,11 @@ lib_libopenvswitch_la_SOURCES = \ >> lib/compiler.h \ >> lib/connectivity.c \ >> lib/connectivity.h \ >> +lib/conntrack-private.h \ >> +lib/conntrack-tcp.c \ >> +lib/conntrack-other.c \ >> +lib/conntrack.c \ >> +lib/conntrack.h \ >> lib/coverage.c \ >> lib/coverage.h \ >> lib/crc32c.c \ >> diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c >> new file mode 100644 >> index 000..65d02a9 >> --- /dev/null >> +++ b/lib/conntrack-other.c >> @@ -0,0 +1,91 @@ >> +/* >> + * Copyright (c) 2015, 2016 Nicira, Inc. >> + * >> + * Licensed under the Apache License, Version 2.0 (the "License"); >> + * you may not use this file except in compliance with the License. >> + * You may obtain a copy of the License at: >> + * >> + * http://www.apache.org/licenses/LICENSE-2.0 >> + * >> + * Unless required by applicable law or agreed to in writing, software >> + * distributed under the License is distributed on an "AS IS" BASIS, >> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or >> implied. >> + * See the License for the specific language governing permissions and >> + * limitations under the License. >> + */ >> + >> +#include >> + >> +#include "conntrack-private.h" >> +#include "dp-packet.h" >> + >> +enum other_state { >> +OTHERS_FIRST, >> +OTHERS_MULTIPLE, >> +OTHERS_BIDIR, >> +}; >> + >> +struct conn_other { >> +struct conn up; >> +enum other_state state; >> +}; >> + >> +static const long long other_timeouts[] = { >> +[OTHERS_FIRST] = 60 * 1000, >> +[OTHERS_MULTIPLE] = 60 * 1000, >> +[OTHERS_BIDIR] = 30 * 1000, >> +}; >> + >> +static struct conn_other * >> +conn_other_cast(const struct conn *conn) >> +{ >> +return CONTAINER_OF(conn, struct conn_other, up); >> +} >> + >> +static void >> +update_expiration(struct conn_o
Re: [ovs-dev] [PATCH v2 04/15] conntrack: New userspace connection tracker.
Thanks for the detailed review Joe, replies inline On 19/04/2016 14:36, "Joe Stringer" wrote: >On 15 April 2016 at 17:02, Daniele Di Proietto wrote: >> This commit adds the conntrack module. >> >> It is a connection tracker that resides entirely in userspace. Its >> primary user will be the dpif-netdev datapath. >> >> The module main goal is to provide conntrack_execute(), which offers a >> convenient interface to implement the datapath ct() action. >> >> The conntrack module uses two submodules to deal with the l4 protocol >> details (conntrack-other for UDP and ICMP, conntrack-tcp for TCP). >> >> The conntrack-tcp submodule implementation is adapted from FreeBSD's pf >> subsystem, therefore it's BSD licensed. It has been slightly altered to >> match the OVS coding style and to allow the pickup of already >> established connections. >> >> Signed-off-by: Daniele Di Proietto > >Thanks for submitting this, I know there's a few people excited about >this feature. > >I notice that there is no NEWS item about this, were you intending to >hold off on announcing it until there is feature parity with the >kernel, eg support for IPv[46] fragments; ALGs; NAT? Actually I didn't think about this :-) I'll add a line in NEWS > > >> diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c >> new file mode 100644 >> index 000..65d02a9 >> --- /dev/null >> +++ b/lib/conntrack-other.c >> @@ -0,0 +1,91 @@ >> +/* >> + * Copyright (c) 2015, 2016 Nicira, Inc. >> + * >> + * Licensed under the Apache License, Version 2.0 (the "License"); >> + * you may not use this file except in compliance with the License. >> + * You may obtain a copy of the License at: >> + * >> + * http://www.apache.org/licenses/LICENSE-2.0 >> + * >> + * Unless required by applicable law or agreed to in writing, software >> + * distributed under the License is distributed on an "AS IS" BASIS, >> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. >> + * See the License for the specific language governing permissions and >> + * limitations under the License. >> + */ >> + >> +#include >> + >> +#include "conntrack-private.h" >> +#include "dp-packet.h" >> + >> +enum other_state { >> +OTHERS_FIRST, >> +OTHERS_MULTIPLE, >> +OTHERS_BIDIR, >> +}; >> + >> +struct conn_other { >> +struct conn up; >> +enum other_state state; >> +}; >> + >> +static const long long other_timeouts[] = { >> +[OTHERS_FIRST] = 60 * 1000, >> +[OTHERS_MULTIPLE] = 60 * 1000, >> +[OTHERS_BIDIR] = 30 * 1000, >> +}; > >Are these timeouts in milliseconds? (a comment or #define might help with that) Sure, I'll add a comment. > > > >> diff --git a/lib/conntrack.c b/lib/conntrack.c >> new file mode 100644 >> index 000..840335b >> --- /dev/null >> +++ b/lib/conntrack.c >... >> +static struct ct_l4_proto *l4_protos[] = { >> +[IPPROTO_TCP] = &ct_proto_tcp, >> +[IPPROTO_UDP] = &ct_proto_other, >> +[IPPROTO_ICMP] = &ct_proto_other, >> +}; > >I see that conntrack-other is shared by UDP and ICMP. In the Linux >datapath, the UDP handler also checks UDP length and UDP checksum - I >wonder if we can share most of the code here for these protocols but >add these additional checks for the UDP case? Thanks for pointing this out! I think it's a good idea to check the UDP length, I've added a check later. This didn't handle at all checksum verification. I thought it was out of scope for the connection tracker, but since the kernel connection tracker does this, I've added the validation. None of this need (yet) a separate conntrack-icmp module, I'm implementing it when the key is being extracted (as you point out later). We would need a separate conntrack-icmp module if we wanted to track icmp type and code. > >> +static struct conn * >> +conn_not_found(struct conntrack *ct, struct dp_packet *pkt, >> + struct conn_lookup_ctx *ctx, uint8_t *state, bool commit, >> + long long now) >> +{ >> +unsigned bucket = hash_to_bucket(ctx->hash); >> +struct conn *nc = NULL; >> + >> +if (!valid_new(pkt, &ctx->key)) { >> +*state |= CS_INVALID; >> +return nc; >> +} >> + >> +*state |= CS_NEW; >> + >> +if (commit) { >> +nc = new_conn(pkt, &ctx->key, now); >> + >> +memcpy(&nc->rev_key, &ctx->key, sizeof nc->rev_key); >> + >> +conn_key_reverse(&nc->rev_key); >> +hmap_insert(&ct->connections[bucket], &nc->node, ctx->hash); >> +} >> + >> +return nc; >> +} > >This function can return NULL > >> +static struct conn * >> +process_one(struct conntrack *ct, struct dp_packet *pkt, >> +struct conn_lookup_ctx *ctx, uint16_t zone, >> +bool commit, long long now) >> +{ >> +unsigned bucket = hash_to_bucket(ctx->hash); >> +struct conn *conn = ctx->conn; >> +uint8_t state = 0; >> + >> +if (conn) { >> +if (ctx->related) { >> +state |= CS_RELATED; >> +if (ctx->reply) { >> +
Re: [ovs-dev] [PATCH v2 13/15] system-tests: Disable offloads in userspace tests.
On 19/04/2016 15:15, "Joe Stringer" wrote: >On 15 April 2016 at 17:02, Daniele Di Proietto wrote: >> The system userspace testsuite uses the userspace datapath with >> netdev-linux devices, connected to veth pairs with the AF_PACKET socket: >> >> (veth pair) (AF_PACKET) >> TCP stack -> p0 ---> ovs-p0 -> netdev-linux (userspace OVS) >> >> Unfortunately this configuration has some problems with offloads: a >> packet generated by the TCP stack maybe sent to p0 without being >> checksummed or segmented. The AF_PACKET socket, by default, ignores the >> offloads and just transmits the data of the packets to userspace, but: >> >> 1. The packet may need GSO, so the data will be too big to be received >>by the userspace datapath >> 2. The packet might have incomplete checksums, so it will likely be >>discarded by the receiver. >> >> Problem 1 causes TCP connections to see a congestion window smaller than >> the MTU, which hurts performance but doesn't prevent communication. >> >> Problem 2 was hidden in the testsuite by a Linux kernel bug, fixed by >> commit ce8c839b74e3("veth: don’t modify ip_summed; doing so treats >> packets with bad checksums as good"). In the kernels that include the >> fix, the userspace datapath is able to process pings, but not tcp or udp >> data. >> >> Unfortunately I couldn't find a way to ask the AF_PACKET to perform >> offloads in kernel. A possible fix would be to use the PACKET_VNET_HDR >> sockopt and perform the offloads in userspace. >> >> Until a proper fix is worked out for netdev-linux, this commit disables >> offloads on the non-OVS side of the veth pair, as a workaround. >> >> Signed-off-by: Daniele Di Proietto > >Thanks for the fix. As I understand this simply fixes the userspace >testsuite when running on newer kernels today, so we could apply this >independently of the rest of the series. > >Acked-by: Joe Stringer Thanks for the review, I'll push this in a minute ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 15/15] system-tests: Run conntrack tests with userspace
On 19/04/2016 17:24, "Joe Stringer" wrote: >On 19 April 2016 at 16:53, Joe Stringer wrote: >> On 15 April 2016 at 17:02, Daniele Di Proietto >> wrote: >>> The userspace connection tracker doesn't support ALGs, frag reassembly >>> or NAT yet, so skip those tests. >>> >>> Also, connection tracking state input from a local port is not possible >>> in userspace. >>> >>> Finally, the userspace datapath pads all frames with 0, to make them at >>> least 64 bytes. >>> >>> Signed-off-by: Daniele Di Proietto >> >> Having three prongs to your commit message like this usually indicates >> that there are three separate patches folded inside ;-) >> >> FWIW the LOCAL tests should work whether the local stack provides >> conntrack state input or not - they just exercise slightly different >> codepaths in the kernel, once the packet leaves OVS, by going out a >> different netdev type (with a different destination netns). Probably >> the class of bugs these tests pick up are not applicable for userspace >> datapath, so it seems fine to skip the conntrack-related ones. I >> wouldn't mind seeing a basic sanity test that sends packets between >> the local stack and another namespace, though - which would run for >> both testsuites, and both with IPv4 and IPv6 traffic. > >I was mistaken here. I looked back over and apparently some of the >tests do rely on this behaviour, and they're passing on the latest >net-next kernels. They can be skipped for userspace. > >> A few comments on the comments below, but LGTM. >> >> Acked-by: Joe Stringer >> >>> --- >>> tests/system-kmod-macros.at | 28 +++ >>> tests/system-traffic.at | 49 >>> ++-- >>> tests/system-userspace-macros.at | 45 +--- >>> 3 files changed, 107 insertions(+), 15 deletions(-) >>> >>> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at >>> index 8e60929..4cecc23 100644 >>> --- a/tests/system-kmod-macros.at >>> +++ b/tests/system-kmod-macros.at >>> @@ -65,3 +65,31 @@ m4_define([CHECK_CONNTRACK], >>> on_exit 'ovstest test-netlink-conntrack flush' >>> ] >>> ) >>> + >>> +# CHECK_CONNTRACK_ALG() >>> +# >>> +# Perform requirements checks for running conntrack ALG tests. The kernel >>> +# always supports ALG, so no check is needed. >>> +# >>> +m4_define([CHECK_CONNTRACK_ALG]) >> >> Saying the kernel "always" supports X is a little misleading, as these >> features can all be turned off in the CONFIG.. of course, all the >> major distros provide kernels with them turned on so it's not such a >> big deal. Maybe drop the 'always'. Good point. I guess it would be nice to come up with some feature detection. For the moment I'll drop the 'always' Thanks for the review! >> >>> +# CHECK_CONNTRACK_FRAG() >>> +# >>> +# Perform requirements checks for running conntrack fragmentations tests. >>> +# The kernel always supports fragmentation, so no check is needed. >>> +m4_define([CHECK_CONNTRACK_FRAG]) >>> + >>> +# CHECK_CONNTRACK_LOCAL_STACK() >>> +# >>> +# Perform requirements checks for running conntrack tests with local stack. >>> +# The kernel always supports reading the connection state of an skb coming >>> +# from an internal port, without an explicit ct() action, so no check is >>> +# needed. >>> +m4_define([CHECK_CONNTRACK_LOCAL_STACK]) >> >> While the kernel module supports this, the skb is not guaranteed to >> have conntrack state when coming from an internal port, depending on >> the kernel version and whether iptables rules are installed in the >> host network namespace. > >Looks like I got a little mixed up - netfilter hooks are not installed >in all namespaces by default on the latest kernels, but I'd guess that >they're still set up in the root netns so it should be fairly reliable >there (at least for the moment). > >> >> >>> +# CHECK_CONNTRACK_LOCAL_STACK() >>> +# >>> +# Perform requirements checks for running conntrack tests with local stack. >>> +# While the kernel connection tracker automatically passes all the >>> connection >>> +# tracking state from an internal port to the OpenvSwitch kernel module, >>> there >>> +# is simply no way of doing that with the userspace, so skip the tests. >>> +m4_define([CHECK_CONNTRACK_LOCAL_STACK], >>> +[ >>> +AT_SKIP_IF([:]) >>> +]) >> >> Same comment here, packets may not have conntrack state associated >> when arriving on internal port. >> >>> +# CHECK_CONNTRACK_NAT() >>> +# >>> +# Perform requirements checks for running conntrack NAT tests. The >>> userspace >>> +# doesn't support NATs yet, so skip the tests >>> +# >>> +m4_define([CHECK_CONNTRACK_NAT], >>> +[ >>> +AT_SKIP_IF([:]) >>> +]) >>> -- >>> 2.1.4 >>> >>> ___ >>> dev mailing list >>> dev@openvswitch.org >>> http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/
Re: [ovs-dev] [PATCH v2 03/15] flow: Introduce parse_dl_type().
On 26/04/2016 16:35, "Flavio Leitner" wrote: >On Fri, Apr 15, 2016 at 05:02:35PM -0700, Daniele Di Proietto wrote: >> The function simply returns the ethernet type of the packet (after >> eventually discarding the VLAN tag). It will be used by a following >> commit. >> >> Signed-off-by: Daniele Di Proietto >> --- > >I agree with Joe that we should avoid forcing inlining. > >Acked-by: Flavio Leitner I would try to avoid forcing inlining in general, but this is done to prevent a performance regression in miniflow_extract(). Antonio spotted this out in v1 of the series: http://openvswitch.org/pipermail/dev/2016-February/065454.html If you guys agree, I'd be inclined to keep the ALWAYS_INLINE here. Thanks for the review! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 02/15] flow: Export parse_ipv6_ext_hdrs().
On 19/04/2016 13:03, "Joe Stringer" wrote: >On 15 April 2016 at 17:02, Daniele Di Proietto wrote: >> This will be used by a future commit. >> >> Signed-off-by: Daniele Di Proietto >> --- >> lib/flow.c | 140 >> ++--- >> lib/flow.h | 3 ++ >> 2 files changed, 81 insertions(+), 62 deletions(-) >> >> diff --git a/lib/flow.c b/lib/flow.c >> index 560a90f..972a996 100644 >> --- a/lib/flow.c >> +++ b/lib/flow.c >> @@ -439,6 +439,82 @@ invalid: >> arp_buf[1] = eth_addr_zero; >> } >> >> +static inline bool >> +parse_ipv6_ext_hdrs__(const void **datap, size_t *sizep, uint8_t *nw_proto, >> + uint8_t *nw_frag) >> +{ >> +while (1) { >> +if (OVS_LIKELY((*nw_proto != IPPROTO_HOPOPTS) >> + && (*nw_proto != IPPROTO_ROUTING) >> + && (*nw_proto != IPPROTO_DSTOPTS) >> + && (*nw_proto != IPPROTO_AH) >> + && (*nw_proto != IPPROTO_FRAGMENT))) { >> +/* It's either a terminal header (e.g., TCP, UDP) or one we >> + * don't understand. In either case, we're done with the >> + * packet, so use it to fill in 'nw_proto'. */ >> +return true; >> +} >> + >> +/* We only verify that at least 8 bytes of the next header are >> + * available, but many of these headers are longer. Ensure that >> + * accesses within the extension header are within those first 8 >> + * bytes. All extension headers are required to be at least 8 >> + * bytes. */ >> +if (OVS_UNLIKELY(*sizep < 8)) { >> +return false; >> +} >> + >> +if ((*nw_proto == IPPROTO_HOPOPTS) >> +|| (*nw_proto == IPPROTO_ROUTING) >> +|| (*nw_proto == IPPROTO_DSTOPTS)) { >> +/* These headers, while different, have the fields we care >> + * about in the same location and with the same >> + * interpretation. */ >> +const struct ip6_ext *ext_hdr = *datap; >> +*nw_proto = ext_hdr->ip6e_nxt; >> +if (OVS_UNLIKELY(!data_try_pull(datap, sizep, >> +(ext_hdr->ip6e_len + 1) * 8))) { >> +return false; >> +} >> +} else if (*nw_proto == IPPROTO_AH) { >> +/* A standard AH definition isn't available, but the fields >> + * we care about are in the same location as the generic >> + * option header--only the header length is calculated >> + * differently. */ >> +const struct ip6_ext *ext_hdr = *datap; >> +*nw_proto = ext_hdr->ip6e_nxt; >> +if (OVS_UNLIKELY(!data_try_pull(datap, sizep, >> +(ext_hdr->ip6e_len + 2) * 4))) { >> +return false; >> +} >> +} else if (*nw_proto == IPPROTO_FRAGMENT) { >> +const struct ovs_16aligned_ip6_frag *frag_hdr = *datap; >> + >> +*nw_proto = frag_hdr->ip6f_nxt; >> +if (!data_try_pull(datap, sizep, sizeof *frag_hdr)) { >> +return false; >> +} >> + >> +/* We only process the first fragment. */ >> +if (frag_hdr->ip6f_offlg != htons(0)) { >> +*nw_frag = FLOW_NW_FRAG_ANY; >> +if ((frag_hdr->ip6f_offlg & IP6F_OFF_MASK) != htons(0)) { >> +*nw_frag |= FLOW_NW_FRAG_LATER; >> +*nw_proto = IPPROTO_FRAGMENT; >> +return true; >> +} >> +} >> +} >> +} >> +} >> + >> +bool >> +parse_ipv6_ext_hdrs(const void **datap, size_t *sizep, uint8_t *nw_proto, >> +uint8_t *nw_frag) >> +{ >> +return parse_ipv6_ext_hdrs__(datap, sizep, nw_proto, nw_frag); >> +} >> + > >I couldn't tell any difference between parse_ipv6_ext_hdrs() and >parse_ipv6_ext_hdrs__(); are they both necessary? > >Acked-by: Joe Stringer You're right, there's no difference, but miniflow_extract() can call the inline version, which usually generates better code. Thanks for the review ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 09/15] dpif-netdev: Execute conntrack action.
On 19/04/2016 14:52, "Joe Stringer" wrote: >On 15 April 2016 at 17:02, Daniele Di Proietto wrote: >> This commit implements the OVS_ACTION_ATTR_CT action in dpif-netdev. >> >> To allow ofproto-dpif to detect the conntrack feature, flow_put will not >> discard anymore flows with ct_* fields set. We still shouldn't allow >> flows with NAT bits set, since there is no support for NAT. >> >> Signed-off-by: Daniele Di Proietto >> --- > >... > >> @@ -2604,6 +2617,9 @@ dpif_netdev_run(struct dpif *dpif) >> ovs_mutex_unlock(&dp->non_pmd_mutex); >> dp_netdev_pmd_unref(non_pmd); >> >> +/* XXX: If workload is too heavy we could add a separate thread. */ >> +conntrack_run(&dp->conntrack); >> + > >Have you tried, eg, portscanning with several threads against the >implementation to see how bad it gets? Good point, I'm doing some testing and I'll revisit this. > > >> @@ -3850,12 +3866,48 @@ dp_execute_cb(void *aux_, struct dp_packet >> **packets, int cnt, >> VLOG_WARN("Packet dropped. Max recirculation depth exceeded."); >> break; >> >> -case OVS_ACTION_ATTR_CT: >> -/* If a flow with this action is slow-pathed, datapath assistance is >> - * required to implement it. However, we don't support this action >> - * in the userspace datapath. */ >> -VLOG_WARN("Cannot execute conntrack action in userspace."); >> +case OVS_ACTION_ATTR_CT: { >> +const struct nlattr *b; >> +bool commit = false; >> +unsigned int left; >> +uint16_t zone = 0; >> +const char *helper = NULL; >> +const uint32_t *setmark = NULL; >> +const struct ovs_key_ct_labels *setlabel = NULL; >> + >> + >> +/* XXX parsing this everytime is expensive. We should do like >> kernel >> + * does and create a structure. */ > >Seems reasonable. You say it's expensive (how expensive?), but it also >seems a little cleaner to store it in a more accessible manner. I tried creating a structure, but it doesn't seem to make a difference in performance. It is a little cleaner, but the I guess the complexity of maintaining a (not so different) action format doesn't seem worth for now. I think I'll remove this comment. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 10/15] dpif-netdev: Implement conntrack dump functions.
On 19/04/2016 14:54, "Joe Stringer" wrote: >On 15 April 2016 at 17:02, Daniele Di Proietto wrote: >> Signed-off-by: Daniele Di Proietto > >Could be combined with the earlier patch that provides the actual >implementation? Done, thanks! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 08/15] conntrack: Implement dumping to ct_entry.
On 19/04/2016 14:49, "Joe Stringer" wrote: >On 15 April 2016 at 17:02, Daniele Di Proietto wrote: >> Signed-off-by: Daniele Di Proietto > > > >> +static void >> +conn_key_to_tuple(const struct conn_key *key, struct ct_dpif_tuple *tuple) >> +{ >> +if (key->dl_type == htons(ETH_TYPE_IP)) { >> +tuple->l3_type = AF_INET; >> +} else if (key->dl_type == htons(ETH_TYPE_IPV6)) { >> +tuple->l3_type = AF_INET6; >> +} >> +tuple->ip_proto = key->nw_proto; >> +ct_endpoint_to_ct_dpif_inet_addr(&key->src.addr, &tuple->src, >> + key->dl_type); >> +ct_endpoint_to_ct_dpif_inet_addr(&key->dst.addr, &tuple->dst, >> + key->dl_type); >> + >> +if (key->nw_proto == IPPROTO_ICMP) { >> +tuple->icmp_id = key->src.port; >> +/* ICMP type and code are not tracked */ >> +tuple->icmp_type = 0; >> +tuple->icmp_code = 0; >> +} else { >> +tuple->src_port = key->src.port; >> +tuple->dst_port = key->dst.port; >> +} > >I think in the Linux implementation, the original ICMP message's icmp >type/code are stored/provided here. Might be a behaviour parity >question. Yes, but they're not part of the key. I guess they could be tracked separately with a conntrack-icmp module. > >> +} >> + >> +static void >> +conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, >> + long long now) >> +{ >> +struct ct_l4_proto *class; >> +long long expiration; >> +memset(entry, 0, sizeof *entry); >> +conn_key_to_tuple(&conn->key, &entry->tuple_orig); >> +conn_key_to_tuple(&conn->rev_key, &entry->tuple_reply); >> + >> +entry->zone = conn->key.zone; >> +entry->mark = conn->mark; >> + >> +memcpy(&entry->labels, &conn->label, sizeof(entry->labels)); >> +/* Not implemented yet */ >> +entry->timestamp.start = 0; >> +entry->timestamp.stop = 0; > >Is it better to reflect that timestamps are unsupported up to the user >or simply report 0 when it's unsupported? (I don't have a particular >preference, just asking the question) I was leaning towards setting them to 0. It is a special value which will not be printed in ct_dpif_format_timestamp(). ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 07/15] conntrack: Implement flush function.
Good idea, I'll combine them Thanks for the review! On 19/04/2016 14:40, "Joe Stringer" wrote: >On 15 April 2016 at 17:02, Daniele Di Proietto wrote: >> Signed-off-by: Daniele Di Proietto > >Looks like this can be combined with patch #11 (to plumb it up through appctl). > >Acked-by: Joe Stringer ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Message could not be delivered
The original message was received at Wed, 27 Apr 2016 12:05:14 +0700 from schleef.org [19.182.20.189] - The following addresses had permanent fatal errors - - Transcript of session follows - ... while talking to 162.107.103.184: >>> DATA <<< 400-aturner; %MAIL-E-OPENOUT, error opening !AS as output <<< 400-aturner; -RMS-E-CRE, ACP file create failed <<< 400-aturner; -SYSTEM-F-EXDISKQUOTA, disk quota exceeded <<< 400 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 0/5] Backport nf_defrag_ipv6 changes.
On Thu, Apr 21, 2016 at 2:07 PM, Joe Stringer wrote: > This short series addresses some of the netfilter/defrag-related changes > recently upstream and backports the equivalent fixes to our compat code. > The last two patches address bugs introduced from the second patch, I've > left them as separate patches here to mirror the upstream commits. I'm mostly relying on you to know which commits are important to backport since you are working in this area much more than me. However, where do we stand after this patch series? Based on a quick scan, I saw that it looks like at least one patch isn't included here that is conntrack related and hasn't been backported: 90c7afc96cbbd77f44094b5b651261968e97de67 ("openvswitch: Fix template leak in error cases.") Is there more that we still need (other than NAT, of course)? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/5] compat: nf_defrag_ipv6: avoid/free clone operations.
On Thu, Apr 21, 2016 at 2:07 PM, Joe Stringer wrote: > diff --git > a/datapath/linux/compat/include/net/netfilter/ipv6/nf_defrag_ipv6.h > b/datapath/linux/compat/include/net/netfilter/ipv6/nf_defrag_ipv6.h > index fe99ced37227..a3b86dab2c9c 100644 > --- a/datapath/linux/compat/include/net/netfilter/ipv6/nf_defrag_ipv6.h > +++ b/datapath/linux/compat/include/net/netfilter/ipv6/nf_defrag_ipv6.h > @@ -16,17 +16,17 @@ > #define OVS_NF_DEFRAG6_BACKPORT 1 > struct sk_buff *rpl_nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, >u32 user); > +#define nf_ct_frag6_gather rpl_nf_ct_frag6_gather > +#endif /* HAVE_NF_CT_FRAG6_CONSUME_ORIG */ > + > +#ifdef OVS_NF_DEFRAG6_BACKPORT > int __init rpl_nf_ct_frag6_init(void); > void rpl_nf_ct_frag6_cleanup(void); > -void rpl_nf_ct_frag6_consume_orig(struct sk_buff *skb); > -#define nf_ct_frag6_gather rpl_nf_ct_frag6_gather > -#else /* HAVE_NF_CT_FRAG6_CONSUME_ORIG */ > +#else /* !OVS_NF_DEFRAG6_BACKPORT */ > static inline int __init rpl_nf_ct_frag6_init(void) { return 0; } > static inline void rpl_nf_ct_frag6_cleanup(void) { } > -static inline void rpl_nf_ct_frag6_consume_orig(struct sk_buff *skb) { } > -#endif /* HAVE_NF_CT_FRAG6_CONSUME_ORIG */ > +#endif /* OVS_NF_DEFRAG6_BACKPORT */ > #define nf_ct_frag6_init rpl_nf_ct_frag6_init > #define nf_ct_frag6_cleanup rpl_nf_ct_frag6_cleanup > -#define nf_ct_frag6_consume_orig rpl_nf_ct_frag6_consume_orig I'm not sure that I totally understand what is going on in this file. In particular, although not directly related to this patch, I'm not sure why we define rpl_nf_ct_frag6_init()/cleanup() to be no-ops in cases where we don't have OVS_NF_DEFRAG6_BACKPORT. On new kernels, shouldn't we be using the upstream definitions of these functions? The values initialized seem to be used by functions which aren't backported. More minor but it also seems odd that rpl_nf_ct_frag6_gather() is declared in the first #ifdef but then actually defined under OVS_NF_DEFRAG6_BACKPORT like all of the other functions. What's the reason for separating it out this way? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 5/5] datapath: Orphan skbs before IPv6 defrag
On Thu, Apr 21, 2016 at 2:07 PM, Joe Stringer wrote: > Upstream commit: > openvswitch: Orphan skbs before IPv6 defrag > > This is the IPv6 counterpart to commit 8282f27449bf ("inet: frag: Always > orphan skbs inside ip_defrag()"). > > Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free > clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be > cloned (implicitly orphaning) prior to queueing for reassembly. As such, > when the IPv6 message is eventually reassembled, the skb->sk for all > fragments would be NULL. After that commit was introduced, rather than > cloning, the original skbs were queued directly without orphaning. The > end result is that all frags except for the first and last may have a > socket attached. > > This commit explicitly orphans such skbs during nf_ct_frag6_gather() to > prevent BUG_ON(skb->sk) during a later call to ip6_fragment(). > > kernel BUG at net/ipv6/ip6_output.c:631! > [...] > Call Trace: > > [] ? __lock_acquire+0x927/0x20a0 > [] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch] > [] ? __lock_is_held+0x52/0x70 > [] ovs_fragment+0x1f7/0x280 [openvswitch] > [] ? mark_held_locks+0x75/0xa0 > [] ? _raw_spin_unlock_irqrestore+0x36/0x50 > [] ? dst_discard_out+0x20/0x20 > [] ? dst_ifdown+0x80/0x80 > [] do_output.isra.28+0xf3/0x1b0 [openvswitch] > [] do_execute_actions+0x709/0x12c0 [openvswitch] > [] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch] > [] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch] > [] ? _raw_spin_unlock+0x27/0x40 > [] ovs_execute_actions+0x45/0x120 [openvswitch] > [] ovs_dp_process_packet+0x85/0x150 [openvswitch] > [] ? _raw_spin_unlock+0x27/0x40 > [] ovs_execute_actions+0xc4/0x120 [openvswitch] > [] ovs_dp_process_packet+0x85/0x150 [openvswitch] > [] ? key_extract+0x442/0xc10 [openvswitch] > [] ovs_vport_receive+0x5d/0xb0 [openvswitch] > [] ? __lock_acquire+0x927/0x20a0 > [] ? __lock_acquire+0x927/0x20a0 > [] ? __lock_acquire+0x927/0x20a0 > [] ? _raw_spin_unlock_irqrestore+0x36/0x50 > [] internal_dev_xmit+0x6d/0x150 [openvswitch] > [] ? internal_dev_xmit+0x5/0x150 [openvswitch] > [] dev_hard_start_xmit+0x2df/0x660 > [] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0 > [] __dev_queue_xmit+0x8f5/0x950 > [] ? __dev_queue_xmit+0x50/0x950 > [] ? mark_held_locks+0x75/0xa0 > [] dev_queue_xmit+0x10/0x20 > [] neigh_resolve_output+0x178/0x220 > [] ? ip6_finish_output2+0x219/0x7b0 > [] ip6_finish_output2+0x219/0x7b0 > [] ? ip6_finish_output2+0x65/0x7b0 > [] ? ip_idents_reserve+0x6b/0x80 > [] ? ip6_fragment+0x93f/0xc50 > [] ip6_fragment+0xba1/0xc50 > [] ? ip6_flush_pending_frames+0x40/0x40 > [] ip6_finish_output+0xcb/0x1d0 > [] ip6_output+0x5f/0x1a0 > [] ? ip6_fragment+0xc50/0xc50 > [] ip6_local_out+0x3d/0x80 > [] ip6_send_skb+0x2f/0xc0 > [] ip6_push_pending_frames+0x4d/0x50 > [] icmpv6_push_pending_frames+0xac/0xe0 > [] icmpv6_echo_reply+0x42e/0x500 > [] icmpv6_rcv+0x4cf/0x580 > [] ip6_input_finish+0x1a7/0x690 > [] ? ip6_input_finish+0x5/0x690 > [] ip6_input+0x30/0xa0 > [] ? ip6_rcv_finish+0x1a0/0x1a0 > [] ip6_rcv_finish+0x4e/0x1a0 > [] ipv6_rcv+0x45f/0x7c0 > [] ? ipv6_rcv+0x36/0x7c0 > [] ? ip6_make_skb+0x1c0/0x1c0 > [] __netif_receive_skb_core+0x229/0xb80 > [] ? mark_held_locks+0x75/0xa0 > [] ? process_backlog+0x6f/0x230 > [] __netif_receive_skb+0x16/0x70 > [] process_backlog+0x78/0x230 > [] ? process_backlog+0xdd/0x230 > [] net_rx_action+0x203/0x480 > [] ? mark_held_locks+0x75/0xa0 > [] __do_softirq+0xde/0x49f > [] ? ip6_finish_output2+0x228/0x7b0 > [] do_softirq_own_stack+0x1c/0x30 > > [] do_softirq.part.18+0x3b/0x40 > [] __local_bh_enable_ip+0xb6/0xc0 > [] ip6_finish_output2+0x251/0x7b0 > [] ? ip6_fragment+0xba1/0xc50 > [] ? ip_idents_reserve+0x6b/0x80 > [] ? ip6_fragment+0x93f/0xc50 > [] ip6_fragment+0xba1/0xc50 > [] ? ip6_flush_pending_frames+0x40/0x40 > [] ip6_finish_output+0xcb/0x1d0 > [] ip6_output+0x5f/0x1a0 > [] ? ip6_fragment+0xc50/0xc50 > [] ip6_local_out+0x3d/0x80 > [] ip6_send_skb+0x2f/0xc0 > [] ip6_push_pending_frames+0x4d/0x50 > [] rawv6_sendmsg+0xa28/0xe30 > [] ? inet_sendmsg+0xc7/0x1d0 > [] inet_sendmsg+0x106/0x1d0 > [] ? inet_sendmsg+0x5/0x1d0 > [] sock_sendmsg+0x38/0x50 > [] SYSC_sendto+0xf6/0x170 > [] ? trace_hardirqs_on_thunk+0x1b/0x1d > [] SyS_sendto+0xe/0x10 > [] entry_SYSCALL_64_fastpath+0x18/0xa8 > Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 > d0 72 14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b > 86 cc 00 00 00 49 8# > RIP [] i
Re: [ovs-dev] [PATCH net-next 9/9] taskstats: use the libnl API to align nlattr on 64-bit
On 23/04/16 01:31, Nicolas Dichtel wrote: > Goal of this patch is to use the new libnl API to align netlink attribute > when needed. > The layout of the netlink message will be a bit different after the patch, > because the padattr (TASKSTATS_TYPE_STATS) will be inside the nested > attribute instead of before it. > > Signed-off-by: Nicolas Dichtel The layout will change/break user space -- I've not tested the patch though.. Balbir Singh. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [patch_v4] vtep: add source node replication support.
Offline discussion is documented in v1->v2 below. There are various considerations to what kind of replication configuration is needed and what is "better" varies with the specific requirements. Feel free to suggest otherwise. This patch series updates the vtep schema to support source node replication in addition to service node replication per logical switch. The default replication mode is service_node as that was the only mode previously supported. New vtep-ctl commands are added to set/reset the replication mode per logical switch. Setting the replication mode is optional and if no configuration is done, the replication mode is service node. Resetting the replication mode sets the replication mode back to a default of service node. In the event that a physical switch cannot support source node replication, an error is populated in the database that can be read by a controller requesting source node replication mode. All logical switches continue to use service node replication since source node replication cannot be supported. If the controller then resets the replication mode, the physical switch should clear the error state. The vtep emulator is also updated to read the vtep database logical switch replication mode and to support both types of replication. The vtep emulator help documentation is also updated. v1->v2: Offline discussion In this discussion, it was agreed only logical switch level replication mode configuration is needed. The rational was that physical switch level option configuration was only added as a convenience; using a controller, the added benefit is minimal and hence the physical switch level configuration portion is dropped and the logical switch level configuration retained. Although in v1, service node mode is the default replication mode, v2 makes replication mode an option and only source node mode can be optionally configured. If a physical switch cannot support source node replication, then an error should be set in the physical switch table that can be read by a controller. v2->v3: Fixed a couple bugs in ovs-vtep I noticed after v2. v3->v4: Added recommendation text for handling lack of source node replication mode support on a transport node in a logical switch, with input from Bruce and Anupam (offline); added to vtep.xml. Also added recommendation for proactive source node replication mode support. Reworded alternate replication mode description in vtep.xml per Bruce. Shorten name to Alternate Replication Mode from per logical switch Alternate Replication Mode. Darrell Ball (1): vtep: add source node replication support. tests/vtep-ctl.at | 17 ++ vtep/README.ovs-vtep.md | 19 ++-- vtep/ovs-vtep | 36 + vtep/vtep-ctl.8.in | 9 vtep/vtep-ctl.c | 42 ++ vtep/vtep.ovsschema | 9 ++-- vtep/vtep.xml | 60 - 7 files changed, 179 insertions(+), 13 deletions(-) -- 1.9.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [patch_v4] vtep: add source node replication support.
This patch series updates the vtep schema, vtep-ctl commands and vtep simulator to support source node replication in addition to service node replication per logical switch. The default replication mode is service node as that was the only mode previously supported. Source node replication mode is optionally configurable and resetting the replication mode implicitly sets the replication mode back to a default of service node. Signed-off-by: Darrell Ball --- tests/vtep-ctl.at | 17 ++ vtep/README.ovs-vtep.md | 19 ++-- vtep/ovs-vtep | 36 + vtep/vtep-ctl.8.in | 9 vtep/vtep-ctl.c | 42 ++ vtep/vtep.ovsschema | 9 ++-- vtep/vtep.xml | 60 - 7 files changed, 179 insertions(+), 13 deletions(-) diff --git a/tests/vtep-ctl.at b/tests/vtep-ctl.at index 99e97e8..d2323a0 100644 --- a/tests/vtep-ctl.at +++ b/tests/vtep-ctl.at @@ -318,6 +318,23 @@ CHECK_LSWITCHES([a]) VTEP_CTL_CLEANUP AT_CLEANUP +AT_SETUP([add-ls a, set-ls-replication-mode a source_node]) +AT_KEYWORDS([vtep-ctl]) +VTEP_CTL_SETUP +AT_CHECK([RUN_VTEP_CTL( + [add-ls a],[set-ls-replication-mode a source_node])], + [0], [], [], [VTEP_CTL_CLEANUP]) +VTEP_CTL_CLEANUP +AT_CLEANUP + +AT_SETUP([add-ls a, reset-ls-replication-mode a]) +AT_KEYWORDS([vtep-ctl]) +VTEP_CTL_SETUP +AT_CHECK([RUN_VTEP_CTL( + [add-ls a],[reset-ls-replication-mode a])], + [0], [], [], [VTEP_CTL_CLEANUP]) +VTEP_CTL_CLEANUP +AT_CLEANUP dnl -- AT_BANNER([vtep-ctl unit tests -- logical binding tests]) diff --git a/vtep/README.ovs-vtep.md b/vtep/README.ovs-vtep.md index 6734dab..74900f1 100644 --- a/vtep/README.ovs-vtep.md +++ b/vtep/README.ovs-vtep.md @@ -166,13 +166,28 @@ vtep-ctl bind-ls br0 p0 0 ls0 vtep-ctl set Logical_Switch ls0 tunnel_key=33 ``` -3. Direct unknown destinations out a tunnel: +3. Optionally, change the replication mode from a default of service_node to + an alternate mode of source_node, which can be done at the logical switch + level: + + ``` +vtep-ctl set-ls-replication-mode ls0 source_node + ``` + +4. The alternate replication mode can also be reset back to the default of + service node replication, at the logical switch level: + + ``` +vtep-ctl reset-ls-replication-mode ls0 + ``` + +5. Direct unknown destinations out a tunnel: ``` vtep-ctl add-mcast-remote ls0 unknown-dst 10.2.2.2 ``` -4. Direct unicast destinations out a different tunnel: +6. Direct unicast destinations out a different tunnel: ``` vtep-ctl add-ucast-remote ls0 00:11:22:33:44:55 10.2.2.3 diff --git a/vtep/ovs-vtep b/vtep/ovs-vtep index 31ff159..cd52d73 100755 --- a/vtep/ovs-vtep +++ b/vtep/ovs-vtep @@ -94,6 +94,7 @@ class Logical_Switch(object): self.unknown_dsts = set() self.tunnel_key = 0 self.setup_ls() +self.replication_mode = "service_node" def __del__(self): vlog.info("destroying lswitch %s" % self.name) @@ -141,13 +142,17 @@ class Logical_Switch(object): ovs_ofctl("add-flow %s table=1,priority=1,in_port=%s,action=%s" % (self.short_name, port_no, ",".join(flood_ports))) -# Traffic coming from a VTEP physical port should only be flooded to -# one 'unknown-dst' and to all other physical ports that belong to that -# VTEP device and this logical switch. +# Traffic coming from a VTEP physical port should always be flooded to +# to all the other physical ports that belong to that VTEP device and +# this logical switch. If the replication mode is service node then +# send to one unknown_dst node (the first one here); else we assume the +# replication mode is source node and we send the packet to all +# unknown_dst nodes. for tunnel in self.unknown_dsts: port_no = self.tunnels[tunnel][0] flood_ports.append(port_no) -break +if self.replication_mode == "service_node": +break ovs_ofctl("add-flow %s table=1,priority=0,action=%s" % (self.short_name, ",".join(flood_ports))) @@ -293,8 +298,31 @@ class Logical_Switch(object): self.remote_macs = remote_macs +replication_mode = vtep_ctl("get logical_switch %s alt_replication_mode" +% self.name) + +# Replication mode is an optional column and if it is not set, +# replication mode defaults to service_node. +if not replication_mode: +replication_mode = "service_node" + +# If the logical switch level replication mode has changed then +# update to that value. +replic_mode_change = False +if replication_mode != self.replication_mode: +
Re: [ovs-dev] [PATCH v2] stt: linearize for CONFIG_SLUB case
On Mon, Apr 25, 2016 at 2:35 PM, Pravin B Shelar wrote: > diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c > index eb397e8..a1b309a 100644 > --- a/datapath/linux/compat/stt.c > +++ b/datapath/linux/compat/stt.c > +static int coalesce_skb(struct sk_buff **headp) > +{ > + struct sk_buff *frag, *head = *headp; > + int tot_len = FRAG_CB(head)->first.tot_len; > + int err; > + > + if (!head->next) > + return 0; > + > + err = pskb_expand_head(head, skb_headroom(head), tot_len, GFP_ATOMIC); > + if (err) > + return err; The headroom and tailroom size seem a little big - I think they are supposed to be the delta amount, not the absolute value that you need. > + if (unlikely(!__pskb_pull_tail(head, head->data_len))) > + BUG(); > + > + for (frag = head->next; frag; frag = frag->next) { > + skb_copy_bits(frag, 0, skb_put(head, frag->len), frag->len); > + } Are we missing a free somewhere of the old fragments that we copied in? > +#ifdef SKIP_ZERO_COPY > +static int __copy_skb(struct sk_buff *to, struct sk_buff *from, > + int *delta, bool *headstolen) [...] > + *headstolen = false; > + err = pskb_expand_head(to, skb_headroom(to), > + to->len + from->len, GFP_ATOMIC); I think that this allocates more memory than is necessary. I believe that it already takes the existing headroom into account so this would double it. It looks like in the tailroom we should be using to->data_len instead of to->len as well. > + if (unlikely(to->data_len || (from->len > skb_tailroom(to > + return -EINVAL; This is probably a little overly cautious for a fast path check given that we just allocated space. > @@ -1154,8 +1247,15 @@ static struct sk_buff *reassemble(struct sk_buff *skb) [...] > + if (copied_skb) { > + if (headstolen) { > + skb->len = 0; > + skb->data_len = 0; > + skb->truesize -= delta; > + } > + } Do we need to make these adjustments at all? The packet is about to be freed shortly. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] datapath-windows: Fix recirculation when it is not the last attribute
Thans for the fix. Acked-by: Nithin Raju -Original Message- From: dev on behalf of Sairam Venugopal Date: Tuesday, April 26, 2016 at 4:53 PM To: "dev@openvswitch.org" Subject: [ovs-dev] [PATCH] datapath-windows: Fix recirculation when it is not the last attribute >When the recirc action is in middle, the current code creates a clone of >the NBL. However, it overwrites the pointer to point to the cloned NBL >without completing it. This causes a memory leak that crashes the kernel. > >Signed-off-by: Sairam Venugopal >--- > datapath-windows/ovsext/Actions.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > >diff --git a/datapath-windows/ovsext/Actions.c >b/datapath-windows/ovsext/Actions.c >index 5dae6b4..5ad29ee 100644 >--- a/datapath-windows/ovsext/Actions.c >+++ b/datapath-windows/ovsext/Actions.c >@@ -1557,10 +1557,14 @@ OvsExecuteRecirc(OvsForwardingContext *ovsFwdCtx, > ovsActionStats.noCopiedNbl++; > return NDIS_STATUS_SUCCESS; > } >-ovsFwdCtx->curNbl = newNbl; > } > >-deferredAction = OvsAddDeferredActions(ovsFwdCtx->curNbl, key, NULL); >+if (newNbl) { >+deferredAction = OvsAddDeferredActions(newNbl, key, NULL); >+} else { >+deferredAction = OvsAddDeferredActions(ovsFwdCtx->curNbl, key, >NULL); >+} >+ > if (deferredAction) { > deferredAction->key.recircId = NlAttrGetU32(actions); > } else { >-- >1.9.5.msysgit.0 > >___ >dev mailing list >dev@openvswitch.org >https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_mailma >n_listinfo_dev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=pN >HQcdr7B40b4h6Yb7FIedI1dnBsxdDuTLBYD3JqV80&m=mp-I9a4bh0khikMfErkQfKi0MxEFbe >WI0dOhNBsCSPo&s=gjRuXLD5XalccxouZ4ka8mIV0LaeH4__y6P_sRC6DFw&e= ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] datapath-windows: Fix recirculation when it is not the last attribute
When the recirc action is in middle, the current code creates a clone of the NBL. However, it overwrites the pointer to point to the cloned NBL without completing it. This causes a memory leak that crashes the kernel. Signed-off-by: Sairam Venugopal --- datapath-windows/ovsext/Actions.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/datapath-windows/ovsext/Actions.c b/datapath-windows/ovsext/Actions.c index 5dae6b4..5ad29ee 100644 --- a/datapath-windows/ovsext/Actions.c +++ b/datapath-windows/ovsext/Actions.c @@ -1557,10 +1557,14 @@ OvsExecuteRecirc(OvsForwardingContext *ovsFwdCtx, ovsActionStats.noCopiedNbl++; return NDIS_STATUS_SUCCESS; } -ovsFwdCtx->curNbl = newNbl; } -deferredAction = OvsAddDeferredActions(ovsFwdCtx->curNbl, key, NULL); +if (newNbl) { +deferredAction = OvsAddDeferredActions(newNbl, key, NULL); +} else { +deferredAction = OvsAddDeferredActions(ovsFwdCtx->curNbl, key, NULL); +} + if (deferredAction) { deferredAction->key.recircId = NlAttrGetU32(actions); } else { -- 1.9.5.msysgit.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 15/15] system-tests: Run conntrack tests with userspace
On Fri, Apr 15, 2016 at 05:02:47PM -0700, Daniele Di Proietto wrote: > The userspace connection tracker doesn't support ALGs, frag reassembly > or NAT yet, so skip those tests. > > Also, connection tracking state input from a local port is not possible > in userspace. > > Finally, the userspace datapath pads all frames with 0, to make them at > least 64 bytes. > > Signed-off-by: Daniele Di Proietto > --- LGTM Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 14/15] system-tests: Add tcp simple test.
On Fri, Apr 15, 2016 at 05:02:46PM -0700, Daniele Di Proietto wrote: > Useful to test the datapath ability to forward tcp packets without the > complexity of connection tracking. > > Signed-off-by: Daniele Di Proietto > --- This patch can be separated from this patchset as well. Thanks for improving the testsuite. Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 12/15] tests: Add conntrack ofproto-dpif tests.
On Fri, Apr 15, 2016 at 05:02:44PM -0700, Daniele Di Proietto wrote: > While the system testsuite already has connection tracking tests, it > will be still useful to add some to the standard testsuite because: > > * They're run more often by developers. > * Some of them are more interesting for the userspace datapath. > > Signed-off-by: Daniele Di Proietto > --- Nice! Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 13/15] system-tests: Disable offloads in userspace tests.
On Fri, Apr 15, 2016 at 05:02:45PM -0700, Daniele Di Proietto wrote: > The system userspace testsuite uses the userspace datapath with > netdev-linux devices, connected to veth pairs with the AF_PACKET socket: > > (veth pair) (AF_PACKET) > TCP stack -> p0 ---> ovs-p0 -> netdev-linux (userspace OVS) > > Unfortunately this configuration has some problems with offloads: a > packet generated by the TCP stack maybe sent to p0 without being > checksummed or segmented. The AF_PACKET socket, by default, ignores the > offloads and just transmits the data of the packets to userspace, but: > > 1. The packet may need GSO, so the data will be too big to be received >by the userspace datapath > 2. The packet might have incomplete checksums, so it will likely be >discarded by the receiver. > > Problem 1 causes TCP connections to see a congestion window smaller than > the MTU, which hurts performance but doesn't prevent communication. > > Problem 2 was hidden in the testsuite by a Linux kernel bug, fixed by > commit ce8c839b74e3("veth: don’t modify ip_summed; doing so treats > packets with bad checksums as good"). In the kernels that include the > fix, the userspace datapath is able to process pings, but not tcp or udp > data. > > Unfortunately I couldn't find a way to ask the AF_PACKET to perform > offloads in kernel. A possible fix would be to use the PACKET_VNET_HDR > sockopt and perform the offloads in userspace. > > Until a proper fix is worked out for netdev-linux, this commit disables > offloads on the non-OVS side of the veth pair, as a workaround. > > Signed-off-by: Daniele Di Proietto > --- I understand why you need this but I don't think it should be part of this patchset. We should start noting somewhere things that we need to revisit in the future. Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 11/15] dpif-netdev: Implement conntrack flush interface.
On Fri, Apr 15, 2016 at 05:02:43PM -0700, Daniele Di Proietto wrote: > Signed-off-by: Daniele Di Proietto This could be squashed with "[07/15] conntrack: Implement flush function" Just a suggestion. Acked-by: Flavio Leitner > --- > lib/dpif-netdev.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 5ac2bf3..995cbc0 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -3991,6 +3991,14 @@ dpif_netdev_ct_dump_done(struct dpif *dpif OVS_UNUSED, > return err; > } > > +static int > +dpif_netdev_ct_flush(struct dpif *dpif, const uint16_t *zone) > +{ > +struct dp_netdev *dp = get_dp_netdev(dpif); > + > +return conntrack_flush(&dp->conntrack, zone); > +} > + > const struct dpif_class dpif_netdev_class = { > "netdev", > dpif_netdev_init, > @@ -4034,7 +4042,7 @@ const struct dpif_class dpif_netdev_class = { > dpif_netdev_ct_dump_start, > dpif_netdev_ct_dump_next, > dpif_netdev_ct_dump_done, > -NULL, /* ct_flush */ > +dpif_netdev_ct_flush, > }; > > static void > -- > 2.1.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev -- fbl ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 08/15] conntrack: Implement dumping to ct_entry.
On Fri, Apr 15, 2016 at 05:02:40PM -0700, Daniele Di Proietto wrote: > Signed-off-by: Daniele Di Proietto > --- Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 10/15] dpif-netdev: Implement conntrack dump functions.
On Fri, Apr 15, 2016 at 05:02:42PM -0700, Daniele Di Proietto wrote: > Signed-off-by: Daniele Di Proietto > --- Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 09/15] dpif-netdev: Execute conntrack action.
On Fri, Apr 15, 2016 at 05:02:41PM -0700, Daniele Di Proietto wrote: > This commit implements the OVS_ACTION_ATTR_CT action in dpif-netdev. > > To allow ofproto-dpif to detect the conntrack feature, flow_put will not > discard anymore flows with ct_* fields set. We still shouldn't allow > flows with NAT bits set, since there is no support for NAT. > > Signed-off-by: Daniele Di Proietto > --- > lib/dpif-netdev.c | 68 > +-- > tests/dpif-netdev.at | 14 +-- > tests/ofproto-dpif.at | 20 +++ > 3 files changed, 77 insertions(+), 25 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 1e8a37c..436359a 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -33,6 +33,7 @@ > > #include "bitmap.h" > #include "cmap.h" > +#include "conntrack.h" > #include "coverage.h" > #include "csum.h" > #include "dp-packet.h" > @@ -89,9 +90,17 @@ static struct shash dp_netdevs > OVS_GUARDED_BY(dp_netdev_mutex) > > static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600); > > +#define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \ > + | CS_INVALID | CS_REPLY_DIR | > CS_TRACKED) > +#define DP_NETDEV_CS_UNSUPPORTED_MASK > (~(uint32_t)DP_NETDEV_CS_SUPPORTED_MASK) > + > static struct odp_support dp_netdev_support = { > .max_mpls_depth = SIZE_MAX, > .recirc = true, > +.ct_state = true, > +.ct_zone = true, > +.ct_mark = true, > +.ct_label = true, > }; > > /* Stores a miniflow with inline values */ > @@ -224,6 +233,8 @@ struct dp_netdev { > /* Cpu mask for pin of pmd threads. */ > char *pmd_cmask; > uint64_t last_tnl_conf_seq; > + > +struct conntrack conntrack; > }; > > static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev > *dp, > @@ -906,6 +917,8 @@ create_dp_netdev(const char *name, const struct > dpif_class *class, > dp->upcall_aux = NULL; > dp->upcall_cb = NULL; > > +conntrack_init(&dp->conntrack); > + > cmap_init(&dp->poll_threads); > ovs_mutex_init_recursive(&dp->non_pmd_mutex); > ovsthread_key_create(&dp->per_pmd_key, NULL); > @@ -976,6 +989,8 @@ dp_netdev_free(struct dp_netdev *dp) > ovs_mutex_destroy(&dp->non_pmd_mutex); > ovsthread_key_delete(dp->per_pmd_key); > > +conntrack_destroy(&dp->conntrack); > + > ovs_mutex_lock(&dp->port_mutex); > CMAP_FOR_EACH (port, node, &dp->ports) { > /* PMD threads are destroyed here. do_del_port() cannot quiesce */ > @@ -1965,9 +1980,7 @@ dpif_netdev_flow_from_nlattrs(const struct nlattr *key, > uint32_t key_len, > return EINVAL; > } > > -/* Userspace datapath doesn't support conntrack. */ > -if (flow->ct_state || flow->ct_zone || flow->ct_mark > -|| !ovs_u128_is_zero(&flow->ct_label)) { > +if (flow->ct_state & DP_NETDEV_CS_UNSUPPORTED_MASK) { > return EINVAL; > } > > @@ -2604,6 +2617,9 @@ dpif_netdev_run(struct dpif *dpif) > ovs_mutex_unlock(&dp->non_pmd_mutex); > dp_netdev_pmd_unref(non_pmd); > > +/* XXX: If workload is too heavy we could add a separate thread. */ > +conntrack_run(&dp->conntrack); > + Do you have any results to share? Just curious. Acked-by: Flavio Leitner > tnl_neigh_cache_run(); > tnl_port_map_run(); > new_tnl_seq = seq_read(tnl_conf_seq); > @@ -3850,12 +3866,48 @@ dp_execute_cb(void *aux_, struct dp_packet **packets, > int cnt, > VLOG_WARN("Packet dropped. Max recirculation depth exceeded."); > break; > > -case OVS_ACTION_ATTR_CT: > -/* If a flow with this action is slow-pathed, datapath assistance is > - * required to implement it. However, we don't support this action > - * in the userspace datapath. */ > -VLOG_WARN("Cannot execute conntrack action in userspace."); > +case OVS_ACTION_ATTR_CT: { > +const struct nlattr *b; > +bool commit = false; > +unsigned int left; > +uint16_t zone = 0; > +const char *helper = NULL; > +const uint32_t *setmark = NULL; > +const struct ovs_key_ct_labels *setlabel = NULL; > + > + > +/* XXX parsing this everytime is expensive. We should do like kernel > + * does and create a structure. */ > +NL_ATTR_FOR_EACH_UNSAFE (b, left, nl_attr_get(a), > nl_attr_get_size(a)) { > +enum ovs_ct_attr sub_type = nl_attr_type(b); > + > +switch(sub_type) { > +case OVS_CT_ATTR_COMMIT: > +commit = true; > +break; > +case OVS_CT_ATTR_ZONE: > +zone = nl_attr_get_u16(b); > +break; > +case OVS_CT_ATTR_HELPER: > +helper = nl_attr_get_string(b); > +break; > +case OVS_CT_ATTR_MARK: > +setmark = nl_attr_get(b); > +
Re: [ovs-dev] [PATCH v2 06/15] tests: Add test-conntrack pcap test.
On Fri, Apr 15, 2016 at 05:02:38PM -0700, Daniele Di Proietto wrote: > Simple program that runs the packet in a pcap file through the > connection tracker and prints the 'ct_state' for each packet. > > E.g. the line: > > `./test/ovstest test-conntrack capture.pcap 2` > > sends the packets in `capture.pcap` to the connection tracker, 2 per > call. > > Useful for debugging. > > Signed-off-by: Daniele Di Proietto > --- Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 07/15] conntrack: Implement flush function.
On Fri, Apr 15, 2016 at 05:02:39PM -0700, Daniele Di Proietto wrote: > Signed-off-by: Daniele Di Proietto > --- Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 05/15] tests: Add very simple conntrack benchmark.
On Fri, Apr 15, 2016 at 05:02:37PM -0700, Daniele Di Proietto wrote: > This introduces a very limited but simple benchmark for > conntrack_execute(). It just sends repeatedly the same batch of packets > through the connection tracker and returns the time spent to process > them. > > While this is not a realistic benchmark, it has proven useful during > development to evaluate different batching and locking strategies. > > E.g. the line: > > `./test/ovstest test-conntrack benchmark 1 1488 32` 14.88M? :-) > > starts 1 thread that will send 1488 packets to the connection > tracker, 32 at a time. It will print the time taken to process them. > > Signed-off-by: Daniele Di Proietto > --- Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 04/15] conntrack: New userspace connection tracker.
On Fri, Apr 15, 2016 at 05:02:36PM -0700, Daniele Di Proietto wrote: > This commit adds the conntrack module. > > It is a connection tracker that resides entirely in userspace. Its > primary user will be the dpif-netdev datapath. > > The module main goal is to provide conntrack_execute(), which offers a > convenient interface to implement the datapath ct() action. > > The conntrack module uses two submodules to deal with the l4 protocol > details (conntrack-other for UDP and ICMP, conntrack-tcp for TCP). > > The conntrack-tcp submodule implementation is adapted from FreeBSD's pf > subsystem, therefore it's BSD licensed. It has been slightly altered to > match the OVS coding style and to allow the pickup of already > established connections. > > Signed-off-by: Daniele Di Proietto > --- > COPYING | 1 + > debian/copyright.in | 4 + > lib/automake.mk | 5 + > lib/conntrack-other.c | 91 ++ > lib/conntrack-private.h | 77 + > lib/conntrack-tcp.c | 476 +++ > lib/conntrack.c | 851 > > lib/conntrack.h | 144 > 8 files changed, 1649 insertions(+) > create mode 100644 lib/conntrack-other.c > create mode 100644 lib/conntrack-private.h > create mode 100644 lib/conntrack-tcp.c > create mode 100644 lib/conntrack.c > create mode 100644 lib/conntrack.h > > diff --git a/COPYING b/COPYING > index 308e3ea..afb98b9 100644 > --- a/COPYING > +++ b/COPYING > @@ -25,6 +25,7 @@ License, version 2. > The following files are licensed under the 2-clause BSD license. > include/windows/getopt.h > lib/getopt_long.c > +lib/conntrack-tcp.c > > The following files are licensed under the 3-clause BSD-license > include/windows/netinet/icmp6.h > diff --git a/debian/copyright.in b/debian/copyright.in > index 57d007a..a15f4dd 100644 > --- a/debian/copyright.in > +++ b/debian/copyright.in > @@ -21,6 +21,9 @@ Upstream Copyright Holders: > Copyright (c) 2014 Michael Chapman > Copyright (c) 2014 WindRiver, Inc. > Copyright (c) 2014 Avaya, Inc. > + Copyright (c) 2001 Daniel Hartmeier > + Copyright (c) 2002 - 2008 Henning Brauer > + Copyright (c) 2012 Gleb Smirnoff > > License: > > @@ -90,6 +93,7 @@ License: > lib/getopt_long.c > include/windows/getopt.h > datapath-windows/ovsext/Conntrack-tcp.c > + lib/conntrack-tcp.c > > * The following files are licensed under the 3-clause BSD-license > > diff --git a/lib/automake.mk b/lib/automake.mk > index 1ec2115..ba30442 100644 > --- a/lib/automake.mk > +++ b/lib/automake.mk > @@ -47,6 +47,11 @@ lib_libopenvswitch_la_SOURCES = \ > lib/compiler.h \ > lib/connectivity.c \ > lib/connectivity.h \ > + lib/conntrack-private.h \ > + lib/conntrack-tcp.c \ > + lib/conntrack-other.c \ > + lib/conntrack.c \ > + lib/conntrack.h \ > lib/coverage.c \ > lib/coverage.h \ > lib/crc32c.c \ > diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c > new file mode 100644 > index 000..65d02a9 > --- /dev/null > +++ b/lib/conntrack-other.c > @@ -0,0 +1,91 @@ > +/* > + * Copyright (c) 2015, 2016 Nicira, Inc. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#include > + > +#include "conntrack-private.h" > +#include "dp-packet.h" > + > +enum other_state { > +OTHERS_FIRST, > +OTHERS_MULTIPLE, > +OTHERS_BIDIR, > +}; > + > +struct conn_other { > +struct conn up; > +enum other_state state; > +}; > + > +static const long long other_timeouts[] = { > +[OTHERS_FIRST] = 60 * 1000, > +[OTHERS_MULTIPLE] = 60 * 1000, > +[OTHERS_BIDIR] = 30 * 1000, > +}; I missed a description here, like the unit. > + > +static struct conn_other * > +conn_other_cast(const struct conn *conn) > +{ > +return CONTAINER_OF(conn, struct conn_other, up); > +} > + > +static void > +update_expiration(struct conn_other *conn, long long now) > +{ > +conn->up.expiration = now + other_timeouts[conn->state]; > +} > + > +static enum ct_update_res > +other_conn_update(struct conn *conn_, struct dp_packet *pkt OVS_UNUSED, > +bool reply, long long now) > +{ > +struct conn_other *conn = conn_other_cast(conn_); > + > +if (reply && conn->state != OTHERS_BIDIR) { > +conn->state = OTHERS_BIDIR; > +} else if (conn->state == OTHERS_FIRST) { > +
Re: [ovs-dev] [PATCH v2 03/15] flow: Introduce parse_dl_type().
On Fri, Apr 15, 2016 at 05:02:35PM -0700, Daniele Di Proietto wrote: > The function simply returns the ethernet type of the packet (after > eventually discarding the VLAN tag). It will be used by a following > commit. > > Signed-off-by: Daniele Di Proietto > --- I agree with Joe that we should avoid forcing inlining. Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 02/15] flow: Export parse_ipv6_ext_hdrs().
On Fri, Apr 15, 2016 at 05:02:34PM -0700, Daniele Di Proietto wrote: > This will be used by a future commit. > > Signed-off-by: Daniele Di Proietto > --- I see that you tried to split the interface and the function code so that the miniflow_extract() could get the code inlined and have no performance impact. Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 01/15] packets: Define ICMP types.
On Fri, Apr 15, 2016 at 05:02:33PM -0700, Daniele Di Proietto wrote: > Linux and FreeBSD have slightly different names for these constants. > Windows doesn't define them. It is simpler to redefine them from > scratch for OVS. The new names are different than those used in Linux > and FreeBSD. > > These definitions will be used by a future commit. > > Signed-off-by: Daniele Di Proietto > --- Acked-by: Flavio Leitner ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v1 2/2] ovn: Send garp on localnet
> > Is there an equivalent of GARP for IPv6? > The "Unsolicited Neighbor Advertisement" seems to be the equivalent of GARP for IPv6. http://tools.ietf.org/html/rfc2461#section-7.2.6 The problem motivating this fix was seen with IPv4 though. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [patch_v3] vtep: add source node replication support.
Thanks Bruce I have one point that I would like to confirm. On Tue, Apr 26, 2016 at 11:41 AM, Bruce Davie wrote: > Darrell, > I don’t think you can handle the error of unsupported replication mode > the way it is described here: > > + +key="unsupported_source_node_replication"> > +Indicates that the requested source node replication mode cannot > be > +supported by the physical switch; this specifically means in this > +context that the physical switch lacks the capability to support > +source node replication mode. This error occurs when a controller > +attempts to set source node replication mode for one of the > logical > +switches that the physical switch is keeping context for. If this > +error occurs, logical switches continue to use service node > +replication mode. > + > > If a given logical switch spans 2 or more physical switches, and one of > them can’t support source mode replication, but the other one can, then > you’ll end up with the logical switch in a mixed state, with one switch > falling back to service node mode and the other staying in source node > mode. At this point I don’t know what happens, but clearly the controller > will not be able to report what mode the logical switch is actually in > (short of introducing an explicit “mixed mode”). > Let me paraphrase what the patch text implies, just to be sure. A physical switch keeps context regarding multiple logical switches. If a controller requests that a logical switch context on a given physical switch change its replication mode to source_node, but the capability of the physical switch cannot support it, then the logical switch context on this physical switch under discussion can only use service_node replication as presumably will all the other logical switch "contexts" on this particular physical switch. Now, with respect to the single logical switch under discussion, let say this logical switch spans 3 HVs and 2 physical switches (psA and psB) as an arbitrary example. Let assume the 3 HVs are using source node replication and psA is using source node replication. But psB can only support service node replication. This means when a BUM packet is received from the internet for example by psB, it sends the packet to a replication node to get to the other transport nodes (3 HVs and psA, in this example). Split horizon is still otherwise honored. Does it absolutely require that the other transport nodes revert to service node mode ? Can't this case still work depending on the way the designated service node behaves ? We have an actual use case in OVN where the HVs will only support source node replication. I got your overall concern that its "easier to shoot oneself in the foot" using mixed replication mode; however can we say its "recommended that all members of a logical switch use the same replication mode", rather than they must use the same replication mode ? > > I propose that the last sentence read: An NVC that observes this error > condition should take appropriate action (e.g. reverting the logical switch > to service node replication mode). > > Regarding the description of the alt_replication_mode column, I think it > could be clearer. Suggested wording follows: > > > > For handling broadcast, multicast (in default manner) and unknown > unicast traffic, packets can be sent to all members of a logical > switch. There are different modes > to replicate the packets. The default mode of replication is to send the > traffic to a service node, which can be a hypervisor, server or > appliance, and let the service node handle replication to other > transport nodes (hypervisors or other VTEP physical > switches). This mode is called service node replication. An alternate > mode of replication, called source node replication involves the > source node sending to all other transport nodes. Hypervisors are > always responsible for doing their own replication for locally > attached VMs in both modes. > Service node mode is the default (and was the only option for > prior versions of this schema). > Source node mode is an alternate replication mode that may be > configured using this column. > > > > > This column (if used) configures the alternate replication mode > for this > . There is one valid value > presently, > source_node. > > agreed > Finally, it’s not strictly necessary to give this group such a long name. > It’s a per-logical switch feature by definition (it exists in the table > where per logical switch information is configured) so you could shorten > the name as follows: > > http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] datapath: Fix datapath build on Centos 6.5 (2.6.31-431) kernels
On Sat, Apr 23, 2016 at 5:52 PM, Wanlong Gao wrote: > The following error is caused by the undefined dev_get_stats64() of kernel > v2.6.32-431. The HAVE_RTNL_LINK_STATS64 does not work because this series > kernel have the "struct rtnl_link_stats64" but neither have 64bit > dev_get_stats() nor dev_get_stats64(), so we should use the compat defined > rpl_dev_get_stats() here. > > I just made this patch against branch-2.5 because the master branch are > dropping the support of kernel less than v3.10 now. > > /root/rpmbuild/BUILD/openvswitch-2.5.1/_default/datapath/linux/vport.c: In > function 'ovs_vport_get_stats': > /root/rpmbuild/BUILD/openvswitch-2.5.1/_default/datapath/linux/vport.c:328: > error: implicit declaration of function 'dev_get_stats64' > /root/rpmbuild/BUILD/openvswitch-2.5.1/_default/datapath/linux/vport.c:328: > warning: assignment makes pointer from integer without a cast > > > Signed-off-by: Wanlong Gao Thanks, applied to branch-2.5. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v2 2/2] ovn: send garp on localnet
In some usecases such as VM migration or when VMs reuse IP addresses, VMs become unreachable externally because external switches/routers on localnet have stale port-mac or arp caches. The problem resolves after some time when the caches ageout which could be minutes for port-mac bindings or hours for arp caches. To fix this, send some gratuitous ARPs when a logical port on a localnet datapath gets added. Such gratuitous arps help on a best-effort basis to update the mac-port bindings and arp-caches of external switches and routers on the localnet. Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1545897 Reported-by: Kyle Mestery Signed-off-by: Ramu Ramamurthy --- ovn/controller/ovn-controller.c | 2 +- ovn/controller/pinctrl.c| 244 +++- ovn/controller/pinctrl.h| 5 +- tests/ovn.at| 63 +++ 4 files changed, 311 insertions(+), 3 deletions(-) diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index f68f842..ae7d44b 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -373,7 +373,7 @@ main(int argc, char *argv[]) enum mf_field_id mff_ovn_geneve = ofctrl_run(br_int); -pinctrl_run(&ctx, &lports, br_int); +pinctrl_run(&ctx, &lports, br_int, chassis_id, &local_datapaths); struct hmap flow_table = HMAP_INITIALIZER(&flow_table); lflow_run(&ctx, &lports, &mcgroups, &local_datapaths, diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c index cbac50e..aec2aa3 100644 --- a/ovn/controller/pinctrl.c +++ b/ovn/controller/pinctrl.c @@ -22,6 +22,8 @@ #include "dp-packet.h" #include "flow.h" #include "lport.h" +#include "ovn-controller.h" +#include "lib/sset.h" #include "openvswitch/ofp-actions.h" #include "openvswitch/ofp-msgs.h" #include "openvswitch/ofp-print.h" @@ -30,6 +32,7 @@ #include "ovn-controller.h" #include "ovn/lib/actions.h" #include "ovn/lib/logical-fields.h" +#include "ovn/lib/ovn-util.h" #include "poll-loop.h" #include "rconn.h" #include "socket-util.h" @@ -54,6 +57,14 @@ static void run_put_arps(struct controller_ctx *, static void wait_put_arps(struct controller_ctx *); static void flush_put_arps(void); +static void init_send_garps(void); +static void destroy_send_garps(void); +static void send_garp_wait(void); +static void send_garp_run(const struct ovsrec_bridge *, + const char *chassis_id, + const struct lport_index *lports, + struct hmap *local_datapaths); + COVERAGE_DEFINE(pinctrl_drop_put_arp); void @@ -62,6 +73,7 @@ pinctrl_init(void) swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP13_VERSION); conn_seq_no = 0; init_put_arps(); +init_send_garps(); } static ovs_be32 @@ -266,7 +278,9 @@ pinctrl_recv(const struct ofp_header *oh, enum ofptype type) void pinctrl_run(struct controller_ctx *ctx, const struct lport_index *lports, -const struct ovsrec_bridge *br_int) +const struct ovsrec_bridge *br_int, +const char *chassis_id, +struct hmap *local_datapaths) { if (br_int) { char *target; @@ -307,6 +321,7 @@ pinctrl_run(struct controller_ctx *ctx, const struct lport_index *lports, } run_put_arps(ctx, lports); +send_garp_run(br_int, chassis_id, lports, local_datapaths); } void @@ -315,6 +330,7 @@ pinctrl_wait(struct controller_ctx *ctx) wait_put_arps(ctx); rconn_run_wait(swconn); rconn_recv_wait(swconn); +send_garp_wait(); } void @@ -322,6 +338,7 @@ pinctrl_destroy(void) { rconn_destroy(swconn); destroy_put_arps(); +destroy_send_garps(); } /* Implementation of the "put_arp" OVN action. This action sends a packet to @@ -484,3 +501,228 @@ flush_put_arps(void) free(pa); } } + +/* + * Send gratuitous arp for vif on localnet. + * + * When a new vif on localnet is added, gratuitous arps are sent announcing + * the port's mac,ip mapping. On localnet, such announcements are needed for + * switches and routers on the broadcast segment to update their port-mac + * and arp tables. + */ +struct garp_data { +struct eth_addr ea; /* Ethernet address of port. */ +ovs_be32 ipv4; /* Ipv4 address of port. */ +long long int announce_time; /* Next announcement in ms. */ +int backoff; /* Backoff for the next announcement. */ +int ofport; /* Ofport used to output this garp. */ +}; + +/* Contains garps to be sent. */ +static struct shash send_garp_data; + +/* Next garp announcement in ms. */ +static long long int send_garp_time; + +static void +init_send_garps(void) +{ +shash_init(&send_garp_data); +send_garp_time = LLONG_MAX; +} + +static void +destroy_send_garps(void) +{ +shash_destroy(&send_garp_data); +} + +/* Add or update a vif for which
[ovs-dev] [PATCH v2 1/2] ovn: Move extract_lport_addresses
Move the function extract_lport_addresses to a file in ovn/lib since that function can be used by ovn-controller also to parse addresses stored in the mac column of the port_binding table. Currently that function is used only in ovn_northd. Signed-off-by: Ramu Ramamurthy --- ovn/lib/automake.mk | 2 + ovn/lib/ovn-util.c | 104 ovn/lib/ovn-util.h | 44 ovn/northd/ovn-northd.c | 104 +--- 4 files changed, 151 insertions(+), 103 deletions(-) create mode 100644 ovn/lib/ovn-util.c create mode 100644 ovn/lib/ovn-util.h diff --git a/ovn/lib/automake.mk b/ovn/lib/automake.mk index 9e09352..2b178da 100644 --- a/ovn/lib/automake.mk +++ b/ovn/lib/automake.mk @@ -10,6 +10,8 @@ ovn_lib_libovn_la_SOURCES = \ ovn/lib/expr.h \ ovn/lib/lex.c \ ovn/lib/lex.h \ + ovn/lib/ovn-util.c \ + ovn/lib/ovn-util.h \ ovn/lib/logical-fields.h nodist_ovn_lib_libovn_la_SOURCES = \ ovn/lib/ovn-nb-idl.c \ diff --git a/ovn/lib/ovn-util.c b/ovn/lib/ovn-util.c new file mode 100644 index 000..abdc247 --- /dev/null +++ b/ovn/lib/ovn-util.c @@ -0,0 +1,104 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include "ovn-util.h" +#include "openvswitch/vlog.h" + +VLOG_DEFINE_THIS_MODULE(ovn_util); + +/* + * Extracts the mac, ipv4 and ipv6 addresses from the input param 'address' + * which should be of the format 'MAC [IP1 IP2 ..]" where IPn should be + * a valid IPv4 or IPv6 address and stores them in the 'ipv4_addrs' and + * 'ipv6_addrs' fields of input param 'laddrs'. + * The caller has to free the 'ipv4_addrs' and 'ipv6_addrs' fields. + * If input param 'store_ipv6' is true only then extracted ipv6 addresses + * are stored in 'ipv6_addrs' fields. + * Return true if at least 'MAC' is found in 'address', false otherwise. + * Eg 1. + * If 'address' = '00:00:00:00:00:01 10.0.0.4 fe80::ea2a:eaff:fe28:3390/64 + * 30.0.0.3/23' and 'store_ipv6' = true + * then returns true with laddrs->n_ipv4_addrs = 2, naddrs->n_ipv6_addrs = 1. + * + * Eg. 2 + * If 'address' = '00:00:00:00:00:01 10.0.0.4 fe80::ea2a:eaff:fe28:3390/64 + * 30.0.0.3/23' and 'store_ipv6' = false + * then returns true with laddrs->n_ipv4_addrs = 2, naddrs->n_ipv6_addrs = 0. + * + * Eg 3. If 'address' = '00:00:00:00:00:01 10.0.0.4 addr 30.0.0.4', then + * returns true with laddrs->n_ipv4_addrs = 1 and laddrs->n_ipv6_addrs = 0. + */ +bool +extract_lport_addresses(char *address, struct lport_addresses *laddrs, +bool store_ipv6) +{ +char *buf = address; +int buf_index = 0; +char *buf_end = buf + strlen(address); +if (!ovs_scan_len(buf, &buf_index, ETH_ADDR_SCAN_FMT, + ETH_ADDR_SCAN_ARGS(laddrs->ea))) { +return false; +} + +ovs_be32 ip4; +struct in6_addr ip6; +unsigned int plen; +char *error; + +laddrs->n_ipv4_addrs = 0; +laddrs->n_ipv6_addrs = 0; +laddrs->ipv4_addrs = NULL; +laddrs->ipv6_addrs = NULL; + +/* Loop through the buffer and extract the IPv4/IPv6 addresses + * and store in the 'laddrs'. Break the loop if invalid data is found. + */ +buf += buf_index; +while (buf < buf_end) { +buf_index = 0; +error = ip_parse_cidr_len(buf, &buf_index, &ip4, &plen); +if (!error) { +laddrs->n_ipv4_addrs++; +laddrs->ipv4_addrs = xrealloc( +laddrs->ipv4_addrs, +sizeof (struct ipv4_netaddr) * laddrs->n_ipv4_addrs); +laddrs->ipv4_addrs[laddrs->n_ipv4_addrs - 1].addr = ip4; +laddrs->ipv4_addrs[laddrs->n_ipv4_addrs - 1].plen = plen; +buf += buf_index; +continue; +} +free(error); +error = ipv6_parse_cidr_len(buf, &buf_index, &ip6, &plen); +if (!error && store_ipv6) { +laddrs->n_ipv6_addrs++; +laddrs->ipv6_addrs = xrealloc( +laddrs->ipv6_addrs, +sizeof(struct ipv6_netaddr) * laddrs->n_ipv6_addrs); +memcpy(&laddrs->ipv6_addrs[laddrs->n_ipv6_addrs - 1].addr, &ip6, + sizeof(struct in6_addr)); +laddrs->ipv6_addrs[laddrs->n_ipv6_addrs - 1].plen = plen; +} + +if (error) { +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); +VLOG_INFO_RL(&rl
[ovs-dev] [PATCH v2 0/2] ovn: send garp on localnet
The first patch refactors the function extract_lport_addresses so that it can be used in ovn-northd and in ovn-controller. The second patch depends on the first patch, and implements the functionality to send gratuitous arps on localnet. Changes v1->v2: Changes per code-review: * send gratuitous arp request instead of a reply * use poll_timer_wait_until for timing * Various code changes to conform to coding style ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v12 6/6] netdev-dpdk: Check dpdk-extra when reading db
A previous patch introduced the ability to pass arbitrary EAL command line options via the dpdk_extras database entry. This commit enhances that by warning the user when such a configuration is detected and prefering the value in the database. Suggested-by: Sean K Mooney Signed-off-by: Aaron Conole Tested-by: Sean K Mooney Tested-by: Kevin Traynor Acked-by: Panu Matilainen Acked-by: Flavio Leitner --- Previous: http://openvswitch.org/pipermail/dev/2016-April/069032.html v12: * No change lib/netdev-dpdk.c | 66 +-- 1 file changed, 55 insertions(+), 11 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index d676a3e..353954d 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2774,6 +2774,17 @@ dpdk_option_extend(char ***argv, int argc, const char *option, newargv[argc+1] = xstrdup(value); } +static char ** +move_argv(char ***argv, size_t cur_size, char **src_argv, size_t src_argc) +{ +char **newargv = grow_argv(argv, cur_size, src_argc); +while (src_argc--) { +newargv[cur_size+src_argc] = src_argv[src_argc]; +src_argv[src_argc] = 0; +} +return newargv; +} + static int extra_dpdk_args(const char *ovs_extra_config, char ***argv, int argc) { @@ -2791,9 +2802,21 @@ extra_dpdk_args(const char *ovs_extra_config, char ***argv, int argc) return ret; } +static bool +argv_contains(char **argv_haystack, const size_t argc_haystack, + const char *needle) +{ +for (size_t i = 0; i < argc_haystack; ++i) { +if (!strcmp(argv_haystack[i], needle)) +return true; +} +return false; +} + static int construct_dpdk_options(const struct ovsrec_open_vswitch *ovs_cfg, - char ***argv, const int initial_size) + char ***argv, const int initial_size, + char **extra_args, const size_t extra_argc) { struct dpdk_options_map { const char *ovs_configuration; @@ -2817,8 +2840,13 @@ construct_dpdk_options(const struct ovsrec_open_vswitch *ovs_cfg, } if (lookup) { -dpdk_option_extend(argv, ret, opts[i].dpdk_option, lookup); -ret += 2; +if (!argv_contains(extra_args, extra_argc, opts[i].dpdk_option)) { +dpdk_option_extend(argv, ret, opts[i].dpdk_option, lookup); +ret += 2; +} else { +VLOG_WARN("Ignoring database defined option '%s' due to " + "dpdk_extras config", opts[i].dpdk_option); +} } } @@ -2829,7 +2857,8 @@ construct_dpdk_options(const struct ovsrec_open_vswitch *ovs_cfg, static int construct_dpdk_mutex_options(const struct ovsrec_open_vswitch *ovs_cfg, - char ***argv, const int initial_size) + char ***argv, const int initial_size, + char **extra_args, const size_t extra_argc) { struct dpdk_exclusive_options_map { const char *category; @@ -2877,10 +2906,16 @@ construct_dpdk_mutex_options(const struct ovsrec_open_vswitch *ovs_cfg, popt->category); } -dpdk_option_extend(argv, ret, popt->eal_dpdk_options[found_pos], - found_value); -ret += 2; +if (!argv_contains(extra_args, extra_argc, + popt->eal_dpdk_options[found_pos])) { +dpdk_option_extend(argv, ret, popt->eal_dpdk_options[found_pos], + found_value); +ret += 2; +} else { +VLOG_WARN("Ignoring database defined option '%s' due to " + "dpdk_extras config", popt->eal_dpdk_options[found_pos]); +} } return ret; } @@ -2890,14 +2925,23 @@ get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv, int argc) { const char *extra_configuration; -int i = construct_dpdk_options(ovs_cfg, argv, argc); -i = construct_dpdk_mutex_options(ovs_cfg, argv, i); +char **extra_args = NULL; +int i; +size_t extra_argc = 0; extra_configuration = smap_get(&ovs_cfg->other_config, "dpdk-extra"); if (extra_configuration) { -i = extra_dpdk_args(extra_configuration, argv, i); +extra_argc = extra_dpdk_args(extra_configuration, &extra_args, 0); } -return i; + +i = construct_dpdk_options(ovs_cfg, argv, argc, extra_args, extra_argc); +i = construct_dpdk_mutex_options(ovs_cfg, argv, i, extra_args, extra_argc); + +if (extra_configuration) { +*argv = move_argv(argv, i, extra_args, extra_argc); +} + +return i + extra_argc; } static char **dpdk_argv; -- 2.5.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v12 5/6] netdev-dpdk: Allow arbitrary eal arguments
A previous change moved some commonly used arguments from commandline to the database, and with it the ability to pass arbitrary arguments to EAL. This change allows arbitrary eal arguments to be provided via a new db entry 'other_config:dpdk-extra' which will tokenize the string and add it to the argument list. The only argument which will not be supported with this change is '--no-huge', which appears to break the system in other ways. Signed-off-by: Aaron Conole Tested-by: Sean K Mooney Tested-by: RobertX Wojciechowicz Tested-by: Kevin Traynor Acked-by: Panu Matilainen Acked-by: Kevin Traynor Acked-by: Flavio Leitner --- Previous: http://openvswitch.org/pipermail/dev/2016-April/069031.html v12: * Squashed NEWS related change into this patch * Dropped a useless assignment hunk INSTALL.DPDK.md | 5 + NEWS | 3 ++- lib/netdev-dpdk.c| 37 + utilities/ovs-dev.py | 7 +-- vswitchd/vswitch.xml | 11 +++ 5 files changed, 60 insertions(+), 3 deletions(-) diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 3cc283a..9f907f8 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -173,6 +173,11 @@ Using the DPDK with ovs-vswitchd: * dpdk-hugepage-dir Directory where hugetlbfs is mounted + * dpdk-extra + Extra arguments to provide to DPDK EAL, as previously specified on the + command line. Do not pass '--no-huge' to the system in this way. Support + for running the system without hugepages is nonexistent. + * cuse-dev-name Option to set the vhost_cuse character device name. diff --git a/NEWS b/NEWS index f349819..e819438 100644 --- a/NEWS +++ b/NEWS @@ -30,7 +30,8 @@ Post-v2.5.0 * Sensible defaults have been introduced for many of the required configuration options * DB entries have been added for many of the DPDK EAL command line - arguments + arguments. Additional arguments can be passed via the dpdk-extra + entry. - ovs-benchmark: This utility has been removed due to lack of use and bitrot. - ovs-appctl: diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index e7951df..d676a3e 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -39,6 +39,7 @@ #include "netdev-provider.h" #include "netdev-vport.h" #include "odp-util.h" +#include "openvswitch/dynamic-string.h" #include "openvswitch/list.h" #include "openvswitch/ofp-print.h" #include "openvswitch/vlog.h" @@ -2774,6 +2775,23 @@ dpdk_option_extend(char ***argv, int argc, const char *option, } static int +extra_dpdk_args(const char *ovs_extra_config, char ***argv, int argc) +{ +int ret = argc; +char *release_tok = xstrdup(ovs_extra_config); +char *tok = release_tok, *endptr = NULL; + +for (tok = strtok_r(release_tok, " ", &endptr); tok != NULL; + tok = strtok_r(NULL, " ", &endptr)) { +char **newarg = grow_argv(argv, ret, 1); +*argv = newarg; +newarg[ret++] = xstrdup(tok); +} +free(release_tok); +return ret; +} + +static int construct_dpdk_options(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv, const int initial_size) { @@ -2871,8 +2889,14 @@ static int get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv, int argc) { +const char *extra_configuration; int i = construct_dpdk_options(ovs_cfg, argv, argc); i = construct_dpdk_mutex_options(ovs_cfg, argv, i); + +extra_configuration = smap_get(&ovs_cfg->other_config, "dpdk-extra"); +if (extra_configuration) { +i = extra_dpdk_args(extra_configuration, argv, i); +} return i; } @@ -2979,6 +3003,19 @@ dpdk_init__(const struct ovsrec_open_vswitch *ovs_cfg) optind = 1; +if (VLOG_IS_INFO_ENABLED()) { +struct ds eal_args; +int opt; +ds_init(&eal_args); +ds_put_cstr(&eal_args, "EAL ARGS:"); +for (opt = 0; opt < argc; ++opt) { +ds_put_cstr(&eal_args, " "); +ds_put_cstr(&eal_args, argv[opt]); +} +VLOG_INFO("%s", ds_cstr_ro(&eal_args)); +ds_destroy(&eal_args); +} + /* Make sure things are initialized ... */ result = rte_eal_init(argc, argv); if (result < 0) { diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py index a74b528..524f574 100755 --- a/utilities/ovs-dev.py +++ b/utilities/ovs-dev.py @@ -267,6 +267,8 @@ def run(): if options.dpdk: _sh("ovs-vsctl --no-wait set Open_vSwitch %s " \ "other_config:dpdk-init=true" % root_uuid) +_sh("ovs-vsctl --no-wait set Open_vSwitch %s other_config:" \ +"dpdk-extra=\"%s\"" % (root_uuid, ' '.join(options.dpdk))) else: _sh("ovs-vsctl --no-wait set Open_vSwitch %s " \ "other_config:dpdk-init=false" % root_uuid) @@ -423,8 +425,9 @@ def main(): help="run ovs-vswitchd under gdb") group.add_option("--valgrind", dest="valgrind", act
[ovs-dev] [PATCH v12 3/6] netdev-dpdk: Restrict vhost_sock_dir
Since the vhost-user sockets directory now comes from the database, it is possible for any user with database access to program an arbitrary filesystem location for the sockets directory. This could result in unprivileged users creating or deleting arbitrary filesystem files by using specially crafted names. To prevent this, use the introduced ovs_realpath function to resolve the filesystem location and ensure that it resolves under the ovs_rundir. Signed-off-by: Aaron Conole --- Previous: http://openvswitch.org/pipermail/dev/2016-April/069030.html v12: * Converted to using strstr instead of canonicalization lib/netdev-dpdk.c| 27 --- vswitchd/vswitch.xml | 4 +++- 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index a393cee..9793fc5 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2897,6 +2897,9 @@ dpdk_init__(const struct ovsrec_open_vswitch *ovs_cfg) int argc; int err; cpu_set_t cpuset; +#ifndef VHOST_CUSE +char *sock_dir_subcomponent; +#endif if (!smap_get_bool(&ovs_cfg->other_config, "dpdk-init", false)) { VLOG_INFO("DPDK Disabled - to change this requires a restart.\n"); @@ -2909,15 +2912,25 @@ dpdk_init__(const struct ovsrec_open_vswitch *ovs_cfg) if (process_vhost_flags("cuse-dev-name", xstrdup("vhost-net"), PATH_MAX, ovs_cfg, &cuse_dev_name)) { #else -if (process_vhost_flags("vhost-sock-dir", xstrdup(ovs_rundir()), -NAME_MAX, ovs_cfg, &vhost_sock_dir)) { +if (process_vhost_flags("vhost-sock-dir", xstrdup(""), +NAME_MAX, ovs_cfg, &sock_dir_subcomponent)) { struct stat s; - -err = stat(vhost_sock_dir, &s); -if (err) { -VLOG_ERR("vhost-user sock directory '%s' does not exist.", - vhost_sock_dir); +if (!strstr(sock_dir_subcomponent, "..")) { +vhost_sock_dir = xasprintf("%s/%s", ovs_rundir(), + sock_dir_subcomponent); + +err = stat(vhost_sock_dir, &s); +if (err) { +VLOG_ERR("vhost-user sock directory '%s' does not exist.", + vhost_sock_dir); +} +} else { +vhost_sock_dir = xstrdup(ovs_rundir()); +VLOG_ERR("vhost-user sock directory request '%s/%s' has invalid" + "characters '..' - using %s instead.", + ovs_rundir(), sock_dir_subcomponent, ovs_rundir()); } +free(sock_dir_subcomponent); #endif } diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 5965b10..e6d7359 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -302,7 +302,9 @@ - Specifies the path to the vhost-user unix domain socket files. + Specifies the path to the vhost-user unix domain socket files. This + path must exist and be a subdirectory tree of the Open vSwitch + run directory. Defaults to the working directory of the application. Changing this -- 2.5.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v12 0/6] Convert DPDK configuration from command line to DB based
Currently, configuration of DPDK parameters is done via the command line through a --dpdk **OPTIONS** -- command line argument. This has a number of challenges, including: * It must be the first option passed to ovs-vswitchd * It is the only datapath feature in OVS to be configured on the command line * It requires specialized knowledge of sub-component command switches * It also inteprets non-EAL arguments (confusing users) * It is a broken model for datapath configuration. This series brings the following changes to openvswitch: * All DPDK options are taken from the ovs database rather than the command line * Non-EAL arguments also have separate database entries * DPDK lcores are optionally auto-assigned to a single core based on the bridge coremask. * DPDK options have default behaviors * Updated documentation This series has been build tested (including `make check`) on OSX, Fedora 23, Windows (via appveyor), and FreeBSD 10.3; the v11 has had very basic testing applied (start, configure some of the settings). I have removed ACKs and Tested-bys for some of the patches since they underwent changes that I felt disqualified continued use of the Acked-by: and Tested-by: tags. Travis-ci build: https://travis-ci.org/orgcandman/ovs/builds/120081527 Appveyor build: https://ci.appveyor.com/project/orgcandman/ovs/build/1.0.9 This is a resend due to an accidentally omitted hunk in 4/8. A huge round of thanks on the work so far should be given to the following folks (in alphabetical order): * Ben Pfaff (Reviews, vhost-sock-dir escape suggestion) * Christian Erhardt (Testing) * Daniele Di Proietto (Reviews, general suggestions) * Flavio Leitner(Original efforts, reviews) * Kevin Traynor (Testing, general suggestions, reviews, doc reviews) * Panu Matilainen (Initialization ideas, eal arguments ideas, reviews) * RobertX Wojciechowicz (Testing, general suggestions) * Sean Mooney (Testing, general suggestions) Previous series: http://openvswitch.org/pipermail/dev/2016-April/069025.html v12: * Rebased to latest upstream * Dropped path canonicalization * Squashed NEWS changes into their specific commits. Aaron Conole (6): netdev-dpdk: Restore thread affinity after DPDK init netdev-dpdk: Convert initialization from cmdline to db netdev-dpdk: Restrict vhost_sock_dir netdev-dpdk: Autofill lcore coremask if absent netdev-dpdk: Allow arbitrary eal arguments netdev-dpdk: Check dpdk-extra when reading db FAQ.md | 6 +- INSTALL.DPDK.md| 87 +++-- NEWS | 6 + lib/automake.mk| 4 + lib/netdev-dpdk.c | 430 + lib/netdev-dpdk.h | 14 +- lib/netdev-nodpdk.c| 21 +++ tests/ofproto-macros.at| 3 +- utilities/ovs-dev.py | 10 +- vswitchd/bridge.c | 3 + vswitchd/ovs-vswitchd.8.in | 6 +- vswitchd/ovs-vswitchd.c| 27 +-- vswitchd/vswitch.xml | 133 ++ 13 files changed, 615 insertions(+), 135 deletions(-) create mode 100644 lib/netdev-nodpdk.c -- 2.5.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v12 4/6] netdev-dpdk: Autofill lcore coremask if absent
The user has control over the DPDK internal lcore coremask, but this parameter can be autofilled with a bit more intelligence. If the user does not fill this parameter in, we use the lowest set bit in the current task CPU affinity. Otherwise, we will reassign the current thread to the specified lcore mask, in addition to the dpdk lcore threads. Signed-off-by: Aaron Conole Tested-by: Sean K Mooney Tested-by: RobertX Wojciechowicz Tested-by: Kevin Traynor Acked-by: Panu Matilainen Acked-by: Kevin Traynor Acked-by: Flavio Leitner --- Previous: http://openvswitch.org/pipermail/dev/2016-April/069029.html v12: * Dropped a whitespace change * Changed argc initialization lib/netdev-dpdk.c | 38 +- 1 file changed, 33 insertions(+), 5 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 9793fc5..e7951df 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2868,9 +2868,10 @@ construct_dpdk_mutex_options(const struct ovsrec_open_vswitch *ovs_cfg, } static int -get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv) +get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv, + int argc) { -int i = construct_dpdk_options(ovs_cfg, argv, 1); +int i = construct_dpdk_options(ovs_cfg, argv, argc); i = construct_dpdk_mutex_options(ovs_cfg, argv, i); return i; } @@ -2894,7 +2895,8 @@ dpdk_init__(const struct ovsrec_open_vswitch *ovs_cfg) { char **argv = NULL; int result; -int argc; +int argc, argc_tmp; +bool auto_determine = true; int err; cpu_set_t cpuset; #ifndef VHOST_CUSE @@ -2943,8 +2945,34 @@ dpdk_init__(const struct ovsrec_open_vswitch *ovs_cfg) } argv = grow_argv(&argv, 0, 1); +argc = 1; argv[0] = xstrdup(ovs_get_program_name()); -argc = get_dpdk_args(ovs_cfg, &argv); +argc_tmp = get_dpdk_args(ovs_cfg, &argv, argc); + +while (argc_tmp != argc) { +if (!strcmp("-c", argv[argc]) || !strcmp("-l", argv[argc])) { +auto_determine = false; +break; +} +argc++; +} +argc = argc_tmp; + +/** + * NOTE: This is an unsophisticated mechanism for determining the DPDK + * lcore for the DPDK Master. + */ +if (auto_determine) { +int i; +for (i = 0; i < CPU_SETSIZE; i++) { +if (CPU_ISSET(i, &cpuset)) { +argv = grow_argv(&argv, argc, 2); +argv[argc++] = xstrdup("-c"); +argv[argc++] = xasprintf("0x%08llX", (1ULL
[ovs-dev] [PATCH v12 1/6] netdev-dpdk: Restore thread affinity after DPDK init
When the DPDK init function is called, it changes the executing thread's CPU affinity to a single core specified in -c. This will result in the userspace bridge configuration thread being rebound, even if that is not the intent. This change fixes that behavior by rebinding to the original thread affinity after calling dpdk_init(). Co-authored-by: Kevin Traynor Signed-off-by: Kevin Traynor Signed-off-by: Aaron Conole Tested-by: RobertX Wojciechowicz Tested-by: Sean K Mooney Acked-by: Panu Matilainen Acked-by: Flavio Leitner --- Previous: http://openvswitch.org/pipermail/dev/2016-April/069026.html v12: * No change lib/netdev-dpdk.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 208c5f5..c4b6476 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2763,6 +2763,9 @@ dpdk_init(int argc, char **argv) int result; int base = 0; char *pragram_name = argv[0]; +int err; +int isset; +cpu_set_t cpuset; if (argc < 2 || strcmp(argv[1], "--dpdk")) return 0; @@ -2804,6 +2807,14 @@ dpdk_init(int argc, char **argv) base = 2; } +/* Get the main thread affinity */ +CPU_ZERO(&cpuset); +err = pthread_getaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset); +if (err) { +VLOG_ERR("Thread getaffinity error %d.", err); +return err; +} + /* Keep the program name argument as this is needed for call to * rte_eal_init() */ @@ -2815,6 +2826,13 @@ dpdk_init(int argc, char **argv) ovs_abort(result, "Cannot init EAL"); } +/* Set the main thread affinity back to pre rte_eal_init() value */ +err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset); +if (err) { +VLOG_ERR("Thread setaffinity error %d", err); +return err; +} + rte_memzone_dump(stdout); rte_eal_init_ret = 0; -- 2.5.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v12 2/6] netdev-dpdk: Convert initialization from cmdline to db
Existing DPDK integration is provided by use of command line options which must be split out and passed to librte in a special manner. However, this forces any configuration to be passed by way of a special DPDK flag, and interferes with ovs+dpdk packaging solutions. This commit delays dpdk initialization until after the OVS database connection is established, at which point ovs initializes librte. It pulls all of the config data from the OVS database, and assembles a new argv/argc pair to be passed along. Signed-off-by: Aaron Conole Acked-by: Kevin Traynor --- Previous: http://openvswitch.org/pipermail/dev/2016-April/069028.html v12: * Squashed NEWS change into this commit FAQ.md | 6 +- INSTALL.DPDK.md| 82 +--- NEWS | 5 + lib/automake.mk| 4 + lib/netdev-dpdk.c | 304 + lib/netdev-dpdk.h | 14 +-- lib/netdev-nodpdk.c| 21 tests/ofproto-macros.at| 3 +- utilities/ovs-dev.py | 13 +- vswitchd/bridge.c | 3 + vswitchd/ovs-vswitchd.8.in | 6 +- vswitchd/ovs-vswitchd.c| 27 +--- vswitchd/vswitch.xml | 120 ++ 13 files changed, 463 insertions(+), 145 deletions(-) create mode 100644 lib/netdev-nodpdk.c diff --git a/FAQ.md b/FAQ.md index 16e8a15..37d0a07 100644 --- a/FAQ.md +++ b/FAQ.md @@ -445,9 +445,9 @@ A: Yes. How you configure it depends on what you mean by "promiscuous A: Firstly, you must have a DPDK-enabled version of Open vSwitch. - If your version is DPDK-enabled it will support the --dpdk - argument on the command line and will display lines with - "EAL:..." during startup when --dpdk is supplied. + If your version is DPDK-enabled it will support the other-config:dpdk-init + configuration in the database and will display lines with "EAL:..." + during startup when other_config:dpdk-init is set to 'true'. Secondly, when adding a DPDK port, unlike a system port, the type for the interface must be specified. For example; diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 7f76df8..3cc283a 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -138,22 +138,66 @@ Using the DPDK with ovs-vswitchd: 5. Start vswitchd: - DPDK configuration arguments can be passed to vswitchd via `--dpdk` - argument. This needs to be first argument passed to vswitchd process. - dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter - for dpdk initialization. + DPDK configuration arguments can be passed to vswitchd via Open_vSwitch + other_config column. The recognized configuration options are listed. + Defaults will be provided for all values not explicitly set. + + * dpdk-init + Specifies whether OVS should initialize and support DPDK ports. This is + a boolean, and defaults to false. + + * dpdk-lcore-mask + Specifies the CPU cores on which dpdk lcore threads should be spawned. + The DPDK lcore threads are used for DPDK library tasks, such as + library internal message processing, logging, etc. Value should be in + the form of a hex string (so '0x123') similar to the 'taskset' mask + input. + If not specified, the value will be determined by choosing the lowest + CPU core from initial cpu affinity list. Otherwise, the value will be + passed directly to the DPDK library. + For performance reasons, it is best to set this to a single core on + the system, rather than allow lcore threads to float. + + * dpdk-mem-channels + This sets the number of memory spread channels per CPU socket. It is purely + an optimization flag. + + * dpdk-alloc-mem + This sets the total memory to preallocate from hugepages regardless of + processor socket. It is recommended to use dpdk-socket-mem instead. + + * dpdk-socket-mem + Comma separated list of memory to pre-allocate from hugepages on specific + sockets. + + * dpdk-hugepage-dir + Directory where hugetlbfs is mounted + + * cuse-dev-name + Option to set the vhost_cuse character device name. + + * vhost-sock-dir + Option to set the path to the vhost_user unix socket files. + + NOTE: Changing any of these options requires restarting the ovs-vswitchd + application. + + Open vSwitch can be started as normal. DPDK will be initialized as long + as the dpdk-init option has been set to 'true'. + ``` export DB_SOCK=/usr/local/var/run/openvswitch/db.sock - ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach + ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true + ovs-vswitchd unix:$DB_SOCK --pidfile --detach ``` If allocated more than one GB hugepage (as for IVSHMEM), set amount and use NUMA node 0 memory: ``` - ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ - -- unix:$DB_SOCK --pidfile --detach + ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,0" + ovs-vswitchd
Re: [ovs-dev] [patch_v3] vtep: add source node replication support.
Darrell, I don’t think you can handle the error of unsupported replication mode the way it is described here: + +Indicates that the requested source node replication mode cannot be +supported by the physical switch; this specifically means in this +context that the physical switch lacks the capability to support +source node replication mode. This error occurs when a controller +attempts to set source node replication mode for one of the logical +switches that the physical switch is keeping context for. If this +error occurs, logical switches continue to use service node +replication mode. + If a given logical switch spans 2 or more physical switches, and one of them can’t support source mode replication, but the other one can, then you’ll end up with the logical switch in a mixed state, with one switch falling back to service node mode and the other staying in source node mode. At this point I don’t know what happens, but clearly the controller will not be able to report what mode the logical switch is actually in (short of introducing an explicit “mixed mode”). I propose that the last sentence read: An NVC that observes this error condition should take appropriate action (e.g. reverting the logical switch to service node replication mode). Regarding the description of the alt_replication_mode column, I think it could be clearer. Suggested wording follows: For handling broadcast, multicast (in default manner) and unknown unicast traffic, packets can be sent to all members of a logical switch. There are different modes to replicate the packets. The default mode of replication is to send the traffic to a service node, which can be a hypervisor, server or appliance, and let the service node handle replication to other transport nodes (hypervisors or other VTEP physical switches). This mode is called service node replication. An alternate mode of replication, called source node replication involves the source node sending to all other transport nodes. Hypervisors are always responsible for doing their own replication for locally attached VMs in both modes. Service node mode is the default (and was the only option for prior versions of this schema). Source node mode is an alternate replication mode that may be configured using this column. This column (if used) configures the alternate replication mode for this . There is one valid value presently, source_node. Finally, it’s not strictly necessary to give this group such a long name. It’s a per-logical switch feature by definition (it exists in the table where per logical switch information is configured) so you could shorten the name as follows: http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Hallo
Hallo , Es tut mir leid, Ihnen in dieser Angelegenheit ohne Ihre vorherige Zustimmung zu kontaktieren. Es war aufgrund der Dringlichkeit befestigt. Mein Name ist Eduardus Charlemagne, Ich bin ein Rechtsanwalt und die persönlichen Anwalt zu einer verstorbenen kunde aus Ihre Land. Ich möchte wissen, ob wir zusammenarbeiten können? Seit dem Tod meines Klient hat sein Rechnung mit dem Gleichgewicht von US$ 18,7 Mio. [ Achtzehn Millionen siebenhunderttausend Dollar ] ruhend gewesen mit dem Bank hier in Togo. Der Verstorbene Kunde war ein ausländischer Kunde, der zusammen mit seiner ganzen Familie starb im Autounfall. Ich bin Kontaktaufnahme mit Ihnen, weil Sie den gleichen Nachnamen tragen und kann daher als nächsten Angehörigen des verstorbenes vorgestellt. Mit meiner Unterstützung können Sie von der Bank als nächsten Angehörigen genehmigt sein. Wenn Sie wirklich daran interessiert sind, schicken Sie mir Ihre Details unten: Ihr vollständiger Name: . Ihre Adresse: Beruf: Ihr Familienstand: ... Das Mobiltelefon und Fax: .. Mit freundlichen Grüßen, Eduardus Charlemagne , Esq. Tel: 00228 93501904 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] datapath-windows: Improved offloading on STT tunnel
Hosts after Windows 7 use LSOv2 by default. The issue is with using Windows XP VM on a Win2012R2 host. The latter will be forced to use LSOv1 since XP doesn't support V2. We discussed about this in the previous IRC meeting. You can check the history of the irc channel for more info. Regards, Sairam Venugopal On Apr 26, 2016, at 9:08 AM, Alin Serdean wrote: >>> -UINT32 encapMss = OvsGetExternalMtu(switchContext) - sizeof(IPHdr) - >>> sizeof(TCPHdr); >>> +UINT32 encapMss = OvsGetExternalMtu(switchContext) >>> + - sizeof(IPHdr) >>> + - sizeof(TCPHdr); >>>if (ipTotalLen > encapMss) { >>> -lsoInfo.Value = 0; >>> -lsoInfo.LsoV2Transmit.TcpHeaderOffset = tcpHeaderOffset; >>> -lsoInfo.LsoV2Transmit.MSS = encapMss; >>> -lsoInfo.LsoV2Transmit.Type = >> NDIS_TCP_LARGE_SEND_OFFLOAD_V2_TYPE; >>> -lsoInfo.LsoV2Transmit.IPVersion = >>> NDIS_TCP_LARGE_SEND_OFFLOAD_IPv4; >>> +outerIpHdr->check = IPChecksum((UINT8 *)outerIpHdr, >>> + sizeof *outerIpHdr, 0); >>> +outerTcpHdr->check = IPPseudoChecksum(&fwdInfo->srcIpAddr, >>> + (uint32 *) &tunKey->dst, >>> + IPPROTO_TCP, (uint16) >>> + 0); >>> + >> >> Sai: We discussed about this in the Hyper-V IRC meeting. You can get rid of >> the V1 type since we can¹t enforce the type of an inner VM on outer host. > [Alin Gabriel Serdean: ] My question would be the following. What happens if > the miniport (the actual host adapter) does not support LSO V2. > > Alin. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] debian, rhel: Ship ovs shared libraries and header files
On Fri, Apr 15, 2016 at 01:28:18PM -0700, ec...@vmware.com wrote: > From: Edwin Chiu > > Compile and package ovs shared libraries and create new header > package for debian (openvswitch-dev) and rhel (openvswitch-devel). > > VMware-BZ: #1556299 > Signed-off-by: Edwin Chiu > Co-authored-by: Harold Lim > --- > debian/automake.mk|1 + > debian/control| 16 > debian/openvswitch-common.install |1 + > debian/openvswitch-dev.install| 11 +++ > debian/rules |4 +++- > rhel/openvswitch.spec.in | 38 > +++-- > 6 files changed, 68 insertions(+), 3 deletions(-) > create mode 100644 debian/openvswitch-dev.install > > diff --git a/debian/automake.mk b/debian/automake.mk > index de2350c..7b5b3aa 100644 > --- a/debian/automake.mk > +++ b/debian/automake.mk > @@ -18,6 +18,7 @@ EXTRA_DIST += \ > debian/openvswitch-datapath-source.copyright \ > debian/openvswitch-datapath-source.dirs \ > debian/openvswitch-datapath-source.install \ > + debian/openvswitch-dev.install \ > debian/openvswitch-ipsec.dirs \ > debian/openvswitch-ipsec.init \ > debian/openvswitch-ipsec.install \ > diff --git a/debian/control b/debian/control > index 2918d06..2c07a73 100644 > --- a/debian/control > +++ b/debian/control > @@ -288,3 +288,19 @@ Description: Open vSwitch VTEP utilities > . > This package provides utilities that are useful to interact with a > VTEP-configured database and a VTEP emulator. > + > +Package: openvswitch-dev > +Architecture: linux-any > +Depends: > + openvswitch-common (>= ${binary:Version}), > + ${misc:Depends} > +Description: Open vSwitch development package > + Open vSwitch is a production quality, multilayer, software-based, Ethernet > + virtual switch. It is designed to enable massive network automation through > + programmatic extension, while still supporting standard management > interfaces > + and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). In > + addition, it is designed to support distribution across multiple physical > + servers similar to VMware's vNetwork distributed vswitch or Cisco's Nexus > + 1000V. > + . > + This package provides openvswitch headers and libopenvswitch for developers. > diff --git a/debian/openvswitch-common.install > b/debian/openvswitch-common.install > index 3264ea5..ebb7d5c 100644 > --- a/debian/openvswitch-common.install > +++ b/debian/openvswitch-common.install > @@ -9,3 +9,4 @@ usr/sbin/ovs-bugtool > usr/share/openvswitch/bugtool-plugins > usr/share/openvswitch/scripts/ovs-bugtool-* > usr/share/openvswitch/scripts/ovs-lib > +usr/lib/lib*.so.* > diff --git a/debian/openvswitch-dev.install b/debian/openvswitch-dev.install > new file mode 100644 > index 000..7a34765 > --- /dev/null > +++ b/debian/openvswitch-dev.install > @@ -0,0 +1,11 @@ > +usr/lib/lib*.so > +usr/lib/lib*.a > +usr/lib/pkgconfig > +include/*.h usr/local/include/openvswitch > +include/openflow/*.h usr/local/include/openvswitch/openflow > +include/openvswitch/*.h usr/local/include/openvswitch/openvswitch > +include/sparse/*.h usr/local/include/openvswitch/sparse > +include/sparse/arpa/*.h usr/local/include/openvswitch/sparse/arpa > +include/sparse/netinet/*.h usr/local/include/openvswitch/sparse/netinet > +include/sparse/sys/*.h usr/local/include/openvswitch/sparse/sys > +lib/*.h usr/local/include/openvswitch/lib > diff --git a/debian/rules b/debian/rules > index 2a70bd6..64bc779 100755 > --- a/debian/rules > +++ b/debian/rules > @@ -30,7 +30,7 @@ override_dh_autoreconf: > dh_autoreconf $(DH_AS_NEEDED) > > override_dh_auto_configure: > - dh_auto_configure -- --enable-ssl $(DATAPATH_CONFIGURE_OPTS) > + dh_auto_configure -- --enable-ssl --enable-shared > $(DATAPATH_CONFIGURE_OPTS) > > override_dh_auto_test: > ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS))) > @@ -91,3 +91,5 @@ override_dh_installinit: > > override_dh_strip: > dh_strip --dbg-package=openvswitch-dbg > + > +override_dh_usrlocal: > diff --git a/rhel/openvswitch.spec.in b/rhel/openvswitch.spec.in > index b86f5db..8a47367 100644 > --- a/rhel/openvswitch.spec.in > +++ b/rhel/openvswitch.spec.in > @@ -32,11 +32,19 @@ Open vSwitch provides standard network bridging functions > and > support for the OpenFlow protocol for remote per-flow control of > traffic. > > +%package devel > +Summary:Open vSwitch development package > +Group: Development/Libraries > + > +%description devel > +This package provides openvswitch headers and libopenvswitch for developers. > + > %prep > %setup -q > > %build > -./configure --prefix=/usr --sysconfdir=/etc > --localstatedir=%{_localstatedir} --enable-ssl > +./configure --prefix=/usr --sysconfdir=/etc > --localstatedir=%{_localstatedir} \ > +--libdir=%{_libdir} --enable-ssl --enable-shared > make %{_smp_mflags} > > %install > @@ -69,11 +77,30 @@ rm
Re: [ovs-dev] [PATCH] datapath-windows: Add ICMP types in NetProto.h
Just a small nit: we could use the values from ipmib.h/netiodef.h instead of defining our own types. > -Mesaj original- > De la: dev [mailto:dev-boun...@openvswitch.org] În numele Sairam > Venugopal > Trimis: Thursday, April 14, 2016 11:23 PM > Către: dev@openvswitch.org > Subiect: [ovs-dev] [PATCH] datapath-windows: Add ICMP types in > NetProto.h > > Update NetProto.h to include ICMP and ICMPv6 types. Update ICMP header > to keep it consistent with KVM. Add UDP and ICMP min length definitions. > > Signed-off-by: Sairam Venugopal > --- ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH net-next 0/8] netlink: align attributes when needed (patchset #3)
From: Lars Ellenberg Date: Tue, 26 Apr 2016 13:54:27 +0200 > On Tue, Apr 26, 2016 at 10:06:10AM +0200, Nicolas Dichtel wrote: >> >> This is the continuation (series #3) of the work done to align netlink >> attributes when these attributes contain some 64-bit fields. >> >> It's the last patchset from what I've seen. >> >> The last user of nla_put_u64() is block/drbd. This module does not use >> standard netlink API (see all the stuff in include/linux/genl_magic_struct.h >> and include/linux/genl_magic_func.h). I didn't modify it because it's seems >> hard to do it whithout testing and fully understanding the context > > Something like this should just work. Unfortunately we had problems using unspec, that's why an explicit new padding attribute is added for each netlink attribute set. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH net-next 3/8] fs/quota: use nla_put_u64_64bit()
From: Jan Kara Date: Tue, 26 Apr 2016 13:08:48 +0200 > On Tue 26-04-16 10:06:13, Nicolas Dichtel wrote: >> Signed-off-by: Nicolas Dichtel > > OK, so I somewhat miss a description of what will this do to the netlink > message so that I can judge whether the change is fine for the userspace > counterpart parsing these messages. AFAIU this changes the message format > by adding a QUOTA_NL_A_PAD field before each 64-bit field which needs an > alignment, am I guessing right? Thus when the userspace counterpart uses > genlmsg_parse() it should just silently ignore these attributes if I read > the documentation right. Did I understand this correctly? All userspace components using netlink should always ignore attributes they do not recognize in dumps. This is one of the most basic principles of netlink. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH net-next 0/8] netlink: align attributes when needed (patchset #3)
From: Nicolas Dichtel Date: Tue, 26 Apr 2016 10:06:10 +0200 > The last user of nla_put_u64() is block/drbd. This module does not use > standard netlink API (see all the stuff in include/linux/genl_magic_struct.h > and include/linux/genl_magic_func.h). Yet another example where doing things in a special unique way creates headaches and pain for everyone... sigh. > I didn't modify it because it's seems hard to do it whithout testing > and fully understanding the context (for example, why > include/linux/drbd_genl.h is not part of uapi?). Any thoughts? I think you'll need to work with the drbd maintainer(s) to resolve this and test the result. Series applied, thanks. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] datapath-windows: Validate netlink packets integrity
Beside the alignment problems, looks good to me. Acked-by: Alin Gabriel Serdean Thanks, Alin. > -Mesaj original- > De la: dev [mailto:dev-boun...@openvswitch.org] În numele Paul Boca > Trimis: Monday, April 25, 2016 2:16 PM > Către: dev@openvswitch.org > Subiect: [ovs-dev] [PATCH] datapath-windows: Validate netlink packets > integrity > > Solved access violation when trying to acces netling message - obtained with > forged IOCTLs > > Signed-off-by: Paul-Daniel Boca > --- > datapath-windows/ovsext/Datapath.c| 45 +--- > datapath-windows/ovsext/Flow.c| 42 ++- > datapath-windows/ovsext/Netlink/Netlink.c | 68 > ++- datapath- > windows/ovsext/Netlink/Netlink.h | 15 +-- > datapath-windows/ovsext/User.c| 5 ++- > datapath-windows/ovsext/Vport.c | 34 > lib/netlink-socket.c | 2 + > 7 files changed, 154 insertions(+), 57 deletions(-) > > diff --git a/datapath-windows/ovsext/Datapath.c b/datapath- > windows/ovsext/Datapath.c > index 0a25af0..6f97693 100644 > --- a/datapath-windows/ovsext/Datapath.c > +++ b/datapath-windows/ovsext/Datapath.c > @@ -307,6 +307,7 @@ static NTSTATUS MapIrpOutputBuffer(PIRP irp, static > NTSTATUS ValidateNetlinkCmd(UINT32 devOp, > POVS_OPEN_INSTANCE instance, > POVS_MESSAGE ovsMsg, > + UINT32 ovsMgsLength, > NETLINK_FAMILY *nlFamilyOps); static > NTSTATUS > InvokeNetlinkCmdHandler(POVS_USER_PARAMS_CONTEXT usrParamsCtx, > NETLINK_FAMILY *nlFamilyOps, @@ > -693,6 +694,7 @@ > OvsDeviceControl(PDEVICE_OBJECT deviceObject, > UINT32 devOp; > OVS_MESSAGE ovsMsgReadOp; > POVS_MESSAGE ovsMsg; > +UINT32 ovsMsgLength = 0; > NETLINK_FAMILY *nlFamilyOps; > OVS_USER_PARAMS_CONTEXT usrParamsCtx; > > @@ -772,6 +774,7 @@ OvsDeviceControl(PDEVICE_OBJECT deviceObject, > } > > ovsMsg = inputBuffer; > +ovsMsgLength = inputBufferLen; > devOp = OVS_TRANSACTION_DEV_OP; > break; > > @@ -806,6 +809,7 @@ OvsDeviceControl(PDEVICE_OBJECT deviceObject, >OVS_CTRL_CMD_EVENT_NOTIFY : >OVS_CTRL_CMD_READ_NOTIFY; > ovsMsg->genlMsg.version = nlControlFamilyOps.version; > +ovsMsgLength = outputBufferLen; > > devOp = OVS_READ_DEV_OP; > break; > @@ -851,6 +855,7 @@ OvsDeviceControl(PDEVICE_OBJECT deviceObject, > > /* Create an NL message for consumption. */ > ovsMsg = &ovsMsgReadOp; > +ovsMsgLength = sizeof (ovsMsgReadOp); > devOp = OVS_READ_DEV_OP; > > break; > @@ -862,7 +867,21 @@ OvsDeviceControl(PDEVICE_OBJECT deviceObject, > goto done; > } > > +/* > + * Output buffer not mandatory but map it in case we have something > + * to return to requester. > +*/ > +if (outputBufferLen != 0) { > +status = MapIrpOutputBuffer(irp, outputBufferLen, > +sizeof *ovsMsg, &outputBuffer); > +if (status != STATUS_SUCCESS) { > +goto done; > +} > +ASSERT(outputBuffer); > +} > + > ovsMsg = inputBuffer; > +ovsMsgLength = inputBufferLen; > devOp = OVS_WRITE_DEV_OP; > break; > > @@ -901,7 +920,8 @@ OvsDeviceControl(PDEVICE_OBJECT deviceObject, > * "artificial" or was copied from a previously validated 'ovsMsg'. > */ > if (devOp != OVS_READ_DEV_OP) { > -status = ValidateNetlinkCmd(devOp, instance, ovsMsg, nlFamilyOps); > +status = ValidateNetlinkCmd(devOp, instance, ovsMsg, > +ovsMsgLength, nlFamilyOps); > if (status != STATUS_SUCCESS) { > goto done; > } > @@ -936,11 +956,18 @@ static NTSTATUS > ValidateNetlinkCmd(UINT32 devOp, > POVS_OPEN_INSTANCE instance, > POVS_MESSAGE ovsMsg, > + UINT32 ovsMsgLength, > NETLINK_FAMILY *nlFamilyOps) { > NTSTATUS status = STATUS_INVALID_PARAMETER; > UINT16 i; > > +// We need to ensure we have enough data to process > +if (NlMsgSize(&ovsMsg->nlMsg) > ovsMsgLength) { > +status = STATUS_INVALID_PARAMETER; > +goto done; > +} > + > for (i = 0; i < nlFamilyOps->opsCount; i++) { > if (nlFamilyOps->cmds[i].cmd == ovsMsg->genlMsg.cmd) { > /* Validate if the command is valid for the device operation. */ > @@ - > 975,6 +1002,14 @@ ValidateNetlinkCmd(UINT32 devOp, > } > } > > +// validate all NlAttrs > +if (!NlValidateAllAttrs(&ovsMsg->nlMsg, sizeof(*ovsMsg), > +NlMsgAttrsLen((PNL_MSG_HDR)o
Re: [ovs-dev] [PATCH v2] datapath-windows: Improved offloading on STT tunnel
> >-UINT32 encapMss = OvsGetExternalMtu(switchContext) - sizeof(IPHdr) - > >sizeof(TCPHdr); > >+UINT32 encapMss = OvsGetExternalMtu(switchContext) > >+ - sizeof(IPHdr) > >+ - sizeof(TCPHdr); > > if (ipTotalLen > encapMss) { > >-lsoInfo.Value = 0; > >-lsoInfo.LsoV2Transmit.TcpHeaderOffset = tcpHeaderOffset; > >-lsoInfo.LsoV2Transmit.MSS = encapMss; > >-lsoInfo.LsoV2Transmit.Type = > NDIS_TCP_LARGE_SEND_OFFLOAD_V2_TYPE; > >-lsoInfo.LsoV2Transmit.IPVersion = > >NDIS_TCP_LARGE_SEND_OFFLOAD_IPv4; > >+outerIpHdr->check = IPChecksum((UINT8 *)outerIpHdr, > >+ sizeof *outerIpHdr, 0); > >+outerTcpHdr->check = IPPseudoChecksum(&fwdInfo->srcIpAddr, > >+ (uint32 *) &tunKey->dst, > >+ IPPROTO_TCP, (uint16) > >+ 0); > >+ > > Sai: We discussed about this in the Hyper-V IRC meeting. You can get rid of > the V1 type since we can¹t enforce the type of an inner VM on outer host. > [Alin Gabriel Serdean: ] My question would be the following. What happens if the miniport (the actual host adapter) does not support LSO V2. Alin. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] SAI ofproto provider for OvS
On Mon, Apr 25, 2016 at 06:07:36PM +0200, Petr Machata wrote: > Ben Pfaff writes: > > >> > want to make sure that the additional providers can be built without > >> > installing code that is available only under NDA. > >> > >> That is understood. The SAI backend will be optional. > > > > That's not exactly what I mean. I mean, there must be a way for users > > to compile your new backend without installing code that is available > > under NDA. To put it another way, if I cannot compile your new code > > without signing an agreement with you, then that is unacceptable and we > > will not take the patches. > > Sorry, this fell through the cracks, so I'm only getting to it now. > > SAI API is open source, so no NDAs would be necessary. Great, thanks. I'll look forwarding to reviewing patches when they become available. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v9 06/15] hmap: Add HMAP_FOR_EACH_POP.
On Fri, Apr 22, 2016 at 06:02:58PM -0700, Daniele Di Proietto wrote: > Makes popping each member of the hmap a bit easier. > > Signed-off-by: Daniele Di Proietto Much better, thank you! Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] datapath-windows: Pause switch state on PnP event
On Thu, Mar 10, 2016 at 01:33:42PM +, Alin Serdean wrote: > A PnP(plug and play) event will be triggered before trying to disable > the extension. We could use this PnP event to prepare for detaching > the datapath. > > This patch sets the switch into a paused state so no more net buffers > are queued. > > Also clean some commentaries. > > Signed-off-by: Alin Gabriel Serdean > --- > v2: Address comments Applied, thanks! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 0/2] ovn: QOS updates with DSCP support
On Mon, Apr 25, 2016 at 04:41:40PM +0530, Babu Shanmugam wrote: > On Friday 22 April 2016 10:51 PM, Ben Pfaff wrote: > >On Fri, Apr 22, 2016 at 12:44:12PM +0530, bscha...@redhat.com wrote: > >>From: Babu Shanmugam > >> > >>Following are done through this series > >>1. Changed the old approach of policing the packets. It is now shaped > >>with queues. Changed the Logical_Port options for NB db > >>2. Support of DSCP marking through options field in Logical_Port table > >> > >>Babu Shanmugam (2): > >> ovn: Replace the QOS policing parameters with the usage of QOS table > >> ovn: QOS DSCP markings for ports > >Have you tested this? There are at least two aspects that seem relevant > >to testing. First, propagating queuing through tunnels is somewhat > >indirect and one needs to make sure that the QoS configuration actually > >makes it to the physical device. Second, HTB has a reputation for poor > >quality for links above about 1 Gbps, which isn't very fast > >anymore--that's why we also support HFSC. > Ben, I have not tested these aspects. The reason I used HTB is mainly > because it supports > burst setting. From vswithc.conf.db man page, HFSC does not seem to have an > option > for burst setting. > I could not understand how "propagating queuing through tunnels is somewhat > indirect". I can test it if you > can give some more information on the problem. Usually for shaping it only makes sense to configure it on the physical NIC network device. Does your series do that? If you haven't tested it, it's hard for me to imagine it working. Why is burst setting valuable? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] datapath-windows: Pause switch state on PnP event
Acked-by: Nithin Raju -Original Message- From: dev on behalf of Sairam Venugopal Date: Monday, April 25, 2016 at 2:38 PM To: Alin Serdean , "dev@openvswitch.org" Subject: Re: [ovs-dev] [PATCH v2] datapath-windows: Pause switch state on PnP event >Thanks for incorporating the review comments. I noticed that V2 hasn¹t >been ACKED yet. > >Acked-by: Sairam Venugopal > > > >On 3/10/16, 5:33 AM, "Alin Serdean" >wrote: > >>A PnP(plug and play) event will be triggered before trying to disable >>the extension. We could use this PnP event to prepare for detaching >>the datapath. >> >>This patch sets the switch into a paused state so no more net buffers >>are queued. >> >>Also clean some commentaries. >> >>Signed-off-by: Alin Gabriel Serdean >>--- >>v2: Address comments >>--- >> datapath-windows/ovsext/Switch.c | 8 ++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >>diff --git a/datapath-windows/ovsext/Switch.c >>b/datapath-windows/ovsext/Switch.c >>index 77bafb4..2a19988 100644 >>--- a/datapath-windows/ovsext/Switch.c >>+++ b/datapath-windows/ovsext/Switch.c >>@@ -345,14 +345,13 @@ OvsExtPause(NDIS_HANDLE filterModuleContext, >> OVS_LOG_TRACE("Enter: filterModuleContext %p", >> filterModuleContext); >> >>-ASSERT(switchContext->dataFlowState == OvsSwitchRunning); >> switchContext->dataFlowState = OvsSwitchPaused; >> KeMemoryBarrier(); >> while(switchContext->pendingOidCount > 0) { >> NdisMSleep(1000); >> } >> >>-OVS_LOG_TRACE("Exit: OvsDetach Successfully"); >>+OVS_LOG_TRACE("Exit: OvsExtPause Successfully"); >> return NDIS_STATUS_SUCCESS; >> } >> >>@@ -605,6 +604,11 @@ OvsExtNetPnPEvent(NDIS_HANDLE filterModuleContext, >> } >> } >> >>+if (netPnPEvent->NetPnPEvent.NetEvent == NetEventFilterPreDetach) { >>+switchContext->dataFlowState = OvsSwitchPaused; >>+KeMemoryBarrier(); >>+} >>+ >> status = NdisFNetPnPEvent(switchContext->NdisFilterHandle, >> netPnPEvent); >> OVS_LOG_TRACE("Exit: OvsExtNetPnPEvent"); >>-- >>1.9.5.msysgit.0 >>___ >>dev mailing list >>dev@openvswitch.org >>https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_mailm >>an_listinfo_dev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r= >>pNHQcdr7B40b4h6Yb7FIedI1dnBsxdDuTLBYD3JqV80&m=PAT4l_htdzZf5en9YHeXbuJ_V8b >>cEK9etu52fQdn03k&s=-8S9mRqroWfEtzpHQOqA0GTM5IWhnuHVihkZjcuEe0c&e= > >___ >dev mailing list >dev@openvswitch.org >https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_mailma >n_listinfo_dev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=pN >HQcdr7B40b4h6Yb7FIedI1dnBsxdDuTLBYD3JqV80&m=PAT4l_htdzZf5en9YHeXbuJ_V8bcEK >9etu52fQdn03k&s=-8S9mRqroWfEtzpHQOqA0GTM5IWhnuHVihkZjcuEe0c&e= ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH net-next 3/8] fs/quota: use nla_put_u64_64bit()
On Tue 26-04-16 14:31:58, Nicolas Dichtel wrote: > Le 26/04/2016 13:08, Jan Kara a écrit : > > On Tue 26-04-16 10:06:13, Nicolas Dichtel wrote: > >> Signed-off-by: Nicolas Dichtel > > > > OK, so I somewhat miss a description of what will this do to the netlink > > message so that I can judge whether the change is fine for the userspace > > counterpart parsing these messages. AFAIU this changes the message format > > by adding a QUOTA_NL_A_PAD field before each 64-bit field which needs an > > alignment, am I guessing right? Thus when the userspace counterpart uses > > genlmsg_parse() it should just silently ignore these attributes if I read > > the documentation right. Did I understand this correctly? > Yes, that's it. OK, then feel free to add: Acked-by: Jan Kara to the quota patch. Honza -- Jan Kara SUSE Labs, CR ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH net-next 3/8] fs/quota: use nla_put_u64_64bit()
Le 26/04/2016 13:08, Jan Kara a écrit : > On Tue 26-04-16 10:06:13, Nicolas Dichtel wrote: >> Signed-off-by: Nicolas Dichtel > > OK, so I somewhat miss a description of what will this do to the netlink > message so that I can judge whether the change is fine for the userspace > counterpart parsing these messages. AFAIU this changes the message format > by adding a QUOTA_NL_A_PAD field before each 64-bit field which needs an > alignment, am I guessing right? Thus when the userspace counterpart uses > genlmsg_parse() it should just silently ignore these attributes if I read > the documentation right. Did I understand this correctly? Yes, that's it. Regards, Nicolas ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [Drbd-dev] [PATCH net-next 0/8] netlink: align attributes when needed (patchset #3)
On Tue, Apr 26, 2016 at 01:54:27PM +0200, Lars Ellenberg wrote: > On Tue, Apr 26, 2016 at 10:06:10AM +0200, Nicolas Dichtel wrote: > > > > This is the continuation (series #3) of the work done to align netlink > > attributes when these attributes contain some 64-bit fields. > > > > It's the last patchset from what I've seen. > > > > The last user of nla_put_u64() is block/drbd. This module does not use > > standard netlink API (see all the stuff in include/linux/genl_magic_struct.h > > and include/linux/genl_magic_func.h). I didn't modify it because it's seems > > hard to do it whithout testing and fully understanding the context > > Something like this should just work. > + * @attrtype: attribute type > + * @value: numeric value > + */ > +static inline int nla_put_u64_64bit_unspec(struct sk_buff *skb, int attrtype, > + u64 value) > +{ > + return nla_put_64bit(skb, attrtype, sizeof(u64), &value, NLA_UNSPEC); Ok, I confused attribute and policy type there for a sec. Anyways, 0 works just fine, all our nested attribute enums are != 0, because nla_parse skips type 0. Lars ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH net-next 0/8] netlink: align attributes when needed (patchset #3)
On Tue, Apr 26, 2016 at 10:06:10AM +0200, Nicolas Dichtel wrote: > > This is the continuation (series #3) of the work done to align netlink > attributes when these attributes contain some 64-bit fields. > > It's the last patchset from what I've seen. > > The last user of nla_put_u64() is block/drbd. This module does not use > standard netlink API (see all the stuff in include/linux/genl_magic_struct.h > and include/linux/genl_magic_func.h). I didn't modify it because it's seems > hard to do it whithout testing and fully understanding the context Something like this should just work. diff --git a/include/linux/genl_magic_struct.h b/include/linux/genl_magic_struct.h index eecd19b..5715dac 100644 --- a/include/linux/genl_magic_struct.h +++ b/include/linux/genl_magic_struct.h @@ -80,7 +80,7 @@ extern void CONCAT_(GENL_MAGIC_FAMILY, _genl_unregister)(void); nla_get_u32, nla_put_u32, true) #define __u64_field(attr_nr, attr_flag, name) \ __field(attr_nr, attr_flag, name, NLA_U64, __u64, \ - nla_get_u64, nla_put_u64, false) + nla_get_u64, nla_put_u64_64bit_unspec, false) #define __str_field(attr_nr, attr_flag, name, maxlen) \ __array(attr_nr, attr_flag, name, NLA_NUL_STRING, char, maxlen, \ nla_strlcpy, nla_put, false) diff --git a/include/net/netlink.h b/include/net/netlink.h index e589cb3..38ba770 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -871,6 +871,18 @@ static inline int nla_put_u64_64bit(struct sk_buff *skb, int attrtype, } /** + * nla_put_u64_64bit_unspec - nla_put_u64_64bit() with padattr = 0 + * @skb: socket buffer to add attribute to + * @attrtype: attribute type + * @value: numeric value + */ +static inline int nla_put_u64_64bit_unspec(struct sk_buff *skb, int attrtype, + u64 value) +{ + return nla_put_64bit(skb, attrtype, sizeof(u64), &value, NLA_UNSPEC); +} + +/** * nla_put_be64 - Add a __be64 netlink attribute to a socket buffer and align it * @skb: socket buffer to add attribute to * @attrtype: attribute type > (for > example, why include/linux/drbd_genl.h is not part of uapi?). > Any thoughts? probably should be. Thanks, Lars ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] BORSA DI STUDIO CANADA (2016-2017)
Tramite la direzione regionale delle borse straniere canadese, il segretariato di stato in esame ed alla ricerca dell'università LAVAL lancia una chiamata alla candidatura per 400 borse di studio canadese di l'anno accademico 2016-2017. Queste borse sono destinate ai cittadini dei paesi della categoria A (paesi industrializzati europei, ed extra europei) e quelli dei paesi della categoria B (paese in via di sviluppo, del terzo mondo ed extra - europei) devono permettere loro di proseguire i loro studi, perfezionare le loro conoscenze per i lavori di ricerche nei domini ai quali l'università Laval presta un'attenzione particolare. SPECIFICITÀ DELLA BORSA: - L'università Laval intende facilitare l'immigrazione alle persone desiderose di proseguire i loro studi e conseguire diplomi di stato canadese. - I candidati trattenuti al termine della selezione di candidature saranno inseriti oltre al loro studio nei settori sensibili della vita economica e sociale del Canada: (salute, diritto, diplomazia, comunicazione, finanze, energia, industria, trasporto, agricoltura…). Quest'opzione dell'università Laval mira a dare un'attitudine professionale agli operatori di borsa per potere lavorare se lo desiderano in Canada alla fine della loro formazione. DURATA DELLA BORSA: Le borse coprono il periodo di un ciclo di formazione o al massimo sei (06) semestri vedere più. SPESE DI VIAGGIO: I biglietti di aereo andata e ritorno (origine – Canada Quebec), sono compito della direzione regionale delle borse straniere canadese. Condizioni preliminari alla candidatura di norma, i candidati alle borse straniere canadese devono: - Età necessaria 18 anni a 60 anni; - Comprendere e parlare correttamente una delle lingue d'insegnamento anno Canada (inglesi, francesi, spagnolo, tedesco, italiano); - Avere un diploma equivalente al brevetto di studio di primo ciclo d'insegnamento, alla maturità o al brevetto d'attitudine professionale dei paesi dell'unione europea. PROCEDURA DI SELEZIONE: - Volete fare la vostra domanda di borsa richiedendo il formulario d'iscrizione da riempire a invia all'indirizzo smalto del nostro segretariato: boursedetudecan...@usa.com Buona possibilità. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH net-next 3/8] fs/quota: use nla_put_u64_64bit()
On Tue 26-04-16 10:06:13, Nicolas Dichtel wrote: > Signed-off-by: Nicolas Dichtel OK, so I somewhat miss a description of what will this do to the netlink message so that I can judge whether the change is fine for the userspace counterpart parsing these messages. AFAIU this changes the message format by adding a QUOTA_NL_A_PAD field before each 64-bit field which needs an alignment, am I guessing right? Thus when the userspace counterpart uses genlmsg_parse() it should just silently ignore these attributes if I read the documentation right. Did I understand this correctly? Honza > --- > fs/quota/netlink.c | 12 +++- > include/uapi/linux/quota.h | 1 + > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/fs/quota/netlink.c b/fs/quota/netlink.c > index d07a2f91d858..8b252673d454 100644 > --- a/fs/quota/netlink.c > +++ b/fs/quota/netlink.c > @@ -47,7 +47,7 @@ void quota_send_warning(struct kqid qid, dev_t dev, > void *msg_head; > int ret; > int msg_size = 4 * nla_total_size(sizeof(u32)) + > -2 * nla_total_size(sizeof(u64)); > +2 * nla_total_size_64bit(sizeof(u64)); > > /* We have to allocate using GFP_NOFS as we are called from a >* filesystem performing write and thus further recursion into > @@ -68,8 +68,9 @@ void quota_send_warning(struct kqid qid, dev_t dev, > ret = nla_put_u32(skb, QUOTA_NL_A_QTYPE, qid.type); > if (ret) > goto attr_err_out; > - ret = nla_put_u64(skb, QUOTA_NL_A_EXCESS_ID, > - from_kqid_munged(&init_user_ns, qid)); > + ret = nla_put_u64_64bit(skb, QUOTA_NL_A_EXCESS_ID, > + from_kqid_munged(&init_user_ns, qid), > + QUOTA_NL_A_PAD); > if (ret) > goto attr_err_out; > ret = nla_put_u32(skb, QUOTA_NL_A_WARNING, warntype); > @@ -81,8 +82,9 @@ void quota_send_warning(struct kqid qid, dev_t dev, > ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MINOR, MINOR(dev)); > if (ret) > goto attr_err_out; > - ret = nla_put_u64(skb, QUOTA_NL_A_CAUSED_ID, > - from_kuid_munged(&init_user_ns, current_uid())); > + ret = nla_put_u64_64bit(skb, QUOTA_NL_A_CAUSED_ID, > + from_kuid_munged(&init_user_ns, current_uid()), > + QUOTA_NL_A_PAD); > if (ret) > goto attr_err_out; > genlmsg_end(skb, msg_head); > diff --git a/include/uapi/linux/quota.h b/include/uapi/linux/quota.h > index 38baddb807f5..4d2489ef6f10 100644 > --- a/include/uapi/linux/quota.h > +++ b/include/uapi/linux/quota.h > @@ -191,6 +191,7 @@ enum { > QUOTA_NL_A_DEV_MAJOR, > QUOTA_NL_A_DEV_MINOR, > QUOTA_NL_A_CAUSED_ID, > + QUOTA_NL_A_PAD, > __QUOTA_NL_A_MAX, > }; > #define QUOTA_NL_A_MAX (__QUOTA_NL_A_MAX - 1) > -- > 2.8.1 > > -- Jan Kara SUSE Labs, CR ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH net-next 8/8] sched: align nlattr properly when needed
Signed-off-by: Nicolas Dichtel --- Documentation/networking/gen_stats.txt | 6 -- include/net/gen_stats.h | 6 -- include/uapi/linux/gen_stats.h | 1 + include/uapi/linux/pkt_cls.h| 2 ++ include/uapi/linux/rtnetlink.h | 1 + include/uapi/linux/tc_act/tc_bpf.h | 1 + include/uapi/linux/tc_act/tc_connmark.h | 1 + include/uapi/linux/tc_act/tc_csum.h | 1 + include/uapi/linux/tc_act/tc_defact.h | 1 + include/uapi/linux/tc_act/tc_gact.h | 1 + include/uapi/linux/tc_act/tc_ife.h | 1 + include/uapi/linux/tc_act/tc_ipt.h | 1 + include/uapi/linux/tc_act/tc_mirred.h | 1 + include/uapi/linux/tc_act/tc_nat.h | 1 + include/uapi/linux/tc_act/tc_pedit.h| 1 + include/uapi/linux/tc_act/tc_skbedit.h | 1 + include/uapi/linux/tc_act/tc_vlan.h | 1 + net/core/gen_stats.c| 35 - net/sched/act_api.c | 7 +-- net/sched/act_bpf.c | 3 ++- net/sched/act_connmark.c| 3 ++- net/sched/act_csum.c| 2 +- net/sched/act_gact.c| 2 +- net/sched/act_ife.c | 2 +- net/sched/act_ipt.c | 2 +- net/sched/act_mirred.c | 2 +- net/sched/act_nat.c | 2 +- net/sched/act_pedit.c | 2 +- net/sched/act_simple.c | 2 +- net/sched/act_skbedit.c | 2 +- net/sched/act_vlan.c| 2 +- net/sched/cls_u32.c | 7 --- net/sched/sch_api.c | 6 -- 33 files changed, 72 insertions(+), 37 deletions(-) diff --git a/Documentation/networking/gen_stats.txt b/Documentation/networking/gen_stats.txt index 70e6275b757a..ff630a87b511 100644 --- a/Documentation/networking/gen_stats.txt +++ b/Documentation/networking/gen_stats.txt @@ -33,7 +33,8 @@ my_dumping_routine(struct sk_buff *skb, ...) { struct gnet_dump dump; - if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump) < 0) + if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump, + TCA_PAD) < 0) goto rtattr_failure; if (gnet_stats_copy_basic(&dump, &mystruct->bstats) < 0 || @@ -56,7 +57,8 @@ existing TLV types. my_dumping_routine(struct sk_buff *skb, ...) { if (gnet_stats_start_copy_compat(skb, TCA_STATS2, TCA_STATS, - TCA_XSTATS, &mystruct->lock, &dump) < 0) +TCA_XSTATS, &mystruct->lock, &dump, +TCA_PAD) < 0) goto rtattr_failure; ... } diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h index cbafa3768d48..610cd397890e 100644 --- a/include/net/gen_stats.h +++ b/include/net/gen_stats.h @@ -19,17 +19,19 @@ struct gnet_dump { /* Backward compatibility */ int compat_tc_stats; int compat_xstats; + int padattr; void *xstats; int xstats_len; struct tc_stats tc_stats; }; int gnet_stats_start_copy(struct sk_buff *skb, int type, spinlock_t *lock, - struct gnet_dump *d); + struct gnet_dump *d, int padattr); int gnet_stats_start_copy_compat(struct sk_buff *skb, int type, int tc_stats_type, int xstats_type, -spinlock_t *lock, struct gnet_dump *d); +spinlock_t *lock, struct gnet_dump *d, +int padattr); int gnet_stats_copy_basic(struct gnet_dump *d, struct gnet_stats_basic_cpu __percpu *cpu, diff --git a/include/uapi/linux/gen_stats.h b/include/uapi/linux/gen_stats.h index 6487317ea619..52deccc2128e 100644 --- a/include/uapi/linux/gen_stats.h +++ b/include/uapi/linux/gen_stats.h @@ -10,6 +10,7 @@ enum { TCA_STATS_QUEUE, TCA_STATS_APP, TCA_STATS_RATE_EST64, + TCA_STATS_PAD, __TCA_STATS_MAX, }; #define TCA_STATS_MAX (__TCA_STATS_MAX - 1) diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index c43c5f78b9c4..84660905fedf 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -66,6 +66,7 @@ enum { TCA_ACT_OPTIONS, TCA_ACT_INDEX, TCA_ACT_STATS, + TCA_ACT_PAD, __TCA_ACT_MAX }; @@ -173,6 +174,7 @@ enum { TCA_U32_PCNT, TCA_U32_MARK, TCA_U32_FLAGS, + TCA_U32_PAD, __TCA_U32_MAX }; diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index a94e0b69c769..262f0379d83a 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -542,6 +542,7 @@ enum { TCA_FCNT, TCA_STATS2, TCA
[ovs-dev] [PATCH net-next 2/8] drivers/wireless: use nla_put_u64_64bit()
Signed-off-by: Nicolas Dichtel --- drivers/net/wireless/mac80211_hwsim.c | 2 +- drivers/net/wireless/mac80211_hwsim.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index c757f14c4c00..9ed0ed1bf514 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -1030,7 +1030,7 @@ static void mac80211_hwsim_tx_frame_nl(struct ieee80211_hw *hw, data->pending_cookie++; cookie = data->pending_cookie; info->rate_driver_data[0] = (void *)cookie; - if (nla_put_u64(skb, HWSIM_ATTR_COOKIE, cookie)) + if (nla_put_u64_64bit(skb, HWSIM_ATTR_COOKIE, cookie, HWSIM_ATTR_PAD)) goto nla_put_failure; genlmsg_end(skb, msg_head); diff --git a/drivers/net/wireless/mac80211_hwsim.h b/drivers/net/wireless/mac80211_hwsim.h index 66e1c73bd507..39f22467ca2a 100644 --- a/drivers/net/wireless/mac80211_hwsim.h +++ b/drivers/net/wireless/mac80211_hwsim.h @@ -148,6 +148,7 @@ enum { HWSIM_ATTR_RADIO_NAME, HWSIM_ATTR_NO_VIF, HWSIM_ATTR_FREQ, + HWSIM_ATTR_PAD, __HWSIM_ATTR_MAX, }; #define HWSIM_ATTR_MAX (__HWSIM_ATTR_MAX - 1) -- 2.8.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH net-next 4/8] sock_diag: align nlattr properly when needed
I also fix the value of INET_DIAG_MAX. It's wrong since commit 8f840e47f190 which is only in net-next right now, thus I didn't make a separate patch. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Nicolas Dichtel --- include/uapi/linux/inet_diag.h | 4 +++- net/core/sock_diag.c | 2 +- net/ipv4/inet_diag.c | 9 ++--- net/sctp/sctp_diag.c | 5 +++-- 4 files changed, 13 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h index f5f3629dd553..a16643705669 100644 --- a/include/uapi/linux/inet_diag.h +++ b/include/uapi/linux/inet_diag.h @@ -115,9 +115,11 @@ enum { INET_DIAG_SKV6ONLY, INET_DIAG_LOCALS, INET_DIAG_PEERS, + INET_DIAG_PAD, + __INET_DIAG_MAX, }; -#define INET_DIAG_MAX INET_DIAG_SKV6ONLY +#define INET_DIAG_MAX (__INET_DIAG_MAX - 1) /* INET_DIAG_MEM */ diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c index ca9e35bbe13c..6b10573cc9fa 100644 --- a/net/core/sock_diag.c +++ b/net/core/sock_diag.c @@ -120,7 +120,7 @@ static size_t sock_diag_nlmsg_size(void) { return NLMSG_ALIGN(sizeof(struct inet_diag_msg) + nla_total_size(sizeof(u8)) /* INET_DIAG_PROTOCOL */ - + nla_total_size(sizeof(struct tcp_info))); /* INET_DIAG_INFO */ + + nla_total_size_64bit(sizeof(struct tcp_info))); /* INET_DIAG_INFO */ } static void sock_diag_broadcast_destroy_work(struct work_struct *work) diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index ad7956fa659a..25af1243649b 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -220,8 +220,9 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, } if ((ext & (1 << (INET_DIAG_INFO - 1))) && handler->idiag_info_size) { - attr = nla_reserve(skb, INET_DIAG_INFO, - handler->idiag_info_size); + attr = nla_reserve_64bit(skb, INET_DIAG_INFO, +handler->idiag_info_size, +INET_DIAG_PAD); if (!attr) goto errout; @@ -1078,7 +1079,9 @@ int inet_diag_handler_get_info(struct sk_buff *skb, struct sock *sk) } attr = handler->idiag_info_size - ? nla_reserve(skb, INET_DIAG_INFO, handler->idiag_info_size) + ? nla_reserve_64bit(skb, INET_DIAG_INFO, + handler->idiag_info_size, + INET_DIAG_PAD) : NULL; if (attr) info = nla_data(attr); diff --git a/net/sctp/sctp_diag.c b/net/sctp/sctp_diag.c index bb2d8d9608e9..84829fff3bc9 100644 --- a/net/sctp/sctp_diag.c +++ b/net/sctp/sctp_diag.c @@ -161,8 +161,9 @@ static int inet_sctp_diag_fill(struct sock *sk, struct sctp_association *asoc, if (ext & (1 << (INET_DIAG_INFO - 1))) { struct nlattr *attr; - attr = nla_reserve(skb, INET_DIAG_INFO, - sizeof(struct sctp_info)); + attr = nla_reserve_64bit(skb, INET_DIAG_INFO, +sizeof(struct sctp_info), +INET_DIAG_PAD); if (!attr) goto errout; -- 2.8.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH net-next 7/8] neigh: align nlattr properly when needed
Signed-off-by: Nicolas Dichtel --- net/core/neighbour.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 6a395d440228..29dd8cc22bbf 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1857,7 +1857,8 @@ static int neightbl_fill_info(struct sk_buff *skb, struct neigh_table *tbl, ndst.ndts_table_fulls += st->table_fulls; } - if (nla_put(skb, NDTA_STATS, sizeof(ndst), &ndst)) + if (nla_put_64bit(skb, NDTA_STATS, sizeof(ndst), &ndst, + NDTA_PAD)) goto nla_put_failure; } -- 2.8.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH net-next 5/8] ovs: align nlattr properly when needed
I also fix commit 8b32ab9e6ef1: use nla_total_size_64bit() for OVS_FLOW_ATTR_USED in ovs_flow_cmd_msg_size(). Fixes: 8b32ab9e6ef1 ("ovs: use nla_put_u64_64bit()") Signed-off-by: Nicolas Dichtel --- include/uapi/linux/openvswitch.h | 2 ++ net/openvswitch/datapath.c | 27 +++ 2 files changed, 17 insertions(+), 12 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index d6be1fb778a5..bb0d515b7654 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -84,6 +84,7 @@ enum ovs_datapath_attr { OVS_DP_ATTR_STATS, /* struct ovs_dp_stats */ OVS_DP_ATTR_MEGAFLOW_STATS, /* struct ovs_dp_megaflow_stats */ OVS_DP_ATTR_USER_FEATURES, /* OVS_DP_F_* */ + OVS_DP_ATTR_PAD, __OVS_DP_ATTR_MAX }; @@ -253,6 +254,7 @@ enum ovs_vport_attr { OVS_VPORT_ATTR_UPCALL_PID, /* array of u32 Netlink socket PIDs for */ /* receiving upcalls */ OVS_VPORT_ATTR_STATS, /* struct ovs_vport_stats */ + OVS_VPORT_ATTR_PAD, __OVS_VPORT_ATTR_MAX }; diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 22d9a5316304..856bd8dba676 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -738,9 +738,9 @@ static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts, len += nla_total_size(acts->orig_len); return len - + nla_total_size(sizeof(struct ovs_flow_stats)) /* OVS_FLOW_ATTR_STATS */ + + nla_total_size_64bit(sizeof(struct ovs_flow_stats)) /* OVS_FLOW_ATTR_STATS */ + nla_total_size(1) /* OVS_FLOW_ATTR_TCP_FLAGS */ - + nla_total_size(8); /* OVS_FLOW_ATTR_USED */ + + nla_total_size_64bit(8); /* OVS_FLOW_ATTR_USED */ } /* Called with ovs_mutex or RCU read lock. */ @@ -759,7 +759,9 @@ static int ovs_flow_cmd_fill_stats(const struct sw_flow *flow, return -EMSGSIZE; if (stats.n_packets && - nla_put(skb, OVS_FLOW_ATTR_STATS, sizeof(struct ovs_flow_stats), &stats)) + nla_put_64bit(skb, OVS_FLOW_ATTR_STATS, + sizeof(struct ovs_flow_stats), &stats, + OVS_FLOW_ATTR_PAD)) return -EMSGSIZE; if ((u8)ntohs(tcp_flags) && @@ -1435,8 +1437,8 @@ static size_t ovs_dp_cmd_msg_size(void) size_t msgsize = NLMSG_ALIGN(sizeof(struct ovs_header)); msgsize += nla_total_size(IFNAMSIZ); - msgsize += nla_total_size(sizeof(struct ovs_dp_stats)); - msgsize += nla_total_size(sizeof(struct ovs_dp_megaflow_stats)); + msgsize += nla_total_size_64bit(sizeof(struct ovs_dp_stats)); + msgsize += nla_total_size_64bit(sizeof(struct ovs_dp_megaflow_stats)); msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_USER_FEATURES */ return msgsize; @@ -1463,13 +1465,13 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb, goto nla_put_failure; get_dp_stats(dp, &dp_stats, &dp_megaflow_stats); - if (nla_put(skb, OVS_DP_ATTR_STATS, sizeof(struct ovs_dp_stats), - &dp_stats)) + if (nla_put_64bit(skb, OVS_DP_ATTR_STATS, sizeof(struct ovs_dp_stats), + &dp_stats, OVS_DP_ATTR_PAD)) goto nla_put_failure; - if (nla_put(skb, OVS_DP_ATTR_MEGAFLOW_STATS, - sizeof(struct ovs_dp_megaflow_stats), - &dp_megaflow_stats)) + if (nla_put_64bit(skb, OVS_DP_ATTR_MEGAFLOW_STATS, + sizeof(struct ovs_dp_megaflow_stats), + &dp_megaflow_stats, OVS_DP_ATTR_PAD)) goto nla_put_failure; if (nla_put_u32(skb, OVS_DP_ATTR_USER_FEATURES, dp->user_features)) @@ -1838,8 +1840,9 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb, goto nla_put_failure; ovs_vport_get_stats(vport, &vport_stats); - if (nla_put(skb, OVS_VPORT_ATTR_STATS, sizeof(struct ovs_vport_stats), - &vport_stats)) + if (nla_put_64bit(skb, OVS_VPORT_ATTR_STATS, + sizeof(struct ovs_vport_stats), &vport_stats, + OVS_VPORT_ATTR_PAD)) goto nla_put_failure; if (ovs_vport_get_upcall_portids(vport, skb)) -- 2.8.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH net-next 3/8] fs/quota: use nla_put_u64_64bit()
Signed-off-by: Nicolas Dichtel --- fs/quota/netlink.c | 12 +++- include/uapi/linux/quota.h | 1 + 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/quota/netlink.c b/fs/quota/netlink.c index d07a2f91d858..8b252673d454 100644 --- a/fs/quota/netlink.c +++ b/fs/quota/netlink.c @@ -47,7 +47,7 @@ void quota_send_warning(struct kqid qid, dev_t dev, void *msg_head; int ret; int msg_size = 4 * nla_total_size(sizeof(u32)) + - 2 * nla_total_size(sizeof(u64)); + 2 * nla_total_size_64bit(sizeof(u64)); /* We have to allocate using GFP_NOFS as we are called from a * filesystem performing write and thus further recursion into @@ -68,8 +68,9 @@ void quota_send_warning(struct kqid qid, dev_t dev, ret = nla_put_u32(skb, QUOTA_NL_A_QTYPE, qid.type); if (ret) goto attr_err_out; - ret = nla_put_u64(skb, QUOTA_NL_A_EXCESS_ID, - from_kqid_munged(&init_user_ns, qid)); + ret = nla_put_u64_64bit(skb, QUOTA_NL_A_EXCESS_ID, + from_kqid_munged(&init_user_ns, qid), + QUOTA_NL_A_PAD); if (ret) goto attr_err_out; ret = nla_put_u32(skb, QUOTA_NL_A_WARNING, warntype); @@ -81,8 +82,9 @@ void quota_send_warning(struct kqid qid, dev_t dev, ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MINOR, MINOR(dev)); if (ret) goto attr_err_out; - ret = nla_put_u64(skb, QUOTA_NL_A_CAUSED_ID, - from_kuid_munged(&init_user_ns, current_uid())); + ret = nla_put_u64_64bit(skb, QUOTA_NL_A_CAUSED_ID, + from_kuid_munged(&init_user_ns, current_uid()), + QUOTA_NL_A_PAD); if (ret) goto attr_err_out; genlmsg_end(skb, msg_head); diff --git a/include/uapi/linux/quota.h b/include/uapi/linux/quota.h index 38baddb807f5..4d2489ef6f10 100644 --- a/include/uapi/linux/quota.h +++ b/include/uapi/linux/quota.h @@ -191,6 +191,7 @@ enum { QUOTA_NL_A_DEV_MAJOR, QUOTA_NL_A_DEV_MINOR, QUOTA_NL_A_CAUSED_ID, + QUOTA_NL_A_PAD, __QUOTA_NL_A_MAX, }; #define QUOTA_NL_A_MAX (__QUOTA_NL_A_MAX - 1) -- 2.8.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH net-next 6/8] rtnl: align nlattr properly when needed
Signed-off-by: Nicolas Dichtel --- net/core/rtnetlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 9efc1f34ef3b..5503dfe6a050 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -876,7 +876,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */ + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */ + nla_total_size(IFNAMSIZ) /* IFLA_QDISC */ - + nla_total_size(sizeof(struct rtnl_link_ifmap)) + + nla_total_size_64bit(sizeof(struct rtnl_link_ifmap)) + nla_total_size(sizeof(struct rtnl_link_stats)) + nla_total_size_64bit(sizeof(struct rtnl_link_stats64)) + nla_total_size(MAX_ADDR_LEN) /* IFLA_ADDRESS */ @@ -1181,7 +1181,7 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev) .dma = dev->dma, .port= dev->if_port, }; - if (nla_put(skb, IFLA_MAP, sizeof(map), &map)) + if (nla_put_64bit(skb, IFLA_MAP, sizeof(map), &map, IFLA_PAD)) return -EMSGSIZE; return 0; -- 2.8.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH net-next 1/8] macsec: use nla_put_u64_64bit()
Signed-off-by: Nicolas Dichtel --- drivers/net/macsec.c | 121 ++--- include/uapi/linux/if_link.h | 1 + include/uapi/linux/if_macsec.h | 6 ++ 3 files changed, 95 insertions(+), 33 deletions(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 6caa72402de7..a172a1ffa151 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -1405,9 +1405,10 @@ static sci_t nla_get_sci(const struct nlattr *nla) return (__force sci_t)nla_get_u64(nla); } -static int nla_put_sci(struct sk_buff *skb, int attrtype, sci_t value) +static int nla_put_sci(struct sk_buff *skb, int attrtype, sci_t value, + int padattr) { - return nla_put_u64(skb, attrtype, (__force u64)value); + return nla_put_u64_64bit(skb, attrtype, (__force u64)value, padattr); } static struct macsec_tx_sa *get_txsa_from_nl(struct net *net, @@ -2131,16 +2132,36 @@ static int copy_rx_sc_stats(struct sk_buff *skb, sum.InPktsUnusedSA+= tmp.InPktsUnusedSA; } - if (nla_put_u64(skb, MACSEC_RXSC_STATS_ATTR_IN_OCTETS_VALIDATED, sum.InOctetsValidated) || - nla_put_u64(skb, MACSEC_RXSC_STATS_ATTR_IN_OCTETS_DECRYPTED, sum.InOctetsDecrypted) || - nla_put_u64(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_UNCHECKED, sum.InPktsUnchecked) || - nla_put_u64(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_DELAYED, sum.InPktsDelayed) || - nla_put_u64(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_OK, sum.InPktsOK) || - nla_put_u64(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_INVALID, sum.InPktsInvalid) || - nla_put_u64(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_LATE, sum.InPktsLate) || - nla_put_u64(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_NOT_VALID, sum.InPktsNotValid) || - nla_put_u64(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_NOT_USING_SA, sum.InPktsNotUsingSA) || - nla_put_u64(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_UNUSED_SA, sum.InPktsUnusedSA)) + if (nla_put_u64_64bit(skb, MACSEC_RXSC_STATS_ATTR_IN_OCTETS_VALIDATED, + sum.InOctetsValidated, + MACSEC_RXSC_STATS_ATTR_PAD) || + nla_put_u64_64bit(skb, MACSEC_RXSC_STATS_ATTR_IN_OCTETS_DECRYPTED, + sum.InOctetsDecrypted, + MACSEC_RXSC_STATS_ATTR_PAD) || + nla_put_u64_64bit(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_UNCHECKED, + sum.InPktsUnchecked, + MACSEC_RXSC_STATS_ATTR_PAD) || + nla_put_u64_64bit(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_DELAYED, + sum.InPktsDelayed, + MACSEC_RXSC_STATS_ATTR_PAD) || + nla_put_u64_64bit(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_OK, + sum.InPktsOK, + MACSEC_RXSC_STATS_ATTR_PAD) || + nla_put_u64_64bit(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_INVALID, + sum.InPktsInvalid, + MACSEC_RXSC_STATS_ATTR_PAD) || + nla_put_u64_64bit(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_LATE, + sum.InPktsLate, + MACSEC_RXSC_STATS_ATTR_PAD) || + nla_put_u64_64bit(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_NOT_VALID, + sum.InPktsNotValid, + MACSEC_RXSC_STATS_ATTR_PAD) || + nla_put_u64_64bit(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_NOT_USING_SA, + sum.InPktsNotUsingSA, + MACSEC_RXSC_STATS_ATTR_PAD) || + nla_put_u64_64bit(skb, MACSEC_RXSC_STATS_ATTR_IN_PKTS_UNUSED_SA, + sum.InPktsUnusedSA, + MACSEC_RXSC_STATS_ATTR_PAD)) return -EMSGSIZE; return 0; @@ -2169,10 +2190,18 @@ static int copy_tx_sc_stats(struct sk_buff *skb, sum.OutOctetsEncrypted += tmp.OutOctetsEncrypted; } - if (nla_put_u64(skb, MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED, sum.OutPktsProtected) || - nla_put_u64(skb, MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED, sum.OutPktsEncrypted) || - nla_put_u64(skb, MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED, sum.OutOctetsProtected) || - nla_put_u64(skb, MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED, sum.OutOctetsEncrypted)) + if (nla_put_u64_64bit(skb, MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED, + sum.OutPktsProtected, + MACSEC_TXSC_STATS_ATTR_PAD) || + nla_put_u64_64bit(skb, MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED, + sum.OutPktsEncrypted, + MACSEC_TXSC_STATS_ATTR_PAD) || + nla_put_u64_64bit(skb, MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED, + sum.OutOctetsProtected, +
[ovs-dev] [PATCH net-next 0/8] netlink: align attributes when needed (patchset #3)
This is the continuation (series #3) of the work done to align netlink attributes when these attributes contain some 64-bit fields. It's the last patchset from what I've seen. The last user of nla_put_u64() is block/drbd. This module does not use standard netlink API (see all the stuff in include/linux/genl_magic_struct.h and include/linux/genl_magic_func.h). I didn't modify it because it's seems hard to do it whithout testing and fully understanding the context (for example, why include/linux/drbd_genl.h is not part of uapi?). Any thoughts? Documentation/networking/gen_stats.txt | 6 +- drivers/net/macsec.c| 121 +++- drivers/net/wireless/mac80211_hwsim.c | 2 +- drivers/net/wireless/mac80211_hwsim.h | 1 + fs/quota/netlink.c | 12 ++-- include/net/gen_stats.h | 6 +- include/uapi/linux/gen_stats.h | 1 + include/uapi/linux/if_link.h| 1 + include/uapi/linux/if_macsec.h | 6 ++ include/uapi/linux/inet_diag.h | 4 +- include/uapi/linux/openvswitch.h| 2 + include/uapi/linux/pkt_cls.h| 2 + include/uapi/linux/quota.h | 1 + include/uapi/linux/rtnetlink.h | 1 + include/uapi/linux/tc_act/tc_bpf.h | 1 + include/uapi/linux/tc_act/tc_connmark.h | 1 + include/uapi/linux/tc_act/tc_csum.h | 1 + include/uapi/linux/tc_act/tc_defact.h | 1 + include/uapi/linux/tc_act/tc_gact.h | 1 + include/uapi/linux/tc_act/tc_ife.h | 1 + include/uapi/linux/tc_act/tc_ipt.h | 1 + include/uapi/linux/tc_act/tc_mirred.h | 1 + include/uapi/linux/tc_act/tc_nat.h | 1 + include/uapi/linux/tc_act/tc_pedit.h| 1 + include/uapi/linux/tc_act/tc_skbedit.h | 1 + include/uapi/linux/tc_act/tc_vlan.h | 1 + net/core/gen_stats.c| 35 + net/core/neighbour.c| 3 +- net/core/rtnetlink.c| 4 +- net/core/sock_diag.c| 2 +- net/ipv4/inet_diag.c| 9 ++- net/openvswitch/datapath.c | 27 +++ net/sched/act_api.c | 7 +- net/sched/act_bpf.c | 3 +- net/sched/act_connmark.c| 3 +- net/sched/act_csum.c| 2 +- net/sched/act_gact.c| 2 +- net/sched/act_ife.c | 2 +- net/sched/act_ipt.c | 2 +- net/sched/act_mirred.c | 2 +- net/sched/act_nat.c | 2 +- net/sched/act_pedit.c | 2 +- net/sched/act_simple.c | 2 +- net/sched/act_skbedit.c | 2 +- net/sched/act_vlan.c| 2 +- net/sched/cls_u32.c | 7 +- net/sched/sch_api.c | 6 +- net/sctp/sctp_diag.c| 5 +- 48 files changed, 211 insertions(+), 98 deletions(-) Comments are welcomed, Regards, Nicolas ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v1] Support port level IPFIX
Hi Daniel, I had the same reaction as Ben. I would prefer to see any efforts being made towards improving the existing OpenFlow “sample” action. Some more comments inline below. > On Apr 25, 2016, at 8:48 PM, Daniel Ye wrote: > > Hi Ben, > > Thanks for your review. I have sent a second patch with signed-off tag. > Please review it again. > For your comments, I listed the answers of mine below: > 1. "I worry that we'll end up with a fourth, and a fifth, ..." >With flow IPFIX, Bridge IPFIX and Port level IPFIX, we can handle almost > all the IPFIX cases. There is no need to add other kinds of IPFIX. > > 2. " why can't the details of what packets it selects, etc., be controlled > from flows in the flow table, rather than by configuration in the database?" >We can't do this, because flow based IPFIX does not support to export > output tunnel information. I believe it should be straightforward to implement that. It’s a matter of extending the “sample” OpenFlow action to support a tunnel_out_port field which could be directly converted into the tunnel_out_port field of the corresponding “userspace” datapath action. > However, in overlay network, users want to >know tunnel information. With Bridge IPFIX, output tunnel information can > be retrieved. But, Bridge IPFIX will enable all ports on the bridge to >collects the IPFIX information and it's not granular. You can’t get more granular than per-flow sampling. If granularity is concern, I would extend the OpenFlow “sample” action to support setting the output port, cf. the design of the datapath action. > With Port IPFIX, users can just enable IPFIX on the ports, which they are > interested in and the >IPFIX information is preciser. Output tunnel information can be got if you > enable port IPFIX on the tunnel port. What's more, IPFIX >configuration of each port can be different. While Bridge IPFIX can only > support one IPFIX configuration per bridge. Users need configure IPFIX >granularly like port-based IPFIX. > > 3. If we want to use IPFIX to monitor the packets which ingress and egress > the port, it's easier to be configured by port than by flows. There shouldn’t be any significant impact on performance to monitor all ports of a bridge vs. only some of the ports. Since you didn't mention performance, I can only assume that’s not a concern, so there shouldn’t be any issue for users to enable monitoring of all ports of a bridge even when they are interested in only a subset of the ports. The per-bridge IPFIX exporter setting was only added to be consistent with the way the sFlow and netflow exporters can be configured in OVS, and that wasn’t meant to be the primary way of using IPFIX. Adding a per-port setting would break the consistency with sFlow and netflow in OVS, so I believe it’s not appropriate. The main feature was the OpenFlow “sample” action. Efforts should concentrate on that IMHO. > > Bests, > Daniel Benli Ye > > -Original Message- > From: Ben Pfaff [mailto:b...@ovn.org] > Sent: Saturday, April 23, 2016 12:55 AM > To: Daniel Ye > Cc: Wenyu Zhang; dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH v1] Support port level IPFIX > > On Tue, Apr 19, 2016 at 03:55:10PM +0800, Daniel Benli Ye wrote: >> From: Benli Ye >> >> This patch enables port level IPFIX. Before this patch, OVS supported >> per bridge IPFIX and per flow IPFX, and exporting packet tunnel >> headers is only supported by bridge IPFIX. This patch adds port level >> IPFIX for easy configuration and port level IPFIX also supports >> exporting packet tunnel headers, just the same with bridge level IPFIX. >> Three main things are done in this patch. >> 1) Add a column ipfix in Port table to ref IPFIX table >> 2) Each interface in the port should use the port IPFiX configuration >> 3) A hash map is used to manage the port which is configured IPFIX >> >> CLI to configure Port IPFIX: >> 1) Configure >> ovs-vsctl -- set Port port0 ipfix=@i -- --id=@i create IPFIX \ >> targets=\"10.24.122.72:4739\" sampling=1 obs_domain_id=123 \ >> obs_point_id=456 cache_active_timeout=1 cache_max_flows=128 \ >> other_config:enable-tunnel-sampling=true >> 2) Clear >> ovs-vsctl clear Port port0 ipfix > > Thanks for working on IPFIX! We don't have enough IPFIX expertise around > here, so new contributors are always welcome. > > The patch lacks a Signed-off-by. We will need it before it can be applied. > CONTRIBUTING.md explains the format and the meaning, which is to agree to the > Developer's Certificate of Origin, which is also in CONTRIBUTING.md; please > read it. > > Due to the lack of signoff, I did not do a detailed review, but I have some > general comments. First, this patch follows the coding style remarkably > well, especially for a first patch--well done, thank you! > > Second, this is the third form of configuration to be introduced for IPFIX. > I worry that we'll end up