Re: [ovs-dev] [patch_v9 1/6] string: Implement strcasestr substitute.
-Original Message- From:on behalf of Ben Pfaff Date: Friday, August 4, 2017 at 2:11 PM To: Darrell Ball Cc: "d...@openvswitch.org" Subject: Re: [ovs-dev] [patch_v9 1/6] string: Implement strcasestr substitute. On Thu, Aug 03, 2017 at 09:34:41PM -0700, Darrell Ball wrote: > strcasestr is not defined for Windows, so implement a version > that could be used on Windows. This is needed for an upcoming > patch. > > Signed-off-by: Darrell Ball Thanks! I don't understand why this uses a macro to change the name of strcasestr to strcasestr_s. I don't know of a benefit to that. Maybe it follows the pattern in string.h.in that substitutes strerror_s for strerror_r and strtok_s to strtok_r, but those cases are because the Windows C library has functions named strerror_s and strtok_s that do what OVS wants. The Windows C library does not (as far as I know) have a function strcasestr_s. The header should provide a prototype for the function. It doesn't do clients any good to provide the prototype in the .c file. Well, there is an explanation for everything – although it may not be a good one : ) ; my intention was to provide a substitute with a separate name and use it for Windows presently (with the remapping of strcasestr to strcasestr_s) but in the most limited scope of source file usage. However, I agree it is better just to provide a Windows version outright. The implementation should follow POSIX in that searching for an empty substring should always succeed; otherwise, eventually we will end up with confusion. See: http://pubs.opengroup.org/onlinepubs/9699919799/functions/strstr.html That was part of the reason for V9 - to support empty substr find success. The V9 patch does handle the empty substring success case. However, if both str and substr are empty, it should still be ‘success’; your version does that; mine version returns NULL, which is non-compliant. My understanding is that it is not a good idea, legally, to replace non-contiguous ranges of years by a range in a copyright notice. ohh; thanks Here is my suggestion. What do you think? Nice I was wondering if the below incremental looked ok to you ? I like the two do/while loops for consistency, affirmative character matching and the explicit else ? It seems more definitional, but it might be slower and is not as compact, but either way is fine with me ? +#ifdef _WIN32 +char * +strcasestr(const char *str, const char *substr) +{ +size_t i = 0; +do { +do { +if (!substr[i]) { +return CONST_CAST(char *, str); +} else if (tolower(substr[i]) == tolower(str[i])) { +i++; +} else { +i = 0; +break; +} +} while (1); +} while (*str++); +return NULL; +} +#endif Thanks, Ben. --8<--cut here-->8-- From: Darrell Ball Date: Thu, 3 Aug 2017 21:34:41 -0700 Subject: [PATCH] string: Implement strcasestr substitute. strcasestr is not defined for Windows, so implement a version that could be used on Windows. This is needed for an upcoming patch. Signed-off-by: Darrell Ball Signed-off-by: Ben Pfaff --- lib/string.c| 23 +-- lib/string.h.in | 4 +++- 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/lib/string.c b/lib/string.c index 082359d858d8..a9ceabe47e9f 100644 --- a/lib/string.c +++ b/lib/string.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2009, 2011 Nicira, Inc. + * Copyright (c) 2009, 2011, 2017 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -15,9 +15,11 @@ */ #include - +#include #include +#include "util.h" + #ifndef HAVE_STRNLEN size_t strnlen(const char *s, size_t maxlen) @@ -26,3 +28,20 @@ strnlen(const char *s, size_t maxlen) return end ? end - s : maxlen; } #endif + +#ifdef _WIN32 +char * +strcasestr(const char *str, const char *substr) +{ +do { +for (size_t i = 0; ; i++) { +if (!substr[i]) { +return CONST_CAST(char *, str); +} else if (tolower(substr[i]) != tolower(str[i])) { +break; +} +} +} while (*str++); +return NULL; +} +#endif diff --git
Re: [ovs-dev] [PATCH] ovsdb-server: Document clarification for some bad wording in RFC 7047.
On Fri, Aug 04, 2017 at 01:45:56PM -0700, Andy Zhou wrote: > On Thu, Jul 27, 2017 at 4:20 PM, Ben Pfaffwrote: > > Reported-by: Harish Kanakaraju > > Signed-off-by: Ben Pfaff > > Acked-by: Andy Zhou Thanks, applied to master. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [patch_v9 3/6] Userspace Datapath: Add TFTP support.
On Thu, Aug 03, 2017 at 09:34:43PM -0700, Darrell Ball wrote: > Both ipv4 and ipv6 are supported. Also, NAT support is included. > > Signed-off-by: Darrell BallI'm happy with this patch and the rest of them. Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [patch_v9 2/6] Userspace Datapath: Add ALG infra and FTP.
On Thu, Aug 03, 2017 at 09:34:42PM -0700, Darrell Ball wrote: > ALG infra and FTP (both V4 and V6) support is added to the userspace > datapath. Also, NAT support is included. > > Signed-off-by: Darrell BallThanks for implementing this. I have only a few minor comments. struct conn_key has at least one hole in it, between 'nw_proto' and 'zone'. That makes it risky, at best, to do byte-by-byte comparisons of two instances of it with memcmp(), but expectation_lookup() does such a comparison. It would probably be better to do a member-by-member comparison, or to carefully rearrange struct conn_key to avoid holes. It doesn't make sense to me to have a strcasestr_s() prototype in conntrack.c. I think it can be removed. As a possible direction for the future, usually the need to read-lock a data structure can be eliminated by using RCU. I'm appending some minor suggestions that help to better conform to the usual OVS style or that made the code easier for me to read and understand. --8<--cut here-->8-- diff --git a/lib/conntrack.c b/lib/conntrack.c index be7d0623b24f..a05c54019bc9 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -66,8 +66,6 @@ enum ct_alg_mode { CT_FTP_MODE_PASSIVE, }; -char *strcasestr_s(const char *str, const char *substr); - static bool conn_key_extract(struct conntrack *, struct dp_packet *, ovs_be16 dl_type, struct conn_lookup_ctx *, uint16_t zone); @@ -333,7 +331,6 @@ write_ct_md(struct dp_packet *pkt, uint16_t zone, const struct conn *conn, static uint8_t get_ip_proto(const struct dp_packet *pkt) { - uint8_t ip_proto; struct eth_header *l2 = dp_packet_eth(pkt); if (l2->eth_type == htons(ETH_TYPE_IPV6)) { @@ -1178,18 +1175,16 @@ sweep_bucket(struct conntrack *ct, struct conntrack_bucket *ctb, } } -#define MAX_ALG_EXP_TO_EXPIRE 1000 +enum { MAX_ALG_EXP_TO_EXPIRE = 1000 }; size_t alg_exp_count = hmap_count(>alg_expectations); /* XXX: revisit this. */ -size_t max_to_expire = -MAX(alg_exp_count/10, MAX_ALG_EXP_TO_EXPIRE); +size_t max_to_expire = MAX(alg_exp_count / 10, MAX_ALG_EXP_TO_EXPIRE); count = 0; ct_rwlock_wrlock(>resources_lock); struct alg_exp_node *alg_exp_node, *alg_exp_node_next; LIST_FOR_EACH_SAFE (alg_exp_node, alg_exp_node_next, exp_node, >alg_exp_list) { -if (now < alg_exp_node->expiration || -count >= max_to_expire) { +if (now < alg_exp_node->expiration || count >= max_to_expire) { min_expiration = MIN(min_expiration, alg_exp_node->expiration); break; } @@ -2355,7 +2350,6 @@ static struct alg_exp_node * expectation_lookup(struct hmap *alg_expectations, const struct conn_key *key, uint32_t basis) { - struct conn_key check_key = *key; check_key.src.port = ALG_WC_SRC_PORT; struct alg_exp_node *alg_exp_node; @@ -2410,7 +2404,7 @@ expectation_create(struct conntrack *ct, alg_exp_node->master_mark = master_conn->mark; alg_exp_node->master_label = master_conn->label; alg_exp_node->master_key = master_conn->key; -alg_exp_node->passive_mode = mode == CT_FTP_MODE_PASSIVE ? true : false; +alg_exp_node->passive_mode = mode == CT_FTP_MODE_PASSIVE; /* Take the write lock here because it is almost 100% * likely that the lookup will fail and * expectation_create() will be called below. */ @@ -2458,7 +2452,7 @@ repl_ftp_v4_addr(struct dp_packet *pkt, ovs_be32 v4_addr_rep, char *ftp_data_start, size_t addr_offset_from_ftp_data_start) { -#define MAX_FTP_V4_NAT_DELTA 8 +enum { MAX_FTP_V4_NAT_DELTA = 8 }; /* Do conservative check for pathological MTU usage. */ uint32_t orig_used_size = dp_packet_size(pkt); @@ -2475,26 +2469,25 @@ repl_ftp_v4_addr(struct dp_packet *pkt, ovs_be32 v4_addr_rep, int overall_delta = 0; char *byte_str = ftp_data_start + addr_offset_from_ftp_data_start; -char *next_delim; -size_t substr_size; -uint8_t rep_byte; -char rep_str[4]; -size_t replace_size; +/* Replace the existing IPv4 address by the new one. */ for (uint8_t i = 0; i < 4; i++) { -memset(rep_str, 0 , sizeof rep_str); -next_delim = memchr(byte_str,',',4); +/* Find the end of the string for this octet. */ +char *next_delim = memchr(byte_str, ',', 4); ovs_assert(next_delim); -substr_size = next_delim - byte_str; +int substr_size = next_delim - byte_str; remain_size -= substr_size; -rep_byte = get_v4_byte_be(v4_addr_rep, i); -replace_size = sprintf(rep_str, "%d", rep_byte); -ovs_assert(replace_size == strlen(rep_str)); + +/* Compose the new string for this octet, and replace it. */ +char rep_str[4]; +uint8_t
Re: [ovs-dev] [PATCH v3 4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
On 08/01/2017 08:58 AM, Kevin Traynor wrote: Previously rxqs were assigned to pmds by round robin in port/queue order. Now that we have the processing cycles used for existing rxqs, use that information to try and produced a better balanced distribution of rxqs across pmds. i.e. given multiple pmds, the rxqs which have consumed the largest amount of processing cycles will be placed on different pmds. The rxqs are sorted by their processing cycles and assigned (in sorted order) round robin across pmds. Signed-off-by: Kevin TraynorKevin, Thanks for your work on openvswitch. Unfortunately, this patch fails when applied to the master branch. I didn't see a 'From:' sha id so I don't know which commit to base this series upon. Here's the error: Applying: dpif-netdev: Count the rxq processing cycles for an rxq. Applying patch #796309 using 'git am' Description: [ovs-dev,v3,4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles. Applying: dpif-netdev: Change rxq_scheduling to use rxq processing cycles. error: patch failed: lib/dpif-netdev.c:3306 error: lib/dpif-netdev.c: patch does not apply Patch failed at 0001 dpif-netdev: Change rxq_scheduling to use rxq processing cycles. I guess we need the series to be rebased. Thanks, - Greg --- Documentation/howto/dpdk.rst | 7 + lib/dpif-netdev.c| 73 +++- 2 files changed, 66 insertions(+), 14 deletions(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index af01d3e..a969285 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues was pinned will become thread. +If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores) +automatically. The processing cycles that have been required for each rxq +will be used where known to assign rxqs with the highest consumption of +processing cycles to different pmds. + +Rxq to pmds assignment takes place whenever there are configuration changes. + QoS --- diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 25a521a..a05e586 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr) } +/* Sort Rx Queues by the processing cycles they are consuming. */ +static int +rxq_cycle_sort(const void *a, const void *b) +{ +struct dp_netdev_rxq * qa; +struct dp_netdev_rxq * qb; + +qa = *(struct dp_netdev_rxq **) a; +qb = *(struct dp_netdev_rxq **) b; + +if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >= +dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) { +return -1; +} + +return 1; +} + /* Assign pmds to queues. If 'pinned' is true, assign pmds to pinned * queues and marks the pmds as isolated. Otherwise, assign non isolated * pmds to unpinned queues. * + * If 'pinned' is false queues will be sorted by processing cycles they are + * consuming and then assigned to pmds in round robin order. + * * The function doesn't touch the pmd threads, it just stores the assignment * in the 'pmd' member of each rxq. */ @@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) struct dp_netdev_port *port; struct rr_numa_list rr; - -rr_numa_list_populate(dp, ); +struct dp_netdev_rxq ** rxqs = NULL; +int i, n_rxqs = 0; +struct rr_numa *numa = NULL; +int numa_id; HMAP_FOR_EACH (port, node, >ports) { -struct rr_numa *numa; -int numa_id; - if (!netdev_is_pmd(port->netdev)) { continue; } -numa_id = netdev_get_numa_id(port->netdev); -numa = rr_numa_list_lookup(, numa_id); - for (int qid = 0; qid < port->n_rxq; qid++) { struct dp_netdev_rxq *q = >rxqs[qid]; @@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) } } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) { -if (!numa) { -VLOG_WARN("There's no available (non isolated) pmd thread " - "on numa node %d. Queue %d on port \'%s\' will " - "not be polled.", - numa_id, qid, netdev_get_name(port->netdev)); +if (n_rxqs == 0) { +rxqs = xmalloc(sizeof *rxqs); } else { -q->pmd = rr_numa_get_pmd(numa); +rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); } +/* Store the queue. */ +rxqs[n_rxqs++] = q; } } } +if (n_rxqs > 1) { +/* Sort the queues in order of the processing cycles + * they consumed during their last pmd interval. */ +qsort(rxqs,
Re: [ovs-dev] [patch_v9 1/6] string: Implement strcasestr substitute.
On Thu, Aug 03, 2017 at 09:34:41PM -0700, Darrell Ball wrote: > strcasestr is not defined for Windows, so implement a version > that could be used on Windows. This is needed for an upcoming > patch. > > Signed-off-by: Darrell BallThanks! I don't understand why this uses a macro to change the name of strcasestr to strcasestr_s. I don't know of a benefit to that. Maybe it follows the pattern in string.h.in that substitutes strerror_s for strerror_r and strtok_s to strtok_r, but those cases are because the Windows C library has functions named strerror_s and strtok_s that do what OVS wants. The Windows C library does not (as far as I know) have a function strcasestr_s. The header should provide a prototype for the function. It doesn't do clients any good to provide the prototype in the .c file. The implementation should follow POSIX in that searching for an empty substring should always succeed; otherwise, eventually we will end up with confusion. See: http://pubs.opengroup.org/onlinepubs/9699919799/functions/strstr.html My understanding is that it is not a good idea, legally, to replace non-contiguous ranges of years by a range in a copyright notice. Here is my suggestion. What do you think? Thanks, Ben. --8<--cut here-->8-- From: Darrell Ball Date: Thu, 3 Aug 2017 21:34:41 -0700 Subject: [PATCH] string: Implement strcasestr substitute. strcasestr is not defined for Windows, so implement a version that could be used on Windows. This is needed for an upcoming patch. Signed-off-by: Darrell Ball Signed-off-by: Ben Pfaff --- lib/string.c| 23 +-- lib/string.h.in | 4 +++- 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/lib/string.c b/lib/string.c index 082359d858d8..a9ceabe47e9f 100644 --- a/lib/string.c +++ b/lib/string.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2009, 2011 Nicira, Inc. + * Copyright (c) 2009, 2011, 2017 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -15,9 +15,11 @@ */ #include - +#include #include +#include "util.h" + #ifndef HAVE_STRNLEN size_t strnlen(const char *s, size_t maxlen) @@ -26,3 +28,20 @@ strnlen(const char *s, size_t maxlen) return end ? end - s : maxlen; } #endif + +#ifdef _WIN32 +char * +strcasestr(const char *str, const char *substr) +{ +do { +for (size_t i = 0; ; i++) { +if (!substr[i]) { +return CONST_CAST(char *, str); +} else if (tolower(substr[i]) != tolower(str[i])) { +break; +} +} +} while (*str++); +return NULL; +} +#endif diff --git a/lib/string.h.in b/lib/string.h.in index bbdaeb43e612..59998aa36fc4 100644 --- a/lib/string.h.in +++ b/lib/string.h.in @@ -1,5 +1,5 @@ /* - * Copyright (c) 2009, 2011, 2013 Nicira, Inc. + * Copyright (c) 2009-2017 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -36,6 +36,8 @@ #define strcasecmp _stricmp #define strncasecmp _strnicmp #define strerror_r(errnum, buf, buflen) strerror_s(buf, buflen, errnum) + +char *strcasestr(const char *, const char *); #endif #ifndef HAVE_STRNLEN -- 2.10.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] debian.rst: Clarify that "dpkg" needs manual help with dependencies.
On Thu, Jul 6, 2017 at 4:47 PM, Ben Pfaffwrote: > On Mon, May 29, 2017 at 11:40:51AM -0700, Ben Pfaff wrote: >> Reported-by: Mircea Ulinic >> Signed-off-by: Ben Pfaff Acked-by: Andy Zhou ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb-server: Document clarification for some bad wording in RFC 7047.
On Thu, Jul 27, 2017 at 4:20 PM, Ben Pfaffwrote: > Reported-by: Harish Kanakaraju > Signed-off-by: Ben Pfaff Acked-by: Andy Zhou ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 6/6] NSH unit test cases using encap and decap actions
On Thu, Aug 03, 2017 at 09:15:00AM +0800, Yi Yang wrote: > From: Jan Scheurich> > With the support of generic encap and decap actions for Ethernet and NSH > it is now possible to build test cases that mimic realistic OVS > configurations and OF pipelines for Service Function Chaining. Packets > are being encapsulated in NSH, forwarded based on NSH headers, sent over > Ethernet links and VXLAN-GPE tunnels, and decapsulated at the end of > a service chain. > > Signed-off-by: Jan Scheurich > Signed-off-by: Yi Yang Thanks for adding tests! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 5/6] Generic encap and decap support for NSH
On Thu, Aug 03, 2017 at 09:14:59AM +0800, Yi Yang wrote: > From: Jan Scheurich> > This commit adds translation and netdev datapath support for generic > encap and decap actions for the NSH MD1 header. The generic encap and > decap actions are mapped to specific encap_nsh and decap_nsh actions > in the datapath. > > The translation follows that general scheme that decap() of an NSH > packet triggers recirculation after decapsulation, while encap(nsh) > just modifies struct flow and sets the ctx->pending_encap flag to > generate the encap_nsh action at the next commit to be able to include > subsequent set_field actions for NSH headers. > > Support for the flexible MD2 format using TLV properties is foreseen > in encap(nsh), but not yet fully implemented. > > The CLI syntax for encap of NSH is > encap(nsh(md_type=1)) > encap(nsh(md_type=2[,tlv(,,),...])) > > Signed-off-by: Jan Scheurich > Signed-off-by: Yi Yang I don't see the new parts added to openvswitch.h in upstream Linux (even in net-next), so I think that all of them should be enclosed in #ifndef __KERNEL__ to make that clear. struct ovs_action_encap_nsh is absurdly large due to the metadata array. It does not make sense for it to be that big given only MD1 support. Ideally, struct ovs_action_encap_nsh would be converted into nested Netlink attributes; then, the metadata could be variable length. That is the right way to add kernel support. Before kernel support, though, it would make more sense to just use a __be32[4] metadata array. This patch seems to make a lot of changes that should have been made in whatever patch originally added the code. For example, all the changes to format_nsh_key() and format_be32_masked() appear to be in this category. There are some new compiler warnings or errors: ../ofproto/ofproto-dpif-ipfix.c:2793:17: error: enumeration values 'OVS_ACTION_ATTR_ENCAP_NSH' and 'OVS_ACTION_ATTR_DECAP_NSH' not explicitly handled in switch [-Werror,-Wswitch-enum] ../ofproto/ofproto-dpif-xlate.c:5824:43: error: cast from 'const char *' to 'struct ofpact_ed_prop *' increases required alignment from 1 to 2 [-Werror,-Wcast-align] ../lib/odp-util.c:1847:21: error: cast from 'uint8_t *' (aka 'unsigned char *') to 'struct nsh_md1_ctx *' increases required alignment from 1 to 4 [-Werror,-Wcast-align] ../lib/odp-util.c:6818:35: error: cast from 'uint8_t *' (aka 'unsigned char *') to 'struct nsh_md1_ctx *' increases required alignment from 1 to 4 [-Werror,-Wcast-align] ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 4/6] userspace: add NSH support to vxlan-gpe tunnels
On Thu, Aug 03, 2017 at 09:14:58AM +0800, Yi Yang wrote: > From: Jan Scheurich> > Signed-off-by: Yi Yang > Signed-off-by: Jan Scheurich Looks good to me, but are there any tests? (Do the later patches test this support?) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 3/6] Adding nsh.at for NSH unit tests
On Thu, Aug 03, 2017 at 09:14:57AM +0800, Yi Yang wrote: > From: Jan Scheurich> > First basic NSH test case implemented and working. > > Unconditionally show matched packet_type in megaflows, even when > matching on eth. > > Signed-off-by: Jan Scheurich I needed to add the following to fix one remaining test (maybe because it was added recently). Please fold it in. --8<--cut here-->8-- diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at index 8aaa1d264ff7..284a65ec6524 100644 --- a/tests/ofproto-dpif.at +++ b/tests/ofproto-dpif.at @@ -4785,7 +4785,7 @@ AT_CHECK([grep "Final flow:" stdout], [0], [Final flow: icmp,tun_id=0x6,in_port= ]) AT_CHECK([ovs-appctl ofproto/trace ovs-dummy "$flow,recirc_id(1)" -generate], [0], [stdout]) -AT_CHECK([grep "Final flow:" stdout], [0], [Final flow: recirc_id=0x1,icmp,tun_id=0x6,in_port=1,vlan_tci=0x,dl_src=50:54:00:00:00:05,dl_dst=50:54:00:00:00:07,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_tos=0,nw_ecn=0,nw_ttl=128,icmp_type=8,icmp_code=0 +AT_CHECK([grep "Final flow:" stdout], [0], [Final flow: recirc_id=0x1,eth,icmp,tun_id=0x6,in_port=1,vlan_tci=0x,dl_src=50:54:00:00:00:05,dl_dst=50:54:00:00:00:07,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_tos=0,nw_ecn=0,nw_ttl=128,icmp_type=8,icmp_code=0 ]) OVS_VSWITCHD_STOP ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 2/6] userspace: enable set_field support for nsh fields
On Thu, Aug 03, 2017 at 09:14:56AM +0800, Yi Yang wrote: > From: Jan Scheurich> > Signed-off-by: Yi Yang > Signed-off-by: Jan Scheurich Thanks for working on this. I think that this should be folded into patch 1, since it fixes what is essentially a bug in patch 1. I noticed that struct nsh_hdr (introduced in the previous patch) isn't suitable for 16-bit alignment, since it has 32-bit members. Our practice is to use appropriate type to ensure safety for 16-bit alignment, like this: /** * struct nsh_md1_ctx - Keeps track of NSH context data * @nshc<1-4>: NSH Contexts. */ struct nsh_md1_ctx { ovs_16aligned_be32 c[4]; }; struct nsh_md2_tlv { ovs_be16 md_class; uint8_t type; uint8_t length; uint8_t md_value[]; }; struct nsh_hdr { ovs_be16 ver_flags_len; uint8_t md_type; uint8_t next_proto; ovs_16aligned_be32 path_hdr; union { struct nsh_md1_ctx md1; struct nsh_md2_tlv md2[0]; }; }; This makes it harder to forget to use the proper accessors to read misaligned data, since the types more or less ensure it. I noticed that it's easier to just use arrays of 4 elements (c[4]) than separate members (c1, c2, c3, c4). I'm appending an incremental that can be applied on top of patches 1 and 2 to achieve many of the suggestions I've made. It includes the incremental for patch 1. I haven't tested anything, beyond compiling. --8<--cut here-->8-- diff --git a/build-aux/extract-odp-netlink-h b/build-aux/extract-odp-netlink-h index a509adb88f79..bc1cc35a7e9b 100755 --- a/build-aux/extract-odp-netlink-h +++ b/build-aux/extract-odp-netlink-h @@ -45,7 +45,11 @@ s,#.*,, s/__u8[[:space:]]*\([a-zA-Z0-9_]*\)[[:space:]]*\[[[:space:]]*ETH_ALEN[[:space:]]*\]/struct eth_addr \1/ # Transform IPv6 addresses from an array to struct in6_addr -s/__be32[[:space:]]*\([a-zA-Z0-9_]*\)[[:space:]]*\[[[:space:]]*4[[:space:]]*\]/struct in6_addr \1/ +# +# As a very special case, only transform member names with more than +# one character because struct ovs_key_nsh has a member "__be32 c[4];" +# that is not an IPv6 address. +s/__be32[[:space:]]*\([a-zA-Z0-9_]\{2,\}\)[[:space:]]*\[[[:space:]]*4[[:space:]]*\]/struct in6_addr \1/ # Transform most Linux-specific __u types into C99 uint_t types, # and most Linux-specific __be into Open vSwitch ovs_be, diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields index 7b6151f3384b..184b75e36bef 100755 --- a/build-aux/extract-ofp-fields +++ b/build-aux/extract-ofp-fields @@ -67,9 +67,9 @@ PREREQS = {"none": "MFP_NONE", # If a name matches more than one prefix, the longest one is used. OXM_CLASSES = {"NXM_OF_":(0, 0x), "NXM_NX_":(0, 0x0001), + "NXOXM_NSH_": (0x005ad650, 0x), "OXM_OF_":(0, 0x8000), "OXM_OF_PKT_REG": (0, 0x8001), - "OXM_NSH_": (0, 0x8004), "ONFOXM_ET_": (0x4f4e4600, 0x), # This is the experimenter OXM class for Nicira, which is the diff --git a/datapath/linux/compat/include/linux/openvswitch.h b/datapath/linux/compat/include/linux/openvswitch.h index 2381ed2c8c3c..5806aba93f73 100644 --- a/datapath/linux/compat/include/linux/openvswitch.h +++ b/datapath/linux/compat/include/linux/openvswitch.h @@ -360,9 +360,6 @@ enum ovs_key_attr { OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking labels */ OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4, /* struct ovs_key_ct_tuple_ipv4 */ OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6, /* struct ovs_key_ct_tuple_ipv6 */ -#ifndef __KERNEL__ - OVS_KEY_ATTR_NSH, /* struct ovs_key_nsh */ -#endif #ifdef __KERNEL__ /* Only used within kernel data path. */ @@ -372,6 +369,7 @@ enum ovs_key_attr { #ifndef __KERNEL__ /* Only used within userspace data path. */ OVS_KEY_ATTR_PACKET_TYPE, /* be32 packet type */ + OVS_KEY_ATTR_NSH, /* struct ovs_key_nsh */ #endif __OVS_KEY_ATTR_MAX @@ -500,10 +498,7 @@ struct ovs_key_nsh { __u8 np; __u8 pad; __be32 path_hdr; -__be32 c1; -__be32 c2; -__be32 c3; -__be32 c4; +__be32 c[4]; }; /* OVS_KEY_ATTR_CT_STATE flags */ diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h index 6192898f76e3..1c4b56b3c051 100644 --- a/include/openvswitch/meta-flow.h +++ b/include/openvswitch/meta-flow.h @@ -1745,7 +1745,7 @@ enum OVS_PACKED_ENUM mf_field_id { /* "nsh_flags". * - * flags field in NSH base header (8 bits). + * flags field in NSH base header. * * Type: u8. * Maskable: bitwise. @@ -1753,13 +1753,13 @@ enum OVS_PACKED_ENUM mf_field_id { * Prerequisites: NSH. * Access: read/write. * NXM: none. - * OXM: OXM_NSH_FLAGS(1)
Re: [ovs-dev] [PATCH] dpif-netdev: Simplify emc replacement policy.
-Original Message- From: Ilya MaximetsDate: Monday, July 31, 2017 at 7:25 AM To: Darrell Ball , "Wang, Yipeng1" , "ovs-dev@openvswitch.org" Cc: Heetae Ahn Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Simplify emc replacement policy. On 31.07.2017 04:41, Darrell Ball wrote: > > > -Original Message- > From: on behalf of "Wang, Yipeng1" > Date: Friday, July 28, 2017 at 11:04 AM > To: Ilya Maximets , "ovs-dev@openvswitch.org" > Cc: Heetae Ahn > Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Simplify emc replacement policy. > > Good catch. But I think the hash comparison is to "randomly" choose one of the two entries to replace when both entries are live. > Your change would always replace the first one in such case. It might cause some thrashing issue for certain traffic. Meanwhile, to my experience, the original "hash comparison" is also not a good way to choose random entry, I encountered some thrashing issue before. > > I think we want some condition like below, but a way to fast choose a random entry. > > if (!to_be_replaced || (emc_entry_alive(to_be_replaced) && !emc_entry_alive(current_entry) ) > to_be_replaced = current_entry; > else if((emc_entry_alive(to_be_replaced) && (emc_entry_alive(current_entry)) > to_be_replaced = random_entry; I agree that we need to have something like random choosing of active entry to replace. I though about this a little and came up with idea to reuse the random value generated for insertion probability. This should give a good distribution for replacement. I'll send v2 soon with that approach. The effect here is highly data dependent and in fact dominated by the packet distribution which will not be random or rather pseudo-random. I had done my own testing with pseudo random flows, fwiw. I did not see any thrashing with even at 4000 flows and saw one alive/alive choice at 8000. We can also see the data dependency with this patch in this first version. This patch removed all randomness when choosing an entry to replace when both candidates are alive and instead always choose the first entry. However, you observed that this fixed your problem of thrashing with your dataset – if so, the dataset used in your testing may not be very random. This change would have been worse in the general case, but seemed perfect for your dataset. > // > > I agree – we are trying to randomly select one of two live entries with the last condition. > Something like this maybe makes it more clear what we are trying to do ? Your code solves the issue with replacement of alive entries while dead ones exists, but you're still uses hashes as random values which is not right. Hashes are not random and there is no any difference in choosing the first entry or the entry with a bit set in a particular place. There always will be some bad case where you will replace same entries all the time and performance of EMC will be low. > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 47a9fa0..75cc039 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -2051,12 +2051,15 @@ emc_insert(struct emc_cache *cache, const struct netdev_flow_key *key, > } > > /* Replacement policy: put the flow in an empty (not alive) entry, or > - * in the first entry where it can be */ > -if (!to_be_replaced > -|| (emc_entry_alive(to_be_replaced) > -&& !emc_entry_alive(current_entry)) > -|| current_entry->key.hash < to_be_replaced->key.hash) { > + * randomly choose one of the two alive entries to be replaced. */ > +if (!to_be_replaced) { > to_be_replaced = current_entry; > +} else if (emc_entry_alive(to_be_replaced) && !emc_entry_alive(current_entry)) { > +to_be_replaced = current_entry; > +} else if (emc_entry_alive(to_be_replaced) && emc_entry_alive(current_entry)) { > +if (current_entry->key.hash & 0x1) { > +to_be_replaced = current_entry; > +} > } > } > /* We didn't find the miniflow in the cache. > > // > > Thanks > Yipeng > > > -Original Message- > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > > boun...@openvswitch.org] On Behalf Of Ilya Maximets > > Sent: Friday, July 28, 2017
Re: [ovs-dev] [PATCH v4 5/5] redhat: allow dpdk to also run as non-root user
On Fri, Aug 4, 2017 at 1:00 PM, Aaron Conolewrote: > After this commit, users may start a dpdk-enabled ovs setup as a > non-root user. This is accomplished by exporting the $HOME directory, > which dpdk uses to fill in it's semi-persistent RTE configuration. > > This change may be a bit controversial since it modifies /dev/hugepages > as part of starting the ovs-vswitchd to set a hugetlbfs group > ownership. This is used to enable writing to /dev/hugepages so that the > dpdk_init will successfully complete. There is an alternate way of > accomplishing this - namely to initialize DPDK before dropping > privileges. However, this would mean that if DPDK ever grows an uninit > / reinit function, non-root ovs likely could never use it. Indeed ... the modifications to /dev/hugepages don't look ideal ... If this was truly limited to when DPDK was in use, I'd feel better about it. We want to build a single package for OVS, right? The package will have DPDK enabled, even for normal uses that won't use DPDK. That means these modifications take place even for non-DPDK use. I'd feel more comfortable if it could be restricted to only when DPDK was actually in use. Maybe some of this logic could be moved into ovs-ctl so that the check could be at runtime? > > This does not change OvS+DPDK's SELinux requirements. It still must be > disabled. > > Signed-off-by: Aaron Conole > --- > Documentation/intro/install/dpdk.rst| 7 +++ > NEWS| 1 + > rhel/README.RHEL.rst| 11 +++ > rhel/openvswitch-fedora.spec.in | 13 + > rhel/usr_lib_systemd_system_ovs-vswitchd.service.in | 5 + > 5 files changed, 37 insertions(+) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 1/5] redhat: allow arbitrary user:group
On Fri, Aug 4, 2017 at 2:31 PM, Russell Bryantwrote: > On Fri, Aug 4, 2017 at 1:00 PM, Aaron Conole wrote: >> Under rpm based distributions, the only user:group that the rhel daemons run >> as is 'root:root'. This is fine as a default, but as part of a security >> procedure, users may want to run as an alternate uid/gid. This commit >> adds an OVS_USER_ID environment variable for systemd, which defaults to >> root:root, but can be overridden by changing the /etc/sysconfig/openvswitch >> environment file. >> >> Acked-by: Markos Chandras >> Signed-off-by: Aaron Conole >> --- >> rhel/automake.mk | 1 + >> rhel/etc_openvswitch_default.conf | 5 + >> rhel/openvswitch-fedora.spec.in | 4 >> rhel/usr_lib_systemd_system_ovs-vswitchd.service | 3 +++ >> rhel/usr_lib_systemd_system_ovsdb-server.service | 3 +++ >> rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template | 3 +++ >> 6 files changed, 19 insertions(+) >> create mode 100644 rhel/etc_openvswitch_default.conf > > >> diff --git a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template >> b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template >> index 3050a07..fdaee00 100644 >> --- a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template >> +++ b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template >> @@ -21,3 +21,6 @@ >> # --ovsdb-server-wrapper=valgrind >> # >> OPTIONS="" >> + >> +# Uncomment and set the OVS User/Group value >> +#OVS_USER_ID="openvswitch:openvswitch" > > Is this really needed? How about just documenting the use of > --ovs-user with OPTIONS above? Nevermind, I see how else this is being used once I read the next patch ... -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC probability.
-Original Message- From: Ilya MaximetsDate: Friday, August 4, 2017 at 7:17 AM To: "ovs-dev@openvswitch.org" Cc: Heetae Ahn , Darrell Ball , Yipeng Wang , Kevin Traynor , Ciara Loftus , Antonio Fischetti , Ilya Maximets Subject: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC probability. Currently, real insertion probability is higher than configured for the maximum case because of wrong usage of the random value. i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min' equals to 1. In this case we're allowing insert if random vailue is less or equal to 1. So, two of 2**32 values (0 and 1) are allowed and real probability is 2 times higher than configured. This happens because 'random_uint32()' returns value in range [0; UINT32_MAX], but for the checking to be correct we should generate random value in range [0; UINT32_MAX - 1]. I understand the calculation is slightly off. If the user enters 4,294,967,295 then the probability to insert into emc will be 2 out of 4,294,967,295 rather than 1 out of 4,294,967,295, However, is there a general concern about such a low probability anyways ? This max inverse value would be rarely, if ever used and if used, it would be impossible to see the difference in any real use case. The user might as well just disable emc rather than use such tiny probabilities. This existing api was discussed extensively and was very contentious. However, if patch 2 really has an absolute dependency on this patch 1, we can include it. I have done various testing and don’t see that, but I have some comments on the other threads. To fix this we have 4 possible solutions: 1. need to use uint64_t for 'emc-insert-min' and calculate it as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full range [0; UINT32_MAX]. This may decrease performance becaue of 64 bit atomic ops. 2. Forbid the '2**32 - 1' as the value for 'emc-insert-min' because it's the only value we have issues with. This will require additional explanations and not very friendly for users. 3. Generate random value in range [0; UINT32_MAX - 1]. This will require heavy division operation. 4. Decrease the range of available values for 'emc-insert-inv-prob'. Actually, we don't need to have so much different values for that option. I beleve that values higher than 1M are completely useless. Choosing the upper limit as a power of 2 like 2**20 we will be able to mask the generated random value in a fast way and also avoid range issue, because same uint32_t can be used to store 2**20. This patch implements solution #4. CC: Ciara Loftus Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert") Signed-off-by: Ilya Maximets Acked-by: Antonio Fischetti --- Infrastructure and logic introduced here will be used for fixing emc replacement policy. lib/dpif-netdev.c| 14 ++ vswitchd/vswitch.xml | 3 ++- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8ecc9d1..e23c674 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -153,9 +153,14 @@ struct netdev_flow_key { #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) #define EM_FLOW_HASH_SEGS 2 +/* Set up maximum inverse EMC insertion probability to 2^20 - 1. + * Higher values considered useless in practice. */ +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20 +#define EM_FLOW_INSERT_INV_PROB_MAX (1 << EM_FLOW_INSERT_INV_PROB_SHIFT) +#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1) /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */ #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100 -#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ +#define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /\ DEFAULT_EM_FLOW_INSERT_INV_PROB) struct emc_entry { @@ -2092,7 +2097,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd, uint32_t min; atomic_read_relaxed(>dp->emc_insert_min, ); -if (min && random_uint32() <= min) { +if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) { emc_insert(>flow_cache, key, flow); } } @@ -2909,8 +2914,9 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
Re: [ovs-dev] [PATCH v4 1/5] redhat: allow arbitrary user:group
On Fri, Aug 4, 2017 at 1:00 PM, Aaron Conolewrote: > Under rpm based distributions, the only user:group that the rhel daemons run > as is 'root:root'. This is fine as a default, but as part of a security > procedure, users may want to run as an alternate uid/gid. This commit > adds an OVS_USER_ID environment variable for systemd, which defaults to > root:root, but can be overridden by changing the /etc/sysconfig/openvswitch > environment file. > > Acked-by: Markos Chandras > Signed-off-by: Aaron Conole > --- > rhel/automake.mk | 1 + > rhel/etc_openvswitch_default.conf | 5 + > rhel/openvswitch-fedora.spec.in | 4 > rhel/usr_lib_systemd_system_ovs-vswitchd.service | 3 +++ > rhel/usr_lib_systemd_system_ovsdb-server.service | 3 +++ > rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template | 3 +++ > 6 files changed, 19 insertions(+) > create mode 100644 rhel/etc_openvswitch_default.conf > diff --git a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template > b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template > index 3050a07..fdaee00 100644 > --- a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template > +++ b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template > @@ -21,3 +21,6 @@ > # --ovsdb-server-wrapper=valgrind > # > OPTIONS="" > + > +# Uncomment and set the OVS User/Group value > +#OVS_USER_ID="openvswitch:openvswitch" Is this really needed? How about just documenting the use of --ovs-user with OPTIONS above? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/6] userspace: Add support for NSH MD1 match fields
On Thu, Aug 03, 2017 at 09:14:55AM +0800, Yi Yang wrote: > From: Jan Scheurich> > This patch adds support for NSH packet header fields to the OVS > control plane and the userspace datapath. Initially we support the > fields of the NSH base header as defined in > https://www.ietf.org/id/draft-ietf-sfc-nsh-13.txt > and the fixed context headers specified for metadata format MD1. > The variable length MD2 format is parsed but the TLV context headers > are not yet available for matching. > > The NSH fields are modelled as OXM fields with the dedicated OXM > class 0x8004 proposed for NSH in ONF. The following fields are defined: > > OXM codeofctl nameSize Comment > = > OXM_NSH_FLAGS nsh_flags 8 Bits 2-9 of 1st NSH word > (0x8004,1) > OXM_NSH_MDTYPE nsh_mdtype 8 Bits 16-23 > (0x8004,2) > OXM_NSH_NEXTPROTO nsh_np 8 Bits 24-31 > (0x8004,3) > OXM_NSH_SPI nsh_spi 24 Bits 0-23 of 2nd NSH word > (0x8004,4) > OXM_NSH_SI nsh_si 8 Bits 24-31 > (0x8004,5) > OXM_NSH_C1 nsh_c1 32 Maskable, nsh_mdtype==1 > (0x8004,6) > OXM_NSH_C2 nsh_c2 32 Maskable, nsh_mdtype==1 > (0x8004,7) > OXM_NSH_C3 nsh_c3 32 Maskable, nsh_mdtype==1 > (0x8004,8) > OXM_NSH_C4 nsh_c4 32 Maskable, nsh_mdtype==1 > (0x8004,9) > > Co-authored-by: Johnson Li > Signed-off-by: Yi Yang > Signed-off-by: Jan Scheurich Thanks for working on this! I do not think that we should use the ONF-reserved class 0x8004. Although some drafts assign 0x8004 for NSH use, I do not think that any of them have been officially approved. So, I suggest that we use a randomly chosen OUI to form an experimenter class. The documentation in meta-flow.xml is incomplete. Please expand it to include the kinds of details that we see in other sections of the document. Please add OVS_KEY_ATTR_NSH to the existing #ifndef __KERNEL__ block. In meta-flow.xml, please indicate the 24-bit size of nsh_spi as part of the type. Also, it's not necessary to redundantly include the size in a comment. I find the NSH version field confusing. The diagram shows the version as the most significant 2 bits of the first 16-bit field of the header, and NSH_VER_MASK is a 2-bit mask, but NSH_VER_SHIFT is defined as 12. Why isn't it 14? I don't see anything in parse_nsh() that verifies that the packet's length is long enough for the NSH header length. Probably, it needs to check before trying to access the length field, and then again when to verify that it's long enough for the encoded length. In format_nsh_masked(), it is redundant to check the masks before each call, because the format_*_masked() functions check them too. In mf_is_value_valid(), I think we should verify that the high 8 bits of MFF_NSH_SPI are zero, since it is a 24-bit field. I'm appending a suggested incremental that addresses many of the issues I noted above. Thanks, Ben. --8<--cut here-->8-- diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields index 7b6151f3384b..184b75e36bef 100755 --- a/build-aux/extract-ofp-fields +++ b/build-aux/extract-ofp-fields @@ -67,9 +67,9 @@ PREREQS = {"none": "MFP_NONE", # If a name matches more than one prefix, the longest one is used. OXM_CLASSES = {"NXM_OF_":(0, 0x), "NXM_NX_":(0, 0x0001), + "NXOXM_NSH_": (0x005ad650, 0x), "OXM_OF_":(0, 0x8000), "OXM_OF_PKT_REG": (0, 0x8001), - "OXM_NSH_": (0, 0x8004), "ONFOXM_ET_": (0x4f4e4600, 0x), # This is the experimenter OXM class for Nicira, which is the diff --git a/datapath/linux/compat/include/linux/openvswitch.h b/datapath/linux/compat/include/linux/openvswitch.h index 2381ed2c8c3c..a98ecc0ebedc 100644 --- a/datapath/linux/compat/include/linux/openvswitch.h +++ b/datapath/linux/compat/include/linux/openvswitch.h @@ -360,9 +360,6 @@ enum ovs_key_attr { OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking labels */ OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4, /* struct ovs_key_ct_tuple_ipv4 */ OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6, /* struct ovs_key_ct_tuple_ipv6 */ -#ifndef __KERNEL__ - OVS_KEY_ATTR_NSH, /* struct ovs_key_nsh */ -#endif #ifdef __KERNEL__ /* Only used within kernel data path. */ @@ -372,6 +369,7 @@ enum ovs_key_attr { #ifndef __KERNEL__ /* Only used within userspace data path. */ OVS_KEY_ATTR_PACKET_TYPE, /* be32 packet type */ + OVS_KEY_ATTR_NSH, /* struct ovs_key_nsh */ #endif __OVS_KEY_ATTR_MAX diff --git
Re: [ovs-dev] [PATCH] tests: fix wrapped comment
On Fri, Aug 4, 2017 at 10:15 AM, Lance Richardsonwrote: > Add missing '#' to comment line. > > Signed-off-by: Lance Richardson > --- > It's odd that this doesn't result in failures in more environments, > but it does fail under Alpine Linux. Definitely odd ... thanks for the patch, applied to master. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] tests: avoid non-posix options to wc
Thanks, applied to master. On Fri, Aug 4, 2017 at 10:26 AM, Lance Richardsonwrote: > The '--lines' option for the wc command is a GNU extension and is not > recognized by some implemenations. Use the POSIX 1003.1 '-l' option > instead. > > Signed-off-by: Lance Richardson > --- > tests/ovn.at | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tests/ovn.at b/tests/ovn.at > index 29c5fc0ab..7d816785e 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -7442,12 +7442,12 @@ > expected=${dst_mac}${src_mac}0800451c3f110100${src_ip}${dst_ip}00351 > echo $expected >> hv2-vif1.expected > OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [hv2-vif1.expected]) > > -AT_CHECK([ovn-sbctl --bare --columns _uuid find Port_Binding > logical_port=cr-alice | wc --lines], [0], [1 > +AT_CHECK([ovn-sbctl --bare --columns _uuid find Port_Binding > logical_port=cr-alice | wc -l], [0], [1 > ]) > > ovn-nbctl --timeout=3 --wait=sb remove Logical_Router_Port alice options > redirect-chassis > > -AT_CHECK([ovn-sbctl find Port_Binding logical_port=cr-alice | wc --lines], > [0], [0 > +AT_CHECK([ovn-sbctl find Port_Binding logical_port=cr-alice | wc -l], [0], [0 > ]) > > OVN_CLEANUP([hv1],[hv2],[hv3]) > -- > 2.13.3 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 5/5] redhat: allow dpdk to also run as non-root user
After this commit, users may start a dpdk-enabled ovs setup as a non-root user. This is accomplished by exporting the $HOME directory, which dpdk uses to fill in it's semi-persistent RTE configuration. This change may be a bit controversial since it modifies /dev/hugepages as part of starting the ovs-vswitchd to set a hugetlbfs group ownership. This is used to enable writing to /dev/hugepages so that the dpdk_init will successfully complete. There is an alternate way of accomplishing this - namely to initialize DPDK before dropping privileges. However, this would mean that if DPDK ever grows an uninit / reinit function, non-root ovs likely could never use it. This does not change OvS+DPDK's SELinux requirements. It still must be disabled. Signed-off-by: Aaron Conole--- Documentation/intro/install/dpdk.rst| 7 +++ NEWS| 1 + rhel/README.RHEL.rst| 11 +++ rhel/openvswitch-fedora.spec.in | 13 + rhel/usr_lib_systemd_system_ovs-vswitchd.service.in | 5 + 5 files changed, 37 insertions(+) diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index b2b91d4..4b0828c 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -134,6 +134,13 @@ has to be configured with DPDK support (``--with-dpdk``). Additional information can be found in :doc:`general`. +.. note:: + If you are running using the Fedora or Red Hat package, the Open vSwitch + daemon will run as a non-root user. This implies that you must have a + working IOMMU. Visit the `RHEL README`__ for additional information. + +__ https://github.com/openvswitch/ovs/blob/master/rhel/README.RHEL.rst + Setup - diff --git a/NEWS b/NEWS index a89e718..a7bc1c3 100644 --- a/NEWS +++ b/NEWS @@ -70,6 +70,7 @@ Post-v2.7.0 First supported use case is encap/decap for Ethernet. - Fedora Packaging: * OVN services are no longer restarted automatically after upgrade. + * ovs-vswitchd and ovsdb-server run as non-root users by default. - Add --cleanup option to command 'ovs-appctl exit' (see ovs-vswitchd(8)). - L3 tunneling: * Use new tunnel port option "packet_type" to configure L2 vs. L3. diff --git a/rhel/README.RHEL.rst b/rhel/README.RHEL.rst index 6affdba..f3d2942 100644 --- a/rhel/README.RHEL.rst +++ b/rhel/README.RHEL.rst @@ -337,6 +337,17 @@ running. All other commands where executed when Open vSwitch was successfully running. +Non-root User Support +--- +Fedora and RHEL support running the Open vSwitch daemons as a non-root user. +By default, a fresh installation will create an *openvswitch* user, along +with any additional support groups needed (such as *hugetlbfs* for DPDK +support). + +This is controlled by modifying the ``OVS_USER_ID`` option. Setting this +to 'root:root', or commenting the variable out will revert this behavior. + + Reporting Bugs -- diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in index 1061824..2eccada 100644 --- a/rhel/openvswitch-fedora.spec.in +++ b/rhel/openvswitch-fedora.spec.in @@ -95,6 +95,10 @@ Requires: openssl hostname iproute module-init-tools Requires(post): /usr/bin/getent Requires(post): /usr/sbin/useradd Requires(post): /usr/bin/sed +%if %{with dpdk} +Requires(post): /usr/sbin/usermod +Requires(post): /usr/sbin/groupadd +%endif Requires(post): systemd-units Requires(preun): systemd-units Requires(postun): systemd-units @@ -379,6 +383,15 @@ if [ $1 -eq 1 ]; then sed -i 's:^#OVS_USER_ID=:OVS_USER_ID=:' /etc/sysconfig/openvswitch +%if %{with dpdk} +getent group hugetlbfs >/dev/null || \ +groupadd hugetlbfs +usermod -a -G hugetlbfs openvswitch +sed -i \ + 's@OVS_USER_ID="openvswitch:openvswitch"@OVS_USER_ID="openvswitch:hugetlbfs"@'\ +/etc/sysconfig/openvswitch +%endif + # In the case of upgrade, this is not needed. chown -R openvswitch:openvswitch /etc/openvswitch fi diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in index 9aff70b..bf0f058 100644 --- a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in +++ b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in @@ -10,8 +10,13 @@ PartOf=openvswitch.service [Service] Type=forking Restart=on-failure +Environment=HOME=/var/run/openvswitch EnvironmentFile=/etc/openvswitch/default.conf EnvironmentFile=-/etc/sysconfig/openvswitch +@begin_dpdk@ +ExecStartPre=/usr/bin/chown :hugetlbfs /dev/hugepages +ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages +@end_dpdk@ ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \ --no-ovsdb-server --no-monitor --system-id=random \ --ovs-user=${OVS_USER_ID} \ -- 2.9.4 ___ dev mailing list
[ovs-dev] [PATCH v4 3/5] dpdkstrip: add a preprocessor tool for stripping dpdk blocks
Normally, in C code, pre-processing macros can be used to enable/disable specific functionality based on switches passed to configure. This works for DPDK using the --with-dpdk flag, which sets the DPDK_NETDEV define to the appropriate value. However, not all files are processed with the C pre-processor. For those files which are not, this commit adds a new pre-processor tool for .in files to either include or exclude those stanzas as appropriate. Signed-off-by: Aaron Conole--- Makefile.am| 1 + build-aux/dpdkstrip.pl | 35 +++ 2 files changed, 36 insertions(+) create mode 100644 build-aux/dpdkstrip.pl diff --git a/Makefile.am b/Makefile.am index 373ef6e..035afe6 100644 --- a/Makefile.am +++ b/Makefile.am @@ -82,6 +82,7 @@ EXTRA_DIST = \ build-aux/cksum-schema-check \ build-aux/calculate-schema-cksum \ build-aux/dist-docs \ + build-aux/dpdkstrip.pl \ build-aux/sodepends.pl \ build-aux/soexpand.pl \ build-aux/xml2nroff \ diff --git a/build-aux/dpdkstrip.pl b/build-aux/dpdkstrip.pl new file mode 100644 index 000..98539cc --- /dev/null +++ b/build-aux/dpdkstrip.pl @@ -0,0 +1,35 @@ +# Copyright (c) 2017 Red Hat, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at: +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +use strict; +use warnings; +use Getopt::Long; + +my ($check_dpdk) = 0; +my ($disabled_print) = 0; + +Getopt::Long::Configure ("bundling"); +GetOptions("dpdk!" => \$check_dpdk) or exit(1); + +OUTER: while () { +if (/@(begin|end)_dpdk@/) { +if (!$check_dpdk) { +$disabled_print = ! $disabled_print; +} +next; +} + +print $_ unless $disabled_print; +} +exit 0; -- 2.9.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 4/5] redhat: dynamic service file for vswitchd
This commit changes the service file from static configuration to an autogenerated file, produced during the build. This will be relevant in a future commit. Signed-off-by: Aaron Conole--- rhel/.gitignore | 1 + rhel/automake.mk | 4 +++- rhel/openvswitch-fedora.spec.in | 9 + ...hd.service => usr_lib_systemd_system_ovs-vswitchd.service.in} | 0 4 files changed, 13 insertions(+), 1 deletion(-) rename rhel/{usr_lib_systemd_system_ovs-vswitchd.service => usr_lib_systemd_system_ovs-vswitchd.service.in} (100%) diff --git a/rhel/.gitignore b/rhel/.gitignore index 164bb66..e584a1e 100644 --- a/rhel/.gitignore +++ b/rhel/.gitignore @@ -4,3 +4,4 @@ openvswitch-kmod-rhel6.spec openvswitch-kmod-fedora.spec openvswitch.spec openvswitch-fedora.spec +usr_lib_systemd_system_ovs-vswitchd.service diff --git a/rhel/automake.mk b/rhel/automake.mk index 2d9443f..11c8be0 100644 --- a/rhel/automake.mk +++ b/rhel/automake.mk @@ -28,13 +28,15 @@ EXTRA_DIST += \ rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template \ rhel/usr_lib_systemd_system_openvswitch.service \ rhel/usr_lib_systemd_system_ovsdb-server.service \ - rhel/usr_lib_systemd_system_ovs-vswitchd.service \ + rhel/usr_lib_systemd_system_ovs-vswitchd.service.in \ rhel/usr_lib_systemd_system_ovn-controller.service \ rhel/usr_lib_systemd_system_ovn-controller-vtep.service \ rhel/usr_lib_systemd_system_ovn-northd.service \ rhel/usr_lib_firewalld_services_ovn-central-firewall-service.xml \ rhel/usr_lib_firewalld_services_ovn-host-firewall-service.xml +DISTCLEANFILES += rhel/usr_lib_systemd_system_ovs-vswitchd.service + update_rhel_spec = \ $(AM_V_GEN)($(ro_shell) && sed -e 's,[@]VERSION[@],$(VERSION),g') \ < $(srcdir)/rhel/$(@F).in > $(@F).tmp || exit 1; \ diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in index fe86054..1061824 100644 --- a/rhel/openvswitch-fedora.spec.in +++ b/rhel/openvswitch-fedora.spec.in @@ -225,6 +225,15 @@ Docker network plugins for OVN. --enable-ssl \ --with-pkidir=%{_sharedstatedir}/openvswitch/pki +/usr/bin/perl build-aux/dpdkstrip.pl \ +%if %{with dpdk} + --dpdk \ +%else + --nodpdk \ +%endif + < rhel/usr_lib_systemd_system_ovs-vswitchd.service.in \ + > rhel/usr_lib_systemd_system_ovs-vswitchd.service + make %{?_smp_mflags} cd selinux make -f %{_datadir}/selinux/devel/Makefile diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in similarity index 100% rename from rhel/usr_lib_systemd_system_ovs-vswitchd.service rename to rhel/usr_lib_systemd_system_ovs-vswitchd.service.in -- 2.9.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 2/5] redhat: dynamically allocate and reference ovs user
After this commit, the fedora RPM will create the openvswitch user, from the non-static pool, for use as an Open vSwitch daemon user. This only happens on install - not upgrade. This will be the default user:group combination for the openvswitch daemons. To do this in a way that doesn't impact existing installations, the /etc/openvswitch directory will be created during the installation, rather than being provided as part of the rpm. Acked-by: Markos ChandrasSigned-off-by: Aaron Conole --- rhel/openvswitch-fedora.spec.in | 13 + rhel/usr_lib_systemd_system_ovsdb-server.service | 1 + 2 files changed, 14 insertions(+) diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in index a779c9a..fe86054 100644 --- a/rhel/openvswitch-fedora.spec.in +++ b/rhel/openvswitch-fedora.spec.in @@ -92,6 +92,9 @@ Requires: openssl hostname iproute module-init-tools #Upstream kernel commit 4f647e0a3c37b8d5086214128614a136064110c3 #Requires: kernel >= 3.15.0-0 +Requires(post): /usr/bin/getent +Requires(post): /usr/sbin/useradd +Requires(post): /usr/bin/sed Requires(post): systemd-units Requires(preun): systemd-units Requires(postun): systemd-units @@ -361,6 +364,16 @@ rm -rf $RPM_BUILD_ROOT %endif %post +if [ $1 -eq 1 ]; then +getent passwd openvswitch >/dev/null || \ +useradd -r -d / -s /sbin/nologin -c "Open vSwitch Daemons" openvswitch + +sed -i 's:^#OVS_USER_ID=:OVS_USER_ID=:' /etc/sysconfig/openvswitch + +# In the case of upgrade, this is not needed. +chown -R openvswitch:openvswitch /etc/openvswitch +fi + %if 0%{?systemd_post:1} %systemd_post %{name}.service %else diff --git a/rhel/usr_lib_systemd_system_ovsdb-server.service b/rhel/usr_lib_systemd_system_ovsdb-server.service index b82cb33..7acd25f 100644 --- a/rhel/usr_lib_systemd_system_ovsdb-server.service +++ b/rhel/usr_lib_systemd_system_ovsdb-server.service @@ -10,6 +10,7 @@ Type=forking Restart=on-failure EnvironmentFile=/etc/openvswitch/default.conf EnvironmentFile=-/etc/sysconfig/openvswitch +ExecStartPre=/usr/bin/chown ${OVS_USER_ID} /var/run/openvswitch ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \ --no-ovs-vswitchd --no-monitor --system-id=random \ --ovs-user=${OVS_USER_ID} \ -- 2.9.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 1/5] redhat: allow arbitrary user:group
Under rpm based distributions, the only user:group that the rhel daemons run as is 'root:root'. This is fine as a default, but as part of a security procedure, users may want to run as an alternate uid/gid. This commit adds an OVS_USER_ID environment variable for systemd, which defaults to root:root, but can be overridden by changing the /etc/sysconfig/openvswitch environment file. Acked-by: Markos ChandrasSigned-off-by: Aaron Conole --- rhel/automake.mk | 1 + rhel/etc_openvswitch_default.conf | 5 + rhel/openvswitch-fedora.spec.in | 4 rhel/usr_lib_systemd_system_ovs-vswitchd.service | 3 +++ rhel/usr_lib_systemd_system_ovsdb-server.service | 3 +++ rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template | 3 +++ 6 files changed, 19 insertions(+) create mode 100644 rhel/etc_openvswitch_default.conf diff --git a/rhel/automake.mk b/rhel/automake.mk index 1265fa7..2d9443f 100644 --- a/rhel/automake.mk +++ b/rhel/automake.mk @@ -10,6 +10,7 @@ EXTRA_DIST += \ rhel/automake.mk \ rhel/etc_init.d_openvswitch \ rhel/etc_logrotate.d_openvswitch \ + rhel/etc_openvswitch_default.conf \ rhel/etc_sysconfig_network-scripts_ifdown-ovs \ rhel/etc_sysconfig_network-scripts_ifup-ovs \ rhel/openvswitch-dkms.spec \ diff --git a/rhel/etc_openvswitch_default.conf b/rhel/etc_openvswitch_default.conf new file mode 100644 index 000..c74417d --- /dev/null +++ b/rhel/etc_openvswitch_default.conf @@ -0,0 +1,5 @@ +# DO NOT EDIT THIS FILE + +# The following is the *default* configuration for the openvswitch user ID. +# This is for backward compatibility. +OVS_USER_ID="root:root" diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in index 1cad940..a779c9a 100644 --- a/rhel/openvswitch-fedora.spec.in +++ b/rhel/openvswitch-fedora.spec.in @@ -246,6 +246,9 @@ done install -m 0755 rhel/etc_init.d_openvswitch \ $RPM_BUILD_ROOT%{_datadir}/openvswitch/scripts/openvswitch.init +install -p -D -m 0644 rhel/etc_openvswitch_default.conf \ +$RPM_BUILD_ROOT/%{_sysconfdir}/openvswitch/default.conf + install -p -D -m 0644 rhel/etc_logrotate.d_openvswitch \ $RPM_BUILD_ROOT/%{_sysconfdir}/logrotate.d/openvswitch @@ -475,6 +478,7 @@ fi %{_sysconfdir}/bash_completion.d/ovs-appctl-bashcomp.bash %{_sysconfdir}/bash_completion.d/ovs-vsctl-bashcomp.bash %dir %{_sysconfdir}/openvswitch +%{_sysconfdir}/openvswitch/default.conf %config %ghost %{_sysconfdir}/openvswitch/conf.db %ghost %{_sysconfdir}/openvswitch/.conf.db.~lock~ %config %ghost %{_sysconfdir}/openvswitch/system-id.conf diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service b/rhel/usr_lib_systemd_system_ovs-vswitchd.service index 886b68a..9aff70b 100644 --- a/rhel/usr_lib_systemd_system_ovs-vswitchd.service +++ b/rhel/usr_lib_systemd_system_ovs-vswitchd.service @@ -10,12 +10,15 @@ PartOf=openvswitch.service [Service] Type=forking Restart=on-failure +EnvironmentFile=/etc/openvswitch/default.conf EnvironmentFile=-/etc/sysconfig/openvswitch ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \ --no-ovsdb-server --no-monitor --system-id=random \ + --ovs-user=${OVS_USER_ID} \ start $OPTIONS ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server \ --no-monitor --system-id=random \ + --ovs-user=${OVS_USER_ID} \ restart $OPTIONS TimeoutSec=300 diff --git a/rhel/usr_lib_systemd_system_ovsdb-server.service b/rhel/usr_lib_systemd_system_ovsdb-server.service index 68deace..b82cb33 100644 --- a/rhel/usr_lib_systemd_system_ovsdb-server.service +++ b/rhel/usr_lib_systemd_system_ovsdb-server.service @@ -8,12 +8,15 @@ PartOf=openvswitch.service [Service] Type=forking Restart=on-failure +EnvironmentFile=/etc/openvswitch/default.conf EnvironmentFile=-/etc/sysconfig/openvswitch ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \ --no-ovs-vswitchd --no-monitor --system-id=random \ + --ovs-user=${OVS_USER_ID} \ start $OPTIONS ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd stop ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd \ + --ovs-user=${OVS_USER_ID} \ --no-monitor restart $OPTIONS RuntimeDirectory=openvswitch RuntimeDirectoryMode=0755 diff --git a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template index 3050a07..fdaee00 100644 --- a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template +++ b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template @@ -21,3 +21,6 @@ # --ovsdb-server-wrapper=valgrind # OPTIONS="" + +# Uncomment and set the OVS User/Group value
[ovs-dev] [PATCH v4 0/5] rhel/fedora: non-root OvS out of the box
This series attempts to introduce the ability to start and use Open vSwitch 'out of the box' as a non-root user. It does this by modifying the service files to pass the recently introduced --ovs-user argument around, and by making some minor tweaks to the passwd, group, and filesystem information. I prefixed the packaging work with 'redhat', but if rpm or packaging is a preferred prefx for that work, I can respin. The more controversial changes are: * This modifies the /etc/sysconfig/ file on install. * The dpdk support directly modifies /dev/hugepages with a call to chmod * A new user 'openvswitch', and up to two new groups 'openvswitch', and 'hugetlbfs' are created After this series: > [root at wsfd-netdev60 ~]# yum install openvswitch-2.7.90-1.fc25.x86_64.rpm > Loaded plugins: product-id, search-disabled-repos, subscription-manager > This system is not registered to Red Hat Subscription Management. You can use > subscription-manager to register. > Examining openvswitch-2.7.90-1.fc25.x86_64.rpm: > openvswitch-2.7.90-1.fc25.x86_64 > Marking openvswitch-2.7.90-1.fc25.x86_64.rpm to be installed > Resolving Dependencies > --> Running transaction check > ---> Package openvswitch.x86_64 0:2.7.90-1.fc25 will be installed > --> Finished Dependency Resolution > > Dependencies Resolved > > > Package ArchVersion Repository > Size > > Installing: > openvswitch x86_64 2.7.90-1.fc25/openvswitch-2.7.90-1.fc25.x86_64 11 > M > > Transaction Summary > > Install 1 Package > > Total size: 11 M > Installed size: 11 M > Is this ok [y/d/N]: y > Downloading packages: > Running transaction check > Running transaction test > Transaction test succeeded > Running transaction > Installing : openvswitch-2.7.90-1.fc25.x86_64 > 1/1 > Verifying : openvswitch-2.7.90-1.fc25.x86_64 > 1/1 > > Installed: > openvswitch.x86_64 0:2.7.90-1.fc25 > > > Complete! > [root at wsfd-netdev60 ~]# systemctl start openvswitch > [root at wsfd-netdev60 ~]# ps aux | grep ovs > openvsw+ 12642 0.0 0.0 47864 2296 ?S ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info > --remote=punix:/var/run/openvswitch/db.sock > --private-key=db:Open_vSwitch,SSL,private_key > --certificate=db:Open_vSwitch,SSL,certificate > --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user > openvswitch:openvswitch --no-chdir > --log-file=/var/log/openvswitch/ovsdb-server.log > --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach > openvsw+ 12688 0.0 0.0 49588 10600 ?S ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err > -vfile:info --mlockall --user openvswitch:openvswitch --no-chdir > --log-file=/var/log/openvswitch/ovs-vswitchd.log > --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach v1->v2: https://lists.linux-foundation.org/pipermail/ovs-dev/2017-June/333417.html The previous method used 3 different locations of configuration from environment variables: 1. The systemd file. 2. A new /etc/sysconfig/openvswitch-pre 3. The existing /etc/sysconfig/openvswitch Now, configuration is from two areas: 1. A new /etc/openvswitch/default.conf 2. The existing /etc/sysconfig/openvswitch As part of the install, we set the OVS_USER_ID to the new values. v2->v3: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/334946.html Refactor for the dpdk non-root user portion due to an issue discovered where the generated service file didn't honor new configuration when re-running ./configure. Also, converted the "Reviewed-by" to "Acked-by". This is because there is no such thing as Reviewed-by in the OVS contributing guide. Finally, included some documentation updates. v3->v4: https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336558.html Remove the makefile modifications Aaron Conole (5): redhat: allow arbitrary user:group redhat: dynamically allocate and reference ovs user dpdkstrip: add a preprocessor tool for stripping dpdk blocks redhat: dynamic service file for vswitchd redhat: allow dpdk to also run as non-root user Documentation/intro/install/dpdk.rst | 7 Makefile.am| 1 + NEWS | 1 + build-aux/dpdkstrip.pl | 35 +++ rhel/.gitignore| 1 + rhel/README.RHEL.rst | 11 ++ rhel/automake.mk | 5 ++- rhel/etc_openvswitch_default.conf | 5 +++ rhel/openvswitch-fedora.spec.in
Re: [ovs-dev] [PATCH v3 6/6] redhat: allow dpdk to also run as non-root user
Sergio Gonzalez Monroywrites: > On 02/08/2017 16:07, Aaron Conole wrote: >> Sergio Gonzalez Monroy writes: >> >>> Hi Aaron, >> Hi Sergio, >> >>> On 01/08/2017 23:05, Aaron Conole wrote: After this commit, users may start a dpdk-enabled ovs setup as a non-root user. This is accomplished by exporting the $HOME directory, which dpdk uses to fill in it's semi-persistent RTE configuration. This change may be a bit controversial since it modifies /dev/hugepages as part of starting the ovs-vswitchd to set a hugetlbfs group ownership. This is used to enable writing to /dev/hugepages so that the dpdk_init will successfully complete. There is an alternate way of accomplishing this - namely to initialize DPDK before dropping privileges. However, this would mean that if DPDK ever grows an uninit / reinit function, non-root ovs likely could never use it. This does not change OvS+DPDK's SELinux requirements. It still must be disabled. Signed-off-by: Aaron Conole --- >>> Instead of modifying /dev/hugepages, what about creating a hugetlbfs >>> mount point for OvS? You could then point DPDK to use that specific >>> mount (--huge-dir). >> With this approach, I need to also insert new information into the >> ovsdb. Additionally, if the user wants to customize it, I'm not sure >> the best way of doing that. Do you have a concrete set of steps you >> think makes sense here? > > I was just checking with the team here (I am not very familiar with > the OvS specifics) and > the option is already in ovsdb 'dpdk-hugepage-dir'. > > Are you referring to something else? Whoops. I forgot to hit send on my reply, it seems. Sorry for that. I'm referring to populating that information in the database in a way that doesn't break the user and also can be done in an automated fashion. That would require either starting ovsdb during the install, only to stop it again, OR possibly adding commands to the startup sequence to set this directory value. Maybe there's some other way, though? > Thanks, > Sergio > >>> The only downside I can think of for this approach is that OvS would >>> be fixed to use a single size (either 2MB or 1GB whatever the mount >>> point is set to). >>> Without specifying the mount point directory, DPDK could use both >>> hugepage sizes. >>> >>> Could you elaborate on the OvS+DPDK's SELinux requirements? >> Sure. We need r/w permissions on vfio labeled devices, and some >> additional unix socket permissions for vhost-user sockets, as well as >> r/w and create bits for hugetlbfs labeled files. I think that was it >> off the top of my head. I have an selinux policy ready to go, but I >> want to get this change in first. >> >>> Quick summary of DPDK privileged/unprivileged user (assuming the >>> unprivileged user has permissions for hugepage allocation): >>> - if all devices are bound to vfio-pci driver (thus IOMMU enabled >>> system), then the DPDK can run as unprivileged user. This is the >>> recommended mode is possible. >>> - if any device is bound to igb_uio/uio_pci_generic (likely scenario >>> when running in VMs), then the DPDK needs privileged user to be able >>> to read each hugepage physical address from /proc/self/pagemap. >> Correct. I agree with the above. >> >> Thanks so much for looking at this! >> >>> Thanks, >>> Sergio >>> >>> Documentation/intro/install/dpdk.rst| 7 +++ NEWS| 1 + rhel/README.RHEL.rst| 11 +++ rhel/openvswitch-fedora.spec.in | 13 + rhel/usr_lib_systemd_system_ovs-vswitchd.service.in | 5 + 5 files changed, 37 insertions(+) diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index a05aa1a..0585c6a 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -134,6 +134,13 @@ has to be configured with DPDK support (``--with-dpdk``). Additional information can be found in :doc:`general`. +.. note:: + If you are running using the Fedora or Red Hat package, the Open vSwitch + daemon will run as a non-root user. This implies that you must have a + working IOMMU. Visit the `RHEL README`__ for additional information. + +__ https://github.com/openvswitch/ovs/blob/master/rhel/README.RHEL.rst + Setup - diff --git a/NEWS b/NEWS index facea02..095272a 100644 --- a/NEWS +++ b/NEWS @@ -64,6 +64,7 @@ Post-v2.7.0 * OpenFlow 1.5 packet-out is now supported. - Fedora Packaging: * OVN services are no longer restarted automatically after upgrade. + * ovs-vswitchd and
Re: [ovs-dev] [PATCH v3 00/19] Add OVS DPDK keep-alive functionality.
HI Ilya, Thanks for looking in to this and providing your feedback. When this feature was first posted as RFC (https://mail.openvswitch.org/pipermail/ovs-dev/2016-July/318243.html), the implementation in OvS was done based on DPDK Keepalive library and keeping collectd in sync. As you can see from RFC it was pretty compact code and integrated well with ceilometer and provided end to end functionality. Much of the RFC code was to handle SHM. However the reviewers pointed below flaws. - Very DPDK specific. - Shared memory for inter process communication(Between OvS and collectd threads). - Tracks PMD cores and not threads. - Limited support to detect false negatives & false positives. - Limited support to query KA status. As per suggestions, below changes were introduced. - Basic infrastructure to register & track threads instead of cores. (Now only PMDs are tracked only but can be extended to track non-PMD threads). - Keep most of the APIs generic so that they can extended in the future. All generic APIs are in Keepalive.[hc] - Remove Shared memory and introduce OvSDB. - Add support to detect false negatives. - appctl options to query status. I agree that we have few issues but they can be reworked. - invoke dpdk_is_enabled() from generic code (vswitchd/bridge.c) isn't nice, I had to do to pass few unit test cases last time. - Half a dozen stub APIs. I couldn't avoid it as they are needed to get the kernel datapath build. The patch series can be categorized in to sub patchesets (KA infrastructure/ OvSDB changes/ Query KA stats / Check False positives). This patch series in the current form is using rte_keepalive library to handle PMD thread. But importantly has introduced basic infrastructure to deal with other threads in the future. Regards, Bhanuprakash. >-Original Message- >From: Ilya Maximets [mailto:i.maxim...@samsung.com] >Sent: Friday, August 4, 2017 2:40 PM >To: ovs-dev@openvswitch.org; Bodireddy, Bhanuprakash >>Cc: Darrell Ball ; Ben Pfaff ; Aaron >Conole >Subject: Re: [ovs-dev] [PATCH v3 00/19] Add OVS DPDK keep-alive >functionality. > >Hi Bhanuprakash, > >Thanks for working on this. >I have a general concern about implementation of this functionality: > >*What is the profit from using rte_keepalive library ?* > >Pros: > >* No need to implement 'rte_keepalive_dispatch_pings()' (40 LOC) > and struct rte_keepalive (30 LOC, can be significantly decreased > by removing not needed elements) ---> ~70 LOC. > >Cons: > >* DPDK dependency: > >* Implementation of PMD threads management (KA) inside netdev code > (netdev-dpdk) looks very strange. >* Many DPDK references in generic code (like dpdk_is_enabled). >* Feature isn't available for the common threads (main?) wihtout DPDK. >* Many stubs and placeholders for cases without dpdk. >* No ability for unit testing. > >So, does it worth to use rte_keepalive? To make functionality fully generic we >only need to implement 'rte_keepalive_dispatch_pings()' >and few helpers. As soon as this function does nothing dpdk-specific it's a >really simple task which will allow to greatly clean up the code. The feature >is >too big to use external library for 70 LOCs of really simple code. (Clean up >should save much more). > >Am I missed something? >Any thoughts? > >Best regards, Ilya Maximets. > >> Keepalive feature is aimed at achieving Fastpath Service Assurance in >> OVS-DPDK deployments. It adds support for monitoring the packet >> processing cores(PMD thread cores) by dispatching heartbeats at >> regular intervals. Incase of heartbeat misses additional health checks >> are enabled on the PMD thread to detect the failure and the same shall >> be reported to higher level fault management systems/frameworks. >> >> The implementation uses OVSDB for reporting the health of the PMD >threads. >> Any external monitoring application can read the status from OVSDB at >> regular intervals (or) subscribe to the updates in OVSDB so that they >> get notified when the changes happen on OVSDB. >> >> keepalive info struct is created and initialized for storing the >> status of the PMD threads. This is initialized by main >> thread(vswitchd) as part of init process and will be periodically updated by >'keepalive' >> thread. keepalive feature can be enabled through below OVSDB settings. >> >> enable-keepalive=true >> - Keepalive feature is disabled by default. >> >> keepalive-interval="5000" >> - Timer interval in milliseconds for monitoring the packet >> processing cores. >> >> When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes >> up at regular intervals to update the timestamp and status of pmd >> cores in keepalive info struct. This information shall be read by >> vswitchd thread and write the status in to 'keepalive' column of
Re: [ovs-dev] [PATCH v3 2/2] dpif-netdev: Fix emc replacement policy.
LGTM Acked-by: Antonio Fischetti> -Original Message- > From: Ilya Maximets [mailto:i.maxim...@samsung.com] > Sent: Friday, August 4, 2017 3:17 PM > To: ovs-dev@openvswitch.org > Cc: Heetae Ahn ; Darrell Ball ; > Wang, Yipeng1 ; Kevin Traynor ; > Loftus, Ciara ; Fischetti, Antonio > ; Ilya Maximets > Subject: [PATCH v3 2/2] dpif-netdev: Fix emc replacement policy. > > Current EMC replacement policy allows to replace active EMC entry > even if there are dead (empty) entries available. This leads to > EMC trashing even on few hundreds of flows. In some cases PMD > threads starts to execute classifier lookups even in tests with > 50 - 100 active flows. > > Looks like the hash comparison rule was introduced to randomly > choose one of alive entries to replace. But it doesn't work as > needed and also hashes has nothing common with randomness. > > Lets fix the replacement policy by removing hash checking and > using the random value passed from 'emc_probabilistic_insert()' > only while considering replace of the alive entry. > This should give us nearly fair way to choose the entry to replace. > > We are avoiding calculation of the new random value by reusing > bits of already generated random for probabilistic EMC insertion. > Bits higher than 'EM_FLOW_INSERT_INV_PROB_SHIFT' are used because > lower bits are less than 'min' and not fully random. > > Not replacing of alive entries while dead ones exists allows to > significantly decrease EMC trashing. > > Testing shows stable work of exact match cache without misses > with up to 3072 - 6144 active flows (depends on traffic pattern). > > Signed-off-by: Ilya Maximets > --- ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side
> > Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum. > So L4 packets's cksum were calculated in VM side but performance is not > good. > Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput and > makes virtio-net frontend-driver support NETIF_F_SG as well > > Signed-off-by: Zhenyu Gao> --- > > Here is some performance number: > > Setup: > > qperf client > +-+ > | VM| > +-+ > | > | qperf server > +--+ ++ > | vswitch+dpdk | | bare-metal | > +--+ ++ >|| >|| > pNic-PhysicalSwitch > > do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx on' > in VM side. > It offload cksum job to ovs-dpdk side. > > do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off' in VM > side. > VM calculate cksum for tcp/udp packets. > > We can see huge improvment in TCP throughput if we leverage ovs-dpdk > cksum. Hi Zhenyu, Thanks for the patch. I tested some alternative use cases and unfortunately I see a degradation for phy-phy and phy-vm-phy topologies. Here are my results: phy-vm-phy: without patch: 0.871Mpps with patch (offload=on): 0.877Mpps with patch (offload=off): 0.891Mpps phy-phy: without patch: 13.581Mpps with patch: 13.055Mpps The half a million pps drop for the second test case is concerning to me but not surprising since we're adding extra complexity to netdev_dpdk_send() Could this be avoided? Would it make sense to put this functionality somewhere else eg. vhost receive? On another note I have a general concern. I understand similar functionality is present in the DPDK vhost sample app. I wonder if it would be feasible for this to be implemented in the DPDK vhost library and leveraged here, rather than having two implementations in two separate code bases. I have some other comments inline. Thanks, Ciara > > [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2 host-qperf-server01 > tcp_bw tcp_lat udp_bw udp_lat > do cksum in ovs-dpdk do cksum in VM without this patch > tcp_bw: > bw = 2.05 MB/secbw = 1.92 MB/secbw = 1.95 MB/sec > tcp_bw: > bw = 3.9 MB/sec bw = 3.99 MB/secbw = 3.98 MB/sec > tcp_bw: > bw = 8.09 MB/secbw = 7.82 MB/secbw = 8.19 MB/sec > tcp_bw: > bw = 14.9 MB/secbw = 14.8 MB/secbw = 15.7 MB/sec > tcp_bw: > bw = 27.7 MB/secbw = 28 MB/sec bw = 29.7 MB/sec > tcp_bw: > bw = 51.2 MB/secbw = 50.9 MB/secbw = 54.9 MB/sec > tcp_bw: > bw = 86.7 MB/secbw = 86.8 MB/secbw = 95.1 MB/sec > tcp_bw: > bw = 149 MB/sec bw = 160 MB/sec bw = 149 MB/sec > tcp_bw: > bw = 211 MB/sec bw = 205 MB/sec bw = 216 MB/sec > tcp_bw: > bw = 271 MB/sec bw = 254 MB/sec bw = 275 MB/sec > tcp_bw: > bw = 326 MB/sec bw = 303 MB/sec bw = 321 MB/sec > tcp_bw: > bw = 407 MB/sec bw = 359 MB/sec bw = 361 MB/sec > tcp_bw: > bw = 816 MB/sec bw = 512 MB/sec bw = 419 MB/sec > tcp_bw: > bw = 840 MB/sec bw = 756 MB/sec bw = 457 MB/sec > tcp_bw: > bw = 1.07 GB/secbw = 880 MB/sec bw = 480 MB/sec > tcp_bw: > bw = 1.17 GB/secbw = 1.01 GB/secbw = 488 MB/sec > tcp_bw: > bw = 1.17 GB/secbw = 1.11 GB/secbw = 483 MB/sec > tcp_lat: > latency = 29 us latency = 29.2 us latency = 29.6 us > tcp_lat: > latency = 28.9 us latency = 29.3 us latency = 29.5 us > tcp_lat: > latency = 29 us latency = 29.3 us latency = 29.6 us > tcp_lat: > latency = 29 us latency = 29.4 us latency = 29.5 us > tcp_lat: > latency = 29 us latency = 29.2 us latency = 29.6 us > tcp_lat: > latency = 29.1 us latency = 29.3 us latency = 29.7 us > tcp_lat: > latency = 29.4 us latency = 29.6 us latency = 30 us > tcp_lat: > latency = 29.8 us latency = 30.1 us latency = 30.2 us > tcp_lat: > latency = 30.9 us latency = 30.9 us latency = 31 us > tcp_lat: > latency = 46.9 us latency = 46.2 us latency = 32.2 us > tcp_lat: > latency = 51.5 us latency = 52.6 us latency = 34.5 us > tcp_lat: > latency = 43.9 us latency = 43.8 us latency = 43.6 us > tcp_lat: > latency = 47.6 us latency = 48 us latency = 48.1 us > tcp_lat: > latency = 77.7 us latency = 78.8 us
Re: [ovs-dev] [PATCH v5 2/3] ovn-controller: Add a new action - 'put_nd_ra_opts'
On Fri, Aug 4, 2017 at 4:09 PM, Numan Siddiquewrote: > > > On Fri, Aug 4, 2017 at 3:02 AM, Ben Pfaff wrote: > >> On Mon, Jul 31, 2017 at 06:11:35PM +0530, nusid...@redhat.com wrote: >> > From: Numan Siddique >> > >> > This patch adds a new OVN action 'put_nd_ra_opts' to support native >> > IPv6 Router Advertisement in OVN. This action can be used to respond >> > to the IPv6 Router Solicitation requests. >> > >> > ovn-controller parses this action and adds a NXT_PACKET_IN2 OF flow >> > with 'pause' flag set and the RA options stored in 'userdata' field. >> > This action is similar to 'put_dhcp_opts' and 'put_dhcpv6_opts'. >> > >> > When a valid IPv6 RS packet is received by the pinctrl module of >> > ovn-controller, it frames a new RA packet and sets the RA options >> > from the 'userdata' field and resumes the packet storing 1 in the >> > 1-bit result sub-field. If the packet is invalid, it resumes the >> > packet without any modifications storing 0 in the 1-bit result >> > sub-field. >> > >> > Eg. reg0[5] = put_nd_ra_opts(address_mode = "slaac", mtu = 1450, >> > slla = 01:02:03:04:05:06, prefix = >> aef0::/64) >> > >> > Note that unlike DHCPv4/v6, a new table to store the supported IPv6 ND >> RA >> > options is not added in SB DB since there are only 3 ND RA options. >> > >> > Co-authored-by: Zongkai LI >> > Signed-off-by: Zongkai LI >> > Signed-off-by: Numan Siddique >> > Acked-by: Miguel Angel Ajo >> >> Thanks for working on this. >> >> Looking at it, the treatment of some of the options strikes me as odd. >> Some of the options (SLL) are actually required, and others could be >> supplied as fixed data since there's a default value (mo_flags, mtu). >> Only the prefixes seem to really be options in what I think of as the >> usual sense. It looks to me like those prefixes could be supplied >> directly in the userdata as bytes to append to the packet. It might be >> cleaner to make these changes--then there would be less parsing of text >> options, encoding them as binary options, decoding those binary options, >> and then re-encoding them again in the packet. >> >> > Thanks for the review Ben. I think what you are suggesting makes sense and > simplies the code. Based on your comments, this is what I am planning to > modify this patch. > > "put_nd_ra_opts" action would be like > > res_bit = put_nd_ra_opts(slla, mtu, mo_flags, ...) > > ovn-northd would include 0 or more prefixes following the mo_flags field > depending the address mode defined by the user. > > Eg. > reg0[4] = put_nd_ra_opts(01:02:03:04:05:06, 1450, 0, aef0::/64) > reg0[4] = put_nd_ra_opts(01:02:03:04:05:06, 1450, 64, aef0::/64, > bef0::/64) > reg0[4] = put_nd_ra_opts(01:02:03:04:05:06, 1450, 128) > > > If this makes sense, I will update the patches accordingly. Please let me > know. > > And also it is possible to store the complete ICMPv6 response data (which includes ICMP v6 header, flags and options) in the userdata field as you mentioned, making the job of framing the reply easier. Numan Thanks again > > Numan > > > If that doesn't make sense, though, this ALIGNED_CAST in >> pinctrl_handle_put_nd_ra_opts()looks wrong--I see no reason to believe >> that opt_data is 32-bit aligned. Probably should use >> get_unaligned_be32(): >> if (opt_len == sizeof(ovs_be32)) { >> mtu = *(ALIGNED_CAST(const ovs_be32 *, opt_data)); >> } >> >> Thanks, >> >> Ben. >> > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC probability.
Currently, real insertion probability is higher than configured for the maximum case because of wrong usage of the random value. i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min' equals to 1. In this case we're allowing insert if random vailue is less or equal to 1. So, two of 2**32 values (0 and 1) are allowed and real probability is 2 times higher than configured. This happens because 'random_uint32()' returns value in range [0; UINT32_MAX], but for the checking to be correct we should generate random value in range [0; UINT32_MAX - 1]. To fix this we have 4 possible solutions: 1. need to use uint64_t for 'emc-insert-min' and calculate it as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full range [0; UINT32_MAX]. This may decrease performance becaue of 64 bit atomic ops. 2. Forbid the '2**32 - 1' as the value for 'emc-insert-min' because it's the only value we have issues with. This will require additional explanations and not very friendly for users. 3. Generate random value in range [0; UINT32_MAX - 1]. This will require heavy division operation. 4. Decrease the range of available values for 'emc-insert-inv-prob'. Actually, we don't need to have so much different values for that option. I beleve that values higher than 1M are completely useless. Choosing the upper limit as a power of 2 like 2**20 we will be able to mask the generated random value in a fast way and also avoid range issue, because same uint32_t can be used to store 2**20. This patch implements solution #4. CC: Ciara LoftusFixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert") Signed-off-by: Ilya Maximets Acked-by: Antonio Fischetti --- Infrastructure and logic introduced here will be used for fixing emc replacement policy. lib/dpif-netdev.c| 14 ++ vswitchd/vswitch.xml | 3 ++- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8ecc9d1..e23c674 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -153,9 +153,14 @@ struct netdev_flow_key { #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) #define EM_FLOW_HASH_SEGS 2 +/* Set up maximum inverse EMC insertion probability to 2^20 - 1. + * Higher values considered useless in practice. */ +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20 +#define EM_FLOW_INSERT_INV_PROB_MAX (1 << EM_FLOW_INSERT_INV_PROB_SHIFT) +#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1) /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */ #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100 -#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ +#define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /\ DEFAULT_EM_FLOW_INSERT_INV_PROB) struct emc_entry { @@ -2092,7 +2097,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd, uint32_t min; atomic_read_relaxed(>dp->emc_insert_min, ); -if (min && random_uint32() <= min) { +if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) { emc_insert(>flow_cache, key, flow); } } @@ -2909,8 +2914,9 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) } atomic_read_relaxed(>emc_insert_min, _min); -if (insert_prob <= UINT32_MAX) { -insert_min = insert_prob == 0 ? 0 : UINT32_MAX / insert_prob; +if (insert_prob < EM_FLOW_INSERT_INV_PROB_MAX) { +insert_min = insert_prob == 0 + ? 0 : EM_FLOW_INSERT_INV_PROB_MAX / insert_prob; } else { insert_min = DEFAULT_EM_FLOW_INSERT_MIN; insert_prob = DEFAULT_EM_FLOW_INSERT_INV_PROB; diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 074535b..61f252e 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -381,7 +381,8 @@ + type='{"type": "integer", + "minInteger": 0, "maxInteger": 1048575}'> Specifies the inverse probability (1/emc-insert-inv-prob) of a flow being inserted into the Exact Match Cache (EMC). On average one in -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 0/2] EMC management fixes.
Version 3: * Added comment to EM_FLOW_INSERT_INV_PROB_SHIFT. Ilya Maximets (2): dpif-netdev: Decrease range of values for EMC probability. dpif-netdev: Fix emc replacement policy. lib/dpif-netdev.c| 35 +-- vswitchd/vswitch.xml | 3 ++- 2 files changed, 27 insertions(+), 11 deletions(-) -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] tests: fix wrapped comment
Add missing '#' to comment line. Signed-off-by: Lance Richardson--- It's odd that this doesn't result in failures in more environments, but it does fail under Alpine Linux. tests/ovn.at | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/ovn.at b/tests/ovn.at index 40fa817f9..29c5fc0ab 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -7134,7 +7134,7 @@ ovn_start # for the localnet on hv2. These tests are expected to succeed. # Create three hypervisors and create OVS ports corresponding -to logical ports. +# to logical ports. net_add n1 sim_add hv1 -- 2.13.3 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] 答复: Re: [PATCH] ovn: Support for taas(tap-as-a-service) function
On Thu, Aug 3, 2017 at 8:52 PM,wrote: > Miguel Ángel and Russell > > Thanks for your reviews. > > Current taas function just for port monitor, in this situation, we can > simplify the design by just add new port type. But we have the plane to add > flow_classifier to tap_flow to monitor special flows of given port. The > flow_classifier definition may like as follow: > 'flow_classifiers': { > 'id': {'allow_post': False, 'allow_put': False, >'validate': {'type:uuid': None}, 'is_visible': True, >'primary_key': True}, > 'tenant_id': {'allow_post': True, 'allow_put': False, > 'validate': {'type:string': None}, > 'required_by_policy': True, 'is_visible': True}, > 'name': {'allow_post': True, 'allow_put': True, > 'validate': {'type:string': None}, > 'is_visible': True, 'default': ''}, > 'description': {'allow_post': True, 'allow_put': True, > 'validate': {'type:string': None}, > 'is_visible': True, 'default': ''}, > 'protocol': {'allow_post': True, 'allow_put': True, > 'validate': {'type:string': None}, > 'is_visible': True, 'default': ''}, > 'src_port_range_min': {'allow_post': True, 'allow_put': True, >'convert_to': attr.convert_to_int, >'is_visible': True, 'default': 0}, > 'src_port_range_max': {'allow_post': True, 'allow_put': True, >'convert_to': attr.convert_to_int, >'is_visible': True, 'default': 0}, > 'dst_port_range_min': {'allow_post': True, 'allow_put': True, >'convert_to': attr.convert_to_int, >'is_visible': True, 'default': 0}, > 'dst_port_range_max': {'allow_post': True, 'allow_put': True, >'convert_to': attr.convert_to_int, >'is_visible': True, 'default': 0}, > 'src_ip_prefix': {'allow_post': True, 'allow_put': True, > 'validate': {'type:subnet': > attr._validate_subnet}, > 'is_visible': True, 'default': '0.0.0.0/0'}, > 'dst_ip_prefix': {'allow_post': True, 'allow_put': True, > 'validate': {'type:subnet': > attr._validate_subnet}, > 'is_visible': True, 'default': '0.0.0.0/0'} > } > > This may need more complex pipeline. So I think add a new table and new > pipeline may be a easier way. > Thanks for sharing the info on future capabilities. We have a very flexible syntax for traffic classification in OVN. It's the logical flow match syntax (see logical flows in the southbound database). We expose this syntax in the northbound database in the "match" column of the ACL table. This would be another use case where we could use this syntax in the northbound database. Expanding on my preview proposal: - a new port type of 'mirror' - when port type=mirror, an option to identify which port is being mirrored - (the new part) when port type=mirror, an option that may be used to specify traffic classification for the subset of traffic on a port to mirror, in "match" syntax Do you think this captures the use case? -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovs-vsctl-bashcomp: Make compatible with busybox "awk".
> From: "Ben Pfaff"> To: d...@openvswitch.org > Cc: "Ben Pfaff" , "Stuart Cardall" > Sent: Friday, 14 July, 2017 12:42:54 AM > Subject: [ovs-dev] [PATCH] ovs-vsctl-bashcomp: Make compatible with busybox > "awk". > > It seems that awk in busybox doesn't think that an empty string is part of > a larger string, but that GNU awk does. This commit adds an extra test to > make _ovs_vsctl_check_startswith_string work either way. > > This allows the following tests to pass with busybox awk: > > vsctl bashcomp unit tests > > 7: vsctl-bashcomp - basic verification ok > 8: vsctl-bashcomp - argument completionok > > Reported-by: Stuart Cardall > Signed-off-by: Ben Pfaff > --- Makes sense, verified that these test cases now pass in the Alpine environment with busybox awk. Acked-by: Lance Richardson ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Decrease range of values for EMC probability.
My reply inline. Acked-by: Antonio Fischetti> -Original Message- > From: Ilya Maximets [mailto:i.maxim...@samsung.com] > Sent: Friday, August 4, 2017 12:38 PM > To: Fischetti, Antonio ; ovs-dev@openvswitch.org > Cc: Heetae Ahn ; Loftus, Ciara > > Subject: Re: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Decrease range of values > for > EMC probability. > > Hi Antonio, > > Thanks for review and testing. Comments inline. > > Best regards, Ilya Maximets. > > On 03.08.2017 18:37, Fischetti, Antonio wrote: > > LGTM, > > just wondering if a further comment would help, > > eg something like > > > > +/* random_uint32() returns a value in the [0; UINT32_MAX] range. > > + For our checking to be correct we would need instead a random value > > + in the range [0; UINT32_MAX - 1]. To avoid further computation > > + we use a decreased range of available values for 'emc-insert-inv-prob' > > + ie [0; 2**20 - 1]. */ > > +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20 > > +#define EM_FLOW_INSERT_INV_PROB_MAX (1 << EM_FLOW_INSERT_INV_PROB_SHIFT) > > +#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1) > > > > ? > > > > -Antonio > > I don't think that such a comment will be useful for the reader. > This is more like explanation why we made this change and it should > be in commit message (which already has this information). > > In addition, the next patch adds build time assert which will forbid > using of EM_FLOW_INSERT_INV_PROB_SHIFT higher than 30. > > One thing we may clarify here is why 20 was chosen. > Something like this: > > /* Set up maximum inverse EMC insertion probability to 2^20 - 1. > * Higher values considered useless in practice. */ > > What do you think? [Antonio] yes, sounds good. Thanks. > > > > >> -Original Message- > >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > boun...@openvswitch.org] > >> On Behalf Of Ilya Maximets > >> Sent: Monday, July 31, 2017 3:41 PM > >> To: ovs-dev@openvswitch.org > >> Cc: Heetae Ahn ; Ilya Maximets > >> > >> Subject: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Decrease range of values for > EMC > >> probability. > >> > >> Currently, real insertion probability is higher than configured > >> for the maximum case because of wrong usage of the random value. > >> > >> i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min' > >> equals to 1. In this case we're allowing insert if random vailue > >> is less or equal to 1. So, two of 2**32 values (0 and 1) are > >> allowed and real probability is 2 times higher than configured. > >> > >> This happens because 'random_uint32()' returns value in range > >> [0; UINT32_MAX], but for the checking to be correct we should > >> generate random value in range [0; UINT32_MAX - 1]. > >> > >> To fix this we have 4 possible solutions: > >> > >> 1. need to use uint64_t for 'emc-insert-min' and calculate it > >> as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full > >> range [0; UINT32_MAX]. > >> > >> This may decrease performance becaue of 64 bit atomic ops. > >> > >> 2. Forbid the '2**32 - 1' as the value for 'emc-insert-min' > >> because it's the only value we have issues with. > >> > >> This will require additional explanations and not very friendly > >> for users. > >> > >> 3. Generate random value in range [0; UINT32_MAX - 1]. > >> > >> This will require heavy division operation. > >> > >> 4. Decrease the range of available values for 'emc-insert-inv-prob'. > >> > >> Actually, we don't need to have so much different values for > >> that option. I beleve that values higher than 1M are completely > >> useless. Choosing the upper limit as a power of 2 like 2**20 we > >> will be able to mask the generated random value in a fast way > >> and also avoid range issue, because same uint32_t can be used to > >> store 2**20. > >> > >> This patch implements solution #4. > >> > >> CC: Ciara Loftus > >> Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert") > >> Signed-off-by: Ilya Maximets > >> --- > >> > >> Infrastructure and logic introduced here will be used for fixing > >> emc replacement policy. > >> > >> lib/dpif-netdev.c| 12 > >> vswitchd/vswitch.xml | 3 ++- > >> 2 files changed, 10 insertions(+), 5 deletions(-) > >> > >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > >> index 47a9fa0..123a7c9 100644 > >> --- a/lib/dpif-netdev.c > >> +++ b/lib/dpif-netdev.c > >> @@ -152,9 +152,12 @@ struct netdev_flow_key { > >> #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) > >> #define EM_FLOW_HASH_SEGS 2 > >> > >> +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20 > >> +#define EM_FLOW_INSERT_INV_PROB_MAX (1 << EM_FLOW_INSERT_INV_PROB_SHIFT) > >> +#define EM_FLOW_INSERT_INV_PROB_MASK
Re: [ovs-dev] [PATCH v3 00/19] Add OVS DPDK keep-alive functionality.
Hi Bhanuprakash, Thanks for working on this. I have a general concern about implementation of this functionality: *What is the profit from using rte_keepalive library ?* Pros: * No need to implement 'rte_keepalive_dispatch_pings()' (40 LOC) and struct rte_keepalive (30 LOC, can be significantly decreased by removing not needed elements) ---> ~70 LOC. Cons: * DPDK dependency: * Implementation of PMD threads management (KA) inside netdev code (netdev-dpdk) looks very strange. * Many DPDK references in generic code (like dpdk_is_enabled). * Feature isn't available for the common threads (main?) wihtout DPDK. * Many stubs and placeholders for cases without dpdk. * No ability for unit testing. So, does it worth to use rte_keepalive? To make functionality fully generic we only need to implement 'rte_keepalive_dispatch_pings()' and few helpers. As soon as this function does nothing dpdk-specific it's a really simple task which will allow to greatly clean up the code. The feature is too big to use external library for 70 LOCs of really simple code. (Clean up should save much more). Am I missed something? Any thoughts? Best regards, Ilya Maximets. > Keepalive feature is aimed at achieving Fastpath Service Assurance > in OVS-DPDK deployments. It adds support for monitoring the packet > processing cores(PMD thread cores) by dispatching heartbeats at regular > intervals. Incase of heartbeat misses additional health checks are > enabled on the PMD thread to detect the failure and the same shall be > reported to higher level fault management systems/frameworks. > > The implementation uses OVSDB for reporting the health of the PMD threads. > Any external monitoring application can read the status from OVSDB at > regular intervals (or) subscribe to the updates in OVSDB so that they get > notified when the changes happen on OVSDB. > > keepalive info struct is created and initialized for storing the > status of the PMD threads. This is initialized by main thread(vswitchd) > as part of init process and will be periodically updated by 'keepalive' > thread. keepalive feature can be enabled through below OVSDB settings. > > enable-keepalive=true > - Keepalive feature is disabled by default. > > keepalive-interval="5000" > - Timer interval in milliseconds for monitoring the packet > processing cores. > > When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes > up at regular intervals to update the timestamp and status of pmd cores > in keepalive info struct. This information shall be read by vswitchd thread > and write the status in to 'keepalive' column of Open_vSwitch table in OVSDB. > > An external monitoring framework like collectd with ovs events support > can read (or) subscribe to the datapath status changes in ovsdb. When the > state > is updated, the collectd shall be notified and will eventually relay the > status > to ceilometer service running in the controller. Below is the high level > overview of deployment model. > > Compute NodeControllerCompute Node > > Collectd <--> Ceilometer <> Collectd > > OvS DPDK OvS DPDK > > +-+ > | VM | > +--+--+ > \---+---/ > | > +--+---+ ++--+ +--+---+ > | OVS |-> | ovsevents plugin| --> | collectd | > +--+---+ ++--+ +--+---+ > > +--+-+ +---++ | > | Ceilometer | <-- | collectd ceilometer plugin | <--- > +--+-+ +---++ > > github: The patches can be found here: > https://github.com/bbodired/ovs (Last master commit e7cd8c363) > > Performance impact: > No noticeable performance or latency impact is observed with > KA feature enabled. > > - > v2 -> v3 > * Rebase. > * Verified with dpdk-stable-17.05.1 release. > * Fixed build issues with MSVC and cross checked with appveyor. > > v1 -> v2 > * Rebase > * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it > is already applied as separate patch. > > RFCv3 -> v1 > * Made changes to fix failures in some unit test cases. > * some more code cleanup w.r.t process related APIs. > > RFCv2 -> RFCv3 > * Remove POSIX shared memory block implementation (suggested by Aaron). > * Rework the logic to register and track threads instead of cores. This way > in the future any thread can be registered to KA framework. For now only > PMD > threads are tracked (suggested by Aaron). > * Refactor few APIs and further clean up the code. > > RFCv1 -> RFCv2 > * Merged the xml and schema commits to later commit where the actual > implementation is done(suggested by Ben). > * Fix ovs-appctl keepalive/* hang issue
Re: [ovs-dev] [PATCH] OF1.5/EXT-334 OXS/Extensible Flow Entry Statistics Support
Hi Ben, Much Thanks for the Review. We will address the comments and will submit the updated patch soon. Thanks & Regards Satya Valli Tata Consultancy Services Mailto: satyavalli.r...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Consulting From: Ben PfaffTo: SatyaValli Cc: d...@openvswitch.org, Surya Muttamsetty , SatyaValli Date: 08/03/2017 01:41 AM Subject:Re: [ovs-dev] [PATCH] OF1.5/EXT-334 OXS/Extensible Flow Entry Statistics Support Thanks for the revision. I have some more comments. On Mon, Jun 19, 2017 at 01:14:40PM +0530, SatyaValli wrote: > commit 49a3592b2878a33033e5fd2e6e5ce82ebccef780 > Author: Satya Valli > Co-authored-by: Lavanya Harivelam > Co-authored-by: Surya Muttamsetty None of the above should be at the top of the commit message. Coauthors go with the sign-offs at the end. > OpenVswitch: OF1.5/EXT-334 Extensible Flow Entry Statistics Please don't repeat the subject again in the body. > OpenFlow 1.5 introduces the Extensible Statistics (OXS) by redefining the existing > flow entry statistics with OXS Fields. > This Patch provides implementation for OXS fields encoding in TLV format. > > To support this implementation below two messages are newly added > > OFPST_OXS_FLOW_STATS_REQUEST > OFPST_OXS_FLOW_STATS_REPLY > OFPST_OXS_AGGREGATE_STATS_REQUEST > OFPST_OXS_AGGREGATE_STATS_REPLY > OFPST_FLOW_REMOVED > As per the openflow specification-1.5, this enhancement should take place > on the existing flow entry statistics with the OXS fields on all the messages > that carries flow entry statistics. > > The current commit adds support for the new feature in flow statistics multipart messages, > aggregate multipart messages and OXS flow statistics support for flow removal message. > > Some more fields are added to ovs-ofctl dump-flows command to support OpenFlow15 OXS stats. > Below are Commands to display OXS stats field wise > > Flow Statistics Multipart > ovs-ofctl dump-flows -O OpenFlow15 oxs-duration > ovs-ofctl dump-flows -O OpenFlow15 oxs-idle_time > ovs-ofctl dump-flows -O OpenFlow15 oxs-packet_count > ovs-ofctl dump-flows -O OpenFlow15 oxs-byte_count > > Aggregate Flow Statistics Multipart > ovs-ofctl dump-aggregate -O OpenFlow15 oxs-packet_count > ovs-ofctl dump-aggregate -O OpenFlow15 oxs-byte_count > ovs-ofctl dump-aggregate -O OpenFlow15 oxs-flow_count > > Make Check Changes in - ofproto.at and ofproto-dpif.at: > For 1.5 version "flags" variable has been removed from flow_stats_reply structure. > For avoiding the make check failures for OF1.5 version add-flow with flags, > below test cases has been modified for 1.5 version in respective .at files > > Files testcase > ofproto.at 0884 > ofproto-dpif1126 Please don't refer to tests by number. The numbers are not meaningful and change often. Use the name of the test. > In ofproto.at there is one test case numbered 0884 and 1126 in ofproto-dpif.at > which is validating flagsfor OF1.5 this testcase is not valid, because > in stats reply flags are not explicitly communicated,I've removed this test case. > Before removing this test case is failing for 1.5 version. > > Since FLOW_REMOVED is not supported in OF1.5 version, > testing has been done by adding test case in ofproto.at. > While doing make check after adding test case we are able to see > OFPT_FLOW_REMOVED (OF1.5) message has sent successfully. > > But make check is failing because of Signal 15 termination, > so we've removed that test case from the patch. Why isn't FLOW_REMOVED supported in OF1.5? Support is needed. Signal 15 is SIGTERM, meaning that something actually used "kill" to kill it intentionally. Please investigate and fix the problem rather than removing a test. This adds features that users can use via ovs-ofctl, but it does not document them. Please document them in ovs-ofctl(8). Please also add appropriate NEWS items to explain the new features. This adds support for new OpenFlow messages, but it does not add any tests for printing them in ofp-print.at. Please add at least one representative test there for every new message. struct ofp15_flow_removed doesn't seem to be the same as ofp_flow_removed in OpenFlow 1.5. This adds a global variable oxs_field_set. This is inappropriate. Do not use global variables. The variable oxs_field_set has a bunch of unnamed magic numbers. This is inappropriate. Do not use unnamed magic numbers. This code is weird. Fix it please: +if (parse_oxs_field(name, )) { +if (f->fl_type == OFPXST_OFB_DURATION) { +oxs_field_set |= (1 <<
[ovs-dev] [PATCH v3 12/19] keepalive: Add support to query keepalive status.
This commit adds support to query if keepalive status is enabled/disabled. $ ovs-appctl keepalive/status keepAlive Status: Enabled Signed-off-by: Bhanuprakash Bodireddy--- lib/keepalive.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/lib/keepalive.c b/lib/keepalive.c index a2f0314..43f8f11 100644 --- a/lib/keepalive.c +++ b/lib/keepalive.c @@ -429,6 +429,19 @@ out: ds_destroy(); } +static void +ka_unixctl_status(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED) +{ +struct ds ds = DS_EMPTY_INITIALIZER; + +ds_put_format(, "keepAlive Status: %s", + ka_is_enabled() ? "Enabled" : "Disabled"); + +unixctl_command_reply(conn, ds_cstr()); +ds_destroy(); +} + static int ka_init__(void) { @@ -473,6 +486,8 @@ ka_init(const struct smap *ovs_other_config) unixctl_command_register("keepalive/pmd-health-show", "", 0, 0, ka_unixctl_pmd_health_show, ka_info); +unixctl_command_register("keepalive/status", "", 0, 0, + ka_unixctl_status, NULL); ovsthread_once_done(_enable); } -- 2.4.11 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 11/19] keepalive: Add support to query keepalive statistics.
This commit adds support to query keepalive statistics. Datapath health status can be retrieved as follows: $ ovs-appctl keepalive/pmd-health-show Keepalive status keepalive status : Enabled keepalive interval: 1000 ms PMD threads : 8 PMDCORESTATE LAST SEEN TIMESTAMP pmd620 ALIVE 8632183482028293 pmd631 ALIVE 8632183482028425 pmd642 ALIVE 8632190191004294 pmd653 ALIVE 8632183482028525 pmd664 GONE8612183482028117 pmd675 ALIVE 8632190191004984 pmd686 ALIVE 8632190191005713 pmd697 ALIVE 8632190191006555 Signed-off-by: Bhanuprakash Bodireddy--- lib/keepalive.c | 78 + 1 file changed, 78 insertions(+) diff --git a/lib/keepalive.c b/lib/keepalive.c index 83ce26d..a2f0314 100644 --- a/lib/keepalive.c +++ b/lib/keepalive.c @@ -23,9 +23,11 @@ #include "dpdk.h" #include "keepalive.h" #include "lib/vswitch-idl.h" +#include "openvswitch/dynamic-string.h" #include "openvswitch/vlog.h" #include "ovs-thread.h" #include "process.h" +#include "unixctl.h" VLOG_DEFINE_THIS_MODULE(keepalive); @@ -354,6 +356,79 @@ ka_stats_run(void) return ka_stats; } +static void +ka_unixctl_pmd_health_show(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *ka_info_) +{ +struct ds ds = DS_EMPTY_INITIALIZER; +ds_put_format(, + "\n\t\tKeepalive status\n\n"); + +ds_put_format(, "keepalive status : %s\n", + ka_is_enabled() ? "Enabled" : "Disabled"); + +if (!ka_is_enabled()) { +goto out; +} + +ds_put_format(, "keepalive interval: %"PRIu32" ms\n", + get_ka_interval()); + +struct keepalive_info *ka_info = (struct keepalive_info *)ka_info_; +if (OVS_UNLIKELY(!ka_info)) { +goto out; +} + +ds_put_format(, "PMD threads : %"PRIu32" \n", ka_info->pmd_cnt); +ds_put_format(, + "\n PMD\tCORE\tSTATE\tLAST SEEN TIMESTAMP\n"); + +struct ka_process_info *pinfo, *pinfo_next; + +ovs_mutex_lock(_info->proclist_mutex); +HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node, _info->process_list) { +char *state = NULL; +if (pinfo->core_state == KA_STATE_UNUSED || + pinfo->core_state == KA_STATE_SLEEP) +continue; + +switch (pinfo->core_state) { +case KA_STATE_ALIVE: +state = "ALIVE"; +break; +case KA_STATE_MISSING: +state = "MISSING"; +break; +case KA_STATE_DEAD: +state = "DEAD"; +break; +case KA_STATE_GONE: +state = "GONE"; +break; +case KA_STATE_DOZING: +state = "DOZING"; +break; +case KA_STATE_SLEEP: +state = "SLEEP"; +break; +case KA_STATE_CHECK: +state = "HEALTH_CHECK_RUNNING"; +break; +case KA_STATE_UNUSED: +break; +} +ds_put_format(, "%s\t%2d\t%s\t%"PRIu64"\n", + pinfo->name, pinfo->core_id, state, + pinfo->core_last_seen_times); +} +ovs_mutex_unlock(_info->proclist_mutex); + +ds_put_format(, "\n"); +out: +unixctl_command_reply(conn, ds_cstr()); +ds_destroy(); +} + static int ka_init__(void) { @@ -396,6 +471,9 @@ ka_init(const struct smap *ovs_other_config) } } +unixctl_command_register("keepalive/pmd-health-show", "", 0, 0, + ka_unixctl_pmd_health_show, ka_info); + ovsthread_once_done(_enable); } } -- 2.4.11 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 10/19] bridge: Update keepalive status in OVSDB
This commit allows vswitchd thread to update the OVSDB with the status of all registered PMD threads. The status can be monitored using ovsdb-client and the sample output is below. $ ovsdb-client monitor Open_vSwitch Open_vSwitch keepalive rowaction keepalive 7b746190-ee71-4dcc-becf-f8cb9c7cb909 old { "pmd62"="ALIVE,0,9226457935188922" "pmd63"="ALIVE,1,9226457935189628" "pmd64"="ALIVE,2,9226457935189897" "pmd65"="ALIVE,3,9226457935190127"} new { "pmd62"="ALIVE,0,9226460230167364" "pmd63"="ALIVE,1,9226460230168100" "pmd64"="ALIVE,2,9226460230168905" "pmd65"="ALIVE,3,9226460230169632"} Signed-off-by: Bhanuprakash Bodireddy--- lib/keepalive.c | 15 +++ lib/keepalive.h | 1 + vswitchd/bridge.c | 26 ++ 3 files changed, 42 insertions(+) diff --git a/lib/keepalive.c b/lib/keepalive.c index 509a0d1..83ce26d 100644 --- a/lib/keepalive.c +++ b/lib/keepalive.c @@ -339,6 +339,21 @@ get_ka_stats(void) ovs_mutex_unlock(); } +struct smap * +ka_stats_run(void) +{ +struct smap *ka_stats = NULL; + +ovs_mutex_lock(); +if (keepalive_stats) { +ka_stats = keepalive_stats; +keepalive_stats = NULL; +} +ovs_mutex_unlock(); + +return ka_stats; +} + static int ka_init__(void) { diff --git a/lib/keepalive.h b/lib/keepalive.h index b18f6e0..cedc390 100644 --- a/lib/keepalive.h +++ b/lib/keepalive.h @@ -101,6 +101,7 @@ int get_ka_init_status(void); int ka_alloc_portstats(unsigned, int); void ka_destroy_portstats(void); void get_ka_stats(void); +struct smap *ka_stats_run(void); void dispatch_heartbeats(void); #endif /* keepalive.h */ diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index 8ff91df..56c55f5 100644 --- a/vswitchd/bridge.c +++ b/vswitchd/bridge.c @@ -286,6 +286,7 @@ static bool port_is_synthetic(const struct port *); static void reconfigure_system_stats(const struct ovsrec_open_vswitch *); static void run_system_stats(void); +static void run_keepalive_stats(void); static void bridge_configure_mirrors(struct bridge *); static struct mirror *mirror_create(struct bridge *, @@ -403,6 +404,7 @@ bridge_init(const char *remote) ovsdb_idl_omit_alert(idl, _open_vswitch_col_cur_cfg); ovsdb_idl_omit_alert(idl, _open_vswitch_col_statistics); +ovsdb_idl_omit_alert(idl, _open_vswitch_col_keepalive); ovsdb_idl_omit_alert(idl, _open_vswitch_col_datapath_types); ovsdb_idl_omit_alert(idl, _open_vswitch_col_iface_types); ovsdb_idl_omit(idl, _open_vswitch_col_external_ids); @@ -2686,6 +2688,29 @@ run_system_stats(void) } } +void +run_keepalive_stats(void) +{ +struct smap *ka_stats; +const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(idl); + +ka_stats = ka_stats_run(); +if (ka_stats && cfg) { +struct ovsdb_idl_txn *txn; +struct ovsdb_datum datum; + +txn = ovsdb_idl_txn_create(idl); +ovsdb_datum_from_smap(, ka_stats); +smap_destroy(ka_stats); +ovsdb_idl_txn_write(>header_, _open_vswitch_col_keepalive, +); +ovsdb_idl_txn_commit(txn); +ovsdb_idl_txn_destroy(txn); + +free(ka_stats); +} +} + static const char * ofp12_controller_role_to_str(enum ofp12_controller_role role) { @@ -3038,6 +3063,7 @@ bridge_run(void) run_stats_update(); run_status_update(); run_system_stats(); +run_keepalive_stats(); } void -- 2.4.11 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 09/19] keepalive: Retrieve PMD status periodically.
This commit implements APIs to retrieve the PMD thread status and return the status in the below format for each PMD thread. Format: pmdid="status,core id,last_seen_timestamp" eg: pmd62="ALIVE,2,9220698256784207" pmd63="GONE,3,9220698256786231" The status is periodically retrieved by keepalive thread and stored in keepalive_stats struc which later shall be retrieved by vswitchd thread. In case of four PMD threads the status is as below: "pmd62"="ALIVE,0,9220698256784207" "pmd63"="ALIVE,1,9220698256784913" "pmd64"="ALIVE,2,9220698256785902" "pmd65"="ALIVE,3,9220698256786231" Signed-off-by: Bhanuprakash Bodireddy--- lib/dpif-netdev.c | 1 + lib/keepalive.c | 73 +++ lib/keepalive.h | 1 + 3 files changed, 75 insertions(+) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 009ee24..65db5fd 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -991,6 +991,7 @@ ovs_keepalive(void *f_) int n_pmds = cmap_count(>poll_threads) - 1; if (n_pmds > 0) { dispatch_heartbeats(); +get_ka_stats(); } xusleep(get_ka_interval() * 1000); diff --git a/lib/keepalive.c b/lib/keepalive.c index 3022789..509a0d1 100644 --- a/lib/keepalive.c +++ b/lib/keepalive.c @@ -24,6 +24,7 @@ #include "keepalive.h" #include "lib/vswitch-idl.h" #include "openvswitch/vlog.h" +#include "ovs-thread.h" #include "process.h" VLOG_DEFINE_THIS_MODULE(keepalive); @@ -33,6 +34,9 @@ static bool ka_init_status = ka_init_failure; /* Keepalive initialization */ static uint32_t keepalive_timer_interval; /* keepalive timer interval */ static struct keepalive_info *ka_info = NULL; +static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER; +static struct smap *keepalive_stats OVS_GUARDED_BY(mutex); + inline bool ka_is_enabled(void) { @@ -266,6 +270,75 @@ keepalive_info_create(void) return ka_info; } +static void +get_pmd_status(struct smap *ka_pmd_stats) +OVS_REQUIRES(ka_info->proclist_mutex) +{ +if (OVS_UNLIKELY(!ka_info)) { +return; +} + +struct ka_process_info *pinfo, *pinfo_next; +HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node, _info->process_list) { +int core_id = pinfo->core_id; +char *state = NULL; +if (pinfo->core_state == KA_STATE_UNUSED || + pinfo->core_state == KA_STATE_SLEEP ) { +continue; +} + +switch (pinfo->core_state) { +case KA_STATE_ALIVE: +state = "ALIVE"; +break; +case KA_STATE_MISSING: +state = "MISSING"; +break; +case KA_STATE_DEAD: +state = "DEAD"; +break; +case KA_STATE_GONE: +state = "GONE"; +break; +case KA_STATE_DOZING: +state = "DOZING"; +break; +case KA_STATE_SLEEP: +state = "SLEEP"; +break; +case KA_STATE_CHECK: +state = "HEALTH_CHECK_RUNNING"; +break; +case KA_STATE_UNUSED: +break; +} + +smap_add_format(ka_pmd_stats, pinfo->name, "%s,%d,%ld", +state, core_id, pinfo->core_last_seen_times); +} +} + +void +get_ka_stats(void) +{ +struct smap *ka_pmd_stats; +ka_pmd_stats = xmalloc(sizeof *ka_pmd_stats); +smap_init(ka_pmd_stats); + +ovs_mutex_lock(_info->proclist_mutex); +get_pmd_status(ka_pmd_stats); +ovs_mutex_unlock(_info->proclist_mutex); + +ovs_mutex_lock(); +if (keepalive_stats) { +smap_destroy(keepalive_stats); +free(keepalive_stats); +keepalive_stats = NULL; +} +keepalive_stats = ka_pmd_stats; +ovs_mutex_unlock(); +} + static int ka_init__(void) { diff --git a/lib/keepalive.h b/lib/keepalive.h index 31ee34c..b18f6e0 100644 --- a/lib/keepalive.h +++ b/lib/keepalive.h @@ -100,6 +100,7 @@ uint32_t get_ka_interval(void); int get_ka_init_status(void); int ka_alloc_portstats(unsigned, int); void ka_destroy_portstats(void); +void get_ka_stats(void); void dispatch_heartbeats(void); #endif /* keepalive.h */ -- 2.4.11 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 08/19] dpif-netdev: Enable heartbeats for DPDK datapath.
This commit adds heartbeat mechanism support for DPDK datapath. Heartbeats are sent to registered PMD threads at predefined intervals (as set in ovsdb with 'keepalive-interval'). The heartbeats are only enabled when there is atleast one port added to the bridge and with active PMD thread polling the port. Signed-off-by: Bhanuprakash Bodireddy--- lib/dpdk-stub.c | 6 ++ lib/dpdk.c| 7 +++ lib/dpdk.h| 2 ++ lib/dpif-netdev.c | 9 - lib/keepalive.c | 9 + lib/keepalive.h | 1 + 6 files changed, 33 insertions(+), 1 deletion(-) diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c index b4f111a..740ec11 100644 --- a/lib/dpdk-stub.c +++ b/lib/dpdk-stub.c @@ -78,3 +78,9 @@ dpdk_mark_pmd_core_sleep(void) { /* Nothing */ } + +void +dpdk_dispatch_pmd_hb(void) +{ +/* Nothing */ +} diff --git a/lib/dpdk.c b/lib/dpdk.c index 250cc2f..59ac035 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -542,3 +542,10 @@ dpdk_mark_pmd_core_sleep(void) { rte_keepalive_mark_sleep(rte_global_keepalive_info); } + +/* Dispatch pings */ +void +dpdk_dispatch_pmd_hb(void) +{ +rte_keepalive_dispatch_pings(NULL, rte_global_keepalive_info); +} diff --git a/lib/dpdk.h b/lib/dpdk.h index 7619730..2532cb7 100644 --- a/lib/dpdk.h +++ b/lib/dpdk.h @@ -47,4 +47,6 @@ void dpdk_unregister_pmd_core(unsigned core_id); void dpdk_mark_pmd_core_alive(void); void dpdk_mark_pmd_core_sleep(void); +void dpdk_dispatch_pmd_hb(void); + #endif /* dpdk.h */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 5737223..009ee24 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -981,11 +981,18 @@ sorted_poll_thread_list(struct dp_netdev *dp, } static void * -ovs_keepalive(void *f_ OVS_UNUSED) +ovs_keepalive(void *f_) { +struct dp_netdev *dp = f_; + pthread_detach(pthread_self()); for (;;) { +int n_pmds = cmap_count(>poll_threads) - 1; +if (n_pmds > 0) { +dispatch_heartbeats(); +} + xusleep(get_ka_interval() * 1000); } diff --git a/lib/keepalive.c b/lib/keepalive.c index eeaa25a..3022789 100644 --- a/lib/keepalive.c +++ b/lib/keepalive.c @@ -248,6 +248,15 @@ ka_destroy_portstats(void) } } +/* Dispatch pings */ +void +dispatch_heartbeats(void) +{ +#ifdef DPDK_NETDEV +dpdk_dispatch_pmd_hb(); +#endif +} + static struct keepalive_info * keepalive_info_create(void) { diff --git a/lib/keepalive.h b/lib/keepalive.h index 605e41c..31ee34c 100644 --- a/lib/keepalive.h +++ b/lib/keepalive.h @@ -101,4 +101,5 @@ int get_ka_init_status(void); int ka_alloc_portstats(unsigned, int); void ka_destroy_portstats(void); +void dispatch_heartbeats(void); #endif /* keepalive.h */ -- 2.4.11 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 07/19] dpif-netdev: Register packet processing cores to KA framework.
This commit registers the packet processing PMD cores to keepalive framework. Only PMDs that have rxqs mapped will be registered and actively monitored by KA framework. This commit spawns a keepalive thread that will dispatch heartbeats to PMD cores. The pmd threads respond to heartbeats by marking themselves alive. As long as PMD responds to heartbeats it is considered 'healthy'. Signed-off-by: Bhanuprakash Bodireddy--- lib/dpif-netdev.c | 99 +++ lib/keepalive.c | 138 +++--- lib/keepalive.h | 25 +- lib/util.c| 10 lib/util.h| 1 + 5 files changed, 254 insertions(+), 19 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index b51674f..5737223 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -73,6 +73,7 @@ #include "seq.h" #include "smap.h" #include "sset.h" +#include "svec.h" #include "timeval.h" #include "tnl-neigh-cache.h" #include "tnl-ports.h" @@ -979,6 +980,94 @@ sorted_poll_thread_list(struct dp_netdev *dp, *n = k; } +static void * +ovs_keepalive(void *f_ OVS_UNUSED) +{ +pthread_detach(pthread_self()); + +for (;;) { +xusleep(get_ka_interval() * 1000); +} + +return NULL; +} + +static void +ka_thread_start(struct dp_netdev *dp) +{ +static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; + +if (ovsthread_once_start()) { +ovs_thread_create("ovs_keepalive", ovs_keepalive, dp); + +ovsthread_once_done(); +} +} + +static void +pmd_num_poll_ports(struct dp_netdev_pmd_thread *pmd, int *num_poll_ports) +{ +struct svec pmd_port_poll_list; +svec_init(_port_poll_list); + +struct rxq_poll *poll; +const char *port_name; +int i = 0; + +HMAP_FOR_EACH (poll, node, >poll_list) { +svec_add(_port_poll_list, netdev_rxq_get_name(poll->rxq->rx)); +} +/* With MQ enabled, remove the duplicates. */ +svec_sort_unique(_port_poll_list); +SVEC_FOR_EACH (i, port_name, _port_poll_list) { +VLOG_DBG("%d Port:%s", i, port_name); +} +svec_destroy(_port_poll_list); + +*num_poll_ports = i; +VLOG_DBG("PMD thread [%d] polling [%d] ports", + pmd->core_id, *num_poll_ports); +} + +static void +ka_register_datapath_threads(struct dp_netdev *dp) +{ +int ka_init = get_ka_init_status(); +VLOG_DBG("Keepalive: Was initialization successful? [%s]", +ka_init ? "Success" : "Failure"); +if (!ka_init) { +return; +} + +ka_thread_start(dp); + +struct dp_netdev_pmd_thread *pmd; +CMAP_FOR_EACH (pmd, node, >poll_threads) { +/* Skip PMD thread with no rxqs mapping. */ +if (OVS_UNLIKELY(!hmap_count(>poll_list))) { +continue; +} + +/* Register only PMD threads. */ +if (pmd->core_id != NON_PMD_CORE_ID) { +int err; +int nports; +pmd_num_poll_ports(pmd, ); +err = ka_alloc_portstats(pmd->core_id, nports); +if (err) { +VLOG_FATAL("Unable to allocate memory for PMD core %d", +pmd->core_id); +return; +} + +int tid = ka_get_pmd_tid(pmd->core_id); +ka_register_thread(tid, true); +VLOG_DBG("Registered PMD thread [%d] on Core [%d] to KA framework", + tid, pmd->core_id); +} +} +} + static void dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], void *aux) @@ -3626,6 +3715,9 @@ reconfigure_datapath(struct dp_netdev *dp) /* Reload affected pmd threads. */ reload_affected_pmds(dp); + +/* Register datapath threads to KA monitoring. */ +ka_register_datapath_threads(dp); } /* Returns true if one of the netdevs in 'dp' requires a reconfiguration */ @@ -3862,6 +3954,9 @@ reload: : PMD_CYCLES_IDLE); } +/* Mark PMD thread alive. */ +ka_mark_pmd_thread_alive(); + if (lc++ > 1024) { bool reload; @@ -3895,6 +3990,10 @@ reload: } emc_cache_uninit(>flow_cache); + +int tid = ka_get_pmd_tid(pmd->core_id); +ka_unregister_thread(tid, true); + free(poll_list); pmd_free_cached_ports(pmd); return NULL; diff --git a/lib/keepalive.c b/lib/keepalive.c index 0087e5c..eeaa25a 100644 --- a/lib/keepalive.c +++ b/lib/keepalive.c @@ -24,6 +24,7 @@ #include "keepalive.h" #include "lib/vswitch-idl.h" #include "openvswitch/vlog.h" +#include "process.h" VLOG_DEFINE_THIS_MODULE(keepalive); @@ -77,21 +78,85 @@ ka_store_pmd_id(unsigned core_idx) /* Register thread to KA framework. */ void -ka_register_pmd_thread(int tid OVS_UNUSED, bool thread_is_pmd OVS_UNUSED, - unsigned core_id) +ka_register_thread(int tid, bool
[ovs-dev] [PATCH v3 06/19] dpif-netdev: Add helper function to store datapath tids.
This commit adds an API to store the PMD thread ids in to KA info struct. The thread ids shall be used to check false positives and for status and statistics reporting. Signed-off-by: Bhanuprakash Bodireddy--- lib/dpif-netdev.c | 3 +++ lib/keepalive.c | 13 + lib/keepalive.h | 1 + 3 files changed, 17 insertions(+) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8ecc9d1..b51674f 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -49,6 +49,7 @@ #include "flow.h" #include "hmapx.h" #include "id-pool.h" +#include "keepalive.h" #include "latch.h" #include "netdev.h" #include "netdev-vport.h" @@ -3824,6 +3825,8 @@ pmd_thread_main(void *f_) poll_list = NULL; +ka_store_pmd_id(pmd->core_id); + /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */ ovsthread_setspecific(pmd->dp->per_pmd_key, pmd); ovs_numa_thread_setaffinity_core(pmd->core_id); diff --git a/lib/keepalive.c b/lib/keepalive.c index 66c969d..0087e5c 100644 --- a/lib/keepalive.c +++ b/lib/keepalive.c @@ -62,6 +62,19 @@ get_ka_init_status(void) return ka_init_status; } +void +ka_store_pmd_id(unsigned core_idx) +{ +int tid = -1; +#ifdef DPDK_NETDEV +tid = rte_sys_gettid(); +#endif + +if (ka_is_enabled()) { +ka_info->thread_id[core_idx] = tid; +} +} + /* Register thread to KA framework. */ void ka_register_pmd_thread(int tid OVS_UNUSED, bool thread_is_pmd OVS_UNUSED, diff --git a/lib/keepalive.h b/lib/keepalive.h index c96f2d0..f74b23a 100644 --- a/lib/keepalive.h +++ b/lib/keepalive.h @@ -76,6 +76,7 @@ void ka_unregister_pmd_thread(int, bool, unsigned); void ka_mark_pmd_thread_alive(void); void ka_mark_pmd_thread_sleep(void); +void ka_store_pmd_id(unsigned core); uint32_t get_ka_interval(void); int get_ka_init_status(void); -- 2.4.11 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 05/19] keepalive: Add more helper functions to KA framework.
This commit introduces helper functions in 'keepalive' module that are needed to register/unregister PMD threads to KA framework. Also introduce APIs to mark the PMD core states. Signed-off-by: Bhanuprakash Bodireddy--- lib/keepalive.c | 51 +++ lib/keepalive.h | 8 2 files changed, 59 insertions(+) diff --git a/lib/keepalive.c b/lib/keepalive.c index e93dc99..66c969d 100644 --- a/lib/keepalive.c +++ b/lib/keepalive.c @@ -49,6 +49,57 @@ ka_get_pmd_tid(unsigned core_idx) return tid; } +/* Return the Keepalive timer interval. */ +inline uint32_t +get_ka_interval(void) +{ +return keepalive_timer_interval; +} + +int +get_ka_init_status(void) +{ +return ka_init_status; +} + +/* Register thread to KA framework. */ +void +ka_register_pmd_thread(int tid OVS_UNUSED, bool thread_is_pmd OVS_UNUSED, + unsigned core_id) +{ +if (ka_is_enabled()) { +dpdk_register_pmd_core(core_id); +} +} + +/* Unregister thread from KA framework. */ +void +ka_unregister_pmd_thread(int tid OVS_UNUSED, bool thread_is_pmd OVS_UNUSED, + unsigned core_id) +{ +if (ka_is_enabled()) { +dpdk_unregister_pmd_core(core_id); +} +} + +/* Mark packet processing core alive. */ +void +ka_mark_pmd_thread_alive(void) +{ +if (ka_is_enabled()) { +dpdk_mark_pmd_core_alive(); +} +} + +/* Mark packet processing core as idle. */ +inline void +ka_mark_pmd_thread_sleep(void) +{ +if (ka_is_enabled()) { +dpdk_mark_pmd_core_sleep(); +} +} + void ka_set_pmd_state_ts(unsigned core_id, enum keepalive_state state, uint64_t last_alive) diff --git a/lib/keepalive.h b/lib/keepalive.h index b87b66f..c96f2d0 100644 --- a/lib/keepalive.h +++ b/lib/keepalive.h @@ -71,4 +71,12 @@ void ka_set_pmd_state_ts(unsigned, enum keepalive_state, uint64_t); int ka_get_pmd_tid(unsigned core); bool ka_is_enabled(void); +void ka_register_pmd_thread(int, bool, unsigned); +void ka_unregister_pmd_thread(int, bool, unsigned); +void ka_mark_pmd_thread_alive(void); +void ka_mark_pmd_thread_sleep(void); + +uint32_t get_ka_interval(void); +int get_ka_init_status(void); + #endif /* keepalive.h */ -- 2.4.11 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 04/19] Keepalive: Add initial keepalive support.
This commit introduces the initial keepalive support by adding 'keepalive' module and also helper and initialization functions that will be invoked by later commits. This commit adds new ovsdb column "keepalive" that shows the status of the datapath threads. This is implemented for DPDK datapath and only status of PMD threads is reported. For eg: To enable keepalive feature. 'ovs-vsctl --no-wait set Open_vSwitch . other_config:enable-keepalive=true' To set timer interval of 5000ms for monitoring packet processing cores. 'ovs-vsctl --no-wait set Open_vSwitch . \ other_config:keepalive-interval="5000" Signed-off-by: Bhanuprakash Bodireddy--- lib/automake.mk| 2 + lib/dpdk-stub.c| 6 ++ lib/dpdk.c | 30 +++-- lib/dpdk.h | 4 ++ lib/keepalive.c| 157 + lib/keepalive.h| 74 + lib/netdev-dpdk.c | 61 +- lib/netdev-dpdk.h | 5 ++ vswitchd/bridge.c | 8 +++ vswitchd/vswitch.ovsschema | 8 ++- vswitchd/vswitch.xml | 49 ++ 11 files changed, 397 insertions(+), 7 deletions(-) create mode 100644 lib/keepalive.c create mode 100644 lib/keepalive.h diff --git a/lib/automake.mk b/lib/automake.mk index 2415f4c..0d99f0a 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -110,6 +110,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/json.c \ lib/jsonrpc.c \ lib/jsonrpc.h \ + lib/keepalive.c \ + lib/keepalive.h \ lib/lacp.c \ lib/lacp.h \ lib/latch.h \ diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c index d7fb19b..b4f111a 100644 --- a/lib/dpdk-stub.c +++ b/lib/dpdk-stub.c @@ -49,6 +49,12 @@ dpdk_get_vhost_sock_dir(void) return NULL; } +bool +dpdk_is_enabled(void) +{ +return false; +} + void dpdk_register_pmd_core(unsigned core_id OVS_UNUSED) { diff --git a/lib/dpdk.c b/lib/dpdk.c index 8db63bf..250cc2f 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -32,6 +32,7 @@ #include "dirs.h" #include "fatal-signal.h" +#include "keepalive.h" #include "netdev-dpdk.h" #include "openvswitch/dynamic-string.h" #include "openvswitch/vlog.h" @@ -42,6 +43,7 @@ VLOG_DEFINE_THIS_MODULE(dpdk); static FILE *log_stream = NULL; /* Stream for DPDK log redirection */ static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */ +static bool dpdk_enabled = false; /* DPDK status. */ static int process_vhost_flags(char *flag, const char *default_val, int size, @@ -303,6 +305,12 @@ static cookie_io_functions_t dpdk_log_func = { .write = dpdk_log_write, }; +bool +dpdk_is_enabled(void) +{ +return dpdk_enabled; +} + static void dpdk_init__(const struct smap *ovs_other_config) { @@ -456,9 +464,7 @@ dpdk_init__(const struct smap *ovs_other_config) void dpdk_init(const struct smap *ovs_other_config) { -static bool enabled = false; - -if (enabled || !ovs_other_config) { +if (dpdk_enabled || !ovs_other_config) { return; } @@ -468,7 +474,7 @@ dpdk_init(const struct smap *ovs_other_config) if (ovsthread_once_start(_enable)) { VLOG_INFO("DPDK Enabled - initializing..."); dpdk_init__(ovs_other_config); -enabled = true; +dpdk_enabled = true; VLOG_INFO("DPDK Enabled - initialized"); ovsthread_once_done(_enable); } @@ -477,6 +483,22 @@ dpdk_init(const struct smap *ovs_other_config) } } +int +dpdk_ka_init(struct keepalive_info *ka_info) +{ +/* Initialize keepalive subsystem */ +if ((rte_global_keepalive_info = +rte_keepalive_create(_failcore_cb, ka_info)) == NULL) { +VLOG_ERR("Keepalive initialization failed."); +return -1; +} else { +rte_keepalive_register_relay_callback(rte_global_keepalive_info, +dpdk_ka_update_core_state, ka_info); +} + +return 0; +} + const char * dpdk_get_vhost_sock_dir(void) { diff --git a/lib/dpdk.h b/lib/dpdk.h index 3f31211..7619730 100644 --- a/lib/dpdk.h +++ b/lib/dpdk.h @@ -17,6 +17,7 @@ #ifndef DPDK_H #define DPDK_H +#include "stdbool.h" #ifdef DPDK_NETDEV #include @@ -31,9 +32,12 @@ #endif /* DPDK_NETDEV */ struct smap; +struct keepalive_info; struct rte_keepalive *rte_global_keepalive_info; +bool dpdk_is_enabled(void); void dpdk_init(const struct smap *ovs_other_config); +int dpdk_ka_init(struct keepalive_info *ka_info); void dpdk_set_lcore_id(unsigned cpu); const char *dpdk_get_vhost_sock_dir(void); diff --git a/lib/keepalive.c b/lib/keepalive.c new file mode 100644 index 000..e93dc99 --- /dev/null +++ b/lib/keepalive.c @@ -0,0 +1,157 @@ +/* + * Copyright (c) 2014, 2015, 2016, 2017 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with
[ovs-dev] [PATCH v3 03/19] process: Add helper function to retrieve process status.
Implement helper function to retrieve the process status. This commit also enables the fields relating to process name, state and core the process was last scheduled. The APIs will be used by keepalive monitoring framework in future commits. Signed-off-by: Bhanuprakash Bodireddy--- lib/process.c | 56 ++-- lib/process.h | 13 + 2 files changed, 63 insertions(+), 6 deletions(-) diff --git a/lib/process.c b/lib/process.c index 0b8d195..cbf83db 100644 --- a/lib/process.c +++ b/lib/process.c @@ -64,7 +64,23 @@ struct raw_process_info { long long int uptime; /* ms since started. */ long long int cputime; /* ms of CPU used during 'uptime'. */ pid_t ppid; /* Parent. */ -char name[18]; /* Name (surrounded by parentheses). */ +char state; /* Process state. */ +int core_id;/* Core id last executed on. */ +char name[18]; /* Name. */ +}; + +struct pstate2Num { +char pidstate; +int num; +}; + +const struct pstate2Num pstate_map[] = { +{ 'S', STOPPED_STATE }, +{ 'R', ACTIVE_STATE }, +{ 't', TRACED_STATE }, +{ 'Z', DEFUNC_STATE }, +{ 'D', UNINTERRUPTIBLE_SLEEP_STATE }, +{ '\0', UNUSED_STATE }, }; /* Pipe used to signal child termination. */ @@ -421,8 +437,8 @@ get_raw_process_info(pid_t pid, struct raw_process_info *raw) n = fscanf(stream, "%*d " /* (1. pid) */ - "%17s " /* 2. process name */ - "%*c " /* (3. state) */ + "(%17[^)]) " /* 2. process name */ + "%c "/* 3. state */ "%lu " /* 4. ppid */ "%*d " /* (5. pgid) */ "%*d " /* (6. sid) */ @@ -458,7 +474,7 @@ get_raw_process_info(pid_t pid, struct raw_process_info *raw) "%*u " /* (36. always 0) */ "%*u " /* (37. always 0) */ "%*d " /* (38. exit_signal) */ - "%*d " /* (39. task_cpu) */ + "%d "/* 39. task_cpu */ #if 0 /* These are here for documentation but #if'd out to save * actually parsing them from the stream for no benefit. */ @@ -468,9 +484,10 @@ get_raw_process_info(pid_t pid, struct raw_process_info *raw) "%*lu " /* (43. gtime) */ "%*ld" /* (44. cgtime) */ #endif - , raw->name, , , , _time, , ); + , raw->name, >state, , , , _time, + , , >core_id); fclose(stream); -if (n != 7) { +if (n != 9) { VLOG_ERR_ONCE("%s: fscanf failed", file_name); return false; } @@ -496,12 +513,15 @@ get_process_info(pid_t pid, struct process_info *pinfo) return false; } +ovs_strlcpy(pinfo->name, child.name, sizeof pinfo->name); +pinfo->state = child.state; pinfo->vsz = child.vsz; pinfo->rss = child.rss; pinfo->booted = child.uptime; pinfo->crashes = 0; pinfo->uptime = child.uptime; pinfo->cputime = child.cputime; +pinfo->core_id = child.core_id; if (child.ppid) { struct raw_process_info parent; @@ -579,6 +599,30 @@ process_run(void) #endif } +bool +process_is_active(int pid) +{ +struct process_info pinfo; +int pstate = UNUSED_STATE; + +ovs_assert(LINUX); + +int success = get_process_info(pid, ); +if (success) { +for (int i = 0; pstate_map[i].pidstate != '\0'; i++) { +if (pstate_map[i].pidstate == pinfo.state) { +VLOG_DBG_ONCE("The state is %c", pstate_map[i].pidstate); +pstate = pstate_map[i].num; +break; +} +} + +if (pstate == ACTIVE_STATE) { +return true; +} +} +return false; +} /* Causes the next call to poll_block() to wake up when process 'p' has * exited. */ diff --git a/lib/process.h b/lib/process.h index 999ac68..69ab77f 100644 --- a/lib/process.h +++ b/lib/process.h @@ -20,6 +20,15 @@ #include #include +enum process_states { +UNUSED_STATE, +STOPPED_STATE, +ACTIVE_STATE, +TRACED_STATE, +DEFUNC_STATE, +UNINTERRUPTIBLE_SLEEP_STATE +}; + struct process; struct process_info { @@ -29,6 +38,9 @@ struct process_info { int crashes;/* # of crashes (usually 0). */ long long int uptime; /* ms since last (re)started by monitor. */ long long int cputime; /* ms of CPU used during 'uptime'. */ +char state; +int core_id; +char name[18]; }; /* Starting and monitoring subprocesses. @@ -47,6 +59,7 @@ bool process_exited(struct process *); int process_status(const struct process *); void process_run(void); void process_wait(struct process
[ovs-dev] [PATCH v3 02/19] process: Avoid warnings compiling process.c
This commit fixes the following "sparse" warning: lib/process.c:439:16: error: use of assignment suppression and length modifier together in gnu_scanf format [-Werror=format=]. This fix doesn't need any other changes as the fields aren't used for now. Signed-off-by: Bhanuprakash Bodireddy--- lib/process.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/lib/process.c b/lib/process.c index 3e119b5..0b8d195 100644 --- a/lib/process.c +++ b/lib/process.c @@ -444,24 +444,24 @@ get_raw_process_info(pid_t pid, struct raw_process_info *raw) "%llu " /* 22. start_time */ "%llu " /* 23. vsize */ "%llu " /* 24. rss */ + "%*u " /* (25. rsslim) */ + "%*u " /* (26. start_code) */ + "%*u " /* (27. end_code) */ + "%*u " /* (28. start_stack) */ + "%*u " /* (29. esp) */ + "%*u " /* (30. eip) */ + "%*u " /* (31. pending signals) */ + "%*u " /* (32. blocked signals) */ + "%*u " /* (33. ignored signals) */ + "%*u " /* (34. caught signals) */ + "%*u " /* (35. whcan) */ + "%*u " /* (36. always 0) */ + "%*u " /* (37. always 0) */ + "%*d " /* (38. exit_signal) */ + "%*d " /* (39. task_cpu) */ #if 0 /* These are here for documentation but #if'd out to save * actually parsing them from the stream for no benefit. */ - "%*lu " /* (25. rsslim) */ - "%*lu " /* (26. start_code) */ - "%*lu " /* (27. end_code) */ - "%*lu " /* (28. start_stack) */ - "%*lu " /* (29. esp) */ - "%*lu " /* (30. eip) */ - "%*lu " /* (31. pending signals) */ - "%*lu " /* (32. blocked signals) */ - "%*lu " /* (33. ignored signals) */ - "%*lu " /* (34. caught signals) */ - "%*lu " /* (35. whcan) */ - "%*lu " /* (36. always 0) */ - "%*lu " /* (37. always 0) */ - "%*d " /* (38. exit_signal) */ - "%*d " /* (39. task_cpu) */ "%*u " /* (40. rt_priority) */ "%*u " /* (41. policy) */ "%*llu " /* (42. blkio_ticks) */ -- 2.4.11 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 01/19] dpdk: Add helper functions for DPDK datapath keepalive.
Introduce helper functions in 'dpdk' module that are needed for DPDK keepalive functionality. Also add dummy functions in 'dpdk-stub' module that are needed when DPDK datapath is not available. Signed-off-by: Bhanuprakash Bodireddy--- lib/dpdk-stub.c | 24 lib/dpdk.c | 31 +++ lib/dpdk.h | 7 +++ 3 files changed, 62 insertions(+) diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c index daef729..d7fb19b 100644 --- a/lib/dpdk-stub.c +++ b/lib/dpdk-stub.c @@ -48,3 +48,27 @@ dpdk_get_vhost_sock_dir(void) { return NULL; } + +void +dpdk_register_pmd_core(unsigned core_id OVS_UNUSED) +{ +/* Nothing */ +} + +void +dpdk_unregister_pmd_core(unsigned core_id OVS_UNUSED) +{ +/* Nothing */ +} + +void +dpdk_mark_pmd_core_alive(void) +{ +/* Nothing */ +} + +void +dpdk_mark_pmd_core_sleep(void) +{ +/* Nothing */ +} diff --git a/lib/dpdk.c b/lib/dpdk.c index 8da6c32..8db63bf 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -22,6 +22,7 @@ #include #include +#include #include #include #ifdef DPDK_PDUMP @@ -489,3 +490,33 @@ dpdk_set_lcore_id(unsigned cpu) ovs_assert(cpu != NON_PMD_CORE_ID); RTE_PER_LCORE(_lcore_id) = cpu; } + +/* Register packet processing core 'core_id' for liveness checks. */ +void +dpdk_register_pmd_core(unsigned core) +{ +rte_keepalive_register_core(rte_global_keepalive_info, core); +} + +void +dpdk_unregister_pmd_core(unsigned core OVS_UNUSED) +{ +/* XXX: DPDK unfortunately hasn't implemented unregister API + * This will be fixed later, instead use sleep API now. + */ +rte_keepalive_mark_sleep(rte_global_keepalive_info); +} + +/* Mark packet processing core alive. */ +void +dpdk_mark_pmd_core_alive(void) +{ +rte_keepalive_mark_alive(rte_global_keepalive_info); +} + +/* Mark packet processing core as idle. */ +void +dpdk_mark_pmd_core_sleep(void) +{ +rte_keepalive_mark_sleep(rte_global_keepalive_info); +} diff --git a/lib/dpdk.h b/lib/dpdk.h index 673a1f1..3f31211 100644 --- a/lib/dpdk.h +++ b/lib/dpdk.h @@ -32,8 +32,15 @@ struct smap; +struct rte_keepalive *rte_global_keepalive_info; void dpdk_init(const struct smap *ovs_other_config); void dpdk_set_lcore_id(unsigned cpu); const char *dpdk_get_vhost_sock_dir(void); +/* Keepalive APIs */ +void dpdk_register_pmd_core(unsigned core_id); +void dpdk_unregister_pmd_core(unsigned core_id); +void dpdk_mark_pmd_core_alive(void); +void dpdk_mark_pmd_core_sleep(void); + #endif /* dpdk.h */ -- 2.4.11 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 00/19] Add OVS DPDK keep-alive functionality.
Keepalive feature is aimed at achieving Fastpath Service Assurance in OVS-DPDK deployments. It adds support for monitoring the packet processing cores(PMD thread cores) by dispatching heartbeats at regular intervals. Incase of heartbeat misses additional health checks are enabled on the PMD thread to detect the failure and the same shall be reported to higher level fault management systems/frameworks. The implementation uses OVSDB for reporting the health of the PMD threads. Any external monitoring application can read the status from OVSDB at regular intervals (or) subscribe to the updates in OVSDB so that they get notified when the changes happen on OVSDB. keepalive info struct is created and initialized for storing the status of the PMD threads. This is initialized by main thread(vswitchd) as part of init process and will be periodically updated by 'keepalive' thread. keepalive feature can be enabled through below OVSDB settings. enable-keepalive=true - Keepalive feature is disabled by default. keepalive-interval="5000" - Timer interval in milliseconds for monitoring the packet processing cores. When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes up at regular intervals to update the timestamp and status of pmd cores in keepalive info struct. This information shall be read by vswitchd thread and write the status in to 'keepalive' column of Open_vSwitch table in OVSDB. An external monitoring framework like collectd with ovs events support can read (or) subscribe to the datapath status changes in ovsdb. When the state is updated, the collectd shall be notified and will eventually relay the status to ceilometer service running in the controller. Below is the high level overview of deployment model. Compute NodeControllerCompute Node Collectd <--> Ceilometer <> Collectd OvS DPDK OvS DPDK +-+ | VM | +--+--+ \---+---/ | +--+---+ ++--+ +--+---+ | OVS |-> | ovsevents plugin| --> | collectd | +--+---+ ++--+ +--+---+ +--+-+ +---++ | | Ceilometer | <-- | collectd ceilometer plugin | <--- +--+-+ +---++ github: The patches can be found here: https://github.com/bbodired/ovs (Last master commit e7cd8c363) Performance impact: No noticeable performance or latency impact is observed with KA feature enabled. - v2 -> v3 * Rebase. * Verified with dpdk-stable-17.05.1 release. * Fixed build issues with MSVC and cross checked with appveyor. v1 -> v2 * Rebase * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it is already applied as separate patch. RFCv3 -> v1 * Made changes to fix failures in some unit test cases. * some more code cleanup w.r.t process related APIs. RFCv2 -> RFCv3 * Remove POSIX shared memory block implementation (suggested by Aaron). * Rework the logic to register and track threads instead of cores. This way in the future any thread can be registered to KA framework. For now only PMD threads are tracked (suggested by Aaron). * Refactor few APIs and further clean up the code. RFCv1 -> RFCv2 * Merged the xml and schema commits to later commit where the actual implementation is done(suggested by Ben). * Fix ovs-appctl keepalive/* hang issue when KA disabled. * Fixed memory leaks with appctl commands for keepalive/pmd-health-show, pmd-xstats-show. * Refactor code and fixed APIs dealing with PMD health monitoring. Bhanuprakash Bodireddy (19): dpdk: Add helper functions for DPDK datapath keepalive. process: Avoid warnings compiling process.c process: Add helper function to retrieve process status. Keepalive: Add initial keepalive support. keepalive: Add more helper functions to KA framework. dpif-netdev: Add helper function to store datapath tids. dpif-netdev: Register packet processing cores to KA framework. dpif-netdev: Enable heartbeats for DPDK datapath. keepalive: Retrieve PMD status periodically. bridge: Update keepalive status in OVSDB keepalive: Add support to query keepalive statistics. keepalive: Add support to query keepalive status. dpif-netdev: Add additional datapath health checks. keepalive: Check the link status as part of PMD health checks. keepalive: Check the packet statistics as part of PMD health checks. keepalive: Check the PMD cycle stats as part of PMD health checks. netdev-dpdk: Enable PMD health checks on heartbeat failure. keepalive: Display extended Keepalive status. Documentation: Update DPDK doc with Keepalive feature. Documentation/howto/dpdk.rst | 90 + lib/automake.mk | 2 + lib/dpdk-stub.c | 36 ++ lib/dpdk.c