Re: [ovs-dev] [PATCH 1/2] revalidator: Revalidate ukeys created from flows.

2017-05-01 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme 

> On May 1, 2017, at 12:58 PM, Joe Stringer  wrote:
> 
> If there is no active ukey for a particular datapath flow, and it is
> dumped from the datapath, then the revalidator threads will assemble a
> ukey based on the datapath flow. This will allow tracking of the stats
> for proper attribution, and future validation of the flow.
> 
> However, until now when creating the ukey in this context, the ukey's
> 'reval_seq' has been set to the current udpif's reval_seq. This implies
> that the flow has been validated against the current flow table.
> However, this is not true - The flow appeared in the datapath without
> any prior knowledge in this OVS instance so we should set up the
> reval_seq of the ukey to ensure that the flow will be validated during
> the current dump/revalidation cycle.
> 
> Refer also revalidate_ukey().
> 
> Fixes: 23597df05226 ("upcall: Create ukeys in handler threads.")
> Signed-off-by: Joe Stringer 
> ---
> ofproto/ofproto-dpif-upcall.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 18be901d0b8a..2e23fe702281 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -1612,7 +1612,7 @@ ukey_create_from_dpif_flow(const struct udpif *udpif,
> }
> 
> dump_seq = seq_read(udpif->dump_seq);
> -reval_seq = seq_read(udpif->reval_seq);
> +reval_seq = seq_read(udpif->reval_seq) - 1; /* Ensure revalidation. */
> ofpbuf_use_const(&actions, &flow->actions, flow->actions_len);
> *ukey = ukey_create__(flow->key, flow->key_len,
>   flow->mask, flow->mask_len, flow->ufid_present,
> -- 
> 2.11.1
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] revalidator: Fix logging of xlate_key() failure.

2017-05-01 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme 

> On May 1, 2017, at 12:58 PM, Joe Stringer  wrote:
> 
> This was being logged using xlate_strerror(), but the return code is
> actually an errno code. Use ovs_strerror() instead.
> 
> Fixes: dd0dc9eda0e0 ("revalidator: Reuse xlate_ukey from deletion.")
> Signed-off-by: Joe Stringer 
> ---
> ofproto/ofproto-dpif-upcall.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 2e23fe702281..21916731fa07 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -2217,8 +2217,8 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, 
> size_t n_ops)
> if (error) {
> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> 
> -VLOG_WARN_RL(&rl, "xlate_actions failed (%s)!",
> - xlate_strerror(error));
> +VLOG_WARN_RL(&rl, "xlate_key failed (%s)!",
> + ovs_strerror(error));
> } else {
> xlate_out_uninit(&ctx.xout);
> if (netflow) {
> -- 
> 2.11.1
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] tunneling: Avoid recirculation on datapath by computing the recirculate actions at translate time.

2017-05-10 Thread Jarno Rajahalme

> On May 10, 2017, at 12:59 PM, Andy Zhou  wrote:
> 
> On Wed, May 10, 2017 at 7:56 AM, William Tu  > wrote:
>>> It may be cleaner if we add a new trunc action for the datapath, say
>>> trunc2  that applies
>>> to all outputs within the clone.
>>> 
>>> So the translation will look like: clone(trunc2, native tunnel
>>> translation). Would this
>>> approach work?
>>> 
>> 
>> Or how about we apply actual packet truncation when clone action
>> follows truncate action?
>> Now we apply actual packet truncation when:
>> actions=trunc, output
>> actions=trunc, tunnel_push
>> actions=trunc, sample
> 
>> 
>> If we add clone as another truncate target, then
>> actions = trunc(100), clone(tnl(...)),  actionX,
>> Inside clone will see packet of size 100, and actionX sees original
>> size. Then I think we don't need to introduce trunc2?
> 
> This is a reasonable approach. Thanks for the suggestion.
> 
> Picking up the topic of trunc on patch port.
> 
> Instead of banning trunc output to a patch port, any down side of
> translating that
> to trunc, clone()? After all, native tunneling
> looks a lot like patch port conceptually.
> 

Right, why should truncated OUTPUT to a patch port behave any different from 
any other OUTPUT port?

  Jarno

> 
>> 
>> Regards,
>> William
>> 
 
 Without the "Avoid recirculation" patch we have two datapath flows, 
 because the
 packet is recirculated. At the end of the first flow the packet size is 
 changed
 and the packet with modified size enters the OF pipeline again.
 
 What is the reason not to change packet size when truncate action is 
 applied?
 
>>> 
>>> One of the reasons could be that we introduced trunc before clone. 
>>> Otherwise, a
>>> clone(trunc2, output:x) is equivalent to trunc, output:x.  Note that
>>> the trunc datapath
>>> action is different than other datapath actions, which usually applies
>>> to all following
>>> actions. Native tunneling may be the first use case that motivates
>>> trunc2, which should
>>> have the normal datapath action behavior.
>>> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] tunneling: Avoid recirculation on datapath by computing the recirculate actions at translate time.

2017-05-10 Thread Jarno Rajahalme
> 
> On May 10, 2017, at 12:59 PM, Andy Zhou  > wrote:
> 
> On Wed, May 10, 2017 at 7:56 AM, William Tu  > wrote:
>>> It may be cleaner if we add a new trunc action for the datapath, say
>>> trunc2  that applies
>>> to all outputs within the clone.
>>> 
>>> So the translation will look like: clone(trunc2, native tunnel
>>> translation). Would this
>>> approach work?
>>> 
>> 
>> Or how about we apply actual packet truncation when clone action
>> follows truncate action?
>> Now we apply actual packet truncation when:
>> actions=trunc, output
>> actions=trunc, tunnel_push
>> actions=trunc, sample
> 
>> 
>> If we add clone as another truncate target, then
>> actions = trunc(100), clone(tnl(...)),  actionX,
>> Inside clone will see packet of size 100, and actionX sees original
>> size. Then I think we don't need to introduce trunc2?
> 
> This is a reasonable approach. Thanks for the suggestion.
> 
> Picking up the topic of trunc on patch port.
> 
> Instead of banning trunc output to a patch port, any down side of
> translating that
> to trunc, clone()? After all, native tunneling
> looks a lot like patch port conceptually.
> 

Right, why should truncated OUTPUT to a patch port behave any different from 
any other OUTPUT port?

  Jarno

> 
>> 
>> Regards,
>> William
>> 
 
 Without the "Avoid recirculation" patch we have two datapath flows, 
 because the
 packet is recirculated. At the end of the first flow the packet size is 
 changed
 and the packet with modified size enters the OF pipeline again.
 
 What is the reason not to change packet size when truncate action is 
 applied?
 
>>> 
>>> One of the reasons could be that we introduced trunc before clone. 
>>> Otherwise, a
>>> clone(trunc2, output:x) is equivalent to trunc, output:x.  Note that
>>> the trunc datapath
>>> action is different than other datapath actions, which usually applies
>>> to all following
>>> actions. Native tunneling may be the first use case that motivates
>>> trunc2, which should
>>> have the normal datapath action behavior.
>>> 
> ___
> dev mailing list
> d...@openvswitch.org 
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofp-actions: Properly interpret "output:in_port".

2017-06-12 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme mailto:ja...@ovn.org>>

Maybe it would be worthwhile elaborating on the subtle difference in the commit 
message? I.e., “in_port” should match to special OpenFlow port number that 
outputs to the incoming port, while normally output to the incoming port would 
be suppressed?

  Jarno

> On Jun 12, 2017, at 8:41 AM, Ben Pfaff  wrote:
> 
> It was being misinterpreted as output:NXM_OF_IN_PORT[], which is subtly
> different and doesn't do anything useful.
> 
> CC: Jarno Rajahalme 
> Fixes: 21b2fa617126 ("ofp-parse: Allow match field names in actions and 
> brackets in matches.")
> Reported-by: nickcooper-zhangtonghao 
> Signed-off-by: Ben Pfaff 
> ---
> lib/ofp-actions.c  | 36 +++-
> tests/ovs-ofctl.at |  2 ++
> 2 files changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
> index d5e4623d0291..f9140f4e9a7e 100644
> --- a/lib/ofp-actions.c
> +++ b/lib/ofp-actions.c
> @@ -635,27 +635,29 @@ parse_OUTPUT(const char *arg,
> 
> output_trunc = ofpact_put_OUTPUT_TRUNC(ofpacts);
> return parse_truncate_subfield(output_trunc, arg, port_map);
> -} else {
> -struct mf_subfield src;
> -char *error = mf_parse_subfield(&src, arg);
> -if (!error) {
> -struct ofpact_output_reg *output_reg;
> +}
> 
> -output_reg = ofpact_put_OUTPUT_REG(ofpacts);
> -output_reg->max_len = UINT16_MAX;
> -output_reg->src = src;
> -} else {
> -free(error);
> -struct ofpact_output *output;
> +ofp_port_t port;
> +if (ofputil_port_from_string(arg, port_map, &port)) {
> +struct ofpact_output *output = ofpact_put_OUTPUT(ofpacts);
> +output->port = port;
> +output->max_len = output->port == OFPP_CONTROLLER ? UINT16_MAX : 0;
> +return NULL;
> +}
> 
> -output = ofpact_put_OUTPUT(ofpacts);
> -if (!ofputil_port_from_string(arg, port_map, &output->port)) {
> -return xasprintf("%s: output to unknown port", arg);
> -}
> -output->max_len = output->port == OFPP_CONTROLLER ? UINT16_MAX : 
> 0;
> -}
> +struct mf_subfield src;
> +char *error = mf_parse_subfield(&src, arg);
> +if (!error) {
> +struct ofpact_output_reg *output_reg;
> +
> +output_reg = ofpact_put_OUTPUT_REG(ofpacts);
> +output_reg->max_len = UINT16_MAX;
> +output_reg->src = src;
> return NULL;
> }
> +free(error);
> +
> +return xasprintf("%s: output to unknown port", arg);
> }
> 
> static void
> diff --git a/tests/ovs-ofctl.at b/tests/ovs-ofctl.at
> index 6afe8f766627..52eaf0320cd5 100644
> --- a/tests/ovs-ofctl.at
> +++ b/tests/ovs-ofctl.at
> @@ -207,6 +207,7 @@ 
> ipv6,actions=ct(commit,nat(src=fe80::20c:29ff:fe88:a18b,random))
> ipv6,actions=ct(commit,nat(src=fe80::20c:29ff:fe88:1-fe80::20c:29ff:fe88:a18b,random))
> ipv6,actions=ct(commit,nat(src=[fe80::20c:29ff:fe88:1]-[fe80::20c:29ff:fe88:a18b]:255-4096,random))
> tcp,actions=ct(commit,nat(src=10.1.1.240-10.1.1.255),alg=ftp)
> +actions=in_port,output:in_port
> ]])
> 
> AT_CHECK([ovs-ofctl parse-flows flows.txt
> @@ -240,6 +241,7 @@ OFPT_FLOW_MOD: ADD ipv6 
> actions=ct(commit,nat(src=fe80::20c:29ff:fe88:a18b,rando
> OFPT_FLOW_MOD: ADD ipv6 
> actions=ct(commit,nat(src=fe80::20c:29ff:fe88:1-fe80::20c:29ff:fe88:a18b,random))
> OFPT_FLOW_MOD: ADD ipv6 
> actions=ct(commit,nat(src=[fe80::20c:29ff:fe88:1]-[fe80::20c:29ff:fe88:a18b]:255-4096,random))
> OFPT_FLOW_MOD: ADD tcp 
> actions=ct(commit,nat(src=10.1.1.240-10.1.1.255),alg=ftp)
> +OFPT_FLOW_MOD: ADD actions=IN_PORT,IN_PORT
> ]])
> AT_CLEANUP
> 
> -- 
> 2.10.2
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovs-ofctl: Avoid read overrun in ofperr_decode_msg().

2017-06-13 Thread Jarno Rajahalme
Seems like I leaped from the fact that error message’s payload must contain at 
least 64 bytes of the message causing the error (or, less, if the message 
length was less than 64), to the erroneous notion that the whole error message 
would only need 64 bytes of storage. Thanks for fixing this.

Acked-by: Jarno Rajahlame mailto:ja...@ovn.org>>

> On Jun 13, 2017, at 4:04 PM, Ben Pfaff  wrote:
> 
> vconn_add_bundle_error() was keeping at most 64 bytes of an OpenFlow
> error message, then it was passing it to ofperr_decode_msg(), which assumed
> that the full message was available.  This led to a buffer overread.
> There's no good reason why it was only keeping the first 64 bytes, so this
> commit changes it to keep the whole error message, sidestepping the
> problem.
> 
> struct vconn_bundle_error only existed for this special case, so remove it
> in favor of a chain of ofpbufs.
> 
> Found via gcc's address sanitizer.
> 
> Reported-by: Lance Richardson 
> CC: Jarno Rajahalme 
> Fixes: 506c1ddb3404 ("vconn: Better bundle error management.")
> Signed-off-by: Ben Pfaff 
> ---
> include/openvswitch/vconn.h | 12 
> lib/vconn.c | 25 -
> utilities/ovs-ofctl.c   | 10 ++
> 3 files changed, 14 insertions(+), 33 deletions(-)
> 
> diff --git a/include/openvswitch/vconn.h b/include/openvswitch/vconn.h
> index 40ca9edfe868..90f9bad2c1c9 100644
> --- a/include/openvswitch/vconn.h
> +++ b/include/openvswitch/vconn.h
> @@ -61,18 +61,6 @@ int vconn_dump_flows(struct vconn *, const struct 
> ofputil_flow_stats_request *,
>  enum ofputil_protocol,
>  struct ofputil_flow_stats **fsesp, size_t *n_fsesp);
> 
> -/* Bundle errors must be free()d by the caller. */
> -struct vconn_bundle_error {
> -struct ovs_list list_node;
> -
> -/* OpenFlow header and some of the message contents for error reporting. 
> */
> -union {
> -struct ofp_header ofp_msg;
> -uint8_t ofp_msg_data[64];
> -};
> -};
> -
> -/* Bundle errors must be free()d by the caller. */
> int vconn_bundle_transact(struct vconn *, struct ovs_list *requests,
>   uint16_t bundle_flags,
>   struct ovs_list *errors);
> diff --git a/lib/vconn.c b/lib/vconn.c
> index 6997eaa96e2c..8a9f0ca8fa96 100644
> --- a/lib/vconn.c
> +++ b/lib/vconn.c
> @@ -744,18 +744,6 @@ vconn_recv_block(struct vconn *vconn, struct ofpbuf 
> **msgp)
> return retval;
> }
> 
> -static void
> -vconn_add_bundle_error(const struct ofp_header *oh, struct ovs_list *errors)
> -{
> -if (errors) {
> -struct vconn_bundle_error *err = xmalloc(sizeof *err);
> -size_t len = ntohs(oh->length);
> -
> -memcpy(err->ofp_msg_data, oh, MIN(len, sizeof err->ofp_msg_data));
> -ovs_list_push_back(errors, &err->list_node);
> -}
> -}
> -
> static int
> vconn_recv_xid__(struct vconn *vconn, ovs_be32 xid, struct ofpbuf **replyp,
>  struct ovs_list *errors)
> @@ -781,13 +769,13 @@ vconn_recv_xid__(struct vconn *vconn, ovs_be32 xid, 
> struct ofpbuf **replyp,
> 
> error = ofptype_decode(&type, oh);
> if (!error && type == OFPTYPE_ERROR) {
> -vconn_add_bundle_error(oh, errors);
> +ovs_list_push_back(errors, &reply->list_node);
> } else {
> VLOG_DBG_RL(&bad_ofmsg_rl, "%s: received reply with xid %08"PRIx32
> " != expected %08"PRIx32,
> vconn->name, ntohl(recv_xid), ntohl(xid));
> +ofpbuf_delete(reply);
> }
> -ofpbuf_delete(reply);
> }
> }
> 
> @@ -1078,7 +1066,8 @@ vconn_bundle_reply_validate(struct ofpbuf *reply,
> }
> 
> if (type == OFPTYPE_ERROR) {
> -vconn_add_bundle_error(oh, errors);
> +struct ofpbuf *copy = ofpbuf_clone(reply);
> +ovs_list_push_back(errors, ©->list_node);
> return ofperr_decode_msg(oh, NULL);
> }
> if (type != OFPTYPE_BUNDLE_CONTROL) {
> @@ -1150,13 +1139,13 @@ vconn_recv_error(struct vconn *vconn, struct ovs_list 
> *errors)
> oh = reply->data;
> ofperr = ofptype_decode(&type, oh);
> if (!ofperr && type == OFPTYPE_ERROR) {
> -vconn_add_bundle_error(oh, errors);
> +ovs_list_push_back(errors, &reply->list_node);
> } else {
> VLOG_DBG_RL(&bad_ofmsg_rl,
> "%s: received unexpected reply with xid 
> %08&

Re: [ovs-dev] [BUG] upcall handler thread crash

2017-02-06 Thread Jarno Rajahalme

> On Feb 5, 2017, at 10:49 PM, wangyunjian  wrote:
> 
> My ovs version is 
> openvswitch-2.5.0(http://openvswitch.org/releases/openvswitch-2.5.0.tar.gz). 
> I had modified the code as follows and getted other crash. Do it need a lock 
> to protect the operations
> of mbridge->mbundles hmap(xbridge->xport hmap) between ovs-vswichd thread and 
> the upcall handler(revalidator) thread?
> 
> 413 static struct mbundle *
> 414 mbundle_lookup(const struct mbridge *mbridge, struct ofbundle *ofbundle)
> 415 {
> 416 struct mbundle *mbundle;
> 417 
> 418 HMAP_FOR_EACH_IN_BUCKET (mbundle, hmap_node, hash_pointer(ofbundle, 
> 0),
> 419  &mbridge->mbundles) {
> 420 xsleep(2);
>  //only add xsleep(2)

xsleep() causes the thread to quiesce, basically telling the main thread it is 
OK to delete the bridge, so the new crash you see is not a bug, but caused by 
your change.

Instead, try it out with these changes:

diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 6f8079a..15a398f 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -18,7 +18,7 @@
 
 #include 
 
-#include "hmap.h"
+#include "cmap.h"
 #include "hmapx.h"
 #include "ofproto.h"
 #include "vlan-bitmap.h"
@@ -31,7 +31,7 @@ BUILD_ASSERT_DECL(sizeof(mirror_mask_t) * CHAR_BIT >= 
MAX_MIRRORS);
 
 struct mbridge {
 struct mirror *mirrors[MAX_MIRRORS];
-struct hmap mbundles;
+struct cmap mbundles;
 
 bool need_revalidate;
 bool has_mirrors;
@@ -40,7 +40,7 @@ struct mbridge {
 };
 
 struct mbundle {
-struct hmap_node hmap_node; /* In parent 'mbridge' map. */
+struct cmap_node cmap_node; /* In parent 'mbridge' map. */
 struct ofbundle *ofbundle;
 
 mirror_mask_t src_mirrors;  /* Mirrors triggered when packet received. */
@@ -84,7 +84,7 @@ mbridge_create(void)
 mbridge = xzalloc(sizeof *mbridge);
 ovs_refcount_init(&mbridge->ref_cnt);
 
-hmap_init(&mbridge->mbundles);
+cmap_init(&mbridge->mbundles);
 return mbridge;
 }
 
@@ -101,7 +101,7 @@ mbridge_ref(const struct mbridge *mbridge_)
 void
 mbridge_unref(struct mbridge *mbridge)
 {
-struct mbundle *mbundle, *next;
+struct mbundle *mbundle;
 size_t i;
 
 if (!mbridge) {
@@ -115,11 +115,11 @@ mbridge_unref(struct mbridge *mbridge)
 }
 }
 
-HMAP_FOR_EACH_SAFE (mbundle, next, hmap_node, &mbridge->mbundles) {
+CMAP_FOR_EACH (mbundle, cmap_node, &mbridge->mbundles) {
 mbridge_unregister_bundle(mbridge, mbundle->ofbundle);
 }
 
-hmap_destroy(&mbridge->mbundles);
+cmap_destroy(&mbridge->mbundles);
 free(mbridge);
 }
 }
@@ -147,7 +147,7 @@ mbridge_register_bundle(struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 
 mbundle = xzalloc(sizeof *mbundle);
 mbundle->ofbundle = ofbundle;
-hmap_insert(&mbridge->mbundles, &mbundle->hmap_node,
+cmap_insert(&mbridge->mbundles, &mbundle->cmap_node,
 hash_pointer(ofbundle, 0));
 }
 
@@ -173,8 +173,9 @@ mbridge_unregister_bundle(struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 }
 }
 
-hmap_remove(&mbridge->mbundles, &mbundle->hmap_node);
-free(mbundle);
+cmap_remove(&mbridge->mbundles, &mbundle->cmap_node,
+hash_pointer(ofbundle, 0));
+ovsrcu_postpone(free, mbundle);
 }
 
 mirror_mask_t
@@ -269,7 +270,7 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 
 /* Update mbundles. */
 mirror_bit = MIRROR_MASK_C(1) << mirror->idx;
-HMAP_FOR_EACH (mbundle, hmap_node, &mirror->mbridge->mbundles) {
+CMAP_FOR_EACH (mbundle, cmap_node, &mirror->mbridge->mbundles) {
 if (hmapx_contains(&mirror->srcs, mbundle)) {
 mbundle->src_mirrors |= mirror_bit;
 } else {
@@ -308,7 +309,7 @@ mirror_destroy(struct mbridge *mbridge, void *aux)
 }
 
 mirror_bit = MIRROR_MASK_C(1) << mirror->idx;
-HMAP_FOR_EACH (mbundle, hmap_node, &mbridge->mbundles) {
+CMAP_FOR_EACH (mbundle, cmap_node, &mbridge->mbundles) {
 mbundle->src_mirrors &= ~mirror_bit;
 mbundle->dst_mirrors &= ~mirror_bit;
 mbundle->mirror_out &= ~mirror_bit;
@@ -414,9 +415,9 @@ static struct mbundle *
 mbundle_lookup(const struct mbridge *mbridge, struct ofbundle *ofbundle)
 {
 struct mbundle *mbundle;
+uint32_t hash = hash_pointer(ofbundle, 0);
 
-HMAP_FOR_EACH_IN_BUCKET (mbundle, hmap_node, hash_pointer(ofbundle, 0),
- &mbridge->mbundles) {
+CMAP_FOR_EACH_WITH_HASH (mbundle, cmap_node, hash, &mbridge->mbundles) {
 if (mbundle->ofbundle == ofbundle) {
 return mbundle;
 }
@@ -424,7 +425,7 @@ mbundle_lookup(const struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 return NULL;
 }
 
-/* Looks up each of the 'n_ofbundlees' pointers in 'ofbundlees'

Re: [ovs-dev] [PATCH] doc: Describe backporting process.

2017-02-15 Thread Jarno Rajahalme
Thanks for writing this up - makes my life a little bit easier :-)

Acked-by: Jarno Rajahalme 

> On Feb 15, 2017, at 3:05 PM, Joe Stringer  wrote:
> 
> This patch documents the backporting process, and provides a walkthrough
> for developers who would like to backport upstream Linux patches into
> the Open vSwitch tree. Nothing in this documentation should be
> surprising or new; it merely puts the existing process into words.
> 
> Signed-off-by: Joe Stringer 
> Acked-by: Ben Pfaff 
> ---
> Documentation/automake.mk  |   1 +
> Documentation/index.rst|   1 +
> Documentation/internals/contributing/backports.rst | 232 +
> Documentation/internals/contributing/index.rst |   1 +
> 4 files changed, 235 insertions(+)
> create mode 100644 Documentation/internals/contributing/backports.rst
> 
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index 42553f0b57ff..610d8ccc6f96 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -80,6 +80,7 @@ EXTRA_DIST += \
>   Documentation/internals/release-process.rst \
>   Documentation/internals/security.rst \
>   Documentation/internals/contributing/index.rst \
> + Documentation/internals/contributing/backports.rst \
>   Documentation/internals/contributing/coding-style.rst \
>   Documentation/internals/contributing/coding-style-windows.rst \
>   Documentation/internals/contributing/documentation-style.rst \
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index 02b376fc2a08..8cfb9f3f47a8 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst
> @@ -98,6 +98,7 @@ Learn more about the Open vSwitch project and about how you 
> can contribute:
>   :doc:`internals/security`
> 
> - **Contributing:** :doc:`internals/contributing/submitting-patches` |
> +  :doc:`internals/contributing/backports` |
>   :doc:`internals/contributing/coding-style` |
>   :doc:`internals/contributing/coding-style-windows`
> 
> diff --git a/Documentation/internals/contributing/backports.rst 
> b/Documentation/internals/contributing/backports.rst
> new file mode 100644
> index ..d1fa35007f01
> --- /dev/null
> +++ b/Documentation/internals/contributing/backports.rst
> @@ -0,0 +1,232 @@
> +..
> +  Licensed under the Apache License, Version 2.0 (the "License"); you may
> +  not use this file except in compliance with the License. You may obtain
> +  a copy of the License at
> +
> +  http://www.apache.org/licenses/LICENSE-2.0
> +
> +  Unless required by applicable law or agreed to in writing, software
> +  distributed under the License is distributed on an "AS IS" BASIS, 
> WITHOUT
> +  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See 
> the
> +  License for the specific language governing permissions and limitations
> +  under the License.
> +
> +  Convention for heading levels in Open vSwitch documentation:
> +
> +  ===  Heading 0 (reserved for the title in a document)
> +  ---  Heading 1
> +  ~~~  Heading 2
> +  +++  Heading 3
> +  '''''''  Heading 4
> +
> +  Avoid deeper levels because they do not render well.
> +
> +===
> +Backporting patches
> +===
> +
> +.. note::
> +
> +This is an advanced topic for developers and maintainers. Readers should
> +familiarize themselves with building and running Open vSwitch, with the 
> git
> +tool, and with the Open vSwitch patch submission process.
> +
> +The backporting of patches from one git tree to another takes multiple forms
> +within Open vSwitch, but is broadly applied in the following fashion:
> +
> +- Contributors submit their proposed changes to the latest development branch
> +- Contributors and maintainers provide feedback on the patches
> +- When the change is satisfactory, maintainers apply the patch to the
> +  development branch.
> +- Maintainers backport changes from a development branch to release branches.
> +
> +With regards to Open vSwitch user space code and code that does not comprise
> +the Linux datapath and compat code, the development branch is `master` in the
> +Open vSwitch repository. Patches are applied first to this branch, then to 
> the
> +most recent `branch-X.Y`, then earlier `branch-X.Z`, and so on. The most 
> common
> +kind of patch in this category is a bugfix which affects master and other
> +branches.
> +
> +For Linux datapath code, the primary development branch is in the `net-next`_
> +tree as described in the section bel

[ovs-dev] [PATCH 1/8] datapath: fix flow stats accounting when node 0 is not possible

2017-02-15 Thread Jarno Rajahalme
From: Thadeu Lima de Souza Cascardo 

Upstream commit:

commit 40773966ccf1985a1b2bb570a03cbeaf1cbd4e00
Author: Thadeu Lima de Souza Cascardo 
Date:   Thu Sep 15 19:11:52 2016 -0300

openvswitch: fix flow stats accounting when node 0 is not possible

On a system with only node 1 as possible, all statistics is going to be
accounted on node 0 as it will have a single writer.

However, when getting and clearing the statistics, node 0 is not going
to be considered, as it's not a possible node.

Tested that statistics are not zero on a system with only node 1
possible. Also compile-tested with CONFIG_NUMA off.

Signed-off-by: Thadeu Lima de Souza Cascardo 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

This patch contained a memory leak that is fixed in this backport.
The next patch silently fixed that in upstream, too.

Signed-off-by: Jarno Rajahalme 
---
 datapath/flow.c   | 6 --
 datapath/flow_table.c | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/datapath/flow.c b/datapath/flow.c
index 390286c..6d56644 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -141,7 +141,8 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
*tcp_flags = 0;
memset(ovs_stats, 0, sizeof(*ovs_stats));
 
-   for_each_node(node) {
+   /* We open code this to make sure node 0 is always considered */
+   for (node = 0; node < MAX_NUMNODES; node = next_node(node, 
node_possible_map)) {
struct flow_stats *stats = 
rcu_dereference_ovsl(flow->stats[node]);
 
if (stats) {
@@ -164,7 +165,8 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
 {
int node;
 
-   for_each_node(node) {
+   /* We open code this to make sure node 0 is always considered */
+   for (node = 0; node < MAX_NUMNODES; node = next_node(node, 
node_possible_map)) {
struct flow_stats *stats = ovsl_dereference(flow->stats[node]);
 
if (stats) {
diff --git a/datapath/flow_table.c b/datapath/flow_table.c
index d4204e5..3829b92 100644
--- a/datapath/flow_table.c
+++ b/datapath/flow_table.c
@@ -154,7 +154,8 @@ static void flow_free(struct sw_flow *flow)
kfree(flow->id.unmasked_key);
if (flow->sf_acts)
ovs_nla_free_flow_actions((struct sw_flow_actions __force 
*)flow->sf_acts);
-   for_each_node(node)
+   /* We open code this to make sure node 0 is always considered */
+   for (node = 0; node < MAX_NUMNODES; node = next_node(node, 
node_possible_map))
if (flow->stats[node])
kmem_cache_free(flow_stats_cache,
rcu_dereference_raw(flow->stats[node]));
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/8] Upstream Linux kernel datapath backports.

2017-02-15 Thread Jarno Rajahalme
Many contributors are currently working on backporting upstream Linux
kernel datapath changes to the OVS tree kernel datapath.  This series
addresses apparent gaps in this work as follows:

In this series:
08733a0 netfilter: handle NF_REPEAT from nf_conntrack_in()

Already applied:
56989f6 genetlink: mark families as __ro_after_init
489111e genetlink: statically initialize families
a07ea4d genetlink: no longer support using static family IDs

In this series:
9157208 net: use core MTU range checking in core net infra
76e4cc7 openvswitch: remove unnecessary EXPORT_SYMBOLs
f33eb0c openvswitch: remove unused functions

Empty merge commit:
8eed1cd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Skipped (Should be addressed with the main 802.1AD backports):
3145c03 openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev
72ec108 openvswitch: fix vlan subtraction from packet length
20ecf1e openvswitch: vlan: remove wrong likely statement

Skipped (Should be addressed with the main "net: mpls: Fixups for GSO"
backports):
c66549f openvswitch: correctly fragment packet with mpls headers
85de4a2 openvswitch: use mpls_hdr
f7d49bc openvswitch: mpls: set network header correctly on key extract

In this series:
2279994 openvswitch: avoid resetting flow key while installing new flow.
190aa3e openvswitch: Fix Frame-size larger than 1024 bytes warning.
db74a33 openvswitch: use percpu flow stats
4077396 openvswitch: fix flow stats accounting when node 0 is not possible

Already applied:
2679d04 openvswitch: avoid deferred execution of recirc actions
ed22709 openvswitch: use alias for genetlink family names


Jarod Wilson (1):
  datapath: use core MTU range checking in core net infra

Jiri Benc (2):
  datapath: remove unused functions
  datapath: remove unnecessary EXPORT_SYMBOLs

Pablo Neira Ayuso (1):
  datapath: handle NF_REPEAT from nf_conntrack_in()

Thadeu Lima de Souza Cascardo (2):
  datapath: fix flow stats accounting when node 0 is not possible
  datapath: use percpu flow stats

pravin shelar (2):
  datapath: Fix Frame-size larger than 1024 bytes warning.
  datapath: avoid resetting flow key while installing new flow.

 acinclude.m4   |  2 ++
 datapath/conntrack.c   |  5 ++-
 datapath/datapath.c| 23 +++---
 datapath/flow.c| 42 ++
 datapath/flow.h|  4 +--
 datapath/flow_netlink.c|  6 ++--
 datapath/flow_netlink.h|  3 +-
 datapath/flow_table.c  | 25 ++-
 datapath/linux/compat/include/linux/if_ether.h |  8 +
 datapath/vport-internal_dev.c  | 22 --
 datapath/vport-netdev.c|  1 -
 datapath/vport.c   | 17 ---
 datapath/vport.h   |  1 -
 13 files changed, 84 insertions(+), 75 deletions(-)

-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/8] datapath: use percpu flow stats

2017-02-15 Thread Jarno Rajahalme
From: Thadeu Lima de Souza Cascardo 

Upstream commit:

commit db74a3335e0f645e3139c80bcfc90feb01d8e304
Author: Thadeu Lima de Souza Cascardo 
Date:   Thu Sep 15 19:11:53 2016 -0300

openvswitch: use percpu flow stats

Instead of using flow stats per NUMA node, use it per CPU. When using
megaflows, the stats lock can be a bottleneck in scalability.

On a E5-2690 12-core system, usual throughput went from ~4Mpps to
~15Mpps when forwarding between two 40GbE ports with a single flow
configured on the datapath.

This has been tested on a system with possible CPUs 0-7,16-23. After
module removal, there were no corruption on the slab cache.

Signed-off-by: Thadeu Lima de Souza Cascardo 
Cc: pravin shelar 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 
Signed-off-by: Jarno Rajahalme 

Signed-off-by: Jarno Rajahalme 
---
 datapath/flow.c   | 42 ++
 datapath/flow.h   |  4 ++--
 datapath/flow_table.c | 26 +-
 3 files changed, 33 insertions(+), 39 deletions(-)

diff --git a/datapath/flow.c b/datapath/flow.c
index 6d56644..58b0e13 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -71,32 +72,33 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 
tcp_flags,
 {
struct flow_stats *stats;
int node = numa_node_id();
+   int cpu = smp_processor_id();
int len = skb->len + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
 
-   stats = rcu_dereference(flow->stats[node]);
+   stats = rcu_dereference(flow->stats[cpu]);
 
-   /* Check if already have node-specific stats. */
+   /* Check if already have CPU-specific stats. */
if (likely(stats)) {
spin_lock(&stats->lock);
/* Mark if we write on the pre-allocated stats. */
-   if (node == 0 && unlikely(flow->stats_last_writer != node))
-   flow->stats_last_writer = node;
+   if (cpu == 0 && unlikely(flow->stats_last_writer != cpu))
+   flow->stats_last_writer = cpu;
} else {
stats = rcu_dereference(flow->stats[0]); /* Pre-allocated. */
spin_lock(&stats->lock);
 
-   /* If the current NUMA-node is the only writer on the
+   /* If the current CPU is the only writer on the
 * pre-allocated stats keep using them.
 */
-   if (unlikely(flow->stats_last_writer != node)) {
+   if (unlikely(flow->stats_last_writer != cpu)) {
/* A previous locker may have already allocated the
-* stats, so we need to check again.  If node-specific
+* stats, so we need to check again.  If CPU-specific
 * stats were already allocated, we update the pre-
 * allocated stats as we have already locked them.
 */
-   if (likely(flow->stats_last_writer != NUMA_NO_NODE)
-   && likely(!rcu_access_pointer(flow->stats[node]))) {
-   /* Try to allocate node-specific stats. */
+   if (likely(flow->stats_last_writer != -1) &&
+   likely(!rcu_access_pointer(flow->stats[cpu]))) {
+   /* Try to allocate CPU-specific stats. */
struct flow_stats *new_stats;
 
new_stats =
@@ -113,12 +115,12 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 
tcp_flags,
new_stats->tcp_flags = tcp_flags;
spin_lock_init(&new_stats->lock);
 
-   rcu_assign_pointer(flow->stats[node],
+   rcu_assign_pointer(flow->stats[cpu],
   new_stats);
goto unlock;
}
}
-   flow->stats_last_writer = node;
+   flow->stats_last_writer = cpu;
}
}
 
@@ -135,15 +137,15 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
struct ovs_flow_stats *ovs_stats,
unsigned long *used, __be16 *tcp_flags)
 {
-   int node;
+   int cpu;
 
*used = 0;
*tcp_flags = 0;
memset(ovs_stats, 0, sizeof(*ovs_stats));
 
-   /* We open code this to make sure node 0 is always considered */
-   for (node = 0; node < MAX_NUMNODES; node = next_node(node, 
node_possible_map)) {
-  

[ovs-dev] [PATCH 3/8] datapath: Fix Frame-size larger than 1024 bytes warning.

2017-02-15 Thread Jarno Rajahalme
From: pravin shelar 

Upstream commit:

commit 190aa3e77880a05332ea1ccb382a51285d57adb5
Author: pravin shelar 
Date:   Mon Sep 19 13:50:59 2016 -0700

openvswitch: Fix Frame-size larger than 1024 bytes warning.

There is no need to declare separate key on stack,
we can just use sw_flow->key to store the key directly.

This commit fixes following warning:

net/openvswitch/datapath.c: In function ‘ovs_flow_cmd_new’:
net/openvswitch/datapath.c:1080:1: warning: the frame size of 1040 bytes
is larger than 1024 bytes [-Wframe-larger-than=]

Signed-off-by: Pravin B Shelar 
Signed-off-by: David S. Miller 
Signed-off-by: Jarno Rajahalme 

Signed-off-by: Jarno Rajahalme 
---
 datapath/datapath.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/datapath/datapath.c b/datapath/datapath.c
index ce2364a..e4089ef 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -939,7 +939,6 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
struct sw_flow_mask mask;
struct sk_buff *reply;
struct datapath *dp;
-   struct sw_flow_key key;
struct sw_flow_actions *acts;
struct sw_flow_match match;
u32 ufid_flags = ovs_nla_get_ufid_flags(a[OVS_FLOW_ATTR_UFID_FLAGS]);
@@ -967,20 +966,24 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
}
 
/* Extract key. */
-   ovs_match_init(&match, &key, &mask);
+   ovs_match_init(&match, &new_flow->key, &mask);
error = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY],
  a[OVS_FLOW_ATTR_MASK], log);
if (error)
goto err_kfree_flow;
 
-   ovs_flow_mask_key(&new_flow->key, &key, true, &mask);
-
/* Extract flow identifier. */
error = ovs_nla_get_identifier(&new_flow->id, a[OVS_FLOW_ATTR_UFID],
-  &key, log);
+  &new_flow->key, log);
if (error)
goto err_kfree_flow;
 
+   /* unmasked key is needed to match when ufid is not used. */
+   if (ovs_identifier_is_key(&new_flow->id))
+   match.key = new_flow->id.unmasked_key;
+
+   ovs_flow_mask_key(&new_flow->key, &new_flow->key, true, &mask);
+
/* Validate actions. */
error = ovs_nla_copy_actions(net, a[OVS_FLOW_ATTR_ACTIONS],
 &new_flow->key, &acts, log);
@@ -1007,7 +1010,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
if (ovs_identifier_is_ufid(&new_flow->id))
flow = ovs_flow_tbl_lookup_ufid(&dp->table, &new_flow->id);
if (!flow)
-   flow = ovs_flow_tbl_lookup(&dp->table, &key);
+   flow = ovs_flow_tbl_lookup(&dp->table, &new_flow->key);
if (likely(!flow)) {
rcu_assign_pointer(new_flow->sf_acts, acts);
 
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 4/8] datapath: avoid resetting flow key while installing new flow.

2017-02-15 Thread Jarno Rajahalme
From: pravin shelar 

Upstream commit:

commit 2279994d07ab67ff7a1d09bfbd65588332dfb6d8
Author: pravin shelar 
Date:   Mon Sep 19 13:51:00 2016 -0700

openvswitch: avoid resetting flow key while installing new flow.

since commit commit db74a3335e0f6 ("openvswitch: use percpu
flow stats") flow alloc resets flow-key. So there is no need
to reset the flow-key again if OVS is using newly allocated
flow-key.

Signed-off-by: Pravin B Shelar 
Signed-off-by: David S. Miller 
Signed-off-by: Jarno Rajahalme 

Signed-off-by: Jarno Rajahalme 
---
 datapath/datapath.c | 8 
 datapath/flow.c | 2 --
 datapath/flow_netlink.c | 6 --
 datapath/flow_netlink.h | 3 ++-
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/datapath/datapath.c b/datapath/datapath.c
index e4089ef..be433ba 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -966,7 +966,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
}
 
/* Extract key. */
-   ovs_match_init(&match, &new_flow->key, &mask);
+   ovs_match_init(&match, &new_flow->key, false, &mask);
error = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY],
  a[OVS_FLOW_ATTR_MASK], log);
if (error)
@@ -1135,7 +1135,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
 
ufid_present = ovs_nla_get_ufid(&sfid, a[OVS_FLOW_ATTR_UFID], log);
if (a[OVS_FLOW_ATTR_KEY]) {
-   ovs_match_init(&match, &key, &mask);
+   ovs_match_init(&match, &key, true, &mask);
error = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY],
  a[OVS_FLOW_ATTR_MASK], log);
} else if (!ufid_present) {
@@ -1252,7 +1252,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct 
genl_info *info)
 
ufid_present = ovs_nla_get_ufid(&ufid, a[OVS_FLOW_ATTR_UFID], log);
if (a[OVS_FLOW_ATTR_KEY]) {
-   ovs_match_init(&match, &key, NULL);
+   ovs_match_init(&match, &key, true, NULL);
err = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY], NULL,
log);
} else if (!ufid_present) {
@@ -1311,7 +1311,7 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct 
genl_info *info)
 
ufid_present = ovs_nla_get_ufid(&ufid, a[OVS_FLOW_ATTR_UFID], log);
if (a[OVS_FLOW_ATTR_KEY]) {
-   ovs_match_init(&match, &key, NULL);
+   ovs_match_init(&match, &key, true, NULL);
err = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY],
NULL, log);
if (unlikely(err))
diff --git a/datapath/flow.c b/datapath/flow.c
index 58b0e13..d6d0556 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -736,8 +736,6 @@ int ovs_flow_key_extract_userspace(struct net *net, const 
struct nlattr *attr,
 {
int err;
 
-   memset(key, 0, OVS_SW_FLOW_KEY_METADATA_SIZE);
-
/* Extract metadata from netlink attributes. */
err = ovs_nla_get_flow_metadata(net, attr, key, log);
if (err)
diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
index 0f32664..61ae396 100644
--- a/datapath/flow_netlink.c
+++ b/datapath/flow_netlink.c
@@ -1884,13 +1884,15 @@ static int validate_and_copy_sample(struct net *net, 
const struct nlattr *attr,
 
 void ovs_match_init(struct sw_flow_match *match,
struct sw_flow_key *key,
+   bool reset_key,
struct sw_flow_mask *mask)
 {
memset(match, 0, sizeof(*match));
match->key = key;
match->mask = mask;
 
-   memset(key, 0, sizeof(*key));
+   if (reset_key)
+   memset(key, 0, sizeof(*key));
 
if (mask) {
memset(&mask->key, 0, sizeof(mask->key));
@@ -1937,7 +1939,7 @@ static int validate_and_copy_set_tun(const struct nlattr 
*attr,
struct nlattr *a;
int err = 0, start, opts_type;
 
-   ovs_match_init(&match, &key, NULL);
+   ovs_match_init(&match, &key, true, NULL);
opts_type = ip_tun_from_nlattr(nla_data(attr), &match, false, log);
if (opts_type < 0)
return opts_type;
diff --git a/datapath/flow_netlink.h b/datapath/flow_netlink.h
index 1c4208b..f837e7c 100644
--- a/datapath/flow_netlink.h
+++ b/datapath/flow_netlink.h
@@ -41,7 +41,8 @@ size_t ovs_tun_key_attr_size(void);
 size_t ovs_key_attr_size(void);
 
 void ovs_match_init(struct sw_flow_match *match,
-   struct sw_flow_key *key, struct sw_flow_mask *mask);
+   struct sw_flow_key *key, bool reset_key,
+   struct sw_flow_m

[ovs-dev] [PATCH 5/8] datapath: remove unused functions

2017-02-15 Thread Jarno Rajahalme
From: Jiri Benc 

Upstream commit:

commit f33eb0cf9984f79e8643eaac888e4b6a06a8e221
Author: Jiri Benc 
Date:   Wed Oct 19 11:26:36 2016 +0200

openvswitch: remove unused functions

ovs_vport_deferred_free is not used anywhere. It's the only caller of
free_vport_rcu thus this one can be removed, too.

Signed-off-by: Jiri Benc 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 
Signed-off-by: Jarno Rajahalme 

Signed-off-by: Jarno Rajahalme 
---
 datapath/vport.c | 16 
 datapath/vport.h |  1 -
 2 files changed, 17 deletions(-)

diff --git a/datapath/vport.c b/datapath/vport.c
index c29f0b0..7ac4632 100644
--- a/datapath/vport.c
+++ b/datapath/vport.c
@@ -509,22 +509,6 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff 
*skb,
 }
 EXPORT_SYMBOL_GPL(ovs_vport_receive);
 
-static void free_vport_rcu(struct rcu_head *rcu)
-{
-   struct vport *vport = container_of(rcu, struct vport, rcu);
-
-   ovs_vport_free(vport);
-}
-
-void ovs_vport_deferred_free(struct vport *vport)
-{
-   if (!vport)
-   return;
-
-   call_rcu(&vport->rcu, free_vport_rcu);
-}
-EXPORT_SYMBOL_GPL(ovs_vport_deferred_free);
-
 static unsigned int packet_length(const struct sk_buff *skb)
 {
unsigned int length = skb->len - ETH_HLEN;
diff --git a/datapath/vport.h b/datapath/vport.h
index 47995be..d14908f 100644
--- a/datapath/vport.h
+++ b/datapath/vport.h
@@ -152,7 +152,6 @@ struct vport_ops {
 struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *,
  const struct vport_parms *);
 void ovs_vport_free(struct vport *);
-void ovs_vport_deferred_free(struct vport *vport);
 
 #define VPORT_ALIGN 8
 
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 6/8] datapath: remove unnecessary EXPORT_SYMBOLs

2017-02-15 Thread Jarno Rajahalme
From: Jiri Benc 

Upstream commit:

commit 76e4cc7731a1e0c07e202999b9834f9d9be66de4
Author: Jiri Benc 
Date:   Wed Oct 19 11:26:37 2016 +0200

openvswitch: remove unnecessary EXPORT_SYMBOLs

Some symbols exported to other modules are really used only by
openvswitch.ko. Remove the exports.

Tested by loading all 4 openvswitch modules, nothing breaks.

Signed-off-by: Jiri Benc 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/datapath.c | 2 --
 datapath/vport-netdev.c | 1 -
 datapath/vport.c| 1 -
 3 files changed, 4 deletions(-)

diff --git a/datapath/datapath.c b/datapath/datapath.c
index be433ba..64cd781 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -62,7 +62,6 @@
 #include "vport-netdev.h"
 
 int ovs_net_id __read_mostly;
-EXPORT_SYMBOL_GPL(ovs_net_id);
 
 static struct genl_family dp_packet_genl_family;
 static struct genl_family dp_flow_genl_family;
@@ -135,7 +134,6 @@ int lockdep_ovsl_is_held(void)
else
return 1;
 }
-EXPORT_SYMBOL_GPL(lockdep_ovsl_is_held);
 #endif
 
 static int queue_gso_packets(struct datapath *dp, struct sk_buff *,
diff --git a/datapath/vport-netdev.c b/datapath/vport-netdev.c
index 970f7d3..fd97246 100644
--- a/datapath/vport-netdev.c
+++ b/datapath/vport-netdev.c
@@ -167,7 +167,6 @@ void ovs_netdev_detach_dev(struct vport *vport)
netdev_master_upper_dev_get(vport->dev));
dev_set_promiscuity(vport->dev, -1);
 }
-EXPORT_SYMBOL_GPL(ovs_netdev_detach_dev);
 
 static void netdev_destroy(struct vport *vport)
 {
diff --git a/datapath/vport.c b/datapath/vport.c
index 7ac4632..9c8c0f1 100644
--- a/datapath/vport.c
+++ b/datapath/vport.c
@@ -507,7 +507,6 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff 
*skb,
ovs_dp_process_packet(skb, &key);
return 0;
 }
-EXPORT_SYMBOL_GPL(ovs_vport_receive);
 
 static unsigned int packet_length(const struct sk_buff *skb)
 {
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 7/8] datapath: use core MTU range checking in core net infra

2017-02-15 Thread Jarno Rajahalme
From: Jarod Wilson 

Upstream commit:

commit 61e84623ace35ce48975e8f90bbbac7557c43d61
Author: Jarod Wilson 
Date:   Fri Oct 7 22:04:33 2016 -0400

net: centralize net_device min/max MTU checking

While looking into an MTU issue with sfc, I started noticing that almost
every NIC driver with an ndo_change_mtu function implemented almost
exactly the same range checks, and in many cases, that was the only
practical thing their ndo_change_mtu function was doing. Quite a few
drivers have either 68, 64, 60 or 46 as their minimum MTU value checked,
and then various sizes from 1500 to 65535 for their maximum MTU value. We
can remove a whole lot of redundant code here if we simple store min_mtu
and max_mtu in net_device, and check against those in net/core/dev.c's
dev_set_mtu().

In theory, there should be zero functional change with this patch, it just
puts the infrastructure in place. Subsequent patches will attempt to start
using said infrastructure, with theoretically zero change in
functionality.

CC: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
Signed-off-by: David S. Miller 

Upstream commit:

commit 91572088e3fdbf4fe31cf397926d8b890fdb3237
Author: Jarod Wilson 
Date:   Thu Oct 20 13:55:20 2016 -0400

net: use core MTU range checking in core net infra

...

openvswitch:
- set min/max_mtu, remove internal_dev_change_mtu
- note: max_mtu wasn't checked previously, it's been set to 65535, which
  is the largest possible size supported

...

Signed-off-by: Jarod Wilson 
Signed-off-by: David S. Miller 
Signed-off-by: Jarno Rajahalme 

Upstream commit:

commit 425df17ce3a26d98f76e2b6b0af2acf4aeb0b026
    Author: Jarno Rajahalme 
Date:   Tue Feb 14 21:16:28 2017 -0800

openvswitch: Set internal device max mtu to ETH_MAX_MTU.

Commit 91572088e3fd ("net: use core MTU range checking in core net
infra") changed the openvswitch internal device to use the core net
infra for controlling the MTU range, but failed to actually set the
max_mtu as described in the commit message, which now defaults to
ETH_DATA_LEN.

This patch fixes this by setting max_mtu to ETH_MAX_MTU after
ether_setup() call.

Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
Signed-off-by: Jarno Rajahalme 
Signed-off-by: David S. Miller 


This backport detects the new max_mtu field in the struct netdevice
and uses the upstream code if it exists, and local backport code if
not.  The latter case is amended with bounds checks with new upstream
macros ETH_MIN_MTU and ETH_MAX_MTU and the corresponding error
messages from the upstream commit.

Signed-off-by: Jarno Rajahalme 
---
 acinclude.m4   |  2 ++
 datapath/linux/compat/include/linux/if_ether.h |  8 
 datapath/vport-internal_dev.c  | 22 +++---
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index e8b64b5..052a18f 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -510,6 +510,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
   OVS_FIND_PARAM_IFELSE([$KSRC/include/linux/netdevice.h],
 [netdev_master_upper_dev_link], [upper_priv],
 [OVS_DEFINE([HAVE_NETDEV_MASTER_UPPER_DEV_LINK_PRIV])])
+  OVS_FIND_FIELD_IFELSE([$KSRC/include/linux/netdevice.h], [net_device],
+[max_mtu])
 
   OVS_GREP_IFELSE([$KSRC/include/linux/netfilter.h], [nf_hook_state])
   OVS_GREP_IFELSE([$KSRC/include/linux/netfilter.h], [nf_register_net_hook])
diff --git a/datapath/linux/compat/include/linux/if_ether.h 
b/datapath/linux/compat/include/linux/if_ether.h
index ac0f1ed..5eb99bc 100644
--- a/datapath/linux/compat/include/linux/if_ether.h
+++ b/datapath/linux/compat/include/linux/if_ether.h
@@ -3,6 +3,14 @@
 
 #include_next 
 
+#ifndef ETH_MIN_MTU
+#define ETH_MIN_MTU68  /* Min IPv4 MTU per RFC791  */
+#endif
+
+#ifndef ETH_MAX_MTU
+#define ETH_MAX_MTU0xU /* 65535, same as IP_MAX_MTU*/
+#endif
+
 #ifndef ETH_P_802_3_MIN
 #define ETH_P_802_3_MIN0x0600
 #endif
diff --git a/datapath/vport-internal_dev.c b/datapath/vport-internal_dev.c
index cc01c9c..b5db664 100644
--- a/datapath/vport-internal_dev.c
+++ b/datapath/vport-internal_dev.c
@@ -89,14 +89,25 @@ static const struct ethtool_ops internal_dev_ethtool_ops = {
.get_link   = ethtool_op_get_link,
 };
 
-static int internal_dev_change_mtu(struct net_device *netdev, int new_mtu)
+#ifndef HAVE_NET_DEVICE_WITH_MAX_MTU
+static int internal_dev_change_mtu(struct net_device *dev, int new_mtu)
 {
-   if (new_mtu < 68)
+   if (new_mtu < ETH_MIN_MTU) {
+   net_err_ratelimited("%s: Invalid MTU %d requested, hw min %d\n",
+   dev->name,

[ovs-dev] [PATCH 8/8] datapath: handle NF_REPEAT from nf_conntrack_in()

2017-02-15 Thread Jarno Rajahalme
From: Pablo Neira Ayuso 

Upstream commit:

commit 08733a0cb7decce40bbbd0331a0449465f13c444
Author: Pablo Neira Ayuso 
Date:   Thu Nov 3 10:56:43 2016 +0100

netfilter: handle NF_REPEAT from nf_conntrack_in()

NF_REPEAT is only needed from nf_conntrack_in() under a very specific
case required by the TCP protocol tracker, we can handle this case
without returning to the core hook path. Handling of NF_REPEAT from the
nf_reinject() is left untouched.

Signed-off-by: Pablo Neira Ayuso 

This upstream change is impossible to detect at module compile time,
so we keep the NF_REPEAT check after the nf_conntrack_in() call.

Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 3c51ce6..72d25ec 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -769,7 +769,10 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
skb->nfctinfo = IP_CT_NEW;
}
 
-   /* Repeat if requested, see nf_iterate(). */
+   /* Repeat if requested, see nf_iterate().
+* Newer conntrack code no longer returns NF_REPEAT, but
+* it is impossible to detect that at module compile time.
+*/
do {
err = nf_conntrack_in(net, info->family,
  NF_INET_PRE_ROUTING, skb);
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] mirror: Allow concurrent lookups.

2017-02-21 Thread Jarno Rajahalme
Handler threads use a selection of mirror functions with the
assumption that the data referred to is RCU protected, while the
implementation has not provided for this, which can lead to an OVS
crash.

This patch fixes this by making the mbundle lookup RCU-safe by using
cmap instead of hmap and postponing mbundle memory free, as wells as
postponing the frees of the mirrors and the vlan bitmaps of each
mirror.

Note that mirror stats update is still not accurate if multiple
threads do it simultaneously.

A less complete version of this patch (using cmap and RCU postpone
just for the mbridge itself) was tested by Yunjian Wang and was found
to fix the observed crash when running a script that adds and deletes
a port repeatedly.

Reported-by: Yunjian Wang 
Signed-off-by: Jarno Rajahalme 
---
 ofproto/ofproto-dpif-mirror.c | 74 +--
 ofproto/ofproto-dpif-mirror.h | 27 ++--
 2 files changed, 67 insertions(+), 34 deletions(-)

diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 675adf3..7d308aa 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -18,7 +18,7 @@
 
 #include 
 
-#include "openvswitch/hmap.h"
+#include "cmap.h"
 #include "hmapx.h"
 #include "ofproto.h"
 #include "vlan-bitmap.h"
@@ -31,7 +31,7 @@ BUILD_ASSERT_DECL(sizeof(mirror_mask_t) * CHAR_BIT >= 
MAX_MIRRORS);
 
 struct mbridge {
 struct mirror *mirrors[MAX_MIRRORS];
-struct hmap mbundles;
+struct cmap mbundles;
 
 bool need_revalidate;
 bool has_mirrors;
@@ -40,7 +40,7 @@ struct mbridge {
 };
 
 struct mbundle {
-struct hmap_node hmap_node; /* In parent 'mbridge' map. */
+struct cmap_node cmap_node; /* In parent 'mbridge' map. */
 struct ofbundle *ofbundle;
 
 mirror_mask_t src_mirrors;  /* Mirrors triggered when packet received. */
@@ -56,7 +56,12 @@ struct mirror {
 /* Selection criteria. */
 struct hmapx srcs;  /* Contains "struct mbundle*"s. */
 struct hmapx dsts;  /* Contains "struct mbundle*"s. */
-unsigned long *vlans;   /* Bitmap of chosen VLANs, NULL selects all. */
+
+/* This is accessed by handler threads assuming RCU protection (see
+ * mirror_get()), but can be manipulated by mirror_set() without any
+ * explicit synchronization. */
+OVSRCU_TYPE(unsigned long *) vlans;   /* Bitmap of chosen VLANs, NULL
+   * selects all. */
 
 /* Output (exactly one of out == NULL and out_vlan == -1 is true). */
 struct mbundle *out;/* Output port or NULL. */
@@ -86,7 +91,7 @@ mbridge_create(void)
 mbridge = xzalloc(sizeof *mbridge);
 ovs_refcount_init(&mbridge->ref_cnt);
 
-hmap_init(&mbridge->mbundles);
+cmap_init(&mbridge->mbundles);
 return mbridge;
 }
 
@@ -103,7 +108,7 @@ mbridge_ref(const struct mbridge *mbridge_)
 void
 mbridge_unref(struct mbridge *mbridge)
 {
-struct mbundle *mbundle, *next;
+struct mbundle *mbundle;
 size_t i;
 
 if (!mbridge) {
@@ -117,12 +122,12 @@ mbridge_unref(struct mbridge *mbridge)
 }
 }
 
-HMAP_FOR_EACH_SAFE (mbundle, next, hmap_node, &mbridge->mbundles) {
+CMAP_FOR_EACH (mbundle, cmap_node, &mbridge->mbundles) {
 mbridge_unregister_bundle(mbridge, mbundle->ofbundle);
 }
 
-hmap_destroy(&mbridge->mbundles);
-free(mbridge);
+cmap_destroy(&mbridge->mbundles);
+ovsrcu_postpone(free, mbridge);
 }
 }
 
@@ -149,7 +154,7 @@ mbridge_register_bundle(struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 
 mbundle = xzalloc(sizeof *mbundle);
 mbundle->ofbundle = ofbundle;
-hmap_insert(&mbridge->mbundles, &mbundle->hmap_node,
+cmap_insert(&mbridge->mbundles, &mbundle->cmap_node,
 hash_pointer(ofbundle, 0));
 }
 
@@ -175,8 +180,9 @@ mbridge_unregister_bundle(struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 }
 }
 
-hmap_remove(&mbridge->mbundles, &mbundle->hmap_node);
-free(mbundle);
+cmap_remove(&mbridge->mbundles, &mbundle->cmap_node,
+hash_pointer(ofbundle, 0));
+ovsrcu_postpone(free, mbundle);
 }
 
 mirror_mask_t
@@ -233,6 +239,8 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 mirror->snaplen = 0;
 }
 
+unsigned long *vlans = ovsrcu_get(unsigned long *, &mirror->vlans);
+
 /* Get the new configuration. */
 if (out_bundle) {
 out = mbundle_lookup(mbridge, out_bundle);
@@ -250,7 +258,7 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 /* If the configuration has not changed, do nothing. */
 if (hmapx_equals(&srcs_map, &mirror->srcs)
 && hmapx_eq

Re: [ovs-dev] [BUG] upcall handler thread crash

2017-02-21 Thread Jarno Rajahalme
Again, thanks for reporting this bug!

I just posted a more complete patch to fix this issue to the ova-dev list. 
Please have a look.

  Jarno

> On Feb 7, 2017, at 4:36 AM, wangyunjian  wrote:
> 
> I have tested patch without issue. Will you submit it as an official patch?
> 
> Thanks,
> 
> Yunjian.
> 
>>> On Feb 5, 2017, at 10:49 PM, wangyunjian  wrote:
>>> 
>>> My ovs version is 
>>> openvswitch-2.5.0(http://openvswitch.org/releases/openvswitch-2.5.0.tar.gz).
>>>  
>>> I had modified the code as follows and getted other crash. Do it need a 
>>> lock to protect the operations
>>> of mbridge->mbundles hmap(xbridge->xport hmap) between ovs-vswichd thread 
>>> and the upcall handler(revalidator) thread?
>>> 
>>> 413 static struct mbundle *
>>> 414 mbundle_lookup(const struct mbridge *mbridge, struct ofbundle *ofbundle)
>>> 415 {
>>> 416 struct mbundle *mbundle;
>>> 417 
>>> 418 HMAP_FOR_EACH_IN_BUCKET (mbundle, hmap_node, hash_pointer(ofbundle, 
>>> 0),
>>> 419  &mbridge->mbundles) {
>>> 420 xsleep(2);  
>>>//only add xsleep(2)
>> 
>> xsleep() causes the thread to quiesce, basically telling the main thread it 
>> is OK to delete the bridge, so the new crash you see is not a bug, but 
>> caused by your change.
>> 
>> Instead, try it out with these changes:
>> 
>> diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
>> index 6f8079a..15a398f 100644
>> --- a/ofproto/ofproto-dpif-mirror.c
>> +++ b/ofproto/ofproto-dpif-mirror.c
>> @@ -18,7 +18,7 @@
>> 
>> #include 
>> 
>> -#include "hmap.h"
>> +#include "cmap.h"
>> #include "hmapx.h"
>> #include "ofproto.h"
>> #include "vlan-bitmap.h"
>> @@ -31,7 +31,7 @@ BUILD_ASSERT_DECL(sizeof(mirror_mask_t) * CHAR_BIT >= 
>> MAX_MIRRORS);
>> 
>> struct mbridge {
>> struct mirror *mirrors[MAX_MIRRORS];
>> -struct hmap mbundles;
>> +struct cmap mbundles;
>> 
>> bool need_revalidate;
>> bool has_mirrors;
>> @@ -40,7 +40,7 @@ struct mbridge {
>> };
>> 
>> struct mbundle {
>> -struct hmap_node hmap_node; /* In parent 'mbridge' map. */
>> +struct cmap_node cmap_node; /* In parent 'mbridge' map. */
>> struct ofbundle *ofbundle;
>> 
>> mirror_mask_t src_mirrors;  /* Mirrors triggered when packet received. */
>> @@ -84,7 +84,7 @@ mbridge_create(void)
>> mbridge = xzalloc(sizeof *mbridge);
>> ovs_refcount_init(&mbridge->ref_cnt);
>> 
>> -hmap_init(&mbridge->mbundles);
>> +cmap_init(&mbridge->mbundles);
>> return mbridge;
>> }
>> 
>> @@ -101,7 +101,7 @@ mbridge_ref(const struct mbridge *mbridge_)
>> void
>> mbridge_unref(struct mbridge *mbridge)
>> {
>> -struct mbundle *mbundle, *next;
>> +struct mbundle *mbundle;
>> size_t i;
>> 
>> if (!mbridge) {
>> @@ -115,11 +115,11 @@ mbridge_unref(struct mbridge *mbridge)
>> }
>> }
>> 
>> -HMAP_FOR_EACH_SAFE (mbundle, next, hmap_node, &mbridge->mbundles) {
>> +CMAP_FOR_EACH (mbundle, cmap_node, &mbridge->mbundles) {
>> mbridge_unregister_bundle(mbridge, mbundle->ofbundle);
>> }
>> 
>> -hmap_destroy(&mbridge->mbundles);
>> +cmap_destroy(&mbridge->mbundles);
>> free(mbridge);
>> }
>> }
>> @@ -147,7 +147,7 @@ mbridge_register_bundle(struct mbridge *mbridge, struct 
>> ofbundle *ofbundle)
>> 
>> mbundle = xzalloc(sizeof *mbundle);
>> mbundle->ofbundle = ofbundle;
>> -hmap_insert(&mbridge->mbundles, &mbundle->hmap_node,
>> +cmap_insert(&mbridge->mbundles, &mbundle->cmap_node,
>> hash_pointer(ofbundle, 0));
>> }
>> 
>> @@ -173,8 +173,9 @@ mbridge_unregister_bundle(struct mbridge *mbridge, 
>> struct ofbundle *ofbundle)
>> }
>> }
>> 
>> -hmap_remove(&mbridge->mbundles, &mbundle->hmap_node);
>> -free(mbundle);
>> +cmap_remove(&mbridge->mbundles, &mbundle->cmap_node,
>> +hash_pointer(ofbundle, 0));
>> +ovsrcu_postpone(free, mbundle);
>> }
>> 
>> mirror_mask_t
>> @@ -269,7 +270,7 @@ mirror_set(struct mbridge *mbridge, void *aux, const 
>> char *name,
>> 
>> /* Update mbundles. */
>> mirror_bit = MIRROR_MASK_C(1) << mirror->idx;
>> -HMAP_FOR_EACH (mbundle, hmap_node, &mirror->mbridge->mbundles) {
>> +CMAP_FOR_EACH (mbundle, cmap_node, &mirror->mbridge->mbundles) {
>> if (hmapx_contains(&mirror->srcs, mbundle)) {
>> mbundle->src_mirrors |= mirror_bit;
>> } else {
>> @@ -308,7 +309,7 @@ mirror_destroy(struct mbridge *mbridge, void *aux)
>> }
>> 
>> mirror_bit = MIRROR_MASK_C(1) << mirror->idx;
>> -HMAP_FOR_EACH (mbundle, hmap_node, &mbridge->mbundles) {
>> +CMAP_FOR_EACH (mbundle, cmap_node, &mbridge->mbundles) {
>> mbundle->src_mirrors &= ~mirror_bit;
>> mbundle->dst_mirrors &= ~mirror_bit;
>> mbundle->mirror_out &= ~mirror_bit;
>> @@ -414,9 +415,9 @@ static struct mbundle *
>> m

[ovs-dev] [PATCH v4 0/4] Userspace meter implementation.

2017-02-22 Thread Jarno Rajahalme
This series is a minor cleanup of the series Andy posted a month
ago. Changes from v3:

- Do not remove required mutex from ofproto_check_ofpacts(), as doing
  so would take more analysis.
- Note that the "execution with help" is broken.  Userspace datapath
  execution should never need help, so this is OK for now.
- Reduce the number of mutexes for userspace datapath meters from 64k
  to 64.

Jarno Rajahalme (4):
  dpif: Meter framework.
  ofproto: Fix thread safety annotation.
  ofproto: Meter translation.
  dpif-netdev: Simple DROP meter implementation.

 datapath/linux/compat/include/linux/openvswitch.h |   4 +-
 include/openvswitch/ofp-actions.h |   1 +
 lib/dpif-netdev.c | 371 ++
 lib/dpif-netlink.c|  46 ++-
 lib/dpif-provider.h   |  29 ++
 lib/dpif.c| 133 +++-
 lib/dpif.h|  13 +-
 lib/odp-execute.c |   3 +
 lib/odp-util.c|  14 +
 lib/ofp-actions.c |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-trace.c  |   2 +
 ofproto/ofproto-dpif-xlate.c  |  11 +-
 ofproto/ofproto-dpif.c|  60 +++-
 ofproto/ofproto-provider.h|  16 +-
 ofproto/ofproto.c |  50 +--
 tests/dpif-netdev.at  | 106 +++
 17 files changed, 818 insertions(+), 43 deletions(-)

-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 2/4] ofproto: Fix thread safety annotation.

2017-02-22 Thread Jarno Rajahalme
ofproto_check_ofpacts() requires ofproto_mutex, but the header did not
tell that so the trace did not take the mutex.

Signed-off-by: Jarno Rajahalme 
---
 ofproto/ofproto-dpif-trace.c | 2 ++
 ofproto/ofproto-provider.h   | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
index b01a131..3c9e3d4 100644
--- a/ofproto/ofproto-dpif-trace.c
+++ b/ofproto/ofproto-dpif-trace.c
@@ -387,8 +387,10 @@ ofproto_unixctl_trace_actions(struct unixctl_conn *conn, 
int argc,
ofproto->up.n_tables, &usable_protocols);
 }
 if (!retval) {
+ovs_mutex_lock(&ofproto_mutex);
 retval = ofproto_check_ofpacts(&ofproto->up, ofpacts.data,
ofpacts.size);
+ovs_mutex_unlock(&ofproto_mutex);
 }
 
 if (retval) {
diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index e21cb26..1361436 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -1949,7 +1949,8 @@ void ofproto_flush_flows(struct ofproto *);
 
 enum ofperr ofproto_check_ofpacts(struct ofproto *,
   const struct ofpact ofpacts[],
-  size_t ofpacts_len);
+  size_t ofpacts_len)
+OVS_REQUIRES(ofproto_mutex);
 
 static inline const struct rule_actions *
 rule_get_actions(const struct rule *rule)
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 1/4] dpif: Meter framework.

2017-02-22 Thread Jarno Rajahalme
Add DPIF-level infrastructure for meters.  Allow meter_set to modify
the meter configuration (e.g. set the burst size if unspecified).

Signed-off-by: Jarno Rajahalme 
Signed-off-by: Andy Zhou 
---
 datapath/linux/compat/include/linux/openvswitch.h |  4 +-
 lib/dpif-netdev.c | 45 
 lib/dpif-netlink.c| 46 +++-
 lib/dpif-provider.h   | 29 
 lib/dpif.c| 88 +++
 lib/dpif.h| 13 +++-
 lib/odp-execute.c |  3 +
 lib/odp-util.c| 14 
 ofproto/ofproto-dpif-sflow.c  |  1 +
 ofproto/ofproto-dpif.c| 60 ++--
 ofproto/ofproto-provider.h| 13 ++--
 ofproto/ofproto.c |  2 +-
 12 files changed, 304 insertions(+), 14 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 425d3a4..b121391 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -787,13 +787,14 @@ enum ovs_nat_attr {
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
  * type may not be changed.
  *
- *
  * @OVS_ACTION_ATTR_SET_TO_MASKED: Kernel internal masked set action translated
  * from the @OVS_ACTION_ATTR_SET.
  * @OVS_ACTION_ATTR_TUNNEL_PUSH: Push tunnel header described by struct
  * ovs_action_push_tnl.
  * @OVS_ACTION_ATTR_TUNNEL_POP: Lookup tunnel port by port-no passed and pop
  * tunnel header.
+ * @OVS_ACTION_ATTR_METER: Run packet through a meter, which may drop the
+ * packet, or modify the packet (e.g., change the DSCP field).
  */
 
 enum ovs_action_attr {
@@ -819,6 +820,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
OVS_ACTION_ATTR_CLONE, /* Nested OVS_CLONE_ATTR_*.  */
+   OVS_ACTION_ATTR_METER, /* u32 meter number. */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 30907b7..87beb01 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3651,6 +3651,46 @@ dp_netdev_disable_upcall(struct dp_netdev *dp)
 fat_rwlock_wrlock(&dp->upcall_rwlock);
 }
 
+
+/* Meters */
+static void
+dpif_netdev_meter_get_features(const struct dpif * dpif OVS_UNUSED,
+   struct ofputil_meter_features *features)
+{
+features->max_meters = 0;
+features->band_types = 0;
+features->capabilities = 0;
+features->max_bands = 0;
+features->max_color = 0;
+}
+
+static int
+dpif_netdev_meter_set(struct dpif *dpif OVS_UNUSED,
+  ofproto_meter_id *meter_id OVS_UNUSED,
+  struct ofputil_meter_config *config OVS_UNUSED)
+{
+return EFBIG; /* meter_id out of range */
+}
+
+static int
+dpif_netdev_meter_get(const struct dpif *dpif OVS_UNUSED,
+  ofproto_meter_id meter_id OVS_UNUSED,
+  struct ofputil_meter_stats *stats OVS_UNUSED,
+  uint16_t n_bands OVS_UNUSED)
+{
+return EFBIG; /* meter_id out of range */
+}
+
+static int
+dpif_netdev_meter_del(struct dpif *dpif OVS_UNUSED,
+  ofproto_meter_id meter_id OVS_UNUSED,
+  struct ofputil_meter_stats *stats OVS_UNUSED,
+  uint16_t n_bands OVS_UNUSED)
+{
+return EFBIG; /* meter_id out of range */
+}
+
+
 static void
 dpif_netdev_disable_upcall(struct dpif *dpif)
 OVS_NO_THREAD_SAFETY_ANALYSIS
@@ -4773,6 +4813,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 break;
 }
 
+case OVS_ACTION_ATTR_METER:
 case OVS_ACTION_ATTR_PUSH_VLAN:
 case OVS_ACTION_ATTR_POP_VLAN:
 case OVS_ACTION_ATTR_PUSH_MPLS:
@@ -4910,6 +4951,10 @@ const struct dpif_class dpif_netdev_class = {
 dpif_netdev_ct_dump_next,
 dpif_netdev_ct_dump_done,
 dpif_netdev_ct_flush,
+dpif_netdev_meter_get_features,
+dpif_netdev_meter_set,
+dpif_netdev_meter_get,
+dpif_netdev_meter_del,
 };
 
 static void
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 9762a87..b491ea6 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2356,6 +2356,46 @@ dpif_netlink_ct_flush(struct dpif *dpif OVS_UNUSED, 
const uint16_t *zone)
 }
 }
 
+
+/* Meters */
+static void
+dpif_netlink_meter_get_features(const struct dpif * dpif OVS_UNUSED,
+struct ofputil_meter_features *features)
+{
+features->max_meters = 0;
+features->band_types = 0;
+features->

[ovs-dev] [PATCH v4 3/4] ofproto: Meter translation.

2017-02-22 Thread Jarno Rajahalme
Translate OpenFlow METER instructions to datapath meter actions.

Signed-off-by: Jarno Rajahalme 
Signed-off-by: Andy Zhou 
---
 include/openvswitch/ofp-actions.h |  1 +
 lib/dpif.c| 47 +++---
 lib/ofp-actions.c |  1 +
 ofproto/ofproto-dpif-xlate.c  | 11 -
 ofproto/ofproto.c | 48 +++
 5 files changed, 79 insertions(+), 29 deletions(-)

diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 88f573d..b487e6de 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -534,6 +534,7 @@ struct ofpact_metadata {
 struct ofpact_meter {
 struct ofpact ofpact;
 uint32_t meter_id;
+uint32_t provider_meter_id;
 };
 
 /* OFPACT_WRITE_ACTIONS, OFPACT_CLONE.
diff --git a/lib/dpif.c b/lib/dpif.c
index 3f36ccd..aa6f37a 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -1104,6 +1104,7 @@ struct dpif_execute_helper_aux {
 struct dpif *dpif;
 const struct flow *flow;
 int error;
+const struct nlattr *meter_action; /* Non-NULL, if have a meter action. */
 };
 
 /* This is called for actions that need the context of the datapath to be
@@ -1119,6 +1120,13 @@ dpif_execute_helper_cb(void *aux_, struct 
dp_packet_batch *packets_,
 ovs_assert(packets_->count == 1);
 
 switch ((enum ovs_action_attr)type) {
+case OVS_ACTION_ATTR_METER:
+/* Maintain a pointer to the first meter action seen. */
+if (!aux->meter_action) {
+aux->meter_action = action;
+}
+   break;
+
 case OVS_ACTION_ATTR_CT:
 case OVS_ACTION_ATTR_OUTPUT:
 case OVS_ACTION_ATTR_TUNNEL_PUSH:
@@ -1129,15 +1137,36 @@ dpif_execute_helper_cb(void *aux_, struct 
dp_packet_batch *packets_,
 struct ofpbuf execute_actions;
 uint64_t stub[256 / 8];
 struct pkt_metadata *md = &packet->md;
-bool dst_set;
 
-dst_set = flow_tnl_dst_is_set(&md->tunnel);
-if (dst_set) {
+if (flow_tnl_dst_is_set(&md->tunnel) || aux->meter_action) {
+ofpbuf_use_stub(&execute_actions, stub, sizeof stub);
+
+if (aux->meter_action) {
+const struct nlattr *a = aux->meter_action;
+
+/* XXX: This code collects meter actions since the last action
+ * execution via the datapath to be executed right before the
+ * current action that needs to be executed by the datapath.
+ * This is only an approximation, but better than nothing.
+ * Fundamentally, we should have a mechanism by which the
+ * datapath could return the result of the meter action so that
+ * we could execute them at the right order. */
+do {
+ofpbuf_put(&execute_actions, a, NLA_ALIGN(a->nla_len));
+/* Find next meter action before 'action', if any. */
+do {
+a = nl_attr_next(a);
+} while (a != action &&
+ nl_attr_type(a) != OVS_ACTION_ATTR_METER);
+} while (a != action);
+}
+
 /* The Linux kernel datapath throws away the tunnel information
  * that we supply as metadata.  We have to use a "set" action to
  * supply it. */
-ofpbuf_use_stub(&execute_actions, stub, sizeof stub);
-odp_put_tunnel_action(&md->tunnel, &execute_actions);
+if (md->tunnel.ip_dst) {
+odp_put_tunnel_action(&md->tunnel, &execute_actions);
+}
 ofpbuf_put(&execute_actions, action, NLA_ALIGN(action->nla_len));
 
 execute.actions = execute_actions.data;
@@ -1170,14 +1199,16 @@ dpif_execute_helper_cb(void *aux_, struct 
dp_packet_batch *packets_,
 
 dp_packet_delete(clone);
 
-if (dst_set) {
+if (flow_tnl_dst_is_set(&md->tunnel) || aux->meter_action) {
 ofpbuf_uninit(&execute_actions);
+
+/* Do not re-use the same meters for later output actions. */
+aux->meter_action = NULL;
 }
 break;
 }
 
 case OVS_ACTION_ATTR_HASH:
-case OVS_ACTION_ATTR_METER:
 case OVS_ACTION_ATTR_PUSH_VLAN:
 case OVS_ACTION_ATTR_POP_VLAN:
 case OVS_ACTION_ATTR_PUSH_MPLS:
@@ -1201,7 +1232,7 @@ dpif_execute_helper_cb(void *aux_, struct dp_packet_batch 
*packets_,
 static int
 dpif_execute_with_help(struct dpif *dpif, struct dpif_execute *execute)
 {
-struct dpif_execute_helper_aux aux = {dpif, execute->flow, 0};
+struct dpif_execute_helper_aux aux = {dpif, execute->flow, 0, NULL};
 struct dp_packet_batch pb;
 
 COVERAGE_INC(dpif_execute_with_help);
diff --git

[ovs-dev] [PATCH v4 4/4] dpif-netdev: Simple DROP meter implementation.

2017-02-22 Thread Jarno Rajahalme
Meters may be used by any flow, so some kind of locking must be used.
In this version we have an adaptive mutex for each meter, which may
not be optimal for DPDK.  However, this should serve as a basis for
further improvement.

A batch of packets is first tried as a whole, and only if some of the
meter bands are hit, we need to process the packets individually.

Signed-off-by: Jarno Rajahalme 
Signed-off-by: Andy Zhou 
---
 lib/dpif-netdev.c| 362 ---
 tests/dpif-netdev.at | 106 +++
 2 files changed, 450 insertions(+), 18 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 87beb01..4257f45 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -86,6 +86,8 @@ DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0)
 
 /* Configuration parameters. */
 enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. */
+enum { MAX_METERS = 65536 };/* Maximum number of meters. */
+enum { MAX_BANDS = 8 }; /* Maximum number of bands / meter. */
 
 /* Protects against changes to 'dp_netdevs'. */
 static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -198,6 +200,31 @@ static bool dpcls_lookup(struct dpcls *cls,
  struct dpcls_rule **rules, size_t cnt,
  int *num_lookups_p);
 
+/* Set of supported meter flags */
+#define DP_SUPPORTED_METER_FLAGS_MASK \
+(OFPMF13_STATS | OFPMF13_PKTPS | OFPMF13_KBPS | OFPMF13_BURST)
+
+/* Set of supported meter band types */
+#define DP_SUPPORTED_METER_BAND_TYPES   \
+( 1 << OFPMBT13_DROP )
+
+struct dp_meter_band {
+struct ofputil_meter_band up; /* type, prec_level, pad, rate, burst_size */
+uint32_t bucket; /* In 1/1000 packets (for PKTPS), or in bits (for KBPS) */
+uint64_t packet_count;
+uint64_t byte_count;
+};
+
+struct dp_meter {
+uint16_t flags;
+uint16_t n_bands;
+uint32_t max_delta_t;
+uint64_t used;
+uint64_t packet_count;
+uint64_t byte_count;
+struct dp_meter_band bands[];
+};
+
 /* Datapath based on the network device interface from netdev.h.
  *
  *
@@ -228,6 +255,11 @@ struct dp_netdev {
 struct hmap ports;
 struct seq *port_seq;   /* Incremented whenever a port changes. */
 
+/* Meters. */
+struct ovs_mutex meter_locks[MAX_METERS];
+struct dp_meter *meters[MAX_METERS]; /* Meter bands. */
+uint32_t meter_free; /* Next free meter. */
+
 /* Protects access to ofproto-dpif-upcall interface during revalidator
  * thread synchronization. */
 struct fat_rwlock upcall_rwlock;
@@ -1067,6 +1099,10 @@ create_dp_netdev(const char *name, const struct 
dpif_class *class,
 dp->reconfigure_seq = seq_create();
 dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
 
+for (int i = 0; i < MAX_METERS; ++i) {
+ovs_mutex_init_adaptive(&dp->meter_locks[i]);
+}
+
 /* Disable upcalls by default. */
 dp_netdev_disable_upcall(dp);
 dp->upcall_aux = NULL;
@@ -1146,6 +1182,16 @@ dp_netdev_destroy_upcall_lock(struct dp_netdev *dp)
 fat_rwlock_destroy(&dp->upcall_rwlock);
 }
 
+static void
+dp_delete_meter(struct dp_netdev *dp, uint32_t meter_id)
+OVS_REQUIRES(dp->meter_locks[meter_id])
+{
+if (dp->meters[meter_id]) {
+free(dp->meters[meter_id]);
+dp->meters[meter_id] = NULL;
+}
+}
+
 /* Requires dp_netdev_mutex so that we can't get a new reference to 'dp'
  * through the 'dp_netdevs' shash while freeing 'dp'. */
 static void
@@ -1161,6 +1207,7 @@ dp_netdev_free(struct dp_netdev *dp)
 do_del_port(dp, port);
 }
 ovs_mutex_unlock(&dp->port_mutex);
+
 dp_netdev_destroy_all_pmds(dp, true);
 cmap_destroy(&dp->poll_threads);
 
@@ -1179,6 +1226,13 @@ dp_netdev_free(struct dp_netdev *dp)
 /* Upcalls must be disabled at this point */
 dp_netdev_destroy_upcall_lock(dp);
 
+for (int i = 0; i < MAX_METERS; ++i) {
+ovs_mutex_lock(&dp->meter_locks[i]);
+dp_delete_meter(dp, i);
+ovs_mutex_unlock(&dp->meter_locks[i]);
+ovs_mutex_destroy(&dp->meter_locks[i]);
+}
+
 free(dp->pmd_cmask);
 free(CONST_CAST(char *, dp->name));
 free(dp);
@@ -3657,37 +3711,304 @@ static void
 dpif_netdev_meter_get_features(const struct dpif * dpif OVS_UNUSED,
struct ofputil_meter_features *features)
 {
-features->max_meters = 0;
-features->band_types = 0;
-features->capabilities = 0;
-features->max_bands = 0;
+features->max_meters = MAX_METERS;
+features->band_types = DP_SUPPORTED_METER_BAND_TYPES;
+features->capabilities = DP_SUPPORTED_METER_FLAGS_MASK;
+features->max_bands = MAX_BANDS;
 features->max_color = 0;
 }
 
+/* Returns false when packet needs to be dropped. */

[ovs-dev] [PATCH v5 0/4] Userspace meter implementation.

2017-02-23 Thread Jarno Rajahalme
This series is a minor cleanup of the series Andy posted a month
ago.

Changes from v4 to v5:

- Actually reduce the number of mutexes for userspace datapath meters from 64k
  to 64.  I forgot to add the change to git before sending v4.

Changes from v3 to v4:

- Do not remove required mutex from ofproto_check_ofpacts(), as doing
  so would take more analysis.
- Note that the "execution with help" is broken.  Userspace datapath
  execution should never need help, so this is OK for now.

Jarno Rajahalme (4):
  dpif: Meter framework.
  ofproto: Fix thread safety annotation.
  ofproto: Meter translation.
  dpif-netdev: Simple DROP meter implementation.

 datapath/linux/compat/include/linux/openvswitch.h |   4 +-
 include/openvswitch/ofp-actions.h |   1 +
 lib/dpif-netdev.c | 389 ++
 lib/dpif-netlink.c|  46 ++-
 lib/dpif-provider.h   |  29 ++
 lib/dpif.c| 133 +++-
 lib/dpif.h|  13 +-
 lib/odp-execute.c |   3 +
 lib/odp-util.c|  14 +
 lib/ofp-actions.c |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-trace.c  |   2 +
 ofproto/ofproto-dpif-xlate.c  |  11 +-
 ofproto/ofproto-dpif.c|  60 +++-
 ofproto/ofproto-provider.h|  16 +-
 ofproto/ofproto.c |  50 +--
 tests/dpif-netdev.at  | 106 ++
 17 files changed, 836 insertions(+), 43 deletions(-)

-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5 1/4] dpif: Meter framework.

2017-02-23 Thread Jarno Rajahalme
Add DPIF-level infrastructure for meters.  Allow meter_set to modify
the meter configuration (e.g. set the burst size if unspecified).

Signed-off-by: Jarno Rajahalme 
Signed-off-by: Andy Zhou 
---
 datapath/linux/compat/include/linux/openvswitch.h |  4 +-
 lib/dpif-netdev.c | 45 
 lib/dpif-netlink.c| 46 +++-
 lib/dpif-provider.h   | 29 
 lib/dpif.c| 88 +++
 lib/dpif.h| 13 +++-
 lib/odp-execute.c |  3 +
 lib/odp-util.c| 14 
 ofproto/ofproto-dpif-sflow.c  |  1 +
 ofproto/ofproto-dpif.c| 60 ++--
 ofproto/ofproto-provider.h| 13 ++--
 ofproto/ofproto.c |  2 +-
 12 files changed, 304 insertions(+), 14 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 425d3a4..b121391 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -787,13 +787,14 @@ enum ovs_nat_attr {
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
  * type may not be changed.
  *
- *
  * @OVS_ACTION_ATTR_SET_TO_MASKED: Kernel internal masked set action translated
  * from the @OVS_ACTION_ATTR_SET.
  * @OVS_ACTION_ATTR_TUNNEL_PUSH: Push tunnel header described by struct
  * ovs_action_push_tnl.
  * @OVS_ACTION_ATTR_TUNNEL_POP: Lookup tunnel port by port-no passed and pop
  * tunnel header.
+ * @OVS_ACTION_ATTR_METER: Run packet through a meter, which may drop the
+ * packet, or modify the packet (e.g., change the DSCP field).
  */
 
 enum ovs_action_attr {
@@ -819,6 +820,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
OVS_ACTION_ATTR_CLONE, /* Nested OVS_CLONE_ATTR_*.  */
+   OVS_ACTION_ATTR_METER, /* u32 meter number. */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 30907b7..87beb01 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3651,6 +3651,46 @@ dp_netdev_disable_upcall(struct dp_netdev *dp)
 fat_rwlock_wrlock(&dp->upcall_rwlock);
 }
 
+
+/* Meters */
+static void
+dpif_netdev_meter_get_features(const struct dpif * dpif OVS_UNUSED,
+   struct ofputil_meter_features *features)
+{
+features->max_meters = 0;
+features->band_types = 0;
+features->capabilities = 0;
+features->max_bands = 0;
+features->max_color = 0;
+}
+
+static int
+dpif_netdev_meter_set(struct dpif *dpif OVS_UNUSED,
+  ofproto_meter_id *meter_id OVS_UNUSED,
+  struct ofputil_meter_config *config OVS_UNUSED)
+{
+return EFBIG; /* meter_id out of range */
+}
+
+static int
+dpif_netdev_meter_get(const struct dpif *dpif OVS_UNUSED,
+  ofproto_meter_id meter_id OVS_UNUSED,
+  struct ofputil_meter_stats *stats OVS_UNUSED,
+  uint16_t n_bands OVS_UNUSED)
+{
+return EFBIG; /* meter_id out of range */
+}
+
+static int
+dpif_netdev_meter_del(struct dpif *dpif OVS_UNUSED,
+  ofproto_meter_id meter_id OVS_UNUSED,
+  struct ofputil_meter_stats *stats OVS_UNUSED,
+  uint16_t n_bands OVS_UNUSED)
+{
+return EFBIG; /* meter_id out of range */
+}
+
+
 static void
 dpif_netdev_disable_upcall(struct dpif *dpif)
 OVS_NO_THREAD_SAFETY_ANALYSIS
@@ -4773,6 +4813,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 break;
 }
 
+case OVS_ACTION_ATTR_METER:
 case OVS_ACTION_ATTR_PUSH_VLAN:
 case OVS_ACTION_ATTR_POP_VLAN:
 case OVS_ACTION_ATTR_PUSH_MPLS:
@@ -4910,6 +4951,10 @@ const struct dpif_class dpif_netdev_class = {
 dpif_netdev_ct_dump_next,
 dpif_netdev_ct_dump_done,
 dpif_netdev_ct_flush,
+dpif_netdev_meter_get_features,
+dpif_netdev_meter_set,
+dpif_netdev_meter_get,
+dpif_netdev_meter_del,
 };
 
 static void
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 9762a87..b491ea6 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2356,6 +2356,46 @@ dpif_netlink_ct_flush(struct dpif *dpif OVS_UNUSED, 
const uint16_t *zone)
 }
 }
 
+
+/* Meters */
+static void
+dpif_netlink_meter_get_features(const struct dpif * dpif OVS_UNUSED,
+struct ofputil_meter_features *features)
+{
+features->max_meters = 0;
+features->band_types = 0;
+features->

[ovs-dev] [PATCH v5 2/4] ofproto: Fix thread safety annotation.

2017-02-23 Thread Jarno Rajahalme
ofproto_check_ofpacts() requires ofproto_mutex, but the header did not
tell that so the trace did not take the mutex.

Signed-off-by: Jarno Rajahalme 
---
 ofproto/ofproto-dpif-trace.c | 2 ++
 ofproto/ofproto-provider.h   | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
index b01a131..3c9e3d4 100644
--- a/ofproto/ofproto-dpif-trace.c
+++ b/ofproto/ofproto-dpif-trace.c
@@ -387,8 +387,10 @@ ofproto_unixctl_trace_actions(struct unixctl_conn *conn, 
int argc,
ofproto->up.n_tables, &usable_protocols);
 }
 if (!retval) {
+ovs_mutex_lock(&ofproto_mutex);
 retval = ofproto_check_ofpacts(&ofproto->up, ofpacts.data,
ofpacts.size);
+ovs_mutex_unlock(&ofproto_mutex);
 }
 
 if (retval) {
diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index e21cb26..1361436 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -1949,7 +1949,8 @@ void ofproto_flush_flows(struct ofproto *);
 
 enum ofperr ofproto_check_ofpacts(struct ofproto *,
   const struct ofpact ofpacts[],
-  size_t ofpacts_len);
+  size_t ofpacts_len)
+OVS_REQUIRES(ofproto_mutex);
 
 static inline const struct rule_actions *
 rule_get_actions(const struct rule *rule)
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5 3/4] ofproto: Meter translation.

2017-02-23 Thread Jarno Rajahalme
Translate OpenFlow METER instructions to datapath meter actions.

Signed-off-by: Jarno Rajahalme 
Signed-off-by: Andy Zhou 
---
 include/openvswitch/ofp-actions.h |  1 +
 lib/dpif.c| 47 +++---
 lib/ofp-actions.c |  1 +
 ofproto/ofproto-dpif-xlate.c  | 11 -
 ofproto/ofproto.c | 48 +++
 5 files changed, 79 insertions(+), 29 deletions(-)

diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 88f573d..b487e6de 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -534,6 +534,7 @@ struct ofpact_metadata {
 struct ofpact_meter {
 struct ofpact ofpact;
 uint32_t meter_id;
+uint32_t provider_meter_id;
 };
 
 /* OFPACT_WRITE_ACTIONS, OFPACT_CLONE.
diff --git a/lib/dpif.c b/lib/dpif.c
index 3f36ccd..aa6f37a 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -1104,6 +1104,7 @@ struct dpif_execute_helper_aux {
 struct dpif *dpif;
 const struct flow *flow;
 int error;
+const struct nlattr *meter_action; /* Non-NULL, if have a meter action. */
 };
 
 /* This is called for actions that need the context of the datapath to be
@@ -1119,6 +1120,13 @@ dpif_execute_helper_cb(void *aux_, struct 
dp_packet_batch *packets_,
 ovs_assert(packets_->count == 1);
 
 switch ((enum ovs_action_attr)type) {
+case OVS_ACTION_ATTR_METER:
+/* Maintain a pointer to the first meter action seen. */
+if (!aux->meter_action) {
+aux->meter_action = action;
+}
+   break;
+
 case OVS_ACTION_ATTR_CT:
 case OVS_ACTION_ATTR_OUTPUT:
 case OVS_ACTION_ATTR_TUNNEL_PUSH:
@@ -1129,15 +1137,36 @@ dpif_execute_helper_cb(void *aux_, struct 
dp_packet_batch *packets_,
 struct ofpbuf execute_actions;
 uint64_t stub[256 / 8];
 struct pkt_metadata *md = &packet->md;
-bool dst_set;
 
-dst_set = flow_tnl_dst_is_set(&md->tunnel);
-if (dst_set) {
+if (flow_tnl_dst_is_set(&md->tunnel) || aux->meter_action) {
+ofpbuf_use_stub(&execute_actions, stub, sizeof stub);
+
+if (aux->meter_action) {
+const struct nlattr *a = aux->meter_action;
+
+/* XXX: This code collects meter actions since the last action
+ * execution via the datapath to be executed right before the
+ * current action that needs to be executed by the datapath.
+ * This is only an approximation, but better than nothing.
+ * Fundamentally, we should have a mechanism by which the
+ * datapath could return the result of the meter action so that
+ * we could execute them at the right order. */
+do {
+ofpbuf_put(&execute_actions, a, NLA_ALIGN(a->nla_len));
+/* Find next meter action before 'action', if any. */
+do {
+a = nl_attr_next(a);
+} while (a != action &&
+ nl_attr_type(a) != OVS_ACTION_ATTR_METER);
+} while (a != action);
+}
+
 /* The Linux kernel datapath throws away the tunnel information
  * that we supply as metadata.  We have to use a "set" action to
  * supply it. */
-ofpbuf_use_stub(&execute_actions, stub, sizeof stub);
-odp_put_tunnel_action(&md->tunnel, &execute_actions);
+if (md->tunnel.ip_dst) {
+odp_put_tunnel_action(&md->tunnel, &execute_actions);
+}
 ofpbuf_put(&execute_actions, action, NLA_ALIGN(action->nla_len));
 
 execute.actions = execute_actions.data;
@@ -1170,14 +1199,16 @@ dpif_execute_helper_cb(void *aux_, struct 
dp_packet_batch *packets_,
 
 dp_packet_delete(clone);
 
-if (dst_set) {
+if (flow_tnl_dst_is_set(&md->tunnel) || aux->meter_action) {
 ofpbuf_uninit(&execute_actions);
+
+/* Do not re-use the same meters for later output actions. */
+aux->meter_action = NULL;
 }
 break;
 }
 
 case OVS_ACTION_ATTR_HASH:
-case OVS_ACTION_ATTR_METER:
 case OVS_ACTION_ATTR_PUSH_VLAN:
 case OVS_ACTION_ATTR_POP_VLAN:
 case OVS_ACTION_ATTR_PUSH_MPLS:
@@ -1201,7 +1232,7 @@ dpif_execute_helper_cb(void *aux_, struct dp_packet_batch 
*packets_,
 static int
 dpif_execute_with_help(struct dpif *dpif, struct dpif_execute *execute)
 {
-struct dpif_execute_helper_aux aux = {dpif, execute->flow, 0};
+struct dpif_execute_helper_aux aux = {dpif, execute->flow, 0, NULL};
 struct dp_packet_batch pb;
 
 COVERAGE_INC(dpif_execute_with_help);
diff --git

[ovs-dev] [PATCH v5 4/4] dpif-netdev: Simple DROP meter implementation.

2017-02-23 Thread Jarno Rajahalme
Meters may be used by any flow, so some kind of locking must be used.
In this version we have an adaptive mutex for each meter, which may
not be optimal for DPDK.  However, this should serve as a basis for
further improvement.

A batch of packets is first tried as a whole, and only if some of the
meter bands are hit, we need to process the packets individually.

Signed-off-by: Jarno Rajahalme 
Signed-off-by: Andy Zhou 
---
 lib/dpif-netdev.c| 380 ---
 tests/dpif-netdev.at | 106 ++
 2 files changed, 468 insertions(+), 18 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 87beb01..6e04c89 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -86,6 +86,9 @@ DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0)
 
 /* Configuration parameters. */
 enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. */
+enum { MAX_METERS = 65536 };/* Maximum number of meters. */
+enum { MAX_BANDS = 8 }; /* Maximum number of bands / meter. */
+enum { N_METER_LOCKS = 64 };/* Maximum number of meters. */
 
 /* Protects against changes to 'dp_netdevs'. */
 static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -198,6 +201,31 @@ static bool dpcls_lookup(struct dpcls *cls,
  struct dpcls_rule **rules, size_t cnt,
  int *num_lookups_p);
 
+/* Set of supported meter flags */
+#define DP_SUPPORTED_METER_FLAGS_MASK \
+(OFPMF13_STATS | OFPMF13_PKTPS | OFPMF13_KBPS | OFPMF13_BURST)
+
+/* Set of supported meter band types */
+#define DP_SUPPORTED_METER_BAND_TYPES   \
+( 1 << OFPMBT13_DROP )
+
+struct dp_meter_band {
+struct ofputil_meter_band up; /* type, prec_level, pad, rate, burst_size */
+uint32_t bucket; /* In 1/1000 packets (for PKTPS), or in bits (for KBPS) */
+uint64_t packet_count;
+uint64_t byte_count;
+};
+
+struct dp_meter {
+uint16_t flags;
+uint16_t n_bands;
+uint32_t max_delta_t;
+uint64_t used;
+uint64_t packet_count;
+uint64_t byte_count;
+struct dp_meter_band bands[];
+};
+
 /* Datapath based on the network device interface from netdev.h.
  *
  *
@@ -228,6 +256,11 @@ struct dp_netdev {
 struct hmap ports;
 struct seq *port_seq;   /* Incremented whenever a port changes. */
 
+/* Meters. */
+struct ovs_mutex meter_locks[N_METER_LOCKS];
+struct dp_meter *meters[MAX_METERS]; /* Meter bands. */
+uint32_t meter_free; /* Next free meter. */
+
 /* Protects access to ofproto-dpif-upcall interface during revalidator
  * thread synchronization. */
 struct fat_rwlock upcall_rwlock;
@@ -264,6 +297,19 @@ struct dp_netdev {
 OVS_ALIGNED_VAR(CACHE_LINE_SIZE) atomic_uint32_t emc_insert_min;
 };
 
+static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
+OVS_ACQUIRES(dp->meter_locks[meter_id % N_METER_LOCKS])
+{
+ovs_mutex_lock(&dp->meter_locks[meter_id % N_METER_LOCKS]);
+}
+
+static void meter_unlock(const struct dp_netdev *dp, uint32_t meter_id)
+OVS_RELEASES(dp->meter_locks[meter_id % N_METER_LOCKS])
+{
+ovs_mutex_unlock(&dp->meter_locks[meter_id % N_METER_LOCKS]);
+}
+
+
 static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp,
 odp_port_t)
 OVS_REQUIRES(dp->port_mutex);
@@ -1067,6 +1113,10 @@ create_dp_netdev(const char *name, const struct 
dpif_class *class,
 dp->reconfigure_seq = seq_create();
 dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
 
+for (int i = 0; i < N_METER_LOCKS; ++i) {
+ovs_mutex_init_adaptive(&dp->meter_locks[i]);
+}
+
 /* Disable upcalls by default. */
 dp_netdev_disable_upcall(dp);
 dp->upcall_aux = NULL;
@@ -1146,6 +1196,16 @@ dp_netdev_destroy_upcall_lock(struct dp_netdev *dp)
 fat_rwlock_destroy(&dp->upcall_rwlock);
 }
 
+static void
+dp_delete_meter(struct dp_netdev *dp, uint32_t meter_id)
+OVS_REQUIRES(dp->meter_locks[meter_id % N_METER_LOCKS])
+{
+if (dp->meters[meter_id]) {
+free(dp->meters[meter_id]);
+dp->meters[meter_id] = NULL;
+}
+}
+
 /* Requires dp_netdev_mutex so that we can't get a new reference to 'dp'
  * through the 'dp_netdevs' shash while freeing 'dp'. */
 static void
@@ -1161,6 +1221,7 @@ dp_netdev_free(struct dp_netdev *dp)
 do_del_port(dp, port);
 }
 ovs_mutex_unlock(&dp->port_mutex);
+
 dp_netdev_destroy_all_pmds(dp, true);
 cmap_destroy(&dp->poll_threads);
 
@@ -1179,6 +1240,17 @@ dp_netdev_free(struct dp_netdev *dp)
 /* Upcalls must be disabled at this point */
 dp_netdev_destroy_upcall_lock(dp);
 
+int i;
+
+for (i = 0; i < MAX_METERS; ++i) {
+meter_lock(dp, i);
+dp_delete_meter(dp, i);
+meter_unl

Re: [ovs-dev] [PATCH v4 4/4] dpif-netdev: Simple DROP meter implementation.

2017-02-23 Thread Jarno Rajahalme
I forgot to add advertised the changes to git before posting, so I just sent a 
v5.

  Jarno

> On Feb 22, 2017, at 6:34 PM, Jarno Rajahalme  wrote:
> 
> Meters may be used by any flow, so some kind of locking must be used.
> In this version we have an adaptive mutex for each meter, which may
> not be optimal for DPDK.  However, this should serve as a basis for
> further improvement.
> 
> A batch of packets is first tried as a whole, and only if some of the
> meter bands are hit, we need to process the packets individually.
> 
> Signed-off-by: Jarno Rajahalme 
> Signed-off-by: Andy Zhou 
> ---
> lib/dpif-netdev.c| 362 ---
> tests/dpif-netdev.at | 106 +++
> 2 files changed, 450 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 87beb01..4257f45 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -86,6 +86,8 @@ DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0)
> 
> /* Configuration parameters. */
> enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. */
> +enum { MAX_METERS = 65536 };/* Maximum number of meters. */
> +enum { MAX_BANDS = 8 }; /* Maximum number of bands / meter. */
> 
> /* Protects against changes to 'dp_netdevs'. */
> static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
> @@ -198,6 +200,31 @@ static bool dpcls_lookup(struct dpcls *cls,
>  struct dpcls_rule **rules, size_t cnt,
>  int *num_lookups_p);
> 
> +/* Set of supported meter flags */
> +#define DP_SUPPORTED_METER_FLAGS_MASK \
> +(OFPMF13_STATS | OFPMF13_PKTPS | OFPMF13_KBPS | OFPMF13_BURST)
> +
> +/* Set of supported meter band types */
> +#define DP_SUPPORTED_METER_BAND_TYPES   \
> +( 1 << OFPMBT13_DROP )
> +
> +struct dp_meter_band {
> +struct ofputil_meter_band up; /* type, prec_level, pad, rate, burst_size 
> */
> +uint32_t bucket; /* In 1/1000 packets (for PKTPS), or in bits (for KBPS) 
> */
> +uint64_t packet_count;
> +uint64_t byte_count;
> +};
> +
> +struct dp_meter {
> +uint16_t flags;
> +uint16_t n_bands;
> +uint32_t max_delta_t;
> +uint64_t used;
> +uint64_t packet_count;
> +uint64_t byte_count;
> +struct dp_meter_band bands[];
> +};
> +
> /* Datapath based on the network device interface from netdev.h.
>  *
>  *
> @@ -228,6 +255,11 @@ struct dp_netdev {
> struct hmap ports;
> struct seq *port_seq;   /* Incremented whenever a port changes. */
> 
> +/* Meters. */
> +struct ovs_mutex meter_locks[MAX_METERS];
> +struct dp_meter *meters[MAX_METERS]; /* Meter bands. */
> +uint32_t meter_free; /* Next free meter. */
> +
> /* Protects access to ofproto-dpif-upcall interface during revalidator
>  * thread synchronization. */
> struct fat_rwlock upcall_rwlock;
> @@ -1067,6 +1099,10 @@ create_dp_netdev(const char *name, const struct 
> dpif_class *class,
> dp->reconfigure_seq = seq_create();
> dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
> 
> +for (int i = 0; i < MAX_METERS; ++i) {
> +ovs_mutex_init_adaptive(&dp->meter_locks[i]);
> +}
> +
> /* Disable upcalls by default. */
> dp_netdev_disable_upcall(dp);
> dp->upcall_aux = NULL;
> @@ -1146,6 +1182,16 @@ dp_netdev_destroy_upcall_lock(struct dp_netdev *dp)
> fat_rwlock_destroy(&dp->upcall_rwlock);
> }
> 
> +static void
> +dp_delete_meter(struct dp_netdev *dp, uint32_t meter_id)
> +OVS_REQUIRES(dp->meter_locks[meter_id])
> +{
> +if (dp->meters[meter_id]) {
> +free(dp->meters[meter_id]);
> +dp->meters[meter_id] = NULL;
> +}
> +}
> +
> /* Requires dp_netdev_mutex so that we can't get a new reference to 'dp'
>  * through the 'dp_netdevs' shash while freeing 'dp'. */
> static void
> @@ -1161,6 +1207,7 @@ dp_netdev_free(struct dp_netdev *dp)
> do_del_port(dp, port);
> }
> ovs_mutex_unlock(&dp->port_mutex);
> +
> dp_netdev_destroy_all_pmds(dp, true);
> cmap_destroy(&dp->poll_threads);
> 
> @@ -1179,6 +1226,13 @@ dp_netdev_free(struct dp_netdev *dp)
> /* Upcalls must be disabled at this point */
> dp_netdev_destroy_upcall_lock(dp);
> 
> +for (int i = 0; i < MAX_METERS; ++i) {
> +ovs_mutex_lock(&dp->meter_locks[i]);
> +dp_delete_meter(dp, i);
> +ovs_mutex_unlock(&dp->meter_locks[i]);
> +ovs_mutex_destroy(&dp->meter_lo

Re: [ovs-dev] Sync on PTAP, EXT-382 and NSH

2017-02-23 Thread Jarno Rajahalme
Thanks for the invite. In general I’d prefer if the email title or body would 
also contain the proposed meeting date & time.

I started review of the L3 userspace patches 3 weeks ago, but unfortunately 
have been swamped by urgent release tasks ever since. This should ease a bit 
next week.

  Jarno

> On Feb 23, 2017, at 6:24 AM, Jan Scheurich  wrote:
> 
> Hi,
>  
> It’s a while since we last had sync meeting. Now that OVS 2.7 is released, I 
> would like to resume the calls.
>  
> The first user space patch series to support L3 tunnels with a non-PTAP 
> bridge was posted 3 weeks ago
> (https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328391.html 
> ) 
> but unfortunately hasn’t received any reviews yet. 
>  
> The next patch series adding support for PTAP as an optional feature of a 
> bridge is completed and could be posted but we are waiting for the first 
> patch set to be reviewed.
>  
> In the meantime there has also been some progress with back-porting Jiri’s 
> kernel datapath patches for L3 tunnels (and some earlier required kernel 
> patches) to the OVS tree and have them configurable from user space through 
> rtnetlink API.
>  
> Let’s have a look at the status and work out a plan how to proceed in order 
> to achieve the agreed target to upstream these changes in time for OVS 2.8.
>  
> Thank you,
> Jan
>  
> Link to the Google design doc:
> https://docs.google.com/document/d/1oWMYUH8sjZJzWa72o2q9kU0N6pNE-rwZcLH3-kbbDR8/edit
>  
> 
>  
>  
> .
>  <>à Join Skype Meeting    
> <><>
> This is an online meeting for Skype for Business, the professional meetings 
> and communications app formerly known as Lync.
> Join by phone
>  
> +492115343925  (Germany)  English 
> (United States)
> 89925  (Germany)  English (United States) 
>  
> Find a local number 
>  
> Conference ID: 70849799
> Forgot your dial-in PIN?  |Help 
>   
>  
>  
> To join a Lync / Skype for Business meeting from an Ericsson standard video 
> room, add 77 before the Conference ID (e.g. 771234567 where 1234567 is the 
> conference ID).To join from a video room outside of Ericsson add one of 
> the domains after 77 and Conference ID (e.g. 771234567@ .ericsson.net 
> , where =emea/apac/amcs).  For assistance 
> contact the IT Service Desk. 
> [!OC([1033])!]
> .
>  
>  
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] ofproto/bond: Fix bond reconfiguration race condition.

2017-02-23 Thread Jarno Rajahalme
LGTM,

Acked-by: Jarno Rajahalme 

> On Feb 23, 2017, at 1:31 PM, Andy Zhou  wrote:
> 
> During the upcall thread bond output translation, bond_may_recirc()
> is currently called outside the lock. In case the main thread executes
> bond_reconfigure() at the same time, the upcall thread may find bond
> state to be inconsistent when calling bond_update_post_recirc_rules().
> 
> This patch fixes the race condition by acquiring the write lock
> before calling bond_may_recirc(). The APIs are refactored slightly.
> 
> The race condition can result in the following stack trace. Copied
> from 'Reported-at':
> 
>Thread 23 handler69:
>Invalid write of size 8
>update_recirc_rules (bond.c:385)
>bond_update_post_recirc_rules__ (bond.c:952)
>bond_update_post_recirc_rules (bond.c:960)
>output_normal (ofproto-dpif-xlate.c:2102)
>xlate_normal (ofproto-dpif-xlate.c:2858)
>xlate_output_action (ofproto-dpif-xlate.c:4407)
>do_xlate_actions (ofproto-dpif-xlate.c:5335)
>xlate_actions (ofproto-dpif-xlate.c:6198)
>upcall_xlate (ofproto-dpif-upcall.c:1129)
>process_upcall (ofproto-dpif-upcall.c:1271)
>recv_upcalls (ofproto-dpif-upcall.c:822)
>udpif_upcall_handler (ofproto-dpif-upcall.c:740)
>Address 0x18630490 is 1,904 bytes inside a block of size 12,288 free'd
>free (vg_replace_malloc.c:529)
>bond_entry_reset (bond.c:1635)
>bond_reconfigure (bond.c:457)
>bundle_set (ofproto-dpif.c:2896)
>ofproto_bundle_register (ofproto.c:1343)
>port_configure (bridge.c:1159)
>bridge_reconfigure (bridge.c:785)
>bridge_run (bridge.c:3099)
>main (ovs-vswitchd.c:111)
>Block was alloc'd at
>malloc (vg_replace_malloc.c:298)
>xmalloc (util.c:110)
>bond_entry_reset (bond.c:1629)
>bond_reconfigure (bond.c:457)
>bond_create (bond.c:245)
>bundle_set (ofproto-dpif.c:2900)
>ofproto_bundle_register (ofproto.c:1343)
>port_configure (bridge.c:1159)
>bridge_reconfigure (bridge.c:785)
>bridge_run (bridge.c:3099)
>main (ovs-vswitchd.c:111)
> 
> Reported-by: Huanle Han 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328969.html
> CC: Huanle Han 
> Signed-off-by: Andy Zhou 
> ---
> ofproto/bond.c   | 27 +++
> ofproto/bond.h   |  3 ++-
> ofproto/ofproto-dpif-xlate.c | 24 
> 3 files changed, 33 insertions(+), 21 deletions(-)
> 
> diff --git a/ofproto/bond.c b/ofproto/bond.c
> index 260023e4bb64..6e10c5143c0e 100644
> --- a/ofproto/bond.c
> +++ b/ofproto/bond.c
> @@ -916,17 +916,16 @@ bool
> bond_may_recirc(const struct bond *bond, uint32_t *recirc_id,
> uint32_t *hash_bias)
> {
> -if (bond->balance == BM_TCP && bond->recirc_id) {
> -if (recirc_id) {
> -*recirc_id = bond->recirc_id;
> -}
> -if (hash_bias) {
> -*hash_bias = bond->basis;
> -}
> -return true;
> -} else {
> -return false;
> +bool may_recirc = bond->balance == BM_TCP && bond->recirc_id;
> +
> +if (recirc_id) {
> +*recirc_id = may_recirc ? bond->recirc_id : 0;
> }
> +if (hash_bias) {
> +*hash_bias = may_recirc ? bond->basis : 0;
> +}
> +
> +return may_recirc;
> }
> 
> static void
> @@ -954,12 +953,16 @@ bond_update_post_recirc_rules__(struct bond* bond, 
> const bool force)
> }
> 
> void
> -bond_update_post_recirc_rules(struct bond* bond, const bool force)
> +bond_update_post_recirc_rules(struct bond *bond, uint32_t *recirc_id,
> +  uint32_t *hash_basis)
> {
> ovs_rwlock_wrlock(&rwlock);
> -bond_update_post_recirc_rules__(bond, force);
> +if (bond_may_recirc(bond, recirc_id, hash_basis)) {
> +bond_update_post_recirc_rules__(bond, false);
> +}
> ovs_rwlock_unlock(&rwlock);
> }
> +
> 
> /* Rebalancing. */
> 
> diff --git a/ofproto/bond.h b/ofproto/bond.h
> index 9a5ea9e21040..6e1221d2381b 100644
> --- a/ofproto/bond.h
> +++ b/ofproto/bond.h
> @@ -120,7 +120,8 @@ void bond_rebalance(struct bond *);
>  * Bond module pulls stats from those post recirculation rules. If rebalancing
>  * is needed, those rules are updated with new output actions.
> */
> -void bond_update_post_recirc_rules(struct bond *, const bool force);
> +void bond_update_post_recirc_rules(struct bond *, uint32_t *recirc_id,
> +  

Re: [ovs-dev] [PATCH 2/2] ofproto/bond: Fix bond post recirc rule leak.

2017-02-23 Thread Jarno Rajahalme
Looks right to me,

Acked-by: Jarno Rajahalme 

> On Feb 23, 2017, at 1:31 PM, Andy Zhou  wrote:
> 
> When bond is removed or when its configuration changes,
> the post recirculation rules that are installed by current
> bond configuration, if any, should be also be removed.
> 
> Reported-by: Huanle Han 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328969.html
> CC: Huanle Han 
> Signed-off-by: Andy Zhou 
> ---
> ofproto/bond.c | 36 ++--
> 1 file changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/ofproto/bond.c b/ofproto/bond.c
> index 6e10c5143c0e..5bb124bda5ad 100644
> --- a/ofproto/bond.c
> +++ b/ofproto/bond.c
> @@ -190,6 +190,7 @@ static struct bond_slave *choose_output_slave(const 
> struct bond *,
>   struct flow_wildcards *,
>   uint16_t vlan)
> OVS_REQ_RDLOCK(rwlock);
> +static void update_recirc_rules__(struct bond *bond);
> 
> /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
>  * stores the mode in '*balance' and returns true.  Otherwise returns false
> @@ -264,7 +265,6 @@ bond_ref(const struct bond *bond_)
> void
> bond_unref(struct bond *bond)
> {
> -struct bond_pr_rule_op *pr_op;
> struct bond_slave *slave;
> 
> if (!bond || ovs_refcount_unref_relaxed(&bond->ref_cnt) != 1) {
> @@ -283,18 +283,18 @@ bond_unref(struct bond *bond)
> hmap_destroy(&bond->slaves);
> 
> ovs_mutex_destroy(&bond->mutex);
> -free(bond->hash);
> -free(bond->name);
> -
> -HMAP_FOR_EACH_POP (pr_op, hmap_node, &bond->pr_rule_ops) {
> -free(pr_op);
> -}
> -hmap_destroy(&bond->pr_rule_ops);
> 
> +/* Free bond resources. Remove existing post recirc rules. */
> if (bond->recirc_id) {
> recirc_free_id(bond->recirc_id);
> +bond->recirc_id = 0;
> }
> +free(bond->hash);
> +bond->hash = NULL;
> +update_recirc_rules__(bond);
> 
> +hmap_destroy(&bond->pr_rule_ops);
> +free(bond->name);
> free(bond);
> }
> 
> @@ -322,9 +322,17 @@ add_pr_rule(struct bond *bond, const struct match *match,
> hmap_insert(&bond->pr_rule_ops, &pr_op->hmap_node, hash);
> }
> 
> +/* This function should almost never be called directly.
> + * 'update_recirc_rules()' should be called instead.  Since
> + * this function modifies 'bond->pr_rule_ops', it is only
> + * safe when 'rwlock' is held.
> + *
> + * However, when the 'bond' is the only reference in the system,
> + * calling this function avoid acquiring lock only to satisfy
> + * lock annotation. Currently, only 'bond_unref()' calls
> + * this function directly.  */
> static void
> -update_recirc_rules(struct bond *bond)
> -OVS_REQ_WRLOCK(rwlock)
> +update_recirc_rules__(struct bond *bond)
> {
> struct match match;
> struct bond_pr_rule_op *pr_op, *next_op;
> @@ -394,6 +402,12 @@ update_recirc_rules(struct bond *bond)
> ofpbuf_uninit(&ofpacts);
> }
> 
> +static void
> +update_recirc_rules(struct bond *bond)
> +OVS_REQ_RDLOCK(rwlock)
> +{
> +update_recirc_rules__(bond);
> +}
> 
> /* Updates 'bond''s overall configuration to 's'.
>  *
> @@ -1640,6 +1654,8 @@ bond_entry_reset(struct bond *bond)
> } else {
> free(bond->hash);
> bond->hash = NULL;
> +/* Remove existing post recirc rules. */
> +update_recirc_rules(bond);
> }
> }
> 
> -- 
> 1.9.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 00/21] Conntrack enhancements.

2017-02-23 Thread Jarno Rajahalme
This patch set backports the recent upstream conntrack fixes and new
features to the OVS tree kernel module, and adds the OVS userspace
support.

Patch 1/21 is an unrelated datapath backport.

Each new feature is introduced in two different commits, the first is
the datapath backport, the second the corresponding userspace datapath
and non-datapath functionality, including OVS system tests.  In one
instance I have squashed the system test with the datapath backport.
Compile would fail after the first patch due to missing userspace code
for new enums.  We may decide to squash the datapath and userspace
changes together for the merge, but for now the review should be more
straightforward with the separation.

System tests have been most recently run on Linux 3.16, on which the
geneve tests fail, but that should have nothing to do with this
series.

Florian Westphal (2):
  datapath: add and use skb_nfct helper
  datapath: add and use nf_ct_set helper

Jarno Rajahalme (18):
  datapath: Fix comments for skb->_nfct
  datapath: Use inverted tuple in ovs_ct_find_existing() if NATted.
  datapath: Do not trigger events for unconfirmed connections.
  datapath: Unionize ovs_key_ct_label with a u32 array.
  datapath: Simplify labels length logic.
  datapath: Refactor labels initialization.
  datapath: Inherit master's labels.
  netlink: Simplify nl_msg_start_nested().
  lib: Check match and action prerequisities with 'match'.
  datapath: Add original direction conntrack tuple to sw_flow_key.
  flow: Make room after ct_state.
  odp: Support conntrack orig tuple key.
  actions: Add resubmit with conntrack tuple.
  compat: nf_ct_delete compat.
  datapath: Add force commit.
  conntrack: Force commit.
  datapath: Add a missing comment.
  tests: Add an FTP test without conntrack.

stephen hemminger (1):
  datapath: make ndo_get_stats64 a void function

 acinclude.m4   |   7 +
 build-aux/extract-ofp-fields   |   3 +
 datapath/actions.c |   2 +
 datapath/conntrack.c   | 292 +
 datapath/conntrack.h   |  10 +-
 datapath/flow.c|  34 +-
 datapath/flow.h|  49 ++-
 datapath/flow_netlink.c|  85 +++--
 datapath/flow_netlink.h|   7 +-
 datapath/linux/compat/include/linux/openvswitch.h  |  33 +-
 datapath/linux/compat/include/linux/skbuff.h   |  11 +
 .../compat/include/net/netfilter/nf_conntrack.h|   8 +
 .../include/net/netfilter/nf_conntrack_core.h  |  10 +
 datapath/vport-internal_dev.c  |   6 +-
 include/openvswitch/flow.h |  15 +-
 include/openvswitch/match.h|  16 +
 include/openvswitch/meta-flow.h| 141 +++-
 include/openvswitch/ofp-actions.h  |  15 +-
 lib/bundle.c   |   4 +-
 lib/bundle.h   |   3 +-
 lib/conntrack.c|  59 +++-
 lib/conntrack.h|   2 +-
 lib/dpif-netdev.c  |   8 +-
 lib/flow.c | 221 +
 lib/flow.h |  50 +++
 lib/learn.c|  15 +-
 lib/learn.h|   3 +-
 lib/match.c| 118 ++-
 lib/meta-flow.c| 193 ++-
 lib/meta-flow.xml  |  92 ++
 lib/multipath.c|   4 +-
 lib/multipath.h|   3 +-
 lib/netlink.c  |   2 +-
 lib/nx-match.c |  55 +++-
 lib/nx-match.h |  10 +-
 lib/odp-execute.c  |   4 +
 lib/odp-util.c | 144 -
 lib/odp-util.h |   8 +-
 lib/ofp-actions.c  | 161 +++---
 lib/ofp-parse.c|   2 +-
 lib/ofp-util.c |   9 +-
 lib/packets.h  |   7 +-
 ofproto/ofproto-dpif-rid.h |   2 +-
 ofproto/ofproto-dpif-sflow.c   |   2 +
 ofproto/ofproto-dpif-trace.c   |  13 +-
 ofproto/ofproto-dpif-xlate.c   |  84 -
 ofproto/ofproto-dpif.c |   4 +-
 ofproto/ofproto.c  |   5 +-
 tests/odp.at   |  18 +-
 tests/ofp-actions.at   |  1

[ovs-dev] [PATCH 01/21] datapath: make ndo_get_stats64 a void function

2017-02-23 Thread Jarno Rajahalme
From: stephen hemminger 

Upstream commit:

commit bc1f44709cf27fb2a5766cadafe7e2ad5e9cb221
Author: stephen hemminger 
Date:   Fri Jan 6 19:12:52 2017 -0800

net: make ndo_get_stats64 a void function

The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.

Signed-off-by: Stephen Hemminger 
Signed-off-by: David S. Miller 

This seems to be fine for all prior Linux versions as well.

Signed-off-by: Jarno Rajahalme 
---
 datapath/vport-internal_dev.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/datapath/vport-internal_dev.c b/datapath/vport-internal_dev.c
index cc01c9c..fec1331 100644
--- a/datapath/vport-internal_dev.c
+++ b/datapath/vport-internal_dev.c
@@ -106,7 +106,7 @@ static void internal_dev_destructor(struct net_device *dev)
free_netdev(dev);
 }
 
-static struct rtnl_link_stats64 *
+static void
 internal_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
 {
int i;
@@ -134,8 +134,6 @@ internal_get_stats(struct net_device *dev, struct 
rtnl_link_stats64 *stats)
stats->tx_bytes += local_stats.tx_bytes;
stats->tx_packets   += local_stats.tx_packets;
}
-
-   return stats;
 }
 
 #ifdef HAVE_IFF_PHONY_HEADROOM
@@ -151,7 +149,7 @@ static const struct net_device_ops internal_dev_netdev_ops 
= {
.ndo_start_xmit = internal_dev_xmit,
.ndo_set_mac_address = eth_mac_addr,
.ndo_change_mtu = internal_dev_change_mtu,
-   .ndo_get_stats64 = internal_get_stats,
+   .ndo_get_stats64 = (void *)internal_get_stats,
 #ifdef HAVE_IFF_PHONY_HEADROOM
 #ifndef HAVE_NET_DEVICE_OPS_WITH_EXTENDED
.ndo_set_rx_headroom = internal_set_rx_headroom,
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 02/21] datapath: add and use skb_nfct helper

2017-02-23 Thread Jarno Rajahalme
From: Florian Westphal 

Upstream commit:

commit cb9c68363efb6d1f950ec55fb06e031ee70db5fc
Author: Florian Westphal 
Date:   Mon Jan 23 18:21:56 2017 +0100

skbuff: add and use skb_nfct helper

Followup patch renames skb->nfct and changes its type so add a helper to
avoid intrusive rename change later.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 

Signed-off-by: Jarno Rajahalme 
---
 acinclude.m4 |  1 +
 datapath/conntrack.c |  6 +++---
 datapath/linux/compat/include/linux/skbuff.h | 11 +++
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index e8b64b5..f26bcc1 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -604,6 +604,7 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
   OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_clear_hash_if_not_l4])
   OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_postpush_rcsum])
   OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [lco_csum])
+  OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_nfct])
 
   OVS_GREP_IFELSE([$KSRC/include/linux/types.h], [bool],
   [OVS_DEFINE([HAVE_BOOL_TYPE])])
diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 3c51ce6..4a1b1ba 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -762,8 +762,8 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 
/* Associate skb with specified zone. */
if (tmpl) {
-   if (skb->nfct)
-   nf_conntrack_put(skb->nfct);
+   if (skb_nfct(skb))
+   nf_conntrack_put(skb_nfct(skb));
nf_conntrack_get(&tmpl->ct_general);
skb->nfct = &tmpl->ct_general;
skb->nfctinfo = IP_CT_NEW;
@@ -864,7 +864,7 @@ static int ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
if (err)
return err;
 
-   ct = (struct nf_conn *)skb->nfct;
+   ct = (struct nf_conn *)skb_nfct(skb);
if (ct)
nf_ct_deliver_cached_events(ct);
}
diff --git a/datapath/linux/compat/include/linux/skbuff.h 
b/datapath/linux/compat/include/linux/skbuff.h
index a2cbd78..943d5f8 100644
--- a/datapath/linux/compat/include/linux/skbuff.h
+++ b/datapath/linux/compat/include/linux/skbuff.h
@@ -371,4 +371,15 @@ static inline __wsum lco_csum(struct sk_buff *skb)
return csum_partial(l4_hdr, csum_start - l4_hdr, partial);
 }
 #endif
+
+#ifndef HAVE_SKB_NFCT
+static inline struct nf_conntrack *skb_nfct(const struct sk_buff *skb)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+   return skb->nfct;
+#else
+   return NULL;
+#endif
+}
+#endif
 #endif
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 03/21] datapath: add and use nf_ct_set helper

2017-02-23 Thread Jarno Rajahalme
From: Florian Westphal 

Upstream commit:

commit c74454fadd5ea6fc866ffe2c417a0dba56b2bf1c
Author: Florian Westphal 
Date:   Mon Jan 23 18:21:57 2017 +0100

netfilter: add and use nf_ct_set helper

Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff.
This avoids changing code in followup patch that merges skb->nfct and
skb->nfctinfo into skb->_nfct.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 

Signed-off-by: Jarno Rajahalme 
---
 acinclude.m4   | 2 ++
 datapath/conntrack.c   | 6 ++
 datapath/linux/compat/include/net/netfilter/nf_conntrack.h | 8 
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index f26bcc1..926ec8a 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -529,6 +529,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
   OVS_FIND_PARAM_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
   [nf_ct_get_tuplepr], [struct.net],
   [OVS_DEFINE([HAVE_NF_CT_GET_TUPLEPR_TAKES_STRUCT_NET])])
+  OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
+  [nf_ct_set])
   OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack_zones.h],
   [nf_ct_zone_init])
   OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack_labels.h],
diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 4a1b1ba..df2bd9c 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -501,8 +501,7 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
 
ct = nf_ct_tuplehash_to_ctrack(h);
 
-   skb->nfct = &ct->ct_general;
-   skb->nfctinfo = ovs_ct_get_info(h);
+   nf_ct_set(skb, ct, ovs_ct_get_info(h));
return ct;
 }
 
@@ -765,8 +764,7 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
if (skb_nfct(skb))
nf_conntrack_put(skb_nfct(skb));
nf_conntrack_get(&tmpl->ct_general);
-   skb->nfct = &tmpl->ct_general;
-   skb->nfctinfo = IP_CT_NEW;
+   nf_ct_set(skb, tmpl, IP_CT_NEW);
}
 
/* Repeat if requested, see nf_iterate(). */
diff --git a/datapath/linux/compat/include/net/netfilter/nf_conntrack.h 
b/datapath/linux/compat/include/net/netfilter/nf_conntrack.h
index e02e20b..bb40b0f 100644
--- a/datapath/linux/compat/include/net/netfilter/nf_conntrack.h
+++ b/datapath/linux/compat/include/net/netfilter/nf_conntrack.h
@@ -14,4 +14,12 @@ static inline bool rpl_nf_ct_get_tuplepr(const struct 
sk_buff *skb,
 #define nf_ct_get_tuplepr rpl_nf_ct_get_tuplepr
 #endif
 
+#ifndef HAVE_NF_CT_SET
+static inline void
+nf_ct_set(struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info info)
+{
+   skb->nfct = &ct->ct_general;
+   skb->nfctinfo = info;
+}
+#endif
 #endif /* _NF_CONNTRACK_WRAPPER_H */
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 04/21] datapath: Fix comments for skb->_nfct

2017-02-23 Thread Jarno Rajahalme
Upstream commit:

commit 5e17da634a21b1200853fe82ba67d6571f2beabe
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:52 2017 -0800

openvswitch: Fix comments for skb->_nfct

Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
they are combined into '_nfct'.

    Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Acked-by: Joe Stringer 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index df2bd9c..e78196a 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -173,7 +173,7 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
ovs_ct_get_labels(ct, &key->ct.labels);
 }
 
-/* Update 'key' based on skb->nfct.  If 'post_ct' is true, then OVS has
+/* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
  * previously sent the packet to conntrack via the ct action.  If
  * 'keep_nat_flags' is true, the existing NAT flags retained, else they are
  * initialized from the connection status.
@@ -462,12 +462,12 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
 
 /* Find an existing connection which this packet belongs to without
  * re-attributing statistics or modifying the connection state.  This allows an
- * skb->nfct lost due to an upcall to be recovered during actions execution.
+ * skb->_nfct lost due to an upcall to be recovered during actions execution.
  *
  * Must be called with rcu_read_lock.
  *
- * On success, populates skb->nfct and skb->nfctinfo, and returns the
- * connection.  Returns NULL if there is no existing entry.
+ * On success, populates skb->_nfct and returns the connection.  Returns NULL
+ * if there is no existing entry.
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
@@ -505,7 +505,7 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return ct;
 }
 
-/* Determine whether skb->nfct is equal to the result of conntrack lookup. */
+/* Determine whether skb->_nfct is equal to the result of conntrack lookup. */
 static bool skb_nfct_cached(struct net *net,
const struct sw_flow_key *key,
const struct ovs_conntrack_info *info,
@@ -516,7 +516,7 @@ static bool skb_nfct_cached(struct net *net,
 
ct = nf_ct_get(skb, &ctinfo);
/* If no ct, check if we have evidence that an existing conntrack entry
-* might be found for this skb.  This happens when we lose a skb->nfct
+* might be found for this skb.  This happens when we lose a skb->_nfct
 * due to an upcall.  If the connection was not confirmed, it is not
 * cached and needs to be run through conntrack again.
 */
@@ -739,7 +739,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if
  * not done already.  Update key with new CT state after passing the packet
  * through conntrack.
- * Note that if the packet is deemed invalid by conntrack, skb->nfct will be
+ * Note that if the packet is deemed invalid by conntrack, skb->_nfct will be
  * set to NULL and 0 will be returned.
  */
 static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key,
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 06/21] datapath: Do not trigger events for unconfirmed connections.

2017-02-23 Thread Jarno Rajahalme
Upstream commit:

commit 193e30967897f3a8b6f9f137ac30571d832c2c5c
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:54 2017 -0800

openvswitch: Do not trigger events for unconfirmed connections.
Receiving change events before the 'new' event for the connection has
been received can be confusing.  Avoid triggering change events for
setting conntrack mark or labels before the conntrack entry has been
confirmed.

Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark")
Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme 
Acked-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Upstream commit:

commit 2317c6b51e4249dbfa093e1b88cab0a9f0564b7f
Author: Jarno Rajahalme 
Date:   Fri Feb 17 18:11:58 2017 -0800

openvswitch: Set event bit after initializing labels.

Connlabels are included in conntrack netlink event messages only if
the IPCT_LABEL bit is set in the event cache (see
ctnetlink_conntrack_event()).  Set it after initializing labels for a
new connection.

Found upon further system testing, where it was noticed that labels
were missing from the conntrack events.

Fixes: 193e30967897 ("openvswitch: Do not trigger events for unconfirmed con
nections.")
Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Fixes: 372ce9737d2b ("datapath: Allow matching on conntrack mark")
Fixes: 038e34abaa31 ("datapath: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 10a7b91..9595fca 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -261,7 +261,8 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
-   nf_conntrack_event_cache(IPCT_MARK, ct);
+   if (nf_ct_is_confirmed(ct))
+   nf_conntrack_event_cache(IPCT_MARK, ct);
key->ct.mark = new_mark;
}
 
@@ -278,7 +279,6 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
struct nf_conn *ct;
-   int err;
 
/* The connection could be invalid, in which case set_label is no-op.*/
ct = nf_ct_get(skb, &ctinfo);
@@ -294,10 +294,31 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
if (!cl || ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN)
return -ENOSPC;
 
-   err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / sizeof(u32));
-   if (err)
-   return err;
+   if (nf_ct_is_confirmed(ct)) {
+   /* Triggers a change event, which makes sense only for
+* confirmed connections.
+*/
+   int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
+   OVS_CT_LABELS_LEN / 
sizeof(u32));
+   if (err)
+   return err;
+   } else {
+   u32 *dst = (u32 *)cl->bits;
+   const u32 *msk = (const u32 *)mask->ct_labels;
+   const u32 *lbl = (const u32 *)labels->ct_labels;
+   int i;
+
+   /* No-one else has access to the non-confirmed entry, copy
+* labels over, keeping any bits we are not explicitly setting.
+*/
+   for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++)
+   dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
+
+   /* Labels are included in the IPCTNL_MSG_CT_NEW event only if
+* the IPCT_LABEL bit it set in the event cache.
+*/
+   nf_conntrack_event_cache(IPCT_LABEL, ct);
+   }
 
ovs_ct_get_labels(ct, &key->ct.labels);
return 0;
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 05/21] datapath: Use inverted tuple in ovs_ct_find_existing() if NATted.

2017-02-23 Thread Jarno Rajahalme
Upstream commit:

commit 9ff464db50e437eef131f719cc2e9902eea9c607
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:53 2017 -0800

openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.

The conntrack lookup for existing connections fails to invert the
packet 5-tuple for NATted packets, and therefore fails to find the
existing conntrack entry.  Conntrack only stores 5-tuples for incoming
packets, and there are various situations where a lookup on a packet
that has already been transformed by NAT needs to be made.  Looking up
an existing conntrack entry upon executing packet received from the
userspace is one of them.

This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
for the conntrack lookup whenever the packet has already been
transformed by conntrack from its input form as evidenced by one of
the NAT flags being set in the conntrack state metadata.

Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Jarno Rajahalme 
Acked-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

This patch also adds a test case to OVS system tests to verify the
behavior.

The following is a more thorough explanation of what is going on:

When we have evidence that an existing conntrack entry could exist, we
must invert the tuple if NAT has already been applied, as the current
packet headers do not match any tuple stored in conntrack.  For
example, if a packet from private address X to a public address B is
source-NATted to A, the conntrack entry will have the following tuples
(ignoring the protocol and port numbers) after the conntrack entry is
committed:

Original direction tuple: (X,B)
Reply direction tuple: (B,A)

Now, if a reply packet is already transformed back to the private
address space (e.g., with a CT(nat) action), the tuple corresponding
to the current packet headers is:

Current packet tuple: (B,X)

This does not match either of the conntrack tuples above.  Normally
this does not matter, as the conntrack lookup was already done using
the tuple (B,A), but if the current packet does not match any flow in
the OVS datapath, the packet is sent to userspace via an upcall,
during which the packet's skb is freed, and the conntrack entry
pointer in the skb is lost.  When the packet is reintroduced to the
datapath, any further conntrack action will need to perform a new
conntrack lookup to find the entry again.  Prior to this patch this
second lookup failed.  The datapath flow setup corresponding to the
upcall can succeed, however, allowing all further packets in the reply
direction to re-use the conntrack entry pointer in the skb, so
typically the lookup failure only causes a packet drop.

The solution is to invert the tuple derived from the current packet
headers in case the conntrack state stored in the packet metadata
indicates that the packet has been transformed by NAT:

Inverted tuple: (X,B)

With this the conntrack entry can be found, matching the original
direction tuple.

This same logic also works for the original direction packets:

Current packet tuple (after reverse NAT): (A,B)
Inverted tuple: (B,A)

While the current packet tuple (A,B) does not match either of the
conntrack tuples, the inverted one (B,A) does match the reply
direction tuple.

Since the inverted tuple matches the reverse direction tuple the
direction of the packet must be reversed as well.

Fixes: c5f6c06b58d6 ("datapath: Interface with NAT.")
Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c| 24 +--
 tests/system-traffic.at | 52 +
 2 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index e78196a..10a7b91 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -471,7 +471,7 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
-u8 l3num, struct sk_buff *skb)
+u8 l3num, struct sk_buff *skb, bool natted)
 {
struct nf_conntrack_l3proto *l3proto;
struct nf_conntrack_l4proto *l4proto;
@@ -494,6 +494,17 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return NULL;
}
 
+   /* Must invert the tuple if skb has been transformed by NAT. */
+   if (natted) {
+   struct nf_conntrack_tuple inverse;
+
+   if (!nf_ct_invert_tuple(&inverse, &tuple, l3proto, l4proto)) {
+   pr_debug("ovs_ct_find_existing: Inversion failed!\n");
+   return NULL;
+   }
+   tuple = inverse;
+   }
+
/* look for tuple match */
h = nf_conntrack_find_get(net, zone, &tuple);
if (!h)
@@ -501,6 

[ovs-dev] [PATCH 07/21] datapath: Unionize ovs_key_ct_label with a u32 array.

2017-02-23 Thread Jarno Rajahalme
Upstream commit:

commit cb80d58fae76d8ea93555149b2b16e19b89a1f4f
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:55 2017 -0800

openvswitch: Unionize ovs_key_ct_label with a u32 array.

Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array.  It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.

Signed-off-by: Jarno Rajahalme 
Acked-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c  | 15 ---
 datapath/linux/compat/include/linux/openvswitch.h |  8 ++--
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 9595fca..a827c6d 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -298,20 +298,21 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
/* Triggers a change event, which makes sense only for
 * confirmed connections.
 */
-   int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / 
sizeof(u32));
+   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
if (err)
return err;
} else {
u32 *dst = (u32 *)cl->bits;
-   const u32 *msk = (const u32 *)mask->ct_labels;
-   const u32 *lbl = (const u32 *)labels->ct_labels;
+   const u32 *msk = mask->ct_labels_32;
+   const u32 *lbl = labels->ct_labels_32;
int i;
 
/* No-one else has access to the non-confirmed entry, copy
 * labels over, keeping any bits we are not explicitly setting.
 */
-   for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++)
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
 
/* Labels are included in the IPCTNL_MSG_CT_NEW event only if
@@ -915,8 +916,8 @@ static bool labels_nonzero(const struct ovs_key_ct_labels 
*labels)
 {
size_t i;
 
-   for (i = 0; i < sizeof(*labels); i++)
-   if (labels->ct_labels[i])
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   if (labels->ct_labels_32[i])
return true;
 
return false;
diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 425d3a4..d185860 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -472,9 +472,13 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
-#define OVS_CT_LABELS_LEN  16
+#define OVS_CT_LABELS_LEN_32   4
+#define OVS_CT_LABELS_LEN  (OVS_CT_LABELS_LEN_32 * sizeof(__u32))
 struct ovs_key_ct_labels {
-   __u8ct_labels[OVS_CT_LABELS_LEN];
+   union {
+   __u8ct_labels[OVS_CT_LABELS_LEN];
+   __u32   ct_labels_32[OVS_CT_LABELS_LEN_32];
+   };
 };
 
 /* OVS_KEY_ATTR_CT_STATE flags */
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 08/21] datapath: Simplify labels length logic.

2017-02-23 Thread Jarno Rajahalme
Upstream commit:

commit b87cec3814ccc7f6afb0a1378ee7e5110d07cdd3
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:56 2017 -0800

openvswitch: Simplify labels length logic.

Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
distinct labels"), the size of conntrack labels extension has fixed to
128 bits, so we do not need to check for labels sizes shorter than 128
at run-time.  This patch simplifies labels length logic accordingly,
but allows the conntrack labels size to be increased in the future
without breaking the build.  In the event of conntrack labels
increasing in size OVS would still be able to deal with the 128 first
label bits.

Suggested-by: Joe Stringer 
Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Acked-by: Joe Stringer 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index a827c6d..dacf34c 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -145,22 +145,20 @@ static size_t ovs_ct_get_labels_len(struct nf_conn_labels 
*cl)
 #endif
 }
 
+/* Guard against conntrack labels max size shrinking below 128 bits. */
+#if NF_CT_LABELS_MAX_SIZE < 16
+#error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes
+#endif
+
 static void ovs_ct_get_labels(const struct nf_conn *ct,
  struct ovs_key_ct_labels *labels)
 {
struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
 
-   if (cl) {
-   size_t len = ovs_ct_get_labels_len(cl);
-
-   if (len > OVS_CT_LABELS_LEN)
-   len = OVS_CT_LABELS_LEN;
-   else if (len < OVS_CT_LABELS_LEN)
-   memset(labels, 0, OVS_CT_LABELS_LEN);
-   memcpy(labels, cl->bits, len);
-   } else {
+   if (cl)
+   memcpy(labels, cl->bits, OVS_CT_LABELS_LEN);
+   else
memset(labels, 0, OVS_CT_LABELS_LEN);
-   }
 }
 
 static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 09/21] datapath: Refactor labels initialization.

2017-02-23 Thread Jarno Rajahalme
Upstream commit:

Refactoring conntrack labels initialization makes changes in later
patches easier to review.

Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Acked-by: Joe Stringer 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c | 113 ++-
 1 file changed, 66 insertions(+), 47 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index dacf34c..adc4315 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -243,19 +243,12 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct 
sk_buff *skb)
return 0;
 }
 
-static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
+static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key,
   u32 ct_mark, u32 mask)
 {
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
-   enum ip_conntrack_info ctinfo;
-   struct nf_conn *ct;
u32 new_mark;
 
-   /* The connection could be invalid, in which case set_mark is no-op. */
-   ct = nf_ct_get(skb, &ctinfo);
-   if (!ct)
-   return 0;
-
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
@@ -270,56 +263,71 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
 #endif
 }
 
-static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key,
-const struct ovs_key_ct_labels *labels,
-const struct ovs_key_ct_labels *mask)
+static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct)
 {
-   enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
-   struct nf_conn *ct;
-
-   /* The connection could be invalid, in which case set_label is no-op.*/
-   ct = nf_ct_get(skb, &ctinfo);
-   if (!ct)
-   return 0;
 
cl = nf_ct_labels_find(ct);
if (!cl) {
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
+   if (cl && ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN)
+   return NULL;
+
+   return cl;
+}
+
+/* Initialize labels for a new, yet to be committed conntrack entry.  Note that
+ * since the new connection is not yet confirmed, and thus no-one else has
+ * access to it's labels, we simply write them over.
+ */
+static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key,
+ const struct ovs_key_ct_labels *labels,
+ const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   u32 *dst;
+   int i;
 
-   if (!cl || ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN)
+   cl = ovs_ct_get_conn_labels(ct);
+   if (!cl)
return -ENOSPC;
 
-   if (nf_ct_is_confirmed(ct)) {
-   /* Triggers a change event, which makes sense only for
-* confirmed connections.
-*/
-   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
-   mask->ct_labels_32,
-   OVS_CT_LABELS_LEN_32);
-   if (err)
-   return err;
-   } else {
-   u32 *dst = (u32 *)cl->bits;
-   const u32 *msk = mask->ct_labels_32;
-   const u32 *lbl = labels->ct_labels_32;
-   int i;
+   dst = (u32 *)cl->bits;
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
 
-   /* No-one else has access to the non-confirmed entry, copy
-* labels over, keeping any bits we are not explicitly setting.
-*/
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
+   /* Labels are included in the IPCTNL_MSG_CT_NEW event only if the
+* IPCT_LABEL bit it set in the event cache.
+*/
+   nf_conntrack_event_cache(IPCT_LABEL, ct);
 
-   /* Labels are included in the IPCTNL_MSG_CT_NEW event only if
-* the IPCT_LABEL bit it set in the event cache.
-*/
-   nf_conntrack_event_cache(IPCT_LABEL, ct);
-   }
+   memcpy(&key->ct.labels, cl->bits, OVS_CT_LABELS_LEN);
+
+   return 0;
+}
+
+static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key,
+const struct ovs_key_ct_labels *labels,
+const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   int err;
+
+   cl = ovs_ct_get_conn_labels(ct);
+   if (

[ovs-dev] [PATCH 10/21] datapath: Inherit master's labels.

2017-02-23 Thread Jarno Rajahalme
Upstream commit:

commit 09aa98ad496d6b11a698b258bc64d7f64c55d682
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:58 2017 -0800

openvswitch: Inherit master's labels.

We avoid calling into nf_conntrack_in() for expected connections, as
that would remove the expectation that we want to stick around until
we are ready to commit the connection.  Instead, we do a lookup in the
expectation table directly.  However, after a successful expectation
lookup we have set the flow key label field from the master
connection, whereas nf_conntrack_in() does not do this.  This leads to
master's labels being inherited after an expectation lookup, but those
labels not being inherited after the corresponding conntrack action
with a commit flag.

This patch resolves the problem by changing the commit code path to
also inherit the master's labels to the expected connection.
Resolving this conflict in favor of inheriting the labels allows more
information be passed from the master connection to related
connections, which would otherwise be much harder if the 32 bits in
the connmark are not enough.  Labels can still be set explicitly, so
this change only affects the default values of the labels in presense
of a master connection.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Acked-by: Joe Stringer 
Signed-off-by: David S. Miller 

Fixes: a94ebc39996b ("datapath: Add conntrack action")
Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index adc4315..16a7773 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -80,6 +80,8 @@ struct ovs_conntrack_info {
 #endif
 };
 
+static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
+
 static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info);
 
 static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -286,18 +288,32 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
  const struct ovs_key_ct_labels *labels,
  const struct ovs_key_ct_labels *mask)
 {
-   struct nf_conn_labels *cl;
-   u32 *dst;
-   int i;
+   struct nf_conn_labels *cl, *master_cl;
+   bool have_mask = labels_nonzero(mask);
+
+   /* Inherit master's labels to the related connection? */
+   master_cl = ct->master ? nf_ct_labels_find(ct->master) : NULL;
+
+   if (!master_cl && !have_mask)
+   return 0;   /* Nothing to do. */
 
cl = ovs_ct_get_conn_labels(ct);
if (!cl)
return -ENOSPC;
 
-   dst = (u32 *)cl->bits;
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
-   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
+   /* Inherit the master's labels, if any. */
+   if (master_cl)
+   *cl = *master_cl;
+
+   if (have_mask) {
+   u32 *dst = (u32 *)cl->bits;
+   int i;
+
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i]
+& mask->ct_labels_32[i]);
+   }
 
/* Labels are included in the IPCTNL_MSG_CT_NEW event only if the
 * IPCT_LABEL bit it set in the event cache.
@@ -957,13 +973,14 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
if (err)
return err;
}
-   if (labels_nonzero(&info->labels.mask)) {
-   if (!nf_ct_is_confirmed(ct))
-   err = ovs_ct_init_labels(ct, key, &info->labels.value,
-&info->labels.mask);
-   else
-   err = ovs_ct_set_labels(ct, key, &info->labels.value,
-   &info->labels.mask);
+   if (!nf_ct_is_confirmed(ct)) {
+   err = ovs_ct_init_labels(ct, key, &info->labels.value,
+&info->labels.mask);
+   if (err)
+   return err;
+   } else if (labels_nonzero(&info->labels.mask)) {
+   err = ovs_ct_set_labels(ct, key, &info->labels.value,
+   &info->labels.mask);
if (err)
return err;
}
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 11/21] netlink: Simplify nl_msg_start_nested().

2017-02-23 Thread Jarno Rajahalme
Since there is no data to copy nl_msg_put_unspec_uninit() may be used
directly, rather than via nl_msg_put_unspec().

Signed-off-by: Jarno Rajahalme 
---
 lib/netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/netlink.c b/lib/netlink.c
index ad7d35a..f253f80 100644
--- a/lib/netlink.c
+++ b/lib/netlink.c
@@ -454,7 +454,7 @@ size_t
 nl_msg_start_nested(struct ofpbuf *msg, uint16_t type)
 {
 size_t offset = msg->size;
-nl_msg_put_unspec(msg, type, NULL, 0);
+nl_msg_put_unspec_uninit(msg, type, 0);
 return offset;
 }
 
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 12/21] lib: Check match and action prerequisities with 'match'.

2017-02-23 Thread Jarno Rajahalme
Supply the match mask to prerequisities checking when available.  This
allows checking for zero-valued matches.  Non-zero valued matches
imply the presense of corresponding mask bits, but for zero valued
matches we must explicitly check the mask, too.

This is required now only for conntrack validity checking due to the
conntrack state having and 'invalid' bit, but not 'valid' bit.  One
way to match an valid conntrack state is to match on the 'tracked' bit
being one and 'invalid' bit being zero.  The latter requires the
corresponding mask bit be verified.

Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/meta-flow.h   |  5 ++--
 include/openvswitch/ofp-actions.h |  4 ++--
 lib/bundle.c  |  4 ++--
 lib/bundle.h  |  3 ++-
 lib/learn.c   | 15 ++--
 lib/learn.h   |  3 ++-
 lib/meta-flow.c   | 38 ++---
 lib/multipath.c   |  4 ++--
 lib/multipath.h   |  3 ++-
 lib/nx-match.c| 17 ++---
 lib/nx-match.h|  6 ++---
 lib/ofp-actions.c | 50 ---
 lib/ofp-parse.c   |  2 +-
 lib/ofp-util.c|  2 +-
 ofproto/ofproto-dpif-trace.c  | 13 +-
 ofproto/ofproto.c |  5 ++--
 utilities/ovs-ofctl.c |  6 ++---
 17 files changed, 105 insertions(+), 75 deletions(-)

diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index 83e2599..aac9945 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -1898,6 +1898,7 @@ void mf_get_mask(const struct mf_field *, const struct 
flow_wildcards *,
 /* Prerequisites. */
 bool mf_are_prereqs_ok(const struct mf_field *mf, const struct flow *flow,
struct flow_wildcards *wc);
+bool mf_are_match_prereqs_ok(const struct mf_field *, const struct match *);
 
 static inline bool
 mf_is_l3_or_higher(const struct mf_field *mf)
@@ -1959,8 +1960,8 @@ void mf_subfield_swap(const struct mf_subfield *,
   const struct mf_subfield *,
   struct flow *flow, struct flow_wildcards *);
 
-enum ofperr mf_check_src(const struct mf_subfield *, const struct flow *);
-enum ofperr mf_check_dst(const struct mf_subfield *, const struct flow *);
+enum ofperr mf_check_src(const struct mf_subfield *, const struct match *);
+enum ofperr mf_check_dst(const struct mf_subfield *, const struct match *);
 
 /* Parsing and formatting. */
 char *mf_parse(const struct mf_field *, const char *,
diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 88f573d..53d6b44 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -954,11 +954,11 @@ ofpacts_pull_openflow_instructions(struct ofpbuf 
*openflow,
const struct vl_mff_map *vl_mff_map,
struct ofpbuf *ofpacts);
 enum ofperr ofpacts_check(struct ofpact[], size_t ofpacts_len,
-  struct flow *, ofp_port_t max_ports,
+  struct match *, ofp_port_t max_ports,
   uint8_t table_id, uint8_t n_tables,
   enum ofputil_protocol *usable_protocols);
 enum ofperr ofpacts_check_consistency(struct ofpact[], size_t ofpacts_len,
-  struct flow *, ofp_port_t max_ports,
+  struct match *, ofp_port_t max_ports,
   uint8_t table_id, uint8_t n_tables,
   enum ofputil_protocol usable_protocols);
 enum ofperr ofpact_check_output_port(ofp_port_t port, ofp_port_t max_ports);
diff --git a/lib/bundle.c b/lib/bundle.c
index 70a743b..620318e 100644
--- a/lib/bundle.c
+++ b/lib/bundle.c
@@ -105,13 +105,13 @@ bundle_execute(const struct ofpact_bundle *bundle,
 
 enum ofperr
 bundle_check(const struct ofpact_bundle *bundle, ofp_port_t max_ports,
- const struct flow *flow)
+ const struct match *match)
 {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 size_t i;
 
 if (bundle->dst.field) {
-enum ofperr error = mf_check_dst(&bundle->dst, flow);
+enum ofperr error = mf_check_dst(&bundle->dst, match);
 if (error) {
 return error;
 }
diff --git a/lib/bundle.h b/lib/bundle.h
index f5ce321..48b9b79 100644
--- a/lib/bundle.h
+++ b/lib/bundle.h
@@ -29,6 +29,7 @@
 struct ds;
 struct flow;
 struct flow_wildcards;
+struct match;
 struct ofpact_bundle;
 struct ofpbuf;
 
@@ -43,7 +44,7 @@ ofp_port_t bundle_execute(const struct ofpact_bundle *, const 
struct flow *,
 bool (*slave_enabled)(ofp_port_t ofp_port, void *aux),
  

[ovs-dev] [PATCH 14/21] flow: Make room after ct_state.

2017-02-23 Thread Jarno Rajahalme
'ct_state' currently only needs 8 bits, so we can make room for a new
CT field introduced in the next patch.

Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/flow.h | 3 ++-
 lib/flow.c | 3 ++-
 lib/match.c| 8 
 lib/packets.h  | 2 +-
 ofproto/ofproto-dpif.c | 2 +-
 tests/ovs-ofctl.at | 2 +-
 6 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
index df80dfe..9169272 100644
--- a/include/openvswitch/flow.h
+++ b/include/openvswitch/flow.h
@@ -91,7 +91,8 @@ struct flow {
  * computation is opaque to the user space. */
 union flow_in_port in_port; /* Input port.*/
 uint32_t recirc_id; /* Must be exact match. */
-uint16_t ct_state;  /* Connection tracking state. */
+uint8_t ct_state;   /* Connection tracking state. */
+uint8_t pad0;
 uint16_t ct_zone;   /* Connection tracking zone. */
 uint32_t ct_mark;   /* Connection mark.*/
 uint8_t pad1[4];/* Pad to 64 bits. */
diff --git a/lib/flow.c b/lib/flow.c
index fb7bfeb..0c95b75 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -593,7 +593,8 @@ miniflow_extract(struct dp_packet *packet, struct miniflow 
*dst)
 miniflow_push_uint32(mf, in_port, odp_to_u32(md->in_port.odp_port));
 if (md->recirc_id || md->ct_state) {
 miniflow_push_uint32(mf, recirc_id, md->recirc_id);
-miniflow_push_uint16(mf, ct_state, md->ct_state);
+miniflow_push_uint8(mf, ct_state, md->ct_state);
+miniflow_push_uint8(mf, pad0, 0);
 miniflow_push_uint16(mf, ct_zone, md->ct_zone);
 }
 
diff --git a/lib/match.c b/lib/match.c
index 3fcaec5..882bf0c 100644
--- a/lib/match.c
+++ b/lib/match.c
@@ -340,8 +340,8 @@ match_set_ct_state(struct match *match, uint32_t ct_state)
 void
 match_set_ct_state_masked(struct match *match, uint32_t ct_state, uint32_t 
mask)
 {
-match->flow.ct_state = ct_state & mask & UINT16_MAX;
-match->wc.masks.ct_state = mask & UINT16_MAX;
+match->flow.ct_state = ct_state & mask & UINT8_MAX;
+match->wc.masks.ct_state = mask & UINT8_MAX;
 }
 
 void
@@ -,7 +,7 @@ match_format(const struct match *match, struct ds *s, int 
priority)
 }
 
 if (wc->masks.ct_state) {
-if (wc->masks.ct_state == UINT16_MAX) {
+if (wc->masks.ct_state == UINT8_MAX) {
 ds_put_format(s, "%sct_state=%s", colors.param, colors.end);
 if (f->ct_state) {
 format_flags(s, ct_state_to_string, f->ct_state, '|');
@@ -1120,7 +1120,7 @@ match_format(const struct match *match, struct ds *s, int 
priority)
 }
 } else {
 format_flags_masked(s, "ct_state", ct_state_to_string,
-f->ct_state, wc->masks.ct_state, UINT16_MAX);
+f->ct_state, wc->masks.ct_state, UINT8_MAX);
 }
 ds_put_char(s, ',');
 }
diff --git a/lib/packets.h b/lib/packets.h
index c4d3799..f7e1d82 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -99,7 +99,7 @@ struct pkt_metadata {
action. */
 uint32_t skb_priority;  /* Packet priority for QoS. */
 uint32_t pkt_mark;  /* Packet mark. */
-uint16_t ct_state;  /* Connection state. */
+uint8_t  ct_state;  /* Connection state. */
 uint16_t ct_zone;   /* Connection zone. */
 uint32_t ct_mark;   /* Connection mark. */
 ovs_u128 ct_label;  /* Connection label. */
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 4007a3a..7c7201d 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3994,7 +3994,7 @@ check_mask(struct ofproto_dpif *ofproto, const struct 
miniflow *flow)
 uint32_t ct_mark;
 
 support = &ofproto->backer->support.odp;
-ct_state = MINIFLOW_GET_U16(flow, ct_state);
+ct_state = MINIFLOW_GET_U8(flow, ct_state);
 if (support->ct_state && support->ct_zone && support->ct_mark
 && support->ct_label && support->ct_state_nat) {
 return ct_state & CS_UNSUPPORTED_MASK ? OFPERR_OFPBMC_BAD_MASK : 0;
diff --git a/tests/ovs-ofctl.at b/tests/ovs-ofctl.at
index 7e26735..9c32788 100644
--- a/tests/ovs-ofctl.at
+++ b/tests/ovs-ofctl.at
@@ -1215,7 +1215,7 @@ NXM_NX_REG0(a0e0d050)
 dnl
 dnl When re-serialising, bits 16-31 are wildcarded, because current OVS 
userspace
 dnl doesn't understand (or store) those bits.
-NXM_OF_ETH_TYPE(0800), NXM_NX_CT_STATE_W(0020/)
+NXM_OF_ETH_TYPE(0800), NXM_NX_CT_STATE_W(0020/00ff)
 nx_pull_match() returned error OFPBMC_BAD_VALUE
 NXM_OF_ETH_TYPE(0800), NXM_NX_CT_STATE_W(0020/0020)
 NXM_OF_ETH_TYPE(0800), NXM_NX_CT_STATE_W(0020/00f0)
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 13/21] datapath: Add original direction conntrack tuple to sw_flow_key.

2017-02-23 Thread Jarno Rajahalme
Upstream commit:

commit 9dd7f8907c3705dc7a7a375d1c6e30b06e6daffc
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:59 2017 -0800

openvswitch: Add original direction conntrack tuple to sw_flow_key.

Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key.  The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry.  This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple.  This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state.  While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards.  If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change.  When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information.  If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields.  Hence, the IP addresses are overlaid in union with ARP
and ND fields.  This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets.  ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme 
Acked-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/actions.c|  2 +
 datapath/conntrack.c  | 86 +--
 datapath/conntrack.h  | 10 ++-
 datapath/flow.c   | 34 +++--
 datapath/flow.h   | 49 ++---
 datapath/flow_netlink.c   | 85 --
 datapath/flow_netlink.h   |  7 +-
 datapath/linux/compat/include/linux/openvswitch.h | 18 +
 8 files changed, 246 insertions(+), 45 deletions(-)

diff --git a/datapath/actions.c b/datapath/actions.c
index 82833d0..71ec14c 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -1011,6 +1011,8 @@ static int execute_masked_set_action(struct sk_buff *skb,
case OVS_KEY_ATTR_CT_ZONE:
case OVS_KEY_ATTR_CT_MARK:
case OVS_KEY_ATTR_CT_LABELS:
+   case OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4:
+   case OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6:
err = -EINVAL;
break;
}
diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 16a7773..d8309c9 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c

[ovs-dev] [PATCH 17/21] compat: nf_ct_delete compat.

2017-02-23 Thread Jarno Rajahalme
Upstream commit:

commit f330a7fdbe1611104622faff7e614a246a7d20f0
Author: Florian Westphal 
Date:   Thu Aug 25 15:33:31 2016 +0200

netfilter: conntrack: get rid of conntrack timer

With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as
Eric Dumazet pointed out during netfilter workshop 2016.

Eric also says: "Another reason was the fact that Thomas was about to
change max timer range [..]" (500462a9de657f8, 'timers: Switch to
a non-cascading wheel').

Remove the timer and use a 32bit jiffies value containing timestamp until
entry is valid.

During conntrack lookup, even before doing tuple comparision, check
the timeout value and evict the entry in case it is too old.

The dying bit is used as a synchronization point to avoid races where
multiple cpus try to evict the same entry.

Because lookup is always lockless, we need to bump the refcnt once
when we evict, else we could try to evict already-dead entry that
is being recycled.

This is the standard/expected way when conntrack entries are destroyed.

Followup patches will introduce garbage colliction via work queue
and further places where we can reap obsoleted entries (e.g. during
netlink dumps), this is needed to avoid expired conntracks from hanging
around for too long when lookup rate is low after a busy period.

Signed-off-by: Florian Westphal 
Acked-by: Eric Dumazet 
Signed-off-by: Pablo Neira Ayuso 

Upstream commit f330a7fdbe16 ("netfilter: conntrack: get rid of
conntrack timer") changes the way nf_ct_delete() is called.  Prior to
commit the call pattern was like this:

   if (del_timer(&ct->timeout))
   nf_ct_delete(ct, ...);

After this change nf_ct_delete() is called directly:

   nf_ct_delete(ct, ...);

This patch provides a replacement implementation for nf_ct_delete()
that first calls the del_timer().  This replacement is only used if
the struct nf_conn has member 'timeout' of type 'struct timer_list'.

The following patch introduces the first caller to nf_ct_delete() in
the OVS kernel module.

Signed-off-by: Jarno Rajahalme 
---
 acinclude.m4   |  4 
 .../linux/compat/include/net/netfilter/nf_conntrack_core.h | 10 ++
 2 files changed, 14 insertions(+)

diff --git a/acinclude.m4 b/acinclude.m4
index 926ec8a..b73eff1 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -523,6 +523,10 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
   OVS_FIND_FIELD_IFELSE([$KSRC/include/linux/netfilter_ipv6.h], [nf_ipv6_ops],
 [fragment.*sock], 
[OVS_DEFINE([HAVE_NF_IPV6_OPS_FRAGMENT])])
 
+  OVS_FIND_FIELD_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
+[nf_conn], [struct timer_list[[ \t]]*timeout],
+[OVS_DEFINE([HAVE_NF_CONN_TIMER])])
+
   OVS_FIND_PARAM_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
   [nf_ct_tmpl_alloc], [nf_conntrack_zone],
   [OVS_DEFINE([HAVE_NF_CT_TMPL_ALLOC_TAKES_STRUCT_ZONE])])
diff --git a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h 
b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
index 09a53c3..a84a477 100644
--- a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
+++ b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
@@ -67,4 +67,14 @@ static inline bool rpl_nf_ct_get_tuple(const struct sk_buff 
*skb,
 #define nf_ct_get_tuple rpl_nf_ct_get_tuple
 #endif /* HAVE_NF_CT_GET_TUPLEPR_TAKES_STRUCT_NET */
 
+#ifdef HAVE_NF_CONN_TIMER
+static inline bool rpl_nf_ct_delete(struct nf_conn *ct, u32 portid, int report)
+{
+   if (del_timer(&ct->timeout))
+   return nf_ct_delete(ct, portid, report);
+   return false;
+}
+#define nf_ct_delete rpl_nf_ct_delete
+#endif /* HAVE_NF_CONN_TIMER */
+
 #endif /* _NF_CONNTRACK_CORE_WRAPPER_H */
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 16/21] actions: Add resubmit with conntrack tuple.

2017-02-23 Thread Jarno Rajahalme
Add resubmit option to use the Conntrack original direction tuple
swapped with the corresponding packet header fields during the lookup.
This could allow the same ACL table be used for admitting return
and/or related traffic as is used for admitting the original direction
traffic.

Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/ofp-actions.h |   4 +-
 lib/ofp-actions.c |  82 +++--
 ofproto/ofproto-dpif-xlate.c  |  68 ++---
 tests/ofp-actions.at  |   6 ++
 tests/ofproto-dpif.at |  89 +--
 tests/system-traffic.at   | 122 --
 utilities/ovs-ofctl.8.in  |  19 +-
 7 files changed, 310 insertions(+), 80 deletions(-)

diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 53d6b44..5ea0763 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -640,11 +640,13 @@ struct ofpact_nat {
 
 /* OFPACT_RESUBMIT.
  *
- * Used for NXAST_RESUBMIT, NXAST_RESUBMIT_TABLE. */
+ * Used for NXAST_RESUBMIT, NXAST_RESUBMIT_TABLE, NXAST_RESUBMIT_TABLE_CT. */
 struct ofpact_resubmit {
 struct ofpact ofpact;
 ofp_port_t in_port;
 uint8_t table_id;
+bool with_ct_orig;   /* Resubmit with Conntrack original direction tuple
+  * fields in place of IP header fields. */
 };
 
 /* Bits for 'flags' in struct nx_action_learn.
diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index 2869e0f..4d35a77 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -265,6 +265,8 @@ enum ofp_raw_action_type {
 NXAST_RAW_RESUBMIT,
 /* NX1.0+(14): struct nx_action_resubmit. */
 NXAST_RAW_RESUBMIT_TABLE,
+/* NX1.0+(44): struct nx_action_resubmit. */
+NXAST_RAW_RESUBMIT_TABLE_CT,
 
 /* NX1.0+(2): uint32_t. */
 NXAST_RAW_SET_TUNNEL,
@@ -3850,19 +3852,20 @@ format_FIN_TIMEOUT(const struct ofpact_fin_timeout *a, 
struct ds *s)
 ds_put_format(s, "%s)%s", colors.paren, colors.end);
 }
 
-/* Action structures for NXAST_RESUBMIT and NXAST_RESUBMIT_TABLE.
+/* Action structures for NXAST_RESUBMIT, NXAST_RESUBMIT_TABLE, and
+ * NXAST_RESUBMIT_TABLE_CT.
  *
  * These actions search one of the switch's flow tables:
  *
- *- For NXAST_RESUBMIT_TABLE only, if the 'table' member is not 255, then
- *  it specifies the table to search.
+ *- For NXAST_RESUBMIT_TABLE and NXAST_RESUBMIT_TABLE_CT only, if the
+ *  'table' member is not 255, then it specifies the table to search.
  *
- *- Otherwise (for NXAST_RESUBMIT_TABLE with a 'table' of 255, or for
- *  NXAST_RESUBMIT regardless of 'table'), it searches the current flow
- *  table, that is, the OpenFlow flow table that contains the flow from
- *  which this action was obtained.  If this action did not come from a
- *  flow table (e.g. it came from an OFPT_PACKET_OUT message), then table 0
- *  is the current table.
+ *- Otherwise (for NXAST_RESUBMIT_TABLE or NXAST_RESUBMIT_TABLE_CT with a
+ *  'table' of 255, or for NXAST_RESUBMIT regardless of 'table'), it
+ *  searches the current flow table, that is, the OpenFlow flow table that
+ *  contains the flow from which this action was obtained.  If this action
+ *  did not come from a flow table (e.g. it came from an OFPT_PACKET_OUT
+ *  message), then table 0 is the current table.
  *
  * The flow table lookup uses a flow that may be slightly modified from the
  * original lookup:
@@ -3870,9 +3873,12 @@ format_FIN_TIMEOUT(const struct ofpact_fin_timeout *a, 
struct ds *s)
  *- For NXAST_RESUBMIT, the 'in_port' member of struct nx_action_resubmit
  *  is used as the flow's in_port.
  *
- *- For NXAST_RESUBMIT_TABLE, if the 'in_port' member is not OFPP_IN_PORT,
- *  then its value is used as the flow's in_port.  Otherwise, the original
- *  in_port is used.
+ *- For NXAST_RESUBMIT_TABLE and NXAST_RESUBMIT_TABLE_CT, if the 'in_port'
+ *  member is not OFPP_IN_PORT, then its value is used as the flow's
+ *  in_port.  Otherwise, the original in_port is used.
+ *
+ *- For NXAST_RESUBMIT_TABLE_CT the Conntrack 5-tuple fields are used as
+ *  the packets IP header fields during the lookup.
  *
  *- If actions that modify the flow (e.g. OFPAT_SET_VLAN_VID) precede the
  *  resubmit action, then the flow is updated with the new values.
@@ -3905,11 +3911,12 @@ format_FIN_TIMEOUT(const struct ofpact_fin_timeout *a, 
struct ds *s)
  *  a total limit of 4,096 resubmits per flow translation (earlier versions
  *  did not impose any total limit).
  *
- * NXAST_RESUBMIT ignores 'table' and 'pad'.  NXAST_RESUBMIT_TABLE requires
- * 'pad' to be all-bits-zero.
+ * NXAST_RESUBMIT ignores 'table' and '

[ovs-dev] [PATCH 18/21] datapath: Add force commit.

2017-02-23 Thread Jarno Rajahalme
Upstream patch:

commit dd41d33f0b033885211a5d6f3ee19e73238aa9ee
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:22:00 2017 -0800

openvswitch: Add force commit.

Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.

Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched.  If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.

This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry.  If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.

Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Acked-by: Joe Stringer 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c  | 26 +--
 datapath/linux/compat/include/linux/openvswitch.h |  5 +
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index d8309c9..041a557 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -72,6 +72,7 @@ struct ovs_conntrack_info {
u8 commit : 1;
u8 nat : 3; /* enum ovs_ct_nat */
u8 random_fully_compat : 1; /* bool */
+   u8 force : 1;
u16 family;
struct md_mark mark;
struct md_labels labels;
@@ -658,10 +659,13 @@ static bool skb_nfct_cached(struct net *net,
 */
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
-   key->ct.zone == info->zone.id)
+   key->ct.zone == info->zone.id) {
ct = ovs_ct_find_existing(net, &info->zone, info->family, skb,
  !!(key->ct.state
 & OVS_CS_F_NAT_MASK));
+   if (ct)
+   nf_ct_get(skb, &ctinfo);
+   }
if (!ct)
return false;
if (!net_eq(net, read_pnet(&ct->ct_net)))
@@ -675,6 +679,18 @@ static bool skb_nfct_cached(struct net *net,
if (help && rcu_access_pointer(help->helper) != info->helper)
return false;
}
+   /* Force conntrack entry direction to the current packet? */
+   if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
+   /* Delete the conntrack entry if confirmed, else just release
+* the reference.
+*/
+   if (nf_ct_is_confirmed(ct))
+   nf_ct_delete(ct, 0, 0);
+   else
+   nf_conntrack_put(&ct->ct_general);
+   nf_ct_set(skb, NULL, 0);
+   return false;
+   }
 
return true;
 }
@@ -1259,6 +1275,7 @@ static int parse_nat(const struct nlattr *attr,
 
 static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
[OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 },
+   [OVS_CT_ATTR_FORCE_COMMIT]  = { .minlen = 0, .maxlen = 0 },
[OVS_CT_ATTR_ZONE]  = { .minlen = sizeof(u16),
.maxlen = sizeof(u16) },
[OVS_CT_ATTR_MARK]  = { .minlen = sizeof(struct md_mark),
@@ -1298,6 +1315,9 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
}
 
switch (type) {
+   case OVS_CT_ATTR_FORCE_COMMIT:
+   info->force = true;
+   /* fall through. */
case OVS_CT_ATTR_COMMIT:
info->commit = true;
break;
@@ -1528,7 +1548,9 @@ int ovs_ct_action_to_attr(const struct ovs_conntrack_info 
*ct_info,
if (!start)
return -EMSGSIZE;
 
-   if (ct_info->commit && nla_put_flag(skb, OVS_CT_ATTR_COMMIT))
+   if (ct_info->commit && nla_put_flag(skb, ct_info->force
+   ? OVS_CT_ATTR_FORCE_COMMIT
+   : OVS_CT_ATTR_COMMIT))
return -EMSGSIZE;
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
nla_put_

[ovs-dev] [PATCH 20/21] datapath: Add a missing comment.

2017-02-23 Thread Jarno Rajahalme
Make openvswitch.h better match upstream by adding a missing comment.

Signed-off-by: Jarno Rajahalme 
---
 datapath/linux/compat/include/linux/openvswitch.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 2fd0963..86abc96 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -718,6 +718,8 @@ struct ovs_action_push_tnl {
  * mask. For each bit set in the mask, the corresponding bit in the value is
  * copied to the connection tracking label field in the connection.
  * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
+ * @OVS_CT_ATTR_NAT: Nested OVS_NAT_ATTR_* for performing L3 network address
+ * translation (NAT) on the packet.
  * @OVS_CT_ATTR_FORCE_COMMIT: Like %OVS_CT_ATTR_COMMIT, but instead of doing
  * nothing if the connection is already committed will check that the current
  * packet is in conntrack entry's original direction.  If directionality does
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 19/21] conntrack: Force commit.

2017-02-23 Thread Jarno Rajahalme
Userspace support for force commit.

Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/ofp-actions.h |  7 +++-
 lib/conntrack.c   | 16 ++--
 lib/conntrack.h   |  2 +-
 lib/dpif-netdev.c |  8 +++-
 lib/odp-util.c| 20 -
 lib/ofp-actions.c | 29 +++--
 ofproto/ofproto-dpif-xlate.c  |  3 +-
 tests/odp.at  | 16 
 tests/ofp-actions.at  | 10 +
 tests/ofproto-dpif.at | 85 +++
 tests/system-traffic.at   | 53 
 tests/test-conntrack.c|  9 +++--
 utilities/ovs-ofctl.8.in  | 12 ++
 13 files changed, 253 insertions(+), 17 deletions(-)

diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 5ea0763..622dd7a 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -556,9 +556,14 @@ ofpact_nest_get_action_len(const struct ofpact_nest *on)
 /* Bits for 'flags' in struct nx_action_conntrack.
  *
  * If NX_CT_F_COMMIT is set, then the connection entry is moved from the
- * unconfirmed to confirmed list in the tracker. */
+ * unconfirmed to confirmed list in the tracker.
+ * If NX_CT_F_FORCE is set, in addition to NX_CT_F_COMMIT, then the conntrack
+ * entry is replaced with a new one in case the original direction of the
+ * existing entry is opposite of the current packet direction.
+ */
 enum nx_conntrack_flags {
 NX_CT_F_COMMIT = 1 << 0,
+NX_CT_F_FORCE  = 1 << 1,
 };
 
 /* Magic value for struct nx_action_conntrack 'recirc_table' field, to specify
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 1b66c8d..0b05be4 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -239,12 +239,21 @@ conn_not_found(struct conntrack *ct, struct dp_packet 
*pkt,
 static struct conn *
 process_one(struct conntrack *ct, struct dp_packet *pkt,
 struct conn_lookup_ctx *ctx, uint16_t zone,
-bool commit, long long now)
+bool force, bool commit, long long now)
 {
 unsigned bucket = hash_to_bucket(ctx->hash);
 struct conn *conn = ctx->conn;
 uint16_t state = 0;
 
+/* Delete found entry if in wrong direction. 'force' implies commit. */
+if (conn && force && ctx->reply) {
+ovs_list_remove(&conn->exp_node);
+hmap_remove(&ct->buckets[bucket].connections, &conn->node);
+atomic_count_dec(&ct->n_conn);
+delete_conn(conn);
+conn = NULL;
+}
+
 if (conn) {
 if (ctx->related) {
 state |= CS_RELATED;
@@ -301,7 +310,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
  * 'setlabel' behaves similarly for the connection label.*/
 int
 conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch,
-  ovs_be16 dl_type, bool commit, uint16_t zone,
+  ovs_be16 dl_type, bool force, bool commit, uint16_t zone,
   const uint32_t *setmark,
   const struct ovs_key_ct_labels *setlabel,
   const char *helper)
@@ -364,7 +373,8 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
 
 conn_key_lookup(ctb, &ctxs[j], now);
 
-conn = process_one(ct, pkts[j], &ctxs[j], zone, commit, now);
+conn = process_one(ct, pkts[j], &ctxs[j], zone, force, commit,
+   now);
 
 if (conn && setmark) {
 set_mark(pkts[j], conn, setmark[0], setmark[1]);
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 254f61c..0437cd3 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -65,7 +65,7 @@ void conntrack_init(struct conntrack *);
 void conntrack_destroy(struct conntrack *);
 
 int conntrack_execute(struct conntrack *, struct dp_packet_batch *,
-  ovs_be16 dl_type, bool commit,
+  ovs_be16 dl_type, bool force, bool commit,
   uint16_t zone, const uint32_t *setmark,
   const struct ovs_key_ct_labels *setlabel,
   const char *helper);
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 30907b7..de844d3 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4734,6 +4734,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 
 case OVS_ACTION_ATTR_CT: {
 const struct nlattr *b;
+bool force = false;
 bool commit = false;
 unsigned int left;
 uint16_t zone = 0;
@@ -4746,6 +4747,9 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 enum ovs_ct_attr sub_type = nl_attr_type(b);
 
 switch(sub_type) {
+case OVS_CT_ATTR_FORCE_COMMIT:
+force = true;
+/* fall through. */
 

[ovs-dev] [PATCH 15/21] odp: Support conntrack orig tuple key.

2017-02-23 Thread Jarno Rajahalme
Userspace support for datapath original direction conntrack tuple.

Signed-off-by: Jarno Rajahalme 
---
 build-aux/extract-ofp-fields|   3 +
 include/openvswitch/flow.h  |  14 ++-
 include/openvswitch/match.h |  16 +++
 include/openvswitch/meta-flow.h | 136 +
 lib/conntrack.c |  43 ++--
 lib/flow.c  | 220 
 lib/flow.h  |  50 +
 lib/match.c | 110 +++-
 lib/meta-flow.c | 157 +++-
 lib/meta-flow.xml   |  92 +
 lib/nx-match.c  |  40 ++--
 lib/nx-match.h  |   4 +-
 lib/odp-execute.c   |   4 +
 lib/odp-util.c  | 124 ++
 lib/odp-util.h  |   8 +-
 lib/ofp-util.c  |   7 +-
 lib/packets.h   |   5 +
 ofproto/ofproto-dpif-rid.h  |   2 +-
 ofproto/ofproto-dpif-sflow.c|   2 +
 ofproto/ofproto-dpif-xlate.c|  13 ++-
 ofproto/ofproto-dpif.c  |   2 +
 tests/odp.at|   2 +-
 tests/ofproto-dpif.at   |  30 +++---
 tests/ofproto.at|   7 ++
 tests/system-traffic.at | 142 --
 25 files changed, 1114 insertions(+), 119 deletions(-)

diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
index 498b887..a26d558 100755
--- a/build-aux/extract-ofp-fields
+++ b/build-aux/extract-ofp-fields
@@ -44,6 +44,9 @@ PREREQS = {"none": "MFP_NONE",
"IPv4": "MFP_IPV4",
"IPv6": "MFP_IPV6",
"IPv4/IPv6": "MFP_IP_ANY",
+   "CT": "MFP_CT_VALID",
+   "CTv4": "MFP_CTV4_VALID",
+   "CTv6": "MFP_CTV6_VALID",
"MPLS": "MFP_MPLS",
"TCP": "MFP_TCP",
"UDP": "MFP_UDP",
diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
index 9169272..68399b8 100644
--- a/include/openvswitch/flow.h
+++ b/include/openvswitch/flow.h
@@ -23,7 +23,7 @@
 /* This sequence number should be incremented whenever anything involving flows
  * or the wildcarding of flows changes.  This will cause build assertion
  * failures in places which likely need to be updated. */
-#define FLOW_WC_SEQ 36
+#define FLOW_WC_SEQ 37
 
 /* Number of Open vSwitch extension 32-bit registers. */
 #define FLOW_N_REGS 16
@@ -92,7 +92,7 @@ struct flow {
 union flow_in_port in_port; /* Input port.*/
 uint32_t recirc_id; /* Must be exact match. */
 uint8_t ct_state;   /* Connection tracking state. */
-uint8_t pad0;
+uint8_t ct_nw_proto;/* CT orig tuple IP protocol. */
 uint16_t ct_zone;   /* Connection tracking zone. */
 uint32_t ct_mark;   /* Connection mark.*/
 uint8_t pad1[4];/* Pad to 64 bits. */
@@ -110,8 +110,12 @@ struct flow {
 /* L3 (64-bit aligned) */
 ovs_be32 nw_src;/* IPv4 source address or ARP SPA. */
 ovs_be32 nw_dst;/* IPv4 destination address or ARP TPA. */
+ovs_be32 ct_nw_src; /* CT orig tuple IPv4 source address. */
+ovs_be32 ct_nw_dst; /* CT orig tuple IPv4 destination address. */
 struct in6_addr ipv6_src;   /* IPv6 source address. */
 struct in6_addr ipv6_dst;   /* IPv6 destination address. */
+struct in6_addr ct_ipv6_src; /* CT orig tuple IPv6 source address. */
+struct in6_addr ct_ipv6_dst; /* CT orig tuple IPv6 destination address. */
 ovs_be32 ipv6_label;/* IPv6 flow label. */
 uint8_t nw_frag;/* FLOW_FRAG_* flags. */
 uint8_t nw_tos; /* IP ToS (including DSCP and ECN). */
@@ -126,6 +130,8 @@ struct flow {
 /* L4 (64-bit aligned) */
 ovs_be16 tp_src;/* TCP/UDP/SCTP source port/ICMP type. */
 ovs_be16 tp_dst;/* TCP/UDP/SCTP destination port/ICMP code. */
+ovs_be16 ct_tp_src; /* CT original tuple source port/ICMP type. */
+ovs_be16 ct_tp_dst; /* CT original tuple dst port/ICMP code. */
 ovs_be32 igmp_group_ip4;/* IGMP group IPv4 address.
  * Keep last for BUILD_ASSERT_DECL below. */
 };
@@ -136,8 +142,8 @@ BUILD_ASSERT_DECL(sizeof(struct flow_tnl) % 
sizeof(uint64_t) == 0);
 
 /* Remember to update FLOW_WC_SEQ when changing 'struct flow'. */
 BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)
-  == sizeof(struct flow_tnl) + 248
-  && FLOW_WC_SEQ == 36);
+  == sizeof(struct flow_tnl) + 292
+  && FLOW_WC_SEQ == 37);
 
 /* Incremental points at which flow classification may be performed i

[ovs-dev] [PATCH 21/21] tests: Add an FTP test without conntrack.

2017-02-23 Thread Jarno Rajahalme
If FTP tests with conntrack fail, it is informative to know if the
problem is with the FTP client and/or server, or with conntrack
itself.

Signed-off-by: Jarno Rajahalme 
---
 tests/system-traffic.at | 29 +
 1 file changed, 29 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index ac9f989..1cc41b7 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -2040,6 +2040,35 @@ 
tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([FTP - no conntrack])
+AT_SKIP_IF([test $HAVE_PYFTPDLIB = no])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_DATA([flows.txt], [dnl
+table=0,action=normal
+])
+
+AT_CHECK([ovs-ofctl --bundle replace-flows br0 flows.txt])
+
+NETNS_DAEMONIZE([at_ns0], [[$PYTHON $srcdir/test-l7.py ftp]], [ftp1.pid])
+NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py ftp]], [ftp0.pid])
+OVS_WAIT_UNTIL([ip netns exec at_ns1 netstat -l | grep ftp])
+
+dnl FTP requests from p0->p1 should work fine.
+NS_CHECK_EXEC([at_ns0], [wget ftp://10.1.1.2 --no-passive-ftp -t 3 -T 1 
--retry-connrefused -v -o wget0.log])
+
+AT_CHECK([find -name index.html], [0], [dnl
+./index.html
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([conntrack - FTP])
 AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] FAQ: Update kernel support info.

2017-02-27 Thread Jarno Rajahalme
OVS 2.7 works with Linux kernels 3.10-4.9.

Signed-off-by: Jarno Rajahalme 
---
 Documentation/faq/releases.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index 319c2d7..118c88d 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -64,6 +64,7 @@ Q: What Linux kernel versions does each Open vSwitch release 
work with?
 2.4.x2.6.32 to 4.0
 2.5.x2.6.32 to 4.3
 2.6.x3.10 to 4.7
+2.7.x3.10 to 4.9
  ==
 
 Open vSwitch userspace should also work with the Linux kernel module built
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] FAQ: Update kernel support info.

2017-02-27 Thread Jarno Rajahalme
Thanks,

Pushed to master and cherry-picked to branch-2.7.

  Jarno

> On Feb 27, 2017, at 5:49 PM, Justin Pettit  wrote:
> 
> 
>> On Feb 27, 2017, at 5:45 PM, Jarno Rajahalme  wrote:
>> 
>> OVS 2.7 works with Linux kernels 3.10-4.9.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> Acked-by: Justin Pettit 
> 
> --Justin
> 
> 
> 
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] FAQ: Update kernel support info.

2017-02-27 Thread Jarno Rajahalme
Forgot to add your ack to commit, sorry!

 Jarno

> On Feb 27, 2017, at 5:49 PM, Justin Pettit  wrote:
> 
> 
>> On Feb 27, 2017, at 5:45 PM, Jarno Rajahalme  wrote:
>> 
>> OVS 2.7 works with Linux kernels 3.10-4.9.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> Acked-by: Justin Pettit 
> 
> --Justin
> 
> 
> 
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Can we use ovs to implement load balancer by openflow?

2017-02-28 Thread Jarno Rajahalme

> On Feb 27, 2017, at 10:42 PM, Han Zhou  wrote:
> 
> On Mon, Feb 27, 2017 at 7:42 PM, Yang, Yi Y  wrote:
>> 
>> Hi, all
>> 
>> Can we use ovs to implement load balancer? Our target is to let ovs
> distribute the traffic to different service VMs based on 5 tuple (src ip,
> dst ip, src port, dst port, transport protocol).
> 
> Yes. Try: man ovs-ofctl, look for keyword "group" for details.

I would add that you’d want to use the “selection_method=dp_hash” option for 
the select group for performance reasons.

  Jarno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 00/22] Conntrack enhancements.

2017-02-28 Thread Jarno Rajahalme
This patch set backports the recent upstream conntrack fixes and new
features to the OVS tree kernel module, and adds the OVS userspace
support.

Patch 1/22 is an unrelated datapath backport, and patch 22/22 allows
compiling against Linux 4.10.

Each new feature is introduced in two different commits, the first is
the datapath backport, the second the corresponding userspace datapath
and non-datapath functionality, including OVS system tests.  In one
instance I have squashed the system test with the datapath backport.
Compile would fail after the first patch due to missing userspace code
for new enums.  We may decide to squash the datapath and userspace
changes together for the merge, but for now the review should be more
straightforward with the separation.

System tests have been most recently run on Linux 3.16, on which the
geneve tests fail, but that should have nothing to do with this
series.

v2:
 - Fix compilation on Linux 3.10 and 3.11
 - Fix 32-bit compiles.

Florian Westphal (2):
  datapath: add and use skb_nfct helper
  datapath: add and use nf_ct_set helper

Jarno Rajahalme (19):
  datapath: Fix comments for skb->_nfct
  datapath: Use inverted tuple in ovs_ct_find_existing() if NATted.
  datapath: Do not trigger events for unconfirmed connections.
  datapath: Unionize ovs_key_ct_label with a u32 array.
  datapath: Simplify labels length logic.
  datapath: Refactor labels initialization.
  datapath: Inherit master's labels.
  netlink: Simplify nl_msg_start_nested().
  lib: Check match and action prerequisities with 'match'.
  datapath: Add original direction conntrack tuple to sw_flow_key.
  flow: Make room after ct_state.
  odp: Support conntrack orig tuple key.
  actions: Add resubmit with conntrack tuple.
  compat: nf_ct_delete compat.
  datapath: Add force commit.
  conntrack: Force commit.
  datapath: Add a missing comment.
  tests: Add an FTP test without conntrack.
  datapath: Allow compiling against Linux 4.10

stephen hemminger (1):
  datapath: make ndo_get_stats64 a void function

 acinclude.m4   |  13 +-
 build-aux/extract-ofp-fields   |   3 +
 datapath/actions.c |   2 +
 datapath/conntrack.c   | 292 +
 datapath/conntrack.h   |  10 +-
 datapath/flow.c|  34 +-
 datapath/flow.h|  49 ++-
 datapath/flow_netlink.c|  85 +++--
 datapath/flow_netlink.h|   7 +-
 datapath/linux/compat/include/linux/openvswitch.h  |  33 +-
 datapath/linux/compat/include/linux/skbuff.h   |  11 +
 .../compat/include/net/netfilter/nf_conntrack.h|   8 +
 .../include/net/netfilter/nf_conntrack_core.h  |  37 +++
 datapath/vport-internal_dev.c  |   6 +-
 include/openvswitch/flow.h |  16 +-
 include/openvswitch/match.h|  16 +
 include/openvswitch/meta-flow.h| 141 +++-
 include/openvswitch/ofp-actions.h  |  15 +-
 lib/bundle.c   |   4 +-
 lib/bundle.h   |   3 +-
 lib/conntrack.c|  59 +++-
 lib/conntrack.h|   2 +-
 lib/dpif-netdev.c  |   8 +-
 lib/flow.c | 221 +
 lib/flow.h |  50 +++
 lib/learn.c|  15 +-
 lib/learn.h|   3 +-
 lib/match.c| 118 ++-
 lib/meta-flow.c| 193 ++-
 lib/meta-flow.xml  |  92 ++
 lib/multipath.c|   4 +-
 lib/multipath.h|   3 +-
 lib/netlink.c  |   2 +-
 lib/nx-match.c |  55 +++-
 lib/nx-match.h |  10 +-
 lib/odp-execute.c  |   4 +
 lib/odp-util.c | 144 -
 lib/odp-util.h |   8 +-
 lib/ofp-actions.c  | 161 +++---
 lib/ofp-parse.c|   2 +-
 lib/ofp-util.c |   9 +-
 lib/packets.h  |   7 +-
 ofproto/ofproto-dpif-rid.h |   2 +-
 ofproto/ofproto-dpif-sflow.c   |   2 +
 ofproto/ofproto-dpif-trace.c   |  13 +-
 ofproto/ofproto-dpif-xlate.c   |  84 -
 ofproto/ofproto-dpif.c |   4 +-

[ovs-dev] [PATCH v2 01/22] datapath: make ndo_get_stats64 a void function

2017-02-28 Thread Jarno Rajahalme
From: stephen hemminger 

Upstream commit:

commit bc1f44709cf27fb2a5766cadafe7e2ad5e9cb221
Author: stephen hemminger 
Date:   Fri Jan 6 19:12:52 2017 -0800

net: make ndo_get_stats64 a void function

The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.

Signed-off-by: Stephen Hemminger 
Signed-off-by: David S. Miller 

This seems to be fine for all prior Linux versions as well.

Signed-off-by: Jarno Rajahalme 
---
 datapath/vport-internal_dev.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/datapath/vport-internal_dev.c b/datapath/vport-internal_dev.c
index cc01c9c..fec1331 100644
--- a/datapath/vport-internal_dev.c
+++ b/datapath/vport-internal_dev.c
@@ -106,7 +106,7 @@ static void internal_dev_destructor(struct net_device *dev)
free_netdev(dev);
 }
 
-static struct rtnl_link_stats64 *
+static void
 internal_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
 {
int i;
@@ -134,8 +134,6 @@ internal_get_stats(struct net_device *dev, struct 
rtnl_link_stats64 *stats)
stats->tx_bytes += local_stats.tx_bytes;
stats->tx_packets   += local_stats.tx_packets;
}
-
-   return stats;
 }
 
 #ifdef HAVE_IFF_PHONY_HEADROOM
@@ -151,7 +149,7 @@ static const struct net_device_ops internal_dev_netdev_ops 
= {
.ndo_start_xmit = internal_dev_xmit,
.ndo_set_mac_address = eth_mac_addr,
.ndo_change_mtu = internal_dev_change_mtu,
-   .ndo_get_stats64 = internal_get_stats,
+   .ndo_get_stats64 = (void *)internal_get_stats,
 #ifdef HAVE_IFF_PHONY_HEADROOM
 #ifndef HAVE_NET_DEVICE_OPS_WITH_EXTENDED
.ndo_set_rx_headroom = internal_set_rx_headroom,
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 02/22] datapath: add and use skb_nfct helper

2017-02-28 Thread Jarno Rajahalme
From: Florian Westphal 

Upstream commit:

commit cb9c68363efb6d1f950ec55fb06e031ee70db5fc
Author: Florian Westphal 
Date:   Mon Jan 23 18:21:56 2017 +0100

skbuff: add and use skb_nfct helper

Followup patch renames skb->nfct and changes its type so add a helper to
avoid intrusive rename change later.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 

Signed-off-by: Jarno Rajahalme 
---
 acinclude.m4 |  1 +
 datapath/conntrack.c |  6 +++---
 datapath/linux/compat/include/linux/skbuff.h | 11 +++
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index e8b64b5..f26bcc1 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -604,6 +604,7 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
   OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_clear_hash_if_not_l4])
   OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_postpush_rcsum])
   OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [lco_csum])
+  OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_nfct])
 
   OVS_GREP_IFELSE([$KSRC/include/linux/types.h], [bool],
   [OVS_DEFINE([HAVE_BOOL_TYPE])])
diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 3c51ce6..4a1b1ba 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -762,8 +762,8 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 
/* Associate skb with specified zone. */
if (tmpl) {
-   if (skb->nfct)
-   nf_conntrack_put(skb->nfct);
+   if (skb_nfct(skb))
+   nf_conntrack_put(skb_nfct(skb));
nf_conntrack_get(&tmpl->ct_general);
skb->nfct = &tmpl->ct_general;
skb->nfctinfo = IP_CT_NEW;
@@ -864,7 +864,7 @@ static int ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
if (err)
return err;
 
-   ct = (struct nf_conn *)skb->nfct;
+   ct = (struct nf_conn *)skb_nfct(skb);
if (ct)
nf_ct_deliver_cached_events(ct);
}
diff --git a/datapath/linux/compat/include/linux/skbuff.h 
b/datapath/linux/compat/include/linux/skbuff.h
index a2cbd78..943d5f8 100644
--- a/datapath/linux/compat/include/linux/skbuff.h
+++ b/datapath/linux/compat/include/linux/skbuff.h
@@ -371,4 +371,15 @@ static inline __wsum lco_csum(struct sk_buff *skb)
return csum_partial(l4_hdr, csum_start - l4_hdr, partial);
 }
 #endif
+
+#ifndef HAVE_SKB_NFCT
+static inline struct nf_conntrack *skb_nfct(const struct sk_buff *skb)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+   return skb->nfct;
+#else
+   return NULL;
+#endif
+}
+#endif
 #endif
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 03/22] datapath: add and use nf_ct_set helper

2017-02-28 Thread Jarno Rajahalme
From: Florian Westphal 

Upstream commit:

commit c74454fadd5ea6fc866ffe2c417a0dba56b2bf1c
Author: Florian Westphal 
Date:   Mon Jan 23 18:21:57 2017 +0100

netfilter: add and use nf_ct_set helper

Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff.
This avoids changing code in followup patch that merges skb->nfct and
skb->nfctinfo into skb->_nfct.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 

Signed-off-by: Jarno Rajahalme 
---
 acinclude.m4   | 2 ++
 datapath/conntrack.c   | 6 ++
 datapath/linux/compat/include/net/netfilter/nf_conntrack.h | 8 
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index f26bcc1..926ec8a 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -529,6 +529,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
   OVS_FIND_PARAM_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
   [nf_ct_get_tuplepr], [struct.net],
   [OVS_DEFINE([HAVE_NF_CT_GET_TUPLEPR_TAKES_STRUCT_NET])])
+  OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
+  [nf_ct_set])
   OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack_zones.h],
   [nf_ct_zone_init])
   OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack_labels.h],
diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 4a1b1ba..df2bd9c 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -501,8 +501,7 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
 
ct = nf_ct_tuplehash_to_ctrack(h);
 
-   skb->nfct = &ct->ct_general;
-   skb->nfctinfo = ovs_ct_get_info(h);
+   nf_ct_set(skb, ct, ovs_ct_get_info(h));
return ct;
 }
 
@@ -765,8 +764,7 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
if (skb_nfct(skb))
nf_conntrack_put(skb_nfct(skb));
nf_conntrack_get(&tmpl->ct_general);
-   skb->nfct = &tmpl->ct_general;
-   skb->nfctinfo = IP_CT_NEW;
+   nf_ct_set(skb, tmpl, IP_CT_NEW);
}
 
/* Repeat if requested, see nf_iterate(). */
diff --git a/datapath/linux/compat/include/net/netfilter/nf_conntrack.h 
b/datapath/linux/compat/include/net/netfilter/nf_conntrack.h
index e02e20b..bb40b0f 100644
--- a/datapath/linux/compat/include/net/netfilter/nf_conntrack.h
+++ b/datapath/linux/compat/include/net/netfilter/nf_conntrack.h
@@ -14,4 +14,12 @@ static inline bool rpl_nf_ct_get_tuplepr(const struct 
sk_buff *skb,
 #define nf_ct_get_tuplepr rpl_nf_ct_get_tuplepr
 #endif
 
+#ifndef HAVE_NF_CT_SET
+static inline void
+nf_ct_set(struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info info)
+{
+   skb->nfct = &ct->ct_general;
+   skb->nfctinfo = info;
+}
+#endif
 #endif /* _NF_CONNTRACK_WRAPPER_H */
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 04/22] datapath: Fix comments for skb->_nfct

2017-02-28 Thread Jarno Rajahalme
Upstream commit:

commit 5e17da634a21b1200853fe82ba67d6571f2beabe
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:52 2017 -0800

openvswitch: Fix comments for skb->_nfct

Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
they are combined into '_nfct'.

    Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Acked-by: Joe Stringer 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index df2bd9c..e78196a 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -173,7 +173,7 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
ovs_ct_get_labels(ct, &key->ct.labels);
 }
 
-/* Update 'key' based on skb->nfct.  If 'post_ct' is true, then OVS has
+/* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
  * previously sent the packet to conntrack via the ct action.  If
  * 'keep_nat_flags' is true, the existing NAT flags retained, else they are
  * initialized from the connection status.
@@ -462,12 +462,12 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
 
 /* Find an existing connection which this packet belongs to without
  * re-attributing statistics or modifying the connection state.  This allows an
- * skb->nfct lost due to an upcall to be recovered during actions execution.
+ * skb->_nfct lost due to an upcall to be recovered during actions execution.
  *
  * Must be called with rcu_read_lock.
  *
- * On success, populates skb->nfct and skb->nfctinfo, and returns the
- * connection.  Returns NULL if there is no existing entry.
+ * On success, populates skb->_nfct and returns the connection.  Returns NULL
+ * if there is no existing entry.
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
@@ -505,7 +505,7 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return ct;
 }
 
-/* Determine whether skb->nfct is equal to the result of conntrack lookup. */
+/* Determine whether skb->_nfct is equal to the result of conntrack lookup. */
 static bool skb_nfct_cached(struct net *net,
const struct sw_flow_key *key,
const struct ovs_conntrack_info *info,
@@ -516,7 +516,7 @@ static bool skb_nfct_cached(struct net *net,
 
ct = nf_ct_get(skb, &ctinfo);
/* If no ct, check if we have evidence that an existing conntrack entry
-* might be found for this skb.  This happens when we lose a skb->nfct
+* might be found for this skb.  This happens when we lose a skb->_nfct
 * due to an upcall.  If the connection was not confirmed, it is not
 * cached and needs to be run through conntrack again.
 */
@@ -739,7 +739,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if
  * not done already.  Update key with new CT state after passing the packet
  * through conntrack.
- * Note that if the packet is deemed invalid by conntrack, skb->nfct will be
+ * Note that if the packet is deemed invalid by conntrack, skb->_nfct will be
  * set to NULL and 0 will be returned.
  */
 static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key,
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 05/22] datapath: Use inverted tuple in ovs_ct_find_existing() if NATted.

2017-02-28 Thread Jarno Rajahalme
Upstream commit:

commit 9ff464db50e437eef131f719cc2e9902eea9c607
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:53 2017 -0800

openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.

The conntrack lookup for existing connections fails to invert the
packet 5-tuple for NATted packets, and therefore fails to find the
existing conntrack entry.  Conntrack only stores 5-tuples for incoming
packets, and there are various situations where a lookup on a packet
that has already been transformed by NAT needs to be made.  Looking up
an existing conntrack entry upon executing packet received from the
userspace is one of them.

This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
for the conntrack lookup whenever the packet has already been
transformed by conntrack from its input form as evidenced by one of
the NAT flags being set in the conntrack state metadata.

Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Jarno Rajahalme 
Acked-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

This patch also adds a test case to OVS system tests to verify the
behavior.

The following is a more thorough explanation of what is going on:

When we have evidence that an existing conntrack entry could exist, we
must invert the tuple if NAT has already been applied, as the current
packet headers do not match any tuple stored in conntrack.  For
example, if a packet from private address X to a public address B is
source-NATted to A, the conntrack entry will have the following tuples
(ignoring the protocol and port numbers) after the conntrack entry is
committed:

Original direction tuple: (X,B)
Reply direction tuple: (B,A)

Now, if a reply packet is already transformed back to the private
address space (e.g., with a CT(nat) action), the tuple corresponding
to the current packet headers is:

Current packet tuple: (B,X)

This does not match either of the conntrack tuples above.  Normally
this does not matter, as the conntrack lookup was already done using
the tuple (B,A), but if the current packet does not match any flow in
the OVS datapath, the packet is sent to userspace via an upcall,
during which the packet's skb is freed, and the conntrack entry
pointer in the skb is lost.  When the packet is reintroduced to the
datapath, any further conntrack action will need to perform a new
conntrack lookup to find the entry again.  Prior to this patch this
second lookup failed.  The datapath flow setup corresponding to the
upcall can succeed, however, allowing all further packets in the reply
direction to re-use the conntrack entry pointer in the skb, so
typically the lookup failure only causes a packet drop.

The solution is to invert the tuple derived from the current packet
headers in case the conntrack state stored in the packet metadata
indicates that the packet has been transformed by NAT:

Inverted tuple: (X,B)

With this the conntrack entry can be found, matching the original
direction tuple.

This same logic also works for the original direction packets:

Current packet tuple (after reverse NAT): (A,B)
Inverted tuple: (B,A)

While the current packet tuple (A,B) does not match either of the
conntrack tuples, the inverted one (B,A) does match the reply
direction tuple.

Since the inverted tuple matches the reverse direction tuple the
direction of the packet must be reversed as well.

Fixes: c5f6c06b58d6 ("datapath: Interface with NAT.")
Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c| 24 +--
 tests/system-traffic.at | 52 +
 2 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index e78196a..10a7b91 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -471,7 +471,7 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
-u8 l3num, struct sk_buff *skb)
+u8 l3num, struct sk_buff *skb, bool natted)
 {
struct nf_conntrack_l3proto *l3proto;
struct nf_conntrack_l4proto *l4proto;
@@ -494,6 +494,17 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return NULL;
}
 
+   /* Must invert the tuple if skb has been transformed by NAT. */
+   if (natted) {
+   struct nf_conntrack_tuple inverse;
+
+   if (!nf_ct_invert_tuple(&inverse, &tuple, l3proto, l4proto)) {
+   pr_debug("ovs_ct_find_existing: Inversion failed!\n");
+   return NULL;
+   }
+   tuple = inverse;
+   }
+
/* look for tuple match */
h = nf_conntrack_find_get(net, zone, &tuple);
if (!h)
@@ -501,6 

[ovs-dev] [PATCH v2 07/22] datapath: Unionize ovs_key_ct_label with a u32 array.

2017-02-28 Thread Jarno Rajahalme
Upstream commit:

commit cb80d58fae76d8ea93555149b2b16e19b89a1f4f
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:55 2017 -0800

openvswitch: Unionize ovs_key_ct_label with a u32 array.

Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array.  It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.

Signed-off-by: Jarno Rajahalme 
Acked-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c  | 15 ---
 datapath/linux/compat/include/linux/openvswitch.h |  8 ++--
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 9595fca..a827c6d 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -298,20 +298,21 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
/* Triggers a change event, which makes sense only for
 * confirmed connections.
 */
-   int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / 
sizeof(u32));
+   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
if (err)
return err;
} else {
u32 *dst = (u32 *)cl->bits;
-   const u32 *msk = (const u32 *)mask->ct_labels;
-   const u32 *lbl = (const u32 *)labels->ct_labels;
+   const u32 *msk = mask->ct_labels_32;
+   const u32 *lbl = labels->ct_labels_32;
int i;
 
/* No-one else has access to the non-confirmed entry, copy
 * labels over, keeping any bits we are not explicitly setting.
 */
-   for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++)
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
 
/* Labels are included in the IPCTNL_MSG_CT_NEW event only if
@@ -915,8 +916,8 @@ static bool labels_nonzero(const struct ovs_key_ct_labels 
*labels)
 {
size_t i;
 
-   for (i = 0; i < sizeof(*labels); i++)
-   if (labels->ct_labels[i])
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   if (labels->ct_labels_32[i])
return true;
 
return false;
diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 425d3a4..d185860 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -472,9 +472,13 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
-#define OVS_CT_LABELS_LEN  16
+#define OVS_CT_LABELS_LEN_32   4
+#define OVS_CT_LABELS_LEN  (OVS_CT_LABELS_LEN_32 * sizeof(__u32))
 struct ovs_key_ct_labels {
-   __u8ct_labels[OVS_CT_LABELS_LEN];
+   union {
+   __u8ct_labels[OVS_CT_LABELS_LEN];
+   __u32   ct_labels_32[OVS_CT_LABELS_LEN_32];
+   };
 };
 
 /* OVS_KEY_ATTR_CT_STATE flags */
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 06/22] datapath: Do not trigger events for unconfirmed connections.

2017-02-28 Thread Jarno Rajahalme
Upstream commit:

commit 193e30967897f3a8b6f9f137ac30571d832c2c5c
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:54 2017 -0800

openvswitch: Do not trigger events for unconfirmed connections.
Receiving change events before the 'new' event for the connection has
been received can be confusing.  Avoid triggering change events for
setting conntrack mark or labels before the conntrack entry has been
confirmed.

Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark")
Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme 
Acked-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Upstream commit:

commit 2317c6b51e4249dbfa093e1b88cab0a9f0564b7f
Author: Jarno Rajahalme 
Date:   Fri Feb 17 18:11:58 2017 -0800

openvswitch: Set event bit after initializing labels.

Connlabels are included in conntrack netlink event messages only if
the IPCT_LABEL bit is set in the event cache (see
ctnetlink_conntrack_event()).  Set it after initializing labels for a
new connection.

Found upon further system testing, where it was noticed that labels
were missing from the conntrack events.

Fixes: 193e30967897 ("openvswitch: Do not trigger events for unconfirmed con
nections.")
Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Fixes: 372ce9737d2b ("datapath: Allow matching on conntrack mark")
Fixes: 038e34abaa31 ("datapath: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 10a7b91..9595fca 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -261,7 +261,8 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
-   nf_conntrack_event_cache(IPCT_MARK, ct);
+   if (nf_ct_is_confirmed(ct))
+   nf_conntrack_event_cache(IPCT_MARK, ct);
key->ct.mark = new_mark;
}
 
@@ -278,7 +279,6 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
struct nf_conn *ct;
-   int err;
 
/* The connection could be invalid, in which case set_label is no-op.*/
ct = nf_ct_get(skb, &ctinfo);
@@ -294,10 +294,31 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
if (!cl || ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN)
return -ENOSPC;
 
-   err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / sizeof(u32));
-   if (err)
-   return err;
+   if (nf_ct_is_confirmed(ct)) {
+   /* Triggers a change event, which makes sense only for
+* confirmed connections.
+*/
+   int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
+   OVS_CT_LABELS_LEN / 
sizeof(u32));
+   if (err)
+   return err;
+   } else {
+   u32 *dst = (u32 *)cl->bits;
+   const u32 *msk = (const u32 *)mask->ct_labels;
+   const u32 *lbl = (const u32 *)labels->ct_labels;
+   int i;
+
+   /* No-one else has access to the non-confirmed entry, copy
+* labels over, keeping any bits we are not explicitly setting.
+*/
+   for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++)
+   dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
+
+   /* Labels are included in the IPCTNL_MSG_CT_NEW event only if
+* the IPCT_LABEL bit it set in the event cache.
+*/
+   nf_conntrack_event_cache(IPCT_LABEL, ct);
+   }
 
ovs_ct_get_labels(ct, &key->ct.labels);
return 0;
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 08/22] datapath: Simplify labels length logic.

2017-02-28 Thread Jarno Rajahalme
Upstream commit:

commit b87cec3814ccc7f6afb0a1378ee7e5110d07cdd3
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:56 2017 -0800

openvswitch: Simplify labels length logic.

Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
distinct labels"), the size of conntrack labels extension has fixed to
128 bits, so we do not need to check for labels sizes shorter than 128
at run-time.  This patch simplifies labels length logic accordingly,
but allows the conntrack labels size to be increased in the future
without breaking the build.  In the event of conntrack labels
increasing in size OVS would still be able to deal with the 128 first
label bits.

Suggested-by: Joe Stringer 
Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Acked-by: Joe Stringer 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index a827c6d..dacf34c 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -145,22 +145,20 @@ static size_t ovs_ct_get_labels_len(struct nf_conn_labels 
*cl)
 #endif
 }
 
+/* Guard against conntrack labels max size shrinking below 128 bits. */
+#if NF_CT_LABELS_MAX_SIZE < 16
+#error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes
+#endif
+
 static void ovs_ct_get_labels(const struct nf_conn *ct,
  struct ovs_key_ct_labels *labels)
 {
struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
 
-   if (cl) {
-   size_t len = ovs_ct_get_labels_len(cl);
-
-   if (len > OVS_CT_LABELS_LEN)
-   len = OVS_CT_LABELS_LEN;
-   else if (len < OVS_CT_LABELS_LEN)
-   memset(labels, 0, OVS_CT_LABELS_LEN);
-   memcpy(labels, cl->bits, len);
-   } else {
+   if (cl)
+   memcpy(labels, cl->bits, OVS_CT_LABELS_LEN);
+   else
memset(labels, 0, OVS_CT_LABELS_LEN);
-   }
 }
 
 static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 09/22] datapath: Refactor labels initialization.

2017-02-28 Thread Jarno Rajahalme
Upstream commit:

Refactoring conntrack labels initialization makes changes in later
patches easier to review.

Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Acked-by: Joe Stringer 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c | 113 ++-
 1 file changed, 66 insertions(+), 47 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index dacf34c..adc4315 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -243,19 +243,12 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct 
sk_buff *skb)
return 0;
 }
 
-static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
+static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key,
   u32 ct_mark, u32 mask)
 {
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
-   enum ip_conntrack_info ctinfo;
-   struct nf_conn *ct;
u32 new_mark;
 
-   /* The connection could be invalid, in which case set_mark is no-op. */
-   ct = nf_ct_get(skb, &ctinfo);
-   if (!ct)
-   return 0;
-
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
@@ -270,56 +263,71 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
 #endif
 }
 
-static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key,
-const struct ovs_key_ct_labels *labels,
-const struct ovs_key_ct_labels *mask)
+static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct)
 {
-   enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
-   struct nf_conn *ct;
-
-   /* The connection could be invalid, in which case set_label is no-op.*/
-   ct = nf_ct_get(skb, &ctinfo);
-   if (!ct)
-   return 0;
 
cl = nf_ct_labels_find(ct);
if (!cl) {
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
+   if (cl && ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN)
+   return NULL;
+
+   return cl;
+}
+
+/* Initialize labels for a new, yet to be committed conntrack entry.  Note that
+ * since the new connection is not yet confirmed, and thus no-one else has
+ * access to it's labels, we simply write them over.
+ */
+static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key,
+ const struct ovs_key_ct_labels *labels,
+ const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   u32 *dst;
+   int i;
 
-   if (!cl || ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN)
+   cl = ovs_ct_get_conn_labels(ct);
+   if (!cl)
return -ENOSPC;
 
-   if (nf_ct_is_confirmed(ct)) {
-   /* Triggers a change event, which makes sense only for
-* confirmed connections.
-*/
-   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
-   mask->ct_labels_32,
-   OVS_CT_LABELS_LEN_32);
-   if (err)
-   return err;
-   } else {
-   u32 *dst = (u32 *)cl->bits;
-   const u32 *msk = mask->ct_labels_32;
-   const u32 *lbl = labels->ct_labels_32;
-   int i;
+   dst = (u32 *)cl->bits;
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
 
-   /* No-one else has access to the non-confirmed entry, copy
-* labels over, keeping any bits we are not explicitly setting.
-*/
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
+   /* Labels are included in the IPCTNL_MSG_CT_NEW event only if the
+* IPCT_LABEL bit it set in the event cache.
+*/
+   nf_conntrack_event_cache(IPCT_LABEL, ct);
 
-   /* Labels are included in the IPCTNL_MSG_CT_NEW event only if
-* the IPCT_LABEL bit it set in the event cache.
-*/
-   nf_conntrack_event_cache(IPCT_LABEL, ct);
-   }
+   memcpy(&key->ct.labels, cl->bits, OVS_CT_LABELS_LEN);
+
+   return 0;
+}
+
+static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key,
+const struct ovs_key_ct_labels *labels,
+const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   int err;
+
+   cl = ovs_ct_get_conn_labels(ct);
+   if (

[ovs-dev] [PATCH v2 10/22] datapath: Inherit master's labels.

2017-02-28 Thread Jarno Rajahalme
Upstream commit:

commit 09aa98ad496d6b11a698b258bc64d7f64c55d682
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:58 2017 -0800

openvswitch: Inherit master's labels.

We avoid calling into nf_conntrack_in() for expected connections, as
that would remove the expectation that we want to stick around until
we are ready to commit the connection.  Instead, we do a lookup in the
expectation table directly.  However, after a successful expectation
lookup we have set the flow key label field from the master
connection, whereas nf_conntrack_in() does not do this.  This leads to
master's labels being inherited after an expectation lookup, but those
labels not being inherited after the corresponding conntrack action
with a commit flag.

This patch resolves the problem by changing the commit code path to
also inherit the master's labels to the expected connection.
Resolving this conflict in favor of inheriting the labels allows more
information be passed from the master connection to related
connections, which would otherwise be much harder if the 32 bits in
the connmark are not enough.  Labels can still be set explicitly, so
this change only affects the default values of the labels in presense
of a master connection.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Acked-by: Joe Stringer 
Signed-off-by: David S. Miller 

Fixes: a94ebc39996b ("datapath: Add conntrack action")
Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index adc4315..16a7773 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -80,6 +80,8 @@ struct ovs_conntrack_info {
 #endif
 };
 
+static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
+
 static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info);
 
 static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -286,18 +288,32 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
  const struct ovs_key_ct_labels *labels,
  const struct ovs_key_ct_labels *mask)
 {
-   struct nf_conn_labels *cl;
-   u32 *dst;
-   int i;
+   struct nf_conn_labels *cl, *master_cl;
+   bool have_mask = labels_nonzero(mask);
+
+   /* Inherit master's labels to the related connection? */
+   master_cl = ct->master ? nf_ct_labels_find(ct->master) : NULL;
+
+   if (!master_cl && !have_mask)
+   return 0;   /* Nothing to do. */
 
cl = ovs_ct_get_conn_labels(ct);
if (!cl)
return -ENOSPC;
 
-   dst = (u32 *)cl->bits;
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
-   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
+   /* Inherit the master's labels, if any. */
+   if (master_cl)
+   *cl = *master_cl;
+
+   if (have_mask) {
+   u32 *dst = (u32 *)cl->bits;
+   int i;
+
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i]
+& mask->ct_labels_32[i]);
+   }
 
/* Labels are included in the IPCTNL_MSG_CT_NEW event only if the
 * IPCT_LABEL bit it set in the event cache.
@@ -957,13 +973,14 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
if (err)
return err;
}
-   if (labels_nonzero(&info->labels.mask)) {
-   if (!nf_ct_is_confirmed(ct))
-   err = ovs_ct_init_labels(ct, key, &info->labels.value,
-&info->labels.mask);
-   else
-   err = ovs_ct_set_labels(ct, key, &info->labels.value,
-   &info->labels.mask);
+   if (!nf_ct_is_confirmed(ct)) {
+   err = ovs_ct_init_labels(ct, key, &info->labels.value,
+&info->labels.mask);
+   if (err)
+   return err;
+   } else if (labels_nonzero(&info->labels.mask)) {
+   err = ovs_ct_set_labels(ct, key, &info->labels.value,
+   &info->labels.mask);
if (err)
return err;
}
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 11/22] netlink: Simplify nl_msg_start_nested().

2017-02-28 Thread Jarno Rajahalme
Since there is no data to copy nl_msg_put_unspec_uninit() may be used
directly, rather than via nl_msg_put_unspec().

Signed-off-by: Jarno Rajahalme 
---
 lib/netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/netlink.c b/lib/netlink.c
index ad7d35a..f253f80 100644
--- a/lib/netlink.c
+++ b/lib/netlink.c
@@ -454,7 +454,7 @@ size_t
 nl_msg_start_nested(struct ofpbuf *msg, uint16_t type)
 {
 size_t offset = msg->size;
-nl_msg_put_unspec(msg, type, NULL, 0);
+nl_msg_put_unspec_uninit(msg, type, 0);
 return offset;
 }
 
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 12/22] lib: Check match and action prerequisities with 'match'.

2017-02-28 Thread Jarno Rajahalme
Supply the match mask to prerequisities checking when available.  This
allows checking for zero-valued matches.  Non-zero valued matches
imply the presense of corresponding mask bits, but for zero valued
matches we must explicitly check the mask, too.

This is required now only for conntrack validity checking due to the
conntrack state having and 'invalid' bit, but not 'valid' bit.  One
way to match an valid conntrack state is to match on the 'tracked' bit
being one and 'invalid' bit being zero.  The latter requires the
corresponding mask bit be verified.

Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/meta-flow.h   |  5 ++--
 include/openvswitch/ofp-actions.h |  4 ++--
 lib/bundle.c  |  4 ++--
 lib/bundle.h  |  3 ++-
 lib/learn.c   | 15 ++--
 lib/learn.h   |  3 ++-
 lib/meta-flow.c   | 38 ++---
 lib/multipath.c   |  4 ++--
 lib/multipath.h   |  3 ++-
 lib/nx-match.c| 17 ++---
 lib/nx-match.h|  6 ++---
 lib/ofp-actions.c | 50 ---
 lib/ofp-parse.c   |  2 +-
 lib/ofp-util.c|  2 +-
 ofproto/ofproto-dpif-trace.c  | 13 +-
 ofproto/ofproto.c |  5 ++--
 utilities/ovs-ofctl.c |  6 ++---
 17 files changed, 105 insertions(+), 75 deletions(-)

diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index 83e2599..aac9945 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -1898,6 +1898,7 @@ void mf_get_mask(const struct mf_field *, const struct 
flow_wildcards *,
 /* Prerequisites. */
 bool mf_are_prereqs_ok(const struct mf_field *mf, const struct flow *flow,
struct flow_wildcards *wc);
+bool mf_are_match_prereqs_ok(const struct mf_field *, const struct match *);
 
 static inline bool
 mf_is_l3_or_higher(const struct mf_field *mf)
@@ -1959,8 +1960,8 @@ void mf_subfield_swap(const struct mf_subfield *,
   const struct mf_subfield *,
   struct flow *flow, struct flow_wildcards *);
 
-enum ofperr mf_check_src(const struct mf_subfield *, const struct flow *);
-enum ofperr mf_check_dst(const struct mf_subfield *, const struct flow *);
+enum ofperr mf_check_src(const struct mf_subfield *, const struct match *);
+enum ofperr mf_check_dst(const struct mf_subfield *, const struct match *);
 
 /* Parsing and formatting. */
 char *mf_parse(const struct mf_field *, const char *,
diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 88f573d..53d6b44 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -954,11 +954,11 @@ ofpacts_pull_openflow_instructions(struct ofpbuf 
*openflow,
const struct vl_mff_map *vl_mff_map,
struct ofpbuf *ofpacts);
 enum ofperr ofpacts_check(struct ofpact[], size_t ofpacts_len,
-  struct flow *, ofp_port_t max_ports,
+  struct match *, ofp_port_t max_ports,
   uint8_t table_id, uint8_t n_tables,
   enum ofputil_protocol *usable_protocols);
 enum ofperr ofpacts_check_consistency(struct ofpact[], size_t ofpacts_len,
-  struct flow *, ofp_port_t max_ports,
+  struct match *, ofp_port_t max_ports,
   uint8_t table_id, uint8_t n_tables,
   enum ofputil_protocol usable_protocols);
 enum ofperr ofpact_check_output_port(ofp_port_t port, ofp_port_t max_ports);
diff --git a/lib/bundle.c b/lib/bundle.c
index 70a743b..620318e 100644
--- a/lib/bundle.c
+++ b/lib/bundle.c
@@ -105,13 +105,13 @@ bundle_execute(const struct ofpact_bundle *bundle,
 
 enum ofperr
 bundle_check(const struct ofpact_bundle *bundle, ofp_port_t max_ports,
- const struct flow *flow)
+ const struct match *match)
 {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 size_t i;
 
 if (bundle->dst.field) {
-enum ofperr error = mf_check_dst(&bundle->dst, flow);
+enum ofperr error = mf_check_dst(&bundle->dst, match);
 if (error) {
 return error;
 }
diff --git a/lib/bundle.h b/lib/bundle.h
index f5ce321..48b9b79 100644
--- a/lib/bundle.h
+++ b/lib/bundle.h
@@ -29,6 +29,7 @@
 struct ds;
 struct flow;
 struct flow_wildcards;
+struct match;
 struct ofpact_bundle;
 struct ofpbuf;
 
@@ -43,7 +44,7 @@ ofp_port_t bundle_execute(const struct ofpact_bundle *, const 
struct flow *,
 bool (*slave_enabled)(ofp_port_t ofp_port, void *aux),
  

[ovs-dev] [PATCH v2 14/22] flow: Make room after ct_state.

2017-02-28 Thread Jarno Rajahalme
'ct_state' currently only needs 8 bits, so we can make room for a new
CT field introduced in the next patch.

Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/flow.h | 3 ++-
 lib/flow.c | 3 ++-
 lib/match.c| 8 
 lib/packets.h  | 2 +-
 ofproto/ofproto-dpif.c | 2 +-
 tests/ovs-ofctl.at | 2 +-
 6 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
index df80dfe..9169272 100644
--- a/include/openvswitch/flow.h
+++ b/include/openvswitch/flow.h
@@ -91,7 +91,8 @@ struct flow {
  * computation is opaque to the user space. */
 union flow_in_port in_port; /* Input port.*/
 uint32_t recirc_id; /* Must be exact match. */
-uint16_t ct_state;  /* Connection tracking state. */
+uint8_t ct_state;   /* Connection tracking state. */
+uint8_t pad0;
 uint16_t ct_zone;   /* Connection tracking zone. */
 uint32_t ct_mark;   /* Connection mark.*/
 uint8_t pad1[4];/* Pad to 64 bits. */
diff --git a/lib/flow.c b/lib/flow.c
index fb7bfeb..0c95b75 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -593,7 +593,8 @@ miniflow_extract(struct dp_packet *packet, struct miniflow 
*dst)
 miniflow_push_uint32(mf, in_port, odp_to_u32(md->in_port.odp_port));
 if (md->recirc_id || md->ct_state) {
 miniflow_push_uint32(mf, recirc_id, md->recirc_id);
-miniflow_push_uint16(mf, ct_state, md->ct_state);
+miniflow_push_uint8(mf, ct_state, md->ct_state);
+miniflow_push_uint8(mf, pad0, 0);
 miniflow_push_uint16(mf, ct_zone, md->ct_zone);
 }
 
diff --git a/lib/match.c b/lib/match.c
index 3fcaec5..882bf0c 100644
--- a/lib/match.c
+++ b/lib/match.c
@@ -340,8 +340,8 @@ match_set_ct_state(struct match *match, uint32_t ct_state)
 void
 match_set_ct_state_masked(struct match *match, uint32_t ct_state, uint32_t 
mask)
 {
-match->flow.ct_state = ct_state & mask & UINT16_MAX;
-match->wc.masks.ct_state = mask & UINT16_MAX;
+match->flow.ct_state = ct_state & mask & UINT8_MAX;
+match->wc.masks.ct_state = mask & UINT8_MAX;
 }
 
 void
@@ -,7 +,7 @@ match_format(const struct match *match, struct ds *s, int 
priority)
 }
 
 if (wc->masks.ct_state) {
-if (wc->masks.ct_state == UINT16_MAX) {
+if (wc->masks.ct_state == UINT8_MAX) {
 ds_put_format(s, "%sct_state=%s", colors.param, colors.end);
 if (f->ct_state) {
 format_flags(s, ct_state_to_string, f->ct_state, '|');
@@ -1120,7 +1120,7 @@ match_format(const struct match *match, struct ds *s, int 
priority)
 }
 } else {
 format_flags_masked(s, "ct_state", ct_state_to_string,
-f->ct_state, wc->masks.ct_state, UINT16_MAX);
+f->ct_state, wc->masks.ct_state, UINT8_MAX);
 }
 ds_put_char(s, ',');
 }
diff --git a/lib/packets.h b/lib/packets.h
index c4d3799..f7e1d82 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -99,7 +99,7 @@ struct pkt_metadata {
action. */
 uint32_t skb_priority;  /* Packet priority for QoS. */
 uint32_t pkt_mark;  /* Packet mark. */
-uint16_t ct_state;  /* Connection state. */
+uint8_t  ct_state;  /* Connection state. */
 uint16_t ct_zone;   /* Connection zone. */
 uint32_t ct_mark;   /* Connection mark. */
 ovs_u128 ct_label;  /* Connection label. */
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 89c7b7f..e595f3b 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3994,7 +3994,7 @@ check_mask(struct ofproto_dpif *ofproto, const struct 
miniflow *flow)
 uint32_t ct_mark;
 
 support = &ofproto->backer->support.odp;
-ct_state = MINIFLOW_GET_U16(flow, ct_state);
+ct_state = MINIFLOW_GET_U8(flow, ct_state);
 if (support->ct_state && support->ct_zone && support->ct_mark
 && support->ct_label && support->ct_state_nat) {
 return ct_state & CS_UNSUPPORTED_MASK ? OFPERR_OFPBMC_BAD_MASK : 0;
diff --git a/tests/ovs-ofctl.at b/tests/ovs-ofctl.at
index 7e26735..9c32788 100644
--- a/tests/ovs-ofctl.at
+++ b/tests/ovs-ofctl.at
@@ -1215,7 +1215,7 @@ NXM_NX_REG0(a0e0d050)
 dnl
 dnl When re-serialising, bits 16-31 are wildcarded, because current OVS 
userspace
 dnl doesn't understand (or store) those bits.
-NXM_OF_ETH_TYPE(0800), NXM_NX_CT_STATE_W(0020/)
+NXM_OF_ETH_TYPE(0800), NXM_NX_CT_STATE_W(0020/00ff)
 nx_pull_match() returned error OFPBMC_BAD_VALUE
 NXM_OF_ETH_TYPE(0800), NXM_NX_CT_STATE_W(0020/0020)
 NXM_OF_ETH_TYPE(0800), NXM_NX_CT_STATE_W(0020/00f0)
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 13/22] datapath: Add original direction conntrack tuple to sw_flow_key.

2017-02-28 Thread Jarno Rajahalme
Upstream commit:

commit 9dd7f8907c3705dc7a7a375d1c6e30b06e6daffc
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:21:59 2017 -0800

openvswitch: Add original direction conntrack tuple to sw_flow_key.

Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key.  The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry.  This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple.  This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state.  While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards.  If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change.  When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information.  If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields.  Hence, the IP addresses are overlaid in union with ARP
and ND fields.  This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets.  ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme 
Acked-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/actions.c|  2 +
 datapath/conntrack.c  | 86 +--
 datapath/conntrack.h  | 10 ++-
 datapath/flow.c   | 34 +++--
 datapath/flow.h   | 49 ++---
 datapath/flow_netlink.c   | 85 --
 datapath/flow_netlink.h   |  7 +-
 datapath/linux/compat/include/linux/openvswitch.h | 18 +
 8 files changed, 246 insertions(+), 45 deletions(-)

diff --git a/datapath/actions.c b/datapath/actions.c
index 82833d0..71ec14c 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -1011,6 +1011,8 @@ static int execute_masked_set_action(struct sk_buff *skb,
case OVS_KEY_ATTR_CT_ZONE:
case OVS_KEY_ATTR_CT_MARK:
case OVS_KEY_ATTR_CT_LABELS:
+   case OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4:
+   case OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6:
err = -EINVAL;
break;
}
diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 16a7773..d8309c9 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c

[ovs-dev] [PATCH v2 17/22] compat: nf_ct_delete compat.

2017-02-28 Thread Jarno Rajahalme
Upstream commit:

commit f330a7fdbe1611104622faff7e614a246a7d20f0
Author: Florian Westphal 
Date:   Thu Aug 25 15:33:31 2016 +0200

netfilter: conntrack: get rid of conntrack timer

With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as
Eric Dumazet pointed out during netfilter workshop 2016.

Eric also says: "Another reason was the fact that Thomas was about to
change max timer range [..]" (500462a9de657f8, 'timers: Switch to
a non-cascading wheel').

Remove the timer and use a 32bit jiffies value containing timestamp until
entry is valid.

During conntrack lookup, even before doing tuple comparision, check
the timeout value and evict the entry in case it is too old.

The dying bit is used as a synchronization point to avoid races where
multiple cpus try to evict the same entry.

Because lookup is always lockless, we need to bump the refcnt once
when we evict, else we could try to evict already-dead entry that
is being recycled.

This is the standard/expected way when conntrack entries are destroyed.

Followup patches will introduce garbage colliction via work queue
and further places where we can reap obsoleted entries (e.g. during
netlink dumps), this is needed to avoid expired conntracks from hanging
around for too long when lookup rate is low after a busy period.

Signed-off-by: Florian Westphal 
Acked-by: Eric Dumazet 
Signed-off-by: Pablo Neira Ayuso 

Upstream commit f330a7fdbe16 ("netfilter: conntrack: get rid of
conntrack timer") changes the way nf_ct_delete() is called.  Prior to
commit the call pattern was like this:

   if (del_timer(&ct->timeout))
   nf_ct_delete(ct, ...);

After this change nf_ct_delete() is called directly:

   nf_ct_delete(ct, ...);

This patch provides a replacement implementation for nf_ct_delete()
that first calls the del_timer().  This replacement is only used if
the struct nf_conn has member 'timeout' of type 'struct timer_list'.

The following patch introduces the first caller to nf_ct_delete() in
the OVS kernel module.

Linux <3.12 does not have nf_ct_delete() at all, so we inline it if it
does not exist.  The inlined code is from 3.11 death_by_timeout(),
which in later versions simply calls nf_ct_delete().

Signed-off-by: Jarno Rajahalme 
---
 acinclude.m4   |  6 
 .../include/net/netfilter/nf_conntrack_core.h  | 37 ++
 2 files changed, 43 insertions(+)

diff --git a/acinclude.m4 b/acinclude.m4
index 926ec8a..f4cbabd 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -523,6 +523,12 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
   OVS_FIND_FIELD_IFELSE([$KSRC/include/linux/netfilter_ipv6.h], [nf_ipv6_ops],
 [fragment.*sock], 
[OVS_DEFINE([HAVE_NF_IPV6_OPS_FRAGMENT])])
 
+  OVS_FIND_FIELD_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
+[nf_conn], [struct timer_list[[ \t]]*timeout],
+[OVS_DEFINE([HAVE_NF_CONN_TIMER])])
+  OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
+  [nf_ct_delete(], [OVS_DEFINE([HAVE_NF_CT_DELETE])])
+
   OVS_FIND_PARAM_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
   [nf_ct_tmpl_alloc], [nf_conntrack_zone],
   [OVS_DEFINE([HAVE_NF_CT_TMPL_ALLOC_TAKES_STRUCT_ZONE])])
diff --git a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h 
b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
index 09a53c3..7834c8c 100644
--- a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
+++ b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
@@ -67,4 +67,41 @@ static inline bool rpl_nf_ct_get_tuple(const struct sk_buff 
*skb,
 #define nf_ct_get_tuple rpl_nf_ct_get_tuple
 #endif /* HAVE_NF_CT_GET_TUPLEPR_TAKES_STRUCT_NET */
 
+#ifdef HAVE_NF_CONN_TIMER
+
+#ifndef HAVE_NF_CT_DELETE
+#include 
+#endif
+
+static inline bool rpl_nf_ct_delete(struct nf_conn *ct, u32 portid, int report)
+{
+   if (del_timer(&ct->timeout))
+#ifdef HAVE_NF_CT_DELETE
+   return nf_ct_delete(ct, portid, report);
+#else
+   {
+   struct nf_conn_tstamp *tstamp;
+
+   tstamp = nf_conn_tstamp_find(ct);
+   if (tstamp && tstamp->stop == 0)
+   tstamp->stop = ktime_to_ns(ktime_get_real());
+
+   if (!test_bit(IPS_DYING_BIT, &ct->status) &&
+   unlikely(nf_conntrack_event(IPCT_DESTROY, ct) < 0)) {
+   /* destroy event was not delivered */
+   nf_ct_delete_from_lists(ct);
+   nf_ct_dying_timeout(ct);
+   return false;
+   }
+   set_bit(IPS_DYING_BIT, &ct->status);
+   nf_ct_delete_from_lis

[ovs-dev] [PATCH v2 18/22] datapath: Add force commit.

2017-02-28 Thread Jarno Rajahalme
Upstream patch:

commit dd41d33f0b033885211a5d6f3ee19e73238aa9ee
Author: Jarno Rajahalme 
Date:   Thu Feb 9 11:22:00 2017 -0800

openvswitch: Add force commit.

Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.

Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched.  If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.

This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry.  If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.

Signed-off-by: Jarno Rajahalme 
Acked-by: Pravin B Shelar 
Acked-by: Joe Stringer 
Signed-off-by: David S. Miller 

Signed-off-by: Jarno Rajahalme 
---
 datapath/conntrack.c  | 26 +--
 datapath/linux/compat/include/linux/openvswitch.h |  5 +
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index d8309c9..041a557 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -72,6 +72,7 @@ struct ovs_conntrack_info {
u8 commit : 1;
u8 nat : 3; /* enum ovs_ct_nat */
u8 random_fully_compat : 1; /* bool */
+   u8 force : 1;
u16 family;
struct md_mark mark;
struct md_labels labels;
@@ -658,10 +659,13 @@ static bool skb_nfct_cached(struct net *net,
 */
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
-   key->ct.zone == info->zone.id)
+   key->ct.zone == info->zone.id) {
ct = ovs_ct_find_existing(net, &info->zone, info->family, skb,
  !!(key->ct.state
 & OVS_CS_F_NAT_MASK));
+   if (ct)
+   nf_ct_get(skb, &ctinfo);
+   }
if (!ct)
return false;
if (!net_eq(net, read_pnet(&ct->ct_net)))
@@ -675,6 +679,18 @@ static bool skb_nfct_cached(struct net *net,
if (help && rcu_access_pointer(help->helper) != info->helper)
return false;
}
+   /* Force conntrack entry direction to the current packet? */
+   if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
+   /* Delete the conntrack entry if confirmed, else just release
+* the reference.
+*/
+   if (nf_ct_is_confirmed(ct))
+   nf_ct_delete(ct, 0, 0);
+   else
+   nf_conntrack_put(&ct->ct_general);
+   nf_ct_set(skb, NULL, 0);
+   return false;
+   }
 
return true;
 }
@@ -1259,6 +1275,7 @@ static int parse_nat(const struct nlattr *attr,
 
 static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
[OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 },
+   [OVS_CT_ATTR_FORCE_COMMIT]  = { .minlen = 0, .maxlen = 0 },
[OVS_CT_ATTR_ZONE]  = { .minlen = sizeof(u16),
.maxlen = sizeof(u16) },
[OVS_CT_ATTR_MARK]  = { .minlen = sizeof(struct md_mark),
@@ -1298,6 +1315,9 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
}
 
switch (type) {
+   case OVS_CT_ATTR_FORCE_COMMIT:
+   info->force = true;
+   /* fall through. */
case OVS_CT_ATTR_COMMIT:
info->commit = true;
break;
@@ -1528,7 +1548,9 @@ int ovs_ct_action_to_attr(const struct ovs_conntrack_info 
*ct_info,
if (!start)
return -EMSGSIZE;
 
-   if (ct_info->commit && nla_put_flag(skb, OVS_CT_ATTR_COMMIT))
+   if (ct_info->commit && nla_put_flag(skb, ct_info->force
+   ? OVS_CT_ATTR_FORCE_COMMIT
+   : OVS_CT_ATTR_COMMIT))
return -EMSGSIZE;
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
nla_put_

[ovs-dev] [PATCH v2 20/22] datapath: Add a missing comment.

2017-02-28 Thread Jarno Rajahalme
Make openvswitch.h better match upstream by adding a missing comment.

Signed-off-by: Jarno Rajahalme 
---
 datapath/linux/compat/include/linux/openvswitch.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 2fd0963..86abc96 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -718,6 +718,8 @@ struct ovs_action_push_tnl {
  * mask. For each bit set in the mask, the corresponding bit in the value is
  * copied to the connection tracking label field in the connection.
  * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
+ * @OVS_CT_ATTR_NAT: Nested OVS_NAT_ATTR_* for performing L3 network address
+ * translation (NAT) on the packet.
  * @OVS_CT_ATTR_FORCE_COMMIT: Like %OVS_CT_ATTR_COMMIT, but instead of doing
  * nothing if the connection is already committed will check that the current
  * packet is in conntrack entry's original direction.  If directionality does
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 16/22] actions: Add resubmit with conntrack tuple.

2017-02-28 Thread Jarno Rajahalme
Add resubmit option to use the Conntrack original direction tuple
swapped with the corresponding packet header fields during the lookup.
This could allow the same ACL table be used for admitting return
and/or related traffic as is used for admitting the original direction
traffic.

Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/ofp-actions.h |   4 +-
 lib/ofp-actions.c |  82 +++--
 ofproto/ofproto-dpif-xlate.c  |  68 ++---
 tests/ofp-actions.at  |   6 ++
 tests/ofproto-dpif.at |  89 +--
 tests/system-traffic.at   | 122 --
 utilities/ovs-ofctl.8.in  |  19 +-
 7 files changed, 310 insertions(+), 80 deletions(-)

diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 53d6b44..5ea0763 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -640,11 +640,13 @@ struct ofpact_nat {
 
 /* OFPACT_RESUBMIT.
  *
- * Used for NXAST_RESUBMIT, NXAST_RESUBMIT_TABLE. */
+ * Used for NXAST_RESUBMIT, NXAST_RESUBMIT_TABLE, NXAST_RESUBMIT_TABLE_CT. */
 struct ofpact_resubmit {
 struct ofpact ofpact;
 ofp_port_t in_port;
 uint8_t table_id;
+bool with_ct_orig;   /* Resubmit with Conntrack original direction tuple
+  * fields in place of IP header fields. */
 };
 
 /* Bits for 'flags' in struct nx_action_learn.
diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index 2869e0f..4d35a77 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -265,6 +265,8 @@ enum ofp_raw_action_type {
 NXAST_RAW_RESUBMIT,
 /* NX1.0+(14): struct nx_action_resubmit. */
 NXAST_RAW_RESUBMIT_TABLE,
+/* NX1.0+(44): struct nx_action_resubmit. */
+NXAST_RAW_RESUBMIT_TABLE_CT,
 
 /* NX1.0+(2): uint32_t. */
 NXAST_RAW_SET_TUNNEL,
@@ -3850,19 +3852,20 @@ format_FIN_TIMEOUT(const struct ofpact_fin_timeout *a, 
struct ds *s)
 ds_put_format(s, "%s)%s", colors.paren, colors.end);
 }
 
-/* Action structures for NXAST_RESUBMIT and NXAST_RESUBMIT_TABLE.
+/* Action structures for NXAST_RESUBMIT, NXAST_RESUBMIT_TABLE, and
+ * NXAST_RESUBMIT_TABLE_CT.
  *
  * These actions search one of the switch's flow tables:
  *
- *- For NXAST_RESUBMIT_TABLE only, if the 'table' member is not 255, then
- *  it specifies the table to search.
+ *- For NXAST_RESUBMIT_TABLE and NXAST_RESUBMIT_TABLE_CT only, if the
+ *  'table' member is not 255, then it specifies the table to search.
  *
- *- Otherwise (for NXAST_RESUBMIT_TABLE with a 'table' of 255, or for
- *  NXAST_RESUBMIT regardless of 'table'), it searches the current flow
- *  table, that is, the OpenFlow flow table that contains the flow from
- *  which this action was obtained.  If this action did not come from a
- *  flow table (e.g. it came from an OFPT_PACKET_OUT message), then table 0
- *  is the current table.
+ *- Otherwise (for NXAST_RESUBMIT_TABLE or NXAST_RESUBMIT_TABLE_CT with a
+ *  'table' of 255, or for NXAST_RESUBMIT regardless of 'table'), it
+ *  searches the current flow table, that is, the OpenFlow flow table that
+ *  contains the flow from which this action was obtained.  If this action
+ *  did not come from a flow table (e.g. it came from an OFPT_PACKET_OUT
+ *  message), then table 0 is the current table.
  *
  * The flow table lookup uses a flow that may be slightly modified from the
  * original lookup:
@@ -3870,9 +3873,12 @@ format_FIN_TIMEOUT(const struct ofpact_fin_timeout *a, 
struct ds *s)
  *- For NXAST_RESUBMIT, the 'in_port' member of struct nx_action_resubmit
  *  is used as the flow's in_port.
  *
- *- For NXAST_RESUBMIT_TABLE, if the 'in_port' member is not OFPP_IN_PORT,
- *  then its value is used as the flow's in_port.  Otherwise, the original
- *  in_port is used.
+ *- For NXAST_RESUBMIT_TABLE and NXAST_RESUBMIT_TABLE_CT, if the 'in_port'
+ *  member is not OFPP_IN_PORT, then its value is used as the flow's
+ *  in_port.  Otherwise, the original in_port is used.
+ *
+ *- For NXAST_RESUBMIT_TABLE_CT the Conntrack 5-tuple fields are used as
+ *  the packets IP header fields during the lookup.
  *
  *- If actions that modify the flow (e.g. OFPAT_SET_VLAN_VID) precede the
  *  resubmit action, then the flow is updated with the new values.
@@ -3905,11 +3911,12 @@ format_FIN_TIMEOUT(const struct ofpact_fin_timeout *a, 
struct ds *s)
  *  a total limit of 4,096 resubmits per flow translation (earlier versions
  *  did not impose any total limit).
  *
- * NXAST_RESUBMIT ignores 'table' and 'pad'.  NXAST_RESUBMIT_TABLE requires
- * 'pad' to be all-bits-zero.
+ * NXAST_RESUBMIT ignores 'table' and '

[ovs-dev] [PATCH v2 21/22] tests: Add an FTP test without conntrack.

2017-02-28 Thread Jarno Rajahalme
If FTP tests with conntrack fail, it is informative to know if the
problem is with the FTP client and/or server, or with conntrack
itself.

Signed-off-by: Jarno Rajahalme 
---
 tests/system-traffic.at | 29 +
 1 file changed, 29 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index ac9f989..1cc41b7 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -2040,6 +2040,35 @@ 
tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([FTP - no conntrack])
+AT_SKIP_IF([test $HAVE_PYFTPDLIB = no])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_DATA([flows.txt], [dnl
+table=0,action=normal
+])
+
+AT_CHECK([ovs-ofctl --bundle replace-flows br0 flows.txt])
+
+NETNS_DAEMONIZE([at_ns0], [[$PYTHON $srcdir/test-l7.py ftp]], [ftp1.pid])
+NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py ftp]], [ftp0.pid])
+OVS_WAIT_UNTIL([ip netns exec at_ns1 netstat -l | grep ftp])
+
+dnl FTP requests from p0->p1 should work fine.
+NS_CHECK_EXEC([at_ns0], [wget ftp://10.1.1.2 --no-passive-ftp -t 3 -T 1 
--retry-connrefused -v -o wget0.log])
+
+AT_CHECK([find -name index.html], [0], [dnl
+./index.html
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([conntrack - FTP])
 AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 19/22] conntrack: Force commit.

2017-02-28 Thread Jarno Rajahalme
Userspace support for force commit.

Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/ofp-actions.h |  7 +++-
 lib/conntrack.c   | 16 ++--
 lib/conntrack.h   |  2 +-
 lib/dpif-netdev.c |  8 +++-
 lib/odp-util.c| 20 -
 lib/ofp-actions.c | 29 +++--
 ofproto/ofproto-dpif-xlate.c  |  3 +-
 tests/odp.at  | 16 
 tests/ofp-actions.at  | 10 +
 tests/ofproto-dpif.at | 85 +++
 tests/system-traffic.at   | 53 
 tests/test-conntrack.c|  9 +++--
 utilities/ovs-ofctl.8.in  | 12 ++
 13 files changed, 253 insertions(+), 17 deletions(-)

diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 5ea0763..622dd7a 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -556,9 +556,14 @@ ofpact_nest_get_action_len(const struct ofpact_nest *on)
 /* Bits for 'flags' in struct nx_action_conntrack.
  *
  * If NX_CT_F_COMMIT is set, then the connection entry is moved from the
- * unconfirmed to confirmed list in the tracker. */
+ * unconfirmed to confirmed list in the tracker.
+ * If NX_CT_F_FORCE is set, in addition to NX_CT_F_COMMIT, then the conntrack
+ * entry is replaced with a new one in case the original direction of the
+ * existing entry is opposite of the current packet direction.
+ */
 enum nx_conntrack_flags {
 NX_CT_F_COMMIT = 1 << 0,
+NX_CT_F_FORCE  = 1 << 1,
 };
 
 /* Magic value for struct nx_action_conntrack 'recirc_table' field, to specify
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 1b66c8d..0b05be4 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -239,12 +239,21 @@ conn_not_found(struct conntrack *ct, struct dp_packet 
*pkt,
 static struct conn *
 process_one(struct conntrack *ct, struct dp_packet *pkt,
 struct conn_lookup_ctx *ctx, uint16_t zone,
-bool commit, long long now)
+bool force, bool commit, long long now)
 {
 unsigned bucket = hash_to_bucket(ctx->hash);
 struct conn *conn = ctx->conn;
 uint16_t state = 0;
 
+/* Delete found entry if in wrong direction. 'force' implies commit. */
+if (conn && force && ctx->reply) {
+ovs_list_remove(&conn->exp_node);
+hmap_remove(&ct->buckets[bucket].connections, &conn->node);
+atomic_count_dec(&ct->n_conn);
+delete_conn(conn);
+conn = NULL;
+}
+
 if (conn) {
 if (ctx->related) {
 state |= CS_RELATED;
@@ -301,7 +310,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
  * 'setlabel' behaves similarly for the connection label.*/
 int
 conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch,
-  ovs_be16 dl_type, bool commit, uint16_t zone,
+  ovs_be16 dl_type, bool force, bool commit, uint16_t zone,
   const uint32_t *setmark,
   const struct ovs_key_ct_labels *setlabel,
   const char *helper)
@@ -364,7 +373,8 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
 
 conn_key_lookup(ctb, &ctxs[j], now);
 
-conn = process_one(ct, pkts[j], &ctxs[j], zone, commit, now);
+conn = process_one(ct, pkts[j], &ctxs[j], zone, force, commit,
+   now);
 
 if (conn && setmark) {
 set_mark(pkts[j], conn, setmark[0], setmark[1]);
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 254f61c..0437cd3 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -65,7 +65,7 @@ void conntrack_init(struct conntrack *);
 void conntrack_destroy(struct conntrack *);
 
 int conntrack_execute(struct conntrack *, struct dp_packet_batch *,
-  ovs_be16 dl_type, bool commit,
+  ovs_be16 dl_type, bool force, bool commit,
   uint16_t zone, const uint32_t *setmark,
   const struct ovs_key_ct_labels *setlabel,
   const char *helper);
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 30907b7..de844d3 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4734,6 +4734,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 
 case OVS_ACTION_ATTR_CT: {
 const struct nlattr *b;
+bool force = false;
 bool commit = false;
 unsigned int left;
 uint16_t zone = 0;
@@ -4746,6 +4747,9 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 enum ovs_ct_attr sub_type = nl_attr_type(b);
 
 switch(sub_type) {
+case OVS_CT_ATTR_FORCE_COMMIT:
+force = true;
+/* fall through. */
 

[ovs-dev] [PATCH v2 22/22] datapath: Allow compiling against Linux 4.10

2017-02-28 Thread Jarno Rajahalme
OVS in-tree datapath compiles against Linux 4.10 kernel, so allow it.

Signed-off-by: Jarno Rajahalme 
---
 acinclude.m4 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index f4cbabd..3de635f 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -134,10 +134,10 @@ AC_DEFUN([OVS_CHECK_LINUX], [
 AC_MSG_RESULT([$kversion])
 
 if test "$version" -ge 4; then
-   if test "$version" = 4 && test "$patchlevel" -le 9; then
+   if test "$version" = 4 && test "$patchlevel" -le 10; then
   : # Linux 4.x
else
-  AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version 
newer than 4.9.x is not supported (please refer to the FAQ for advice)])
+  AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version 
newer than 4.10.x is not supported (please refer to the FAQ for advice)])
fi
 elif test "$version" = 3 && test "$patchlevel" -ge 10; then
: # Linux 3.x
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 15/22] odp: Support conntrack orig tuple key.

2017-02-28 Thread Jarno Rajahalme
Userspace support for datapath original direction conntrack tuple.

Signed-off-by: Jarno Rajahalme 
---
 build-aux/extract-ofp-fields|   3 +
 include/openvswitch/flow.h  |  15 ++-
 include/openvswitch/match.h |  16 +++
 include/openvswitch/meta-flow.h | 136 +
 lib/conntrack.c |  43 ++--
 lib/flow.c  | 220 
 lib/flow.h  |  50 +
 lib/match.c | 110 +++-
 lib/meta-flow.c | 157 +++-
 lib/meta-flow.xml   |  92 +
 lib/nx-match.c  |  40 ++--
 lib/nx-match.h  |   4 +-
 lib/odp-execute.c   |   4 +
 lib/odp-util.c  | 124 ++
 lib/odp-util.h  |   8 +-
 lib/ofp-util.c  |   7 +-
 lib/packets.h   |   5 +
 ofproto/ofproto-dpif-rid.h  |   2 +-
 ofproto/ofproto-dpif-sflow.c|   2 +
 ofproto/ofproto-dpif-xlate.c|  13 ++-
 ofproto/ofproto-dpif.c  |   2 +
 tests/odp.at|   2 +-
 tests/ofproto-dpif.at   |  30 +++---
 tests/ofproto.at|   7 ++
 tests/system-traffic.at | 142 --
 25 files changed, 1115 insertions(+), 119 deletions(-)

diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
index 498b887..a26d558 100755
--- a/build-aux/extract-ofp-fields
+++ b/build-aux/extract-ofp-fields
@@ -44,6 +44,9 @@ PREREQS = {"none": "MFP_NONE",
"IPv4": "MFP_IPV4",
"IPv6": "MFP_IPV6",
"IPv4/IPv6": "MFP_IP_ANY",
+   "CT": "MFP_CT_VALID",
+   "CTv4": "MFP_CTV4_VALID",
+   "CTv6": "MFP_CTV6_VALID",
"MPLS": "MFP_MPLS",
"TCP": "MFP_TCP",
"UDP": "MFP_UDP",
diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
index 9169272..5cd78e4 100644
--- a/include/openvswitch/flow.h
+++ b/include/openvswitch/flow.h
@@ -23,7 +23,7 @@
 /* This sequence number should be incremented whenever anything involving flows
  * or the wildcarding of flows changes.  This will cause build assertion
  * failures in places which likely need to be updated. */
-#define FLOW_WC_SEQ 36
+#define FLOW_WC_SEQ 37
 
 /* Number of Open vSwitch extension 32-bit registers. */
 #define FLOW_N_REGS 16
@@ -92,7 +92,7 @@ struct flow {
 union flow_in_port in_port; /* Input port.*/
 uint32_t recirc_id; /* Must be exact match. */
 uint8_t ct_state;   /* Connection tracking state. */
-uint8_t pad0;
+uint8_t ct_nw_proto;/* CT orig tuple IP protocol. */
 uint16_t ct_zone;   /* Connection tracking zone. */
 uint32_t ct_mark;   /* Connection mark.*/
 uint8_t pad1[4];/* Pad to 64 bits. */
@@ -110,8 +110,12 @@ struct flow {
 /* L3 (64-bit aligned) */
 ovs_be32 nw_src;/* IPv4 source address or ARP SPA. */
 ovs_be32 nw_dst;/* IPv4 destination address or ARP TPA. */
+ovs_be32 ct_nw_src; /* CT orig tuple IPv4 source address. */
+ovs_be32 ct_nw_dst; /* CT orig tuple IPv4 destination address. */
 struct in6_addr ipv6_src;   /* IPv6 source address. */
 struct in6_addr ipv6_dst;   /* IPv6 destination address. */
+struct in6_addr ct_ipv6_src; /* CT orig tuple IPv6 source address. */
+struct in6_addr ct_ipv6_dst; /* CT orig tuple IPv6 destination address. */
 ovs_be32 ipv6_label;/* IPv6 flow label. */
 uint8_t nw_frag;/* FLOW_FRAG_* flags. */
 uint8_t nw_tos; /* IP ToS (including DSCP and ECN). */
@@ -126,8 +130,11 @@ struct flow {
 /* L4 (64-bit aligned) */
 ovs_be16 tp_src;/* TCP/UDP/SCTP source port/ICMP type. */
 ovs_be16 tp_dst;/* TCP/UDP/SCTP destination port/ICMP code. */
+ovs_be16 ct_tp_src; /* CT original tuple source port/ICMP type. */
+ovs_be16 ct_tp_dst; /* CT original tuple dst port/ICMP code. */
 ovs_be32 igmp_group_ip4;/* IGMP group IPv4 address.
  * Keep last for BUILD_ASSERT_DECL below. */
+ovs_be32 pad4;  /* Pad to 64 bits. */
 };
 BUILD_ASSERT_DECL(sizeof(struct flow) % sizeof(uint64_t) == 0);
 BUILD_ASSERT_DECL(sizeof(struct flow_tnl) % sizeof(uint64_t) == 0);
@@ -136,8 +143,8 @@ BUILD_ASSERT_DECL(sizeof(struct flow_tnl) % 
sizeof(uint64_t) == 0);
 
 /* Remember to update FLOW_WC_SEQ when changing 'struct flow'. */
 BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)
-  == sizeof(struct flow_tnl) + 248
-  && FLOW

Re: [ovs-dev] [PATCH 0/8] Upstream Linux kernel datapath backports.

2017-03-02 Thread Jarno Rajahalme

> On Mar 1, 2017, at 7:27 PM, Joe Stringer  wrote:
> 
> On 15 February 2017 at 17:34, Jarno Rajahalme  wrote:
>> Many contributors are currently working on backporting upstream Linux
>> kernel datapath changes to the OVS tree kernel datapath.  This series
>> addresses apparent gaps in this work as follows:
>> 
>> In this series:
>> 08733a0 netfilter: handle NF_REPEAT from nf_conntrack_in()
>> 
>> Already applied:
>> 56989f6 genetlink: mark families as __ro_after_init
>> 489111e genetlink: statically initialize families
>> a07ea4d genetlink: no longer support using static family IDs
>> 
>> In this series:
>> 9157208 net: use core MTU range checking in core net infra
>> 76e4cc7 openvswitch: remove unnecessary EXPORT_SYMBOLs
>> f33eb0c openvswitch: remove unused functions
>> 
>> Empty merge commit:
>> 8eed1cd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
>> 
>> Skipped (Should be addressed with the main 802.1AD backports):
>> 3145c03 openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev
>> 72ec108 openvswitch: fix vlan subtraction from packet length
>> 20ecf1e openvswitch: vlan: remove wrong likely statement
>> 
>> Skipped (Should be addressed with the main "net: mpls: Fixups for GSO"
>> backports):
>> c66549f openvswitch: correctly fragment packet with mpls headers
>> 85de4a2 openvswitch: use mpls_hdr
>> f7d49bc openvswitch: mpls: set network header correctly on key extract
>> 
>> In this series:
>> 2279994 openvswitch: avoid resetting flow key while installing new flow.
>> 190aa3e openvswitch: Fix Frame-size larger than 1024 bytes warning.
>> db74a33 openvswitch: use percpu flow stats
>> 4077396 openvswitch: fix flow stats accounting when node 0 is not possible
>> 
>> Already applied:
>> 2679d04 openvswitch: avoid deferred execution of recirc actions
>> ed22709 openvswitch: use alias for genetlink family names
> 
> 
> Thanks for submitting this series, and outlining the status of all of
> the patches you looked at. I've reviewed them and they look good.
> 
> There was one patch for which the backport included more invasive
> changes to datapath/*.c files than the original changes, which I've
> amended here:
> 
> https://github.com/joestringer/openvswitch/commit/807e09f0372b848001f6fe797d9040e9da30abb7
> 
> I'm working through the various backports series at the moment. I
> noticed that the following backports were missing:
> https://github.com/joestringer/openvswitch/commit/8fa9841b9b06139f69f0138868ed9d6df730b748
> https://github.com/joestringer/openvswitch/commit/c598981830e1274391fb80bae6ca9862e6c53fda
> 
> Would you mind taking a look at these in place on github?

I checked out all three at github and they seem good to me:

Acked-by: Jarno Rajahalme 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/3] lib: Refactor nested netlink APIs.

2017-03-02 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme 

> On Feb 16, 2017, at 5:11 PM, Andy Zhou  wrote:
> 
> Future patches will make use of those changes.
> 
> Signed-off-by: Andy Zhou 
> ---
> lib/netlink.c | 19 ---
> lib/netlink.h |  3 ++-
> 2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/netlink.c b/lib/netlink.c
> index ad7d35a..ae4c72a 100644
> --- a/lib/netlink.c
> +++ b/lib/netlink.c
> @@ -467,16 +467,29 @@ nl_msg_end_nested(struct ofpbuf *msg, size_t offset)
> attr->nla_len = msg->size - offset;
> }
> 
> -/* Same as nls_msg_end_nested() when the nested Netlink contains non empty
> - * message. Otherwise, drop the nested message header from 'msg'.*/
> +/* Cancel a nested Netlink attribute in 'msg'.  'offset' should be the value
> + * returned by nl_msg_start_nested(). */
> void
> +nl_msg_cancel_nested(struct ofpbuf *msg, size_t offset)
> +{
> +msg->size = offset;
> +}
> +
> +/* Same as nls_msg_end_nested() when the nested Netlink contains non empty
> + * message. Otherwise, drop the nested message header from 'msg'.
> + *
> + * Return true if the nested message has been dropped.  */
> +bool
> nl_msg_end_non_empty_nested(struct ofpbuf *msg, size_t offset)
> {
> nl_msg_end_nested(msg, offset);
> 
> struct nlattr *attr = ofpbuf_at_assert(msg, offset, sizeof *attr);
> if (!nl_attr_get_size(attr)) {
> -msg->size = offset;
> +nl_msg_cancel_nested(msg, offset);
> +return true;
> +} else {
> +return false;
> }
> }
> 
> diff --git a/lib/netlink.h b/lib/netlink.h
> index 7646f91..bb4dbf6 100644
> --- a/lib/netlink.h
> +++ b/lib/netlink.h
> @@ -79,7 +79,8 @@ void nl_msg_put_string(struct ofpbuf *, uint16_t type, 
> const char *value);
> 
> size_t nl_msg_start_nested(struct ofpbuf *, uint16_t type);
> void nl_msg_end_nested(struct ofpbuf *, size_t offset);
> -void nl_msg_end_non_empty_nested(struct ofpbuf *, size_t offset);
> +void nl_msg_cancel_nested(struct ofpbuf *, size_t offset);
> +bool nl_msg_end_non_empty_nested(struct ofpbuf *, size_t offset);
> void nl_msg_put_nested(struct ofpbuf *, uint16_t type,
>const void *data, size_t size);
> 
> -- 
> 1.8.3.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/3] xlate: Translate openflow clone into odp sample action.

2017-03-02 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme 

> On Feb 16, 2017, at 5:11 PM, Andy Zhou  wrote:
> 
> When datapath does not support the 'clone' action directly, generate
> sample action (with 100% probability) instead.
> 
> Specifically, currently, there is no plan to support the 'clone'
> action on the Linux kernel datapath directly, so the sample action
> will be used to translate the openflow clone action for this datapath.
> 
> Signed-off-by: Andy Zhou 
> ---
> ofproto/ofproto-dpif-xlate.c | 38 --
> tests/ofproto-dpif.at|  2 +-
> 2 files changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index c4ca5d2..1a5fdf8 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -4659,18 +4659,36 @@ xlate_sample_action(struct xlate_ctx *ctx,
>   tunnel_out_port, false);
> }
> 
> -/* Only called if the datapath supports 'OVS_ACTION_ATTR_CLONE'.
> - *
> - * Translates 'oc' within OVS_ACTION_ATTR_CLONE. */
> +/* Use datapath 'clone' or sample to enclose the translation of 'oc'.   */
> static void
> compose_clone_action(struct xlate_ctx *ctx, const struct ofpact_nest *oc)
> {
> size_t clone_offset = nl_msg_start_nested(ctx->odp_actions,
>   OVS_ACTION_ATTR_CLONE);
> +do_xlate_actions(oc->actions, ofpact_nest_get_action_len(oc), ctx);
> +nl_msg_end_non_empty_nested(ctx->odp_actions, clone_offset);
> +}
> +
> +/* Use datapath 'sample' action to translate clone.  */
> +static void
> +compose_clone_action_using_sample(struct xlate_ctx *ctx,
> +  const struct ofpact_nest *oc)
> +{
> +size_t offset = nl_msg_start_nested(ctx->odp_actions,
> +OVS_ACTION_ATTR_SAMPLE);
> +
> +size_t ac_offset = nl_msg_start_nested(ctx->odp_actions,
> +   OVS_SAMPLE_ATTR_ACTIONS);
> 
> do_xlate_actions(oc->actions, ofpact_nest_get_action_len(oc), ctx);
> 
> -nl_msg_end_non_empty_nested(ctx->odp_actions, clone_offset);
> +if (nl_msg_end_non_empty_nested(ctx->odp_actions, ac_offset)) {
> +nl_msg_cancel_nested(ctx->odp_actions, offset);
> +} else {
> +nl_msg_put_u32(ctx->odp_actions, OVS_SAMPLE_ATTR_PROBABILITY,
> +   UINT32_MAX); /* 100% probability. */
> +nl_msg_end_nested(ctx->odp_actions, offset);
> +}
> }
> 
> static void
> @@ -4690,16 +4708,16 @@ xlate_clone(struct xlate_ctx *ctx, const struct 
> ofpact_nest *oc)
> ofpbuf_use_stub(&ctx->action_set, actset_stub, sizeof actset_stub);
> ofpbuf_put(&ctx->action_set, old_action_set.data, old_action_set.size);
> 
> +/* Datapath clone action will make sure the pre clone packets
> + * are used for actions after clone. Save and restore
> + * ctx->base_flow to reflect this for the openflow pipeline. */
> +struct flow old_base_flow = ctx->base_flow;
> if (ctx->xbridge->support.clone) {
> -/* Datapath clone action will make sure the pre clone packets
> - * are used for actions after clone. Save and restore
> - * ctx->base_flow to reflect this for the openflow pipeline. */
> -struct flow old_base_flow = ctx->base_flow;
> compose_clone_action(ctx, oc);
> -ctx->base_flow = old_base_flow;
> } else {
> -do_xlate_actions(oc->actions, ofpact_nest_get_action_len(oc), ctx);
> +compose_clone_action_using_sample(ctx, oc);
> }
> +ctx->base_flow = old_base_flow;
> 
> ofpbuf_uninit(&ctx->action_set);
> ctx->action_set = old_action_set;
> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
> index e861d9f..f1415e4 100644
> --- a/tests/ofproto-dpif.at
> +++ b/tests/ofproto-dpif.at
> @@ -6457,7 +6457,7 @@ AT_CHECK([ovs-appctl dpif/disable-dp-clone br0], [0],
> AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.10.10.2,dst=10.10.10.1,proto=1,tos=1,ttl=128,frag=no),icmp(type=8,code=0)'],
>  [0], [stdout])
> 
> AT_CHECK([tail -1 stdout], [0], [dnl
> -Datapath actions: 
> set(ipv4(src=10.10.10.2,dst=192.168.4.4)),2,set(eth(src=80:81:81:81:81:81)),set(ipv4(src=10.10.10.2,dst=192.168.5.5)),3,set(eth(src=50:54:00:00:00:09)),set(ipv4(src=10.10.10.2,dst=10.10.10.1)),4
> +Datapath actions: 
> sample(sample=100.0%,actions(set(ipv4(src=10.10.10.2,dst=192.168.4.4)),2)),sample(sample=100.0%,actions(set(eth(src=80:81:81:81:81:81)),set(ipv4(src=10.10.10.2,dst=192.168.5.5)),3)),4
> ])
> 
> OVS_VSWITCHD_STOP
> -- 
> 1.8.3.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/3] ofproto-dpif: Enhance execute_controller_action().

2017-03-02 Thread Jarno Rajahalme
With the notes below:

Acked-by: Jarno Rajahalme 

> On Feb 16, 2017, at 5:11 PM, Andy Zhou  wrote:
> 
> Allow execute_controller_action() to accept actions encoded with
> nested netlink attributes.
> 
> execute_controller_action() can be called during 'xlate_actions'. It
> tries executes all actions translated so far to get the current packet
> that needs to be sent to the controller.  This works fine until when
> the action is enclosed within a nested netlink message, and the
> action translation has not finished yet.
> 
> For example;
> A, clone(B, controller, C)
> 
> In this case, we can not execute 'clone' since its translation has not
> be finished (missing C), However, A still needs to be executed before
> the packet can be sent to the controller.
> 
> This solution is to make a copy of the odp actions translated so far,
> and 'fix up' the copy so that it can be executed. The original odp
> actions are left intact so that xlate can continue.
> 
> Signed-off-by: Andy Zhou 
> ---
> ofproto/ofproto-dpif-xlate.c | 149 +--
> 1 file changed, 144 insertions(+), 5 deletions(-)
> 
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 503a347..c4ca5d2 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3805,13 +3805,150 @@ flood_packets(struct xlate_ctx *ctx, bool all)
> ctx->nf_output_iface = NF_OUT_FLOOD;
> }
> 
> +/* Copy and reformat a partially xlated odp actions to a new
> + * odp actions list in 'b', so that the new actions list
> + * can be executed by odp_execute_actions.
> + *
> + * When xlate using nested odp actions, such as sample and clone,
> + * The nested action created by nl_msg_start_nested() may not

“the”

> + * have been properly closed yet, thus can not be executed
> + * directly.
> + *
> + * Since unclosed nested action has to be last action, it can be
> + * fixed by skip the outer header, and treat the actions within

“skipping”, “treating"

> + * as if they are outside the nested attribute. Since the effect

“, since"

> + * of executing them on packet is the same.
> + *
> + * As an optimization, a fully closed 'sample' or 'clone' action
> + * is skipped since their execution has no effect to the packet.
> + *

In this case the actions are executed without a datapath helper, so none of the 
datapath dependent actions (HASH, OUTPUT, USERSPACE, RECIRC, CT) are actually 
executed, so maybe they could be skipped as well? Same for the TRUNC, as it 
only has an effect on OUTPUT, which will not be executed.

> + * Returns true if success. 'b' contains the new actions list.
> + * The caller is responsible for dispose 'b'.
> + *

“disposing"

> + * Returns false if error, 'b' has been freed already.  */
> +static bool
> +xlate_fixup_actions(struct ofpbuf *b, const struct nlattr *actions,
> +size_t actions_len)
> +{
> +const struct nlattr *a;
> +unsigned int left;
> +
> +NL_ATTR_FOR_EACH_UNSAFE (a, left, actions, actions_len) {
> +int type = nl_attr_type(a);
> +
> +switch ((enum ovs_action_attr) type) {
> +case OVS_ACTION_ATTR_HASH:
> +case OVS_ACTION_ATTR_PUSH_VLAN:
> +case OVS_ACTION_ATTR_POP_VLAN:
> +case OVS_ACTION_ATTR_PUSH_MPLS:
> +case OVS_ACTION_ATTR_POP_MPLS:
> +case OVS_ACTION_ATTR_SET:
> +case OVS_ACTION_ATTR_SET_MASKED:
> +case OVS_ACTION_ATTR_TRUNC:
> +case OVS_ACTION_ATTR_OUTPUT:
> +case OVS_ACTION_ATTR_TUNNEL_PUSH:
> +case OVS_ACTION_ATTR_TUNNEL_POP:
> +case OVS_ACTION_ATTR_USERSPACE:
> +case OVS_ACTION_ATTR_RECIRC:
> +case OVS_ACTION_ATTR_CT:
> +ofpbuf_put(b, a, nl_attr_len_pad(a, left));
> +break;
> +
> +case OVS_ACTION_ATTR_CLONE:
> +/* If the clone action has been fully xlated, it can
> + * be skipped, since any actions executed within clone
> + * do not affect the current packet.
> + *
> + * When xlating actions wihtin clone, the clone action,

“within”

> + * because it is an nested netlink attribute, do not have
> + * a vlaid 'nla_len'; it will be zero instead.  Skip

“valid”

> + * the clone heaer to find the start of the actions

“header”

> + * enclosed. Treat those actions as if they are written
> + * outside of clone.   */
> +if (!a->nla_len) {
> +bool ok;
> +if (left < NLA_HDRLEN) {
> +   

Re: [ovs-dev] [PATCH v3 00/16] port Jiri Benc's L3 patchset to ovs

2017-03-02 Thread Jarno Rajahalme

> On Mar 2, 2017, at 11:56 AM, Joe Stringer  wrote:
> 
> Thanks for looking it over, that sounds reasonable. I'll be looking
> forward along the other backports to try to get us back in better sync
> with upstream.
> 
> I will mention that at this stage, the tree that I pointed to is
> missing the MPLS GSO backport but everything else should be up to date
> until the end of the L3 tunnelling series. I'll follow up on the MPLS
> GSO thread regarding that[0].
> 
> [0] http://patchwork.ozlabs.org/patch/725891/
> 

I agree that we should not block everything else on this MPLS backport issue, 
so IMO you could merge the tree with the understanding the we’ll deal with the 
MPLS soon after.

  Jarno

> On 1 March 2017 at 20:23, Yang, Yi Y  wrote:
>> Joe, I checked this tree 
>> https://github.com/joestringer/openvswitch/commits/dev/backport_review_v0.6 
>> , it included all the 802.1ad patches and Jiri's l3 kernel data path 
>> patches, so I think this is the first step we should take, once they are 
>> officially merged, Jan and I will rework userspace l3 patches and vxlangpe 
>> patches and resubmit them based on your tree.
>> 
>> -Original Message-
>> From: Joe Stringer [mailto:j...@ovn.org]
>> Sent: Thursday, March 2, 2017 11:43 AM
>> To: Yang, Yi Y 
>> Cc: ovs dev ; Jarno Rajahalme 
>> Subject: Re: [ovs-dev] [PATCH v3 00/16] port Jiri Benc's L3 patchset to ovs
>> 
>> On 6 February 2017 at 05:04, Yi Yang  wrote:
>>> This patch set just ports Jiri Benc's L3 8 support patches for layer 3 
>>> encapsulated packets from net-next to current ovs, it also includes Jiri 
>>> Benc's 3 userspace patches, Jarno Rajahalme and Pravin Shelar's vlan fix 
>>> patches for L3 patchset as well as my 3 patches which enabled vxlangpe in 
>>> compat mode and dpdk netdev in both L2 and L3(layer3=true) mode.
>>> 
>>> This patchset has been verified on Ubuntu 14.04 x86_64 with Linux kernel 
>>> 3.13.0-24-generic and 4.9.7, it also passed "make check"
>>> and "sudo make check-kmod RECHECK=yes" in Fedora 23 with kernel
>>> 4.2.3-300.fc23.x86_64
>>> 
>>> This patch set is based on 
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328492.html 
>>> ([PATCH v2 0/4] Backport 802.1ad patches), please merge this one after 
>>> merging [PATCH v2 0/4] Backport 802.1ad patches.
>>> 
>>> Yi Yang (16):
>>>  datapath: use hard_header_len instead of hardcoded ETH_HLEN
>>>  datapath: add mac_proto field to the flow key
>>>  datapath: pass mac_proto to ovs_vport_send
>>>  datapath: support MPLS push and pop for L3 packets
>>>  datapath: add processing of L3 packets
>>>  datapath: netlink: support L3 packets
>>>  datapath: add Ethernet push and pop actions
>>>  datapath: allow L3 netdev ports
>>>  userspace: add support for pop_eth and push_eth actions
>>>  userspace: add layer 3 flow and switching support
>>>  userspace: add non-tap (l3) support to GRE vports
>>>  datapath: Add a missing break statement
>>>  datapath: upcall: Fix vlan handling.
>>>  datapath: enable vxlangpe creation in compat mode
>>>  userspace: enable layer3 option for vxlan-gpe
>>>  userspace: add vxlan-gpe support for dpdk netdev
>> 
>> Picking this thread back up, apologies for the delay..
>> 
>> Given that these backports + vlan + other series by Jarno may be 
>> interdependent on each other, I figured that I will assemble a single tree 
>> that brings them all together, run travis and local kmod testing on a 
>> variety of platforms, as well as per-commit compile checks with kernels 4.9 
>> and 3.13.
>> 
>> Here's the current tree I'm testing:
>> https://github.com/joestringer/openvswitch/commits/dev/backport_review_v0.6
>> 
>> Travis looks good (build check against a range of kernels):
>> https://travis-ci.org/joestringer/openvswitch/builds/206841787
>> 
>> My local system-kmod testing on a variety of platforms is coming out looking 
>> good so far.
>> 
>> I dropped the userspace changes since they are decoupled and superseded. 
>> Where they were necessary to fix the build, I folded in the minimal changes 
>> necessary to fix the patch so that the tree successfully compiles on each 
>> individual commit.
>> 
>> I'd appreciate if you could look over that tree once more, there's a couple 
>> of new patches that I backported but otherwise it's all series that have 
>> been out on the mailinglist o

Re: [ovs-dev] [PATCH v2 09/22] datapath: Refactor labels initialization.

2017-03-03 Thread Jarno Rajahalme

> On Mar 2, 2017, at 5:26 PM, Joe Stringer  wrote:
> 
> On 28 February 2017 at 17:17, Jarno Rajahalme  <mailto:ja...@ovn.org>> wrote:
>> Upstream commit:
>> 
>>Refactoring conntrack labels initialization makes changes in later
>>patches easier to review.
>> 
>>Signed-off-by: Jarno Rajahalme 
>>Acked-by: Pravin B Shelar 
>>Acked-by: Joe Stringer 
>>    Signed-off-by: David S. Miller 
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> datapath/conntrack.c | 113 
>> ++-
>> 1 file changed, 66 insertions(+), 47 deletions(-)
>> 
>> diff --git a/datapath/conntrack.c b/datapath/conntrack.c
>> index dacf34c..adc4315 100644
>> --- a/datapath/conntrack.c
>> +++ b/datapath/conntrack.c
>> @@ -243,19 +243,12 @@ int ovs_ct_put_key(const struct sw_flow_key *key, 
>> struct sk_buff *skb)
>>return 0;
>> }
>> 
>> -static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
>> +static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key,
>>   u32 ct_mark, u32 mask)
>> {
>> #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
>> -   enum ip_conntrack_info ctinfo;
>> -   struct nf_conn *ct;
>>u32 new_mark;
>> 
>> -   /* The connection could be invalid, in which case set_mark is no-op. 
>> */
>> -   ct = nf_ct_get(skb, &ctinfo);
>> -   if (!ct)
>> -   return 0;
>> -
>>new_mark = ct_mark | (ct->mark & ~(mask));
>>if (ct->mark != new_mark) {
>>ct->mark = new_mark;
>> @@ -270,56 +263,71 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
>> sw_flow_key *key,
>> #endif
>> }
>> 
>> -static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key,
>> -const struct ovs_key_ct_labels *labels,
>> -const struct ovs_key_ct_labels *mask)
>> +static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct)
>> {
>> -   enum ip_conntrack_info ctinfo;
>>struct nf_conn_labels *cl;
>> -   struct nf_conn *ct;
>> -
>> -   /* The connection could be invalid, in which case set_label is 
>> no-op.*/
>> -   ct = nf_ct_get(skb, &ctinfo);
>> -   if (!ct)
>> -   return 0;
>> 
>>cl = nf_ct_labels_find(ct);
>>if (!cl) {
>>nf_ct_labels_ext_add(ct);
>>cl = nf_ct_labels_find(ct);
>>}
>> +   if (cl && ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN)
>> +   return NULL;
> 
> The above two lines were not introduced in the upstream code. Do you
> intend to introduce them?
> 

Should have mentioned this in a commit message or in a comment. The inclusion 
of this test is intentional, and the rationale is that it might be possible 
that the kernel is configured with too little space for labels. However, it is 
possible that the way OVS kernel module initializes the number of words in 
labels for older kernels already takes care of this, do you have a take on this?

  Jarno

> For my current working tree for review/build/test, I will drop these lines.
> 
>> +   return cl;
>> +}
>> +
>> +/* Initialize labels for a new, yet to be committed conntrack entry.  Note 
>> that
>> + * since the new connection is not yet confirmed, and thus no-one else has
>> + * access to it's labels, we simply write them over.
>> + */
>> +static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key,
>> + const struct ovs_key_ct_labels *labels,
>> + const struct ovs_key_ct_labels *mask)
>> +{
>> +   struct nf_conn_labels *cl;
>> +   u32 *dst;
>> +   int i;
>> 
>> -   if (!cl || ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN)
>> +   cl = ovs_ct_get_conn_labels(ct);
>> +   if (!cl)
>>return -ENOSPC;
>> 
>> -   if (nf_ct_is_confirmed(ct)) {
>> -   /* Triggers a change event, which makes sense only for
>> -* confirmed connections.
>> -*/
>> -   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
>> -   mask->ct_labels_32,
>> -   OVS_CT_LABELS_LEN_32);
>> -   if (err)
>> -  

Re: [ovs-dev] [PATCH v2 12/22] lib: Check match and action prerequisities with 'match'.

2017-03-03 Thread Jarno Rajahalme

> On Mar 2, 2017, at 5:34 PM, Joe Stringer  wrote:
> 
> On 28 February 2017 at 17:17, Jarno Rajahalme  <mailto:ja...@ovn.org>> wrote:
>> Supply the match mask to prerequisities checking when available.  This
>> allows checking for zero-valued matches.  Non-zero valued matches
>> imply the presense of corresponding mask bits, but for zero valued
>> matches we must explicitly check the mask, too.
>> 
>> This is required now only for conntrack validity checking due to the
>> conntrack state having and 'invalid' bit, but not 'valid' bit.  One
>> way to match an valid conntrack state is to match on the 'tracked' bit
>> being one and 'invalid' bit being zero.  The latter requires the
>> corresponding mask bit be verified.
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> include/openvswitch/meta-flow.h   |  5 ++--
>> include/openvswitch/ofp-actions.h |  4 ++--
>> lib/bundle.c  |  4 ++--
>> lib/bundle.h  |  3 ++-
>> lib/learn.c   | 15 ++--
>> lib/learn.h   |  3 ++-
>> lib/meta-flow.c   | 38 ++---
>> lib/multipath.c   |  4 ++--
>> lib/multipath.h   |  3 ++-
>> lib/nx-match.c| 17 ++---
>> lib/nx-match.h|  6 ++---
>> lib/ofp-actions.c | 50 
>> ---
>> lib/ofp-parse.c   |  2 +-
>> lib/ofp-util.c|  2 +-
>> ofproto/ofproto-dpif-trace.c  | 13 +-
>> ofproto/ofproto.c |  5 ++--
>> utilities/ovs-ofctl.c |  6 ++---
>> 17 files changed, 105 insertions(+), 75 deletions(-)
>> 
>> diff --git a/include/openvswitch/meta-flow.h 
>> b/include/openvswitch/meta-flow.h
>> index 83e2599..aac9945 100644
>> --- a/include/openvswitch/meta-flow.h
>> +++ b/include/openvswitch/meta-flow.h
>> @@ -1898,6 +1898,7 @@ void mf_get_mask(const struct mf_field *, const struct 
>> flow_wildcards *,
>> /* Prerequisites. */
>> bool mf_are_prereqs_ok(const struct mf_field *mf, const struct flow *flow,
>>struct flow_wildcards *wc);
>> +bool mf_are_match_prereqs_ok(const struct mf_field *, const struct match *);
>> 
>> static inline bool
>> mf_is_l3_or_higher(const struct mf_field *mf)
>> @@ -1959,8 +1960,8 @@ void mf_subfield_swap(const struct mf_subfield *,
>>   const struct mf_subfield *,
>>   struct flow *flow, struct flow_wildcards *);
>> 
>> -enum ofperr mf_check_src(const struct mf_subfield *, const struct flow *);
>> -enum ofperr mf_check_dst(const struct mf_subfield *, const struct flow *);
>> +enum ofperr mf_check_src(const struct mf_subfield *, const struct match *);
>> +enum ofperr mf_check_dst(const struct mf_subfield *, const struct match *);
>> 
>> /* Parsing and formatting. */
>> char *mf_parse(const struct mf_field *, const char *,
> 
> I assume that the above is OK from a library standpoint because we are
> not currently guaranteeing that APIs remain stable between release
> versions, only within the minor revisions of a particular release. IE,
> library users compiling against libopenvswitch from 2.7 will need to
> change their code to compile against libopenvswitch from 2.8.
> 

I did not remember this, but seems all right to me.

>> diff --git a/include/openvswitch/ofp-actions.h 
>> b/include/openvswitch/ofp-actions.h
>> index 88f573d..53d6b44 100644
>> --- a/include/openvswitch/ofp-actions.h
>> +++ b/include/openvswitch/ofp-actions.h
>> @@ -954,11 +954,11 @@ ofpacts_pull_openflow_instructions(struct ofpbuf 
>> *openflow,
>>const struct vl_mff_map *vl_mff_map,
>>struct ofpbuf *ofpacts);
>> enum ofperr ofpacts_check(struct ofpact[], size_t ofpacts_len,
>> -  struct flow *, ofp_port_t max_ports,
>> +  struct match *, ofp_port_t max_ports,
>>   uint8_t table_id, uint8_t n_tables,
>>   enum ofputil_protocol *usable_protocols);
>> enum ofperr ofpacts_check_consistency(struct ofpact[], size_t ofpacts_len,
>> -  struct flow *, ofp_port_t max_ports,
>> +  struct match *, ofp_port_t max_ports,
>>   uint8_t table_id, uint8_t n_tables,
>&

Re: [ovs-dev] [PATCH v2 13/22] datapath: Add original direction conntrack tuple to sw_flow_key.

2017-03-03 Thread Jarno Rajahalme

> On Mar 2, 2017, at 5:57 PM, Joe Stringer  wrote:
> 
> On 28 February 2017 at 17:17, Jarno Rajahalme  <mailto:ja...@ovn.org>> wrote:
>> Upstream commit:
>> 
>>commit 9dd7f8907c3705dc7a7a375d1c6e30b06e6daffc
>>Author: Jarno Rajahalme 
>>Date:   Thu Feb 9 11:21:59 2017 -0800
>> 
>>openvswitch: Add original direction conntrack tuple to sw_flow_key.
>> 
>>Add the fields of the conntrack original direction 5-tuple to struct
>>sw_flow_key.  The new fields are initially marked as non-existent, and
>>are populated whenever a conntrack action is executed and either finds
>>or generates a conntrack entry.  This means that these fields exist
>>for all packets that were not rejected by conntrack as untrackable.
>> 
>>The original tuple fields in the sw_flow_key are filled from the
>>original direction tuple of the conntrack entry relating to the
>>current packet, or from the original direction tuple of the master
>>conntrack entry, if the current conntrack entry has a master.
>>Generally, expected connections of connections having an assigned
>>helper (e.g., FTP), have a master conntrack entry.
>> 
>>The main purpose of the new conntrack original tuple fields is to
>>allow matching on them for policy decision purposes, with the premise
>>that the admissibility of tracked connections reply packets (as well
>>as original direction packets), and both direction packets of any
>>related connections may be based on ACL rules applying to the master
>>connection's original direction 5-tuple.  This also makes it easier to
>>make policy decisions when the actual packet headers might have been
>>transformed by NAT, as the original direction 5-tuple represents the
>>packet headers before any such transformation.
>> 
>>When using the original direction 5-tuple the admissibility of return
>>and/or related packets need not be based on the mere existence of a
>>conntrack entry, allowing separation of admission policy from the
>>established conntrack state.  While existence of a conntrack entry is
>>required for admission of the return or related packets, policy
>>changes can render connections that were initially admitted to be
>>rejected or dropped afterwards.  If the admission of the return and
>>related packets was based on mere conntrack state (e.g., connection
>>being in an established state), a policy change that would make the
>>connection rejected or dropped would need to find and delete all
>>conntrack entries affected by such a change.  When using the original
>>direction 5-tuple matching the affected conntrack entries can be
>>allowed to time out instead, as the established state of the
>>connection would not need to be the basis for packet admission any
>>more.
>> 
>>It should be noted that the directionality of related connections may
>>be the same or different than that of the master connection, and
>>neither the original direction 5-tuple nor the conntrack state bits
>>carry this information.  If needed, the directionality of the master
>>connection can be stored in master's conntrack mark or labels, which
>>are automatically inherited by the expected related connections.
>> 
>>The fact that neither ARP nor ND packets are trackable by conntrack
>>allows mutual exclusion between ARP/ND and the new conntrack original
>>tuple fields.  Hence, the IP addresses are overlaid in union with ARP
>>    and ND fields.  This allows the sw_flow_key to not grow much due to
>>this patch, but it also means that we must be careful to never use the
>>new key fields with ARP or ND packets.  ARP is easy to distinguish and
>>keep mutually exclusive based on the ethernet type, but ND being an
>>ICMPv6 protocol requires a bit more attention.
>> 
>>Signed-off-by: Jarno Rajahalme 
>>Acked-by: Joe Stringer 
>>Acked-by: Pravin B Shelar 
>>Signed-off-by: David S. Miller 
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
> 
> I had to roll in the following incremental (derived from your later
> patch) to fix the build with this commit:
> 

Right, I forgot to mention that I left these patches separate knowing that they 
will not compile individually.

> diff --git a/lib/odp-execute.c b/lib/odp-execute.c
> index 1f6812a6dd02..50bbafaa0231 100644
> --- a/lib/odp-execute.c
> +++ b/lib/odp-execute.c
> @@ -381,6 +381,8 @@ odp_execute_set_action(struct dp_pa

Re: [ovs-dev] [PATCH v2 14/22] flow: Make room after ct_state.

2017-03-03 Thread Jarno Rajahalme

> On Mar 2, 2017, at 6:41 PM, Joe Stringer  wrote:
> 
> On 28 February 2017 at 17:17, Jarno Rajahalme  <mailto:ja...@ovn.org>> wrote:
>> 'ct_state' currently only needs 8 bits, so we can make room for a new
>> CT field introduced in the next patch.
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> include/openvswitch/flow.h | 3 ++-
>> lib/flow.c | 3 ++-
>> lib/match.c| 8 
>> lib/packets.h  | 2 +-
>> ofproto/ofproto-dpif.c | 2 +-
>> tests/ovs-ofctl.at | 2 +-
>> 6 files changed, 11 insertions(+), 9 deletions(-)
>> 
>> diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
>> index df80dfe..9169272 100644
>> --- a/include/openvswitch/flow.h
>> +++ b/include/openvswitch/flow.h
>> @@ -91,7 +91,8 @@ struct flow {
>>  * computation is opaque to the user space. 
>> */
>> union flow_in_port in_port; /* Input port.*/
>> uint32_t recirc_id; /* Must be exact match. */
>> -uint16_t ct_state;  /* Connection tracking state. */
>> +uint8_t ct_state;   /* Connection tracking state. */
>> +uint8_t pad0;
>> uint16_t ct_zone;   /* Connection tracking zone. */
>> uint32_t ct_mark;   /* Connection mark.*/
>> uint8_t pad1[4];/* Pad to 64 bits. */
>> diff --git a/lib/flow.c b/lib/flow.c
>> index fb7bfeb..0c95b75 100644
>> --- a/lib/flow.c
>> +++ b/lib/flow.c
>> @@ -593,7 +593,8 @@ miniflow_extract(struct dp_packet *packet, struct 
>> miniflow *dst)
>> miniflow_push_uint32(mf, in_port, odp_to_u32(md->in_port.odp_port));
>> if (md->recirc_id || md->ct_state) {
>> miniflow_push_uint32(mf, recirc_id, md->recirc_id);
>> -miniflow_push_uint16(mf, ct_state, md->ct_state);
>> +miniflow_push_uint8(mf, ct_state, md->ct_state);
>> +miniflow_push_uint8(mf, pad0, 0);
>> miniflow_push_uint16(mf, ct_zone, md->ct_zone);
>> }
>> 
>> diff --git a/lib/match.c b/lib/match.c
>> index 3fcaec5..882bf0c 100644
>> --- a/lib/match.c
>> +++ b/lib/match.c
>> @@ -340,8 +340,8 @@ match_set_ct_state(struct match *match, uint32_t 
>> ct_state)
>> void
>> match_set_ct_state_masked(struct match *match, uint32_t ct_state, uint32_t 
>> mask)
>> {
>> -match->flow.ct_state = ct_state & mask & UINT16_MAX;
>> -match->wc.masks.ct_state = mask & UINT16_MAX;
>> +match->flow.ct_state = ct_state & mask & UINT8_MAX;
>> +match->wc.masks.ct_state = mask & UINT8_MAX;
>> }
>> 
>> void
>> @@ -,7 +,7 @@ match_format(const struct match *match, struct ds *s, 
>> int priority)
>> }
>> 
>> if (wc->masks.ct_state) {
>> -if (wc->masks.ct_state == UINT16_MAX) {
>> +if (wc->masks.ct_state == UINT8_MAX) {
>> ds_put_format(s, "%sct_state=%s", colors.param, colors.end);
>> if (f->ct_state) {
>> format_flags(s, ct_state_to_string, f->ct_state, '|');
>> @@ -1120,7 +1120,7 @@ match_format(const struct match *match, struct ds *s, 
>> int priority)
>> }
>> } else {
>> format_flags_masked(s, "ct_state", ct_state_to_string,
>> -f->ct_state, wc->masks.ct_state, 
>> UINT16_MAX);
>> +f->ct_state, wc->masks.ct_state, UINT8_MAX);
>> }
>> ds_put_char(s, ',');
>> }
>> diff --git a/lib/packets.h b/lib/packets.h
>> index c4d3799..f7e1d82 100644
>> --- a/lib/packets.h
>> +++ b/lib/packets.h
>> @@ -99,7 +99,7 @@ struct pkt_metadata {
>>action. */
>> uint32_t skb_priority;  /* Packet priority for QoS. */
>> uint32_t pkt_mark;  /* Packet mark. */
>> -uint16_t ct_state;  /* Connection state. */
>> +uint8_t  ct_state;  /* Connection state. */
>> uint16_t ct_zone;   /* Connection zone. */
>> uint32_t ct_mark;   /* Connection mark. */
>> ovs_u128 ct_label;  /* Connection label. */
>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>> index 89c7b7f..e595f3b 100644
>> --- a/ofproto/ofproto-dpif.c
>> +++ b/ofproto/ofproto-dpif.c
>> @@ -3994,7 +3994,7 @@ check_mask(struct ofproto_dpif *ofproto, const struct

Re: [ovs-dev] [PATCH 2/2] datapath: Simplify do_execute_actions().

2017-03-03 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme 

> On Mar 2, 2017, at 7:29 PM, Joe Stringer  wrote:
> 
> From: andy zhou 
> 
> Upstream commit:
>commit 5b8784aaf29be20ba8d363e1124d7436d42ef9bf
>Author: Andy Zhou 
>Date: Fri Jan 27 13:45:28 2017 -0800
> 
>openvswitch: Simplify do_execute_actions().
> 
>do_execute_actions() implements a worthwhile optimization: in case
>an output action is the last action in an action list, skb_clone()
>can be avoided by outputing the current skb. However, the
>implementation is more complicated than necessary.  This patch
>simplify this logic.
> 
>Signed-off-by: Andy Zhou 
>Acked-by: Pravin B Shelar 
>Signed-off-by: David S. Miller 
> 
> Upstream: 5b8784aaf29b ("openvswitch: Simplify do_execute_actions().")
> Signed-off-by: Joe Stringer 
> ---
> datapath/actions.c | 42 --
> 1 file changed, 20 insertions(+), 22 deletions(-)
> 
> diff --git a/datapath/actions.c b/datapath/actions.c
> index 3af34357ecd0..abb6637133b0 100644
> --- a/datapath/actions.c
> +++ b/datapath/actions.c
> @@ -1125,12 +1125,6 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
> struct sw_flow_key *key,
> const struct nlattr *attr, int len)
> {
> - /* Every output action needs a separate clone of 'skb', but the common
> -  * case is just a single output action, so that doing a clone and
> -  * then freeing the original skbuff is wasteful.  So the following code
> -  * is slightly obscure just to avoid that.
> -  */
> - int prev_port = -1;
>   const struct nlattr *a;
>   int rem;
> 
> @@ -1138,20 +1132,28 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
>a = nla_next(a, &rem)) {
>   int err = 0;
> 
> - if (unlikely(prev_port != -1)) {
> - struct sk_buff *out_skb = skb_clone(skb, GFP_ATOMIC);
> -
> - if (out_skb)
> - do_output(dp, out_skb, prev_port, key);
> + switch (nla_type(a)) {
> + case OVS_ACTION_ATTR_OUTPUT: {
> + int port = nla_get_u32(a);
> + struct sk_buff *clone;
> +
> + /* Every output action needs a separate clone
> +  * of 'skb', In case the output action is the
> +  * last action, cloning can be avoided.
> +  */
> + if (nla_is_last(a, rem)) {
> + do_output(dp, skb, port, key);
> + /* 'skb' has been used for output.
> +  */
> + return 0;
> + }
> 
> + clone = skb_clone(skb, GFP_ATOMIC);
> + if (clone)
> + do_output(dp, clone, port, key);
>   OVS_CB(skb)->cutlen = 0;
> - prev_port = -1;
> - }
> -
> - switch (nla_type(a)) {
> - case OVS_ACTION_ATTR_OUTPUT:
> - prev_port = nla_get_u32(a);
>   break;
> + }
> 
>   case OVS_ACTION_ATTR_TRUNC: {
>   struct ovs_action_trunc *trunc = nla_data(a);
> @@ -1241,11 +1243,7 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
>   }
>   }
> 
> - if (prev_port != -1)
> - do_output(dp, skb, prev_port, key);
> - else
> - consume_skb(skb);
> -
> + consume_skb(skb);
>   return 0;
> }
> 
> -- 
> 2.11.1
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] datapath: maintain correct checksum state in conntrack actions.

2017-03-03 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme 

> On Mar 2, 2017, at 7:29 PM, Joe Stringer  wrote:
> 
> From: Lance Richardson 
> 
> Upstream commit:
>commit 75f01a4c9cc291ff5cb28ca1216adb163b7a20ee
>Author: Lance Richardson 
>Date: Thu Jan 12 19:33:18 2017 -0500
> 
>openvswitch: maintain correct checksum state in conntrack actions
> 
>When executing conntrack actions on skbuffs with checksum mode
>CHECKSUM_COMPLETE, the checksum must be updated to account for
>header pushes and pulls. Otherwise we get "hw csum failure"
>logs similar to this (ICMP packet received on geneve tunnel
>via ixgbe NIC):
> 
>[  405.740065] genev_sys_6081: hw csum failure
>[  405.740106] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G  I 
> 4.10.0-rc3+ #1
>[  405.740108] Call Trace:
>[  405.740110]  
>[  405.740113]  dump_stack+0x63/0x87
>[  405.740116]  netdev_rx_csum_fault+0x3a/0x40
>[  405.740118]  __skb_checksum_complete+0xcf/0xe0
>[  405.740120]  nf_ip_checksum+0xc8/0xf0
>[  405.740124]  icmp_error+0x1de/0x351 [nf_conntrack_ipv4]
>[  405.740132]  nf_conntrack_in+0xe1/0x550 [nf_conntrack]
>[  405.740137]  ? find_bucket.isra.2+0x62/0x70 [openvswitch]
>[  405.740143]  __ovs_ct_lookup+0x95/0x980 [openvswitch]
>[  405.740145]  ? netif_rx_internal+0x44/0x110
>[  405.740149]  ovs_ct_execute+0x147/0x4b0 [openvswitch]
>[  405.740153]  do_execute_actions+0x22e/0xa70 [openvswitch]
>[  405.740157]  ovs_execute_actions+0x40/0x120 [openvswitch]
>[  405.740161]  ovs_dp_process_packet+0x84/0x120 [openvswitch]
>[  405.740166]  ovs_vport_receive+0x73/0xd0 [openvswitch]
>[  405.740168]  ? udp_rcv+0x1a/0x20
>[  405.740170]  ? ip_local_deliver_finish+0x93/0x1e0
>[  405.740172]  ? ip_local_deliver+0x6f/0xe0
>[  405.740174]  ? ip_rcv_finish+0x3a0/0x3a0
>[  405.740176]  ? ip_rcv_finish+0xdb/0x3a0
>[  405.740177]  ? ip_rcv+0x2a7/0x400
>[  405.740180]  ? __netif_receive_skb_core+0x970/0xa00
>[  405.740185]  netdev_frame_hook+0xd3/0x160 [openvswitch]
>[  405.740187]  __netif_receive_skb_core+0x1dc/0xa00
>[  405.740194]  ? ixgbe_clean_rx_irq+0x46d/0xa20 [ixgbe]
>[  405.740197]  __netif_receive_skb+0x18/0x60
>[  405.740199]  netif_receive_skb_internal+0x40/0xb0
>[  405.740201]  napi_gro_receive+0xcd/0x120
>[  405.740204]  gro_cell_poll+0x57/0x80 [geneve]
>[  405.740206]  net_rx_action+0x260/0x3c0
>[  405.740209]  __do_softirq+0xc9/0x28c
>[  405.740211]  irq_exit+0xd9/0xf0
>[  405.740213]  do_IRQ+0x51/0xd0
>[  405.740215]  common_interrupt+0x93/0x93
> 
>Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
>Signed-off-by: Lance Richardson 
>Acked-by: Pravin B Shelar 
>Signed-off-by: David S. Miller 
> 
> Upstream: 75f01a4c9cc2 ("openvswitch: maintain correct checksum state in 
> conntrack actions")
> Signed-off-by: Joe Stringer 
> ---
> datapath/conntrack.c | 6 --
> 1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/datapath/conntrack.c b/datapath/conntrack.c
> index e4e76b836805..36db32abbf63 100644
> --- a/datapath/conntrack.c
> +++ b/datapath/conntrack.c
> @@ -555,7 +555,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
> nf_conn *ct,
>   int hooknum, nh_off, err = NF_ACCEPT;
> 
>   nh_off = skb_network_offset(skb);
> - skb_pull(skb, nh_off);
> + skb_pull_rcsum(skb, nh_off);
> 
>   /* See HOOK2MANIP(). */
>   if (maniptype == NF_NAT_MANIP_SRC)
> @@ -620,6 +620,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
> nf_conn *ct,
>   err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> push:
>   skb_push(skb, nh_off);
> + skb_postpush_rcsum(skb, skb->data, nh_off);
> 
>   return err;
> }
> @@ -927,7 +928,7 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
> 
>   /* The conntrack module expects to be working at L3. */
>   nh_ofs = skb_network_offset(skb);
> - skb_pull(skb, nh_ofs);
> + skb_pull_rcsum(skb, nh_ofs);
> 
>   if (key->ip.frag != OVS_FRAG_TYPE_NONE) {
>   err = handle_fragments(net, key, info->zone.id, skb);
> @@ -941,6 +942,7 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
>   err = ovs_ct_lookup(net, key, info, skb);
> 
>   skb_push(skb, nh_ofs);
> + skb_postpush_rcsum(skb, skb->data, nh_ofs);
>   if (err)
>   kfree_skb(skb);
>   return err;
> -- 
> 2.11.1
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 15/22] odp: Support conntrack orig tuple key.

2017-03-03 Thread Jarno Rajahalme
Thanks for the review Joe!

> On Mar 3, 2017, at 10:09 AM, Joe Stringer  wrote:
> 
> On 28 February 2017 at 17:17, Jarno Rajahalme  wrote:
>> Userspace support for datapath original direction conntrack tuple.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> Thanks for the submission. Some feedback below.
> 
> 
> 
>> diff --git a/include/openvswitch/meta-flow.h 
>> b/include/openvswitch/meta-flow.h
>> index aac9945..94cee20 100644
>> --- a/include/openvswitch/meta-flow.h
>> +++ b/include/openvswitch/meta-flow.h
>> @@ -740,6 +740,139 @@ enum OVS_PACKED_ENUM mf_field_id {
>>  */
>> MFF_CT_LABEL,
>> 
>> +/* "ct_nw_proto".
>> + *
>> + * The "protocol" byte in the IPv4 or IPv6 header for the original
>> + * direction conntrack tuple, or of the master conntrack entry, if the
>> + * current connection is a related connection.
>> + *
>> + * The value is initially zero and populated by the CT action.  The 
>> value
>> + * remains zero after the CT action only if the packet can not be
>> + * associated with a tracked connection, in which case the prerequisites
> 
> "Tracked" in the current API documentation refers to whether the
> packet was submitted to the connection tracker during the current
> pipeline processing, and not connection state. To refer to connections
> which have been committed, we call that "committed". See the
> "Connection Tracking Fields" section of ovs-fields(7) for more
> details.
> 

The intent is to not require the connection to be committed, as the value is 
properly populated also for the “new” packets. In the above, I was using the 
term “tracked connection” in a more general sense, and did not intend to refer 
to the “packet is tracked” ct_state bit. Maybe I should change this to “valid 
connection”?

> 
> 
>> @@ -383,61 +388,63 @@ parse_ethertype(const void **datap, size_t *sizep)
>> return htons(FLOW_DL_TYPE_NONE);
>> }
>> 
>> -static inline void
>> +/* Returns 'true' if the packet is an ND packet. */
>> +static inline bool
>> parse_icmpv6(const void **datap, size_t *sizep, const struct icmp6_hdr *icmp,
>>  const struct in6_addr **nd_target,
>>  struct eth_addr arp_buf[2])
>> {
>> -if (icmp->icmp6_code == 0 &&
>> -(icmp->icmp6_type == ND_NEIGHBOR_SOLICIT ||
>> - icmp->icmp6_type == ND_NEIGHBOR_ADVERT)) {
>> +if (icmp->icmp6_code != 0 ||
>> +(icmp->icmp6_type != ND_NEIGHBOR_SOLICIT &&
>> + icmp->icmp6_type != ND_NEIGHBOR_ADVERT)) {
>> +return false;
>> +}
>> 
>> -*nd_target = data_try_pull(datap, sizep, sizeof **nd_target);
>> -if (OVS_UNLIKELY(!*nd_target)) {
>> -return;
>> -}
>> +*nd_target = data_try_pull(datap, sizep, sizeof **nd_target);
>> +if (OVS_UNLIKELY(!*nd_target)) {
>> +return true;
>> +}
>> 
>> -while (*sizep >= 8) {
>> -/* The minimum size of an option is 8 bytes, which also is
>> - * the size of Ethernet link-layer options. */
>> -const struct ovs_nd_opt *nd_opt = *datap;
>> -int opt_len = nd_opt->nd_opt_len * ND_OPT_LEN;
>> +while (*sizep >= 8) {
>> +/* The minimum size of an option is 8 bytes, which also is
>> + * the size of Ethernet link-layer options. */
>> +const struct ovs_nd_opt *nd_opt = *datap;
>> +int opt_len = nd_opt->nd_opt_len * ND_OPT_LEN;
>> 
>> -if (!opt_len || opt_len > *sizep) {
>> -return;
>> -}
>> +if (!opt_len || opt_len > *sizep) {
>> +return true;
>> +}
>> 
>> -/* Store the link layer address if the appropriate option is
>> - * provided.  It is considered an error if the same link
>> - * layer option is specified twice. */
>> -if (nd_opt->nd_opt_type == ND_OPT_SOURCE_LINKADDR
>> -&& opt_len == 8) {
>> -if (OVS_LIKELY(eth_addr_is_zero(arp_buf[0]))) {
>> -arp_buf[0] = nd_opt->nd_opt_mac;
>> -} else {
>> -goto invalid;
>> -}
>> -} else if (nd_opt->nd_opt_type == ND_OPT_TARGET_LINKADDR
>> -   && opt_len == 8) {
>> -if (OVS_LIKELY(eth_addr_is_zero

Re: [ovs-dev] [PATCH v2 16/22] actions: Add resubmit with conntrack tuple.

2017-03-03 Thread Jarno Rajahalme

> On Mar 3, 2017, at 10:35 AM, Joe Stringer  wrote:
> 
> On 28 February 2017 at 17:17, Jarno Rajahalme  <mailto:ja...@ovn.org>> wrote:
>> Add resubmit option to use the Conntrack original direction tuple
>> swapped with the corresponding packet header fields during the lookup.
>> This could allow the same ACL table be used for admitting return
>> and/or related traffic as is used for admitting the original direction
>> traffic.
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> include/openvswitch/ofp-actions.h |   4 +-
>> lib/ofp-actions.c |  82 +++--
>> ofproto/ofproto-dpif-xlate.c  |  68 ++---
>> tests/ofp-actions.at  |   6 ++
>> tests/ofproto-dpif.at |  89 +--
>> tests/system-traffic.at   | 122 
>> --
>> utilities/ovs-ofctl.8.in  |  19 +-
>> 7 files changed, 310 insertions(+), 80 deletions(-)
>> 
>> diff --git a/include/openvswitch/ofp-actions.h 
>> b/include/openvswitch/ofp-actions.h
>> index 53d6b44..5ea0763 100644
>> --- a/include/openvswitch/ofp-actions.h
>> +++ b/include/openvswitch/ofp-actions.h
>> @@ -640,11 +640,13 @@ struct ofpact_nat {
>> 
>> /* OFPACT_RESUBMIT.
>>  *
>> - * Used for NXAST_RESUBMIT, NXAST_RESUBMIT_TABLE. */
>> + * Used for NXAST_RESUBMIT, NXAST_RESUBMIT_TABLE, NXAST_RESUBMIT_TABLE_CT. 
>> */
>> struct ofpact_resubmit {
>> struct ofpact ofpact;
>> ofp_port_t in_port;
>> uint8_t table_id;
>> +bool with_ct_orig;   /* Resubmit with Conntrack original direction tuple
>> +  * fields in place of IP header fields. */
>> };
>> 
>> /* Bits for 'flags' in struct nx_action_learn.
>> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
>> index 2869e0f..4d35a77 100644
>> --- a/lib/ofp-actions.c
>> +++ b/lib/ofp-actions.c
>> @@ -265,6 +265,8 @@ enum ofp_raw_action_type {
>> NXAST_RAW_RESUBMIT,
>> /* NX1.0+(14): struct nx_action_resubmit. */
>> NXAST_RAW_RESUBMIT_TABLE,
>> +/* NX1.0+(44): struct nx_action_resubmit. */
>> +NXAST_RAW_RESUBMIT_TABLE_CT,
>> 
>> /* NX1.0+(2): uint32_t. */
>> NXAST_RAW_SET_TUNNEL,
>> @@ -3850,19 +3852,20 @@ format_FIN_TIMEOUT(const struct ofpact_fin_timeout 
>> *a, struct ds *s)
>> ds_put_format(s, "%s)%s", colors.paren, colors.end);
>> }
>> 
>> -/* Action structures for NXAST_RESUBMIT and NXAST_RESUBMIT_TABLE.
>> +/* Action structures for NXAST_RESUBMIT, NXAST_RESUBMIT_TABLE, and
>> + * NXAST_RESUBMIT_TABLE_CT.
>>  *
>>  * These actions search one of the switch's flow tables:
>>  *
>> - *- For NXAST_RESUBMIT_TABLE only, if the 'table' member is not 255, 
>> then
>> - *  it specifies the table to search.
>> + *- For NXAST_RESUBMIT_TABLE and NXAST_RESUBMIT_TABLE_CT only, if the
>> + *  'table' member is not 255, then it specifies the table to search.
> 
> 'only' is a bit superfluous - it's now for 2 of the 3 cases.

removed.

> 
>>  *
>> - *- Otherwise (for NXAST_RESUBMIT_TABLE with a 'table' of 255, or for
>> - *  NXAST_RESUBMIT regardless of 'table'), it searches the current flow
>> - *  table, that is, the OpenFlow flow table that contains the flow from
>> - *  which this action was obtained.  If this action did not come from a
>> - *  flow table (e.g. it came from an OFPT_PACKET_OUT message), then 
>> table 0
>> - *  is the current table.
>> + *- Otherwise (for NXAST_RESUBMIT_TABLE or NXAST_RESUBMIT_TABLE_CT with 
>> a
>> + *  'table' of 255, or for NXAST_RESUBMIT regardless of 'table'), it
>> + *  searches the current flow table, that is, the OpenFlow flow table 
>> that
>> + *  contains the flow from which this action was obtained.  If this 
>> action
>> + *  did not come from a flow table (e.g. it came from an OFPT_PACKET_OUT
>> + *  message), then table 0 is the current table.
>>  *
>>  * The flow table lookup uses a flow that may be slightly modified from the
>>  * original lookup:
>> @@ -3870,9 +3873,12 @@ format_FIN_TIMEOUT(const struct ofpact_fin_timeout 
>> *a, struct ds *s)
>>  *- For NXAST_RESUBMIT, the 'in_port' member of struct nx_action_resubmit
>>  *  is used as the flow's in_port.
>>  *
>> - *- For NXAST_RESUBMIT_TABLE, if the 'in_po

Re: [ovs-dev] [PATCH v2 09/22] datapath: Refactor labels initialization.

2017-03-03 Thread Jarno Rajahalme

> On Mar 3, 2017, at 1:44 PM, Joe Stringer  wrote:
> 
> 
> 
> On 3/03/2017 10:37, "Jarno Rajahalme" mailto:ja...@ovn.org>> 
> wrote:
> 
>> On Mar 2, 2017, at 5:26 PM, Joe Stringer > <mailto:j...@ovn.org>> wrote:
>> 
>> On 28 February 2017 at 17:17, Jarno Rajahalme > <mailto:ja...@ovn.org>> wrote:
>>> Upstream commit:
>>> 
>>>Refactoring conntrack labels initialization makes changes in later
>>>patches easier to review.
>>> 
>>>Signed-off-by: Jarno Rajahalme mailto:ja...@ovn.org>>
>>>Acked-by: Pravin B Shelar mailto:pshe...@ovn.org>>
>>>Acked-by: Joe Stringer mailto:j...@ovn.org>>
>>>Signed-off-by: David S. Miller >> <mailto:da...@davemloft.net>>
>>> 
>>> Signed-off-by: Jarno Rajahalme mailto:ja...@ovn.org>>
>>> ---
>>> datapath/conntrack.c | 113 
>>> ++-
>>> 1 file changed, 66 insertions(+), 47 deletions(-)
>>> 
>>> diff --git a/datapath/conntrack.c b/datapath/conntrack.c
>>> index dacf34c..adc4315 100644
>>> --- a/datapath/conntrack.c
>>> +++ b/datapath/conntrack.c
>>> @@ -243,19 +243,12 @@ int ovs_ct_put_key(const struct sw_flow_key *key, 
>>> struct sk_buff *skb)
>>>return 0;
>>> }
>>> 
>>> -static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
>>> +static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key,
>>>   u32 ct_mark, u32 mask)
>>> {
>>> #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
>>> -   enum ip_conntrack_info ctinfo;
>>> -   struct nf_conn *ct;
>>>u32 new_mark;
>>> 
>>> -   /* The connection could be invalid, in which case set_mark is 
>>> no-op. */
>>> -   ct = nf_ct_get(skb, &ctinfo);
>>> -   if (!ct)
>>> -   return 0;
>>> -
>>>new_mark = ct_mark | (ct->mark & ~(mask));
>>>if (ct->mark != new_mark) {
>>>ct->mark = new_mark;
>>> @@ -270,56 +263,71 @@ static int ovs_ct_set_mark(struct sk_buff *skb, 
>>> struct sw_flow_key *key,
>>> #endif
>>> }
>>> 
>>> -static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key,
>>> -const struct ovs_key_ct_labels *labels,
>>> -const struct ovs_key_ct_labels *mask)
>>> +static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct)
>>> {
>>> -   enum ip_conntrack_info ctinfo;
>>>struct nf_conn_labels *cl;
>>> -   struct nf_conn *ct;
>>> -
>>> -   /* The connection could be invalid, in which case set_label is 
>>> no-op.*/
>>> -   ct = nf_ct_get(skb, &ctinfo);
>>> -   if (!ct)
>>> -   return 0;
>>> 
>>>cl = nf_ct_labels_find(ct);
>>>if (!cl) {
>>>nf_ct_labels_ext_add(ct);
>>>cl = nf_ct_labels_find(ct);
>>>}
>>> +   if (cl && ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN)
>>> +   return NULL;
>> 
>> The above two lines were not introduced in the upstream code. Do you
>> intend to introduce them?
>> 
> 
> Should have mentioned this in a commit message or in a comment. The inclusion 
> of this test is intentional, and the rationale is that it might be possible 
> that the kernel is configured with too little space for labels. However, it 
> is possible that the way OVS kernel module initializes the number of words in 
> labels for older kernels already takes care of this, do you have a take on 
> this?
> 
> When we compile the out of tree module for a particular kernel, this 
> information should be available. I don't think that we try to support 
> compiling against one kernel with one definition of the labels length, then 
> allow that same module to run on another kernel with a different definition. 
> So it should be fine to omit so long as there are still the compile time 
> checks.
> 

But my understanding is that the compile time checks only apply to newer 
kernels, where the available storage for labels is a compile time 
configuration, rather than a run-time number of words.

  Jarno

> 
>   Jarno
> 
>> For my current working tree for review/build/test, I will drop these lines.
>> 
>>

Re: [ovs-dev] [PATCH v2 22/22] datapath: Allow compiling against Linux 4.10

2017-03-03 Thread Jarno Rajahalme

> On Mar 3, 2017, at 3:43 PM, Joe Stringer  wrote:
> 
> On 28 February 2017 at 17:17, Jarno Rajahalme  wrote:
>> OVS in-tree datapath compiles against Linux 4.10 kernel, so allow it.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> Acked-by: Joe Stringer 
> 
> We should probably update the .travis.yml soon, syncing with the
> currently supported versions on kernel.org.

The caveat is that OVS tree MPLS code compiles, but does not work correctly 
until the MPLS backports are done.

  Jarno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto-dpif-xlate: fix build.

2017-03-06 Thread Jarno Rajahalme
Recent patch 27d931da3ac ("ofproto-dpif: Enhance
execute_controller_action().") missed some new action enumerations
added previously.

Fixes: 27d931da3ac ("ofproto-dpif: Enhance execute_controller_action().")
Signed-off-by: Jarno Rajahalme 
---
 ofproto/ofproto-dpif-xlate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index d915ba1..eda34f0 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3859,6 +3859,8 @@ xlate_fixup_actions(struct ofpbuf *b, const struct nlattr 
*actions,
 case OVS_ACTION_ATTR_USERSPACE:
 case OVS_ACTION_ATTR_RECIRC:
 case OVS_ACTION_ATTR_CT:
+case OVS_ACTION_ATTR_PUSH_ETH:
+case OVS_ACTION_ATTR_POP_ETH:
 ofpbuf_put(b, a, nl_attr_len_pad(a, left));
 break;
 
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 00/22] Conntrack enhancements

2017-03-06 Thread Jarno Rajahalme
This patch set backports the recent upstream conntrack fixes and new
features to the OVS tree kernel module, and adds the OVS userspace
support.

Patch 1/22 is an unrelated datapath backport, and patch 22/22 allows
compiling against Linux 4.10.

Each new feature is introduced in two different commits, the first is
the datapath backport, the second the corresponding userspace datapath
and non-datapath functionality, including OVS system tests.  In one
instance I have squashed the system test with the datapath backport.
Compile would fail after the first patch due to missing userspace code
for new enums.  We may decide to squash the datapath and userspace
changes together for the merge, but for now the review should be more
straightforward with the separation.

System tests have been most recently run on Linux 3.16, on which the
geneve tests fail, but that should have nothing to do with this
series.

v3: Address Joe's feedback.

Florian Westphal (2):
  datapath: add and use skb_nfct helper
  datapath: add and use nf_ct_set helper

Jarno Rajahalme (20):
  datapath: Allow compiling against Linux 4.10
  datapath: Fix comments for skb->_nfct
  datapath: Use inverted tuple in ovs_ct_find_existing() if NATted.
  datapath: Do not trigger events for unconfirmed connections.
  datapath: Unionize ovs_key_ct_label with a u32 array.
  datapath: Simplify labels length logic.
  datapath: Refactor labels initialization.
  datapath: Inherit master's labels.
  netlink: Simplify nl_msg_start_nested().
  lib: Check match and action prerequisities with 'match'.
  datapath: Add original direction conntrack tuple to sw_flow_key.
  flow: Make room after ct_state.
  ofp-util: Ignore unknown fields in ofputil_decode_packet_in2().
  odp: Support conntrack orig tuple key.
  actions: Add resubmit with conntrack tuple.
  compat: nf_ct_delete compat.
  datapath: Add force commit.
  conntrack: Force commit.
  datapath: Add a missing comment.
  tests: Add an FTP test without conntrack.

 acinclude.m4   |  13 +-
 build-aux/extract-ofp-fields   |   3 +
 datapath/actions.c |   2 +
 datapath/conntrack.c   | 295 +
 datapath/conntrack.h   |  10 +-
 datapath/flow.c|  34 +-
 datapath/flow.h|  49 ++-
 datapath/flow_netlink.c|  85 +++--
 datapath/flow_netlink.h|   7 +-
 datapath/linux/compat/include/linux/openvswitch.h  |  33 +-
 datapath/linux/compat/include/linux/skbuff.h   |  11 +
 .../compat/include/net/netfilter/nf_conntrack.h|   8 +
 .../include/net/netfilter/nf_conntrack_core.h  |  37 +++
 include/openvswitch/flow.h |  16 +-
 include/openvswitch/match.h|  16 +
 include/openvswitch/meta-flow.h| 141 +++-
 include/openvswitch/ofp-actions.h  |  15 +-
 lib/bundle.c   |   4 +-
 lib/bundle.h   |   3 +-
 lib/conntrack.c|  59 +++-
 lib/conntrack.h|   2 +-
 lib/dpif-netdev.c  |   8 +-
 lib/flow.c | 229 +
 lib/flow.h |  50 +++
 lib/learn.c|  15 +-
 lib/learn.h|   3 +-
 lib/match.c| 118 ++-
 lib/meta-flow.c| 193 ++-
 lib/meta-flow.xml  |  92 ++
 lib/multipath.c|   4 +-
 lib/multipath.h|   3 +-
 lib/netlink.c  |   2 +-
 lib/nx-match.c |  58 ++--
 lib/nx-match.h |  10 +-
 lib/odp-execute.c  |   4 +
 lib/odp-util.c | 144 -
 lib/odp-util.h |   8 +-
 lib/ofp-actions.c  | 161 +++---
 lib/ofp-parse.c|   2 +-
 lib/ofp-util.c |   9 +-
 lib/packets.h  |   7 +-
 ofproto/ofproto-dpif-rid.h |   2 +-
 ofproto/ofproto-dpif-sflow.c   |   2 +
 ofproto/ofproto-dpif-trace.c   |  13 +-
 ofproto/ofproto-dpif-xlate.c   |  91 +-
 ofproto/ofproto-dpif.c |   4 +-
 ofproto/ofproto.c  |   5 +-
 tests/odp.at   

[ovs-dev] [PATCH v3 01/22] datapath: Allow compiling against Linux 4.10

2017-03-06 Thread Jarno Rajahalme
OVS in-tree datapath compiles against Linux 4.10 kernel, so allow it.

Signed-off-by: Jarno Rajahalme 
Acked-by: Joe Stringer 
---
 acinclude.m4 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index 19cffe0..5ffb5a7 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -134,10 +134,10 @@ AC_DEFUN([OVS_CHECK_LINUX], [
 AC_MSG_RESULT([$kversion])
 
 if test "$version" -ge 4; then
-   if test "$version" = 4 && test "$patchlevel" -le 9; then
+   if test "$version" = 4 && test "$patchlevel" -le 10; then
   : # Linux 4.x
else
-  AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version 
newer than 4.9.x is not supported (please refer to the FAQ for advice)])
+  AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version 
newer than 4.10.x is not supported (please refer to the FAQ for advice)])
fi
 elif test "$version" = 3 && test "$patchlevel" -ge 10; then
: # Linux 3.x
-- 
2.1.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


  1   2   3   4   >