Re: [PATCH 2/2] openvswitch: Add eventmask support to CT action.

2017-04-21 Thread Jarno Rajahalme
Thanks Joe, I’ll issue a v2 with the comment fix and retain the acks, so it 
should be good to go.

  Jarno

> On Apr 20, 2017, at 11:53 AM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 19 April 2017 at 18:49, Jarno Rajahalme <ja...@ovn.org> wrote:
>> Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
>> which can be used in conjunction with the commit flag
>> (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
>> conntrack events (IPCT_*) should be delivered via the Netfilter
>> netlink multicast groups.  Default behavior depends on the system
>> configuration, but typically a lot of events are delivered.  This can be
>> very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
>> types of events are of interest.
>> 
>> Netfilter core init_conntrack() adds the event cache extension, so we
>> only need to set the ctmask value.  However, if the system is
>> configured without support for events, the setting will be skipped due
>> to extension not being found.
>> 
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
>> ---
> 
> Thanks Jarno, looks good overall. Minor nit in the API description below.
> 
> Acked-by: Joe Stringer <j...@ovn.org>
> 
>> include/uapi/linux/openvswitch.h | 12 
>> net/openvswitch/conntrack.c  | 27 +++
>> 2 files changed, 39 insertions(+)
>> 
>> diff --git a/include/uapi/linux/openvswitch.h 
>> b/include/uapi/linux/openvswitch.h
>> index 66d1c3c..38ae95d 100644
>> --- a/include/uapi/linux/openvswitch.h
>> +++ b/include/uapi/linux/openvswitch.h
>> @@ -693,6 +693,17 @@ struct ovs_action_hash {
>>  * nothing if the connection is already committed will check that the current
>>  * packet is in conntrack entry's original direction.  If directionality does
>>  * not match, will delete the existing conntrack entry and commit a new one.
>> + * @OVS_CT_ATTR_EVENTMASK: Mask of bits indicating which conntrack event 
>> types
>> + * (enum ip_conntrack_events IPCT_*) should be reported.  For any bit set to
>> + * zero, the corresponding event type is not generated.  Default behavior
>> + * depends on system configuration, but typically all event types are
>> + * generated, hence listening on UPDATE events may get a lot of events.
> 
> s/UPDATE/NFNLGRP_CONNTRACK_UPDATE/
> 
>> + * Explicitly passing this attribute allows limiting the updates received to
>> + * the events of interest.  The bit 1 << IPCT_NEW, 1 << IPCT_RELATED, and
>> + * 1 << IPCT_DESTROY must be set to ones for those events to be received on
>> + * NFNLGRP_CONNTRACK_NEW and NFNLGRP_CONNTRACK_DESTROY groups, respectively.
>> + * Remaining bits control the changes for which an event is delivered on the
>> + * NFNLGRP_CONNTRACK_UPDATE group.
>>  */
>> enum ovs_ct_attr {
>>OVS_CT_ATTR_UNSPEC,
>> @@ -704,6 +715,7 @@ enum ovs_ct_attr {
>>   related connections. */
>>OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */
>>OVS_CT_ATTR_FORCE_COMMIT,  /* No argument */
>> +   OVS_CT_ATTR_EVENTMASK,  /* u32 mask of IPCT_* events. */
>>__OVS_CT_ATTR_MAX
>> };
>> 
>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> index 58de4c2..4f7c3b5 100644
>> --- a/net/openvswitch/conntrack.c
>> +++ b/net/openvswitch/conntrack.c
>> @@ -66,7 +66,9 @@ struct ovs_conntrack_info {
>>u8 commit : 1;
>>u8 nat : 3; /* enum ovs_ct_nat */
>>u8 force : 1;
>> +   u8 have_eventmask : 1;
>>u16 family;
>> +   u32 eventmask;  /* Mask of 1 << IPCT_*. */
>>struct md_mark mark;
>>struct md_labels labels;
>> #ifdef CONFIG_NF_NAT_NEEDED
>> @@ -1007,6 +1009,20 @@ static int ovs_ct_commit(struct net *net, struct 
>> sw_flow_key *key,
>>if (!ct)
>>return 0;
>> 
>> +   /* Set the conntrack event mask if given.  NEW and DELETE events have
>> +* their own groups, but the NFNLGRP_CONNTRACK_UPDATE group listener
>> +* typically would receive many kinds of updates.  Setting the event
>> +* mask allows those events to be filtered.  The set event mask will
>> +* remain in effect for the lifetime of the connection unless changed
>> +* by a further CT action with both the commit flag and the eventmask
>> +* option. */
>> +   if (info->have_eventmask) {
>> +   struct nf_conntrack_ecache *cache

[PATCH net-next v2 2/2] openvswitch: Add eventmask support to CT action.

2017-04-21 Thread Jarno Rajahalme
Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
which can be used in conjunction with the commit flag
(OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
conntrack events (IPCT_*) should be delivered via the Netfilter
netlink multicast groups.  Default behavior depends on the system
configuration, but typically a lot of events are delivered.  This can be
very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
types of events are of interest.

Netfilter core init_conntrack() adds the event cache extension, so we
only need to set the ctmask value.  However, if the system is
configured without support for events, the setting will be skipped due
to extension not being found.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Reviewed-by: Greg Rose <gvrose8...@gmail.com>
Acked-by: Joe Stringer <j...@ovn.org>
---
 include/uapi/linux/openvswitch.h | 12 
 net/openvswitch/conntrack.c  | 27 +++
 2 files changed, 39 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 66d1c3c..61b7d36 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -693,6 +693,17 @@ struct ovs_action_hash {
  * nothing if the connection is already committed will check that the current
  * packet is in conntrack entry's original direction.  If directionality does
  * not match, will delete the existing conntrack entry and commit a new one.
+ * @OVS_CT_ATTR_EVENTMASK: Mask of bits indicating which conntrack event types
+ * (enum ip_conntrack_events IPCT_*) should be reported.  For any bit set to
+ * zero, the corresponding event type is not generated.  Default behavior
+ * depends on system configuration, but typically all event types are
+ * generated, hence listening on NFNLGRP_CONNTRACK_UPDATE events may get a lot
+ * of events.  Explicitly passing this attribute allows limiting the updates
+ * received to the events of interest.  The bit 1 << IPCT_NEW, 1 <<
+ * IPCT_RELATED, and 1 << IPCT_DESTROY must be set to ones for those events to
+ * be received on NFNLGRP_CONNTRACK_NEW and NFNLGRP_CONNTRACK_DESTROY groups,
+ * respectively.  Remaining bits control the changes for which an event is
+ * delivered on the NFNLGRP_CONNTRACK_UPDATE group.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -704,6 +715,7 @@ enum ovs_ct_attr {
   related connections. */
OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */
OVS_CT_ATTR_FORCE_COMMIT,  /* No argument */
+   OVS_CT_ATTR_EVENTMASK,  /* u32 mask of IPCT_* events. */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 58de4c2..4f7c3b5 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -66,7 +66,9 @@ struct ovs_conntrack_info {
u8 commit : 1;
u8 nat : 3; /* enum ovs_ct_nat */
u8 force : 1;
+   u8 have_eventmask : 1;
u16 family;
+   u32 eventmask;  /* Mask of 1 << IPCT_*. */
struct md_mark mark;
struct md_labels labels;
 #ifdef CONFIG_NF_NAT_NEEDED
@@ -1007,6 +1009,20 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
if (!ct)
return 0;
 
+   /* Set the conntrack event mask if given.  NEW and DELETE events have
+* their own groups, but the NFNLGRP_CONNTRACK_UPDATE group listener
+* typically would receive many kinds of updates.  Setting the event
+* mask allows those events to be filtered.  The set event mask will
+* remain in effect for the lifetime of the connection unless changed
+* by a further CT action with both the commit flag and the eventmask
+* option. */
+   if (info->have_eventmask) {
+   struct nf_conntrack_ecache *cache = nf_ct_ecache_find(ct);
+
+   if (cache)
+   cache->ctmask = info->eventmask;
+   }
+
/* Apply changes before confirming the connection so that the initial
 * conntrack NEW netlink event carries the values given in the CT
 * action.
@@ -1238,6 +1254,8 @@ static const struct ovs_ct_len_tbl 
ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
/* NAT length is checked when parsing the nested attributes. */
[OVS_CT_ATTR_NAT]   = { .minlen = 0, .maxlen = INT_MAX },
 #endif
+   [OVS_CT_ATTR_EVENTMASK] = { .minlen = sizeof(u32),
+   .maxlen = sizeof(u32) },
 };
 
 static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
@@ -1316,6 +1334,11 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
break;
}
 #endif
+   case OVS_CT_ATTR_EVENTMASK:
+   info->have_eventmask = true;
+   info->

[PATCH net-next v2 1/2] openvswitch: Typo fix.

2017-04-21 Thread Jarno Rajahalme
Fix typo in a comment.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Greg Rose <gvrose8...@gmail.com>
---
 net/openvswitch/conntrack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 7b2c2fc..58de4c2 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -373,7 +373,7 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
}
 
/* Labels are included in the IPCTNL_MSG_CT_NEW event only if the
-* IPCT_LABEL bit it set in the event cache.
+* IPCT_LABEL bit is set in the event cache.
 */
nf_conntrack_event_cache(IPCT_LABEL, ct);
 
-- 
2.1.4



Re: [PATCH 1/2] openvswitch: Typo fix.

2017-04-19 Thread Jarno Rajahalme
Sorry for the chatter, forgot to include “net-next” in the title, sending again.

  Jarno

> On Apr 19, 2017, at 6:49 PM, Jarno Rajahalme <ja...@ovn.org> wrote:
> 
> Fix typo in a comment.
> 
> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> ---
> net/openvswitch/conntrack.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 7b2c2fc..58de4c2 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -373,7 +373,7 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
> sw_flow_key *key,
>   }
> 
>   /* Labels are included in the IPCTNL_MSG_CT_NEW event only if the
> -  * IPCT_LABEL bit it set in the event cache.
> +  * IPCT_LABEL bit is set in the event cache.
>*/
>   nf_conntrack_event_cache(IPCT_LABEL, ct);
> 
> -- 
> 2.1.4
> 



[PATCH net-next 1/2] openvswitch: Typo fix.

2017-04-19 Thread Jarno Rajahalme
Fix typo in a comment.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 7b2c2fc..58de4c2 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -373,7 +373,7 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
}
 
/* Labels are included in the IPCTNL_MSG_CT_NEW event only if the
-* IPCT_LABEL bit it set in the event cache.
+* IPCT_LABEL bit is set in the event cache.
 */
nf_conntrack_event_cache(IPCT_LABEL, ct);
 
-- 
2.1.4



[PATCH 2/2] openvswitch: Add eventmask support to CT action.

2017-04-19 Thread Jarno Rajahalme
Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
which can be used in conjunction with the commit flag
(OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
conntrack events (IPCT_*) should be delivered via the Netfilter
netlink multicast groups.  Default behavior depends on the system
configuration, but typically a lot of events are delivered.  This can be
very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
types of events are of interest.

Netfilter core init_conntrack() adds the event cache extension, so we
only need to set the ctmask value.  However, if the system is
configured without support for events, the setting will be skipped due
to extension not being found.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/uapi/linux/openvswitch.h | 12 
 net/openvswitch/conntrack.c  | 27 +++
 2 files changed, 39 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 66d1c3c..38ae95d 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -693,6 +693,17 @@ struct ovs_action_hash {
  * nothing if the connection is already committed will check that the current
  * packet is in conntrack entry's original direction.  If directionality does
  * not match, will delete the existing conntrack entry and commit a new one.
+ * @OVS_CT_ATTR_EVENTMASK: Mask of bits indicating which conntrack event types
+ * (enum ip_conntrack_events IPCT_*) should be reported.  For any bit set to
+ * zero, the corresponding event type is not generated.  Default behavior
+ * depends on system configuration, but typically all event types are
+ * generated, hence listening on UPDATE events may get a lot of events.
+ * Explicitly passing this attribute allows limiting the updates received to
+ * the events of interest.  The bit 1 << IPCT_NEW, 1 << IPCT_RELATED, and
+ * 1 << IPCT_DESTROY must be set to ones for those events to be received on
+ * NFNLGRP_CONNTRACK_NEW and NFNLGRP_CONNTRACK_DESTROY groups, respectively.
+ * Remaining bits control the changes for which an event is delivered on the
+ * NFNLGRP_CONNTRACK_UPDATE group.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -704,6 +715,7 @@ enum ovs_ct_attr {
   related connections. */
OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */
OVS_CT_ATTR_FORCE_COMMIT,  /* No argument */
+   OVS_CT_ATTR_EVENTMASK,  /* u32 mask of IPCT_* events. */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 58de4c2..4f7c3b5 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -66,7 +66,9 @@ struct ovs_conntrack_info {
u8 commit : 1;
u8 nat : 3; /* enum ovs_ct_nat */
u8 force : 1;
+   u8 have_eventmask : 1;
u16 family;
+   u32 eventmask;  /* Mask of 1 << IPCT_*. */
struct md_mark mark;
struct md_labels labels;
 #ifdef CONFIG_NF_NAT_NEEDED
@@ -1007,6 +1009,20 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
if (!ct)
return 0;
 
+   /* Set the conntrack event mask if given.  NEW and DELETE events have
+* their own groups, but the NFNLGRP_CONNTRACK_UPDATE group listener
+* typically would receive many kinds of updates.  Setting the event
+* mask allows those events to be filtered.  The set event mask will
+* remain in effect for the lifetime of the connection unless changed
+* by a further CT action with both the commit flag and the eventmask
+* option. */
+   if (info->have_eventmask) {
+   struct nf_conntrack_ecache *cache = nf_ct_ecache_find(ct);
+
+   if (cache)
+   cache->ctmask = info->eventmask;
+   }
+
/* Apply changes before confirming the connection so that the initial
 * conntrack NEW netlink event carries the values given in the CT
 * action.
@@ -1238,6 +1254,8 @@ static const struct ovs_ct_len_tbl 
ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
/* NAT length is checked when parsing the nested attributes. */
[OVS_CT_ATTR_NAT]   = { .minlen = 0, .maxlen = INT_MAX },
 #endif
+   [OVS_CT_ATTR_EVENTMASK] = { .minlen = sizeof(u32),
+   .maxlen = sizeof(u32) },
 };
 
 static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
@@ -1316,6 +1334,11 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
break;
}
 #endif
+   case OVS_CT_ATTR_EVENTMASK:
+   info->have_eventmask = true;
+   info->eventmask = nla_get_u32(a);
+   break;
+
default:
 

[PATCH 1/2] openvswitch: Typo fix.

2017-04-19 Thread Jarno Rajahalme
Fix typo in a comment.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 7b2c2fc..58de4c2 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -373,7 +373,7 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
}
 
/* Labels are included in the IPCTNL_MSG_CT_NEW event only if the
-* IPCT_LABEL bit it set in the event cache.
+* IPCT_LABEL bit is set in the event cache.
 */
nf_conntrack_event_cache(IPCT_LABEL, ct);
 
-- 
2.1.4



[PATCH net-next 2/2] openvswitch: Add eventmask support to CT action.

2017-04-19 Thread Jarno Rajahalme
Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
which can be used in conjunction with the commit flag
(OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
conntrack events (IPCT_*) should be delivered via the Netfilter
netlink multicast groups.  Default behavior depends on the system
configuration, but typically a lot of events are delivered.  This can be
very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
types of events are of interest.

Netfilter core init_conntrack() adds the event cache extension, so we
only need to set the ctmask value.  However, if the system is
configured without support for events, the setting will be skipped due
to extension not being found.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/uapi/linux/openvswitch.h | 12 
 net/openvswitch/conntrack.c  | 27 +++
 2 files changed, 39 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 66d1c3c..38ae95d 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -693,6 +693,17 @@ struct ovs_action_hash {
  * nothing if the connection is already committed will check that the current
  * packet is in conntrack entry's original direction.  If directionality does
  * not match, will delete the existing conntrack entry and commit a new one.
+ * @OVS_CT_ATTR_EVENTMASK: Mask of bits indicating which conntrack event types
+ * (enum ip_conntrack_events IPCT_*) should be reported.  For any bit set to
+ * zero, the corresponding event type is not generated.  Default behavior
+ * depends on system configuration, but typically all event types are
+ * generated, hence listening on UPDATE events may get a lot of events.
+ * Explicitly passing this attribute allows limiting the updates received to
+ * the events of interest.  The bit 1 << IPCT_NEW, 1 << IPCT_RELATED, and
+ * 1 << IPCT_DESTROY must be set to ones for those events to be received on
+ * NFNLGRP_CONNTRACK_NEW and NFNLGRP_CONNTRACK_DESTROY groups, respectively.
+ * Remaining bits control the changes for which an event is delivered on the
+ * NFNLGRP_CONNTRACK_UPDATE group.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -704,6 +715,7 @@ enum ovs_ct_attr {
   related connections. */
OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */
OVS_CT_ATTR_FORCE_COMMIT,  /* No argument */
+   OVS_CT_ATTR_EVENTMASK,  /* u32 mask of IPCT_* events. */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 58de4c2..4f7c3b5 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -66,7 +66,9 @@ struct ovs_conntrack_info {
u8 commit : 1;
u8 nat : 3; /* enum ovs_ct_nat */
u8 force : 1;
+   u8 have_eventmask : 1;
u16 family;
+   u32 eventmask;  /* Mask of 1 << IPCT_*. */
struct md_mark mark;
struct md_labels labels;
 #ifdef CONFIG_NF_NAT_NEEDED
@@ -1007,6 +1009,20 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
if (!ct)
return 0;
 
+   /* Set the conntrack event mask if given.  NEW and DELETE events have
+* their own groups, but the NFNLGRP_CONNTRACK_UPDATE group listener
+* typically would receive many kinds of updates.  Setting the event
+* mask allows those events to be filtered.  The set event mask will
+* remain in effect for the lifetime of the connection unless changed
+* by a further CT action with both the commit flag and the eventmask
+* option. */
+   if (info->have_eventmask) {
+   struct nf_conntrack_ecache *cache = nf_ct_ecache_find(ct);
+
+   if (cache)
+   cache->ctmask = info->eventmask;
+   }
+
/* Apply changes before confirming the connection so that the initial
 * conntrack NEW netlink event carries the values given in the CT
 * action.
@@ -1238,6 +1254,8 @@ static const struct ovs_ct_len_tbl 
ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
/* NAT length is checked when parsing the nested attributes. */
[OVS_CT_ATTR_NAT]   = { .minlen = 0, .maxlen = INT_MAX },
 #endif
+   [OVS_CT_ATTR_EVENTMASK] = { .minlen = sizeof(u32),
+   .maxlen = sizeof(u32) },
 };
 
 static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
@@ -1316,6 +1334,11 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
break;
}
 #endif
+   case OVS_CT_ATTR_EVENTMASK:
+   info->have_eventmask = true;
+   info->eventmask = nla_get_u32(a);
+   break;
+
default:
 

[PATCH net] openvswitch: Fix refcount leak on force commit.

2017-03-28 Thread Jarno Rajahalme
The reference count held for skb needs to be released when the skb's
nfct pointer is cleared regardless of if nf_ct_delete() is called or
not.

Failing to release the skb's reference cound led to deferred conntrack
cleanup spinning forever within nf_conntrack_cleanup_net_list() when
cleaning up a network namespace:

   kworker/u16:0-19025 [004] 45981067.173642: sched_switch: kworker/u16:0:19025 
[120] R ==> rcu_preempt:7 [120]
   kworker/u16:0-19025 [004] 45981067.173651: kernel_stack: 
=> ___preempt_schedule (a001ed36)
=> _raw_spin_unlock_bh (a0713290)
=> nf_ct_iterate_cleanup (c00a4454)
=> nf_conntrack_cleanup_net_list (c00a5e1e)
=> nf_conntrack_pernet_exit (c00a63dd)
=> ops_exit_list.isra.1 (a06075f3)
=> cleanup_net (a0607df0)
=> process_one_work (a0084c31)
=> worker_thread (a008592b)
=> kthread (a008bee2)
=> ret_from_fork (a071b67c)

Fixes: dd41d33f0b03 ("openvswitch: Add force commit.")
Reported-by: Yang Song <yangs...@vmware.com>
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index e0a8777..7b2c2fc 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -643,8 +643,8 @@ static bool skb_nfct_cached(struct net *net,
 */
if (nf_ct_is_confirmed(ct))
nf_ct_delete(ct, 0, 0);
-   else
-   nf_conntrack_put(>ct_general);
+
+   nf_conntrack_put(>ct_general);
nf_ct_set(skb, NULL, 0);
return false;
}
-- 
2.1.4



[PATCH net-next 2/2] netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.

2017-02-21 Thread Jarno Rajahalme
Commit 4dee62b1 ("netfilter: nf_ct_expect: nf_ct_expect_insert()
returns void") inadvertently changed the successful return value of
nf_ct_expect_related_report() from 0 to 1 due to
__nf_ct_expect_check() returning 1 on success.  Prevent this
regression in the future by changing the return value of
__nf_ct_expect_check() to 0 on success.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/netfilter/nf_conntrack_expect.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_expect.c 
b/net/netfilter/nf_conntrack_expect.c
index d6ace69..4b2e1fb 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -410,7 +410,7 @@ static inline int __nf_ct_expect_check(struct 
nf_conntrack_expect *expect)
struct net *net = nf_ct_exp_net(expect);
struct hlist_node *next;
unsigned int h;
-   int ret = 1;
+   int ret = 0;
 
if (!master_help) {
ret = -ESHUTDOWN;
@@ -460,7 +460,7 @@ int nf_ct_expect_related_report(struct nf_conntrack_expect 
*expect,
 
spin_lock_bh(_conntrack_expect_lock);
ret = __nf_ct_expect_check(expect);
-   if (ret <= 0)
+   if (ret < 0)
goto out;
 
nf_ct_expect_insert(expect);
-- 
2.1.4



[PATCH net-next 1/2] netfilter: nf_ct_expect: nf_ct_expect_related_report(): Return zero on success,

2017-02-21 Thread Jarno Rajahalme
Commit 4dee62b1 ("netfilter: nf_ct_expect: nf_ct_expect_insert()
returns void") inadvertently changed the successful return value of
nf_ct_expect_related_report() from 0 to 1, which caused openvswitch
conntrack integration fail in FTP test cases.

Fix this by always returning zero on the success code path.

Fixes: 4dee62b1 ("netfilter: nf_ct_expect: nf_ct_expect_insert() returns void")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/netfilter/nf_conntrack_expect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_expect.c 
b/net/netfilter/nf_conntrack_expect.c
index e19a697..d6ace69 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -467,7 +467,7 @@ int nf_ct_expect_related_report(struct nf_conntrack_expect 
*expect,
 
spin_unlock_bh(_conntrack_expect_lock);
nf_ct_expect_event_report(IPEXP_NEW, expect, portid, report);
-   return ret;
+   return 0;
 out:
spin_unlock_bh(_conntrack_expect_lock);
return ret;
-- 
2.1.4



[PATCH net-next] openvswitch: Set event bit after initializing labels.

2017-02-17 Thread Jarno Rajahalme
Connlabels are included in conntrack netlink event messages only if
the IPCT_LABEL bit is set in the event cache (see
ctnetlink_conntrack_event()).  Set it after initializing labels for a
new connection.

Found upon further system testing, where it was noticed that labels
were missing from the conntrack events.

Fixes: 193e30967897 ("openvswitch: Do not trigger events for unconfirmed 
connections.")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c2d452e..85cd595 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -339,9 +339,7 @@ static struct nf_conn_labels *ovs_ct_get_conn_labels(struct 
nf_conn *ct)
 
 /* Initialize labels for a new, yet to be committed conntrack entry.  Note that
  * since the new connection is not yet confirmed, and thus no-one else has
- * access to it's labels, we simply write them over.  Also, we refrain from
- * triggering events, as receiving change events before the create event would
- * be confusing.
+ * access to it's labels, we simply write them over.
  */
 static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key,
  const struct ovs_key_ct_labels *labels,
@@ -374,6 +372,11 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
 & mask->ct_labels_32[i]);
}
 
+   /* Labels are included in the IPCTNL_MSG_CT_NEW event only if the
+* IPCT_LABEL bit it set in the event cache.
+*/
+   nf_conntrack_event_cache(IPCT_LABEL, ct);
+
memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
 
return 0;
-- 
2.1.4



[PATCH net-next] openvswitch: Set internal device max mtu to ETH_MAX_MTU.

2017-02-14 Thread Jarno Rajahalme
Commit 91572088e3fd ("net: use core MTU range checking in core net
infra") changed the openvswitch internal device to use the core net
infra for controlling the MTU range, but failed to actually set the
max_mtu as described in the commit message, which now defaults to
ETH_DATA_LEN.

This patch fixes this by setting max_mtu to ETH_MAX_MTU after
ether_setup() call.

Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/vport-internal_dev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index 09141a1..89193a6 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -149,6 +149,8 @@ static void do_setup(struct net_device *netdev)
 {
ether_setup(netdev);
 
+   netdev->max_mtu = ETH_MAX_MTU;
+
netdev->netdev_ops = _dev_netdev_ops;
 
netdev->priv_flags &= ~IFF_TX_SKB_SHARING;
-- 
2.1.4



Re: [PATCH v3 net-next 00/10] openvswitch: Conntrack integration improvements.

2017-02-09 Thread Jarno Rajahalme

> On Feb 9, 2017, at 8:44 AM, Pravin Shelar <pshe...@ovn.org> wrote:
> 
> On Wed, Feb 8, 2017 at 5:30 PM, Jarno Rajahalme <ja...@ovn.org> wrote:
>> This series improves the conntrack integration code in the openvswitch
>> module by fixing outdated comments (patch 1), bugs (patches 2, 3, and
>> 7), clarifying code (patches 4, 5, and 6), improving performance
>> (patch 10), and adding new features enabling better translation from
>> firewall admission policy to network configuration requested by user
>> communities (patches 8 and 9).
>> 
>> v3: Rebase to the current net-next, add the comment only changing
>>patch 1 and reshuffle some of the patches as requested by Joe.
>> 
> 
> All patches looks good to me.
> 
> Acked-by: Pravin B Shelar <pshe...@ovn.org>
> 
> Thanks.

Thanks for the review!

Joe spotted some english language and coding style issues I had not addressed 
in v3, so I just posted a v4. I included Joes and yours Acks, but not sure if 
that is the proper protocol, or if you need to Ack the v4 again. Sorry for the 
inconvenience!

  Jarno



Re: [PATCH v2 net-next 3/9] openvswitch: Simplify labels length logic.

2017-02-09 Thread Jarno Rajahalme

> On Feb 8, 2017, at 2:47 PM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 8 February 2017 at 11:32, Jarno Rajahalme <ja...@ovn.org> wrote:
>> Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
>> distinct labels"), the size of conntrack labels extension has fixed to
>> 128 bits, so we do not need to check for labels sizes shorter than 128
>> at run-time.  This patch simplifies labels length logic accordingly,
>> but allows the conntrack labels size to be increased in the future
>> without breaking the build.  In the event of conntrack labels
>> increasing in size OVS would still be able to deal with the 128 first
>> label bits.
>> 
>> Suggested-by: Joe Stringer <j...@ovn.org>
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
>> ---
>> net/openvswitch/conntrack.c | 22 +++---
>> 1 file changed, 11 insertions(+), 11 deletions(-)
>> 
>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> index 6730f09..a07e5cd 100644
>> --- a/net/openvswitch/conntrack.c
>> +++ b/net/openvswitch/conntrack.c
>> @@ -129,22 +129,22 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct)
>> #endif
>> }
>> 
>> +/* Guard against conntrack labels max size shrinking below 128 bits. */
>> +#if NF_CT_LABELS_MAX_SIZE < 16
>> +#error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes
>> +#endif
>> +
>> static void ovs_ct_get_labels(const struct nf_conn *ct,
>>  struct ovs_key_ct_labels *labels)
>> {
>>struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
>> 
>> -   if (cl) {
>> -   size_t len = sizeof(cl->bits);
>> -
>> -   if (len > OVS_CT_LABELS_LEN)
>> -   len = OVS_CT_LABELS_LEN;
>> -   else if (len < OVS_CT_LABELS_LEN)
>> -   memset(labels, 0, OVS_CT_LABELS_LEN);
>> -   memcpy(labels, cl->bits, len);
>> -   } else {
>> +   if (cl)
>> +   memcpy(labels, cl->bits,
>> +  sizeof(cl->bits) > OVS_CT_LABELS_LEN
>> +  ? OVS_CT_LABELS_LEN : sizeof(cl->bits));
> 
> Is this to be defensive? If sizeof(cl->bits) is larger than
> OVS_CT_LABELS_LEN, we'll use OVS_CT_LABELS_LEN; if it's equal, we'll
> still use OVS_CT_LABELS_LEN; if it's less, the precompiler will fail
> above. Why not memcpy(.., OVS_CT_LABELS_LEN) ?

Right, I’ll post v4 simplifying this as suggested. Functionality is still the 
same, though.

  Jarno




Re: [PATCH v2 net-next 8/9] openvswitch: Add force commit.

2017-02-09 Thread Jarno Rajahalme

> On Feb 8, 2017, at 3:53 PM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 8 February 2017 at 11:32, Jarno Rajahalme <ja...@ovn.org> wrote:
>> Stateful network admission policy may allow connections to one
>> direction and reject connections initiated in the other direction.
>> After policy change it is possible that for a new connection an
>> overlapping conntrack entry already exists, where the original
>> direction of the existing connection is opposed to the new
>> connection's initial packet.
>> 
>> Most importantly, conntrack state relating to the current packet gets
>> the "reply" designation based on whether the original direction tuple
>> or the reply direction tuple matched.  If this "directionality" is
>> wrong w.r.t. to the stateful network admission policy it may happen
>> that packets in neither direction are correctly admitted.
>> 
>> This patch adds a new "force commit" option to the OVS conntrack
>> action that checks the original direction of an existing conntrack
>> entry.  If that direction is opposed to the current packet, the
>> existing conntrack entry is deleted and a new one is subsequently
>> created in the correct direction.
>> 
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> 
> 
> 
>>if (help && rcu_access_pointer(help->helper) != info->helper)
>>return false;
>>}
>> +   /* Force conntrack entry direction to the current packet? */
>> +   if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
>> +   /* Delete the conntrack entry if confirmed, else just release
>> +* the reference.
>> +*/
>> +   if (nf_ct_is_confirmed(ct))
>> +   nf_ct_delete(ct, 0, 0);
>> +   else
>> +   nf_conntrack_put(>ct_general);
>> +   skb->nfct = NULL;
>> +   skb->nfctinfo = 0;
> 
> This can use nf_ct_set().

This is in v3.

  Jarno




[PATCH v4 net-next 05/10] openvswitch: Simplify labels length logic.

2017-02-09 Thread Jarno Rajahalme
Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
distinct labels"), the size of conntrack labels extension has fixed to
128 bits, so we do not need to check for labels sizes shorter than 128
at run-time.  This patch simplifies labels length logic accordingly,
but allows the conntrack labels size to be increased in the future
without breaking the build.  In the event of conntrack labels
increasing in size OVS would still be able to deal with the 128 first
label bits.

Suggested-by: Joe Stringer <j...@ovn.org>
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 net/openvswitch/conntrack.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index f23934c..fe2a410 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -129,22 +129,20 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct)
 #endif
 }
 
+/* Guard against conntrack labels max size shrinking below 128 bits. */
+#if NF_CT_LABELS_MAX_SIZE < 16
+#error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes
+#endif
+
 static void ovs_ct_get_labels(const struct nf_conn *ct,
  struct ovs_key_ct_labels *labels)
 {
struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
 
-   if (cl) {
-   size_t len = sizeof(cl->bits);
-
-   if (len > OVS_CT_LABELS_LEN)
-   len = OVS_CT_LABELS_LEN;
-   else if (len < OVS_CT_LABELS_LEN)
-   memset(labels, 0, OVS_CT_LABELS_LEN);
-   memcpy(labels, cl->bits, len);
-   } else {
+   if (cl)
+   memcpy(labels, cl->bits, OVS_CT_LABELS_LEN);
+   else
memset(labels, 0, OVS_CT_LABELS_LEN);
-   }
 }
 
 static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
@@ -274,7 +272,7 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
-   if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN)
+   if (!cl)
return -ENOSPC;
 
if (nf_ct_is_confirmed(ct)) {
-- 
2.1.4



[PATCH v4 net-next 01/10] openvswitch: Fix comments for skb->_nfct

2017-02-09 Thread Jarno Rajahalme
Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
they are combined into '_nfct'.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 net/openvswitch/conntrack.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index fbffe0e..5de6d12 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -157,7 +157,7 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
ovs_ct_get_labels(ct, >ct.labels);
 }
 
-/* Update 'key' based on skb->nfct.  If 'post_ct' is true, then OVS has
+/* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
  * previously sent the packet to conntrack via the ct action.  If
  * 'keep_nat_flags' is true, the existing NAT flags retained, else they are
  * initialized from the connection status.
@@ -421,12 +421,12 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
 
 /* Find an existing connection which this packet belongs to without
  * re-attributing statistics or modifying the connection state.  This allows an
- * skb->nfct lost due to an upcall to be recovered during actions execution.
+ * skb->_nfct lost due to an upcall to be recovered during actions execution.
  *
  * Must be called with rcu_read_lock.
  *
- * On success, populates skb->nfct and skb->nfctinfo, and returns the
- * connection.  Returns NULL if there is no existing entry.
+ * On success, populates skb->_nfct and returns the connection.  Returns NULL
+ * if there is no existing entry.
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
@@ -464,7 +464,7 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return ct;
 }
 
-/* Determine whether skb->nfct is equal to the result of conntrack lookup. */
+/* Determine whether skb->_nfct is equal to the result of conntrack lookup. */
 static bool skb_nfct_cached(struct net *net,
const struct sw_flow_key *key,
const struct ovs_conntrack_info *info,
@@ -475,7 +475,7 @@ static bool skb_nfct_cached(struct net *net,
 
ct = nf_ct_get(skb, );
/* If no ct, check if we have evidence that an existing conntrack entry
-* might be found for this skb.  This happens when we lose a skb->nfct
+* might be found for this skb.  This happens when we lose a skb->_nfct
 * due to an upcall.  If the connection was not confirmed, it is not
 * cached and needs to be run through conntrack again.
 */
@@ -699,7 +699,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if
  * not done already.  Update key with new CT state after passing the packet
  * through conntrack.
- * Note that if the packet is deemed invalid by conntrack, skb->nfct will be
+ * Note that if the packet is deemed invalid by conntrack, skb->_nfct will be
  * set to NULL and 0 will be returned.
  */
 static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key,
-- 
2.1.4



[PATCH v4 net-next 04/10] openvswitch: Unionize ovs_key_ct_label with a u32 array.

2017-02-09 Thread Jarno Rajahalme
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array.  It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 include/uapi/linux/openvswitch.h |  8 ++--
 net/openvswitch/conntrack.c  | 15 ---
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 375d812..96aee34 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -446,9 +446,13 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
-#define OVS_CT_LABELS_LEN  16
+#define OVS_CT_LABELS_LEN_32   4
+#define OVS_CT_LABELS_LEN  (OVS_CT_LABELS_LEN_32 * sizeof(__u32))
 struct ovs_key_ct_labels {
-   __u8ct_labels[OVS_CT_LABELS_LEN];
+   union {
+   __u8ct_labels[OVS_CT_LABELS_LEN];
+   __u32   ct_labels_32[OVS_CT_LABELS_LEN_32];
+   };
 };
 
 /* OVS_KEY_ATTR_CT_STATE flags */
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index a6ff374..f23934c 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -281,20 +281,21 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
/* Triggers a change event, which makes sense only for
 * confirmed connections.
 */
-   int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / 
sizeof(u32));
+   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
if (err)
return err;
} else {
u32 *dst = (u32 *)cl->bits;
-   const u32 *msk = (const u32 *)mask->ct_labels;
-   const u32 *lbl = (const u32 *)labels->ct_labels;
+   const u32 *msk = mask->ct_labels_32;
+   const u32 *lbl = labels->ct_labels_32;
int i;
 
/* No-one else has access to the non-confirmed entry, copy
 * labels over, keeping any bits we are not explicitly setting.
 */
-   for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++)
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
}
 
@@ -866,8 +867,8 @@ static bool labels_nonzero(const struct ovs_key_ct_labels 
*labels)
 {
size_t i;
 
-   for (i = 0; i < sizeof(*labels); i++)
-   if (labels->ct_labels[i])
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   if (labels->ct_labels_32[i])
return true;
 
return false;
-- 
2.1.4



[PATCH v4 net-next 08/10] openvswitch: Add original direction conntrack tuple to sw_flow_key.

2017-02-09 Thread Jarno Rajahalme
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key.  The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry.  This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple.  This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state.  While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards.  If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change.  When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information.  If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields.  Hence, the IP addresses are overlaid in union with ARP
and ND fields.  This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets.  ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 include/uapi/linux/openvswitch.h | 20 +-
 net/openvswitch/actions.c|  2 +
 net/openvswitch/conntrack.c  | 86 +---
 net/openvswitch/conntrack.h  | 10 -
 net/openvswitch/flow.c   | 34 +---
 net/openvswitch/flow.h   | 49 ++-
 net/openvswitch/flow_netlink.c   | 85 +--
 net/openvswitch/flow_netlink.h   |  7 +++-
 8 files changed, 246 insertions(+), 47 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 96aee34..90af8b8 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -1,6 +1,6 @@
 
 /*
- * Copyright (c) 2007-2013 Nicira, Inc.
+ * Copyright (c) 2007-2017 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -331,6 +331,8 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
+   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
+   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -472,6 +474,22 @@ struct ovs_key_ct_labels {
 
 #define OVS_CS_F_NAT_MASK (OVS_CS_F_SRC_NAT | OVS_CS_F_DST_NAT)
 
+struct ovs_key_ct_tuple_ipv4 {
+   __be32 ipv4_src;
+   __be32 ipv4_dst;
+

[PATCH v4 net-next 10/10] openvswitch: Pack struct sw_flow_key.

2017-02-09 Thread Jarno Rajahalme
struct sw_flow_key has two 16-bit holes. Move the most matched
conntrack match fields there.  In some typical cases this reduces the
size of the key that needs to be hashed into half and into one cache
line.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 net/openvswitch/conntrack.c| 40 
 net/openvswitch/conntrack.h|  8 
 net/openvswitch/flow.h | 14 --
 net/openvswitch/flow_netlink.c | 11 +++
 4 files changed, 39 insertions(+), 34 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 8b15bab..c2d452e 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -152,7 +152,7 @@ static void __ovs_ct_update_key_orig_tp(struct sw_flow_key 
*key,
const struct nf_conntrack_tuple *orig,
u8 icmp_proto)
 {
-   key->ct.orig_proto = orig->dst.protonum;
+   key->ct_orig_proto = orig->dst.protonum;
if (orig->dst.protonum == icmp_proto) {
key->ct.orig_tp.src = htons(orig->dst.u.icmp.type);
key->ct.orig_tp.dst = htons(orig->dst.u.icmp.code);
@@ -166,8 +166,8 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
const struct nf_conntrack_zone *zone,
const struct nf_conn *ct)
 {
-   key->ct.state = state;
-   key->ct.zone = zone->id;
+   key->ct_state = state;
+   key->ct_zone = zone->id;
key->ct.mark = ovs_ct_get_mark(ct);
ovs_ct_get_labels(ct, >ct.labels);
 
@@ -195,10 +195,10 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, 
u8 state,
return;
}
}
-   /* Clear 'ct.orig_proto' to mark the non-existence of conntrack
+   /* Clear 'ct_orig_proto' to mark the non-existence of conntrack
 * original direction key fields.
 */
-   key->ct.orig_proto = 0;
+   key->ct_orig_proto = 0;
 }
 
 /* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
@@ -228,7 +228,7 @@ static void ovs_ct_update_key(const struct sk_buff *skb,
if (ct->master)
state |= OVS_CS_F_RELATED;
if (keep_nat_flags) {
-   state |= key->ct.state & OVS_CS_F_NAT_MASK;
+   state |= key->ct_state & OVS_CS_F_NAT_MASK;
} else {
if (ct->status & IPS_SRC_NAT)
state |= OVS_CS_F_SRC_NAT;
@@ -259,11 +259,11 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct 
sw_flow_key *key)
 int ovs_ct_put_key(const struct sw_flow_key *swkey,
   const struct sw_flow_key *output, struct sk_buff *skb)
 {
-   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct.state))
+   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct_state))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
-   nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct.zone))
+   nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct_zone))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) &&
@@ -275,14 +275,14 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey,
>ct.labels))
return -EMSGSIZE;
 
-   if (swkey->ct.orig_proto) {
+   if (swkey->ct_orig_proto) {
if (swkey->eth.type == htons(ETH_P_IP)) {
struct ovs_key_ct_tuple_ipv4 orig = {
output->ipv4.ct_orig.src,
output->ipv4.ct_orig.dst,
output->ct.orig_tp.src,
output->ct.orig_tp.dst,
-   output->ct.orig_proto,
+   output->ct_orig_proto,
};
if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,
sizeof(orig), ))
@@ -293,7 +293,7 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey,
IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.dst),
output->ct.orig_tp.src,
output->ct.orig_tp.dst,
-   output->ct.orig_proto,
+   output->ct_orig_proto,
};
if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,
sizeof(orig), ))
@@ -612,11 +612,11 @@ static bool skb_nfct_cached(struct net *net,
 

[PATCH v4 net-next 03/10] openvswitch: Do not trigger events for unconfirmed connections.

2017-02-09 Thread Jarno Rajahalme
Receiving change events before the 'new' event for the connection has
been received can be confusing.  Avoid triggering change events for
setting conntrack mark or labels before the conntrack entry has been
confirmed.

Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark")
Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 net/openvswitch/conntrack.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4df9a54..a6ff374 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -245,7 +245,8 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
-   nf_conntrack_event_cache(IPCT_MARK, ct);
+   if (nf_ct_is_confirmed(ct))
+   nf_conntrack_event_cache(IPCT_MARK, ct);
key->ct.mark = new_mark;
}
 
@@ -262,7 +263,6 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
struct nf_conn *ct;
-   int err;
 
/* The connection could be invalid, in which case set_label is no-op.*/
ct = nf_ct_get(skb, );
@@ -277,10 +277,26 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN)
return -ENOSPC;
 
-   err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / sizeof(u32));
-   if (err)
-   return err;
+   if (nf_ct_is_confirmed(ct)) {
+   /* Triggers a change event, which makes sense only for
+* confirmed connections.
+*/
+   int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
+   OVS_CT_LABELS_LEN / 
sizeof(u32));
+   if (err)
+   return err;
+   } else {
+   u32 *dst = (u32 *)cl->bits;
+   const u32 *msk = (const u32 *)mask->ct_labels;
+   const u32 *lbl = (const u32 *)labels->ct_labels;
+   int i;
+
+   /* No-one else has access to the non-confirmed entry, copy
+* labels over, keeping any bits we are not explicitly setting.
+*/
+   for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++)
+   dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
+   }
 
ovs_ct_get_labels(ct, >ct.labels);
return 0;
-- 
2.1.4



[PATCH v4 net-next 00/10] openvswitch: Conntrack integration improvements.

2017-02-09 Thread Jarno Rajahalme
This series improves the conntrack integration code in the openvswitch
module by fixing outdated comments (patch 1), bugs (patches 2, 3, and
7), clarifying code (patches 4, 5, and 6), improving performance
(patch 10), and adding new features enabling better translation from
firewall admission policy to network configuration requested by user
communities (patches 8 and 9).

Please note that v3 of the series was Acked by Pravin, but I posted a
v4 addressing the remaining english language and coding style issues
posted by Joe on v2.

v4: Address remaining language and coding style issues from Joe.
v3: Rebase to the current net-next, add the comment only changing
patch 1 and reshuffle some of the patches as requested by Joe.

Jarno Rajahalme (10):
  openvswitch: Fix comments for skb->_nfct
  openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.
  openvswitch: Do not trigger events for unconfirmed connections.
  openvswitch: Unionize ovs_key_ct_label with a u32 array.
  openvswitch: Simplify labels length logic.
  openvswitch: Refactor labels initialization.
  openvswitch: Inherit master's labels.
  openvswitch: Add original direction conntrack tuple to sw_flow_key.
  openvswitch: Add force commit.
  openvswitch: Pack struct sw_flow_key.

 include/uapi/linux/openvswitch.h |  33 -
 net/openvswitch/actions.c|   2 +
 net/openvswitch/conntrack.c  | 296 ++-
 net/openvswitch/conntrack.h  |  14 +-
 net/openvswitch/flow.c   |  34 -
 net/openvswitch/flow.h   |  55 ++--
 net/openvswitch/flow_netlink.c   |  92 +---
 net/openvswitch/flow_netlink.h   |   7 +-
 8 files changed, 420 insertions(+), 113 deletions(-)

-- 
2.1.4



[PATCH v4 net-next 06/10] openvswitch: Refactor labels initialization.

2017-02-09 Thread Jarno Rajahalme
Refactoring conntrack labels initialization makes changes in later
patches easier to review.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 net/openvswitch/conntrack.c | 104 ++--
 1 file changed, 62 insertions(+), 42 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index fe2a410..7c5bb98 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -227,19 +227,12 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct 
sk_buff *skb)
return 0;
 }
 
-static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
+static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key,
   u32 ct_mark, u32 mask)
 {
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
-   enum ip_conntrack_info ctinfo;
-   struct nf_conn *ct;
u32 new_mark;
 
-   /* The connection could be invalid, in which case set_mark is no-op. */
-   ct = nf_ct_get(skb, );
-   if (!ct)
-   return 0;
-
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
@@ -254,50 +247,66 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
 #endif
 }
 
-static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key,
-const struct ovs_key_ct_labels *labels,
-const struct ovs_key_ct_labels *mask)
+static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct)
 {
-   enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
-   struct nf_conn *ct;
-
-   /* The connection could be invalid, in which case set_label is no-op.*/
-   ct = nf_ct_get(skb, );
-   if (!ct)
-   return 0;
 
cl = nf_ct_labels_find(ct);
if (!cl) {
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
+
+   return cl;
+}
+
+/* Initialize labels for a new, yet to be committed conntrack entry.  Note that
+ * since the new connection is not yet confirmed, and thus no-one else has
+ * access to it's labels, we simply write them over.  Also, we refrain from
+ * triggering events, as receiving change events before the create event would
+ * be confusing.
+ */
+static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key,
+ const struct ovs_key_ct_labels *labels,
+ const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   u32 *dst;
+   int i;
+
+   cl = ovs_ct_get_conn_labels(ct);
if (!cl)
return -ENOSPC;
 
-   if (nf_ct_is_confirmed(ct)) {
-   /* Triggers a change event, which makes sense only for
-* confirmed connections.
-*/
-   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
-   mask->ct_labels_32,
-   OVS_CT_LABELS_LEN_32);
-   if (err)
-   return err;
-   } else {
-   u32 *dst = (u32 *)cl->bits;
-   const u32 *msk = mask->ct_labels_32;
-   const u32 *lbl = labels->ct_labels_32;
-   int i;
+   dst = (u32 *)cl->bits;
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
 
-   /* No-one else has access to the non-confirmed entry, copy
-* labels over, keeping any bits we are not explicitly setting.
-*/
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
-   }
+   memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
+
+   return 0;
+}
+
+static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key,
+const struct ovs_key_ct_labels *labels,
+const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   int err;
+
+   cl = ovs_ct_get_conn_labels(ct);
+   if (!cl)
+   return -ENOSPC;
+
+   err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
+   if (err)
+   return err;
+
+   memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
 
-   ovs_ct_get_labels(ct, >ct.labels);
return 0;
 }
 
@@ -877,25 +886,36 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
 const struct ovs_conntrack_info *info,
   

[PATCH v4 net-next 07/10] openvswitch: Inherit master's labels.

2017-02-09 Thread Jarno Rajahalme
We avoid calling into nf_conntrack_in() for expected connections, as
that would remove the expectation that we want to stick around until
we are ready to commit the connection.  Instead, we do a lookup in the
expectation table directly.  However, after a successful expectation
lookup we have set the flow key label field from the master
connection, whereas nf_conntrack_in() does not do this.  This leads to
master's labels being inherited after an expectation lookup, but those
labels not being inherited after the corresponding conntrack action
with a commit flag.

This patch resolves the problem by changing the commit code path to
also inherit the master's labels to the expected connection.
Resolving this conflict in favor of inheriting the labels allows more
information be passed from the master connection to related
connections, which would otherwise be much harder if the 32 bits in
the connmark are not enough.  Labels can still be set explicitly, so
this change only affects the default values of the labels in presense
of a master connection.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 net/openvswitch/conntrack.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 7c5bb98..f989ccf 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -73,6 +73,8 @@ struct ovs_conntrack_info {
 #endif
 };
 
+static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
+
 static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info);
 
 static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -270,18 +272,32 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
  const struct ovs_key_ct_labels *labels,
  const struct ovs_key_ct_labels *mask)
 {
-   struct nf_conn_labels *cl;
-   u32 *dst;
-   int i;
+   struct nf_conn_labels *cl, *master_cl;
+   bool have_mask = labels_nonzero(mask);
+
+   /* Inherit master's labels to the related connection? */
+   master_cl = ct->master ? nf_ct_labels_find(ct->master) : NULL;
+
+   if (!master_cl && !have_mask)
+   return 0;   /* Nothing to do. */
 
cl = ovs_ct_get_conn_labels(ct);
if (!cl)
return -ENOSPC;
 
-   dst = (u32 *)cl->bits;
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
-   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
+   /* Inherit the master's labels, if any. */
+   if (master_cl)
+   *cl = *master_cl;
+
+   if (have_mask) {
+   u32 *dst = (u32 *)cl->bits;
+   int i;
+
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i]
+& mask->ct_labels_32[i]);
+   }
 
memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
 
@@ -909,13 +925,14 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
if (err)
return err;
}
-   if (labels_nonzero(>labels.mask)) {
-   if (!nf_ct_is_confirmed(ct))
-   err = ovs_ct_init_labels(ct, key, >labels.value,
->labels.mask);
-   else
-   err = ovs_ct_set_labels(ct, key, >labels.value,
-   >labels.mask);
+   if (!nf_ct_is_confirmed(ct)) {
+   err = ovs_ct_init_labels(ct, key, >labels.value,
+>labels.mask);
+   if (err)
+   return err;
+   } else if (labels_nonzero(>labels.mask)) {
+   err = ovs_ct_set_labels(ct, key, >labels.value,
+   >labels.mask);
if (err)
return err;
}
-- 
2.1.4



[PATCH v4 net-next 09/10] openvswitch: Add force commit.

2017-02-09 Thread Jarno Rajahalme
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.

Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched.  If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.

This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry.  If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 include/uapi/linux/openvswitch.h |  5 +
 net/openvswitch/conntrack.c  | 26 --
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 90af8b8..7f41f7d 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -674,6 +674,10 @@ struct ovs_action_hash {
  * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
  * @OVS_CT_ATTR_NAT: Nested OVS_NAT_ATTR_* for performing L3 network address
  * translation (NAT) on the packet.
+ * @OVS_CT_ATTR_FORCE_COMMIT: Like %OVS_CT_ATTR_COMMIT, but instead of doing
+ * nothing if the connection is already committed will check that the current
+ * packet is in conntrack entry's original direction.  If directionality does
+ * not match, will delete the existing conntrack entry and commit a new one.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -684,6 +688,7 @@ enum ovs_ct_attr {
OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of
   related connections. */
OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */
+   OVS_CT_ATTR_FORCE_COMMIT,  /* No argument */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index bfd7606..8b15bab 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -65,6 +65,7 @@ struct ovs_conntrack_info {
struct nf_conn *ct;
u8 commit : 1;
u8 nat : 3; /* enum ovs_ct_nat */
+   u8 force : 1;
u16 family;
struct md_mark mark;
struct md_labels labels;
@@ -613,10 +614,13 @@ static bool skb_nfct_cached(struct net *net,
 */
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
-   key->ct.zone == info->zone.id)
+   key->ct.zone == info->zone.id) {
ct = ovs_ct_find_existing(net, >zone, info->family, skb,
  !!(key->ct.state
 & OVS_CS_F_NAT_MASK));
+   if (ct)
+   nf_ct_get(skb, );
+   }
if (!ct)
return false;
if (!net_eq(net, read_pnet(>ct_net)))
@@ -630,6 +634,18 @@ static bool skb_nfct_cached(struct net *net,
if (help && rcu_access_pointer(help->helper) != info->helper)
return false;
}
+   /* Force conntrack entry direction to the current packet? */
+   if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
+   /* Delete the conntrack entry if confirmed, else just release
+* the reference.
+*/
+   if (nf_ct_is_confirmed(ct))
+   nf_ct_delete(ct, 0, 0);
+   else
+   nf_conntrack_put(>ct_general);
+   nf_ct_set(skb, NULL, 0);
+   return false;
+   }
 
return true;
 }
@@ -1207,6 +1223,7 @@ static int parse_nat(const struct nlattr *attr,
 
 static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
[OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 },
+   [OVS_CT_ATTR_FORCE_COMMIT]  = { .minlen = 0, .maxlen = 0 },
[OVS_CT_ATTR_ZONE]  = { .minlen = sizeof(u16),
.maxlen = sizeof(u16) },
[OVS_CT_ATTR_MARK]  = { .minlen = sizeof(struct md_mark),
@@ -1246,6 +1263,9 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
}
 
switch (type) {
+   case OVS_CT_ATTR_FORCE_COMMIT:
+   info-

[PATCH v4 net-next 02/10] openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.

2017-02-09 Thread Jarno Rajahalme
The conntrack lookup for existing connections fails to invert the
packet 5-tuple for NATted packets, and therefore fails to find the
existing conntrack entry.  Conntrack only stores 5-tuples for incoming
packets, and there are various situations where a lookup on a packet
that has already been transformed by NAT needs to be made.  Looking up
an existing conntrack entry upon executing packet received from the
userspace is one of them.

This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
for the conntrack lookup whenever the packet has already been
transformed by conntrack from its input form as evidenced by one of
the NAT flags being set in the conntrack state metadata.

Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 net/openvswitch/conntrack.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 5de6d12..4df9a54 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -430,7 +430,7 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
-u8 l3num, struct sk_buff *skb)
+u8 l3num, struct sk_buff *skb, bool natted)
 {
struct nf_conntrack_l3proto *l3proto;
struct nf_conntrack_l4proto *l4proto;
@@ -453,6 +453,17 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return NULL;
}
 
+   /* Must invert the tuple if skb has been transformed by NAT. */
+   if (natted) {
+   struct nf_conntrack_tuple inverse;
+
+   if (!nf_ct_invert_tuple(, , l3proto, l4proto)) {
+   pr_debug("ovs_ct_find_existing: Inversion failed!\n");
+   return NULL;
+   }
+   tuple = inverse;
+   }
+
/* look for tuple match */
h = nf_conntrack_find_get(net, zone, );
if (!h)
@@ -460,6 +471,13 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
 
ct = nf_ct_tuplehash_to_ctrack(h);
 
+   /* Inverted packet tuple matches the reverse direction conntrack tuple,
+* select the other tuplehash to get the right 'ctinfo' bits for this
+* packet.
+*/
+   if (natted)
+   h = >tuplehash[!h->tuple.dst.dir];
+
nf_ct_set(skb, ct, ovs_ct_get_info(h));
return ct;
 }
@@ -482,7 +500,9 @@ static bool skb_nfct_cached(struct net *net,
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
key->ct.zone == info->zone.id)
-   ct = ovs_ct_find_existing(net, >zone, info->family, skb);
+   ct = ovs_ct_find_existing(net, >zone, info->family, skb,
+ !!(key->ct.state
+& OVS_CS_F_NAT_MASK));
if (!ct)
return false;
if (!net_eq(net, read_pnet(>ct_net)))
-- 
2.1.4



Re: [PATCH v2 net-next 5/9] openvswitch: Refactor labels initialization.

2017-02-09 Thread Jarno Rajahalme

> On Feb 8, 2017, at 3:06 PM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 8 February 2017 at 11:32, Jarno Rajahalme <ja...@ovn.org> wrote:
>> Refactoring conntrack labels initialization makes chenges in later
> 
> *changes
> 
>> patches easier to review.
>> 
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> 
> Minor other nit:
> 
>> 
>>cl = nf_ct_labels_find(ct);
>>if (!cl) {
>>nf_ct_labels_ext_add(ct);
>>cl = nf_ct_labels_find(ct);
>>}
>> +
>> +   return cl;
>> +}
>> +
>> +/* Initialize labels for a new, to be committed conntrack entry.  Note that
> 
> Maybe insert 'yet':
> 
> Initialize labels for a new, yet to be committed conntrack entry.  Note that


Fixed these typos for v4.

  Jarno



Re: [PATCH v2 net-next 6/9] openvswitch: Inherit master's labels.

2017-02-09 Thread Jarno Rajahalme

> On Feb 8, 2017, at 3:25 PM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 8 February 2017 at 11:32, Jarno Rajahalme <ja...@ovn.org> wrote:
>> We avoid calling into nf_conntrack_in() for expected connections, as
>> that would remove the expectation that we want to stick around until
>> we are ready to commit the connection.  Instead, we do a lookup in the
>> expectation table directly.  However, after a successful expectation
>> lookup we have set the flow key label field from the master
>> connection, whereas nf_conntrack_in() does not do this.  This leads to
>> master's labels being inherited after an expectation lookup, but those
>> labels not being inherited after the corresponding conntrack action
>> with a commit flag.
>> 
>> This patch resolves the problem by changing the commit code path to
>> also inherit the master's labels to the expected connection.
>> Resolving this conflict in favor or inheriting the labels allows more
> 
> *of inheriting
> 
>> information be passed from the master connection to related
>> connections, which would otherwise be much harder if the 32 bits in
>> the connmark are not enough.  Labels can still be set explicitly, so
>> this change only affects the default values of the labels in presense
>> of a master connection.
>> 
>> Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> 
> 
> 
>> @@ -272,18 +274,32 @@ static int ovs_ct_init_labels(struct nf_conn *ct, 
>> struct sw_flow_key *key,
>>  const struct ovs_key_ct_labels *labels,
>>  const struct ovs_key_ct_labels *mask)
>> {
>> -   struct nf_conn_labels *cl;
>> -   u32 *dst;
>> -   int i;
>> +   struct nf_conn_labels *cl, *master_cl;
>> +   bool have_mask = labels_nonzero(mask);
>> +
>> +   /* Inherit master's labels to the related connection? */
>> +   master_cl = (ct->master) ? nf_ct_labels_find(ct->master) : NULL;
> 
> I don't think (ct->master) needs the parentheses.

Fixed these style issues for v4.

  Jarno




[PATCH v3 net-next 10/10] openvswitch: Pack struct sw_flow_key.

2017-02-08 Thread Jarno Rajahalme
struct sw_flow_key has two 16-bit holes. Move the most matched
conntrack match fields there.  In some typical cases this reduces the
size of the key that needs to be hashed into half and into one cache
line.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c| 40 
 net/openvswitch/conntrack.h|  8 
 net/openvswitch/flow.h | 14 --
 net/openvswitch/flow_netlink.c | 11 +++
 4 files changed, 39 insertions(+), 34 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index de47782..a12825e 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -154,7 +154,7 @@ static void __ovs_ct_update_key_orig_tp(struct sw_flow_key 
*key,
const struct nf_conntrack_tuple *orig,
u8 icmp_proto)
 {
-   key->ct.orig_proto = orig->dst.protonum;
+   key->ct_orig_proto = orig->dst.protonum;
if (orig->dst.protonum == icmp_proto) {
key->ct.orig_tp.src = htons(orig->dst.u.icmp.type);
key->ct.orig_tp.dst = htons(orig->dst.u.icmp.code);
@@ -168,8 +168,8 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
const struct nf_conntrack_zone *zone,
const struct nf_conn *ct)
 {
-   key->ct.state = state;
-   key->ct.zone = zone->id;
+   key->ct_state = state;
+   key->ct_zone = zone->id;
key->ct.mark = ovs_ct_get_mark(ct);
ovs_ct_get_labels(ct, >ct.labels);
 
@@ -197,10 +197,10 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, 
u8 state,
return;
}
}
-   /* Clear 'ct.orig_proto' to mark the non-existence of conntrack
+   /* Clear 'ct_orig_proto' to mark the non-existence of conntrack
 * original direction key fields.
 */
-   key->ct.orig_proto = 0;
+   key->ct_orig_proto = 0;
 }
 
 /* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
@@ -230,7 +230,7 @@ static void ovs_ct_update_key(const struct sk_buff *skb,
if (ct->master)
state |= OVS_CS_F_RELATED;
if (keep_nat_flags) {
-   state |= key->ct.state & OVS_CS_F_NAT_MASK;
+   state |= key->ct_state & OVS_CS_F_NAT_MASK;
} else {
if (ct->status & IPS_SRC_NAT)
state |= OVS_CS_F_SRC_NAT;
@@ -261,11 +261,11 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct 
sw_flow_key *key)
 int ovs_ct_put_key(const struct sw_flow_key *swkey,
   const struct sw_flow_key *output, struct sk_buff *skb)
 {
-   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct.state))
+   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct_state))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
-   nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct.zone))
+   nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct_zone))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) &&
@@ -277,14 +277,14 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey,
>ct.labels))
return -EMSGSIZE;
 
-   if (swkey->ct.orig_proto) {
+   if (swkey->ct_orig_proto) {
if (swkey->eth.type == htons(ETH_P_IP)) {
struct ovs_key_ct_tuple_ipv4 orig = {
output->ipv4.ct_orig.src,
output->ipv4.ct_orig.dst,
output->ct.orig_tp.src,
output->ct.orig_tp.dst,
-   output->ct.orig_proto,
+   output->ct_orig_proto,
};
if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,
sizeof(orig), ))
@@ -295,7 +295,7 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey,
IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.dst),
output->ct.orig_tp.src,
output->ct.orig_tp.dst,
-   output->ct.orig_proto,
+   output->ct_orig_proto,
};
if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,
sizeof(orig), ))
@@ -614,11 +614,11 @@ static bool skb_nfct_cached(struct net *net,
 * due to an upcall.  If the connection was not confirmed, it is not
 * cached and needs to be run thro

[PATCH v3 net-next 04/10] openvswitch: Unionize ovs_key_ct_label with a u32 array.

2017-02-08 Thread Jarno Rajahalme
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array.  It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/uapi/linux/openvswitch.h |  8 ++--
 net/openvswitch/conntrack.c  | 15 ---
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 375d812..96aee34 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -446,9 +446,13 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
-#define OVS_CT_LABELS_LEN  16
+#define OVS_CT_LABELS_LEN_32   4
+#define OVS_CT_LABELS_LEN  (OVS_CT_LABELS_LEN_32 * sizeof(__u32))
 struct ovs_key_ct_labels {
-   __u8ct_labels[OVS_CT_LABELS_LEN];
+   union {
+   __u8ct_labels[OVS_CT_LABELS_LEN];
+   __u32   ct_labels_32[OVS_CT_LABELS_LEN_32];
+   };
 };
 
 /* OVS_KEY_ATTR_CT_STATE flags */
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index a6ff374..f23934c 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -281,20 +281,21 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
/* Triggers a change event, which makes sense only for
 * confirmed connections.
 */
-   int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / 
sizeof(u32));
+   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
if (err)
return err;
} else {
u32 *dst = (u32 *)cl->bits;
-   const u32 *msk = (const u32 *)mask->ct_labels;
-   const u32 *lbl = (const u32 *)labels->ct_labels;
+   const u32 *msk = mask->ct_labels_32;
+   const u32 *lbl = labels->ct_labels_32;
int i;
 
/* No-one else has access to the non-confirmed entry, copy
 * labels over, keeping any bits we are not explicitly setting.
 */
-   for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++)
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
}
 
@@ -866,8 +867,8 @@ static bool labels_nonzero(const struct ovs_key_ct_labels 
*labels)
 {
size_t i;
 
-   for (i = 0; i < sizeof(*labels); i++)
-   if (labels->ct_labels[i])
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   if (labels->ct_labels_32[i])
return true;
 
return false;
-- 
2.1.4



[PATCH v3 net-next 07/10] openvswitch: Inherit master's labels.

2017-02-08 Thread Jarno Rajahalme
We avoid calling into nf_conntrack_in() for expected connections, as
that would remove the expectation that we want to stick around until
we are ready to commit the connection.  Instead, we do a lookup in the
expectation table directly.  However, after a successful expectation
lookup we have set the flow key label field from the master
connection, whereas nf_conntrack_in() does not do this.  This leads to
master's labels being inherited after an expectation lookup, but those
labels not being inherited after the corresponding conntrack action
with a commit flag.

This patch resolves the problem by changing the commit code path to
also inherit the master's labels to the expected connection.
Resolving this conflict in favor or inheriting the labels allows more
information be passed from the master connection to related
connections, which would otherwise be much harder if the 32 bits in
the connmark are not enough.  Labels can still be set explicitly, so
this change only affects the default values of the labels in presense
of a master connection.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 0e038ee..5fbadcd 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -73,6 +73,8 @@ struct ovs_conntrack_info {
 #endif
 };
 
+static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
+
 static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info);
 
 static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -272,18 +274,32 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
  const struct ovs_key_ct_labels *labels,
  const struct ovs_key_ct_labels *mask)
 {
-   struct nf_conn_labels *cl;
-   u32 *dst;
-   int i;
+   struct nf_conn_labels *cl, *master_cl;
+   bool have_mask = labels_nonzero(mask);
+
+   /* Inherit master's labels to the related connection? */
+   master_cl = (ct->master) ? nf_ct_labels_find(ct->master) : NULL;
+
+   if (!master_cl && !have_mask)
+   return 0;   /* Nothing to do. */
 
cl = ovs_ct_get_conn_labels(ct);
if (!cl)
return -ENOSPC;
 
-   dst = (u32 *)cl->bits;
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
-   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
+   /* Inherit the master's labels, if any. */
+   if (master_cl)
+   *cl = *master_cl;
+
+   if (have_mask) {
+   u32 *dst = (u32 *)cl->bits;
+   int i;
+
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i]
+& mask->ct_labels_32[i]);
+   }
 
memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
 
@@ -911,13 +927,14 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
if (err)
return err;
}
-   if (labels_nonzero(>labels.mask)) {
-   if (!nf_ct_is_confirmed(ct))
-   err = ovs_ct_init_labels(ct, key, >labels.value,
->labels.mask);
-   else
-   err = ovs_ct_set_labels(ct, key, >labels.value,
-   >labels.mask);
+   if (!nf_ct_is_confirmed(ct)) {
+   err = ovs_ct_init_labels(ct, key, >labels.value,
+>labels.mask);
+   if (err)
+   return err;
+   } else if (labels_nonzero(>labels.mask)) {
+   err = ovs_ct_set_labels(ct, key, >labels.value,
+   >labels.mask);
if (err)
return err;
}
-- 
2.1.4



[PATCH v3 net-next 05/10] openvswitch: Simplify labels length logic.

2017-02-08 Thread Jarno Rajahalme
Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
distinct labels"), the size of conntrack labels extension has fixed to
128 bits, so we do not need to check for labels sizes shorter than 128
at run-time.  This patch simplifies labels length logic accordingly,
but allows the conntrack labels size to be increased in the future
without breaking the build.  In the event of conntrack labels
increasing in size OVS would still be able to deal with the 128 first
label bits.

Suggested-by: Joe Stringer <j...@ovn.org>
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index f23934c..c7db4da 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -129,22 +129,22 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct)
 #endif
 }
 
+/* Guard against conntrack labels max size shrinking below 128 bits. */
+#if NF_CT_LABELS_MAX_SIZE < 16
+#error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes
+#endif
+
 static void ovs_ct_get_labels(const struct nf_conn *ct,
  struct ovs_key_ct_labels *labels)
 {
struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
 
-   if (cl) {
-   size_t len = sizeof(cl->bits);
-
-   if (len > OVS_CT_LABELS_LEN)
-   len = OVS_CT_LABELS_LEN;
-   else if (len < OVS_CT_LABELS_LEN)
-   memset(labels, 0, OVS_CT_LABELS_LEN);
-   memcpy(labels, cl->bits, len);
-   } else {
+   if (cl)
+   memcpy(labels, cl->bits,
+  sizeof(cl->bits) > OVS_CT_LABELS_LEN
+  ? OVS_CT_LABELS_LEN : sizeof(cl->bits));
+   else
memset(labels, 0, OVS_CT_LABELS_LEN);
-   }
 }
 
 static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
@@ -274,7 +274,7 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
-   if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN)
+   if (!cl)
return -ENOSPC;
 
if (nf_ct_is_confirmed(ct)) {
-- 
2.1.4



[PATCH v3 net-next 00/10] openvswitch: Conntrack integration improvements.

2017-02-08 Thread Jarno Rajahalme
This series improves the conntrack integration code in the openvswitch
module by fixing outdated comments (patch 1), bugs (patches 2, 3, and
7), clarifying code (patches 4, 5, and 6), improving performance
(patch 10), and adding new features enabling better translation from
firewall admission policy to network configuration requested by user
communities (patches 8 and 9).

v3: Rebase to the current net-next, add the comment only changing
patch 1 and reshuffle some of the patches as requested by Joe.

Jarno Rajahalme (10):
  openvswitch: Fix comments for skb->_nfct
  openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.
  openvswitch: Do not trigger events for unconfirmed connections.
  openvswitch: Unionize ovs_key_ct_label with a u32 array.
  openvswitch: Simplify labels length logic.
  openvswitch: Refactor labels initialization.
  openvswitch: Inherit master's labels.
  openvswitch: Add original direction conntrack tuple to sw_flow_key.
  openvswitch: Add force commit.
  openvswitch: Pack struct sw_flow_key.

 include/uapi/linux/openvswitch.h |  33 -
 net/openvswitch/actions.c|   2 +
 net/openvswitch/conntrack.c  | 298 ++-
 net/openvswitch/conntrack.h  |  14 +-
 net/openvswitch/flow.c   |  34 -
 net/openvswitch/flow.h   |  55 ++--
 net/openvswitch/flow_netlink.c   |  92 +---
 net/openvswitch/flow_netlink.h   |   7 +-
 8 files changed, 422 insertions(+), 113 deletions(-)

-- 
2.1.4



[PATCH v3 net-next 06/10] openvswitch: Refactor labels initialization.

2017-02-08 Thread Jarno Rajahalme
Refactoring conntrack labels initialization makes chenges in later
patches easier to review.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 104 ++--
 1 file changed, 62 insertions(+), 42 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c7db4da..0e038ee 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -229,19 +229,12 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct 
sk_buff *skb)
return 0;
 }
 
-static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
+static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key,
   u32 ct_mark, u32 mask)
 {
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
-   enum ip_conntrack_info ctinfo;
-   struct nf_conn *ct;
u32 new_mark;
 
-   /* The connection could be invalid, in which case set_mark is no-op. */
-   ct = nf_ct_get(skb, );
-   if (!ct)
-   return 0;
-
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
@@ -256,50 +249,66 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
 #endif
 }
 
-static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key,
-const struct ovs_key_ct_labels *labels,
-const struct ovs_key_ct_labels *mask)
+static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct)
 {
-   enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
-   struct nf_conn *ct;
-
-   /* The connection could be invalid, in which case set_label is no-op.*/
-   ct = nf_ct_get(skb, );
-   if (!ct)
-   return 0;
 
cl = nf_ct_labels_find(ct);
if (!cl) {
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
+
+   return cl;
+}
+
+/* Initialize labels for a new, to be committed conntrack entry.  Note that
+ * since the new connection is not yet confirmed, and thus no-one else has
+ * access to it's labels, we simply write them over.  Also, we refrain from
+ * triggering events, as receiving change events before the create event would
+ * be confusing.
+ */
+static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key,
+ const struct ovs_key_ct_labels *labels,
+ const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   u32 *dst;
+   int i;
+
+   cl = ovs_ct_get_conn_labels(ct);
if (!cl)
return -ENOSPC;
 
-   if (nf_ct_is_confirmed(ct)) {
-   /* Triggers a change event, which makes sense only for
-* confirmed connections.
-*/
-   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
-   mask->ct_labels_32,
-   OVS_CT_LABELS_LEN_32);
-   if (err)
-   return err;
-   } else {
-   u32 *dst = (u32 *)cl->bits;
-   const u32 *msk = mask->ct_labels_32;
-   const u32 *lbl = labels->ct_labels_32;
-   int i;
+   dst = (u32 *)cl->bits;
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
 
-   /* No-one else has access to the non-confirmed entry, copy
-* labels over, keeping any bits we are not explicitly setting.
-*/
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
-   }
+   memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
+
+   return 0;
+}
+
+static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key,
+const struct ovs_key_ct_labels *labels,
+const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   int err;
+
+   cl = ovs_ct_get_conn_labels(ct);
+   if (!cl)
+   return -ENOSPC;
+
+   err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
+   if (err)
+   return err;
+
+   memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
 
-   ovs_ct_get_labels(ct, >ct.labels);
return 0;
 }
 
@@ -879,25 +888,36 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
 const struct ovs_conntrack_info *info,
 struct sk_buff *skb)
 {
+   

[PATCH v3 net-next 03/10] openvswitch: Do not trigger events for unconfirmed connections.

2017-02-08 Thread Jarno Rajahalme
Receiving change events before the 'new' event for the connection has
been received can be confusing.  Avoid triggering change events for
setting conntrack mark or labels before the conntrack entry has been
confirmed.

Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark")
Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4df9a54..a6ff374 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -245,7 +245,8 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
-   nf_conntrack_event_cache(IPCT_MARK, ct);
+   if (nf_ct_is_confirmed(ct))
+   nf_conntrack_event_cache(IPCT_MARK, ct);
key->ct.mark = new_mark;
}
 
@@ -262,7 +263,6 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
struct nf_conn *ct;
-   int err;
 
/* The connection could be invalid, in which case set_label is no-op.*/
ct = nf_ct_get(skb, );
@@ -277,10 +277,26 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN)
return -ENOSPC;
 
-   err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / sizeof(u32));
-   if (err)
-   return err;
+   if (nf_ct_is_confirmed(ct)) {
+   /* Triggers a change event, which makes sense only for
+* confirmed connections.
+*/
+   int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
+   OVS_CT_LABELS_LEN / 
sizeof(u32));
+   if (err)
+   return err;
+   } else {
+   u32 *dst = (u32 *)cl->bits;
+   const u32 *msk = (const u32 *)mask->ct_labels;
+   const u32 *lbl = (const u32 *)labels->ct_labels;
+   int i;
+
+   /* No-one else has access to the non-confirmed entry, copy
+* labels over, keeping any bits we are not explicitly setting.
+*/
+   for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++)
+   dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
+   }
 
ovs_ct_get_labels(ct, >ct.labels);
return 0;
-- 
2.1.4



[PATCH v3 net-next 01/10] openvswitch: Fix comments for skb->_nfct

2017-02-08 Thread Jarno Rajahalme
Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
they are combined into '_nfct'.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index fbffe0e..5de6d12 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -157,7 +157,7 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
ovs_ct_get_labels(ct, >ct.labels);
 }
 
-/* Update 'key' based on skb->nfct.  If 'post_ct' is true, then OVS has
+/* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
  * previously sent the packet to conntrack via the ct action.  If
  * 'keep_nat_flags' is true, the existing NAT flags retained, else they are
  * initialized from the connection status.
@@ -421,12 +421,12 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
 
 /* Find an existing connection which this packet belongs to without
  * re-attributing statistics or modifying the connection state.  This allows an
- * skb->nfct lost due to an upcall to be recovered during actions execution.
+ * skb->_nfct lost due to an upcall to be recovered during actions execution.
  *
  * Must be called with rcu_read_lock.
  *
- * On success, populates skb->nfct and skb->nfctinfo, and returns the
- * connection.  Returns NULL if there is no existing entry.
+ * On success, populates skb->_nfct and returns the connection.  Returns NULL
+ * if there is no existing entry.
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
@@ -464,7 +464,7 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return ct;
 }
 
-/* Determine whether skb->nfct is equal to the result of conntrack lookup. */
+/* Determine whether skb->_nfct is equal to the result of conntrack lookup. */
 static bool skb_nfct_cached(struct net *net,
const struct sw_flow_key *key,
const struct ovs_conntrack_info *info,
@@ -475,7 +475,7 @@ static bool skb_nfct_cached(struct net *net,
 
ct = nf_ct_get(skb, );
/* If no ct, check if we have evidence that an existing conntrack entry
-* might be found for this skb.  This happens when we lose a skb->nfct
+* might be found for this skb.  This happens when we lose a skb->_nfct
 * due to an upcall.  If the connection was not confirmed, it is not
 * cached and needs to be run through conntrack again.
 */
@@ -699,7 +699,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if
  * not done already.  Update key with new CT state after passing the packet
  * through conntrack.
- * Note that if the packet is deemed invalid by conntrack, skb->nfct will be
+ * Note that if the packet is deemed invalid by conntrack, skb->_nfct will be
  * set to NULL and 0 will be returned.
  */
 static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key,
-- 
2.1.4



[PATCH v3 net-next 09/10] openvswitch: Add force commit.

2017-02-08 Thread Jarno Rajahalme
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.

Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched.  If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.

This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry.  If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/uapi/linux/openvswitch.h |  5 +
 net/openvswitch/conntrack.c  | 26 --
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 90af8b8..7f41f7d 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -674,6 +674,10 @@ struct ovs_action_hash {
  * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
  * @OVS_CT_ATTR_NAT: Nested OVS_NAT_ATTR_* for performing L3 network address
  * translation (NAT) on the packet.
+ * @OVS_CT_ATTR_FORCE_COMMIT: Like %OVS_CT_ATTR_COMMIT, but instead of doing
+ * nothing if the connection is already committed will check that the current
+ * packet is in conntrack entry's original direction.  If directionality does
+ * not match, will delete the existing conntrack entry and commit a new one.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -684,6 +688,7 @@ enum ovs_ct_attr {
OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of
   related connections. */
OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */
+   OVS_CT_ATTR_FORCE_COMMIT,  /* No argument */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 8685bcd..de47782 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -65,6 +65,7 @@ struct ovs_conntrack_info {
struct nf_conn *ct;
u8 commit : 1;
u8 nat : 3; /* enum ovs_ct_nat */
+   u8 force : 1;
u16 family;
struct md_mark mark;
struct md_labels labels;
@@ -615,10 +616,13 @@ static bool skb_nfct_cached(struct net *net,
 */
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
-   key->ct.zone == info->zone.id)
+   key->ct.zone == info->zone.id) {
ct = ovs_ct_find_existing(net, >zone, info->family, skb,
  !!(key->ct.state
 & OVS_CS_F_NAT_MASK));
+   if (ct)
+   nf_ct_get(skb, );
+   }
if (!ct)
return false;
if (!net_eq(net, read_pnet(>ct_net)))
@@ -632,6 +636,18 @@ static bool skb_nfct_cached(struct net *net,
if (help && rcu_access_pointer(help->helper) != info->helper)
return false;
}
+   /* Force conntrack entry direction to the current packet? */
+   if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
+   /* Delete the conntrack entry if confirmed, else just release
+* the reference.
+*/
+   if (nf_ct_is_confirmed(ct))
+   nf_ct_delete(ct, 0, 0);
+   else
+   nf_conntrack_put(>ct_general);
+   nf_ct_set(skb, NULL, 0);
+   return false;
+   }
 
return true;
 }
@@ -1209,6 +1225,7 @@ static int parse_nat(const struct nlattr *attr,
 
 static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
[OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 },
+   [OVS_CT_ATTR_FORCE_COMMIT]  = { .minlen = 0, .maxlen = 0 },
[OVS_CT_ATTR_ZONE]  = { .minlen = sizeof(u16),
.maxlen = sizeof(u16) },
[OVS_CT_ATTR_MARK]  = { .minlen = sizeof(struct md_mark),
@@ -1248,6 +1265,9 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
}
 
switch (type) {
+   case OVS_CT_ATTR_FORCE_COMMIT:
+   info->force = true;
+   /* fall throug

[PATCH v3 net-next 02/10] openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.

2017-02-08 Thread Jarno Rajahalme
The conntrack lookup for existing connections fails to invert the
packet 5-tuple for NATted packets, and therefore fails to find the
existing conntrack entry.  Conntrack only stores 5-tuples for incoming
packets, and there are various situations where a lookup on a packet
that has already been transformed by NAT needs to be made.  Looking up
an existing conntrack entry upon executing packet received from the
userspace is one of them.

This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
for the conntrack lookup whenever the packet has already been
transformed by conntrack from its input form as evidenced by one of
the NAT flags being set in the conntrack state metadata.

Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 5de6d12..4df9a54 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -430,7 +430,7 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
-u8 l3num, struct sk_buff *skb)
+u8 l3num, struct sk_buff *skb, bool natted)
 {
struct nf_conntrack_l3proto *l3proto;
struct nf_conntrack_l4proto *l4proto;
@@ -453,6 +453,17 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return NULL;
}
 
+   /* Must invert the tuple if skb has been transformed by NAT. */
+   if (natted) {
+   struct nf_conntrack_tuple inverse;
+
+   if (!nf_ct_invert_tuple(, , l3proto, l4proto)) {
+   pr_debug("ovs_ct_find_existing: Inversion failed!\n");
+   return NULL;
+   }
+   tuple = inverse;
+   }
+
/* look for tuple match */
h = nf_conntrack_find_get(net, zone, );
if (!h)
@@ -460,6 +471,13 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
 
ct = nf_ct_tuplehash_to_ctrack(h);
 
+   /* Inverted packet tuple matches the reverse direction conntrack tuple,
+* select the other tuplehash to get the right 'ctinfo' bits for this
+* packet.
+*/
+   if (natted)
+   h = >tuplehash[!h->tuple.dst.dir];
+
nf_ct_set(skb, ct, ovs_ct_get_info(h));
return ct;
 }
@@ -482,7 +500,9 @@ static bool skb_nfct_cached(struct net *net,
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
key->ct.zone == info->zone.id)
-   ct = ovs_ct_find_existing(net, >zone, info->family, skb);
+   ct = ovs_ct_find_existing(net, >zone, info->family, skb,
+ !!(key->ct.state
+& OVS_CS_F_NAT_MASK));
if (!ct)
return false;
if (!net_eq(net, read_pnet(>ct_net)))
-- 
2.1.4



[PATCH v3 net-next 08/10] openvswitch: Add original direction conntrack tuple to sw_flow_key.

2017-02-08 Thread Jarno Rajahalme
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key.  The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry.  This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple.  This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state.  While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards.  If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change.  When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information.  If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP not ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields.  Hence, the IP addresses are overlaid in union with ARP
and ND fields.  This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets.  ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/uapi/linux/openvswitch.h | 20 +-
 net/openvswitch/actions.c|  2 +
 net/openvswitch/conntrack.c  | 86 +---
 net/openvswitch/conntrack.h  | 10 -
 net/openvswitch/flow.c   | 34 +---
 net/openvswitch/flow.h   | 49 ++-
 net/openvswitch/flow_netlink.c   | 85 +--
 net/openvswitch/flow_netlink.h   |  7 +++-
 8 files changed, 246 insertions(+), 47 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 96aee34..90af8b8 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -1,6 +1,6 @@
 
 /*
- * Copyright (c) 2007-2013 Nicira, Inc.
+ * Copyright (c) 2007-2017 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -331,6 +331,8 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
+   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
+   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -472,6 +474,22 @@ struct ovs_key_ct_labels {
 
 #define OVS_CS_F_NAT_MASK (OVS_CS_F_SRC_NAT | OVS_CS_F_DST_NAT)
 
+struct ovs_key_ct_tuple_ipv4 {
+   __be32 ipv4_src;
+   __be32 ipv4_dst;
+   __be16 src_port;
+   __be16 dst_port;
+   __u8   ipv4_proto;
+};
+
+

[PATCH v2 net-next 3/9] openvswitch: Simplify labels length logic.

2017-02-08 Thread Jarno Rajahalme
Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
distinct labels"), the size of conntrack labels extension has fixed to
128 bits, so we do not need to check for labels sizes shorter than 128
at run-time.  This patch simplifies labels length logic accordingly,
but allows the conntrack labels size to be increased in the future
without breaking the build.  In the event of conntrack labels
increasing in size OVS would still be able to deal with the 128 first
label bits.

Suggested-by: Joe Stringer <j...@ovn.org>
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 6730f09..a07e5cd 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -129,22 +129,22 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct)
 #endif
 }
 
+/* Guard against conntrack labels max size shrinking below 128 bits. */
+#if NF_CT_LABELS_MAX_SIZE < 16
+#error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes
+#endif
+
 static void ovs_ct_get_labels(const struct nf_conn *ct,
  struct ovs_key_ct_labels *labels)
 {
struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
 
-   if (cl) {
-   size_t len = sizeof(cl->bits);
-
-   if (len > OVS_CT_LABELS_LEN)
-   len = OVS_CT_LABELS_LEN;
-   else if (len < OVS_CT_LABELS_LEN)
-   memset(labels, 0, OVS_CT_LABELS_LEN);
-   memcpy(labels, cl->bits, len);
-   } else {
+   if (cl)
+   memcpy(labels, cl->bits,
+  sizeof(cl->bits) > OVS_CT_LABELS_LEN
+  ? OVS_CT_LABELS_LEN : sizeof(cl->bits));
+   else
memset(labels, 0, OVS_CT_LABELS_LEN);
-   }
 }
 
 static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
@@ -274,7 +274,7 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
-   if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN)
+   if (!cl)
return -ENOSPC;
 
err = nf_connlabels_replace(ct, labels->ct_labels_32,
-- 
2.1.4



[PATCH v2 net-next 4/9] openvswitch: Do not trigger events for unconfirmed connections.

2017-02-08 Thread Jarno Rajahalme
Receiving change events before the 'new' event for the connection has
been received can be confusing.  Avoid triggering change events for
setting conntrack mark or labels before the conntrack entry has been
confirmed.

Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark")
Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index a07e5cd..6e3e5e7 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -245,7 +245,8 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
-   nf_conntrack_event_cache(IPCT_MARK, ct);
+   if (nf_ct_is_confirmed(ct))
+   nf_conntrack_event_cache(IPCT_MARK, ct);
key->ct.mark = new_mark;
}
 
@@ -262,7 +263,6 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
struct nf_conn *ct;
-   int err;
 
/* The connection could be invalid, in which case set_label is no-op.*/
ct = nf_ct_get(skb, );
@@ -277,11 +277,27 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
if (!cl)
return -ENOSPC;
 
-   err = nf_connlabels_replace(ct, labels->ct_labels_32,
-   mask->ct_labels_32,
-   OVS_CT_LABELS_LEN_32);
-   if (err)
-   return err;
+   if (nf_ct_is_confirmed(ct)) {
+   /* Triggers a change event, which makes sense only for
+* confirmed connections.
+*/
+   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
+   if (err)
+   return err;
+   } else {
+   u32 *dst = (u32 *)cl->bits;
+   int i;
+
+   /* No-one else has access to the non-confirmed entry, copy
+* labels over, keeping any bits we are not explicitly setting.
+*/
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i] &
+mask->ct_labels_32[i]);
+   }
 
ovs_ct_get_labels(ct, >ct.labels);
return 0;
-- 
2.1.4



[PATCH v2 net-next 0/9] openvswitch: Conntrack integration improvements.

2017-02-08 Thread Jarno Rajahalme
This series improves the conntrack integration code in the openvswitch
module by fixing bugs (patches 1, 4, and 6), clarifying code (patches
2, 3, and 5), improving performance (patch 9), and adding new features
enabling better translation from firewall admission policy to network
configuration requested by user communities (patches 7 and 8).

Jarno Rajahalme (9):
  openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.
  openvswitch: Unionize ovs_key_ct_label with a u32 array.
  openvswitch: Simplify labels length logic.
  openvswitch: Do not trigger events for unconfirmed connections.
  openvswitch: Refactor labels initialization.
  openvswitch: Inherit master's labels.
  openvswitch: Add original direction conntrack tuple to sw_flow_key.
  openvswitch: Add force commit.
  openvswitch: Pack struct sw_flow_key.

 include/uapi/linux/openvswitch.h |  33 -
 net/openvswitch/actions.c|   2 +
 net/openvswitch/conntrack.c  | 285 +++
 net/openvswitch/conntrack.h  |  14 +-
 net/openvswitch/flow.c   |  34 -
 net/openvswitch/flow.h   |  55 ++--
 net/openvswitch/flow_netlink.c   |  92 ++---
 net/openvswitch/flow_netlink.h   |   7 +-
 8 files changed, 416 insertions(+), 106 deletions(-)

-- 
2.1.4



[PATCH v2 net-next 1/9] openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.

2017-02-08 Thread Jarno Rajahalme
The conntrack lookup for existing connections fails to invert the
packet 5-tuple for NATted packets, and therefore fails to find the
existing conntrack entry.  Conntrack only stores 5-tuples for incoming
packets, and there are various situations where a lookup on a packet
that has already been transformed by NAT needs to be made.  Looking up
an existing conntrack entry upon executing packet received from the
userspace is one of them.

This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
for the conntrack lookup whenever the packet has already been
transformed by conntrack from its input form as evidenced by one of
the NAT flags being set in the conntrack state metadata.

Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 54253ea..b91baa2 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -430,7 +430,7 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
-u8 l3num, struct sk_buff *skb)
+u8 l3num, struct sk_buff *skb, bool natted)
 {
struct nf_conntrack_l3proto *l3proto;
struct nf_conntrack_l4proto *l4proto;
@@ -453,6 +453,17 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return NULL;
}
 
+   /* Must invert the tuple if skb has been transformed by NAT. */
+   if (natted) {
+   struct nf_conntrack_tuple inverse;
+
+   if (!nf_ct_invert_tuple(, , l3proto, l4proto)) {
+   pr_debug("ovs_ct_find_existing: Inversion failed!\n");
+   return NULL;
+   }
+   tuple = inverse;
+   }
+
/* look for tuple match */
h = nf_conntrack_find_get(net, zone, );
if (!h)
@@ -460,6 +471,13 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
 
ct = nf_ct_tuplehash_to_ctrack(h);
 
+   /* Inverted packet tuple matches the reverse direction conntrack tuple,
+* select the other tuplehash to get the right 'ctinfo' bits for this
+* packet.
+*/
+   if (natted)
+   h = >tuplehash[!h->tuple.dst.dir];
+
skb->nfct = >ct_general;
skb->nfctinfo = ovs_ct_get_info(h);
return ct;
@@ -483,7 +501,9 @@ static bool skb_nfct_cached(struct net *net,
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
key->ct.zone == info->zone.id)
-   ct = ovs_ct_find_existing(net, >zone, info->family, skb);
+   ct = ovs_ct_find_existing(net, >zone, info->family, skb,
+ !!(key->ct.state
+& OVS_CS_F_NAT_MASK));
if (!ct)
return false;
if (!net_eq(net, read_pnet(>ct_net)))
-- 
2.1.4



[PATCH v2 net-next 2/9] openvswitch: Unionize ovs_key_ct_label with a u32 array.

2017-02-08 Thread Jarno Rajahalme
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array.  It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/uapi/linux/openvswitch.h | 8 ++--
 net/openvswitch/conntrack.c  | 9 +
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 375d812..96aee34 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -446,9 +446,13 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
-#define OVS_CT_LABELS_LEN  16
+#define OVS_CT_LABELS_LEN_32   4
+#define OVS_CT_LABELS_LEN  (OVS_CT_LABELS_LEN_32 * sizeof(__u32))
 struct ovs_key_ct_labels {
-   __u8ct_labels[OVS_CT_LABELS_LEN];
+   union {
+   __u8ct_labels[OVS_CT_LABELS_LEN];
+   __u32   ct_labels_32[OVS_CT_LABELS_LEN_32];
+   };
 };
 
 /* OVS_KEY_ATTR_CT_STATE flags */
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index b91baa2..6730f09 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -277,8 +277,9 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN)
return -ENOSPC;
 
-   err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / sizeof(u32));
+   err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
if (err)
return err;
 
@@ -852,8 +853,8 @@ static bool labels_nonzero(const struct ovs_key_ct_labels 
*labels)
 {
size_t i;
 
-   for (i = 0; i < sizeof(*labels); i++)
-   if (labels->ct_labels[i])
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   if (labels->ct_labels_32[i])
return true;
 
return false;
-- 
2.1.4



[PATCH v2 net-next 7/9] openvswitch: Add original direction conntrack tuple to sw_flow_key.

2017-02-08 Thread Jarno Rajahalme
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key.  The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry.  This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple.  This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state.  While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards.  If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change.  When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information.  If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP not ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields.  Hence, the IP addresses are overlaid in union with ARP
and ND fields.  This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets.  ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/uapi/linux/openvswitch.h | 20 +-
 net/openvswitch/actions.c|  2 +
 net/openvswitch/conntrack.c  | 86 +---
 net/openvswitch/conntrack.h  | 10 -
 net/openvswitch/flow.c   | 34 +---
 net/openvswitch/flow.h   | 49 ++-
 net/openvswitch/flow_netlink.c   | 85 +--
 net/openvswitch/flow_netlink.h   |  7 +++-
 8 files changed, 246 insertions(+), 47 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 96aee34..90af8b8 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -1,6 +1,6 @@
 
 /*
- * Copyright (c) 2007-2013 Nicira, Inc.
+ * Copyright (c) 2007-2017 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -331,6 +331,8 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
+   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
+   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -472,6 +474,22 @@ struct ovs_key_ct_labels {
 
 #define OVS_CS_F_NAT_MASK (OVS_CS_F_SRC_NAT | OVS_CS_F_DST_NAT)
 
+struct ovs_key_ct_tuple_ipv4 {
+   __be32 ipv4_src;
+   __be32 ipv4_dst;
+   __be16 src_port;
+   __be16 dst_port;
+   __u8   ipv4_proto;
+};
+
+

[PATCH v2 net-next 9/9] openvswitch: Pack struct sw_flow_key.

2017-02-08 Thread Jarno Rajahalme
struct sw_flow_key has two 16-bit holes. Move the most matched
conntrack match fields there.  In some typical cases this reduces the
size of the key that needs to be hashed into half and into one cache
line.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c| 40 
 net/openvswitch/conntrack.h|  8 
 net/openvswitch/flow.h | 14 --
 net/openvswitch/flow_netlink.c | 11 +++
 4 files changed, 39 insertions(+), 34 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index f15c855..6ae18bd 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -154,7 +154,7 @@ static void __ovs_ct_update_key_orig_tp(struct sw_flow_key 
*key,
const struct nf_conntrack_tuple *orig,
u8 icmp_proto)
 {
-   key->ct.orig_proto = orig->dst.protonum;
+   key->ct_orig_proto = orig->dst.protonum;
if (orig->dst.protonum == icmp_proto) {
key->ct.orig_tp.src = htons(orig->dst.u.icmp.type);
key->ct.orig_tp.dst = htons(orig->dst.u.icmp.code);
@@ -168,8 +168,8 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
const struct nf_conntrack_zone *zone,
const struct nf_conn *ct)
 {
-   key->ct.state = state;
-   key->ct.zone = zone->id;
+   key->ct_state = state;
+   key->ct_zone = zone->id;
key->ct.mark = ovs_ct_get_mark(ct);
ovs_ct_get_labels(ct, >ct.labels);
 
@@ -197,10 +197,10 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, 
u8 state,
return;
}
}
-   /* Clear 'ct.orig_proto' to mark the non-existence of conntrack
+   /* Clear 'ct_orig_proto' to mark the non-existence of conntrack
 * original direction key fields.
 */
-   key->ct.orig_proto = 0;
+   key->ct_orig_proto = 0;
 }
 
 /* Update 'key' based on skb->nfct.  If 'post_ct' is true, then OVS has
@@ -230,7 +230,7 @@ static void ovs_ct_update_key(const struct sk_buff *skb,
if (ct->master)
state |= OVS_CS_F_RELATED;
if (keep_nat_flags) {
-   state |= key->ct.state & OVS_CS_F_NAT_MASK;
+   state |= key->ct_state & OVS_CS_F_NAT_MASK;
} else {
if (ct->status & IPS_SRC_NAT)
state |= OVS_CS_F_SRC_NAT;
@@ -261,11 +261,11 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct 
sw_flow_key *key)
 int ovs_ct_put_key(const struct sw_flow_key *swkey,
   const struct sw_flow_key *output, struct sk_buff *skb)
 {
-   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct.state))
+   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct_state))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
-   nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct.zone))
+   nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct_zone))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) &&
@@ -277,14 +277,14 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey,
>ct.labels))
return -EMSGSIZE;
 
-   if (swkey->ct.orig_proto) {
+   if (swkey->ct_orig_proto) {
if (swkey->eth.type == htons(ETH_P_IP)) {
struct ovs_key_ct_tuple_ipv4 orig = {
output->ipv4.ct_orig.src,
output->ipv4.ct_orig.dst,
output->ct.orig_tp.src,
output->ct.orig_tp.dst,
-   output->ct.orig_proto,
+   output->ct_orig_proto,
};
if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,
sizeof(orig), ))
@@ -295,7 +295,7 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey,
IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.dst),
output->ct.orig_tp.src,
output->ct.orig_tp.dst,
-   output->ct.orig_proto,
+   output->ct_orig_proto,
};
if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,
sizeof(orig), ))
@@ -615,11 +615,11 @@ static bool skb_nfct_cached(struct net *net,
 * due to an upcall.  If the connection was not confirmed, it is not
 * cached and needs to be run thro

[PATCH v2 net-next 5/9] openvswitch: Refactor labels initialization.

2017-02-08 Thread Jarno Rajahalme
Refactoring conntrack labels initialization makes chenges in later
patches easier to review.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 104 ++--
 1 file changed, 62 insertions(+), 42 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 6e3e5e7..026f4a9 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -229,19 +229,12 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct 
sk_buff *skb)
return 0;
 }
 
-static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
+static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key,
   u32 ct_mark, u32 mask)
 {
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
-   enum ip_conntrack_info ctinfo;
-   struct nf_conn *ct;
u32 new_mark;
 
-   /* The connection could be invalid, in which case set_mark is no-op. */
-   ct = nf_ct_get(skb, );
-   if (!ct)
-   return 0;
-
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
@@ -256,50 +249,66 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
 #endif
 }
 
-static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key,
-const struct ovs_key_ct_labels *labels,
-const struct ovs_key_ct_labels *mask)
+static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct)
 {
-   enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
-   struct nf_conn *ct;
-
-   /* The connection could be invalid, in which case set_label is no-op.*/
-   ct = nf_ct_get(skb, );
-   if (!ct)
-   return 0;
 
cl = nf_ct_labels_find(ct);
if (!cl) {
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
+
+   return cl;
+}
+
+/* Initialize labels for a new, to be committed conntrack entry.  Note that
+ * since the new connection is not yet confirmed, and thus no-one else has
+ * access to it's labels, we simply write them over.  Also, we refrain from
+ * triggering events, as receiving change events before the create event would
+ * be confusing.
+ */
+static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key,
+ const struct ovs_key_ct_labels *labels,
+ const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   u32 *dst;
+   int i;
+
+   cl = ovs_ct_get_conn_labels(ct);
if (!cl)
return -ENOSPC;
 
-   if (nf_ct_is_confirmed(ct)) {
-   /* Triggers a change event, which makes sense only for
-* confirmed connections.
-*/
-   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
-   mask->ct_labels_32,
-   OVS_CT_LABELS_LEN_32);
-   if (err)
-   return err;
-   } else {
-   u32 *dst = (u32 *)cl->bits;
-   int i;
+   dst = (u32 *)cl->bits;
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
 
-   /* No-one else has access to the non-confirmed entry, copy
-* labels over, keeping any bits we are not explicitly setting.
-*/
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
-   (labels->ct_labels_32[i] &
-mask->ct_labels_32[i]);
-   }
+   memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
+
+   return 0;
+}
+
+static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key,
+const struct ovs_key_ct_labels *labels,
+const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   int err;
+
+   cl = ovs_ct_get_conn_labels(ct);
+   if (!cl)
+   return -ENOSPC;
+
+   err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
+   if (err)
+   return err;
+
+   memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
 
-   ovs_ct_get_labels(ct, >ct.labels);
return 0;
 }
 
@@ -881,25 +890,36 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
 const struct ovs_conntrack_info *info,
 struct sk_buff *skb)
 {
+

[PATCH v2 net-next 6/9] openvswitch: Inherit master's labels.

2017-02-08 Thread Jarno Rajahalme
We avoid calling into nf_conntrack_in() for expected connections, as
that would remove the expectation that we want to stick around until
we are ready to commit the connection.  Instead, we do a lookup in the
expectation table directly.  However, after a successful expectation
lookup we have set the flow key label field from the master
connection, whereas nf_conntrack_in() does not do this.  This leads to
master's labels being inherited after an expectation lookup, but those
labels not being inherited after the corresponding conntrack action
with a commit flag.

This patch resolves the problem by changing the commit code path to
also inherit the master's labels to the expected connection.
Resolving this conflict in favor or inheriting the labels allows more
information be passed from the master connection to related
connections, which would otherwise be much harder if the 32 bits in
the connmark are not enough.  Labels can still be set explicitly, so
this change only affects the default values of the labels in presense
of a master connection.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 026f4a9..5b408f9 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -73,6 +73,8 @@ struct ovs_conntrack_info {
 #endif
 };
 
+static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
+
 static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info);
 
 static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -272,18 +274,32 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
  const struct ovs_key_ct_labels *labels,
  const struct ovs_key_ct_labels *mask)
 {
-   struct nf_conn_labels *cl;
-   u32 *dst;
-   int i;
+   struct nf_conn_labels *cl, *master_cl;
+   bool have_mask = labels_nonzero(mask);
+
+   /* Inherit master's labels to the related connection? */
+   master_cl = (ct->master) ? nf_ct_labels_find(ct->master) : NULL;
+
+   if (!master_cl && !have_mask)
+   return 0;   /* Nothing to do. */
 
cl = ovs_ct_get_conn_labels(ct);
if (!cl)
return -ENOSPC;
 
-   dst = (u32 *)cl->bits;
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
-   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
+   /* Inherit the master's labels, if any. */
+   if (master_cl)
+   *cl = *master_cl;
+
+   if (have_mask) {
+   u32 *dst = (u32 *)cl->bits;
+   int i;
+
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i]
+& mask->ct_labels_32[i]);
+   }
 
memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
 
@@ -913,13 +929,14 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
if (err)
return err;
}
-   if (labels_nonzero(>labels.mask)) {
-   if (!nf_ct_is_confirmed(ct))
-   err = ovs_ct_init_labels(ct, key, >labels.value,
->labels.mask);
-   else
-   err = ovs_ct_set_labels(ct, key, >labels.value,
-   >labels.mask);
+   if (!nf_ct_is_confirmed(ct)) {
+   err = ovs_ct_init_labels(ct, key, >labels.value,
+>labels.mask);
+   if (err)
+   return err;
+   } else if (labels_nonzero(>labels.mask)) {
+   err = ovs_ct_set_labels(ct, key, >labels.value,
+   >labels.mask);
if (err)
return err;
}
-- 
2.1.4



[PATCH v2 net-next 8/9] openvswitch: Add force commit.

2017-02-08 Thread Jarno Rajahalme
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.

Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched.  If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.

This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry.  If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/uapi/linux/openvswitch.h |  5 +
 net/openvswitch/conntrack.c  | 27 +--
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 90af8b8..7f41f7d 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -674,6 +674,10 @@ struct ovs_action_hash {
  * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
  * @OVS_CT_ATTR_NAT: Nested OVS_NAT_ATTR_* for performing L3 network address
  * translation (NAT) on the packet.
+ * @OVS_CT_ATTR_FORCE_COMMIT: Like %OVS_CT_ATTR_COMMIT, but instead of doing
+ * nothing if the connection is already committed will check that the current
+ * packet is in conntrack entry's original direction.  If directionality does
+ * not match, will delete the existing conntrack entry and commit a new one.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -684,6 +688,7 @@ enum ovs_ct_attr {
OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of
   related connections. */
OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */
+   OVS_CT_ATTR_FORCE_COMMIT,  /* No argument */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 14c35b7..f15c855 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -65,6 +65,7 @@ struct ovs_conntrack_info {
struct nf_conn *ct;
u8 commit : 1;
u8 nat : 3; /* enum ovs_ct_nat */
+   u8 force : 1;
u16 family;
struct md_mark mark;
struct md_labels labels;
@@ -616,10 +617,13 @@ static bool skb_nfct_cached(struct net *net,
 */
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
-   key->ct.zone == info->zone.id)
+   key->ct.zone == info->zone.id) {
ct = ovs_ct_find_existing(net, >zone, info->family, skb,
  !!(key->ct.state
 & OVS_CS_F_NAT_MASK));
+   if (ct)
+   nf_ct_get(skb, );
+   }
if (!ct)
return false;
if (!net_eq(net, read_pnet(>ct_net)))
@@ -633,6 +637,19 @@ static bool skb_nfct_cached(struct net *net,
if (help && rcu_access_pointer(help->helper) != info->helper)
return false;
}
+   /* Force conntrack entry direction to the current packet? */
+   if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
+   /* Delete the conntrack entry if confirmed, else just release
+* the reference.
+*/
+   if (nf_ct_is_confirmed(ct))
+   nf_ct_delete(ct, 0, 0);
+   else
+   nf_conntrack_put(>ct_general);
+   skb->nfct = NULL;
+   skb->nfctinfo = 0;
+   return false;
+   }
 
return true;
 }
@@ -1211,6 +1228,7 @@ static int parse_nat(const struct nlattr *attr,
 
 static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
[OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 },
+   [OVS_CT_ATTR_FORCE_COMMIT]  = { .minlen = 0, .maxlen = 0 },
[OVS_CT_ATTR_ZONE]  = { .minlen = sizeof(u16),
.maxlen = sizeof(u16) },
[OVS_CT_ATTR_MARK]  = { .minlen = sizeof(struct md_mark),
@@ -1250,6 +1268,9 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
}
 
switch (type) {
+   case OVS_CT_ATTR_FORCE_COMMIT:
+   info->for

Re: [PATCH net-next 7/7] openvswitch: Pack struct sw_flow_key.

2017-02-07 Thread Jarno Rajahalme

> On Feb 6, 2017, at 11:15 PM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 2 February 2017 at 17:10, Jarno Rajahalme <ja...@ovn.org> wrote:
>> struct sw_flow_key has two 16-bit holes. Move the most matched
>> conntrack match fields there.  In some typical cases this reduces the
>> size of the key that needs to be hashed into half and into one cache
>> line.
>> 
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> 
> Looks like this misses the zeroing in ovs_nla_get_flow_metadata();
> might want to double-check for any other memset/copies of the key->ct
> field.

Good catch. Looked, there are no other places to change.

Will rebase to current net-next and repost.

 Jarno



Re: [PATCH net-next 6/7] openvswitch: Add force commit.

2017-02-07 Thread Jarno Rajahalme

> On Feb 7, 2017, at 2:15 PM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 2 February 2017 at 17:10, Jarno Rajahalme <ja...@ovn.org> wrote:
>> Stateful network admission policy may allow connections to one
>> direction and reject connections initiated in the other direction.
>> After policy change it is possible that for a new connection an
>> overlapping conntrack entry already exist, where the connection
>> original direction is opposed to the new connection's initial packet.
>> 
>> Most importantly, conntrack state relating to the current packet gets
>> the "reply" designation based on whether the original direction tuple
>> or the reply direction tuple matched.  If this "directionality" is
>> wrong w.r.t. to the stateful network admission policy it may happen
>> that packets in neither direction are correctly admitted.
>> 
>> This patch adds a new "force commit" option to the OVS conntrack
>> action that checks the original direction of an existing conntrack
>> entry.  If that direction is opposed to the current packet, the
>> existing conntrack entry is deleted and a new one is subsequently
>> created in the correct direction.
>> 
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
>> ---
>> include/uapi/linux/openvswitch.h | 10 ++
>> net/openvswitch/conntrack.c  | 27 +--
>> 2 files changed, 35 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/uapi/linux/openvswitch.h 
>> b/include/uapi/linux/openvswitch.h
>> index 90af8b8..d5ba9a9 100644
>> --- a/include/uapi/linux/openvswitch.h
>> +++ b/include/uapi/linux/openvswitch.h
>> @@ -674,6 +674,10 @@ struct ovs_action_hash {
>>  * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
>>  * @OVS_CT_ATTR_NAT: Nested OVS_NAT_ATTR_* for performing L3 network address
>>  * translation (NAT) on the packet.
>> + * @OVS_CT_ATTR_FORCE_COMMIT: Like %OVS_CT_ATTR_COMMIT, but instead of doing
>> + * nothing if the connection is already committed will check that the 
>> current
>> + * packet is in conntrack entry's original direction.  If directionality 
>> does
>> + * not match, will delete the existing conntrack entry and commit a new one.
>>  */
>> enum ovs_ct_attr {
>>OVS_CT_ATTR_UNSPEC,
>> @@ -684,6 +688,12 @@ enum ovs_ct_attr {
>>OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of
>>   related connections. */
>>OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */
>> +   OVS_CT_ATTR_FORCE_COMMIT,  /* No argument, commits connection.  If 
>> the
>> +   * conntrack entry original direction 
>> tuple
>> +   * does not match the current packet 
>> header
>> +   * values, will delete the current 
>> conntrack
>> +   * entry and create a new one.
>> +   */
> 
> We only need one copy of the explanation, keep it above the enum, then
> the inline comment can be /* No argument */.
> 

OK.

>>__OVS_CT_ATTR_MAX
>> };
>> 
>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> index 1afe153..1f27f44 100644
>> --- a/net/openvswitch/conntrack.c
>> +++ b/net/openvswitch/conntrack.c
>> @@ -65,6 +65,7 @@ struct ovs_conntrack_info {
>>struct nf_conn *ct;
>>u8 commit : 1;
>>u8 nat : 3; /* enum ovs_ct_nat */
>> +   u8 force : 1;
>>u16 family;
>>struct md_mark mark;
>>struct md_labels labels;
>> @@ -631,10 +632,13 @@ static bool skb_nfct_cached(struct net *net,
>> */
>>if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
>>!(key->ct.state & OVS_CS_F_INVALID) &&
>> -   key->ct.zone == info->zone.id)
>> +   key->ct.zone == info->zone.id) {
>>ct = ovs_ct_find_existing(net, >zone, info->family, skb,
>>  !!(key->ct.state
>> & OVS_CS_F_NAT_MASK));
>> +   if (ct)
>> +   nf_ct_get(skb, );
>> +   }
> 
> If ctinfo is only used with the new call below, we can unconditionally
> fetch this just before it's used...
> 
>>if (!ct)
>>return false;
>>   

Re: [PATCH net-next 4/7] openvswitch: Inherit master's labels.

2017-02-07 Thread Jarno Rajahalme

> On Feb 6, 2017, at 1:53 PM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 2 February 2017 at 17:10, Jarno Rajahalme <ja...@ovn.org> wrote:
>> We avoid calling into nf_conntrack_in() for expected connections, as
>> that would remove the expectation that we want to stick around until
>> we are ready to commit the connection.  Instead, we do a lookup in the
>> expectation table directly.  However, after a successful expectation
>> lookup we have set the flow key label field from the master
>> connection, whereas nf_conntrack_in() does not do this.  This leads to
>> master's labels being iherited after an expectation lookup, but those
>> labels not being inherited after the corresponding conntrack action
>> with a commit flag.
>> 
>> This patch resolves the problem by changing the commit code path to
>> also inherit the master's labels to the expected connection.
>> Resolving this conflict in favor or inheriting the labels allows
>> information be passed from the master connection to related
>> connections, which would otherwise be much harder.  Labels can still
>> be set explicitly, so this change only affects the default values of
>> the labels in presense of a master connection.
>> 
>> Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
>> ---
>> net/openvswitch/conntrack.c | 48 
>> -
>> 1 file changed, 34 insertions(+), 14 deletions(-)
>> 
>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> index a163c44..738a4fa 100644
>> --- a/net/openvswitch/conntrack.c
>> +++ b/net/openvswitch/conntrack.c
>> @@ -265,6 +265,8 @@ static struct nf_conn_labels 
>> *ovs_ct_get_conn_labels(struct nf_conn *ct)
>>return cl;
>> }
>> 
>> +static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
>> +
> 
> These declarations typically live at the top of the file, not
> somewhere in the middle.
> 

Moved.

>> /* Initialize labels for a new, to be committed conntrack entry.  Note that
>>  * since the new connection is not yet confirmed, and thus no-one else has
>>  * access to it's labels, we simply write them over.  Also, we refrain from
>> @@ -275,18 +277,35 @@ static int ovs_ct_init_labels(struct nf_conn *ct, 
>> struct sw_flow_key *key,
>>  const struct ovs_key_ct_labels *labels,
>>  const struct ovs_key_ct_labels *mask)
>> {
>> -   struct nf_conn_labels *cl;
>> -   u32 *dst;
>> -   int i;
>> +   struct nf_conn_labels *cl, *master_cl;
>> +   bool have_mask = labels_nonzero(mask);
>> +
>> +   /* Inherit master's labels to the related connection? */
>> +   master_cl = (ct->master) ? nf_ct_labels_find(ct->master) : NULL;
>> +
>> +   if (!master_cl && !have_mask)
>> +   return 0;   /* Nothing to do. */
>> 
>>cl = ovs_ct_get_conn_labels(ct);
>>if (!cl)
>>return -ENOSPC;
>> 
>> -   dst = (u32 *)cl->bits;
>> -   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
>> -   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
>> -   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
>> +   /* Inherit the master's labels, if any. */
>> +   if (master_cl) {
>> +   size_t len = sizeof(master_cl->bits);
>> +
>> +   memcpy(>bits, _cl->bits,
>> +  len > OVS_CT_LABELS_LEN ? OVS_CT_LABELS_LEN : len);
> 
> Looks like this is another spot where we're trying to handle differing
> label lengths, which we could simplify if there was a stronger
> guarantee they're the same.

Indeed, here the ‘cl’s are different instances of the same structure, so we do 
need not worry about the sizes at all. I’ll change this to an simple structure 
assignment for v2.

> 
>> +   }
>> +   if (have_mask) {
>> +   u32 *dst = (u32 *)cl->bits;
>> +   int i;
>> +
>> +   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
>> +   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
>> +   (labels->ct_labels_32[i]
>> +& mask->ct_labels_32[i]);
>> +   }
> 
> By the way, is this open-coding nf_connlabels_replace()? Can
> ovs_ct_set_labels() and this share the code?

nf_connlabels_r

Re: [PATCH net-next 3/7] openvswitch: Do not trigger events for unconfirmed connection.

2017-02-07 Thread Jarno Rajahalme
Thanks for the review! Comments below,

  Jarno

> On Feb 6, 2017, at 1:46 PM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 2 February 2017 at 17:10, Jarno Rajahalme <ja...@ovn.org> wrote:
>> Avoid triggering change events for setting conntrack mark or labels
>> before the conntrack entry has been confirmed.  Refactoring on this
>> patch also makes chenges in later patches easier to review.
>> 
>> Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark")
>> Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label")
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> 
> Functional and cosmetic changes should be in separate patches.
> 

OK, will split.

>> ---
>> net/openvswitch/conntrack.c | 87 
>> -
>> 1 file changed, 63 insertions(+), 24 deletions(-)
>> 
>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> index 6730f09..a163c44 100644
>> --- a/net/openvswitch/conntrack.c
>> +++ b/net/openvswitch/conntrack.c
>> @@ -229,23 +229,17 @@ int ovs_ct_put_key(const struct sw_flow_key *key, 
>> struct sk_buff *skb)
>>return 0;
>> }
>> 
>> -static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
>> +static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key,
>>   u32 ct_mark, u32 mask)
>> {
>> #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
>> -   enum ip_conntrack_info ctinfo;
>> -   struct nf_conn *ct;
>>u32 new_mark;
>> 
>> -   /* The connection could be invalid, in which case set_mark is no-op. 
>> */
>> -   ct = nf_ct_get(skb, );
>> -   if (!ct)
>> -   return 0;
>> -
>>new_mark = ct_mark | (ct->mark & ~(mask));
>>if (ct->mark != new_mark) {
>>ct->mark = new_mark;
>> -   nf_conntrack_event_cache(IPCT_MARK, ct);
>> +   if (nf_ct_is_confirmed(ct))
>> +   nf_conntrack_event_cache(IPCT_MARK, ct);
>>key->ct.mark = new_mark;
>>}
>> 
>> @@ -255,26 +249,59 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
>> sw_flow_key *key,
>> #endif
>> }
>> 
>> -static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key,
>> -const struct ovs_key_ct_labels *labels,
>> -const struct ovs_key_ct_labels *mask)
>> +static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct)
>> {
>> -   enum ip_conntrack_info ctinfo;
>>struct nf_conn_labels *cl;
>> -   struct nf_conn *ct;
>> -   int err;
>> -
>> -   /* The connection could be invalid, in which case set_label is 
>> no-op.*/
>> -   ct = nf_ct_get(skb, );
>> -   if (!ct)
>> -   return 0;
>> 
>>cl = nf_ct_labels_find(ct);
>>if (!cl) {
>>nf_ct_labels_ext_add(ct);
>>cl = nf_ct_labels_find(ct);
>>}
>> +
>>if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN)
>> +   return NULL;
>> +
>> +   return cl;
>> +}
>> +
>> +/* Initialize labels for a new, to be committed conntrack entry.  Note that
>> + * since the new connection is not yet confirmed, and thus no-one else has
>> + * access to it's labels, we simply write them over.  Also, we refrain from
>> + * triggering events, as receiving change events before the create event 
>> would
>> + * be confusing.
>> + */
>> +static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key,
>> + const struct ovs_key_ct_labels *labels,
>> + const struct ovs_key_ct_labels *mask)
>> +{
>> +   struct nf_conn_labels *cl;
>> +   u32 *dst;
>> +   int i;
>> +
>> +   cl = ovs_ct_get_conn_labels(ct);
>> +   if (!cl)
>> +   return -ENOSPC;
>> +
>> +   dst = (u32 *)cl->bits;
> 
> Is it worth extending the union to include unsigned long, to avoid
> casting it to u32 here?
> 

This cast is on the struct nf_conn_labels, I would not unionize it at this 
point. This type of cast is typical in conntrack code.

>> +   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
>> +   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
>> +   (labels->ct_lab

Re: [PATCH net-next 5/7] openvswitch: Add original direction conntrack tuple to sw_flow_key.

2017-02-07 Thread Jarno Rajahalme

> On Feb 6, 2017, at 11:15 PM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 2 February 2017 at 17:10, Jarno Rajahalme <ja...@ovn.org> wrote:
>> Add the fields of the conntrack original direction 5-tuple to struct
>> sw_flow_key.  The new fields are initially zeroed, and are populated
>> whenever a conntrack action is executed and either finds or generates
>> a conntrack entry.  This means that these fields exist for all packets
>> were not rejected by conntrack as untrackable.
>> 
>> The original tuple fields in the sw_flow_key are filled from the
>> original direction tuple of the conntrack entry relating to the
>> current packet, or from the original direction tuple of the master
>> conntrack entry, if the current conntrack entry has a master.
>> Generally, expected connections of connections having an assigned
>> helper (e.g., FTP), have a master conntrack entry.
>> 
>> The main purpose of the new conntrack original tuple fields is to
>> allow matching on them for policy decision purposes, with the premise
>> that the admissibility of tracked connections reply packets (as well
>> as original direction packets), and both direction packets of any
>> related connections may be based on ACL rules applying to the master
>> connection's original direction 5-tuple.  This also makes it easier to
>> make policy decisions when the actual packet headers might have been
>> transformed by NAT, as the original direction 5-tuple represents the
>> packet headers before any such transformation.
>> 
>> When using the original direction 5-tuple the admissibility of return
>> and/or related packets need not be based on the mere existence of a
>> conntrack entry, allowing separation of admission policy from the
>> established conntrack state.  While existence of a conntrack entry is
>> required for admission of the return or related packets, policy
>> changes can render connections that were initially admitted to be
>> rejected or dropped afterwards.  If the admission of the return and
>> related packets was based on mere conntrack state (e.g., connection
>> being in an established state), a policy change that would make the
>> connection rejected or dropped would need to find and delete all
>> conntrack entries affected by such a change.  When using the original
>> direction 5-tuple matching the affected conntrack entries can be
>> allowed to time out instead, as the established state of the
>> connection would not need to be the basis for packet admission any
>> more.
>> 
>> It should be noted that the directionality of related connections may
>> be the same or different than that of the master connection, and
>> neither the original direction 5-tuple nor the conntrack state bits
>> carry this information.  If needed, the directionality of the master
>> connection can be stored in master's conntrack mark or labels, which
>> are automatically inherited by the expected related connections.
>> 
>> The fact that neither ARP not ND packets are trackable by conntrack
>> allows mutual exclusion between ARP/ND and the new conntrack original
>> tuple fields.  Hence, the IP addresses are overlaid in union with ARP
>> and ND fields.  This allows the sw_flow_key to not grow much due to
>> this patch, but it also means that we must be careful to never use the
>> new key fields with ARP or ND packets.  ARP is easy to distinguish and
>> keep mutually exclusive based on the ethernet type, but ND being an
>> ICMPv6 protocol requires a bit more attention.
>> 
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
>> ---
> 
> OK, maybe we need to do something a bit more to handle the NATed
> related connections to address the problem in patch 1.
> 
> 
> 
>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> index 738a4fa..1afe153 100644
>> --- a/net/openvswitch/conntrack.c
>> +++ b/net/openvswitch/conntrack.c
>> @@ -155,6 +155,59 @@ static void __ovs_ct_update_key(struct sw_flow_key 
>> *key, u8 state,
>>key->ct.zone = zone->id;
>>key->ct.mark = ovs_ct_get_mark(ct);
>>ovs_ct_get_labels(ct, >ct.labels);
>> +
>> +   /* Use the master if we have one. */
>> +   if (ct && ct->master)
>> +   ct = ct->master;
> 
> Perhaps:
> 
> if (!ct || sw_flow_key_is_nd(key) || !is_ip_any(key->eth.type)) {
>/* zero everything */
>return;
> }
> 
> One of the things this helps us to avoid is having a comment in the
> middle of an if statement.
> 
> Then afterwards,

Re: [PATCH net-next 1/7] openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.

2017-02-07 Thread Jarno Rajahalme

> On Feb 6, 2017, at 9:07 AM, Pravin Shelar <pshe...@ovn.org> wrote:
> 
> On Thu, Feb 2, 2017 at 5:10 PM, Jarno Rajahalme <ja...@ovn.org> wrote:
>> When looking for an existing conntrack entry, the packet 5-tuple
>> must be inverted if NAT has already been applied, as the current
>> packet headers do not match any conntrack tuple.  For
>> example, if a packet from private address X to a public address B is
>> source-NATted to A, the conntrack entry will have the following tuples
>> (ignoring the protocol and port numbers) after the conntrack entry is
>> committed:
>> 
>> Original direction tuple: (X,B)
>> Reply direction tuple: (B,A)
>> 
>> Now, if a reply packet is already transformed back to the private
>> address space (e.g., with a CT(nat) action), the tuple corresponding
>> to the current packet headers is:
>> 
>> Current packet tuple: (B,X)
>> 
>> This does not match either of the conntrack tuples above.  Normally
>> this does not matter, as the conntrack lookup was already done using
>> the tuple (B,A), but if the current packet does not match any flow in
>> the OVS datapath, the packet is sent to userspace via an upcall,
>> during which the packet's skb is freed, and the conntrack entry
>> pointer in the skb is lost.  When the packet is reintroduced to the
>> datapath, any further conntrack action will need to perform a new
>> conntrack lookup to find the entry again.  Prior to this patch this
>> second lookup failed for NATted packets.  The datapath flow setup
>> corresponding to the upcall can succeed, however, allowing all further
>> packets in the reply direction to re-use the conntrack entry pointer
>> in the skb, so typically the lookup failure only causes a packet drop.
>> 
>> The solution is to invert the tuple derived from the current packet
>> headers in case the conntrack state stored in the packet metadata
>> indicates that the packet has been transformed by NAT:
>> 
>> Inverted tuple: (X,B)
>> 
>> With this the conntrack entry can be found, matching the original
>> direction tuple.
>> 
>> This same logic also works for the original direction packets:
>> 
>> Current packet tuple (after NAT): (A,B)
>> Inverted tuple: (B,A)
>> 
>> While the current packet tuple (A,B) does not match either of the
>> conntrack tuples, the inverted one (B,A) does match the reply
>> direction tuple.
>> 
>> Since the inverted tuple matches the reverse direction tuple the
>> direction of the packet must be reversed as well.
>> 
>> Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> 
> I could not apply this patch series to net-next branch. But it does
> applies to net, which branch are you targeting it for?

The patches were against net-next, but there likely was a merge from netfilter 
around the time of me sending the email out causing the difficulty. Will 
address all comments, rebase and post a v2 later today.

 Jarno




Re: [PATCH net-next 7/7] openvswitch: Pack struct sw_flow_key.

2017-02-07 Thread Jarno Rajahalme

> On Feb 6, 2017, at 11:15 PM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 2 February 2017 at 17:10, Jarno Rajahalme <ja...@ovn.org> wrote:
>> struct sw_flow_key has two 16-bit holes. Move the most matched
>> conntrack match fields there.  In some typical cases this reduces the
>> size of the key that needs to be hashed into half and into one cache
>> line.
>> 
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> 
> Looks like this misses the zeroing in ovs_nla_get_flow_metadata();
> might want to double-check for any other memset/copies of the key->ct
> field.

Good catch. Looked, there are no other places to change.

Will rebase to current net-next and repost.

  Jarno



Re: [PATCH net-next 1/7] openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.

2017-02-07 Thread Jarno Rajahalme

> On Feb 7, 2017, at 9:14 AM, Pravin Shelar <pshe...@ovn.org> wrote:
> 
> On Mon, Feb 6, 2017 at 9:15 AM, David Miller <da...@davemloft.net> wrote:
>> From: Pravin Shelar <pshe...@ovn.org>
>> Date: Mon, 6 Feb 2017 09:06:29 -0800
>> 
>>> On Sun, Feb 5, 2017 at 2:28 PM, David Miller <da...@davemloft.net> wrote:
>>>> From: Jarno Rajahalme <ja...@ovn.org>
>>>> Date: Thu,  2 Feb 2017 17:10:00 -0800
>>>> 
>>>>> This does not match either of the conntrack tuples above.  Normally
>>>>> this does not matter, as the conntrack lookup was already done using
>>>>> the tuple (B,A), but if the current packet does not match any flow in
>>>>> the OVS datapath, the packet is sent to userspace via an upcall,
>>>>> during which the packet's skb is freed, and the conntrack entry
>>>>> pointer in the skb is lost.
>>>> 
>>>> This is the real bug.
>>>> 
>>>> If the metadata for a packet is important, as it obviously is here,
>>>> then upcalls should preserve that metadata across the upcall.  This
>>>> is exactly how NF_QUEUE handles this problem and so should OVS.
>>> 
>>> OVS kernel datapath serializes skb context and sends it along with skb
>>> during upcall via genetlink parameters. userspace echoes same
>>> information back to kernel modules. This way OVS does maintains
>>> metadata across upcall.
>> 
>> Then the conntrack state looked up before the NAT operation done in
>> the upcall should be maintained and therefore this bug should not
>> exist.
> 

We already maintain enough metadata across the userspace upcall, but so far we 
have failed to use it correctly in the case of NATted packets. The NAT flags 
are there, based on which we (now) know that we have to do inverted lookup to 
locate the conntrack entry. Conntrack NAT by design only stores tuples for the 
incoming packets, and there are various instances of tuple inversions in 
netfilter code for this reason (see, for example, 
net/netfilter/nf_nat_core.c:nf_nat_used_tuple()). I’ll make the commit message 
clearer about this for the v2 of the series.

> I think it fails because after upcall OVS is performing lookup for
> nonexistent conntrack entry. This is due to the fact that the packet
> is already Nat-ed. So one could argue that there is already enough
> context available in OVS to lookup original conntract entry after the
> upcall as shown in this patch.
> But I am also fine with using original context to lookup the conntract
> entry as Joe has suggested.

For expected (i.e., related) connection, using the master’s original direction 
5-tuple added to the flow meta-data in patch 5 would find the master 
connection, not the related connection, so it would not work.

  Jarno



[PATCH net-next 3/7] openvswitch: Do not trigger events for unconfirmed connection.

2017-02-02 Thread Jarno Rajahalme
Avoid triggering change events for setting conntrack mark or labels
before the conntrack entry has been confirmed.  Refactoring on this
patch also makes chenges in later patches easier to review.

Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark")
Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 87 -
 1 file changed, 63 insertions(+), 24 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 6730f09..a163c44 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -229,23 +229,17 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct 
sk_buff *skb)
return 0;
 }
 
-static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
+static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key,
   u32 ct_mark, u32 mask)
 {
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
-   enum ip_conntrack_info ctinfo;
-   struct nf_conn *ct;
u32 new_mark;
 
-   /* The connection could be invalid, in which case set_mark is no-op. */
-   ct = nf_ct_get(skb, );
-   if (!ct)
-   return 0;
-
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
-   nf_conntrack_event_cache(IPCT_MARK, ct);
+   if (nf_ct_is_confirmed(ct))
+   nf_conntrack_event_cache(IPCT_MARK, ct);
key->ct.mark = new_mark;
}
 
@@ -255,26 +249,59 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
 #endif
 }
 
-static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key,
-const struct ovs_key_ct_labels *labels,
-const struct ovs_key_ct_labels *mask)
+static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct)
 {
-   enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
-   struct nf_conn *ct;
-   int err;
-
-   /* The connection could be invalid, in which case set_label is no-op.*/
-   ct = nf_ct_get(skb, );
-   if (!ct)
-   return 0;
 
cl = nf_ct_labels_find(ct);
if (!cl) {
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
+
if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN)
+   return NULL;
+
+   return cl;
+}
+
+/* Initialize labels for a new, to be committed conntrack entry.  Note that
+ * since the new connection is not yet confirmed, and thus no-one else has
+ * access to it's labels, we simply write them over.  Also, we refrain from
+ * triggering events, as receiving change events before the create event would
+ * be confusing.
+ */
+static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key,
+ const struct ovs_key_ct_labels *labels,
+ const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   u32 *dst;
+   int i;
+
+   cl = ovs_ct_get_conn_labels(ct);
+   if (!cl)
+   return -ENOSPC;
+
+   dst = (u32 *)cl->bits;
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
+
+   memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
+
+   return 0;
+}
+
+static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key,
+const struct ovs_key_ct_labels *labels,
+const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   int err;
+
+   cl = ovs_ct_get_conn_labels(ct);
+   if (!cl)
return -ENOSPC;
 
err = nf_connlabels_replace(ct, labels->ct_labels_32,
@@ -283,7 +310,8 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
if (err)
return err;
 
-   ovs_ct_get_labels(ct, >ct.labels);
+   memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
+
return 0;
 }
 
@@ -865,25 +893,36 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
 const struct ovs_conntrack_info *info,
 struct sk_buff *skb)
 {
+   enum ip_conntrack_info ctinfo;
+   struct nf_conn *ct;
int err;
 
err = __ovs_ct_lookup(net, key, info, skb);
if (err)
return err;
 
+   /* The connection could be invalid, in which case this is a no-op.*/
+   ct = nf_ct_get(skb, );
+   if (!ct)
+   return 0;
+
/* Apply changes before confirming the connection so that the initial
  

[PATCH net-next 5/7] openvswitch: Add original direction conntrack tuple to sw_flow_key.

2017-02-02 Thread Jarno Rajahalme
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key.  The new fields are initially zeroed, and are populated
whenever a conntrack action is executed and either finds or generates
a conntrack entry.  This means that these fields exist for all packets
were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple.  This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state.  While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards.  If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change.  When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information.  If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP not ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields.  Hence, the IP addresses are overlaid in union with ARP
and ND fields.  This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets.  ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/uapi/linux/openvswitch.h | 20 -
 net/openvswitch/actions.c|  2 +
 net/openvswitch/conntrack.c  | 95 +---
 net/openvswitch/conntrack.h  | 13 +-
 net/openvswitch/flow.c   | 34 +++---
 net/openvswitch/flow.h   | 49 -
 net/openvswitch/flow_netlink.c   | 91 +-
 net/openvswitch/flow_netlink.h   |  7 ++-
 8 files changed, 264 insertions(+), 47 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 96aee34..90af8b8 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -1,6 +1,6 @@
 
 /*
- * Copyright (c) 2007-2013 Nicira, Inc.
+ * Copyright (c) 2007-2017 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -331,6 +331,8 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
+   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
+   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -472,6 +474,22 @@ struct ovs_key_ct_labels {
 
 #define OVS_CS_F_NAT_MASK (OVS_CS_F_SRC_NAT | OVS_CS_F_DST_NAT)
 
+struct ovs_key_ct_tuple_ipv4 {
+   __be32 ipv4_src;
+   __be32 ipv4_dst;
+   __be16 src_port;
+   __be16 dst_port;
+   __u8   ipv4_proto;
+};
+
+struct ovs_key_ct_tuple_ipv6 {
+   __be32 ip

[PATCH net-next 2/7] openvswitch: Unionize ovs_key_ct_label with a u32 array.

2017-02-02 Thread Jarno Rajahalme
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array.  It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/uapi/linux/openvswitch.h | 8 ++--
 net/openvswitch/conntrack.c  | 9 +
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 375d812..96aee34 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -446,9 +446,13 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
-#define OVS_CT_LABELS_LEN  16
+#define OVS_CT_LABELS_LEN_32   4
+#define OVS_CT_LABELS_LEN  (OVS_CT_LABELS_LEN_32 * sizeof(__u32))
 struct ovs_key_ct_labels {
-   __u8ct_labels[OVS_CT_LABELS_LEN];
+   union {
+   __u8ct_labels[OVS_CT_LABELS_LEN];
+   __u32   ct_labels_32[OVS_CT_LABELS_LEN_32];
+   };
 };
 
 /* OVS_KEY_ATTR_CT_STATE flags */
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index b91baa2..6730f09 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -277,8 +277,9 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN)
return -ENOSPC;
 
-   err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / sizeof(u32));
+   err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
if (err)
return err;
 
@@ -852,8 +853,8 @@ static bool labels_nonzero(const struct ovs_key_ct_labels 
*labels)
 {
size_t i;
 
-   for (i = 0; i < sizeof(*labels); i++)
-   if (labels->ct_labels[i])
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   if (labels->ct_labels_32[i])
return true;
 
return false;
-- 
2.1.4



[PATCH net-next 6/7] openvswitch: Add force commit.

2017-02-02 Thread Jarno Rajahalme
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exist, where the connection
original direction is opposed to the new connection's initial packet.

Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched.  If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.

This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry.  If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/uapi/linux/openvswitch.h | 10 ++
 net/openvswitch/conntrack.c  | 27 +--
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 90af8b8..d5ba9a9 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -674,6 +674,10 @@ struct ovs_action_hash {
  * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
  * @OVS_CT_ATTR_NAT: Nested OVS_NAT_ATTR_* for performing L3 network address
  * translation (NAT) on the packet.
+ * @OVS_CT_ATTR_FORCE_COMMIT: Like %OVS_CT_ATTR_COMMIT, but instead of doing
+ * nothing if the connection is already committed will check that the current
+ * packet is in conntrack entry's original direction.  If directionality does
+ * not match, will delete the existing conntrack entry and commit a new one.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -684,6 +688,12 @@ enum ovs_ct_attr {
OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of
   related connections. */
OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */
+   OVS_CT_ATTR_FORCE_COMMIT,  /* No argument, commits connection.  If the
+   * conntrack entry original direction tuple
+   * does not match the current packet header
+   * values, will delete the current conntrack
+   * entry and create a new one.
+   */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 1afe153..1f27f44 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -65,6 +65,7 @@ struct ovs_conntrack_info {
struct nf_conn *ct;
u8 commit : 1;
u8 nat : 3; /* enum ovs_ct_nat */
+   u8 force : 1;
u16 family;
struct md_mark mark;
struct md_labels labels;
@@ -631,10 +632,13 @@ static bool skb_nfct_cached(struct net *net,
 */
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
-   key->ct.zone == info->zone.id)
+   key->ct.zone == info->zone.id) {
ct = ovs_ct_find_existing(net, >zone, info->family, skb,
  !!(key->ct.state
 & OVS_CS_F_NAT_MASK));
+   if (ct)
+   nf_ct_get(skb, );
+   }
if (!ct)
return false;
if (!net_eq(net, read_pnet(>ct_net)))
@@ -648,6 +652,19 @@ static bool skb_nfct_cached(struct net *net,
if (help && rcu_access_pointer(help->helper) != info->helper)
return false;
}
+   /* Force conntrack entry direction to the current packet? */
+   if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
+   /* Delete the conntrack entry if confirmed, else just release
+* the reference.
+*/
+   if (nf_ct_is_confirmed(ct))
+   nf_ct_delete(ct, 0, 0);
+   else
+   nf_ct_put(ct);
+   skb->nfct = NULL;
+   skb->nfctinfo = 0;
+   return false;
+   }
 
return true;
 }
@@ -1226,6 +1243,7 @@ static int parse_nat(const struct nlattr *attr,
 
 static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
[OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 },
+   [OVS_CT_ATTR_FORCE_COMMIT]  = { .minlen = 0, .maxlen = 0 },
[OVS_CT_ATTR_ZONE]  = { .minlen = sizeof(u16),
.ma

[PATCH net-next 4/7] openvswitch: Inherit master's labels.

2017-02-02 Thread Jarno Rajahalme
We avoid calling into nf_conntrack_in() for expected connections, as
that would remove the expectation that we want to stick around until
we are ready to commit the connection.  Instead, we do a lookup in the
expectation table directly.  However, after a successful expectation
lookup we have set the flow key label field from the master
connection, whereas nf_conntrack_in() does not do this.  This leads to
master's labels being iherited after an expectation lookup, but those
labels not being inherited after the corresponding conntrack action
with a commit flag.

This patch resolves the problem by changing the commit code path to
also inherit the master's labels to the expected connection.
Resolving this conflict in favor or inheriting the labels allows
information be passed from the master connection to related
connections, which would otherwise be much harder.  Labels can still
be set explicitly, so this change only affects the default values of
the labels in presense of a master connection.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 48 -
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index a163c44..738a4fa 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -265,6 +265,8 @@ static struct nf_conn_labels *ovs_ct_get_conn_labels(struct 
nf_conn *ct)
return cl;
 }
 
+static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
+
 /* Initialize labels for a new, to be committed conntrack entry.  Note that
  * since the new connection is not yet confirmed, and thus no-one else has
  * access to it's labels, we simply write them over.  Also, we refrain from
@@ -275,18 +277,35 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
  const struct ovs_key_ct_labels *labels,
  const struct ovs_key_ct_labels *mask)
 {
-   struct nf_conn_labels *cl;
-   u32 *dst;
-   int i;
+   struct nf_conn_labels *cl, *master_cl;
+   bool have_mask = labels_nonzero(mask);
+
+   /* Inherit master's labels to the related connection? */
+   master_cl = (ct->master) ? nf_ct_labels_find(ct->master) : NULL;
+
+   if (!master_cl && !have_mask)
+   return 0;   /* Nothing to do. */
 
cl = ovs_ct_get_conn_labels(ct);
if (!cl)
return -ENOSPC;
 
-   dst = (u32 *)cl->bits;
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
-   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
+   /* Inherit the master's labels, if any. */
+   if (master_cl) {
+   size_t len = sizeof(master_cl->bits);
+
+   memcpy(>bits, _cl->bits,
+  len > OVS_CT_LABELS_LEN ? OVS_CT_LABELS_LEN : len);
+   }
+   if (have_mask) {
+   u32 *dst = (u32 *)cl->bits;
+   int i;
+
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i]
+& mask->ct_labels_32[i]);
+   }
 
memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
 
@@ -916,13 +935,14 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
if (err)
return err;
}
-   if (labels_nonzero(>labels.mask)) {
-   if (!nf_ct_is_confirmed(ct))
-   err = ovs_ct_init_labels(ct, key, >labels.value,
->labels.mask);
-   else
-   err = ovs_ct_set_labels(ct, key, >labels.value,
-   >labels.mask);
+   if (!nf_ct_is_confirmed(ct)) {
+   err = ovs_ct_init_labels(ct, key, >labels.value,
+>labels.mask);
+   if (err)
+   return err;
+   } else if (labels_nonzero(>labels.mask)) {
+   err = ovs_ct_set_labels(ct, key, >labels.value,
+   >labels.mask);
if (err)
return err;
}
-- 
2.1.4



[PATCH net-next 1/7] openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.

2017-02-02 Thread Jarno Rajahalme
When looking for an existing conntrack entry, the packet 5-tuple
must be inverted if NAT has already been applied, as the current
packet headers do not match any conntrack tuple.  For
example, if a packet from private address X to a public address B is
source-NATted to A, the conntrack entry will have the following tuples
(ignoring the protocol and port numbers) after the conntrack entry is
committed:

Original direction tuple: (X,B)
Reply direction tuple: (B,A)

Now, if a reply packet is already transformed back to the private
address space (e.g., with a CT(nat) action), the tuple corresponding
to the current packet headers is:

Current packet tuple: (B,X)

This does not match either of the conntrack tuples above.  Normally
this does not matter, as the conntrack lookup was already done using
the tuple (B,A), but if the current packet does not match any flow in
the OVS datapath, the packet is sent to userspace via an upcall,
during which the packet's skb is freed, and the conntrack entry
pointer in the skb is lost.  When the packet is reintroduced to the
datapath, any further conntrack action will need to perform a new
conntrack lookup to find the entry again.  Prior to this patch this
second lookup failed for NATted packets.  The datapath flow setup
corresponding to the upcall can succeed, however, allowing all further
packets in the reply direction to re-use the conntrack entry pointer
in the skb, so typically the lookup failure only causes a packet drop.

The solution is to invert the tuple derived from the current packet
headers in case the conntrack state stored in the packet metadata
indicates that the packet has been transformed by NAT:

Inverted tuple: (X,B)

With this the conntrack entry can be found, matching the original
direction tuple.

This same logic also works for the original direction packets:

Current packet tuple (after NAT): (A,B)
Inverted tuple: (B,A)

While the current packet tuple (A,B) does not match either of the
conntrack tuples, the inverted one (B,A) does match the reply
direction tuple.

Since the inverted tuple matches the reverse direction tuple the
direction of the packet must be reversed as well.

Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 54253ea..b91baa2 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -430,7 +430,7 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
-u8 l3num, struct sk_buff *skb)
+u8 l3num, struct sk_buff *skb, bool natted)
 {
struct nf_conntrack_l3proto *l3proto;
struct nf_conntrack_l4proto *l4proto;
@@ -453,6 +453,17 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return NULL;
}
 
+   /* Must invert the tuple if skb has been transformed by NAT. */
+   if (natted) {
+   struct nf_conntrack_tuple inverse;
+
+   if (!nf_ct_invert_tuple(, , l3proto, l4proto)) {
+   pr_debug("ovs_ct_find_existing: Inversion failed!\n");
+   return NULL;
+   }
+   tuple = inverse;
+   }
+
/* look for tuple match */
h = nf_conntrack_find_get(net, zone, );
if (!h)
@@ -460,6 +471,13 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
 
ct = nf_ct_tuplehash_to_ctrack(h);
 
+   /* Inverted packet tuple matches the reverse direction conntrack tuple,
+* select the other tuplehash to get the right 'ctinfo' bits for this
+* packet.
+*/
+   if (natted)
+   h = >tuplehash[!h->tuple.dst.dir];
+
skb->nfct = >ct_general;
skb->nfctinfo = ovs_ct_get_info(h);
return ct;
@@ -483,7 +501,9 @@ static bool skb_nfct_cached(struct net *net,
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
key->ct.zone == info->zone.id)
-   ct = ovs_ct_find_existing(net, >zone, info->family, skb);
+   ct = ovs_ct_find_existing(net, >zone, info->family, skb,
+ !!(key->ct.state
+& OVS_CS_F_NAT_MASK));
if (!ct)
return false;
if (!net_eq(net, read_pnet(>ct_net)))
-- 
2.1.4



[PATCH net-next 7/7] openvswitch: Pack struct sw_flow_key.

2017-02-02 Thread Jarno Rajahalme
struct sw_flow_key has two 16-bit holes. Move the most matched
conntrack match fields there.  In some typical cases this reduces the
size of the key that needs to be hashed into half and into one cache
line.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c| 42 +-
 net/openvswitch/conntrack.h|  6 +++---
 net/openvswitch/flow.h | 14 --
 net/openvswitch/flow_netlink.c |  8 
 4 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 1f27f44..e8d29c0 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -152,8 +152,8 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
const struct nf_conntrack_zone *zone,
const struct nf_conn *ct)
 {
-   key->ct.state = state;
-   key->ct.zone = zone->id;
+   key->ct_state = state;
+   key->ct_zone = zone->id;
key->ct.mark = ovs_ct_get_mark(ct);
ovs_ct_get_labels(ct, >ct.labels);
 
@@ -161,7 +161,7 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
if (ct && ct->master)
ct = ct->master;
 
-   key->ct.orig_proto = 0;
+   key->ct_orig_proto = 0;
key->ct.orig_tp.src = 0;
key->ct.orig_tp.dst = 0;
if (key->eth.type == htons(ETH_P_IP)) {
@@ -172,7 +172,7 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
 
key->ipv4.ct_orig.src = orig->src.u3.ip;
key->ipv4.ct_orig.dst = orig->dst.u3.ip;
-   key->ct.orig_proto = orig->dst.protonum;
+   key->ct_orig_proto = orig->dst.protonum;
if (orig->dst.protonum == IPPROTO_ICMP) {
key->ct.orig_tp.src
= htons(orig->dst.u.icmp.type);
@@ -195,7 +195,7 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
 
key->ipv6.ct_orig.src = orig->src.u3.in6;
key->ipv6.ct_orig.dst = orig->dst.u3.in6;
-   key->ct.orig_proto = orig->dst.protonum;
+   key->ct_orig_proto = orig->dst.protonum;
if (orig->dst.protonum == IPPROTO_ICMPV6) {
key->ct.orig_tp.src
= htons(orig->dst.u.icmp.type);
@@ -238,7 +238,7 @@ static void ovs_ct_update_key(const struct sk_buff *skb,
if (ct->master)
state |= OVS_CS_F_RELATED;
if (keep_nat_flags) {
-   state |= key->ct.state & OVS_CS_F_NAT_MASK;
+   state |= key->ct_state & OVS_CS_F_NAT_MASK;
} else {
if (ct->status & IPS_SRC_NAT)
state |= OVS_CS_F_SRC_NAT;
@@ -269,11 +269,11 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct 
sw_flow_key *key)
 int ovs_ct_put_key(const struct sw_flow_key *swkey,
   const struct sw_flow_key *output, struct sk_buff *skb)
 {
-   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct.state))
+   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct_state))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
-   nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct.zone))
+   nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct_zone))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) &&
@@ -285,25 +285,25 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey,
>ct.labels))
return -EMSGSIZE;
 
-   if (swkey->eth.type == htons(ETH_P_IP) && swkey->ct.orig_proto) {
+   if (swkey->eth.type == htons(ETH_P_IP) && swkey->ct_orig_proto) {
struct ovs_key_ct_tuple_ipv4 orig = {
output->ipv4.ct_orig.src,
output->ipv4.ct_orig.dst,
output->ct.orig_tp.src,
output->ct.orig_tp.dst,
-   output->ct.orig_proto,
+   output->ct_orig_proto,
};
if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4, sizeof(orig),
))
return -EMSGSIZE;
} else if (swkey->eth.type == htons(ETH_P_IPV6) &&
-  swkey->ct.orig_proto) {
+  swkey->ct_orig_proto) {
struct ovs_key_ct_tuple_ipv6 orig = {
IN6

Re: [net-next] openvswitch: Simplify do_execute_actions().

2017-01-26 Thread Jarno Rajahalme
Nice clean-up.

Acked-by: Jarno Rajahalme <ja...@ovn.org>

> On Jan 25, 2017, at 9:24 PM, Andy Zhou <az...@ovn.org> wrote:
> 
> do_execute_actions() implements a worthwhile optimization: in case
> an output action is the last action in an action list, skb_clone()
> can be avoided by outputing the current skb. However, the
> implementation is more complicated than necessary.  This patch
> simplify this logic.
> 
> Signed-off-by: Andy Zhou <az...@ovn.org>
> ---
> net/openvswitch/actions.c | 40 +++-
> 1 file changed, 19 insertions(+), 21 deletions(-)
> 
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 514f7bc..3866608 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -830,6 +830,9 @@ static void do_output(struct datapath *dp, struct sk_buff 
> *skb, int out_port,
> {
>   struct vport *vport = ovs_vport_rcu(dp, out_port);
> 
> + if (unlikely(!skb))
> + return;
> +
>   if (likely(vport)) {
>   u16 mru = OVS_CB(skb)->mru;
>   u32 cutlen = OVS_CB(skb)->cutlen;
> @@ -1141,12 +1144,6 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
> struct sw_flow_key *key,
> const struct nlattr *attr, int len)
> {
> - /* Every output action needs a separate clone of 'skb', but the common
> -  * case is just a single output action, so that doing a clone and
> -  * then freeing the original skbuff is wasteful.  So the following code
> -  * is slightly obscure just to avoid that.
> -  */
> - int prev_port = -1;
>   const struct nlattr *a;
>   int rem;
> 
> @@ -1154,20 +1151,25 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
>a = nla_next(a, )) {
>   int err = 0;
> 
> - if (unlikely(prev_port != -1)) {
> - struct sk_buff *out_skb = skb_clone(skb, GFP_ATOMIC);
> + switch (nla_type(a)) {
> + case OVS_ACTION_ATTR_OUTPUT: {
> + int port = nla_get_u32(a);
> 
> - if (out_skb)
> - do_output(dp, out_skb, prev_port, key);
> + /* Every output action needs a separate clone
> +  * of 'skb', In case the output action is the
> +  * last action, cloning can be avoided.
> +  */
> + if (nla_is_last(a, rem)) {
> + do_output(dp, skb, port, key);
> + /* 'skb' has been used for output.
> +  */
> + return 0;
> + }
> 
> + do_output(dp, skb_clone(skb, GFP_ATOMIC), port, key);
>   OVS_CB(skb)->cutlen = 0;
> - prev_port = -1;
> - }
> -
> - switch (nla_type(a)) {
> - case OVS_ACTION_ATTR_OUTPUT:
> - prev_port = nla_get_u32(a);
>   break;
> + }
> 
>   case OVS_ACTION_ATTR_TRUNC: {
>   struct ovs_action_trunc *trunc = nla_data(a);
> @@ -1257,11 +1259,7 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
>   }
>   }
> 
> - if (prev_port != -1)
> - do_output(dp, skb, prev_port, key);
> - else
> - consume_skb(skb);
> -
> + consume_skb(skb);
>   return 0;
> }
> 
> -- 
> 1.8.3.1
> 



Re: [PATCH net] openvswitch: Add a missing break statement.

2016-12-19 Thread Jarno Rajahalme

> On Dec 19, 2016, at 5:06 PM, Jarno Rajahalme <ja...@ovn.org> wrote:
> 
> Add a break statement to prevent fall-through from
> OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
> actions setting ethernet addresses fail to validate with log messages
> complaining about invalid tunnel attributes.
> 
> Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> Acked-by: Pravin B Shelar <pshe...@ovn.org>
> Acked-by: Jiri Benc <jb...@redhat.com>
> ---
> net/openvswitch/flow_netlink.c | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index d19044f..c87d359 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -2195,6 +2195,7 @@ static int validate_set(const struct nlattr *a,
>   case OVS_KEY_ATTR_ETHERNET:
>   if (mac_proto != MAC_PROTO_ETHERNET)
>   return -EINVAL;
> + break;
> 
>   case OVS_KEY_ATTR_TUNNEL:
>   if (masked)
> -- 
> 2.1.4
> 

I posted this separately from an earlier net-next patch series. Pravin agreed 
to pick up that rest of the series (skb->protocol w/ vlans).

  Jarno


Re: [PATCH v3 net-next 1/3] openvswitch: Add a missing break statement.

2016-12-19 Thread Jarno Rajahalme

> On Dec 13, 2016, at 9:07 PM, Pravin Shelar <pshe...@ovn.org> wrote:
> 
> On Tue, Nov 29, 2016 at 3:30 PM, Jarno Rajahalme <ja...@ovn.org> wrote:
>> Add a break statement to prevent fall-through from
>> OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
>> actions setting ethernet addresses fail to validate with log messages
>> complaining about invalid tunnel attributes.
>> 
>> Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
>> Acked-by: Pravin B Shelar <pshe...@ovn.org>
>> Acked-by: Jiri Benc <jb...@redhat.com>
> 
> Hi Jarno,
> Since this is straight forward patch. can you send it separately so
> that we can get it merged soon?
> 

I just did, against net. You’ll take over the rest?

  Jarno

> Thanks,
> Pravin.



[PATCH net] openvswitch: Add a missing break statement.

2016-12-19 Thread Jarno Rajahalme
Add a break statement to prevent fall-through from
OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
actions setting ethernet addresses fail to validate with log messages
complaining about invalid tunnel attributes.

Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
Acked-by: Jiri Benc <jb...@redhat.com>
---
 net/openvswitch/flow_netlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d19044f..c87d359 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2195,6 +2195,7 @@ static int validate_set(const struct nlattr *a,
case OVS_KEY_ATTR_ETHERNET:
if (mac_proto != MAC_PROTO_ETHERNET)
return -EINVAL;
+   break;
 
case OVS_KEY_ATTR_TUNNEL:
if (masked)
-- 
2.1.4



Re: [PATCH v3 net-next 2/3] openvswitch: Use is_skb_forwardable() for length check.

2016-11-30 Thread Jarno Rajahalme

> On Nov 30, 2016, at 5:51 AM, Jiri Benc <jb...@redhat.com> wrote:
> 
> On Tue, 29 Nov 2016 15:30:52 -0800, Jarno Rajahalme wrote:
>> @@ -504,11 +485,20 @@ void ovs_vport_send(struct vport *vport, struct 
>> sk_buff *skb, u8 mac_proto)
>>  goto drop;
>>  }
>> 
>> -if (unlikely(packet_length(skb, vport->dev) > mtu &&
>> - !skb_is_gso(skb))) {
>> -net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n",
>> - vport->dev->name,
>> - packet_length(skb, vport->dev), mtu);
>> +if (unlikely(!is_skb_forwardable(vport->dev, skb))) {
> 
> How does this work when the vlan tag is accelerated? Then we can be
> over MTU, yet the check will pass.
> 

I’ll check how other call sites use this.

  Jarno

> Jiri



Re: [PATCH v2 net-next 2/2] openvswitch: Fix skb->protocol for vlan frames.

2016-11-29 Thread Jarno Rajahalme

> On Nov 28, 2016, at 11:21 PM, Pravin Shelar <pshe...@ovn.org> wrote:
> 
> On Mon, Nov 28, 2016 at 6:41 PM, Jarno Rajahalme <ja...@ovn.org> wrote:
>> Do not set skb->protocol to be the ethertype of the L3 header, unless
>> the packet only has the L3 header.  For a non-hardware offloaded VLAN
>> frame skb->protocol needs to be one of the VLAN ethertypes.
>> 
>> Any VLAN offloading is undone on the OVS netlink interface.  Also any
>> VLAN tags added by userspace are non-offloaded.
>> 
>> Incorrect skb->protocol value on a full-size non-offloaded VLAN skb
>> causes packet drop due to failing MTU check, as the VLAN header should
>> not be counted in when considering MTU in ovs_vport_send().
>> 
> I think we should move to is_skb_forwardable() type of packet length
> check in vport-send and get rid of skb-protocol checks altogether.
> 

I added a new patch to do this, thanks for the suggestion.

>> Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
>> ---
>> v2: Set skb->protocol when an ETH_P_TEB frame is received via ARPHRD_NONE
>>interface.
>> 
>> net/openvswitch/datapath.c |  1 -
>> net/openvswitch/flow.c | 30 ++
>> 2 files changed, 22 insertions(+), 9 deletions(-)
> ...
> ...
>> @@ -531,15 +538,22 @@ static int key_extract(struct sk_buff *skb, struct 
>> sw_flow_key *key)
>>if (unlikely(parse_vlan(skb, key)))
>>return -ENOMEM;
>> 
>> -   skb->protocol = parse_ethertype(skb);
>> -   if (unlikely(skb->protocol == htons(0)))
>> +   key->eth.type = parse_ethertype(skb);
>> +   if (unlikely(key->eth.type == htons(0)))
>>return -ENOMEM;
>> 
>> +   if (skb->protocol == htons(ETH_P_TEB)) {
>> +   if (key->eth.vlan.tci & htons(VLAN_TAG_PRESENT)
>> +   && !skb_vlan_tag_present(skb))
>> +   skb->protocol = key->eth.vlan.tpid;
>> +   else
>> +   skb->protocol = key->eth.type;
>> +   }
>> +
> 
> I am not sure if this work in case of nested vlans.
> Can we move skb-protocol assignment to parse_vlan() to avoid checking
> for non-accelerated vlan case again here?

I did this for the v3 that I sent out a moment ago.

Thanks for the review,

  Jarno




[PATCH v3 net-next 2/3] openvswitch: Use is_skb_forwardable() for length check.

2016-11-29 Thread Jarno Rajahalme
Use is_skb_forwardable() instead of an explicit length check.  This
gets around the apparent MTU check failure in OVS test cases when
skb->protocol is not properly set in case of non-accelerated VLAN
skbs.

Suggested-by: Pravin Shelar <pshe...@ovn.org>
Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
v3: New patch suggested by Pravin.

net/openvswitch/vport.c | 38 ++
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index b6c8524..076b39f 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -464,27 +464,8 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff 
*skb,
return 0;
 }
 
-static unsigned int packet_length(const struct sk_buff *skb,
- struct net_device *dev)
-{
-   unsigned int length = skb->len - dev->hard_header_len;
-
-   if (!skb_vlan_tag_present(skb) &&
-   eth_type_vlan(skb->protocol))
-   length -= VLAN_HLEN;
-
-   /* Don't subtract for multiple VLAN tags. Most (all?) drivers allow
-* (ETH_LEN + VLAN_HLEN) in addition to the mtu value, but almost none
-* account for 802.1ad. e.g. is_skb_forwardable().
-*/
-
-   return length;
-}
-
 void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto)
 {
-   int mtu = vport->dev->mtu;
-
switch (vport->dev->type) {
case ARPHRD_NONE:
if (mac_proto == MAC_PROTO_ETHERNET) {
@@ -504,11 +485,20 @@ void ovs_vport_send(struct vport *vport, struct sk_buff 
*skb, u8 mac_proto)
goto drop;
}
 
-   if (unlikely(packet_length(skb, vport->dev) > mtu &&
-!skb_is_gso(skb))) {
-   net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n",
-vport->dev->name,
-packet_length(skb, vport->dev), mtu);
+   if (unlikely(!is_skb_forwardable(vport->dev, skb))) {
+   /* Log only if the device is up. */
+   if (vport->dev->flags & IFF_UP) {
+   unsigned int length = skb->len
+   - vport->dev->hard_header_len;
+
+   if (!skb_vlan_tag_present(skb)
+   && eth_type_vlan(skb->protocol))
+   length -= VLAN_HLEN;
+
+   net_warn_ratelimited("%s: dropped over-mtu packet %d > 
%d\n",
+vport->dev->name, length,
+vport->dev->mtu);
+   }
vport->dev->stats.tx_errors++;
goto drop;
}
-- 
2.1.4



[PATCH v3 net-next 1/3] openvswitch: Add a missing break statement.

2016-11-29 Thread Jarno Rajahalme
Add a break statement to prevent fall-through from
OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
actions setting ethernet addresses fail to validate with log messages
complaining about invalid tunnel attributes.

Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
Acked-by: Jiri Benc <jb...@redhat.com>
---
v3: No change.

 net/openvswitch/flow_netlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d19044f..c87d359 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2195,6 +2195,7 @@ static int validate_set(const struct nlattr *a,
case OVS_KEY_ATTR_ETHERNET:
if (mac_proto != MAC_PROTO_ETHERNET)
return -EINVAL;
+   break;
 
case OVS_KEY_ATTR_TUNNEL:
if (masked)
-- 
2.1.4



[PATCH v3 net-next 3/3] openvswitch: Fix skb->protocol for vlan frames.

2016-11-29 Thread Jarno Rajahalme
Do not always set skb->protocol to be the ethertype of the L3 header.
For a packet with non-accelerated VLAN tags skb->protocol needs to be
the ethertype of the outermost non-accelerated VLAN ethertype.

Any VLAN offloading is undone on the OVS netlink interface, and any
VLAN tags added by userspace are non-accelerated, as are double tagged
VLAN packets.

Fixes: 018c1dda5f ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, 
netlink attributes")
Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
v3: Set skb->protocol properly also for double tagged packets, as suggested
by Pravin.  This patch is no longer needed for the MTU test failure, as
the new patch 2/3 addresses that.

 net/openvswitch/datapath.c |  1 -
 net/openvswitch/flow.c | 62 +++---
 2 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 2d4c4d3..9c62b63 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -606,7 +606,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, 
struct genl_info *info)
rcu_assign_pointer(flow->sf_acts, acts);
packet->priority = flow->key.phy.priority;
packet->mark = flow->key.phy.skb_mark;
-   packet->protocol = flow->key.eth.type;
 
rcu_read_lock();
dp = get_dp_rcu(net, ovs_header->dp_ifindex);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 08aa926..e2fe2e5 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -354,6 +354,7 @@ static int parse_vlan(struct sk_buff *skb, struct 
sw_flow_key *key)
res = parse_vlan_tag(skb, >eth.vlan);
if (res <= 0)
return res;
+   skb->protocol = key->eth.vlan.tpid;
}
 
/* Parse inner vlan tag. */
@@ -361,6 +362,11 @@ static int parse_vlan(struct sk_buff *skb, struct 
sw_flow_key *key)
if (res <= 0)
return res;
 
+   /* If the outer vlan tag was accelerated, skb->protocol should
+* refelect the inner vlan type. */
+   if (!eth_type_vlan(skb->protocol))
+   skb->protocol = key->eth.cvlan.tpid;
+
return 0;
 }
 
@@ -477,12 +483,18 @@ static int parse_icmpv6(struct sk_buff *skb, struct 
sw_flow_key *key,
 }
 
 /**
- * key_extract - extracts a flow key from an Ethernet frame.
+ * key_extract - extracts a flow key from a packet with or without an
+ * Ethernet header.
  * @skb: sk_buff that contains the frame, with skb->data pointing to the
- * Ethernet header
+ * beginning of the packet.
  * @key: output flow key
  *
- * The caller must ensure that skb->len >= ETH_HLEN.
+ * 'key->mac_proto' must be initialized to indicate the frame type.  For an L3
+ * frame 'key->mac_proto' must equal 'MAC_PROTO_NONE', and the caller must
+ * ensure that 'skb->protocol' is set to the ethertype of the L3 header.
+ * Otherwise the presence of an Ethernet header is assumed and the caller must
+ * ensure that skb->len >= ETH_HLEN and that 'skb->protocol' is initialized to
+ * zero.
  *
  * Returns 0 if successful, otherwise a negative errno value.
  *
@@ -498,8 +510,9 @@ static int parse_icmpv6(struct sk_buff *skb, struct 
sw_flow_key *key,
  *  of a correct length, otherwise the same as skb->network_header.
  *  For other key->eth.type values it is left untouched.
  *
- *- skb->protocol: the type of the data starting at skb->network_header.
- *  Equals to key->eth.type.
+ *- skb->protocol: For non-accelerated VLAN, one of the VLAN ether types,
+ *  otherwise the same as key->eth.type, the ether type of the payload
+ *  starting at skb->network_header.
  */
 static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 {
@@ -531,15 +544,16 @@ static int key_extract(struct sk_buff *skb, struct 
sw_flow_key *key)
if (unlikely(parse_vlan(skb, key)))
return -ENOMEM;
 
-   skb->protocol = parse_ethertype(skb);
-   if (unlikely(skb->protocol == htons(0)))
+   key->eth.type = parse_ethertype(skb);
+   if (unlikely(key->eth.type == htons(0)))
return -ENOMEM;
 
skb_reset_network_header(skb);
__skb_push(skb, skb->data - skb_mac_header(skb));
}
skb_reset_mac_len(skb);
-   key->eth.type = skb->protocol;
+   if (!eth_type_vlan(skb->protocol))
+   skb->protocol = key->eth.type;
 
/* Network layer. */
if (key->eth.type == htons(ETH_P_IP)) {
@@ -800,29 +814,15 @@ int ovs_flow_key_extract_userspace(struct net *net, const 
struct nlattr *attr,
if (err)
return err;

[PATCH v2 net-next 1/2] openvswitch: Add a missing break statement.

2016-11-28 Thread Jarno Rajahalme
Add a break statement to prevent fall-through from
OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
actions setting ethernet addresses fail to validate with log messages
complaining about invalid tunnel attributes.

Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
Acked-by: Jiri Benc <jb...@redhat.com>
---
v2: No change.

 net/openvswitch/flow_netlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d19044f..c87d359 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2195,6 +2195,7 @@ static int validate_set(const struct nlattr *a,
case OVS_KEY_ATTR_ETHERNET:
if (mac_proto != MAC_PROTO_ETHERNET)
return -EINVAL;
+   break;
 
case OVS_KEY_ATTR_TUNNEL:
if (masked)
-- 
2.1.4



[PATCH v2 net-next 2/2] openvswitch: Fix skb->protocol for vlan frames.

2016-11-28 Thread Jarno Rajahalme
Do not set skb->protocol to be the ethertype of the L3 header, unless
the packet only has the L3 header.  For a non-hardware offloaded VLAN
frame skb->protocol needs to be one of the VLAN ethertypes.

Any VLAN offloading is undone on the OVS netlink interface.  Also any
VLAN tags added by userspace are non-offloaded.

Incorrect skb->protocol value on a full-size non-offloaded VLAN skb
causes packet drop due to failing MTU check, as the VLAN header should
not be counted in when considering MTU in ovs_vport_send().

Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
v2: Set skb->protocol when an ETH_P_TEB frame is received via ARPHRD_NONE
interface.

 net/openvswitch/datapath.c |  1 -
 net/openvswitch/flow.c | 30 ++
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 2d4c4d3..9c62b63 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -606,7 +606,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, 
struct genl_info *info)
rcu_assign_pointer(flow->sf_acts, acts);
packet->priority = flow->key.phy.priority;
packet->mark = flow->key.phy.skb_mark;
-   packet->protocol = flow->key.eth.type;
 
rcu_read_lock();
dp = get_dp_rcu(net, ovs_header->dp_ifindex);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 08aa926..b9aae99 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -477,12 +477,17 @@ static int parse_icmpv6(struct sk_buff *skb, struct 
sw_flow_key *key,
 }
 
 /**
- * key_extract - extracts a flow key from an Ethernet frame.
+ * key_extract - extracts a flow key from a packet with or without an
+ * Ethernet header.
  * @skb: sk_buff that contains the frame, with skb->data pointing to the
- * Ethernet header
+ * beginning of the packet.
  * @key: output flow key
  *
- * The caller must ensure that skb->len >= ETH_HLEN.
+ * 'key->mac_proto' must be initialized to indicate the frame type.
+ * For an L3 frame 'key->mac_proto' must equal 'MAC_PROTO_NONE', and the
+ * caller must ensure that 'skb->protocol' is set to the ethertype of the L3
+ * header.  Otherwise the presence of an Ethernet header is assumed and
+ * the caller must ensure that skb->len >= ETH_HLEN.
  *
  * Returns 0 if successful, otherwise a negative errno value.
  *
@@ -498,8 +503,9 @@ static int parse_icmpv6(struct sk_buff *skb, struct 
sw_flow_key *key,
  *  of a correct length, otherwise the same as skb->network_header.
  *  For other key->eth.type values it is left untouched.
  *
- *- skb->protocol: the type of the data starting at skb->network_header.
- *  Equals to key->eth.type.
+ *- skb->protocol: For non-accelerated VLAN, one of the VLAN ether types,
+ *  otherwise the same as key->eth.type, the ether type of the payload
+ *  starting at skb->network_header.
  */
 static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 {
@@ -518,6 +524,7 @@ static int key_extract(struct sk_buff *skb, struct 
sw_flow_key *key)
return -EINVAL;
 
skb_reset_network_header(skb);
+   key->eth.type = skb->protocol;
} else {
eth = eth_hdr(skb);
ether_addr_copy(key->eth.src, eth->h_source);
@@ -531,15 +538,22 @@ static int key_extract(struct sk_buff *skb, struct 
sw_flow_key *key)
if (unlikely(parse_vlan(skb, key)))
return -ENOMEM;
 
-   skb->protocol = parse_ethertype(skb);
-   if (unlikely(skb->protocol == htons(0)))
+   key->eth.type = parse_ethertype(skb);
+   if (unlikely(key->eth.type == htons(0)))
return -ENOMEM;
 
+   if (skb->protocol == htons(ETH_P_TEB)) {
+   if (key->eth.vlan.tci & htons(VLAN_TAG_PRESENT)
+   && !skb_vlan_tag_present(skb))
+   skb->protocol = key->eth.vlan.tpid;
+   else
+   skb->protocol = key->eth.type;
+   }
+
skb_reset_network_header(skb);
__skb_push(skb, skb->data - skb_mac_header(skb));
}
skb_reset_mac_len(skb);
-   key->eth.type = skb->protocol;
 
/* Network layer. */
if (key->eth.type == htons(ETH_P_IP)) {
-- 
2.1.4



Re: [PATCH net-next 2/2] openvswitch: Fix skb->protocol for vlan frames.

2016-11-28 Thread Jarno Rajahalme

> On Nov 28, 2016, at 2:42 PM, Jiri Benc <jb...@redhat.com> wrote:
> 
> On Mon, 28 Nov 2016 14:29:39 -0800, Jarno Rajahalme wrote:
>> I’m not sure what you suggest here. Obviously the kernel ABI can not
>> be changed as existing userspace code expects upcalled packets to be
>> non-accelerated. Also, if userspace pushes vlan headers, the packet
>> will actually have them.
> 
> The user space API needs to be preserved, of course. I'm talking about
> what happens internally in the kernel.
> 
> See this patchset: https://www.spinics.net/lists/netdev/msg398827.html
> 

I did not try to apply this series yet, but given the recent L3 changes maybe 
it needs a rebase?

>> Would this incremental fix this:
>> 
>> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> index 9be9fda..37f1bb9 100644
>> --- a/net/openvswitch/flow.c
>> +++ b/net/openvswitch/flow.c
>> @@ -354,6 +354,8 @@ static int parse_vlan(struct sk_buff *skb, struct
>> sw_flow_key *key) res = parse_vlan_tag(skb, >eth.vlan);
>>  if (res <= 0)
>>  return res;
>> +if (skb->protocol == htons(ETH_P_TEB))
>> +skb->protocol = key->eth.vlan.tpid;
>>  }
>> 
>>  /* Parse inner vlan tag. */
> 
> I'll look at this tomorrow. But it seems we're adding more and more
> hacks instead of cleaning up the vlan handling.
> 

Right, I just noticed that the incremental only handles the VLAN case.

I’ll post a v2 later today with a proper fix that solves the immediate issue. 
IMO this should be fixed independently of the VLAN handling series for which I 
have no informed opinion yet.

  Jarno

> Jiri



Re: [PATCH net-next 2/2] openvswitch: Fix skb->protocol for vlan frames.

2016-11-28 Thread Jarno Rajahalme

> On Nov 24, 2016, at 8:10 AM, Jiri Benc <jb...@redhat.com> wrote:
> 
> On Tue, 22 Nov 2016 20:09:34 -0800, Jarno Rajahalme wrote:
>> Do not set skb->protocol to be the ethertype of the L3 header, unless
>> the packet only has the L3 header.  For a non-hardware offloaded VLAN
>> frame skb->protocol needs to be one of the VLAN ethertypes.
>> 
>> Any VLAN offloading is undone on the OVS netlink interface.  Due to
>> this all VLAN packets sent to openvswitch module from userspace are
>> non-offloaded.
> 
> This is exactly why I wanted to always accelerate the vlan tag, the
> same way it is done in other parts of the networking stack: to prevent
> all those weird corner cases.
> 
> Looks to me this is the only real way forward.
> 

I’m not sure what you suggest here. Obviously the kernel ABI can not be changed 
as existing userspace code expects upcalled packets to be non-accelerated. 
Also, if userspace pushes vlan headers, the packet will actually have them.

> This patch is wrong, it would leave skb->protocol as ETH_P_TEB for L2
> frames received via ARPHRD_NONE interface.
> 

Would this incremental fix this:

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 9be9fda..37f1bb9 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -354,6 +354,8 @@ static int parse_vlan(struct sk_buff *skb, struct 
sw_flow_key *key)
res = parse_vlan_tag(skb, >eth.vlan);
if (res <= 0)
return res;
+   if (skb->protocol == htons(ETH_P_TEB))
+   skb->protocol = key->eth.vlan.tpid;
}
 
/* Parse inner vlan tag. */


  Jarno


[PATCH net-next 1/2] openvswitch: Add a missing break statement.

2016-11-22 Thread Jarno Rajahalme
Add a break statement to prevent fall-through from
OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
actions setting ethernet addresses fail to validate with log messages
complaining about invalid tunnel attributes.

Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/flow_netlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d19044f..c87d359 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2195,6 +2195,7 @@ static int validate_set(const struct nlattr *a,
case OVS_KEY_ATTR_ETHERNET:
if (mac_proto != MAC_PROTO_ETHERNET)
return -EINVAL;
+   break;
 
case OVS_KEY_ATTR_TUNNEL:
if (masked)
-- 
2.1.4



[PATCH net-next 2/2] openvswitch: Fix skb->protocol for vlan frames.

2016-11-22 Thread Jarno Rajahalme
Do not set skb->protocol to be the ethertype of the L3 header, unless
the packet only has the L3 header.  For a non-hardware offloaded VLAN
frame skb->protocol needs to be one of the VLAN ethertypes.

Any VLAN offloading is undone on the OVS netlink interface.  Due to
this all VLAN packets sent to openvswitch module from userspace are
non-offloaded.

Incorrect skb->protocol value on a full-size non-offloaded VLAN skb
causes packet drop due to failing MTU check, as the VLAN header should
not be counted in when considering MTU in ovs_vport_send().

Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/datapath.c |  1 -
 net/openvswitch/flow.c | 20 +++-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 2d4c4d3..9c62b63 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -606,7 +606,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, 
struct genl_info *info)
rcu_assign_pointer(flow->sf_acts, acts);
packet->priority = flow->key.phy.priority;
packet->mark = flow->key.phy.skb_mark;
-   packet->protocol = flow->key.eth.type;
 
rcu_read_lock();
dp = get_dp_rcu(net, ovs_header->dp_ifindex);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 08aa926..9be9fda 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -477,12 +477,17 @@ static int parse_icmpv6(struct sk_buff *skb, struct 
sw_flow_key *key,
 }
 
 /**
- * key_extract - extracts a flow key from an Ethernet frame.
+ * key_extract - extracts a flow key from a packet with or without an
+ * Ethernet header.
  * @skb: sk_buff that contains the frame, with skb->data pointing to the
- * Ethernet header
+ * beginning of the packet.
  * @key: output flow key
  *
- * The caller must ensure that skb->len >= ETH_HLEN.
+ * 'key->mac_proto' must be initialized to indicate the frame type.
+ * For an L3 frame 'key->mac_proto' must equal 'MAC_PROTO_NONE', and the
+ * caller must ensure that 'skb->protocol' is set to the ethertype of the L3
+ * header.  Otherwise the presence of an Ethernet header is assumed and
+ * the caller must ensure that skb->len >= ETH_HLEN.
  *
  * Returns 0 if successful, otherwise a negative errno value.
  *
@@ -497,9 +502,6 @@ static int parse_icmpv6(struct sk_buff *skb, struct 
sw_flow_key *key,
  *  on output, then just past the IP header, if one is present and
  *  of a correct length, otherwise the same as skb->network_header.
  *  For other key->eth.type values it is left untouched.
- *
- *- skb->protocol: the type of the data starting at skb->network_header.
- *  Equals to key->eth.type.
  */
 static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 {
@@ -518,6 +520,7 @@ static int key_extract(struct sk_buff *skb, struct 
sw_flow_key *key)
return -EINVAL;
 
skb_reset_network_header(skb);
+   key->eth.type = skb->protocol;
} else {
eth = eth_hdr(skb);
ether_addr_copy(key->eth.src, eth->h_source);
@@ -531,15 +534,14 @@ static int key_extract(struct sk_buff *skb, struct 
sw_flow_key *key)
if (unlikely(parse_vlan(skb, key)))
return -ENOMEM;
 
-   skb->protocol = parse_ethertype(skb);
-   if (unlikely(skb->protocol == htons(0)))
+   key->eth.type = parse_ethertype(skb);
+   if (unlikely(key->eth.type == htons(0)))
return -ENOMEM;
 
skb_reset_network_header(skb);
__skb_push(skb, skb->data - skb_mac_header(skb));
}
skb_reset_mac_len(skb);
-   key->eth.type = skb->protocol;
 
/* Network layer. */
if (key->eth.type == htons(ETH_P_IP)) {
-- 
2.1.4



Re: Long delays creating a netns after deleting one (possibly RCU related)

2016-11-18 Thread Jarno Rajahalme

> On Nov 14, 2016, at 3:09 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> 
> On Mon, 2016-11-14 at 14:46 -0800, Eric Dumazet wrote:
>> On Mon, 2016-11-14 at 16:12 -0600, Eric W. Biederman wrote:
>> 
>>> synchronize_rcu_expidited is not enough if you have multiple network
>>> devices in play.
>>> 
>>> Looking at the code it comes down to this commit, and it appears there
>>> is a promise add rcu grace period combining by Eric Dumazet.
>>> 
>>> Eric since people are hitting noticable stalls because of the rcu grace
>>> period taking a long time do you think you could look at this code path
>>> a bit more?
>>> 
>>> commit 93d05d4a320cb16712bb3d57a9658f395d8cecb9
>>> Author: Eric Dumazet <eduma...@google.com>
>>> Date:   Wed Nov 18 06:31:03 2015 -0800
>> 
>> Absolutely, I will take a loop asap.
> 
> The worst offender should be fixed by the following patch.
> 
> busy poll needs to poll the physical device, not a virtual one...
> 
> diff --git a/include/net/gro_cells.h b/include/net/gro_cells.h
> index 
> d15214d673b2e8e08fd6437b572278fb1359f10d..2a1abbf8da74368cd01adc40cef6c0644e059ef2
>  100644
> --- a/include/net/gro_cells.h
> +++ b/include/net/gro_cells.h
> @@ -68,6 +68,9 @@ static inline int gro_cells_init(struct gro_cells *gcells, 
> struct net_device *de
>   struct gro_cell *cell = per_cpu_ptr(gcells->cells, i);
> 
>   __skb_queue_head_init(>napi_skbs);
> +
> + set_bit(NAPI_STATE_NO_BUSY_POLL, >napi.state);
> +
>   netif_napi_add(dev, >napi, gro_cell_poll, 64);
>   napi_enable(>napi);
>   }
> 
> 
> 
> 
> 

This fixes the problem for me, so for whatever it’s worth:

Tested-by: Jarno Rajahalme <ja...@ovn.org>



[PATCH net-next v2 3/5] virtio_net: Do not clear memory for struct virtio_net_hdr twice.

2016-11-18 Thread Jarno Rajahalme
virtio_net_hdr_from_skb() clears the memory for the header, so there
is no point for the callers to do the same.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 drivers/net/tun.c  | 3 +--
 include/linux/virtio_net.h | 2 +-
 net/packet/af_packet.c | 2 --
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3b8d8cc..64e694c 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1360,8 +1360,7 @@ static ssize_t tun_put_user(struct tun_struct *tun,
}
 
if (vnet_hdr_sz) {
-   struct virtio_net_hdr gso = { 0 }; /* no info leak */
-   int ret;
+   struct virtio_net_hdr gso;
 
if (iov_iter_count(iter) < vnet_hdr_sz)
return -EINVAL;
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 74f1e33..6620400 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -58,7 +58,7 @@ static inline int virtio_net_hdr_from_skb(const struct 
sk_buff *skb,
  struct virtio_net_hdr *hdr,
  bool little_endian)
 {
-   memset(hdr, 0, sizeof(*hdr));
+   memset(hdr, 0, sizeof(*hdr));   /* no info leak */
 
if (skb_is_gso(skb)) {
struct skb_shared_info *sinfo = skb_shinfo(skb);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d2238b2..abe6c0b 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1970,8 +1970,6 @@ static unsigned int run_filter(struct sk_buff *skb,
 static int __packet_rcv_vnet(const struct sk_buff *skb,
 struct virtio_net_hdr *vnet_hdr)
 {
-   *vnet_hdr = (const struct virtio_net_hdr) { 0 };
-
if (virtio_net_hdr_from_skb(skb, vnet_hdr, vio_le()))
BUG();
 
-- 
2.1.4



[PATCH net-next v2 4/5] af_packet: Use virtio_net_hdr_to_skb().

2016-11-18 Thread Jarno Rajahalme
Use the common virtio_net_hdr_to_skb() instead of open coding it.
Other call sites were changed by commit fd2a0437dc, but this one was
missed, maybe because it is split in two parts of the source code.

Interim comparisons of 'vnet_hdr->gso_type' still work as both the
vnet_hdr and skb notion of gso_type is zero when there is no gso.

Fixes: fd2a0437dc ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/packet/af_packet.c | 51 +++---
 1 file changed, 3 insertions(+), 48 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index abe6c0b..1816b77 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2388,8 +2388,6 @@ static void tpacket_set_protocol(const struct net_device 
*dev,
 
 static int __packet_snd_vnet_parse(struct virtio_net_hdr *vnet_hdr, size_t len)
 {
-   unsigned short gso_type = 0;
-
if ((vnet_hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
(__virtio16_to_cpu(vio_le(), vnet_hdr->csum_start) +
 __virtio16_to_cpu(vio_le(), vnet_hdr->csum_offset) + 2 >
@@ -2401,29 +2399,6 @@ static int __packet_snd_vnet_parse(struct virtio_net_hdr 
*vnet_hdr, size_t len)
if (__virtio16_to_cpu(vio_le(), vnet_hdr->hdr_len) > len)
return -EINVAL;
 
-   if (vnet_hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
-   switch (vnet_hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
-   case VIRTIO_NET_HDR_GSO_TCPV4:
-   gso_type = SKB_GSO_TCPV4;
-   break;
-   case VIRTIO_NET_HDR_GSO_TCPV6:
-   gso_type = SKB_GSO_TCPV6;
-   break;
-   case VIRTIO_NET_HDR_GSO_UDP:
-   gso_type = SKB_GSO_UDP;
-   break;
-   default:
-   return -EINVAL;
-   }
-
-   if (vnet_hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
-   gso_type |= SKB_GSO_TCP_ECN;
-
-   if (vnet_hdr->gso_size == 0)
-   return -EINVAL;
-   }
-
-   vnet_hdr->gso_type = gso_type;  /* changes type, temporary storage */
return 0;
 }
 
@@ -2443,27 +2418,6 @@ static int packet_snd_vnet_parse(struct msghdr *msg, 
size_t *len,
return __packet_snd_vnet_parse(vnet_hdr, *len);
 }
 
-static int packet_snd_vnet_gso(struct sk_buff *skb,
-  struct virtio_net_hdr *vnet_hdr)
-{
-   if (vnet_hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
-   u16 s = __virtio16_to_cpu(vio_le(), vnet_hdr->csum_start);
-   u16 o = __virtio16_to_cpu(vio_le(), vnet_hdr->csum_offset);
-
-   if (!skb_partial_csum_set(skb, s, o))
-   return -EINVAL;
-   }
-
-   skb_shinfo(skb)->gso_size =
-   __virtio16_to_cpu(vio_le(), vnet_hdr->gso_size);
-   skb_shinfo(skb)->gso_type = vnet_hdr->gso_type;
-
-   /* Header must be checked, and gso_segs computed. */
-   skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
-   skb_shinfo(skb)->gso_segs = 0;
-   return 0;
-}
-
 static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
void *frame, struct net_device *dev, void *data, int tp_len,
__be16 proto, unsigned char *addr, int hlen, int copylen,
@@ -2723,7 +2677,8 @@ static int tpacket_snd(struct packet_sock *po, struct 
msghdr *msg)
}
}
 
-   if (po->has_vnet_hdr && packet_snd_vnet_gso(skb, vnet_hdr)) {
+   if (po->has_vnet_hdr && virtio_net_hdr_to_skb(skb, vnet_hdr,
+ vio_le())) {
tp_len = -EINVAL;
goto tpacket_error;
}
@@ -2914,7 +2869,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
packet_pick_tx_queue(dev, skb);
 
if (po->has_vnet_hdr) {
-   err = packet_snd_vnet_gso(skb, _hdr);
+   err = virtio_net_hdr_to_skb(skb, _hdr, vio_le());
if (err)
goto out_free;
len += sizeof(vnet_hdr);
-- 
2.1.4



[PATCH net-next v2 2/5] virtio_net.h: Fix comment.

2016-11-18 Thread Jarno Rajahalme
Fix incorrent comment after the final #endif.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 include/linux/virtio_net.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 1c912f8..74f1e33 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -98,4 +98,4 @@ static inline int virtio_net_hdr_from_skb(const struct 
sk_buff *skb,
return 0;
 }
 
-#endif /* _LINUX_VIRTIO_BYTEORDER */
+#endif /* _LINUX_VIRTIO_NET_H */
-- 
2.1.4



[PATCH net-next v2 5/5] af_packet: Use virtio_net_hdr_from_skb() directly.

2016-11-18 Thread Jarno Rajahalme
Remove static function __packet_rcv_vnet(), which only called
virtio_net_hdr_from_skb() and BUG()ged out if an error code was
returned.  Instead, call virtio_net_hdr_from_skb() from the former
call sites of __packet_rcv_vnet() and actually use the error handling
code that is already there.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/packet/af_packet.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 1816b77..fab9bbf 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1967,15 +1967,6 @@ static unsigned int run_filter(struct sk_buff *skb,
return res;
 }
 
-static int __packet_rcv_vnet(const struct sk_buff *skb,
-struct virtio_net_hdr *vnet_hdr)
-{
-   if (virtio_net_hdr_from_skb(skb, vnet_hdr, vio_le()))
-   BUG();
-
-   return 0;
-}
-
 static int packet_rcv_vnet(struct msghdr *msg, const struct sk_buff *skb,
   size_t *len)
 {
@@ -1985,7 +1976,7 @@ static int packet_rcv_vnet(struct msghdr *msg, const 
struct sk_buff *skb,
return -EINVAL;
*len -= sizeof(vnet_hdr);
 
-   if (__packet_rcv_vnet(skb, _hdr))
+   if (virtio_net_hdr_from_skb(skb, _hdr, vio_le()))
return -EINVAL;
 
return memcpy_to_msg(msg, (void *)_hdr, sizeof(vnet_hdr));
@@ -2244,8 +2235,9 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
net_device *dev,
spin_unlock(>sk_receive_queue.lock);
 
if (po->has_vnet_hdr) {
-   if (__packet_rcv_vnet(skb, h.raw + macoff -
-  sizeof(struct virtio_net_hdr))) {
+   if (virtio_net_hdr_from_skb(skb, h.raw + macoff -
+   sizeof(struct virtio_net_hdr),
+   vio_le())) {
spin_lock(>sk_receive_queue.lock);
goto drop_n_account;
}
-- 
2.1.4



[PATCH net-next v2 1/5] virtio_net: Simplify call sites for virtio_net_hdr_{from,to}_skb().

2016-11-18 Thread Jarno Rajahalme
No point storing the return value of virtio_net_hdr_to_skb() or
virtio_net_hdr_from_skb() to a variable when the value is used only
once as a boolean in an immediately following if statement.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 drivers/net/macvtap.c | 5 ++---
 drivers/net/tun.c | 8 +++-
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 070e329..5da9861 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -821,9 +821,8 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
if (iov_iter_count(iter) < vnet_hdr_len)
return -EINVAL;
 
-   ret = virtio_net_hdr_from_skb(skb, _hdr,
- macvtap_is_little_endian(q));
-   if (ret)
+   if (virtio_net_hdr_from_skb(skb, _hdr,
+   macvtap_is_little_endian(q)))
BUG();
 
if (copy_to_iter(_hdr, sizeof(vnet_hdr), iter) !=
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1588469..3b8d8cc 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1252,8 +1252,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
struct tun_file *tfile,
return -EFAULT;
}
 
-   err = virtio_net_hdr_to_skb(skb, , tun_is_little_endian(tun));
-   if (err) {
+   if (virtio_net_hdr_to_skb(skb, , tun_is_little_endian(tun))) {
this_cpu_inc(tun->pcpu_stats->rx_frame_errors);
kfree_skb(skb);
return -EINVAL;
@@ -1367,9 +1366,8 @@ static ssize_t tun_put_user(struct tun_struct *tun,
if (iov_iter_count(iter) < vnet_hdr_sz)
return -EINVAL;
 
-   ret = virtio_net_hdr_from_skb(skb, ,
- tun_is_little_endian(tun));
-   if (ret) {
+   if (virtio_net_hdr_from_skb(skb, ,
+   tun_is_little_endian(tun))) {
struct skb_shared_info *sinfo = skb_shinfo(skb);
pr_err("unexpected GSO type: "
   "0x%x, gso_size %d, hdr_len %d\n",
-- 
2.1.4



Re: [net-next] af_packet: Use virtio_net_hdr_to_skb().

2016-11-18 Thread Jarno Rajahalme
Sorry for my transgressions and wasting your time. I’ll send a v2 in a moment.

  Jarno
 
> On Nov 18, 2016, at 8:35 AM, David Miller <da...@davemloft.net> wrote:
> 
> From: Jarno Rajahalme <ja...@ovn.org>
> Date: Wed, 16 Nov 2016 18:06:42 -0800
> 
>> Use the common virtio_net_hdr_to_skb() instead of open coding it.
>> Other call sites were changed by commit fd2a0437dc, but this one was
>> missed, maybe because it is split in two parts of the source code.
>> 
>> Also fix other call sites to be more uniform.
>> 
>> Fixes: fd2a0437dc ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> 
> This patch is doing many more things that just this.
> 
> Do not mix unrelated changes together:
> 
>> @@ -821,9 +821,8 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
>>  if (iov_iter_count(iter) < vnet_hdr_len)
>>  return -EINVAL;
>> 
>> -ret = virtio_net_hdr_from_skb(skb, _hdr,
>> -  macvtap_is_little_endian(q));
>> -if (ret)
>> +if (virtio_net_hdr_from_skb(skb, _hdr,
>> +macvtap_is_little_endian(q)))
>>  BUG();
>> 
>>  if (copy_to_iter(_hdr, sizeof(vnet_hdr), iter) !=
> 
> This has nothing to do with modifying code to use vrtio_net_hdr_to_skb(), it
> doesn't belong in this patch.
> 
>> @@ -1361,15 +1360,12 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>>  }
>> 
>>  if (vnet_hdr_sz) {
>> -struct virtio_net_hdr gso = { 0 }; /* no info leak */
>> -int ret;
>> -
>> +struct virtio_net_hdr gso;
> 
> This is _extremely_ opaque.  The initializer is trying to prevent kernel 
> memory
> info leaks onto the network or into user space.
> 
> Maybe this transformation is valid but:
> 
> 1) YOU DON'T EVEN MENTION IT IN YOUR COMMIT MESSAGE.
> 
> 2) It's unrelated to this specific change, therefore it belongs in
>   a separate change.
> 
> 3) You don't explain that it is a valid transformation, not why.
> 
> It is extremely disappointing to catch unrelated, potentially far
> reaching things embedded in a patch when I review it.
> 
> Please do not ever do this.
> 
>> @@ -98,4 +98,4 @@ static inline int virtio_net_hdr_from_skb(const struct 
>> sk_buff *skb,
>>  return 0;
>> }
>> 
>> -#endif /* _LINUX_VIRTIO_BYTEORDER */
>> +#endif /* _LINUX_VIRTIO_NET_H */
> 
> Another unrelated change.
> 
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index 11db0d6..09abb88 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -1971,8 +1971,6 @@ static unsigned int run_filter(struct sk_buff *skb,
>> static int __packet_rcv_vnet(const struct sk_buff *skb,
>>   struct virtio_net_hdr *vnet_hdr)
>> {
>> -*vnet_hdr = (const struct virtio_net_hdr) { 0 };
>> -
> 
> There is no way this belongs in this patch, and again you do not explain
> why removing this initializer is valid.



Re: Long delays creating a netns after deleting one (possibly RCU related)

2016-11-17 Thread Jarno Rajahalme

> On Nov 14, 2016, at 3:09 PM, Eric Dumazet  wrote:
> 
> On Mon, 2016-11-14 at 14:46 -0800, Eric Dumazet wrote:
>> On Mon, 2016-11-14 at 16:12 -0600, Eric W. Biederman wrote:
>> 
>>> synchronize_rcu_expidited is not enough if you have multiple network
>>> devices in play.
>>> 
>>> Looking at the code it comes down to this commit, and it appears there
>>> is a promise add rcu grace period combining by Eric Dumazet.
>>> 
>>> Eric since people are hitting noticable stalls because of the rcu grace
>>> period taking a long time do you think you could look at this code path
>>> a bit more?
>>> 
>>> commit 93d05d4a320cb16712bb3d57a9658f395d8cecb9
>>> Author: Eric Dumazet 
>>> Date:   Wed Nov 18 06:31:03 2015 -0800
>> 
>> Absolutely, I will take a loop asap.
> 
> The worst offender should be fixed by the following patch.
> 
> busy poll needs to poll the physical device, not a virtual one...
> 
> diff --git a/include/net/gro_cells.h b/include/net/gro_cells.h
> index 
> d15214d673b2e8e08fd6437b572278fb1359f10d..2a1abbf8da74368cd01adc40cef6c0644e059ef2
>  100644
> --- a/include/net/gro_cells.h
> +++ b/include/net/gro_cells.h
> @@ -68,6 +68,9 @@ static inline int gro_cells_init(struct gro_cells *gcells, 
> struct net_device *de
>   struct gro_cell *cell = per_cpu_ptr(gcells->cells, i);
> 
>   __skb_queue_head_init(>napi_skbs);
> +
> + set_bit(NAPI_STATE_NO_BUSY_POLL, >napi.state);
> +
>   netif_napi_add(dev, >napi, gro_cell_poll, 64);
>   napi_enable(>napi);
>   }
> 

This solved a ~20 second slowdown between OVS datapath unit tests for me.

  Jarno



Re: Virtio_net support vxlan encapsulation package TSO offload discuss

2016-11-17 Thread Jarno Rajahalme
I worked on the same issue a few months back. I rebased my proof-of-concept 
code to the current net-next and posted an RFC patch a moment ago.

I have zero experience on QEMU feature negotiation or extending the virtio_net 
spec. Since the virtio_net handling code is now all done using shared code, 
this should work for macvtap as well, not sure if macvtap needs some control 
plane changes.

I posted a separate patch to make af_packet also use the shared infra for 
virtio_net handling yesterday. My RFC patch assumes that af_packet need not be 
touched, i.e., assumes the af_packet patch is applied, even though the patches 
apply to net-next in either order.

  Jarno

> On Nov 16, 2016, at 11:27 PM, Jason Wang  wrote:
> 
> 
> 
> On 2016年11月17日 09:31, Zhangming (James, Euler) wrote:
>> On 2016年11月15日 11:28, Jason Wang wrote:
>>> On 2016年11月10日 14:19, Zhangming (James, Euler) wrote:
 On 2016年11月09日 15:14, Jason Wang wrote:
> On 2016年11月08日 19:58, Zhangming (James, Euler) wrote:
>> On 2016年11月08日 19:17, Jason Wang wrote:
>> 
>>> On 2016年11月08日 19:13, Jason Wang wrote:
 Cc Michael
 
 On 2016年11月08日 16:34, Zhangming (James, Euler) wrote:
> In container scenario, OVS is installed in the Virtual machine,
> and all the containers connected to the OVS will communicated
> through VXLAN encapsulation.
> 
> By now, virtio_net does not support TSO offload for VXLAN
> encapsulated TSO package. In this condition, the performance is
> not good, sender is bottleneck
> 
> I googled this scenario, but I didn’t find any information. Will
> virtio_net support VXLAN encapsulation package TSO offload later?
> 
 Yes and for both sender and receiver.
 
> My idea is virtio_net open encapsulated TSO offload, and
> transport encapsulation info to TUN, TUN will parse the info and
> build skb with encapsulation info.
> 
> OVS or kernel on the host should be modified to support this.
> Using this method, the TCP performance aremore than 2x as before.
> 
> Any advice and suggestions for this idea or new idea will be
> greatly appreciated!
> 
> Best regards,
> 
>  James zhang
> 
 Sounds very good. And we may also need features bits
 (VIRTIO_NET_F_GUEST|HOST_GSO_X) for this.
 
 This is in fact one of items in networking todo list. (See
 http://www.linux-kvm.org/page/NetworkingTodo). While at it, we'd
 better support not only VXLAN but also other tunnels.
>>> Cc Vlad who is working on extending virtio-net headers.
>>> 
 We can start with the spec work, or if you've already had some
 bits you can post them as RFC for early review.
 
 Thanks
>> Below is my demo code
>> Virtio_net.c
>> static int virtnet_probe(struct virtio_device *vdev), add belows codes:
>>   if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
>> // avoid gso segment, it should be negotiation 
>> later, because in the demo I reuse num_buffers.
>>   virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>>   dev->hw_enc_features |= NETIF_F_TSO;
>>   dev->hw_enc_features |= NETIF_F_ALL_CSUM;
>>   dev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL;
>>   dev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
>>   dev->hw_enc_features |=
>> NETIF_F_GSO_TUNNEL_REMCSUM;
>> 
>>   dev->features |= NETIF_F_GSO_UDP_TUNNEL;
>>   dev->features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
>>   dev->features |= NETIF_F_GSO_TUNNEL_REMCSUM;
>>   }
>> 
>> static int xmit_skb(struct send_queue *sq, struct sk_buff *skb), add
>> below to pieces of codes
>> 
>>   if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL)
>>   hdr->hdr.gso_type |= VIRTIO_NET_HDR_GSO_TUNNEL;
>>   if (skb_shinfo(skb)->gso_type & 
>> SKB_GSO_UDP_TUNNEL_CSUM)
>>   hdr->hdr.gso_type |= 
>> VIRTIO_NET_HDR_GSO_TUNNEL_CSUM;
>>   if (skb_shinfo(skb)->gso_type & SKB_GSO_TUNNEL_REMCSUM)
>>   hdr->hdr.gso_type |=
>> VIRTIO_NET_HDR_GSO_TUNNEL_REMCSUM;
>> 
>>   if (skb->encapsulation && skb_is_gso(skb)) {
>>   inner_mac_len = skb_inner_network_header(skb) - 
>> skb_inner_mac_header(skb);
>>   tnl_len = skb_inner_mac_header(skb) - 
>> skb_mac_header(skb);
>>   if ( !(inner_mac_len >> DATA_LEN_SHIFT) && !(tnl_len 
>> >> DATA_LEN_SHIFT) ) {
>>   

[RFC PATCH net-next] virtio_net: Support UDP Tunnel offloads.

2016-11-17 Thread Jarno Rajahalme
This patch is a proof-of-concept I did a few months ago for UDP tunnel
offload support in virtio_net interface, and rebased on to the current
net-next.

Real implementation needs to extend the virtio_net header rather than
piggy-backing on existing fields.  Inner MAC length (or inner network
offset) also needs to be passed as a new field.  Control plane (QEMU)
also needs to be updated.

All testing was done using Geneve, but this should work for all UDP
tunnels the same.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 drivers/net/tun.c   |  7 -
 drivers/net/virtio_net.c| 16 +++---
 include/linux/skbuff.h  |  5 
 include/linux/virtio_net.h  | 66 ++---
 include/uapi/linux/virtio_net.h |  7 +
 5 files changed, 78 insertions(+), 23 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1588469..36f3219 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -198,7 +198,9 @@ struct tun_struct {
struct net_device   *dev;
netdev_features_t   set_features;
 #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
- NETIF_F_TSO6|NETIF_F_UFO)
+  NETIF_F_TSO6|NETIF_F_UFO|NETIF_F_GSO_UDP_TUNNEL| \
+  NETIF_F_GSO_UDP_TUNNEL_CSUM| \
+  NETIF_F_GSO_TUNNEL_REMCSUM)
 
int align;
int vnet_hdr_sz;
@@ -1877,6 +1879,9 @@ static int set_offload(struct tun_struct *tun, unsigned 
long arg)
 
if (arg & TUN_F_UFO) {
features |= NETIF_F_UFO;
+#if 1
+   features |= 
NETIF_F_GSO_UDP_TUNNEL|NETIF_F_GSO_UDP_TUNNEL_CSUM|NETIF_F_GSO_TUNNEL_REMCSUM;
+#endif
arg &= ~TUN_F_UFO;
}
}
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ca5239a..eb8d887 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1789,7 +1789,10 @@ static int virtnet_probe(struct virtio_device *vdev)
 
if (virtio_has_feature(vdev, VIRTIO_NET_F_GSO)) {
dev->hw_features |= NETIF_F_TSO | NETIF_F_UFO
-   | NETIF_F_TSO_ECN | NETIF_F_TSO6;
+   | NETIF_F_TSO_ECN | NETIF_F_TSO6
+   | NETIF_F_GSO_UDP_TUNNEL
+   | NETIF_F_GSO_UDP_TUNNEL_CSUM
+   | NETIF_F_GSO_TUNNEL_REMCSUM;
}
/* Individual feature bits: what can host handle? */
if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_TSO4))
@@ -1798,13 +1801,18 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->hw_features |= NETIF_F_TSO6;
if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
dev->hw_features |= NETIF_F_TSO_ECN;
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UFO))
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UFO)) {
dev->hw_features |= NETIF_F_UFO;
-
+#if 1
+   dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+   dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+   dev->hw_features |= NETIF_F_GSO_TUNNEL_REMCSUM;
+#endif
+   }
dev->features |= NETIF_F_GSO_ROBUST;
 
if (gso)
-   dev->features |= dev->hw_features & 
(NETIF_F_ALL_TSO|NETIF_F_UFO);
+   dev->features |= dev->hw_features & 
(NETIF_F_ALL_TSO|NETIF_F_UFO|NETIF_F_GSO_UDP_TUNNEL|NETIF_F_GSO_UDP_TUNNEL_CSUM|NETIF_F_GSO_TUNNEL_REMCSUM);
/* (!csum && gso) case will be fixed by register_netdev() */
}
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a4aeeca..992ad30 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2115,6 +2115,11 @@ static inline unsigned char *skb_inner_mac_header(const 
struct sk_buff *skb)
return skb->head + skb->inner_mac_header;
 }
 
+static inline int skb_inner_mac_offset(const struct sk_buff *skb)
+{
+   return skb_inner_mac_header(skb) - skb->data;
+}
+
 static inline void skb_reset_inner_mac_header(struct sk_buff *skb)
 {
skb->inner_mac_header = skb->data - skb->head;
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 1c912f8..17384d1 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -8,10 +8,19 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
const struct virtio_net_hdr *hdr,
bool little_endian)
 {
-   unsigned short gso_type = 0;
+   u16

[PATCH net-next] af_packet: Use virtio_net_hdr_to_skb().

2016-11-16 Thread Jarno Rajahalme
Use the common virtio_net_hdr_to_skb() instead of open coding it.
Other call sites were changed by commit fd2a0437dc, but this one was
missed, maybe because it is split in two parts of the source code.

Also fix other call sites to be more uniform.

Fixes: fd2a0437dc ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 drivers/net/macvtap.c  |  5 ++---
 drivers/net/tun.c  | 12 
 include/linux/virtio_net.h |  2 +-
 net/packet/af_packet.c | 44 +---
 4 files changed, 8 insertions(+), 55 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 070e329..5da9861 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -821,9 +821,8 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
if (iov_iter_count(iter) < vnet_hdr_len)
return -EINVAL;
 
-   ret = virtio_net_hdr_from_skb(skb, _hdr,
- macvtap_is_little_endian(q));
-   if (ret)
+   if (virtio_net_hdr_from_skb(skb, _hdr,
+   macvtap_is_little_endian(q)))
BUG();
 
if (copy_to_iter(_hdr, sizeof(vnet_hdr), iter) !=
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 9328568..864aae3 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1252,8 +1252,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
struct tun_file *tfile,
return -EFAULT;
}
 
-   err = virtio_net_hdr_to_skb(skb, , tun_is_little_endian(tun));
-   if (err) {
+   if (virtio_net_hdr_to_skb(skb, , tun_is_little_endian(tun))) {
this_cpu_inc(tun->pcpu_stats->rx_frame_errors);
kfree_skb(skb);
return -EINVAL;
@@ -1361,15 +1360,12 @@ static ssize_t tun_put_user(struct tun_struct *tun,
}
 
if (vnet_hdr_sz) {
-   struct virtio_net_hdr gso = { 0 }; /* no info leak */
-   int ret;
-
+   struct virtio_net_hdr gso;
if (iov_iter_count(iter) < vnet_hdr_sz)
return -EINVAL;
 
-   ret = virtio_net_hdr_from_skb(skb, ,
- tun_is_little_endian(tun));
-   if (ret) {
+   if (virtio_net_hdr_from_skb(skb, ,
+   tun_is_little_endian(tun))) {
struct skb_shared_info *sinfo = skb_shinfo(skb);
pr_err("unexpected GSO type: "
   "0x%x, gso_size %d, hdr_len %d\n",
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 1c912f8..74f1e33 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -98,4 +98,4 @@ static inline int virtio_net_hdr_from_skb(const struct 
sk_buff *skb,
return 0;
 }
 
-#endif /* _LINUX_VIRTIO_BYTEORDER */
+#endif /* _LINUX_VIRTIO_NET_H */
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 11db0d6..09abb88 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1971,8 +1971,6 @@ static unsigned int run_filter(struct sk_buff *skb,
 static int __packet_rcv_vnet(const struct sk_buff *skb,
 struct virtio_net_hdr *vnet_hdr)
 {
-   *vnet_hdr = (const struct virtio_net_hdr) { 0 };
-
if (virtio_net_hdr_from_skb(skb, vnet_hdr, vio_le()))
BUG();
 
@@ -2391,8 +2389,6 @@ static void tpacket_set_protocol(const struct net_device 
*dev,
 
 static int __packet_snd_vnet_parse(struct virtio_net_hdr *vnet_hdr, size_t len)
 {
-   unsigned short gso_type = 0;
-
if ((vnet_hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
(__virtio16_to_cpu(vio_le(), vnet_hdr->csum_start) +
 __virtio16_to_cpu(vio_le(), vnet_hdr->csum_offset) + 2 >
@@ -2404,29 +2400,6 @@ static int __packet_snd_vnet_parse(struct virtio_net_hdr 
*vnet_hdr, size_t len)
if (__virtio16_to_cpu(vio_le(), vnet_hdr->hdr_len) > len)
return -EINVAL;
 
-   if (vnet_hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
-   switch (vnet_hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
-   case VIRTIO_NET_HDR_GSO_TCPV4:
-   gso_type = SKB_GSO_TCPV4;
-   break;
-   case VIRTIO_NET_HDR_GSO_TCPV6:
-   gso_type = SKB_GSO_TCPV6;
-   break;
-   case VIRTIO_NET_HDR_GSO_UDP:
-   gso_type = SKB_GSO_UDP;
-   break;
-   default:
-   return -EINVAL;
-   }
-
-   if (vnet_hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
-   gso_type |= SKB_GSO

[net-next] af_packet: Use virtio_net_hdr_to_skb().

2016-11-16 Thread Jarno Rajahalme
Use the common virtio_net_hdr_to_skb() instead of open coding it.
Other call sites were changed by commit fd2a0437dc, but this one was
missed, maybe because it is split in two parts of the source code.

Also fix other call sites to be more uniform.

Fixes: fd2a0437dc ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 drivers/net/macvtap.c  |  5 ++---
 drivers/net/tun.c  | 12 
 include/linux/virtio_net.h |  2 +-
 net/packet/af_packet.c | 44 +---
 4 files changed, 8 insertions(+), 55 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 070e329..5da9861 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -821,9 +821,8 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
if (iov_iter_count(iter) < vnet_hdr_len)
return -EINVAL;
 
-   ret = virtio_net_hdr_from_skb(skb, _hdr,
- macvtap_is_little_endian(q));
-   if (ret)
+   if (virtio_net_hdr_from_skb(skb, _hdr,
+   macvtap_is_little_endian(q)))
BUG();
 
if (copy_to_iter(_hdr, sizeof(vnet_hdr), iter) !=
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 9328568..864aae3 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1252,8 +1252,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
struct tun_file *tfile,
return -EFAULT;
}
 
-   err = virtio_net_hdr_to_skb(skb, , tun_is_little_endian(tun));
-   if (err) {
+   if (virtio_net_hdr_to_skb(skb, , tun_is_little_endian(tun))) {
this_cpu_inc(tun->pcpu_stats->rx_frame_errors);
kfree_skb(skb);
return -EINVAL;
@@ -1361,15 +1360,12 @@ static ssize_t tun_put_user(struct tun_struct *tun,
}
 
if (vnet_hdr_sz) {
-   struct virtio_net_hdr gso = { 0 }; /* no info leak */
-   int ret;
-
+   struct virtio_net_hdr gso;
if (iov_iter_count(iter) < vnet_hdr_sz)
return -EINVAL;
 
-   ret = virtio_net_hdr_from_skb(skb, ,
- tun_is_little_endian(tun));
-   if (ret) {
+   if (virtio_net_hdr_from_skb(skb, ,
+   tun_is_little_endian(tun))) {
struct skb_shared_info *sinfo = skb_shinfo(skb);
pr_err("unexpected GSO type: "
   "0x%x, gso_size %d, hdr_len %d\n",
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 1c912f8..74f1e33 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -98,4 +98,4 @@ static inline int virtio_net_hdr_from_skb(const struct 
sk_buff *skb,
return 0;
 }
 
-#endif /* _LINUX_VIRTIO_BYTEORDER */
+#endif /* _LINUX_VIRTIO_NET_H */
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 11db0d6..09abb88 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1971,8 +1971,6 @@ static unsigned int run_filter(struct sk_buff *skb,
 static int __packet_rcv_vnet(const struct sk_buff *skb,
 struct virtio_net_hdr *vnet_hdr)
 {
-   *vnet_hdr = (const struct virtio_net_hdr) { 0 };
-
if (virtio_net_hdr_from_skb(skb, vnet_hdr, vio_le()))
BUG();
 
@@ -2391,8 +2389,6 @@ static void tpacket_set_protocol(const struct net_device 
*dev,
 
 static int __packet_snd_vnet_parse(struct virtio_net_hdr *vnet_hdr, size_t len)
 {
-   unsigned short gso_type = 0;
-
if ((vnet_hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
(__virtio16_to_cpu(vio_le(), vnet_hdr->csum_start) +
 __virtio16_to_cpu(vio_le(), vnet_hdr->csum_offset) + 2 >
@@ -2404,29 +2400,6 @@ static int __packet_snd_vnet_parse(struct virtio_net_hdr 
*vnet_hdr, size_t len)
if (__virtio16_to_cpu(vio_le(), vnet_hdr->hdr_len) > len)
return -EINVAL;
 
-   if (vnet_hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
-   switch (vnet_hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
-   case VIRTIO_NET_HDR_GSO_TCPV4:
-   gso_type = SKB_GSO_TCPV4;
-   break;
-   case VIRTIO_NET_HDR_GSO_TCPV6:
-   gso_type = SKB_GSO_TCPV6;
-   break;
-   case VIRTIO_NET_HDR_GSO_UDP:
-   gso_type = SKB_GSO_UDP;
-   break;
-   default:
-   return -EINVAL;
-   }
-
-   if (vnet_hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
-   gso_type |= SKB_GSO

[PATCH net] openvswitch: Remove incorrect WARN_ONCE().

2016-08-01 Thread Jarno Rajahalme
ovs_ct_find_existing() issues a warning if an existing conntrack entry
classified as IP_CT_NEW is found, with the premise that this should
not happen.  However, a newly confirmed, non-expected conntrack entry
remains IP_CT_NEW as long as no reply direction traffic is seen.  This
has resulted into somewhat confusing kernel log messages.  This patch
removes this check and warning.

Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.")
Suggested-by: Joe Stringer <j...@ovn.org>
Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c644c78..e054a74 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -433,7 +433,6 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
struct nf_conntrack_l4proto *l4proto;
struct nf_conntrack_tuple tuple;
struct nf_conntrack_tuple_hash *h;
-   enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
unsigned int dataoff;
u8 protonum;
@@ -458,13 +457,8 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
 
ct = nf_ct_tuplehash_to_ctrack(h);
 
-   ctinfo = ovs_ct_get_info(h);
-   if (ctinfo == IP_CT_NEW) {
-   /* This should not happen. */
-   WARN_ONCE(1, "ovs_ct_find_existing: new packet for %p\n", ct);
-   }
skb->nfct = >ct_general;
-   skb->nfctinfo = ctinfo;
+   skb->nfctinfo = ovs_ct_get_info(h);
return ct;
 }
 
-- 
2.1.4



Re: [ovs-dev] [PATCH net-next] openvswitch: Only set mark and labels when commiting a connection.

2016-06-21 Thread Jarno Rajahalme
Thanks for the review!

> On Jun 21, 2016, at 1:57 PM, Joe Stringer <j...@ovn.org> wrote:
> 
> On 20 June 2016 at 17:19, Jarno Rajahalme <ja...@ovn.org> wrote:
>> Only allow setting conntrack mark or labels when the commit flag is
>> specified.  This makes sure we can not set them before the connection
>> has been persisted, as in that case the mark and labels would be lost
>> in an event of an userspace upcall.
>> 
>> OVS userspace already requires the commit flag to accept setting
>> ct_mark and/or ct_labels.  Validate for this on the kernel API.
>> 
>> Finally, set conntrack mark and labels right before committing so that
>> the initial conntrack NEW event has the mark and labels.
>> 
>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> 
> The structure of this commit message suggests there are multiple
> changes trying to be addressed in one patch. I suggest splitting them
> out.
> 

Done for v2 I just sent for net.

> In terms of applying the mark and labels before committing the
> connection, that's actually the behaviour I would expect if you were
> to execute ct(mark=foo,commit). The NEW event should include these
> pieces, and should have all along.

Right, the v2 patch 1/2 does this.

> 
>> @@ -1145,6 +1155,20 @@ static int parse_ct(const struct nlattr *attr, struct 
>> ovs_conntrack_info *info,
>>}
>>}
>> 
>> +#ifdef CONFIG_NF_CONNTRACK_MARK
>> +   if (!info->commit && info->mark.mask) {
>> +   OVS_NLERR(log,
>> + "Setting conntrack mark requires 'commit' flag.");
>> +   return -EINVAL;
>> +   }
>> +#endif
>> +#ifdef CONFIG_NF_CONNTRACK_LABELS
>> +   if (!info->commit && labels_nonzero(>labels.mask)) {
>> +   OVS_NLERR(log,
>> + "Setting conntrack labels requires 'commit' 
>> flag.");
>> +   return -EINVAL;
>> +   }
>> +#endif
> 
> I'm of mixed minds about this, but I lean towards agreeing with it. On
> one hand, it's applying more restrictions on an otherwise fairly loose
> interface and if anyone is relying on this behaviour then it would be
> surprising to have this restriction introduced. On the other hand, it
> doesn't make a lot of sense to set a label/mark but not to commit the
> connection. As you say, the behaviour isn't exactly consistent in that
> case today anyway: If there was a flow with
> actions=ct(mark=foo),recirc() followed by a userspace upcall, then the
> mark would be reflected in the flow key but not saved to any persisted
> connection. A subsequent ct(commit) after upcall wouldn't persist it,
> either. However if there were two flows already in the datapath to do
> this, then it /would/ be persisted. Restricting the mark/labels
> modification to only if you have the "commit" flag would address that
> consistency issue. The OVS userspace enforcing this constraint also
> hints that this was an unintentional omission from kernel validation.

I separated this out to the v2 patch 2/2.

  Jarno



[PATCH net v2 1/2] openvswitch: Set mark and labels before confirming.

2016-06-21 Thread Jarno Rajahalme
Set conntrack mark and labels right before committing so that
the initial conntrack NEW event has the mark and labels.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
v2: Separate Kernel API change to an RFC patch (2/2).

 net/openvswitch/conntrack.c | 33 ++---
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 3d5feed..23fd4fb 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -824,23 +824,6 @@ static int ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
return 0;
 }
 
-/* Lookup connection and confirm if unconfirmed. */
-static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
-const struct ovs_conntrack_info *info,
-struct sk_buff *skb)
-{
-   int err;
-
-   err = __ovs_ct_lookup(net, key, info, skb);
-   if (err)
-   return err;
-   /* This is a no-op if the connection has already been confirmed. */
-   if (nf_conntrack_confirm(skb) != NF_ACCEPT)
-   return -EINVAL;
-
-   return 0;
-}
-
 static bool labels_nonzero(const struct ovs_key_ct_labels *labels)
 {
size_t i;
@@ -873,21 +856,33 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
}
 
if (info->commit)
-   err = ovs_ct_commit(net, key, info, skb);
+   err = __ovs_ct_lookup(net, key, info, skb);
else
err = ovs_ct_lookup(net, key, info, skb);
if (err)
goto err;
 
+   /* Apply changes before confirming the connection so that the initial
+* conntrack NEW netlink event carries the values given in the CT
+* action.
+*/
if (info->mark.mask) {
err = ovs_ct_set_mark(skb, key, info->mark.value,
  info->mark.mask);
if (err)
goto err;
}
-   if (labels_nonzero(>labels.mask))
+   if (labels_nonzero(>labels.mask)) {
err = ovs_ct_set_labels(skb, key, >labels.value,
>labels.mask);
+   if (err)
+   goto err;
+   }
+   /* This will take care of sending queued events even if the connection
+* is already confirmed.
+*/
+   if (info->commit && nf_conntrack_confirm(skb) != NF_ACCEPT)
+   err = -EINVAL;
 err:
skb_push(skb, nh_ofs);
if (err)
-- 
2.1.4



[RFC PATCH net v2 2/2] openvswitch: Only set mark and labels with a commit flag.

2016-06-21 Thread Jarno Rajahalme
Only set conntrack mark or labels when the commit flag is specified.
This makes sure we can not set them before the connection has been
persisted, as in that case the mark and labels would be lost in an
event of an userspace upcall.

OVS userspace already requires the commit flag to accept setting
ct_mark and/or ct_labels.  Validate for this in the kernel API.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 76 ++---
 1 file changed, 51 insertions(+), 25 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 23fd4fb..52f3b9b 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -835,6 +835,42 @@ static bool labels_nonzero(const struct ovs_key_ct_labels 
*labels)
return false;
 }
 
+/* Lookup connection and confirm if unconfirmed. */
+static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
+const struct ovs_conntrack_info *info,
+struct sk_buff *skb)
+{
+   int err;
+
+   err = __ovs_ct_lookup(net, key, info, skb);
+   if (err)
+   return err;
+
+   /* Apply changes before confirming the connection so that the initial
+* conntrack NEW netlink event carries the values given in the CT
+* action.
+*/
+   if (info->mark.mask) {
+   err = ovs_ct_set_mark(skb, key, info->mark.value,
+ info->mark.mask);
+   if (err)
+   return err;
+   }
+   if (labels_nonzero(>labels.mask)) {
+   err = ovs_ct_set_labels(skb, key, >labels.value,
+   >labels.mask);
+   if (err)
+   return err;
+   }
+   /* This will take care of sending queued events even if the connection
+* is already confirmed.
+*/
+   if (nf_conntrack_confirm(skb) != NF_ACCEPT)
+   return -EINVAL;
+
+   return 0;
+}
+
 /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
  * value if 'skb' is freed.
  */
@@ -856,34 +892,10 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
}
 
if (info->commit)
-   err = __ovs_ct_lookup(net, key, info, skb);
+   err = ovs_ct_commit(net, key, info, skb);
else
err = ovs_ct_lookup(net, key, info, skb);
-   if (err)
-   goto err;
 
-   /* Apply changes before confirming the connection so that the initial
-* conntrack NEW netlink event carries the values given in the CT
-* action.
-*/
-   if (info->mark.mask) {
-   err = ovs_ct_set_mark(skb, key, info->mark.value,
- info->mark.mask);
-   if (err)
-   goto err;
-   }
-   if (labels_nonzero(>labels.mask)) {
-   err = ovs_ct_set_labels(skb, key, >labels.value,
-   >labels.mask);
-   if (err)
-   goto err;
-   }
-   /* This will take care of sending queued events even if the connection
-* is already confirmed.
-*/
-   if (info->commit && nf_conntrack_confirm(skb) != NF_ACCEPT)
-   err = -EINVAL;
-err:
skb_push(skb, nh_ofs);
if (err)
kfree_skb(skb);
@@ -1140,6 +1152,20 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
}
}
 
+#ifdef CONFIG_NF_CONNTRACK_MARK
+   if (!info->commit && info->mark.mask) {
+   OVS_NLERR(log,
+ "Setting conntrack mark requires 'commit' flag.");
+   return -EINVAL;
+   }
+#endif
+#ifdef CONFIG_NF_CONNTRACK_LABELS
+   if (!info->commit && labels_nonzero(>labels.mask)) {
+   OVS_NLERR(log,
+ "Setting conntrack labels requires 'commit' flag.");
+   return -EINVAL;
+   }
+#endif
if (rem > 0) {
OVS_NLERR(log, "Conntrack attr has %d unknown bytes", rem);
return -EINVAL;
-- 
2.1.4



[PATCH net-next] openvswitch: Only set mark and labels when commiting a connection.

2016-06-20 Thread Jarno Rajahalme
Only allow setting conntrack mark or labels when the commit flag is
specified.  This makes sure we can not set them before the connection
has been persisted, as in that case the mark and labels would be lost
in an event of an userspace upcall.

OVS userspace already requires the commit flag to accept setting
ct_mark and/or ct_labels.  Validate for this on the kernel API.

Finally, set conntrack mark and labels right before committing so that
the initial conntrack NEW event has the mark and labels.

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 net/openvswitch/conntrack.c | 72 ++---
 1 file changed, 48 insertions(+), 24 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 3d5feed..f1612f5 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -824,6 +824,17 @@ static int ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
return 0;
 }
 
+static bool labels_nonzero(const struct ovs_key_ct_labels *labels)
+{
+   size_t i;
+
+   for (i = 0; i < sizeof(*labels); i++)
+   if (labels->ct_labels[i])
+   return true;
+
+   return false;
+}
+
 /* Lookup connection and confirm if unconfirmed. */
 static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
 const struct ovs_conntrack_info *info,
@@ -834,24 +845,35 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
err = __ovs_ct_lookup(net, key, info, skb);
if (err)
return err;
-   /* This is a no-op if the connection has already been confirmed. */
+
+   /* As any changes to an unconfirmed connection may be lost due
+* to an upcall, we require the presence of the 'commit' flag
+* for setting mask and/or labels.  Perform the changes before
+* confirming the connection so that the initial mark and label
+* values are present in the initial CT NEW netlink event.
+*/
+   if (info->mark.mask) {
+   err = ovs_ct_set_mark(skb, key, info->mark.value,
+ info->mark.mask);
+   if (err)
+   return err;
+   }
+   if (labels_nonzero(>labels.mask)) {
+   err = ovs_ct_set_labels(skb, key, >labels.value,
+   >labels.mask);
+   if (err)
+   return err;
+   }
+
+   /* Will take care of sending queued events even if the connection is
+* already confirmed.
+*/
if (nf_conntrack_confirm(skb) != NF_ACCEPT)
return -EINVAL;
 
return 0;
 }
 
-static bool labels_nonzero(const struct ovs_key_ct_labels *labels)
-{
-   size_t i;
-
-   for (i = 0; i < sizeof(*labels); i++)
-   if (labels->ct_labels[i])
-   return true;
-
-   return false;
-}
-
 /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
  * value if 'skb' is freed.
  */
@@ -876,19 +898,7 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
err = ovs_ct_commit(net, key, info, skb);
else
err = ovs_ct_lookup(net, key, info, skb);
-   if (err)
-   goto err;
 
-   if (info->mark.mask) {
-   err = ovs_ct_set_mark(skb, key, info->mark.value,
- info->mark.mask);
-   if (err)
-   goto err;
-   }
-   if (labels_nonzero(>labels.mask))
-   err = ovs_ct_set_labels(skb, key, >labels.value,
-   >labels.mask);
-err:
skb_push(skb, nh_ofs);
if (err)
kfree_skb(skb);
@@ -1145,6 +1155,20 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
}
}
 
+#ifdef CONFIG_NF_CONNTRACK_MARK
+   if (!info->commit && info->mark.mask) {
+   OVS_NLERR(log,
+ "Setting conntrack mark requires 'commit' flag.");
+   return -EINVAL;
+   }
+#endif
+#ifdef CONFIG_NF_CONNTRACK_LABELS
+   if (!info->commit && labels_nonzero(>labels.mask)) {
+   OVS_NLERR(log,
+ "Setting conntrack labels requires 'commit' flag.");
+   return -EINVAL;
+   }
+#endif
if (rem > 0) {
OVS_NLERR(log, "Conntrack attr has %d unknown bytes", rem);
return -EINVAL;
-- 
2.1.4



Re: [PATCH net-next] openvswitch: Only set mark and labels when commiting a connection.

2016-06-20 Thread Jarno Rajahalme
The title should have been:

openvswitch: Only set mark and labels with a commit flag.

This reflects the fact that modifying the mark and/or labels of an existing 
connection is allowed. The commit flag is still required to do that, though.

  Jarno

> On Jun 20, 2016, at 5:19 PM, Jarno Rajahalme <ja...@ovn.org> wrote:
> 
> Only allow setting conntrack mark or labels when the commit flag is
> specified.  This makes sure we can not set them before the connection
> has been persisted, as in that case the mark and labels would be lost
> in an event of an userspace upcall.
> 
> OVS userspace already requires the commit flag to accept setting
> ct_mark and/or ct_labels.  Validate for this on the kernel API.
> 
> Finally, set conntrack mark and labels right before committing so that
> the initial conntrack NEW event has the mark and labels.
> 
> Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
> ---
> net/openvswitch/conntrack.c | 72 ++---
> 1 file changed, 48 insertions(+), 24 deletions(-)
> 
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 3d5feed..f1612f5 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -824,6 +824,17 @@ static int ovs_ct_lookup(struct net *net, struct 
> sw_flow_key *key,
>   return 0;
> }
> 
> +static bool labels_nonzero(const struct ovs_key_ct_labels *labels)
> +{
> + size_t i;
> +
> + for (i = 0; i < sizeof(*labels); i++)
> + if (labels->ct_labels[i])
> + return true;
> +
> + return false;
> +}
> +
> /* Lookup connection and confirm if unconfirmed. */
> static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
>const struct ovs_conntrack_info *info,
> @@ -834,24 +845,35 @@ static int ovs_ct_commit(struct net *net, struct 
> sw_flow_key *key,
>   err = __ovs_ct_lookup(net, key, info, skb);
>   if (err)
>   return err;
> - /* This is a no-op if the connection has already been confirmed. */
> +
> + /* As any changes to an unconfirmed connection may be lost due
> +  * to an upcall, we require the presence of the 'commit' flag
> +  * for setting mask and/or labels.  Perform the changes before
> +  * confirming the connection so that the initial mark and label
> +  * values are present in the initial CT NEW netlink event.
> +  */
> + if (info->mark.mask) {
> + err = ovs_ct_set_mark(skb, key, info->mark.value,
> +   info->mark.mask);
> + if (err)
> + return err;
> + }
> + if (labels_nonzero(>labels.mask)) {
> + err = ovs_ct_set_labels(skb, key, >labels.value,
> + >labels.mask);
> + if (err)
> + return err;
> + }
> +
> + /* Will take care of sending queued events even if the connection is
> +  * already confirmed.
> +  */
>   if (nf_conntrack_confirm(skb) != NF_ACCEPT)
>   return -EINVAL;
> 
>   return 0;
> }
> 
> -static bool labels_nonzero(const struct ovs_key_ct_labels *labels)
> -{
> - size_t i;
> -
> - for (i = 0; i < sizeof(*labels); i++)
> - if (labels->ct_labels[i])
> - return true;
> -
> - return false;
> -}
> -
> /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
>  * value if 'skb' is freed.
>  */
> @@ -876,19 +898,7 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
>   err = ovs_ct_commit(net, key, info, skb);
>   else
>   err = ovs_ct_lookup(net, key, info, skb);
> - if (err)
> - goto err;
> 
> - if (info->mark.mask) {
> - err = ovs_ct_set_mark(skb, key, info->mark.value,
> -   info->mark.mask);
> - if (err)
> - goto err;
> - }
> - if (labels_nonzero(>labels.mask))
> - err = ovs_ct_set_labels(skb, key, >labels.value,
> - >labels.mask);
> -err:
>   skb_push(skb, nh_ofs);
>   if (err)
>   kfree_skb(skb);
> @@ -1145,6 +1155,20 @@ static int parse_ct(const struct nlattr *attr, struct 
> ovs_conntrack_info *info,
>   }
>   }
> 
> +#ifdef CONFIG_NF_CONNTRACK_MARK
> + if (!info->commit && info->mark.mask) {
> + OVS_NLERR(log,
> +   "Setting conntrack mark requires 'commit' flag.");
> + return -EINVAL;
> + }
> +#endif
> +#ifdef CONFIG_NF_CONNTRACK_LABELS
> + if (!info->commit && labels_nonzero(>labels.mask)) {
> + OVS_NLERR(log,
> +   "Setting conntrack labels requires 'commit' flag.");
> + return -EINVAL;
> + }
> +#endif
>   if (rem > 0) {
>   OVS_NLERR(log, "Conntrack attr has %d unknown bytes", rem);
>   return -EINVAL;
> -- 
> 2.1.4
> 



Re: [PATCH net] openvswitch: Fix cached ct with helper.

2016-05-10 Thread Jarno Rajahalme
This would result in inconsistent helper assignment if a first CT action 
assigns a helper and a further CT action tries to assign a different helper; 
Typically the second helper assignment would be ignored, but if the unconfirmed 
conntrack entry is lost due to an upcall the second helper assignment would be 
successful.  This is best resolved by allowing helper assignment by a 
committing CT action only by testing the 'info->commit' flag in addition to the 
conditions you have there already. It may also be helpful to fail helper 
assignment without the commit flag in parse_ct().

  Jarno

> On May 10, 2016, at 11:40 AM, Joe Stringer  wrote:
> 
> When using conntrack helpers from OVS, a common configuration is to
> perform a lookup without specifying a helper, then go through a
> firewalling policy, only to decide to attach a helper afterwards.
> 
> In this case, the initial lookup will cause a ct entry to be attached to
> the skb, then the later commit with helper should attach the helper and
> confirm the connection. However, the helper attachment has been missing.
> If the user has enabled automatic helper attachment, then this issue
> will be masked as it will be applied in init_conntrack(). It is also
> masked if the action is executed from ovs_packet_cmd_execute() as that
> will construct a fresh skb.
> 
> This patch fixes the issue by making an explicit call to try to assign
> the helper if there is a discrepancy between the action's helper and the
> current skb->nfct.
> 
> Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action")
> Signed-off-by: Joe Stringer 
> ---
> net/openvswitch/conntrack.c | 12 
> 1 file changed, 12 insertions(+)
> 
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index b5fea1101faa..89f61a1720eb 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -776,6 +776,18 @@ static int __ovs_ct_lookup(struct net *net, struct 
> sw_flow_key *key,
>   return -EINVAL;
>   }
> 
> + /* Userspace may decide to perform a ct lookup without a helper
> +  * specified followed by a (recirculate and) commit with one.
> +  * Therefore, for unconfirmed connections we need to attach the
> +  * helper here.
> +  */
> + if (!nf_ct_is_confirmed(ct) && info->helper && !nfct_help(ct)) {
> + int err = __nf_ct_try_assign_helper(ct, info->ct,
> + GFP_ATOMIC);
> + if (err)
> + return err;
> + }
> +
>   /* Call the helper only if:
>* - nf_conntrack_in() was executed above ("!cached") for a
>*   confirmed connection, or
> -- 
> 2.1.4
> 



[PATCH net v3 1/2] udp_tunnel: Remove redundant udp_tunnel_gro_complete().

2016-05-03 Thread Jarno Rajahalme
The setting of the UDP tunnel GSO type is already performed by
udp[46]_gro_complete().

Signed-off-by: Jarno Rajahalme <ja...@ovn.org>
---
 drivers/net/geneve.c | 2 --
 drivers/net/vxlan.c  | 2 --
 include/net/udp_tunnel.h | 9 -
 net/ipv4/fou.c   | 2 --
 4 files changed, 15 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index bc16889..98f1224 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -504,8 +504,6 @@ static int geneve_gro_complete(struct sk_buff *skb, int 
nhoff,
int gh_len;
int err = -ENOSYS;
 
-   udp_tunnel_gro_complete(skb, nhoff);
-
gh = (struct genevehdr *)(skb->data + nhoff);
gh_len = geneve_hlen(gh);
type = gh->proto_type;
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 1c0fa36..dd2d032 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -616,8 +616,6 @@ out:
 static int vxlan_gro_complete(struct sk_buff *skb, int nhoff,
  struct udp_offload *uoff)
 {
-   udp_tunnel_gro_complete(skb, nhoff);
-
return eth_gro_complete(skb, nhoff + sizeof(struct vxlanhdr));
 }
 
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index b831140..a114024 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -106,15 +106,6 @@ static inline struct sk_buff 
*udp_tunnel_handle_offloads(struct sk_buff *skb,
return iptunnel_handle_offloads(skb, type);
 }
 
-static inline void udp_tunnel_gro_complete(struct sk_buff *skb, int nhoff)
-{
-   struct udphdr *uh;
-
-   uh = (struct udphdr *)(skb->data + nhoff - sizeof(struct udphdr));
-   skb_shinfo(skb)->gso_type |= uh->check ?
-   SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
-}
-
 static inline void udp_tunnel_encap_enable(struct socket *sock)
 {
 #if IS_ENABLED(CONFIG_IPV6)
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index a39068b..305d9ac 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -228,8 +228,6 @@ static int fou_gro_complete(struct sk_buff *skb, int nhoff,
int err = -ENOSYS;
const struct net_offload **offloads;
 
-   udp_tunnel_gro_complete(skb, nhoff);
-
rcu_read_lock();
offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
ops = rcu_dereference(offloads[proto]);
-- 
2.7.4



  1   2   3   >