Re: [ovs-dev] [PATCH net] net: ensure all external references are released in deferred skbuffs

2022-06-22 Thread Florian Westphal
Eric Dumazet  wrote:
> On Sun, Jun 19, 2022 at 2:39 AM Ilya Maximets  wrote:
> >
> > Open vSwitch system test suite is broken due to inability to
> > load/unload netfilter modules.  kworker thread is getting trapped
> > in the infinite loop while running a net cleanup inside the
> > nf_conntrack_cleanup_net_list, because deferred skbuffs are still
> > holding nfct references and not being freed by their CPU cores.
> >
> > In general, the idea that we will have an rx interrupt on every
> > CPU core at some point in a near future doesn't seem correct.
> > Devices are getting created and destroyed, interrupts are getting
> > re-scheduled, CPUs are going online and offline dynamically.
> > Any of these events may leave packets stuck in defer list for a
> > long time.  It might be OK, if they are just a piece of memory,
> > but we can't afford them holding references to any other resources.
> >
> > In case of OVS, nfct reference keeps the kernel thread in busy loop
> > while holding a 'pernet_ops_rwsem' semaphore.  That blocks the
> > later modprobe request from user space:
> >
> >   # ps
> >299 root  R  99.3  200:25.89 kworker/u96:4+
> >
> >   # journalctl
> >   INFO: task modprobe:11787 blocked for more than 1228 seconds.
> > Not tainted 5.19.0-rc2 #8
> >   task:modprobe state:D
> >   Call Trace:
> >
> >__schedule+0x8aa/0x21d0
> >schedule+0xcc/0x200
> >rwsem_down_write_slowpath+0x8e4/0x1580
> >down_write+0xfc/0x140
> >register_pernet_subsys+0x15/0x40
> >nf_nat_init+0xb6/0x1000 [nf_nat]
> >do_one_initcall+0xbb/0x410
> >do_init_module+0x1b4/0x640
> >load_module+0x4c1b/0x58d0
> >__do_sys_init_module+0x1d7/0x220
> >do_syscall_64+0x3a/0x80
> >entry_SYSCALL_64_after_hwframe+0x46/0xb0
> >
> > At this point OVS testsuite is unresponsive and never recover,
> > because these skbuffs are never freed.
> >
> > Solution is to make sure no external references attached to skb
> > before pushing it to the defer list.  Using skb_release_head_state()
> > for that purpose.  The function modified to be re-enterable, as it
> > will be called again during the defer list flush.
> >
> > Another approach that can fix the OVS use-case, is to kick all
> > cores while waiting for references to be released during the net
> > cleanup.  But that sounds more like a workaround for a current
> > issue rather than a proper solution and will not cover possible
> > issues in other parts of the code.
> >
> > Additionally checking for skb_zcopy() while deferring.  This might
> > not be necessary, as I'm not sure if we can actually have zero copy
> > packets on this path, but seems worth having for completeness as we
> > should never defer such packets regardless.
> >
> > CC: Eric Dumazet 
> > Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu 
> > lists")
> > Signed-off-by: Ilya Maximets 
> > ---
> >  net/core/skbuff.c | 16 +++-
> >  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> I do not think this patch is doing the right thing.
> 
> Packets sitting in TCP receive queues should not hold state that is
> not relevant for TCP recvmsg().

Agree, but tcp_v4/6_rcv() already call nf_reset_ct(), else it would
not be possible to remove nf_conntrack module in practice.

I wonder where the deferred skbs are coming from, any and all
queued skbs need the conntrack state dropped.

I don't mind a new helper that does a combined dst+ct release though.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] openvswitch: Ensure nf_ct_put is not called with null pointer

2022-04-09 Thread Florian Westphal
Mark Mielke  wrote:
> A recent commit replaced calls to nf_conntrack_put() with calls
> to nf_ct_put(). nf_conntrack_put() permitted the caller to pass
> null without side effects, while nf_ct_put() performs WARN_ON()
> and proceeds to try and de-reference the pointer. ovs-vswitchd
> triggers the warning on startup:
> 
> [   22.178881] WARNING: CPU: 69 PID: 2157 at 
> include/net/netfilter/nf_conntrack.h:176 __ovs_ct_lookup+0x4e2/0x6a0 
> [openvswitch]
> ...
> [   22.213573] Call Trace:
> [   22.214318]  
> [   22.215064]  ovs_ct_execute+0x49c/0x7f0 [openvswitch]

Applied to nf.git, thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] openvswitch: Ensure nf_ct_put is not called with null pointer

2022-04-10 Thread Florian Westphal
Mark Mielke  wrote:
> A recent commit replaced calls to nf_conntrack_put() with calls
> to nf_ct_put(). nf_conntrack_put() permitted the caller to pass
> null without side effects, while nf_ct_put() performs WARN_ON()
> and proceeds to try and de-reference the pointer. ovs-vswitchd
> triggers the warning on startup:
> 
> [   22.178881] WARNING: CPU: 69 PID: 2157 at 
> include/net/netfilter/nf_conntrack.h:176 __ovs_ct_lookup+0x4e2/0x6a0 
> [openvswitch]
> ...
> [   22.213573] Call Trace:
> [   22.214318]  
> [   22.215064]  ovs_ct_execute+0x49c/0x7f0 [openvswitch]
> ...
> Cc: sta...@vger.kernel.org
> Fixes: 408bdcfce8df ("net: prefer nf_ct_put instead of nf_conntrack_put")

Actually, no.  As Pablo Neira just pointed out to me Upstream kernel is fine.
The preceeding commit made nf_ct_out() a noop when ct is NULL.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] openvswitch: Ensure nf_ct_put is not called with null pointer

2022-04-25 Thread Florian Westphal
Ilya Maximets  wrote:
> Hi, Florian.
> 
> There is a problem on 5.15 longterm tree where the offending commit
> got backported, but the previous one was not, so it triggers an issue
> while loading the openvswitch module.
> 
> To be more clear, v5.15.35 contains the following commit:
>   408bdcfce8df ("net: prefer nf_ct_put instead of nf_conntrack_put")
> backported as commit 72dd9e61fa319bc44020c2d365275fc8f6799bff, but
> it doesn't have the previous one:
>   6ae7989c9af0 ("netfilter: conntrack: avoid useless indirection during 
> conntrack destruction")
> that adds the NULL pointer check to the nf_ct_put().
> 
> Either 6ae7989c9af0 should be backported to 5.15 or 72dd9e61fa31
> reverted on that tree.

The commit was never meant to be backported to stable, it doesn't fix any bug.

I suspect it was done to take 'net/sched: act_ct: fix ref leak when
switching zones' without munging it.

I suggest to add stable-only patch that makes nf_ct_put(NULL)
legal, like in linux.git, but I don't know stable team preferences.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: fix zone sync issue

2020-12-16 Thread Florian Westphal
Flavio Leitner  wrote:
> 
> This email has 'To' field pointing to ovs-dev, but the patch
> seems to fix another code other than OVS.
> 
> You might have realized by now, but in case you're still waiting... :)

Thanks for pointing that out, patch has been applied to conntrack-tools
repo.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH nf-next 5/5] net: prefer nf_ct_put instead of nf_conntrack_put

2022-01-07 Thread Florian Westphal
Its the same as nf_conntrack_put(), but without the
need for an indirect call.  The downside is a module dependency on
nf_conntrack, but all of these already depend on conntrack anyway.

Cc: Paul Blakey 
Cc: d...@openvswitch.org
Signed-off-by: Florian Westphal 
---
 net/netfilter/nf_conntrack_core.c |  4 ++--
 net/openvswitch/conntrack.c   | 14 ++
 net/sched/act_ct.c|  6 +++---
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 7a2063abae04..c74933a6039f 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -989,7 +989,7 @@ static int __nf_ct_resolve_clash(struct sk_buff *skb,
 
nf_ct_acct_merge(ct, ctinfo, loser_ct);
nf_ct_add_to_dying_list(loser_ct);
-   nf_conntrack_put(&loser_ct->ct_general);
+   nf_ct_put(loser_ct);
nf_ct_set(skb, ct, ctinfo);
 
NF_CT_STAT_INC(net, clash_resolve);
@@ -1921,7 +1921,7 @@ nf_conntrack_in(struct sk_buff *skb, const struct 
nf_hook_state *state)
/* Invalid: inverse of the return code tells
 * the netfilter core what to do */
pr_debug("nf_conntrack_in: Can't track with proto module\n");
-   nf_conntrack_put(&ct->ct_general);
+   nf_ct_put(ct);
skb->_nfct = 0;
NF_CT_STAT_INC_ATOMIC(state->net, invalid);
if (ret == -NF_DROP)
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 121664e52271..a921c5c00a1b 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -574,7 +574,7 @@ ovs_ct_expect_find(struct net *net, const struct 
nf_conntrack_zone *zone,
struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
 
nf_ct_delete(ct, 0, 0);
-   nf_conntrack_put(&ct->ct_general);
+   nf_ct_put(ct);
}
}
 
@@ -723,7 +723,7 @@ static bool skb_nfct_cached(struct net *net,
if (nf_ct_is_confirmed(ct))
nf_ct_delete(ct, 0, 0);
 
-   nf_conntrack_put(&ct->ct_general);
+   nf_ct_put(ct);
nf_ct_set(skb, NULL, 0);
return false;
}
@@ -967,7 +967,8 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 
/* Associate skb with specified zone. */
if (tmpl) {
-   nf_conntrack_put(skb_nfct(skb));
+   ct = nf_ct_get(skb, &ctinfo);
+   nf_ct_put(ct);
nf_conntrack_get(&tmpl->ct_general);
nf_ct_set(skb, tmpl, IP_CT_NEW);
}
@@ -1328,7 +1329,12 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
 
 int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key *key)
 {
-   nf_conntrack_put(skb_nfct(skb));
+   enum ip_conntrack_info ctinfo;
+   struct nf_conn *ct;
+
+   ct = nf_ct_get(skb, &ctinfo);
+
+   nf_ct_put(ct);
nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
ovs_ct_fill_key(skb, key, false);
 
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 3fa904cf8f27..9db291b45878 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -598,7 +598,7 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct 
sk_buff *skb,
if (nf_ct_is_confirmed(ct))
nf_ct_kill(ct);
 
-   nf_conntrack_put(&ct->ct_general);
+   nf_ct_put(ct);
nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
 
return false;
@@ -763,7 +763,7 @@ static void tcf_ct_params_free(struct rcu_head *head)
tcf_ct_flow_table_put(params);
 
if (params->tmpl)
-   nf_conntrack_put(¶ms->tmpl->ct_general);
+   nf_ct_put(params->tmpl);
kfree(params);
 }
 
@@ -967,7 +967,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
qdisc_skb_cb(skb)->post_ct = false;
ct = nf_ct_get(skb, &ctinfo);
if (ct) {
-   nf_conntrack_put(&ct->ct_general);
+   nf_ct_put(ct);
nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
}
 
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: make masks cache size configurable

2020-07-22 Thread Florian Westphal
Eelco Chaudron  wrote:
> This patch makes the masks cache size configurable, or with
> a size of 0, disable it.
> 
> Reviewed-by: Paolo Abeni 
> Signed-off-by: Eelco Chaudron 
> ---
>  include/uapi/linux/openvswitch.h |1 
>  net/openvswitch/datapath.c   |   11 +
>  net/openvswitch/flow_table.c |   86 
> ++
>  net/openvswitch/flow_table.h |   10 
>  4 files changed, 98 insertions(+), 10 deletions(-)
> 
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 7cb76e5ca7cf..8300cc29dec8 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -86,6 +86,7 @@ enum ovs_datapath_attr {
>   OVS_DP_ATTR_MEGAFLOW_STATS, /* struct ovs_dp_megaflow_stats */
>   OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
>   OVS_DP_ATTR_PAD,
> + OVS_DP_ATTR_MASKS_CACHE_SIZE,

This new attr should probably get an entry in
datapath.c datapath_policy[].

> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -1535,6 +1535,10 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, 
> struct sk_buff *skb,
>   if (nla_put_u32(skb, OVS_DP_ATTR_USER_FEATURES, dp->user_features))
>   goto nla_put_failure;
>  
> + if (nla_put_u32(skb, OVS_DP_ATTR_MASKS_CACHE_SIZE,
> + ovs_flow_tbl_masks_cache_size(&dp->table)))
> + goto nla_put_failure;
> +
>   genlmsg_end(skb, ovs_header);
>   return 0;


ovs_dp_cmd_msg_size() should add another nla_total_size(sizeof(u32))
to make sure there is enough space.

> + if (a[OVS_DP_ATTR_MASKS_CACHE_SIZE]) {
> + u32 cache_size;
> +
> + cache_size = nla_get_u32(a[OVS_DP_ATTR_MASKS_CACHE_SIZE]);
> + ovs_flow_tbl_masks_cache_resize(&dp->table, cache_size);
> + }

I see a 0 cache size is legal (turns it off) and that the allocation
path has a few sanity checks as well.

Would it make sense to add min/max policy to datapath_policy[] for this
as well?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: make masks cache size configurable

2020-07-23 Thread Florian Westphal
Eelco Chaudron  wrote:
> On 22 Jul 2020, at 21:22, Florian Westphal wrote:
> > I see a 0 cache size is legal (turns it off) and that the allocation
> > path has a few sanity checks as well.
> > 
> > Would it make sense to add min/max policy to datapath_policy[] for this
> > as well?
> 
> Yes I could add the following:
> 
> @@ -1906,7 +1906,8 @@ static const struct nla_policy
> datapath_policy[OVS_DP_ATTR_MAX + 1] = {
> [OVS_DP_ATTR_NAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1
> },
> [OVS_DP_ATTR_UPCALL_PID] = { .type = NLA_U32 },
> [OVS_DP_ATTR_USER_FEATURES] = { .type = NLA_U32 },
> +   [OVS_DP_ATTR_MASKS_CACHE_SIZE] =  NLA_POLICY_RANGE(NLA_U32, 0,
> +   PCPU_MIN_UNIT_SIZE / sizeof(struct mask_cache_entry)),
>  };
> Let me know your thoughts

I think its a good idea.  When 'max' becomes too restricted one could
rework internal kernel logic to support larger size and userspace
can detect it by probing with a larger size first.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH nf-next v2] openvswitch: use nf_ct_get_tuplepr, invert_tuplepr

2018-06-25 Thread Florian Westphal
These versions deal with the l3proto/l4proto details internally.
It removes only caller of nf_ct_get_tuple, so make it static.

After this, l3proto->get_l4proto() can be removed in a followup patch.

Signed-off-by: Florian Westphal 
---
 No changes since v1.

 This is a preparation patch to remove the l3proto indirections.
 Evanutally nf_conntrack_l3proto will be removed.

 ipv4 and ipv6 protocol trackers will be part of nf_conntrack itself.

 include/net/netfilter/nf_conntrack_core.h |  7 ---
 net/netfilter/nf_conntrack_core.c |  3 +--
 net/openvswitch/conntrack.c   | 17 +++--
 3 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_core.h 
b/include/net/netfilter/nf_conntrack_core.h
index 9b5e7634713e..90df45022c51 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -40,13 +40,6 @@ void nf_conntrack_cleanup_start(void);
 void nf_conntrack_init_end(void);
 void nf_conntrack_cleanup_end(void);
 
-bool nf_ct_get_tuple(const struct sk_buff *skb, unsigned int nhoff,
-unsigned int dataoff, u_int16_t l3num, u_int8_t protonum,
-struct net *net,
-struct nf_conntrack_tuple *tuple,
-const struct nf_conntrack_l3proto *l3proto,
-const struct nf_conntrack_l4proto *l4proto);
-
 bool nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse,
const struct nf_conntrack_tuple *orig,
const struct nf_conntrack_l3proto *l3proto,
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 3465da2a98bd..160493f95fed 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -222,7 +222,7 @@ static u32 hash_conntrack(const struct net *net,
return scale_hash(hash_conntrack_raw(tuple, net));
 }
 
-bool
+static bool
 nf_ct_get_tuple(const struct sk_buff *skb,
unsigned int nhoff,
unsigned int dataoff,
@@ -244,7 +244,6 @@ nf_ct_get_tuple(const struct sk_buff *skb,
 
return l4proto->pkt_to_tuple(skb, dataoff, net, tuple);
 }
-EXPORT_SYMBOL_GPL(nf_ct_get_tuple);
 
 bool nf_ct_get_tuplepr(const struct sk_buff *skb, unsigned int nhoff,
   u_int16_t l3num,
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 284aca2a252d..e05bd3e53f0f 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -607,23 +607,12 @@ static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
 u8 l3num, struct sk_buff *skb, bool natted)
 {
-   const struct nf_conntrack_l3proto *l3proto;
-   const struct nf_conntrack_l4proto *l4proto;
struct nf_conntrack_tuple tuple;
struct nf_conntrack_tuple_hash *h;
struct nf_conn *ct;
-   unsigned int dataoff;
-   u8 protonum;
 
-   l3proto = __nf_ct_l3proto_find(l3num);
-   if (l3proto->get_l4proto(skb, skb_network_offset(skb), &dataoff,
-&protonum) <= 0) {
-   pr_debug("ovs_ct_find_existing: Can't get protonum\n");
-   return NULL;
-   }
-   l4proto = __nf_ct_l4proto_find(l3num, protonum);
-   if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, l3num,
-protonum, net, &tuple, l3proto, l4proto)) {
+   if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), l3num,
+  net, &tuple)) {
pr_debug("ovs_ct_find_existing: Can't get tuple\n");
return NULL;
}
@@ -632,7 +621,7 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
if (natted) {
struct nf_conntrack_tuple inverse;
 
-   if (!nf_ct_invert_tuple(&inverse, &tuple, l3proto, l4proto)) {
+   if (!nf_ct_invert_tuplepr(&inverse, &tuple)) {
pr_debug("ovs_ct_find_existing: Inversion failed!\n");
return NULL;
}
-- 
2.16.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] net: openvswitch: Add support to lookup invalid packet in ct action.

2020-10-06 Thread Florian Westphal
nusid...@redhat.com  wrote:
> From: Numan Siddique 
> 
> For a tcp packet which is part of an existing committed connection,
> nf_conntrack_in() will return err and set skb->_nfct to NULL if it is
> out of tcp window. ct action for this packet will set the ct_state
> to +inv which is as expected.

This is because from conntrack p.o.v., such TCP packet is NOT part of
the existing connection.

For example, because it is considered part of a previous incarnation
of the same connection.

> But a controller cannot add an OVS flow as
> 
> table=21,priority=100,ct_state=+inv, actions=drop
> 
> to drop such packets. That is because when ct action is executed on other
> packets which are not part of existing committed connections, ct_state
> can be set to invalid. Few such cases are:
>- ICMP reply packets.

Can you elaborate? Echo reply should not be invalid. Conntrack should
mark it as established (unless such echo reply came out of the blue).

>- TCP SYN/ACK packets during connection establishment.

SYN/ACK should also be established state.
INVALID should only be matched for packets that were never seen
by conntrack, or that are deemed out of date / corrupted.

> To distinguish between an invalid packet part of committed connection
> and others, this patch introduces as a new ct attribute
> OVS_CT_ATTR_LOOKUP_INV. If this is set in the ct action (without commit),
> it tries to find the ct entry and if present, sets the ct_state to
> +inv,+trk and also sets the mark and labels associated with the
> connection.
> 
> With this,  a controller can add flows like
> 
> 
> 
> table=20,ip, action=ct(table=21, lookup_invalid)
> table=21,priority=100,ct_state=+inv+trk,ct_label=0x2/0x2 actions=drop
> table=21,ip, actions=resubmit(,22)
> 
> 

What exactly is the feature/problem that needs to be solved?
I suspect this would help me to provide better feedback than the
semi-random comments below  :-)

My only problem with how conntrack does things ATM is that the ruleset
cannot distinguish:

1. packet was not even seen by conntrack
2. packet matches existing connection, but is "bad", for example:
  - contradicting tcp flags
  - out of window
  - invalid checksum

There are a few sysctls to modify default behaviour, e.g. relax window
checks, or ignore/skip checksum validation.

The other problem i see (solveable for sure by yet-another-sysctl but i
see that as last-resort) is usual compatibility problem:

ct state invalid drop
ct mark gt 0 accept

If standard netfilter conntrack were to set skb->_nfct e.g. even if
state is invalid, we could still make the above work via some internal
flag.

But if you reverse it, you get different behaviour:

ct mark gt 0 accept
ct state invalid drop

First rule might now accept out-of-window packet even when "be_liberal"
sysctl is off.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] net: openvswitch: Add support to lookup invalid packet in ct action.

2020-10-06 Thread Florian Westphal
Numan Siddique  wrote:
> On Tue, Oct 6, 2020 at 4:46 PM Florian Westphal  wrote:
> >
> > nusid...@redhat.com  wrote:
> > > From: Numan Siddique 
> > >
> > > For a tcp packet which is part of an existing committed connection,
> > > nf_conntrack_in() will return err and set skb->_nfct to NULL if it is
> > > out of tcp window. ct action for this packet will set the ct_state
> > > to +inv which is as expected.
> >
> > This is because from conntrack p.o.v., such TCP packet is NOT part of
> > the existing connection.
> >
> > For example, because it is considered part of a previous incarnation
> > of the same connection.
> >
> > > But a controller cannot add an OVS flow as
> > >
> > > table=21,priority=100,ct_state=+inv, actions=drop
> > >
> > > to drop such packets. That is because when ct action is executed on other
> > > packets which are not part of existing committed connections, ct_state
> > > can be set to invalid. Few such cases are:
> > >- ICMP reply packets.
> >
> > Can you elaborate? Echo reply should not be invalid. Conntrack should
> > mark it as established (unless such echo reply came out of the blue).
> 
> Hi Florian,
> 
> Thanks for providing the comments.
> 
> Sorry for not being very clear.
> 
> Let me brief about the present problem we see in OVN (which is a
> controller using ovs)
> 
> When a VM/container sends a packet (in the ingress direction), we don't send 
> all
> the packets to conntrack. If a packet is destined to an OVN load
> balancer virtual ip,
> only then we send the packet to conntrack in the ingress direction and
> then we do dnat
> to the backend.

Ah, okay.  That explains the INVALID then, but in that case I think this
patch/direction is less desirable from conntrack point of view.

More below.

> table=1, match = (ip && ip4.dst == VIP) action = ct(table=2)
> tablle=2, ct_state=+new+trk && ip4.dst == VIP, action = ct(commit,
> nat=BACKEND_IP)
> ...
> ..
> 
> However for the egress direction (when the packet is to be delivered
> to the VM/container),
> we send all the packets to conntrack and if the ct.est is set, we do
> undnat before delivering
> the packet to the VM/container.
> ...
> table=40, match = ip, action = ct(table=41)
> table=41, match = ct_state=+est+trk, action = ct(nat)
> ...
> 
> What I mean here is that, since we send all the packets in the egress
> pipeline to conntrack,
> we can't add a flow like - match = ct_state=+inv, action = drop.

Basically this is like a normal routing/netfitler (no ovs), where
the machine in question only sees unidirectional traffic (reply packets
taking different route).

> i.e When a VM/container sends an ICMP request packet, it will not be
> sent to conntrack, but
> the reply ICMP will be sent to conntrack and it will be marked as invalid.

Yes.  For plain netfilter, the solution would be to accept INVALID icmp
replies in the iptables/nftables ruleset.

> So is the case with TCP, the TCP SYN from the VM is not sent to
> conntrack, but the SYN/ACK
> from the server would be sent to conntrack and it will be marked as invalid.

Right.

> > 1. packet was not even seen by conntrack
> > 2. packet matches existing connection, but is "bad", for example:
> >   - contradicting tcp flags
> >   - out of window
> >   - invalid checksum
> 
> We want the below to be solved (using OVS flows) :
>   - If the packet is marked as invalid due to (2) which you mentioned above,
> we would like to read the ct_mark and ct_label fields as the packet is
> part of existing connection, so that we can add an OVS flow like
>
> ct_state=+inv+trk,ct_label=0x2 actions=drop

Wouldn't it make more sense to let tcp conntrack skip the in-window
check?  AFAIU thats the only problem and its identical to other cases
that we have at the moment, for example, conntrack supports mid-stream
pickup (on by default) which also disables tcp window checks.

> I tested by setting 'be_liberal' sysctl flag and since skb->_nfct was
> set for (2), OVS
> datapath module set the ct_state to +est.

Yes.  This flag can be set programatically on a per-ct basis.

See nft_flow_offload_eval() for example (BE_LIBERAL).
Given we now have multiple places that set this I think it would make
sense to add a helper, say, e.g.

void nf_ct_tcp_be_liberal(struct nf_conn *ct);
?

(Or perhaps nf_ct_tcp_disable_window_checks() , might be more
 clear/descriptive as to what this is doing).

Do you think that this resolves the OVN problem?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [net-next] netfiler: conntrack: Add the option to set ct tcp flag - BE_LIBERAL per-ct basis.

2020-11-09 Thread Florian Westphal
nusid...@redhat.com  wrote:
> From: Numan Siddique 
> 
> Before calling nf_conntrack_in(), caller can set this flag in the
> connection template for a tcp packet and any errors in the
> tcp_in_window() will be ignored.
> 
> A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which
> sets this flag for both the directions of the nf_conn.
> 
> openvswitch makes use of this feature so that any out of window tcp
> packets are not marked invalid. Prior to this there was no easy way
> to distinguish if conntracked packet is marked invalid because of
> tcp_in_window() check error or because it doesn't belong to an
> existing connection.
> 
> An earlier attempt (see the link) tried to solve this problem for
> openvswitch in a different way. Florian Westphal instead suggested
> to be liberal in openvswitch for tcp packets.
> 
> Link: 
> https://patchwork.ozlabs.org/project/netdev/patch/20201006083355.121018-1-nusid...@redhat.com/
> 
> Suggested-by: Florian Westphal 
> Signed-off-by: Numan Siddique 
> ---
>  include/net/netfilter/nf_conntrack_l4proto.h |  6 ++
>  net/netfilter/nf_conntrack_core.c| 13 +++--
>  net/openvswitch/conntrack.c  |  1 +
>  3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h 
> b/include/net/netfilter/nf_conntrack_l4proto.h
> index 88186b95b3c2..572ae8d2a622 100644
> --- a/include/net/netfilter/nf_conntrack_l4proto.h
> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
> @@ -203,6 +203,12 @@ static inline struct nf_icmp_net 
> *nf_icmpv6_pernet(struct net *net)
>  {
>   return &net->ct.nf_ct_proto.icmpv6;
>  }
> +
> +static inline void nf_ct_set_tcp_be_liberal(struct nf_conn *ct)
> +{
> + ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
> + ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
> +}
>  #endif
>  
>  #ifdef CONFIG_NF_CT_PROTO_DCCP
> diff --git a/net/netfilter/nf_conntrack_core.c 
> b/net/netfilter/nf_conntrack_core.c
> index 234b7cab37c3..8290c5b04e88 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1748,10 +1748,18 @@ static int nf_conntrack_handle_packet(struct nf_conn 
> *ct,
> struct sk_buff *skb,
> unsigned int dataoff,
> enum ip_conntrack_info ctinfo,
> -   const struct nf_hook_state *state)
> +   const struct nf_hook_state *state,
> +   union nf_conntrack_proto *tmpl_proto)

I would prefer if we could avoid adding yet another function argument.

AFAICS its enough to call the new nf_ct_set_tcp_be_liberal() helper
before nf_conntrack_confirm() in ovs_ct_commit(), e.g.:

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1235,10 +1235,13 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
if (!nf_ct_is_confirmed(ct)) {
err = ovs_ct_init_labels(ct, key, &info->labels.value,
 &info->labels.mask);
if (err)
return err;
+
+   if (nf_ct_protonum(ct) == IPPROTO_TCP)
+   nf_ct_set_tcp_be_liberal(ct);

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [net-next] netfiler: conntrack: Add the option to set ct tcp flag - BE_LIBERAL per-ct basis.

2020-11-10 Thread Florian Westphal
Numan Siddique  wrote:
> On Tue, Nov 10, 2020 at 3:06 AM Florian Westphal  wrote:
> Thanks for the comments. I actually tried this approach first, but it
> doesn't seem to work.
> I noticed that for the committed connections, the ct tcp flag -
> IP_CT_TCP_FLAG_BE_LIBERAL is
> not set when nf_conntrack_in() calls resolve_normal_ct().

Yes, it won't be set during nf_conntrack_in, thats why I suggested
to do it before confirming the connection.

> Would you expect that the tcp ct flags should have been preserved once
> the connection is committed ?

Yes, they are preserved when you set them after nf_conntrack_in(), else
we would already have trouble with hw flow offloading which needs to
turn off window checks as well.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [net-next] netfiler: conntrack: Add the option to set ct tcp flag - BE_LIBERAL per-ct basis.

2020-11-10 Thread Florian Westphal
Numan Siddique  wrote:
> On Tue, Nov 10, 2020 at 5:55 PM Florian Westphal  wrote:
> >
> > Numan Siddique  wrote:
> > > On Tue, Nov 10, 2020 at 3:06 AM Florian Westphal  wrote:
> > > Thanks for the comments. I actually tried this approach first, but it
> > > doesn't seem to work.
> > > I noticed that for the committed connections, the ct tcp flag -
> > > IP_CT_TCP_FLAG_BE_LIBERAL is
> > > not set when nf_conntrack_in() calls resolve_normal_ct().
> >
> > Yes, it won't be set during nf_conntrack_in, thats why I suggested
> > to do it before confirming the connection.
> 
> Sorry for the confusion. What I mean is - I tested  your suggestion - i.e 
> called
> nf_ct_set_tcp_be_liberal()  before calling nf_conntrack_confirm().
> 
>  Once the connection is established, for subsequent packets, openvswitch
>  calls nf_conntrack_in() [1] to see if the packet is part of the
> existing connection or not (i.e ct.new or ct.est )
> and if the packet happens to be out-of-window then skb->_nfct is set
> to NULL. And the tcp
> be flags set during confirmation are not reflected when
> nf_conntrack_in() calls resolve_normal_ct().

Can you debug where this happens?  This looks very very wrong.
resolve_normal_ct() has no business to check any of those flags
(and I don't see where it uses them, it should only deal with the
 tuples).

The flags come into play when nf_conntrack_handle_packet() gets called
after resolve_normal_ct has found an entry, since that will end up
calling the tcp conntrack part.

The entry found/returned by resolve_normal_ct should be the same
nf_conn entry that was confirmed earlier, i.e. it should be in "liberal"
mode.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] net: openvswitch: Be liberal in tcp conntrack.

2020-11-19 Thread Florian Westphal
Jakub Kicinski  wrote:
> On Mon, 16 Nov 2020 18:31:26 +0530 nusid...@redhat.com wrote:
> > From: Numan Siddique 
> > 
> > There is no easy way to distinguish if a conntracked tcp packet is
> > marked invalid because of tcp_in_window() check error or because
> > it doesn't belong to an existing connection. With this patch,
> > openvswitch sets liberal tcp flag for the established sessions so
> > that out of window packets are not marked invalid.
> > 
> > A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which
> > sets this flag for both the directions of the nf_conn.
> > 
> > Suggested-by: Florian Westphal 
> > Signed-off-by: Numan Siddique 
> 
> Florian, LGTY?

Sorry, this one sailed past me.

Acked-by: Florian Westphal 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] openvswitch: Replace removed NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)

2019-05-08 Thread Florian Westphal
Geert Uytterhoeven  wrote:
> Commit 4806e975729f99c7 ("netfilter: replace NF_NAT_NEEDED with
> IS_ENABLED(CONFIG_NF_NAT)") removed CONFIG_NF_NAT_NEEDED, but a new user
> popped up afterwards.

Thnaks for spotting this.

Acked-by: Florian Westphal 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-19 Thread Florian Westphal
Xin Long  wrote:
> > master connection only if it is not yet confirmed.  Users may commit 
> > different
> > labels for the related connection.  This should be more in line with the
> > previous behavior.
> >
> > What do you think?
> >
> You're right.
> Also, I noticed the related ct->mark is set to master ct->mark in
> init_conntrack() as well as secmark when creating the related ct.
> 
> Hi, Florian,
> 
> Any reason why the labels are not set to master ct's in there?

The intent was to have lables be set only via ctnetlink (userspace)
or ruleset.

The original use case was for tagging connections based on
observed behaviour/properties at a later time, not at start of flow.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-19 Thread Florian Westphal
Xin Long  wrote:
> Got it, I will fix this in ovs.

Thanks!

I don't want to throw more work at you, but since you are
already working on ovs+conntrack:

ovs_ct_init() is bad, as this runs unconditionally.

modprobe openvswitch -> conntrack becomes active in all
existing and future namespaces.

Conntrack is slow, we should avoid this behaviour.

ovs_ct_limit_init() should be called only when the feature is
configured (the problematic call is nf_conncount_init, as that
turns on connection tracking, the kmalloc etc is fine).

Likewise, nf_connlabels_get() should only be called when
labels are added/configured, not at the start.

I resolved this for tc in 70f06c115bcc but it seems i never
got around to fix it for ovs.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next] openvswitch: prepare for stolen verdict coming from conntrack and nat engine

2024-07-03 Thread Florian Westphal
At this time, conntrack either returns NF_ACCEPT or NF_DROP.
To improve debuging it would be nice to be able to replace NF_DROP
verdict with NF_DROP_REASON() helper,

This helper releases the skb instantly (so drop_monitor can pinpoint
precise location) and returns NF_STOLEN.

Prepare call sites to deal with this before introducing such changes
in conntrack and nat core.

Signed-off-by: Florian Westphal 
---
 net/openvswitch/conntrack.c | 47 +
 1 file changed, 37 insertions(+), 10 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 3b980bf2770b..8eb1d644b741 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -679,6 +679,8 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
action |= BIT(NF_NAT_MANIP_DST);
 
err = nf_ct_nat(skb, ct, ctinfo, &action, &info->range, info->commit);
+   if (err != NF_ACCEPT)
+   return err;
 
if (action & BIT(NF_NAT_MANIP_SRC))
ovs_nat_update_key(key, skb, NF_NAT_MANIP_SRC);
@@ -697,6 +699,22 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 }
 #endif
 
+static int verdict_to_errno(unsigned int verdict)
+{
+   switch (verdict & NF_VERDICT_MASK) {
+   case NF_ACCEPT:
+   return 0;
+   case NF_DROP:
+   return -EINVAL;
+   case NF_STOLEN:
+   return -EINPROGRESS;
+   default:
+   break;
+   }
+
+   return -EINVAL;
+}
+
 /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if
  * not done already.  Update key with new CT state after passing the packet
  * through conntrack.
@@ -735,7 +753,7 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 
err = nf_conntrack_in(skb, &state);
if (err != NF_ACCEPT)
-   return -ENOENT;
+   return verdict_to_errno(err);
 
/* Clear CT state NAT flags to mark that we have not yet done
 * NAT after the nf_conntrack_in() call.  We can actually clear
@@ -762,9 +780,12 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 * the key->ct_state.
 */
if (info->nat && !(key->ct_state & OVS_CS_F_NAT_MASK) &&
-   (nf_ct_is_confirmed(ct) || info->commit) &&
-   ovs_ct_nat(net, key, info, skb, ct, ctinfo) != NF_ACCEPT) {
-   return -EINVAL;
+   (nf_ct_is_confirmed(ct) || info->commit)) {
+   int err = ovs_ct_nat(net, key, info, skb, ct, ctinfo);
+
+   err = verdict_to_errno(err);
+   if (err)
+   return err;
}
 
/* Userspace may decide to perform a ct lookup without a helper
@@ -795,9 +816,12 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 * - When committing an unconfirmed connection.
 */
if ((nf_ct_is_confirmed(ct) ? !cached || add_helper :
- info->commit) &&
-   nf_ct_helper(skb, ct, ctinfo, info->family) != NF_ACCEPT) {
-   return -EINVAL;
+ info->commit)) {
+   int err = nf_ct_helper(skb, ct, ctinfo, info->family);
+
+   err = verdict_to_errno(err);
+   if (err)
+   return err;
}
 
if (nf_ct_protonum(ct) == IPPROTO_TCP &&
@@ -1001,10 +1025,9 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
/* This will take care of sending queued events even if the connection
 * is already confirmed.
 */
-   if (nf_conntrack_confirm(skb) != NF_ACCEPT)
-   return -EINVAL;
+   err = nf_conntrack_confirm(skb);
 
-   return 0;
+   return verdict_to_errno(err);
 }
 
 /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
@@ -1039,6 +1062,10 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
else
err = ovs_ct_lookup(net, key, info, skb);
 
+   /* conntrack core returned NF_STOLEN */
+   if (err == -EINPROGRESS)
+   return err;
+
skb_push_rcsum(skb, nh_ofs);
if (err)
ovs_kfree_skb_reason(skb, OVS_DROP_CONNTRACK);
-- 
2.44.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] openvswitch: prepare for stolen verdict coming from conntrack and nat engine

2024-07-03 Thread Florian Westphal
Aaron Conole  wrote:
> > verdict with NF_DROP_REASON() helper,
> >
> > This helper releases the skb instantly (so drop_monitor can pinpoint
> > precise location) and returns NF_STOLEN.
> >
> > Prepare call sites to deal with this before introducing such changes
> > in conntrack and nat core.
> >
> > Signed-off-by: Florian Westphal 
> > ---
> 
> AFAIU, these changes are only impacting the existing NF_DROP cases, and
> won't impact how ovs + netfilter communicate about invalid packets.  One
> important thing to note is that we rely on:
> 
>  * Note that if the packet is deemed invalid by conntrack, skb->_nfct will be
>  * set to NULL and 0 will be returned.

Right, this is about how to communicate 'packet dropped'.

NF_DROP means 'please call kfree_skb for me'.  Problem from introspection point
of view is that drop monitor will blame nf_hook_slow() (for netfilter)
and ovs resp. act_ct for the drop.

Plan is to allow conntrack/nat engine to return STOLEN verdict ("skb
might have been free'd already").

Example change:
@@ -52,10 +53,8 @@ nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int 
hooknum,
rt = skb_rtable(skb);
nh = rt_nexthop(rt, ip_hdr(skb)->daddr);
newsrc = inet_select_addr(out, nh, RT_SCOPE_UNIVERSE);
-   if (!newsrc) {
-   pr_info("%s ate my IP address\n", out->name);
-   return NF_DROP;
-   }
+   if (!newsrc)
+   return NF_DROP_REASON(skb, SKB_DROP_REASON_NETFILTER_DROP, 
EADDRNOTAVAIL);


Where NF_DROP_REASON() is:

static __always_inline int
NF_DROP_REASON(struct sk_buff *skb, enum skb_drop_reason reason, u32 err)
{
BUILD_BUG_ON(err > 0x);

kfree_skb_reason(skb, reason);

return ((err << 16) | NF_STOLEN);
}

So drop monitoring tools will blame nf_nat_masquerade.c:nf_nat_masquerade_ipv4 
and not
the consumer of the NF_DROP verdict.

I can't make such changes ATM because ovs and act_ct assume conntrack
returns only ACCEPT and DROP, so we'd get double-free.  Hope that makes
sense.

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-07-08 Thread Florian Westphal
Xin Long  wrote:
> I can avoid this warning by not allocating ext for commit ct in ovs:
> 
> @@ -426,7 +426,7 @@ static int ovs_ct_set_labels(struct nf_conn *ct,
> struct sw_flow_key *key,
> struct nf_conn_labels *cl;
> int err;
> 
> -   cl = ovs_ct_get_conn_labels(ct);
> +   cl = nf_ct_labels_find(ct);
> if (!cl)
> return -ENOSPC;
> 
> However, the test case would fail, although the failure can be worked around
> by setting ct_label in the 1st rule:
> 
>   
> table=0,priority=30,in_port=1,ip,nw_dst=172.1.1.2,actions=ct(commit,nat(dst=10.1.1.2:80),exec(set_field:0x01->ct_label),table=1)
> 
> So I'm worrying our change may break some existing OVS user cases.

Then ovs_ct_limit_init() and nf_connlabels_get() need to be called
once on the first conntrack operatation, regardless if labels are asked
for or not.

Not nice but still better than current state.

ovs_ct_execute() perhaps?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-07-08 Thread Florian Westphal
Xin Long  wrote:
> > Xin Long  wrote:
> > > I can avoid this warning by not allocating ext for commit ct in ovs:
> > >
> > > @@ -426,7 +426,7 @@ static int ovs_ct_set_labels(struct nf_conn *ct,
> > > struct sw_flow_key *key,
> > > struct nf_conn_labels *cl;
> > > int err;
> > >
> > > -   cl = ovs_ct_get_conn_labels(ct);
> > > +   cl = nf_ct_labels_find(ct);
> > > if (!cl)
> > > return -ENOSPC;
> ovs_ct_get_conn_labels() must be replaced with nf_ct_labels_find() in here
> anyway, thinking that the confirmed ct without labels was created in other
> places (not by OVS conntrack), the warning may still be triggered when
> trying to set labels in OVS after.

Right.

> > Then ovs_ct_limit_init() and nf_connlabels_get() need to be called
> > once on the first conntrack operatation, regardless if labels are asked
> > for or not.
> >
> > Not nice but still better than current state.
> Right, not nice, it undermines the bits check against NF_CT_LABELS_MAX_SIZE.

Yes, but OTOH this bit check isn't used anymore, it comes from days when
label area was dynamically sized, today is hardcoded to 128 anyway.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 net-next] openvswitch: switch to per-action label counting in conntrack

2024-08-13 Thread Florian Westphal
Xin Long  wrote:
> Similar to commit 70f06c115bcc ("sched: act_ct: switch to per-action
> label counting"), we should also switch to per-action label counting
> in openvswitch conntrack, as Florian suggested.
> 
> The difference is that nf_connlabels_get() is called unconditionally
> when creating an ct action in ovs_ct_copy_action(). As with these
> flows:
> 
>   table=0,ip,actions=ct(commit,table=1)
>   table=1,ip,actions=ct(commit,exec(set_field:0xac->ct_label),table=2)
> 
> it needs to make sure the label ext is created in the 1st flow before
> the ct is committed in ovs_ct_commit(). Otherwise, the warning in
> nf_ct_ext_add() when creating the label ext in the 2nd flow will
> be triggered:

With this and 
https://patchwork.ozlabs.org/project/netfilter-devel/patch/7380c37e2d58a93164b7f2212c90cd23f9d910f8.1721268584.git.lucien@gmail.com/
applied new netns doesn't have conntrack enabled anymore, so

Acked-by: Florian Westphal 

Thanks Xinlong!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly

2023-07-18 Thread Florian Westphal
Jakub Kicinski  wrote:
> On Sun, 16 Jul 2023 17:09:16 -0400 Xin Long wrote:
> > With the OVS upcall, the original ct in the skb will be dropped, and when
> > the skb comes back from userspace it has to create a new ct again through
> > nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
> > 
> > However, the new ct will not be able to have the exp as the original ct
> > has taken it away from the hash table in nf_ct_find_expectation(). This
> > will cause some flow never to be matched, like:
> > 
> >   'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
> >   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
> >   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
> > 
> > if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
> > matched.
> > 
> > OVS conntrack works around this by adding its own exp lookup function to
> > not remove the exp from the hash table and saving the exp and its master
> > info to the flow keys instead of create a real ct. But this way doesn't
> > work for TC act_ct.
> > 
> > The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
> > the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
> > allows both OVS conntrack and TC act_ct to have a simple and clear fix
> > for this problem in the patch 2/3 and 3/3.
> 
> Florian, Pablo, any opinion on these?

Sorry for the silence.  I dislike moving tc/ovs artifacts into
the conntrack core.

But so far I could not come up with a counter-proposal,
and it doesn't look too outrageous.

Let me look at it again later today (its early morning here)
and I will get back to this in my late evening.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly

2023-07-19 Thread Florian Westphal
Florian Westphal  wrote:
> Jakub Kicinski  wrote:
> > On Sun, 16 Jul 2023 17:09:16 -0400 Xin Long wrote:
> > > With the OVS upcall, the original ct in the skb will be dropped, and when
> > > the skb comes back from userspace it has to create a new ct again through
> > > nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
> > > 
> > > However, the new ct will not be able to have the exp as the original ct
> > > has taken it away from the hash table in nf_ct_find_expectation(). This
> > > will cause some flow never to be matched, like:
> > > 
> > >   'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
> > >   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
> > >   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
> > > 
> > > if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
> > > matched.
> > > 
> > > OVS conntrack works around this by adding its own exp lookup function to
> > > not remove the exp from the hash table and saving the exp and its master
> > > info to the flow keys instead of create a real ct. But this way doesn't
> > > work for TC act_ct.
> > > 
> > > The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
> > > the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
> > > allows both OVS conntrack and TC act_ct to have a simple and clear fix
> > > for this problem in the patch 2/3 and 3/3.
> > 
> > Florian, Pablo, any opinion on these?
> 
> Sorry for the silence.  I dislike moving tc/ovs artifacts into
> the conntrack core.

Can't find a better solution, feel free to take this though the net-next tree.

Acked-by: Florian Westphal 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 net-next 0/5] net: move more duplicate code of ovs and tc conntrack into nf_conntrack_ovs

2023-02-10 Thread Florian Westphal
Jakub Kicinski  wrote:
> On Tue,  7 Feb 2023 17:52:05 -0500 Xin Long wrote:
> > We've moved some duplicate code into nf_nat_ovs in:
> > 
> >   "net: eliminate the duplicate code in the ct nat functions of ovs and tc"
> > 
> > This patchset addresses more code duplication in the conntrack of ovs
> > and tc then creates nf_conntrack_ovs for them, and four functions will
> > be extracted and moved into it:
> > 
> >   nf_ct_handle_fragments()
> >   nf_ct_skb_network_trim()
> >   nf_ct_helper()
> >   nf_ct_add_helper()
> 
> Hi Pablo, do you prefer to take this or should we?

Looks like Pablo is very busy atm, I have no objections
if this is applied to net-next.

You may add
Acked-by: Florian Westphal 

if you like.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev