On 11/6/25 2:39 AM, Florian Westphal wrote:
Fernando Fernandez Mancera <[email protected]> wrote:
Since commit d265929930e2 ("netfilter: nf_conncount: reduce unnecessary
GC") if nftables/iptables connlimit is used without a check for new
connections there can be duplicated entries tracked.
Pass the nf_conn struct directly to the nf_conncount API and check
whether the connection is confirmed or not inside nf_conncount_add(). If
the connection is confirmed, skip it.
I think there is a bit too much noise here, can this be
split in several chunks?
Not sure, I could try but the noise comes from removing zone and tuple
which requires many changes around. Otherwise this would compile with
warnings/errors. I am not sure I can split this more.
-unsigned int nf_conncount_count(struct net *net,
+unsigned int nf_conncount_count(struct net *net, const struct nf_conn *ct,
struct nf_conncount_data *data,
- const u32 *key,
- const struct nf_conntrack_tuple *tuple,
- const struct nf_conntrack_zone *zone);
+ const u32 *key);
nf_conn *ct has pitfalls that I did not realize earlier.
Current code is also buggy in that regard.
Maybe we need additional function to ease this for callers.
See below.
+static int __nf_conncount_add(const struct nf_conn *ct,
+ struct nf_conncount_list *list)
{
const struct nf_conntrack_tuple_hash *found;
struct nf_conncount_tuple *conn, *conn_n;
+ const struct nf_conntrack_tuple *tuple;
+ const struct nf_conntrack_zone *zone;
struct nf_conn *found_ct;
unsigned int collect = 0;
+ if (!ct || nf_ct_is_confirmed(ct))
+ return -EINVAL;
+
When is the caller expected to pass a NULL ct?
Maybe this just misses a comment.
No, I don't think the caller is expected to pass a NULL ct. That should
be removed.
index fc35a11cdca2..e815c0235b62 100644
--- a/net/netfilter/nft_connlimit.c
+++ b/net/netfilter/nft_connlimit.c
@@ -24,28 +24,35 @@ static inline void nft_connlimit_do_eval(struct
nft_connlimit *priv,
const struct nft_pktinfo *pkt,
const struct nft_set_ext *ext)
{
- const struct nf_conntrack_zone *zone = &nf_ct_zone_dflt;
- const struct nf_conntrack_tuple *tuple_ptr;
+ struct nf_conntrack_tuple_hash *h;
struct nf_conntrack_tuple tuple;
enum ip_conntrack_info ctinfo;
const struct nf_conn *ct;
unsigned int count;
-
- tuple_ptr = &tuple;
+ int err;
ct = nf_ct_get(pkt->skb, &ctinfo);
- if (ct != NULL) {
- tuple_ptr = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
- zone = nf_ct_zone(ct);
- } else if (!nf_ct_get_tuplepr(pkt->skb, skb_network_offset(pkt->skb),
- nft_pf(pkt), nft_net(pkt), &tuple)) {
- regs->verdict.code = NF_DROP;
- return;
+ if (!ct) {
+ if (!nf_ct_get_tuplepr(pkt->skb, skb_network_offset(pkt->skb),
+ nft_pf(pkt), nft_net(pkt), &tuple)) {
+ regs->verdict.code = NF_DROP;
+ return;
+ }
+
+ h = nf_conntrack_find_get(nft_net(pkt), &nf_ct_zone_dflt,
&tuple);
+ if (!h) {
+ regs->verdict.code = NF_DROP;
+ return;
+ }
+ ct = nf_ct_tuplehash_to_ctrack(h);
This ct had its refcount incremented.
Thank you! Will fix it.
I also see that this shared copypaste with xtables.
Would it be possible to pass 'const struct sk_buff *'
and let the nf_conncount core handle this internally?
nf_conncount_add(net, pf, skb, priv->list);
which does:
ct = nf_ct_get(skb, &ctinfo);
if (ct && !nf_ct_is_template(ct))
return __nf_conncount_add(ct, list);
if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), pf, net,
&tuple))
return -ERR;
if (ct) /* its a template, so do lookup in right zone */
zone = nf_ct_zone(ct);
else
zone = &nf_ct_zone_dflt;
h = nf_conntrack_find_get(nft_net(pkt), zone, &tuple);
if (!h)
return -ERR;
ct = nf_ct_tuplehash_to_ctrack(h);
err = __nf_conncount_add(ct, list);
nf_ct_put(ct):
return err;
I.e., the existing nf_conncount_add() becomes a helper that takes
a ct, as you have already proposed, but its renamed and turned into
an internal helper so frontends don't need to duplicate tuple lookup.
Alternatively, no need to rename it and instead add a new API call
that takes the ct argument, e.g. 'nf_conncount_add_ct' or whatever,
and then make nf_conncount_add() internal in a followup patch.
Unfortunately, I do not think this is possible. xt_connlimit is using
the rbtree with nf_conncount_count() while nft_connlimit isn't. I
believe we do not want to change that. In addition, for the rbtree we
need to calculate the key.. I would leave this code as duplicated given
that is shared only between xt_connlimit and nft_connlimit. Openvswitch
doesn't care about this as they always call nf_conncount_count() while
holding a reference to a ct..
+ h = nf_conntrack_find_get(net, &nf_ct_zone_dflt, &tuple);
We should not restrict the lookup to the default zone,
but we should follow what the template (if any) says.
I under-estimated the work to get this right, I think this is too much
code to copypaste between nft+xt frontends. Sorry for making that
suggestion originally.
No worries at all, I think there is some benefit from this change even
with the copy-paste. Maybe we can create a helper function to just get
the ct from the sk_buff.. what about "nf_conntrack_get_or_find()"? I
accept suggestions for a better name :)
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev