27.04.2022 17:40, Vitaliy Makkoveev пишет:
On Mon, Apr 25, 2022 at 06:09:13PM +0200, Alexander Bluhm wrote:
On Mon, Apr 25, 2022 at 05:49:49PM +0300, Vitaliy Makkoveev wrote:
I would suggest to replace the timeout garbage collector with
reference counting.  This diff is only compile tested as I have no
setup for IPsec ids yet.  Should work on 7.1 an -current.

bluhm
Your diff is broken, it keeps `id_gc_list' as part of 'ipsec_ids' and
still removes `ids' from this non-existing list:
This happens when I send code that I have not tested.  Updated diff
below.

Also it modifies existing behaviour, we stop to re-use dead `ids' for
some period after it's removal. I don't know why we do this, may be some
one involved to this layer could explain.
I will ask markus@

bluhm

I found the race which triggered the assertion. We do atomic decerement
of `id_refcount' without `ipsec_flows_mtx' mutex(9) held:

1288 ipsp_ids_free(struct ipsec_ids *ids)
1289 {
1290         if (ids == NULL)
1291                 return;
1292
1293         /*
1294          * If the refcount becomes zero, then a timeout is started. This
1295          * timeout must be cancelled if refcount is increased from zero.
1296          */
1297         DPRINTF("ids %p count %d", ids, ids->id_refcount);
1298         KASSERT(ids->id_refcount > 0);
1299
1300         if (atomic_dec_int_nv(&ids->id_refcount) > 0)
1301                 return;
1302
1303         mtx_enter(&ipsec_flows_mtx);
1304
1305         /*
1306          * Add second for the case ipsp_ids_gc() is already running and
1307          * awaits netlock to be released.
1308          */
1309         ids->id_gc_ttl = ipsec_ids_idle + 1;
1310
1311         if (LIST_EMPTY(&ipsp_ids_gc_list))
1312                 timeout_add_sec(&ipsp_ids_gc_timeout, 1);
1313         LIST_INSERT_HEAD(&ipsp_ids_gc_list, ids, id_gc_list);


So we have this `ids' linked to the `ipsec_ids_tree' but not yet linked
to the `id_gc_list'. This `ids' has zeroed `id_refcount'. Also we have
`ipsec_flows_mtx' mutex(9) not yet locked, before line 1303.

On other CPU we perform ipsp_ids_insert(), so we lock `ipsec_flows_mtx'
mutex(9). If we won, the ipsp_ids_free() thread will wait us. We do
search in the `ipsec_ids_tree' tree where we have this `ids' linked.
It's `id_refcount' is null, so we will restore is and reuse. We bump
it's `id_refcount'. But this `ids' is not yet linked to the `id_gc_list'
list so we break it at line 1209.

1198 ipsp_ids_insert(struct ipsec_ids *ids)
1199 {
1200         struct ipsec_ids *found;
1201         u_int32_t start_flow;
1202
1203         mtx_enter(&ipsec_flows_mtx);
1204
1205         found = RBT_INSERT(ipsec_ids_tree, &ipsec_ids_tree, ids);
1206         if (found) {
1207                 /* if refcount was zero, then timeout is running */
1208                 if (atomic_inc_int_nv(&found->id_refcount) == 1) {
1209                         LIST_REMOVE(found, id_gc_list);

When we release `ipsec_flows_mtx' mutex(9), the ipsp_ids_free() thread
continues it's work and places this `ids' to the `id_gc_list'.

This diff fixes race and keeps current behaviour. The `id_refcount'
decrement and `ids' placement to the `id_gc_list' should be consistent
and protected with the same lock. Since all `id_refcount' modifications
are protected with `ipsec_flows_mtx' mutex(9) it's no reason to keep it
atomic.

kasak, could you test this diff?

Am I supposed at first, to revert Alexander's patch? Because, with both patches hunk failed.

I have reverted first patch, successfully applied your patch, recompiled kernel and now testing. As I said earlier, it will take a couple days.


Index: sys/netinet/ip_ipsp.c
===================================================================
RCS file: /cvs/src/sys/netinet/ip_ipsp.c,v
retrieving revision 1.269
diff -u -p -r1.269 ip_ipsp.c
--- sys/netinet/ip_ipsp.c       10 Mar 2022 15:21:08 -0000      1.269
+++ sys/netinet/ip_ipsp.c       27 Apr 2022 14:36:26 -0000
@@ -1205,7 +1205,7 @@ ipsp_ids_insert(struct ipsec_ids *ids)
        found = RBT_INSERT(ipsec_ids_tree, &ipsec_ids_tree, ids);
        if (found) {
                /* if refcount was zero, then timeout is running */
-               if (atomic_inc_int_nv(&found->id_refcount) == 1) {
+               if ((++found->id_refcount) == 1) {
                        LIST_REMOVE(found, id_gc_list);
if (LIST_EMPTY(&ipsp_ids_gc_list))
@@ -1248,7 +1248,7 @@ ipsp_ids_lookup(u_int32_t ipsecflowinfo)
mtx_enter(&ipsec_flows_mtx);
        ids = RBT_FIND(ipsec_ids_flows, &ipsec_ids_flows, &key);
-       atomic_inc_int(&ids->id_refcount);
+       ids->id_refcount++;
        mtx_leave(&ipsec_flows_mtx);
return ids;
@@ -1290,6 +1290,8 @@ ipsp_ids_free(struct ipsec_ids *ids)
        if (ids == NULL)
                return;
+ mtx_enter(&ipsec_flows_mtx);
+
        /*
         * If the refcount becomes zero, then a timeout is started. This
         * timeout must be cancelled if refcount is increased from zero.
@@ -1297,10 +1299,10 @@ ipsp_ids_free(struct ipsec_ids *ids)
        DPRINTF("ids %p count %d", ids, ids->id_refcount);
        KASSERT(ids->id_refcount > 0);
- if (atomic_dec_int_nv(&ids->id_refcount) > 0)
+       if ((--ids->id_refcount) > 0) {
+               mtx_leave(&ipsec_flows_mtx);
                return;
-
-       mtx_enter(&ipsec_flows_mtx);
+       }
/*
         * Add second for the case ipsp_ids_gc() is already running and
Index: sys/netinet/ip_ipsp.h
===================================================================
RCS file: /cvs/src/sys/netinet/ip_ipsp.h,v
retrieving revision 1.238
diff -u -p -r1.238 ip_ipsp.h
--- sys/netinet/ip_ipsp.h       21 Apr 2022 15:22:50 -0000      1.238
+++ sys/netinet/ip_ipsp.h       27 Apr 2022 14:36:26 -0000
@@ -241,7 +241,7 @@ struct ipsec_ids {
        struct ipsec_id         *id_local;      /* [I] */
        struct ipsec_id         *id_remote;     /* [I] */
        u_int32_t               id_flow;        /* [I] */
-       u_int                   id_refcount;    /* [a] */
+       u_int                   id_refcount;    /* [F] */
        u_int                   id_gc_ttl;      /* [F] */
  };
  RBT_HEAD(ipsec_ids_flows, ipsec_ids);

Reply via email to