-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi HÃ¥vard,

> Thanks for enduring my rant. :) Looking forward to see what you
> find.

Thanks again for bringing this to our attention and your analysis. My
grip on the problem has grown the last few days. Although I do not
have a proper fix I do know how to alleviate the pain on short term
and can comment on how to recover from a situation where zones get stuck
.

First of all, as you have found out, there is a list of current TCP
connections. When the number of concurrent connections stay below the
size of this list everything is fine and works as it should. When it
doesn't the signerd executes a different code path, holding off those
connection for later. This code path doesn't work as it should.
Judging from your analysis you are well aware of this.

So to cut to the chase. Based on my testing, on short term your
troubles should go away by increasing the number of this define in
tcpset.h

#define TCPSET_MAX 50

Make this something in the order of the number of zones you are adding
at once. I'd stay a bit away from 1024 as to allow for the signerd to
have some room for other file descriptors. So I'd advice maybe 500 to
900.

Then there is also some SOA handling that behaves a bit inconsistent
depending on the number of notifies coming in. I recommend applying
the following attached patch: serial_handling.diff

We'd appreciate you testing this changes. We have not yet decided if
we'll be releasing this or wait till we found a proper fix. Some test
feedback would be awesome!

Last I'd like to address 'getting unstuck'. You mentioned restarting
the signer and removing temp files helps partially. I can clarify this
a bit. In my test i've seen two scenarios: 1) N zones where added to
ODS where N>TCPSET_MAX. 2) N zones received a notify, where N>TCPSET_MAX
.

1) For me in the first scenario stopping and starting the signer
helps. Though the signerd will get stuck again after the following
TCPSET_MAX connections. So adding 320 zones you'd have to go through 7
stop/start iterations. Ofcourse later, likely these zones will update
at the same time? In case you get in situation 2).

2) This time restarting does not work. The stuckiness is persistent.
What helps is stopping the signer, remove the tmp files like you did,
start the signer and apply the stop/start strategy from scenario 1).

These two 'fixes' work better for higher values of TCPSET_MAX. I don't
really see a disadvantage to doing this. You'll use a couple of
kilobytes more memory on the heap, you won't see it in top. ;)

In the mean time we'll keep looking into an actual fix. I hope though
that these suggestions will relieve some of your pain.

Regards,
Yuri
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlZytM4ACgkQI3PTR4mhavi6zwCfddnqXoq7JKtr3HgcBgsy7XiZ
z9YAoJmpZgQnuNxMJ+ZpnxDrYj9T8cwI
=QqUh
-----END PGP SIGNATURE-----
commit 88bcdcf507c415f85f7aff9ca1cfd985aa5dd658
Author: Yuri Schaeffer <[email protected]>
Date:   Tue Dec 15 17:09:35 2015 +0100

    Make handling of notifies more consistent. Previous implementation would
    hop back and forth between code paths.

diff --git a/signer/src/wire/query.c b/signer/src/wire/query.c
index e0ae910..965996e 100644
--- a/signer/src/wire/query.c
+++ b/signer/src/wire/query.c
@@ -354,56 +354,47 @@ query_process_notify(query_type* q, ldns_rr_type qtype, void* engine)
             return QUERY_DISCARDED;
         }
         lock_basic_lock(&q->zone->xfrd->serial_lock);
-        if (q->zone->xfrd->serial_notify_acquired) {
-            if (!util_serial_gt(q->zone->xfrd->serial_notify,
-                q->zone->xfrd->serial_disk)) {
-
-                if (addr2ip(q->addr, address, sizeof(address))) {
-                    ods_log_info("[%s] ignore notify from %s: already got "
-                        "zone %s serial %u on disk (received %u)", query_str,
-                        address, q->zone->name, q->zone->xfrd->serial_disk,
-                        q->zone->xfrd->serial_notify);
-                } else {
-                    ods_log_info("[%s] ignore notify: already got zone %s "
-                        "serial %u on disk (received %u)", query_str,
-                        q->zone->name, q->zone->xfrd->serial_disk,
-                        q->zone->xfrd->serial_notify);
-                }
-                q->zone->xfrd->serial_notify_acquired = 0;
+
+        if (!util_serial_gt(serial, q->zone->xfrd->serial_disk))
+        {
+            if (addr2ip(q->addr, address, sizeof(address))) {
+                ods_log_info("[%s] ignore notify from %s: already got "
+                    "zone %s serial %u on disk (received %u)", query_str,
+                    address, q->zone->name, q->zone->xfrd->serial_disk,
+                    serial);
             } else {
-                if (addr2ip(q->addr, address, sizeof(address))) {
-                    ods_log_info("[%s] ignore notify from %s: zone %s "
-                        "transfer in progress", query_str, address,
-                        q->zone->name);
-                } else {
-                    ods_log_info("[%s] ignore notify: zone %s transfer in "
-                        "progress", query_str, q->zone->name);
-                }
-                /* update values */
-                q->zone->xfrd->serial_notify = serial;
-                q->zone->xfrd->serial_notify_acquired = time_now();
+                ods_log_info("[%s] ignore notify: already got zone %s "
+                    "serial %u on disk (received %u)", query_str,
+                    q->zone->name, q->zone->xfrd->serial_disk, serial);
             }
             lock_basic_unlock(&q->zone->xfrd->serial_lock);
-            goto send_notify_ok;
+        } else if (q->zone->xfrd->serial_notify_acquired) {
+            lock_basic_unlock(&q->zone->xfrd->serial_lock);
+            if (addr2ip(q->addr, address, sizeof(address))) {
+                ods_log_info("[%s] ignore notify from %s: zone %s "
+                    "transfer in progress", query_str, address,
+                    q->zone->name);
+            } else {
+                ods_log_info("[%s] ignore notify: zone %s transfer in "
+                    "progress", query_str, q->zone->name);
+            }
+        } else {
+            q->zone->xfrd->serial_notify = serial;
+            q->zone->xfrd->serial_notify_acquired = time_now();
+            lock_basic_unlock(&q->zone->xfrd->serial_lock);
+            /* forward notify to xfrd */
+            if (addr2ip(q->addr, address, sizeof(address))) {
+                ods_log_verbose("[%s] forward notify for zone %s from client %s",
+                    query_str, q->zone->name, address);
+            } else {
+                ods_log_verbose("[%s] forward notify for zone %s", query_str,
+                    q->zone->name);
+            }
+            xfrd_set_timer_now(q->zone->xfrd);
+            dnshandler_fwd_notify(e->dnshandler, buffer_begin(q->buffer),
+                buffer_remaining(q->buffer));
         }
-        q->zone->xfrd->serial_notify = serial;
-        q->zone->xfrd->serial_notify_acquired = time_now();
-        lock_basic_unlock(&q->zone->xfrd->serial_lock);
-    }
-
-    /* forward notify to xfrd */
-    if (addr2ip(q->addr, address, sizeof(address))) {
-        ods_log_verbose("[%s] forward notify for zone %s from client %s",
-            query_str, q->zone->name, address);
-    } else {
-        ods_log_verbose("[%s] forward notify for zone %s", query_str,
-            q->zone->name);
     }
-    xfrd_set_timer_now(q->zone->xfrd);
-    dnshandler_fwd_notify(e->dnshandler, buffer_begin(q->buffer),
-        buffer_remaining(q->buffer));
-
-send_notify_ok:
     /* send notify ok */
     buffer_pkt_set_qr(q->buffer);
     buffer_pkt_set_aa(q->buffer);

Attachment: serial_handling.diff.sig
Description: PGP signature

_______________________________________________
Opendnssec-user mailing list
[email protected]
https://lists.opendnssec.org/mailman/listinfo/opendnssec-user

Reply via email to