-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi HÃ¥vard,
> Thanks for enduring my rant. :) Looking forward to see what you > find. Thanks again for bringing this to our attention and your analysis. My grip on the problem has grown the last few days. Although I do not have a proper fix I do know how to alleviate the pain on short term and can comment on how to recover from a situation where zones get stuck . First of all, as you have found out, there is a list of current TCP connections. When the number of concurrent connections stay below the size of this list everything is fine and works as it should. When it doesn't the signerd executes a different code path, holding off those connection for later. This code path doesn't work as it should. Judging from your analysis you are well aware of this. So to cut to the chase. Based on my testing, on short term your troubles should go away by increasing the number of this define in tcpset.h #define TCPSET_MAX 50 Make this something in the order of the number of zones you are adding at once. I'd stay a bit away from 1024 as to allow for the signerd to have some room for other file descriptors. So I'd advice maybe 500 to 900. Then there is also some SOA handling that behaves a bit inconsistent depending on the number of notifies coming in. I recommend applying the following attached patch: serial_handling.diff We'd appreciate you testing this changes. We have not yet decided if we'll be releasing this or wait till we found a proper fix. Some test feedback would be awesome! Last I'd like to address 'getting unstuck'. You mentioned restarting the signer and removing temp files helps partially. I can clarify this a bit. In my test i've seen two scenarios: 1) N zones where added to ODS where N>TCPSET_MAX. 2) N zones received a notify, where N>TCPSET_MAX . 1) For me in the first scenario stopping and starting the signer helps. Though the signerd will get stuck again after the following TCPSET_MAX connections. So adding 320 zones you'd have to go through 7 stop/start iterations. Ofcourse later, likely these zones will update at the same time? In case you get in situation 2). 2) This time restarting does not work. The stuckiness is persistent. What helps is stopping the signer, remove the tmp files like you did, start the signer and apply the stop/start strategy from scenario 1). These two 'fixes' work better for higher values of TCPSET_MAX. I don't really see a disadvantage to doing this. You'll use a couple of kilobytes more memory on the heap, you won't see it in top. ;) In the mean time we'll keep looking into an actual fix. I hope though that these suggestions will relieve some of your pain. Regards, Yuri -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlZytM4ACgkQI3PTR4mhavi6zwCfddnqXoq7JKtr3HgcBgsy7XiZ z9YAoJmpZgQnuNxMJ+ZpnxDrYj9T8cwI =QqUh -----END PGP SIGNATURE-----
commit 88bcdcf507c415f85f7aff9ca1cfd985aa5dd658 Author: Yuri Schaeffer <[email protected]> Date: Tue Dec 15 17:09:35 2015 +0100 Make handling of notifies more consistent. Previous implementation would hop back and forth between code paths. diff --git a/signer/src/wire/query.c b/signer/src/wire/query.c index e0ae910..965996e 100644 --- a/signer/src/wire/query.c +++ b/signer/src/wire/query.c @@ -354,56 +354,47 @@ query_process_notify(query_type* q, ldns_rr_type qtype, void* engine) return QUERY_DISCARDED; } lock_basic_lock(&q->zone->xfrd->serial_lock); - if (q->zone->xfrd->serial_notify_acquired) { - if (!util_serial_gt(q->zone->xfrd->serial_notify, - q->zone->xfrd->serial_disk)) { - - if (addr2ip(q->addr, address, sizeof(address))) { - ods_log_info("[%s] ignore notify from %s: already got " - "zone %s serial %u on disk (received %u)", query_str, - address, q->zone->name, q->zone->xfrd->serial_disk, - q->zone->xfrd->serial_notify); - } else { - ods_log_info("[%s] ignore notify: already got zone %s " - "serial %u on disk (received %u)", query_str, - q->zone->name, q->zone->xfrd->serial_disk, - q->zone->xfrd->serial_notify); - } - q->zone->xfrd->serial_notify_acquired = 0; + + if (!util_serial_gt(serial, q->zone->xfrd->serial_disk)) + { + if (addr2ip(q->addr, address, sizeof(address))) { + ods_log_info("[%s] ignore notify from %s: already got " + "zone %s serial %u on disk (received %u)", query_str, + address, q->zone->name, q->zone->xfrd->serial_disk, + serial); } else { - if (addr2ip(q->addr, address, sizeof(address))) { - ods_log_info("[%s] ignore notify from %s: zone %s " - "transfer in progress", query_str, address, - q->zone->name); - } else { - ods_log_info("[%s] ignore notify: zone %s transfer in " - "progress", query_str, q->zone->name); - } - /* update values */ - q->zone->xfrd->serial_notify = serial; - q->zone->xfrd->serial_notify_acquired = time_now(); + ods_log_info("[%s] ignore notify: already got zone %s " + "serial %u on disk (received %u)", query_str, + q->zone->name, q->zone->xfrd->serial_disk, serial); } lock_basic_unlock(&q->zone->xfrd->serial_lock); - goto send_notify_ok; + } else if (q->zone->xfrd->serial_notify_acquired) { + lock_basic_unlock(&q->zone->xfrd->serial_lock); + if (addr2ip(q->addr, address, sizeof(address))) { + ods_log_info("[%s] ignore notify from %s: zone %s " + "transfer in progress", query_str, address, + q->zone->name); + } else { + ods_log_info("[%s] ignore notify: zone %s transfer in " + "progress", query_str, q->zone->name); + } + } else { + q->zone->xfrd->serial_notify = serial; + q->zone->xfrd->serial_notify_acquired = time_now(); + lock_basic_unlock(&q->zone->xfrd->serial_lock); + /* forward notify to xfrd */ + if (addr2ip(q->addr, address, sizeof(address))) { + ods_log_verbose("[%s] forward notify for zone %s from client %s", + query_str, q->zone->name, address); + } else { + ods_log_verbose("[%s] forward notify for zone %s", query_str, + q->zone->name); + } + xfrd_set_timer_now(q->zone->xfrd); + dnshandler_fwd_notify(e->dnshandler, buffer_begin(q->buffer), + buffer_remaining(q->buffer)); } - q->zone->xfrd->serial_notify = serial; - q->zone->xfrd->serial_notify_acquired = time_now(); - lock_basic_unlock(&q->zone->xfrd->serial_lock); - } - - /* forward notify to xfrd */ - if (addr2ip(q->addr, address, sizeof(address))) { - ods_log_verbose("[%s] forward notify for zone %s from client %s", - query_str, q->zone->name, address); - } else { - ods_log_verbose("[%s] forward notify for zone %s", query_str, - q->zone->name); } - xfrd_set_timer_now(q->zone->xfrd); - dnshandler_fwd_notify(e->dnshandler, buffer_begin(q->buffer), - buffer_remaining(q->buffer)); - -send_notify_ok: /* send notify ok */ buffer_pkt_set_qr(q->buffer); buffer_pkt_set_aa(q->buffer);
serial_handling.diff.sig
Description: PGP signature
_______________________________________________ Opendnssec-user mailing list [email protected] https://lists.opendnssec.org/mailman/listinfo/opendnssec-user
