Hi, I recently had occasion to bump the number of zones in our OpenDNSSEC installation by a significant amount -- around 320 zones were added in "one go", first by a sequence of "ods-ksmutil zone add" commands, and then with "ods-ksmutil update zonelist", followed by backing up the SoftHSM and KASP databases, and then notifying the enforcer.
This reveals once again that I'm not able to operationally figure out how to handle OpenDNSSEC, and I find it to be a quite frustrating experience, probably because of bugs. As usual, I'm running OpenDNSSEC 1.4.7 in "DNS in, DNS out" mode. The problem I appear to have is that a largish number of the newly added zones have not been transferred from the hidden master to OpenDNSSEC. What's more, there doesn't seem to come any new initiative from OpenDNSSEC's signer to actually re-try the zone transfers. Instead, it has gotten it into its head that it's already doing the zone transfers, but this is untrue, and no coaxing of the running signer appears to be able to persuade it otherwise. Instead, the signer keeps logging Dec 3 17:37:09 hugin ods-signerd: [tools] unable to read zone <zonename>: adapter failed (Incoming zone transfer not ready) Dec 3 17:37:09 hugin ods-signerd: [worker[4]] backoff task [read] for zone <zonename> with 3600 seconds However, looking at a packet trace with the hidden master on the signer machine reveals that ods-signerd did at this instance *NOT* try to initiate a zone transfer. It's as if ods-signerd sits there idle in a loop expecting the zone files to somehow magically appear in the file system, when I've given clear instructions that it needs to use zone transfers to fetch the data. This, I suspect, goes back to the old complaint I've raised before that there seems to be insufficient synchronization between the different internal tasks in ods-signerd. This can, among other things, lead to alarming log messages which are actually (I hope!) benign: Dec 3 16:28:26 hugin ods-signerd: [worker[4]] CRITICAL: failed to sign zone <zone>: General error (because that zone has yet to be transferred from the hidden master, and is therefore not available) and this makes it quite difficult as an operator to relate to *anything* OpenDNSSEC logs -- it all too frequently cries "Wolf! Wolf!". I beleive this is a problem which needs to be solved. I do realize that's no small task... ...and in the sequence where the zones were added, ods-enforcerd complained: Dec 3 16:12:58 hugin ods-enforcerd: INFO: Promoting ZSK from publish to active as this is the first pass for the zone Dec 3 16:12:58 hugin ods-enforcerd: ERROR: Trying to make non-backed up ZSK active when RequireBackup flag is set Yes, I've set RequireBackup, but that's not caused me to commit an operational error? Again, seen from an operator, this is "Wolf! Wolf!" once again. Meanwhile, I've run "ods-signer" and listed the work queue. It remains more or less steady at 366 tasks scheduled, one per zone, many of them of this type: On Thu Dec 3 18:33:43 2015 I will [read] zone <zone> Typically, it's "working" on 4 of them: cmd> queue It is now Thu Dec 3 17:40:04 2015 Working with task [read] on zone <zone1> Working with task [read] on zone <zone2> Working with task [read] on zone <zone3> Working with task [read] on zone <zone4> I have 362 tasks scheduled. ... However, again, when ods-signerd says "working with task [read]", it appears it's always talking about "reading from the file system". While it's doing this, *NO* activity is seen with my packet sniffer related to these zones towards the hidden master. I can give "ods-signer" the "flush" command, and while it re-schedules the various tasks it has queued, it is not making any progress AT ALL on transferring ANY of the newly added 320 (minus 51) zones which remains. According to the log (and the packet sniffer), ods-signer's xfrd task is periodically probing some of the old already-established zones, but out of the 320 zones added, 51 have made it, and by the looks of it, no more of the newly added zones will ever automatically be transferred from the hidden master. The last entry I have in the log from the xfrd sub-task of ods-signerd related to the newly added zones is: Dec 3 16:28:26 hugin ods-signerd: [xfrd] zone <zone> transfer done [notify acquired 0, serial on disk 2015120315, notify serial 0] and the local time is now well past 18:30. Bumping the serial number on the hidden master for some of the new not-yet-transferred zones and sending a notify just produces this message in the OpenDNSSEC log: Dec 3 18:21:30 hugin ods-signerd: [query] ignore notify from <hidden-master>: zone <zone> transfer in progress to which I can only say "rubbish!", as a zone transfer is most definately *NOT* in progress -- both the packet sniffer and the display of the open FDs of ods-signerd disagrees. Is it any wonder I'm frustrated with what appears to be an utter lack of robustness in this area of functionality? It looks like the only recourse I have is to restart OpenDNSSEC, and then I'll once again get the problem that it falls over due to the contents of the tmp/ files it has created itself. Double sigh! Regards, - HÃ¥vard _______________________________________________ Opendnssec-user mailing list [email protected] https://lists.opendnssec.org/mailman/listinfo/opendnssec-user
