>> Can someone familiar with this piece of code please explain what >> problem this assertion might indicate? The text in the log >> message itself isn't very descriptive. Looking at the code this >> appears to be related to use of IXFR, but I can't figure out the >> sequence of events which might trigger this problem, or where the >> data parts would come from (DNS neighbor? Already-existing IXFR >> log?) > > An IXFR message is a list of differences between two versions of a zone. > In OpenDNSSEC this is called parts. > > A part is a list of deleted records and a list of added records. THE SOA > record in these lists are kind of special and are stored separately. The > soamin is the SOA record that is deleted, the soaplus is the SOA record > that is added. > > If one of those SOA records is missing in the part, then there is > something broken, and we fail on the assertion. > > Where the data part comes from: Evertime a change is made we keep a > journal in the working directory: <zone>.ixfr. That is read out when the > secondary requests an IXFR from OpenDNSSEC.
Hm, ok. From my recollection the <zone>.ixfr files are also read when OpenDNSSEC is started, possibly due to "pull" from the slave which does zone transfers(?) We've also observed problems in this area, this has given rise to https://issues.opendnssec.org/browse/SUPPORT-181 where the crux is ods-signerd: [backup] bad ixfr journal: trailing RRs after final SOA Normally this would cause the .ixfr file to be deleted, I'm running with a local patch which simply renames the file to <zone>.ixfr-bad for later debugging. I've given [email protected] copies of older /var/opendnssec/tmp/ directories; I now have two more stashed which I've not yet sent. Looking at the files in one of them, both the *.ixfr-bad and the *.ixfr files have an even number of SOA records in them... >> Having assertions fire in long-running daemons in a normal >> operational environment is a bug, plain and simple. > > I agree, however for a developer the assertion helps to find > the actual bug. Mm... I'm wondering if the assertion about "soamin not set" and the problem reported above are related. They at least concern the same piece of functionality -- both are related to IXFR handling. >> When this happens, if we try to restart the signer, it will >> shortly thereafter exit again with the same message; I have then >> to remove the tempoary files and push new zone content via notify >> messages from the hidden master to set things up again. So, this >> time I have two sets of such files which I can supply to a >> developer who would be willing to take a closer look (if indeed >> the source of the data can be found in the files on disk). > > The contents of the backup files may help, because as you say, > restarting will give you the same error. That tells me that reading the > soamin from backup fails, so that file should be able to be used to > trigger the bug. As I said, if any of you are interested in copies of the "bad" .ixfr files (and the presumed-good ones as well), I can send them privately. Curing this bug would, I beleive, considerably improve the resiliency of OpenDNSSEC in our deployment. > There aren't many lines that alter soamin. Right, that agrees with my grep'ing of the source. Regards, - HÃ¥vard _______________________________________________ Opendnssec-user mailing list [email protected] https://lists.opendnssec.org/mailman/listinfo/opendnssec-user
