Hi André,
I understand your concerns but still you can explicitly set rrsig-refresh.
Based on our experience DNS deployments are very diverse. So what is the right
default value?
Regards,
Daniel
On 8/31/22 13:53, André Keller wrote:
Hi Libor,
On 31.08.22 13:14, libor.peltan wrote:
What rrsig-refresh actually serves for, is to refresh RRSIGs soon enough, so
they they don't expire due to delays in:
1) propagation among authoritative servers, that means synchronization of
secondaries with primaries, including e.g. the lengthy process of signing
itself (in case of huge zone)
2) propagation to resolvers' caches
When I thought about this, I actually saw that (1) is exactly propagation-delay
and (2) is exactly the RRSIG's TTL. Setting rrsig-refresh default to the sum of
both values was a logical conclusion.
I see, I guess from a knot standpoint this makes sense, but I feel this does
not take into account any potential delays that are caused by operational
issues.
To paint a very simplified picture of our own architecture:
* We use Puppet Configuration management create/maintain the knot configuration
on all involved servers
* We have a hidden primary, that holds the zonedata and does the signing
* We have public secondaries that sync these zones via the normal
TSIG/AXFR/IXFR protocol
* Zonedata update on the hidden primary is done via a CI pipeline towards the
hidden primary
So for the actual "public" facing service, only the secondaries are relevant as long as we do not need to change zonedata. That means the hidden primary also has no redundancy built in. If it breaks,
we will simply redeploy it with puppet, rerun the pipeline and we are up and running again. However this would take time depending on when the outage is. So having signatures refresh early before they
expire give us some headroom there were the secondaries can serve the current zonedata without being dependent on the primary.
Another issue I can think of, could be temporary network issues between the
primary and the secondaries.
I'd say that the setting of propagation-delay is still in your hands, as well as setting non-default rrsig-refresh. The only disadvantage of too high rrsig-refresh is that zone signing takes place
more often and creates larger change-sets to be propagated to secondaries. In other words, utilizing more of all resources (CPU, memory, disk, network).
For our deployment this is not really a concern. We do not have huge zones, we
just have many of them. Also, they are mostly static. So signing performance
was never an issue until now.
I would probably prefer to set a higher rrsig-refresh as compared to increase propagation-delay, it seems clearer to me what it does. Propagation delay for me is the time it takes during normal
operations for all primaries and secondaries to be in sync, plus some margin for taking into account caching on resolvers. On top of that I'd like to have some sort of safety margin against
operational issues, so setting rrsig-refresh is probably the way we go about in the future.
This all makes me think if the one-hour default of propagation-delay is maybe
not optimal...?
Please let me know your ideas/opinions in more detail. Any real operational
experience is very very valuable for us!
As already said, at least to me propagation-delay is not what I would associate with operational issues, I would expect all my primaries and secondary to be in sync during normal operation well within
the default of one hour.
I guess choosing default values is always hard and I do not have an issue with making our configuration more explicit to cover our specific use case. I just wish this, at least for us, quite
significant change in behavior would have been made a bit more obvious in the changelog. It caught us by suprise :)
Regards
André
--
--