I'm +1 on throttling reloads; I think that is the most obvious and
critical work item for the MAAS team to address. I have filed that as
bug #1710308.

I'm also +1 on better service monitoring using actual queries; I've
filed that as bug #1710310. I think something equivalent to 'dig
@127.0.0.1 <test-query>' on the region should be enough to detect a
deadlock condition, but I like the idea of monitoring it from the rack's
perspective as well (though that feels more like a non-fatal warning,
because we don't want to restart bind in the event of random firewall
hiccups).

Finally, I think your last bullet requires more discussion before we can
work on it. MAAS currently uses sudoers rules specific to the init
system to start and stop services like bind9; we do not currently have
permission to 'kill -9' arbitrary processes. I'm concerned that if we go
down that road, we would open up the possibility that MAAS could
erroneously (or due to a malicious attack) believe that bind9 isn't
working and repeatedly kill it without good cause, or be convinced to
'kill -9' an incorrect process.

In summary, I think the most urgent thing for MAAS to do is throttle
reloads. That should greatly reduce the window of opportunity for the
deadlock to occur. In parallel, this should be addressed upstream in
bind9.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1710278

Title:
  [2.3a1] named stuck on reload, DNS broken

To manage notifications about this bug go to:
https://bugs.launchpad.net/bind/+bug/1710278/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to