On Sat, Oct 07, 2017 at 06:53:32PM -0700, Henry B (Hank) Hotz, CISSP wrote: > On thing that’s conspicuously missing from this discussion is any > historical context for how the version numbers are *supposed* to be > handled. It seems like most of these problems are recent, or at least > recent-ish.
The previous system had... a lot of serious issues. We mostly rewrote lib/kadm5/log.c, but mostly also kept the design as it was. We added the uberblock to help with atomicity and to help find the end of the log quickly, and made the iprop log function as a roll-forward log for the HDB. We mostly did not rewrite the ipropd daemons though. > IIUC the deal is (should be? used to be? Please correct!): > > 1) On initial creation, the log contains a version 0 no-op, making the > db version 1. Pretty much. > 2) On connection, the slave tells the master what version it has. If > it doesn’t match what the master has then the master sends updates to > bring them in sync. Yes. > 2a) If the master’s change log is insufficient, (or the difference is > “too big), then it sends the whole DB. There is not and never was an "is too big" heuristic. If the slave was 1e6 entries behind, and th emaster had those 1e6 entries, then those would be sent. The new system automatically truncates the log (rewrites it, actually, preserving N entries) as necessary. This functions as an "is too big" heuristic. > 2b) If the difference is small enough, then the master just replays > the change log from where the slave is. Yes. > 3) Seems to me that the handling of the heartbeat messages ought to > mirror the initial connection logic, or else make no attempt to do > anything to the DB at all. Anything else is clearly risky and > unnecessarily complex. (I never worried about them because I had > already implemented external processes to deal with the issue. > Somebody else should write this bullet.) I don't follow. > A new DB (on a slave) is guaranteed to have a smaller version number > than the master (if the master is actually populated), so will always > get a complete download. > > Truncation, preserving the version number is safe and periodically > necessary. Yes. > I do not remember the --reset option, but it’s clearly dangerous. How > can it be used safely, knowing only the above? It's no different than removing the log and restarting the master. We didn't change the iprop _protocol_, but we've considered it. If we did modify it, then we'd a) make the version numbers larger, b) use {vno, timestamp} rather than just {vno} to identify state, then if you reset the log on the master then the master would be able to send_complete() to slaves with, say, version 2. So far we've tried hard to support graceful upgrades. But we've been tempted to make more radical changes. For example, one thing we might do (no promises) is to make the HDB interface for the kadm5 API (if we don't just burn that API altogether, though as much as we dislike it, it's actually valueable just because of the existing codebase using it, such as Russ' Wallet, or Roland krb5_admin stack) just... a SQL interface. We'd probably keep libhdb for the KDC, and have the iprop system write old-style HDBs for the KDC, but not for the admin interfaces. We might then throw away the existing iprop system and replace it with an RDBMS replication system. PostgreSQL comes to mind, though we could also build a suitable system out of SQLite3. Key to all of that would be an implementation plan that makes it easy to do all of this, otherwise it couldn't happen. In the Heimdal tradition, that would probably imply some sort of compiler. (One thing I've toyed with is modifying asn1_compile to support generation of code to use "SQL rules", as it were, to encode to/from an RDBMS.) But anyways, for now, and for as long as we don't choose to make such radical changes, graceful upgrades are supported, at least to some degree: the iprop protocol has not been modified, the iprop log format has not been modified (the ubeblock is a nop, which already existed). This has limited us somewhat. In particular we have problems to deal with like vno rollover, and how to gracefully deal with master-side iprop log reset (spoilers: we can't!). We've still managed to make significant improvements to the iprop system, and we'll be making more (mostly we'll make ipropd-master fork() per-slave processes, do a complete review of the ipropd daemons, and fix the bugs we're aware of so far. Nico --