On Sat, Oct 07, 2017 at 06:53:32PM -0700, Henry B (Hank) Hotz, CISSP wrote:
> On thing that’s conspicuously missing from this discussion is any
> historical context for how the version numbers are *supposed* to be
> handled. It seems like most of these problems are recent, or at least
> recent-ish.

The previous system had... a lot of serious issues.  We mostly rewrote
lib/kadm5/log.c, but mostly also kept the design as it was.  We added
the uberblock to help with atomicity and to help find the end of the log
quickly, and made the iprop log function as a roll-forward log for the
HDB.  We mostly did not rewrite the ipropd daemons though.

> IIUC the deal is (should be? used to be? Please correct!):
> 
> 1) On initial creation, the log contains a version 0 no-op, making the
> db version 1.

Pretty much.

> 2) On connection, the slave tells the master what version it has. If
> it doesn’t match what the master has then the master sends updates to
> bring them in sync.

Yes. 

> 2a) If the master’s change log is insufficient, (or the difference is
> “too big), then it sends the whole DB.

There is not and never was an "is too big" heuristic.  If the slave was
1e6 entries behind, and th emaster had those 1e6 entries, then those
would be sent.

The new system automatically truncates the log (rewrites it, actually,
preserving N entries) as necessary.  This functions as an "is too big"
heuristic.

> 2b) If the difference is small enough, then the master just replays
> the change log from where the slave is.

Yes.

> 3) Seems to me that the handling of the heartbeat messages ought to
> mirror the initial connection logic, or else make no attempt to do
> anything to the DB at all. Anything else is clearly risky and
> unnecessarily complex. (I never worried about them because I had
> already implemented external processes to deal with the issue.
> Somebody else should write this bullet.)

I don't follow.

> A new DB (on a slave) is guaranteed to have a smaller version number
> than the master (if the master is actually populated), so will always
> get a complete download.
> 
> Truncation, preserving the version number is safe and periodically
> necessary. 

Yes.

> I do not remember the --reset option, but it’s clearly dangerous. How
> can it be used safely, knowing only the above?

It's no different than removing the log and restarting the master.

We didn't change the iprop _protocol_, but we've considered it.  If we
did modify it, then we'd a) make the version numbers larger, b) use
{vno, timestamp} rather than just {vno} to identify state, then if you
reset the log on the master then the master would be able to
send_complete() to slaves with, say, version 2.

So far we've tried hard to support graceful upgrades.  But we've been
tempted to make more radical changes.

For example, one thing we might do (no promises) is to make the HDB
interface for the kadm5 API (if we don't just burn that API altogether,
though as much as we dislike it, it's actually valueable just because of
the existing codebase using it, such as Russ' Wallet, or Roland
krb5_admin stack) just... a SQL interface.  We'd probably keep libhdb
for the KDC, and have the iprop system write old-style HDBs for the KDC,
but not for the admin interfaces.  We might then throw away the existing
iprop system and replace it with an RDBMS replication system.
PostgreSQL comes to mind, though we could also build a suitable system
out of SQLite3.

Key to all of that would be an implementation plan that makes it easy to
do all of this, otherwise it couldn't happen.  In the Heimdal tradition,
that would probably imply some sort of compiler.  (One thing I've toyed
with is modifying asn1_compile to support generation of code to use "SQL
rules", as it were, to encode to/from an RDBMS.)

But anyways, for now, and for as long as we don't choose to make such
radical changes, graceful upgrades are supported, at least to some
degree: the iprop protocol has not been modified, the iprop log format
has not been modified (the ubeblock is a nop, which already existed).
This has limited us somewhat.  In particular we have problems to deal
with like vno rollover, and how to gracefully deal with master-side
iprop log reset (spoilers: we can't!).

We've still managed to make significant improvements to the iprop
system, and we'll be making more (mostly we'll make ipropd-master fork()
per-slave processes, do a complete review of the ipropd daemons, and fix
the bugs we're aware of so far.

Nico
-- 

Reply via email to