On Mon, Apr 29, 2019 at 3:28 PM Paul Davis <paul.joseph.da...@gmail.com>

> On Mon, Apr 29, 2019 at 1:29 PM Nick Vatamaniuc <vatam...@apache.org>
> wrote:
> >
> > After discussing how replicator might be implemented with fdb
> >
> https://lists.apache.org/thread.html/9338bd50f39d7fdec68d7ab2441c055c166041bd84b403644f662735@%3Cdev.couchdb.apache.org%3E
> ,
> > and thinking about a global jobs queue for replicator (and indexing as
> > well, as Garren had mentioned), I had noticed that we might want to have
> > something like a couch_event subsystem. We'd need it that to drive
> creation
> > of replication jobs or pending indexing requests. So that's basically
> what
> > this discussion thread is about.
> >
> > couch_event, for those who don't know, is a node-local event bus for
> > database-level events. When dbs are created, deleted, or updated it
> > publishes a corresponding event. Any interested listeners subscribe to
> > those events. Currently some of the listeners are:
> >
> >  * ddoc cache : for ddoc updates, and db creates and deletes
> >  * ken : for ddoc updates to know when to start building indexes
> >  * _replicator db updates : to monitor changes to replication docs
> >  * _users db auth : to drive auth cache updates
> >  * _db_updates handler : drive db updates
> >  * smoosh : drive compaction triggers
> >
> > This all happens in memory on each node. We use the nifty and fast khash
> > library to distribute events. If listeners register and then die, we
> > automatically clean that up.
> >
> > Now with fdb, some of those users of couch_event go away (smoosh) but
> most
> > remain, and we need something to drive them.
> >
> > There was a separate discussion for how to implement _db_updates handler,
> > which is a starting point for this one, but I had made a new thread as it
> > involves not just the _db_updates but those other things:
> >
> >
> https://lists.apache.org/thread.html/a7bf140aea864286817bbd4f16f5a2f0a295f4d046950729400e0e2a@%3Cdev.couchdb.apache.org%3E
> >
> Just to make sure, this basically reads like _db_updates but with the
> addition that we're tracking updates to design docs and maybe local
> docs except the only thing with local docs is going away.
Right, but this is a more fundamental component. couch_event drives
it also drives those other components. We even discussed the chance of not
having _db_updates in the proposal but I think we'd definitely need
something like couch_event. _db_events also persists updates (though
configurable) but couch_event is purely ephemeral so design requirements a
bit different for both.

> > From the _db_updates discussion I liked the proposal to use atomic ops to
> > accumulate and deduplicate events and consumers periodically reading and
> > resetting the stats.
> >
> > The list of events currently being published is the following:
> >
> >  * DbName, created
> >  * DbName, deleted
> >  * DbName, updated
> >  * DbName, local_updated (not needed anymore, used by smoosh only)
> >  * DbName, ddoc_updated
> >  * DbName, {ddoc_updated, DDocId}
> >
> > (The {ddoc_updated, DDocId} makes it slightly more difficult as we'd need
> > to track specific DDocIDs. Maybe we could forgo such detailed updates and
> > let consumers keep track design documents on their own?)
> >
> > But the idea is to have consumer worker processes and queues for each
> > consumer of the API. We could share them per-node. So, if on the same
> node
> > replicator and indexer want to find out about db update, we'd just add
> > extra callbacks for one, but they'd share the same consumer worker. Each
> > consumer would receive updates in their queue from the producers.
> Producers
> > would mostly end up doing atomic ops to update counts and periodically
> > monitor if consumer are still alive. Consumers would poll changes (or use
> > watches) to their update queues and notify their callbacks (start
> > replication jobs, update _db_updates DB, start indexing jobs, etc.).
> >
> > Because consumers can die at any time, we don't want to have a growing
> > event queue, so each consumer will periodically report its health in the
> > db. The producer will monitor periodically all the consumers health
> > (asynchronously, using snapshot reads) and if consumers stop updating
> their
> > health their queues will be cleared. If they are alive next time they go
> to
> > read they'd have to re-register themselves in the consumers list. (This
> is
> > the same pattern from the replication proposal to replace process
> > linking/monitoring and cleanup that we have now).
> >
> I'm super confused about this discussion of queues and producers and
> consumers. Comparing with the _db_updates discussion it sounds like
> "producers" would be anything that modifies a database or otherwise
> generate an "event". And the queue is maybe the counters subspace?

Producers and consumers vs notifiers and consumers? Notifiers it is then! I
had started thinking in the context of replication job queues as had
already discussed workers and job queues there. But yeah, they are tables I

> Though its not really a queue right, its just a counter table? In the
> _db_updates discussion we'd have a process that would reference the
> counters table and then periodically update the _db_updates subspace
> with whatever changed which sounds like something you'd expect on
> every node?
> The data model might look something like:
> >
> >  ("couch_event", "consumers") = [C1, C2,...]
> >  ("couch_event", Ci, "heath") = (MaxTimeout, Timestamp)
> >  ("couch_event", Ci, "events", DbName, DbVer) = CreateDeleteUpdates
> >  ("couch_event", Ci, "events", DbName, DbVer, "ddoc_updates", DDocId) =
> > Updates
> >
> > CreateDeleteUpdates is an integer that will encode create, delete, and
> > updates in one value using atomic ops:
> >
> > * The value is initialized to 2, this is equivalent to "unknown" or
> "idle"
> > state.
> > * On a delete: min(1), so we reset the value down to 1
> > * On create: max(3) so we bring it up 3 (and if it was higher, it gets
> > reset to 3)
> > * On update: add(4)
> >
> This seems awkwardly complicated to me. The very first thing that
> strikes me is that I'd argue against creating some second
> implementation of worker coordination/health status thing. If there
> are workers involved I'd like to have a central "worker"/"work
> distribution" implementation that would be re-used by all the things
> that need that rather than having some unknown number of bespoke
> implementations.

I can see each worker node getting identified by a worker ID and then
handling a set of "roles". They'd have a single health reporting key and
the same code handling time-outs and all that. One subtlety is who notices
an takes custom action when they are down. In replication proposal it was
one of the replication workers (a neighbor) and then it re-queues its jobs.
In this case it was the producer. Those could just be configuration
parameters I'd think.

> Second the status aspect confuses me. I *think* what you're
> considering is how to extend the _db_updates discussion to include
> event types beyond `created | deleted | updated` as discussed there by
> using logical comparisons? I'm also not sure what the lifetime of the
> counter thing is? You say "initialized to 2" but I don't understand
> when initialization would happen.

Initialized when consumer starts up and after each polling cycle. Then
resets to 2 with an explicit conflict range for the updater. See the
pattern here with the "reset"

But now that you've said we can use padding and maybe not needing specific
ddoc IDs it could look much nicer.

> I'd probably simplify this by first removing the design doc id
> specificity and returning to the more general "design doc updated"
> message. Then we'd just expand that message to the related design
> document ids if/when we need to.

> Second, you can use multiple counters in a single value by padding
> them out. So instead of using all of the logical comparisons you could
> do something like such:
> gen_increment(created) ->
>     gen_increment_bin(1, 0, 0, 0);
> gen_increment(deleted) ->
>     gen_increment_bin(0, 1, 0, 0);
> gen_incrememnt(doc_updated) ->
>     gen_increment_bin(0, 0, 1, 0);
> gen_incrememnt(ddoc_updated) ->
>     gen_increment_bin(0, 0, 0, 1).
> gen_increment_bin(Created, Deleted, Updated, DDocUpdated) ->
>     <<
>         Created:32/little-unsigned-integer,
>         Deleted:32/little-unsigned-integer,
>         Updated:32/little-unsigned-integer,
>         DDocUpdated:32/little-unsigned-integer
>     >>.
> And then you just use the ADD atomic op and you now have your event
> types changing independently.

Had no idea about packed ranges. And that's not a tuple, it is "integer"
concatenated from 4 other integers?

I think one thing it doesn't do is clear the creation and update counter on
deletion, and creation doesn't clear the deletion counter.

With the add, mins and max a [Created, Deleted, Created] event list would
look like:
2, max(2,3)=3, min(1,3)=1, max(1,3)=3, emit "Created"

Deleted, Created, Deleted would look like:
2, min(1,3)=1, max(1,3)=3, min(1,3)=1 emit "Deleted".

With completely separate bit fields we wouldn't know whether to emit
"Deleted" or "Created" if it looks like (1, 1, 0, 0).

But I think we can just do more than one atomic op and have a min(0) or
and(0) some fields in the same transaction as that add(...) operation.

Or, maybe we'd use {DbName, DbVersion}. Then for same database we'd know
the obvious sequence of events, since if gets created again it would get a
new DbVersion.

> The idea is that a consumer looking at the value periodically could figure
> > out if the database was deleted, created, created and updated, just
> > updated, or it was idle.
> >
> > During each poll interval, for each database, the consumer would read the
> > value and reset it to 2 with a read conflict range on it. For ddocs it
> > would read the range and then clear it. If the value read was idle (2),
> it
> > will clear all the db event counter and ddoc range.
> >
> > We could also enhance the consumers to accept only updates, and producers
> > would do filtering ahead of time. Db names for example could be
> > */_replicator or */_users, docs would be "_design/*". Or that we only
> need
> > creates and deletes not updates. But that would mean not being able to
> > share consumer workers per node.
> >
> > What does everyone think?
> >
> > Cheers,
> > -Nick
> I'd say all in all it sounds a lot more complicated than I'd expect.
> And the difference between this proposal and the db_updates discussion
> is not large enough to really consider them separately as it seems
> like either approach could drive the other (assuming we make a minor
> extension to the db_updates proposal).
> Now that db_updates and couch_event are both in the discussion I'm
> gonna use a third word so I don't confuse anyone by appearing to
> reference one or the either. I can't think of anything amusing so I'm
> gonna just go with calling this "notifications".

"Bus" a shorter version! :-) Notification sounds fine though.

> For notifications, I'd suggest that we do something similar to the
> db_updates proposal with a few extensions:
> a) all individual processes that want to generate a notification would
> use atomic counters with padding to track the various types of
> notification in the "counters" subspace
> b) a worker process would periodically scan this subspace and make any
> updates to a notification table
c) to provide couch_event style events, a single process per couchdb
> node would follow the notifications space and translate each update to
> current couch_event behavior
> d) db_updates would just be a scan on this table with possible
> dropping ddoc_updated events
> The counters subspace would be like such:
>     (?COUNTERS, DbName) -> (CounterType1, CounterType2, CounterType3, ...)
> And the notifications subspace would look like:
>     (?NOTIFICATIONS, versionstamp_of_notification) ->
>         (DbName, db_uuid, versionstamp_of_db,
> versionstamp_of_last_ddoc_change?)
>         | (DbName, ?DELETED)
> The first question I'd ask then would be how hard we try to have a
> single coordinating process that's translating from counters to
> notifications. For v1 I'd probably just create one per CouchDB node
> and accept that there will be contention. And we could make it a role
> per node so that if someone wants to have fancy dancy deployments they
> could just deploy that role N times or something. The only other
> consideration I have is I'm not accounting for the summary KV that
> Koco proposed. I assume that's something so that it doesn't matter
> which worker created the summary that's comparing the new values of
> counters? Though I suddenly realize if "summary" might have meant
> "hash of data" or something. Assuming we generate notifications and
> clear counters in a single transaction I'm not 100% sure what the
> summary is for.

A big difference here is that this is a shared / global notification table
where I had it per consumer. The reason was that when a consumer goes away,
or it unregisters and its event table is cleared. Also, in an idle state
all notification subspaces would contain just a list of consumers and their
health and no other data. Then _db_updates, indexer, replicator would
consume the events and and then would persist that data or spawn or
whatever they'd want, but it would be out of couch_event's hands at that
point (so to speak). With a global table, not sure I follow how we clear
the ?COUNTERS or ?NOTIFICATION table. We'd perhaps have a list of consumers
and their last known consumed versionstamp_of_notification, then the worker
process would find the minimum and clear everything from the start to that
minimum. I think that might work?

Another nice thing about separate consumers is that they each can have
different polling periods. Indexer could poll every minute, replicator
every 10 seconds, change feeds consumer every 5, etc. The downside is
having to copy update to all them. A pretty standard message queue
trade-off (queue per worker, vs global queue).

Also am worried about d) with db_updates just scanning the table for and
this being used for its backing store. I have seen _db_updates disabled in
production in few instance, and wonder if we'd would want the option of not
storing db update permanently (and make it part of _db_updates internals)
as opposed making couch_event depend on it.

Reply via email to