After discussing how replicator might be implemented with fdb
https://lists.apache.org/thread.html/9338bd50f39d7fdec68d7ab2441c055c166041bd84b403644f662735@%3Cdev.couchdb.apache.org%3E,
and thinking about a global jobs queue for replicator (and indexing as
well, as Garren had mentioned), I had noticed that we might want to have
something like a couch_event subsystem. We'd need it that to drive creation
of replication jobs or pending indexing requests. So that's basically what
this discussion thread is about.

couch_event, for those who don't know, is a node-local event bus for
database-level events. When dbs are created, deleted, or updated it
publishes a corresponding event. Any interested listeners subscribe to
those events. Currently some of the listeners are:

 * ddoc cache : for ddoc updates, and db creates and deletes
 * ken : for ddoc updates to know when to start building indexes
 * _replicator db updates : to monitor changes to replication docs
 * _users db auth : to drive auth cache updates
 * _db_updates handler : drive db updates
 * smoosh : drive compaction triggers

This all happens in memory on each node. We use the nifty and fast khash
library to distribute events. If listeners register and then die, we
automatically clean that up.

Now with fdb, some of those users of couch_event go away (smoosh) but most
remain, and we need something to drive them.

There was a separate discussion for how to implement _db_updates handler,
which is a starting point for this one, but I had made a new thread as it
involves not just the _db_updates but those other things:

https://lists.apache.org/thread.html/a7bf140aea864286817bbd4f16f5a2f0a295f4d046950729400e0e2a@%3Cdev.couchdb.apache.org%3E

>From the _db_updates discussion I liked the proposal to use atomic ops to
accumulate and deduplicate events and consumers periodically reading and
resetting the stats.

The list of events currently being published is the following:

 * DbName, created
 * DbName, deleted
 * DbName, updated
 * DbName, local_updated (not needed anymore, used by smoosh only)
 * DbName, ddoc_updated
 * DbName, {ddoc_updated, DDocId}

(The {ddoc_updated, DDocId} makes it slightly more difficult as we'd need
to track specific DDocIDs. Maybe we could forgo such detailed updates and
let consumers keep track design documents on their own?)

But the idea is to have consumer worker processes and queues for each
consumer of the API. We could share them per-node. So, if on the same node
replicator and indexer want to find out about db update, we'd just add
extra callbacks for one, but they'd share the same consumer worker. Each
consumer would receive updates in their queue from the producers. Producers
would mostly end up doing atomic ops to update counts and periodically
monitor if consumer are still alive. Consumers would poll changes (or use
watches) to their update queues and notify their callbacks (start
replication jobs, update _db_updates DB, start indexing jobs, etc.).

Because consumers can die at any time, we don't want to have a growing
event queue, so each consumer will periodically report its health in the
db. The producer will monitor periodically all the consumers health
(asynchronously, using snapshot reads) and if consumers stop updating their
health their queues will be cleared. If they are alive next time they go to
read they'd have to re-register themselves in the consumers list. (This is
the same pattern from the replication proposal to replace process
linking/monitoring and cleanup that we have now).

The data model might look something like:

 ("couch_event", "consumers") = [C1, C2,...]
 ("couch_event", Ci, "heath") = (MaxTimeout, Timestamp)
 ("couch_event", Ci, "events", DbName, DbVer) = CreateDeleteUpdates
 ("couch_event", Ci, "events", DbName, DbVer, "ddoc_updates", DDocId) =
Updates

CreateDeleteUpdates is an integer that will encode create, delete, and
updates in one value using atomic ops:

* The value is initialized to 2, this is equivalent to "unknown" or "idle"
state.
* On a delete: min(1), so we reset the value down to 1
* On create: max(3) so we bring it up 3 (and if it was higher, it gets
reset to 3)
* On update: add(4)

The idea is that a consumer looking at the value periodically could figure
out if the database was deleted, created, created and updated, just
updated, or it was idle.

During each poll interval, for each database, the consumer would read the
value and reset it to 2 with a read conflict range on it. For ddocs it
would read the range and then clear it. If the value read was idle (2), it
will clear all the db event counter and ddoc range.

We could also enhance the consumers to accept only updates, and producers
would do filtering ahead of time. Db names for example could be
*/_replicator or */_users, docs would be "_design/*". Or that we only need
creates and deletes not updates. But that would mean not being able to
share consumer workers per node.

What does everyone think?

Cheers,
-Nick

Reply via email to