Re: [DISCUSS] couch_event and FDB

Paul Davis Mon, 29 Apr 2019 12:28:10 -0700

On Mon, Apr 29, 2019 at 1:29 PM Nick Vatamaniuc <vatam...@apache.org> wrote:
>
> After discussing how replicator might be implemented with fdb
> https://lists.apache.org/thread.html/9338bd50f39d7fdec68d7ab2441c055c166041bd84b403644f662735@%3Cdev.couchdb.apache.org%3E,
> and thinking about a global jobs queue for replicator (and indexing as
> well, as Garren had mentioned), I had noticed that we might want to have
> something like a couch_event subsystem. We'd need it that to drive creation
> of replication jobs or pending indexing requests. So that's basically what
> this discussion thread is about.
>
> couch_event, for those who don't know, is a node-local event bus for
> database-level events. When dbs are created, deleted, or updated it
> publishes a corresponding event. Any interested listeners subscribe to
> those events. Currently some of the listeners are:
>
>  * ddoc cache : for ddoc updates, and db creates and deletes
>  * ken : for ddoc updates to know when to start building indexes
>  * _replicator db updates : to monitor changes to replication docs
>  * _users db auth : to drive auth cache updates
>  * _db_updates handler : drive db updates
>  * smoosh : drive compaction triggers
>
> This all happens in memory on each node. We use the nifty and fast khash
> library to distribute events. If listeners register and then die, we
> automatically clean that up.
>
> Now with fdb, some of those users of couch_event go away (smoosh) but most
> remain, and we need something to drive them.
>
> There was a separate discussion for how to implement _db_updates handler,
> which is a starting point for this one, but I had made a new thread as it
> involves not just the _db_updates but those other things:
>
> https://lists.apache.org/thread.html/a7bf140aea864286817bbd4f16f5a2f0a295f4d046950729400e0e2a@%3Cdev.couchdb.apache.org%3E
>


Just to make sure, this basically reads like _db_updates but with the
addition that we're tracking updates to design docs and maybe local
docs except the only thing with local docs is going away.

> From the _db_updates discussion I liked the proposal to use atomic ops to
> accumulate and deduplicate events and consumers periodically reading and
> resetting the stats.
>
> The list of events currently being published is the following:
>
>  * DbName, created
>  * DbName, deleted
>  * DbName, updated
>  * DbName, local_updated (not needed anymore, used by smoosh only)
>  * DbName, ddoc_updated
>  * DbName, {ddoc_updated, DDocId}
>
> (The {ddoc_updated, DDocId} makes it slightly more difficult as we'd need
> to track specific DDocIDs. Maybe we could forgo such detailed updates and
> let consumers keep track design documents on their own?)
>
> But the idea is to have consumer worker processes and queues for each
> consumer of the API. We could share them per-node. So, if on the same node
> replicator and indexer want to find out about db update, we'd just add
> extra callbacks for one, but they'd share the same consumer worker. Each
> consumer would receive updates in their queue from the producers. Producers
> would mostly end up doing atomic ops to update counts and periodically
> monitor if consumer are still alive. Consumers would poll changes (or use
> watches) to their update queues and notify their callbacks (start
> replication jobs, update _db_updates DB, start indexing jobs, etc.).
>
> Because consumers can die at any time, we don't want to have a growing
> event queue, so each consumer will periodically report its health in the
> db. The producer will monitor periodically all the consumers health
> (asynchronously, using snapshot reads) and if consumers stop updating their
> health their queues will be cleared. If they are alive next time they go to
> read they'd have to re-register themselves in the consumers list. (This is
> the same pattern from the replication proposal to replace process
> linking/monitoring and cleanup that we have now).
>

I'm super confused about this discussion of queues and producers and
consumers. Comparing with the _db_updates discussion it sounds like
"producers" would be anything that modifies a database or otherwise
generate an "event". And the queue is maybe the counters subspace?
Though its not really a queue right, its just a counter table? In the
_db_updates discussion we'd have a process that would reference the
counters table and then periodically update the _db_updates subspace
with whatever changed which sounds like something you'd expect on
every node?

> The data model might look something like:
>
>  ("couch_event", "consumers") = [C1, C2,...]
>  ("couch_event", Ci, "heath") = (MaxTimeout, Timestamp)
>  ("couch_event", Ci, "events", DbName, DbVer) = CreateDeleteUpdates
>  ("couch_event", Ci, "events", DbName, DbVer, "ddoc_updates", DDocId) =
> Updates
>
> CreateDeleteUpdates is an integer that will encode create, delete, and
> updates in one value using atomic ops:
>
> * The value is initialized to 2, this is equivalent to "unknown" or "idle"
> state.
> * On a delete: min(1), so we reset the value down to 1
> * On create: max(3) so we bring it up 3 (and if it was higher, it gets
> reset to 3)
> * On update: add(4)
>

This seems awkwardly complicated to me. The very first thing that
strikes me is that I'd argue against creating some second
implementation of worker coordination/health status thing. If there
are workers involved I'd like to have a central "worker"/"work
distribution" implementation that would be re-used by all the things
that need that rather than having some unknown number of bespoke
implementations.

Second the status aspect confuses me. I *think* what you're
considering is how to extend the _db_updates discussion to include
event types beyond `created | deleted | updated` as discussed there by
using logical comparisons? I'm also not sure what the lifetime of the
counter thing is? You say "initialized to 2" but I don't understand
when initialization would happen.

I'd probably simplify this by first removing the design doc id
specificity and returning to the more general "design doc updated"
message. Then we'd just expand that message to the related design
document ids if/when we need to.

Second, you can use multiple counters in a single value by padding
them out. So instead of using all of the logical comparisons you could
do something like such:

gen_increment(created) ->
    gen_increment_bin(1, 0, 0, 0);
gen_increment(deleted) ->
    gen_increment_bin(0, 1, 0, 0);
gen_incrememnt(doc_updated) ->
    gen_increment_bin(0, 0, 1, 0);
gen_incrememnt(ddoc_updated) ->
    gen_increment_bin(0, 0, 0, 1).

gen_increment_bin(Created, Deleted, Updated, DDocUpdated) ->
    <<
        Created:32/little-unsigned-integer,
        Deleted:32/little-unsigned-integer,
        Updated:32/little-unsigned-integer,
        DDocUpdated:32/little-unsigned-integer
    >>.

And then you just use the ADD atomic op and you now have your event
types changing independently.

> The idea is that a consumer looking at the value periodically could figure
> out if the database was deleted, created, created and updated, just
> updated, or it was idle.
>
> During each poll interval, for each database, the consumer would read the
> value and reset it to 2 with a read conflict range on it. For ddocs it
> would read the range and then clear it. If the value read was idle (2), it
> will clear all the db event counter and ddoc range.
>
> We could also enhance the consumers to accept only updates, and producers
> would do filtering ahead of time. Db names for example could be
> */_replicator or */_users, docs would be "_design/*". Or that we only need
> creates and deletes not updates. But that would mean not being able to
> share consumer workers per node.
>
> What does everyone think?
>
> Cheers,
> -Nick

I'd say all in all it sounds a lot more complicated than I'd expect.
And the difference between this proposal and the db_updates discussion
is not large enough to really consider them separately as it seems
like either approach could drive the other (assuming we make a minor
extension to the db_updates proposal).

Now that db_updates and couch_event are both in the discussion I'm
gonna use a third word so I don't confuse anyone by appearing to
reference one or the either. I can't think of anything amusing so I'm
gonna just go with calling this "notifications".

For notifications, I'd suggest that we do something similar to the
db_updates proposal with a few extensions:

a) all individual processes that want to generate a notification would
use atomic counters with padding to track the various types of
notification in the "counters" subspace
b) a worker process would periodically scan this subspace and make any
updates to a notification table
c) to provide couch_event style events, a single process per couchdb
node would follow the notifications space and translate each update to
current couch_event behavior
d) db_updates would just be a scan on this table with possible
dropping ddoc_updated events

The counters subspace would be like such:

    (?COUNTERS, DbName) -> (CounterType1, CounterType2, CounterType3, ...)

And the notifications subspace would look like:

    (?NOTIFICATIONS, versionstamp_of_notification) ->
        (DbName, db_uuid, versionstamp_of_db, versionstamp_of_last_ddoc_change?)
        | (DbName, ?DELETED)

The first question I'd ask then would be how hard we try to have a
single coordinating process that's translating from counters to
notifications. For v1 I'd probably just create one per CouchDB node
and accept that there will be contention. And we could make it a role
per node so that if someone wants to have fancy dancy deployments they
could just deploy that role N times or something. The only other
consideration I have is I'm not accounting for the summary KV that
Koco proposed. I assume that's something so that it doesn't matter
which worker created the summary that's comparing the new values of
counters? Though I suddenly realize if "summary" might have meant
"hash of data" or something. Assuming we generate notifications and
clear counters in a single transaction I'm not 100% sure what the
summary is for.

And that's all I have to say about that.

Re: [DISCUSS] couch_event and FDB

Reply via email to