nickva opened a new pull request, #4962: URL: https://github.com/apache/couchdb/pull/4962
Make couch_multidb_changes shard map aware couch_multidb_changes module is monitoring shards whose names match a particular suffix, and notifies users with found, updated and deleted event. This is the module which drives replicator jobs when `*/_replicator` databases are updated. Previously, couch_multidb_changes reacted only to node-local shard file events and was not aware of the shard map membership of those files. This was mostly evident during shard moves: the target shard could be created long before the shard file becomes a part of the shard map. The replicator could notice the new target shard file and spawn a replication job on the new node, but keep the same replication job running on the source node. The two replication jobs will eventually conflict in the PG system (https://www.erlang.org/doc/man/pg.html) and one of them would start crashing with a "duplicate job" error. This could last days depending on how long it would take to populate the data on the target. Even after recovery, the target shard could be backed-off up to another extra 8 hours until it may run again. To avoid issues like that, make couch_multidb_changes aware of shard map membership updates. When the a shard file is discovered, and it is not in the shard map, mark it with a `wait_shard_map = true` flag. Then, re-use the existing db event monitoring mechanism to notice when shards db is updated, and schedule a delayed membership check for the shards tracked in our ETS table. Other changes to the module are mostly cosmetic: * Remove the unused `created` callback. `db_found` is used instead, both when dbs are created, and during startup when they are discovered. * In the ETS table use a proper `#row{}` record since we now have 5 items in the tuple. This simplifies some of the existing code as well. * During deletion and creation, actually delete the entries from the ETS table. Previously we didn't do it so the would hang around forever until the node was restarted. * Add comments to a few tricky sections explaining what should be happening there. * Add more tests, both the old and new functionality. Increase coverage from 96% to 98%. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org