nickva opened a new pull request, #4962:
URL: https://github.com/apache/couchdb/pull/4962

   Make couch_multidb_changes shard map aware couch_multidb_changes module is 
monitoring shards whose names match a particular suffix, and notifies users 
with found, updated and deleted event. This is the module which drives 
replicator jobs when `*/_replicator` databases are updated.
   
   Previously, couch_multidb_changes reacted only to node-local shard file 
events and was not aware of the shard map membership of those files. This was 
mostly evident during shard moves: the target shard could be created long 
before the
   shard file becomes a part of the shard map. The replicator could notice the 
new target shard file and spawn a replication job on the new node, but keep the 
same replication job running on the source node. The two replication jobs will 
eventually conflict in the PG system (https://www.erlang.org/doc/man/pg.html) 
and one of them would start crashing with a "duplicate job" error. This could 
last days depending on how long it would take to populate the data on the 
target. Even after recovery, the target shard could be backed-off up to another 
extra 8 hours until it may run again.
   
   To avoid issues like that, make couch_multidb_changes aware of shard map 
membership updates. When the a shard file is discovered, and it is not in the 
shard map, mark it with a `wait_shard_map = true` flag. Then, re-use the 
existing db event monitoring mechanism to notice when shards db is updated, and 
schedule a delayed membership check for the shards tracked in our ETS table.
   
   Other changes to the module are mostly cosmetic:
   
    * Remove the unused `created` callback. `db_found` is used instead, both 
when dbs are created, and during startup when they are discovered.
   
    * In the ETS table use a proper `#row{}` record since we now have 5 items 
in the tuple. This simplifies some of the existing code as well.
   
    * During deletion and creation, actually delete the entries from the ETS 
table. Previously we didn't do it so the would hang around forever until the 
node was restarted.
   
    * Add comments to a few tricky sections explaining what should be happening 
there.
   
    * Add more tests, both the old and new functionality. Increase coverage 
from 96% to 98%.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to