nickva opened a new pull request, #5827: URL: https://github.com/apache/couchdb/pull/5827
Previously, when the purge checkpoints were first created concurrently with compaction running, it was possible for compaction to finish first and remove too many purge infos before the internal replicator checkpointed. In that case we could end up with a "hole" between a minimum (checkpointed) purge sequence, and the oldest purge sequence. Subsequently, internal replicator would start crashing since when fetching the minimum purge sequence it will correctly detect that one of the purge clients is asking for a sequence that's too low (that is it "skipped" and hasn't processed intermediate purge sequences). The tell-tale sign of this in production is repeated `invalid_start_purge_seq` errors emitted in the logs. One way to get out of would be to delete the checkpoints docs and let them be re-created. To fix the race condition, when compaction starts check if all the expected checkpoints from the other shard copies are created first, and only then use the minimum version, otherwise use the oldest purge sequence version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
