nickva opened a new pull request, #5827:
URL: https://github.com/apache/couchdb/pull/5827

   Previously, when the purge checkpoints were first created concurrently with 
compaction running, it was possible for compaction to finish first and remove 
too many purge infos before the internal replicator checkpointed. In that case 
we could end up with a "hole" between a minimum (checkpointed) purge sequence, 
and the oldest purge sequence. Subsequently, internal replicator would start 
crashing since when fetching the minimum purge sequence it will correctly 
detect that one of the purge clients is asking for a sequence that's too low 
(that is it "skipped" and hasn't processed intermediate purge sequences). The 
tell-tale sign of this in production is repeated `invalid_start_purge_seq` 
errors emitted in the logs. One way to get out of would be to delete the 
checkpoints docs and let them be re-created.
   
   To fix the race condition, when compaction starts check if all the expected 
checkpoints from the other shard copies are created first, and only then use 
the minimum version, otherwise use the oldest purge sequence version.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to