bruno-roustant commented on PR #2021:
URL: https://github.com/apache/solr/pull/2021#issuecomment-1768027856

   Here is my understanding.
   
   VersionBucket
   It serves 2 purposes.
   First, it is a lock object to lock atomic operations on a doc ID. It is a 
bucket because it is used to lock operations of all the doc IDs which hash 
falls into this bucket.
   Second, it keeps the highest version which is the max of the versions of the 
docs in the bucket. This is an optimization only when the leader forwards the 
update request to another replica (there could be an option to not store this 
long if only the leader updates and never forwards). When adding a doc with 
version v, to compare first with the highest version vh of the bucket. If v > 
vh, then we know the doc version is ordered, without having to look in the 
transaction log or index for the precise indexed doc version.
   
   Why 65536 VersionBucket?
   For both VersionBucket goals, the more buckets there are, the more precise 
is the locking and the highest optimization. The drawback is the memory usage.
   In SOLR-XXX, the number of buckets was studied. If there are not enough 
buckets, it happens that update threads are blocked waiting on the same lock 
when an update operation takes a long time. With a large number of buckets, the 
probability of updates locking the same bucket, within the duration of a long 
operation, is sufficiently low to not impact perf.
   
   Can we manage buckets dynamically?
   For the locking aspect, yes, but there is more synchronization required if 
we support bucket removal. If we only support lazy creation, we only need to 
synchronize when creating a bucket.
   For the highest version optimization, partially, we cannot remove the 
highest value if it is different than the common "seed" value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to