I think Apache Solr could explore leveraging the returned sequence number for its transaction logs.
On Tue, 25 Apr 2023 at 18:36, Michael McCandless <luc...@mikemccandless.com> wrote: > On Sun, Apr 23, 2023 at 6:19 AM Uwe Schindler <u...@thetaphi.de> wrote: > > Having the sequence number public in API does not bring any benefit, as >> you cannot use it for anything. >> > > Actually there are some interesting use cases for sequence numbers: > > They enable the caller to know the effective order of operations of > concurrent indexing events. This can be useful for applications that might > sometimes update the same document at the same time across threads to > implement optimistic concurrency to re-index the same document if the order > was not correct according to the applications external version tracking for > out-of-order updates. OpenSearch has an array of locks to implement > pessimistic concurrency (ensuring the that same id is never updated > concurrently) but for cases where the conflicts are rare, the optimistic > implementation based on Lucene's sequence numbers is likely more efficient. > > Another use case is precise indexing operation replay (e.g. from a Kinesis > queue or transaction log or whatever) on recovering from a commit point: > upon commit, you know which precise indexing event was captured in the > commit, and on recovering you can resume indexing from precisely the next > indexing event. This doesn't matter for idempotent updates, but, for other > cases like append only, it is useful and performant. > > I also don't see why flush should return a sequence number -- it is not an > externally visible event. Patrick maybe you had an interesting use case in > mind? Note that commit also writes (and fsyncs) the next segments_N file, > to light all the newly written/fsync'd segments for the next reader to open. > > Mike McCandless > > http://blog.mikemccandless.com > >