I think Apache Solr could explore leveraging the returned sequence number
for its transaction logs.

On Tue, 25 Apr 2023 at 18:36, Michael McCandless <luc...@mikemccandless.com>
wrote:

> On Sun, Apr 23, 2023 at 6:19 AM Uwe Schindler <u...@thetaphi.de> wrote:
>
> Having the sequence number public in API does not bring any benefit, as
>> you cannot use it for anything.
>>
>
> Actually there are some interesting use cases for sequence numbers:
>
> They enable the caller to know the effective order of operations of
> concurrent indexing events.  This can be useful for applications that might
> sometimes update the same document at the same time across threads to
> implement optimistic concurrency to re-index the same document if the order
> was not correct according to the applications external version tracking for
> out-of-order updates.  OpenSearch has an array of locks to implement
> pessimistic concurrency (ensuring the that same id is never updated
> concurrently) but for cases where the conflicts are rare, the optimistic
> implementation based on Lucene's sequence numbers is likely more efficient.
>
> Another use case is precise indexing operation replay (e.g. from a Kinesis
> queue or transaction log or whatever) on recovering from a commit point:
> upon commit, you know which precise indexing event was captured in the
> commit, and on recovering you can resume indexing from precisely the next
> indexing event.  This doesn't matter for idempotent updates, but, for other
> cases like append only, it is useful and performant.
>
> I also don't see why flush should return a sequence number -- it is not an
> externally visible event.  Patrick maybe you had an interesting use case in
> mind?  Note that commit also writes (and fsyncs) the next segments_N file,
> to light all the newly written/fsync'd segments for the next reader to open.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>

Reply via email to