Index documents in async way

Cao Mạnh Đạt Thu, 08 Oct 2020 07:21:16 -0700

Hi guys,

First of all it seems that I used the term async a lot recently :D.
Recently I have been thinking a lot about changing the current indexing
model of Solr from sync way like currently (user submit an update request
waiting for response). What about changing it to async model, where nodes
will only persist the update into tlog then return immediately much like
what tlog is doing now. Then we have a dedicated executor which reads from
tlog to do indexing (producer consumer model with tlog acting like the
queue).


I do see several big benefits of this approach

   - We can batching updates in a single call, right now we do not use
   writer.add(documents) api from lucene, by batching updates this gonna boost
   the performance of indexing
   - One common problems with Solr now is we have lot of threads doing
   indexing so that can ends up with many small segments. Using this model we
   can have bigger segments so less merge cost
   - Another huge reason here is after switching to this model, we can
   remove tlog and use a distributed queue like Kafka, Pulsar. Since the
   purpose of leader in SolrCloud now is ordering updates, the distributed
   queue is already ordering updates for us, so no need to have a dedicated
   leader. That is just the beginning of things that we can do after using a
   distributed queue.

What do your guys think about this? Just want to hear from your guys before
going deep into this rabbit hole.

Thanks!

Index documents in async way

Reply via email to