Re: Index documents in async way

Mike Drob Thu, 08 Oct 2020 07:31:30 -0700

Interesting idea! Can you explain a little more on how this would impact
durability of updates? What does a failure look like, and how does that
information get propagated back to the client app?


Mike

On Thu, Oct 8, 2020 at 9:21 AM Cao Mạnh Đạt <[email protected]> wrote:

> Hi guys,
>
> First of all it seems that I used the term async a lot recently :D.
> Recently I have been thinking a lot about changing the current indexing
> model of Solr from sync way like currently (user submit an update request
> waiting for response). What about changing it to async model, where nodes
> will only persist the update into tlog then return immediately much like
> what tlog is doing now. Then we have a dedicated executor which reads from
> tlog to do indexing (producer consumer model with tlog acting like the
> queue).
>
> I do see several big benefits of this approach
>
>    - We can batching updates in a single call, right now we do not use
>    writer.add(documents) api from lucene, by batching updates this gonna boost
>    the performance of indexing
>    - One common problems with Solr now is we have lot of threads doing
>    indexing so that can ends up with many small segments. Using this model we
>    can have bigger segments so less merge cost
>    - Another huge reason here is after switching to this model, we can
>    remove tlog and use a distributed queue like Kafka, Pulsar. Since the
>    purpose of leader in SolrCloud now is ordering updates, the distributed
>    queue is already ordering updates for us, so no need to have a dedicated
>    leader. That is just the beginning of things that we can do after using a
>    distributed queue.
>
> What do your guys think about this? Just want to hear from your guys
> before going deep into this rabbit hole.
>
> Thanks!
>
>

Re: Index documents in async way

Reply via email to