Can there be a situation where the index writer fails after the document was added to tlog and a success is sent to the user? I think we want to avoid such a situation, isn't it?
On Thu, 8 Oct, 2020, 8:25 pm Cao Mạnh Đạt, <[email protected]> wrote: > > Can you explain a little more on how this would impact durability of > updates? > Since we persist updates into tlog, I do not think this will be an issue > > > What does a failure look like, and how does that information get > propagated back to the client app? > I did not be able to do much research but I think this is gonna be the > same as the current way of our asyncId. In this case asyncId will be the > version of an update (in case of distributed queue it will be offset) > failures update will be put into a time-to-live map so users can query the > failure, for success we can skip that by leverage the max succeeded version > so far. > > On Thu, Oct 8, 2020 at 9:31 PM Mike Drob <[email protected]> wrote: > >> Interesting idea! Can you explain a little more on how this would impact >> durability of updates? What does a failure look like, and how does that >> information get propagated back to the client app? >> >> Mike >> >> On Thu, Oct 8, 2020 at 9:21 AM Cao Mạnh Đạt <[email protected]> wrote: >> >>> Hi guys, >>> >>> First of all it seems that I used the term async a lot recently :D. >>> Recently I have been thinking a lot about changing the current indexing >>> model of Solr from sync way like currently (user submit an update request >>> waiting for response). What about changing it to async model, where nodes >>> will only persist the update into tlog then return immediately much like >>> what tlog is doing now. Then we have a dedicated executor which reads from >>> tlog to do indexing (producer consumer model with tlog acting like the >>> queue). >>> >>> I do see several big benefits of this approach >>> >>> - We can batching updates in a single call, right now we do not use >>> writer.add(documents) api from lucene, by batching updates this gonna >>> boost >>> the performance of indexing >>> - One common problems with Solr now is we have lot of threads doing >>> indexing so that can ends up with many small segments. Using this model >>> we >>> can have bigger segments so less merge cost >>> - Another huge reason here is after switching to this model, we can >>> remove tlog and use a distributed queue like Kafka, Pulsar. Since the >>> purpose of leader in SolrCloud now is ordering updates, the distributed >>> queue is already ordering updates for us, so no need to have a dedicated >>> leader. That is just the beginning of things that we can do after using a >>> distributed queue. >>> >>> What do your guys think about this? Just want to hear from your guys >>> before going deep into this rabbit hole. >>> >>> Thanks! >>> >>>
