Ahh we are indeed doing that. The maxBufferedDocs is total-doc-count /
555, to provoke precisely a "5 big segments + 5 medium segments + 5 baby
segments" consistent segment geometry in the end.
But that works out to:
maxBufferedDocs=49774
Which is not too tiny?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Oct 21, 2021 at 8:52 AM Robert Muir <[email protected]> wrote:
> Yeah, I'm pretty lost in all the ways we index here. But if we are
> passing maxBufferedDocs <low number> for this deterministic indexing,
> I think it would cause the issue? I have no idea what the IW config
> here is...
>
> On Thu, Oct 21, 2021 at 8:48 AM Robert Muir <[email protected]> wrote:
> >
> > On Thu, Oct 21, 2021 at 8:36 AM Robert Muir <[email protected]> wrote:
> > >
> > > But also the internal reuse of IndexingChain.PerField (which houses
> > > the reused tokenstream) isn't just per-thread, it is
> > > per-thread-per-segment, right? So if Mike is indexing with 100
> > > threads, and flushes 200 times, I'd expect 20k of these things to be
> > > made. There's a lot going on in the benchmark code for nightly and it
> > > is tricky for me to try to navigate the various cases (1KB,
> > > 1KB-with-vectors, 4KB, "deterministic indexing", etc)
> >
> > I think this might be the case with your link. If you look at the URL
> > of your actual link, you see it ends with #profiler_4kb_indexing_1_cpu
> > ?
> > This makes me think i'm looking at the profiler output of the
> > "deterministic indexing".
> > For this one, LogDocMergePolicy is used.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>