Re: BlockTreeTermsReader consumes crazy amount of memory

Vitaly Funstein Thu, 28 Aug 2014 14:39:46 -0700

On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless <
[email protected]> wrote:


>
> The segments_N file can be different, that's fine: after that, we then
> re-use SegmentReaders when they are in common between the two commit
> points.  Each segments_N file refers to many segments...
>
>
Yes, you are totally right - I didn't follow the code far enough the first
time around. :) This is an excellent idea, actually - I can probably
arrange maintained commit points as an MRU data structure (e.g.
LinkedHashMap with access order), and simply grab the most recently opened
reader to pass in when obtaining a new one from the new commit point - to
maximize segment reader reuse.


> You can set it (min and max) as high as you want; the only hard
> requirement is that max >= 2*(min-1), I believe.
>

Looks like this is used inside Lucene41PostingsFormat, which simply passes
in those defaults - so you are effectively saying the minimum (and
therefore, maximum) block size can be raised to reuse the size of the terms
index inside those TreeMap nodes?


>
> > We are already using a customized codec though, so perhaps adding
> > this to the codec is okay and transparent?
>
> Hmmm :)  Customized in what manner?
>
>
We need to have the ability to turn off stored fields compression, so there
is one codec in case the system is configured that way. The other one
exists for compression on, but there I tweaked stored fields format for
bias toward decompression, as well as a smaller chunk size - based on some
empirical observations in executed tests. I am guessing I'll just add
another customization to both that deals with the block sizing for postings
format, and see what difference that makes...

Re: BlockTreeTermsReader consumes crazy amount of memory

Reply via email to