On Thu, Aug 28, 2014 at 5:38 PM, Vitaly Funstein <vfunst...@gmail.com> wrote:
> On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>>
>> The segments_N file can be different, that's fine: after that, we then
>> re-use SegmentReaders when they are in common between the two commit
>> points.  Each segments_N file refers to many segments...
>>
>>
> Yes, you are totally right - I didn't follow the code far enough the first
> time around. :) This is an excellent idea, actually - I can probably
> arrange maintained commit points as an MRU data structure (e.g.
> LinkedHashMap with access order), and simply grab the most recently opened
> reader to pass in when obtaining a new one from the new commit point - to
> maximize segment reader reuse.

That's great!

>> You can set it (min and max) as high as you want; the only hard
>> requirement is that max >= 2*(min-1), I believe.
>>
>
> Looks like this is used inside Lucene41PostingsFormat, which simply passes
> in those defaults - so you are effectively saying the minimum (and
> therefore, maximum) block size can be raised to reuse the size of the terms
> index inside those TreeMap nodes?

Yes, but it then increases cost at search time to locate a given term,
because more scanning is then required once we seek to the block that
might have the term.

This reduces the size of the FST, but if RAM is being used by
something else inside BT, it won't help.  But from your screen shot it
looked like it was almost entirely the FST, which is what I would
expect.

>> > We are already using a customized codec though, so perhaps adding
>> > this to the codec is okay and transparent?
>>
>> Hmmm :)  Customized in what manner?
>>
>>
> We need to have the ability to turn off stored fields compression, so there
> is one codec in case the system is configured that way. The other one
> exists for compression on, but there I tweaked stored fields format for
> bias toward decompression, as well as a smaller chunk size - based on some
> empirical observations in executed tests. I am guessing I'll just add
> another customization to both that deals with the block sizing for postings
> format, and see what difference that makes...

Ahh, OK.  Yes, just add this custom terms index block sizing too.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to