Re: How to avoid sharing docStore files?

Ivan Vasilev Wed, 12 May 2010 07:16:13 -0700

That`s fine Andrzej :) doing split in just one pass really matters forbig indexes.

Hope we will use it in our application.
Thanks,
Ivan


Andrzej Bialecki wrote:

On 2010-05-12 14:29, Ivan Vasilev wrote:

Hi Michael,
Thanks for your answer.
What we do now:
1. Splitting indexes. We do it not by reading indexes and distributing
docs in separate indexes like in MultiPassIndexSplitter. We do it by
binary copping segments to different folders and then recreate segment
descriptor file for each one (we have created tool for this). The
decision of which segment to which new index to go is taken by taking
segment sizes and calculating so that to have almost equal indexes. If
we have .cfx file this would be an obstacle for current logic of division.
I saw the class MultiPassIndexSplitter. It offers splitting index by
docs (not by segments). It has a big advantage - index could be split
better (to more similar in size parts). It would be done even if index
was just optimized and we have only one big segment. But it has also
disadvantages. Index is read as many times as the number of new indexes
is (it is bad for ~40Gb indexes). Also the original index remains all
the time this means if we do the split in one and the same partition we
need double disk space.
May be we should offer both index split approaches to the user... this
depends on higher levels :)


Hi,

I wrote the MultiPassIndexSplitter. Yes, multi-pass is problematic with
large indexes. I'm currently working on a single-pass TrueSplitter :)
which should be ready within a couple weeks.

However, even this new tool will make a copy of the original index, so
you will need twice as much space. But in this case perhaps you could
put the original index on a network FS, and split it into the target
partition - the data would be read just once.

Re: How to avoid sharing docStore files?

Reply via email to