I don't think (but don't know for sure) whether optimizing before the
end of the run buys you anything. And you're right, it takes a while.
I've assumed that it was best done at the end of the entire run,
but that's only an assumption.

Search the archives for the thread titled

MergeFactor and MaxBufferedDocs value should ...?

for an exposition on how all the indexing factors relate.

Also, look at the (new to 2.1) IndexWriter.ramSizeInBytes() (or something
like that). Rather than worrying about MergeFactor and the other
parameters and guessing, this may allow you to flush RAM to disk
when needed rather than every N documents. CAUTION: there's
a bug (detailed in the thread referenced above) with this code, so look
at the thread...

In fact, yesterday I experimented with what I call an "AdaptiveWriter".
The idea is to query the op system for the amount of RAM I'm using,
and track the amount of RAM used to index a document as it
relates to size. Then flush when I get "too close for comfort" to an
OOM error. Yes, this is a digression from optimizing, but it is
related to indexing as fast as possible <G>.

By monitoring the size of the index growth and the size of the
incoming document, I can create a crude measure of how much
RAM the index needs for a document of a given size. Actually,
I tracked the ratio of the size of incoming doc to the change in
memory. When my available RAM for the process is less
than 2X the largest ratio yet times the size of the incoming
document, I flush. I'm not sure how much this changes things,
but I thought that creating one of these was better than
experimenting with the various factors for each new project...

Anyway, do look at that thread for ideas on how to make
this as efficient as possible, and you can probably ignore
the rest <G>...

Best
Erick



On 5/3/07, Aleksander M. Stensby <[EMAIL PROTECTED]> wrote:

Ok. but then you would not optimize at all? Not even in the end of the
indexing run?

On Thu, 03 May 2007 12:17:40 +0200, Mark Miller <[EMAIL PROTECTED]>
wrote:

> I think it is worth your time to do some benchmarking. I think
> mergeFactor is not very helpful in the end...if you set it high, you'll
> index faster but then your searches will be slower prompting you to
> optimize...after which you'll find that you paid all your gains back.
> Test things out for yourself, but I'd recommend a low merge factor and
> then you can forget about the hassle of optimizing. Amortize,
amortize...
>
> - Mark
>
> Aleksander M. Stensby wrote:
>> Hello everyone!
>> I'm wondering if any of you have any helpful advice to what MergeFactor
>> i should use...
>> The indexing process is handling a large amount of documents and i
>> would like to index as fast as possible.
>> Initial thought was to increase the mergeFactor to make the indexer
>> work more in memory and less writing to file. Thus this created a
>> problem for me with "TOO-MANY-OPEN-FILES"... of course, since i choose
>> 2000 as my mergeFactor:) Well, i could do an optimize from time to
>> time, but the big question is whats more efficient? Optimize tends to
>> take a loooong time on our system since it is quite a large index.
>>
>> Any helpful advice to what i should do? 10 as mergeFactor cant possibly
>> be the best solution here? Any advice would be highly appreciated!
>>
>> - Aleksander
>>
>> --Aleksander M. Stensby
>> Software Developer
>> Integrasco A/S
>> [EMAIL PROTECTED]
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



--
Aleksander M. Stensby
Software Developer
Integrasco A/S
[EMAIL PROTECTED]
Tlf.: +47 41 22 82 72

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to