Watch out for merge sizes > 100 -- you'll run out of file descriptors 
-- ? Also does mergeFactor have any effect in RAMdirectory ?

  Winton

>I've loaded a large (but not as large as yours) index with mergeFactor
>set to 1000.  Was substantially faster than with default setting.
>Making it higher didn't seem to make things much faster but did cause
>it to use more memory. In addition I loaded the data in chunks in
>separate processes and optimized the index after each chunk, again
>in a separate process.  All done straight to disk, no messing about
>with RAMDirectories.
>
>Didn't play with maxMergeDocs and am not sure what you mean by
>"maximum heap size" but 1MB doesn't sound very large.
>
>
>
>--
>Ian.
>[EMAIL PROTECTED]
>
>
>Chantal Ackermann wrote:
>>
>> hi to all,
>>
>> please help! I think I mixed my brain up already with this stuff...
>>
>> I'm trying to index about 29 textfiles where the biggest one is ~700Mb and
>> the smallest ~300Mb. I achieved once to run the whole index, with a merge
>> factor = 10 and maxMergeDocs=10000. This took more than 35 hours I think
>> (don't know exactly) and it didn't use much RAM (though it could have).
>> unfortunately I had a call to optimize at the end and while optimization an
>> IOException (File to big) occured (while merging).
>>
>> As I run the program on a multi-processor machine I now changed the code to
>> index each file in a single thread and write to one single IndexWriter. the
>> merge factor is still at 10. maxMergeDocs is at 1.000.000. I set the maximum
>> heap size to 1MB.
>>
>> I tried to use RAMDirectory (as mentioned in the mailing list) and just use
>> IndexWriter.addDocument(). At the moment it seems not to make any 
>>difference.
>> after a while _all_ the threads exit one after another (not all at once!)
>> with an OutOfMemoryError. the priority of all of them is at the minimum.
>>
>> even if the multithreading doesn't increase performance I would be glad if I
>> could just once get it running again.
>>
>> I would be even happier if someone could give me a hint what would be the
>> best way to index this amount of data. (the average size of an entry that
>> gets parsed for a Document is about 1Kb.)
>>
>> thanx for any help!
>> chantal
>
>--
>To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
>For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>


Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to