Watch out for merge sizes > 100 -- you'll run out of file descriptors -- ? Also does mergeFactor have any effect in RAMdirectory ?
Winton >I've loaded a large (but not as large as yours) index with mergeFactor >set to 1000. Was substantially faster than with default setting. >Making it higher didn't seem to make things much faster but did cause >it to use more memory. In addition I loaded the data in chunks in >separate processes and optimized the index after each chunk, again >in a separate process. All done straight to disk, no messing about >with RAMDirectories. > >Didn't play with maxMergeDocs and am not sure what you mean by >"maximum heap size" but 1MB doesn't sound very large. > > > >-- >Ian. >[EMAIL PROTECTED] > > >Chantal Ackermann wrote: >> >> hi to all, >> >> please help! I think I mixed my brain up already with this stuff... >> >> I'm trying to index about 29 textfiles where the biggest one is ~700Mb and >> the smallest ~300Mb. I achieved once to run the whole index, with a merge >> factor = 10 and maxMergeDocs=10000. This took more than 35 hours I think >> (don't know exactly) and it didn't use much RAM (though it could have). >> unfortunately I had a call to optimize at the end and while optimization an >> IOException (File to big) occured (while merging). >> >> As I run the program on a multi-processor machine I now changed the code to >> index each file in a single thread and write to one single IndexWriter. the >> merge factor is still at 10. maxMergeDocs is at 1.000.000. I set the maximum >> heap size to 1MB. >> >> I tried to use RAMDirectory (as mentioned in the mailing list) and just use >> IndexWriter.addDocument(). At the moment it seems not to make any >>difference. >> after a while _all_ the threads exit one after another (not all at once!) >> with an OutOfMemoryError. the priority of all of them is at the minimum. >> >> even if the multithreading doesn't increase performance I would be glad if I >> could just once get it running again. >> >> I would be even happier if someone could give me a hint what would be the >> best way to index this amount of data. (the average size of an entry that >> gets parsed for a Document is about 1Kb.) >> >> thanx for any help! >> chantal > >-- >To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> >For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> Winton Davies Lead Engineer, Overture (NSDQ: OVER) 1820 Gateway Drive, Suite 360 San Mateo, CA 94404 work: (650) 403-2259 cell: (650) 867-1598 http://www.overture.com/ -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>