On Apr 3, 2006, at 6:26 PM, Marvin Humphrey wrote:


On Apr 3, 2006, at 5:43 PM, Doug Cutting wrote:

Marvin Humphrey wrote:
Plucene is a Lucene 1.3 port, so it doesn't have max_buffered_docs -- but I can set merge_factor to 1000.

I would not recommend that. With a merge factor that high you may run out of file handles, and, moreover, I doubt that disks are very efficient when reading from that many streams.

Running out of filehandles is a solvable problem because you can set ulimit -n to whatever on OS X -- and you pretty much have to, since the default is 256.

The streams issue is more complicated. N-way merges from disk tend to be IO-bound. The best I can do is try a couple numbers and see what works. IIRC, the number 100 has gone by on the Plucene mailing list as a good value.

The higher the better, it seems.  Here's times to index 1000 docs:

merge_factor  secs
10            141
30            123
100           107
250           100
1000           89

I suspect that Plucene is so CPU-bound that the IO doesn't come into play.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to