On Apr 3, 2006, at 6:26 PM, Marvin Humphrey wrote:
On Apr 3, 2006, at 5:43 PM, Doug Cutting wrote:
Marvin Humphrey wrote:
Plucene is a Lucene 1.3 port, so it doesn't have
max_buffered_docs -- but I can set merge_factor to 1000.
I would not recommend that. With a merge factor that high you may
run out of file handles, and, moreover, I doubt that disks are
very efficient when reading from that many streams.
Running out of filehandles is a solvable problem because you can
set ulimit -n to whatever on OS X -- and you pretty much have to,
since the default is 256.
The streams issue is more complicated. N-way merges from disk tend
to be IO-bound. The best I can do is try a couple numbers and see
what works. IIRC, the number 100 has gone by on the Plucene
mailing list as a good value.
The higher the better, it seems. Here's times to index 1000 docs:
merge_factor secs
10 141
30 123
100 107
250 100
1000 89
I suspect that Plucene is so CPU-bound that the IO doesn't come into
play.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]