TL wrote:
You mentioned that as a rule of thumb each node should
only have about 20M pages. What's the main bottleneck
that's encountered around 20M pages? Disk i/o , cpu
speed?
Either or both, depending on your hardware, index, traffic, etc.
CPU-time to compute results serially can average up to a second or more
with ~20M page indexes. And the total amount of i/o time per query on
indexes this size can be more than a second. If you can spread the i/o
over multiple spindles then it may not be the bottleneck.
Doug
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general