Dennis Kubes wrote:

> 100 million pages = 50-100 servers and 20-40T of space distributed. 
> Ideally the setup would be processing machines and search servers.  You 

[..]

That's a very nice description - thanks, Dennis. I think it would be 
useful to include it on the Wiki as a case study.


> This is all dependent on the size of each local index.  Approximately 
> 2-4M pages per index split is good.  Over that you may see performance 
> decreases.  Scaling that out over many servers you will see almost 
> linear response time.  We have almost 100M pages in the index and are 
> seeing subsecond response times on most queries.

Are you running with a sorted index, and using non-zero 
searcher.max.hits? If you use a well-defined PR-like scoring, then using 
this feature could make wonders to the performance, and increase the max 
number of docs per server.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to