Dennis Kubes wrote: > 100 million pages = 50-100 servers and 20-40T of space distributed. > Ideally the setup would be processing machines and search servers. You
[..] That's a very nice description - thanks, Dennis. I think it would be useful to include it on the Wiki as a case study. > This is all dependent on the size of each local index. Approximately > 2-4M pages per index split is good. Over that you may see performance > decreases. Scaling that out over many servers you will see almost > linear response time. We have almost 100M pages in the index and are > seeing subsecond response times on most queries. Are you running with a sorted index, and using non-zero searcher.max.hits? If you use a well-defined PR-like scoring, then using this feature could make wonders to the performance, and increase the max number of docs per server. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
