Andrzej Bialecki wrote:
> Dennis Kubes wrote:
> 
>> 100 million pages = 50-100 servers and 20-40T of space distributed. 
>> Ideally the setup would be processing machines and search servers.  You 
> 
> [..]
> 
> That's a very nice description - thanks, Dennis. I think it would be 
> useful to include it on the Wiki as a case study.

I will polish it up a bit and put it out there.

> 
> 
>> This is all dependent on the size of each local index.  Approximately 
>> 2-4M pages per index split is good.  Over that you may see performance 
>> decreases.  Scaling that out over many servers you will see almost 
>> linear response time.  We have almost 100M pages in the index and are 
>> seeing subsecond response times on most queries.
> 
> Are you running with a sorted index, and using non-zero 
> searcher.max.hits? If you use a well-defined PR-like scoring, then using 
> this feature could make wonders to the performance, and increase the max 
> number of docs per server.

I don't know about the sorted index.  How do I learn about that?

We basically took the current indexer and extended it to split into 
parts.  The indexer also splits the segements and linkdb into the same 
parts so all data for a single url will be in the same split on the same 
search server.  We are using searcher.max.hits at 1000 and we did see a 
performance increase from that.

Dennis Kubes

> 
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to