In the beginning it is approximately 10 to 1.  So for every page I crawl 
I will get 10 more pages to crawl that are not currently in the index. 
As you move towards 50 million pages is becomes more like 6 to 1.  If 
you seed the entire dmoz, your first crawl will be around 5.5 million 
pages.  Your second crawl will be around 54 million pages.  And a depth 
of 3 will give you over 300 million pages.  These are the numbers that 
we are currently seeing.

Dennis Kubes

bbrown wrote:
> This is kind of a generic question. Are there any stats on how many pages 
> will get crawled based on some initial seed.  For example, if you seed the 
> list from dmoz, how many pages will get indexed?  Lets say there are 4 
> million, will 4 million only get indexed?
> 
> Or lets say I have 4000, will I get 30,000 crawled/indexed pages?
> 
> --
> Berlin Brown
> [berlin dot brown at gmail dot com]
> http://botspiritcompany.com/botlist/?
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to