e w wrote: > (The message below was posted to nutch-dev a few days ago.) Can anyone > (anonymous or otherwise) confirm whether it's possible to use Nutch > 0.7 for > a "4-6 billion page search engine"? Is this a typo or for real? Just > curious > and if it's true what were the major issues e.g. time, RAM, (storage > presumably)? My understanding was that the practical limit on 0.7 was > about > 100 million pages whatever hardware you have.
Unless we are talking about an extensively re-written version 0.7, I'd say it's next to impossible to use an out-of-the-box 0.7 for anything more than 200-300 mln urls, if even that many. The main bottleneck were the DB operations, which for any type of hardware would take even days to complete. These limitations have been largely removed in 0.8 and later, due to the Hadoop framework. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
