e w wrote:
> (The message below was posted to nutch-dev a few days ago.) Can anyone
> (anonymous or otherwise) confirm whether it's possible to use Nutch 
> 0.7 for
> a "4-6 billion page search engine"? Is this a typo or for real? Just 
> curious
> and if it's true what were the major issues e.g. time, RAM, (storage
> presumably)? My understanding was that the practical limit on 0.7 was 
> about
> 100 million pages whatever hardware you have.

Unless we are talking about an extensively re-written version 0.7, I'd 
say it's next to impossible to use an out-of-the-box 0.7 for anything 
more than 200-300 mln urls, if even that many. The main bottleneck were 
the DB operations, which for any type of hardware would take even days 
to complete.

These limitations have been largely removed in 0.8 and later, due to the 
Hadoop framework.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to