Hi Otis, if you have to handle a very large set of tasks you may be interested in having a look at the hazelcast distribuded queue. You should be able to use it with the current version of the trunk since droids-56 has been applied. In such cluster no task will be lost in case of failure.
Best regards, Bertil Sent from my iPad On Mar 25, 2011, at 9:09 PM, Otis Gospodnetic <[email protected]> wrote: > Hi, > > Somebody (Paul?) mentioned using Droids for doing a 50M page crawl. Anyone > else > using Droids for crawls of that size? > > I'm asking because I have a need to do a "semi-vertical" crawl on up to 10K > domains and I'm considering Droids vs. Nutch. This may translate to several > times that many different servers - say 100K. And that may translate to a few > 100M web pages. Too big for Droids without having a persistent link queue, > right? > > Thanks, > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ >
