On 1/2/07, Sean Dean <[EMAIL PROTECTED]> wrote: > There actually isn't much of a reason to generate "huge" multi-million page > fetch lists when you can create lots of smaller ones and merge them together. > This allows for more of a ladder-style approach, and in some cases reduces > the risk of errors in terms of Hadoop versions (0.8+) with large > unrecoverable fetches or failed parse-reduce stag
The problem I am faced with is I'm not sure how to merge my indexes together. For example I run a fetch of about 200,000 pages in about 3 or 4 different fetches. Once done I run the index command and all goes very well and my index is built. That said if I try and run a new fetch and then try and index the new fetch I get an error saying "crawl/indexes" already exists. How does one actually merge different fetches to the same index without having to recreate the index each time? Thanks! Justin ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
