I tested this last night, so in case anyone wants to know the answer, yes,
this can be done.
If all you need are the Lucene indexes for your website, you can do the
crawl, do another crawl, and then do an
IndexMerger (from the nutch.crawl api dir)
Then do a DeleteDuplicates on that new index

whamo, new index with both crawls data.
S



sdeck wrote:
> 
> Thanks for everyones help so far from my postings.
> Here is another question.
> 
> I am currently merging my crawls, but am wondering if I can skip a few
> steps and how to do it.
> I inject a whole slew of urls into a crawl each time, and then merge it
> with the crawl previously to that.
> The urls injected are the same each time.
> 
> Now, my merged segments directory is starting to get larger and the
> indexing is starting to get slower. However, I only use the generated
> Lucene index for my website, not any of the segments, etc. Plus, I restart
> the crawl each and every time. So, would I be able to give the de duper
> the two lucene index directories I have, and then use IndexMerger to
> combine the indexes into a new lucene index, and skip over the merge of
> the linkdb, crawld, segments ?
> 
> Thanks,
> S
> 

-- 
View this message in context: 
http://www.nabble.com/Fun-question-for-index-merge-tf2861621.html#a8012047
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to