I asked a similar question last week but I don't think I explained myself properly. I have created a nutch / lucene index using the normal crawl, merge, dedup process. The problem I am having is that this whole process takes a long time, I would like to be able to inject single urls and have them appear in the search very quickly without having to rebuild the whole index (triggered by documents being changed for example.) How can this be done?
I have been trying to do the following without success. 1. Crawl and index the new url. 2. Copy the live index 3. Dedup against the live copy 4. Merge with the live copy 5. Replace the live index with the new index The process seems to work apart from step 3, I cannot seem to dedup a previously merged index against an unmerged one. I imagine I am looking at the problem from completely the wrong direction. Cheers Rob ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
