I have been trying to get to grips with org.apache.nutch.crawl.Injector to help with a requirement I have for the project I'm working on and I'm a little confused about one place. On lines 120 - 121 any existing CrawlDatum is used instead of the newly injected one. This doesn't seem to make sense from my point of view, I'm guessing it's just a matter of not being able to see the issue from the other side. The scenario I an in is as such, when I inject a url it is because I want it to be re-indexed, maybe because it's changed, I don't care if that url's already in the crawldb I want it re-indexed. As far as I can see, if this wasn't the case I wouldn't be trying to inject it.
What am I missing here? Why is the existing CrawlDatum used instead of the newly injected one? Cheers Rob ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
