I have been trying to get to grips with
org.apache.nutch.crawl.Injector to help with a requirement I have for
the project I'm working on and I'm a little confused about one place.
On lines 120 - 121 any existing CrawlDatum is used instead of the
newly injected one. This doesn't seem to make sense from my point of
view, I'm guessing it's just a matter of not being able to see the
issue from the other side. The scenario I an in is as such, when I
inject a url it is because I want it to be re-indexed, maybe because
it's changed, I don't care if that url's already in the crawldb I want
it re-indexed. As far as I can see, if this wasn't the case I wouldn't
be trying to inject it.

What am I missing here? Why is the existing CrawlDatum used instead of
the newly injected one?

Cheers
Rob

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to