Would you say it's worth writing it up as a patch and adding it to JIRA? On 7/9/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Robert Young wrote: > > I have been trying to get to grips with > > org.apache.nutch.crawl.Injector to help with a requirement I have for > > the project I'm working on and I'm a little confused about one place. > > On lines 120 - 121 any existing CrawlDatum is used instead of the > > newly injected one. This doesn't seem to make sense from my point of > > view, I'm guessing it's just a matter of not being able to see the > > issue from the other side. The scenario I an in is as such, when I > > inject a url it is because I want it to be re-indexed, maybe because > > it's changed, I don't care if that url's already in the crawldb I want > > it re-indexed. As far as I can see, if this wasn't the case I wouldn't > > be trying to inject it. > > > > What am I missing here? Why is the existing CrawlDatum used instead of > > the newly injected one? > > That's indeed a place in Nutch that I planned to change for a long time > ... This behavior is not obvious, what's worse it's undocumented. > > It would be relatively simple to extend this behavior so that only > selected parts of data would be updated or replaced when a seed list > contains the same URL as the one already in CrawlDb. > > For now, just add the code that you need in Injector.InjectReducer. > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > >
------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
