Koch Martina wrote:
Hi all,
I'd like to add a new field to the CrawlDatum to capture the date when an URL
was found first. The field should be called FoundFirst. Can anyone tell me
which classes I need to modify in order to achieve this? In my opinion, it
should be sufficient to change the CrawlDatum and CrawlDbReader class, but I
think, I've missed something beacause the CrawlDbMerger crashes now. I know
that I lose the compatibility to Nutch, but still...
The easiest (and compatible) way to do this is to use
CrawlDatum.getMetaData(), which is a MapWritable that can store
arbitrary key/value pairs.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com