I've been twiddling with the new feed parsing code in nutch-1.0-dev as a model for some changes to an existing nutch architecture that I've been running for a while. Using the new ParseResult structure I'm able to create a set of ParseData objects for each document I've crawled. All of the documents I'm crawling are Rss feeds, and I'd like to index each item in the feeds as it's own document. So everything is working just fine up until the point where I try to create the index. At that point nothing happens. A quick check through the Indexer code and a dump file of the segment shows me that I don't have any CrawlDatum entries with a DB status for the items parsed from each feed.
I never intended on crawling the items from the feeds, so my question is can I in the parsing/fetching stage add the db status into the the ParseData for each individual item? I'd rather not mess around in the Indexer and remove checks for things that should probably exist. patrik
