Doug Cutting wrote:
Renaud Richardet wrote:
I see. I was thinking that I could index the feed items without having to fetch them individually.

Okay, so if Parser#parse returned a Map<String,Parse>, then the URL for each parse should be that of its link, since you don't want to fetch that separately. Right?
Exactly.

So now the question is, how much impact would this change to the Parser API have on the rest of Nutch? It would require changes to all Parser implementations, to ParseSegement, to ParseUtil, and to Fetcher. But, as far as I can tell, most of these changes look straightforward.
I think so, too. I have opened an issue in JIRA (https://issues.apache.org/jira/browse/NUTCH-443) and will give it a try.
Doğacan, have you started working on it yet?

Thanks,
Renaud

Reply via email to