Doug Cutting wrote:
Renaud Richardet wrote:
I see. I was thinking that I could index the feed items without
having to fetch them individually.
Okay, so if Parser#parse returned a Map<String,Parse>, then the URL
for each parse should be that of its link, since you don't want to
fetch that separately. Right?
Exactly.
So now the question is, how much impact would this change to the
Parser API have on the rest of Nutch? It would require changes to all
Parser implementations, to ParseSegement, to ParseUtil, and to
Fetcher. But, as far as I can tell, most of these changes look
straightforward.
I think so, too. I have opened an issue in JIRA
(https://issues.apache.org/jira/browse/NUTCH-443) and will give it a try.
Doğacan, have you started working on it yet?
Thanks,
Renaud