[ https://issues.apache.org/jira/browse/NUTCH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671133#action_12671133 ]
Andrzej Bialecki commented on NUTCH-357: ----------------------------------------- Closing this issue - the suggested solution seems to address the problem in a sufficient way. > crawling simulation > ------------------- > > Key: NUTCH-357 > URL: https://issues.apache.org/jira/browse/NUTCH-357 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.8.1, 0.9.0 > Reporter: Stefan Groschupf > Assignee: Andrzej Bialecki > Fix For: 1.0.0 > > Attachments: protocol-simulation-pluginV1.patch > > > We recently discovered some serious issue related to crawling and scoring. > Reproducing these problems is a kind of difficult, since first of all it is > not polite to re-crawl a set of pages again and again, secondly it is > difficult to catch the page that cause a problem. > Therefore it would be very useful to have a testbed to simulate crawls where > we can control the response of "web servers". > For the very beginning simulate very basic situation like a page points to it > self, link chains or internal links would already be very usefully. > However later on simulate crawls against existing data collections like TREC > or a webgraph would be much more interesting, for instance to caculate the > quality of the nutch OPIC implementation against page rank scores of the > webgraph or evaluaing crawling strategies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.