[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090257#comment-13090257 ]
Markus Jelsma commented on NUTCH-1090: -------------------------------------- You can patch o.a.n.crawl.LinkDB.configure() to log this information. > LinkDb (invertlinks) should inform the user when it ignores internal links > -------------------------------------------------------------------------- > > Key: NUTCH-1090 > URL: https://issues.apache.org/jira/browse/NUTCH-1090 > Project: Nutch > Issue Type: Improvement > Components: linkdb > Affects Versions: 1.3 > Reporter: Marek Bachmann > Priority: Trivial > Labels: configuration, information, log > Fix For: 1.3 > > Attachments: LinkDb.patch > > > I used nutch to crawl sites on a single domain. After the crawl was complete > I tried to build a LinkDb. The LinkDb was empty. > It comes up that this happens because the invertlinks command ignores > internal links to the same domain by default. > Unfortunately the LinkDb class doesn't tell anything about that. So it was > hard to find out why the LinkDb was empty. > I suggest to add an information for the user when the invertlinks command is > ignoring internal links. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira