[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marek Bachmann updated NUTCH-1090: ---------------------------------- Attachment: LinkDb.patch Inserted a {{LOG.info}} command in the {{invert}} method when db.ignore.internal.links is set to true. Added a constant value {{IGNORE_INTERNAL_LINKS}} for the {{"db.ignore.internal.links"}} string. Moved the creation of the {{JobConf}} object at the top of the {{invert}} method > LinkDb (invertlinks) should inform the user when it ignores internal links > -------------------------------------------------------------------------- > > Key: NUTCH-1090 > URL: https://issues.apache.org/jira/browse/NUTCH-1090 > Project: Nutch > Issue Type: Improvement > Components: linkdb > Affects Versions: 1.3 > Reporter: Marek Bachmann > Priority: Trivial > Labels: configuration, information, log > Fix For: 1.3 > > Attachments: LinkDb.patch > > > I used nutch to crawl sites on a single domain. After the crawl was complete > I tried to build a LinkDb. The LinkDb was empty. > It comes up that this happens because the invertlinks command ignores > internal links to the same domain by default. > Unfortunately the LinkDb class doesn't tell anything about that. So it was > hard to find out why the LinkDb was empty. > I suggest to add an information for the user when the invertlinks command is > ignoring internal links. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira