[ 
https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Bachmann updated NUTCH-1090:
----------------------------------

    Attachment: LinkDb.patch

Inserted a {{LOG.info}} command in the {{invert}} method when 
db.ignore.internal.links is set to true.
Added a constant value {{IGNORE_INTERNAL_LINKS}} for the 
{{"db.ignore.internal.links"}} string.
Moved the creation of the {{JobConf}} object at the top of the {{invert}} method

> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1090
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1090
>             Project: Nutch
>          Issue Type: Improvement
>          Components: linkdb
>    Affects Versions: 1.3
>            Reporter: Marek Bachmann
>            Priority: Trivial
>              Labels: configuration, information, log
>             Fix For: 1.3
>
>         Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete 
> I tried to build a LinkDb. The LinkDb was empty. 
> It comes up that this happens because the invertlinks command ignores 
> internal links to the same domain by default. 
> Unfortunately the LinkDb class doesn't tell anything about that. So it was 
> hard to find out why the LinkDb was empty. 
> I suggest to add an information for the user when the invertlinks command is 
> ignoring internal links.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to