[
https://issues.apache.org/jira/browse/NUTCH-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743856#comment-13743856
]
Osy commented on NUTCH-1623:
Sure Lewis, For Nutch 2.2.1 in nutch-default.xml there is a description for
this functionality (!! NO IMPLEMENTED YET !!):
If true, no file content will be saved during fetch.
And it is probably what we want to set most of time, since file:// URLs
are meant to be local and we can always use them directly at parsing
and indexing stages. Otherwise file contents will be saved.
Exactly what I need.
Thanks
> Implement file.content.ignored function
> ---
>
> Key: NUTCH-1623
> URL: https://issues.apache.org/jira/browse/NUTCH-1623
> Project: Nutch
> Issue Type: New Feature
> Components: crawldb, fetcher
>Affects Versions: 2.2, 2.2.1
>Reporter: Osy
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira