[ https://issues.apache.org/jira/browse/NUTCH-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel resolved NUTCH-2032. ------------------------------------ Resolution: Duplicate Thanks, [~betolink]! Closing this old issue. The functionality to index the raw content is available since 1.11 (see NUTCH-1785). > Plugin to index the raw content of a readable document. > -------------------------------------------------------- > > Key: NUTCH-2032 > URL: https://issues.apache.org/jira/browse/NUTCH-2032 > Project: Nutch > Issue Type: New Feature > Components: indexer, parser > Affects Versions: 1.10 > Reporter: Luis Lopez > Assignee: Lewis John McGibbney > Priority: Major > Labels: content, index, index-rawcontent, parser, raw > > This is related to https://issues.apache.org/jira/browse/NUTCH-1785 and > https://issues.apache.org/jira/browse/NUTCH-1458 > We created a couple plugins to index the raw content of readable documents. > If we include these plugins in the plugin chain we'll index the raw content > of a readable document, i.e. XML, HTML, CSV, TXT etc. The index-rawcontent > plugin is not designed to index binary files, however having the full content > of an HTML/XML or a CSV document is really critical for some of us. -- This message was sent by Atlassian Jira (v8.3.4#803005)