[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jorge Luis Betancourt Gonzalez resolved NUTCH-1985. --------------------------------------------------- Resolution: Fixed > Adding a main() method to the MimeTypeIndexingFilter > ---------------------------------------------------- > > Key: NUTCH-1985 > URL: https://issues.apache.org/jira/browse/NUTCH-1985 > Project: Nutch > Issue Type: Improvement > Components: indexer, metadata, plugin > Affects Versions: 1.10 > Reporter: Jorge Luis Betancourt Gonzalez > Priority: Minor > Labels: features, patch, test > Fix For: 1.10 > > Attachments: NUTCH-1985.patch > > > This make very easy the testing of different rules files to check the > expressions used to filter the content based on the MIME type detected. Until > now the only way to check this was to do test crawls and check the stored > data in Solr/Elasticsearch. > This allows calling the file using the {{bin/nutch plugin}} command, > something like: > {{bin/nutch plugin mimetype-filter > org.apache.nutch.indexer.filter.MimeTypeIndexingFilter -h}} > Two options are accepted, {{-h, --help}} for showing the help and {{-rules}} > for specifying a rules file to be used, this makes easy to play with > different rules file until you get the desired behavior. > After invoking the class, a valid MIME type must be entered for each line, > and the output will be the same MIME type with a {{+}} or {{-}} sign in the > beginning, indicating if the given MIME type is allowed or denied > respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)