[ 
https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509167#comment-14509167
 ] 

Jorge Luis Betancourt Gonzalez commented on NUTCH-1985:
-------------------------------------------------------

Should we commit this for 1.10 release? or wait for 1.11 ?

> Adding a main() method to the MimeTypeIndexingFilter
> ----------------------------------------------------
>
>                 Key: NUTCH-1985
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1985
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer, metadata, plugin
>    Affects Versions: 1.10
>            Reporter: Jorge Luis Betancourt Gonzalez
>            Priority: Minor
>              Labels: features, patch, test
>             Fix For: 1.10
>
>         Attachments: NUTCH-1985.patch
>
>
> This make very easy the testing of different rules files to check the 
> expressions used to filter the content based on the MIME type detected. Until 
> now the only way to check this was to do test crawls and check the stored 
> data in Solr/Elasticsearch. 
> This allows calling the file using the {{bin/nutch plugin}} command, 
> something like:
> {{bin/nutch plugin mimetype-filter 
> org.apache.nutch.indexer.filter.MimeTypeIndexingFilter -h}}
> Two options are accepted, {{-h, --help}} for showing the help and {{-rules}} 
> for specifying a rules file to be used, this makes easy to play with 
> different rules file until you get the desired behavior. 
> After invoking the class, a valid MIME type must be entered for each line, 
> and the output will be the same MIME type with a {{+}} or {{-}} sign in the 
> beginning, indicating if the given MIME type is allowed or denied 
> respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to